-
PERCENTILE vs NTILE 완벽 비교 가이드Programing Language/SQL 2025. 6. 14. 11:51반응형
🎯 핵심 차이점 요약
구분PERCENTILENTILE
목적 특정 백분위값 구하기 데이터를 N등분하기 반환값 실제 데이터 값 그룹 번호 (1, 2, 3...) 사용법 PERCENTILE_CONT(0.5) NTILE(4) 결과 중간값, 75백분위값 등 1분위, 2분위, 3분위 등
📊 PERCENTILE 함수들
1. PERCENTILE_CONT (연속형)
sql-- 50백분위 (중간값) 구하기 SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary FROM employees; -- 25백분위, 75백분위 구하기 SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3 FROM employees;
2. PERCENTILE_DISC (이산형)
sql-- 실제 존재하는 값 중에서 백분위에 해당하는 값 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary FROM employees;
3. PERCENT_RANK (순위를 백분율로)
sql-- 각 직원의 급여가 전체에서 몇 백분위인지 SELECT name, salary, PERCENT_RANK() OVER (ORDER BY salary) * 100 AS percentile_rank FROM employees;
🎲 NTILE 함수
기본 사용법
sql-- 데이터를 4등분 (quartile) SELECT name, salary, NTILE(4) OVER (ORDER BY salary) AS quartile FROM employees; -- 데이터를 10등분 (decile) SELECT name, salary, NTILE(10) OVER (ORDER BY salary) AS decile FROM employees; -- 데이터를 100등분 (percentile groups) SELECT name, salary, NTILE(100) OVER (ORDER BY salary) AS percentile_group FROM employees;
🔍 구체적 예시로 이해하기
샘플 데이터
sql-- 10명의 급여 데이터 (이미 정렬됨) ID | Name | Salary 1 | A | 1000 2 | B | 2000 3 | C | 3000 4 | D | 4000 5 | E | 5000 6 | F | 6000 7 | G | 7000 8 | H | 8000 9 | I | 9000 10 | J | 10000
PERCENTILE 결과
sql-- PERCENTILE_CONT 결과 SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1, -- 3250 (실제값 없음, 보간) PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median, -- 5500 (실제값 없음, 보간) PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3 -- 7750 (실제값 없음, 보간) FROM employees; -- PERCENTILE_DISC 결과 SELECT PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY salary) AS q1, -- 3000 (실제 존재하는 값) PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY salary) AS median, -- 5000 (실제 존재하는 값) PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY salary) AS q3 -- 8000 (실제 존재하는 값) FROM employees;
NTILE 결과
sql-- NTILE(4) 결과 Name | Salary | NTILE(4) A | 1000 | 1 ← 1분위 (하위 25%) B | 2000 | 1 C | 3000 | 1 D | 4000 | 2 ← 2분위 (25-50%) E | 5000 | 2 F | 6000 | 2 G | 7000 | 3 ← 3분위 (50-75%) H | 8000 | 3 I | 9000 | 4 ← 4분위 (상위 25%) J | 10000 | 4
🤔 언제 무엇을 사용할까?
PERCENTILE을 사용하는 경우
1. 특정 백분위 값이 궁금할 때
sql-- "급여의 중간값은 얼마인가?" SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary FROM employees; -- "상위 10%의 급여 기준선은?" SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) AS top_10_percent FROM employees;
2. 통계 분석이 목적일 때
sql-- 사분위수 분석 SELECT MIN(salary) AS min_val, PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3, MAX(salary) AS max_val FROM employees;
NTILE을 사용하는 경우
1. 데이터를 균등하게 나누고 싶을 때
sql-- 고객을 5개 등급으로 나누기 SELECT customer_id, total_purchase, NTILE(5) OVER (ORDER BY total_purchase DESC) AS customer_tier FROM customers;
2. 각 그룹에 레이블을 붙이고 싶을 때
sql-- 성적에 따른 등급 부여 SELECT student_id, score, CASE NTILE(5) OVER (ORDER BY score DESC) WHEN 1 THEN 'A' WHEN 2 THEN 'B' WHEN 3 THEN 'C' WHEN 4 THEN 'D' WHEN 5 THEN 'F' END AS grade FROM students;
3. 균등 분할이 필요한 비즈니스 로직
sql-- 영업팀을 4개 그룹으로 나누어 관리 SELECT employee_id, sales_amount, NTILE(4) OVER (ORDER BY sales_amount DESC) AS performance_group, CASE NTILE(4) OVER (ORDER BY sales_amount DESC) WHEN 1 THEN 'Top Performer' WHEN 2 THEN 'High Performer' WHEN 3 THEN 'Average Performer' WHEN 4 THEN 'Needs Improvement' END AS performance_category FROM sales_team;
🎯 대장균 문제에서 왜 NTILE을 썼나?
문제 요구사항 분석
- "상위 0% ~ 25%를 'CRITICAL'"
- "26% ~ 50%를 'HIGH'"
- "51% ~ 75%를 'MEDIUM'"
- "76% ~ 100%를 'LOW'"
NTILE이 적합한 이유
sql-- 정확히 4개 그룹으로 균등 분할 NTILE(4) OVER (ORDER BY SIZE_OF_COLONY DESC) -- 결과: -- 1 → CRITICAL (상위 25%) -- 2 → HIGH (26-50%) -- 3 → MEDIUM (51-75%) -- 4 → LOW (76-100%)
PERCENTILE로 했다면?
sql-- 복잡하고 실수하기 쉬움 CASE WHEN PERCENT_RANK() OVER (ORDER BY SIZE_OF_COLONY DESC) <= 0.25 THEN 'CRITICAL' WHEN PERCENT_RANK() OVER (ORDER BY SIZE_OF_COLONY DESC) <= 0.50 THEN 'HIGH' WHEN PERCENT_RANK() OVER (ORDER BY SIZE_OF_COLONY DESC) <= 0.75 THEN 'MEDIUM' ELSE 'LOW' END
🔄 MySQL에서의 제한사항
MySQL에서 지원하는 함수들
sql-- ✅ 지원됨 NTILE(n) OVER (ORDER BY column) PERCENT_RANK() OVER (ORDER BY column) -- ❌ 지원 안됨 (PostgreSQL, SQL Server에서만 지원) PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column) PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY column)
MySQL에서 PERCENTILE 구현하기
sql-- 중간값 구하기 (MySQL 방식) SELECT AVG(salary) as median_salary FROM ( SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) as rn, COUNT(*) OVER () as total_count FROM employees ) ranked WHERE rn IN (FLOOR((total_count + 1) / 2), CEIL((total_count + 1) / 2));
🎯 실무 활용 시나리오
고객 세그멘테이션
sql-- NTILE: 고객을 5개 등급으로 균등 분할 WITH customer_tiers AS ( SELECT customer_id, total_purchase, NTILE(5) OVER (ORDER BY total_purchase DESC) AS tier FROM customer_summary ) SELECT tier, COUNT(*) AS customer_count, MIN(total_purchase) AS min_purchase, MAX(total_purchase) AS max_purchase FROM customer_tiers GROUP BY tier;
성과 평가
sql-- PERCENT_RANK: 각 직원이 상위 몇 %인지 SELECT employee_id, performance_score, ROUND(PERCENT_RANK() OVER (ORDER BY performance_score DESC) * 100, 1) AS percentile FROM employee_performance WHERE percentile <= 10; -- 상위 10% 직원들
🏆 요약
PERCENTILE
- 목적: "몇 백분위 값이 얼마인가?"
- 결과: 실제 값
- 용도: 통계 분석, 기준값 설정
NTILE
- 목적: "데이터를 N개 그룹으로 나누자"
- 결과: 그룹 번호
- 용도: 등급 분류, 균등 분할
균등한 그룹 분할이 목적이라면 NTILE이 최적입니다! 🎯
반응형'Programing Language > SQL' 카테고리의 다른 글
2. JOIN + GROUP BY 문제 (0) 2025.06.14 1. 날짜/시간 처리 문제 (0) 2025.06.14 프로그래머스_대장균의 크기에 따라 분류하기 2 (0) 2025.06.14 DAYOFWEEK 함수 (0) 2025.06.14 CTE(Common Table Expression) 완벽 이해 가이드 (1) 2025.06.13