ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • PERCENTILE vs NTILE 완벽 비교 가이드
    Programing Language/SQL 2025. 6. 14. 11:51
    반응형

    🎯 핵심 차이점 요약

    구분PERCENTILENTILE

    목적 특정 백분위값 구하기 데이터를 N등분하기
    반환값 실제 데이터 값 그룹 번호 (1, 2, 3...)
    사용법 PERCENTILE_CONT(0.5) NTILE(4)
    결과 중간값, 75백분위값 등 1분위, 2분위, 3분위 등

    📊 PERCENTILE 함수들

    1. PERCENTILE_CONT (연속형)

     
    sql
    -- 50백분위 (중간값) 구하기
    SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
    FROM employees;
    
    -- 25백분위, 75백분위 구하기  
    SELECT 
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1,
        PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3
    FROM employees;

    2. PERCENTILE_DISC (이산형)

     
    sql
    -- 실제 존재하는 값 중에서 백분위에 해당하는 값
    SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
    FROM employees;

    3. PERCENT_RANK (순위를 백분율로)

     
    sql
    -- 각 직원의 급여가 전체에서 몇 백분위인지
    SELECT name, salary,
           PERCENT_RANK() OVER (ORDER BY salary) * 100 AS percentile_rank
    FROM employees;

    🎲 NTILE 함수

    기본 사용법

     
    sql
    -- 데이터를 4등분 (quartile)
    SELECT name, salary,
           NTILE(4) OVER (ORDER BY salary) AS quartile
    FROM employees;
    
    -- 데이터를 10등분 (decile)
    SELECT name, salary,
           NTILE(10) OVER (ORDER BY salary) AS decile
    FROM employees;
    
    -- 데이터를 100등분 (percentile groups)
    SELECT name, salary,
           NTILE(100) OVER (ORDER BY salary) AS percentile_group
    FROM employees;

    🔍 구체적 예시로 이해하기

    샘플 데이터

     
    sql
    -- 10명의 급여 데이터 (이미 정렬됨)
    ID | Name | Salary
    1  | A    | 1000
    2  | B    | 2000  
    3  | C    | 3000
    4  | D    | 4000
    5  | E    | 5000
    6  | F    | 6000
    7  | G    | 7000
    8  | H    | 8000
    9  | I    | 9000
    10 | J    | 10000

    PERCENTILE 결과

     
    sql
    -- PERCENTILE_CONT 결과
    SELECT 
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1,    -- 3250 (실제값 없음, 보간)
        PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median, -- 5500 (실제값 없음, 보간)
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3     -- 7750 (실제값 없음, 보간)
    FROM employees;
    
    -- PERCENTILE_DISC 결과  
    SELECT 
        PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY salary) AS q1,    -- 3000 (실제 존재하는 값)
        PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY salary) AS median, -- 5000 (실제 존재하는 값)
        PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY salary) AS q3     -- 8000 (실제 존재하는 값)
    FROM employees;

    NTILE 결과

     
    sql
    -- NTILE(4) 결과
    Name | Salary | NTILE(4)
    A    | 1000   | 1    ← 1분위 (하위 25%)
    B    | 2000   | 1    
    C    | 3000   | 1    
    D    | 4000   | 2    ← 2분위 (25-50%)
    E    | 5000   | 2    
    F    | 6000   | 2    
    G    | 7000   | 3    ← 3분위 (50-75%)
    H    | 8000   | 3    
    I    | 9000   | 4    ← 4분위 (상위 25%)
    J    | 10000  | 4

    🤔 언제 무엇을 사용할까?

    PERCENTILE을 사용하는 경우

    1. 특정 백분위 값이 궁금할 때

     
    sql
    -- "급여의 중간값은 얼마인가?"
    SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
    FROM employees;
    
    -- "상위 10%의 급여 기준선은?"
    SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) AS top_10_percent
    FROM employees;

    2. 통계 분석이 목적일 때

     
    sql
    -- 사분위수 분석
    SELECT 
        MIN(salary) AS min_val,
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1,
        PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3,
        MAX(salary) AS max_val
    FROM employees;

    NTILE을 사용하는 경우

    1. 데이터를 균등하게 나누고 싶을 때

     
    sql
    -- 고객을 5개 등급으로 나누기
    SELECT customer_id, total_purchase,
           NTILE(5) OVER (ORDER BY total_purchase DESC) AS customer_tier
    FROM customers;

    2. 각 그룹에 레이블을 붙이고 싶을 때

     
    sql
    -- 성적에 따른 등급 부여
    SELECT student_id, score,
           CASE NTILE(5) OVER (ORDER BY score DESC)
               WHEN 1 THEN 'A'
               WHEN 2 THEN 'B' 
               WHEN 3 THEN 'C'
               WHEN 4 THEN 'D'
               WHEN 5 THEN 'F'
           END AS grade
    FROM students;

    3. 균등 분할이 필요한 비즈니스 로직

     
    sql
    -- 영업팀을 4개 그룹으로 나누어 관리
    SELECT employee_id, sales_amount,
           NTILE(4) OVER (ORDER BY sales_amount DESC) AS performance_group,
           CASE NTILE(4) OVER (ORDER BY sales_amount DESC)
               WHEN 1 THEN 'Top Performer'
               WHEN 2 THEN 'High Performer'  
               WHEN 3 THEN 'Average Performer'
               WHEN 4 THEN 'Needs Improvement'
           END AS performance_category
    FROM sales_team;

    🎯 대장균 문제에서 왜 NTILE을 썼나?

    문제 요구사항 분석

    • "상위 0% ~ 25%를 'CRITICAL'"
    • "26% ~ 50%를 'HIGH'"
    • "51% ~ 75%를 'MEDIUM'"
    • "76% ~ 100%를 'LOW'"

    NTILE이 적합한 이유

     
    sql
    -- 정확히 4개 그룹으로 균등 분할
    NTILE(4) OVER (ORDER BY SIZE_OF_COLONY DESC)
    
    -- 결과:
    -- 1 → CRITICAL (상위 25%)
    -- 2 → HIGH (26-50%)  
    -- 3 → MEDIUM (51-75%)
    -- 4 → LOW (76-100%)

    PERCENTILE로 했다면?

     
    sql
    -- 복잡하고 실수하기 쉬움
    CASE 
        WHEN PERCENT_RANK() OVER (ORDER BY SIZE_OF_COLONY DESC) <= 0.25 THEN 'CRITICAL'
        WHEN PERCENT_RANK() OVER (ORDER BY SIZE_OF_COLONY DESC) <= 0.50 THEN 'HIGH'
        WHEN PERCENT_RANK() OVER (ORDER BY SIZE_OF_COLONY DESC) <= 0.75 THEN 'MEDIUM'
        ELSE 'LOW'
    END

    🔄 MySQL에서의 제한사항

    MySQL에서 지원하는 함수들

     
    sql
    -- ✅ 지원됨
    NTILE(n) OVER (ORDER BY column)
    PERCENT_RANK() OVER (ORDER BY column)
    
    -- ❌ 지원 안됨 (PostgreSQL, SQL Server에서만 지원)
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column)
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY column)

    MySQL에서 PERCENTILE 구현하기

     
    sql
    -- 중간값 구하기 (MySQL 방식)
    SELECT AVG(salary) as median_salary
    FROM (
        SELECT salary,
               ROW_NUMBER() OVER (ORDER BY salary) as rn,
               COUNT(*) OVER () as total_count
        FROM employees
    ) ranked
    WHERE rn IN (FLOOR((total_count + 1) / 2), CEIL((total_count + 1) / 2));

    🎯 실무 활용 시나리오

    고객 세그멘테이션

     
    sql
    -- NTILE: 고객을 5개 등급으로 균등 분할
    WITH customer_tiers AS (
        SELECT customer_id, total_purchase,
               NTILE(5) OVER (ORDER BY total_purchase DESC) AS tier
        FROM customer_summary
    )
    SELECT tier,
           COUNT(*) AS customer_count,
           MIN(total_purchase) AS min_purchase,
           MAX(total_purchase) AS max_purchase
    FROM customer_tiers
    GROUP BY tier;

    성과 평가

     
    sql
    -- PERCENT_RANK: 각 직원이 상위 몇 %인지
    SELECT employee_id, performance_score,
           ROUND(PERCENT_RANK() OVER (ORDER BY performance_score DESC) * 100, 1) AS percentile
    FROM employee_performance
    WHERE percentile <= 10;  -- 상위 10% 직원들

    🏆 요약

    PERCENTILE

    • 목적: "몇 백분위 값이 얼마인가?"
    • 결과: 실제 값
    • 용도: 통계 분석, 기준값 설정

    NTILE

    • 목적: "데이터를 N개 그룹으로 나누자"
    • 결과: 그룹 번호
    • 용도: 등급 분류, 균등 분할

    균등한 그룹 분할이 목적이라면 NTILE이 최적입니다! 🎯

    반응형

    댓글

Designed by Tistory.