Table of contents
TOC
Collapse the table of content
Expand the table of content

PERCENTILE_CONT (U-SQL)

Michael Rys|Last Updated: 5/15/2018
|
4 Contributors

The PERCENTILE_CONT analytic function calculates a percentile based on a continuous distribution of the values in the window. The result is interpolated and might not be equal to any of the specific values in the column.

PERCENTILE_CONT can only be used in the context of a windowing expression.

Syntax
PERCENTILE_CONT_Expression :=                                                                            
     'PERCENTILE_CONT' '(' double_literal ')'
     'WITHIN' 'GROUP' '(' Simple_Order_By_Clause ')'.

Semantics of Syntax Elements

  • double_literal
    The percentile to compute. The value must range between 0.0 and 1.0 and be of type double.

  • WITHIN GROUP ( Simple_Order_By_Clause )
    Specifies a list of numeric values to sort and compute the percentile over.

    Syntax
    Simple_Order_By_Clause :=                                                                           
         'ORDER' 'BY' Sort_Item_Expression.
    Sort_Item_Expression := expression [Sort_Direction].
    Sort_Direction := 'ASC' | 'DESC'.

    Only one ORDER BY expression is allowed. The type of the expression has to be a nullable or nonnullable numeric type. The default sort order is ascending.

Return Type

The return type is double?.

Usage in Windowing Expression

This analytic function can be used in a windowing expression with the following restrictions:

  • Only the PARTITION BY clause can be specified with the OVER operator.
  • The ORDER BY clause in the OVER operator is not allowed.
  • The ROWS clause in the OVER operator is not allowed.

Example

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
  • The example below is based on the dataset defined below. Ensure your execution includes the rowset variable.

     @employees = 
         SELECT * FROM 
             ( VALUES
             (1, "Noah",   "Engineering", 100, 10000),
             (2, "Sophia", "Engineering", 100, 20000),
             (3, "Liam",   "Engineering", 100, 30000),
             (4, "Amy",    "Engineering", 100, 35000),
             (5, "Emma",   "HR",          200, 8000),
             (6, "Jacob",  "HR",          200, 8000),
             (7, "Olivia", "HR",          200, 8000),
             (8, "Mason",  "Executive",   300, 50000),
             (9, "Ava",    "Marketing",   400, 15000),
             (10, "Ethan", "Marketing",   400, 9000) 
             ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
    

Using PERCENTILE_CONT
The following example uses PERCENTILE_CONT to find the median employee salary in each department.

@result =
    SELECT DISTINCT DeptName,
                    PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY Salary) OVER(PARTITION BY DeptName) AS MedianCont
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/percentile_cont/example.csv"
USING Outputters.Csv();

See Also

© 2018 Microsoft