Table of contents
TOC
Collapse the table of content
Expand the table of content

OVER Expression (U-SQL)

Michael Rys|Last Updated: 5/15/2018
|
4 Contributors

A windowing expression is an expression whose value is computed by applying for every row a window function to multiple row values of a column (the window defined by the OVER operator) instead of just the column’s value of the row.

Note

The OVER expression is sometimes referred to as an OVER Clause to make it search discoverable.

The OVER operator allows to specify a partition or window of column data that the window functions operate on. The windows provide the ability to access data from previous rows without having to use a self-join and provide both cumulative and sliding windows. It thus provides the ability to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

It is similar to aggregation over grouping but instead of grouping the whole rowset with GROUP BY and then returning a single aggregated value for each group, the window defines a set of rows for a particular column in the SELECT clause that will be aggregated for each row. In the GROUP BY case, the result will be a row per group, with windowing expressions, the number of rows returned by the SELECT will not be affected by the windows.

Windowing expressions are only supported inside expressions in a SELECT FROM clause. More than one windowing expression can be used in a single query with a single FROM clause. The OVER operator for each function can differ in partitioning and ordering.

A windowing expression can never be used if a GROUP BY clause is present.

Syntax
Windowing_Expression :=                                                                                  
     Window_Function_Call 'OVER' '('
          [ Over_Partition_By_Clause ]
          [ Order_By_Clause ]
          [ Row_Clause ]
     ')'.

Semantics of Syntax Elements

  • Window_Function_Call
    The window function that is being applied to the window.

    Syntax
    Window_Function_Call :=                                                                             
          Aggregate_Function_Call
    |     Analytic_Function_Call
    |     Ranking_Function_Call.
    

    Window functions can be one of the following:

    Aggregation functions applied to a window cannot be used with DISTINCT.

  • OVER ( … )
    The OVER operator that specifies the window with the following components. Note that certain windowing functions have certain requirements with respect to the presence or absence of some if these components. These requirements are explained in the relevant section describing that function.

  • Over_Partition_By_Clause
    The OVER operator’s optional partition clause defines the window by partitioning the rowset. The window function is applied to each partition separately and computation restarts for each partition.

    Syntax
    Over_Partition_By_Clause :=                                                                         
         'PARTITION' 'BY' Expression_List.
    

    The data can be partitioned according to a list of scalar expressions. The result type of each of the expression has to return an equality comparable type or an error is raised. Most commonly, the partition expressions refer to the rowset’s columns (as specified by the FROM clause and not the columns from the SELECT clause).

    If PARTITION BY is not specified, the function treats all rows of the query result set as a single group.

  • Order_By_Clause
    The OVER operator’s optional ORDER BY clause defines the order of the rows withing each of the windows.

    Syntax
    Order_By_Clause :=                                                                                  
         'ORDER' 'BY' Sort_Item_Expression_List.
    Sort_Item_Expression_List := Sort_Item_Expression { ',' Sort_Item_Expression }.
    Sort_Item_Expression := expression [Sort_Direction].

    The syntax and semantics follows the normal ORDER BY clause. For window functions that depend on the order such as the ranking functions, this order by clause specifies the logical order in which the window function calculation is performed.

    If the ORDER BY clause is not specified, then the window is not ordered.

  • Row_Clause
    The OVER operator’s optional ROWS clause further limits the rows within a partition by specifying fixed number of rows preceding or following the current row. Preceding and following rows are defined based on the ordering in the ORDER BY clause. Therefore, the ROWS clause requires that the ORDER BY clause be specified.

    Syntax
    Row_Clause :=                                                                                       
         'ROWS' Window_Frame_Extent.
    

    If the ROWS clause is not specified but ORDER BY is specified, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used as the default for the window frame.

    The ROWS clause is incompatible with the DISTINCT aggregate option and an error is raised.

    • Window_Frame_Extent
      The window frame extent specifies the rows specifying a lower bound and an upper bound by either just specifying the preceding rows from the current row or providing a frame between two explicitly specified end points.

      Syntax
      Window_Frame_Extent :=                                                                         
           Window_Frame_Preceding | Window_Frame_Between.
      

      For example, ROWS BETWEEN 2 PRECEDING AND CURRENT ROW means that the window of rows that the function operates on is three rows in size, starting with 2 rows preceding until and including the current row.

      • Window_Frame_Preceding
        Defines the window frame relative from the current row. Currently supported are:

        Syntax
        Window_Frame_Preceding :=                                                                 
             'UNBOUNDED' 'PRECEDING'
        |    unsigned_integer_literal 'PRECEDING' 
        |    'CURRENT' 'ROW'.
        

        With the following semantics:

        • UNBOUNDED PRECEDING
          Specifies that the window starts at the first row of the partition. It can only be specified as the window starting point.

        • unsigned_integer_literal PRECEDING
          The nonnegative integer literal specifies the number of rows to precede the current row in the partition. If the value is larger than the available number of preceding rows in the partition, then the window starts at the first row of the partition.

        • CURRENT ROW
          Specifies that the window starts or ends at the current row. It can be specified as both a starting and ending point.

      • Window_Frame_Between
        Specifies the lower (starting) and upper (ending) boundary points of the window.

        Syntax
        Window_Frame_Between :=                                                                   
             'BETWEEN' Window_Frame_Bound
             'AND' Window_Frame_Bound.
        Window_Frame_Bound := Window_Frame_Preceding | Window_Frame_Following.

        The first frame bound specifies the lower, the second frame bound the upper boundary (inclusive). If the upper boundary is smaller than the lower boundary, an error is raised. Note that a frame can be all preceding or all following the current row. The frame bounds are specified as follows:

        • Window_Frame_Preceding
          Specifies that the frame starts or ends before the current row as above

        • Window_Frame_Following
          Specifies that the frame starts or ends following the current row:

          Syntax
          Window_Frame_Following :=                                                            
               unsigned_integer_literal 'FOLLOWING' 
          |    'CURRENT' 'ROW'.
          

          With the following semantics:

          • unsigned_integer_literal FOLLOWING
            The nonnegative integer literal specifies the number of rows to follow the current row in the partition. If the value is larger than the available number of following rows in the partition, then last row of the partition is chosen as the boundary.

          • CURRENT ROW
            Specifies that the window boundary is at the current row.

            Tip

            U-SQL does currently not support UNBOUNDED FOLLOWING. One can achieve the same result by inverting the ordering and use UNBOUNDED PRECEDING instead.

Examples

Sample Data

It contains two views that we will use for the sample data.

  • QueryLog
    To access this data use the code below.
    sql querylog = Demo.QueryLog();

data:

FruitQuantitySource
Banana300Image
Cherry300Image
Durian500Image
Apple100Web
Fig200Web
Papay200Web
Avocado300Web
Cherry400Web
Durian500Web
  • Employees
    To access this data use the code below.
    sql employees = Demo.Employees();

data:

EmpIDEmpNameDeptNameDeptIDSalary
1NoahEngineering10010000
2SophiaEngineering10020000
3LiamEngineering10030000
4EmmaHR20010000
5JacobHR20010000
6OliviaHR20010000
7MasonExecutive30050000
8AvaMarketing40015000
9EthanMarketing40010000

Now, let’s walk through windowing functions and aggregations. As we do this the answer to this question will be clear and we’ll also understand the conceptual difference of Windowing Functions from Grouping.

data3 =         
    SELECT EmpName, SUM(Salary) OVER( ) As SalaryAllDepts        
    FROM employees; 

Note these things:

  • We are still doing a SUM() but it is modified by an OVER clause
  • The OVER clause is empty – there’s nothing between the parentheses
  • The OVER clause defines the “window” – in this case since it is empty the window is the entire set of rows
  • Thus you can read SUM(Salary) OVER () as “The sum of Salary across the window of rows”
EmpNameTotalAllDepts
Noah165000
Sophia165000
Liam165000
Emma165000
Jacob165000
Olivia165000
Mason165000
Ava165000
Ethan165000

When we did the first SUM without any GROUP BY we received the correct total salary 165000. But grouping collapsed that into a single row. Here we see there are as many output rows as input rows. More interestingly the 165000 value is clearly calculated in each row but to calculate it data from EVERY row was used.

Another way of thinking about this is that each output row has knowledge of multiple input rows - in this all of the input rows. The “window” is all the rows. Now let’s try an OVER clause that contains something. In this case by using PARTITION BY DeptName our window is based on DeptName.

data4 =         
    SELECT EmpName, DeptName, SUM(Salary) OVER( PARTITION BY DeptName ) AS SalaryByDept  
    FROM employees; 

result:

EmpNameDeptNameSalaryByDept:double
NoahEngineering60000
SophiaEngineering60000
LiamEngineering60000
MasonExecutive50000
EmmaHR30000
JacobHR30000
OliviaHR30000
AvaMarketing25000
EthanMarketing25000

Again, notice that there are the same number of input rows as output rows. However now each row has a Department total - again this is the cause the window for SUM was defined on the DeptName. Key points

  • grouping collapses input rows
  • with grouping aggregation happens on the entire set of rows or what is specified by the GROUP BY clause
  • windowing does not collapse rows
  • with windowing aggregations happen on the window defined by the OVER clause – either all rows and the rows defined by the window
  • OVER( ) – defines a window of all rows
  • OVER( PARTITION BY X ) – defines a window on column X

Now let’s calculate the percentage of salary devoted to each department form the total salary across all departments. We can’t mix Windowing Functions and GROUP BY in the same statement so we’ll build it up like this:

a =         
    SELECT TOP 1 SUM(Salary) OVER( ) AS SalaryAllDepts        
    FROM employees;     

b =           
    SELECT DeptName, SUM(Salary) AS SalaryByDept         
    FROM employees         
    GROUP BY DeptName;     

c = 
    SELECT DeptName, SalaryByDept, SalaryAllDepts, (SalaryByDept/SalaryAllDepts) AS Percentage         
    FROM a CROSS JOIN b;

result:

DeptNameSalaryByDeptSalaryAllDeptsPercentage
Engineering600001650000.363636363636364
Executive500001650000.303030303030303
HR300001650000.181818181818182
Marketing250001650000.151515151515152

See also

© 2018 Microsoft