ORDER BY and OFFSET/FETCH Clauses (U-SQL)

 

Updated: September 19, 2017

U-SQL provides the optional ORDER BY clause to order its rowset on a SELECT expression and an OUTPUT statement.

The only reason to order inside a SELECT expression is to fetch a limited number of rows. The output rowset from a SELECT does not have a guaranteed order because it is not output to a final destination, but passed on through other processing steps that will not preserve the order. Thus, U-SQL in its current batch-mode form requires that the ORDER BY clause in a SELECT expression is only used in conjunction with an OFFSET/FETCH clause that fetches the limited set of rows.

To output with a specific order, use ORDER BY with the OUTPUT statement which has no such restrictions and makes the OFFSET/FETCH clause optional. For more information on the OUTPUT statement specific ordering and examples please refer to Output Statement (U-SQL). Note that the ORDER BY and OFFSET/FETCH semantics also apply to the OUTPUT statement’s ORDER BY clause.

Syntax
Order_By_Fetch_Clause :=                                                                                 
    Order_By_Clause Offset_Fetch.
Order_By_Clause := 'ORDER' 'BY' Sort_Item_Expression_List.

Semantics of Syntax Elements

  • Sort_Item_Expression_List
    The ORDER BY sort item expression list can refer to any of the columns in the SELECT’s rowset and can include any C# expression.

    Syntax
    Sort_Item_Expression_List :=                                                                        
         Sort_Item_Expression { ',' Sort_Item_Expression }.
    Sort_Item_Expression := expression [Sort_Direction].

    The order gets applied from left to right in the expression list and may either be sorted in ascending or descending order depending on the optionally specified sort direction. The default sort direction is ascending.

    U-SQL always orders null values last, regardless of the sort direction or data types.

  • Offset_Fetch
    The full syntax of the OFFSET/FETCH clause is:

    Syntax
    Offset_Fetch :=                                                                                     
           ['OFFSET' integer_or_long_literal ('ROW' | 'ROWS')] [Fetch].
    Fetch := 'FETCH' ['FIRST' | 'NEXT'] integer_or_long_literal ['ROW' | 'ROWS'] ['ONLY'].

    The OFFSET/FETCH clause is the ANSI SQL-conformant way to specify getting the first number of rows. U-SQL makes many of the keywords optional to minimize the amount of typing required. If the OFFSET x ROWS clause is not specified, it defaults to OFFSET 0 ROWS. If the FETCH clause is not specified, then all the rows starting after the skipped rows are being returned.

    System_CAPS_ICON_note.jpg Note

    There is no guarantee that different invocations with the same or different offsets operate from a single snapshot of the ordered rowset. The set of returned rows may be non-deterministically impacted if the order specification is not deterministic (e.g., the order by clause is under-specified, so that multiple rows can be ordered in the same local position) or the data changes between different invocations.

    The OFFSET and FETCH clauses allow specifying integer or long literal values. The value for the FETCH clause has to be in the range [1, 10000], otherwise an error is raised.

    System_CAPS_ICON_note.jpg Note

    Other SQL dialects may offer different syntactic options such as TOP() or a p-=.LIMIT clause, while U-SQL currently offers the FETCH clause.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
  • The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.

Dataset

@employees = 
    SELECT * FROM 
        ( VALUES
        (1, "Noah",   100, (int?)10000, new DateTime(2012,05,31), "cell:030-0074321,office:030-0076545"),
        (2, "Sophia", 100, (int?)15000, new DateTime(2012,03,19), "cell:(5) 555-4729,office:(5) 555-3745"),
        (3, "Liam",   100, (int?)30000, new DateTime(2014,09,14), "cell:(5) 555-3932"),
        (4, "Amy",    100, (int?)35000, new DateTime(1999,02,27), "cell:(171) 555-7788,office:(171) 555-6750, home:(425) 555-6238"),
        (5, "Justin", 100, (int?)15000, new DateTime(2015,01,12), "cell:0921-12 34 65,office:0921-12 34 67"),
        (6, "Emma",   200, (int?)8000,  new DateTime(2014,03,08), (string)null),
        (7, "Jacob",  200, (int?)8000,  new DateTime(2014,09,02), ""),
        (8, "Olivia", 200, (int?)8000,  new DateTime(2013,12,11), "cell:88.60.15.31,office:88.60.15.32"),
        (9, "Mason",  300, (int?)50000, new DateTime(2016,01,01), "cell:(91) 555 22 82,office:(91) 555 91 99, home:(425) 555-2819"),
        (10, "Ava",   400, (int?)15000, new DateTime(2014,09,14), "cell:91.24.45.40,office:91.24.45.41"),
        (11, "Ethan", 400, (int?)9000,  new DateTime(2015,08,22), "cell:(604) 555-4729,office:(604) 555-3745"),
        (12, "David", 800, (int?)100,   new DateTime(2016,11,01), "cell:(171) 555-1212"),
        (13, "Andrew", 100, (int?)null, new DateTime(1995,07,16), "cell:(1) 135-5555,office:(1) 135-4892"),
        (14, "Jennie", 100, (int?)34000, new DateTime(2000,02,12), "cell:(5) 555-3392,office:(5) 555-7293")
        ) AS T(EmpID, EmpName, DeptID, Salary, StartDate, PhoneNumbers);

SELECT & ORDER BY using FETCH only
Returns first five records.

@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    FETCH 5 ROWS;    

OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleA.txt"
USING Outputters.Csv();

SELECT & ORDER BY using OFFSET and FETCH
Skip first two records, then return next five records.

@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    OFFSET 2 ROWS 
    FETCH 5 ROWS;

OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleB.txt"
USING Outputters.Csv();

SELECT & ORDER BY using OFFSET only
Skip first two records, then return remainder.

@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    OFFSET 2 ROWS; 

OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleC.txt"
USING Outputters.Csv();

OUTPUT & ORDER BY

OUTPUT @employees
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleD.txt"
ORDER BY EmpID ASC
OFFSET 2 ROW    // Can be used without FETCH
FETCH 5 ROWS    // Can be used without OFFSET
USING Outputters.Csv();

SELECT and OUTPUT using ORDER BY

// Order by EmpID ASC and select all but first two records
@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    OFFSET 2 ROWS;

// Then output remaining records in reverse EmpID order
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleE.txt"
ORDER BY EmpID DESC
USING Outputters.Csv();

See Also

Community Additions

ADD
Show: