ORDER BY and OFFSET/FETCH Clauses (U-SQL)
Updated: September 19, 2017
U-SQL provides the optional ORDER BY clause to order its rowset on a SELECT expression and an OUTPUT statement.
The only reason to order inside a SELECT expression is to fetch a limited number of rows. The output rowset from a SELECT does not have a guaranteed order because it is not output to a final destination, but passed on through other processing steps that will not preserve the order. Thus, U-SQL in its current batch-mode form requires that the ORDER BY clause in a SELECT expression is only used in conjunction with an OFFSET/FETCH clause that fetches the limited set of rows.
To output with a specific order, use ORDER BY with the OUTPUT statement which has no such restrictions and makes the OFFSET/FETCH clause optional. For more information on the OUTPUT statement specific ordering and examples please refer to Output Statement (U-SQL). Note that the ORDER BY and OFFSET/FETCH semantics also apply to the OUTPUT statement’s ORDER BY clause.
| Syntax |
|---|
Order_By_Fetch_Clause :=
Order_By_Clause Offset_Fetch. |
Semantics of Syntax Elements
Sort_Item_Expression_List
The ORDER BY sort item expression list can refer to any of the columns in the SELECT’s rowset and can include any C# expression.Syntax Sort_Item_Expression_List := Sort_Item_Expression { ',' Sort_Item_Expression }.
Sort_Item_Expression := expression [Sort_Direction].The order gets applied from left to right in the expression list and may either be sorted in ascending or descending order depending on the optionally specified sort direction. The default sort direction is ascending.
U-SQL always orders null values last, regardless of the sort direction or data types.
Offset_Fetch
The full syntax of theOFFSET/FETCHclause is:Syntax Offset_Fetch := ['OFFSET' integer_or_long_literal ('ROW' | 'ROWS')] [Fetch].
Fetch := 'FETCH' ['FIRST' | 'NEXT'] integer_or_long_literal ['ROW' | 'ROWS'] ['ONLY'].The
OFFSET/FETCHclause is the ANSI SQL-conformant way to specify getting the first number of rows. U-SQL makes many of the keywords optional to minimize the amount of typing required. If theOFFSET x ROWSclause is not specified, it defaults toOFFSET 0 ROWS. If theFETCHclause is not specified, then all the rows starting after the skipped rows are being returned.
Note There is no guarantee that different invocations with the same or different offsets operate from a single snapshot of the ordered rowset. The set of returned rows may be non-deterministically impacted if the order specification is not deterministic (e.g., the order by clause is under-specified, so that multiple rows can be ordered in the same local position) or the data changes between different invocations.
The
OFFSETandFETCHclauses allow specifying integer or long literal values. The value for theFETCHclause has to be in the range [1, 10000], otherwise an error is raised.
Note Other SQL dialects may offer different syntactic options such as
TOP()or a p-=.LIMITclause, while U-SQL currently offers theFETCHclause.
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.
Dataset
@employees =
SELECT * FROM
( VALUES
(1, "Noah", 100, (int?)10000, new DateTime(2012,05,31), "cell:030-0074321,office:030-0076545"),
(2, "Sophia", 100, (int?)15000, new DateTime(2012,03,19), "cell:(5) 555-4729,office:(5) 555-3745"),
(3, "Liam", 100, (int?)30000, new DateTime(2014,09,14), "cell:(5) 555-3932"),
(4, "Amy", 100, (int?)35000, new DateTime(1999,02,27), "cell:(171) 555-7788,office:(171) 555-6750, home:(425) 555-6238"),
(5, "Justin", 100, (int?)15000, new DateTime(2015,01,12), "cell:0921-12 34 65,office:0921-12 34 67"),
(6, "Emma", 200, (int?)8000, new DateTime(2014,03,08), (string)null),
(7, "Jacob", 200, (int?)8000, new DateTime(2014,09,02), ""),
(8, "Olivia", 200, (int?)8000, new DateTime(2013,12,11), "cell:88.60.15.31,office:88.60.15.32"),
(9, "Mason", 300, (int?)50000, new DateTime(2016,01,01), "cell:(91) 555 22 82,office:(91) 555 91 99, home:(425) 555-2819"),
(10, "Ava", 400, (int?)15000, new DateTime(2014,09,14), "cell:91.24.45.40,office:91.24.45.41"),
(11, "Ethan", 400, (int?)9000, new DateTime(2015,08,22), "cell:(604) 555-4729,office:(604) 555-3745"),
(12, "David", 800, (int?)100, new DateTime(2016,11,01), "cell:(171) 555-1212"),
(13, "Andrew", 100, (int?)null, new DateTime(1995,07,16), "cell:(1) 135-5555,office:(1) 135-4892"),
(14, "Jennie", 100, (int?)34000, new DateTime(2000,02,12), "cell:(5) 555-3392,office:(5) 555-7293")
) AS T(EmpID, EmpName, DeptID, Salary, StartDate, PhoneNumbers);
SELECT & ORDER BY using FETCH only
Returns first five records.
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
FETCH 5 ROWS;
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleA.txt"
USING Outputters.Csv();
SELECT & ORDER BY using OFFSET and FETCH
Skip first two records, then return next five records.
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
OFFSET 2 ROWS
FETCH 5 ROWS;
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleB.txt"
USING Outputters.Csv();
SELECT & ORDER BY using OFFSET only
Skip first two records, then return remainder.
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
OFFSET 2 ROWS;
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleC.txt"
USING Outputters.Csv();
OUTPUT & ORDER BY
OUTPUT @employees TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleD.txt" ORDER BY EmpID ASC OFFSET 2 ROW // Can be used without FETCH FETCH 5 ROWS // Can be used without OFFSET USING Outputters.Csv();
SELECT and OUTPUT using ORDER BY
// Order by EmpID ASC and select all but first two records
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
OFFSET 2 ROWS;
// Then output remaining records in reverse EmpID order
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleE.txt"
ORDER BY EmpID DESC
USING Outputters.Csv();