Table of contents
Collapse the table of content
Expand the table of content

COMBINE Expression (U-SQL)

Michael Rys|Last Updated: 5/15/2018
4 Contributors

U-SQL provides the ability to write custom rowset combiners in C# using the user-defined operator extensibility framework by implementing an ICombiner. See U-SQL Programmability Guide: User-Defined Combiner for more information.

A combiner provides a way to implement custom join operators that are more complex than the standard U-SQL join expressions.

[//]: # 'Add link to join section'

A combiner is being invoked with the COMBINE expression that provide the necessary information about both the input rowsets, the join clause, the expected result schema as well as additional information.

A combiner provides limited optimization support, because an optimizer cannot reason about the procedural C# code defining the combiner For example, it cannot not push predicates through to earlier statements unless the column used in the predicate is marked as read only. Therefore, it is recommended to instead use the standard joins if possible or use the READONLY and REQURED clauses.

Combine_Expression :=                                                                                    
    'COMBINE' Combine_Input                            
    'WITH' Combine_InputJoin_On_ClauseProduce_Clause                                 

Semantics of Syntax Elements

  • Combine_Input
    Specifies the input rowsets that the combiner will join.

[//]: # 'ADD SOON: with an optional presort clause. [Presort_Clause]'

  Combine_Input :=                                                                                    


  • Aliased_Rowset
    Aliased rowsets are rowsets that may or may not need a table alias.

    Aliased_Rowset :=                                                                              
        Rowset [Alias] | Rowset_Expression Alias.


    • Alias is the rowset alias given to the rowset that can be used in the remainder of the COMBINE expression to refer to that specific rowset. It can be a quoted or unquoted identifier:

      Alias :=                                                                                  
          'AS' Quoted_or_Unquoted_Identifier.

    • Rowset
      The two simplest rowset sources are a rowset variable such as @rowset that has been defined in a previous statement of the script or a table that has been created in the account’s catalog:

      Rowset :=                                                                                 
          Rowset_Variable | Identifier.

      A table can be referenced either with its fully qualified 3-part name, within the current database context with a 2-part name, or within the current database and schema context with a single-part name. Optionally, a table alias can be provided for an input rowset variable or table which then can be used in the remainder of the COMBINE expression.

      Providing a rowset alias is optional.

    • Rowset_Expression
      U-SQL also provides the ability to combine nested query expressions, table-valued function calls or querying external rowsets. Follow the links for more details on each.

      Rowset_Expression :=                                                                      
          '(' Query_Expression ')' 
      |   Function_Call
      |   External_Rowset_Expression.

      In these cases, a rowset alias has to be provided and cannot be left out.
      The UDO programming model makes both the values and the schema of the input rowsets available in the context of the combiner's implementation.

[//]: # '• Presort_Clause
Content is under development and will be provided soon.
Presort_Clause := 'PRESORT' Sort_Item_List.'

  • Join_On_Clause
    Specifies the join comparison that is being used to identify which of the rows from each rowset will be joined.

    Join_On_Clause :=                                                                                   
        'ON' Join_Comparison_Expression.

    where Join_Comparison_Expression provides the join comparison expression for the combiner.

  • Produce_Clause
    Specifies the rowset schema returned by the COMBINE expression.

    Produce_Clause :=                                                                                   
        'PRODUCE' Column_Definition_List.


    • Column_Definition_List
      This list defines the schema of the combiner. The returned columns are defined as a pair of column names and column types:

      Column_Definition_List :=                                                                      
           Column_Definition { ',' Column_Definition}.
      Column_Definition := Quoted_or_Unquoted_IdentifierBuilt_in_Type.

      Each column has an identifier that can be either a quoted or unquoted identifier. A column is typed with one of the U-SQL types that the combiner supports.

      The UDO programming model makes the specified rowset schema available to the implementation of the combiner. An error is raised if the combiner is producing a schema that is incompatible with the specified return schema.

  • Readonly_Clause
    The optional READONLY clause can help the UDO programmer to write more efficient code. For more information on how the UDO programmer can take advantage of this hint, see the U-SQL C# Developer’s Guide.

    The optional READONLY clause specifies the columns are read only for the combiner and will be passed through to the output using either the same name or the specified column name in parenthesis. Only columns in the combine expression’s ON clause can be marked READONLY, otherwise the error “E_CSC_USER_UDOREADONLYNOTKEYCOLUMN: Column '…' cannot be marked as READONLY” is raised.

    Readonly_Clause :=                                                                                  
        'READONLY' Star_Or_Readonly_Column_List.
    Star_Or_Readonly_Column_List := '*' | Readonly_Column_List.
    Readonly_Column_List := Readonly_Column { ',' Readonly_Column }.
    Readonly_Column := Column_Identifier [Output_Column_Dependency_Alias].
    Output_Column_Dependency_Alias := '(' Quoted_or_Unquoted_Identifier ')'.
  • Required_Clause
    The optional REQUIRED clause can help the UDO programmer to write more efficient code. For more information on how the UDO programmer can take advantage of this hint, see the U-SQL C# Developer’s Guide.

    The optional REQUIRED clause specifies that either all columns are required on input for the combiner (if specified with *) or the specified columns are required. If a specified column is followed by a list of columns in parenthesis, then the input column is only required if the columns in that list are referenced from the output.

    Required_Clause :=                                                                                  
        'REQUIRED' Star_Or_Required_Column_List.
    Star_Or_Required_Column_List := '*' | Required_Column_List.
    Required_Column_List := Required_Column { ',' Required_Column}.
    Required_Column := Column_Identifier [Required_Output_Column_Dependency_List].
    Required_Output_Column_Dependency_List := '(' Identifier_List ')'.
  • Using_Clause
    The USING clause specifies which combiner should be used to transform the input rowset.

    USING_Clause :=                                                                                     
        'USING' udo_expression.

    The USING clause takes a C# expression that returns an instance of ICombiner. Users can write their own by implementing an ICombiner (see U-SQL Programmability Guide: User-Defined Combiner for more detail on how to write your own combiner). Most commonly, the UDO expression is either the instantiation of a combiner class of the form

    USING new MyNameSpace.MyCombiner(parameter:"value")

    or the invocation of a factory method

    USING MyNameSpace.MyCombinerFactory(parameter:"value")

    where parameter is a parameter of the combiner.


User-Defined Combiner - CombinerEX
c# code is placed in the associated Code-Behind .cs file. See usage in next section, below.

using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ReferenceGuide_Examples
    public class CombinerEX : ICombiner
        public override IEnumerable<IRow> Combine(IRowset left, IRowset right, IUpdatableRow output)
            var theRight = (from row in right.Rows
                            select new
                                Division = row.Get<string>("Division"),
                                carIDstart = row.Get<int>("carIDstart"),
                                carIDend = row.Get<int>("carIDend")

            foreach (var row in left.Rows)
                int carID = row.Get<int>("carID");

                var theLeft = (from w in theRight
                               where carID >= w.carIDstart && carID <= w.carIDend
                               select new
                                   Division = w.Division

                output.Set<string>("currentModel", row.Get<string>("currentModel"));
                output.Set<string>("introYear", row.Get<string>("introYear"));
                output.Set<string>("Division", theLeft[0].Division);
                yield return output.AsReadOnly();

Using User-Defined Combiner - CombinerEX
Using Code-Behind from previous section, above.

@right = 
        ( VALUES
        (0, 99, "Ford Motor Company", "Ford"),
        (100, 199, "Ford Motor Company", "Lincoln"),
        (200, 299, "Ford Motor Company", "Motorcraft"),
        (300, 399, "General Motors Company", "Buick"),
        (400, 499, "General Motors Company", "Chevrolet"),
        (500, 599, "General Motors Company", "Cadillac"),
        (600, 699, "General Motors Company", "GMC"),
        (700, 799, "Fiat Chrysler", "Chrysler"),
        (800, 899, "Fiat Chrysler", "Dodge"),
        (900, 999, "Fiat Chrysler", "Jeep"),
        (1000, 1099, "Fiat Chrysler", "Ram"),
        (1100, 1199, "Fiat Chrysler", "Mopar"),
        (1200, 1299, "Fiat Chrysler", "SRT")
        ) AS T(carIDstart, carIDend, Automaker, Division);

@left = 
        ( VALUES
        (3, 0, "Mustang", "1964"),
        (7, 0, "Fiesta", "1976"),
        (133, 100, "Navigator", "1998"),
        (160, 100, "Continental", "2017"),
        (639, 600, "Canyon", "2004"),
        (801, 800, "Challenger", "2008"),
        (802, 800, "Charger", "2006")
        ) AS T(carID, baseID, currentModel, introYear);

@result =
    COMBINE @left AS l
    WITH @right AS r
    ON l.baseID == r.carIDstart
    PRODUCE currentModel string,
            Division string,
            introYear string
    REQUIRED Division, carIDstart, carIDend, carID, currentModel, introYear
    USING new ReferenceGuide_Examples.CombinerEX();

OUTPUT @result
TO "/ReferenceGuide/QSE/PrimaryRowsetExpressions/Combine/ExampleA.txt"
USING Outputters.Tsv(outputHeader: true);

Combiner with ORDER BY and FETCH
The ORDER BY clause with FETCH allows the selection of a limited number of rows based on the specified order. This examples continues to use CombinerEX defined above.

// Same as previous example but only returns top 3 records ordered by introYear
@result =
    COMBINE @left AS l
    WITH @right AS r
    ON l.baseID == r.carIDstart
    PRODUCE currentModel string,
            Division string,
            introYear string            
    REQUIRED Division, carIDstart, carIDend, carID, currentModel, introYear
    USING new ReferenceGuide_Examples.CombinerEX()

    OUTPUT @result
    TO "/ReferenceGuide/QSE/PrimaryRowsetExpressions/Combine/ExampleB.txt"
    USING Outputters.Tsv(outputHeader: true);

See Also

© 2018 Microsoft