SEMIJOIN (U-SQL)

 

Updated: March 10, 2017

Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern. There are two variants: LEFT SEMIJOIN and RIGHT SEMIJOIN.

A LEFT SEMIJOIN (or just SEMIJOIN) gives only those rows in the left rowset that have a matching row in the right rowset.

The RIGHT SEMIJOIN gives only those rows in the right rowset that have a matching row in the left rowset.

The join expression in the ON clause specifies how to determine the match.

Examples

Given the following rowsets:

@employees
EmpName stringDepID int?
Rafferty31
Jones33
Heisenberg33
Robinson34
Smith34
Williamsnull
@departments
DeptID intDepName string
31Sales
33Engineering
34Clerical
35Marketing

The following query finds all employees that are in valid departments by finding all the employees in the left @employees rowset that have a depID that is listed in the right @departments rowset):

@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  
                      
@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  
                       
@emps_in_valid_dept =  
    SELECT e.EmpName, e.DepID  
    FROM @employees AS e  
         LEFT SEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d  
         ON e.DepID == d.DepID;  
  
OUTPUT @emps_in_valid_dept   
TO "/output/rsLeftSemiJoinEmployeesInValidDept.csv"  
USING Outputters.Csv();

The resulting rowset looks like:

@emps_in_valid_dept
EmpName stringDepID int?
Rafferty31
Jones33
Heisenberg33
Robinson34
Smith34

The following query finds all departments with at least one employee:

@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  
                      
@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  
                       
@depts_with_emps =  
    SELECT d.DepName, d.DepID  
    FROM @employees AS e  
         RIGHT SEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d   
         ON e.DepID == d.DepID;  
  
  
OUTPUT @depts_with_emps   
TO "/output/rsRightSemiJoinDepartmentsWithEmployees.csv"  
USING Outputters.Csv();

The query return the rowset:

@depts_with_emps_0/@depts_with_emps_1
DepName stringDepID int?
Sales31
Engineering33
Clerical34

Additional Example
An example of comparing SEMIJOIN against a subquery with IN is available at Subqueries with IN/NOT IN and SEMIJOIN/ANTISEMIJOIN.

See Also

Community Additions

ADD
Show: