Table of contents
TOC
Collapse the table of content
Expand the table of content

SEMIJOIN (U-SQL)

Michael Rys|Last Updated: 5/15/2018
|
4 Contributors

Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern. There are two variants: LEFT SEMIJOIN and RIGHT SEMIJOIN.

A LEFT SEMIJOIN (or just SEMIJOIN) gives only those rows in the left rowset that have a matching row in the right rowset.

The RIGHT SEMIJOIN gives only those rows in the right rowset that have a matching row in the left rowset.

The join expression in the ON clause specifies how to determine the match.

Examples

Given the following rowsets:

EmpNameDepID
Rafferty31
Jones33
Heisenberg33
Robinson34
Smith34
Williamsnull
DeptIDDepName
31Sales
33Engineering
34Clerical
35Marketing

The following query finds all employees that are in valid departments by finding all the employees in the left @employees rowset that have a depID that is listed in the right @departments rowset):

@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  

@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  

@emps_in_valid_dept =  
    SELECT e.EmpName, e.DepID  
    FROM @employees AS e  
         LEFT SEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d  
         ON e.DepID == d.DepID;  

OUTPUT @emps_in_valid_dept   
TO "/output/rsLeftSemiJoinEmployeesInValidDept.csv"  
USING Outputters.Csv();

The resulting rowset looks like:

EmpNameDepID
Rafferty31
Jones33
Heisenberg33
Robinson34
Smith34

The following query finds all departments with at least one employee:

@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  

@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  

@depts_with_emps =  
    SELECT d.DepName, d.DepID  
    FROM @employees AS e  
         RIGHT SEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d   
         ON e.DepID == d.DepID;  


OUTPUT @depts_with_emps   
TO "/output/rsRightSemiJoinDepartmentsWithEmployees.csv"  
USING Outputters.Csv();

The query return the rowset:

DepNameDepID
Sales31
Engineering33
Clerical34

Additional Example
An example of comparing SEMIJOIN against a subquery with IN is available at Subqueries with IN/NOT IN and SEMIJOIN/ANTISEMIJOIN.

See Also

© 2018 Microsoft