Join Data

 

Updated: October 16, 2015

Joins two datasets

You can use the Join Data module to perform a join operation on two tables. The module supports the following join types:

  • Inner Join. An inner join is the typical join operation. It returns the combined rows only when the values of the key columns match.

  • Left Outer Join. A left outer join returns joined rows for all rows from the left table. When a row in the left table has no matching rows in the right table, the returned row contains missing values for all columns that come from the right table unless you specify a replacement value for missing values.

  • Full Outer Join. A full outer join returns all rows from the left table (table1) and from the right table (table2).

    For each of the rows in the left table that have no matching rows in the right table, the join results include a row containing missing values from the right table.

    For each of the rows in the right table that have no matching rows in the left table, the join results include a row containing missing values for all columns from the left table.

  • Left Semi-Join. A left semi-join returns only the values from the left table when the values of the key columns match.

To perform a join on two datasets, they must be related by a single key column. Composite keys are not supported.

You can see examples of how this module is used by exploring these sample experiments in the Model Gallery:

For additional examples of Join Data, see these sample experiments:

  • The combined dataset cannot have two columns with the same name. Therefore, if the left and right datasets have any duplicate column names, a numeric suffix is appended to the column names of the right dataset to make them unique. For example, if both datasets had a column named Month, the column from the left dataset would remain as is, and the column from the right dataset would be renamed Month (1).

  • The algorithm that is used for comparison of key values is hash-forced.

  • Each column of the joined dataset preserves a categorical type, if the corresponding column of the input dataset is categorical.

  • In left outer joins, if there are any missing values, a categorical level is created in the left dataset for missing values. This is true even if there are no missing values in the joined (right) dataset.

Name

Type

Description

Dataset1

Data Table

First dataset to join

Dataset2

Data Table

Second dataset to join

Name

Range

Type

Default

Description

Join key columns for L

Any

ColumnSelection

Select the join key columns for the first dataset.

Join key columns for R

Any

ColumnSelection

Select the join key columns for the second dataset.

Match case

Any

Boolean

True

Indicate whether a case-sensitive comparison is allowed in key columns.

Join type

List

Type

Inner join

Choose a join type.

Keep right key columns in joined table

Any

Boolean

True

Indicate whether to keep key columns from the second dataset in the joined dataset.

Name

Type

Description

Results dataset

Data Table

Result of join operation

For a list of all exceptions, see Machine Learning Module Error Codes.

Exception

Description

Error 0001

An exception occurs if one or more specified columns of the dataset couldn't be found.

Error 0003

An exception occurs if one or more inputs are null or empty.

Error 0006

An exception occurs if the parameter is greater than or equal to the specified value.

Error 0016

An exception occurs if the input datasets that are passed to the module should have compatible column types, but they do not.

Error 0017

An exception occurs if one or more specified columns have types that are unsupported by the current module.

Error 0020

An exception occurs if the number of columns in some of the datasets that are passed to the module is too small.

Error 0028

An exception occurs when the column set contains duplicate column names and it is not allowed.

Error 0011

An exception occurs if the argument for the passed column set does not apply to any dataset columns.

Error 0027

An exception occurs when two objects have to be of the same size, but they are not.

Show: