You can combine related data streams within your data flow by using the Merge node, the Join node or the Lookup node.
Which node should I use to combine my data?
- The Merge node combines left, right and inner joins into a single node. The Merge node is a great place to start, as it provides a good overview of the matches and non-matches in both data sets.
- The Join node has only one output, and allows you to pick a specific type of join to combine your data. You can select from the following join types: left, left orphans, right, right orphans, inner, all orphans and full outer.
- The Lookup node is an optimization of the inner and left joins and is best used to enrich a large data set (left) with a smaller data set (right).
Example data flows
A number of sample Data Flows are available from the Samples workspace, found in the Analyze Directory page.
In the Directory under the /Data360 Samples/Tutorials/
folder, you will find "Joining and Blending Data", which shows examples of how to use this node.
Join types
The following table compares the Join and Merge nodes to describe the different ways that you can combine data from two sources in Data360 Analyze:
Join type | Description | Nodes | ||||
---|---|---|---|---|---|---|
|
A left orphans join combines two data sources and outputs any records that are only found in the left data set. |
|
||||
|
A right orphans join combines two data sources and outputs any records that are only found in the right data set. |
|
||||
|
An inner join combines two data sources and outputs only the matches i.e. data that is found in both the left and right data sets. |
|
||||
|
A left join combines two data sources and outputs all records from the left data set and any corresponding matches from the right data set. Tip: A left join in Data360 Analyze is equivalent to a SQL left join.
|
|
||||
|
A right join combines two data sources and outputs all records from the right data set and any corresponding matches from the left data set. Tip: A right join in Data360 Analyze is equivalent to a SQL right join.
|
|
||||
|
An all orphans join combines two data sources and outputs all records both data sets, apart from those that match. |
|
||||
|
A full outer join combines two data sources and outputs all records from both data sets. |
|
Example using the Merge node
For example, suppose you are analyzing data for a home appliances retailer. You have the following two input data sets, where the first contains a list of items and their cost, and the second shows the store where the item was purchased, as well as how many of each item were sold:
Data set one
Item | Cost |
---|---|
Iron | 20.99 |
Washing machine | 350.00 |
Fridge | 300.00 |
Data set two
Item | Store ID | Number sold |
---|---|---|
Ironing board | 123986 | 50 |
Iron | 123900 | 20 |
Fridge | 123901 | 2 |
By joining these two data sets, you can view and use the related data in your analysis, for example to answer questions such as; "What was the total income from iron purchases at a given store?".
- Connect each of the inputs to the Merge node, then run the input nodes.
The Merge node is used to join the two data sources on the "Item" field.
In the Match Keys Grid, select "Item" for both the Left Field and Right Field. - In the Field List grid, untick "Item" from the right Fields list.
- Run the Merge node.
The left orphans output pin shows the data that appears in the first (left) data source but not in the second (right) data source:
Item | Cost |
---|---|
Washing machine | 350.00 |
The matches output pin shows the combined data:
Item | Cost | Store ID | Number sold |
---|---|---|---|
Fridge | 300.00 | 123901 | 2 |
Iron | 20.99 | 123900 | 20 |
The right orphans output pin shows the data that appears in the second (right) data source but not in the first (left) data source:
Item | Store ID | Number sold |
---|---|---|
Ironing board | 123986 | 50 |
Combining two data sets with no matching fields
In this example, you have two data sets that you want to combine, but the data sets do not contain any matching keys:
Data set one
Customer_Name | City |
---|---|
Andrew Morley | Sheffield |
Mark Smith | Leeds |
Rachel Woods | London |
Data set two
Bank_ID |
---|
21285 |
- Connect each of the inputs to the Merge node, then run the input nodes.
- On the Merge node, do not specify any fields in the Match Keys grid, and use the default settings in the Field List grid.
- Set the NoJoinKeyBehavior property to Ignore.
The matches output pin shows the combined data:
Customer_Name | City | Bank_ID |
---|---|---|
Andrew Morley | Sheffield | 21285 |
Mark Smith | Leeds | 21285 |
Rachel Woods | London | 21285 |