Joining data - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

You can combine related data streams within your data flow by using the Merge node, the Join node or the Lookup node.

Which node should I use to combine my data?

  • The Merge node combines left, right and inner joins into a single node. The Merge node is a great place to start, as it provides a good overview of the matches and non-matches in both data sets.
  • The Join node has only one output, and allows you to pick a specific type of join to combine your data. You can select from the following join types: left, left orphans, right, right orphans, inner, all orphans and full outer.
  • The Lookup node is an optimization of the inner and left joins and is best used to enrich a large data set (left) with a smaller data set (right).

Example data flows

A number of sample Data Flows are available from the Samples workspace, found in the Analyze Directory page.

In the Directory under the /Data360 Samples/Tutorials/ folder, you will find "Joining and Blending Data", which shows examples of how to use this node.

Note: Upgrades will overwrite all data flows in the workspace. If you want to make changes to one of the sample data flows, we recommend you create a copy and save it elsewhere, using Save as...

Join types

The following table compares the Join and Merge nodes to describe the different ways that you can combine data from two sources in Data360 Analyze:

Join type Description Nodes
Left orphans Join left

A left orphans join combines two data sources and outputs any records that are only found in the left data set.

Join left The left orphans output pin of the Merge node outputs a left join. Alternatively, you can use the Join node to produce the same result by selecting Left Orphans in the JoinMode property.
Right orphans Join right

A right orphans join combines two data sources and outputs any records that are only found in the right data set.

Join right The right orphans output pin of the Merge node outputs a right join. Alternatively, you can use the Join node to produce the same result by selecting Right Orphans in the JoinMode property.
Inner Join inner

An inner join combines two data sources and outputs only the matches i.e. data that is found in both the left and right data sets.

Join inner The matches output pin of the Merge node outputs the data that appears in both data sources. Alternatively, you can use the Join node to produce the same result. The Join node outputs an inner join by default, therefore you do not need to change the setting of the JoinMode property to output this type of join.
Left Join left inner

A left join combines two data sources and outputs all records from the left data set and any corresponding matches from the right data set.

Tip: A left join in Data360 Analyze is equivalent to a SQL left join.

Join left inner

You can create a left inner join with the Join node by selecting Left in the JoinMode property.

Alternatively, a concatenation of the left orphans and matches outputs of the Merge node produces the same result, although the order of the records may be different.

Right Join right inner

A right join combines two data sources and outputs all records from the right data set and any corresponding matches from the left data set.

Tip: A right join in Data360 Analyze is equivalent to a SQL right join.
Join right

You can create a right inner join with the Join node by selecting Right in the JoinMode property.

Alternatively, a concatenation of the right orphans and matches outputs of the Merge node produces the same result, although the order of the records may be different.

All orphans All orphans

An all orphans join combines two data sources and outputs all records both data sets, apart from those that match.

Join full orphans

A concatenation of the left orphans and right orphans outputs of the Merge node produces an all orphans join.

Selecting All Orphans in the JoinMode property also outputs all non-matched records from both data sets.

Full outer Full outer join
A full outer join combines two data sources and outputs all records from both data sets.

A concatenation of the left orphans, right orphans and matches outputs of the Merge node produces a full outer join.

Alternatively, you can output all records from both inputs by selecting Full Outer in the JoinMode property of the Join node.

Selecting All Orphans in the JoinMode property outputs all non-matched records from both data sets.

Example using the Merge node

For example, suppose you are analyzing data for a home appliances retailer. You have the following two input data sets, where the first contains a list of items and their cost, and the second shows the store where the item was purchased, as well as how many of each item were sold:

Data set one

Item Cost
Iron 20.99
Washing machine 350.00
Fridge 300.00

Data set two

Item Store ID Number sold
Ironing board 123986 50
Iron 123900 20
Fridge 123901 2

By joining these two data sets, you can view and use the related data in your analysis, for example to answer questions such as; "What was the total income from iron purchases at a given store?".

  1. Connect each of the inputs to the Merge node, then run the input nodes.

    The Merge node is used to join the two data sources on the "Item" field.

    In the Match Keys Grid, select "Item" for both the Left Field and Right Field.
  2. In the Field List grid, untick "Item" from the right Fields list.
  3. Run the Merge node.

The left orphans output pin shows the data that appears in the first (left) data source but not in the second (right) data source:

Item Cost
Washing machine 350.00

The matches output pin shows the combined data:

Item Cost Store ID Number sold
Fridge 300.00 123901 2
Iron 20.99 123900 20

The right orphans output pin shows the data that appears in the second (right) data source but not in the first (left) data source:

Item Store ID Number sold
Ironing board 123986 50

Combining two data sets with no matching fields

In this example, you have two data sets that you want to combine, but the data sets do not contain any matching keys:

Data set one

Customer_Name City
Andrew Morley Sheffield
Mark Smith Leeds
Rachel Woods London

Data set two

Bank_ID
21285

  1. Connect each of the inputs to the Merge node, then run the input nodes.
  2. On the Merge node, do not specify any fields in the Match Keys grid, and use the default settings in the Field List grid.
  3. Set the NoJoinKeyBehavior property to Ignore.

The matches output pin shows the combined data:

Customer_Name City Bank_ID
Andrew Morley Sheffield 21285
Mark Smith Leeds 21285
Rachel Woods London 21285