This deprecated node enriches a large data set (left table) with a small data set (right table) by joining the two data sets based on common fields.
The Lookup node works in a similar way to the Join node and the Merge node and is an optimization of the inner and left joins.
An advantage of using the Lookup node is that the input data sets do not need to be sorted prior to using the node; however, the entire data set from the right table is loaded into memory, therefore, it is only recommended for use with a small data set on the right input. If you want to join two large data sets, it is recommend that you use the Merge node.
To merge two data sets using the Lookup node, in the InputKey property, specify the field names from the first (main) input data set that you want to base the join on. Then, in the LookupKey property, specify the input fields from the second (smaller) data set that you want to base the join on.
You can also use the Lookup node to join a single record (containing potentially multiple fields) to all the records in the other data set. In this case, enter 1
or true
in both the InputKey and LookupKey properties.
For advanced users, it is also possible to define additional filtering criteria to be applied by the Lookup node while the data sets are merged; you can do this by entering Data360 Analyze Script in the Script property. See the Transform (Deprecated) or Split (Deprecated) node help topics for more information on the types of Script statements that can be used to refine your output.
In the output, if a left orphan is found, that is, if a record is not found in the lookup (right) data set, the right row contains a NULLvalue. When a match is found, both sides contain data.
Example
You have the following main data set containing a list of customers and locations:
Townstring | Customer_Namestring |
---|---|
Liverpool | Derek |
Leeds | Simon |
Sheffield | Alison |
Chester | David |
Liverpool | Emma |
Liverpool | Jane |
Chester | Mark |
Sheffield | Holly |
Sheffield | Stephen |
Liverpool | Wendy |
Leeds | Adrian |
You want to enrich this data set with the name of the sales representative that is responsible for each customer:
townstring | sales_repstring |
---|---|
Liverpool | Jon |
Chester | Adam |
Sheffield | Paul |
Leeds | Claire |
In the InputKey property, type:
Town
In the LookupKey property, type:
town
The output contains the following merged data set showing the areas that each sales rep are responsible for:
Townstring | Customer_Namestring | sales_repstring |
---|---|---|
Liverpool | Derek | Jon |
Leeds | Simon | Claire |
Sheffield | Alison | Paul |
Chester | David | Adam |
Liverpool | Emma | Jon |
Liverpool | Jane | Jon |
Chester | Mark | Adam |
Sheffield | Holly | Paul |
Sheffield | Stephan | Paul |
Liverpool | Wendy | Jon |
Leeds | Adrian | Claire |
Properties
InputKey
Specify an expression or field name to determine the names of the input fields from the first (main) input data set that you want to base the join on.
A value is required for this property.
LookupKey
Specify an expression or field name to determine the names of the input fields from the lookup (smaller) data set that you want to base the join on.
A value is required for this property.
Script
Optionally specify Script to define additional filtering criteria to be applied by the Lookup node while the data sets are merged. The default code in the Script property outputs all records from the left input along with any matching fields from the right input.
The Script property is evaluated once for every row of data, regardless of whether or not a match is found.
If you only want to output fields that are found in both data sets, uncomment the line that states "where matchIsFound" (the matchIsFound variable is only true when a row in the main data set matches a row in the lookup data set).
DuplicateKeyBehavior
Optionally specify what to do if there are duplicate records in the lookup (right) data set, based on the values specified in the LookupKey property. Choose from:
- Error - The node fails when it encounters a duplicate value.
- Log - A warning is logged for each duplicate value. The lookup row that is emitted is the first match found.
- Ignore - The first matching lookup value is used and no warning or error is logged.
The default value is Error.
Inputs and outputs
Inputs: data, lookup.
Outputs: out1, multiple optional.