The Streaming Join node joins two streaming sources, or a streaming source and a batch source, allowing you to combine fields from two separate data stores to create a new sheet made of rows where related fields are matched.
The first data set (left) must be a streaming data source. The second data set (right) can be either a streaming data source or a batch data source. The combined output is always a streaming data set.
To perform a successful Join, the two chosen fields must be related.
The Streaming Join node allows three join types:
Inner
Only creates a row if the left field value has a matching field value on the right.
Left outer
Creates rows for every left field value and joins any matching right field values. Does not include right field values without a matching left field value.
Right outer
Creates rows for every right field value and joins any matching left field values. Does not include left field values without a matching right field value.
Example
The first data set (left) contains order transactions from a streaming data source:
AccountIDint | Datedate | Itemstring |
---|---|---|
1000101 | 2016-01-01 | Tea |
1000102 | 2016-01-01 | Coffee |
1000103 | 2016-01-01 | Scone |
1000104 | 2016-01-02 | Water |
1000105 | 2016-01-12 | Tea |
The second data set (right) contains customer information from a batch data source:
AccountIDint | Namestring | Addressstring |
---|---|---|
1000101 | MARCIA SIDONY | "10 Rose Lane, Springfield" |
1000102 | DAVE THURSTAN | "73 Gorham Street, Boston" |
1000103 | ALEXANDRA GILLIAN | "7 James Street, Florida" |
1000104 | HALEY JANNA | "78 Summer Way, Denver" |
You want to join these two data sets to see if there are any transactions that are not associated with a customer, and any customers who have not made any transactions. The input field AccountID
is common to both data sets, so you can use this field to join them (note that even if the names of the input fields did not match, if the contents of the fields are similar they can still be used to join).
The following table shows the combined output data (streaming):
AccountID | Date | Item | Name | Address |
---|---|---|---|---|
1000101 | 2016-01-01 | Tea | MARCIA SIDONY | 10 Rose Lane, Springfield |
1000102 | 2016-01-01 | Coffee | DAVE THURSTAN | 73 Gorham Street, Boston |
1000103 | 2016-01-01 | Scone | ALEXANDRA GILLIAN | 7 James Street, Florida |
1000104 | 2016-01-02 | Water | HALEY JANNA | 78 Summer Way, Denver |
Properties
Display Name
Specify the name of the node that is displayed on the Analysis Designer canvas.
The default value is Streaming Join.
Join Type
Select a join type. Choose from:
- Inner
- Left outer
- Right outer
The default value is Left outer.
Source Configuration
Optionally click the Source Configuration button if you want to configure watermark settings.
- In the Join Sources dialog, click the Configure button to specify watermark settings for the left or right inputs.
- Select a datetime field in the Watermark Time Field property.
- Specify the number of seconds to wait for late events in the Watermark Window Threshold property.
Note that if you specify a value for one of the watermark properties, the other is also required.
Click Save to close the Configure dialog, then click Save on the Join Sources dialog when you have finished editing both inputs.
Join Fields
Click the Add Field button to specify the left and right input fields on which you want to base the join, then click Update. The selected fields must be related to perform a successful join.
Click Advanced to specify additional non-equality joining criteria. The system shows any advanced criteria in the text box below the Advanced button.
Output Fields
Select the fields that you want to include in the output.