Streaming Join - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
ft:lastEdition
2024-07-09
ft:lastPublication
2024-07-09T15:09:58.774265

The Streaming Join node joins two streaming sources, or a streaming source and a batch source, allowing you to combine fields from two separate data stores to create a new sheet made of rows where related fields are matched.

The first data set (left) must be a streaming data source. The second data set (right) can be either a streaming data source or a batch data source. The combined output is always a streaming data set.

To perform a successful Join, the two chosen fields must be related.

The Streaming Join node allows three join types:

Join types with the Streaming Join node

Inner

Only creates a row if the left field value has a matching field value on the right.

Left outer

Creates rows for every left field value and joins any matching right field values. Does not include right field values without a matching left field value.

Right outer

Creates rows for every right field value and joins any matching left field values. Does not include left field values without a matching right field value.

Example

The first data set (left) contains order transactions from a streaming data source:

AccountIDint Datedate Itemstring
1000101 2016-01-01 Tea
1000102 2016-01-01 Coffee
1000103 2016-01-01 Scone
1000104 2016-01-02 Water
1000105 2016-01-12 Tea

The second data set (right) contains customer information from a batch data source:

AccountIDint Namestring Addressstring
1000101 MARCIA SIDONY "10 Rose Lane, Springfield"
1000102 DAVE THURSTAN "73 Gorham Street, Boston"
1000103 ALEXANDRA GILLIAN "7 James Street, Florida"
1000104 HALEY JANNA "78 Summer Way, Denver"

You want to join these two data sets to see if there are any transactions that are not associated with a customer, and any customers who have not made any transactions. The input field AccountID is common to both data sets, so you can use this field to join them (note that even if the names of the input fields did not match, if the contents of the fields are similar they can still be used to join).

The following table shows the combined output data (streaming):

AccountID Date Item Name Address
1000101 2016-01-01 Tea MARCIA SIDONY 10 Rose Lane, Springfield
1000102 2016-01-01 Coffee DAVE THURSTAN 73 Gorham Street, Boston
1000103 2016-01-01 Scone ALEXANDRA GILLIAN 7 James Street, Florida
1000104 2016-01-02 Water HALEY JANNA 78 Summer Way, Denver

Properties

Display Name

Specify the name of the node that is displayed on the Analysis Designer canvas.

The default value is Streaming Join.

Join Type

Select a join type. Choose from:

  • Inner
  • Left outer
  • Right outer

The default value is Left outer.

Source Configuration

Optionally click the Source Configuration button if you want to configure watermark settings.

  1. In the Join Sources dialog, click the Configure button to specify watermark settings for the left or right inputs.
  2. Select a datetime field in the Watermark Time Field property.
  3. Specify the number of seconds to wait for late events in the Watermark Window Threshold property.

Note that if you specify a value for one of the watermark properties, the other is also required.

Click Save to close the Configure dialog, then click Save on the Join Sources dialog when you have finished editing both inputs.

Join Fields

Click the Add Field button to specify the left and right input fields on which you want to base the join, then click Update. The selected fields must be related to perform a successful join.

Click Advanced to specify additional non-equality joining criteria. The system shows any advanced criteria in the text box below the Advanced button.

Output Fields

Select the fields that you want to include in the output.