The Streaming Deduplication node eliminates redundant streaming data.
Example
You are analyzing account transaction data and want to obtain a list of unique customers.
An extract of the data imported by the Streaming Data Store node is shown in the following table:
firstName | lastName | age | state | transactionNumber |
---|---|---|---|---|
Rajesh | Rao | 22 | IL | 12445 |
John | Briggs | 41 | NY | 12446 |
George | Arnold | 22 | CA | 12447 |
Rajesh | Rao | 22 | IL | 12448 |
The customer Rajesh Rao appears twice, with two different transaction numbers. To remove the redundant record and obtain a list of unique customers by name, the Streaming Deduplication node is configured as follows:
Properties
Display Name
Specify the name of the node that is displayed on the Analysis Designer canvas.
The default value is Streaming Deduplication.
Watermark Time Field
Optionally select a datetime input field.
Watermark Window Threshold
Specify the number of seconds to wait for late events (enter an integer value). The value that you specify in this property applies if you select a field in the Watermark Time Field property.
The default value is 600 seconds.
Unique Identifier Fields
Select one or more fields that uniquely identify a record.
Click the Add Field button, then use the arrow buttons to add and remove fields in the Field Selector dialog, then click Save.
You must select at least one field.