Streaming Deduplication - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
Last updated
2024-12-12
Published on
2024-12-12T10:34:46.959692

The Streaming Deduplication node eliminates redundant streaming data.

Example

You are analyzing account transaction data and want to obtain a list of unique customers.

An extract of the data imported by the Streaming Data Store node is shown in the following table:

firstName lastName age state transactionNumber
Rajesh Rao 22 IL 12445
John Briggs 41 NY 12446
George Arnold 22 CA 12447
Rajesh Rao 22 IL 12448

The customer Rajesh Rao appears twice, with two different transaction numbers. To remove the redundant record and obtain a list of unique customers by name, the Streaming Deduplication node is configured as follows:

Properties

Display Name

Specify the name of the node that is displayed on the Analysis Designer canvas.

The default value is Streaming Deduplication.

Watermark Time Field

Optionally select a datetime input field.

Watermark Window Threshold

Specify the number of seconds to wait for late events (enter an integer value). The value that you specify in this property applies if you select a field in the Watermark Time Field property.

The default value is 600 seconds.

Unique Identifier Fields

Select one or more fields that uniquely identify a record.

Click the Add Field button, then use the arrow buttons to add and remove fields in the Field Selector dialog, then click Save.

You must select at least one field.