Remove Duplicates (Deprecated) - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

This deprecated node removes duplicate records from one or more inputs based on the value(s) specified in the FieldListExpr property.

CAUTION:
This node has been deprecated and will not be supported in a future release. As an alternative, the Remove Duplicates node can be used to provide similar functionality, but the underlying code is Python rather than Data360 Analyze Script.

The output data is also sorted by this node.

To detect duplicates, you can use the Duplicate Detection node.

Example

You have the following input data:

Product_Codeunicode Product_Nameunicode
15 Tea
2 Coffee
3 Water
15 Tea-EarlGrey
15 Tea-Herbal
15 Tea

Removing duplicates across all input fields

If you do not enter a value in the FieldListExpr property, the duplicate detection will run across all of the input fields, meaning that only records that are identical across all fields will be removed.

One instance of "15, Tea" is removed in the output:

Product_Codeunicode Product_Nameunicode
15 Tea
15 Tea-EarlGrey
15 Tea-Herbal
2 Coffee
3 Water

Removing duplicates based on a specified field

If you enter the name of an input field in the FieldListExpr property, any duplicates in that field will be removed, regardless of whether the other input field(s) contain matching data.

For example, in the FieldListExpr property, if you enter:

Product_Code

Three records that have the Product_Code of "15" are removed, despite the differences in the Product_Name field:

Product_Codeunicode Product_Nameunicode
15 Tea
2 Coffee
3 Water

To check for duplicates before removing them, you can use the Duplicate Detection node.

Properties

FieldListExpr

Specify a Script expression or enter a list of comma separated input fields to be used in the identification of duplicate records in the input data.

If no value is given, then all fields will be used.

StableSort

Optionally specify whether records with the same field will remain in the same order.

The default value is False, meaning that the records are sorted when this node runs.

Epsilon

Optionally specify a tolerance epsilon for the comparison of floating point numbers, for example, 0.1.

MergeOnly

Optionally specify whether to only perform a merge operation on the input data. This mode is useful for merging the output of multiple parallel sort nodes.

Note: This option requires the data on the input pins to already be sorted. Unsorted input data will produce undefined unsorted output data.

Inputs and outputs

Inputs: Duplicates, multiple optional.

Outputs: Duplicates removed.