This deprecated node removes duplicate records from one or more inputs based on the value(s) specified in the FieldListExpr property.
The output data is also sorted by this node.
To detect duplicates, you can use the Duplicate Detection node.
Example
You have the following input data:
Product_Codeunicode | Product_Nameunicode |
15 | Tea |
2 | Coffee |
3 | Water |
15 | Tea-EarlGrey |
15 | Tea-Herbal |
15 | Tea |
Removing duplicates across all input fields
If you do not enter a value in the FieldListExpr property, the duplicate detection will run across all of the input fields, meaning that only records that are identical across all fields will be removed.
One instance of "15, Tea" is removed in the output:
Product_Codeunicode | Product_Nameunicode |
15 | Tea |
15 | Tea-EarlGrey |
15 | Tea-Herbal |
2 | Coffee |
3 | Water |
Removing duplicates based on a specified field
If you enter the name of an input field in the FieldListExpr property, any duplicates in that field will be removed, regardless of whether the other input field(s) contain matching data.
For example, in the FieldListExpr property, if you enter:
Product_Code
Three records that have the Product_Code
of "15" are removed, despite the differences in the Product_Name
field:
Product_Codeunicode | Product_Nameunicode |
15 | Tea |
2 | Coffee |
3 | Water |
To check for duplicates before removing them, you can use the Duplicate Detection node.
Properties
FieldListExpr
Specify a Script expression or enter a list of comma separated input fields to be used in the identification of duplicate records in the input data.
If no value is given, then all fields will be used.
StableSort
Optionally specify whether records with the same field will remain in the same order.
The default value is False, meaning that the records are sorted when this node runs.
Epsilon
Optionally specify a tolerance epsilon for the comparison of floating point numbers, for example, 0.1
.
MergeOnly
Optionally specify whether to only perform a merge operation on the input data. This mode is useful for merging the output of multiple parallel sort nodes.
Inputs and outputs
Inputs: Duplicates, multiple optional.
Outputs: Duplicates removed.