Functions DEDUPE - trillium_discovery - trillium_quality - Latest

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
Latest
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Copyright
2024
First publish date
2008
Last updated
2024-10-18
Published on
2024-10-18T15:02:04.502478

Separates the value in an attribute into tokens and returns the deduped and delimited list of tokens. It performs the deduplication on the attribute value by searching the maximum number of tokens per phrase first, and repeats the search after decrementing the number of tokens per phrase by 1 each time. This process will continue until the number of tokens per phrase reaches the minimum specified.

Note:

You can use the DEDUPE function in the Transformer and the Set Selection utility.

General Guidelines

  • The search is case-sensitive. For example, "Car" and "car" are not duplicates. When the data is in mixed case, the DEDUPE function can be used with the UPPER or LOWER function.
  • Duplicate phrases cannot extend across previously removed phrase(s) within a given record.
  • Pay special attention to the logic for multi-token phrase processing outlined in Example 1. When a duplicate multi-token phrase is found, the original order of tokens in the attribute may not be maintained.

Guidelines for the Set Selection Utility

  • The numerical values will be returned based on currently calculated precision and string formatting.
  • Duplicate phrases cannot extend across boundaries of a given attribute for a given record within the set.
  • The maximum length of the returned string will be the length of the input attribute times the number of records in the set. For example, if the input attribute has a length of 30 and there are 4 records in the set each with a full 30 character of data and no duplicates are found, the concatenation of the 4 records will be 120 characters. If this returned string is written back to a receiving attribute with a length less than 120 characters, the returned data will be truncated to the length of the receiving attribute.