Quality Functions - trillium_discovery - trillium_quality - Latest

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
Latest
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Copyright
2024
First publish date
2008
Last updated
2024-10-18
Published on
2024-10-18T15:02:04.502478

The following table shows the list of Quality functions you can use in the Expression Builder. The countries that use a particular function are indicated in the Country column.


The following table shows the list of Quality functions you can use in the Expression Builder. The countries that use a particular function are indicated in the Country column.

Function Name

Country

Description

ASCTOFULL

China

Japan

Korea

Taiwan

Transforms all half-width ASCII characters (single-byte) in an attribute to their full-width (double-byte) representation.

ASCTOHALF

China

Japan

Korea

Taiwan

Transforms all full-width ASCII characters (double-byte) in an attribute to their half-width (single-byte) representation.

CJKTOARABICNUM

China

Japan

Korea

Taiwan

Transforms Chinese number symbols in an attribute to their Arabic decimal equivalents.

Note: Make sure that you are applying this function to the attribute where Chinese numbers only represent NUMBERS. Otherwise, the following may happen: 千葉県 returns 1000葉県.

CJKTOFULL

China

Japan

Korea

Taiwan

Transforms half-width characters in an attribute to their full-width form. For Japan, this function automatically composes kana sound marks (dakuten and handakuten) appropriately. See Japanese full-width and half-width characters for details.

CJKTOHALF

China

Japan

Korea

Taiwan

Transforms full-width characters in an attribute to their half-width form. For Japan, this function automatically decomposes kana sound marks (dakuten and handakuten) appropriately. See Japanese full-width and half-width characters for details.

CTOSIMPCHINESE

China

Taiwan

Transforms Traditional Chinese characters in an attribute to their Simplified Chinese equivalent.

CTOTRADCHINESE

China

Taiwan

Transforms Simplified Chinese characters in an attribute to their Traditional Chinese equivalent.

DEDUPE All

Separates the value in an attribute into tokens and returns the deduped and delimited list of tokens. It performs the deduplication on the attribute value by searching the maximum number of tokens per phrase first, and repeats the search after decrementing the number of tokens per phrase by 1 each time. This process will continue until the number of tokens per phrase reaches the minimum specified.

Note: You can use the DEDUPE function in the Transformer and the Set Selection utility.

General Guidelines

  • The search is case-sensitive. For example, "Car" and "car" are not duplicates. When the data is in mixed case, the DEDUPE function can be used with the UPPER or LOWER function.
  • Duplicate phrases cannot extend across previously removed phrase(s) within a given record.
  • Pay special attention to the logic for multi-token phrase processing outlined in Example 1. When a duplicate multi-token phrase is found, the original order of tokens in the attribute may not be maintained.

Guidelines for the Set Selection Utility

  • The numerical values will be returned based on currently calculated precision and string formatting.
  • Duplicate phrases cannot extend across boundaries of a given attribute for a given record within the set.
  • The maximum length of the returned string will be the length of the input attribute times the number of records in the set. For example, if the input attribute has a length of 30 and there are 4 records in the set each with a full 30 character of data and no duplicates are found, the concatenation of the 4 records will be 120 characters. If this returned string is written back to a receiving attribute with a length less than 120 characters, the returned data will be truncated to the length of the receiving attribute.

JCOMBINE

Japan

Transforms spacing form sound marks (dakuten and handakutens) in an attribute to combining form. Usually used before JCOMPOSE.

If the sound marks cannot be merged with the preceding character (such as "ア"), they will be written out in hankaku in the output. If you need those sound marks to be in zenkaku in the output, use JSMARK after JCOMPOSE (JCOMBINE + JCOMPOSE + JSMARK). See Japanese Sound Marks for details.

JCOMPOSE

Japan

Merges combining form sound marks (dakuten and handakutens) with the base characters to build dakuten characters. It is recommended to use JCOMBINE.

If the sound marks cannot be merged with the preceding character (such as "ア"), they will be written out in hankaku in the output. If you need those sound marks to be in zenkaku in the output, use JSMARK after JCOMPOSE (JCOMBINE + JCOMPOSE + JSMARK). See Japanese Sound Marks for details.

JDECOMPOSE

Japan

Separate combining form sound marks (dakuten and handakutens) from their base character. Usually used before JSMARK. See Japanese Sound Marks for details.

JHIRAGANASTOL

Japan

Transforms small size yo-on and soku-on in an attribute to its large equivalent.

 

Zenkaku

Large: あいうえおつやゆよわアイウエオツヤユヨワ

Small: ぁぃぅぇぉっゃゅょゎ ァィゥェォッャュョヮ

 

Hankaku

Large: アイウエオツヤユヨ

Small: ァィゥェォッャュョ

 

JKANATOROMAN

Japan

Transform hiragana and full-width katakana characters in an attribute to Hebon style romaji. See Romaji characters.

JROMANTOKANA

Japan

Transforms romaji (Hebon) characters in an attribute to full-width katakana. See Romaji characters.

JSMARK

Japan

Transforms combining form sound marks (dakuten and handakutens) in an attribute to spacing mark form. Usually used after CJKTOFULL or JDECOMPOSE. See Japanese full-width and half-width characters for details.

JTOHIRAGANA

Japan

Transforms full-width katakana characters in an attribute to hiragana. If you want to convert half-width katakana characters to hiragara, run CJKTOFULL first and run JTOHIRAGANA. See Japanese full-width and half-width characters for details.

JTOKATAKANA Japan

Transforms hiragana characters in an attribute to full-width katakana. See Japanese full-width and half-width characters for details.

KTOROMAN Korea

Transforms Korean Hangul characters in an attribute to their romanized forms.

MATCH All

Compares attributes and/or values and returns a match score based on the Relationship Linker Comparison routines and modifiers. All comparison routines are available for this usage.

PROXIMITY All

Returns a calculated distance between two latitude and longitude coordinates, based on the DISTANCE Relationship Linker routine. Distance is measured in kilometers (KM), miles (MI), or nautical miles (NM). Each coordinate is made up of two numbers, one for latitude and one for longitude.

This function is useful to create an expression in the Transformer to append the calculated distance in a new attribute or to use as part of a conditional statement.

UNIQUE_ID All

Generates universally unique identifiers (UUIDs) as unique permanent record identifiers. A UUID is a unique 36-character key and used to maintain high volume records in the database. You can use UUIDs, for example, to determine record/attribute changes for sorted files and manage multiple views of matched relationships.

UUIDs are represented as 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form of 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens).

Example:

f18e79d0-d474-494e-8290-7e09c4b9679d

You can configure the Quality processes to generate UUIDs by creating a new attribute and setting the UNIQUE_ID function to that attribute.

Note: The attribute to contain the unique IDs must be a minimum of 36 characters in length and the attribute type must be ASCII.