Parsing Names for China, Korea, and Taiwan - trillium_discovery - trillium_quality - Latest

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
Latest
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Copyright
2024
First publish date
2008
Last updated
2024-10-18
Published on
2024-10-18T15:02:04.502478

The Customer Data Parser (CDP) processes personal and business names for China, Korea, and Taiwan in three steps.

Step 1: Token Identification

The first step is to isolate words and phrases into tokens. This is called "Token Identification." Tokens may contain one or more characters (and/or symbols) that are identifiable as a word or word/phrase element. During the initial scan, the Parser uses commas or space characters in the input attribute to determine where one token ends and the next begins.

 

Step 2: Table Lookup

The second step is to scan each token against Standard and Custom Parser Definitions tables (also known as a lookup or word pattern table). This process verifies which tokens are personal names and which are business names. It also identifies the surname character(s) and uncovers new tokens based on the lookup results. During this process, all word elements that can be further identified as part of a name, for example, a surname and given name, are created as separate tokens.

 

Step 3: Output

The Parser passes a comprehensive data block called the PREPOS (Parser Repository). The PREPOS contains parsed data including error codes, identification indicators and name information. The output schema determines which of these attributes are returned to the output.

Click the following topics to setup and run the Customer Data Parser process.