Parsing Names for China, Korea, and Taiwan - trillium_discovery - trillium_quality - 17.2

Trillium Control Center

Product type
Product family
Trillium > Trillium Discovery
Trillium > Trillium Quality
Product name
Trillium Quality and Discovery
Trillium Control Center
First publish date
Last updated
Published on

The Customer Data Parser (CDP) processes personal and business names for China, Korea, and Taiwan in three steps.

Step 1: Token Identification

The first step is to isolate words and phrases into tokens. This is called "Token Identification." Tokens may contain one or more characters (and/or symbols) that are identifiable as a word or word/phrase element. During the initial scan, the Parser uses commas or space characters in the input attribute to determine where one token ends and the next begins.


Step 2: Table Lookup

The second step is to scan each token against Standard and Custom Parser Definitions tables (also known as a lookup or word pattern table). This process verifies which tokens are personal names and which are business names. It also identifies the surname character(s) and uncovers new tokens based on the lookup results. During this process, all word elements that can be further identified as part of a name, for example, a surname and given name, are created as separate tokens.


Step 3: Output

The Parser passes a comprehensive data block called the PREPOS (Parser Repository). The PREPOS contains parsed data including error codes, identification indicators and name information. The output schema determines which of these attributes are returned to the output.

Click the following topics to setup and run the Customer Data Parser process.