Customer Data Parser Process - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Quality
Trillium > Trillium Discovery
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
How Do I
Overview
Configuration
Reference
Administration
Installation
First publish date
2008

The CDP standardizes name and address data by identifying individual elements and recoding the elements according to country-specific rules and tables. The CDP completes its process in seven major steps:

Note: Details of the steps described in this topic is written out in the CDP debug file. By enabling debugging in the CDP settings, you can review the process from tokenization to the final output for each record.

Step 1: Assign Intrinsic Attributes

The CDP separates the input data lines into tokens and assigns an . Tokens are strings that contain one or more characters (and/or symbols) that are identifiable as a word or phrase. An intrinsic attribute describes the type of data in a token (for example, ALPHA, NUMERIC, and so on) to each token. For example, a token containing only alphabetic characters would be assigned an intrinsic attribute of ALPHA and a token containing a single numeric digit would be assigned an intrinsic attribute of 1NUMERIC.

Step 2: Assign Specific Attributes

The CDP looks up all tokens in the country-specific parser table and assigns . A specific attribute identifies the token as a particular name or address element, such as GIVEN-NAME.

Step 3: Assign Line Types

Using the attributes assigned in Step 2, the CDP identifies the line type and reassesses the attributes based on the line type. For example, if the attributes on a line include STREET-NAME and STREET-TYPE, the CDP would identify the line type as Street. The Parser identifies four types of lines:

  • Name (N)
  • Street (S)
  • Geography (G)
  • Miscellaneous (M)

Step 4: Process Geography Lines

After identifying the lines, the CDP then parses each line in detail. Geographic lines are parsed first using the city tables.

Step 5: Process Name Lines

Next, the CDP parses the name lines. The CDP looks up the original pattern of the name line in the country-specific word and pattern definitions table and recodes the line based on the recoded pattern.

Step 6: Process Street Lines

Street line information is similar to name line information. The CDP looks up the original pattern of the street line in the country-specific word and pattern definitions table and recodes the line based on the recoded pattern.

Step 7: Generate Output

Based on the final attributes, the CDP generates standardized output that can be analyzed and corrected (PR_ fields, the "Parser Repository").