CN Postal Matching Process - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Trillium > Trillium Quality
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

The CN Postal Matcher uses a four-step process to identify and match postal information.

Step 1: Initial Parsing

The first step in the Postal Matching process isolates all words and phrases by breaking up the input attribute(s) into recognizable tokens. During the initial scan, the Postal Matcher uses commas or space characters in the input attribute to determine where one token ends and the next begins.

Example

Input record: 吴卓霖,广东省广州市南华路22号,135800

Initial token results: (six tokens)

Token 1 Token 2 Token 3 Token 4 Token 5 Token 6
吴卓霖 广东省 广州市 南华路 22号 135800

Step 2: Table-Based Tokenizing

After initial tokens are created, the Postal Matcher scans each token against the Parser Definition tables to further identify the tokens. During this secondary identification process, all elements further identifiable via the Parsing Definition entries are also separated into tokens.

Example

Token results of previous step: (6 tokens)

吴卓霖 | 广东省 | 广州市 | 南华路 | 22号 | 135800

Token results of this step: (12 tokens)

Previous Results New Results Reasoning
吴卓霖 吴 | 卓霖 Based on surname lookup
广东省 广东省 Based on L1 lookup
广州市 广州市 Based on L2 lookup
南华路 南华路 Based on L4 lookup
22号 22 | 号 Recognized as house number based on table lookup
135800 135800 See Step 3.

Step 3: Mask-Based Data Identification

Any token that remains unknown after the table look-up process is subsequently reviewed against a set of pre-defined masks (data shapes) in the Parser Definition table.

For example, 1100000 is identified as a postcode based on pr_postcode: 1100000

1100000 pr_postcode: 1100000 Recognized as postcode based on mask lookup.

Step 4: Output to PREPOS

The Postal Matcher passes results of the data identification process to the PREPOS program. See Analyzing the Postal Matcher Results for Asian Countries.