TW Postal Matching Process - trillium_quality - trillium_discovery - Latest

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium™ software
Product
Trillium™ software > Trillium™ Discovery
Version
Latest
ft:locale
en-US
Product name
Trillium Quality and Discovery
ft:title
Trillium Control Center
Copyright
2025
First publish date
2008
ft:lastEdition
2025-08-28
ft:lastPublication
2025-08-28T06:18:28.409000
L1_Product_Gateway
Verify
L2_Product_Segment
Data Quality
L3_Product_Brand
Precisely Trillium
L4_Investment_Segment
Core Data Quality
L5_Product_Group
Data Quality - Application
L6_Product_Name
Trillium Discovery

The TW Postal Matcher uses a four-step process to identify and match postal information.

Step 1: Initial Parsing

The first step in the Postal Matching process isolates all words and phrases by breaking up the input attribute(s) into recognizable tokens. During the initial scan, the Postal Matcher uses commas or space characters in the input attribute to determine where one token ends and the next begins.

Example

Input record: 鄭淑珍, 台北市四維路 2號 3樓, 106

Initial token results: (six tokens)

Token 1 Token 2 Token 3 Token 4 Token 5 Token 6
鄭淑珍 台北市 四維路 2號

3樓

106

Step 2: Table-Based Tokenizing

After initial tokens are created, the Postal Matcher scans each token against the Parser Definition tables to further identify the tokens. During this secondary identification process, all elements further identifiable via the Parser Definition entries are also separated into tokens.

Example

Token results of previous step: (6 tokens)

鄭淑珍, 台北市四維路 2號 3樓, 106

Token results of this step: (9 tokens)

Previous Results New Results Reasoning
鄭淑珍 鄭 | 淑珍 Based on surname lookup
台北市

台北市

Based on L1 lookup

四維路

四維路

Based on L4 lookup
2 號 2|號 Recognized as house number based on table lookup
3樓 3 | 樓 Recognized as floor number based on table lookup
106 106 See Step 3.

Step 3: Mask-Based Data Identification

Any token that remains unknown after the table look-up process is subsequently reviewed against a set of pre-defined masks (data shapes) in the Parser Definition table.

Example

For example, 1100000 is identified as a postcode based on pr_postcode: 1100000

1100000 pr_postcode: 1100000 Recognized as postcode based on mask lookup.

Step 4: Output to PREPOS

The Postal Matcher passes results of the data identification process to the PREPOS program. See Analyzing the Postal Matcher Results for Asian Countries.