JP Postal Matching Process - trillium_discovery - trillium_quality - 17.2

Trillium Control Center

Product type
Product family
Trillium > Trillium Quality
Trillium > Trillium Discovery
Product name
Trillium Quality and Discovery
Trillium Control Center
First publish date
Last updated
Published on

The Standard JP Postal Matcher uses a four-step process to identify and match postal information.

Step 1: Initial Parsing

The first step in the Postal Matching process isolates words and phrases by breaking up the input attribute(s) into tokens. During the initial scan, the Postal Matcher uses commas or space characters in the input attribute to determine where one token ends and the next begins.


Input record: 東京都台東区浅草4丁目10番 1110032

Initial token results: (six tokens)

Token 1 Token 2 Token 3 Token 4 Token 5 Token 6
東京都 台東区 浅草 4丁目 10番 1110032

Step 2: Table-Based Tokenizing

After initial tokens are created, the Postal Matcher scans each token against the Parser Definition tables to further identify the tokens. During this secondary identification process, all elements further identifiable via the Parsing Definition entries are also separated into tokens.


Token results of previous step: (6 tokens)

東京都 | 台東区 | 浅草 | 4丁目 | 7番 | 1110032

Token results of this step: (12 tokens)

Previous Results New Results Reasoning
東京都 東京都 Based on L1 lookup
台東区 台東区 Based on L2 lookup
浅草 浅草 Based on L3 lookup
4丁目 4 | 丁目 Based on Chome lookup
10番 10 | 番 Based on Ban lookup
1110032 1110032 See Step 3.

Step 3: Mask-Based Data Identification

Any token that remains unknown after the table look-up process is subsequently reviewed against a set of pre-defined masks (data shapes) in the Parser Definition table.


For example, 1100000 is identified as a postcode based on pr_postcode: 1100000

1100000 pr_postcode: 1100000 Recognized as postcode based on mask lookup.

Step 4: Output to PREPOS

The Postal Matcher passes results of the data identification process to the PREPOS program. See Analyzing the Postal Matcher Results for Asian Countries.