Token Masks for China, Japan, Korea, and Taiwan  - trillium_discovery - trillium_quality - 17.1

Trillium Parser Tuner

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Quality
Trillium > Trillium Discovery
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Parser Tuner
Topic type
How Do I
Overview
Configuration
Reference
Administration
Installation
First publish date
2008

Token masks are used to detect data patterns that allow the Customer Data Parser and Postal Matcher to identify data elements. The following table lists the valid token masks. The list applies to all Asian countries (China, Japan, Korea, and Taiwan) unless otherwise specified.

Token Mask

Description

1

Level 1 Geography (State/Prefecture).

2

Level 2 Geography (City).

3

Level 3 Geography (Town).

4

Level 4 Geography (Street).

A

Ambiguous address token.

B

Business branch indicator.

C

Business clue word.

D

Business department.

E

Special business type token. In front of this token is ‘N’, after it is ‘B’.

F

First name.

G

Number, no hyphen.

Unknown with one number (Japan)

H

Honorific.

I

String to be ignored.

J

One alpha character. If this token is not changed by a mask, then it will be converted to an unknown token and saved in the pr_unknown_token_n field.

L

Last name, surname.

M

Merge token with the previous token.

N

Business name.

O

Other tokens such as house #, building #, floor #, entrance #. Two numbers with hyphen (Korea).

P

Job position.

Q

Postal code.

R

Place or region (used for business clue).

S

Space character. Not really a token, but part of mask.

T

Business type.

U

Unknown token.

V

More than one alpha character. If this token is not changed by a mask, then it will be converted to an unknown token and saved in the pr_unknown_token_n field.

W

Postal mask, but invalid value.

X

Unknown personal name token.

Y

Chinese direction character (north, south, east, west). If this token is not changed by a mask, then it will be converted to an unknown token and saved in the pr_unknown_token_n field.

Z

Merge with lowest confidence before or after.

a

Second string that matches a Level 1 geography. (See token ‘1’)

b

Second string that matches a Level 2 geography. (See token ‘2’)

c

Second string that matches a Level 3 geography. (See token ‘3’)

d

Second string that matches a Level 4 geography. (See token ‘4’)

e

Area name (China, Taiwan). Apartment number (Korea). Post office name (Japan).

f

Floor number.

g

Building name.

h

House number (China, Taiwan).

Chome (Japan).

Apartment-House indicator with two numbers (Korea).

i

Building number (China, Taiwan).

Ban (Japan).

House indicator with one number (Korea).

j

Block number with one number (Taiwan lane).

Go (Japan).

k

Block number with two numbers (Taiwan alley).

l

Business names that are followed by PO Boxes. They are put in Level 4 because they act like addresses. Example: post offices, schools, etc.

m

Level 4 with ‘RI’ at end (Taiwan sub-house). Sub-house number (China)

n

Sub-block (Korea).

Three numbers with hyphen such as "1-2-3" (China, Taiwan).

Building number (China). Department number (Taiwan).

Unknown with three numbers (Japan).

o

Two numbers with hyphen.

Unknown with two numbers (Japan).

p

PO Box.

r

Room / Apt / Suite number.

s

Two numbers separated by slash.

Building number indicator (Japan).

t

Entrance number (China, Taiwan).

Two numbers separated by slash with tong (Korea).

u

One number with tong (Korea). Section number (China, Taiwan)

v

One number with ban (Japan). Sub-block number (Korea). Room type (China, Taiwan)

#

Comment token (for example, enclosed in parenthesis).