The TOKENIZE routine allows fields to be matched on a token by token basis. The default score is calculated based on the percentage of matched tokens.
Score |
Description |
---|---|
Base score |
(Number of matched tokens) divided by the (maximum number of matched tokens). The maximum number of matched tokens is the number of tokens in the shorter field. |
100 | All tokens match. |
88 | Blank field versus blank field. |
80 | Blank field versus non-blank field. |
Deduct from the base score | |
-3 | For each extra token in the longer field. |
Example
Field 1: ABC Company Trillium Software
Field 2: ABC Company TrilliumSoftware
Field 1 |
Token 1: ABC |
Field 2 |
Token 1: ABC |
---|---|---|---|
|
Token 2: Company |
|
Token 2: Company |
|
Token 3: Trillium |
|
Token 3:TrilliumSoftware |
|
Token 4: Software |
|
Attempted matches would be:
Field 1 |
Field 2 |
||
ABC |
vs |
ABC |
Match |
Company |
vs |
Company |
Match |
Trillium |
vs |
TrilliumSoftware |
|
Software |
vs |
TrilliumSoftware |
The matched percentage is calculated as 2 (number of matched tokens) divided by 3 (number of tokens in the shorter field, Field 2) yielding a base score of 67. A deduction, -3, is given for the 1 extra token in the longer field, Field 1. The final score is 67 – 3 = 64.