The PHRASE2 modifier will attempt to enhance the comparison through phrasing by concatenating tokens before matching. When this modifier is specified, the Relationship Linker performs the following steps:
- Performs the regular TOKENIZE routine and obtains the base score.
- If the base score is less than 98, and there are multiple tokens to concatenate in Field 1, and there are unmatched tokens in Field 2, it attempts the PHRASE2 matching and obtains the score.
- If the resulting score is less than 98, and there are multiple tokens to concatenate in Field 2, and there are unmatched tokens in Field 1, it attempts the reverse PHRASE2 matching and obtains the final score.
Each matched phrase will increase the final score by a weighted scoring algorithm described below.
Score |
Description |
---|---|
Base score + Weight |
Sum of the base score and weight. Weight is calculated by: (Number of matches) * (Weight factor) * 3 |
Weight Factor
Difference in Number of Tokens in Fields |
Weight Factor |
---|---|
0 | 5 |
1 | 4 |
2 | 3 |
3 | 2 |
4+ | 1 |
Example 1 - no reverse matching
Field 1: ABC Company Trillium Software
Field 2: ABC Company TrilliumSoftware
In this case, Field 1 has 4, Field 2 has 3 tokens. The Relationship Linker first performs the regular TOKENIZE routine.
Field 1 |
Token 1: ABC |
Field 2 |
Token 1: ABC |
---|---|---|---|
|
Token 2: Company |
|
Token 2: Company |
|
Token 3: Trillium |
|
Token 3: TrilliumSoftware |
|
Token 4: Software |
|
Attempted matches would be:
Field 1 |
Field 2 |
||
ABC |
vs |
ABC |
Match |
Company |
vs |
Company |
Match |
Trillium |
vs |
TrilliumSoftware |
|
Software |
vs |
TrilliumSoftware |
The matched percentage is calculated as 2 (number of matched tokens) divided by 3 (number of tokens in the shorter field, Field 2) yielding a base score of 67. A deduction, -3, is given for the 1 extra token in the longer field, Field 1. The final base score is 67 – 3 = 64.
Since the base score is less than 98, and there are two tokens left to concatenate in Field 1 ("Trillium" and "Software"), and there is one unmatched token in Field 2 ("TrilliumSoftware"), the linking attempts the PHRASE2 matching.
Field 1 |
Field 2 |
||
Trillium+Software |
vs |
TrilliumSoftware |
Match |
Due to the concatenation of tokens in Field 1, the number of tokens in Field 1 = 1 (TrilliumSoftware), the number of tokens in Field 2 = 1 (TrilliumSoftware). The number of matches is: 1 (TrilliumSoftware vs TrilliumSoftware), the difference in tokens is 1-1 = 0, yielding a weight factor : 5, total weight is: 1*5*3 = 15. The base score for the above example is 64. Adding the additional weight value of 15 would yield a final score: 64+15=79. In this example, there is no unmatched tokens left at this point and the reverse matching would not be attempted.
Example 2 - reverse matching
Field 1: ABC Company TrilliumSoftware
Field 2: ABC Company Trillium Software
In this case, Field 1 has 3, Field 2 has 4 tokens. The Relationship Linker first performs the regular TOKENIZE routine.
Field 1 |
Token 1: ABC |
Field 2 |
Token 1: ABC |
---|---|---|---|
|
Token 2: Company |
|
Token 2: Company |
|
Token 3: TrilliumSoftware |
|
Token 3: Trillium |
|
|
|
Token 4: Software |
Attempted matches would be:
Field 1 |
Field 2 |
||
ABC |
vs |
ABC |
Match |
Company |
vs |
Company |
Match |
TrilliumSoftware |
vs |
Trillium |
|
TrilliumSoftware |
vs |
Software |
The matched percentage is calculated as 2 (number of matched tokens) divided by 3 (number of tokens in the shorter field, Field 1) yielding a base score of 67. A deduction, -3, is given for the 1 extra token in the longer field, Field 2. The final base score is 67 – 3 = 64. Since there are not multiple tokens to concatenate in Field 1, the reverse matching would be attempted.
In this case, there are two tokens to concatenate in Field 2 ("Trillium" and "Software"), and there is one unmatched token in Field 1 ("TrilliumSoftware").
Field 2 |
Field 1 |
||
Trillium+Software |
vs |
TrilliumSoftware |
Match |
Due to the concatenation of tokens in Field 2, the number of tokens in Field 2 = 1 (TrilliumSoftware), the number of tokens in Field 1 = 1 (TrilliumSoftware). The number of matches is: 1 (TrilliumSoftware vs TrilliumSoftware), the difference in tokens is 1-1 = 0, yielding a weight factor : 5, total weight is: 1*5*3 = 15. The base score for the above example would be 64. Adding the additional weight value of 15 would yield a final score: 64+15=79.