TOKENIZE Routine - trillium_discovery - trillium_quality - Latest

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
Latest
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Copyright
2024
First publish date
2008
Last updated
2024-10-18
Published on
2024-10-18T15:02:04.502478

The TOKENIZE routine allows fields to be matched on a token by token basis. The default score is calculated based on the percentage of matched tokens.

Table 1. Scoring for TOKENIZE

Score

Description

Base score

(Number of matched tokens) divided by the (maximum number of matched tokens).

The maximum number of matched tokens is the number of tokens in the shorter field.

100 All tokens match.
88 Blank field versus blank field.
80 Blank field versus non-blank field.
Deduct from the base score  
-3 For each extra token in the longer field.

Example

Field 1: ABC Company Trillium Software

Field 2: ABC Company TrilliumSoftware

Field 1

Token 1: ABC

Field 2

Token 1: ABC

 

Token 2: Company

 

Token 2: Company

 

Token 3: Trillium

 

Token 3:TrilliumSoftware

 

Token 4: Software

 

 

Attempted matches would be:

Field 1

 

Field 2

 

ABC

vs

ABC

Match

Company

vs

Company

Match

Trillium

vs

TrilliumSoftware

 

Software

vs

TrilliumSoftware

 

The matched percentage is calculated as 2 (number of matched tokens) divided by 3 (number of tokens in the shorter field, Field 2) yielding a base score of 67. A deduction, -3, is given for the 1 extra token in the longer field, Field 1. The final score is 67 – 3 = 64.