TOKENIZE Routine - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Trillium > Trillium Quality
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
Administration
Overview
How Do I
Configuration
Reference
Installation
First publish date
2008

The TOKENIZE routine allows fields to be matched on a token by token basis. The default score is calculated based on the percentage of matched tokens.

Table 1. Scoring for TOKENIZE

Score

Description

Base score

(Number of matched tokens) divided by the (maximum number of matched tokens).

The maximum number of matched tokens is the number of tokens in the shorter field.

100 All tokens match.
88 Blank field versus blank field.
80 Blank field versus non-blank field.
Deduct from the base score  
-3 For each extra token in the longer field.

Example

Field 1: ABC Company Trillium Software

Field 2: ABC Company TrilliumSoftware

Field 1

Token 1: ABC

Field 2

Token 1: ABC

 

Token 2: Company

 

Token 2: Company

 

Token 3: Trillium

 

Token 3:TrilliumSoftware

 

Token 4: Software

 

 

Attempted matches would be:

Field 1

 

Field 2

 

ABC

vs

ABC

Match

Company

vs

Company

Match

Trillium

vs

TrilliumSoftware

 

Software

vs

TrilliumSoftware

 

The matched percentage is calculated as 2 (number of matched tokens) divided by 3 (number of tokens in the shorter field, Field 2) yielding a base score of 67. A deduction, -3, is given for the 1 extra token in the longer field, Field 1. The final score is 67 – 3 = 64.