BUSNAME Routine - trillium_discovery - trillium_quality - 17.1

Inline Quality and Discovery

Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Inline Quality and Discovery

The BUSNAME routine compares two business names, assuming the fields are left-justified and blank-filled. The routine also assumes that a word is any group of characters followed by a blank. This routine can successfully match:

The default order of operations for scoring for the BUSNAME routine is as follows:

  1. Test for blank fields. If either field is blank and the routine modifier is not COMPACT, return a score of 50.
  2. Test for an exact match of the non-modified input fields. For an exact match return a score of 100.
  3. Apply any of the following routine modifiers in the listed order:
    • ALPHANUM
    • NOCASE
    • DECOMP
    • DI
    • SORT - returns at the end of SORT processing a score of 0, 95.
    • COMPACT - returns at the end of COMPACT processing a score of 0,75,85,90.
    • ACRONYM - sets the maximum return score to be 99 before applying the business name match algorithm.
  4. Apply the BUSNAME routine logic below.
Table 1. Scoring for BUSNAME

Score

Description

100

Exact match, excluding blank versus blank.

50

One or both fields are blank.

The following logic is applied to narrow the match. With this logic, values are deducted from ‘100’.

Deduct

from 100

 Reason for deduction

– 1

For each extra word, if there were no other errors.

– 3

For each extra word, if there were any word or letter errors.

– 3

For each inserted word (also count one word error).

– 4

For each word transposition (also count one word error).

– 15

For extra words, if the only matches were single letter exact substrings and there were less than three such matches.

If two words are not equal and the preceding rules for word errors do not apply, the words are compared letter for letter, and the following deductions are taken.

– 1

For each doubled letter (also count one letter error).

– 2

For each transposed letter (also count one letter error).

– 2

For each inserted letter (also count one letter error).

– 2

For each mismatch (also count one letter error).

– 2

For any extra characters if one word is longer than the other (also count letter errors).

– 10

If the number of word errors is more than one third the greater number of words in both fields.

– 10

If the number of errors is more than 25%, the greater number of characters in both fields.

– 25

If the number of word errors is greater than 50% of the smaller number of words in both fields.

– 25

If the number of character errors is greater than 50% of the smaller number of characters in both fields.

Examples

"IBM" vs "ibm" with NOCASE = 100

"I BM" vs "I-BM" with ALPHANUM = 100