LEVENSHTEIN Routine - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Trillium > Trillium Quality
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

The LEVENSHTEIN routine compares two strings by measuring the difference using the LEVENSHTEIN distance algorithm. The Levenshtein distance is calculated as the minimum number of edits required to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

For example, the Levenshtein distance between "test" and "test" is 0 because the strings are identical and no edit is needed. If the strings are "test" and "tent", the distance is 1 because one substitution (change 's' to 'n') is required. If the strings are "book" and "back", the distance is 2, since the following two edits are required to change one into the other:

1. book → baok (substitution of 'a' for 'o')

2. baok → back (substitution of 'c' for 'o')

The higher value of distance means the strings are more different than each other.

Guidelines

  • The LEVENSHTEIN routine is case-sensitive.
  • As of V15.8.2, the LEVENSHTEIN routine is not supported for the MATCH function in the Expression Builder.

Routine Modifiers

The LEVENSHTEIN routine uses the following optional modifiers to fine-tune the matching scores. You can use multiple modifiers at the same time:

  • ANCHOR. Finds a match of one string (String 1) within another string (String 2) by making a SUBSTRNG-like comparison from the beginning or the end of String 2.
  • DAMERAU. Enables scoring according to the Damerau algorithm. The Damerau algorithm allows adjacent transpositions in addition to insertions, deletions, and substitutions.
  • EXACT. Specifies the number of exact characters to match at the beginning and/or the end of the string.
  • RECIPROCAL. Makes an additional comparison by reversing the order of strings, and takes the higher score of the two comparisons.
  • TOLERANCE. Specifies error tolerance by limiting the number of edit distance allowed for certain length of strings.
Table 1. Scoring for LEVENSHTEIN

Score

Description

100

Exact match.

99

The base score for the strings that are not an exact match.

2 Nomatch.

1

Blank field versus non-blank field.

0

Blank field versus blank field.

Deduct from 99

- 1

For each edit distance.

Routine with Modifier(s): 

LEVENSHTEIN With Modifier (ANCHOR)

LEVENSHTEIN With Modifier (DAMERAU)

LEVENSHTEIN With Modifier (EXACT)

LEVENSHTEIN With Modifier (RECIPROCAL)

LEVENSHTEIN With Modifier (TOLERANCE)