LEVENSHTEIN Routine - trillium_discovery - trillium_quality - Latest

Inline Quality and Discovery

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Quality
Trillium > Trillium Discovery
Version
Latest
Language
English
Product name
Trillium Quality and Discovery
Title
Inline Quality and Discovery
Copyright
2024
First publish date
2008
Last updated
2024-10-18
Published on
2024-10-18T15:10:12.949492

The LEVENSHTEIN routine compares two strings by measuring the difference using the LEVENSHTEIN distance algorithm. The Levenshtein distance is calculated as the minimum number of edits required to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

For example, the Levenshtein distance between "test" and "test" is 0 because the strings are identical and no edit is needed. If the strings are "test" and "tent", the distance is 1 because one substitution (change 's' to 'n') is required. If the strings are "book" and "back", the distance is 2, since the following two edits are required to change one into the other:

1. book → baok (substitution of 'a' for 'o')

2. baok → back (substitution of 'c' for 'o')

The higher value of distance means the strings are more different than each other.

Guidelines

  • The LEVENSHTEIN routine is case-sensitive.
  • As of V15.8.2, the LEVENSHTEIN routine is not supported for the MATCH function in the Expression Builder.

Routine Modifiers

The LEVENSHTEIN routine uses the following optional modifiers to fine-tune the matching scores. You can use multiple modifiers at the same time:

  • ANCHOR. Finds a match of one string (String 1) within another string (String 2) by making a SUBSTRNG-like comparison from the beginning or the end of String 2.
  • DAMERAU. Enables scoring according to the Damerau algorithm. The Damerau algorithm allows adjacenttranspositions in addition to insertions, deletions, and substitutions.
  • EXACT. Specifies the number of exact characters to match at the beginning and/or the end of the string.
  • RECIPROCAL. Makes an additional comparison by reversing the order of strings, and takes the higher score of the two comparisons.
  • TOLERANCE. Specifies error tolerance by limiting the number of edit distance allowed for certain length of strings.
Table 1. Scoring for LEVENSHTEIN

Score

Description

100

Exact match.

99

The base score for the strings that are not an exact match.

2 Nomatch.

1

Blank field versus non-blank field.

0

Blank field versus blank field.

Deduct from 99

- 1

For each edit distance.

Routine with Modifier(s): 

LEVENSHTEIN With Modifier (ANCHOR)

LEVENSHTEIN With Modifier (DAMERAU)

LEVENSHTEIN With Modifier (EXACT)

LEVENSHTEIN With Modifier (RECIPROCAL)

LEVENSHTEIN With Modifier (TOLERANCE)