The LEVENSHTEIN routine compares two strings by measuring the difference using the LEVENSHTEIN distance algorithm. The Levenshtein distance is calculated as the minimum number of edits required to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
For example, the Levenshtein distance between "test" and "test" is 0 because the strings are identical and no edit is needed. If the strings are "test" and "tent", the distance is 1 because one substitution (change 's' to 'n') is required. If the strings are "book" and "back", the distance is 2, since the following two edits are required to change one into the other:
1. book → baok (substitution of 'a' for 'o')
2. baok → back (substitution of 'c' for 'o')
The higher value of distance means the strings are more different than each other.
Guidelines
- The LEVENSHTEIN routine is case-sensitive.
- As of V15.8.2, the LEVENSHTEIN routine is not supported for the MATCH function in the Expression Builder.
Routine Modifiers
The LEVENSHTEIN routine uses the following optional modifiers to fine-tune the matching scores. You can use multiple modifiers at the same time:
- ANCHOR. Finds a match of one string (String 1) within another string (String 2) by making a SUBSTRNG-like comparison from the beginning or the end of String 2.
- DAMERAU. Enables scoring according to the Damerau algorithm. The Damerau algorithm allows adjacenttranspositions in addition to insertions, deletions, and substitutions.
- EXACT. Specifies the number of exact characters to match at the beginning and/or the end of the string.
- RECIPROCAL. Makes an additional comparison by reversing the order of strings, and takes the higher score of the two comparisons.
- TOLERANCE. Specifies error tolerance by limiting the number of edit distance allowed for certain length of strings.
Score |
Description |
---|---|
100 |
Exact match. |
99 |
The base score for the strings that are not an exact match. |
2 | Nomatch. |
1 |
Blank field versus non-blank field. |
0 |
Blank field versus blank field. |
Deduct from 99 |
|
- 1 |
For each edit distance. |
Routine with Modifier(s):
LEVENSHTEIN With Modifier (ANCHOR)
LEVENSHTEIN With Modifier (DAMERAU)
LEVENSHTEIN With Modifier (EXACT)
LEVENSHTEIN With Modifier (RECIPROCAL)
LEVENSHTEIN With Modifier (TOLERANCE)