The section outlines the differences in the SOUNDEX1 comparison routines between the Series 7 matcher and winkey modules and the TS Quality (TSQ) versions. Key differences include variations in padding methods, handling of duplicate adjacent character values, and treatment of spaces in character evaluations. Updates for TSQ v17.3.0 aim to provide consistent results and address prior deficiencies.
Change type | Description |
---|---|
Introduced in version 17.3 | A new sections that provides comparative analysis of SOUNDEX1 routines across Trillium Series versions. |
For Series 7, the matcher and winkey modules had different algorithm implementations for the SOUNDEX1 comparison routine, which yielded different results. These difference are explained below along with examples.
-
Padding of Results with Less Than 4 Characters
- Matcher pads with the zero character.
- Example: The soundex result for ‘LEE’ is ‘L’; the matcher result is ‘L000’.
- Winkey pads with the space or blank character.
- Example: The soundex result for ‘LEE’ is ‘L’; the winkey result is ‘L ’.
- Matcher pads with the zero character.
-
Eliminating Duplicate Adjacent Character Values
- Matcher eliminates duplicates after character values have been recoded.
- Example: The soundex result for ‘PFISTER’ is ‘P236’; ‘P’ and ‘F’ are coded to ‘1’, which are identified as duplicates, resulting in ‘F’ being eliminated from the result.
- Winkey eliminates duplicates before character values have been recoded.
- Example: The soundex result for ‘PFISTER’ is ‘P123’; ‘P’ and ‘F’ are coded to ‘1’, but the duplicate value is not eliminated since the prior values of ‘P’ and ‘F’ are different.
- Matcher eliminates duplicates after character values have been recoded.
-
Converting Vowels to Spaces and Eliminating the Spaces When Considering Adjacent Character Values
- Matcher removes spaces before eliminating duplicates, starting after the
first character value.
Example of algorithm steps for ‘JACKSON’:
- Characters are coded as: J = 2, A = space, C = 2, K = 2, S = 2, O = space, N = 5.
- Spaces are removed: J = 2, C = 2, K = 2, S = 2, N = 5.
- Eliminate duplicates: J = 2, C = 2, N = 5.
- Result: ‘J25’.
- Result padded with zero: ‘J250’.
- Winkey does not remove spaces before eliminating duplicates, starting after
the first character value.
Example of algorithm steps for ‘JACKSON’:
- Characters are coded as: J = 2, A = space, C = 2, K = 2, S = 2, O = space, N = 5.
- No duplicates: J = 2, C = 2, K = 2, S = 2.
- Result: ‘J222’.
- Matcher removes spaces before eliminating duplicates, starting after the
first character value.
When upgrading to TS Quality, the matcher module was replaced with the rellink module. For this upgrade, the differences between the two Series 7 modules were removed to produce consistent results within the TS Quality modules. Additional changes were made for TS Quality to include the first character value to be evaluated when considering the elimination of duplicate adjacent character values.
To provide continued support for the Series 7 comparison routines, the Routine Modifier (S7) was introduced. For v17.3.0, updates were made to these routines to fix prior deficiencies where results were found to be inconsistent.
Differences Between Series 7 SOUNDEX1 Comparison Routines Used for Matcher and Winkey Modules and the TSQ v17.x Rellink and Winkey Modules:
- For TSQ versions prior to v17.3.0, an additional difference was introduced when
converting vowels to spaces and eliminating the spaces when considering adjacent
character values. For TS Quality, the first character value was evaluated when
considering adjacent character values.
Example of algorithm steps for coding ‘JACKSON’ (v17.2 and below):
- Characters are coded as: J = 2, A = space, C = 2, K = 2, S = 2, O = space, N = 5.
- Spaces are removed: J = 2, C = 2, K = 2, S = 2, N = 5.
- Eliminate duplicates (C = 2, K = 2, S = 2): J = 2, N = 5.
- Result: ‘J5’.
- Result padded with zero: ‘J500’.
For v17.3.0, the first character value is no longer evaluated when considering adjacent character values. This change was made to align the algorithm with that of the U.S. National Archives.
Example of algorithm steps for coding ‘JACKSON’ (v17.3.0):- Characters are coded as: J = 2, A = space, C = 2, K = 2, S = 2, O = space, N = 5.
- Spaces are removed: J = 2, C = 2, K = 2, S = 2, N = 5.
- Eliminate duplicates (K = 2, S = 2): J = 2, C = 2, N = 5.
- Result: ‘J25’.
- Result padded with zero: ‘J250’.
- For TSQ versions prior to v17.3.0, spaces between words were eliminated prior to
considering adjacent character values. This was in line with the U.S. National
Archives.
Example of algorithm steps for coding ‘PAM NAILS’ (v17.2 and below):
- Characters are coded as: P = 1, A = space, M = 5, space, N = 5, A = space, I = space, L = 4, S = 2.
- All spaces are removed: P = 1, M = 5, N = 5, L = 4, S = 2.
- Eliminate duplicates (N = 5): P = 1, N = 5, L = 4, S = 2.
- Result: ‘P542’.
For v17.3.0, spaces between words are no longer eliminated when considering adjacent character values. This change restored the significance placed on spaces between words due to their use in multi-word business names.
Example of algorithm steps for coding ‘PAM NAILS’ (v17.3.0):- Characters are coded as: P = 1, A = space, M = 5, space, N = 5, A = space, I = space, L = 4, S = 2.
- Spaces are removed within words only: P = 1, M = 5, space, N = 5, L = 4, S = 2.
- No adjacent duplicates to remove.
- Eliminate remaining spaces: P = 1, M = 5, N = 5, L = 4, S = 2.
- Result: ‘P554’.
- For TSQ versions prior to v17.3.0, spaces between words were eliminated prior to
considering adjacent character values. This was consistent with the U.S. National
Archives.
Example of algorithm steps for coding ‘PAM NAILS’ (v17.2 and below):
- Characters are coded as: P = 1, A = space, M = 5, space, N = 5, A = space, I = space, L = 4, S = 2.
- All spaces are removed: P = 1, M = 5, N = 5, L = 4, S = 2.
- Eliminate duplicates (N = 5): P = 1, N = 5, L = 4, S = 2.
- Result: ‘P542’.
For v17.3.0, spaces between words are no longer eliminated when considering adjacent character values. This change restored the significance placed on spaces between words due to their use in multi-word business names.
Example of algorithm steps for coding ‘PAM NAILS’ (v17.3.0):- Characters are coded as: P = 1, A = space, M = 5, space, N = 5, A = space, I = space, L = 4, S = 2.
- Spaces are removed within words only: P = 1, M = 5, space, N = 5, L = 4, S = 2.
- No adjacent duplicates to remove.
- Eliminate remaining spaces: P = 1, M = 5, N = 5, L = 4, S = 2.
- Result: ‘P554’.
Tables of Sample Differences in SOUNDEX1 Routine Results Across Versions and Modules Using Different Routine Modifiers:
Version/Test Data | Series 7 Winkey | TSQ 17.2 S7 Modifier Winkey | TSQ 17.3 S7 Modifier Winkey | TSQ 17.2 No S7 Modifier | TSQ 17.3 No S7 Modifier | National Archives Online |
---|---|---|---|---|---|---|
JACKSON | J222 | J225 | J222 | J250 | J250 | J250 |
PAM NAILS | P554 | P542 | P554 | P542 | P554 | P542 |
CORNEY | C65 | C5 | C65 | C650 | C650 | C650 |
LEE | L | Error | L | L000 | L000 | L000 |
GUTIERREZ | G362 | G62 | G362 | G362 | G362 | G362 |
PFISTER | P123 | P236 | P123 | P123 | P236 | P236 |
ASHCRAFT | A226 | A226 | A226 | A261 | A261 | A261 |
LLOYD | L3 | L | L3 | L430 | L300 | L300 |
CAMPBELL | C511 | C114 | C522 | C514 | C514 | C514 |
MCGEE | M22 | M2 | M22 | M200 | M200 | M200 |
RIEDEMANAS | R355 | R552 | R355 | R352 | R355 | R355 |
SCHAFER | S216 | S16 | S216 | S216 | S160 | S160 |
SHAEFFER | S16 | S6 | S16 | S160 | S160 | S160 |
Version/Test Data | Series 7 Matcher | TSQ 17.2 S7 Modifier Rellink | TSQ 17.3 S7 Modifier Rellink | TSQ 17.2 No S7 Modifier | TSQ 17.3 No S7 Modifier | National Archives Online |
---|---|---|---|---|---|---|
JACKSON | J250 | J5 | J250 | J250 | J250 | J250 |
PAM NAILS | P554 | P542 | P554 | P542 | P554 | P542 |
CORNEY | C650 | C5 | C650 | C650 | C650 | C650 |
LEE | L000 | Error | L000 | L000 | L000 | L000 |
GUTIERREZ | G362 | G62 | G362 | G362 | G362 | G362 |
PFISTER | P236 | P236 | P236 | P123 | P236 | P236 |
ASHCRAFT | A226 | A226 | A226 | A261 | A261 | A261 |
LLOYD | L300 | L | L3000 | L430 | L300 | L300 |
CAMPBELL | C514 | C14 | C514 | C514 | C514 | C514 |
MCGEE | M200 | M | M200 | M200 | M200 | M200 |
RIEDEMANAS | R355 | R552 | R355 | R352 | R355 | R355 |
SCHAFER | S160 | S16 | S160 | S216 | S160 | S160 |
SHAEFFER | S160 | S6 | S160 | S160 | S160 | S160 |