Default Matching Method - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

Using group by (match group) set by the user, the matcher identifies groups of records that might potentially be duplicates of one another. The matcher then proceeds through each record in the group; if the record matches an existing Suspect, the record is considered a Duplicate of that suspect, assigned a Score, CollectionNumber, and MatchRecordType (Duplicate), and eliminated from the match. If, on the other hand, the record matches no existing Suspect within the match group, the record becomes a new Suspect, in that it is added to the current Match group so that it can be matched against by subsequent records. When the matcher has exhausted all records in the current Match group, it eliminates all Suspects from the match, labeling the Match Record type as Unique and assigning a collection number of 0. Those Suspects with a least one duplicate will retain a Match Record Type of Suspect and is assigned the same collection number as its matched duplicate record. Finally, when all records within a match group have been written to the output. A new match group is compared.

Note: The Default Matching Method will only compare records that are within the same match group.

The type of matching (Intraflow or Interflow) determines how express key match results translate to Candidate Match Scores. In Interflow matching, a successful Express Key match always confers a 100 MatchScore onto the Candidate. On the other hand, in Intraflow matching, the score a Candidate gains as a result of an Express Key match depends on whether the record to which that Candidate matched was a match of some other Suspect—Express Key duplicates of a Suspect will always have MatchScores of 100, whereas Express Key duplicates of another Candidate (which was a duplicate of a Suspect) will inherit the MatchScore (not necessarily 100) of that Candidate