Matching Terminology - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
How Do I
Overview
Tips
Reference
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265
Average Score
The average match score of all duplicates. The possible values are 0-100, with 0 indicating a poor match and 100 indicating an exact match.
Baseline
The selected match result that will be compared against another match result.
Candidate Group
Suspect and Candidate records grouped together by an ID assigned by CandidateFinder. The suspect (the first record in the group) is a record read from an Input source while its candidates are usually records found in a database using a SQL query.
Candidate Records
All non-suspect records in a match group or candidate group.
Drop
A decrease in duplicates.
Detail Match Record
A single record that corresponds to a record processed by a match stage. Each record provides information about whether the record was a Suspect, Unique, or a Duplicate as well as information about its Match Group or Candidate Group and output collection. Candidate records provide information on why the input record matched or did not match to its suspect.
Duplicate Collections
A duplicate collection consists of a Suspect and its Duplicate records grouped together by a CollectionNumber. Unique records always belong to CollectionNumber 0.
Duplicate Records
Number of records that match another record within a match group.
Express Matches
An express match is made when a suspect and candidate have an exact match on the contents of a designated field, usually an ExpressMatchKey provided by the Match Key Generator. If an Express Match is made no further processing is done to determine if the suspect and candidate are duplicates.
Input Records
Order of the records in the matching stage before the matching sort is performed.
Interflow Match
A matching stage that locates matches between similar data records between two input record streams. The first record stream is a source for suspect records and the second stream is a source for candidate records.
Intraflow Match
A matching stage that locates matches between similar data records within a single input stream.
Lift
An increase in duplicates.
Match Groups
(Group By) Records grouped together either by a match key or a sliding window.
Match Results
(or Resource Bundle) Logical grouping of files produced by a stage. This data is saved for each run of a stage and stored to disk. Subsequent runs will not overwrite or change the results from a previous run. In MAT, the bundles are used to provide information about the summary and details results, as well as settings information.
Match Results List
List of match results of a single type that MAT can analyze in the current analysis session.
Match Results Type
Indicates the contents of the match results. MAT uses the match results type to determine how to use the data.
Matcher Stage
A stage on the canvas that performs matching routines. The matcher stages are Interflow Match, Intraflow Match, and Transactional Match
Missed Match
A record that was previously a suspect or duplicate but is now unique.
New Match
A record that was previously unique but is now a suspect or duplicate.
Sliding Window
The sliding window matching method sequentially fills a predetermined buffer size called a window with the corresponding amount of data rows. As each row is added to the window it is compared to each item already contained in the window.
Suspect Records
A driver record that is matched against candidates within a match group or a candidate group.
Transactional Match
A matching stage that matches suspect records against candidate records that are returned from Candidate Finder or by an external application.
Unique Records
A suspect or candidate record that does not match any other records in a match group. If it is the only record in a match group, a suspect is automatically unique.