Options - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
First publish date
2007
Last updated
2024-03-04
Published on
2024-03-04T22:52:13.486265
  1. In the Load match rule field, select one of the predefined match rules which you can either use as-is or modify to suit your needs. If you want to create a new match rule without using one of the predefined match rules as a starting point, click New. You can only have one custom rule in a dataflow.
    Note: Do not use special characters while creating a new rule.
    Note: The Dataflow Options feature in Enterprise Designer enables the match rule to be exposed for configuration at runtime.
  2. Click Group By to select a field to use for grouping records in the match queue. Intraflow Match only attempts to match records against other records in the same match queue.
  3. Select the Sort box to perform a pre-match sort of your input based on the field selected in the Group By field.
  4. Click Advanced to specify additional sort performance options.
    Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance:

    (InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords

  5. Click Express Match On to perform an initial comparison of express key values to determine whether two records are considered a match.

    Express Key matching can be a useful tool for reducing the number of compares performed and thereby improving execution speed. A loose express key results in many false positive matches. You can generate an express key as part of generating a match key through MatchKeyGenerator. See Match Key Generator for more information.

    If two records have an exact match on the express key, the candidate is considered a 100% duplicate. If two records do not match on an express key value, they are compared using the rules-based method.

    To determine whether a candidate was matched using an express key, look at the value of the ExpressKeyIdentified field, which is either Y for a match or N for no match. Note that suspect records always have an ExpressKeyIdentified value of N.

  6. In the Initial Collection Number text box, specify the starting number to assign to the collection number field for duplicate records.

    The collection number identifies each duplicate record in a match queue. Unique records are assigned a collection number of 0. Each duplicate record is assigned a collection number starting with the value specified in the Initial Collection Number text box.

  7. Select one of the following:
    Option Description
    Compare suspect to all candidates This option matches the suspect to all candidates in the same match group (group by option) even if a duplicate is already found within the match group. For example:

    Suspect - John Smith
    Candidate - Bill Jones
    Candidate - John Smith
    Candidate - John Smith

    In the example, the suspect John Smith would be compared to both John smith candidates.

    Check the Return Unique Candidates box to return records within a match group from the candidate port that have been identified as unique records.

    Stop comparing suspect against candidates after finding n duplicates This option matches the suspect to all candidates in the same match group (group by option) but stops comparing when the user defined number of duplicates have been identified. For example, if you chose to stop comparing candidates after finding one duplicate and you had this data:

    Suspect - John Smith
    Candidate - Bill Jones
    Candidate - John Smith
    Candidate - John Smith

    In the example, the suspect record John Smith would stop comparing within the match group when the first John Smith candidate is identified as a duplicate.

  8. Click Generate Data for Analysis to generate match results. For more information, see Analyzing Match Results.
  9. Assign collection number 0 to unique records, checked by default, will assign zeroes as collection numbers to unique records. Uncheck this option to generate collection numbers other than zero for unique records. The unique record collection numbers will be in sequence with any other collection numbers. For example, if your matching dataflow finds five records and the first three records are unique, the collection numbers would be assigned as shown in the first group below. If your matching dataflow finds five records and the last two are unique, the collection numbers would be assigned as shown in the second group below.
    Option Description
    Collection Number Record Type
    1 Unique
    2 Unique
    3 Unique
    4 Duplicate/Suspect
    4 Duplicate/Suspect
       
    Collection Number Record Type
    1 Duplicate/Suspect
    1 Duplicate/Suspect
    2 Unique
    3 Unique
    4 Unique
    If you leave this box checked, any unique records found in your dataflow will be assigned a collection number of zero by default.
  10. Select the Return match rule name option to include the selected match rule name in the stage output.
  11. Select Return detailed match information if you want detailed match information to be displayed as an output for your match rule. For more information about the output fields, see Output.
    Note: If you enable this field, it will hinder the overall stage performance.
  12. If you are creating a new custom matching rule, see Building a Match Rule for more information.
  13. Click Evaluate to evaluate how a suspect record scored against candidate records. For more information, see Interflow Match.