Creating a Best of Breed Record - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
First publish date
2007
Last updated
2024-03-04
Published on
2024-03-04T22:52:13.486265

To eliminate duplicate records from your data, you may choose to merge data from groups of duplicate records into a single "best of breed" record. This approach is useful when each duplicate record contains data of the same type (for example, phone numbers or names) and you want to preserve the best data from each record in the surviving record.

This procedure describes how create a dataflow that merges duplicate records into a best of breed record.

  1. In Enterprise Designer, create a dataflow that identifies duplicate records through matching.

    Matching is the first step in deduplication because you need to identify records that are similar, such as records that have the same account number or name. See the following topics for instructions on creating a dataflow that matches records.

    Note: You only need to build the dataflow to the point where it reads data and performs matching with an Interflow Match, Intraflow Match, or Transactional Match stage. Once you have created a dataflow to this point, continue with the following steps.
  2. Once you have defined a dataflow that reads data and matches records, drag a Best of Breed stage to the canvas and connect it to the stage that performs the matching (Interflow Match, Intraflow Match, or Transactional Match).

    For example, if your dataflow reads data from a file and performs matching with Intraflow Match, your dataflow would look like this after adding a Best of Breed stage:

    Best of Breed stage in dataflow
  3. Double-click the Best of Breed stage on the canvas.
  4. In the Group by field, select CollectionNumber.
  5. Under Best of Breed Settings, select Rules in the conditions tree.
  6. Click Add Rule.

    Records in each group are evaluated to see if they meet the rules you define here. If a record matches a rule, its data may be copied to the best of breed record, depending on how you configure the actions associated with the rule. You will define actions later.

  7. Define a rule that a duplicate record must meet in order for a its data to be copied to the best of breed record.

    Configure options to define a rule. For more information, see Rule options

  8. Click OK.
  9. Click the Actions node in the tree.
  10. Click Add Action.
  11. Specify the data to copy to the best of breed record if the record meets the criteria you defined in the rule.
    For more information, see Actions options.
  12. Click OK.

    You have now configured Best of Breed with one rule and one action. You can add additional rules and actions if needed.

  13. Click OK to close the Best of Breed Options window.
  14. Drag a sink stage onto the canvas and connect it to the Best of Breed stage.

    For example, if you were using a Write to File sink stage your dataflow would look like this:

    Write to File in dataflow
  15. Double-click the sink stage and configure it.

    For information on configuring sink stages, see the Dataflow Designer's Guide.

You now have a dataflow that identifies matching records and merges records within a collection into a single best of breed record.