Matching Records Between and Within Sources - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
How Do I
Overview
Tips
Reference
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

This procedure describes how to use an Intraflow Match stage to identify records in one file that match records in another file and in the same file. For example, you have two files (file A and file B) and you want to see if there are records in file A that match records in file B, but you also want to see if there are records in file A that match other records in file A. You can accomplish this using a Stream Combiner and an Intraflow Match stage.

  1. In Enterprise Designer, create a new dataflow.
  2. Drag a source stage onto the canvas.
  3. Double-click the source stage and configure it. See the Dataflow Designer's Guide for instructions on configuring source stages.
  4. Drag a second source stage onto the canvas and configure it to read the second data source into the dataflow.
  5. Drag a Stream Combiner stage onto the canvas and connect the two source stages to it.

    For example, if your dataflow had two Read from File stages it would look like this after adding the Stream Combiner:

    Stream Combiner in dataflow
  6. Drag a Match Key Generator stage onto the canvas and connect it to the Stream Combiner stage.

    For example, your dataflow may now look like this:

    Match Key Generator in dataflow

    Match Key Generator creates a non-unique key for each record, which can then be used by matching stages to identify groups of potentially duplicate records. Match keys facilitate the matching process by allowing you to group records by match key and then only comparing records within these groups.

  7. Double-click Match Key Generator.
  8. Click Add.
  9. Define the rule to use to generate a match key for each record.
    For more information, see Match Key Generator Options.
  10. When you are done defining the rule click OK.
  11. If you want to add additional match rules, click Add and add them, otherwise click OK when you are done.
  12. Drag an Intraflow Match stage onto the canvas and connect it to the Match Key Generator stage.

    For example, your dataflow may now look like this:

    Intraflow Match in dataflow
  13. Double-click Intraflow Match.
  14. In the Load match rule field, select one of the predefined match rules which you can either use as-is or modify to suit your needs. If you want to create a new match rule without using one of the predefined match rules as a starting point, click New. You can only have one custom rule in a dataflow.
    Note: Do not use special characters while creating a new rule.
    Note: The Dataflow Options feature in Enterprise Designer enables the match rule to be exposed for configuration at runtime.
  15. In the Group by field, select MatchKey.

    This will place records that have the same match key into a group. The match rule is applied to records within a group to see if there are duplicates. The match key for each record will be generated by the Generate Match Key stage you configured earlier in this procedure.

  16. For information about modifying the other options, see Building a Match Rule.
  17. Click OK to save your Intraflow Match configuration and return to the dataflow canvas.
  18. Drag a sink stage onto the canvas and connect it to the Generate Match key stage.

    For example, if you were using a Write to File sink stage your dataflow would look like this:

    Write to File in dataflow
  19. Double-click the sink stage and configure it.

    For information on configuring sink stages, see the Dataflow Designer's Guide.