Matching Records from One Source to Another Source - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Product family
Spectrum > Quality > Spectrum Quality
Product name
Spectrum Data Quality
Spectrum Data Quality Guide
Topic type
How Do I
First publish date

This procedure describes how to use an Interflow Match stage to identify records in one source that match records in another source. The first source contains suspect records and the second source contains candidate records. The dataflow only matches records from one source to records in another source. It does not attempt to match records from within the same source. The dataflow groups records into collections of matching records and writes these collections to an output file.

  1. In Enterprise Designer, create a new dataflow.
  2. Drag two source stages onto the canvas. Configure one of them to point to the source of the suspect records and configure the other to point to the source of the candidate records.

    See the Dataflow Designer's Guide for instructions on configuring source stages.

  3. Drag a Match Key Generator stage onto the canvas and connect it to one of the source stages.

    For example, if you are using a Read from File source stage, your dataflow would now look like this:

    Read from File in dataflow

    Match Key Generator creates a non-unique key for each record, which can then be used by matching stages to identify groups of potentially duplicate records. Match keys facilitate the matching process by allowing you to group records by match key and then only comparing records within these groups.

    Note: You will add a second Match Key Generator stage later. For now you only need one on the canvas.
  4. Double-click the Match Key Generator stage.
  5. Click Add.
  6. Define the rule to use to generate a match key for each record.
    For more information, see Match Key Generator Options.
  7. When you are done defining the rule click OK.
  8. Right-click the Match Key Generator stage on the canvas and select Copy Stage.
  9. Right-click in an empty area of the canvas and select Paste.
  10. Connect the copy of Match Key Generator to the other source stage.

    For example, if you are using Read from File input stages your dataflow would now look like this:

    Read from File in dataflow

    The dataflow now contains two Match Key Generator stages that produce match keys for each source using exactly the same rules. Having identically-configured Match Key Generator stages is essential to the proper functioning of this dataflow.

  11. Drag an Interflow Match stage onto the canvas and connect each of the Match Key Generator stages to it.

    For example, if you are using Read from File input stages your dataflow would now look like this:

    Interflow Match in dataflow
  12. Double-click the Interflow Match stage.
  13. In the Load match rule field, select one of the predefined match rules which you can either use as-is or modify to suit your needs. If you want to create a new match rule without using one of the predefined match rules as a starting point, click New. You can only have one custom rule in a dataflow.
    Note: Do not use special characters while creating a new rule.
    Note: The Dataflow Options feature in Enterprise Designer enables the match rule to be exposed for configuration at runtime.
  14. In the Group by field, select MatchKey.

    This will place records that have the same match key into a group. The match rule is applied to records within a group to see if there are duplicates. The match key for each record will be generated by the Generate Match Key stages you configured earlier in this procedure.

  15. For information about modifying the other options, see Building a Match Rule.
  16. Drag a sink stage onto the canvas and connect it to the Interflow Match stage.

    For example, if you were using a Write to File sink stage your dataflow would look like this:

    Write to File in dataflow
  17. Double-click the sink stage and configure it.

    For information on configuring sink stages, see the Dataflow Designer's Guide.

You now have a dataflow that will match records from two data sources.

Matching Records from Multiple Sources

As a direct mail company, you want to identify people who are on a do-not-mail list so that you do not send direct mail to them. You have a list of recipients in one file, and a list of people who do not wish to receive direct marketing mail in another file (a suppression file).

The following dataflow provides a solution to this business scenario:

Business scenario solution dataflow

The Read from File stage reads data from your mailing list, and the Read from File 2 stage reads data from the suppression list. The two Match Key Generator stages are identically configured so that they produce a match key which can be used by Interflow Match to form groups of potential matches. Interflow Match identifies records in the mailing list that are also in the suppression file and marks these records as duplicates. Conditional Router sends unique records, meaning those records that were not found in the suppression list, to Write to File to be written out to a file. The Conditional Router stage sends all other records to Write to Null where they are discarded.