Matching Records Using Multiple Match Rules - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
How Do I
Overview
Tips
Reference
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

If you have records that you want to match and you want to use more than one matching operation, you can create a dataflow that uses more than one match key then combines the results to effectively match on multiple separate criteria. For example, say you want to create a dataflow that matches records where:

The name and address match
OR
The date of birth and government ID match

To perform matching using this logic, you create a dataflow that performs name and address matching in one stage, and date of birth and government ID matching in another stage, then combine the matching records into a single collection.

This topic provides a general procedure for setting up a dataflow where matching occurs over the course of two matching stages. For purposes of illustration this procedure uses Intraflow Match stages. However, you can use this technique with Interflow Match as well.

  1. In Enterprise Designer, create a new dataflow.
  2. Drag a source stage onto the canvas.
  3. Double-click the source stage and configure it. See the Dataflow Designer's Guide for instructions on configuring source stages.
  4. Define the first matching pass. The results of this first matching pass will be collections of records that match on your first set of matching criteria, for example records that match on name and address.
    1. Drag a Match Key Generator and Intraflow Match stage to the canvas and connect them so that you have a dataflow that looks like this:
      Match Key Generator and Intraflow Match in dataflow
    1. In the Match Key Generator stage, define the match key to use for the first matching pass.

      For example, if you want the first matching pass to match on name and address, you may create a match key based on the fields containing the last name and postal code.

    2. In the Intraflow Match stage, define the match rules you want to perform the first matching pass.

      For example, if you may configure this matching stage to match on name and address.

  5. Save the collection numbers from the first matching pass to another field. This is necessary because the CollectionNumber field will be overwritten during the second matching pass. It is necessary to rename the CollectionNumber field in order to preserve the results of the first matching pass.
    1. Drag a Transformer stage to the canvas and connect it to the Intraflow Match stage so that you have a dataflow that looks like this:
      Transformer in dataflow
    2. Configure the Transformer stage to rename the field CollectionNumber to CollectionNumberPass1.
  6. Define the second matching pass. The results of this second matching pass will be collections of records that match on your second set of matching criteria, for example records that date of birth and government ID.
    1. Drag a Match Key Generator and Intraflow Match stage to the canvas and connect them so that you have a dataflow that looks like this:
      Match Key Generator and Intraflow Match in dataflow
    2. In the second Match Key Generator stage, define the match key to use for the second matching pass.

      For example, if you want the second matching pass to match date of birth and government ID, you might create a match key based on the fields containing the birthday and government ID.

    3. In the second Intraflow Match stage, define the match rule for the second matching pass.

      For example, if you may configure this matching stage to match on date of birth and government ID.

  7. Determine if any of the duplicate records identified by the second matching pass were also identified as duplicates in the first matching pass.
    1. Create the dataflow snippet shown below following the second Intraflow Match stage:
      Dataflow snippet following Intraflow Match
    2. Configure the Conditional Router stage so that records where the CollectionNumber field is not equal to 0 are routed to the Duplicate Synchronization stage.

      This will route the duplicates from the second matching pass to the Duplicate Synchronization stage.

    3. Configure the Duplicate Synchronization stage to group records by the CollectionNumer field (this is the collection number from the second matching pass). Then within each collection, identify whether any of the records in the collection were also identified as duplicates in the first matching pass. If they were, copy the collection number from the first pass to a new field called CollectionNumberConsolidated. To accomplish this, configure Duplicate Synchronization as shown here:
      Duplicate Synchronization Options dialog box
    4. In the Transformer stage that follows the Duplicate Synchronization stage, create a custom transform using this script:
      if (data['CollectionNumberConsolidated'] == null) {
       data['CollectionNumberConsolidated'] = data['CollectionNumber']
      }
    5. In the Transformer that immediately follows the Conditional Router (Transformer 2 in sample dataflow) configure a transform to copy CollectionNumberPass1 to CollectionNumberConsolidated.

      This takes the unique records from the second matching pass and copies CollectionNumberPass1 to CollectionNumberConsolidated.

  8. After the Stream Combiner you will have collections of records that match in either of the matching passes. The CollectionNumberConsolidated field indicates the matching records. You can add a sink or any additional processing you wish to perform after the Stream Combiner stage.