Matching Records Against a Database - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
First publish date
2007
Last updated
2024-03-04
Published on
2024-03-04T22:52:13.486265

This procedure describes how to match records where the suspect records come from a source such as a file or database, and the candidate records are in a database with other unrelated records. For each input record, the dataflow queries the database for candidates for that record, then uses a Transactional Match stage to match records. Finally, the dataflow writes the collections of matching records to an output file.

Note: Transactional Match only matches suspect records to candidates. It does not attempt to match suspect records to other suspect records as is done in Intraflow Match.
  1. In Enterprise Designer, create a new dataflow.
  2. Drag a source stage onto the canvas.
  3. Double-click the source stage and configure it. See the Dataflow Designer's Guide for instructions on configuring source stages.
  4. Drag a Candidate Finder stage to the canvas and connect the source stage to it.

    For example, if you were using the Read from File source stage, your dataflow would look like this:

    Read from File in dataflow

    Candidate Finder obtains the candidate records that will form the set of potential matches that Transactional Match will evaluate later in the dataflow.

  5. Double-click the Candidate Finder stage on the canvas.
  6. In the Connection field, select the database you want to query to find candidate records. If the database you want is not listed, open Management Console and define the database connection there first.
  7. In the SQL field, enter a SQL SELECT statement that finds records that are candidates based on the value in one of the dataflow fields. To reference dataflow fields, use the format ${FieldName}, where FieldName is the name of the field you want to reference.

    For example, if you wanted to find records in the database where the value in the LastName column is the same as the dataflow records' Customer_LastName field, you would write a SQL statement like this:

    SELECT FirstName, LastName, Address, City, State, PostalCode 
    FROM Customer_Table 
    WHERE LastName = ${Customer_LastName};
  8. On the Field Map tab, select which fields in the dataflow should contain the data from each database column.

    The Selected Fields column lists the database columns and theStage Fields lists the fields in the dataflow.

  9. Click OK.
  10. Drag a Transactional Match stage onto the canvas and connect the Candidate Finder stage to it.

    For example, if you are using a Read from File input stage your dataflow would now look like this:

    Read from File in dataflow

    Transactional Match matches suspect records against candidate records that are returned from the Candidate Finder stage. Transactional Match uses matching rules to compare the suspect record to all candidate records with the same candidate group number (assigned in Candidate Finder) to identify duplicates.

  11. Double-click the Transactional Match stage on the canvas.
  12. In the Load match rule field, select one of the predefined match rules which you can either use as-is or modify to suit your needs. If you want to create a new match rule without using one of the predefined match rules as a starting point, click New. You can only have one custom rule in a dataflow.
    Note: Do not use special characters while creating a new rule.
    Note: The Dataflow Options feature in Enterprise Designer enables the match rule to be exposed for configuration at runtime.
  13. For information about modifying the other options, see Building a Match Rule.
  14. When you are done configuring the Transactional Match stage, click OK.
  15. Drag a sink stage onto the canvas and connect it to the Transactional Match stage.

    For example, if you were using a Write to File sink stage your dataflow would look like this:

    Write to File in dataflow
  16. Double-click the sink stage and configure it.

    For information on configuring sink stages, see the Dataflow Designer's Guide.

You now have a dataflow that will match records from two data sources.

Example of Matching Records Against a Database

As a sales executive for an online sales company you want to determine if an online prospect is an existing customer or a new customer.

The following dataflow service provides a solution to the business scenario:

Dataflow solution to business scenario

This dataflow is a service that evaluates prospect data sent to it by an API call or web service call. It evaluates the data against customer data in a customer database to determine if a prospect is a customer.

The Input stage is configured so that the dataflow accepts the following input fields: AddressLine1, City, Name, PostalCode, and StateProvince. AddressLine1 and Name are the fields that are key to the dataflow processing in this template.

The Candidate Finder stage obtains the candidate records that will form the set of potential matches that the Transactional Match stage will evaluate.

The Transactional Match stage matches suspect records against potential candidate records that are returned from the Candidate Finder Stage. Transactional Match uses matching rules to compare the suspect record to all candidate records with the same candidate group number (assigned in Candidate Finder) to identify duplicates. In this example, Transactional Match compares LastName and AddressLine1.

The Output stage returns the results of the dataflow through an API or web service response.