Determining if a Prospect is a Customer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Product family
Spectrum > Quality > Spectrum Quality
Product name
Spectrum Data Quality
Spectrum Data Quality Guide
Topic type
How Do I
First publish date

This dataflow template demonstrates how to evaluate prospect data in an input file to customer data in a customer database to determine if a prospect is a customer. This is a service dataflow, meaning that the dataflow can be accessed via the API or web services.

Business Scenario

As a sales executive for an online sales company you want to determine if an online prospect is an existing customer or a new customer.

The following dataflow service provides a solution to the business scenario:

Business flow solution dataflow

This dataflow template is available in Enterprise Designer. Go to File > New > Dataflow > From template and select ProspectMatching. This dataflow requires these products: Advanced Matching and Universal Name.

For each record in the input file, this dataflow does the following:


The selected input fields for this template are AddressLine1, City, Name, PostalCode, and StateProvince. AddressLine1 and Name are the fields that are key to the dataflow processing in this template.

Name Parser

In this template, the Name Parser stage is named Parse Personal Name. Parse Personal Name stage examines name fields and compares them to name data stored in the Spectrum Technology Platform name database files. Based on the comparison, it parses the name data into First, Middle, and Last name fields, assigns an entity type, and a gender to each name. It also uses pattern recognition in addition to the name data.

In this template the Parse Personal Name stage is configured as follows.

  • Parse personal names is selected and Parse business names is cleared. When you select these options, first names are evaluated for gender, order, and punctuation and no evaluation of business names is performed.
  • Gender Determination Source is set to default. For most cases, Default is the best setting for gender determination because it covers a wide variety of names. However, if you are processing names from a specific culture, select that culture. Selecting a specific culture helps ensure that the proper gender is assigned to the names. For example, if you leave Default selected, then the name Jean will be identified as a female name. However, if you select French, it will be identified as a male name.
  • Order is set to natural. The name fields are ordered by Title, First Name, Middle Name, Last Name, and Suffix.
  • Retain periods is cleared. Any punctuation in the name data is not retained.

Candidate Finder

The Candidate Finder stage is used in combination with the Transactional Match stage.

The Candidate Finder stage obtains the candidate records that will form the set of potential matches that the Transactional Match stage will evaluate. In addition, depending on the format of your data, Candidate Finder may need to parse the name or address of the suspect record, the candidate records, or both.

As part of configuring Candidate Finder, you select the database connection through which the specified query will be executed. You can select any connection configured in Management Console. To connect to a database not listed, configure a connection to that database in Management Console, then close and reopen Candidate Finder to refresh the connection list.

To define the SQL query you can type any valid SQL select statement into the text box on the Candidate Finder Options view. For example, assume you have a table in your database called Customer_Table that has the following columns:







Note: You can type any valid SQL select, however, Select * is not valid in this control.

To retrieve all the rows from the database, you might construct a query similar to the following:

select Cust_Name, Cust_Address, Cust_City, Cust_State, Cust_Zip from Customer_Table; 

However, it is unlikely that you would want to match your transaction against all the rows in the database.To return only relevant candidate records, you will want to add a WHERE clause using variable substitution.Variable substitution refers to a special notation that you will use to cause the Candidate Selection engine to replace the variable with the actual data from your suspect record.

To use variable substitution, enclose the field name in braces preceded by a dollar sign using the form ${FieldName}. For example, the following query will return only those records that have a value in Cust_Zip that matches the value in PostalCode on the suspect record.

select Cust_Name, Cust_Address, Cust_City, Cust_State,Cust_Zip 
from Customer_Table 
where Cust_Zip = ${PostalCode}; 

Next you need to map database columns to stage fields if the column names in your database do not match the Component Field names exactly. If they do match they will be automatically mapped to the corresponding Stage Fields. You will need to use the Selected Fields (columns from the database) to map to the Stage Fields (field names defined in the dataflow).

Again consider the Customer_Table from the above example:







When you retrieve these records from the database, you need to map the column names to the field names that will be used by the Transactional Match stage and other stages in your dataflow. For example, Cust_Address might be mapped to AddressLine1, and Cust_Zip would be mapped to PostalCode.

  1. Select the drop-down list under Selected Fields in the candidate Finder Options view. Then, select the database column Cust_Zip.
  2. Select the drop-down list under Stage Fields. Then, select the field to which you want to map.

For example, if you want to map Cust_Zip to Postal Code, first select Cust_Zip under Selected fields and then select PostalCode on the corresponding Stage Field row.

In addition to mapping fields as described above, you can use special notation in your SQL query to perform the mapping.To do this, you will enter the name of the Stage Field, enclosed in braces, after the column name in your query.When you do this, the selected fields will be automatically mapped to the corresponding stage fields.

An example of this using the query from the previous example follows:

select Cust_Name {Name}, Cust_Address {AddressLine1}, 
 Cust_City {City}, Cust_State {StateProvince}, 
 Cust_Zip {PostalCode} 
from Customer 
where Cust_Zip = ${PostalCode}; 

Transactional Match

The Transactional Match stage is used in combination with the Candidate Finder stage.

The Transactional Match stage allows you to match suspect records against potential candidate records that are returned from the Candidate Finder Stage.

Transactional Match uses matching rules to compare the suspect record to all candidate records with the same candidate group number (assigned in Candidate Finder) to identify duplicates. If the candidate record is a duplicate, it is assigned a collection number, the match record type is labeled a Duplicate, and the record is then written out. Any unmatched candidates in the group are assigned a collection number of 0, labeled as Unique and then written out as well.

In this template, you create a custom matching rule that compares LastName and AddressLine1.

Here are some guidelines to follow when creating your matching hierarchy:

  • A parent node must be given a unique name. It can not be a field.
  • The child field must be a Spectrum Technology Platform data type field, that is, one available through one or more stages.
  • All children under a parent must use the same logical operators. To combine connectors you must first create intermediate parent nodes.
  • Thresholds at the parent node could be higher than the threshold of the children.
  • Parent nodes do not have to have a threshold.


As a service, this template sends all available fields to the output. You can limit the output based on your needs.