Defining the SQL Query - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265
You can type any valid SQL select statement into the text box on the Candidate Finder Options dialog.
Note: Select * is not valid.
For example, assume you have a table in your database called Customer_Table that has the following columns:
  • Customer_Table
  • Cust_Name
  • Cust_Address
  • Cust_City
  • Cust_State
  • Cust_Zip

To retrieve all the rows from the database, you might construct a query similar to the following:

SELECT Cust_Name, Cust_Address, Cust_City, Cust_State, Cust_Zip from Customer_Table; 

You will rarely want to match your transaction against all the rows in the database. To return only relevant candidate records, add a WHERE clause using variable substitution. Variable substitution refers to a special notation that you will use to cause the Candidate Selection engine to replace the variable with the actual data from your suspect record.

To use variable substitution, enclose the field name in braces preceded by a dollar sign using the form ${FieldName}. For example, the following query will return only those records that have a value in Cust_Zip that matches the value in PostalCode on the suspect record.

SELECT Cust_Name, Cust_Address, Cust_City, Cust_State,Cust_Zip 
FROM Customer_Table 
WHERE Cust_Zip = ${PostalCode}; 

For SQL 2000, the data type needs to be identical to the data type for Candidate Finder. The JDBC driver sets the Candidate Finder input variable (Ex: ${MatchKey}) that is used in the WHERE clause to a data type of nVarChar(4000). If the data in the database is set to a data type of VarChar, SQL Server will ignore the index on the database. If the index is ignored, then performance will be degraded. Therefore, use the following query for SQL 2000:

SELECT Cust_Name, Cust_Address, Cust_City, Cust_State,Cust_Zip 
FROM Customer_Table 
WHERE Cust_Zip = CAST(${PostalCode} AS VARCHAR(255));