Analyzing Results - discovery - 23.1

Spectrum Discovery Guide

Product type
Product family
Spectrum > Discovery
Product name
Spectrum Discovery
Spectrum Discovery Guide
Topic type
How Do I
First publish date
The Analyze Results page displays the generated nested Boolean match rule, and potential match key components learnt from the information provided by you. The match rule can be reviewed and exported to the match rules repository of Match Rules Management option in the Enterprise Designer; this can further be consumed in your batch jobs. The potential match key components can be used in the Match Key Generator stage of the Enterprise Designer after reviewing.

Match Rule Tab

This tab displays the match rule and the conditions associated with it with the attributes such as Threshold, Scoring Method, Algorithms, Missing Data, and Matching Method, and the values for each of these attributes.

It further provides the capabilities to select a match key from the Linked Match Key drop-down that you need to link to the match rule. You can select a match key available in the repository or from the match keys suggested by the system. If you select a system-suggested match key, make sure that you give it a name at the time of publishing, as the system-suggested default names are Match Key 1, Match Key 2, and so on.
Note: You can unlink the match key at any point in time by clearing the match key. It is mandatory to publish the rule again to accommodate the changes.

You can see a preview of the match key too by clicking the Preview Match Key button. As you click, a new window opens, where you can further modify the match key available in the repository or use the system-suggested match key per your needs.

Note: The Spectrum Smart Data Quality (SDQ) is integrated with Data Stewardship, which helps you improve the match rules based on the exception handling done in Data Stewardship. When you save the manual updates to the records in Data Stewardship, it reflects as a notification on the Projects page in SDQ, corresponding to the project you made modifications to.
Note: Finally, the Data Stewardship Data Quality page provides information regarding trends across data flows and stages.

Match Key Tab

This tab displays potential match key components in a tabular format. It also displays the Column in which the match key component was detected along with the Algorithm to be used. The Average Group Size helps you determine the average size of the group for your match key, which is generated for the complete dataset. You get the average of each value based on each match key combination to avoid any loss of accuracy. You can review and choose to consume any of the potential match key components based on your scenario by adding these in the Match Key Generator stage of Enterprise Designer.
Note: As of now, the below algorithms are supported.
Algorithm Description
Soundex Returns a Soundex code of selected fields. Soundex produces a fixed-length code based on the English pronunciation of a word.
Metaphone Returns a Metaphone coded key of selected fields. Metaphone is an algorithm for coding words using their English pronunciation.
Consonant Returns specified fields with consonants removed.
Substring Returns a specified portion of the selected field.
Nysiis Phonetic code algorithm that matches an approximate pronunciation to an exact spelling and indexes words that are pronounced similarly. Part of the New York State Identification and Intelligence System.

Say, for example, that you are looking for someone's information in a database of people. You believe that the person's name sounds like "John Smith", but it is in fact spelled "Jon Smyth". If you conducted a search looking for an exact match for "John Smith" no results would be returned. However, if you index the database using the NYSIIS algorithm and search using the NYSIIS algorithm again, the correct match will be returned because both "John Smith" and "Jon Smyth" are indexed as "JAN SNATH" by the algorithm.

Double Metaphone Returns a code based on a phonetic representation of their characters. Double Metaphone is an improved version of the Metaphone algorithm, and attempts to account for the many irregularities found in different languages.
MD5 A message digest algorithm that produces a 128-bit hash value. This algorithm is commonly used to check data integrity.
Example: This table displays a potential match key- Match Key 1 detected in the phone column with an average group size of 2. The algorithm to be used is SUBSTRING (1, 7), where 1 is the starting index, and 7 is the last index to be specified in the options of the Match Key Generator stage. The starting index is fixed to 1 for all potential match key components.
Match Key Column Algorithm Average Group Size
Match Key 1 phone SUBSTRING (1, 7) 2

Based on the actions performed by you : Variations present in the sample data uploaded, Columns selected for matching , and Records Tagged, the system has unlocked patterns present in your data to provide you with a match rule and potential match key components. It is suggested to test the generated results on your dataset.