Building a Match Rule - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

Match rules are used in Interflow Match, Intraflow Match, and Transactional Match to define the criteria that determine if one record matches another. Match rules specify the fields to compare, how to compare the fields, and a hierarchy of comparisons for complex matching rules.

You can build match rules in Interflow Match, Intraflow Match, and Transactional Match. You can also build match rules in the Enterprise Designer Match Rule Management tool. Building a rule in the Match Rule Management tool makes the rule available to use in any dataflow, and also makes it available to other users. Building a match rule in one of the matcher stages makes the rule available only for that stage, unless you save the rule by clicking the Save button, which makes it available to other stages and users.

  1. Open Enterprise Designer.
  2. Do one of the following:
    • If you want to define a match rule in Interflow Match, Intraflow Match, or Transactional Match, double-click the match stage for which you want to define a match rule. In the Load match rule field, choose a predefined match rule as a starting point. If you want to start with a blank match rule, click New.
    • If you want to define a match rule in the Match Rule Management tool, select Tools > Match Rule Management. If you want to use an existing rule as a starting point for your rule, check the Copy from box and select the rule to use as a starting point.
  3. Specify the dataflow fields you want to use in the match rule as well as the match rule hierarchy.
    1. Click Add Parent.
    2. Type in a name for the parent. The name must be unique and it cannot be a field. The first parent in the hierarchy is used as the match rule name in the Load match rule field. All custom match rules that you create and predefined rules that you modify are saved with the word "Custom" prepended to the name.
    3. Click Add Child. A drop-down menu appears in the rule hierarchy. Select a field to add to the parent.
      Note: All children under a parent must use the same logical operator. If you want to use different logical operators between fields you must first create intermediate parents.
    4. Repeat to complete your matching hierarchy.
  4. Define parent options. Parent options are displayed to the right of the rule hierarchy when a parent node is selected.
    1. Click Match when not true to change the logical operator for the parent from AND to AND NOT. If you select this option, records will only match if they do not match the logic defined in this parent.
      Note: Checking the Match when not true option has the effect of negating the Matching Method options. For more information, see Negative Match Conditions.
    2. In the Matching Method field, specify how to determine if a parent is a match or a non-match. One of the following:
      All true
      A parent is considered a match if all children are determined to match. This method creates an "AND" connector between children.
      Any true
      A parent is considered a match if at least one child is determined to match. This method creates an "OR" connector between children.
      Based on threshold
      A parent is considered a match if the score of the parent is greater than or equal to the parent's threshold. When you select this option, the Threshold slider appears. Use this slider to specify a threshold. The scoring method determines which logical connector to use. Thresholds at the parent cannot be higher than the threshold of the children.
      Note: The threshold set here can be overridden at runtime in the Dataflow Options dialog box. Go to Edit > Dataflow Options and click Add. Expand the stage, click Top level threshold, and enter the threshold in the Default value field.
    3. In the Missing Data field, specify how to score blank data in a field. One of the following:
      Ignore blanks
      Ignores the field if it contains blank data.
      Count as 0
      Scores the field as 0 if it contains blank data.
      Count as 100
      Scores the field as 100 if it contains blank data.
      Compare Blanks

      Scores the suspect and candidate fields as 100 if they both contain blank data; otherwise, scores the suspect and candidate fields as 0.

    4. In the Scoring method field, select the method used for determining the matching score. One of the following:
      Weighted Average
      Uses the weight of each child to determine the average match score.
      Average
      Uses the average score of each child to determine the score of a parent.
      Maximum
      Uses the highest child score to determine the score of a parent.
      Minimum
      Uses the lowest child score to determine the score of a parent.
      Vector Summation
      Uses the vector summation of each child score to determine the score of the parent. The formula for calculation is:

      sqrt(a^2 + b^2 + c^2) / sqrt(n), where: a, b, and c are the scores of three children and n is the number of children.

      The following table shows the logical relationship between matching methods and scoring methods and how each combination changes the logic used during match processing.

      Table 1. Matching Method-to-Scoring Method Matrix
      Scoring Method Matching Method Comments
      Any True All True Based on Threshold
      Weighted Average n/a AND AND Only available when All True or Based on Threshold are selected as the Matching Method.
      Average n/a AND AND
      Vector Summation n/a AND AND
      Maximum OR n/a OR Only available when Any True or Based on Threshold are selected as the Matching Method.
      Minimum OR n/a OR
  5. Define child options. Child options are displayed to the right of the rule hierarchy when a child is selected.
    1. Check the option Candidate field to map the child record field selected to a field in the input file.
    2. Check the option Cross match against and select one or more items from the dropdown list to match different fields to one another between two records. If you are using the Match Rule Management tool to create or edit a match rule, there will be no dropdown and you will instead need to enter each field name, separated by commas.
    3. Click Match when not true to change the logical operator from AND to NOT. If you select this option, the match rule will only evaluate to true if the records do not match the logic defined in this child.

      For example, if you want to identify individuals who are associated with multiple accounts, you could create a match rule that matches on name but where the account number does not match. You would use the Match when not true option for the child that matches the account number.

    4. In the Missing Data field, specify how to score blank data in a field. One of the following:
      Ignore blanks
      Ignores the field if it contains blank data.
      Count as 0
      Scores the field as 0 if it contains blank data.
      Count as 100
      Scores the field as 100 if it contains blank data.
      Compare Blanks

      Scores the suspect and candidate fields as 100 if they both contain blank data; otherwise, scores the suspect and candidate fields as 0.

    5. In the Threshold field, specify the threshold that must be met at the individual field level in order for that field to be determined a match.
    6. In the Scoring method field, select the method used for determining the matching score. One of the following:
      Weighted Average
      Uses the weight of each algorithm to determine the average match score.
      Average
      Uses the average score of each algorithm to determine the match score.
      Maximum
      Uses the highest algorithm score to determine the match score.
      Minimum
      Uses the lowest algorithm score to determine the match score.
      Vector Summation
      Uses vector summation of the score of each algorithm to determine the match score. This scoring method is useful if you want a higher match score in one or more algorithms to get proportionately represented in the final match score. The formula used for calculating the final score is:

      sqrt(a^2 + b^2 + c^2) / sqrt(n), where: a, b, and c are the scores of three different algorithms and n is the number of algorithms used.

    7. Choose one or more algorithms to use to determine if the values in the field match.
      For more information, see Algorithms to determine matching values.
  6. If you are defining a rule in Interflow Match, Intraflow Match, or Transactional Match, and you want to share the rule with other stages and/or users, click the Save button at the top of the window.