Using Probabilistic Matching - trillium_discovery - trillium_quality

Using Probabilistic Matching - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type

Software

Portfolio

Verify

Product family

Trillium

Product

Trillium > Trillium Discovery

Trillium > Trillium Quality

Version

17.1

Language

English

Product name

Trillium Quality and Discovery

Title

Trillium Control Center

Topic type

Overview

Installation

How Do I

Configuration

Administration

Reference

First publish date

2008

The probabilistic matching takes into consideration the frequency of data among a group of potential match candidates. The probabilistic matching is useful when you want to include the minimum frequency of data in the match rule, or you want to extend matches to include the frequently occurred data that otherwise fail to match.

Probabilistic Matching Process

You obtain a frequency of the data before records are to be matched, and append a flag such as a zero (0) to the record, signaling that those records have a high probability of matching. During the linking process, the Relationship Linker or Reference Matcher uses a second match field that contains the probability flag. If two records being compared contain the flag value (0), and an "A" grade is returned, the Relationship Linker or Reference Matcher automatically converts the "A" grade to an "M" grade. It then returns a match result based on the grade pattern using the M grade. It is up to the user to decide whether grade pattens with "M" should be P (matches) or S (suspect matches). If there is no pattern with "M," then the "M" will be treated as an "A."

Note: You can use all of the Relationship Linker routines except those that require multiple match fields, such as SUBSTRING.

Note: The value of the flag attribute must be ASCII characters.

Follow the order of these tasks to use the probabilistic matching function:

Step 1: Analyze the entity for input and obtain frequency counts of data.

Step 2: Scan and flag the records that contain the frequent values.

Step 3: Modify the field list in the Relationship Linker Rules Editor to use the flag attribute as a secondary field.

Step 4: Modify the grade pattern list in the Relationship Linker Rules Editor to include M grades.

Example

You ran an analysis on the input entity and found a high percentage of records contain "John" in the PR_GIVEN_NAME1_RECODED_01 attribute. So you decide to use this attribute for probabilistic matching. In the Transformer, you add an attribute called P_FLAG to the output schema and create an attribute scan. The attribute scan looks for PR_GIVEN_NAME1_RECODED_01 and if it finds "John," it returns a zero (0) in the P_FLAG attribute as a probabilistic flag. You run the Transformer to generate the flag.

Next, in the Relationship Linker Rules Editor, you add the P_FLAG attribute to the field list for PR_GIVEN_NAME1_RECODED_01. This will tell the Relationship Linker/Reference Matcher to convert any grade 'A' to a grade "M" if one or both of the records being compared has a value of zero (0) in its P_FLAG attribute.

If the Relationship Linker/Reference Matcher is comparing two records that both have PR_GIVEN_NAME1_RECODED_01 ="John " (and consequently P_FLAG="0") the comparison score will be 100 which will convert to a grade "A" and then to a grade 'M.'

You want the records with the "M" to be S (suspect) so you add a pattern S210 using the grade "M" to the grade pattern list.

Next time you run the Relationship Linker, some of the records that used to hit the pattern P110 are now hitting the pattern S210 because a grade "A" is now converted to an "M" when the P_FLAG attribute contains the probabilistic flag (0).