Examining Patterns - trillium_discovery - 17.1

Trillium Discovery Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
17.1
Language
English
Product name
Trillium Discovery
Title
Trillium Discovery Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

Understanding data patterns gives you the information you need to make decisions about standards used in your data quality projects.

Examining patterns helps identify format deviations and anomalies in your data. Drilling down to data rows from a pattern adds context to help determine whether a pattern is correct or not. You can then export the data rows to your local or server system.
Guidelines:

When you examine patterns, you drill-down from a data source to see the pattern values and pattern metadata for a selected attribute. You can then drill-down to see the attribute rows that contain the pattern.

Note the following guidelines when examining each pattern type:
  • Character Patterns. If you have attributes that need to conform to a fixed format, such a date or currency format, you examine character patterns to find inconsistencies and errors. See Character Pattern Types for information about choices for pattern encoding.
  • Masks. You can examine an attribute for a unique mask pattern. A mask is a description of a word, phrase, or number and identifies each character as alphabetic, numeric, or a special character. The mask pattern is the shape produced by this encoding and shows the common qualities unique to a word(s), phrase(s), or number(s). Before you examine a mask, note the encoding conventions used by the mask pattern.
  • Metaphones. If you have an attribute that has a large number of values compared to the number of metaphones, this is an indication that you may have multiple misspelled values.
  • Soundexes. Discovery Center groups data values that have been analyzed as having similar sounds and identifies them as soundexes. Examining soundexes helps you find duplicated data and misspellings.
    Note: Soundexes are not available for numeric values and non-ASCII encoded data.

Examine all patterns in an attribute

To examine all patterns in an attribute

  1. Open a data source.
  2. Click the Attribute Details tab.
  3. In the Attribute Name list, select the attribute that contains the patterns you want to examine. A tab named for the attribute opens below the Data Source: Name panel showing an overview of the attribute's metadata. Rows showing metadata for discovered character patterns, masks, soundexes, and metaphones are highlighted in blue. Note the value. This is the number of unique patterns in the attribute.
  4. Double-click a pattern. The pattern: attribute_name tab opens showing all patterns of the selected type in the attribute, along with metadata such as pattern frequency and distribution %.
  5. To see pattern values, double-click a row. The Values: Selected pattern tab opens showing the pattern values, frequency, distribution %, length, and other metadata.
  6. To see data rows that contain the pattern values, double-click a row. The Data Rows: Selected pattern Value tab opens.
  7. (Optional) Export selected rows to your local or server system as a .CSV file. You can export all rows to your server system. See Exporting Tab Rows.

Examine pattern values, metadata, and associated data rows

To examine pattern values, metadata, and associated data rows

  1. Open a data source.
  2. Click the Attribute Details tab.
  3. In the Attribute Name list, select the attribute that contains the patterns you want to examine. The charts and graphs populate with details for that attribute's values, patterns, and structure.
  4. In the Patterns drop-down list, select the pattern you want to view; either Character Patterns, Masks, Metaphones, or Soundexes. The Patterns distribution chart populates with up to 10 bars. Each bar corresponds with a unique pattern discovered in the attribute.
    Note: For more information about using the charts and graphs, seeData Visualization Tools.
  5. Hover over a bar to see the pattern value and metadata, including frequency, distribution %, value length (character patterns only), , and value count.
    mask stats (masks only)
    Total number of alphabetic (A) and numeric (N) characters in the mask pattern. If there are no alphabetic or numeric characters, a question mark (?) displays
  6. Double-click a bar to open the Values: Selected Pattern tab showing the values for the pattern, along with the following metadata:

    Column Name

    Description

    Value The word, phrase, or number represented by the pattern.

    Frequency

    The number of times the pattern occurs in the attribute.

    Distribution %

    The measure of how much of the attribute contains the pattern.

    Length

    The length of the pattern in characters.

    Mask Recode

    All mask values. (Mask pattern only.)

    Mask Recoded Value

    Value that corresponds to the mask, with the mask recode applied. (Mask pattern only.)

    Mask Recoded Status

    Either Yes or No. Yes indicates a recoded mask. No indicates no recode has been applied. (Mask pattern only.)
  7. Examine the data values that match the pattern code(s) you selected.
  8. To see the rows in the attribute that contain the selected pattern value, double-click a row. The Data Rows: Selected Pattern Value tab opens.
  9. (Optional) Export selected rows to your local or server system as a .CSV file. You can export all rows to your server system. See Exporting Tab Rows.