Examining Duplicate Data Rows and Values - trillium_discovery - 17.1

Trillium Discovery Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
17.1
Language
English
Product name
Trillium Discovery
Title
Trillium Discovery Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

When you import data into a repository, the data is analyzed to identify structural information such as source type, minimum and maximum row length, the count of duplicate values, and the count of rows that are potentially a duplicate of other rows in an attribute. Duplicate data rows indicates non-unique values. Review duplicate values and duplicate data rows to identify candidates for remediation.

To examine duplicate rows, you drill-down from a data source to view metadata for each duplicate value. You then open the data rows that contain the duplicate values.

To examine potential duplicate data rows

  1. Open a data source.
  2. Click the Source Metadata tab.
  3. Double-click Duplicate Rows if the Value column shows a number greater than 0. The Duplicates tab opens showing the following metadata for each potential duplicate value:

    Column Name

    Description

    Duplicate ID

    Indicates whether a data row is a duplicate of one or more data rows in the data source. A value of 1 indicates a data row is in the first group of duplicates found, a value of 2 indicates a data row is in the second group of duplicates found, and so on. Data rows with a value of 0 are unique.

    Duplicate Value Frequency

    The number of times the duplicate value occurs.

    Distribution %

    The measure of how much the attribute contains the duplicate data value.

  4. Examine the Duplicate Value Frequency column. The value indicates the number of data rows that contain values that are potential duplicates. Attributes can have one or multiple potential duplicate values. Each is assigned a unique Duplication ID value.
  5. Double-click a row whose values you want to examine. The Data Rows: Duplicate ID value tab opens showing the rows that contain the duplicate value. Examine the rows to verify that the information is or is not a duplicate.
    Note: If the rows contain no values, then the shared duplicate value is a null (empty) value.
  6. Optional: Export selected rows to your local or server system as a .CSV file. For more information, see Exporting Tab Rows.

Examine duplicate values

To examine duplicate values

  1. Open a data source.
  2. Click the Attribute Details tab.
  3. To see duplicate values for an attribute, do any of the following:
    • In the Attribute Name list, click the name of the attribute you want to examine:
    • In the Values > Completeness table, note the total duplicate values for the attribute.
    • Click the attribute_name tab and note the value displayed in the Value column for the Duplicates metadata.
    • Click the Attribute Summary tab and note the value in the Duplicates column for each attribute. Any attribute with a value greater then zero (0) includes one or more duplicate values.
  4. Optional: Export selected rows to your local or server system as a .CSV file. For more information, see Exporting Tab Rows.