When you import data into a repository, the data is analyzed to identify structural
information such as source type, minimum and maximum row length, the count of duplicate
values, and the count of rows that are potentially a duplicate of other rows in an attribute.
Duplicate data rows indicates non-unique values. Review duplicate values and duplicate data
rows to identify candidates for remediation.
To examine duplicate rows, you drill-down from a data source to view metadata for each
duplicate value. You then open the data rows that contain the duplicate values.
To examine potential duplicate data rows
-
Open a data source.
-
Click the Source Metadata tab.
-
Double-click Duplicate Rows if the Value column shows a number greater than
0. The Duplicates tab opens showing the following metadata for each potential
duplicate value:
Column Name
|
Description
|
Duplicate ID
|
Indicates whether a data row is a duplicate of one or more data
rows in the data source. A value of 1 indicates a data row is in
the first group of duplicates found, a value of 2 indicates a data
row is in the second group of duplicates found, and so on. Data
rows with a value of 0 are unique.
|
Duplicate Value Frequency
|
The number of times the duplicate value occurs.
|
Distribution %
|
The measure of how much the attribute contains the duplicate data
value.
|
-
Examine the Duplicate ValueFrequency column. The value indicates the number of data rows that contain
values that are potential duplicates. Attributes can have one or multiple potential
duplicate values. Each is assigned a unique Duplication ID value.
-
Double-click a row whose values you want to examine. The Data Rows: Duplicate ID
value tab opens showing the rows that contain the duplicate value. Examine
the rows to verify that the information is or is not a duplicate.
Note: If the rows contain no values, then the shared duplicate value is a null
(empty) value.
- Optional:
Export selected rows to your local or server system as a .CSV file. For more
information, see Exporting Tab Rows.