Data Distribution - trillium_discovery - 17.1

Trillium Discovery Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Version
17.1
Language
English
Product name
Trillium Discovery
Title
Trillium Discovery Center
Topic type
How Do I
Installation
Reference
Configuration
Administration
Overview
First publish date
2008

Information about what your data looks like is important to a successful data quality project. This analysis occurs when you create a profiled (fully loaded) data source in the Discovery Center. You can make better informed decisions about how to profile and cleanse your data or plan a data integration project when you:

  • Verify how complete (or incomplete) your data set is
  • Recognize whether your data falls within acceptable minimum and maximum ranges
  • Understand how frequently values occur in an attribute

Data distribution analysis includes:

  • Data Patterns. Patterns describe the shape of a data value in an attribute and help identify format deviations, misspellings, and duplications in your data.
  • Data Type Structure. Knowing the type of data you load into the repository allows you to better understand the structure of your data, including which percentage of the data consists of string, integer, decimal, and null values.
  • Data Values. Each attribute and data row contains a set of values. Important metadata about data values includes how complete or incomplete the values are, the frequency the values occur, and the range in which a value is distributed across your data.
  • Standard Deviation. Standard deviation is analysis that measures how dispersed the values for a numeric attribute are from the attribute's numeric average value.