Information about what your data looks like is important to a successful data quality project. This analysis occurs when you create a profiled (fully loaded) data source in the Discovery Center. You can make better informed decisions about how to profile and cleanse your data or plan a data integration project when you:
- Verify how complete (or incomplete) your data set is
- Recognize whether your data falls within acceptable minimum and maximum ranges
- Understand how frequently values occur in an attribute
Data distribution analysis includes:
- Data Patterns. Patterns describe the shape of a data value in an attribute and help identify format deviations, misspellings, and duplications in your data.
- Data Type Structure. Knowing the type of data you load into the repository allows you to better understand the structure of your data, including which percentage of the data consists of string, integer, decimal, and null values.
- Data Values. Each attribute and data row contains a set of values. Important metadata about data values includes how complete or incomplete the values are, the frequency the values occur, and the range in which a value is distributed across your data.
- Standard Deviation. Standard deviation is analysis that measures how dispersed the values for a numeric attribute are from the attribute's numeric average value.