Data dependencies are a critical component of your data structure, which is why it is worth investigating both the quality and types of dependencies your data contains. Trillium analyzes an initial sample of data (10,000 rows) when you import data into a repository. If your data contains more than 10,000 rows, you may want to consider re-running the analysis using a larger data set.
- More information:
-
If your data contains more than 10,000 rows you will want to review all dependencies discovered during the import and then, based on the results, consider re-running the analysis using a larger (more representative) data set. Precisely recommends that you re-run a dependency analysis if you find unexpected or missing dependencies in the initial run.
When you review your dependencies, Trillium tells you if an analysis has been performed against ALL rows in your data or only a partial count. You may find that running an analysis on a subset of data is adequate for discovering all dependencies, or you may find you need to run the analysis against all rows to get a complete discovery. This is where your familiarity with your data and the data schema will help you to determine the most optimum strategy for investigation.
As you work with dependencies you'll discover how attributes within the same entity relate to each other. Understanding how dependencies structure your data can be useful to you in several ways.
Dependencies help you to discover the relationships in your data by:
- Identifying entities that are candidates for normalization
-
You can identify data that has not been fully normalized by finding dependencies that have a common left-hand attribute (Lh Attr) that consistently identifies the value in a right-hand attribute (Rh Attr) almost 100% of the time (Quality %). This would allow an entity to be split into two or more entities, further normalizing your data.
- Measuring the accuracy of business rules
-
You can run a dependency analysis to test business rules you expect your data to support. For example, you may have a business rule that states "the unique value in Postal_Code always references a unique value in City". However, if the City attribute has missing values, alternate spellings or misspellings, and other errors, a dependency analysis will reveal these.
- Testing referential constraints in data
-
Typically, there are certain attributes that should always be populated during a transaction to ensure that a record is referentially correct. For example, your business may require that an ORDER record always has the CUSTOMER_ID entered. If this is not the case for all ORDER records, a dependency analysis can identify the discrepancies.