Use the data store Profile tab to profile data store data. Complete the following steps to profile a data store:
- Select the Pipelines menu at the top of the page.
- Click the menu button to the right of the data store and select Edit >Edit Stage.
- Select the Profile tab.
- Make sure Profile Data Store Data is selected.
- Complete the fields, using the guidance below.
- Click Save.
- From the pipeline, click the menu button to the right of the data store, and select Execute >Run.
When the execution is complete, you can view the profile data store data from the Metrics tab. For more information, see Metrics.
Data load range
Choose the conditions that determine when profiling occurs.
For example, you can choose to profile all data in the data store, or only new data.
Choose from the following options:
- All - When profiling occurs, profile all data in the data set
- New Data Since Last Load - profile only new data
- Based On Date Parameters - profile data added between two dates.
- Based On Work ID Parameter - profile a subset of your data, which can be altered by input or Analysis Run to create outputs parsed by Work ID. For more information, see Using Work ID Parameters.
- Based On File Path Pattern Parameter - profile a subset of your data, sourced only from a file path that matches the pattern used in a parameter. For more information, see Using a File Path Pattern Parameter Name.
- Based On File Path Parameter - profile a subset of your data, sourced only from a specified file.
Sampling rate percent
Specify the percentage of the number of records that are being considered for profiling that will be used as the sample size that is actually profiled.
Value limit for field values
Specify the maximum number of field values that can be displayed for a field after data profiling. The field values are available from the Profile Data Store Data table on the Metrics tab.
If the number of values found in a field is greater than the value limit, no values are displayed.
The default value is 10,000.
Profile data store
The Profile Data Store contains information about the fields in the data store. The fields are the same as those listed for the Profile Data Node, with the exception of the values and patterns fields, which are not included in the Profile Data Store.
You can select an existing data store or create a new data store from the profile data.
If any of the profile fields are missing from the selected Profile Data Store, a prompt will ask whether you want to automatically add the missing fields to the Profile Data Store.
Profile patterns data store
The Profile Patterns Data Store contains information about the content of the fields in the data store. Each entry represents a unique pattern found in a field in the profiled data store.
The Profile Patterns Data Store contains the following fields:
- dataStoreID - The system ID of the Data Store.
- field - The name of the field in the Data Store.
- pattern - A representation of the pattern found.
-
patternRegex - A regular expression matching the value of the
pattern
field. -
totalCount - The number of times the value of
pattern
occurs infield
. -
patternPercentage -
totalCount
expressed as a percentage of the total number of occurrences offield
.
Profile values data store
The Profile Values Data Store contains information about the content of the fields in the data store. Instead of detailing the patterns found in the data, however, each entry represents a unique value found in a field in the profiled data store.
The Profile Values Data Store contains the following fields:
- dataStoreID - The system ID of the Data Store.
- field - The name of the field in the Data Store.
-
value - A value found in
field
. -
totalCount - The number of times
value
occurs infield
. -
valuePercentage -
totalCount
expressed as a percentage of the total number of occurrences offield
. -
outlier - A boolean value, indicating whether
value
is an outlier.
Profile rules data store
The Profile Rules Data Store contains information about the results of Data Quality Checks.
Custom counters
You use Custom Counters to create expressions that are applied to individual fields. The result is a count of values that satisfy the expression. For example, consider the following data set:
id |
value |
---|---|
001 |
100 |
002 |
125 |
003 |
150 |
004 |
175 |
005 |
200 |
If you create a custom counter using the expression value < 150
, the value of the field is 2.
Data quality checks
Use Data Quality Checks to specify rules that you have created and apply those rules to the data in your data store. For more information, see Rule Library.
The results of the data quality checks are contained in the Profile Rules Data Store.
To add a new data quality check, click Add. The Add Data Quality Check dialog is displayed.
Enter a Check Name for your new data quality check.
Complete the tabs:
Details
- Select the Rule Library that contains the rule group to apply to your data.
- Select the Rule Group that you want to use.
- Enter names for the Result Field, Error Reason Field, and Result Count Field.
- Tick the box for Execute All Rules in a Group, or select individual rules in the Rules to Execute panel.
Placeholder mappings
For each of the placeholder fields applicable to the rule group, click Edit to open the Edit Placeholder Mapping panel.
The placeholder data type is automatically filled in based on the rule.
Choose a Field Selection Type to determine which fields in the data store will be tested against the rule group.
- Specified Field - Select one field to be tested from the Select Incoming Fields dropdown.
- Specified Fields - Select multiple fields to be tested from the Select Incoming Fields dropdown.
- All Fields based on Semantic Type - Test all fields against the semantic type provided in the field definition.
- All Fields - Test all fields.
Send profile results for Data Governance
When the Environment is configured for integration with DIS-Govern, select the Send Profile Results for Data Governance checkbox to send the results of profiling to DIS-Govern based upon the data being stored in Profile Data Store and/or Profile Values Data Store.
Executing a data store for profiling
Once you have configured a data store's Profile tab, it will become executable and executing the data store will cause the profiling to be performed. Once the execution is complete, the data store will gain a Metrics tab, which will contain a summary of each field's profiling information. This summary comes from the profiled data store's profile data store.
Additionally, each profile data store that was set up in the Profile tab will be populated with profiling information. You can then view the profiling information using ad-hoc queries in the Data360 DQ+ visualizer or in an analysis. You can also view a summary of the information contained within the profile values and profile patterns data stores by viewing the profiled data store's Metrics tab and using the View Values and View Patterns buttons.