Profile - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
Last updated
2024-10-09
Published on
2024-10-09T14:37:51.625264

Use the data store Profile tab to profile data store data. Complete the following steps to profile a data store:

  1. Select the Pipelines menu at the top of the page.
  2. Click the menu button to the right of the data store and select Edit >Edit Stage.
  3. Select the Profile tab.
  4. Make sure Profile Data Store Data is selected.
  5. Complete the fields, using the guidance below.
  6. Click Save.
  7. From the pipeline, click the menu button to the right of the data store, and select Execute >Run.

When the execution is complete, you can view the profile data store data from the Metrics tab. For more information, see Metrics.

Data load range

Choose the conditions that determine when profiling occurs.

For example, you can choose to profile all data in the data store, or only new data.

Choose from the following options:

  • All - When profiling occurs, profile all data in the data set
  • New Data Since Last Load - profile only new data
  • Based On Date Parameters - profile data added between two dates.
  • Based On Work ID Parameter - profile a subset of your data, which can be altered by input or Analysis Run to create outputs parsed by Work ID. For more information, see Using Work ID Parameters.
  • Based On File Path Pattern Parameter - profile a subset of your data, sourced only from a file path that matches the pattern used in a parameter. For more information, see Using a File Path Pattern Parameter Name.
  • Based On File Path Parameter - profile a subset of your data, sourced only from a specified file.

Sampling rate percent

Specify the percentage of the number of records that are being considered for profiling that will be used as the sample size that is actually profiled.

Value limit for field values

Specify the maximum number of field values that can be displayed for a field after data profiling. The field values are available from the Profile Data Store Data table on the Metrics tab.

If the number of values found in a field is greater than the value limit, no values are displayed.

The default value is 10,000.

Profile data store

The Profile Data Store contains information about the fields in the data store. The fields are the same as those listed for the Profile Data Node, with the exception of the values and patterns fields, which are not included in the Profile Data Store.

You can select an existing data store or create a new data store from the profile data.

If any of the profile fields are missing from the selected Profile Data Store, a prompt will ask whether you want to automatically add the missing fields to the Profile Data Store.

Profile patterns data store

The Profile Patterns Data Store contains information about the content of the fields in the data store. Each entry represents a unique pattern found in a field in the profiled data store.

The Profile Patterns Data Store contains the following fields:

  • dataStoreID - The system ID of the Data Store.
  • field - The name of the field in the Data Store.
  • pattern - A representation of the pattern found.
  • patternRegex - A regular expression matching the value of the pattern field.
  • totalCount - The number of times the value of pattern occurs in field.
  • patternPercentage - totalCount expressed as a percentage of the total number of occurrences of field.

Profile values data store

The Profile Values Data Store contains information about the content of the fields in the data store. Instead of detailing the patterns found in the data, however, each entry represents a unique value found in a field in the profiled data store.

The Profile Values Data Store contains the following fields:

  • dataStoreID - The system ID of the Data Store.
  • field - The name of the field in the Data Store.
  • value - A value found in field.
  • totalCount - The number of times value occurs in field.
  • valuePercentage - totalCount expressed as a percentage of the total number of occurrences of field.
  • outlier - A boolean value, indicating whether value is an outlier.

Profile rules data store

The Profile Rules Data Store contains information about the results of Data Quality Checks.

Custom counters

You use Custom Counters to create expressions that are applied to individual fields. The result is a count of values that satisfy the expression. For example, consider the following data set:

id

value

001

100

002

125

003

150

004

175

005

200

If you create a custom counter using the expression value < 150, the value of the field is 2.

Data quality checks

Use Data Quality Checks to specify rules that you have created and apply those rules to the data in your data store. For more information, see Rule Library.

The results of the data quality checks are contained in the Profile Rules Data Store.

To add a new data quality check, click Add. The Add Data Quality Check dialog is displayed.

Enter a Check Name for your new data quality check.

Complete the tabs:

Details

  1. Select the Rule Library that contains the rule group to apply to your data.
  2. Select the Rule Group that you want to use.
  3. Enter names for the Result Field, Error Reason Field, and Result Count Field.
  4. Tick the box for Execute All Rules in a Group, or select individual rules in the Rules to Execute panel.

Placeholder mappings

For each of the placeholder fields applicable to the rule group, click Edit to open the Edit Placeholder Mapping panel.

The placeholder data type is automatically filled in based on the rule.

Choose a Field Selection Type to determine which fields in the data store will be tested against the rule group.

  • Specified Field - Select one field to be tested from the Select Incoming Fields dropdown.
  • Specified Fields - Select multiple fields to be tested from the Select Incoming Fields dropdown.
  • All Fields based on Semantic Type - Test all fields against the semantic type provided in the field definition.
  • All Fields - Test all fields.

Send profile results for Data Governance

When the Environment is configured for integration with DIS-Govern, select the Send Profile Results for Data Governance checkbox to send the results of profiling to DIS-Govern based upon the data being stored in Profile Data Store and/or Profile Values Data Store.

Executing a data store for profiling

Once you have configured a data store's Profile tab, it will become executable and executing the data store will cause the profiling to be performed. Once the execution is complete, the data store will gain a Metrics tab, which will contain a summary of each field's profiling information. This summary comes from the profiled data store's profile data store.

Additionally, each profile data store that was set up in the Profile tab will be populated with profiling information. You can then view the profiling information using ad-hoc queries in the Data360 DQ+ visualizer or in an analysis. You can also view a summary of the information contained within the profile values and profile patterns data stores by viewing the profiled data store's Metrics tab and using the View Values and View Patterns buttons.