Profiling panel - Data360_Govern - Preview

Data360 Govern Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Govern
Precisely Data Integrity Suite > Govern
Version
Preview
Language
English
Product name
Data360 Govern
Title
Data360 Govern Help
Copyright
2024
First publish date
2014

The Profiling side panel is shown when any profiling data is available, for either a business or technical asset. It is populated using the Data360 Govern DataProfiles v2.0 APIs.

The name of the asset displays as the first heading, followed by the asset type, then the categories.

All categories are expanded by default, with a maximum of five entries each. Further entries can be displayed by clicking Show more. If a category has no entries, the label is not displayed.

Note: The most recent profiling data is displayed. Data profiles can be updated multiple times a day.

The Profiling panel can include these categories:

  • Sample Summary
  • Sample Quality
  • Sample Distribution
  • Top Values
  • Bottom Values
  • Invalid/Outliers
  • Shapes
  • Statistics

The sample categories are always the first categories that display and should always have data. However, there is no validation for any required fields in the API, other than profileSetDate.

Sample summary

Field Description Source
Effective Date The date of the latest set of profiling information received for the asset. profileSetDate
Total Row Count The total number of rows in an entire data set. totalCount
Sample Row Count

The number of rows that are profiled.

A percentage of the total count is also displayed.

sampleCount
Base Type The type of data. type
Semantic Type A further explanation of the type. For example, if the type is "String" the Semantic Type may be Email or Name. typeQualifier
Type Confidence

A percentage of how confident you can be that the profiling results are from the type specified. For example, "I'm 97.5% sure the data sampled was from a date field".

Displayed as a percentage with two decimal points of precision. For example, if the value in the API is .9753, then the display in the UI is 97.53%.

confidence
Match Detection The number of duplicates and similar fields in the sample data.  

Semantic types

Semantic types are standardized strings of characters that help to describe the type of information particular data represents.

When the profiling side panel is displayed, a check is carried out to see if the semantic type is found in the semantic definitions. If a match is not found, the qualifier is displayed against the Semantic Type. For example, HONORIFIC_EN.

If a match is found, the name of the semantic type displays as a link to the appropriate type, for example Date. Click the link to display the Information secondary side panel, to the left of the Profiling panel. The definition details of the relevant semantic type are displayed. Click any link on the Information secondary side panel to replace the semantic type information with its details.

If the semantic type is not sent with the profiling data, the Semantic types label will still show, but filled with dashes.

Note: Historical profiling information is never linked to semantic types because you cannot control the effective date of the semantic definition. Semantic types are always created with today as the effective date.

Match detection

The Match Detection field is displayed as part of the Sample Summary category, and shows the number of duplicates and similar fields in the sample data. Match detection is based on the data signature and data structure that is passed to Data360 Govern by Data360 Analyze after an asset is profiled. All profiled assets are checked for similar or duplicate entries, based on those passed fields. If two assets have the same data signature, they are classified as duplicates, but if they have the same data structure, they are regarded as similar assets.

Duplicates:

  • Have a red badge, positioned to the left of the label, together with the total number found. Click the link to open the Match Detection dialog.
  • If you hover your mouse over the label, a tooltip displays the number of assets detected that are of the same type and have matching data.
  • If there are no duplicates, the red badge is muted and the label appears gray with no link.

Similar fields:

  • Have an orange badge, positioned to the left of the label, together with the total number found. Click the link to open the Match Detection dialog.
  • If you hover your mouse over the label, a tooltip displays the number of assets detected that are of the same type but with different data.
  • If there are no similar fields, the orange badge is muted and the label appears gray with no link.
The Match Detection dialog includes the asset path for the item that you are investigating, and a grid with the details of either the duplicate or similar fields, depending on the link that was clicked on the Profiling panel. If you select more than one asset path, a menu button displays to the right of the filter field. Click it and select Edit Tags, where you can, for example, add one or more tags to the selected asset paths.
Note: You cannot add tags to both Duplicate Fields and Similar Fields at the same time.

Sample quality

  • There is a tool tip next to each percentage calculated, which displays the percentage of the total. The total itself is relative, for example, total of the sample, total of the valid and similar.
Field Description Source
Quality bar A single horizontal bar with a spread of counts of valid, invalid and not populated rows from the sample data.  
Valid

The number of valid values found in the sample data, based on the Type or Semantic Type.

Next to the count is a percentage. This is calculated within Data360 Govern, and equals Valid Count divided by the Sample Count.

Distinct - Indicates how many of the valid values are distinct.

matchCount

cardinality

Invalid/Outliers

The number of invalid or outlier values found in the sample data.

Next to the count is a percentage. This is calculated within Data360 Govern, and equals Invalid/Outliers Count divided by the Sample Count.

outlierCount
Null/Blank

The count of either nulls or blanks found in the sample data.

Next to the count is a percentage. This is calculated within Data360 Govern, and equals Not Populated Count divided by the Sample Count.

nullCount + blankCount

Sample distribution

The bar chart shows the distribution of samples, according to the type of data. For example, if the data is:

  • Date/Time - The bar chart shows the distribution over time.
  • String - The bar chart shows the distribution according to distinct string values.
  • Number - The bar chart shows the range distribution, together with the standard deviation and mean for the distinct values.
  • Boolean - The bar chart shows whether values are true or false.

The bar chart displays the relevant results with green bars, and also includes the invalid/outliers, if any, with a red bar and null/blank values with a gray one.

Top values, bottom values, invalid/outliers and shapes

These categories all behave in a similar way, and only display if there is data for them. Each value displays as a bar chart with the value and count.

Note: Top and bottom sample sets maintain the sort order they are received in. The one exception are number type values, which are sorted by the key value, meaning they descend for top values and ascend for bottom ones.

Next to the count a percentage displays, which is a calculation of the value count divided by the sample count.

  • Top Values - The values are from topK with the count of each in cardinalityDetail.
  • Bottom Values - The values are from bottomK with the count of each in cardinalityDetail.
  • Invalid/Outlier Values - Both the values and counts are in outlierDetail.
  • Shapes - Both the values and counts are in shapesDetail.

Statistics

There is a set of statistics that are delivered through the APIs.

Label Source
Null Count nullCount
Blank Count blankCount
Minimum Value min
Maximum Value max
Minimum Length minLength
Maximum Length maxLength
Mean mean
Standard Deviation standardDeviation
Multiline multiline
Leading Whitespace leadingWhiteSpace
Trailing Whitespace trailingWhiteSpace
Leading Zero Count leadingZeroCount
Validation Regular Expression regExp

The availability of a particular statistical value depends in part, on the data type. For example, if the data type is boolean, then only Blank Count, Null Count and Validation Regular Expression will be displayed, if those have a value.