Column Details in Profiling Results - discovery - 23.1

Spectrum Discovery Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Discovery
Version
23.1
Language
English
Product name
Spectrum Discovery
Title
Spectrum Discovery Guide
First publish date
2007
Last updated
2024-02-07
Published on
2024-02-07T17:21:58.768552
Based on what column you click in the left pane of the Data Profiling Results page, these details are displayed:
Note: The categories in the doughnut charts and values on X and Y axes of graphs are hyper-linked. Clicking the links navigates you to the associated list of records.
  • Completeness: The percentage of Complete, Null, and Empty Strings detected in the column. Click the hyperlink to view the records in each categories.
  • Uniqueness: These statistics are displayed here:
    • Unique: Records with no duplicates in the data source.
    • Non- unique: Records having duplicates in the data source.
    • Distinct: A list of all records present in your data source irrespective of those being unique or non-unique record.

    For example, your column contains these names:

    Roger
    Gigi
    Gigi
    Gigi
    Garey
    Elena
    Brad
    Brad
    Here:
    • Roger, Garey, and Elena are unique records.

    • Gigi and Brad are non-unique records.
    • Roger, Gigi, Garey, Brad, and Elena are distinct records.
  • Min Length and Max Length of characters in case the column data is a string
  • Standard deviation (Stdev), Maximum (Max), Minimum (Min), Variance, and Average value in case the column has numerical data.
  • Histogram representing the distribution of data in case the column has numerical data
  • Percentile in case the column has numerical data
  • Frequency of data in the column. Click the hyper-linked value on the X axis to go the list of associated records.
  • String Lengths in case the column data type is string
  • Scripts Distribution in case the column data type is string
  • Text Pattern Frequency in case the column data type is string
  • Character Categories: Graphically displays the frequencies of Latin character types detected in the selected string column. The various categories are:
    • Casing: Upper Case, Lower Case, and Mixed Case
    • Character Data Types: Alphabetic, Numeric, and Alphanumeric
    • Contains Spaces: Single Space, Multiple Spaces, and Trailing or Leading Spaces
    • Special Character: Contains or Does not Contain.
      Note: Only the special characters defined during configuration of Character Analysis rule will be considered here.

Viewing Outliers

You can view the outliers detected in a column by clicking the Outliers tab. This tab displays a tabular structure of category-wise detected outliers in your column. Details such as Pattern, Length, Value or Frequency, Computed Range, and Occurrence are also displayed.

Viewing the Suggestions Page

This page displays a comprehensive view of various suggestions generated for your data. You can choose to select these suggestions to cleanse your data.

This page is divided into two parts:

  • A header showing the profile details - These details are displayed for the analyzed profile:
    • Profile name: Name of the profile
    • Description: A brief description of the profile
    • Started: The date (yyyy/mm/dd), time (hh:mm:ss), and period (AM or PM) when the profile analysis was initialized.
    • Data Source: The source of the data in the profile being analyzed. It can be a physical or logical model or a direct data source connection.
    • User: The user who created the profile
    • Tables: Number of tables in the selected model
    • Data source type: Whether the data source for the profile is a model or a database
    • Records: Total number of records analyzed
    • Statistics Button: Navigates you to the Data Profiling Results page. For more information, see Viewing Suggestions on the Data Profiling Results Page.
  • The second section displays the name of your table along with a Recommendation button. On clicking this button, suggestion-wise Groovy Scripts for your table are displayed. On expanding the table through the icon, you can view column-wise suggestions for your table.
Details such as Suggestions and their corresponding Descriptions are displayed in a tabular format for various columns on which suggestive actions were generated. The View Details button displays a preview of statistics and options for your data, you can choose to select any suggestion here by clicking the Select Suggestion button or remove an already applied suggestion by clicking the Remove Suggestion button. icon signifies successful selection of a suggestion. You can choose to view and copy the groovy script for any particular suggestion by clicking the corresponding Recommendation button.
Note: To download and copy the Groovy Script for all the columns of a particular table for which suggestions are selected, click the Recommendation button placed adjacent to the table name. On clicking this button, a Recommendation window is displayed, this window displays details such as the table name, column name, and suggestion generated followed by the groovy script.

You can view and select suggestions for independent statistics from the Data Profiling Results page. Examine the statistics of your data and select any suggestion according to your requirements. For more information, see Viewing Suggestions on the Data Profiling Results Page.

Viewing Suggestions on the Data Profiling Results Page

You can view the suggestions provided to cleanse your data. These suggestions are generated based on the statistics of your data. You can choose to select these. Click the Suggestions button, a new window will pop-up showing suggestions. You can move to the next suggestion by clicking the icon.
Note: To view the transformed data after selecting an option for any suggestion, click the icon. If you wish to select the suggestion, click the Select Suggestion button, a message will be displayed on successful selection of the suggestion.

Suggestions for Summary Tab

For Character Categories, you will view any of these suggestions based on your data:
  • Character Data Type: Character data types such as numeric, alphanumeric, and alphabetic detected in your data are displayed in a tabular format with their frequencies. From the list of options, you can select to Trim or Eliminate these values, or set to a default value. You have to specify this value in the text box which appears on selecting this option.
  • Remove Spaces: Redundant spaces such as a Single Space, Multiple Spaces, and Trailing or leading Spaces present in your data are displayed in a tabular format with their frequencies. You can choose any of these options according to your requirement:
    • Trim Spaces - This option will remove any trailing or leading spaces present in your data.
    • Minimize white spaces - This option will remove redundant spaces between any two words and also remove any leading or trailing spaces
    • Remove spaces - This option will remove all spaces present in your data.
  • Standardize Casing: You can regularize your data by standardizing the casing of your data such as Mixed Case, Upper Case, and Lower Case through this option. You can choose to change the casing of your data to Lower case or Upper case, this will cleanse your data, improve it's uniformity, and readability.
  • Remove Special Characters: You can choose to remove any redundant special characters present in your data through this option. The detected special characters are displayed under the Options section in a text box. To cleanse your data, you can select the special characters to be removed and delete the characters which you want to keep using the backspace key from the text box.
  • Remove Control Characters: You can remove any redundant non-printable control characters present in your data through this option. The control characters detected in your data are displayed under the Statistics section, select the Remove Control Characters option and apply this suggestion to further cleanse your data.
For Uniqueness, you will see this suggestion:

Create Semantic Type: You can categorize your distinct data into a semantic type such as Gender, First Name, Last Name, Country, and Phone. Click on the statistics graph of the distinct data, a preview of your data is shown. Download this data and create semantic types through Table Management. For more information about Table Management, see Introduction to Lookup Tables.

For Completeness, you will see this suggestion:

Handling Nulls: You can set the null and empty strings present in your data to a default value. This increases the uniformity and completeness of your data. The frequencies of Complete, Null, and Empty strings are displayed in a tabular format under the Statistics section. Enter the desired default value in the Options text-box and select this suggestion.

Suggestions for Date Summary Tab

For Date Patterns, you will see this suggestion:

Standardize Date: You can regulate the formats of dates present in your data to improve the consistency of your data. Frequencies of various date formats detected in your data are displayed in a tabular format. Select the desired format from the date format drop-down placed under the Options section and select this suggestion.

Suggestions for Custom Pattern Tab

For Matched Patterns, you will see this suggestion:

Pattern Cleansing: You can eliminate values from your data not following the custom pattern specified by you while configuring a profile. For more information about custom patterns, see Single Column Key Analysis. The category with the highest frequency is treated as reference, other statistics are matched to it. You can select any of these options for non-matching statistics:
  • Set them to a default value. Enter the default value in the text-box provided.
  • Set them to null

After selecting the desired option, select this suggestion to cleanse your data.

Suggestions for Outlier Tab

For semantic type outliers, you will see this suggestion:

Segregating Semantic Types: You can move any semantic type outliers present in a column to a separate column and improve the uniformity and clarity of your data. The frequency of various outliers detected in your data are displayed in a tabular format, you can choose to move any of the detected outliers to a separate custom column and segregate the semantic types.

Select the outliers which you wish to move to a separate column, enter the custom name of the column in the text box and select this suggestion.

Note: On closing the suggestions window after the successful selection of any suggestion, a Show Recommendations prompt is displayed. On clicking Yes, you will be navigated to the Suggestions page. You can copy the groovy script from this page and use in the Enterprise Designer. For more information, see Viewing the Suggestions Page.