Histogram - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Creates a list of unique values from the input data set with an instance count for each value. The most commonly occurring value is listed first.

To configure this node:

  1. In the InputFields property, select or type the name of one or more fields from the input data set that you want to include in the output. Any input fields that you do not specify in this property will not be output when the node is run. For each field that you specify, each unique value within that field will be listed in a separate row in the output, along with a count of how many times the value occurs.

    The InputFields property is a multi-field picker, a property type which is found on a number of nodes. For more information on this property type, see Multi-field picker.

  2. In the ResultField property, you can choose to specify a name for the output field that will contain the count for each value. If you do not specify a different value, the default value of "Count" will be used.
  3. By default, the node will sort the output by frequency of occurrence, listing the most common values first. If you would prefer to sort by the input values, rather than by occurrence, set the SortResults property to False.

The Histogram node is best used with data sets where the fields have a relatively small number of unique values. If your input data does not have a limited set of values, one approach is to group the values using a Transform node to reduce the number of unique values before working with the Histogram node, as in the following example.

Example - Working with numeric data

This example uses the default data in a Create Data node and analyzes the values in the "rand" field.

As this field contains a large number of unique values, before working with the Histogram node, you can use a Transform node to group values into ranges to reduce the number of unique values.

  1. Create a basic data flow containing a Create Data node connected to a Transform node, then connect the Transform node to a Histogram node.
  2. In the Transform node ConfigureFields property, enter the following Python script:

    out1.rand = in1.rand

    out1.range = str

    In the ProcessRecords property, enter the following:

    x = in1.rand
    
    if x < 0:
        range = "< 0"
    elif x < 1000:
        range = "0 to 999"
    elif x < 10000:
        range = "1K to 10K"
    else:
        range = "> 10K"
    out1.rand = x
    out1.range = range
  3. Run the Create Data and Transform nodes.

    The Transform node outputs the "rand" field along with a "range" field which categorizes each value from the "rand" field into one of the four specified groups. You will notice that all rows fall into either the "< 0", ">10K" or the "1K to 10K" group.

  4. In the InputFields property of the Histogram node, select the "range" field, then run the Histogram node.

    The Histogram node output is as follows:

    Count

    long

    range

    string

    14 >10K
    4 1K to 10K
    2 < 0

Properties

InputFields

Select or type the names of one or more input fields to be analyzed.

From the menu button to the right of the field name, you can select Case Insensitive matching of values, or for more advanced cases you can choose to Compare Substrings. There is also an option to Delete a selected field from the list. The output records are also sorted by these matching criteria. You have the option to change the sort order to Sort Descending (high to low).

A value is required for this property.

ResultField

Optionally specify the name of the field to create and populate with occurrence counts.

The default value is Count.

SortResults

Optionally specify whether to sort the results by descending frequency of occurrence count.

The default value is True.

Inputs and outputs

Inputs: in1.

Outputs: Histogram Data.