Histogram (Deprecated) - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

This deprecated node creates a list of unique values from the input data set with an instance count for each value. The most commonly occurring value is listed first.

CAUTION:
This node has been deprecated and will not be supported in a future release. As an alternative, the Histogram node can be used to provide similar functionality, but the underlying code is Python rather than Data360 Analyze Script.

To configure this node:

  1. In the InputFields property, specify the name of one or more fields from the input data set that you want to include in the output. Any input fields that you do not specify in this property will not be output when the node is run. For each field that you specify, each unique value within that field will be listed in a separate row in the output, along with a count of how many times the value occurs. If you want to specify more than one field, separate the field names with a comma.
  2. In the ResultField property, you can choose to specify a name for the output field that will contain the count for each value. If you do not specify a different value, the default value of "Count" will be used.
  3. By default, the node will sort the output by frequency of occurrence, listing the most common values first. If you would prefer to sort by the input values, rather than by occurrence, set the SortResults property to False.

The Histogram node is best used with data sets where the fields have a relatively small number of unique values. If your input data does not have a limited set of values, one approach is to group the values using a Transform node to reduce the number of unique values before working with the Histogram node, as in the following example.

Example - Working with numeric data

This example uses the default data in a Create Data node and analyzes the values in the "rand" field.

As this field contains a large number of unique values, before working with the Histogram node, you can use a Transform node to group values into ranges to reduce the number of unique values.

  1. Create a basic data flow containing a Create Data node connected to a Transform node, then connect the Transform node to a Histogram node.
  2. In the Transform node ConfigureFields property, enter the following Python script:
    out1.rand = in1.rand
    out1.range = str

    In the ProcessRecords property, enter the following:

    x = in1.rand
    if x < 0:
        range = "< 0"
    elif x < 1000:
        range = "0 to 999"
    elif x < 10000:
        range = "1K to 10K"
    else:
        range = "> 10K"
    
    out1.rand = x
    out1.range = range
  3. Run the Create Data and Transform nodes.

    The Transform node outputs the "rand" field along with a "range" field which categorizes each value from the "rand" field into one of the four specified groups. You will notice that all rows fall into either the "< 0", ">10K" or the "1K to 10K" group.

  4. In the InputFields property of the Histogram node, click the menu button then select Input Fields > in1 > range then run the Histogram node.

    The Histogram node output is as follows:

Count

integer

range

string

14 >10K
4 1K to 10K
2 < 0

Properties

InputFields

Specify an expression containing the names of the fields to analyze. For example:

  • 'type' - to count occurrences of values within a single field
  • 'type', status, order_id - to count occurrences of value combinations across multiple fields
Note: You might need to surround a field name in single quotes if it is a reserved keyword for scripting.

A value is required for this property.

ResultField

Optionally specify the name of the field to create and populate with occurrence counts.

The default value is Count.

SortResults

Optionally specify whether to sort the results by descending frequency of occurrence count.

The default value is True.

Inputs and outputs

Inputs: in1.

Outputs: out1.