Creates a list of unique values from the input data set with an instance count for each value. The most commonly occurring value is listed first.
To configure this node:
- In the InputFields property, specify the name of one or more fields from the input data set that you want to include in the output. Any input fields that you do not specify in this property will not be output when the node is run. For each field that you specify, each unique value within that field will be listed in a separate row in the output, along with a count of how many times the value occurs. If you want to specify more than one field, separate the field names with a comma.
- In the ResultField property, you can choose to specify a name for the output field that will contain the count for each value. If you do not specify a different value, the default value of "Count" will be used.
- By default, the node will sort the output by frequency of occurrence, listing the most common values first. If you would prefer to sort by the input values, rather than by occurrence, set the SortResults property to False.
The Histogram (Superseded) node is best used with data sets where the fields have a relatively small number of unique values. If your input data does not have a limited set of values, one approach is to group the values using a Transform node to reduce the number of unique values before working with the Histogram (Superseded) node, as in the following example.
Example - Working with numeric data
This example uses the default data in a Create Data node and analyzes the values in the "rand" field.
As this field contains a large number of unique values, before working with the Histogram (Superseded) node, you can use a Transform node to group values into ranges to reduce the number of unique values.
- Create a basic data flow containing a Create Data node connected to a Transform node, then connect the Transform node to a Histogram (Superseded) node.
- In the Transform node ConfigureFields property, enter the following Python script:
out1.rand = in1.rand out1.range = str
In the ProcessRecords property, enter the following:
x = in1.rand if x < 0: range = "< 0" elif x < 1000: range = "0 to 999" elif x < 10000: range = "1K to 10K" else: range = "> 10K" out1.rand = x out1.range = range
- Run the Create Data and Transform nodes.
The Transform node outputs the "rand" field along with a "range" field which categorizes each value from the "rand" field into one of the four specified groups. You will notice that all rows fall into either the
"1K to 10K"group.
- In the InputFields property of the Histogram node, click the menu button then select Input Fields > in1 > range then run the Histogram (Superseded) node.
The Histogram (Superseded) node output is as follows:
|4||1K to 10K|
Specify an expression containing the names of the fields to analyze. For example:
'type'- to count occurrences of values within a single field
'type', status, order_id- to count occurrences of value combinations across multiple fields
A value is required for this property.
Optionally specify the name of the field to create and populate with occurrence counts.
The default value is Count.
Optionally specify whether to sort the results by descending frequency of occurrence count.
The default value is True.
Inputs and outputs