Statistics - Data360_Analyze - 3 - 3.12

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
3.12
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2023
First publish date
2016

Performs a quick statistical analysis of numeric input data, using the sum, min, max, average, count, first, last, standard deviation, variance, and null count functions.

To configure this node:

  1. In the FieldList property, specify the names of the fields over which you want to run the analysis. You can choose fields by selecting the Input Fields option in the property menu, or you can type the names of the fields in the following format:

    fields.<field name>

    Tip: If you do not enter any field names, the node analyzes all input fields.
  2. If you want to group the records by specific fields to produce more granular results, select the fields that you want to group by in the GroupBy property.

    The GroupBy property is a multi-field picker, a property type which is found on a number of nodes. For more information on this property type, see Multi-field picker.

  3. If you want to exclude any statistical functions from the evaluation, set the corresponding property to False. For example, to exclude the average function, set the IncludeAverage property to False.
    Tip: All functions are set to run by default but can be individually turned off to optimize performance. In particular, when not in use, we recommend that you set IncludeSum and IncludeAverage to False when working with large data sets.

By default, the node outputs the statistics for all fields to one record per input field per group. To change this behavior, you can set the WideOutput property to True to output the statistics for all fields to one line per group.

Properties

FieldList

Optionally specify a comma separated list of input fields over which you want to run the analysis, in the following format: fields.<fieldName1>, fields.<fieldName2>

If no value is specified then all input fields are analyzed.

GroupBy

Select or type the names of the fields that you want to group by.

By default, the node will also sort the data in ascending order (low to high). From the menu button to the right of the field name, you have the option to change the sort order to Sort Descending (high to low), you can select Case Insensitive sorting, or for more advanced cases you can choose to Compare Substrings. There is also an option to Delete a selected field from the list.

If you have added multiple group by criteria, you can drag and drop the fields to reorder them if needed. The order of the fields determines which field the data will be sorted by first.

For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to group by. In this case, use the notation fields.<name> separating each field reference with a comma. To sort in descending order, use the fn.desc function.

Example: fields.FirstName, fn.desc(fields.DOB)

WideOutput

Optionally specify the output format. If set to True then the statistics for all fields will be output on one line per group. If set to False they will be output to one record per input field per group.

The default value is False.

IncludeCount

Optionally specify whether to calculate the number of records across the group defined by the GroupBy property.

The default value is True.

IncludeNullCount

Optionally specify whether to calculate the number of NULL values of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeSum

Optionally specify whether to calculate the sum of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeAverage

Optionally specify whether to calculate the average of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeMin

Optionally specify whether to calculate the minimum value of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeMax

Optionally specify whether to calculate the maximum value of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeFirst

Optionally specify whether to output the first value of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeLast

Optionally specify whether to output the last value of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeSampleStdev

Optionally specify whether to calculate the sample standard deviation of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludePopulationStdev

Optionally specify whether to calculate the population standard deviation of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludeSampleVariance

Optionally specify whether to calculate the sample variance of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

IncludePopulationVariance

Optionally specify whether to calculate the population variance of each field defined in the FieldList property across the group defined by the GroupBy property.

The default value is True.

SortInput

Optionally specify whether the input will be sorted based on the fields specified in the GroupBy property.

The default value is True.

UnsortedInputBehavior

Optionally specify the behavior when input data has not been sorted. Choose from:

  • Error - The node will fail if the input records are not sorted according to the GroupBy criteria.
  • Log - If the input records are not sorted according to the GroupBy criteria, then a warning is logged the first time the problem is encountered, however the node will continue processing.
  • Ignore - No action is taken if the input records are not sorted according to the GroupBy criteria.

The default value is Error.

Inputs and outputs

Inputs: Input Records.

Outputs: Statistics.