Aggregate - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Aggregates data within groups, based on a value or values specified in the GroupBy property.

Tip: The simplest way to configure the Aggregate node is by using the Fields editor in the GroupBy property and the Grid editor in the Operations property. Alternatively, you can select the Advanced tabs to write Python script to perform advanced calculations. You must choose to work in either the Grid or the Advanced tab, as working in one will reset the other.

  1. Connect the Aggregate node to a node that has successfully finished running. This will ensure that the input data is available.
  2. In the GroupBy property, select the field by which you want to group the aggregate values. You can type in the drop-down to filter the list of fields to those which contain the typed text. You can select multiple fields by repeating this step, or you can leave this property blank to perform the aggregation over the entire input data set.
    Tip: If you have connected the Aggregate node to a node that has not yet run, the GroupBy property will not contain a list of available fields. In this case, you can type the name of a field in the GroupBy property and press Enter to commit the field name. When the node is executed, an error will be generated if this field does not exist.

    By default, the Aggregate node will also sort your data in ascending order (low to high) before the records are processed. The menu button to the right of each item in the Fields grid allows you to:

    • Sort your data in descending order (high to low).
    • Sort the selected field case insensitively.
    • Compare Substrings.
    • Delete the selected field.

    The GroupBy property is a multi-field picker, a property type which is found on a number of nodes. For more information on this property type, see Multi-field picker.

  3. In the Operations grid, specify how you want to aggregate the data by selecting an Operation and a Field. The selected operation will be calculated over the rows of data grouped by the fields specified in the GroupBy property. You can edit the Output Field Name as required. Repeat this step if you want to add multiple operations.
    Tip: The Count function will always provide a count of the records in the grouping, regardless of the field selected.
    Tip: The Sample StDev and Sample Variance functions assume the values provided represent a sample, rather than the entire population.

    For more information on the available aggregate operations, see Grouping functions.

Examples

Grouping, summing and finding the minimum value

You have the following input to an Aggregate node:

ZipCodestring SalePricedouble SatisfactionScoreint
45228 12.84 6
84140 127.68 9
33064 198.65 7
33064 29.75 6
84140 22.05 7
84140 128.95 8
45228 75.99 4

  1. In the GroupBy property, select ZipCode to group the data by Zip code.
  2. Select the following options in the Operations grid, and rename the output field names as follows:

    Operation Field Output Field Name
    Count ZipCode Count
    Sum SalePrice SalesTotal
    Min SatisfactionScore LeastSatisfied
    Tip: The order of the fields in the output is determined by the order of operations listed in the grid.
  3. After renaming the last output field name, click away from the Operations property to ensure that your final edit has been committed, then run the node.

The output is as follows:

ZipCodestring Countlong SalesTotaldouble LeastSatisfiedint
33064 2 228.4 6
45228 2 88.83 4
84140 3 278.68 7

Excluding Null values

Tip: You must choose to work in either the Grid or the Advanced tab, as working in one will reset the other. This example uses the Advanced tab.

You have the following input to an Aggregate node:

ZipCodestring SalePricedouble SatisfactionScoreint
45228 12.84 6
84140 127.68 9
33064 198.65 7
33064 29.75 6
84140 22.05 7
84140 128.95 8
45228 75.99 NULL

  1. In the GroupBy property, select ZipCode to group the data by Zip code.
  2. Select the Advanced tab of the Operations property.

    In the Configure Fields property, specify which input field names to use in the output, and which output fields to create using aggregate functions. Common aggregation functions are group.count(), group.sum(), group.min(), group.max() and group.avg(). This example illustrates how the group() functions can accept additional arguments to modify their behavior e.g. to exclude Null values.

    Type the following in Configure Fields:

    #Copy over GroupBy field
    outputs[0]["ZipCode"] = inputs[0]["ZipCode"]
    
    #Statistics
    outputs[0]["Count"] = group.count(fields["ZipCode"])
    outputs[0]["TotalSales"] = group.sum(fields["SalePrice"])
    outputs[0]["LeastSatisfied"] = group.min(fields["SatisfactionScore"])
    
    #Count all answered Satisfaction Scores, excluding any Null values
    outputs[0]["Answered"] = group.count(fields["SatisfactionScore"], False)
  3. In Process Records, specify that you want to copy all records from the input to the output by typing:
    outputs[0] += inputs[0]
Tip: The Python script "outputs[0] += inputs[0]" is not tied to any particular input or output pin names, and as such will continue to work even if the input or output names change.

The output would be as follows:

ZipCodestring Countlong TotalSalesdouble LeastSatisfiedint Answeredlong
33064 2 228.4 6 2
45228 2 88.83 6 1
84140 3 278.68 7 3

See Grouping functions for further examples of using Python script to configure the Aggregate node.

Properties

GroupBy

Select or type the names of the fields that you want to group by.

By default, the Aggregate node will also sort the data in ascending order (low to high). From the menu button to the right of the field name, you have the option to change the sort order to Sort Descending (high to low), you can select Case Insensitive sorting, or for more advanced cases you can choose to Compare Substrings. There is also an option to Delete a selected field from the list.

If you have added multiple group by criteria, you can drag and drop the fields to reorder them if needed. The order of the fields determines which field the data will be sorted by first.

For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to group by. In this case, use the notation fields.<name> separating each field reference with a comma. To sort in descending order, use the fn.desc function.

Example: fields.FirstName, fn.desc(fields.DOB)

Operations

Specify the operations you wish to calculate over the rows of data grouped by the fields specified in the GroupBy property.

In the Grid, select an aggregate operation, then select the field that you want to apply this operation to.

To apply multiple aggregate operations, add each new operation on a new row. The menu to the right of each row allows you to change the order of the operations or delete a row. The order of the operations in the grid dictates the order of the columns in the output.

For advanced use cases, you can select the Advanced tab to type Python script. For example, to exclude Null values from a count:

outputs[0]["OutputFieldName"] = group.count(fields["InputFieldName"], False)

A value is required for this property.

SortInput

Optionally specify whether the input will be sorted based on the fields specified in the GroupBy property.

The default value is True.

ImplicitWriteMode

Optionally specify whether or not the output records will be automatically written by the node after processing each input record. Choose from:

  • Always - Unless an output record has been explicitly written within the Python script, or has been explicitly set to None, it will always be written.
  • Never - Output records will never be automatically written. They need to be explicitly written within the ProcessRecords script.
  • When Modified - Output records will be automatically written whenever something (including None) has been set on the output record, and the output has not been set to None, and the record has not been explicitly written.

The default value is When Modified.

ImplicitWriteEvaluation

Optionally set when the ImplicitWriteMode is to be evaluated. Choose from:

  • Per Record - The ImplicitWriteMode property will be evaluated for each input record meaning records may be implicitly written per input record.
  • Group End - The ImplicitWriteMode property will be evaluated at the end of each group meaning records will only ever be implicitly written at the end of each group as configured by the GroupBy property.

The default value is Group End.

UnsortedInputBehavior

Optionally specify the behavior when input data has not been sorted. Choose from:

  • Error - The node will fail if the input records are not sorted according to the GroupBy criteria.
  • Log - If the input records are not sorted according to the GroupBy criteria, then a warning is logged the first time the problem is encountered however the node will continue processing.
  • Ignore - No action is taken if the input records are not sorted according to the GroupBy criteria.

The default value is Error.

Example data flows

A number of sample Data Flows are available from the Data360 Samples workspace, found in the Directory.

In the Directory under the /Data360 Samples/Node Examples/ folder, you will find "Aggregating data", which shows examples of how to use this node.

Note: Upgrades will overwrite all data flows in the workspace. If you want to make changes to one of the sample data flows, we recommend you create a copy and save it elsewhere, using Save as...

Inputs and outputs

Inputs: in1.

Outputs: out1.