Aggregates data within groups, based on a value or values specified in the GroupBy property.
- Connect the Aggregate node to a node that has successfully finished running. This will ensure that the input data is available.
- In the GroupBy property, select the field by which you want to
group the aggregate values. You can type in the drop-down to filter the list of fields to
those which contain the typed text. You can select multiple fields by repeating this step,
or you can leave this property blank to perform the aggregation over the entire input data
set.Tip: If you have connected the Aggregate node to a node that has not yet run, the GroupBy property will not contain a list of available fields. In this case, you can type the name of a field in the GroupBy property and press Enter to commit the field name. When the node is executed, an error will be generated if this field does not exist.
By default, the Aggregate node will also sort your data in ascending order (low to high) before the records are processed. The menu button to the right of each item in the Fields grid allows you to:
- Sort your data in descending order (high to low).
- Sort the selected field case insensitively.
- Compare Substrings.
- Delete the selected field.
The GroupBy property is a multi-field picker, a property type which is found on a number of nodes. For more information on this property type, see Multi-field picker.
- In the Operations grid, specify how you want to aggregate the data by selecting an Operation and a Field. The selected operation will be calculated over the rows of data grouped by the fields specified in the GroupBy property. You can edit the Output Field Name as required. Repeat this step if you want to add multiple operations.Tip: The Count function will always provide a count of the records in the grouping, regardless of the field selected.Tip: The Sample StDev and Sample Variance functions assume the values provided represent a sample, rather than the entire population.
For more information on the available aggregate operations, see Grouping functions.
Examples
Grouping, summing and finding the minimum value
You have the following input to an Aggregate node:
ZipCodestring | SalePricedouble | SatisfactionScoreint |
---|---|---|
45228 | 12.84 | 6 |
84140 | 127.68 | 9 |
33064 | 198.65 | 7 |
33064 | 29.75 | 6 |
84140 | 22.05 | 7 |
84140 | 128.95 | 8 |
45228 | 75.99 | 4 |
- In the GroupBy property, select
ZipCode
to group the data by Zip code. - Select the following options in the Operations grid, and rename the output field names as follows:
Operation Field Output Field Name Count ZipCode Count Sum SalePrice SalesTotal Min SatisfactionScore LeastSatisfied Tip: The order of the fields in the output is determined by the order of operations listed in the grid. - After renaming the last output field name, click away from the Operations property to ensure that your final edit has been committed, then run the node.
The output is as follows:
ZipCodestring | Countlong | SalesTotaldouble | LeastSatisfiedint |
---|---|---|---|
33064 | 2 | 228.4 | 6 |
45228 | 2 | 88.83 | 4 |
84140 | 3 | 278.68 | 7 |
Excluding Null values
You have the following input to an Aggregate node:
ZipCodestring | SalePricedouble | SatisfactionScoreint |
---|---|---|
45228 | 12.84 | 6 |
84140 | 127.68 | 9 |
33064 | 198.65 | 7 |
33064 | 29.75 | 6 |
84140 | 22.05 | 7 |
84140 | 128.95 | 8 |
45228 | 75.99 | NULL |
- In the GroupBy property, select
ZipCode
to group the data by Zip code. - Select the Advanced tab of the Operations property.
In the Configure Fields property, specify which input field names to use in the output, and which output fields to create using aggregate functions. Common aggregation functions are
group.count()
,group.sum()
,group.min()
,group.max()
andgroup.avg()
. This example illustrates how the group() functions can accept additional arguments to modify their behavior e.g. to exclude Null values.Type the following in Configure Fields:
#Copy over GroupBy field outputs[0]["ZipCode"] = inputs[0]["ZipCode"] #Statistics outputs[0]["Count"] = group.count(fields["ZipCode"]) outputs[0]["TotalSales"] = group.sum(fields["SalePrice"]) outputs[0]["LeastSatisfied"] = group.min(fields["SatisfactionScore"]) #Count all answered Satisfaction Scores, excluding any Null values outputs[0]["Answered"] = group.count(fields["SatisfactionScore"], False)
- In Process Records, specify that you want to copy all records from the input to the output by typing:
outputs[0] += inputs[0]
outputs[0] += inputs[0]
" is not tied to any particular input or output pin names, and as such will continue to work even if the input or output names change.
The output would be as follows:
ZipCodestring | Countlong | TotalSalesdouble | LeastSatisfiedint | Answeredlong |
---|---|---|---|---|
33064 | 2 | 228.4 | 6 | 2 |
45228 | 2 | 88.83 | 6 | 1 |
84140 | 3 | 278.68 | 7 | 3 |
See Grouping functions for further examples of using Python script to configure the Aggregate node.
Properties
GroupBy
Select or type the names of the fields that you want to group by.
By default, the Aggregate node will also sort the data in ascending order (low to high). From the menu button to the right of the field name, you have the option to change the sort order to Sort Descending (high to low), you can select Case Insensitive sorting, or for more advanced cases you can choose to Compare Substrings. There is also an option to Delete a selected field from the list.
If you have added multiple group by criteria, you can drag and drop the fields to reorder them if needed. The order of the fields determines which field the data will be sorted by first.
For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to group by. In this case, use the notation fields.<name>
separating each field reference with a comma. To sort in descending order, use the fn.desc
function.
Example: fields.FirstName, fn.desc(fields.DOB)
Operations
Specify the operations you wish to calculate over the rows of data grouped by the fields specified in the GroupBy property.
In the Grid, select an aggregate operation, then select the field that you want to apply this operation to.
To apply multiple aggregate operations, add each new operation on a new row. The menu to the right of each row allows you to change the order of the operations or delete a row. The order of the operations in the grid dictates the order of the columns in the output.
For advanced use cases, you can select the Advanced tab to type Python script. For example, to exclude Null values from a count:
outputs[0]["OutputFieldName"] = group.count(fields["InputFieldName"], False)
A value is required for this property.
SortInput
Optionally specify whether the input will be sorted based on the fields specified in the GroupBy property.
The default value is True.
ImplicitWriteMode
Optionally specify whether or not the output records will be automatically written by the node after processing each input record. Choose from:
- Always - Unless an output record has been explicitly written within the Python script, or has been explicitly set to None, it will always be written.
- Never - Output records will never be automatically written. They need to be explicitly written within the ProcessRecords script.
- When Modified - Output records will be automatically written whenever something (including None) has been set on the output record, and the output has not been set to None, and the record has not been explicitly written.
The default value is When Modified.
ImplicitWriteEvaluation
Optionally set when the ImplicitWriteMode is to be evaluated. Choose from:
- Per Record - The ImplicitWriteMode property will be evaluated for each input record meaning records may be implicitly written per input record.
- Group End - The ImplicitWriteMode property will be evaluated at the end of each group meaning records will only ever be implicitly written at the end of each group as configured by the GroupBy property.
The default value is Group End.
UnsortedInputBehavior
Optionally specify the behavior when input data has not been sorted. Choose from:
- Error - The node will fail if the input records are not sorted according to the GroupBy criteria.
- Log - If the input records are not sorted according to the GroupBy criteria, then a warning is logged the first time the problem is encountered however the node will continue processing.
- Ignore - No action is taken if the input records are not sorted according to the GroupBy criteria.
The default value is Error.
Example data flows
A number of sample Data Flows are available from the Data360 Samples workspace, found in the Directory.
In the Directory under the /Data360 Samples/Node Examples/
folder, you will find "Aggregating data", which shows examples of how to use this node.
Inputs and outputs
Inputs: in1.
Outputs: out1.