Agg Ex (Deprecated) - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

This deprecated node groups data based on a value or values that you specify in the GroupBy property.

CAUTION:
This node has been deprecated and will not be supported in a future release. As an alternative, the Aggregate node can be used to provide similar functionality, but the underlying code is Python rather than Data360 Analyze Script.

The input must be sorted by the field(s) in the GroupBy property. The Agg Ex (Extended Aggregate) node allows you to sort data as it is brought into the node via the SortInput property.

To configure the Agg Ex node:

  1. In the Script property, enter additional Data360 Analyze Script to control how the Agg Ex node analyzes the data. Some common functions that can be used are:
    • groupCount() - Counts the total number of instances of each value per field(s) specified in the GroupBy property. For example, if you specify two fields in the GroupBy property, then the groupCount function will only count values that match across the two fields. If you specify only one field in the GroupBy property, then all instances of each value in that field will be counted regardless of whether the corresponding values in other fields match or not.
    • groupSum() - Produces a sum of all matching values based on the field(s) specified in the GroupBy property. If the data type of the specified field is numeric, the numerical sum is returned. For string data sets, the values are concatenated.
    • firstInGroup - Boolean variable that has a value of true for the first instance of each unique value within each field specified in the GroupBy property. Commonly used to reset counters and initialize variables on each new value.
    • lastInGroup - Boolean variable that has a value of true for the last instance of each unique value within each field specified in the GroupBy property. This variable is included by default in the Script property to ensure that only one record is output per group after all records have been processed.
  2. In the GroupBy property, specify the field(s) by which you want to group the data. Note, using a value of either 1 or true in the GroupBy property will cause the aggregation to be performed over the entire input data set, rather than separately for each sub-group.
  3. If the input data is not sorted, set the SortInput property to true.

Example

In the following examples, the SortInput property is set to true.

Aggregating across one field using groupCount()

You have the following input data:

Product_Codeint Product_Namestring Payment_Amountunicode Customer_IDunicode
15 Tea 1.2 a
2 Coffee 1.5 b
3 Water NULL c
15 Tea_Earl Grey 1.2 d
15 Tea_Herbal 1.2 e
15 Tea 1.2 f
15 Tea 1.2 g

In the Script property, enter the following code:

Total = groupCount()emit Total, referencedFields(1, {{^GroupBy^}})where lastInGroup

In the GroupBy property, enter:

Product_Name

The output would list the Product_Name field and a "Total" column showing the total count of matching unique values:

Totalint Product_Namestring
1 Coffee
3 Tea
1 Tea_Earl Grey
1 Tea_Herbal
1 Water

Aggregating across two fields using groupCount()

You have the following input data:

Product_Codeint Product_Namestring Payment_Amountunicode Customer_IDunicode
15 Tea 1.2 a
2 Coffee 1.5 b
3 Water NULL c
15 Tea_Earl Grey 1.2 d
15 Tea_Herbal 1.2 e
15 Tea 1.2 f
15 Tea 1.2 g

In the Script property, enter the following code:

Total = groupCount()emit Total, referencedFields(1, {{^GroupBy^}})where lastInGroup

In the GroupBy property, enter:

Product_Name,Product_Code

The output would list the fields specified in the GroupBy property and would also include a "Total" column showing the total count of matching unique values across the 2 input fields:

Totalint Product_Namestring Product_Codeunicode
1 Coffee 2
3 Tea 15
1 Tea_Earl Grey 15
1 Tea_Herbal 15
1 Water 3

Aggregating across one field using groupSum()

You have the following input data:

Product_Codeint Product_Namestring Payment_Amountdouble
15 Tea 1.2
15 Tea 1.2
15 Tea 1.2

In the Script property, enter the following code:

Sum = groupSum(Payment_Amount)emit Sum,Product_Namewhere lastInGroup

In the GroupBy property, enter:

Payment_Amount

The output shows the sum of the matching Payment_Amount values grouped by the Product_Name:

Sumdouble Product_Namestring
3.6 Tea

Properties

SortInput

Optionally specify whether the input is sorted based on the field(s) specified in the GroupBy property.

The default value is False.

Script

Specify Script to control how the Agg Ex node analyzes the input data. Some common Script functions that can be used in the Agg Ex node are:

  • groupCount()
  • groupSum()
  • firstInGroup
  • lastInGroup

A value is required for this property.

GroupBy

Specify the fields by which you want to group the data. Note, using a value of either 1 or true in the GroupBy property will cause the aggregation to be performed over the entire input data set, rather than separately for each sub-group.

A value is required for this property.

StableSort

Optionally specify whether, for any items that are already in order, their order is maintained.

The default value is False.

VerifyInputsSorted

Optionally specify whether to verify that the inputs are sorted according to the values specified in the GroupBy property.

The default value is True.

VerifyInputsGrouped

Optionally specify whether to verify that the inputs are grouped according to the value(s) specified in the GroupBy property. This can be used in place of the VerifyInputsSorted property, but note that setting this property to true does not automatically disable the VerifyInputsSorted property which needs to be disabled separately. Note, using this property uses more memory than using the VerifyInputsSorted property.

The default value is False.

Inputs and outputs

Inputs: in1.

Outputs: out1.