The Group Statistics stage allows you to run statistical operations across multiple data rows broken down into groups that you want to analyze. If no groups are defined all rows will be treated as belonging to one group.
Groups are defined by one or more fields that have the same value across multiple data rows.
For example, the data in this table could be grouped by region, state, or both.
Region | State |
---|---|
East | MD |
East | MD |
East | CT |
West | CA |
West | CA |
A group by Region would yield East and West. A group by State would yield California, Connecticut, and Maryland. A group by Region and State would yield East/Maryland, East/Connecticut, and West/California.
Input
The Group Statistics stage takes any field as input. Grouping can be performed on numeric or string data.
Options
Option | Description |
---|---|
Input fields |
Lists the fields in the flow that you can use to group records and perform calculations. |
Row |
Specifies the field or fields you want to use as categories for the calculations. For example, if you had data that included a Region field and you wanted to calculate total population by region, you would group by the Region field. To add a field, select the field in the Input fields list then click >>. |
Column |
Optional. For creating a pivot table, specifies the field or fields whose values you want to pivot into columns for the purposes of cross tabulation. To add a field, select the field in the Input fields list then click >>. For example, if you had data that includes regions and shipping dates, and you want to tally the number of shipments each day for each state, you must specify the state field as a row and the shipment date field as a column. |
Rows and Columns are presorted in the configured order |
Indicates that the input data is already sorted. If this check box is checked, the stage does not sort the data and performs the specified operation directly on the input data. |
Operation |
Specifies the calculation to perform on each group. To add an operation, select the field in the Input fields list that you want to use for the operation then click >>. For more information about the supported Group Statistics operations, see Operations. |
Type |
For the input and output fields, specifies the data type.
Note: When using the integer and long types, data can be lost if the
input number or calculated number from an operation contains
decimal data.
|
Get count of records that are computed upon | Returns the actual number of records in a group on which the
selected operation is performed. This column Computational
Count excludes those input records where the column
on which the operation is performed contains
|
Fields Tab
The Fields tab is used when creating a pivot table. For more information, see Creating a Pivot Table.Output Tab
Option | Description |
---|---|
Return one row for each group |
For each group of rows, return a single row that contains the aggregated data for all rows in the group. Individual rows will be dropped. If this option is not selected, all rows will be returned. No data will be dropped. This option is not available if you use the Percent Rank or ZScore operations. |
Return a count of rows in each group |
Returns the number of rows in each group. The default output field name that will contain the count is GroupCount. |
Return a unique ID for each group |
Returns a unique ID for each group of rows. The ID starts at 1 and increments by 1 for each additional group found. The default field name is GroupID. |