The Group node allows you to group records by one or multiple fields, in order to produce an associated array. The arrays that are associated to each group can then be processed using other Analysis features, such as array functions or the Javascript node.
Grouped Output Field Name
This is the name of the new field that will be produced by your grouping and will contain one array for each group.
Fields to Group By
This is the set of fields that will be used to create the groups.
Group Node Example
Suppose you had the following dataset.
purpose |
loan_amnt |
installment |
---|---|---|
credit_card |
1000 |
150 |
credit_card |
1500 |
200 |
credit_card |
2000 |
250 |
home_improvement |
5000 |
300 |
home_improvement |
10000 |
400 |
home_improvement |
20000 |
900 |
small_business |
10000 |
500 |
small_business |
15000 |
750 |
small_business |
17500 |
850 |
If you were to create a Grouped Output Field named amounts and used purpose as the Field to Group By, the Group node would produce the following output.
purpose |
amounts |
---|---|
credit_card |
[{loan_amnt: 1000, installment: 150}. {loan_amnt: 1500, installment: 200}, {loan_amnt: 2000, installment: 250}] |
home_improvement |
[{loan_amnt: 5000, installment: 300}. {loan_amnt: 10000, installment: 400}, {loan_amnt: 20000, installment: 900}] |
small_business |
[{loan_amnt: 10000, installment: 500}. {loan_amnt: 15000, installment: 750}, {loan_amnt: 17500, installment: 850}] |
With this newly structured data set, you could process the amounts field as an array.
For example, you could pass the amounts field into a Javascript node and run the following script.
var sum = 0;
for(var i = 0; i< input.amounts.length; i++){
sum += input.amounts[i].loan_amnt;
}
output.total_loan_amnt = sum;
output.purpose = input.purpose;
Such a script would produce the following dataset.
purpose |
total_loan_amnt |
---|---|
credit_card |
4500 |
home_improvement |
35000 |
small_business |
42500 |
Grouping by Multiple Fields
While the example above demonstrates grouping by a single field, you can also use the Group node to group by multiple fields. For example, consider a scenario where our original dataset had an additional field called grade.
purpose |
grade |
loan_amnt |
installment |
---|---|---|---|
credit_card |
A |
1000 |
150 |
credit_card |
B |
1500 |
200 |
credit_card |
B |
2000 |
250 |
home_improvement |
A |
5000 |
300 |
home_improvement |
A |
10000 |
400 |
home_improvement |
C |
20000 |
900 |
small_business |
B |
10000 |
500 |
small_business |
B |
15000 |
750 |
small_business |
D |
17500 |
850 |
Using both purpose and grade as Fields to Group By would produce the following dataset:
purpose |
grade |
amounts |
---|---|---|
credit_card |
A |
[{loan_amnt: 1000, installment: 150}] |
credit_card |
B |
[{loan_amnt: 1500, installment: 200}, {loan_amnt: 2000, installment: 250}] |
home_improvement |
A |
[{loan_amnt: 5000, installment: 300}. {loan_amnt: 10000, installment: 400} |
home_improvement |
C |
[{loan_amnt: 20000, installment: 900}]] |
small_business |
B |
[{loan_amnt: 10000, installment: 500}. {loan_amnt: 15000, installment: 750}] |
small_business |
D |
[{loan_amnt: 17500, installment: 850}] |
Group Node Limitations
Due to browser memory, there is a limit to how many records the Group node can handle per array. You should therefore take care to minimize the amount of records that will be placed in each group.