Data360 Analyze Script contains a number of pre-built macros.
These macros, which act as a single instruction, are made up of a number of smaller instructions to enable you to perform common multi-step analytical procedures in one step.
As a first step in using these macros, you should assign the macro to a variable, for example:
<myVariable> = <groupMacro()>
Then, define the macro behavior by specifying which field you want to be the subject of the operation, as detailed in the following sections.
Note that while group macro values are calculated over an entire execution group, the value varies as records are processed. The value that is output is the value at the end of the execution group. If the macro contains an optional filter expression, only rows where that expression is true are counted. The firstInGroup
and lastInGroup
built-in variables can be used to determine whether execution is at the start or end of an execution group, see Built-in variables.
groupCount()
count()
macro is no longer supported. If you previously used count()
, please use groupCount()
.
groupCount
, commonly used with the Agg Ex (Deprecated) node, can be used to count the number of instances of a value for each specified field.
The groupCount
macro should be used in the following format, where filter
is an optional boolean expression specifying what to count and accumulatorVariable
is the name of the variable used to store interim results:
groupCount([filter[, accumulatorVariable]])
If the optional filter expression is given, when this value is encountered, the count is incremented.
If a field is specified in the GroupBy property, the data is grouped by the specified value and all instances of that value are counted. If two fields are specified in the GroupBy property, the groupCount
function will only count values that match across the two fields. If only one field is specified in the GroupBy property, all instances of each value in that field will be counted regardless of whether the corresponding values in other fields match or not.
groupCount()
variable is reset to 0 on firstInGroup. Due to this relying on the execution group, rather than the total number of execution iterations, the results of using groupCount
on, for example, a Transform (Deprecated) and Agg Ex (Deprecated) node may vary even with the same number of records input. Using groupCount() with a Transform (Deprecated) node
You have the following input data and want to count the number of rows:
color:string |
---|
Red |
Green |
Blue |
Yellow |
Cyan |
In the Script property, enter:
rows = groupCount()
emit rows, color
This gives the following output:
rows:int | color:string |
---|---|
1 | Red |
2 | Green |
3 | Blue |
4 | Yellow |
5 | Cyan |
Using groupCount() with an Agg Ex (Deprecated) node
You have the following input data and want to group the data by "type" and count the number of instances of each:
type:string |
---|
Type1 |
Type1 |
Type1 |
Type2 |
Type2 |
Type3 |
In the GroupBy property enter:
type
In the Script property enter:
rows = groupCount()emit rows, type
where lastInGroup
The output, as follows, shows each different "type" and how many rows there are of each:
rows:int | type:string |
---|---|
3 | Type1 |
2 | Type2 |
1 | Type3 |
In the following example, consider the same data and GroupBy property as in the previous example. This time, enter the following in the Script property:
rows = groupCount(type == "Type1")
emit rows, type
where lastInGroup
In this example, the output shows the data grouped by "type" and a count of the rows that contain the type "Type1":
rows:int | type:string |
---|---|
3 | Type1 |
0 | Type2 |
0 | Type3 |
You can use the optional accumulator variable to explicitly control the accumulation and reset of accumulation.
You have the following input data:
type:string | include:boolean | reset:boolean |
---|---|---|
Type1 | false | false |
Type1 | true | true |
Type1 | true | false |
Type2 | true | false |
Type2 | true | true |
Type2 | true | false |
Type3 | true | true |
In the GroupBy property, enter:
type
In the Script property, enter:
if ('reset') then groupVar = 0rows = groupCount('include' == true, groupVar)emit rows, type
The output would be:
rows:int | type:string |
---|---|
0 | Type1 |
1 | Type1 |
2 | Type1 |
1 | Type2 |
1 | Type2 |
2 | Type2 |
1 | Type3 |
groupMin()
Returns the minimum value of the expression attained during execution within the current execution group. If initialValue
is specified it will use that as the initial value to compare to.
The groupMin macro should be used in the following format, where the expression
is often a column or variable, the initialValue
is an optional expression, which if present, will be used as the initial value to compare to and accumulator variable
is the name of the variable used to store interim results:
groupMin(expression[, initialValue[, accumulator variable]])
Given the following input data:
rand: double |
---|
279.59 |
7.138 |
15.79 |
15.488 |
14.584 |
0.3465 |
1.865 |
0.26377 |
2.7272 |
30.176 |
1.345 |
minrand = groupMin(rand, 200.0)
emit minrand
The output would be as follows:
rand: double |
---|
200.0 |
7.138 |
7.138 |
7.138 |
7.138 |
0.3465 |
0.3465 |
0.26377 |
0.26377 |
0.26377 |
0.26377 |
In the following example, consider the same input data as in the previous example, but in this case the macro statement does not contain an initial value, as follows:
minrand = groupMin(rand)emit minrand
In this case, the output would be as follows:
rand: double |
---|
279.59 |
7.138 |
7.138 |
7.138 |
7.138 |
0.3465 |
0.3465 |
0.26377 |
0.26377 |
0.26377 |
0.26377 |
In this example, the optional accumulator variable is used with the groupMin
macro to explicitly control the accumulation and reset of accumulation, that is, you can control when to reset the concatenation of values..
You have the following input data:
rand: double |
---|
7.138 |
15.79 |
15.488 |
14.584 |
0.3465 |
1.865 |
0.26377 |
2.7272 |
30.176 |
1.345 |
Specify the following groupMin
macro statement:
minrand = groupMin(rand, 20, minVar)
To explicitly reset the accumulator variable "minVar" if it drops below 1, you would enter the following Data360 Analyze Script:
if (minVar < 1) then minVar = 1emit minrand
In this case, the output would be as follows:
minrand: double |
---|
20 |
7.138 |
7.138 |
7.138 |
7.138 |
1 |
1 |
1 |
1 |
1 |
groupMax()
The groupMax()
macro is generally used to find the maximum values in a given field. For example, groupMax()
can be used to group your input data by customer, then find the maximum amount paid by each customer.
You can also use groupMax()
with a Transform (Deprecated) node, for example, to find the largest amount paid by any customer.
This macro should be used in the following format, where initialValue
is optional and if specified will be used as the first value to compare to, expression
references a field in the input data and accumulatorVariable
is the name of the variable used to store interim results.
groupMax(expression[, initialValue[, accumulatorVariable]])
Using groupMax() with an Agg Ex (Deprecated) node
You have the following input data:
Customer name:string | Payments:double |
---|---|
Bob | 34.87 |
David | 23.56 |
Bob | 23.56 |
Lucy | 235.87 |
David | 12.45 |
Lucy | 1.45 |
Bob | 12.56 |
Lucy | 300.00 |
Using an Agg Ex (Deprecated) node, group the data by customer name, then find the maximum amount paid by each customer.
In the GroupBy property, enter Customer name
.
In the Script property, enter:
_GroupMax = groupMax(Payments)
emit referencedFields(1,{{^GroupBy^}}),_GroupMax
where lastInGroup
The following data is output:
Customer name:string | _GroupMax:double |
---|---|
Bob | 34.87 |
David | 23.56 |
Lucy | 300.00 |
In this example, the '_GroupMax' column shows the highest amount paid by each customer.
Using groupMax() with a Transform (Deprecated) node
You have the following input data:
Customer name:string | Payments:double |
---|---|
Bob | 34.87 |
David | 23.56 |
Bob | 23.56 |
Lucy | 235.87 |
David | 12.45 |
Lucy | 1.45 |
Bob | 12.56 |
Lucy | 300.00 |
Using a Transform (Deprecated) node and the groupMax()
macro, enter the following in the Script property:
_maxPayment= groupMax('Payments')
emit 'Customer name','Payments', _maxPayment
The following data is output:
Customer name:string | Payments:double | _maxPayment:double |
---|---|---|
Bob | 34.87 | 34.87 |
David | 23.56 | 34.87 |
Bob | 23.56 | 34.87 |
Lucy | 235.87 | 235.87 |
David | 12.45 | 235.87 |
Lucy | 1.45 | 235.87 |
Bob | 12.56 | 235.87 |
Lucy | 300.00 | 300.00 |
In this example, the groupMax()
values are listed in the '_maxPayment' column. As the records are processed, each new value is compared to the first value. If the value is less than the first value, there is no change and the first value is still used. When a value that is higher than the first value is encountered, the higher value is listed.
Specifying an initial value
You have the following input data:
rand: int |
---|
7138 |
15790 |
27959 |
17809 |
16901 |
3465 |
22852 |
11228 |
26567 |
30176 |
12345 |
In a Transform (Deprecated) node, you specify the following in the Script property:
maxrand = groupMax(rand, 10000)
emit maxrand
The output would be:
maxrand: int |
---|
10000 |
15790 |
27959 |
27959 |
27959 |
27959 |
27959 |
27959 |
27959 |
30176 |
30176 |
With the same input data, if you did not specify an initial value in the Script property, the output would instead be as follows:
maxrand: int |
---|
7138 |
15790 |
27959 |
27959 |
27959 |
27959 |
27959 |
27959 |
27959 |
30176 |
30176 |
groupString()
Generally, the groupString()
macro is used with the Transform (Deprecated) or Agg Ex (Deprecated) nodes. It is used to concatenate the values from a given field, and must be used with string data. The value from the first row of data is added to the second row, then the two second row values are added to the third row, and so on. Optionally, you can specify an accumulator variable to define when to reset the accumulation. This macro may be useful if you want to reduce the number of rows across which the string data is spread.
This macro should be used in the following format:
groupString( value [, separator[, accumulatorVariable]])
Where:
-
value
must be a string data type, and is commonly an input field name. -
separator
is an optional string value that indicates the character to use to separate each of the concatenated strings, for example, a comma. -
accumulatorVariable
is the name of the variable used to store interim results.
Using groupString() with a Transform (Deprecated) node
You have the following input data:
color:string |
---|
Red |
Green |
Blue |
Yellow |
Cyan |
In the Script property, enter the following:
strColor = groupString(color, ", ")
emit strColor
The output would be:
strcolor:string |
---|
Red |
Red, Green |
Red, Green, Blue |
Red, Green, Blue, Yellow |
Red, Green, Blue, Yellow, Cyan |
Using groupString() with an Agg Ex (Deprecated) node
You have the following input data:
type:string | color:string |
---|---|
Primary | Red |
Primary | Green |
Primary | Blue |
Secondary | Yellow |
Secondary | Magenta |
Secondary | Cyan |
In the GroupBy property, enter type
.
In the Script property, enter the following:
strColor = groupString(color, ", ")
emit type, strColor
The output would be:
type:string | strcolor:string |
---|---|
Primary | Red |
Primary | Red, Green |
Primary | Red, Green, Blue |
Secondary | Yellow |
Secondary | Yellow, Magenta |
Secondary | Yellow, Magenta, Cyan |
In this example, you have the same input data as in the previous example and the same field is specified in the GroupBy property. This time, in the Script property, enter the following:
strColor = groupString(color, ", ")
emit type, strColor
where lastInGroup
The output would be:
type:string | strcolor:string |
---|---|
Primary | Red, Green, Blue |
Secondary | Yellow, Magenta, Cyan |
Using the accumulator variable
The accumulator variable can be used to explicitly control the accumulation and reset of accumulation, that is, you can control when to reset the concatenation of values..
You have the following input data to a Transform (Deprecated)node:
color:string | reset:boolean |
---|---|
Red | false |
Green | false |
Blue | true |
Yellow | false |
Cyan | false |
In the Script property, enter the following:
if ('reset') thencatVal = ""total = groupString(color, ", ", catVal)emit total
The output would be as follows:
total:string |
---|
Red |
Red, Green |
Blue |
Blue, Yellow |
Blue, Yellow, Cyan |
groupSum()
sum()
is no longer supported. If you previously used sum()
, please use groupSum()
.
groupSum()
, commonly used with the Agg Ex (Deprecated) node, produces a sum of all matching values based on the field(s) specified in the GroupBy property. If the data type of the specified field is numeric, the numerical sum is returned. For string data sets, the values are concatenated.
The groupSum()
macro should be used in the following format:
groupSum(value [, initialValue[, accumulatorVariable]])
Where:
-
value
is usually a field name -
initialValue
is an optional expression which, if specified, is used as the initial value of the sum -
accumulatorVariable
is optional and is the name of the variable that is used to store interim results
Using groupSum() with numerical data
You have the following input data:
id:int |
---|
1 |
2 |
3 |
4 |
5 |
In the Script property, enter the following:
total = groupSum(id)
emit total
The output would be:
id:int |
---|
1 |
3 |
6 |
10 |
15 |
Using groupSum() with string data
You have the following input data:
color:string |
---|
Red |
Green |
Blue |
Yellow |
Cyan |
In the Script property, enter the following:
total = groupSum(color, "Colors: ")
emit total
The output would be:
total:string |
---|
Colors: Red |
Colors: RedGreen |
Colors: RedGreenBlue |
Colors: RedGreenBlueYellow |
Colors: RedGreenBlueYellowCyan |
Using the accumulator variable
You can use the optional accumulator variable with the groupSum()
macro to explicitly control the accumulation and reset of accumulation, that is, you can control when to reset the concatenation of values.
You have the following input data:
color:string | reset:boolean |
---|---|
Red | false |
Green | false |
Blue | true |
Yellow | false |
Cyan | false |
In the Script property, enter the following:
if ('reset') thensumVar = "Colors: ")total = groupSum(color, "Colors: ", sumVar)emit total
The output would be:
total:string |
---|
Colors: Red |
Colors: RedGreen |
Colors: Blue |
Colors: BlueYellow |
Colors: BlueYellowCyan |