Built-in macros - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000
CAUTION:
This topic relates to Data360 Analyze Script which is the language that is used in some deprecated nodes. If you are looking for help configuring the Python-based nodes, see Python scripting.

Data360 Analyze Script contains a number of pre-built macros.

These macros, which act as a single instruction, are made up of a number of smaller instructions to enable you to perform common multi-step analytical procedures in one step.

As a first step in using these macros, you should assign the macro to a variable, for example:

<myVariable> = <groupMacro()>

Then, define the macro behavior by specifying which field you want to be the subject of the operation, as detailed in the following sections.

Note that while group macro values are calculated over an entire execution group, the value varies as records are processed. The value that is output is the value at the end of the execution group. If the macro contains an optional filter expression, only rows where that expression is true are counted. The firstInGroup and lastInGroup built-in variables can be used to determine whether execution is at the start or end of an execution group, see Built-in variables.

groupCount()

Note: The count() macro is no longer supported. If you previously used count(), please use groupCount().

groupCount, commonly used with the Agg Ex (Deprecated) node, can be used to count the number of instances of a value for each specified field.

The groupCount macro should be used in the following format, where filter is an optional boolean expression specifying what to count and accumulatorVariable is the name of the variable used to store interim results:

groupCount([filter[, accumulatorVariable]])

If the optional filter expression is given, when this value is encountered, the count is incremented.

If a field is specified in the GroupBy property, the data is grouped by the specified value and all instances of that value are counted. If two fields are specified in the GroupBy property, the groupCount function will only count values that match across the two fields. If only one field is specified in the GroupBy property, all instances of each value in that field will be counted regardless of whether the corresponding values in other fields match or not.

Note: The groupCount() variable is reset to 0 on firstInGroup. Due to this relying on the execution group, rather than the total number of execution iterations, the results of using groupCount on, for example, a Transform (Deprecated) and Agg Ex (Deprecated) node may vary even with the same number of records input.

Using groupCount() with a Transform (Deprecated) node

You have the following input data and want to count the number of rows:

color:string
Red
Green
Blue
Yellow
Cyan

In the Script property, enter:

rows = groupCount() emit rows, color

This gives the following output:

rows:int color:string
1 Red
2 Green
3 Blue
4 Yellow
5 Cyan

Using groupCount() with an Agg Ex (Deprecated) node

You have the following input data and want to group the data by "type" and count the number of instances of each:

type:string
Type1
Type1
Type1
Type2
Type2
Type3

In the GroupBy property enter:

type

In the Script property enter:

rows = groupCount()emit rows, type where lastInGroup

The output, as follows, shows each different "type" and how many rows there are of each:

rows:int type:string
3 Type1
2 Type2
1 Type3

In the following example, consider the same data and GroupBy property as in the previous example. This time, enter the following in the Script property:

rows = groupCount(type == "Type1") emit rows, type where lastInGroup

In this example, the output shows the data grouped by "type" and a count of the rows that contain the type "Type1":

rows:int type:string
3 Type1
0 Type2
0 Type3

You can use the optional accumulator variable to explicitly control the accumulation and reset of accumulation.

You have the following input data:

type:string include:boolean reset:boolean
Type1 false false
Type1 true true
Type1 true false
Type2 true false
Type2 true true
Type2 true false
Type3 true true

In the GroupBy property, enter:

type

In the Script property, enter:

if ('reset') then groupVar = 0rows = groupCount('include' == true, groupVar)emit rows, type

The output would be:

rows:int type:string
0 Type1
1 Type1
2 Type1
1 Type2
1 Type2
2 Type2
1 Type3

groupMin()

Returns the minimum value of the expression attained during execution within the current execution group. If initialValue is specified it will use that as the initial value to compare to.

The groupMin macro should be used in the following format, where the expression is often a column or variable, the initialValue is an optional expression, which if present, will be used as the initial value to compare to and accumulator variable is the name of the variable used to store interim results:

groupMin(expression[, initialValue[, accumulator variable]])

Given the following input data:

rand: double
279.59
7.138
15.79
15.488
14.584
0.3465
1.865
0.26377
2.7272
30.176
1.345

minrand = groupMin(rand, 200.0) emit minrand

The output would be as follows:

rand: double
200.0
7.138
7.138
7.138
7.138
0.3465
0.3465
0.26377
0.26377
0.26377
0.26377

In the following example, consider the same input data as in the previous example, but in this case the macro statement does not contain an initial value, as follows:

minrand = groupMin(rand)emit minrand

In this case, the output would be as follows:

rand: double
279.59
7.138
7.138
7.138
7.138
0.3465
0.3465
0.26377
0.26377
0.26377
0.26377

In this example, the optional accumulator variable is used with the groupMin macro to explicitly control the accumulation and reset of accumulation, that is, you can control when to reset the concatenation of values..

You have the following input data:

rand: double
7.138
15.79
15.488
14.584
0.3465
1.865
0.26377
2.7272
30.176
1.345

Specify the following groupMin macro statement:

minrand = groupMin(rand, 20, minVar)

To explicitly reset the accumulator variable "minVar" if it drops below 1, you would enter the following Data360 Analyze Script:

if (minVar < 1) then minVar = 1emit minrand

In this case, the output would be as follows:

minrand: double
20
7.138
7.138
7.138
7.138
1
1
1
1
1

groupMax()

The groupMax() macro is generally used to find the maximum values in a given field. For example, groupMax() can be used to group your input data by customer, then find the maximum amount paid by each customer.

You can also use groupMax() with a Transform (Deprecated) node, for example, to find the largest amount paid by any customer.

This macro should be used in the following format, where initialValue is optional and if specified will be used as the first value to compare to, expression references a field in the input data and accumulatorVariable is the name of the variable used to store interim results.

groupMax(expression[, initialValue[, accumulatorVariable]])

Using groupMax() with an Agg Ex (Deprecated) node

You have the following input data:

Customer name:string Payments:double
Bob 34.87
David 23.56
Bob 23.56
Lucy 235.87
David 12.45
Lucy 1.45
Bob 12.56
Lucy 300.00

Using an Agg Ex (Deprecated) node, group the data by customer name, then find the maximum amount paid by each customer.

In the GroupBy property, enter Customer name.

In the Script property, enter:

_GroupMax = groupMax(Payments)

emit referencedFields(1,{{^GroupBy^}}),_GroupMax

where lastInGroup

The following data is output:

Customer name:string _GroupMax:double
Bob 34.87
David 23.56
Lucy 300.00

In this example, the '_GroupMax' column shows the highest amount paid by each customer.

Using groupMax() with a Transform (Deprecated) node

You have the following input data:

Customer name:string Payments:double
Bob 34.87
David 23.56
Bob 23.56
Lucy 235.87
David 12.45
Lucy 1.45
Bob 12.56
Lucy 300.00

Using a Transform (Deprecated) node and the groupMax() macro, enter the following in the Script property:

_maxPayment= groupMax('Payments')

emit 'Customer name','Payments', _maxPayment

The following data is output:

Customer name:string Payments:double _maxPayment:double
Bob 34.87 34.87
David 23.56 34.87
Bob 23.56 34.87
Lucy 235.87 235.87
David 12.45 235.87
Lucy 1.45 235.87
Bob 12.56 235.87
Lucy 300.00 300.00

In this example, the groupMax() values are listed in the '_maxPayment' column. As the records are processed, each new value is compared to the first value. If the value is less than the first value, there is no change and the first value is still used. When a value that is higher than the first value is encountered, the higher value is listed.

Specifying an initial value

You have the following input data:

rand: int
7138
15790
27959
17809
16901
3465
22852
11228
26567
30176
12345

In a Transform (Deprecated) node, you specify the following in the Script property:

maxrand = groupMax(rand, 10000) emit maxrand

The output would be:

maxrand: int
10000
15790
27959
27959
27959
27959
27959
27959
27959
30176
30176

With the same input data, if you did not specify an initial value in the Script property, the output would instead be as follows:

maxrand: int
7138
15790
27959
27959
27959
27959
27959
27959
27959
30176
30176

groupString()

Generally, the groupString() macro is used with the Transform (Deprecated) or Agg Ex (Deprecated) nodes. It is used to concatenate the values from a given field, and must be used with string data. The value from the first row of data is added to the second row, then the two second row values are added to the third row, and so on. Optionally, you can specify an accumulator variable to define when to reset the accumulation. This macro may be useful if you want to reduce the number of rows across which the string data is spread.

This macro should be used in the following format:

groupString( value [, separator[, accumulatorVariable]])

Where:

  • value must be a string data type, and is commonly an input field name.
  • separator is an optional string value that indicates the character to use to separate each of the concatenated strings, for example, a comma.
  • accumulatorVariable is the name of the variable used to store interim results.

Using groupString() with a Transform (Deprecated) node

You have the following input data:

color:string
Red
Green
Blue
Yellow
Cyan

In the Script property, enter the following:

strColor = groupString(color, ", ") emit strColor

The output would be:

strcolor:string
Red
Red, Green
Red, Green, Blue
Red, Green, Blue, Yellow
Red, Green, Blue, Yellow, Cyan

Using groupString() with an Agg Ex (Deprecated) node

You have the following input data:

type:string color:string
Primary Red
Primary Green
Primary Blue
Secondary Yellow
Secondary Magenta
Secondary Cyan

In the GroupBy property, enter type.

In the Script property, enter the following:

strColor = groupString(color, ", ") emit type, strColor

The output would be:

type:string strcolor:string
Primary Red
Primary Red, Green
Primary Red, Green, Blue
Secondary Yellow
Secondary Yellow, Magenta
Secondary Yellow, Magenta, Cyan

In this example, you have the same input data as in the previous example and the same field is specified in the GroupBy property. This time, in the Script property, enter the following:

strColor = groupString(color, ", ") emit type, strColor where lastInGroup

The output would be:

type:string strcolor:string
Primary Red, Green, Blue
Secondary Yellow, Magenta, Cyan

Using the accumulator variable

The accumulator variable can be used to explicitly control the accumulation and reset of accumulation, that is, you can control when to reset the concatenation of values..

You have the following input data to a Transform (Deprecated)node:

color:string reset:boolean
Red false
Green false
Blue true
Yellow false
Cyan false

In the Script property, enter the following:

if ('reset') thencatVal = ""total = groupString(color, ", ", catVal)emit total

The output would be as follows:

total:string
Red
Red, Green
Blue
Blue, Yellow
Blue, Yellow, Cyan

groupSum()

Note: The sum() is no longer supported. If you previously used sum(), please use groupSum().

groupSum(), commonly used with the Agg Ex (Deprecated) node, produces a sum of all matching values based on the field(s) specified in the GroupBy property. If the data type of the specified field is numeric, the numerical sum is returned. For string data sets, the values are concatenated.

The groupSum() macro should be used in the following format:

groupSum(value [, initialValue[, accumulatorVariable]])

Where:

  • value is usually a field name
  • initialValue is an optional expression which, if specified, is used as the initial value of the sum
  • accumulatorVariable is optional and is the name of the variable that is used to store interim results

Using groupSum() with numerical data

You have the following input data:

id:int
1
2
3
4
5

In the Script property, enter the following:

total = groupSum(id) emit total

The output would be:

id:int
1
3
6
10
15

Using groupSum() with string data

You have the following input data:

color:string
Red
Green
Blue
Yellow
Cyan

In the Script property, enter the following:

total = groupSum(color, "Colors: ") emit total

The output would be:

total:string
Colors: Red
Colors: RedGreen
Colors: RedGreenBlue
Colors: RedGreenBlueYellow
Colors: RedGreenBlueYellowCyan

Using the accumulator variable

You can use the optional accumulator variable with the groupSum() macro to explicitly control the accumulation and reset of accumulation, that is, you can control when to reset the concatenation of values.

You have the following input data:

color:string reset:boolean
Red false
Green false
Blue true
Yellow false
Cyan false

In the Script property, enter the following:

if ('reset') thensumVar = "Colors: ")total = groupSum(color, "Colors: ", sumVar)emit total

The output would be:

total:string
Colors: Red
Colors: RedGreen
Colors: Blue
Colors: BlueYellow
Colors: BlueYellowCyan