You can use Data360 Analyze Script to control the data that is output from nodes. Some of the most common Data360 Analyze Script output statements are listed below:
emit
The most common output-related keyword is emit
which simply means "output". Use this keyword to define which of the input fields you want to include in the output.
emit * (output all)
If you want to output all fields in the source data, use emit *
. In general, you will use emit * combined with other output-related Data360 Analyze Script.
override emit
To override the data in an existing field, override emit
can be used after an emit *
statement.
override emit
cannot be used if default emit
is used.You have the following input data to a Transform (Deprecated) node:
Product: string | number: double |
---|---|
Water | 1.0 |
Coffee | 2.0 |
Tea | 2.0 |
Juice | 3.0 |
Cake |
4.0 |
You want to change the data in the "number" field from a decimal number to a whole number. In the Script property, enter:
emit *override emit number.int() as number
The output would be:
Product: string | number: double |
---|---|
Water | 1 |
Coffee | 2 |
Tea | 2 |
Juice | 3 |
Cake |
4 |
default emit
Describes a default value to be used if there are no input fields with the specified name.
default
keyword is not used as a default value for cases where there are null values. For this, you should use the ifNull operator. default emit
cannot be used if override emit
is used.You have the following input to a Transform (Deprecated)node:
id:int | name:string |
---|---|
1 | fred |
2 | barry |
NULL | mark |
3 | NULL |
In the Script property, enter the following:
id = -1name = "Unknown"emit *default emit iddefault emit name
This results in the output being exactly the same as the input.
Consider the same Transform (Deprecated) node, with the same code in the Script property, but this time, you have the following input:
differentId:int | lastName:string |
---|---|
1 | fred |
2 | barry |
NULL | mark |
3 | NULL |
In this case, the output would be:
differentId:int | lastName:string | id:int | name:string |
---|---|---|---|
1 | fred | -1 | "unknown" |
2 | barry | -1 | "unknown" |
NULL | mark | -1 | "unknown" |
3 | NULL | -1 | "unknown" |
emit as
Allows you to specify a name to use for a given field in the output.
That is, emit x as y
will emit the value of the variable "x" to the field "y". If "y" has been defined as a variable, this will fail.
Renames the field with the value of 'tax' to the name "previousTax".
emit 'tax' as previousTax
rename
Renames a specified field name, and is used in the following format:
rename
<existing field name> as
<new field name>
Outputs the data from a column named "a" and labels it as "b":
emit arename a as b
rename
statement must follow an emit
statement. The following incorrect Data360 Analyze Script would produce the error "field 'a' does not exist":rename a as bemit *
exclude
Excludes a field from the output. Must be used within an output statement.
Rather than listing all the fields that you want to include in the output, you can instead list the ones that you want to exclude.
Outputs all input fields except "cust name" and "id" from billing input:
output 1 {
emit *exclude "cust name", "billing:id"}
where
The where
keyword is commonly used in conjunction with emit
to control the output based on a condition and is used in the following format:
emit <fieldList>where <conditional expression>
where fieldList
is a list of the fields that you want to include in the output and conditional expression
is a true / false expression. Each row is checked against the conditional expression, and where the value is true it is included in the output.
You have the following input data to a Transform (Deprecated) node:
Product:string | Cost:double |
---|---|
Chair | 51.0 |
Table | 400.0 |
Sofa | 350.5 |
Cushion | 12.0 |
TV stand | 20.0 |
You only want to output the rows where the cost is greater than 50.
In the Script property, enter the following:
emit *wherecost > 50
The output would be as follows:
Product:string | Cost:double |
---|---|
Chair | 51.0 |
Table | 400.0 |
Sofa | 350.5 |
output
Used when you want to take input data and split it into multiple outputs and denotes the output (name or number) to which the specified fields will be emitted. The output name is case insensitive. When used in a Transform (Deprecated) node, the output keyword allows you to split data into as many outputs as needed.
The following format is used ("extend" and "optional" are optional keywords and can be used in conjunction with each other):
[extend] output
[optional] <"name"
> or <number
> <output configuration statement
>
Outputs all fields where the state is "FL", except "state". The output name is "Florida".
output "Florida" {emit * where state = "FL"exclude "state"}
Outputs all fields where the state is anything other than "FL". The output number is "2".
output 2 {emit * where state != "FL"}
extend
The extend
keyword can be used within the context of an output statement. If an output has already been defined, the extend
keyword can be used to further configure the output.
You have the following input data to a Transform (Deprecated) node:
column1:int | column2:int |
---|---|
2 | 4 |
5 | 6 |
6 | 6 |
7 | 18 |
20 | 22 |
You want to calculate the difference between "column1" and "column2". In the Script property, enter the following:
difference = 'column2' - 'column1'emit *extend output 1emit difference as "Difference"
The output would be as follows:
column1:int | column2:int | Difference: int |
---|---|---|
2 | 4 | 2 |
5 | 6 | 1 |
6 | 6 | 0 |
7 | 18 | 11 |
20 | 22 | 2 |
optional
The optional
keyword allows you to specify an output statement that will only take effect if the specified output exists. For example, on a node which has an optional "error" pin, you may want to output any error information to the "error" pin, if it exists.
dividend = 'dividendField'denominator = 'denominatorField'fraction = double(null)if (denominator.isNull()) then {
error = trueerrorMsg = "Null denominator"
}else if (denominator == 0) then {
error = trueerrorMsg = "Unable to divide by 0"
}else {
fraction = dividend / denominator
}#Output all input fields, to all outputsemit *emit fraction as "Fraction"
#If there is a second output, write errors to that output if there are any.output optional 2 {
emit errorMsgemit execCount as "RecordNumber"where error
}
do output
Configures output for a node which is not necessarily based on the input. The statements contained within the do output block control what is sent to the given output.
Requirements:
- The output on which the do output block operates must have already been defined within the Data360 Analyze Script using the output operator.
- The do output can specify an output name or an output number.
- All of the fields which are to be output must have been defined via emit statements in the original output statement; no new fields can be defined in the do output block.
- Only specifically named fields can be used within an emit statement in the do output block; emitpattern, emit * and so on, cannot be used.
The following format is used, where num
must be an integer literal corresponding to an output on the node which has already been defined in an output statement:
do output {"name"|num} {statement}*
Note that the double quotes are required around the name
.
The output is defined to pass through all of the input fields, and define a "RecordNum" output field. The do output statement then specifies that "outputCount" records are to be written, where outputCount is a field on the input. If outputCount is less than or equal to zero, no records will be output. Otherwise, for a given input record, "outputRecord" records will be written.
output 1 {
#Pass through all of the input fields emit *
#Additionally, define an output field "RecordNum", but don't actually write the records here
emit null.int() as "RecordNum"
where false
}
#Now, loop through 'outputCount' times
i = 0
while (i < 'outputCount') { #Each time, define to write a record to output 1
do output 1 { #Write the iteration count as the "RecordNum" field
emit i as "RecordNum"
}
i = i + 1
}
The output statement is defined using the output name ("Data"). However, the do output statement still operates on the output number. Assume that there is an output "Data" which is the second node output.Define the output (required prior to using do output)
output "Data" { emit * emit null.int() as "RecordNum"
where false
}#Now, loop through 'outputCount' times
i = 0
while (i < 'outputCount') { #Each time, define to write a record to output 2 ("Data")
do output 2 { #Write the iteration count as the "RecordNum" field
emit i as "RecordNum"
}
i = i + 1
}
null
This keyword evaluates to the null value, with a null type. The null value is different to an empty string (""). Input and output fields, which must otherwise have a constant and well-defined value type, may always contain the null value in any record. Therefore, operators must handle the case where one or more of their parameters are null. When a null argument is provided for a parameter, operators will usually treat it as an identity operation for the other parameter(s), that is, 10 * null == 10 * 1
.
null + 10 # value: 10null - 10 # value: -1010 * null # value: 1010 / null # value: 10
If an identity operation is not possible, the operators will evaluate to null if it makes sense for the particular operator, otherwise they will error.
null / 10 # value: null
In comparison operations, null is the minimal element in any ordering.
10 > null # value: true-10 > null # value: true10 < null # value: false-10 < null # value: false
It is possible to check for null values, and supply defaults for any cases where a value is null using the ifNull operator. The following example can be used to search an input field for a string value, where the input field may contain nulls. If the input field ("myField" in this case) is null, then the search will instead be performed over the empty string. This makes the search "safe" for handling null values.
Find "j" in the field "myField". If "myField" is null, then it will instead search the empty string, returning -1 and assigning this to x.
x = strFind(ifNull('myField', ""), "j")
Even though output fields may contain the null value, they must still have a non-null type. A cast operator should be used on the null value when it would otherwise be ambiguous.
flag = Truefoo = if (flag) then 'color' else null # ambiguous typebar = if (flag) then "bar" else str(null) # string typeoutput 1 {
emit foo as "foo" # ERROR
emit bar as "bar" # ok
emit null as "baz" # ERROR
}
pattern
This keyword matches input field names to a pattern that you specify as a regular expression. If used directly after the emit
keyword, only field names that match the specified pattern are included in the output. Or, when used in conjunction with the keyword exclude
, matching fields are excluded from the output.
To indicate that you want to include fields that match the specified pattern
, the following format should be used, where regularExpression
is the pattern that is matched against the input field names:
emit pattern "regularExpression"
To indicate that you want to exclude fields that match the specified pattern
, the following format should be used, where field(s)
is a list of the fields that you want to output, excluding any that match the regularExpression
:
emit field(s) exclude pattern "regularExpression"
Technical note: regularExpression
is evaluated according to the Perl regular expression standard.
For each of the following three examples, the input data field names are as follows:
rand |
rind |
ron |
roberto |
rich |
robert |
bob |
bobert |
emit pattern "r"
Outputs all fields where the field name contains the letter "r". All fields except "bob" are output.
emit pattern "r?nd"
Outputs all fields where the field name contains either "rnd" or "nd" anywhere in their names. Only "rand" and "rind" are output because they contain "nd"; "rnd" does not match any fields.
emit * exclude pattern "r.+[ot]"
Excludes fields that start with "r" and end with "o" or "t" and contain at least one character in the middle; "roberto" and "robert" are excluded.
exclude pattern
is not case sensitive. For example, emit * exclude pattern "r.+[ot]"
would output all fields that start with "r" or "R".Using multiple exclude pattern statements
exclude pattern
statements are needed, they must be typed on separate lines.The following Data360 Analyze Script illustrates how to correctly write two exclude pattern
statements:emit * exclude pattern "[A-Z]{1}I[A-Z]{1}[0-9]{2}"exclude pattern "A"
For simpler wildcard matching, please see the wildcard keyword.
wildcard
You can use a wildcard character to match field names to a simple pattern. Matching fields are the included in, or excluded from, the output depending on the keyword that preceeds the wildcard.
Use the asterix character "*" to indicate that you want to match any number of characters.
Use the question mark character "?" to indicate that you want to match a single character.
Include all fields that begin with "A_" except if there is an "X" following the "A_":
emit wildcard "A_*"
exclude wildcard "A_X*"
In this example, you have the following input field names:
rand |
rind |
ron |
roberto |
rich |
robert |
bob |
bobert |
To output all fields where the field name begins with "r", you would type:
emit wildcard "r*"
In this case, all fields are output except for "bob" and "bobert".
To output any field where the field name begins with "ro" and has only one other additional character following "ro", you would type:
emit wildcard "ro?"
In this case, only "ron" is output.
To exclude any field where the field name starts with the letter "r", contains any single character, and then ends with "nd", you would type:
emit * exclude wildcard "r?nd"
In this case, "rand" and "rind" are excluded.
For more complex regular expression matching, please see the pattern keyword.