Performs calculations on input field values, producing new output fields.
Uses Python scripting to apply an expression to fields in data attached to the input node, generating new fields in your data to store the results of the calculation.
Examples
The examples below use the default data from the Create Data node.
Overwriting an existing field
If a calculated field has the same name as an existing field in your data, the field is overwritten with the contents of the calculated field. For example, using the data from the Create Data node, add a Calculated Field to alter the value of the rand field.
Name | Type | Expression |
---|---|---|
rand | int | rand *2 |
Running the node against the default data replaces the rand field in the original data, with a new rand field, displayed as column one in the data, where for each row the value is double the value from the input.
Reusing a calculated field
You can create a new calculated field and then reuse that field in a subsequent expression in the same table. For example, the previous example can be extended to create a new field called 'randx2'.This new field can then be used in a new calculation:
Name | Type | Expression |
---|---|---|
double rand | int |
rand *2
|
and add one | int |
|
Running the node against the default data creates two new fields, randx2 and randx4. The value of randx4 is based on the value of randx2.
Condition-based values
You can set a value for a calculated field depending on whether or not a condition is met in an existing field.
Name | Type | Expression |
---|---|---|
penalty | int |
100 if IOU>400 else 10
|
Using datetime
You can use the imported Python datetime
module to calculate values based on existing datetime fields. The following examples can be applied to the default data.
Name | Type | Expression |
---|---|---|
nextAnnual | date |
datetime.date(year=dueDate.year+1,month=dueDate.month,day=dueDate.day)
|
startOfMonth | date |
datetime.date(year=dueDate.year,month=dueDate.month,day=1)
|
sinceDue | int |
(datetime.datetime.now().date()-dueDate).days
|
Data can be automatically converted from datetime to date or time formats, or from date to datetime format. For example, if you have a field in datetime format, you can use the Calculate Fields node to create a new field in date or time format from the existing field:
Name | Type | Expression | Result |
---|---|---|---|
createDate | date |
create
|
2004-01-01 |
createTime | time |
create
|
12:35:41 |
See Pythonmodule support for more information about how datetime conversion handles your data.
Using math
You can use the imported Python math
module to use mathematical functions defined by the C standard, for example math.ceil()
, math.floor()
, math.sqrt()
, math.log()
.
Name | Type | Expression |
---|---|---|
rootIOU | double |
math.sqrt(IOU)
|
rootRand | double |
math.sqrt(rand)
|
cleanRootRand | double |
math.sqrt(rand) if rand > 0 else 0
|
Note that applying the second example to the default data produces 18 calculated fields and two errors, because two of the rows in the default data contain negative values for the rand
field. You can use the example in the third row to remove these errors.
Using regular expressions
You can use the imported Python re
module to apply regular expressions to a field.
Name | Type | Expression |
---|---|---|
Match_start | boolean |
re.match(r"An", name)
|
Match_any | boolean |
re.search(r"An", name, flags=re.IGNORECASE)
|
The first expression returns true for any row where the name field starts with 'An'. Note that the regular expression is case-sensitive.
The second expression returns true for any row where the name field contains 'An' at any position. Note that this expression is also not case-sensitive, so it returns true for 'Anatoli' and 'Jordan'.
Replacing null values
You can use the optional DefaultValueForNull* properties to define a default value for a NULL field of a given data type. This can be useful if you need to overcome a situation where a NULL value would otherwise cause the expression that you enter in the Calculated Fields table to fail.
For example, you have the following date
type input data:
dueDate |
---|
2022-10-10 |
2022-06-01 |
2022-05-05 |
NULL |
NULL |
NULL |
2022-04-23 |
2022-04-19 |
NULL |
NULL |
In the Calculated Fields table, define a date
field called dueDate
, to overwrite an existing dueDate
field:
Name | Type | Expression |
---|---|---|
dueDate | date |
dueDate
|
In the DefaultValueForNullDate property, set the value to Today.
This example uses the DefaultValueForNullDate property because the input field that is being calculated contains date type data. This example was created on 2022-03-02 (today), therefore this is the date value that will be used to replace NULL values when the DefaultValueForNullDate property is set to Today.
After running the Calculate Fields node, the output is as follows:
dueDate |
---|
2022-10-10 |
2022-06-01 |
2022-05-05 |
2022-03-02 |
2022-03-02 |
2022-03-02 |
2022-04-23 |
2022-04-19 |
2022-03-02 |
2022-03-02 |
Using built-in variables
You can use the following built-in variables to add new fields to your output:
-
node.firstExec
- Boolean object that is True for the first input record. Allows variable values to be set when processing the first record. -
node.lastExec
- Boolean object that is True for the last input record. Allows values to be derived when all records have been processed. -
node.execCount
- Provides a count of the current record number.
Name | Type | Expression |
---|---|---|
recordID | long |
|
delta | int |
|
increment | long |
|
completed | boolean |
|
Result
recordID | delta | increment | completed |
---|---|---|---|
1 | 0 | 0 | false |
2 | 100 | 200 | false |
3 | 100 | 300 | false |
4 | 100 | 400 | true |
Referencing property values in your calculation - advanced example
You can use the node.properties
built-in variable to add a new field from a calculation that references a property value on the node.
This example uses the following input data:
net_price (double) |
---|
10.0 |
12.5 |
22.34 |
10.5 |
The node has a Tax Multiplier
property which is set to 1.2
.
Calculated Fields (where net_price
is the name of the input field, tax.multiplier
is the run property name for the Tax Multiplier
property):
Name | Type | Expression |
---|---|---|
gross_price | double |
|
Result
gross_price (double) | net_price (double) |
---|---|
12.0 | 10.0 |
15.0 | 12.5 |
26.808 | 22.34 |
12.6 | 10.5 |
The same result could also be achieved by using textual substitution:
net_price * float("{{^Tax Multiplier^}}")
Properties
Calculated Fields
A table containing rows of fields to be created from expressions applied to existing fields. Populate the table to create new output fields, or to override input fields with calculated values based on the data from the node's input.
Each row in the table represents a different field to be written to the output. The table must be populated, and each column in the table must be filled for each of the table rows. Calculated fields can be referenced in any row in the table after they are declared.
Name
Enter a name for the output field. If the name of the output field is the same as that of an input field, the input field is overwritten.
Type
Choose a field Type from the drop-down list.
Possible values are:unicode, double, int, long, date, datetime, time, boolean, string. The value must match the data type of the evaluated expression.
Expression
Enter an expression to be applied to one or more fields in the data.
The expression should be entered as a Python expression. To include a reference to a field in the data in your expression, enter the name of the field.
fields['<FIELDNAME>']
if either of the following conditions applies:
- The field name is a reserved Python keyword.
- The field name contains a space.
If you enter a conditional expression, the expression must specify the values to be used for both True and False conditions, for example 100 IFIOU>400 else 10
. If the Type of the calculated field is boolean, enter an expression to be matched for a true condition, for example IOU>100
The following Python modules are imported by default for use in expressions:
- datetime - imported as datetime, allowing you to perform basic calculations on date, time, and datetime fields.
- fn - imported as fn, providing null-safe comparisons and manipulation of strings that might be null. For more information, see The null-safe function module "fn".
- math - imported as math, providing access to Python mathematical functions. For more information, see the Python math module documentation.
- re - imported as re, allowing you to specify regular expressions as part of an expression.
See Examples for sample usage.
DefaultValueForNullText
Optionally specify the value to be used for Null input string or unicode fields when they are referenced in an expression in the Calculated Fields table.
Either choose the Null value from the drop-down list, or enter a different default value. The default value is an empty string.
DefaultValueForNullNumber
Optionally specify the value to be used for Null input numeric fields when they are referenced in an expression in the Calculated Fields table.
Choose one of the following from the drop-down list, or enter a different value:
- 0
- 1
- Null - Null numeric values are left null when used in an expression.
The default value is 0.
DefaultValueForNullBoolean
Optionally specify the value to be used for Null input boolean fields when they are referenced in an expression in the Calculated Fields table.
Choose from one of the following options:
- True
- False
- Null
The default value is False.
DefaultValueForNullDate
Optionally specify the value to be used for Null input date fields when they are referenced in an expression in the Calculated Fields table.
Choose one of the following from the drop-down list, or enter a different default date value in the format yyyy-MM-dd:
- Today - Null date values that are used in an expression are set to the date when the node began execution.
- Null - Null date values are left null when used in an expression.
- 1970-01-01
The default value is 1970-01-01.
DefaultValueForNullTime
Optionally specify the value to be used for Null input time fields when they are referenced in an expression in the Calculated Fields table.
Choose one of the following from the drop-down list, or enter a different default time value in the format HH:mm:ss:
- Now - Null time values that are used in an expression are set to the time when the node began execution.
- 00:00:00
- Null - Null time values are left null when used in an expression.
The default value is 00:00:00.
DefaultValueForNullDateTime
Optionally specify the value to be used for Null input datetime fields when they are referenced in an expression in the Calculated Fields table.
Choose one of the following from the drop-down list, or enter a different default datetime value in the format yyyy-MM-dd HH:mm:ss:
- Today - Null datetime values that are used in an expression are set to the start of the day when the node began execution.
- Now - Null datetime values that are used in an expression are set to the value of datetime when the node began execution.
- Null - Null datetime values are left null when used in an expression.
- 1970-01-01 00:00:00
The default value is 1970-01-01 00:00:00.
WriteErrorRecordsToMainOutput
Optionally specify whether records are written to the first output when errors were encountered evaluating expressions against the corresponding input record.
For each input record, all expressions in the table are evaluated. For any expression that has an error evaluating against a given input record, an error record is written to the Errors output.
Choose one of the following values:
- False - no output record is written to the first output.
- True - an output record is written to the first output, with all of the output fields populated that did not have expression errors.
The default value is False.
ErrorThresholdCount
Optionally specify the number of input records that can result in expression errors before the node fails.
For each input record, all expressions in the table are evaluated. Therefore, for each input record there could potentially be an error for each row in the Calculated Fields table. This threshold, however, applies to the number of input records against which errors were reported while evaluating the expressions.
Specify a valid, non-negative integer value for the field, or one of the following values:
- 0 - the node fails on the first error encountered.
- -1 - the node never fails due to an expression error against an input record
The default value is -1.
ErrorThresholdPercentage
Optionally specify the percentage of input records that can result in expression errors before the node fails.
For each input record, all expressions in the table are evaluated. Therefore for each input record there could be potentially an error for each row in the Calculated Fields table. This threshold however, applies to the the percentage of input records against which errors were reported while evaluating the expressions.
The threshold is only applied after at least 100 records have been processed, or all records have been processed if there are less than 100 input records.
Specify a valid integer for the field between 1 and 100 (inclusive), or one of the following values:
- 0 - the node will never fail due to an expression error against an input record.
- -1 - the threshold is not applied.
The default value is 100 - meaning that the node will fail if all input records or all of the first 100 input records have errors.
Inputs and outputs
Inputs: in1.
Outputs: calculated fields, errors.