For both the ConfigureFields and ProcessRecords Python scripts the following variables are available:
Variable name | Type | Descripton |
---|---|---|
node | Node | The node object used for logging and failure control, properties access and explicit record writing. |
inputs |
When used in ConfigureFields , "inputs" is a collection of Metadata. When used in ProcessRecords , "inputs" is a collection of Record objects. |
Please note that in the Transform node these lists will only contain one item as there is only one input. Access via:
|
outputs |
When used in ConfigureFields, "outputs" is a collection of MetadataBuilder objects. When used in ProcessRecords, "outputs" is a collection of output Record objects. |
Access via:
outputs['InputName'] or
outputs[inputIndex] |
patterns | Pattern |
Used in ConfigureFields to build patterns to search for input fields. Can also be used in record key properties (i.e. in the Advanced tab of the Match Keys, GroupBy and SortBy properties). |
fields |
When used in ConfigureFields, "fields" contains the metadata for all of the fields in the inputs. In the Transform and Aggregate nodes, this is just one input, but in other nodes, this would contain references to all the fields against all inputs (unless explicitly restricted to a particular input). When used in ProcessRecords, "fields" references the fields on input records, as opposed to field metadata. |
Access the field metadata using
If there are multiple fields with the same name across the inputs to the node, any attempt to reference that field directly, or to use the |
fn | Null-safe functions module. | A set of Null-safe functions for data comparisons and string manipulation. |
group | Grouping module. |
A set of functions for easy aggregation of data. Note: The aggregation functions are only applicable to nodes which define some form of data grouping. For example, they can be used in the GroupBy property of the Aggregate and Transform nodes. The aggregation functions are not available on nodes that do not have a GroupBy property, including the Merge node.
|
In addition, each input and output will be directly bound as a variable available within the Python script (e.g. in1, out1), as long as the input or output name meets the following criteria:
- Is a valid python identifier (e.g. does not contain spaces).
- Does not conflict with any of the other bound variables (node, inputs, outputs, patterns).
- Does not conflict with any Python built-in type or keyword (e.g. print, raise, str, int).
- Does not conflict with an imported module provided with Data360 Analyze.
If the input or output does not meet the above criteria, it must be accessed from the corresponding inputs or outputs collection, respectively.
The bound input and output variables will have the same type as the corresponding entry in the inputs/outputs collection – Metadata/Record for input in ConfigureFields/ProcessRecords respectively, and MetadataBuilder/Record for output in ConfigureFields/ProcessRecords respectively.
Each of these variables will be bound into ConfigureFields and then bound into ProcessRecords prior to each invocation of the ProcessRecords script. The scope of the local variable space is shared between the script in ConfigureFields and ProcessRecords. This means that any variable, function definition or import made within ConfigureFields will be available for use within the ProcessRecords script so long as the name does not conflict with one of the variables that are directly bound into ProcessRecords. This means, for example, that you can define a local variable "myVariable" in ConfigureFields then use that variable in the ProcessRecords script. However, if there is an input or output on the node named "myVariable", then the variable will be overwritten by the input/output bound into ProcessRecords.
For more information on the different object types, see:
- Node objects
- Metadata objects
- FieldMetadata objects
- MetadataBuilder objects
- Record objects
- Logger objects
- Properties objects
- Patterns objects
- FieldPattern objects
Node objects
Members |
properties
logger
firstExec
lastExec
firstInGroup
lastInGroup
execCount
|
||||||
Methods |
write(outputName|outputIndex, outputRecord)
fail()
|
Metadata objects
Metadata objects are a container of FieldMetadata.
The metadata object largely emulates a Python dict type with the key being the name of the field and the value, the field metadata.
The __len__, __getitem__, __iter__, __reversed__ and __contains__ functions are implemented on the Metadata type as defined in the Python documentation available at: https://docs.python.org/2/reference/datamodel.html#emulating-container-types* .
This means that along with being able to use the reversed and contains methods directly, the following operations are supported from the Python dict interface as defined in https://docs.python.org/2/library/stdtypes.html#mapping-types-dict*:
- len(d)
- d[key]
- key in d
- key not in d
- iter(d)
Members |
<fieldName>
The Metadata itself is also just a collection of fields such that individual fields can be accessed using either of the following forms:
|
||||
Methods |
todict()
intersection(other)
difference(other)
|
FieldMetadata objects
Methods |
name()
type()
|
MetadataBuilder objects
The metadata builder is used for building an output metadata.
The metadata builder object largely emulates a Python dict type with the key being the name of the field and the value, the field metadata.
The __len__, __getitem__, __iter__, __reversed__, __setitem__, __delitem__ and __contains__ functions are implemented on the Metadata type as defined in: https://docs.python.org/2/reference/datamodel.html#emulating-container-types*
This means that along with being able to use the reversed and contains methods directly, the following operations are supported from the Python dict interface as defined in https://docs.python.org/2/library/stdtypes.html#mapping-types-dict*:
- len(d)
- d[key]
- d[key] = value
- del d[key]
- key in d
- key not in d
- iter(d)
The node takes care of constructing the output metadata from the builder, therefore there are generally no members and only a few methods required on the metadata builder. However, there are various useful operations, as follows:
Operations |
Note: When using the
+= operator, an error will be raised if any of the fields being added to the output would have the same (case-insensitive) output field name as one which already exists.When adding a new field, if a field with the same (case-insensitive) name already exists, to remove the existing field and add the new field, explicitly name the new field by using the following syntax: out1.<fieldname> = obj
|
Methods |
todict()
|
Record objects
Record objects are a container of field values.
The record object largely emulates a Python dict type with the key being the name of the field and the value, being the value for that field on the record.
The __len__, __getitem__, __iter__, __reversed__, __setitem__ and __contains__ functions are implemented on the Metadata type as defined in: https://docs.python.org/2/reference/datamodel.html#emulating-container-types*
This means that along with being able to use the reversed and contains methods directly, the following operations are supported from the Python dict interface as defined in https://docs.python.org/2/library/stdtypes.html#mapping-types-dict*:
- len(d)
- d[key]
- d[key] = value
- key in d
- key not in d
- iter(d)
Members |
<fieldName>
The Record itself is also just a collection of field values such that individual field values can be accessed using either of the following forms:
|
Operations |
|
Methods |
metadata()
todict()
|
Logger objects
Methods |
debug(message)
info(message)
warn(message)
error(message)
|
Properties objects
Methods |
All property methods work by taking both the Run Time Property name of the property and the property name. The node retrieves the property via the Run Time Property name, however any errors on property retrieval are reported against the (often more user friendly) property name.
isSet(propName, runtimePropName)
getString(propName, runtimePropName[, default])
getInt(propName, runtimePropName[, default])
getBool(propName, runtimePropName[, default)
|
Patterns objects
Methods |
The Patterns object has simple methods for constructing and returning FieldPattern objects which can then be provided to the MetadataBuilder The Patterns object can also be used in record key properties (in the Advanced tab of the Match Keys, GroupBy and SortBy properties). This allows you to specify multiple input fields on which to join, group or sort the input data based on fields that match the pattern. For example, in the Aggregate node you can specify a wildcard pattern in the Advanced tab of the GroupBy property to group the input data by all fields that match the pattern. Note: If a pattern is specified it must match at least one input field name.
wildcard(pattern[, metadata])
all([metadata])
regex(pattern[, metadata])
|
FieldPattern objects
Methods |
A FieldPattern object is returned as the result of the various method calls available on the Patterns object. FieldPattern objects are generally used as arguments to the MetadataBuilder
rename(renamePattern)
Tip: Standard Python regular expressions are available via the "re" module.
|
(* links correct at time of publishing).
Null handling
All Null fields on data records are bound in as special Null objects in Python — not the Python None
value.
To check if a value is Null
, use the following syntax:
if in1.MyField is Null:
Non-ASCII characters
When working with the Python-based nodes, it is important to be aware of the following Python language notation if your data contains non-ASCIIcharacters:
u
' character to convert the string to a Unicode string.For example, if you had the field name 'colör
' as input to a Transform node, the following code would cause the node to fail:out1.color = str(fields['colör'])
Instead, you would need to use the 'u
' character prefix to convert the colör
field to a Unicode string:out1.color = str(fields[u'colör'])