Handling Python node inputs and outputs - 3.12

Data360 Analyze Server Help

Data360 Analyze
Product family
Product name
Data360 Analyze
Data360 Analyze Server Help
First publish date

This section details how to write the Python node code to handle node inputs and outputs.

Handling node input

Finding input fields

If you want to locate a specific field within an input, it is necessary to find the (zero-indexed) index of the field within the input metadata, using the name of the field to search. For example, the following code within a node called SumExample locates the field with the name specified in the InputFieldName property. The node will error if the field does not exist on the input:

self.inputFieldIndices = [ ]
for input in self.inputs:

    idx = input.metadata.find(self.inputFieldName)
    if idx == -1:

        raise braininfo.BrainNodeException, "Specified InputFieldName: %s does not exist on input: %s " % (self.inputFieldName, input.name)


This code obtains the input object for each node input, then searches the input's metadata for the field with the name self.inputFieldName.

If the field cannot be found in the metadata, then a result of -1 is returned. In this case, the node defines that it will error.

It is often the case that the same field will need to be read from each record in an input. When this is the case, then it is recommended to store the field index (idx in the above code), such that it can be used without re-searching the record metadata each time. This should be performed in the initialize method of the node.

Reading records

Note: Record processing should generally be performed in the pump method, see Pump.

Processing records is relatively straightforward. The following example shows how to read records from each input that still has records remaining, until all inputs have been completely read:

for inp in range(len(self.inputs)):

    inputRec = self.inputs[inp].read()
    if not inputRec is None:

        #process record

This example loops through each of the node inputs, then reads the next available record from the inpth input. If no more records are available from this input, then a null value (None in Python) will be returned.

Therefore, if we only had one input and wanted to continue processing records until this input had no more records to read, the following could be used instead:

inputRec = self.inputs[0].read()
if inputRec is None:
    return False
    #process record

return True

Once the record has been obtained, and we have verified that this record is not null (implying that there are no more records to read from the input), then it is straightforward to get a field from this record.

To obtain the first field defined on a record, then the following can be used:

field = inputRec[0]

However, since field ordering is not guaranteed, it is better to use the name of the field and obtain the index of that field in the metadata. To get the field named "Foo" from the first input, the following can be used:

field = inputRec["Foo"]

However, obtaining a field using a field name forces the node to search through the records metadata to locate the field with the specified name. If this is to be done repeatedly, the index should be stored in a variable so that the metadata does not need to be searched every time, where, in the initialize method, the index for the {{^InputFieldName^}} field for the inpth input has been stored in the self.inputFieldIndices array, at position inp.

The following example obtains the {{^InputFieldName^}} field from the inpth input:


Null field data

If a field has not been set on an input record, the data viewer will display the value as "NULL". When accessing a field that has not been set on a record, the returned object will be equal to the Python None.

Therefore, when reading the field at the index "index" from a record, the following will check that this field value is set on the record:

field = inputRec[index];if field is None:    #handle the case where the field is not set

Handling node output

This section details how the Python node can be written to handle the node outputs.

Setting output metadata

Note: Wherever possible, outputs should have their metadata set in the initialize method, see Initialize. In cases where the metadata is dependent on data from input records, the metadata should be set in the pump method.

The metadata needs to be defined on an output before any records are written to that output.

A new metadata object can be obtained off the node output, using:

metadata = self.newMetadata()

Constructing new metadata

In the following example, the OutputAsDouble and OutputFieldName properties are used to determine the output metadata.

When OutputAsDouble is set to True, the OutputFieldName property is set to be of a floating point type. Otherwise, the output metadata is setup with an integer type, as shown in the following code:

om = self.newMetadata()if self.outputAsDouble:    om.append(self.outputFieldName, "double")else:    om.append(self.outputFieldName, "int")
self.outputs[0].metadata = om

The newMetadata call constructs a new BrainMetadata object. This is then populated with new metadata objects. Once all of the required field metadata has been added to the metadata, the metadata can be set on the output, as follows:

self.outputs[0].metadata = om

After the metadata has been set on the output, no additional field metadata can be added to the metadata.

Reusing metadata

While constructing new metadata allows for full control of all fields in the output metadata, it may be that the output metadata should simply be the same as the metadata on an input. In this case, you can use the clone method defined on the BrainMetadata class. The following example code shows how to set the metadata for the first output to be the same as the metadata for the first input:

#Setup the output metadataoutputMetadata = self.inputs[0].metadata.clone()self.outputs[0].metadata = outputMetadata

Similarly, the following example shows how to set the metadata to be used on multiple outputs:

metadata0 = self.inputs[0].metadata.clone()self.outputs[0].metadata = metadata0metadata1 = self.inputs[0].metadata.cloneself.outputs[1].metadata = metadata1
Note: While the same metadata can be used on multiple outputs without modifying the metadata, if you need to have different metadata on different outputs, a different metadata object will be required.

Extending input metadata

For most nodes, you neither want to create an entirely new metadata, nor do you want to have exactly the same metadata on the output as on the input. In most cases, you want to use most or all of the fields from an input, and add additional fields for calculations based on the input data.

The following example shows how a node called Simple Metadata Modification takes a Create Data node as an input. The node then outputs all of the input fields from the Create Data node, except for the "junk" field. In addition to the fields from the input, the node outputs the additional fields "Running Id Total", and "Running Rand Total" which are the sums of the fields "id" and "rand" from the input.

To setup this metadata, the following code is used:

#Construct the output metadata, using all of the input fieldsom = self.inputs[0].metadata.clone()#Remove "junk" from the output metadataom.remove("junk")#Add the additional fields that we want to outputom.append("Running Id Total", "int")om.append("Running Rand Total", "double")#Set the output metadataself.outputs[0].metadata = om

The first line of this code takes a copy of the metadata from the input pin. At this stage, all of the metadata from the input is set on the metadata object om. Then, since we don’t want the field "junk" on the output pin, this is removed from the output metadata. Next, the additional fields "Running Id Total" and "Running Rand Total" are added to the metadata object, before this is finally set as the metadata for the output pin.

When copying the metadata from the input pin in this fashion, it is possible to use the _copyFrom method on the output record to pass through all of the fields defined on the input record which are to sent to the node’s output, see Copying data from input to output pins.

Writing records

Note: Record writing should generally be performed in the pump method, see Pump.

Records can only be written to an output after the output has its metadata set. It is a relatively straightforward process to write records within a Python node. First, a new record is obtained from the output metadata. Then, on the returned record, each of the fields can be populated prior to writing the record to the output.

The following code shows how to write a simple record to the first output with one field set to the value of the sum variable:

outputRec = self.outputs[0].newRecord()outputRec[0] = sumself.outputs[0].write(outputRec)

Each field which is not set on a record prior to the record being written will appear as "NULL" in the data viewer. For instance, if in the above example, the first output was defined with a metadata containing two fields, the second field would be left as "NULL".

Copying data from input to output pins

This section describes how to populate output records without needing to copy each of the fields across from the input individually.

Once you have setup the output metadata, you can use code similar to the following example to copy the data from the input record to the output. Note that for this example, the variables have been created in the initialize method for the indices of the various fields that we need to handle individually.

# Read the input recordinputRec = self.inputs[0].read()# Check that we haven't passed the last recordif inputRec is None:    return False# Add to the running totalsself.runningIdTotal = self.runningIdTotal + inputRec[self.inputIdIdx]self.runningRandTotal = self.runningRandTotal + inputRec[self.inputRandIdx]# Create a new output recordoutputRec = self.outputs[0].newRecord()# Copy all of the fields from the input record that are defined on the output metadataoutputRec._copyFrom(inputRec)# Add the additional running total field values to the output recordoutputRec[self.outputIdTotalIdx] = self.runningIdTotaloutputRec[self.outputRandTotalIdx] = self.runningRandTotal# Write the output recordself.outputs[0].write(outputRec)

The first two pieces of this code reads a record from the first input, and checks that we have not passed the last record.

The code then adds to the running totals it is keeping on the "id" and "rand" fields. The self.inputIdIdx, and self.inputRandIdx are variables storing the index of the "id" and "rand" fields on the input pin - these have already been setup in the initialize method. Then a new output record is constructed.

The next line (highlighted in red above) is the line of interest, as this copies all fields from the input record, which are defined on the output metadata to the new output record.

Note that after these fields have been copied, it is possible to make additional changes to the output record prior to writing it. In the example, this involves simply setting the two running total fields. However, it would also be possible at this stage to overwrite the values of fields that have been set via the call to _copyFrom.