Python node recommendations - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Property visibility

If you want to create a Python node to implement some complex functionality, and allow this functionality to be configured by setting the node properties, you can create a Python library node. For more information on creating library nodes, see Creating library nodes. It is generally recommended that you hide the following Python-specific properties so that they are not visible to the end user of the node:

  • Python2Implementation
  • PythonCodeOnServer
Note: In the library node editor Define panel, you can hide inherited properties by clicking the relevant property menu and choosing Hide.

However, in advanced use cases the end user of the node may be expected to provide their own Python code. In such cases, it may be appropriate to expose the Python-specific properties.

Property base and run time property names

It is recommended that you set the property base to reflect the node's name, for example, the property base on a node called SumExample is ls.brain.node.sumExample.

Experienced users will be aware that the structure of the node property hierarchy is important because the property resolution rules use the dot-separated property name as a search hierarchy.

If a search is performed for the property ls.brain.node.sumExample.foo, then

  • If it exists, it is returned
  • If it does not exist, a search is performed for ls.brain.node.foo, then
    • If this property exists, it is returned.
    • If it does not exist, a search is performed for ls.brain.foo, then
      • If this property exists, it is returned.
      • If it does not exist, a search is performed for ls.foo then
        • If this property exists, it is returned.
        • If it does not exist, a search is performed for foo.
          • If this property exists, it is returned, otherwise an error occurs.

Therefore, it is recommended that you ensure that the node hierarchy is mirrored in the run time property name hierarchy. It also makes life easier to declare this property base somewhere in the node code rather than needing to type this each time you require a property value. This way, a propertyName property that is declared on a parent node, with the correct run time property name hierarchy can be obtained using the following line:

self.properties.getString( PROPERTY_BASE + ".propertyName");

Code documentation and maintenance

Please keep in mind the following recommendations:

  • Comment any complex logic in the code for clarity for future users.
  • Provide sufficient information to the logs such that errors have sensible and understandable error messages that allow for node errors to be easily resolved.

Working with external files

In general, it is sufficient to use the node inputs and outputs for all required processing. Where permanent storage is required, then standard Python I/O mechanisms can be used to create files on the server.

There is sometimes a need to create temporary files which only exist for the lifetime of the node. Mechanisms are provided within the Python node to allow for temp file management.

A file can be created using the standard Python I/O mechanisms and registered with the node to be deleted once the node has terminated. This can be done using the registerTemp method on the reporter object.

There is currently no way within the Python node to allow for temporary file persistence that is determined by the temporary file configuration – the temp file registered will always be deleted when the node has completed (regardless of whether or not the node has succeeded or failed).

An example of how this could be used to create a temporary file with a .txt extension in the system tmp directory is shown below:

  1. Firstly, both the os and tempfile modules must be imported, as follows:
    import osimport tempfile
  2. Create a temp file using the standard Python mechanisms provided by the tempfile module, as follows:
    #Create the temp file(tempFileHandle, tempFileName) = tempfile.mkstemp(".tmp")
  3. Register this with the node such that it gets deleted when the node has completed execution, as follows:
    #Register the temp file to get deleted when the node completesself.reporter.registerTemp(tempFileName)
  4. Perform the normal file I/O operations to populate the file. An example of this (writing a simple line to the temp file then closing the file) is shown below:
    #Write something to the temp file & closeos.write(tempFileHandle, "This is a temp file\n")os.close(tempFileHandle)

Controlling downstream processing

Note: The setSuccessReturnCode mechanism may not work correctly when the nodes are run in streaming mode. This is because in streaming mode, the downstream nodes start execution as soon as possible, rather than waiting for the predecessor nodes to complete. This means that by the time the setSuccessReturnCode has been set by a node, the downstream nodes that it is trying to prevent from running will have already started execution. To prevent this from happening, when constructing a Python node which uses the setSuccessReturnCode method, the node should be set to never be run in streaming mode.

The following steps will ensure that the node is never streamed:

  • Create a new boolean property on the node (name it "Streamable").
  • Set the run time property name of the property to ls.brain.node.streamable.
  • Set the property to False.
  • If you are creating a library node, this property should then be hidden to prevent end users of the node from modifying it.

For example, in the Python2Implementation property of a node, there is a property defined ReturnCode. This return code determines how the node exits, and also determines which nodes downstream from this node will be executed. Within the pump method, the ReturnCode property is used to set the exit status of the node, as follows:

self.reporter.setSuccessReturnCode(self.returnCode)

The valid values to pass to the setSuccessReturnCode method and their meaning is outlined in the following table.

ReturnCode Description

0

All nodes wired to the output pin and output clock will execute normally.

-1

The node errors - no connected nodes execute on either the output pin or output clock.

100

Signal to any nodes that are wired to this node's output pin to change to the rescheduled state, without preserving any data on the output.

101-116

Signal to any nodes that are wired to this node's output pin to change to the rescheduled state. The data on the 1st to 16th output will be preserved (e.g. return code 102 will preserve the data on the 2nd output, 113 will preserve the data on the 13th output, and so on).

Adding extra code blocks

In some advanced cases, you may want to create some infrastructure for your Python node which defines the base operation, and then expose another Python property where simple code snippets can be entered to control some small piece of the processing. For example, if you are creating a Python library node that you want to have some common base code, however each instance of this node may need to add or modify a small section of code. Therefore, multiple Python code blocks are sometimes required.

Modifying the Python2Implementation property allows you to execute your Python code. When you are writing an additional Python code block, you will need to load, import and execute the additional code, for example:

# If you implement your logic here, you don't need code below. If you want to inherit, and add more logic at next level, uncomment below code# py3File = brainNodeControlObj.properties.getString("ls.brain.node.BrainPython.python3ImplementationFile")# braininfo.loadModule(py3File, "py3", brainNodeControlObj)# import py3# BrainNodeImpl = py3.setup(brainNodeControlObj, BrainNode)# return BrainNodeImplreturn BrainNode

When uncommented, the purpose of this code is:

  1. Obtain the name of the file containing the Python code property which corresponds to the run time property name:
    ls.brain.node.BrainPython.python3ImplementationFile
  2. Load the code from this file into the module named "py3":
    braininfo.loadModule(py3File, "py3", brainNodeControlObj)
  3. import the module py3:
    import py3
  4. Call the py3 setup method and assign the returned object to the BrainNodeImpl variable:
    BrainNodeImpl = py3.setup(brainNodeControlObj, BrainNode)
  5. Return this object to the caller of our setup method:
    return BrainNodeImpl

Additionally, after uncommenting the code, you must also complete the following steps:

  1. Create a new property
    1. Name the property to represent its purpose (<PropertyName>).
    2. Set the type of the property to Python.
    3. Set the Run Time Property Name of the property to represent its purpose.
  2. Within the Python2Implementation code, uncomment from # py3File = to # return BrainNodeImpl.
  3. Within the Python2Implementation code, comment out the following line: return BrainNode.
  4. In the Python2Implementation code, replace the reference to ls.brain.node.BrainPython.python3ImplementationFile with the Run Time Property Name that you defined in step 1c.
  5. In the Python2Implementation code, change all of the references to py3 (both the whole word, and in the string py3File) to the property name that you defined in step 1a.
  6. In the (<PropertyName>) property code, add the following code:
    import braininfo
    def setup(brainNodeControlObj, BrainNodeClass):
        ''' Must return a class that inherits from BrainNodeClass. '''
        class BrainNode(BrainNodeClass):
        #Your methods should be implemented here
        
        return BrainNode
  7. Modify the (<PropertyName>) property code to add the methods that you want to define on the new class.
  8. Call these methods using the following format: self.<methodName>(<args>) within the Python2Implementation code, where appropriate.

The process of adding extra code can be expanded to any number of levels of inheritance. Each time you add a new code block, rather than returning BrainNode at the end of the setup method, you simply need to:

  • Obtain the filename of the Python code property from self.properties.
  • Load, then import, the module.
  • Call the setup method on that module.
  • Return the object that the setup method returns.

Example

You have a node that provides the code required to setup the outputs to have the same fields as the corresponding input metadata. For this to work, the node must have the same number of inputs as outputs.

The Python2Implementation defines the base functionality, which defines the initialize and pump methods required. The pump method simply makes the following call:

return self.processData()

However, self.processData is not defined within the Python2Implementation property. Rather, this is declared in the additional Python code property ProcessData. The ProcessData property is declared to have a run time property name of ls.brain.node.inheritanceExample.processData.

At the end of the setup method in the Python2Implementation property, you can see that this Python code property is loaded, imported, the setup method is called on ProcessData, and the value returned from the ProcessData setup method is then returned from the Python2Implementation setup method, as follows:

processData = brainNodeControlObj.properties.getString("ls.brain.node.inheritanceExample.processData")braininfo.loadModule(processData, "processDataModule", brainNodeControlObj)import processDataModuleProcessDataImpl = processDataModule.setup(brainNodeControlObj, BrainNode)return ProcessDataImpl

With this approach, the call to self.processData within the Python2Implementationpump method will end up actually calling the processData method defined in the ProcessData property.

The processData method itself is designed to be expanded in nodes that inherit from the base node. It is defined as follows:

def processData(self):
#Enter code here to process through the data#Return True to be called again for the next stage of processing#Return False to signal that there is no more work to do#The node will complete processing after False is returned

    return False

Since this is defined as a base node, and the nodes that inherit from this base node should not be modifying the Python2Implementation (rather they should modify the ProcessData method), the Python2Implementation and PythonCodeOnServer properties have been hidden from the implementing node.

A node called Multi Filter inherits the ProcessData property from the Abstract Metadata PassThrough node, the Multi Filter node then defines a new ProcessRecord property, which is invoked from the code within the ProcessData property. The code within the processData method shown below shows that the node continues reading records from each input until the inputs have no more records to read. On each record, a method processRecord is called.

def processData(self):
    allProcessed = True
    self.recordNum = self.recordNum + 1

    #Keep processing records until none left to read
    for inputNum in range(len(inputs)):
        inputRecord = self.inputs[inputNum].read()
        
        if not inputRecord is None:
            allProcessed = False

            #Call the processRecord method
            #to be defined on implementing nodes
            self.processRecord(inputNum, inputRecord, self.recordNum, self.inputs[inputNum], self.outputs[inputNum])

    return not allProcessed

Again, however, this code is calling a method that is not defined within the code block. The processData method calls the processRecord method which is not defined in the ProcessData property.

This node has introduced an additional level of inheritance, with an additional Python code property. This time, the property is ProcessRecord, with the run time property name of: ls.brain.node.inheritanceExample.processRecord. At the end of the ProcessData setup method, you can see that the same steps have been taken to load the property file, import the corresponding module and call setup on this module as shown in the code below:

processRecord = brainNodeControlObj.properties.getString("ls.brain.node.inheritanceExample.processRecord")braininfo.loadModule(processRecord, "processRecordModule", brainNodeControlObj)import processRecordModuleProcessRecordImpl = processRecordModule.setup(brainNodeControlObj, BrainNode)return ProcessRecordImpl

Therefore, within the processData method, the call to self.processRecord will be invoking the processRecord method defined within the ProcessRecord property.

As shown below, the implementation of the processRecord method is just a stub which is left to be filled out by Multi Filter node instances:

# Takes the following properties#	inputNum 	- The index of the input that is being processed.			Note: This is the same index as that of the output that is provided.#	inputRecord	- The record currently being processed from the specified input#	recordNum	- The record being read is the <recordNum>th record read from the specified input#	input		- The node input from which the record was read#	output	- The node output which can be written to where desireddef processRecord(self, inputNum, inputRecord, recordNum, input, output):
    #Enter code here to process the record
    pass

Since users who implement the Multi Filter node are only meant to modify the contents of the ProcessRecord property and not the ProcessData property, the ProcessData property has been hidden on the Multi Filter node. Therefore, when an end user creates a new Multi Filter node, they are presented with a clean node interface as shown below, which hides most of the implementation that restricts their focus to simply writing the code to handle the individual records as they are read.

The following example shows how a PassThrough node implements a straight pass through on all input records:

def processRecord(self, inputNum, inputRecord, recordNum, input, output):
    record = output.metadata.newRecord()    record._copyFrom(inputRecord)    output.write(record)

Since all of the record I/O has been taken care of in the base nodes, the amount of effort required from the node user is significantly reduced.

The next example shows how to convert strings to upper case. The code for the processRecord method is as follows:

def processRecord(self, inputNum, inputRecord, recordNum, input, output):
    outputRecord = output.metadata.newRecord()
    for idx in range(len(inputRecord)):
        val = inputRecord[idx]
        if isinstance(val, basestring):
            outputRecord[idx] = val.upper()
        else:
            outputRecord[idx] = val
        output.write(outputRecord)