The code skeleton that is provided by default with the Python node contains the basis of code needed to run the node successfully, you just need to add additional code to the initialize
and pump
methods.
The process flow of the Python node follows the path:
initialize
While pump:
#keep on calling pump until it returns False
pass
finalize
This path is always followed, assuming that no exceptions are thrown from the code within these methods. Additional node states for controlling downstream processing (outside of failure and success) are described in Controlling downstream processing.
Initialize
The initialize
method is where you should perform the required steps to setup the node. This generally involves:
- Reading any required properties.
- Setting up output metadata (if that can be performed without needing to read input records).
- Open outputs (if the output metadata can be setup).
- Performing any other initialization that may be required.
Pump
The pump
method contains the business logic of the node. This will involve the reading and writing of records, and any business logic or data transformations that need to be applied to these records. The pump
method will be called repeatedly until the method returns false, allowing individual pieces of work to be done each time that pump
is called, rather than doing it all at once free up the CPU.
Finalize
The cleanup method finishes execution of the node and cleans up internal resources allocated during the initialize
and pump
methods.
While there may be some additional code required in the finalize
method to handle the closing of external files, database connections, sockets and so on, the default implementation is often sufficient.