In-container node execution - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

In-container node execution is a feature that improves the performance of nodes when processing a moderate number of records. Rather than initializing separate node processes (which can cause delays on data flows with a large number of nodes), the "node container" process allows different nodes to be run within the same process, sharing the same JVM, and therefore reducing the process start-up overhead.

This feature only applies to Java based nodes, for example the Transform, Sort and Split nodes. This does not apply to the deprecated nodes.

Disable in-container execution

In-container execution is on by default. If you want to disable in-container execution across the system, add the following line to your cust.prop file:

ls.brain.server.delegateToContainer=false

The cust.prop file can be found at:

<Data360Analyze site configuration directory>/cust.prop

Restart Data360 Analyze for the changes to take effect.

Disabling in-container execution for an individual node

If you do not want an individual node to run in the container, for example, if you have a custom script which generates a large in-memory object, then you can disable in-container execution for a specific node:

  1. Select the node for which you want to disable in-container execution, then click the Define tab of the Properties panel.
  2. Create a new Boolean type property by typing the name "DontRunInContainer"and selecting Boolean from the Type list:

  3. Assign a Run Time Property Name to the new "DontRunInContainer" property:
    1. On the Define tab of the Properties panel, click the menu button to the right of the property and select Edit Details.
    2. In the Edit Property Definition dialog, type ls.brain.node.memorySafeExecutor in the Run Time Property Name field.
    3. Click Done.
  4. Change to the Configure tab of the Properties panel, then set the DontRunInContainer property to True:

Configure the node container port

By default, when the Data360 Analyze server starts it launches the node container on the next available port, starting from the Data360 Analyze server port, + 1 e.g. 7732.

If you want to set a fixed port for the node container, add the following line to your cust.prop file, replacing <port number> with the required port number:

ls.brain.server.nodeContainer.port=<port number>

For example, to set the node container to run on port 8123, add the following line:

ls.brain.server.nodeContainer.port=8123

Restart Data360 Analyze for the changes to take effect.

Configure node container heap space

In-container node execution is a feature that improves the performance of nodes when processing a moderate number of records. Nodes are processed in a "container" to reduce the latency in initializing node execution, see In-container node execution.

The node container has no default maximum heap size. To configure the maximum amount of heap memory that the node container can use, add the following line to your cust.prop file, replacing <heap size> with the required value:

ls.brain.server.nodeContainer.javaMaxHeapSize=<heap size>

For example, to set the container to use a maximum of 2GB, add the line:

ls.brain.server.nodeContainer.javaMaxHeapSize=2048m

The cust.prop file can be found at:

<Data360Analyze site configuration directory>/cust.prop

Restart Data360 Analyze for the changes to take effect.

Note: If you have set a maximum heap size for the node container, and you change the container thread limit, you should then also increase the container's heap size accordingly.

Configure node container thread limit

In-container node execution is a feature that improves the performance of nodes when processing a moderate number of records. Nodes are processed in a "container" to reduce the latency in initializing node execution, see In-container node execution.

There are a fixed number of threads available in the node container. The number of available threads dictates the maximum number of nodes that can run concurrently within the container. If all threads are in use, and a node is to be executed, it will run in its own process, rather than waiting for a free thread.

By default, the thread limit is set to 4. However, it may be desirable to increase this limit, particularly on server instances where there are likely to be multiple concurrent executions by different users or schedules.

You can override the default setting by adding the following line to your cust.prop file, and editing the number "4" as required:

ls.brain.server.nodeContainer.inContainerPoolSize=4

For example, to set the container to use a maximum of 20 threads, add the line:

ls.brain.server.nodeContainer.inContainerPoolSize=20

The cust.prop file can be found at:

<Data360Analyze site configuration directory>/cust.prop

Restart Data360 Analyze for the changes to take effect.

Note: If you change the node container thread limit, and if you have also set a maximum heap size for the container, you should increase the container's heap size accordingly, see Configuring node container heap space.