Execute Data Flow - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Executes data flows.

You can use this node to run another data flow from within your current data flow.

You can browse directly to the data flow that you want to run, or, if you want to run more than one data flow, you can connect an input node containing a list of paths to the data flows that you want to run.

When you run another data flow by using the Execute Data Flow node, a "Run State" is created. A run state contains the run properties, node run information and data which is created when the Execute Data Flow node runs another data flow. By default, run states are saved in the Directory if the data flow that is executed fails. If you want to save run states in the Directory when a data flow executes successfully, modify the configuration of the CleanupOnSuccess property.

You can choose where to save temporary data files in the TemporaryExecutionDataLocation property. This allows you to choose a location that is accessible to other users who are collaborating with you on a data flow, so that they can also view the run data without needing to re-run the nodes.

You can open run states from the Directory for viewing in read only mode, with an option to edit the underlying data flow. When you open a run state, the information is overlaid onto the most recent version of the data flow. This means that if you have edited the data flow since the run state was created, the execution information would relate to the older version of the data flow. For example, if you have edited a node in the data flow since the run state was created, when you view the run state you will be seeing the execution information and data for the node as it was before you edited it.

Run a single data flow

  1. In the DataFlow property, click the folder icon to browse to the data flow that you want to run.
  2. In the Choose a Data Flow dialog, select the data flow that you want to run and click Choose.
  3. If you want to configure run properties, you can reference a run property set that you have already defined on the system in the RunPropertySet property. For more information about run property sets, see Run property sets.
  4. By default, the run state that is generated by running the data flow will be saved into the same Directory folder as the current data flow. If you want to save the run state in a different location, specify where to save the run state in the OutputFolder property.

Run multiple data flows

In this example, you have an input node that contains a list of paths to the data flows that you want to run.

Path:string
//admin/Data flow 1#graph
//admin/Data flow 2#graph
//admin/Data flow 3#graph
Tip: You can copy the path to data flows in your system from the Directory. The Resource Path is listed in the details panel on the right of the screen:

For more information, see Resource path examples.
  1. Connect your input data node to the Execute Data Flow node. For example, a Create Data node containing a list of paths to the data flows that you want to run.
  2. Select the Execute Data Flow node and choose the (from Field) variant of the DataFlow property.
  3. Type the name of the input field containing the list of paths to the data flows that you want to run, in this case Path.
  4. If you want to configure run properties, you can reference a run property set that you have already defined on the system in the RunPropertySet property. For more information on run property sets, see Run property sets.
  5. By default, the run state that is generated by running the data flows will be saved into the same Directory folder as the current data flow. If you want to save the run states in a different location, specify where to save the run states in the OutputFolder property.
Tip: You can also reference an input field containing a run property set or output folder by selecting the (from Field) variant of the RunPropertySet and/or OutputFolder properties. The reference must be in the Resource Path format, which you can obtain from the Directory, see Resource path examples.

If the referenced data flows execute successfully, the Execute Data Flow node will run successfully. The node outputs run information, with one record for each data flow that was executed.

By default, the node will fail if any of the referenced data flows fail to run successfully. You can modify this behavior by using the FailedDataFlowBehavior property.

Run with data driven run properties

When running multiple data flows via the Execute Data Flow, rather than specifying a run property set to run against each data flow, an alternative is to specify run properties in the incoming data set.

In the following example, the incoming data set contains two data flows to be run in the field "DataFlowName", and three other fields "myType", "myId" and "myName":

DataFlowName myType myId myName
//admin/testDataFlow#graph tertiary 8

Bob

//admin/anotherDataFlow#graph secondary 5 Bill

When the RunPropertySet property is left blank, the node will pass each unused input field as a run property to the data flow when it is run:

  • "testDataFlow" is passed the following run properties: myType=tertiary, myId=8, myName=Bob
  • "anotherDataFlow" is passed the following run properties: myType=secondary, myId=5, myName=Bill

When the node processes the incoming data set, it will determine whether any unused fields (not used in the properties) are classed as "unmapped fields", if any unmapped fields are found the UnmappedFieldBehavior property specifies how the node will behave.

In this example, if "testDataFlow" uses "myType", and "anotherDataFlow" uses "myType" and "myId", but no data flows use the "myName" field, then "myName" is classed as an unmapped field.

Resource paths

On the Execute Data Flow node, you can specify resource paths as relative paths to the data flow that you are currently editing. For example:

The current data flow has the following path:

//public/myFolder/thisDataFlow

You want to reference a run property set which has this path:

//public/runPropSets/myRunPropSet

In this case, you can use a relative path, for example:

../runPropSets/myRunPropSet

This is useful, for example, when building data flows that you might want to move between a UAT and a Production system which have a similar folder structure.

Properties

DataFlow

Specify the data flow that is to be executed by the node.

Choose the (from Field) variant of this property to specify the name of an input field containing the data flow.

RunPropertySet

Optionally specify the parent run property set to be used by the data flow.

Choose the (from Field) variant of this property to specify the name of an input field containing the run property set.

OutputFolder

Optionally specify the directory to output the execution run state to.

If you do not specify a value, the run state will be saved in the same location as the selected data flow.

Choose the (from Field) variant of this property to specify the name of an input field containing the output folder.

TemporaryExecutionDataLocation

Optionally specify the location to be used for storing temporary execution data.

If the default value is used, then the default location is appended with suffix dataflow name and execution id.

Choose the (from Field) variant of this property to specify the name of an input field containing the output folder.

GenerateSubdirectories

Optionally specify the behavior of the node when a TemporaryExecutionDataLocation property is populated.

If True is selected then each run writes its temp data into a separate subdirectory.

If False is selected, the files from all runs are written into that specified folder in TemporaryExecutionDataLocation property without any separation.

The default value is False.

PassThroughFields

Optionally specify which input fields on the first input will "pass through" the node unchanged from the input to the output, assuming that the input exists. The input fields specified will appear on those output records which were produced as a result of the input fields. Choose from:

  • All - Passes through all the input data fields to the output.
  • None - Passes none of the input data fields to the output; as such, only the fields created by the node appear on the output.
  • Used - Passes through all the fields that the node used to create the output. Used fields include any input field referenced by a property, be it explicitly (i.e. via a 'field1' reference) or via a field pattern (i.e. '1:foo*').
  • Unused - Passes through all the fields that the node did not use to create the output.

The default value is Used.

If a naming conflict exists between a pass-through field and an explicitly named output field, an error will occur.

CleanupOnSuccess

Optionally specify what to clean up when the execution of a data flow succeeds. Choose from:

  • None
  • Temporary Data - Cleans up all the data produced by the nodes so you can't access any data from the input/output pins when viewing the run state.
  • Temporary Data and Logs - Cleans up temporary data and any logs produced during the run.
  • Node States - Cleans up temporary data, logs, and clears down the state of the nodes, whilst the run state will be available from the directory when viewing the run state no node state information will be available.
  • Run State - Clears the entire run state, so that you can't see it from the Directory.

The default value is Run State.

CleanupOnFailure

Optionally specify what to clean up when the execution of a data flow fails. Choose from:

  • None
  • Temporary Data - Cleans up all the data produced by the nodes so you can't access any data from the input/output pins when viewing the run state.
  • Temporary Data and Logs - Cleans up temporary data and any logs produced during the run.
  • Node States - Cleans up temporary data, logs, and clears down the state of the nodes, whilst the run state will be available from the directory when viewing the run state no node state information will be available.
  • Run State - Clears the entire run state, so that you can't see it from the Directory.

The default value is None.

FailedDataFlowBehavior

Optionally specify the behavior of the node if a failure occurs while executing a data flow. Choose from:

  • Error - Logs an error to the Errors panel.
  • Log - Logs a warning to the Errors panel.
  • Ignore - Ignores the error.

The default value is Error.

UnmappedFieldBehavior

Optionally specify the behavior of the node if an input field is not used by the node. Choose from:

  • Error - Logs an error to the Errors panel.
  • Log - Logs a warning to the Errors panel.
  • Ignore - Ignores the error.

The default value is Ignore.

StopAtFirstFailure

Optionally specify if the node should terminate execution on the first failure it encounters.

The default value is False.

Example data flows

A number of sample Data Flows are available from the Samples workspace, found in the Analyze Directory page.

In the Directory under the /Data360 Samples/Node Examples/ folder, you will find "Running sub Data Flows with Execute Dataflow node", which shows examples of how to use this node.

Note: Upgrades will overwrite all data flows in the workspace. If you want to make changes to one of the sample data flows, we recommend you create a copy and save it elsewhere, using Save as...

Inputs and outputs

Inputs: 1 optional

Outputs: out and errors