/FLOW - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To define a data flow or execution flow between tasks and/or subjobs.

Format

/FLOW task_or_subjob1 task_or_subjob2 [data_flow]

where

data_flow = [attributes…]
attributes = {dataflow_optimization}
dataflow_optimization = {DIRECT [VERIFY]} {NOTDIRECT }

Arguments

task_or_subjob1 the pathname or alias of the first task or subjob in the execution flow.
task_or_subjob2 the pathname or alias of the second task or subjob in the execition flow.

Location

The option may appear anywhere in the job definition. When using direct data flows, however, specify the most data intensive flows first to avoid ambiguity.

Notes

Data flows are established automatically at run-time between connected tasks. This is done by matching the fully qualified names of sources and targets and adding a dataflow link from the target of the first task to the matching source of the second task.

If one of the two files specified with the /FLOW option is a subjob, no data flow is established.

Data flows are always automatically established between 2 tasks in a /FLOW statement.

When direct data-flows are enabled globally via the /DEFAULTFLOW option, individual data flows between two tasks are specified as not direct with the NOTDIRECT attribute. Conversely, when direct data-flows are disabled globally, individual data-flows are specified as direct with the DIRECT attribute.

Direct data-flows bypass writing the intermediate file to disk for better performance. This attribute can only be specified for data flows connecting a single file target to a single file source.

When the VERIFY keyword is specified and direct data flow optimization is enabled, Connect ETL generates a warning that identifies the data flows connecting a single file target to a single file source that cannot be treated as direct data flows and provides the reasons.

Note: Direct data flows apply to input and output files only. When using the VERIFY keyword, no warning is generated when 2 tasks in a flow contain non-file input and output. For example, when the VERIFY keyword is specified and task1 has DB output only and task2 has DB input, no warning is generated.

Data-flows cannot be optimized into direct data-flows in the following cases:

  • The source or target of the data flow has more than one connection. If a task has two data-flow connections to two other tasks from its single target file, for example, neither of these data flows can be optimized into direct data-flows.
  • The source of the data-flow contains a header layout.
  • The file name of the data flow target file contains a wildcard pattern.
  • The DTL job is customized with a third-party language.
  • Optimizing the data-flow into a direct data-flow creates a job cycle, which causes an infinite loop in the run sequence.

Examples

/FLOW task1 task2
An execution flow between the two tasks; task2 runs after task1 completes finishes. If the two tasks have any matching sources and targets, data-flows are automatically established between them. If direct data flows are enabled for the task, these data flows are treated as direct data flows. No intermediate files are created.
/FLOW task1 task2 DATAFLOW MAPREDUCE

A MapReduce flow between two tasks.