/MAPREDUCE - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To indicate that the job is a MapReduce job, and to define a MapReduce data flow between two tasks.

Format

/MAPREDUCE [flow_definition]

where

flow_definition = FLOW task1 task2

Arguments

task1 the pathname or alias of the first task in the MapReduce flow.
task2 the pathname or alias of the second task in the MapReduce flow.

Location

The option may appear anywhere in the job definition.

Notes

When /MAPREDUCE is specified with no argument, the job is set as a map-only job when running on Hadoop.

When /MAPREDUCE is specified with the FLOW argument, a MapReduce data flow is established automatically between the two tasks at run-time. The MapReduce data flow is established by matching the fully qualified names of an eligible target of the first task with an eligible source of the second task. An eligible source or target is data that is either piped or is a non-lookup file. A MapReduce data flow splits the job between the map side and the reduce side when running on Hadoop.

If a MapReduce data flow can not be created or if either of the tasks specified in the MapReduce flow definition is a custom task, the HMRFLCNC error message is generated.

If more than one possible MapReduce data flow exists, an error message is generated. Consider, for example, the following scenario where task1 has two file outputs and task2 has two file inputs and either flow can be converted to a MapReduce flow:

Example

/MAPREDUCE
This option declares the job as a map only MapReduce job.
/MAPREDUCE FLOW task1 task2

When exactly one data flow exists between the two tasks that can be a MapReduce flow, the MapReduce flow is automatically established between the tasks at run-time.