Purpose
To indicate that the job is a MapReduce job, and to define a MapReduce data flow between two tasks.
Format
/MAPREDUCE | [flow_definition] |
where
flow_definition | = | FLOW task1 task2 |
Arguments
task1 | the pathname or alias of the first task in the MapReduce flow. |
task2 | the pathname or alias of the second task in the MapReduce flow. |
Location
The option may appear anywhere in the job definition.
Notes
When /MAPREDUCE is specified with no argument, the job is set as a map-only job when running on Hadoop.
When /MAPREDUCE is specified with the FLOW argument, a MapReduce data flow is established automatically between the two tasks at run-time. The MapReduce data flow is established by matching the fully qualified names of an eligible target of the first task with an eligible source of the second task. An eligible source or target is data that is either piped or is a non-lookup file. A MapReduce data flow splits the job between the map side and the reduce side when running on Hadoop.
If a MapReduce data flow can not be created or if either of the tasks specified in the MapReduce flow definition is a custom task, the HMRFLCNC error message is generated.
If more than one possible MapReduce data flow exists, an error message is generated. Consider, for example, the following scenario where task1 has two file outputs and task2 has two file inputs and either flow can be converted to a MapReduce flow:
Example
/MAPREDUCE
/MAPREDUCE FLOW task1 task2
When exactly one data flow exists between the two tasks that can be a MapReduce flow, the MapReduce flow is automatically established between the tasks at run-time.