A Connect for Big Data setup consists of the following:
- Windows workstation
- Connect must be installed as described in Step-by-Step Installation, Windows Systems.
- Connect Job and Task Editors are used for MapReduce job development.
- MapReduce jobs are submitted to Hadoop via the ETL server from the Job Editor.
- Linux server (edge node)
- Connect must be installed as described in Step-by-Step Installation, UNIX Systems.
- The Hadoop client must be installed and configured to connect to the Hadoop cluster.
- The Editor Runtime Service, dmxd, must be running to respond to jobs run via the Windows
workstation; it calls dmxjob with the
/HADOOP
option, which ultimately calls hadoop to submit jobs to the cluster. - Hadoop cluster
- Connect must be installed without dmxd on all nodes in the Hadoop cluster as described in Step-by-Step Installation, Hadoop Cluster.
- Each mapper and reducer runs the map side or reduce side task(s), respectively.
- All file descriptors for sources, targets, and intermediate files are carefully connected so they fit into the Hadoop MapReduce flow.