When running MapReduce jobs, Connect for Big Data sort acceleration can be specified at the command-line, or can be configured cluster-wide so that it is invoked for all MapReduce jobs.
To set up the Connect for Big Data sort acceleration properties for all nodes in the cluster, do the following (contact your cluster administrator for assistance):
-
Stop
the following Hadoop daemons:
- For MRv1, stop JobTracker and all TaskTracker daemons.
- For MRv2, stop ResourceManager and all NodeManager daemons.
-
Edit the
hadoop-env.sh
file in the Hadoopconf
directory (such as/etc/hadoop/conf
) and specify the correct dmxhadoop jar file in the classpath, where<type>
is the MapReduce version, as described at the beginning of Connect for Big Data Sort Properties:export HADOOP_CLASSPATH=\ <connect_install>/lib/dmxhadoop_<type>.jar:\ $HADOOP_CLASSPATH
Otherwise, you will need to specify the dmxhadoop jar file location for each job at the command line with the option:–libjars <connect_install>/lib/dmxhadoop_<type>.jar
-
Edit the
mapred-site.xml
file in the Hadoopconf
directory and define the required properties shown in Required Properties. This will cause Connect for Big Data Sort to be invoked for all MapReduce jobs. -
Distribute the modified
hadoop-env.sh
andmapred-site.xml
files to all nodes in the cluster.Note: For updates to mapred-site.xml to take effect, the applicable Hadoop daemons must be restarted.
Otherwise, you will need to specify the dmxhadoop jar file location for each job at the command line with the option:
–libjars <connect_install>/lib/dmxhadoop_<type>.jar
- Edit the
mapred-site.xml
file in the Hadoopconf
directory and define the required properties shown in Required Properties. This will cause Connect for Big Data Sort to be invoked for all MapReduce jobs. - Distribute the modified
hadoop-env.sh
andmapred-site.xml
files to all nodes in the cluster.Note: For updates to mapred-site.xml to take effect, the applicable Hadoop daemons must be restarted. - Re-start the following Hadoop daemons:
- For MRv1, re-start JobTracker and all TaskTracker daemons.
- For MRv2, re-start ResourceManager and all NodeManager daemons.
-
Edit the