Set Properties Cluster-Wide - Connect_ETL - 9.13

Connect ETL for Big Data Sort User Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL for Big Data Sort User Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:03:59.237517

When running MapReduce jobs, Connect for Big Data sort acceleration can be specified at the command-line, or can be configured cluster-wide so that it is invoked for all MapReduce jobs.

To set up the Connect for Big Data sort acceleration properties for all nodes in the cluster, do the following (contact your cluster administrator for assistance):

  1. Stop the following Hadoop daemons:
    • For MRv1, stop JobTracker and all TaskTracker daemons.
    • For MRv2, stop ResourceManager and all NodeManager daemons.
  2. Edit the hadoop-env.sh file in the Hadoop conf directory (such as /etc/hadoop/conf) and specify the correct dmxhadoop jar file in the classpath, where <type> is the MapReduce version, as described at the beginning of Connect for Big Data Sort Properties:
    export HADOOP_CLASSPATH=\
                <connect_install>/lib/dmxhadoop_<type>.jar:\
                $HADOOP_CLASSPATH
    Otherwise, you will need to specify the dmxhadoop jar file location for each job at the command line with the option:
    –libjars <connect_install>/lib/dmxhadoop_<type>.jar
    1. Edit the mapred-site.xml file in the Hadoop conf directory and define the required properties shown in Required Properties. This will cause Connect for Big Data Sort to be invoked for all MapReduce jobs.
    2. Distribute the modified hadoop-env.sh and mapred-site.xml files to all nodes in the cluster.
      Note: For updates to mapred-site.xml to take effect, the applicable Hadoop daemons must be restarted.

    Otherwise, you will need to specify the dmxhadoop jar file location for each job at the command line with the option:

    –libjars <connect_install>/lib/dmxhadoop_<type>.jar
    1. Edit the mapred-site.xml file in the Hadoop conf directory and define the required properties shown in Required Properties. This will cause Connect for Big Data Sort to be invoked for all MapReduce jobs.
    2. Distribute the modified hadoop-env.sh and mapred-site.xmlfiles to all nodes in the cluster.
      Note: For updates to mapred-site.xml to take effect, the applicable Hadoop daemons must be restarted.
    3. Re-start the following Hadoop daemons:
      • For MRv1, re-start JobTracker and all TaskTracker daemons.
      • For MRv2, re-start ResourceManager and all NodeManager daemons.