When invoking the hadoop command using the -conf
option to specify the
configuration parameters as described in Chapter 4. Using Connect for Big Data Sort, you need to provide an XML configuration
file that conforms to the Hadoop configuration file schema.
Following is a sample file that includes both the required Connect for Big Data Sort
properties along with some optional ones to override default settings. Other Hadoop properties
can also be specified in this file to override the site-wide settings. Note that
connect_install_dir should be replaced with the actual directory in which Connect for Big Data
was installed.
<?xml version=”1.0”?>
<configuration>
<!-- Required properties for Connect for Big Data Sort -->
<property>
<name>mapreduce.job.map.output.collector.class</name>
<value>com.syncsort.dmexpress.hadoop.DMXMapOutputCollector</value>
</property>
<property>
<name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name>
<value>com.syncsort.dmexpress.hadoop.DMXShuffleConsumerPlugin</value>
</property>
<property>
<name>dmx.home.dir</name>
<value>connect_install_dir</value>
</property>
<!-- Override dynamically set memory values -->
<property>
<name>dmx.map.memory</name>
<value>1024</value> <!-- 1GB -->
</property>
<property>
<name>dmx.reduce.memory</name>
<value>4096</value> <!-- 4GB -->
</property>
</configuration>