Use Connect for Big Data Sort - Connect_ETL - 9.13

Connect ETL for Big Data Sort User Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL for Big Data Sort User Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:03:59.237517

In order for the Connect for Big Data sort accelerator to be picked up by a MapReduce job, the job must be invoked with the Connect for Big Data Sort properties set in one of the following ways:

  • Specifying the properties at the command line with the –D option:
    hadoop jar <jar_file> \-D mapreduce.job.map.output.collector.class=\
    com.syncsort.dmexpress.hadoop.DMXMapOutputCollector \
    -D mapreduce.job.reduce.shuffle.consumer.plugin.class=\
    com.syncsort.dmexpress.hadoop.DMXShuffleConsumerPlugin \
    -D dmx.home.dir=<connect_install> \
    [dmx_optional_parameters]
     [command_options]
  • Using an XML configuration file that contains the Connect for Big Data Sort property settings:
hadoop jar <jar_file> -conf <XML_file> [command_options]

where:

  • <jar_file> is the jar file that defines the MapReduce job
  • [dmx_optional_parameters] are any of the optional parameters shown in Optional Properties.
  • <XML_file> is the configuration file containing the same properties/values that would have been specified with the –D option. See Sample Hadoop Configuration File for a sample configuration file.
  • [command_options] are any application-specific options such as input and output files

If the HADOOP_CLASSPATH was not set globally to point to the location of the dmxhadoop jar file as described in , then it will need to be specified with the other Connect for Big Data options as follows, where <type> is the MapReduce version, as described at the beginning of Optional Properties:

–libjars <connect_install>/lib/dmxhadoop_<type>.jar

Note that the “-“ options must precede any non-“-“ application-specific arguments in order to be parsed correctly. If you get a “Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.” warning, you’ll need to modify the MapReduce job to pick up the –D generic command-line options. You can do this by either using the GenericOptionsParser class or by implementing the Tool interface and running your application with the ToolRunner utility, which uses GenericOptionsParser internally.