To compare against the performance of running TeraSort with Connect for Big Data sort acceleration, run the TeraSort application using the native Hadoop sort as follows:
hadoop jar $HADOOP_MAPRED_HOME/<examples_jar_file> terasort <terasort_input_directory> <terasort_output_directory>
where:
<terasort_input_directory>
is the HDFS directory containing the data that was previously generated with TeraGen<terasort_output_directory>
is the HDFS directory in which to output the sorted data
For example, to sort data previously generated in input-1TB
and output it to
a directory named output-1TB
, run the following:
hadoop jar $HADOOP_MAPRED_HOME/hadoop-examples.jar terasort input-1TB output-1TB