Run TeraSort without Connect for Big Data Sort - Connect_ETL - 9.13

Connect ETL for Big Data Sort User Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL for Big Data Sort User Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:03:59.237517

To compare against the performance of running TeraSort with Connect for Big Data sort acceleration, run the TeraSort application using the native Hadoop sort as follows:

hadoop jar $HADOOP_MAPRED_HOME/<examples_jar_file> terasort <terasort_input_directory> <terasort_output_directory>

where:

  • <terasort_input_directory> is the HDFS directory containing the data that was previously generated with TeraGen
  • <terasort_output_directory> is the HDFS directory in which to output the sorted data

For example, to sort data previously generated in input-1TB and output it to a directory named output-1TB, run the following:

hadoop jar $HADOOP_MAPRED_HOME/hadoop-examples.jar terasort input-1TB output-1TB