Intermediate File Compression - 9.13

Connect ETL for Big Data Sort User Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL for Big Data Sort User Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:03:59.237517

Running Hadoop jobs with compression of the intermediate files between the mappers and reducers may significantly improve performance in some cases. To do so, specify map output compression and a compression codec by adding the following additional options to the hadoop job invocation:

-D mapred.compress.map.output=true
  • For gzip compression (CPU-intensive, good compression rates on text data):
    -D mapred.map.output.compression.codec=\
    org.apache.hadoop.io.compress.GzipCodec
  • For Snappy compression (faster than gzip, good compression rates on random data):
    -D mapred.map.output.compression.codec=\
    org.apache.hadoop.io.compress.SnappyCodec

These compression options require installation of the corresponding Hadoop native library for the given codec. For details, see the documentation of the distribution you are using.