Pig Support - Connect_ETL - 9.13

Connect ETL for Big Data Sort User Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL for Big Data Sort User Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:03:59.237517

Pig is a platform (both language and runtime environment) for writing and executing complex data processing programs in the Hadoop framework. Pig Latin is the extensible scripting language for writing Pig programs, which are translated into a sequence of MapReduce jobs to be run in Hadoop. Connect for Big Data can be used to accelerate the MapReduce jobs.

Fields of the following Pig data types are supported as keys for Connect for Big Data Sort:

  • int signed 32-bit integer
  • long signed 64-bit integer
  • float 32-bit floating point
  • double 64-bit floating point
  • chararray character array (string) in Unicode UTF-8 format
  • bytearray byte array
  • boolean boolean
  • tuple ordered set of fields
In addition to the properties required to invoke Connect for Big Data Sort for any MapReduce job shown in Connect for Big Data Sort Accelerator Properties, the following properties must be set for Pig programs, where <type> is the MapReduce version, as described at the beginning of Connect for Big Data Sort Accelerator Properties:
dmx.pig.support
Value true
pig.additional.jars

Value <connect_install>/lib/dmxhadoop_<type>.jar

The properties can be set in any of the following ways according to your needs:
  • For a single run: Specify the properties in a file in the following form, one per line, and pass that file to the pig command with the -P option, for example -P dmx-pig-properties.pig:
    <property_name>=<property_value>
  • For a single user, for all runs: Specify the properties in the file $HOME/.pigbootup (for pig version 0.11 and up) in the following form, one per line:
    set <property_name> <property_value>
  • For all users, for all runs: Specify the properties in the site-wide pig.properties file, in the following form, one per line:
    <property_name>=<property_value>

    For Apache Pig, the pig.properties file can be found in /etc/pig/conf/; for all other installations, check the relevant documentation on where to find/edit this file; it may be via a UI.