Pig is a platform (both language and runtime environment) for writing and executing complex data processing programs in the Hadoop framework. Pig Latin is the extensible scripting language for writing Pig programs, which are translated into a sequence of MapReduce jobs to be run in Hadoop. Connect for Big Data can be used to accelerate the MapReduce jobs.
Fields of the following Pig data types are supported as keys for Connect for Big Data Sort:
- int signed 32-bit integer
- long signed 64-bit integer
- float 32-bit floating point
- double 64-bit floating point
- chararray character array (string) in Unicode UTF-8 format
- bytearray byte array
- boolean boolean
- tuple ordered set of fields
In addition to the properties required to invoke Connect for Big Data Sort
for any MapReduce job shown in Connect for Big Data Sort Accelerator Properties, the following properties must be
set for Pig programs, where
<type>
is the MapReduce version, as described
at the beginning of Connect for Big Data Sort Accelerator Properties:dmx.pig.support
Value true
pig.additional.jars
Value <connect_install>/lib/dmxhadoop_<type>.jar
The properties can be set in any of the following ways according to your needs:
-
For a single run: Specify the properties in a file in the following form, one per
line, and pass that file to the pig command with the
-P
option, for example-P dmx-pig-properties.pig
:<property_name>=<property_value>
-
For a single user, for all runs: Specify the properties in the file
$HOME/.pigbootup
(for pig version 0.11 and up) in the following form, one per line:set <property_name> <property_value>
-
For all users, for all runs: Specify the properties in the site-wide
pig.properties
file, in the following form, one per line:<property_name>=<property_value>
For Apache Pig, the
pig.properties
file can be found in/etc/pig/conf/
; for all other installations, check the relevant documentation on where to find/edit this file; it may be via a UI.