Connect for Big Data is the Hadoop-enabled edition of Connect, providing the following Hadoop functionality:
- ETL Processing in Hadoop – Develop a Connect for Big Data ETL application entirely in the Connect GUI to run seamlessly in the Hadoop MapReduce framework, with no Pig, Hive, or Java programming required. Currently, jobs can be run in either MapReduce or Spark. See the online Connect Help topic "Connect for Big Data”.
- Hadoop Sort Acceleration – Seamlessly replace the native sort within Hadoop MapReduce processing with the high-speed Connect engine sort, providing performance benefits without programming changes to existing MapReduce jobs. See the Connect for Big Data Sort User Guide, which is included in the Documentation folder under your Connect software installation directory.
- Apache Spark Integration – Use the Spark mainframe connector to transfer mainframe data to HDFS. See the online Connect Help topic “Spark Mainframe Connector”.
- Apache Sqoop Integration – Use the Sqoop mainframe import connector to transfer mainframe data into HDFS. See the online Connect Help topic "Sqoop Mainframe Import Connector”.