Connect to HDFS from Connect - Connect_ETL - 9.13

Connect ETL Installation Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Installation Guide
Copyright
2024
First publish date
2003
Last updated
2024-11-08
Published on
2024-11-08T16:36:35.232000

In order for Connect to access data located in a HDFS, a Hadoop distribution must be installed and configured as follows on the system where the Connect jobs and tasks are executed:

  • The hadoop command must be accessible to Connect:
    • Connect first looks for the hadoop command in $HADOOP_HOME/bin/hadoop, where the environment variable HADOOP_HOME is set to the directory where Hadoop is installed. Defining environment variables can be done through the Environment Variables tab of the Connect Server dialog.
    • If HADOOP_HOME is not defined or the directory can't be found, Connect looks for the hadoop command in the system path, where it is automatically added by some Hadoop distributions.
  • The fs.default.name property in the core-site.xml configuration file must be set to point to the Hadoop file system.
  • The HTTP namenode daemon must be running on the default port 50070. If you would like to use a different port number, please contact Technical Support.
  • If the Hadoop cluster requires Kerberos authentication, you need to use the dmxkinit utility to run your HDFS extract/load jobs/tasks.