Cluster in the cloud using Cloudera Director - Connect_ETL - 9.13

Connect ETL Installation Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Installation Guide
Copyright
2024
First publish date
2003
Last updated
2024-11-08
Published on
2024-11-08T16:36:35.232000

Using Cloudera Director, you can install Connect for Big Data on all of the nodes of a cluster in Google Cloud Platform (GCP) or in Amazon web services (AWS).

Provided that you update the Cloudera Director configuration file, Cloudera Director can install Connect for Big Data as part of a cluster creation process that is initiated from the Cloudera Director command-line interface (CLI).
Note: As Cloudera works toward supporting third-party parcels in Cloudera Director, Precisely is committed to updating the Connect for Big Data installation procedures in alignment with Cloudera Director enhanced functionality.

Pre-Installation

To enable Cloudera Director to install Connect for Big Data on a cluster in the cloud, update the instancePostCreateScripts section of the Cloudera Director configuration file to invoke a Connect installation script, which you create. At a minimum, the Connect installation script must install the DMExpress RPM.

Example: instancePostCreateScripts section of a Cloudera Director configuration file

In the following instancePostCreateScripts example, the Connect installation script is copied from a Google Cloud Storage bucket and executed.
instancePostCreateScripts: ["""#!/bin/sh
echo "Installing DMExpress..."
/usr/local/bin/gsutil cp gs://<bucket_name>/installdmx.sh installdmx.sh
chmod a+x installdmx.sh
sudo ./installdmx.sh
if test $? -ne 0
then
echo Failed to install Connect on cluster nodes.
exit 1
fi
echo "Done installing Connect ..."
exit 0
"""]

Example: Connect installation script

#!/bin/bash
version=9.13
shrinkWrapFile=dmexpress-${version}-1.x86_64.bin
shrinkWrapResponse=shrinkWrapResponse.txt
# create the shrink-wrap response file
cat < $shrinkWrapResponse
a
EOF
/usr/local/bin/gsutil cp gs://<bucket_name>/$shrinkWrapFile $shrinkWrapFile
if test $? -ne 0
then
echo Failed to copy Connect shrinkwrap file from the bucket
echo ""
exit 1
fi
chmod a+x $shrinkWrapFile
#extract the rpm
./$shrinkWrapFile < $shrinkWrapResponse > shrinkWrap.out 2>&1
#install the rpm
rpm -i dmexpress-${version}-1.x86_64.rpm
if test $? -ne 0
then
echo Failed to install Connect RPM package
echo ""
exit 1
fi
rm -f $shrinkWrapResponse
rm -f $shrinkWrapFile
rm -f dmexpress-${version}-1.x86_64.rpm

Installation

From the Cloudera Director CLI, create the cluster. When the Cloudera Director cluster deployment completes successfully, Connect for Big Data is installed on all of the nodes in the cluster.

Post-installation

To enable the submission of Connect for Big Data jobs from the Connect Job Editor on a Windows instance, do the following:

SSH to the ETL server/edge node and run a preparation script, which you create, to do the following: start the Editor Runtime Service, dmxd; create a UNIX account, connectuser/connectuser; enable password authentication for SSH.

Example: ETL server/edge node preparation script
#!/bin/bash
# (1) start dmxd on master-node
DMEXPRESS_HOME_DIRECTORY=/usr/dmexpress
export DMEXPRESS_HOME_DIRECTORY
if [ "" != "022" -a "" != "0022" -a "" != "000" -a "" != "00" -a "" != "0000" -a "" != "002" -a "" != "02" -a "" != "0002" -a "" != "020" -a "" != "0020" ]
then
umask 022 2>/dev/null
fi
if [ ! -f $DMEXPRESS_HOME_DIRECTORY/bin/dmxd ]
then
echo Failed to locate the Editor Runtime Service 'dmxd'.
24 Connect Install Guide
exit 1
fi
mkdir -p $DMEXPRESS_HOME_DIRECTORY/logs
echo "JOBS_DETAILS_DIR=$DMEXPRESS_HOME_DIRECTORY/logs" > $DMEXPRESS_HOME_DIRECTORY/bin/dmxd.conf
echo "DMEXPRESS_EXE=$DMEXPRESS_HOME_DIRECTORY/bin/dmexpress" >> $DMEXPRESS_HOME_DIRECTORY/bin/dmxd.conf
echo "DMEXPRESS_AUTHENTICATION_METHOD=DEFAULT" >> $DMEXPRESS_HOME_DIRECTORY/bin/dmxd.conf
PATH=$DMEXPRESS_HOME_DIRECTORY/bin:$PATH:/usr/bin; export PATH
LD_LIBRARY_PATH=$DMEXPRESS_HOME_DIRECTORY/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
cd $DMEXPRESS_HOME_DIRECTORY/bin
echo Starting the Editor Runtime Service at `date`...
nohup ./dmxd ./dmxd.conf 1>dmxd.stdout 2>dmxd.stderr &
# (2) create connectuser
useradd -d /home/connectuser -m -s /bin/bash "connectuser"
echo "connectuser:connectuser"| chpasswd
if test $? -ne 0
then
echo Failed to set password for user connectuser.
exit 1
fi
# (3) enable password authentication for sftp
cat /etc/ssh/sshd_config | sed -e "s/PasswordAuthentication.*no/PasswordAuthentication yes/" > sshd_config_temp
mv sshd_config_temp /etc/ssh/sshd_config
/etc/init.d/sshd restart
if test $? -ne 0
then
echo Failed to enable ssh password login.
exit 1
fi
exit 0

As dmxd runs on port 32636 and the SSH service runs on port 22, modify the edge node network rules to allow TCP connections to these ports from the Windows instance.