Using Cloudera Director, you can install Connect for Big Data on all of the nodes of a cluster in Google Cloud Platform (GCP) or in Amazon web services (AWS).
Pre-Installation
To enable Cloudera Director to install Connect for Big Data on a cluster in the cloud,
update the instancePostCreateScripts
section of the Cloudera Director
configuration file to invoke a Connect installation script, which you create. At a minimum,
the Connect installation script must install the DMExpress RPM.
Example: instancePostCreateScripts section of a Cloudera Director configuration file
instancePostCreateScripts
example, the Connect
installation script is copied from a Google Cloud Storage bucket and executed.
instancePostCreateScripts: ["""#!/bin/sh
echo "Installing DMExpress..."
/usr/local/bin/gsutil cp gs://<bucket_name>/installdmx.sh installdmx.sh
chmod a+x installdmx.sh
sudo ./installdmx.sh
if test $? -ne 0
then
echo Failed to install Connect on cluster nodes.
exit 1
fi
echo "Done installing Connect ..."
exit 0
"""]
Example: Connect installation script
#!/bin/bash
version=9.13
shrinkWrapFile=dmexpress-${version}-1.x86_64.bin
shrinkWrapResponse=shrinkWrapResponse.txt
# create the shrink-wrap response file
cat < $shrinkWrapResponse
a
EOF
/usr/local/bin/gsutil cp gs://<bucket_name>/$shrinkWrapFile $shrinkWrapFile
if test $? -ne 0
then
echo Failed to copy Connect shrinkwrap file from the bucket
echo ""
exit 1
fi
chmod a+x $shrinkWrapFile
#extract the rpm
./$shrinkWrapFile < $shrinkWrapResponse > shrinkWrap.out 2>&1
#install the rpm
rpm -i dmexpress-${version}-1.x86_64.rpm
if test $? -ne 0
then
echo Failed to install Connect RPM package
echo ""
exit 1
fi
rm -f $shrinkWrapResponse
rm -f $shrinkWrapFile
rm -f dmexpress-${version}-1.x86_64.rpm
Installation
From the Cloudera Director CLI, create the cluster. When the Cloudera Director cluster deployment completes successfully, Connect for Big Data is installed on all of the nodes in the cluster.
Post-installation
To enable the submission of Connect for Big Data jobs from the Connect Job Editor on a Windows instance, do the following:
SSH to the ETL server/edge node and run a preparation script, which you create, to do the
following: start the Editor Runtime Service, dmxd; create a UNIX account,
connectuser/connectuser
; enable password authentication for SSH.
#!/bin/bash
# (1) start dmxd on master-node
DMEXPRESS_HOME_DIRECTORY=/usr/dmexpress
export DMEXPRESS_HOME_DIRECTORY
if [ "" != "022" -a "" != "0022" -a "" != "000" -a "" != "00" -a "" != "0000" -a "" != "002" -a "" != "02" -a "" != "0002" -a "" != "020" -a "" != "0020" ]
then
umask 022 2>/dev/null
fi
if [ ! -f $DMEXPRESS_HOME_DIRECTORY/bin/dmxd ]
then
echo Failed to locate the Editor Runtime Service 'dmxd'.
24 Connect Install Guide
exit 1
fi
mkdir -p $DMEXPRESS_HOME_DIRECTORY/logs
echo "JOBS_DETAILS_DIR=$DMEXPRESS_HOME_DIRECTORY/logs" > $DMEXPRESS_HOME_DIRECTORY/bin/dmxd.conf
echo "DMEXPRESS_EXE=$DMEXPRESS_HOME_DIRECTORY/bin/dmexpress" >> $DMEXPRESS_HOME_DIRECTORY/bin/dmxd.conf
echo "DMEXPRESS_AUTHENTICATION_METHOD=DEFAULT" >> $DMEXPRESS_HOME_DIRECTORY/bin/dmxd.conf
PATH=$DMEXPRESS_HOME_DIRECTORY/bin:$PATH:/usr/bin; export PATH
LD_LIBRARY_PATH=$DMEXPRESS_HOME_DIRECTORY/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
cd $DMEXPRESS_HOME_DIRECTORY/bin
echo Starting the Editor Runtime Service at `date`...
nohup ./dmxd ./dmxd.conf 1>dmxd.stdout 2>dmxd.stderr &
# (2) create connectuser
useradd -d /home/connectuser -m -s /bin/bash "connectuser"
echo "connectuser:connectuser"| chpasswd
if test $? -ne 0
then
echo Failed to set password for user connectuser.
exit 1
fi
# (3) enable password authentication for sftp
cat /etc/ssh/sshd_config | sed -e "s/PasswordAuthentication.*no/PasswordAuthentication yes/" > sshd_config_temp
mv sshd_config_temp /etc/ssh/sshd_config
/etc/init.d/sshd restart
if test $? -ne 0
then
echo Failed to enable ssh password login.
exit 1
fi
exit 0
As dmxd
runs on port 32636 and the SSH service runs on port 22, modify the
edge node network rules to allow TCP connections to these ports from the Windows
instance.