- Managed Methods - recommended for large clusters
- Cloudera Manager Parcel Installation – Store the parcel in the Cloudera Manager local or remote parcel repository (requires root/sudo privileges), then distribute and activate the parcel on the cluster nodes via Cloudera Manager (requires Administrator access to Cloudera Manager). Available as of Cloudera Manager 4.5.
- Apache Ambari Service Installation – Deploy the Connect for Big Data Service Definition Package to the Ambari repository, then install Connect for Big Data on the nodes in the cluster using the Ambari web interface (requires root/sudo privileges). Available as of Ambari 1.7.
- RPM Installation – Deploy the RPM (Red Hat Package Manager) on all nodes in the cluster, then use the RPM to install Connect for Big Data on all nodes in the cluster (requires root/sudo privileges).
- Debian package Installation – Deploy the apt package manager on all nodes in the cluster, then use the apt command to install Connect for Big Data on all nodes in the cluster (requires root/sudo privileges).
- Manual/Silent Installation – Install Connect for Big Data on one node and replicate on all remaining nodes
The Editor Runtime Service (dmxd) only needs to be running on the node(s) to which you want to submit jobs from the Connect GUI; typically, this is the machine designated as the edge node. When installing Connect for Big Data using any of the managed methods, the Editor Runtime Service is not installed. See Installing/Upgrading the Editor Runtime Service for instructions on how to do this on the edge node.
Installation Packages for Managed Methods
There are two separate installation packages for Connect for Big Data, one for the software and another for the license. If you do not already have a license installed, install a license package along with the software package. If the license is not installed, Connect for Big Data runs in trial mode, which eventually expires and stops working.
If you want to upgrade from a release before the introduction of the second license package, you must install both the software and license packages.
Cloudera Manager Parcel Installation
Pre-Installation
Execute the following steps on the machine where Cloudera Manager is installed:
./dmexpress-<Connect version>-<OS>.parcel.bin
./dmexpresslicense_<license site ID>-<date>-<OS>.parcel.bin
For example, dmexpresslicense_12345-20190928-el6.parcel.bin
Read and accept the Software License Agreement.
Enter a target directory in which to put the extracted .parcel, .sha, and manifest.json files. The manifest.json file is required to use Connect via a remote parcel repository. The default is the current folder.
Installation
- Depending on whether you are using a local parcel repository or a remote parcel
repository, do one of the following:
- Local parcel repository – With root/sudo privileges, copy the extracted .parcel and .sha files for software and license to the Cloudera Manager local parcel repository. The default location is /opt/cloudera/parcel-repo/.
- Remote parcel repository – With root/sudo privileges, copy the extracted .parcel and manifest.json files for software and license to your remote parcel repository. Ensure that the files have read and execute permissions for all users. As outlined on Cloudera’s Creating and Using a Parcel Repository page, follow the steps to Configure the Cloudera Manager Server to Use the Parcel URL.
- Logged in to Cloudera Manager as an Administrator user, click on the parcel indicator button in the Cloudera Manager Admin console navigation bar to bring up the Parcels tab of the Hosts page.
- If not already detected, click on the Check for New Parcels button. Consider
the following:
- If you are using a local parcel repository, you can see the “downloaded” parcels on this page, for example, dmexpress 9.11.1 and/or dmexpresslicense_12345 20180928.
- If you are using a remote parcel repository, click on the Download button
to download the dmexpress and/or dmexpresslicense-XXXXX parcel from the remote
repository.
Click on the Distribute button to distribute the dmexpress and/or dmexpresslicense-XXXXX parcel to the nodes in the cluster. By default, the files are written to /opt/cloudera/parcels/parcel_name/ on each node.
- Upon completion of the distribution, either or both parcels can be activated by clicking on its Activate button. If there was a previously activated distribution of Connect for Big Data, be sure that no Connect for Big Data jobs are running, because Cloudera Manager automatically deactivates the old parcel upon activation of the new parcel, and any running jobs fail.
- Upon activation, the symbolic link /usr/dmexpress is created/updated to point to the
activated Connect installation.
See the Cloudera Manager Enterprise Edition User Guide for details on Managing Parcels.
Apache Ambari Service Installation
Pre-Installation
- Run the self-extracting shrink-wrap executable for the software package from the
directory where it is located. For the software executable this
is:
./dmexpress-<Connect version>-<OS>.parcel.bin
For the license executable, this is:./dmexpresslicense-<license site ID>-<date>-<arch>.ambari-service.bin
e.g. dmexpresslicense-12345-20180928-any.ambari-service.bin
- Read and accept the Software License Agreement.
- Enter a target directory in which to extract the Connect for Big Data or Connect for Big Data license service folder, or press Enter to accept the default, which is the current directory. If a folder with the same name already exists, you are prompted to overwrite; enter yes to overwrite, or no to exit the extracting process.
- Enter a target directory in which to copy the Connect for Big Data or Connect for Big Data license service package where it can be found by the Ambari server, or press Enter to accept the default, which is the root path of the latest stack.
- Enter
yes
to restart the Ambari server for the new package to be picked up, or no to restart later. - If the Connect for Big Data or Connect for Big Data license, respectively, service definition already exists in the repository, you are prompted to upgrade; enter yes to upgrade, or no to exit the process without updating the existing service definition package.
- Enter the Ambari server's hostname, username, and password, and the cluster name, as
prompted, to complete the upgrade.
- If the credentials entered fail, you can re-run this step manually by executing
the following script, where
is the directory you specified in step 3:<Ambari service extracted package path>
<Ambari service extracted package path>/services/DMXh /package/scripts/prepare_dmxh_upgrade.sh
- If the credentials entered fail for the license package, execute this script:
<Ambari service extracted package path>/services/ DMXhLicense/package/scripts/prepare_dmxh_license_upgrade.sh
- If the credentials entered fail, you can re-run this step manually by executing
the following script, where
- If there is no license installed, repeat steps 1-7 for the license .bin file.
Installation
- Log in to the Ambari dashboard and select Actions->Add Service.
- On the Add Service Wizard page, select Connect for Big Data and/or Connect for Big Data License and click Next.
- On the Assign Slaves and Clients page, check Client for all nodes, and click Next.
- On the Configure Services page, click Next to continue with the default options (recommended). Alternatively, if you wish to change the default installation directory, expand the “Advanced” section and make changes to the Connect for Big Data Base Directory setting, ensuring that the same directory is specified for both the Connect for Big Data and Connect for Big Data License tabs, and then click Next.
- On the Review page, verify the configuration and click Deploy to deploy Connect for Big Data and/or Connect for Big Data License, or click Back to make modifications.
- On the Install, Start and Test page, wait for the Connect for Big Data and/or
Connect for Big Data License service to be successfully installed on each node. If an
error occurs, select the "Failures encountered" text to display an error log and
identify the problem.
See http://docs.hortonworks.com/ for details on Apache Ambari.
RPM Installation
Pre-Installation
- Run the self-extracting shrink-wrap executable for the software and license packages
from the directories where they are located. For the software RPM, this
is:
./dmexpress-<Connect version>-1.x86_64.bin
For the license RPM, this is:./dmexpresslicense-<license site ID>-<date>-<revision>.<arch>.bin
e.g. dmexpresslicense-12345-20180927-1.x86_64.bin
- Read and accept the Software License Agreement.
- Enter a target directory in which to put the extracted RPM file (the default is the current folder).
Installation
You can deploy the RPM on all nodes in the cluster using configuration management software or install the Connect for Big Data RPM package on all nodes in the cluster directly:
Execute the following command with sudo or root privileges:
rpm -i dmexpress-<Connect version>-1.x86_64.rpm
The license RPM equivalent command is:
rpm -i dmexpresslicense-<license site ID>-<date>-<revision>.<arch>.rpm
This creates a dmexpress folder under the default install location of /usr. To install to a
different location (not recommended), use the --prefix
option for
both license and software install, such as:
rpm -i --prefix /some/other/directory dmexpress-<Connect version>-1.x86_64.rpm
Alternatively, the RPM can be installed with your Linux distribution’s high-level package manager if it supports RPM. For example, on RHEL and CentOS, the yum command can be used:
yum install dmexpress-<version>-1.x86_64.rpm
or
yum install dmexpresslicense-<license site ID>-<date>-<revision>.<arch>.rpm
If there is an existing package, you can upgrade the software or license RPM instead:
rpm -U <package>.rpm
or
yum upgrade <package>.rpm
Debian Package Installation
Execute the following steps on a Linux machine:
- Run the self-extracting shrink-wrap executable for the software and license packages
from the directories where they are located on your system outside of the cluster. For the
software Debian package, this is:
./dmexpress-<Connect_version>-1.bin
For the license Debian package, this is:
./dmexpresslicense-<license_site_ID>-<license_date>-<revision>.bin
e.g. dmexpresslicense-12345-20180927-1.bin
- Read and accept the Software License Agreement.
- Enter a target directory in which to put the extracted .deb file (the default is the current folder).
Installation
You can deploy the Debian package on all nodes in the cluster using configuration management software or install the Debian package on all nodes in the cluster directly:
Execute the following command with sudo or root privileges:
apt install ./dmexpress-<Connect version>-1.deb
Optionally, to install a license, execute the following command with sudo or root privileges:
apt install ./dmexpresslicense-<license site ID>-<date>-<revision>.deb
This creates a dmexpress folder under the default install location of /usr.
Databricks cluster only: If the Debian package will be installed on a Databricks cluster, you manually install Debian on each node in the cluster. As an alternative to manual installs, you can specify the install command in the cluster node initialization (init) script. Databricks will then use the init script during cluster creation to extract the Connect for Big Data executable bin files and automatically install them on each node in the cluster.
#!/bin/bash
debPackageDBFSPath=/dbfs/mnt/azuregen2/dir1
connectDEBPackageName=dmexpress_9.11.33-1.deb
connectLicenseDEBPackageName=dmexpresslicense-123456_77910427-1.deb
cp -f $ debPackageDBFSPath/$connectDEBPackageName
$connectDEBPackageName
chmod a+x $connectDEBPackageName
cp -f $ debPackageDBFSPath/$connectLicenseDEBPackageName
$connectLicenseDEBPackageName
chmod a+x $connectLicenseDEBPackageName
sudo apt install ./$connectDEBPackageName
rm $connectDEBPackageName
sudo apt install ./$connectLicenseDEBPackageName
rm $connectLicenseDEBPackageName
To deploy Connect to a Databricks cluster, see Deploying Connect to a Databricks cluster in the cloud.
Uninstallation
Execute the following command with sudo or root privileges on the machine where you want to uninstall:
apt -y remove dmexpress
Manual/Silent Installation
Pre-Installation
- Create a shared directory, hereafter referred to as <shared_directory>, that can
be accessed by all nodes in the cluster for sharing the following files/folders
(otherwise, they would need to be copied to the same location on each node in the
cluster):
- The DMExpressLicense.txt file obtained from the download.
- The dmexpress sub-directory created upon the dmexpress tar file extraction.
- The response file for the Connect silent installations (generated upon install on the first node).
- Extract the Connect Software.
- Copy DMExpressLicense.txt and the dmexpress tar file to the <shared_directory>.
- Extract the contents of the dmexpress tar file in the <shared_directory> on
your UNIX system:
tar xvof dmexpress_<Connect version>-1_<language>_linux_2-6_x86-64_64bit.tar
This creates a dmexpress/ directory under the current directory, hereafter referred to as the <connect_download_directory>.
Installation
To install Connect for Big Data on each node in the cluster, follow the instructions under UNIX Systems, Silent Installation. You must manually install Connect for Big Data on the first node, specifying a file to record your responses to the install prompts, and can then silently install Connect for Big Data on the remaining nodes using the recorded response file, ensuring that all nodes are configured consistently.
When running the manual installation on the first machine, respond no to the prompt about installing the Editor Runtime Service unless you want all the nodes in the cluster to install/run it. See Installing/Upgrading the Editor Runtime Service for instructions on installing it on at least one machine to which Connect for Big Data jobs are submitted from the GUI.
Installing/Upgrading the Editor Runtime Service
The Editor Runtime Service (dmxd) must be installed and running on any machine to which Connect for Big Data jobs are submitted from the GUI; typically, this is the machine designated as the edge node. If you install/upgrade Connect for Big Data on the edge node using any of the managed installation methods, or using the Manual/Silent installation method where you answer no to the prompt about installing the service, the Editor Runtime Service is not installed/upgraded.
To install/upgrade the Editor Runtime Service on any machine where Connect for Big Data is installed, follow the instructions for UNIX systems in Configuring the Editor Runtime Service.