Performing installation - Data360_DQ+ - 11.X

Data360 DQ+ Enterprise Installation

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
11.X
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Enterprise Installation
Copyright
2024
First publish date
2016
ft:lastEdition
2024-06-06
ft:lastPublication
2024-06-06T12:37:34.761477

Once you have met the prerequisites outlined in Preparing to install DQ+, you can move on to installation. The installer is a shell script named dqplus_installer_XXXX.sh which you will need to execute to begin the installation process.

Unzip the installer

sudo -u sagacity dqplus_installer_XXXX.sh

You should see that the installer files have been extracted to the /opt/infogix directory.

Deciding how to configure your cluster

Before performing installation, you should consider on which machines you are going to place the components described in Description of architecture (Application, Load Balancer, ApplicationDb, Compute primary/secondaries (Hadoop cluster), and ComputeDb Nodes).

This assignment is performed by making modifications to the install.properties file, as described in the "Edit the install.properties file" section of this guide.

To help you design your cluster, a few basic examples are provided:

Maintenance Machine configuration

Maintenance Machine configuration is made by assigning the IP address of the Maintenance Machine to a set of properties found under the Node Mapping section of the install.properties file. Individual components are described in greater detail later in this guide, however the basic configuration is as follows.

Maintenance Machine configuration

Assuming the Maintenance Machine had an IP address of 192.0.2.1, this configuration would be accomplished by making the following assignments in the install.properties file:

sagacityLoadBalancer_ip=192.0.2.1

sagacityComputeDbNode1_ip=192.0.2.1

dqplusApplicationDb_ip=192.0.2.1

Note: This example is for illustrative purposes only. For a more detailed description of these properties, consult the table found in the "Edit the install.properties file" section.

Three machine configuration

In addition to the recommended Maintenance Machine configuration, the other components that comprise Data360 DQ+ need to be assigned to the other IP addresses of the machines you have available in your cluster. Some additional components may also be assigned to the Maintenance Machine.

Three machine configuration

For example, in the diagram above, a three machine cluster is shown with component assignment spread across the three machines. Notice that nodes within the Compute/Hadoop and ComputeDb clusters should be distributed across machines.

Here, this configuration would be accomplished with the following assignments within the install.properties file:

sagacityLoadBalancer_ip=192.0.2.1

sagacityApplicationServer1_ip=192.0.2.1

sagacityApplicationServer1_memory=3g

sagacityApplicationServer_memory-swap=6g

sagacityApplicationServer2_ip=255.255.255.0

sagacityApplicationServer2_memory=3g

sagacityApplicationServer2_memory-swap=6g

dqplusApplicationDb_ip=192.0.2.1

sagacityComputeDbMaster_ip=192.0.2.1

sagacityComputeDbNode1_ip=255.255.255.0

sagacityComputeDbNode2_ip=254.254.254.0

Note: This example is for illustrative purposes only. For a more detailed description of these properties, consult the table found in the "Edit the install.properties file" section.

Six node example

If working with two or more machines, other configurations can be considered. Generally, you will want to assign more memory to the components your users will use the most. For more information on what each component does, see Getting started: Basic concepts and terminology. As a general example, consider the following six machine configuration.

Six node example

Here, this configuration would be accomplished with the following assignments within the install.properties file:

sagacityLoadBalancer_ip=192.0.2.1

sagacityApplicationServer1_ip=192.0.2.1

sagacityApplicationServer1_memory=3g

sagacityApplicationServer_memory-swap=6g

sagacityApplicationServer2_ip=255.255.255.0

sagacityApplicationServer2_memory=3g

sagacityApplicationServer2_memory-swap=6g

dqplusApplicationDb_ip=192.0.2.1

sagacityComputeDbMaster_ip=253.253.253.0

sagacityComputeDbNode1_ip=252.252.252.0

sagacityComputeDbNode2_ip=251.251.251.0

Note: This example is for illustrative purposes only. For a more detailed description of these properties, consult the table found in the "Edit the install.properties file" section.

Edit the install.properties file

Once you have an idea of how you’ll configure your cluster - and prior to running the first installation script - you’ll need to make some modifications to the file found at:

/opt/infogix/dqplus/properties/install.properties

The modifications you make to the install.properties file will map Data360 DQ+ components to machines and will also determine a wide range of other settings that control the system.

See the following tables describing all of the properties found in this file:

Description of install.properties properties

Below is a description of each property found within the install.properties file. Properties are grouped by topic, as is done by the comments in the actual file itself.

Administrator details

Property

Description

sagacityAdministratorEmail

Set to your email.

sagacityAdministratorFirst

Set to your first name

sagacityAdministratorLast

Set to your last name

sagacityCompanyName

Set to your company’s name

sagacityCompanyId

Set to a unique id that can be used to identify your company.

For example: yourcompany.com

SMTP properties

Property

Description

EMAIL_HOST

The SMTP server name.

EMAIL_PORT

The SMTP server port.

EMAIL_USER

The user id from which the outgoing notifications from Data360 DQ+ will be sent.

EMAIL_PASSWORD

Password for the EMAIL_USER

EMAIL_FROM_ADDRESS

The email that should appear in the From field when a user receives an email from Data360 DQ+.

Default deployment user

Property

Description

sagacityMaintenanceUser

Set to the name that you assigned to the operating system user you created during preparation.

sagacityMaintenancePassword

Set this to the password for the maintenance user on the operating system. This is required, as it will be used by Vertica to connect with other Vertica hosts within the system.

JAVA_HOME and Security Lib

Property

Description

JavaHomeDir

Path to the Maintenance machine’s installation of Java.

For example: /opt/java/current

JavaSecurityLib

Path to the Maintenance machine’s Java Cryptography Extension (JCE) Library.

For example: ${JavaHomeDir}/lib/security

Note: There is no need to change this property from its default value.

DNS hostname mapping

Property

Description

dns_<ip address>_hostname=<hostname>

A mapping of all cluster machines’ host names to IP addresses.

Note that both sides of the property are variable.

For example, if you were using 2 machines, you’d have:

dns_172.170.130.195_hostname=ip-172-170-130-195.ec2.internal

dns_172.170.130.245_hostname=ip-172-170-130-245.ec2.internal

Authentication type settings

Property

Description

AUTH_TYPE

Authentication type to use.

Possible values: INTERNAL, SAML, or HTTPHEADER.

If no value is specified, will default to SAML, if IS_SAML_ENABLED is set to ‘true’.

If no value is specified, and IS_SAML_ENABLED is not set to ‘true’, will default to INTERNAL.

SAML settings

Property

Description

IS_SAML_ENABLED

Set to true if SAML is used as the sagacityCompanyAuthType.

Set to false otherwise.

SAML_METADATA_IDP

Path to Identity Provider (IDP) Metadata XML file if SAML is used.

If SAML is not used, leave blank.

SAML_SSO_MAX_AUTH_AGE

Maximum allowable time (in seconds) between the authentication of a user and the processing of an authentication statement when SSO is enabled.

Default setting: 43200 seconds (12 hours)

Node mapping

Property

Description

sagacityLoadBalancer_ip

IP address of the machine to place the Load Balancer on. This should be the Maintenance machine.

dqplusApplicationDb_ip

IP address of the machine to place ApplicationDb on. This should be the Maintenance machine.

sagacityStandbyApplicationDb_ip

If setting up a warm standby Postgres database, IP address of the instance on which the standby container should be placed.

sagacityApplicationServer1_ip

IP address of the machine to place App Server 1 on. For App Server 1, this should be the Maintenance machine. For additional App Servers, this should be the additional machines in use.

To add additional App Servers, create a sagacityApplicationServer_ip property and IP Address mapping for each one, using incremental numbering.

For example:

sagacityApplicationServer2_ip, sagacityApplicationServer3_ip...

sagacityApplicationServerN_ip

sagacityApplicationServer1_memory

Default setting: 3g

Used as a command line parameter that is passed when creating the Application component.

For example,

sagacityApplicationServer1_memory=3g

allocates 3g of memory for App Server 1.

Note that if multiple App Servers are in use, you need to assign memory to each one, using the appropriate numbers to identify each server.

For example:

sagacityApplicationServer2_memory,

sagacityApplicationServer3_memory...

sagacityApplicationServerN_memory

sagacityApplicationServer1_memory-swap

Default setting: 6g

Used as a command line parameter that is passed when creating the Application component.

For example,

sagacityApplicationServer1_memory-swap=6g

allocates 6g of memory-swap for App Server 1.

Note that if multiple App Servers are in use, you need to assign memory-swap to each one, using the appropriate numbers to identify each server.

For example:

sagacityApplicationServer2_memory-swap,

sagacityApplicationServer3_memory-swap...

sagacityApplicationServerN_memory-swap

sagacityComputeDbMaster_ip

IP address of the machine to place ComputeDb primary on. This should be the Maintenance machine.

sagacityComputeDbNode1_ip

IP address of the machine to place ComputeDb Node 1 on.

To add additional ComputeDb Nodes, create a sagacityComputeDbNode_ip property and IP Address mapping for each one, using incremental numbering.

For example:

sagacityComputeDbNode2_ip, sagacityComputeDbNode3_ip...

sagacityComputeDbNodeN_ip

Load Balancer configuration

Property

Description

IS_SSL_ENABLED

Set to true if SSL is enabled; false if SSL is disabled.

LOAD_BAL_SSL_CERT

If SSL is enabled, set this parameter to the path of the file containing the security certificate in the PEM format.

If SSL is disabled, leave blank.

LOAD_BAL_SSL_CERT_KEY

If SSL is enabled, set this parameter to the path of the file containing the security certificate’s secret key in the PEM format.

If SSL is disabled, leave blank.

LOAD_BAL_SSL_TRUSTED_CERT

If SSL is enabled, set this parameter to the path of the file containing a trusted CA certificate in the PEM format (optional).

If SSL is disabled, leave blank.

Deployment properties

Property

Description

DEPLOY_HOST

Used to specify the application access point, i.e., the address of the Load Balancer.

Default setting: ${sagacityLoadBalancer_ip}

DEPLOY_HOST_URL

Used to specify the URL of the application access point.

If SSL is disabled, set to: http://${DEPLOY_HOST}

If SSL is enabled, set to: https://${DEPLOY_HOST}

DEPLOY_JAVA_OPTS

Sets the default file encoding for the JVM to UTF-8, and sets the timezone.

You must modify the user.timezone property value to match the timezone of the local host. The value of user.timezone must be one of the values for the "TZ database name" column supported by Java, as described at: https://en.wikipedia.org/wiki/List_of_tz_database_time_zonesThe timezone specified in this property must match the timezone of the system(s) that the product is installed on and the timezone in the Admin Settings screen within the product.

Default setting:

-Dfile.encoding=UTF-8 -Duser.timezone=America/Chicago

SSO properties

Property

Description

USE_IFRAME_FOR_SSO

Used to specify if the system should display SSO verification in an iFrame.

Values can be: true or false.

Default value: false

Special folders: Application mounts

Property

Description

sagacitySharedMountPoint

Set this to the path of a directory that can be shared across all machines in the cluster to hold data, logs, and backup content.

sagacityExclusiveMountPoint

Set this to the path of a directory located on the Maintenance Machine that can be used to contain runtime data for the Compute, Load Balancer, and Application components.

Special folders: Shared mount subdirectories

Property

Description

DATA_HOME

Default setting: ${sagacitySharedMountPoint}/data

The folder within sagacitySharedMountPoint that hosts Application runtime data.

Note: There is no need to change this property from its default value.

sagacityLogsFolder

Default setting: ${sagacitySharedMountPoint}/logs

The folder within sagacitySharedMountPoint that hosts system logs.

Note: There is no need to change this property from its default value.

sagacityBackupFolder

Default setting: ${sagacitySharedMountPoint}/backup

The folder within sagacitySharedMountPoint that hosts all database backups for the system.

Note: There is no need to change this property from its default value.

Special folders: Exclusive mount sub-directories

Property

Description

sagacityDataFolder

Default setting: ${sagacityExclusiveMountPoint}/application}

The folder within sagacityExclusiveMountPoint that hosts the databases of the components that comprise Data360 DQ+.

Note: There is no need to change this property from its default value.

sagacityFolderCompute

Default setting: ${sagacityExclusiveMountPoint}/runtime/hadoop}

The folder within sagacityExclusiveMountPoint that hosts the Compute component.

Note: There is no need to change this property from its default value.

sagacityFolderLoadBalancer

Default setting: ${sagacityExclusiveMountPoint}/runtime/nginx}

The folder within sagacityExclusiveMountPoint that hosts the Load Balancer component.

Note: There is no need to change this property from its default value.

Special folders: External folders (shared across all nodes)

Property

Description

externalDataFolder

Set this to the path of a directory that can be shared across all machines in the cluster to hold files for processing.

externalDataFolderMountDestination

Set this to a path that you would like to be used to represent the externalDataFolder whenever that path is shown in the UI.

Application keystore

Property

Description

APP_KEYSTORE_FILE

See comments in install.properties file

APP_KEYSTORE_PASSWORD

The password to the Application Keystore.

Vertica DB (ComputeDb properties)

Property

Description

VERTICA_DB_URL

Set to the Vertica data access url.

VERTICA_DB_URL_ESCAPED

Same as VERTICA_DB_URL, but with the ampersand escaped.

Only used when VERTICA_SSL_ENABLED = true

VERTICA_DB_USER

Default setting: ${sagacityMaintenanceUser}

VERTICA_DB_PASSWORD

The Vertica database password.

VERTICA_DATABASE_SCHEMA

The name of the schema in the Vertica database to use.

Default setting: public

VERTICA_SSL_ENABLED

Set to true to enable Vertica SSL.

If the VERTICA_SSL_ENABLED property is set to true, then the system expects the physical locations of the CRT and the KEY files to be referenced in the VERTICA_SSL_CRT_FILE and VERTICA_SSL_KEY_FILE properties, respectively.

VERTICA_SSL_CERTIFICATE_GENERATE

Set to true to generate a Self Signed SSL certificate for Vertica. Note that VERTICA_SSL_ENABLED must also be set to true.if this property is set to false, a Self Signed SSL certificate will not be automatically generated for Vertica. In this case, use the VERTICA_SSL_CRT_FILE and VERTICA_SSL_KEY_FILE properties to define where to locate your existing certificate and key files.

VERTICA_SSL_CRT_FILE

Physical location of your CRT file.

Only used when the VERTICA_SSL_ENABLED property is set to true.

If you have set VERTICA_SSL_CERTIFICATE_GENERATE to true then this is the location that the generated CRT file will be stored.

VERTICA_SSL_KEY_FILE

Physical location of your KEY file.

Only used when the VERTICA_SSL_ENABLED property is set to true.

If you have set VERTICA_SSL_CERTIFICATE_GENERATE to true then this is the location that the generated Key file will be stored.

VERTICA_LICENSE_FILE

If using the Community Edition of Vertica, set to: CE

Otherwise, set to the path of your Vertica license file.

VERTICA_RESOURCE_POOL_MAXMEMORYSIZE

Default setting: 25%

Can be set to a percentage using % or an actual amount of space, for example: 10G

See Vertica documentation if you would like to change this setting.

VERTICA_MAX_DATA_UTILIZATION_PERCENTAGE

Default setting: 80

Always interpreted as a percentage, so % is not needed.

See Vertica documentation if you would like to change this setting.

Database properties (ApplicationDb properties)

Property

Description

DATABASE_ADMIN_PASSWORD

The database administrator password.

DATABASE_PASSWORD

The database user password. Set to the password of the application database user.

DATABASE_SERVER

Default setting: ${dqplusApplicationDb_ip}

The IP address of the machine that the ApplicationDb component is hosted on.

Note: There is no need to change this property from its default value.

STANDBY_DATABASE_SERVER

Default setting: ${sagacityStandyByApplicationDb_ip}

Note: There is no need to change this property from its default value.

SQL_DATASTORE_DB_SERVER

Default setting: ${DATABASE_SERVER}

The IP address of the machine that the ApplicationDb component is hosted on.

Note: There is no need to change this property from its default value.

DATABASE_SERVER_SSL_ENABLED

Set to true to enable Database SSL.If the DATABASE_SERVER_SSL_ENABLED property is set to true, then the system expects the physical locations of the CRT and the KEY files to be referenced in the DATABASE_SERVER_SSL_CRT_FILE and DATABASE_SERVER_SSL_KEY_FILE properties, respectively.

DATABASE_SERVER_SSL_CERTIFICATE_GENERATE

true to generate a Self Signed SSL certificate for the Application DB. Note that DATABASE_SERVER_SSL_ENABLED must also be set to true.

false to not generate a Self Signed SSL certificate for the Application DB.

DATABASE_SERVER_SSL_CRT_FILE

Physical location of your CRT file.

Only used when DATABASE_SERVER_SSL_ENABLED = true

DATABASE_SERVER_SSL_KEY_FILE

Physical location of your KEY file.

Only used when DATABASE_SERVER_SSL_ENABLED = true

DATABASE_URL

Set to the url by which JDBC is established

to access the ApplicationDb database.

Note: There is no need to change this property from its default value.

DATABASE_URL_ESCAPED

Same as DATABASE_URL, but with the ampersand escaped.

Only used when DATABASE_SERVER_SSL_ENABLED = true

SQL_DATASTORE_DB_URL

Default setting: ${DATABASE_URL}

Note: There is no need to change this property from its default value.

Hadoop connectivity properties (Compute properties)

Property

Description

HADOOP_VENDOR

The distribution type of Hadoop in use.

Permissible values:cloudera

HADOOP_VENDOR_VERSION

The version of the Hadoop distribution.

HADOOP_VENDOR_VERSION_COMPLETE

The version of the Hadoop distribution.

HDFS_MASTER_NODE_ADDRESS

Host reference of the Hadoop Distributed File System primary node in use.

Use a Fully Qualified Domain Name or an IP address/port. For example: 172.17.13.168:8020

YARN_RESOURCE_MANAGER_ADDRESS

Host reference of the Yarn resource manager node within the Hadoop cluster in use.

Use a Fully Qualified Domain Name or an IP address/port. For example: 172.17.13.190:8050

HADOOP_KMS_SERVER_URL

Default setting: kms://http@${sagacityComputeMaster_ip}:16000/kms

The url by which the Compute primary component’s KMS server is accessible.

Note: There is no need to change this property from its default value.

YARN_NODE_MANAGER_VCORES

Controls the value of this Hadoop service.

Value is determined on a case by case basis by Infogix.

YARN_NODE_MANAGER_MEMORY_MB

Controls the value of this Hadoop service.

Value is determined on a case by case basis by Infogix.

YARN_SCHEDULER_MAXIMUM_ALLOCATION_MB

Controls the value of this Hadoop service.

Value is determined on a case by case basis by Infogix.

YARN_SCHEDULER_MAXIMUM_ALLOCATION_VCORES

Controls the value of this Hadoop service.

Value is determined on a case by case basis by Infogix.

HADOOP_YARN_CLIENT_CONF_FILE

The absolute path to the Hadoop Yarn configuration file.

Note: This file must be made available at an accessible location prior to installation.

HADOOP_HDFS_DATA_FOLDER

The absolute path of the local HDFS data folder on the Data360 DQ+ cluster nodes.

HADOOP_SPARK_DYNAMIC_ALLOCATION

Used to specify if dynamic allocation is enabled for the Hadoop cluster.

Should be set to true, unless Infogix support advises to set to false.

HADOOP_EXTRA_JAVA_OPTIONS

Used to specify additional Java command-line parameters for Spark jobs executed in the Hadoop cluster.

Hadoop security properties (Compute properties)

Property

Description

SECURITY_AUTH_MODE

Default setting: kerberos

Note that Hadoop only supports kerberos as an authentication mode.

Note: There is no need to change this property from its default value, and it is only applicable if SECURITY_AUTH_ENABLED is set to true.

SECURITY_AUTH_ENABLED

Set to true to enable Kerberos authentication in Hadoop.

Set to false to disable Kerberos authentication in Hadoop.

DEFAULT_REALM

Default setting: sagacity

Effectively, a collection of credentials used for authentication purposes.

Using the default setting 'sagacity' is recommended, however the realm can be given any other name if necessary.

KDC_SERVER

The IP address or dns hostname the Kerberos component is hosted on.

For example:

${sagacityKerberosServer_ip}

KDC_DB_USER

The KerberosDb server user name.

KDC_DB_PASSWORD

Password for the KerberosDb.

KDC_USER_PASSWORD

Password for the default user on Kerberos.

KRB5_CONF_FILE

The absolute path to the Hadoop cluster’s Kerberos configuration file.

Typically a copy of the file (krb5.conf) is maintained on the Hadoop installation node and the path to this file can be used.

Note: Only specify when using a secured cluster.

KRB5_KEYTABS_DIR

The root directory where the Application Server nodes’ keytab files are stored.

Note: For each Application Server node, the installation process expects a directory with a name matching the IP address of the Application Server node. Additionally, the Data360 DQ+ system user keytab needs to be in each of these directories.

Note: Only specify when using a secured cluster.

HADOOP_KERBEROS_KEYTAB_FILE

Used to specify the reference to the Data360 DQ+system user's keytab file. The keytab file referenced by this property must be inside the following directory: /etc/security/keytabs/

For example: HADOOP_KERBEROS_KEYTAB_FILE=/etc/security/keytabs/sagacity.keytab

The is the same as the keytab file that is copied into all of the node directories under the KRB5_KEYTABS_DIR directory.

Note: Only specify when using a secured cluster.

HADOOP_KERBEROS_KEYTAB_USER

The principal name of the HADOOP_KERBEROS_KEYTAB_FILE file.

Note: Only specify when using a secured cluster.

HADOOP_KERBEROS_NN_KEYTAB_USER

The principal name of the Hadoop cluster's name-node. Can be obtained from the Hadoop cluster's security configuration.

Note: Only specify when using a secured cluster.

HADOOP_KERBEROS_RM_KEYTAB_USER

The principal name of the Hadoop cluster’s resource manager. Can be obtained from the Hadoop cluster’s security configuration.

Note: Only specify when using a secured cluster.

HADOOP_DFS_DATA_TRANSFER_PROTECTION

The Hadoop cluster's preset data transfer protection scheme. Can be obtained from the Hadoop cluster’s security configuration.

Note: Only specify when using a secured cluster.

HADOOP_DFS_HTTP_POLICY

The Hadoop cluster's preset HTTP access policy scheme.Note: When using a secured cluster, specify: HTTPS_ONLY

HADOOP_SPARK_AUTHENTICATE

Used to specify if Spark authentication is enabled on the Hadoop cluster. Set to true or false.

HDFS/Hadoop client configuration properties

Property

Description

HADOOP_HDFS_CLIENT_CONF_FILE

The HDFS client configuration file path and name.

Google Dataproc connectivity properties (Compute properties)

Property Description
GCP_PROJECT_ID Google GCP project ID to use
GCP_LOCATION_ID Google GCP location ID to use
GCP_SERVICE_ACCOUNT_KEY_FILE_LOCAL

Path to the service account key (.json) file on the maintenance machine.

This file will be copied to the remote docker containers at the location specified by GCP_SERVICE_ACCOUNT_KEY_FILE.

GCP_SERVICE_ACCOUNT_KEY_FILE Path to the service account key (.json) file used by the application at runtime.
GCP_KMS_MASTER_KEY_RING_ID The Google CKM key ring ID used for data encryption and decryption.
GCP_KMS_MASTER_KEY_ID The Google CKM key ID used for data encryption and decryption.
HADOOP_ENCRYPTION_KEY

The Google CKM key ID used for encryption and decryption in the spark or hadoop environment.

SECURITY_CRYPT_PROVIDER Value must be set to GOOGLE_KMS
EXECUTION_SECURITY_CRYPT_PROVIDER Value must be set to GOOGLE_KMS_SPARK
DATAPROC_ENABLED Set to true to enable Google Dataproc.
DATAPROC_PROJECTID Set value to the GCP_PROJECT_ID’s value
DATAPROC_BUCKET_NAME The name of Cloud Storage bucket created for DQ+
DATAPROC_LOGGING_BUCKET_NAME The name of Cloud Storage bucket created for DQ+
DATAPROC_REGION Set value to the GCP_LOCATION_ID’s value
DATAPROC_SHARED_CLUSTER_NAME The name of shared DataProc cluster created for DQ+
HADOOP_HDFS_FILESYSTEM Value must be set to gs

Optional: Edit the install.properties file to support Apache Impala or Hive

If you are a new customer installing the product for the first time, and you are using Apache Impala or Apache Hive for Data360 DQ+ Data Views, you will need to uncomment the relevant section (for Impala or Hive) within the install.properties file.

Tables describing the relevant properties found in this file are provided below.

Impala properties

Property

Description

IMPALA_AUTH_MODE

The authentication mode.

Possible values: kerberos or empty string.

IMPALA_DB_URL

The Impala Database’s url.

IMPALA_DB_USER

The Impala Database username.

IMPALA_DB_PASSWORD

The Impala Database password.

IMPALA_DATABASE_SCHEMA

The Impala Database schema.

Default value: default

HADOOP_CHANGE_EXT_TABLE_LOCATION_PERMISSION

Set this to true if the JDBC user does not already have write permission to the folder.

Hive properties

Property

Description

HIVE_AUTH_MODE

The authentication mode.

Possible values: kerberos or empty string.

HIVE_DB_URL

The Hive Database url.

HIVE_DB_USER

The Hive Database username.

HIVE_DB_PASSWORD

The Hive Database password.

HIVE_DATABASE_SCHEMA

The Hive Database schema.

Default value: default

HADOOP_CHANGE_EXT_TABLE_LOCATION_PERMISSION

Set this to true if the JDBC user does not already have write permission to the folder.

Dataview options

Property

Description

DATAVIEW_STORE_TYPE

Acceptable values:

Impala or Hive

Encrypting the install.properties file

After you have configured the install.properties file, it is recommended that you encrypt its contents using the following command, which is located in the /opt/infogix/dqplus/bin directory:

encryptProperties

If later on you need to decrypt the contents of the install.properties file, you can run the following command, which is also located in the opt/infogix/dqplus/bin directory.

decryptProperties

 

Note: After running decryptProperties, you will then need to re-run encryptProperties to re-encrypt the contents of the install.properties file.

Optional: Changing the Docker image storage directory

Once a sagacityExclusiveMountPoint has been set, you may need to change the Docker image storage directory from its default location (/var/lib/docker) to <sagacityExclusiveMountPoint>/docker.

Note that you will only need to do this if there is not enough disk space in /.

To do so, run the following commands on all machines in your cluster:

sudo mkdir <sagacityExclusiveMountPoint>/docker

sudo ln –s <sagacityExclusiveMountPoint>/docker /var/lib/docker

Prerequisite verification

Before running the core installation commands, you need to verify that all system prerequisites have been met.

First, check environment wide prerequisites, by running the following command:

./verifyEnvironment

 

Then, check prerequisites specific to Vertica, by running this command:

./verifyVerticaRequirements

Core installation commands

After completing the preliminary steps listed above, you will simply need to run the following commands under the /opt/infogix/dqplus-<version>/bin directory. Note that it will take some time for each command to complete.

./install

./initialize

If you have chosen to enable SSL for Vertica by setting the VERTICA_SSL_ENABLED setting to true, then you also need to run ./startComputeDb after running ./initialize

Troubleshooting initialization errors

If you are attempting to use an application database from 3.0, you may see the following error during initialization:

dqplusApplicationDb#13|ERROR: an older version of the "repmgr" extension is installed

dqplusApplicationDb#13|DETAIL: version 4.2 is installed but newer version 4.4 is available

dqplusApplicationDb#13|HINT: update the installed extension version by executing "ALTER EXTENSION repmgr UPDATE"

If you see this error, the workaround is to upgrade repmgr, as follows:

  1. Run the following command on the same host as the application database to upgrade the repmgr db schema:

    docker exec -it -u postgres dqplusApplicationDb psql repmgr -tAc "ALTER EXTENSION repmgr UPDATE"

  2. Re-run initialize to complete the initialization.

This workaround is only needed if you are attempting to use an application database from 3.0. New installations are not affected.

Testing installation

Once you have completed installation, you should verify that the application is functioning properly, as follows:

  1. Create a user account for yourself.
  2. Create a Data360 DQ+ Environment. The steps required to complete these tasks can be found in the "First user setup" section of the integrated product help.
  3. Test feature functionality by creating Data Stages. To verify that Data360 DQ+ is properly functioning, you should attempt to create a Pipeline/Path containing one of each type of Data Stage. The basic work flow of this process would be to:
    1. Create a Pipeline
    2. Create a Path within your Pipeline
    3. Acquire a data source, and create a Data Store
    4. Create and Execute an Analysis
    5. Create an Analytic Model
    6. Create and Execute a Data View
    7. Create a Dashboard
    8. Create and Execute a Process Model

For more information on each specific data stage, you can watch the Introductory Screencasts available on the Data360 DQ+ Welcome Screen. You can also learn more by reading the integrated product help.

After installation has completed, you should once again run ./verifyVerticaRequirements to verify that Vertica has been properly installed.