Once you have met the prerequisites outlined in Preparing to install DQ+, you can move on to installation. The installer is a shell script named dqplus_installer_XXXX.sh which you will need to execute to begin the installation process.
Unzip the installer
sudo -u sagacity dqplus_installer_XXXX.sh
You should see that the installer files have been extracted to the /opt/infogix directory.
Deciding how to configure your cluster
Before performing installation, you should consider on which machines you are going to place the components described in Description of architecture (Application, Load Balancer, ApplicationDb, Compute primary/secondaries (Hadoop cluster), and ComputeDb Nodes).
This assignment is performed by making modifications to the install.properties file, as described in the "Edit the install.properties file" section of this guide.
To help you design your cluster, a few basic examples are provided:
Maintenance Machine configuration
Maintenance Machine configuration is made by assigning the IP address of the Maintenance Machine to a set of properties found under the Node Mapping section of the install.properties file. Individual components are described in greater detail later in this guide, however the basic configuration is as follows.
Assuming the Maintenance Machine had an IP address of 192.0.2.1, this configuration would be accomplished by making the following assignments in the install.properties file:
sagacityLoadBalancer_ip=192.0.2.1
sagacityComputeDbNode1_ip=192.0.2.1
dqplusApplicationDb_ip=192.0.2.1
Three machine configuration
In addition to the recommended Maintenance Machine configuration, the other components that comprise Data360 DQ+ need to be assigned to the other IP addresses of the machines you have available in your cluster. Some additional components may also be assigned to the Maintenance Machine.
For example, in the diagram above, a three machine cluster is shown with component assignment spread across the three machines. Notice that nodes within the Compute/Hadoop and ComputeDb clusters should be distributed across machines.
Here, this configuration would be accomplished with the following assignments within the install.properties file:
sagacityLoadBalancer_ip=192.0.2.1
sagacityApplicationServer1_ip=192.0.2.1
sagacityApplicationServer1_memory=3g
sagacityApplicationServer_memory-swap=6g
sagacityApplicationServer2_ip=255.255.255.0
sagacityApplicationServer2_memory=3g
sagacityApplicationServer2_memory-swap=6g
dqplusApplicationDb_ip=192.0.2.1
sagacityComputeDbMaster_ip=192.0.2.1
sagacityComputeDbNode1_ip=255.255.255.0
sagacityComputeDbNode2_ip=254.254.254.0
Six node example
If working with two or more machines, other configurations can be considered. Generally, you will want to assign more memory to the components your users will use the most. For more information on what each component does, see Getting started: Basic concepts and terminology. As a general example, consider the following six machine configuration.
Here, this configuration would be accomplished with the following assignments within the install.properties file:
sagacityLoadBalancer_ip=192.0.2.1
sagacityApplicationServer1_ip=192.0.2.1
sagacityApplicationServer1_memory=3g
sagacityApplicationServer_memory-swap=6g
sagacityApplicationServer2_ip=255.255.255.0
sagacityApplicationServer2_memory=3g
sagacityApplicationServer2_memory-swap=6g
dqplusApplicationDb_ip=192.0.2.1
sagacityComputeDbMaster_ip=253.253.253.0
sagacityComputeDbNode1_ip=252.252.252.0
sagacityComputeDbNode2_ip=251.251.251.0
Edit the install.properties file
Once you have an idea of how you’ll configure your cluster - and prior to running the first installation script - you’ll need to make some modifications to the file found at:
/opt/infogix/dqplus/properties/install.properties
The modifications you make to the install.properties file will map Data360 DQ+ components to machines and will also determine a wide range of other settings that control the system.
See the following tables describing all of the properties found in this file:
Description of install.properties properties
Below is a description of each property found within the install.properties file. Properties are grouped by topic, as is done by the comments in the actual file itself.
Administrator details
|
Property |
Description |
|---|---|
|
sagacityAdministratorEmail |
Set to your email. |
|
sagacityAdministratorFirst |
Set to your first name |
|
sagacityAdministratorLast |
Set to your last name |
|
sagacityCompanyName |
Set to your company’s name |
|
sagacityCompanyId |
Set to a unique id that can be used to identify your company. For example: yourcompany.com |
SMTP properties
|
Property |
Description |
|---|---|
|
EMAIL_HOST |
The SMTP server name. |
|
EMAIL_PORT |
The SMTP server port. |
|
EMAIL_USER |
The user id from which the outgoing notifications from Data360 DQ+ will be sent. |
|
EMAIL_PASSWORD |
Password for the EMAIL_USER |
|
EMAIL_FROM_ADDRESS |
The email that should appear in the From field when a user receives an email from Data360 DQ+. |
Default deployment user
|
Property |
Description |
|---|---|
|
sagacityMaintenanceUser |
Set to the name that you assigned to the operating system user you created during preparation. |
|
sagacityMaintenancePassword |
Set this to the password for the maintenance user on the operating system. This is required, as it will be used by Vertica to connect with other Vertica hosts within the system. |
JAVA_HOME and Security Lib
|
Property |
Description |
|---|---|
|
JavaHomeDir |
Path to the Maintenance machine’s installation of Java. For example: /opt/java/current |
|
JavaSecurityLib |
Path to the Maintenance machine’s Java Cryptography Extension (JCE) Library. For example: ${JavaHomeDir}/lib/security Note: There is no need to change this property from its default value. |
DNS hostname mapping
|
Property |
Description |
|---|---|
|
dns_<ip address>_hostname=<hostname> |
A mapping of all cluster machines’ host names to IP addresses. Note that both sides of the property are variable. For example, if you were using 2 machines, you’d have: dns_172.170.130.195_hostname=ip-172-170-130-195.ec2.internal dns_172.170.130.245_hostname=ip-172-170-130-245.ec2.internal |
Authentication type settings
|
Property |
Description |
|---|---|
|
AUTH_TYPE |
Authentication type to use. Possible values: INTERNAL, SAML, or HTTPHEADER. If no value is specified, will default to SAML, if IS_SAML_ENABLED is set to ‘true’. If no value is specified, and IS_SAML_ENABLED is not set to ‘true’, will default to INTERNAL. |
SAML settings
|
Property |
Description |
|---|---|
|
IS_SAML_ENABLED |
Set to true if SAML is used as the sagacityCompanyAuthType. Set to false otherwise. |
|
SAML_METADATA_IDP |
Path to Identity Provider (IDP) Metadata XML file if SAML is used. If SAML is not used, leave blank. |
| SAML_SSO_MAX_AUTH_AGE |
Maximum allowable time (in seconds) between the authentication of a user and the processing of an authentication statement when SSO is enabled. Default setting: 43200 seconds (12 hours) |
Node mapping
|
Property |
Description |
|---|---|
|
sagacityLoadBalancer_ip |
IP address of the machine to place the Load Balancer on. This should be the Maintenance machine. |
|
dqplusApplicationDb_ip |
IP address of the machine to place ApplicationDb on. This should be the Maintenance machine. |
|
sagacityStandbyApplicationDb_ip |
If setting up a warm standby Postgres database, IP address of the instance on which the standby container should be placed. |
|
sagacityApplicationServer1_ip |
IP address of the machine to place App Server 1 on. For App Server 1, this should be the Maintenance machine. For additional App Servers, this should be the additional machines in use. To add additional App Servers, create a sagacityApplicationServer_ip property and IP Address mapping for each one, using incremental numbering. For example: sagacityApplicationServer2_ip, sagacityApplicationServer3_ip... sagacityApplicationServerN_ip |
|
sagacityApplicationServer1_memory |
Default setting: 3g Used as a command line parameter that is passed when creating the Application component. For example, sagacityApplicationServer1_memory=3g allocates 3g of memory for App Server 1. Note that if multiple App Servers are in use, you need to assign memory to each one, using the appropriate numbers to identify each server. For example: sagacityApplicationServer2_memory, sagacityApplicationServer3_memory... sagacityApplicationServerN_memory |
|
sagacityApplicationServer1_memory-swap |
Default setting: 6g Used as a command line parameter that is passed when creating the Application component. For example, sagacityApplicationServer1_memory-swap=6g allocates 6g of memory-swap for App Server 1. Note that if multiple App Servers are in use, you need to assign memory-swap to each one, using the appropriate numbers to identify each server. For example: sagacityApplicationServer2_memory-swap, sagacityApplicationServer3_memory-swap... sagacityApplicationServerN_memory-swap |
|
sagacityComputeDbMaster_ip |
IP address of the machine to place ComputeDb primary on. This should be the Maintenance machine. |
|
sagacityComputeDbNode1_ip |
IP address of the machine to place ComputeDb Node 1 on. To add additional ComputeDb Nodes, create a sagacityComputeDbNode_ip property and IP Address mapping for each one, using incremental numbering. For example: sagacityComputeDbNode2_ip, sagacityComputeDbNode3_ip... sagacityComputeDbNodeN_ip |
Load Balancer configuration
|
Property |
Description |
|---|---|
|
IS_SSL_ENABLED |
Set to true if SSL is enabled; false if SSL is disabled. |
|
LOAD_BAL_SSL_CERT |
If SSL is enabled, set this parameter to the path of the file containing the security certificate in the PEM format. If SSL is disabled, leave blank. |
|
LOAD_BAL_SSL_CERT_KEY |
If SSL is enabled, set this parameter to the path of the file containing the security certificate’s secret key in the PEM format. If SSL is disabled, leave blank. |
|
LOAD_BAL_SSL_TRUSTED_CERT |
If SSL is enabled, set this parameter to the path of the file containing a trusted CA certificate in the PEM format (optional). If SSL is disabled, leave blank. |
Deployment properties
|
Property |
Description |
|---|---|
|
DEPLOY_HOST |
Used to specify the application access point, i.e., the address of the Load Balancer. Default setting: ${sagacityLoadBalancer_ip} |
|
DEPLOY_HOST_URL |
Used to specify the URL of the application access point. If SSL is disabled, set to: http://${DEPLOY_HOST} If SSL is enabled, set to: https://${DEPLOY_HOST} |
| DEPLOY_JAVA_OPTS |
Sets the default file encoding for the JVM to UTF-8, and sets the timezone. You must modify the Default setting:
|
SSO properties
|
Property |
Description |
|---|---|
|
USE_IFRAME_FOR_SSO |
Used to specify if the system should display SSO verification in an iFrame. Values can be: true or false. Default value: false |
Special folders: Application mounts
|
Property |
Description |
|---|---|
|
sagacitySharedMountPoint |
Set this to the path of a directory that can be shared across all machines in the cluster to hold data, logs, and backup content. |
|
sagacityExclusiveMountPoint |
Set this to the path of a directory located on the Maintenance Machine that can be used to contain runtime data for the Compute, Load Balancer, and Application components. |
Special folders: Shared mount subdirectories
|
Property |
Description |
|---|---|
|
DATA_HOME |
Default setting: ${sagacitySharedMountPoint}/data The folder within sagacitySharedMountPoint that hosts Application runtime data. Note: There is no need to change this property from its default value. |
|
sagacityLogsFolder |
Default setting: ${sagacitySharedMountPoint}/logs The folder within sagacitySharedMountPoint that hosts system logs. Note: There is no need to change this property from its default value. |
|
sagacityBackupFolder |
Default setting: ${sagacitySharedMountPoint}/backup The folder within sagacitySharedMountPoint that hosts all database backups for the system. Note: There is no need to change this property from its default value. |
Special folders: Exclusive mount sub-directories
|
Property |
Description |
|---|---|
|
sagacityDataFolder |
Default setting: ${sagacityExclusiveMountPoint}/application} The folder within sagacityExclusiveMountPoint that hosts the databases of the components that comprise Data360 DQ+. Note: There is no need to change this property from its default value. |
|
sagacityFolderCompute |
Default setting: ${sagacityExclusiveMountPoint}/runtime/hadoop} The folder within sagacityExclusiveMountPoint that hosts the Compute component. Note: There is no need to change this property from its default value. |
|
sagacityFolderLoadBalancer |
Default setting: ${sagacityExclusiveMountPoint}/runtime/nginx} The folder within sagacityExclusiveMountPoint that hosts the Load Balancer component. Note: There is no need to change this property from its default value. |
Special folders: External folders (shared across all nodes)
|
Property |
Description |
|---|---|
|
externalDataFolder |
Set this to the path of a directory that can be shared across all machines in the cluster to hold files for processing. |
|
externalDataFolderMountDestination |
Set this to a path that you would like to be used to represent the externalDataFolder whenever that path is shown in the UI. |
Application keystore
|
Property |
Description |
|---|---|
|
APP_KEYSTORE_FILE |
See comments in install.properties file |
|
APP_KEYSTORE_PASSWORD |
The password to the Application Keystore. |
Vertica DB (ComputeDb properties)
|
Property |
Description |
|---|---|
|
VERTICA_DB_URL |
Set to the Vertica data access url. |
|
VERTICA_DB_URL_ESCAPED |
Same as VERTICA_DB_URL, but with the ampersand escaped. Only used when VERTICA_SSL_ENABLED = true |
|
VERTICA_DB_USER |
Default setting: ${sagacityMaintenanceUser} |
|
VERTICA_DB_PASSWORD |
The Vertica database password. |
|
VERTICA_DATABASE_SCHEMA |
The name of the schema in the Vertica database to use. Default setting: public |
|
VERTICA_SSL_ENABLED |
Set to true to enable Vertica SSL. If the VERTICA_SSL_ENABLED property is set to true, then the system expects the physical locations of the CRT and the KEY files to be referenced in the VERTICA_SSL_CRT_FILE and VERTICA_SSL_KEY_FILE properties, respectively. |
|
VERTICA_SSL_CERTIFICATE_GENERATE |
Set to true to generate a Self Signed SSL certificate for Vertica. Note that VERTICA_SSL_ENABLED must also be set to true.if this property is set to false, a Self Signed SSL certificate will not be automatically generated for Vertica. In this case, use the VERTICA_SSL_CRT_FILE and VERTICA_SSL_KEY_FILE properties to define where to locate your existing certificate and key files. |
|
VERTICA_SSL_CRT_FILE |
Physical location of your CRT file. Only used when the VERTICA_SSL_ENABLED property is set to true. If you have set VERTICA_SSL_CERTIFICATE_GENERATE to true then this is the location that the generated CRT file will be stored. |
|
VERTICA_SSL_KEY_FILE |
Physical location of your KEY file. Only used when the VERTICA_SSL_ENABLED property is set to true. If you have set VERTICA_SSL_CERTIFICATE_GENERATE to true then this is the location that the generated Key file will be stored. |
|
VERTICA_LICENSE_FILE |
If using the Community Edition of Vertica, set to: CE Otherwise, set to the path of your Vertica license file. |
|
VERTICA_RESOURCE_POOL_MAXMEMORYSIZE |
Default setting: 25% Can be set to a percentage using % or an actual amount of space, for example: 10G See Vertica documentation if you would like to change this setting. |
|
VERTICA_MAX_DATA_UTILIZATION_PERCENTAGE |
Default setting: 80 Always interpreted as a percentage, so % is not needed. See Vertica documentation if you would like to change this setting. |
Database properties (ApplicationDb properties)
|
Property |
Description |
|---|---|
|
DATABASE_ADMIN_PASSWORD |
The database administrator password. |
|
DATABASE_PASSWORD |
The database user password. Set to the password of the application database user. |
|
DATABASE_SERVER |
Default setting: ${dqplusApplicationDb_ip} The IP address of the machine that the ApplicationDb component is hosted on. Note: There is no need to change this property from its default value. |
|
STANDBY_DATABASE_SERVER |
Default setting: ${sagacityStandyByApplicationDb_ip} Note: There is no need to change this property from its default value. |
|
SQL_DATASTORE_DB_SERVER |
Default setting: ${DATABASE_SERVER} The IP address of the machine that the ApplicationDb component is hosted on. Note: There is no need to change this property from its default value. |
|
DATABASE_SERVER_SSL_ENABLED |
Set to true to enable Database SSL.If the DATABASE_SERVER_SSL_ENABLED property is set to true, then the system expects the physical locations of the CRT and the KEY files to be referenced in the DATABASE_SERVER_SSL_CRT_FILE and DATABASE_SERVER_SSL_KEY_FILE properties, respectively. |
|
DATABASE_SERVER_SSL_CERTIFICATE_GENERATE |
true to generate a Self Signed SSL certificate for the Application DB. Note that DATABASE_SERVER_SSL_ENABLED must also be set to true. false to not generate a Self Signed SSL certificate for the Application DB. |
|
DATABASE_SERVER_SSL_CRT_FILE |
Physical location of your CRT file. Only used when DATABASE_SERVER_SSL_ENABLED = true |
|
DATABASE_SERVER_SSL_KEY_FILE |
Physical location of your KEY file. Only used when DATABASE_SERVER_SSL_ENABLED = true |
|
DATABASE_URL |
Set to the url by which JDBC is established to access the ApplicationDb database. Note: There is no need to change this property from its default value. |
|
DATABASE_URL_ESCAPED |
Same as DATABASE_URL, but with the ampersand escaped. Only used when DATABASE_SERVER_SSL_ENABLED = true |
|
SQL_DATASTORE_DB_URL |
Default setting: ${DATABASE_URL} Note: There is no need to change this property from its default value. |
Hadoop connectivity properties (Compute properties)
|
Property |
Description |
|---|---|
|
HADOOP_VENDOR |
The distribution type of Hadoop in use. Permissible values:cloudera |
| HADOOP_VENDOR_VERSION |
The version of the Hadoop distribution. |
| HADOOP_VENDOR_VERSION_COMPLETE |
The version of the Hadoop distribution. |
|
HDFS_MASTER_NODE_ADDRESS |
Host reference of the Hadoop Distributed File System primary node in use. Use a Fully Qualified Domain Name or an IP address/port. For example: 172.17.13.168:8020 |
|
YARN_RESOURCE_MANAGER_ADDRESS |
Host reference of the Yarn resource manager node within the Hadoop cluster in use. Use a Fully Qualified Domain Name or an IP address/port. For example: 172.17.13.190:8050 |
|
HADOOP_KMS_SERVER_URL |
Default setting: kms://http@${sagacityComputeMaster_ip}:16000/kms The url by which the Compute primary component’s KMS server is accessible. Note: There is no need to change this property from its default value. |
|
YARN_NODE_MANAGER_VCORES |
Controls the value of this Hadoop service. Value is determined on a case by case basis by Infogix. |
|
YARN_NODE_MANAGER_MEMORY_MB |
Controls the value of this Hadoop service. Value is determined on a case by case basis by Infogix. |
|
YARN_SCHEDULER_MAXIMUM_ALLOCATION_MB |
Controls the value of this Hadoop service. Value is determined on a case by case basis by Infogix. |
|
YARN_SCHEDULER_MAXIMUM_ALLOCATION_VCORES |
Controls the value of this Hadoop service. Value is determined on a case by case basis by Infogix. |
|
HADOOP_YARN_CLIENT_CONF_FILE |
The absolute path to the Hadoop Yarn configuration file. Note: This file must be made available at an accessible location prior to installation. |
|
HADOOP_HDFS_DATA_FOLDER |
The absolute path of the local HDFS data folder on the Data360 DQ+ cluster nodes. |
|
HADOOP_SPARK_DYNAMIC_ALLOCATION |
Used to specify if dynamic allocation is enabled for the Hadoop cluster. Should be set to true, unless Infogix support advises to set to false. |
|
HADOOP_EXTRA_JAVA_OPTIONS |
Used to specify additional Java command-line parameters for Spark jobs executed in the Hadoop cluster. |
Hadoop security properties (Compute properties)
|
Property |
Description |
|---|---|
|
SECURITY_AUTH_MODE |
Default setting: kerberos Note that Hadoop only supports kerberos as an authentication mode. Note: There is no need to change this property from its default value, and it is only applicable if SECURITY_AUTH_ENABLED is set to true. |
|
SECURITY_AUTH_ENABLED |
Set to true to enable Kerberos authentication in Hadoop. Set to false to disable Kerberos authentication in Hadoop. |
|
DEFAULT_REALM |
Default setting: sagacity Effectively, a collection of credentials used for authentication purposes. Using the default setting 'sagacity' is recommended, however the realm can be given any other name if necessary. |
|
KDC_SERVER |
The IP address or dns hostname the Kerberos component is hosted on. For example: ${sagacityKerberosServer_ip} |
|
KDC_DB_USER |
The KerberosDb server user name. |
|
KDC_DB_PASSWORD |
Password for the KerberosDb. |
|
KDC_USER_PASSWORD |
Password for the default user on Kerberos. |
|
KRB5_CONF_FILE |
The absolute path to the Hadoop cluster’s Kerberos configuration file. Typically a copy of the file (krb5.conf) is maintained on the Hadoop installation node and the path to this file can be used. Note: Only specify when using a secured cluster. |
|
KRB5_KEYTABS_DIR |
The root directory where the Application Server nodes’ keytab files are stored. Note: For each Application Server node, the installation process expects a directory with a name matching the IP address of the Application Server node. Additionally, the Data360 DQ+ system user keytab needs to be in each of these directories. Note: Only specify when using a secured cluster. |
|
HADOOP_KERBEROS_KEYTAB_FILE |
Used to specify the reference to the Data360 DQ+system user's keytab file. The keytab file referenced by this property must be inside the following directory: For example: The is the same as the keytab file that is copied into all of the node directories under the KRB5_KEYTABS_DIR directory. Note: Only specify when using a secured cluster. |
|
HADOOP_KERBEROS_KEYTAB_USER |
The principal name of the HADOOP_KERBEROS_KEYTAB_FILE file. Note: Only specify when using a secured cluster. |
|
HADOOP_KERBEROS_NN_KEYTAB_USER |
The principal name of the Hadoop cluster's name-node. Can be obtained from the Hadoop cluster's security configuration. Note: Only specify when using a secured cluster. |
|
HADOOP_KERBEROS_RM_KEYTAB_USER |
The principal name of the Hadoop cluster’s resource manager. Can be obtained from the Hadoop cluster’s security configuration. Note: Only specify when using a secured cluster. |
|
HADOOP_DFS_DATA_TRANSFER_PROTECTION |
The Hadoop cluster's preset data transfer protection scheme. Can be obtained from the Hadoop cluster’s security configuration. Note: Only specify when using a secured cluster. |
|
HADOOP_DFS_HTTP_POLICY |
The Hadoop cluster's preset HTTP access policy scheme.Note: When using a secured cluster, specify: HTTPS_ONLY |
|
HADOOP_SPARK_AUTHENTICATE |
Used to specify if Spark authentication is enabled on the Hadoop cluster. Set to true or false. |
HDFS/Hadoop client configuration properties
|
Property |
Description |
|---|---|
|
HADOOP_HDFS_CLIENT_CONF_FILE |
The HDFS client configuration file path and name. |
Google Dataproc connectivity properties (Compute properties)
| Property | Description |
|---|---|
| GCP_PROJECT_ID | Google GCP project ID to use |
| GCP_LOCATION_ID | Google GCP location ID to use |
| GCP_SERVICE_ACCOUNT_KEY_FILE_LOCAL |
Path to the service account key (.json) file on the maintenance machine. This file will be copied to the remote docker containers at the location specified by GCP_SERVICE_ACCOUNT_KEY_FILE. |
| GCP_SERVICE_ACCOUNT_KEY_FILE | Path to the service account key (.json) file used by the application at runtime. |
| GCP_KMS_MASTER_KEY_RING_ID | The Google CKM key ring ID used for data encryption and decryption. |
| GCP_KMS_MASTER_KEY_ID | The Google CKM key ID used for data encryption and decryption. |
| HADOOP_ENCRYPTION_KEY |
The Google CKM key ID used for encryption and decryption in the spark or hadoop environment. |
| SECURITY_CRYPT_PROVIDER | Value must be set to GOOGLE_KMS |
| EXECUTION_SECURITY_CRYPT_PROVIDER | Value must be set to GOOGLE_KMS_SPARK |
| DATAPROC_ENABLED | Set to true to enable Google Dataproc. |
| DATAPROC_PROJECTID | Set value to the GCP_PROJECT_ID’s value |
| DATAPROC_BUCKET_NAME | The name of Cloud Storage bucket created for DQ+ |
| DATAPROC_LOGGING_BUCKET_NAME | The name of Cloud Storage bucket created for DQ+ |
| DATAPROC_REGION | Set value to the GCP_LOCATION_ID’s value |
| DATAPROC_SHARED_CLUSTER_NAME | The name of shared DataProc cluster created for DQ+ |
| HADOOP_HDFS_FILESYSTEM | Value must be set to gs |
Optional: Edit the install.properties file to support Apache Impala or Hive
If you are a new customer installing the product for the first time, and you are using Apache Impala or Apache Hive for Data360 DQ+ Data Views, you will need to uncomment the relevant section (for Impala or Hive) within the install.properties file.
Tables describing the relevant properties found in this file are provided below.
Impala properties
|
Property |
Description |
|---|---|
|
IMPALA_AUTH_MODE |
The authentication mode. Possible values: kerberos or empty string. |
|
IMPALA_DB_URL |
The Impala Database’s url. |
|
IMPALA_DB_USER |
The Impala Database username. |
|
IMPALA_DB_PASSWORD |
The Impala Database password. |
|
IMPALA_DATABASE_SCHEMA |
The Impala Database schema. Default value: default |
|
HADOOP_CHANGE_EXT_TABLE_LOCATION_PERMISSION |
Set this to true if the JDBC user does not already have write permission to the folder. |
Hive properties
|
Property |
Description |
|---|---|
|
HIVE_AUTH_MODE |
The authentication mode. Possible values: kerberos or empty string. |
|
HIVE_DB_URL |
The Hive Database url. |
|
HIVE_DB_USER |
The Hive Database username. |
|
HIVE_DB_PASSWORD |
The Hive Database password. |
|
HIVE_DATABASE_SCHEMA |
The Hive Database schema. Default value: default |
|
HADOOP_CHANGE_EXT_TABLE_LOCATION_PERMISSION |
Set this to true if the JDBC user does not already have write permission to the folder. |
Dataview options
|
Property |
Description |
|---|---|
|
DATAVIEW_STORE_TYPE |
Acceptable values: Impala or Hive |
Encrypting the install.properties file
After you have configured the install.properties file, it is recommended that you encrypt its contents using the following command, which is located in the /opt/infogix/dqplus/bin directory:
encryptProperties
If later on you need to decrypt the contents of the install.properties file, you can run the following command, which is also located in the opt/infogix/dqplus/bin directory.
decryptProperties
Optional: Changing the Docker image storage directory
Once a sagacityExclusiveMountPoint has been set, you may need to change the Docker image storage directory from its default location (/var/lib/docker) to <sagacityExclusiveMountPoint>/docker.
Note that you will only need to do this if there is not enough disk space in /.
To do so, run the following commands on all machines in your cluster:
sudo mkdir <sagacityExclusiveMountPoint>/docker
sudo ln –s <sagacityExclusiveMountPoint>/docker /var/lib/docker
Prerequisite verification
Before running the core installation commands, you need to verify that all system prerequisites have been met.
First, check environment wide prerequisites, by running the following command:
./verifyEnvironment
Then, check prerequisites specific to Vertica, by running this command:
./verifyVerticaRequirements
Core installation commands
After completing the preliminary steps listed above, you will simply need to run the following commands under the /opt/infogix/dqplus-<version>/bin directory. Note that it will take some time for each command to complete.
./install
./initialize
If you have chosen to enable SSL for Vertica by setting the VERTICA_SSL_ENABLED setting to true, then you also need to run ./startComputeDb after running ./initialize
Troubleshooting initialization errors
If you are attempting to use an application database from 3.0, you may see the following error during initialization:
dqplusApplicationDb#13|ERROR: an older version of the "repmgr" extension is installed
dqplusApplicationDb#13|DETAIL: version 4.2 is installed but newer version 4.4 is available
dqplusApplicationDb#13|HINT: update the installed extension version by executing "ALTER EXTENSION repmgr UPDATE"
If you see this error, the workaround is to upgrade repmgr, as follows:
- Run the following command on the same host as the application database to upgrade the repmgr db schema:
docker exec -it -u postgres dqplusApplicationDb psql repmgr -tAc "ALTER EXTENSION repmgr UPDATE" - Re-run initialize to complete the initialization.
This workaround is only needed if you are attempting to use an application database from 3.0. New installations are not affected.
Testing installation
Once you have completed installation, you should verify that the application is functioning properly, as follows:
- Create a user account for yourself.
- Create a Data360 DQ+ Environment. The steps required to complete these tasks can be found in the "First user setup" section of the integrated product help.
- Test feature functionality by creating Data Stages. To verify that Data360 DQ+ is properly functioning, you should attempt to create a Pipeline/Path containing one of each type of Data Stage. The basic work flow of this process would be to:
- Create a Pipeline
- Create a Path within your Pipeline
- Acquire a data source, and create a Data Store
- Create and Execute an Analysis
- Create an Analytic Model
- Create and Execute a Data View
- Create a Dashboard
- Create and Execute a Process Model
For more information on each specific data stage, you can watch the Introductory Screencasts available on the Data360 DQ+ Welcome Screen. You can also learn more by reading the integrated product help.
After installation has completed, you should once again run ./verifyVerticaRequirements to verify that Vertica has been properly installed.