Create the infrastructure - Data360_DQ+ - 12.0

Data360 DQ+ AWS Installation

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
12.0
ft:locale
en-US
Product name
Data360 DQ+
ft:title
Data360 DQ+ AWS Installation
Copyright
2024
First publish date
2016
ft:lastEdition
2024-12-12
ft:lastPublication
2024-12-12T10:33:57.869000
  • Prerequisite: You have a Linux Virtual Machine (VM) with the necessary Setup requirements that you can use to deploy the infrastructure and install the application.
  • Prerequisite: Your Precisely representative must have provided you with a <build>-dist.zip file which you have unzipped and located the infra folder which will be used to deploy the infrastructure.
  • Prerequisite: To Configure SSL, make sure you have a valid SSL certificate that has been imported by AWS into the Certificate Manager. See https://aws.amazon.com/certificate-manager/ for more information.
  • Prerequisite: All non public facing component services should be deployed in 2 subnet including AWS EKS.
  • Prerequisite: Tag all subnets that your cluster uses for load balancers.
  1. If not already configured, configure the network as follows:
    1. Create a VPC (Virtual Private Cloud) by signing in to your AWS account.
    2. Choose a deployment option and create the required subnets:
      • Private deployment - Create two private subnets.
      • Public deployment - Create two private subnets and two public subnets.
    3. Configure all necessary routing rules.
      Note: For a private subnet, ensure that no routing tables have a connection to an Internet Gateway.
    4. If your private environments require VPC endpoints to communicate with AWS services, ensure that you have an endpoint for S3 and an endpoint for KMS.
    5. Ensure that your Linux VM with the necessary Setup requirements is running in one of the subnets.
    6. Ensure that you have created all the necessary security groups, IAM roles / policies / instance profiles, security groups and encryption keys, as directed by Infogix, and that you have the ARN / ID / name of these as required. For more information, see the table in Creating the infrastructure in silent mode.
  2. The infra folder contains the files that you need to deploy the infrastructure. Before running the provided scripts, make a note of the following information:
    • The AWS access key and secret key that will be used for deployment, for example AKIAIOSFODNN7EXAMPLE and JalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

      This must have Admin access for the duration of the installation. A copy of the key for the Elastic Beanstalk instances must be present on the VPN and you will need to know the local path to this file. For more information, see the AWS documentation at:

      https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html

    • The AWS primary and secondary regions, for example us-east-1 and us-east-2.
      Note: The primary and secondary regions cannot be the same.
    • The domain name for the installation, for example infogix.com.
    • The VPC ID of the Virtual Private Cloud to install into, for example vpc-6hj35709.
    • The subnet IDs for all subnets, for example subnet-07b3d0bf6h8op62ed. You can choose to have a public deployment or a private deployment. If you choose a public deployment, you will be required to provide the subnet IDs for two public and two private subnets, if you choose a private deployment, you will be asked to provide the subnet IDs for two private subnets.
    • The unique deployment ID associated with the AWS IAM user or AWS account, for example dev.
    • The password for the Amazon Aurora account.
    • To install Redshift Cluster you need to set flag 'redshiftEnabled' as 'true' otherwise set flag as 'false'.
    • The password for the Amazon Redshift database.
    • The name of the AWS key pair for the Elastic Beanstalk (EB) instance, and the name of the AWS key pair for the Elastic Map Reduce (EMR) instance. You can use the same key pair for both, or different key pairs. You also need to note down the paths to the key pairs on the VM.
    • For kubernetes deployment, get the values Cluster name, Kubernetes master node ip address, Kubernetes master node port number, Kubernetes namespace, label to use for labelling pods and ALB certificate ARN value.
    • The user ID from which the outgoing notifications from Data360 DQ+ will be sent, and the password for this user.
    • The password used by the application keystore file.
    • The SMTP server name, for example EMAIL_HOST=email-smtp.us-east-1.amazonaws.com.
    • The email address that should appear in the 'From' field when a user receives an email from Data360 DQ+, for example do-not-reply@example.com.
    • The company domain used when the company was first added to the system, for example THE_COMPANY.
    • The display friendly company name, for example The Company Inc.
    • The email address, first name and last name of the administrator.
    • A location to store the <OVERRIDES_FOLDER> folder, for example /tmp/files/. This folder will be used to store third party or custom files, such as JAR files for connecting to an external database, self-signed certificates or custom node plugins. It is recommended that you store the <OVERRIDES_FOLDER> folder in a separate location to the extracted <build>-dist.zip folder.
    • The ARN of the SSL certificate you wish to use. You can get this from the AWS Certificate Manager.

    In general, for any settings not called out in this step you can use the default values. For a full list of all settings, see Creating the infrastructure in silent mode.

  3. If you need to install self-signed certificates, for example, if you plan to call REST APIs from Data360 DQ+ and the REST service uses self-signed certificates, or requires mutually assured SSL, add these certificates to the following folder:

    /<OVERRIDES_FOLDER>/sparkshared/certs

    For example:

    /tmp/files/sparkshared/certs/

    Certificates from this location will be added to the trust store on EMR nodes.

    If you have a custom EMR Bootstrap script that you need to be run against EMR clusters, please name it emrBoostrap2.sh and insert it in the config/ folder, replacing the file that already exists.

    If your custom Bootstrap script requires any arguments to be passed to it, please include them in the config/dqplus.tfvarsvars.auto.tfvars file, by adding the appropriate line. For example:

    emrBootstrap2Args = ["s3://cafeqa.cafe.infogix.com/spark/certs/", "us-east-1"]

    Note: Each argument is encased in double quotes, separated by commas. The list is also enclosed in square brackets.

    You can add as many arguments as your script requires. If this file doesn’t exist yet, you can create it then add the line.

    If you use a custom EMR script, this also needs to be placed in the /<OVERRIDES_FOLDER>/sparkshared/bootstrap/ directory.

    For example:

    /tmp/files/sparkshared/bootstrap/emrBootstrap2.sh

    Note: To patch EMR for the CVE-2021-45046 log4j vulnerability, download s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.4.0-v1.sh, rename it as emrBootstrap2.sh, then use it as the custom bootstrap script, according to the instructions above.
  4. Navigate to the infra/aws/config directory and run the following command:

    python3 setup.py

    Follow the instructions provided by the script. You will need the information that you obtained in step 2 to respond to some of the questions. Settings that have a default setting do not need to be changed.

    Note: 1. "setup.py" script will execute "properties.py" and "password.py" after infra creation which will add properties in dqplus.properties and pw.properties files.2. Similarly, "setup.py" script will execute "properties.py", "password.py", "properties-ecr.py" and "properties-k8s.py" which will add properties in dqplus.properties and pw.properties files.
  5. Make a copy of the following files from the infra folder, and save them in a different location as a backup:
    • terraform.tfstate - This is a JSON file that is created dynamically by Terraform, containing the metadata of the resources that are created by Terraform. This file is created the first time that you run Terraform in your environment. This file is critical because it is used by Terraform to determine the changes that should be made to the infrastructure. If you are installing a new patch, Terraform will use this file to determine what changes need to be made for the new release.
    • dqplus.tfvarsvars.auto.tfvars - This file contains a list of variables that are consumed by Terraform as part of the build and is populated based on your responses to the install questions. For more information, see https://learn.hashicorp.com/tutorials/terraform/aws-variables.
    • dqplus.properties - This is a dynamically generated file and is created when the properties.py python script is run during setup. It takes outputs from Terraform and constructs a file containing the properties that are needed to install the application. If the deployment is specific to Kubernetes deployment, then it will update properties according to k8s specific (AWS ECR and K8s specific).
    • pw.properties - This is a dynamically generated file and is created when the password.py Python script is run during setup. It contains password information for the created infrastructure which is then used by the application.
  6. Similarly, for kubernetes deployment, helm chart will be updated once gradle executions will executes.