Create the infrastructure in silent mode - Data360_DQ+ - 12.0

Data360 DQ+ AWS Installation

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
12.0
ft:locale
en-US
Product name
Data360 DQ+
ft:title
Data360 DQ+ AWS Installation
Copyright
2024
First publish date
2016
ft:lastEdition
2024-12-12
ft:lastPublication
2024-12-12T10:33:57.869000

As an alternative to creating the infrastructure manually, you can deploy the infrastructure in silent mode.

  • Prerequisite: The same prerequisites that are required to deploy the infrastructure manually are also needed if you plan to deploy the infrastructure in silent mode, see Setup requirements.
  • Prerequisite: To prepare for running a silent deployment, you must have completed steps 1 and 2 of Creating the infrastructure. You should also read step 3, and complete if required.
  • Prerequisite: To Configure SSL, make sure you have a valid SSL certificate that has been imported by AWS into the Certificate Manager. See https://aws.amazon.com/certificate-manager/ for more information.
  • Prerequisite: Make sure you create a Kubernetes cluster and also have 3 docker Repositories in AWS ECR registry. The Repositories must have the title - dqplus-main, dqplus-os and dqplus-extension.
Note: The instructions in this section assume that you are using an AWS access key and secret key to create the infrastructure. If you want to use instance profiles for authentication instead, see Using instance profiles for authentication. Note that instance profile authentication is only supported when running a silent installation, and not when creating the infrastructure by running the interactive script.
  1. Populate a variables file with the following required settings. Use the format setting = "value", for example awsPrimaryRegionprimaryRegion = "us-east-1":

    awsPrimaryRegion primaryRegion = " "

    awsAccessKey = " "

    awsSecretKey = " "

    awsSecondaryRegion = " "

    domainName = " "

    vpcId = " "

    privateSubnet1Id = " "

    privateSubnet2Id = " "

    deploymentId = " "

    primaryS3Bucket = " "

    availabilityZone1 = " "

    availabilityZone2 = " "

    localPathToEmrKey = " "

    localPathToEc2Key = " "

    s3KmsKey = " "

    #Aurora

    auroraPostgresPassword = " "

    auroraDbInstanceClass = "db.r5.xlarge"

    auroraBackupRetentionPeriod = "30"

    auroraPreferredBackupWindow = "04:02-04:32"

    auroraPreferredMaintenanceWindow = "sat:02:00-sat:02:30"

    auroraSnapshotIdentifier = " "

    auroraSecurityGroupId = " "

    rdsSubnetGroupName = " "

    postgresKmsKeyArn = " "

    postgresEngineVersion = ""

    #Redshift

    redshiftEnabled = "true"

    redshiftMasterPassword = " "

    redshiftStandardSnapshotIdentifier = " "

    redshiftHpSnapshotIdentifier = " "

    redshiftClusterSubnetGroupName = " "

    redshiftSecurityGroupId = " "

    redshiftIamRoleArn = " "

    redshiftEnabled = "true"

    #EMR

    emrEc2KeyName = " "

    emrCoreInstanceCount = "3"

    emrCoreInstanceType = "r5d.xlarge"

    emrTaskInstanceCount = "1"

    emrTaskInstanceType = "r5d.xlarge"

    emrMasterInstanceCount = "3"

    emrMasterInstanceType = "r5d.xlarge"

    emrKmsKeyArn = " "

    emrMasterSecurityGroupId = " "

    emrSlaveSecurityGroupId = " "

    emrServiceAccessSecurityGroupId = " "

    emrInstanceProfileArn = " "

    emrIamRoleArn = " "

    emrCustomAmiId = " "

    #EB

    publicSubnet1Id = " "

    publicSubnet2Id = " "

    elasticBeanstalkPlatformName = "64bit Amazon Linux 2 v4.1.3 running Tomcat 9.0 Corretto 11"

    ebMaxInstanceCount = "4"

    ebMinInstanceCount = "2"

    ebInstanceType = "c5.2xlarge"

    ebKeyPairName = " "

    ebAppCount = "1 "

    sslConnectionStatus = "true"

    httpsListenerEnable = "true"

    httpListenerEnable = "false"

    ebOrK8s = "0"

    Note: The ebOrK8s is working as a flag for elastic Beanstalk or Kubernetes deployment. If ebOrK8s=0, it will deploy elastic beanstalk and if "ebOrK8s=1" then it will skip elastic beanstalk deployment and do the required configuration for Kubernetes deployment.

    sslCertificateArn = "arn:aws:acm:us-east-1:0517023248360:certificate/xxxxx-aabd-4134-a85e-1ea86xxxxxx"

    Note: sslConnectionStatus is working as a flag. But the other three values will get read into a terraform variable file. If sslConnectionStatus = "true", httpsListenerEnable should be "true", httpListenerEnable should be "false", and sslCertificateArn should set the value of ssl certificate. Similarly, If sslConnectionStatus = "false", httpsListenerEnable should be "false", httpListenerEnable should be "true", and sslCertificateArn can be blank "".

    ebInstanceProfileArn = " "

    ebSecurityGroupIds = " "

    ebCustomAmiId = " "

    Note: Where the above settings contain a value, for example ebMaxInstanceCount = "4", this is the recommended default value which should be used in most cases. Please note that you must include all settings.
    Note: The elasticBeanstalkPlatformName value can change regularly due to the AWS release schedule. It is recommended that you verify the value for this setting from your AWS account by running the following AWS CLI command: aws elasticbeanstalk list-available-solution-stacks | grep "running Tomcat 9.0 Corretto 11"
    Tip: During a manual deployment, the values for these settings are entered via prompts when running the python3 setup.py script.
    The following table provides additional information about each of the properties:
    Property Description
    awsPrimaryRegion

    The AWS primary region, for example:

    awsPrimaryRegion = "us-east-1"

    primaryRegion

    The AWS primary region, for example:

    primaryRegion = "us-east-1"

    awsAccessKey

    The AWS access key that will be used for deployment, for example:

    awsAccessKey = "AKIAIOSFODNN7EXAMPLE"

    This must have Admin access for the duration of the installation. For more information, see the AWS documentation, for example at: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html

    awsSecretKey

    The AWS secret key that will be used for deployment, for example:

    awsSecretKey = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

    This must have Admin access for the duration of the installation. For more information, see the AWS documentation, for example at: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html

    awsSecondaryRegion

    The AWS secondary region, for example:

    awsSecondaryRegion = "us-east-2"

    domainName

    The domain name for the installation, for example:

    domainName = "infogix.com"

    vpcId

    The VPC ID of the Virtual Private Cloud to install into, for example:

    vpcId = "vpc-6hj35709"

    privateSubnet1Id

    The subnet ID for the first private subnet, for example:

    privateSubnet1Id = "subnet-07b3d0bf6h8op62ed"

    privateSubnet2Id

    The subnet ID for the second private subnet, for example:

    privateSubnet2Id = "subnet-00ecf411w465df7ef"

    deploymentId

    The unique deployment ID associated with the AWS IAM user or AWS account, for example:

    deploymentId = "dev"

    primaryS3Bucket The name for the S3 bucket to be used for the installation.
    availabilityZone1 The name of the first Availability Zone for the Aurora cluster.
    availabilityZone2 The name of the second Availability Zone for the Aurora cluster.
    localPathToEmrKey

    The local path to the .pem file that can be used to login to the EMR primary server. This needs to be a local path on the VM from which the install is initiated.

    localPathToEc2Key

    The path to the Elastic Beanstalk key pair on the VM, for example:

    localPathToEc2Key = "/home/ec2-user/engineering.pem"

    s3KmsKey The ARN of the KMS key used for S3.
    auroraPostgresPassword

    The password for the Amazon Aurora account, for example:

    auroraPostgresPassword = "Password1"

    auroraDbInstanceClass

    The Amazon Aurora database instance class. The default value is:For example:

    auroraDbInstanceClass = "db.r5.xlarge"

    For more information, see the AWS documentation, for example at:

    https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html

    auroraBackupRetentionPeriod

    The number of days to retain backup data. The default value is 30 days:For example, 30 days:

    auroraBackupRetentionPeriod = "30"

    auroraPreferredBackupWindow

    Preferred time to perform backups. The default value is between 04:02 AM and 04:32 AM:For example, between 04:02 AM and 04:32 AM:

    auroraPreferredBackupWindow = "04:02-04:32"

    auroraPreferredMaintenanceWindow

    Preferred time to perform maintenance. The default value is on a Saturday between 02:00 AM and 02:30 AM:For example, on a Saturday between 02:00 AM and 02:30 AM:

    auroraPreferredMaintenanceWindow = "sat:02:00-sat:02:30"

    auroraSnapshotIdentifier

    Specifies whether or not to create this cluster from a snapshot. You can use either the name or ARN when specifying a DB cluster snapshot, or the ARN when specifying a DB snapshot.

    auroraSecurityGroupId

    The security group ID to use for the Aurora database cluster.

    rdsSubnetGroupName

    The subnet group name to be used for the Aurora database cluster.

    postgresKmsKeyArn

    The ARN of the KMS key to be used for Aurora Postgres encryption.
    postgresEngineVersion The version of the Aurora Engine to use. This is an optional value and if not included in the var file, the default of 11.9 will be used.
    redshiftEnabled

    Determines whether to install the Redshift environment. If the value is set to 'true', script creates redshift, if it is set to 'false' redshift creation gets skipped.

    redshiftMasterPassword

    The password for the Amazon Redshift database, for example:

    redshiftMasterPassword = "Password1"

    redshiftClusterSubnetGroupName The name of the subnet group to use for the Redshift cluster.
    redshiftSecurityGroupId The ID of the security group to use for Redshift access.
    redshiftIamRoleArn The ARN of the IAM role to use for Redshift access.

    redshiftStandardSnapshotIdentifier

    The name of the snapshot from which to create the new Standard Redshift cluster if restoring from an existing DB.

    redshiftHpSnapshotIdentifier

    The name of the snapshot from which to create the new High Performance Redshift cluster if restoring from an existing DB.
    ebOrK8s The ebOrK8s is working as a flag for elastic Beanstalk or Kubernetes deployment. If ebOrK8s=0, it will deploy elastic beanstalk and if ebOrK8s=1 then it will skip elastic beanstalk deployment and do the required configuration for Kubernetes deployment.
    publicSubnet1Id If using public subnets, enter the first Subnet ID, alternatively leave as " "
    publicSubnet2Id If using public subnets, enter the second Subnet ID, alternatively leave as " "
    elasticBeanstalkPlatformName

    The Elastic Beanstalk platform name, for example:

    elasticBeanstalkPlatformName = "64bit Amazon Linux 2 v4.1.3 running Tomcat 9.0 Corretto 11"

    The elasticBeanstalkPlatformName value can change regularly due to the AWS release schedule. It is recommended that you verify the value for this setting from your AWS account by running the following AWS CLI command:aws elasticbeanstalk list-available-solution-stacks | grep "running Tomcat 9.0 Corretto 11"

    ebMaxInstanceCount

    The maximum number of EC2 instances in your Elastic Beanstalk environment. The default value is:For example:

    ebMaxInstanceCount = "4"

    ebMinInstanceCount

    The minimum number of EC2 instances in your Elastic Beanstalk environment. The default value is:For example:

    ebMinInstanceCount = "2"

    ebInstanceType

    The Elastic Beanstalk instance type. The default value is:For example:

    ebInstanceType = "c5.2xlarge"

    ebKeyPairName

    The name of the AWS key pair for the Elastic Beanstalk (EB) instance, for example:

    ebKeyPairName = "engineering"

    This can be the same as the emrEc2KeyName, or it can be different.

    ebAppCount

    Determines whether to install the Elastic Beanstalk environments into an existing application called dqplus. If you already have a Data360 DQ+ installation in AWS, the value should be 0. If this is your first implementation of Data360 DQ+, the value should be 1.

    The default value is: ebAppCount = "1"

    ebInstanceProfileArn The ARN of the instance profile to assign to Elastic Beanstalk instances.
    ebSecurityGroupIds The security group IDs separated by a comma to apply to Elastic Beanstalk instances.
    ebSecurityAmiId

    The AMI ID to use for Elastic Beanstalk instances if you are using a custom image.

    If you are not using a custom image, do not add this setting to the file.

    emrEc2KeyName

    The name of the AWS key pair for the Elastic Map Reduce (EMR) instance, for example:

    emrEc2KeyName = "engineering"

    This can be the same as the ebKeyPairName, or it can be different.

    emrCoreInstanceCount

    The number of EMR core instances. The default value is:For example:

    emrCoreInstanceCount = "3"

    emrCoreInstanceType

    The EMR core node type. Infogix supports the r5d instance range for this setting.

    The default value is:For example:

    emrCoreInstanceType = "r5d.xlarge"

    emrTaskInstanceCount

    The number of task nodes in your EMR cluster. The default value is:For example:

    emrTaskInstanceCount = "1"

    emrTaskInstanceType

    The EMR task node type. Infogix supports the r5d instance range for this setting.

    The default value is:For example:

    emrTaskInstanceType = "r5d.xlarge"

    emrMasterInstanceCount

    The number of EMR Master instances.

    The default value is emrCoreInstanceCount = "3"

    emrMasterInstanceType

    The EMR primary node type. Infogix supports the r5d instance range for this setting.

    The default value is:For example:

    emrMasterInstanceType = "r5d.xlarge"

    emrKmsKeyArn The ARN of the KMS key to use for EMR.
    emrMasterSecurityGroupId The ID of the security group to use for the EMR primary instance.
    emrSlaveSecurityGroupId The ID of the security group to use for the EMR secondary instances.
    emrServiceAccessSecurityGroupId The ID of the Amazon EC2 service-access security group. This is required when the cluster runs on a private subnet.
    emrInstanceProfileArn The ARN of the instance profile to assign to EMR instances.
    emrIamRoleArn The ARN of the IAM role to use for EMR access.
    emrCustomAmiId

    The AMI ID to use for EMR instances if you are using a custom image.

    If you are not using a custom image, do not add this setting to the file.

    sslConnectionStatus The sslConnectionStatus will set the flag for ssl configure. The default value is : sslConnectionStatus = "false" (for "http").
    httpsListenerEnable The elastic beanstalk will set the httpsListenerEnable value to set the Application load balancer listeners setting. The default value is : httpsListenerEnable = "false" (since sslConnectionStatus is "false" and it is for https configuration).
    httpListenerEnable The elastic beanstalk will set the httpListenerEnable value to set the Application load balancer listeners setting. The default value is : httpListenerEnable = "true" (since sslConnectionStatus is "false" and it is for http configuration).
    sslCertificateArn The elastic beanstalk will set the ssl certificate value. The default value is : sslCertificateArn = "" (since the sslConnectionStatus is "false" and it is for an HTTP configuration).
  2. Name the variables file dqplus.tfvarsvars.auto.tfvars and save it in the infra/aws/config directory.

  3. Create a new variables file called dqplus.properties and add the following property:

    OVERRIDES_FOLDER=<path to OVERRIDES_FOLDER>

    For example: OVERRIDES_FOLDER=/tmp/files/

  4. From the infra/aws/config directory, run the following commands:

    terraform plan --target=module.buckets –-var-file=dqplus.tfvarsvars.auto.tfvars --out=bucketplan

    terraform apply bucketplan

  5. Upload the emrBootstrap.sh and emrBootstrap2.sh files to the newly created bucket in the following location:

    /sparkshared/bootstrap/emrBootstrap.sh and /sparkshared/bootstrap/emrBootstrap2.sh.
    Tip: You can find the emrBootstrap.sh file in the config directory.
    Note: If you are using a custom EMR Bootstrap script, upload that one, not the file from the config directory. See step 3 of Creating the infrastructure for more details.
  6. Upload any self signed certificates that you need to install to the newly created bucket in the following location:

    /sparkshared/certs/

  7. From the infra/aws/config directory, run the following commands:

    terraform plan --var-file=dqplus.tfvarsvars.auto.tfvars out=plan

    terraform apply plan

    Note: Running terraform apply plan will create the infrastructure and will incur costs. Before executing this command, it is recommended that you first verify what will be created by running terraform show plan.
  8. From the infra/aws/config directory, run the following commands:

    For EB deployment:

    python3 ./properties.py

    python3 ./password.py

    For K8s deployment:

    python3 ./properties.py

    python3 ./password.py

    python3 ./properties-ecr.py

    python3 ./password-k8s.py

The properties.py script parses data from the output of Terraform and creates a new properties file required by the installer to install the product.

The password.py script parses data from the output of Terraform and creates a new file containing sensitive information required by the installer.

The properties-ecr.py script parses data from the output of Terraform and creates a new properties file required by the installer for docker container registry (AWS ECR).

The properties-k8s.py script parses data and edit new properties required by the installer for Kubernetes deployment.