This section gives an architectural overview of running Data360 DQ+ on AWS.
At a high level, there must be a primary region, and a secondary region. The following components exist within the primary region:
- Application Load Balancer - Ensures a balanced load between the AWS Elastic Beanstalk instances.
- AWS Elastic Beanstalk - Provisions the necessary AWS resources to run Data360 DQ+, allowing users to access the application via a web browser. For more information, see the AWS documentation, for example: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html
- Amazon Athena - Analyzes data in your S3 storage buckets using standard SQL. For more information, see the AWS documentation, for example: https://docs.aws.amazon.com/athena/latest/ug/what-is.html
- Amazon Aurora - Contains items created by Data360 DQ+ users, including pipelines, paths and other data stages. For more information, see the AWS documentation, for example: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html
- Amazon Redshift Databases - Load data into Data360 DQ+ data views and process queries made by Data360 DQ+ dashboards. For more information, see the AWS documentation, for example: https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html
- Elastic Map Reduce (EMR) Cluster - Allows you to run big data frameworks, such as Apache Hadoop. For more information, see the AWS documentation, for example: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html
- Amazon S3 Buckets - Cloud-based data storage. For more information, see the AWS documentation, for example: https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html
The secondary region contains a replication of the data in the primary region Amazon S3 Bucket.
See the architecture diagram below for an overview of how these components fit together.
Data360 DQ+ Architecture with Elastic Beanstalk