Description of architecture - Data360_DQ+ - 11.X

Data360 DQ+ Enterprise Installation

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
11.X
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Enterprise Installation
Copyright
2024
First publish date
2016
ft:lastEdition
2024-06-06
ft:lastPublication
2024-06-06T12:37:34.761477

Application

Application refers to the Data360 DQ+ web application servers, the components that allow clients to access Data360 DQ+ via a web browser.

In the diagram, Application components are labeled as App Servers. As depicted, n App Servers are running on n nodes. This represents the fact that you can create any number of App Servers to suit your needs. Generally, the more users making requests to the system, the more App Servers required. Specific configuration examples using two App Servers are provided in Performing installation.

Amount Allowed: 2 to n

Load Balancer

The Load Balancer is a component that runs HTTPD to ensure a balanced load between the multiple Application Servers used by Data360 DQ+. Functionally, the Load Balancer receives requests from clients before they reach an Application Server. After intercepting a request, the Load Balancer then decides which Application Server the request should be routed to, depending on which server can most efficiently serve the client.

In the diagram, the Load Balancer is shown connected to each Application Server. In a typical configuration, the Load Balancer resides on the system’s Maintenance node.

Amount allowed: 1

ApplicationDb

From a functional perspective, this is the container that holds all Data360 DQ+ definitions that need to be persisted, that is, items created by users of Data360 DQ+, including Pipelines, Paths, and other Data Stages. ApplicationDb does not, however, store a definition’s associated data. This is the responsibility of Compute primary/secondary.

In the diagram, ApplicationDb is shown connected to all App Servers, as every server must be able to access definitions when a user requests them.

Currently, only one ApplicationDb component is supported per instance of Data360 DQ+, and it typically should reside on the system’s Maintenance node. This is detailed in Performing installation.

Amount allowed: 1

Compute Cluster

"Compute primary and Compute secondary" or "Compute Cluster" refer to a Hadoop cluster or Google Dataproc cluster to which you can point the system. This cluster is used to perform Analysis processing for Data360 DQ+.

When used with Hadoop, HDFS serves as the repository for all of the data that Data360 DQ+ users work with. When used with Google Dataproc, Google Cloud Storage (GCS) serves as the repository for all data that Data360 DQ+ users work with.

ComputeDb Nodes

ComputeDb refers to the component that loads data into Data360 DQ+ Data Views and processes queries made by Data360 DQ+ Dashboards. This component includes Vertica, an application designed to perform Big Data analytics.

In the diagram, ComputeDb is shown in the ComputeDb cluster. Like ApplicationDb and the Compute cluster, the ComputeDb cluster must be connected to all App Servers, as every server must be able to initiate visualization processing if a user requests it. In a typical configuration, the ComputeDb nodes are distributed throughout all machines. This is detailed in Performing installation. Here, n nodes are shown.

Generally, the number of ComputeDb nodes used will depend on the extent to which you will use Data360 DQ+ to perform data visualization. If you intend to use Dashboards that perform resource intensive computations or for which data is updated on a regular basis, adding more ComputeDb nodes will increase system performance. On the other hand, if you are using Data360 DQ+ primarily for other purposes, such as data analysis, you will want to save space for Compute cluster components instead.

Amount allowed: n Nodes