The Execute Query in DB node allows you to take the contents of a data set that you have placed in a Data Store and use those contents to query an external data base. Results from the query are then returned to the node, where they can be further used in your Analysis.
Details tab
The Details tab is used to construct your SQL query and to specify how (and what) incoming data should be loaded.
Edit SQL
The Edit SQL button allows you to construct a query that will be executed in the external database that the node is configured to point to. This query will be made when the Analysis executes.
When using field names from incoming nodes, the following syntax should be used:
${fieldName}
As an example of using the Edit SQL feature, consider the following data set:
TableName |
---|
ACME_RECON_FILES_CDATA |
ACMECORP |
ACME_PROMO_CDATA |
Table Names Data Set
To use the contents from each row in the TableName field within a query, you would refer to the TableName field as ${TableName}. When the query executed, it would then run the query for each row in the dataset, substituting the TableName field value for ${TableName}.
For example, with the data set above, the query:
SELECT * FROM ${TableName}
Would run the following 3 queries during execution.
SELECT * FROM ACME_RECON_FILES_CDATA
SELECT * FROM ACMECORP
SELECT * FROM ACME_PROMO_CDATA
As discussed below, further modification of the query and the creation of Output fields in the Outputs tab would allow you to return the results of the query to the node.
Edit Sample SQL
Like the Edit SQL option, the Edit Sample SQL option allows you to write queries using the contents of fields from incoming nodes. The difference between the two, however, is that Edit Sample SQL only uses content from the Analysis sample (i.e., the values shown in the Analysis' sheets). Edit SQL, on the other hand, is queried against the entire database and is what is run during the Analysis' execution.
Effectively, this means: 1) that if you write an Edit Sample SQL script, its results will be shown in the Execute Query in DB node's sheet during Analysis editing; 2) if no Edit Sample SQL script is written, the results of the Edit SQL script will be shown in the sample sheet instead; 3) and, regardless of whether an Edit Sample SQL script is used, the Edit SQL script will always be used during execution.
Array Field to Process
The Execute Query in DB node can also process up to one incoming array field, and use the contents of that field in its queries. Incoming array fields may come from other Analysis nodes that output them, such as a Co-Group node or a Script node.
Once the array field is selected, it can be used in queries just like other fields, using the same ${fieldName} syntax.
Data Load Range
AllUses all data in the table, regardless of any last update time stamp.
New Data Since Last ExecutionOnly uses data that has been added to the database table since the last time the Analysis ran.
If you would like your query to only pull new data since the last time the analysis executed, you will also need to include one of the following in a WHERE clause:
WHERE <fieldContainingLastUpdateTime> BETWEEN ${fromTimestamp} AND ${toTimestamp}
WHERE <fieldContainingLastUpdateTime> >= ${fromTimestamp} and <fieldContainingLastUpdateTime> < ${toTimestamp}
Additionally, <fieldContainingLastUpdateTime> must be in the UTC time zone or converted to the UTC time zone within the query.
In these queries, <fieldContainingLastUpdateTime> refers to a field in your database table that contains last update time stamps. ${fromTimestamp} and ${toTimestamp} refer to system fields that allow the Execute Query in DB node to calculate the Data Load range, based on the setting you choose in the Data Load Range dropdown menu. After adding the WHERE clause to your query, you should make the New Data Since Last Execution selection.
Isolate each record for processing
If checked, the Execute Query in DB node will open a database connection for each record that is processed. If unchecked, the node will create one connection per data partition.
Database
The Database tab is used to connect to an external database. The database specified here is where the query created in the Details tab will execute.
Outputs
The Outputs tab is used to create fields that will hold the results of queries made in the Details tab. Output fields can then be referenced in your SQL query using the AS keyword.
For example, consider the following dataset:
TableName |
---|
ACME_RECON_FILES_CDATA |
ACMECORP |
ACME_PROMO_CDATA |
Table Names Data Set
To find the row count of each table that is named in this data set, you could first use the Outputs tab to create a New Field called ROWCOUNT.
Within the Details tab, you could then write the following query:
SELECT COUNT(*) AS ROWCOUNT FROM ${TableName}
The results of this query, displayed in the Execute Query in DB node's sheet, would simply be the number of rows contained in each table from the TableName data set. This could, for example, look something like this:
ROWCOUNT |
---|
4 |
91 |
18 |
Table Names Query Result Set
New Field vs. Input Field
As with other Analysis nodes and features throughout the system, the Outputs tab will let you add either New Fields or Input Fields.
A New Field is a field that will be newly created by the result of the node's query (referred to in the SQL script with an AS operator). You must create a New Field for every field you wish to create with your query.
An Input Field, on the other hand, refers to fields that are entering the Execute Query in DB node from other nodes. If you would like to use one of these incoming fields in your SQL script, you must create an Input field for it.