Convert to Micro Batch - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
ft:lastEdition
2024-07-09
ft:lastPublication
2024-07-09T15:09:58.774265

The Convert to Micro Batch node converts streaming data to batched data.

The input for this node must come from another Streaming node. The Streaming nodes feed data into the application in real time from a streaming source, such as a Kafka topic. You can use this node as a bridge between the incoming streaming data and an Analysis node that requires a batch input for further processing. You can define the fixed intervals at which the batched data should be processed.

Example

An Analysis contains a Streaming Data Store node which is pulling account data from a Kafka topic:

accountID dateOfJoin name accountType
52145697 0/29/2004 Smith Savings
82225766 7/22/2003 Green Checking
97145123 9/12/2018 Mohan Savings

You want to use a Filter node to remove any invalid date records. As the Filter node requires a batch input, the data is first passed from the Streaming Data Store node to a Convert to Micro Batch node.

The Batch Interval property is set to 50000 milliseconds, and the Output Mode property is set to Append.

The Convert to Micro Batch node then feeds the data into a Filter node which has the following script in the Filter Expression property:

ISDATE(dateOfJoin , 'MM/dd/yyyy')

In this example, the first record is removed from the data set due to the invalid date value of 0/29/2004 in the dateOfJoin field.

Properties

Display Name

Specify the name of the node that is displayed on the Analysis Designer canvas.

The default value is Convert to Micro Batch.

Batch Interval

Specify the period of time over which to collect data. Enter a numeric value and select a unit of time. Choose from:

  • Milliseconds
  • Seconds
  • Minutes
  • Hour(s)

The default value is Milliseconds.

Output Mode

Select an output mode. Choose from:

  • Append - Adds new rows. Append mode supports queries where the rows that are added are never going to change, for example, select, filter or join queries.
  • Update - Outputs rows that were updated since the last trigger.
  • Complete - Outputs all rows after every trigger. Complete mode supports aggregation queries.

The default value is Append.

The following table lists the different streaming queries that are supported by each output mode:

Query type

Supported output mode

Queries with aggregation
  • All output modes are supported for aggregations that use watermarking. For aggregations that do not use watermarking, only the Update and Complete output modes are supported.
  • A Streaming SQL node with aggregation will not work if a subsequent Convert to Micro Batch node is in Append mode.
Queries with joins
  • If a Convert to Micro Batch node follows a Streaming Join node, the Output Mode property must be set to Append.
Other queries
  • Only the Append and Update output modes are supported for queries where the data is not aggregated.

Trigger Once

Select Trigger Once to specify that the node should only pull data from a streaming source a single time, processing all available data before stopping. This option can be useful when you want to periodically spin up a cluster, process everything that is available since the last period, and then shutdown the cluster. In some cases, this may lead to significant cost savings.

By default, Trigger Once is not selected.

Note: There is no guarantee that each execution will process all available data in the streaming source when only triggered once.