The Convert to Micro Batch node converts streaming data to batched data.
The input for this node must come from another Streaming node. The Streaming nodes feed data into the application in real time from a streaming source, such as a Kafka topic. You can use this node as a bridge between the incoming streaming data and an Analysis node that requires a batch input for further processing. You can define the fixed intervals at which the batched data should be processed.
Example
An Analysis contains a Streaming Data Store node which is pulling account data from a Kafka topic:
accountID | dateOfJoin | name | accountType |
---|---|---|---|
52145697 | 0/29/2004 | Smith | Savings |
82225766 | 7/22/2003 | Green | Checking |
97145123 | 9/12/2018 | Mohan | Savings |
You want to use a Filter node to remove any invalid date records. As the Filter node requires a batch input, the data is first passed from the Streaming Data Store node to a Convert to Micro Batch node.
The Batch Interval property is set to 50000 milliseconds, and the Output Mode property is set to Append.
The Convert to Micro Batch node then feeds the data into a Filter node which has the following script in the Filter Expression property:
ISDATE(dateOfJoin , 'MM/dd/yyyy')
In this example, the first record is removed from the data set due to the invalid date value of 0/29/2004
in the dateOfJoin
field.
Properties
Display Name
Specify the name of the node that is displayed on the Analysis Designer canvas.
The default value is Convert to Micro Batch.
Batch Interval
Specify the period of time over which to collect data. Enter a numeric value and select a unit of time. Choose from:
- Milliseconds
- Seconds
- Minutes
- Hour(s)
The default value is Milliseconds.
Output Mode
Select an output mode. Choose from:
- Append - Adds new rows. Append mode supports queries where the rows that are added are never going to change, for example, select, filter or join queries.
- Update - Outputs rows that were updated since the last trigger.
- Complete - Outputs all rows after every trigger. Complete mode supports aggregation queries.
The default value is Append.
The following table lists the different streaming queries that are supported by each output mode:
Query type |
Supported output mode |
---|---|
Queries with aggregation |
|
Queries with joins |
|
Other queries |
|
Trigger Once
Select Trigger Once to specify that the node should only pull data from a streaming source a single time, processing all available data before stopping. This option can be useful when you want to periodically spin up a cluster, process everything that is available since the last period, and then shutdown the cluster. In some cases, this may lead to significant cost savings.
By default, Trigger Once is not selected.