Using iteration with an embedded flow - 23.1

Spectrum Dataflow Designer Guide

Version
23.1
Language
English
Product name
Spectrum Technology Platform
Title
Spectrum Dataflow Designer Guide
First publish date
2007
Last updated
2024-05-09
Published on
2024-05-09T23:01:03.226155

Iteration settings specify how an embedded flow should process incoming records. By default, an embedded flow processes each record individually just as any other stage in the flow would. But if you use iteration, you can process groups of records together, which can be useful for things like performing comparisons or calculations based on groups of records rather then the entire set of input data. You can also use iteration to set stage options based on the data in each record.

There are two kinds of iteration: per-record iteration and per-group iteration. In per-record iteration, an embedded flow process one record at a time and the result is sent along to the next stage following the embedded flow. Per-record iteration is useful if you want to set stage options on a record-by-record basis using field values.

In per-group iteration, records are grouped by a key field and the embedded flow processes each group. All the records in a group are processed in one iteration, then the group is written to the next stage following the embedded subflow. Use per-group iteration to perform processing on groups of related records, as well as to set stage options to use when processing the group of records. For example, you might want to group records by customer ID so that you can perform an analysis of each customer's records, perhaps to determine which store each customer visits most often.

You should consider the impact on performance when using iteration. Each time a new iteration starts, there is some overhead during the initialization of the embedded flow, and this overhead can be significant, especially if you have embedded flows within other embedded flows. For example, if the an embedded flow iterates 1,000 times and it contains within it another embedded flow that also iterates 1,000 times, the total number of iterations would be 1,000,000. Using per-record iteration has a more significant impact on performance since each record kicks off a new iteration.

  1. Create an embedded flow containing the stage or stages that you use for iteration.
    Note: There are some limitations to what can be included in embedded flows that have iteration enabled:
    • The Stream Combiner stage cannot be the first stage in an embedded flow that has iteration enabled.
    • The embedded flow cannot contain a sink that writes to a file located on the client. Sinks inside an embedded flow must write to a file on the Spectrum Technology Platform server or on a file server.
  2. Double-click the embedded flow icon.
  3. Check the Enable iteration check box.
  4. If there is more than one input channel connected to the embedded flow, use the Port field to choose the port whose records you want to use to drive iteration.

    For example, say you have two input ports, A and B, and you choose to iterate each time a key field changes. If you choose to use port B for iteration, the embedded flow will start a new iteration each time a key field in the records from port B changes. All the records from the other port, port A, will be read into the embedded flow, cached, and used for each iteration.

  5. Select the type of iteration you want to perform.
    Iterate each time a key field changes
    In this type of iteration, the embedded flow processes groups of records that have the same value in one or more fields. When the embedded flow finishes processing the group of records, the embedded flow resets and a new group of records is processed. Use this type of iteration to create embedded flows that process groups of records and groups each output record group separately.
    Tip: If you choose this type of iteration, you can improve performance by placing a Sorter stage in front of the embedded flow and sorting the records by the key field.
    Iterate per record
    In this type of iteration, the embedded flow processes one record at a time. Every time one record completes the embedded flow processing, the result is sent to the output and a new record is processed. Embedded flows that iterate for the record handle each record as a new flow execution.
  6. If you choose Iterate each time a key field changes, check the box Ignore case when comparing values if you want to ignore differences in case when evaluating key field values to determine record groups.
  7. Specify one or more key fields.
    1. Click Add.
    2. Choose the field you want to use as a key field.
    3. If you want to use the field's value to set a stage option within the embedded flow, specify the name of the option you want to set.
    4. Click OK.
    5. Add additional key fields if needed.

      If you have more than one key field and you chose the option Iterate each time a key field changes, records must contain the same value in all key fields to be grouped together.