Record Joiner - 23.1

Spectrum Dataflow Designer Guide

Version
23.1
Language
English
Product name
Spectrum Technology Platform
Title
Spectrum Dataflow Designer Guide
Copyright
2024
First publish date
2007
Last updated
2024-10-11
Published on
2024-10-11T22:55:47.128293

Record Joiner performs a SQL-style JOIN operation to combine records from different streams based on a relationship between fields in the streams. You can use Record Joiner to join records from multiple files, multiple databases, or any upstream channels in the flow. You must connect at least two input channels to Record Joiner. The results of the JOIN operation are then written to one output channel. Optionally, records that do not match the join condition can be written to a separate output channel.

Using Record Joiner

To use the Record Joiner stage in a new Flow, perform these steps:
  1. On the Spectrum Flow Designer Home page, click New.
  2. On the New Flow page, click Job, Service, or Subflow, as required and then click the correcsponding blank canvas.
  3. Click Ok.
  4. In the dialog box that appears, give a name to the Flow, Job, Service, or Subflow you are creating.
  5. Click Ok.
  6. From the Palette Panel drag the Record Joiner stage to the canvas.
    Note: Record Joiner is one of the Control Stages.
  7. Drag all Sources (of the records that are to be joined) to the canvas and connect their Output Port to the Record Joiner Input Port.
  8. Drag the Sink for the joined records and connect its Input Port to the Record Joiner Output Port.
  9. Configure the Sources. See the documentation of the respective stage for field-level details.
  10. Click Record Joiner and configure the Join Definition as described below.

Join Definition

Option Description

Left port

The port whose records you want to use as the left table in the JOIN operation. All other input ports will be used as right tables in the JOIN operation.

Note: "Left" table and "right" table are SQL JOIN concepts. Before using Record Joiner you should have a good understanding of the SQL JOIN operation. For more information, see wikipedia.org/wiki/Join_(SQL).

Join type

The type of JOIN operation you want to perform. One of the following:

Left Outer
Returns all records from the left port even if there are no matches between the left port and the other ports. This option returns all records from the left port plus any records that match in any of the other ports.
Full
Returns all records from all ports.
Inner
Returns only those records that have a match between the left port and another port. For example, if you have four input sources and port 1 is the left port, an inner join will return records that have matching fields between port 1 and port 2, port 1 and port 3, and port 1 and port 4.

Join Fields

The field or fields from the left port that must match the data in a field from another port in order for the records to be joined.

Note: The valid data types for join fields are integer, string, datetime, date, long, float, double, and big decimal.

Data from the left port is sorted

Specifies whether the records in the left port are already sorted by the field specified in Join Fields. If the records are already sorted, checking this box can improve performance. If you do not check this box, Record Joiner will sort the records according to the field specified in Join Fields before performing the join operation.

If you have specified multiple join fields, then the records must be sorted using the order of the fields listed in Join Fields. For example, if you have two join fields:

  • Amount
  • Region

Then the records must be sorted first by the Amount field, then by the Region field.

Important: If you select this option but the records are not sorted, you will get incorrect results from Record Joiner. Only select this option if you are sure that the records in the left port are already sorted.

Join Definitions

Describes the join conditions that will be used to determine if a record from the left port should be joined with a record from one of the other ports: port1.Name = port2.Name

This indicates that if the value in the Name field of a record from port1 matches the value in the Name field of a record from port2, the two records will be joined.

To modify a join condition, click Modify. Select a field from the right port whose data must match the data in the join field from the left port in order for the records to be joined. If you want to change the left port field, click Cancel and change it in the Join Fields field. If the records in the right port are sorted by the join field, check the box Data from the right port is sorted. Checking this box can improve performance.

Important: If you select Data from the right port is sorted but the records are not sorted, you will get incorrect results from Record Joiner. Only select this option if you are sure that the records in the right port are already sorted.

Field Resolution

This tab specifies which port's data to use in the joined record in cases where the same field name exists in more than one input port. For example, if you are performing a join on two sources of data, and each source contains a field named DateOfBirth, you can specify which port's data to use in the DateOfBirth field in the joined record.

If there are fields of the same name but with different data, and you want to preserve both fields' data in the joined record, you must rename one of the fields before the data is sent to Record Joiner. You can use the Transformer stage to rename fields.

Handling Records That Are Not Joined

In order for a record to be included in the Record Joiner output it must meet the join condition, or a join type must be selected that returns both joined records and those that did not meet the join condition. For example, a full join will return all records from all input ports regardless of whether a record meets the join condition. In the case of a join type that does not return all records from all ports, such as a left outer join or an inner join, only records that match the join condition are included in the Record Joiner output.

To capture the records that are not included in the result of the join operation, use the not_joined output port. The output from this port contains all records that were not included in the regular output port.

Records that come out of this port have the field InputPortIndex added to them. This field contains the number of the Record Joiner input port where the record came from. This allows you to identify the source of the record.

Note:
  • For optimal performance of this stage, ensure two independent streams of records are joined to generate a consolidated output.
  • If a single path is first branched using either a broadcaster or conditional router then re-joined back using a Record Joiner, the flow may stop responding. In case multiple stages are used between branching and joining, use the Sorter as close to the Record Joiner as possible.

Math

The Math stage handles mathematical calculations on a single data row and allows you to conduct a variety of math functions using one or more expressions. Data is input as strings but the values must be numeric or Boolean, based on the type of operation being performed on the data.

  1. Under Control Stages, click the Math stage and drag it to the canvas, placing it where you want on the flow.
  2. Connect the stage to other stages on the canvas.
  3. Double-click the Math stage. The Math Options dialog box appears, with the Expressions tab open. This view shows the input fields, the Calculator, and the Expressions canvas. Alternately, you can click the Functions tab to use functions instead of the Calculator.
The Input fields control lists the valid fields found on the input port. Field name syntax is very flexible but has some restrictions based on Groovy scripting rules. If you are not familiar with Groovy scripting, see this website for complete information about Groovy:groovy-lang.org.
Note: This stage is not available in the tech preview version of Flow Designer.