Connection points allow you to simplify the view of your data flow by organizing multiple data sets into a single bundle, reducing the number of connection lines on the canvas. This can be particularly useful when working with large, complex data flows.
Even with a relatively small number of nodes and connections you can see how connection points can be used to reduce the amount of noise on the canvas. In the following data flow, three Cat nodes are connected to varying combinations of the upstream nodes.
Using a connection point in the same data flow results in a much cleaner canvas.
The connection point also makes it easier to reuse the various data sets in different contexts, by using tags to identify the data sets and select which data sets to pass to downstream nodes.
You can also use connection points to set up conditional path execution in data flows, by configuring the connection point to pass the first enabled data set to the next node in the data flow. For more information, see First Enabled Data Set.
Migrating from LAE
- If you are migrating from LAE, you can import legacy data flows that contain Bundlers, Unbundlers and Bypasses, and these are displayed in Data360 Analyze as connection points and connection filters. These will continue to work without any additional configuration, and you can also upgrade selected connection filters to use the new improved filter mechanics.Note: Once imported, legacy connection filters will be displayed in orange. However, once they are edited and upgraded as a result, their color returns to normal.
- You cannot use run property values to configure connection points.
Adding a connection point
- Open a data flow.
- Right-click anywhere in the canvas, and select Insert >Connection Point.
- Connect a node output to the connection point input. The connection point updates to show that one data set has been connected to it.
- Connect a second node output to the connection point input. As long as both inputs have been executed, the connection point updates to show that two data sets are connected.
- Hover over the "2 sets" icon to see details of the data sets that are connected to the connection point.
- Connect further node outputs as required.
Unbundling data sets
"Unbundling" is the process of selecting which data sets will be passed to the next node in the data flow.
- Add a node to the canvas.
- Connect the output of the connection point to the node's input. When you connect the output to the node, a connection filter is added, and a button is displayed that you can click to select the data sets that will be passed to the input. The available options depend on whether the node that you added in the previous step can accept only a single data set or multiple data sets as an input, for example a Cat node or Composite node. The connection filter button is initially displayed with a white background, which turns blue after you have configured a filter.
- Click the connection filter button, then select which of the data sets going into the connection point will be passed to the next node. Choose one of the following options.
- All Data Sets - Pass through all data sets. This is the default option for nodes that can accept multiple data sets.
- Data Sets by Tag / Single Data Set by Tag - Select one or more specific data sets to pass through based on node output pin names. If a node can take multiple data sets, the option is displayed as Data Sets by Tag. If a node can accept only a single data set, the option is displayed as Single Data Set by Tag, and is selected by default.
- First Enabled Data Set - Pass through only the first enabled data set. In this case, you define which inputs to evaluate, and the first enabled data set from the list of selected inputs will be passed through.
All data sets
For a node that can accept multiple data sets, the default behavior for a connection filter is to pass through all data sets to the next node in the data flow. In this case, the connection filter button is displayed with a white background to indicate that a filter has not been configured.
When you connect a connection point to a node that can only receive a single output, this option is not available.
Data sets by tag
Select the data sets that you want to unbundle and pass to the next node in the data flow. The data sets that you select are highlighted in the list as you make selections.
When you hover over a data set's tag, its path in the data flow is highlighted on the canvas.
If a tag that you have previously selected in the connection filter does not match any data sets, the tag is displayed in red. This can happen when output pins are renamed, or when input pins are removed.
Single data set by tag
Select a data set to pass to the next node in the data flow. The data set that you select is highlighted.
This is the default option when you connect from a connection point to a node that accepts a single data set as its input.
First enabled data set
You can ensure that your connection point outputs a single data set by selecting the First Enabled Data Set option.
When you select this option, the available data sets are listed. You define the order in which the data sets are used by selecting them from the list. Click the buttons in the order that you want the data sets to be selected. As you select data sets, the order of priority is updated.
If you click a selected data set, this will remove the selection and the order of priority is updated so that any data sets that are still selected are moved up in the order accordingly.
For example, in the data flow above, if the node that trims the white space from transactions, and which generates the "cleanTransactions" data set, is disabled in the data flow, the connection point lists two data sets, and the connection filter shows that no records are arriving from the "cleanTransactions" output pin. The next node in the data flow now takes the "sortedOrders" data set as its input.
Aggregating connection points
You can connect connection points to each other to aggregate their contents. To help you to identify which connection point each data set comes from you can add name tags to the connection points.
- Right-click the connection point.
- Select Add Name Tag.
- Enter a name for the connection point.
After you have added name tags, the data sets will be identified by the connection point name tags. In the screenshot below, two connection points have been tagged "source" and "processed". A third connection has been added, and connected to a downstream composite node. The connection filter for this node aggregates the data sets from both the "source" and "processed" connection points, and the data sets from the "processed" connection point have been selected.
When you have a connection point that accepts multiple bundles, as in the screenshot above, you can select all the data sets from a single connection point by selecting the connection point name tag. In the example above, clicking any of the instances of "source" would select all three data sets from the "source" connection point. You could then deselect individual data sets as required.
Troubleshooting
If you rename an output pin after you have connected it to a connection point, and the connection point is connected to a downstream node, an error is generated at the connection filter. The connection line out of the connection point to the next node is highlighted in red to show the error, and the connection filter shows that the connection point expects a data set with a different name, which is no longer connected.