Hash Split - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Splits the input record set into multiple streams to allow parallel processing of subsets of the input.

Takes one input and a list of field names. Splits the input into any number of outputs, based on hashing the values in the specified fields. This allows you to split your data into a set of subsequent nodes that can all process at the same time with lower data counts, as opposed to one node processing with a very large data count.

However, it is important to note that if you are using the Hash Split on two streams, with the intention of merging them into a Join operation, you must use the SplitFields property. Otherwise, the results will be inconsistent across the streams or non-deterministic.

Tip: For optimal performance, the number of outputs should be odd, preferably a prime number.

Properties

SplitFields

Specify a comma-separated list of fields on which to hash.

Inputs and outputs

Inputs: in1.

Outputs: out1, out2, multiple optional.