The Repartition node can be used to control how data is partitioned during Analysis execution.
Partition By Fields
This parameter allows you to choose which fields to use when repartitioning the data set.
Number of Partitions
This parameter allows you to choose how many partitions the data set should be divided into.
Repartitioning example
Suppose you had the following data set.
name |
value |
---|---|
A |
10 |
B |
11 |
C |
12 |
D |
13 |
A |
14 |
A |
15 |
C |
16 |
B |
17 |
C |
18 |
B |
19 |
Were you to select name as a Partition By Field and specify 4 as the Number of Partitions, the Repartition node might produce the following result.
name |
value |
---|---|
C |
18 |
C |
12 |
C |
16 |
A |
10 |
A |
14 |
A |
15 |
B |
11 |
B |
17 |
B |
19 |
D |
13 |
Within your result data set, records with similar Partition By Field values are placed within the same partition - that is, within close proximity of one another within the data set - in no particular order. Additionally, the specified Number of Partitions parameter matches the number of unique values within the Partition By Field.