You must define a Distribution Key, that is, a column or a set of columns that uniquely identifies each row in a table, for each table involved in data distribution. The columns act as, but need not be defined as, a primary key in the data modeling sense.
A primary key or unique index is:
-
Recommended for target-only, unidirectional tables for maximum Connect CDC performance
-
Required by Connect CDC for bi-directional tables
These are particularly important in splitting a table. The Distribution Key functions as a set of unique constraining columns in one table that are sent to the second table. The Distribution Key assures that the data sent does not collide with other data nor with other database restraints because it uniquely identifies those rows.
All Distribution Key columns must be defined as a primary key or unique index. Otherwise, the merge fails since inserts to the target table will not fail when data is arriving from multiple source tables. This depends on the unique constraint violation to change the insert into an update for the secondary inserts.