Implementing a Kafka architecture requires a decision regarding the format of the Kafka message data payload. The two most popular choices are JSON and AVRO. While JSON is a near universal format, simple to read and use, each and every Kafka message produced and consumed will also contain the schema describing the data in the payload. The issue with this technique is the size of the combined JSON payload, the size of messages written over TCP/IP and the storage required in the cluster for the duplicated schema information.
AVRO solves these problems through a serialization system that separates the data from the schema. Data is written by the producer and read by the consumer using the schema that describes the data at the time it was written by a producer. This of course assumes that the tools and languages used by the producer and consumer are AVRO "aware". The Connect CDC SQData Apply Engine automates the registration of Kafka topic schemas.
Automatic registration is a two step process. When the Apply Engine is parsed, either using the two step "Parse and Run" technique or the "two-in-one SQDENG" method the Engine first retrieves each schema from the Confluent Registry matching the Descriptions in the Engine script. Then it compares those schemas to the Descriptions using the TOPIC and SUBJECT values specified for each Description. If they are compatible the Parse phase completes and the Apply Engine is ready for execution. Any schema retrieved that do not match the "current" Descripion are automatically registered, the Parse phase completes and the Apply Engine is ready for Execution.
Example 1 Db2 to Kafka AVRO
----------------------------------------------------------------------
-- Name: DB2TOKAF: Z/OS DB2 To Kafka AVRO on Linux
...
-- Change Log:
----------------------------------------------------------------------
-- 2019-02-01 INITIAL RELEASE using JSON
-- 2019-02-02 Switch to Confluent AVRO Schema Registry
...
OPTIONS
CDCOP('I','U','D') -- Set CHANGE OP Constants
,USE AVRO COMPATIBLE NAMES -- Required for AVRO Targets
,CONFLUENT REPOSITORY 'http://schema_registry.precisely.com:8081'
;
----------------------------------------------------------------------
-- Source Descriptions
----------------------------------------------------------------------
BEGIN GROUP DB2_SOURCE;
DESCRIPTION DB2SQL ./DB2DDL/EMP.ddl AS EMPLOYEE
KEY IS EMP_NO
TOPIC IVP_HR_EMPLOYEE
SUBJECT IVP_HR_EMPLOYEE-value;
DESCRIPTION DB2SQL ./DB2DDL/DEPT.ddl AS DEPARTMENT
KEY IS DEPT_NO
TOPIC IVP_HR_DEPARTMENT
SUBJECT IVP_HR_DEPARTMENT-value;
END GROUP;
----------------------------------------------------------------------
-- Target Datastore(s)
----------------------------------------------------------------------
DATASTORE kafka:///*/key -- specify dynamic topic
OF AVRO -- specify AVRO format
FORMAT CONFLUENT -- use Confluent Schema Registry
AS TARGET
DESCRIBED BY GROUP DB2_SOURCE -- use source for REPLICATE
;