Automatic AVRO Schema registration - connect_cdc_sqdata - Latest

Connect CDC (SQData) Apply engine

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect CDC (SQData)
Version
Latest
Language
English
Product name
Connect CDC (SQData)
Title
Connect CDC (SQData) Apply engine
Copyright
2024
First publish date
2000
Last edition
2024-07-30
Last publish date
2024-07-30T20:19:56.898694

Implementing a Kafka architecture requires a decision regarding the format of the Kafka message data payload. The two most popular choices are JSON and AVRO. While JSON is a near universal format, simple to read and use, each and every Kafka message produced and consumed will also contain the schema describing the data in the payload. The issue with this technique is the size of the combined JSON payload, the size of messages written over TCP/IP and the storage required in the cluster for the duplicated schema information.

AVRO solves these problems through a serialization system that separates the data from the schema. Data is written by the producer and read by the consumer using the schema that describes the data at the time it was written by a producer. This of course assumes that the tools and languages used by the producer and consumer are AVRO "aware". The Connect CDC SQData Apply Engine automates the registration of Kafka topic schemas.

Automatic registration is a two step process. When the Apply Engine is parsed, either using the two step "Parse and Run" technique or the "two-in-one SQDENG" method the Engine first retrieves each schema from the Confluent Registry matching the Descriptions in the Engine script. Then it compares those schemas to the Descriptions using the TOPIC and SUBJECT values specified for each Description. If they are compatible the Parse phase completes and the Apply Engine is ready for execution. Any schema retrieved that do not match the "current" Descripion are automatically registered, the Parse phase completes and the Apply Engine is ready for Execution.

Note: Apply engines that utilize the REPLICATE function, for DB2, IMS and VSAM source data still require manual intervention to add/update the source DESCRIPTION parts that correspond to new and altered RDBMS schemas and IMS or VSAM "copybooks". Once that has been accomplished however, the Apply Engine need only be Parsed and before execution because the registration of the updated AVRO schemas will be performed automatically. Even Version 4 Apply Engines that have "customized" target DESCRIPTIONS and mapping PROCEDURES will benefit because the Target DESCRIPTIONS used to create the AVRO schemas will be automatically validated and registered when the Engine is Started.

Example 1 Db2 to Kafka AVRO

Review the Apply Engine script in the Db2 to Kafka Replication use case. Only a few elements of the Engine script need to be added or altered to switch to the AVRO format, including a Confluent Schema Registry. as identified by a green bar in the first character of modified lines:
----------------------------------------------------------------------
-- Name: DB2TOKAF:  Z/OS DB2 To Kafka AVRO on Linux
...
--       Change Log:
----------------------------------------------------------------------
-- 2019-02-01 INITIAL RELEASE using JSON
-- 2019-02-02 Switch to Confluent AVRO Schema Registry
...
OPTIONS
   CDCOP('I','U','D')                 -- Set CHANGE OP Constants
  ,USE AVRO COMPATIBLE NAMES           -- Required for AVRO Targets
  ,CONFLUENT REPOSITORY 'http://schema_registry.precisely.com:8081'
;
----------------------------------------------------------------------
--       Source Descriptions
----------------------------------------------------------------------
BEGIN GROUP DB2_SOURCE;
DESCRIPTION DB2SQL ./DB2DDL/EMP.ddl AS EMPLOYEE
            KEY IS EMP_NO
            TOPIC IVP_HR_EMPLOYEE
            SUBJECT IVP_HR_EMPLOYEE-value;
DESCRIPTION DB2SQL ./DB2DDL/DEPT.ddl AS DEPARTMENT
            KEY IS DEPT_NO
            TOPIC IVP_HR_DEPARTMENT
            SUBJECT IVP_HR_DEPARTMENT-value;
END GROUP;
----------------------------------------------------------------------
--       Target Datastore(s)
----------------------------------------------------------------------
DATASTORE kafka:///*/key                    -- specify dynamic topic
          OF AVRO                           -- specify AVRO format
          FORMAT CONFLUENT                  -- use Confluent Schema Registry
          AS TARGET
          DESCRIBED BY GROUP DB2_SOURCE     -- use source for REPLICATE
;