Precisely supports a variety of target formats for Kafka Topics, including JSON, AVRO and delimited. JSON is the preferred format for streaming changed data (CDC), particularly while testing as it is self-described. AVRO is essentially identical to JSON except that the Schema is maintained separately and the Kafka messages will have a significantly lighter payload compared to JSON.
With AVRO and JSON, you have the option of replicating an entire source record/row, including the before and after image for updates, without any mapping. This is performed via the REPLICATE function and the examples provided later in this document utilize the source descriptions to generate the target kafka payload.
You also have the option of customizing your output with the source to target mapping and dynamic construction of JSON with a custom payload format using Relational DDL type syntax and explicit source to target column mapping. While the custom target DDL can use any combination of upper and lower case as well as hyphens or underscores as the Kafka object names, we highly recommend against hyphens and also highly recommend using all lower case characters. The reason is that both are the defacto standards in the open source Kafka and Java communities. While you may be just getting started with Kafka, most large Kafka and Hadoop users later choose to separate the schemas from the data payload by using AVRO. While JSON is easy for a human to read it consumes significantly more storage. AVRO is one method of separating the two and it imposes much stricter rules on object names and data types.
You can also create a custom message using the PRINTF function. Please note that the scope of this document is not intended to discuss custom JSON. Visit Precisely https://www.precisely.com/support for assistance with this type of processing.
DATASTORE kafka:///* -- specify dynamic topic
OF JSON -- specify JSON format
AS TARGET
DESCRIBED BY GROUP DB2_SOURCE -- use source for REPLICATE
{“object_name”:SQDATA.EMPLOYEE,
"stck":"00d16918eb93309dc0",
"timestamp":"2018-06-01-13.18.41.780264",
"change_op":"U",
"after_image":
{
"empno":“000010”,
"first_name":“SANDRA”,
”midinit":“i”,
“last_name”:“HAAS”,
“workdept”:“A00”,
“phone_no":"3978",
"hire_date":"1978-09-15",
"job":"PRES",
"ed_level":14,
"sex":"f",
"birth_date":"1954-03-31",
"salary":59250.00,
"bonus":6600.00,
"comm":2340.00
},
"before_image":
{
"empno":“000010”,
"first_name":“CHRISTINE”,
”midinit":“I”,
“last_name”:“HAAS”,
“workdept”:“A00”,
“phone_no":"3978",
"hire_date":"1978-09-15",
"job":"PRES",
"ed_level":14,
"sex":"f",
"birthdate":"1954-03-31",
"salary":59250.00,
"bonus":6600.00,
"comm":2340.00
}
}
DATASTORE kafka:///* -- specify dynamic topic
OF AVRO -- specify AVRO format
AS TARGET
DESCRIBED BY GROUP DB2_SOURCE -- use source for REPLICATE
{"type":"record","namespace":"sqdata","name":"employee__replicate",
"fields":[
{"name":"change_op","type":"string"},
{"name":"object_name","type":"string"},
{"name":"alias","type":"string"},
{"name":"stck","type":"bytes","default":{"\u0000":1}},
{"name":"timestamp","type":"string"},{"name":"seq","type":"long"},
{"name":"after_image","type":["null",{"type":"record","name":"employee",
"fields":[
{"name":"empno","type":"string"},
{"name":"first_name","type":"int"},
{"name":"midinit","type":"string"},
{"name":"last_name","type":"string"},
{"name":"workdept","type":"string"},
{"name":"phone_no","type":"string"},
{"name":"hire_date","type":"string"},
{"name":"job","type":["null","string"]},
{"name":"ed_level","type":["null","string"]},
{"name":"sex","type":["null","string"]},
{"name":"birth_date","type":["null","string"]},
{"name":"salary","type":["null","int"]},
{"name":"bonus","type":"string"},
{"name":"comm","type":["null","string"]}
]}]},
{"name":"before_image","type":["null","employee"]}]}
Sample - AVRO data payload
No sample payload is included because in addition to removing the schema information and adding a header containing the Schema name, Registry ID number and date/time stamps, other formatting changes make the content of the file all but unreadable. For this reason Precisely suggests testing first with human readable JSON formatted output.