Kafka contingencies - connect_cdc_sqdata - Latest

Connect CDC (SQData) Kafka Quickstart

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect CDC (SQData)
Version
Latest
Language
English
Product name
Connect CDC (SQData)
Title
Connect CDC (SQData) Kafka Quickstart
Copyright
2024
First publish date
2000
Last edition
2024-07-30
Last publish date
2024-07-30T20:00:09.892433

While very nature of Kafka and its cluster based architecture is to accommodate large volumes of data from Producers, issues can occur. Various factors may make a contingency plan for a Kafka cluster desirable or even a requirement. One such contingency scenario is described below.

The principal issue associated with an extended Kafka outage traces back to the source of the data where the Connect CDC SQData Capture and Publishing occur. Achieving high end to end throughput is accomplished through the careful management of the data that has been captured and the efforts made to avoid the I/O operations required when that transient data must be "landed", in other words written to disk before it is consumed and written to its eventual target, in this case Kafka.

When the Apply or Replicator Engine is unable to write to Kafka that eventually translates to the need to hold the captured data and/or slow down its capture at the source. That can become problematic, particularly when the source itself generates a very high volume of Change Data. When an Engine stops, data cannot be published, committed units-of-work (UOWs) are captured and the data ordinarily held in memory must be written to a transient Storage area. Depending on the Capture that may be a zOS high performance LogStream or disk storage dedicated for this purpose. Eventually the transient data area will be exhausted. When that happens the Capture will slow and eventually stop reading the database log. This is considered an acceptable state albeit not normal or desirable but manageable. The problem that can occur is that the source database log files may be archived, moved to slower storage or even deleted. When that happens Capture will be unable to continue from where it left off and the possibility of data loss becomes reality. Talk to Precisely about the best practices we recommend for log archiving.

If a contingency plan for this situation is required, the following is two approaches may be considered. While it does "land" the data, it will do so on disk often much lower in cost than the on the source machine.
  • Use SQDUtil
  • Use a special Engine and Kafka utility