File or HDFS AVRO container target - connect_cdc_sqdata - Latest

Connect CDC (SQData) Replicator Engine

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect CDC (SQData)
Version
Latest
Language
English
Product name
Connect CDC (SQData)
Title
Connect CDC (SQData) Replicator Engine
Copyright
2024
First publish date
2000
ft:lastEdition
2024-08-01
ft:lastPublication
2024-08-01T16:58:30.842568

Replication to AVRO containers in either plain file or HDFS (Hadoop file system) can be accomplished with minimal configuration and provide excellent performance.

Prerequisites
  • AVRO container targets are restricted to a single worker and each target must have its own individual file.
  • As a consequence the target url must be a generic url and the substitution parameter for the generic url must be unique for each target.
  • By default, the qualified name of the source is used as substitution parameter. That can be overridden using the MAPPINGS section, either static or dynamic.
Examples
  • AVRO Container formatted file or HDFS (Hadoop file system) where every source object (i.e. table name) will be written to a unique file using AVRO CONTAINER formatting and default file Rotation.
    REPLICATE
      DB2 cdc://<host_name>:<sqdaemon_port>/<publisher_name>/<subscription_name>
      TO AVRO CONTAINER [file | hdfs]:///*
    ;
  • AVRO Container formatted file or HDFS (Hadoop file system) where every source object name (i.e. table name) is written to a specified file name with a common prefix and suffix and specific file rotation parameters in specified in an OPTIONS statement.
    REPLICATE
      DB2 cdc://<host_name>:<sqdaemon_port>/<publisher_name>/<subscription_name>
      TO AVRO CONTAINER [file | hdfs]:///<prefix>_*_<suffix>
    ;
    OPTIONS
    ROTATE SIZE 100M,
    ROTATE DELAY 30
    
Note: AVRO Container targets use a file rotation method controlled by a size and or a delay. For hadoop (HDFS), if an OPTIONS statement does not define a size and delay, by default a delay of 1 hour is applied. A delay of one hour means that from the time a record is first written to a target file, the file will be rotated after one hour has passed, if the file has not been rotated for other reason.