Construct the HDFS apply engine script - connect_cdc_sqdata - Latest

Connect CDC (SQData) HDFS Quickstart

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect CDC (SQData)
Version
Latest
Language
English
Product name
Connect CDC (SQData)
Title
Connect CDC (SQData) HDFS Quickstart
Copyright
2024
First publish date
2000
Last edition
2024-07-30
Last publish date
2024-07-30T20:10:32.610182

The following examples replicate DB2 changed data (CDC) for the IVP EMP and DEPT tables into HDFS in JSON or AVRO format. An example of the JSON output can be seen above in Determine HDFS Output Format. The example also includes a filter for the EMPLOYEE table, only updates to employees with a bonus over $5,000 will cause the record to be written to HDFS. All changes to the DEPT table are Replicated with no filter applied.

Example 1
----------------------------------------------------------------------
-- Name: DB2TOHDFS z/OS DB2 To HDFS on Linux
-- Client/Project: client/project
----------------------------------------------------------------------
--  SUBSTITUTION PARMS USED IN THIS SCRIPT:
--    %(ENGINE) - ENGINE Name
--    %(SHOST) - Source HOST of the Capture/Publisher
--    %(SPORT) - Source HOST SQDaemon PORT
--    %(PUBNM) - Source Capture/Publisher Agent Name
--    %(THOST) - Target HDFS HOST
--    %(TPORT) - Target HDFS HOST PORT
----------------------------------------------------------------------
--       Change Log:
----------------------------------------------------------------------
-- 2017-06-01 INITIAL RELEASE
----------------------------------------------------------------------
JOBNAME DB2TOHDFS;
OPTIONS
   CDCOP('I','U','D')                  -- Set CHANGE OP Constants
  ,USE AVRO COMPATIBLE NAMES
;
----------------------------------------------------------------------
--       Data Definition Section
----------------------------------------------------------------------
----------------------------------------------------------------------
--       Source Descriptions
----------------------------------------------------------------------
BEGIN GROUP DB2_SOURCE;
DESCRIPTION DB2SQL ./DB2DDL/EMP.ddl AS EMP;
DESCRIPTION DB2SQL ./DB2DDL/DEPT.ddl AS DEPT;
END GROUP;
-----------------------------------------------------------------------
--       Target Descriptions
----------------------------------------------------------------------
----------------------------------------------------------------------
--       Datastore Section
----------------------------------------------------------------------
----------------------------------------------------------------------
--       Source Datastore
----------------------------------------------------------------------
DATASTORE cdc://%(SHOST):%(SPORT)/%(PUBNM)/%(ENGINE)
        OF UTSCDC
        AS CDCIN
        DESCRIBED BY GROUP DB2_SOURCE
;
----------------------------------------------------------------------
--       Target Datastore(s)
----------------------------------------------------------------------
DATASTORE hdfs://%(THOST):%(TPORT)/*.dat
        OF JSON                           -- specify JSON format
        AS TARGET
        DESCRIBED BY GROUP DB2_SOURCE     -- use source for REPLICATE
        STAGING SIZE 4G                   -- file size 4G max
        STAGING DELAY 60                 -- rotate every 60 minutes
;
----------------------------------------------------------------------
--       Field Specification Section
----------------------------------------------------------------------
DATEFORMAT 'ISO';
----------------------------------------------------------------------
--       Procedure Section
----------------------------------------------------------------------
CREATE PROC P_EMP AS SELECT
{
  IF EMP.BONUS > '5000'
  {
      REPLICATE(TARGET, EMP)
  }
}
FROM CDCIN;
----------------------------------------------------------------------
--       Main Section
----------------------------------------------------------------------
PROCESS INTO TARGET
SELECT
{
  CASE RECNAME(CDCIN)
      WHEN 'EMP' CALLPROC(P_EMP)
      WHEN 'DEPT' REPLICATE(TARGET, DEPT)
}
FROM CDCIN;

Example 2

Switching the Output format to AVRO, including a Schema Repository, requires only the following elements of the script above to be added or altered:
------------------------------------------------------------
--       SCHEMA ID LIST
------------------------------------------------------------
SCHEMA ID FOR
  CDC HEADER 1
,EMP <employee_Schema_ID>
,DEPT <department_Schema_IDn>

----------------------------------------------------------------------
--       Target Datastore(s)
----------------------------------------------------------------------
DATASTORE hdfs://%(THOST):%(TPORT)/*.dat
        OF AVRO                           -- specify AVRO format
        AS TARGET
        DESCRIBED BY GROUP DB2_SOURCE     -- use source for REPLICATE
        STAGING SIZE 4G                   -- file size 4G max
        STAGING DELAY 60                 -- rotate every 60 minutes