The following examples replicate DB2 changed data (CDC) for the IVP EMP and DEPT tables into HDFS in JSON or AVRO format. An example of the JSON output can be seen above in Determine HDFS Output Format. The example also includes a filter for the EMPLOYEE table, only updates to employees with a bonus over $5,000 will cause the record to be written to HDFS. All changes to the DEPT table are Replicated with no filter applied.
Example 1
----------------------------------------------------------------------
-- Name: DB2TOHDFS z/OS DB2 To HDFS on Linux
-- Client/Project: client/project
----------------------------------------------------------------------
-- SUBSTITUTION PARMS USED IN THIS SCRIPT:
-- %(ENGINE) - ENGINE Name
-- %(SHOST) - Source HOST of the Capture/Publisher
-- %(SPORT) - Source HOST SQDaemon PORT
-- %(PUBNM) - Source Capture/Publisher Agent Name
-- %(THOST) - Target HDFS HOST
-- %(TPORT) - Target HDFS HOST PORT
----------------------------------------------------------------------
-- Change Log:
----------------------------------------------------------------------
-- 2017-06-01 INITIAL RELEASE
----------------------------------------------------------------------
JOBNAME DB2TOHDFS;
OPTIONS
CDCOP('I','U','D') -- Set CHANGE OP Constants
,USE AVRO COMPATIBLE NAMES
;
----------------------------------------------------------------------
-- Data Definition Section
----------------------------------------------------------------------
----------------------------------------------------------------------
-- Source Descriptions
----------------------------------------------------------------------
BEGIN GROUP DB2_SOURCE;
DESCRIPTION DB2SQL ./DB2DDL/EMP.ddl AS EMP;
DESCRIPTION DB2SQL ./DB2DDL/DEPT.ddl AS DEPT;
END GROUP;
-----------------------------------------------------------------------
-- Target Descriptions
----------------------------------------------------------------------
----------------------------------------------------------------------
-- Datastore Section
----------------------------------------------------------------------
----------------------------------------------------------------------
-- Source Datastore
----------------------------------------------------------------------
DATASTORE cdc://%(SHOST):%(SPORT)/%(PUBNM)/%(ENGINE)
OF UTSCDC
AS CDCIN
DESCRIBED BY GROUP DB2_SOURCE
;
----------------------------------------------------------------------
-- Target Datastore(s)
----------------------------------------------------------------------
DATASTORE hdfs://%(THOST):%(TPORT)/*.dat
OF JSON -- specify JSON format
AS TARGET
DESCRIBED BY GROUP DB2_SOURCE -- use source for REPLICATE
STAGING SIZE 4G -- file size 4G max
STAGING DELAY 60 -- rotate every 60 minutes
;
----------------------------------------------------------------------
-- Field Specification Section
----------------------------------------------------------------------
DATEFORMAT 'ISO';
----------------------------------------------------------------------
-- Procedure Section
----------------------------------------------------------------------
CREATE PROC P_EMP AS SELECT
{
IF EMP.BONUS > '5000'
{
REPLICATE(TARGET, EMP)
}
}
FROM CDCIN;
----------------------------------------------------------------------
-- Main Section
----------------------------------------------------------------------
PROCESS INTO TARGET
SELECT
{
CASE RECNAME(CDCIN)
WHEN 'EMP' CALLPROC(P_EMP)
WHEN 'DEPT' REPLICATE(TARGET, DEPT)
}
FROM CDCIN;
Example 2
Switching the Output format to AVRO, including a Schema Repository, requires only the following elements of the script above to be added or altered:
------------------------------------------------------------
-- SCHEMA ID LIST
------------------------------------------------------------
SCHEMA ID FOR
CDC HEADER 1
,EMP <employee_Schema_ID>
,DEPT <department_Schema_IDn>
----------------------------------------------------------------------
-- Target Datastore(s)
----------------------------------------------------------------------
DATASTORE hdfs://%(THOST):%(TPORT)/*.dat
OF AVRO -- specify AVRO format
AS TARGET
DESCRIBED BY GROUP DB2_SOURCE -- use source for REPLICATE
STAGING SIZE 4G -- file size 4G max
STAGING DELAY 60 -- rotate every 60 minutes