Connect CDC (SQData) Change Data Capture was designed to be both robust and forgiving in that it takes a simple and conservative approach when deciding where in the log to start capturing data when first started or upon being restarted following either a scheduled interruption of the capture or an unplanned disruption in a production environment.
Unlike Db2 z/OS , the first time the UDB (Db2/LUW) capture agent is started it will use the oldest available archived Log as starting point, unless the LSN is explicitly specified.
Regardless of the LSN specified, there are likely to be transactions that started prior to the selected LSN, but that have not been committed or rolled back. These are referred to as in-flight transactions. The capture agent keeps track of these in-flight transactions, and therefore, in case of a normal restart, knows exactly how far back in time to re-mine to guarantee the re-capture of all in-flight transactions. On the initial start however, the capture agent does not have any such indication so the recommended behavior is to choose the current LSN as the start point for mining the Log. If a log record belongs to a transaction for which the begin transaction record has not been seen by the capture agent, the record is considered an orphan record. If the transaction is committed after the start LSN or if the transaction is rolled back, the orphaned records of that transaction are voided (that is not counted as orphan record). Therefore it is possible that the statistics, immediately after a re-start, show some orphan records, but does not show them later.
It is recommended to use an LSN close to the "Current LSN" ensuring that transactions committed prior to that point are ignored.
The safe approach for initiating capture the first time involves the following steps:
- Schedule implementation for a quiet point.
- Determine the "current" LSN as a starting point by following the procedure and examples below.
- Setting the LSN at the global capture agent level (i.e. all tables) using the SQDCONF modify command.
- Start the capture agent.
Example 1
Set the LSN at the capture agent level (i.e. all tables) with the SQDCONF modify command to start at the "Current LSN".
- Run the Db2pd command to determine the current log number, for example:
$ Db2pd -log -db <database_name>
Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:27:02 -- Date 2018-05-23-14.49.45.515600 Logs: Current Log Number 2 Pages Written 19 Cur Commit Disk Log Reads 0 Cur Commit Total Log Reads 0 Method 1 Archive Status n/a Method 1 Next Log to Archive n/a Method 1 First Failure n/a Method 2 Archive Status n/a Method 2 Next Log to Archive n/a Method 2 First Failure n/a Log Chain ID 0 Current LSO 57142478 Current LSN 0x000000000004FB14 Address StartLSN StartLSO State Size Pages Filename 0x00007F164E99F090 00000000000429EC 48912001 0x00000000 1000 1000 S0000000.LOG 0x00007F16511319F0 000000000004901E 52988001 0x00000000 1000 1000 S0000001.LOG 0x00007F1651132350 000000000004F9A6 57064001 0x00000000 1000 1000 S0000002.LOG
Note: The most recent (current) log is number 2. - Run the Db2flsn command to determine the LRI / LSN range of that log file:
$ Db2flsn -db SAMPLE -lrirange 2
S0000002.LOG: has LRI range 00000000000000010000000000002254000000000004F9A6 to 000000000000000100000000000022CC000000000004FB13
- Run the sqdconf utility (or edit the shell script used to run the utilitiy) to modify the capture configuration and start the capture:
$ sqdconf modify /<SQDATA_VAR_DIR>/udbcdc/udbcdc.cab --lsn=00000000000022CC:000000000004FB13 $ sqdconf apply /<SQDATA_VAR_DIR>/udbcdc/udbcdc.cab $ sqdconf start /<SQDATA_VAR_DIR>/udbcdc/udbcdc.cab --safe-restart=00000000000022CC:000000000004FB13
Example 2
- Run GetStartLSN.sh
#!/bin/bash DATABASE="${1:-sample}" save_clnum="" db2pd -log -db $DATABASE > /tmp/db2.log.output_$$ while IFS= read -r line; do if [ ! -z "$line" ]; then fields=($(printf "%s" "$line"|cut -d' ' --output-delimiter=' ' -f1-)) if [ "${fields[0]}" == "Current" ]; then if [ "${fields[1]}" == "Log" ]; then save_clnum="${fields[3]}" fi fi fi done < '/tmp/db2.log.output_'$$ #cat /tmp/db2.log.output_$$ db2flsn -db $DATABASE -lrirange ${save_clnum} | (read -r lri; flds=($(printf "%s" "$lri"|cut -d' ' --output-delimiter=' ' -f1-)) lsn=${flds[6]:16:16}":"${flds[6]:32:16} echo " --lsn="$lsn echo "--safe-restart="$lsn ) rm /tmp/db2.log.output_$$
- Perform step 3 in Example 1 to modify the capture configuration and start the capture.
- Both the manual procedure and shell script examples use commands that will likely require DBA permissions because they can only be run on the DB server and the second requires permission to access the log directory/files. If too much time elapses between running these commands and your using them the logs will likely have rolled or been archived in your environment making the number's too old to use.
- Once the starting point is identified it must be specified in the capture configuration file. The initial starting LSN should be set at a global capture agent level (i.e. all tables in the configuration file). Once the LSN is established, it is typically not altered for normal operation. The capture agent continuously updates the global LSN and the individual table LSNs in the configuration file as changed data is being processed. Each time the capture agent starts, data capture is resumed from the last LSN processed.
- A colon (:) must be added to the middle of the two part LRI/LSN.
- If this is the first time the capture has been started after "creating" the capture configuration (.cab) file, then an initial "sqdconf apply" must be run following the "sqdconf mount" command and before issuing the first "sqdconf modify" command.
- A similar procedure would be used in other situations, such as a point-in-time recovery, where you go back and recapture the changes made from a few hours earlier. The appropriate LSN would be determined using the Db2pd and Db2flsn commands.