Validating data integrity - assure_mimix_for_aix - 6.0

Assure MIMIX for AIX Guide

Product type
Software
Portfolio
Integrate
Product family
Assure
Product
Assure MIMIX™ Software for AIX
Version
6.0
ft:locale
en-US
Product name
Assure MIMIX for AIX
ft:title
Assure MIMIX for AIX Guide
Copyright
2025
First publish date
2003
ft:lastEdition
2025-03-12
ft:lastPublication
2025-03-12T05:07:12.494000

Validating the data integrity of the replica is critical. Prior to performing the failover operations, validate the data integrity of the replica on the recovery server and restore it if necessary. To validate the data, first create a snapshot of the replica and then analyze it with the application itself.

Follow these steps to perform the Unplanned Failover/Resync/Failback operations:

  1. On the recovery server, make sure all snapshot filesystems are unmounted before trying to release the snapshot.

    /usr/scrt/bin/rtumnt -C <Context ID>
  2. Make sure no snapshots already exist on the recovery server.

    /usr/scrt/bin/scrt_ra -C <Context ID> -W

  3. On the recovery server, create a Snapshot based on the current redo log and validate the data integrity. Enter the following command to create a snapshot based on the current redo log:

    /usr/scrt/bin/scrt_ra -C <Primary Context ID> -X

    You should see output similar to the following:

    Making SNAP /dev/rsnc1lif_bk_1, 66.306
    Making SNAP /dev/rsnc1lif_bk_2, 66.310
    Making SNAP /dev/rsnc1lif_bk_3, 66.314
    Making SNAP /dev/rsnc1dbmf_bk_1, 66.318
    Making SNAP /dev/rsntestlv, 66.450
    Making SNAP /dev/rsnrtlog, 66.454
    Making SNAP /dev/rtestlv, 66.448
    Making SNAP /dev/rrtlog, 66.452
    Snap Devs (read only, raw) Minor rj wj Snap level
    Vdev level Vdevs (read/write, block)------------------------------ ----- -- -- --------------
    /dev/rsnc1lif_bk_1 304 ON -- 10
    0 N/A/dev/rsnc1lif_bk_2 308 ON -- 100 N/A
    /dev/rsnc1lif_bk_3 312 ON -- 10
    0 N/A/dev/rsnc1dbmf_bk_1 316 ON -- 100 N/A
    /dev/rsntestlv 448 ON ON 10
    10 /dev/testlv/dev/rsnrtlog 452 ON ON 10
    10 /dev/rtlog
  4. Mount the snapshot filesystems on the recovery server. Enter the following command:

    /usr/scrt/bin/rtmnt -C <Context ID>

    You should see output similar to the following:

    Determining Filesystems to mount...
    fsck -fp -y /dev/rtestlv
    log redo processing for /dev/rtestlv
    syncpt record at 7028end of log 7028
    syncpt record at 7028
    syncpt address 7028
    number of log records = 1
    number of do blocks = 0
    number of nodo blocks = 0
    /dev/rtestlv (/test): ** Unmounted cleanly - Check
    suppressed
    Mounting /test...   

       "To validate the data, analyze it with your application.

       "If analysis indicates the data is valid, use rtumnt to unmount the snapshot filesystems, use scrt_ra to remove the snapshot then proceed to Step 6 to perform a failover to the latest point in the data.

    /usr/scrt/bin/rtumnt -C <Context ID>

    /usr/scrt/bin/scrt_ra -C <Context ID> -W

       "If analysis indicates data corruption, use rtumnt to unmount the snapshot filesystems, use scrt_ra to remove snapshot.

    /usr/scrt/bin/rtumnt -C <Context ID>

    /usr/scrt/bin/scrt_ra -C <Context ID> -W

    Use the following command to display the Available rollback windows:

       /usr/scrt/bin/scrt_ra -C<Context ID> -vpe

    then create a snapshot based on one of the following to locate and validate an optimal restore point.

    Once you have located an optimal restore point, remove the snapshot. Proceed to Step 5 to Backup the replica or to Step 6 to perform a failover restore.

  5. On the recovery server, if you have TSM or SysBack, backup the replica. This provides additional data protection by keeping complete copies of the data on archive media such as tape. Refer to Working with Archived Data.

  6. On the recovery server, depending on the results from Step 4, either rollback the replica to the validated rollback location before executing a failover or failover using the current redo log.

    To rollback the replica to the validated rollback location from Step 4, use:

    scrt_ra -C <Primary Context ID> -F [-t | -S | -D].

    For example, to restore the replica to a previously validated LFC:

    scrt_ra -C <Primary Context ID> -F -t 70250

    For example, to restore the replica to a previously validated epoch time (1169584788 seconds from Jan. 1, 1970):

    scrt_ra -C <Primary Context ID> -F -S 1169584788

    For example, to restore the replica to a previously validated date and time:

    scrt_ra -C <Primary Context ID> -F -D "05/15/09 09:33:40"

    For example, to restore the replica to a previously validated event mark (1389727288 which is the event mark Epoch Time):                                    

    scrt_ra -C <Primary Context ID> -F -S 1389727288
  7. On the recovery server, failover the replication group, execute:

    rtdr -C <Primary Context ID> failover

    Note: At this point the configured recovery server has become the new production server and the configured production server has become the new recovery server.
  8. On the new production server, start your application.

    Note: At this point your application is running on the new production server and both configured production servers are down. When one or both configured production servers come online, Power HA will start and bring the Resource Groups online on one of the configured production servers.
  9. On the configured production server, use PowerHa Stop Cluster Services to unmanage the resource groups.

    Stop now, on system restart or both     select     [now]

    Stop Cluster Services on these nodes   select     [both nodes]

    Select an Action on Resource Groups  select     [Unmanage Resource Groups]

  10. On the configured production server stop your application.

    Unplanned failover -resync operation

    A resync operation is required when the Production Volumes and Recovery Replica Volumes diverge. This occurs after a failover to the recovery server.

    When the application is started on the new production server the updates to the Replica Volumes result in a divergence from the data on the new recovery server.

  11. On the new recovery server, execute the resync command to start replication on the new recovery server.

    /usr/scrt/bin/rtdr -C <Primary Context ID> resync

    Answer “y” to the “Do you wish to” questions:

    scsetup: You have requested failover processing.
    scsetup: Do you wish to continue? [y|n]
    !!! RESET WARNING !!!
    You have requested an LCA reset.
    All outstanding sealed LFCs will be dumped.
    Do you wish to do this? [y|n]
  12. On the new recovery server verify that the resync process completed.

    Example:

    -- Failover Context ID <Failover ContextID> is enabled and ready for re-sync. ---

  13. On the new production server, execute the resync command to start replication on the new production server.

    /usr/scrt/bin/rtdr -C <Primary Context ID> resync

  14. On the new production server verify that the resync process completed.

    Example:

    --- System re-sync is activated. ---

    Note: At this point your application is running on the new production server and data is being replicated to the new recovery server.
    Note: Important: If the failover from the production server to the recovery server was necessary because all protected data was lost on the production server, then the data must be restored to the production server before doing a failback.
  15. On the new production server, use the scconfig command to resync all protected data from the new production server to the new recovery server.

    /usr/scrt/bin/scconfig -B -G -C <Failover Context ID>

    Unplanned failover -failback operation

  16. Before performing the failback operation, ensure that you stop your application on the new production server.

  17. On the new production server, use rtumnt to unmount the protected file systems, transfer any current LFC data to the recovery server.

    /usr/scrt/bin/rtumnt -FC<FailoverContextID>

  18. On the new production server verify that the protected file systems are unmounted and synchronization of data has successfully completed.

    Example:

    Determining Filesystems to unmount...Unmounting /dev/lvFS_1_C1 from /FS_1_C1...Sync: transferring current LFC to Recovery ServerWaiting for synchronization of data to complete.All data has been synchronized to the Recovery Server.

  19. On the new production server, use rtstop to unmount the protected file systems, transfer any current LFC data to the recovery server and unload the Assure MIMIX for AIX production server drivers.

    /usr/scrt/bin/rtstop -FSC<FailoverContextID>
  20. On the new production server verify that the synchronization of data has successfully completed.

    Example:

    No mounted filesystems
    Sync: transferring any current LFC data to Recovery Server
    Waiting for synchronization of data to complete.
    All data has been synchronized to the Recovery Server.
    Stopping scrt_lca................
    Unloading Assure MIMIX  DR for AIX Production Server Drivers
  21. On the new recovery server execute the failback command to start the failback process. Wait for failback to successfully complete before performing failback on the new production server.

    /usr/scrt/bin/rtdr -C <Primary Context ID> failback

  22. On the new recovery server verify that the failback process completed.

    Example:

    Assure MIMIX  DR for AIX Production Server Drivers are already loadedfor Context ID 1.
    Starting scrt_lca
    fsck -fp -y /dev/rlvFS_1_C1
    The current volume is: /dev/lvFS_1_C1
    Primary superblock is valid.Mounting  /FS_1_C1...
    --- Primary Context ID <1> is enabled. ---
  23. On the new production server execute the failback command to start the failback process. Wait for failback to successfully complete before starting your application.

    /usr/scrt/bin/rtdr -C <Failover Context ID> failback
  24. On the new production server verify that the failback process completed.

    Example:

    Assure MIMIX  DR for AIX Recovery Server Drivers are already loaded for Context ID 1.
    Starting scrt_aba
    --- Primary Context ID <1> is enabled. ---
    Note: At this point the production and recovery servers have been returned back to their original roles and replication is active.
  25. On the production server start your application.

  26. Use PowerHa Start Cluster Services to manage the resource groups.

Start now, on system restart or both     select    [both]

Start Cluster Services on these nodes   select    [both nodes]

Manage Resource Groups                    select     [Automatically]

Note: This concludes the Unplanned Failover Operations (Failover, Resync, and Failback).