MIMIX includes functions that can assist you in resolving a variety of problems that could not be automatically corrected. Depending on the type of problem, some problem resolution tasks may need to be performed from the system where the problem occurs, such as on the source system where the journal resides or on the target system if the problem is related to the apply process. MIMIX will direct you to the correct system when this is required.
Object activity: The Work with Data Group Activity (WRKDGACT) command allows you to track system journal replication activity associated with a data group. You can see the object, DLO, IFS, and spooled file activity, which can help you determine the cause of an error. You can also see an error view that identifies the reason why the object is in error. Options on the Work with Data Group Activity display allow you to see messages associated with an entry, synchronize the entry between systems, and remove a failed entry with or without related entries.
Failed object activity requests: During normal processing, system journal replication processes may encounter object replication requests that cannot be processed due to an error and cannot be automatically recovered. Often the error is due to a transient condition, such as when an object is in use by another process at the time the object retrieve process attempts to gather the object data. Although MIMIX will attempt some automatic retries, requests may still result in a Failed status. In many cases, failed entries can be resubmitted and they will succeed. Some errors may require user intervention, such as a never-ending process that holds a lock on the object.
When the Automatic object recovery policy is enabled, MIMIX will attempt a third retry cycle using the settings from the Number of third delay/retries (OBJRTY) and Third retry interval (min.) (OBJRTYITV) policies. These policies can be set for the installation or adjusted for a specific data group.
You can manually request that MIMIX retry processing for a data group activity entry that has a status of *FAILED. These entries can be viewed using the Work with Data Group Activity (WRKDGACT) command. From the Work with Data Group Activity or Work with Data Group Activity Entries displays, you can use the retry option to resubmit individual failed entries or all of the entries for an object. This option calls the Retry Data Group Activity Entries (RTYDGACTE) command. From the Work with Data Group Activity display, you can also specify a time at which to start the request, thereby delaying the retry attempt until a time when it is more likely to succeed.
Files on hold: When the database apply process detects a data synchronization problem with a file or member, it logs the problem in an internal database. The replication manager evaluates the logged problem and initiates a request to perform an automatic recovery action. If the recovery action fails to correct the problem or cannot be performed, the recovery action generates an error condition of “held due to error” that appears in the status of the affected data group file entry. Replication activity for the identified file or member is held for as long as the error condition exists and prevents replication activity from being applied to the target system. You need to analyze the cause of the problem in order to determine how to correct and release the file and ensure that the problem does not occur again.
An option on the Work with Data Groups display provides quick access to the subset of file entries that are in error for a data group. From the Work with DG File Entries display, you can see the status of an entry and use a number of options to assist in resolving the error. An alternative view shows the database error code and journal code. Available options include access to the Work with DG Files on Hold (WRKDGFEHLD) command. The WRKDGFEHLD command allows you to work with file entries that are in a held status. When this option is selected from the target system, you can view and work with the entry for which the error was detected and work with all other entries following the entry in error.
Advanced problem resolution support: If you use the Assure MIMIX portal application from the Assure UI portal, you can access much more information about problems detected in replication processes, automatic recovery actions, and use preferred options to manually address errors. For additional details, see Support for advanced analysis .
Journal analysis: With user journal replication, when the system that is the source of replicated data fails, it is possible that some of the generated journal entries may not have been transmitted to or received by the target system. However, it is not always possible to determine this until the failed system has been recovered. Even if the failed system is recovered, damage to a disk unit or to the journal itself may prevent an accurate analysis of any missed data. Once the source system is available again, if there is no damage to the disk unit or journal and its associated journal receivers, you can use the journal analysis function to help determine what journal entries may have been missed and to which journaled files, data areas, data queues, and IFS objects the data belongs. You can only perform journal analysis on the system where a journal resides.