Support for automatically detecting and resolving problems - assure_mimix - 10.0

Assure MIMIX Administrator Reference

Product type
Product family
Assure MIMIX™ Software
Product name
Assure MIMIX
Assure MIMIX Administrator Reference
First publish date

The following processes can automatically detect changes within your replication environment and will attempt to automatically correct any problems found when default configuration settings that control automatic recovery actions are used.

  • Replication processes: MIMIX automatically checks for common problems during user journal and system journal replication and attempts to automatically correct them. A replication error is not reported for the affected file or object unless the attempted recovery action fails or is prevented by policy values.

  • Target journal inspection: Target journal inspection processes check for objects changed on the target by users or programs other than MIMIX and notifies you when changes are detected. MIMIX automatically attempts to correct the detected changes unless prevented from doing so by policies.

  • Audits: MIMIX ships with a set of audits which are scheduled to run automatically and check for common problems within data groups. Any detected difference is automatically corrected unless policies explicitly prevent audit recovery actions. Audits that automatically check all configured objects associated with their audit class run on a weekly basis. Audits that automatically check a subset of replicated objects selected by priorities run every day during a specific time. Details of scheduling and priority eligibility can be customized. Audits can be invoked manually as well. You have control over other aspects of audit runtime behavior, including optionally disabling automatic recovery.

  • Virtual switch: During a virtual switch procedure, MIMIX tracks which objects are changed on the target node during testing. When testing is complete, the virtual switch procedure submits recovery actions to remove the tracked changes and return target node objects to their pre-test state. These virtual switch recovery actions are performed regardless of the values of the automatic database and object recovery policies.

Problems can be detected for library-based objects, directory-based objects, and folder-based objects. The types of problems detected are associated with creating or deleting an object or changing an object's authorities, attributes, or data. More than one type of error can exist for an object. All detected problems are tracked by MIMIX which may also initiate recovery actions that will address multiple problems. If recovery actions cannot correct the problem or if automatic recovery policies are disabled, manual recovery is required by the user.

Recoveries for problems detected by audits are submitted immediately after the audit compare phase completes. Most audit recoveries are processed by submitting a recovery request to the replication manager to determine processing. A limited number of audit recoveries are processed within the audit job.

Replication Manager: Recoveries for problems detected by audits and other processes are initiated by the MIMIX replication manager (MXREPMGR). The replication manager is a persistent job in the MIMIXSBS subsystem. For recovery processing, the persistent job periodically checks the internal database for any objects identified as needing correction. On the source system, an additional persistent job in the subsystem, the replication manager recovery controller job (MXRPMCTLR), controls routing recovery requests to multiple recovery handler jobs (MXRPMHDLR) to perform recovery actions. The controller job on the source system determines how to resolve the identified problems for an object and whether to perform a synchronize operation for a recovery request or to route the request through object replication for processing. The replication manager also prioritizes the object for the next priority-based audit. Recovery processing policies determine how many handler jobs can run on a source system and how many retry attempts can be made when processing a recovery.

The replication manager initiates recoveries only for objects (file member, library-based, directory-based objects, or folder-based objects) within the name space of objects eligible for replication. The type of problem detected for an object determines whether the recovery action is initiated by the replication manager on the source or the target system. Policy values also affect the recovery action.

If the object exists on the source system, the recovery action is initiated from the source system. The recovery action can only be performed if policies allow automatic recovery for replication processes (DBRCY and OBJRCY). The recovery action submits a U-MX journal entry into the appropriate journal for processing by MIMIX replication. The information to be captured for the synchronization request in the U-MX entry is subject to any delay time specified in Source capture delay (SRCCAPDLY) policy.

If the percentage of storage used by the system auxiliary storage pool (ASP 1) on the source system of a data group exceeds the system ASP threshold warning policy for delayed data recoveries (RCYASPTHLD) policy, the replication manager will delay submitting recoveries for member data of *FILE objects. This prevents a potentially large amount of U-MX journal entries from being added to the user journal on a system that is already storage constrained. When the percentage of ASP 1 currently in use falls below the threshold set in the policy, the replication manager automatically resumes processing these recoveries. Other types of recoveries are not affected by the RCYASPTHLD policy.

Note: The replication manager does not check the system ASP and does not delay member data recoveries when the source system of a Assure MIMIX for PowerHA installation is installed on an independent ASP.

If the detected object or member is eligible for replication but exists only on the target system, the replication manager on the target system uses the automatic recovery policies (DBRCY, OBJRCY) and the value of the Object only on target (OBJONTGT) policy in effect to determine how to recover the object or member. If automatic recovery policies are enabled, the value of the OBJONTGT policy determines how the recovery action is processed. If automatic recovery polices are disabled, the OBJONTGT policy is ignored and the recovery action cannot be processed.

If a recovery action fails to correct the problem or is not attempted due to policy values, the recovery action generates an error condition that is reflected in the status of the file entry, tracking entry, or object activity entry associated with the affected object.

The replication manager sends an error notification when a recovery action for a replication-eligible, target-only object fails or is prevented from running by policies. This type of notification is sent on a per-object or per-member basis. Notifications of this type during the recovery phase of a virtual switch are sent once per data group.

MIMIX retains information about an object or member with detected problems until its recovery action completes or fails and three days have passed since its most recently detected problem.