Fine-tuning backlog warning thresholds for a data group - assure_mimix - 10.0

Assure MIMIX Administrator Reference

Product type
Software
Portfolio
Integrate
Product family
Assure
Product
Assure MIMIX™ Software
Version
10.0
Language
English
Product name
Assure MIMIX
Title
Assure MIMIX Administrator Reference
Copyright
2023
First publish date
1999
ft:lastEdition
2024-05-07
ft:lastPublication
2024-05-07T13:36:02.962500

MIMIX supports the ability to set a backlog threshold on each of the replication jobs used by a data group. When a job has a backlog that reaches or exceeds the specified threshold, the threshold condition is indicated in the job status and reflected in user interfaces.

Threshold settings are meant to inform you that, while normal replication processes are active, a condition exists that could become a problem. What is an acceptable risk for some data groups may not be acceptable for other data groups or in some environments. For example, a threshold condition which occurs after starting a process that was temporarily ended or while processing an unusually large object which rarely changes may be an acceptable risk. However, a process that is continuously in a threshold condition or having multiple processes frequently in threshold conditions may indicate a more serious exposure that requires attention. Ultimately, each threshold setting must be a balance between allowing normal fluctuations to occur while ensuring that a job status is highlighted when a backlog approaches an unacceptable level of risk to your recovery time objectives (RTO) or risk of data loss.

Important! When evaluating whether threshold settings are compatible with your RTO, you must consider all of the processes in the replication paths for which the data group is configured and their thresholds. Each threshold represents only one process in either the user journal replication path or the system journal replication path. If the threshold for one process is set higher than its shipped value, a backlog for that process may not result in a threshold condition while being sufficiently large to cause subsequent processes to have backlogs which exceed their thresholds. Consider the cumulative effect that having multiple processes in threshold conditions would have on RTO and your tolerance for data loss in the event of a failure.

Table 38 lists the shipped values for thresholds available in a data group definition, identifies the risk associated with a backlog for each replication process, and identifies available options to address a persistent threshold condition. For each data group, you may need to use multiple options or adjust one or more threshold values multiple times before finding an appropriate setting.

Table 1. Shipped threshold values for replication processes and the risk associated with a backlog

Replication ProcessBacklog Threshold and its Shipped Default Val­ues

Risk Associated with a Backlog

Options forResolving Persistent Threshold Conditions

Note: Select a name to view a description

Remote journaling threshold 

  10 minutes

All journal entries in the backlog for the remote journaling function exist only in the source system journal and are waiting to be transmitted to the remote journal. These entries cannot be processed by MIMIX user journal replication processes and are at risk of being lost if the source system fails. After the source system becomes available again, journal analysis may be required.

Option 3 

Option 5 

Database reader/send threshold 

  10 minutes

For data groups that use remote journaling, all journal entries in the database reader backlog are physically located on the target system but MIMIX has not started to replicate them. If the source system fails, these entries need to be read and applied before switching.

For data groups that use MIMIX source-send processing, all journal entries in the database send backlog, are waiting to be read and to be transmitted to the target system. The backlogged journal entries exist only in the source system and are at risk of being lost if the source system fails. After the source system becomes available again, journal analysis may be required.

Option 2 

Option 3 

Option 5 

Database apply threshold warning (1000s) 

  100,000 entries)

All of the entries in the database apply backlog are waiting to be applied to the target system. If the source system fails, these entries need to be applied before switching. A large backlog can also affect performance.

Option 2 

Option 3 

Option 5 

Object send threshold 

  10 minutes

All of the journal entries in the object send backlog exist only in the system journal on the source system and are at risk of being lost if the source system fails. MIMIX may not have determined all of the information necessary to replicate the objects associated with the journal entries. As this backlog clears, subsequent processes may have backlogs as replication progresses. If the object send process is shared among multiple data groups and the backlog is persistent, it may be necessary to reduce the number of data groups sharing the same object send process.

Option 2 

Option 3 

Option 4 

Option 5 

Object retrieve warning message threshold 

  100 entries

All of the objects associated with journal entries in the object retrieve backlog are waiting to be packaged so they can be sent to the target system. The latest changes to these objects exist only in the source system and are at risk of being lost if the source system fails. As this backlog clears, subsequent processes may have backlogs as replication progresses.

Option 1 

Option 2 

Option 3 

Option 5 

Container send warning message threshold 

   100 entries

All of the packaged objects associated with journal entries in the container send backlog are waiting to be sent to the target system. The latest changes to these objects exist only in the source system and are at risk of being lost if the source system fails. As this backlog clears, subsequent processes may have backlogs as replication progresses

Option 1 

Option 2 

Option 3 

Option 5 

Object apply warning message threshold 

  100 requests

All of the entries in the object apply backlog are waiting to be applied to the target system. If the source system fails, these entries need to be applied before switching. Any related objects for which an automatic recovery action was collecting data may be lost.

Option 1 

Option 2 

Option 3 

Option 5 

The following options are available, listed in order of preference. Some options are not available for all thresholds.

Option 1 - Adjust the number of available jobs. This option is available only for the object retrieve, container send, and object apply processes. Each of these processes have a configurable minimum and maximum number of jobs, a threshold at which more jobs are started, and a warning message threshold. If the number of entries in a backlog divided by the number of active jobs exceeds the job threshold, extra jobs are automatically started in an attempt to address the backlog. If the backlog reaches the higher value specified in the warning message threshold, the process status reflects the threshold condition. If the process frequently shows a threshold status, the maximum number of jobs may be too low or the job threshold value may be too high. Adjusting either value in the data group configuration can result in more throughput.   

Option 2 - Temporarily increase job performance. This option is available for all processes except the RJ link. Use work management functions to increase the resources available to a job by increasing its run priority or its timeslice (CHGJOB command). These changes are effective only for the current instance of the job. The changes do not persist if the job is ended.

Option 3 - Change threshold values or add criterion. All processes support changing the threshold value. In addition, if the quantity of entries is more of a concern than time, some processes support specifying additional threshold criteria not used by shipped default settings. For the remote journal, database reader (or database send), and object send processes, you can adjust the threshold so that a number of journal entries is used as criteria instead of, or in conjunction with a time value. If both time and entries are specified, the first criterion reached will trigger the threshold condition. Changes to threshold values are effective the next time the process status is requested.

Option 4 - Adjust the number of object send jobs. This option is only available for the object send process. Determine if the data group uses a shared object send job. If the threshold is persistent, it may be necessary to reduce the number of data groups sharing the same object send process. For details, see Optimizing performance for a shared object send process.

Option 5 - Get assistance. If you tried the other options and threshold conditions persist, contact your Certified MIMIX Consultant for assistance. It may be necessary to change configurations to adjust what is defined to each data group or to make permanent work management changes for specific jobs.