Cluster Resource Services - assure_mimix - 10.0

Assure MIMIX Operations with PowerHA User Guide

Product type
Software
Portfolio
Integrate
Product family
Assure
Product
Assure MIMIX™ Software
Version
10.0
Language
English
Product name
Assure MIMIX
Title
Assure MIMIX Operations with PowerHA User Guide
Topic type
How Do I
Copyright
2023
First publish date
2009

IBM Cluster Resource Services is part of the base operating system. Cluster Resource Services provides the integrated services and application programming interfaces (APIs) necessary to create and manage a cluster. This includes:

  • Heartbeat monitoring - Heartbeat monitoring ensures that each node (system) in the cluster is active. At regular intervals, each active node in the cluster conveys that it is active by sending a signal to its adjacent nodes. Each node expects an acknowledgment to the heartbeat it sent out as well as an incoming heartbeat from the adjacent node. If a node misses sending a heartbeat for a predetermined number of consecutive heartbeats, a heartbeat failure is signaled. Cluster resource services determines what event to initiate after considering the role of the failing node and whether the failure can be confirmed by a distress message. If the failure cannot be confirmed, cluster resource services will partition the cluster.

  • Reliable messaging - The reliable messaging function keeps track of all nodes within a cluster and ensures that all nodes have consistent information about the state of cluster resources. Any status change for a node is broadcast along with a reason code. Retry and timeout values determine how many times a message can be sent to a node before signaling a failure or partition event. More time is allowed on remote networks.

  • Switchover administration - Cluster resource services maintains the hierarchy of each node when a switchover or failover occurs. The hierarchy, called the recovery domain, determines which node assumes the role of the primary node.

  • Distributed activities - Distributed activities provide the synchronization of actions across the nodes, or a subset of nodes, in a cluster to ensure that all of the nodes affected by the action are involved and that results are consistently reflected across the cluster.

  • Parallel jobs - A set of parallel jobs are used to control the cluster, resources defined to the cluster, perform user and exit program requests, and interact with subsystems for highly available applications.

  • APIs - Application programming interfaces (APIs) provide the ability to create clusters, add or remove nodes, and create and manage the system objects which identify groups of cluster resources.

  • IP address takeover - The IP address takeover function allows access to an application or device without regard to the system on which the application is  running or to where the device is varied on. A floating IP address is switched from the primary node to a backup node without requiring the re-configuration of clients. IP takeover is a key component in providing application resiliency and device resiliency.

  • Resiliency support - Application, data, and device resiliency depend upon cluster resource group (*CRG) system objects. Cluster resource services provides the ability for users and programs to allocate resources to and manage these objects.

The characteristics of messaging and heartbeat monitoring can be adjusted to match the performance of the network.