Document toolboxDocument toolbox

High Availability (HA) / Duplex Deployment

High Availability / Duplex Deployment 

For high availability, the solution is configured on a three-node cluster. Primary and Secondary nodes have Docker images running while the Arbiter node does the load-balancing.  

The two (primary, secondary) will have everything running as if the Secondary node is a replica of the Primary node. Each node of the cluster exposes a VIP (Virtual IP, using VRR protocol) which will route the service request to the active primary node at any point in time.

The two nodes are synchronized with KeepAlived enabled. As soon as the primary node is down, the secondary node resumes services and acts as primary. Thus, the failover from primary to secondary happens nearly seamlessly provided that KeepAlived is running.

The Arbiter node does not replicate the data. Its purpose is to participate in the election process while electing a primary node.

Elections occur when the primary becomes unavailable. The remaining nodes in the replica set, i.e. arbiter and secondary nodes call for an election to select a new primary so that the cluster resume normal operations. The failover from primary to secondary happens nearly seamlessly.

Fault Tolerance

This three-node cluster design ensures that the system is available if the primary or the secondary is unavailable. If the primary is unavailable, the Arbiter will elect the secondary to be primary.

However, if both the primary and the arbiter are down, the secondary will not be elected for the read/write operations.

Note:

  1. Each Node represents a VM.

  2. At any given point of time at-least two nodes should be up.

  3. For the hardware resilience, these nodes/VMs should be on two physical servers. If all the VMs are deployed on the same physical server, it will provide only VM level resilience and fault tolerance.

Failover scenarios

When the Primary or Secondary node is down

The system will switch to the other active node.

Impact

All applications will continue to work after the seamless failover.

Recovery

NA


When Primary and Secondary nodes are down


Impact

All applications and services will not work.

Recovery

The manual intervention is required to recover the system.



When primary or secondary CIM datastore is down

The load-balancer (Arbiter) elects the datastore instance on the other node as primary.

Impact

All read/write operations on the CIM datastore will continue to function seamlessly.

Recovery

NA


When primary and secondary CIM datastores are down


Impact

All read/write operations on the datastore will fail.

Recovery

The manual intervention is required to resume the datastore operations.


When CIM primary data store and the Arbiter are down


Impact

All write operations on the primary datastore will fail. The read operation (if configured to be read from secondary DB) will continue to work normally.

Recovery

A manual intervention is required to resume the datastore write operations.


CIM secondary data store and the Arbiter are down


Impact

All write operations on the secondary data store will fail. The read operations will continue to work normally.

Recovery

A manual intervention is required to resume the datastore write operations.


The Arbiter is down


Impact

All read/write operations on the datastore will continue to function seamlessly.

Recovery

NA