The time required to failover depends on how long it takes to manually promote the replica and redirect traffic. The average time ranges between 20-30 minutes.
Promoting a replica does not automatically set up replication for existing appliances. After promoting a replica, if desired, you can set up replication from the new primary to existing appliances and the previous primary.
-
If the primary appliance is available, to allow replication to finish before you switch appliances, on the primary appliance, put the primary appliance into maintenance mode.
-
Put the appliance into maintenance mode.
-
To use the management console, see Enabling and scheduling maintenance mode
-
You can also use the
ghe-maintenance -s
command.ghe-maintenance -s
-
-
When the number of active Git operations, MySQL queries, and Resque jobs reaches zero, wait 30 seconds.
Note
Nomad will always have jobs running, even in maintenance mode, so you can safely ignore these jobs.
-
To verify all replication channels report
OK
, use theghe-repl-status -vv
command.ghe-repl-status -vv
-
-
Enable maintenance mode on all active replica appliances. For more information, see Enabling and scheduling maintenance mode.
-
On the replica appliance you'd like to fail over to, to stop replication and promote the replica appliance to primary status, use the
ghe-repl-promote
command.ghe-repl-promote
Note
If the primary node is unavailable, warnings and timeouts may occur but can be ignored.
-
Update the DNS record to point to the IP address of the replica. Traffic is directed to the replica after the TTL period elapses. If you are using a load balancer, ensure it is configured to send traffic to the replica.
-
Notify users that they can resume normal operations.
-
If desired, set up replication from the new primary to existing appliances and the previous primary. For more information, see About high availability configuration.
-
Appliances you do not intend to setup replication to that were part of the high availability configuration prior the failover, need to be removed from the high availability configuration by UUID.
-
On the former appliances, get their UUID via
cat /data/user/common/uuid
.cat /data/user/common/uuid
-
On the new primary, remove the UUIDs using
ghe-repl-teardown
. Please replaceUUID
with a UUID you retrieved in the previous step.ghe-repl-teardown -u UUID
-