About high availability replication for clusters
You can provide protection against disruption in a datacenter or cloud region by configuring a cluster deployment of GitHub Enterprise Server for high availability. In a high availability configuration, an identical set of replica nodes sync with the nodes in your active cluster. If hardware or software failures affect the datacenter with your active cluster, you can manually fail over to the replica nodes and continue processing user requests, minimizing the impact of the outage.
In a high availability configuration, nodes that host data services sync regularly with the replica cluster. Replica nodes run in standby and do not serve applications or process user requests.
We recommend configuring high availability as a part of a comprehensive disaster recovery plan for GitHub Enterprise Server clustering. We also recommend performing regular backups. For more information, see Configuring backups on your instance.
Prerequisites
Hardware and software
For each existing node in your active cluster, you'll need to provision a second virtual machine with identical hardware resources. For example, if your cluster has 13 nodes and each node has 12 vCPUs, 96 GB of RAM, and 750 GB of attached storage, you must provision 13 new virtual machines that each have 12 vCPUs, 96 GB of RAM, and 750 GB of attached storage.
On each new virtual machine, install the same version of GitHub Enterprise Server that runs on the nodes in your active cluster. You don't need to upload a license or perform any additional configuration. For more information, see Setting up a GitHub Enterprise Server instance.
Note
The nodes that you intend to use for high availability replication should be standalone GitHub Enterprise Server instances. Don't initialize the replica nodes as a second cluster.
Network
You must assign a static IP address to each new node that you provision, and you must configure a load balancer to accept connections and direct them to the nodes in your cluster's front-end tier.
The latency between primary and replica nodes must be less than 70 milliseconds. We don't recommend configuring a firewall between the nodes' networks. For more information about network connectivity between nodes in the replica cluster, see Cluster network configuration.
Creating a high availability replica for a cluster
To create a high availability replica for your cluster, use the ghe-cluster-repl-bootstrap
utility, then complete the follow-up tasks that the tool details.
-
SSH into any node in your cluster. For more information, see Accessing the administrative shell (SSH).
-
To begin configuration of high availability, run the following command. The
-p
and-s
flags are optional. If you're using the flags, replace PRIMARY-DATACENTER and SECONDARY-DATACENTER with the names of your primary and secondary datacenters.Note
- By default, the utility will use the name of the primary datacenter in
cluster.conf
. - If no name for the primary datacenter is defined, the utility will use
mona
. - If no name for the secondary datacenter is defined, the utility will use
hubot
.
Shell ghe-cluster-repl-bootstrap -p PRIMARY-DATACENTER -s SECONDARY-DATACENTER
ghe-cluster-repl-bootstrap -p PRIMARY-DATACENTER -s SECONDARY-DATACENTER
- By default, the utility will use the name of the primary datacenter in
-
After the utility runs, you will see output with further instructions. To finish the configuration, complete the tasks listed in the output.
Monitoring replication between active and replica cluster nodes
Initial replication between the active and replica nodes in your cluster takes time. The amount of time depends on the amount of data to replicate and the activity levels for GitHub Enterprise Server.
You can monitor the progress on any node in the cluster, using command-line tools available via the GitHub Enterprise Server administrative shell. For more information about the administrative shell, see Accessing the administrative shell (SSH).
To monitor the replication of all services, use the following command.
ghe-cluster-repl-status
You can use ghe-cluster-status
to review the overall health of your cluster. For more information, see Command-line utilities.
Reconfiguring high availability replication after a failover
After you fail over from the cluster's active nodes to the cluster's replica nodes, you can reconfigure high availability in one of two ways. The method you choose will depend on the reason that you failed over, and the state of the original active nodes.
- Provision and configure a new set of replica nodes for each of the new active nodes in your secondary datacenter.
- Use the original active nodes as the new replica nodes.
The process for reconfiguring high availability is identical to the initial configuration of high availability. For more information, see Creating a high availability replica for a cluster.
If you use the original active nodes, after reconfiguring high availability, you will need to unset maintenance mode on the nodes. For more information, see Enabling and scheduling maintenance mode.
Disabling high availability replication for a cluster
You can stop replication to the replica nodes for your cluster deployment of GitHub Enterprise Server using the ghe-cluster-repl-teardown
utility. Alternatively, you can manually disable replication.
Disabling replication using ghe-cluster-repl-teardown
-
SSH into any node in your cluster. For more information, see Accessing the administrative shell (SSH).
-
To disable replication, run the following command:
Shell ghe-cluster-repl-teardown
ghe-cluster-repl-teardown
-
After the configuration run finishes, GitHub Enterprise Server displays the following message.
Finished cluster configuration
Manually disabling replication
-
SSH into any node in your cluster. For more information, see Accessing the administrative shell (SSH).
-
Open the cluster configuration file at
/data/user/common/cluster.conf
in a text editor. For example, you can use Vim. Create a backup of thecluster.conf
file before you edit the file.Shell sudo vim /data/user/common/cluster.conf
sudo vim /data/user/common/cluster.conf
-
In the top-level
[cluster]
section, delete theredis-master-replica
, andmysql-master-replica
key-value pairs. -
Delete each section for a replica node. For replica nodes,
replica
is configured asenabled
. -
Apply the new configuration. This command can take some time to finish, so we recommend running the command in a terminal multiplexer like
screen
ortmux
.ghe-cluster-config-apply
-
After the configuration run finishes, GitHub Enterprise Server displays the following message.
Finished cluster configuration