About high availability configuration

In a high availability configuration, a fully redundant secondary GitHub Enterprise Server appliance is kept in sync with the primary appliance through replication of all major datastores.

When you configure high availability, there is an automated setup of one-way, asynchronous replication of all datastores (Git repositories, MySQL, Redis, and Elasticsearch) from the primary to the replica appliance.

GitHub Enterprise Server supports an active/passive configuration, where the replica appliance runs as a standby with database services running in replication mode but application services stopped.

注意: 允许 GitHub Enterprise Server 使用最多 8 种高可用性副本(被动和主动/地理副本)。

Targeted failure scenarios

Use a high availability configuration for protection against:

  • 软件崩溃,操作系统失败或不可恢复的应用程序所致。
  • 硬件故障,包括存储硬件、CPU、RAM、网络接口等。
  • 虚拟化主机系统故障,包括 AWS 上未计划和计划内的维护事件
  • 逻辑或物理服务网络(如果故障转移设备在不受故障影响的单独网络上)。

A high availability configuration is not a good solution for:

  • Scaling-out. While you can distribute traffic geographically using geo-replication, the performance of writes is limited to the speed and availability of the primary appliance. For more information, see "About geo-replication."
  • Backing up your primary appliance. A high availability replica does not replace off-site backups in your disaster recovery plan. Some forms of data corruption or loss may be replicated immediately from the primary to the replica. To ensure safe rollback to a stable past state, you must perform regular backups with historical snapshots.
  • Zero downtime upgrades. To prevent data loss and split-brain situations in controlled promotion scenarios, place the primary appliance in maintenance mode and wait for all writes to complete before promoting the replica.

Network traffic failover strategies

During failover, you must separately configure and manage redirecting network traffic from the primary to the replica.

DNS failover

With DNS failover, use short TTL values in the DNS records that point to the primary GitHub Enterprise Server appliance. We recommend a TTL between 60 seconds and five minutes.

During failover, you must place the primary into maintenance mode and redirect its DNS records to the replica appliance's IP address. The time needed to redirect traffic from primary to replica will depend on the TTL configuration and time required to update the DNS records.

If you are using geo-replication, you must configure Geo DNS to direct traffic to the nearest replica. For more information, see "About geo-replication."

Load balancer

负载均衡器设计使用网络设备将 Git 和 HTTP 流量引导至各个 GitHub Enterprise Server 设备。 您可以使用负载均衡器限制引导至设备的流量以确保安全,或者在没有 DNS 记录更改的情况下根据需要重定向流量。 我们强烈建议使用支持 PROXY 协议的基于 TCP 负载均衡器。 对 GitHub Enterprise Server 主机名的 DNS 查询应解析为负载均衡器。 我们建议您启用子域隔离。 如果启用了子域隔离,另一个通配符记录 (*.HOSTNAME) 也应解析为负载均衡器。 更多信息请参阅“启用子域隔离”。

During failover, you must place the primary appliance into maintenance mode. You can configure the load balancer to automatically detect when the replica has been promoted to primary, or it may require a manual configuration change. You must manually promote the replica to primary before it will respond to user traffic. For more information, see "Using GitHub Enterprise Server with a load balancer."

您可以勾选 https://HOSTNAME/status URL 返回的状态代码,以监控 GitHub Enterprise Server 的可用性。 可提供用户流量的设备将返回状态代码 200(正常)。 设备可能出于以下几个原因而返回 503(服务不可用):

  • 设备是被动副本,例如双节点高可用性配置中的副本。
  • 设备处于维护模式。
  • 设备属于地理副本配置,但为非活动的副本。



Utilities for replication management

To manage replication on GitHub Enterprise Server, use these command line utilities by connecting to the replica appliance using SSH.


The ghe-repl-setup command puts a GitHub Enterprise Server appliance in replica standby mode.

  • An encrypted WireGuard VPN tunnel is configured for communication between the two appliances.
  • Database services are configured for replication and started.
  • Application services are disabled. Attempts to access the replica appliance over HTTP, Git, or other supported protocols will result in an "appliance in replica mode" maintenance page or error message.
admin@169-254-1-2:~$ ghe-repl-setup
Verifying ssh connectivity with ...
Connection check succeeded.
Configuring database replication against primary ...
Success: Replica mode is configured against
To disable replica mode and undo these changes, run `ghe-repl-teardown'.
Run `ghe-repl-start' to start replicating against the newly configured primary.


The ghe-repl-start command turns on active replication of all datastores.

admin@169-254-1-2:~$ ghe-repl-start
Starting MySQL replication ...
Starting Redis replication ...
Starting Elasticsearch replication ...
Starting Pages replication ...
Starting Git replication ...
Success: replication is running for all services.
Use `ghe-repl-status' to monitor replication health and progress.


The ghe-repl-status command returns an OK, WARNING or CRITICAL status for each datastore replication stream. When any of the replication channels are in a WARNING state, the command will exit with the code 1. Similarly, when any of the channels are in a CRITICAL state, the command will exit with the code 2.

admin@169-254-1-2:~$ ghe-repl-status
OK: mysql replication in sync
OK: redis replication is in sync
OK: elasticsearch cluster is in sync
OK: git data is in sync (10 repos, 2 wikis, 5 gists)
OK: pages data is in sync

The -v and -vv options give details about each datastore's replication state:

$ ghe-repl-status -v
OK: mysql replication in sync
  | IO running: Yes, SQL running: Yes, Delay: 0

OK: redis replication is in sync
  | master_host:
  | master_port:6379
  | master_link_status:up
  | master_last_io_seconds_ago:3
  | master_sync_in_progress:0

OK: elasticsearch cluster is in sync
  | {
  |   "cluster_name" : "github-enterprise",
  |   "status" : "green",
  |   "timed_out" : false,
  |   "number_of_nodes" : 2,
  |   "number_of_data_nodes" : 2,
  |   "active_primary_shards" : 12,
  |   "active_shards" : 24,
  |   "relocating_shards" : 0,
  |   "initializing_shards" : 0,
  |   "unassigned_shards" : 0
  | }

OK: git data is in sync (366 repos, 31 wikis, 851 gists)
  |                   TOTAL         OK      FAULT    PENDING      DELAY
  | repositories        366        366          0          0        0.0
  |        wikis         31         31          0          0        0.0
  |        gists        851        851          0          0        0.0
  |        total       1248       1248          0          0        0.0

OK: pages data is in sync
  | Pages are in sync


The ghe-repl-stop command temporarily disables replication for all datastores and stops the replication services. To resume replication, use the ghe-repl-start command.

admin@168-254-1-2:~$ ghe-repl-stop
Stopping Pages replication ...
Stopping Git replication ...
Stopping MySQL replication ...
Stopping Redis replication ...
Stopping Elasticsearch replication ...
Success: replication was stopped for all services.


The ghe-repl-promote command disables replication and converts the replica appliance to a primary. The appliance is configured with the same settings as the original primary and all services are enabled.

推广副本不会自动为现有设备创建副本。 在推广副本后,如有需要,可以设置从新的主设备复制到现有设备及之前的主设备。

admin@168-254-1-2:~$ ghe-repl-promote
Enabling maintenance mode on the primary to prevent writes ...
Stopping replication ...
  | Stopping Pages replication ...
  | Stopping Git replication ...
  | Stopping MySQL replication ...
  | Stopping Redis replication ...
  | Stopping Elasticsearch replication ...
  | Success: replication was stopped for all services.
Switching out of replica mode ...
  | Success: Replication configuration has been removed.
  | Run `ghe-repl-setup' to re-enable replica mode.
Applying configuration and starting services ...
Success: Replica has been promoted to primary and is now accepting requests.


The ghe-repl-teardown command disables replication mode completely, removing the replica configuration.

Further reading




所有 GitHub 文档都是开源的。看到错误或不清楚的内容了吗?提交拉取请求。


或者, 了解如何参与。