Before you begin using GitHub Enterprise in a production capacity, you should set up a backup host, schedule automated backups, and develop a recovery plan as part of an overall automated backups plan.

GitHub Enterprise Backup Utilities

The GitHub Enterprise Backup Utilities are a companion piece of a software designed to run on a Linux (or other modern Unix) host system separate from the primary GitHub Enterprise instance.

Backup snapshots are taken at regular intervals over a secure (SSH) network connection initiated from the backup host to the GitHub Enterprise instance. Backups may be taken while the instance is online / in active use and run under the lowest scheduling priority to minimize performance impact. Snapshots are incremental in that only data added since the last snapshot is transferred over the network and occupies additional physical storage space.

Backup snapshots may be used to restore an existing or newly provisioned GitHub Enterprise instance to a previous state by issuing a restore command from the backup host.

See the github/backup-utils project repository on GitHub.com for more detailed information on features, requirements, and advanced usage.

Backup host requirements and recommendations

The backup utilities have been designed to run on a wide range of Unix/Linux host operating systems in an effort to integrate easily into already established backup and disaster recovery environments. If your organization is already operating a separate, isolated environment for long term permanent storage of mission critical data, the backup utilities should fit right in.

It's highly recommended that the backup host and primary GitHub Enterprise instance be geographically distant from each other. This ensures that backups are available for recovery in the face of major disaster or network outage at the primary site.

Physical storage requirements will vary based on Git repository disk usage and expected growth patterns. Allocating at least 5x the amount of storage allocated to the primary GitHub appliance is recommended to allow for historical snapshots and growth over time.

The following table is a summary of all the recommended hardware requirements:

  • vCPU: 2
  • Memory: 2GB
  • Storage: 5x the primary instance's allocated storage

Installing the Backup Utilities

The backup utilities are distributed as a separate software package. Follow the instructions below to install, configure, and perform an initial full backup of the primary GitHub Enterprise instance.

  1. Download the latest github/backup-utils release and extract on the backup host.
  2. Copy the included backup.config-example file to backup.config and open in an editor.
  3. Set the GHE_HOSTNAME value to the primary GitHub Enterprise instance's host name or IP address and the GHE_DATA_DIR value to the filesystem location where backup snapshots should be stored.
  4. Open the primary instance's settings page at https://[hostname]/setup/settings and add the backup host's SSH key to the list of Authorized SSH keys. See Administrative Shell (SSH) Access for detailed instructions.
  5. Run bin/ghe-host-check to verify SSH connectivity with the GitHub appliance.
  6. Run bin/ghe-backup to perform an initial full backup.

Backup scheduling and recovery point objective

Regular backups should be scheduled on the backup host using cron(8) or similar command scheduling service. The configured backup frequency will dictate the worst case recovery point objective (RPO) in your recovery plan. For example, if backups are scheduled to run every day at midnight, you could lose up to 24 hours of data in a disaster scenario.

We recommend starting with an hourly backup schedule, guaranteeing a worst case maximum of one hour of data loss should primary site data be destroyed. If backup runs overlap, the most recent ghe-backup command aborts with an error message, indicating the existence of a simultaneous backup. If this occurs, we recommended decreasing the frequency of your scheduled backups to a point where these overlaps do not occur.

Example cron configuration

The following assumes that the backup utilities are installed at /opt/backup-utils. Commands should be issued under a user that has write access to the GHE_DATA_DIR path set in the backup.config file.

To schedule hourly backup snapshots with verbose informational output written to a log file and errors generating an email:

MAILTO=admin@example.com
0 * * * * /opt/backup-utils/bin/ghe-backup -v 1>>/opt/backup-utils/backup.log

Recovery procedures

In the event of prolonged outage or catastrophic event at the primary site, basic operations of your GitHub Enterprise environment can be restored by provisioning a GitHub Enterprise appliance and performing a restore from the backup host.

To successfully restore a backup, add the backup host's SSH key to the GitHub appliance as an Authorized SSH key. For more information, see "Administrative Shell (SSH) Access."

$ ghe-restore 169.154.1.1
Starting restore of 169.154.1.1 from snapshot 20141111T174152
Connect 169.154.1.1 OK (v2.0.0)
Enabling maintenance mode on 169.154.1.1 ...
Restoring Git repositories ...
Restoring GitHub Pages ...
Restoring MySQL database ...
Restoring Redis database ...
Restoring SSH authorized keys ...
Restoring Elasticsearch indices ...
Restoring SSH host keys ...
Completed restore of 169.154.1.1 from snapshot 20141111T174152
Visit https://169.154.1.1/setup/settings to configure the recovered appliance.