Before you begin using GitHub Enterprise in a production capacity, you should set up a backup host, schedule automated backups, and develop a recovery plan as part of an overall automated backups plan.
GitHub Enterprise Backup Utilities
The GitHub Enterprise Backup Utilities are a companion piece of a software designed to run on a Linux (or other modern Unix) host system separate from the primary GitHub Enterprise instance.
Take backup snapshots at regular intervals over a secure (SSH) network connection initiated from the backup host to your GitHub Enterprise instance. You can take a snapshot while the instance is online / in active use and run under the lowest scheduling priority to minimize performance impact. Snapshots are incremental in that only data added since the last snapshot is transferred over the network and occupies additional physical storage space.
Use backup snapshots to restore an existing or newly provisioned GitHub Enterprise instance to a previous state by issuing a restore command from the backup host.
See the github/backup-utils project repository on GitHub.com for more detailed information on features, requirements, and advanced usage.
Backup host requirements and recommendations
The backup utilities can run on a wide range of Unix/Linux host operating systems in an effort to integrate easily into already established backup and disaster recovery environments. You can integrate the backup utilities into an existing environment for long term permanent storage of mission critical data.
We recommend that the backup host and your GitHub Enterprise instance be geographically distant from each other. This ensures that backups are available for recovery in the face of major disaster or network outage at the primary site.
Physical storage requirements will vary based on Git repository disk usage and expected growth patterns. We recommend allocating at least 5x the amount of storage allocated to your GitHub Enterprise instance to allow for historical snapshots and growth over time.
Hardware recommendations
- vCPU: 2
- Memory: 2GB
- Storage: 5x the primary instance's allocated storage
Installing the Backup Utilities
The backup utilities are distributed as a separate software package. Follow the instructions below to install, configure, and perform an initial full backup of the primary GitHub Enterprise instance.
- Download the latest github/backup-utils release and extract on the backup host.
- Copy the included
backup.config-example
file tobackup.config
and open in an editor. - Set the
GHE_HOSTNAME
value to the primary GitHub Enterprise instance's host name or IP address and theGHE_DATA_DIR
value to the filesystem location where you want to store backup snapshots. - Open the primary instance's settings page at
https://[hostname]/setup/settings
and add the backup host's SSH key to the list of Authorized SSH keys. For more information, see Administrative Shell (SSH) Access. - Run
bin/ghe-host-check
to verify SSH connectivity with the GitHub appliance. - Run
bin/ghe-backup
to perform an initial full backup.
Backup scheduling and recovery point objective
Schedule regular backups on the backup host using cron(8)
or similar command scheduling service. The configured backup frequency will dictate the worst case recovery point objective (RPO) in your recovery plan. For example, if you have scheduled the backup to run every day at midnight, you could lose up to 24 hours of data in a disaster scenario.
We recommend starting with an hourly backup schedule, guaranteeing a worst case maximum of one hour of data loss should primary site data be destroyed. If backup runs overlap, the most recent ghe-backup
command aborts with an error message, indicating the existence of a simultaneous backup. If this occurs, we recommended decreasing the frequency of your scheduled backups to a point where these overlaps do not occur.
Example cron configuration
The following assumes that the backup utilities are installed at /opt/backup-utils
. Issue these commands under a user that has write access to the GHE_DATA_DIR
path set in the backup.config
file.
To schedule hourly backup snapshots with verbose informational output written to a log file and errors generating an email:
MAILTO=admin@example.com
0 * * * * /opt/backup-utils/bin/ghe-backup -v 1>>/opt/backup-utils/backup.log
Recovery procedures
Tip: This section focuses primarily on disaster recovery scenarios, but you can use the restore utility to set up staging environments for testing upgrades or for migrating between supported platforms and major version upgrades.
In the event of prolonged outage or catastrophic event at the primary site, you can restore operations of your GitHub Enterprise environment by provisioning a GitHub Enterprise appliance and performing a restore from the backup host.
To restore a backup, add the backup host's SSH key to the target GitHub Enterprise appliance as an Authorized SSH key. For more information, see "Administrative Shell (SSH) Access."
This example uses the -c
flag which overwrites the settings, certificate, and license data on the target host even if is already configured. Omit this flag if you are setting up a staging instance for testing purposes and you wish to retain the existing configuration on the target.
Note: The network settings are excluded from the backup snapshot. You must manually configure the network on the target GitHub Enterprise appliance as required for your environment.
ghe-restore -c 169.154.1.1 Checking for leaked keys in the backup snapshot that is being restored ... * No leaked keys found Connect 169.154.1.1:122 OK (v2.9.0) WARNING: All data on GitHub Enterprise appliance 169.154.1.1 (v2.9.0) will be overwritten with data from snapshot 20170329T150710. Please verify that this is the correct restore host before continuing. Type 'yes' to continue: yes Starting restore of 169.154.1.1:122 from snapshot 20170329T150710 # ...output truncated Completed restore of 169.154.1.1:122 from snapshot 20170329T150710 Visit https://169.154.1.1/setup/settings to review appliance configuration.