As part of a disaster recovery plan, you can protect production data on your GitHub Enterprise instance by configuring automated backups.

In this guide

About GitHub Enterprise Backup Utilities

GitHub Enterprise Backup Utilities is a backup system you install on a separate host, which takes backup snapshots of your GitHub Enterprise instance at regular intervals over a secure SSH network connection. You can use a snapshot to restore an existing GitHub Enterprise instance to a previous state from the backup host.

Only data added since the last snapshot will transfer over the network and occupy additional physical storage space. To minimize performance impact, backups are performed online under the lowest CPU/IO priority. You do not need to schedule a maintenance window to perform a backup.

For more detailed information on features, requirements, and advanced usage, see the GitHub Enterprise Backup Utilities README.

Prerequisites

To use GitHub Enterprise Backup Utilities, you must have a Linux or Unix host system separate from your GitHub Enterprise instance.

You can also integrate GitHub Enterprise Backup Utilities into an existing environment for long-term permanent storage of critical data.

We recommend that the backup host and your GitHub Enterprise instance be geographically distant from each other. This ensures that backups are available for recovery in the event of a major disaster or network outage at the primary site.

Physical storage requirements will vary based on Git repository disk usage and expected growth patterns:

Hardware Recommendation
vCPUs 2
Memory 2 GB
Storage Five times the primary instance's allocated storage

More resources may be required depending on your usage, such as user activity and selected integrations.

Installing GitHub Enterprise Backup Utilities

Note: To ensure a recovered appliance is immediately available, perform backups targeting the primary instance even in a Geo-replication configuration.

  1. Download the latest GitHub Enterprise Backup Utilities release and extract the file with the tar command.

    tar -xzvf /path/to/github-backup-utils-vMAJOR.MINOR.PATCH.tar.gz
    
  2. Copy the included backup.config-example file to the backup.config folder and open in an editor.

  3. Set the GHE_HOSTNAME value to your primary GitHub Enterprise instance's hostname or IP address.
  4. Set the GHE_DATA_DIR value to the filesystem location where you want to store backup snapshots.
  5. Open your primary instance's settings page at https://HOSTNAME/setup/settings and add the backup host's SSH key to the list of authorized SSH keys. For more information, see Accessing the administrative shell (SSH).
  6. Verify SSH connectivity with your GitHub Enterprise instance with the ghe-host-check command.

    bin/ghe-host-check
    
  7. To create an initial full backup, run the ghe-backup command.

    bin/ghe-backup
    

For more information on advanced usage, see the GitHub Enterprise Backup Utilities README.

Scheduling a backup

You can schedule regular backups on the backup host using the cron(8) command or a similar command scheduling service. The configured backup frequency will dictate the worst case recovery point objective (RPO) in your recovery plan. For example, if you have scheduled the backup to run every day at midnight, you could lose up to 24 hours of data in a disaster scenario. We recommend starting with an hourly backup schedule, guaranteeing a worst case maximum of one hour of data loss if the primary site data is destroyed.

If backup attempts overlap, the ghe-backup command will abort with an error message, indicating the existence of a simultaneous backup. If this occurs, we recommended decreasing the frequency of your scheduled backups. For more information, see the "Scheduling backups" section of the GitHub Enterprise Backup Utilities README.

Restoring a backup

In the event of prolonged outage or catastrophic event at the primary site, you can restore your GitHub Enterprise instance by provisioning another GitHub Enterprise appliance and performing a restore from the backup host. You must add the backup host's SSH key to the target GitHub Enterprise appliance as an authorized SSH key before restoring an appliance.

If you are restoring to a GitHub Enterprise 2.11 versioned appliance from a 2.9 or 2.10 versioned backup snapshot, you may need to run a migration script on the original appliance first. For more information, see "Migrating audit logs to GitHub Enterprise 2.11."

To restore your GitHub Enterprise instance from the last successful snapshot, use the ghe-restore command. You should see output similar to this:

ghe-restore -c 169.154.1.1
Checking for leaked keys in the backup snapshot that is being restored ...
* No leaked keys found
Connect 169.154.1.1:122 OK (v2.9.0)
WARNING: All data on GitHub Enterprise appliance 169.154.1.1 (v2.9.0)
         will be overwritten with data from snapshot 20170329T150710.
Please verify that this is the correct restore host before continuing.
Type 'yes' to continue: yes
Starting restore of 169.154.1.1:122 from snapshot 20170329T150710
# ...output truncated
Completed restore of 169.154.1.1:122 from snapshot 20170329T150710
Visit https://169.154.1.1/setup/settings to review appliance configuration.

Note: The network settings are excluded from the backup snapshot. You must manually configure the network on the target GitHub Enterprise appliance as required for your environment.

You can use these additional options with ghe-restore command:

  • The -c flag overwrites the settings, certificate, and license data on the target host even if it is already configured. Omit this flag if you are setting up a staging instance for testing purposes and you wish to retain the existing configuration on the target. For more information, see the "Using using backup and restore commands" section of the GitHub Enterprise Backup Utilities README.
  • The -s flag allows you to select a different backup snapshot.