GitHub Enterprise is your organization's private copy of GitHub contained within a virtual machine that you configure and control. Supported virtualization platforms include Amazon Web Services (cloud hosting) and VMWare (on premises).

This section outlines the basic system and network architecture and options for deploying in your environment.

Storage architecture

The GitHub Enterprise virtual machine requires two storage volumes, which must be attached to the virtual machine separately and are mounted on these paths:

  • / - The root filesystem. Included in the distributed machine image and containing the base operating system along with the GitHub Enterprise application environment. The root filesystem should be treated as ephemeral. Note: Any data on this filesystem will be replaced when upgrading to future GitHub Enterprise releases. The root system maintains the following information:
    • Custom CA Certificates (in /usr/local/share/ca-certificates)
    • Custom networking configurations
    • Custom firewall configurations
    • The replication state
  • /data/user - The user filesystem. This contains all configuration and user data, such as:
    • Git repositories
    • Databases
    • Search indexes
    • Published Pages content
    • Large files from Git Large File Storage
    • Pre-receive hook environments

The intent of this architecture is to simplify upgrade, rollback, and recovery procedures by separating the running software environment from persistent application data.

Network architecture

You can run GitHub Enterprise as either a single virtual machine host or in a two-node active/passive configuration for increased redundancy in high availability deployments. Some organizations with tens of thousands of developers may also benefit from GitHub Enterprise Clustering. For more information, see "Clustering Overview".

When using a single node, network configuration requires only that the GitHub Enterprise instance be reachable from the user population and for administrative purposes. You may choose to make the instance network accessible over the public internet (in private mode only) or restrict access to specific closed networks. For more information on specific network ports required for application and administrative use, see "Network Port Configuration".

The two-node active/passive configuration is more complex and requires that two identical virtual machines be provisioned with separate storage and as much underlying hardware/network redundancy as possible. The instances must be reachable from each other over ports 122/TCP (SSH) and 1194/UDP (OpenVPN) for replication services and DNS must be configured with short TTL values for network failover.

See High Availability Configuration for detailed instructions on configuring a two-node active/passive configuration.

Data retention and datacenter redundancy

Before using GitHub Enterprise in a production environment, we strongly recommend you set up backups and a disaster recovery plan. For more information, see "Backups and Disaster Recovery".

GitHub Enterprise includes support for online and incremental backups via the GitHub Enterprise Backup Utilities. You can take Incremental snapshots over a secure network link (the SSH administrative port) over long distances for off-site / geographically dispersed storage. You can restore snapshots over the network into a newly provisioned GitHub Enterprise virtual machine at time of recovery in case of disaster at the primary datacenter.

In addition to network backups, both AWS (EBS) and VMware disk snapshots of the user storage volumes are supported while the appliance is offline / in maintenance mode. Regular volume snapshots can be used as a low-cost / low-complexity alternative to network backups with backup-utils if your service level requirements allow for regular offline maintenance.