Skip to main content

Replacing a cluster node

To replace a GitHub Enterprise Server node, you must mark the affected nodes offline in the cluster configuration file (cluster.conf) and add the replacement nodes. This might be necessary if a node were to fail, or to add a node with more resources to increase performance.

About replacement of cluster nodes

The cluster topology for GitHub Enterprise Server provides horizontal scaling for companies with tens of thousands of developers. GitHub recommends clustering if a single primary node would routinely experience resource exhaustion. Clustering requires careful planning and additional administrative overhead. For more information, see "About clustering."

You can replace a functional node in a cluster, or you can replace a node that has failed unexpectedly.

Warning: To avoid conflicts, the replacement node must use a new hostname that has not been previously used in the cluster.

Replacing a functional node

  1. Provision and install GitHub Enterprise Server with a unique hostname on the replacement node.

  2. Using the administrative shell or DHCP, only configure the IP address of the replacement node. Don't configure any other settings.

  3. To add the newly provisioned replacement node, on any node, modify the cluster.conf file to remove the failed node and add the replacement node. For example, this modified cluster.conf file replaces ghe-data-node-3 with the newly provisioned node, ghe-replacement-data-node-3:

    [cluster "ghe-replacement-data-node-3"]
      hostname = ghe-replacement-data-node-3
      ipv4 = 192.168.0.7
      # ipv6 = fd12:3456:789a:1::7
      git-server = true
      pages-server = true
      mysql-server = true
      elasticsearch-server = true
      redis-server = true
      memcache-server = true
      metrics-server = true
      storage-server = true
    
  4. From the administrative shell of the node with the modified cluster.conf, run ghe-cluster-config-init. This will initialize the newly added node in the cluster.

  5. From the same node, run ghe-cluster-config-apply. This will validate the configuration file, copy it to each node in the cluster, and configure each node according to the modified cluster.conf file.

  6. If you're taking a node offline that provides data services, such as git-server, pages-server, or storage-server, evacuate the node. For more information, see "Evacuating a cluster node running data services."

  7. To mark the failed node offline, on any node, modify the cluster configuration file (cluster.conf) in the relevant node section to include the text offline = true.

    For example, this modified cluster.conf will mark the ghe-data-node-3 node as offline:

      [cluster "ghe-data-node-3"]
      hostname = ghe-data-node-3
      offline = true
      ipv4 = 192.168.0.6
      # ipv6 = fd12:3456:789a:1::6
      
  8. From the administrative shell of the node where you modified cluster.conf, run ghe-cluster-config-apply. This will validate the configuration file, copy it to each node in the cluster, and mark the node offline.

  9. If you're replacing the primary MySQL or Redis node, in cluster.conf, modify the mysql-master or redis-master value with the replacement node name.

    For example, this modified cluster.conf file specifies a newly provisioned cluster node, ghe-replacement-data-node-1 as the primary MySQL and Redis node:

    mysql-master = ghe-replacement-data-node-1
    redis-master = ghe-replacement-data-node-1
    

Replacing a node in an emergency

  1. Provision and install GitHub Enterprise Server with a unique hostname on the replacement node.

  2. Using the administrative shell or DHCP, only configure the IP address of the replacement node. Don't configure any other settings.

  3. To mark the failed node offline, on any node, modify the cluster configuration file (cluster.conf) in the relevant node section to include the text offline = true.

    For example, this modified cluster.conf will mark the ghe-data-node-3 node as offline:

      [cluster "ghe-data-node-3"]
      hostname = ghe-data-node-3
      offline = true
      ipv4 = 192.168.0.6
      # ipv6 = fd12:3456:789a:1::6
      
  4. From the administrative shell of the node where you modified cluster.conf, run ghe-cluster-config-apply. This will validate the configuration file, copy it to each node in the cluster, and mark the node offline.

  5. To add the newly provisioned replacement node, on any node, modify the cluster.conf file to remove the failed node and add the replacement node. For example, this modified cluster.conf file replaces ghe-data-node-3 with the newly provisioned node, ghe-replacement-data-node-3:

    [cluster "ghe-replacement-data-node-3"]
      hostname = ghe-replacement-data-node-3
      ipv4 = 192.168.0.7
      # ipv6 = fd12:3456:789a:1::7
      git-server = true
      pages-server = true
      mysql-server = true
      elasticsearch-server = true
      redis-server = true
      memcache-server = true
      metrics-server = true
      storage-server = true
    
  6. If you're replacing the primary MySQL or Redis node, in cluster.conf, modify the mysql-master or redis-master value with the replacement node name.

    For example, this modified cluster.conf file specifies a newly provisioned cluster node, ghe-replacement-data-node-1 as the primary MySQL and Redis node:

    mysql-master = ghe-replacement-data-node-1
    redis-master = ghe-replacement-data-node-1
    
  7. From the administrative shell of the node with the modified cluster.conf, run ghe-cluster-config-init. This will initialize the newly added node in the cluster.

  8. From the same node, run ghe-cluster-config-apply. This will validate the configuration file, copy it to each node in the cluster, and configure each node according to the modified cluster.conf file.

  9. If you're taking a node offline that provides data services, such as git-server, pages-server, or storage-server, evacuate the node. For more information, see "Evacuating a cluster node running data services."