我们经常发布文档更新,此页面的翻译可能仍在进行中。有关最新信息,请访问英文文档。如果此页面上的翻译有问题,请告诉我们

此版本的 GitHub Enterprise 将停止服务 此版本的 GitHub Enterprise 已停止服务 2020-11-12. 即使针对重大安全问题,也不会发布补丁。 要获得更好的性能、改进的安全性和新功能,请升级到 GitHub Enterprise 的最新版本。 如需升级方面的帮助,请联系 GitHub Enterprise 支持

Monitoring cluster nodes

A GitHub Enterprise Server cluster is comprised of redundant services that are distributed across two or more nodes. If an individual service or an entire node were to fail, it should not be immediately apparent to users of the cluster. However since performance and redundancy are affected, it is important to monitor the health of a GitHub Enterprise Server cluster.

本文内容

Did this doc help you?

Manually checking cluster status

GitHub Enterprise Server has a built-in command line utility for monitoring the health of the cluster. From the administrative shell, running the ghe-cluster-status command executes a series of health checks on each node including verification of connectivity and service status. The output shows all test results including the text ok or error. For example, to only display failing tests, run:

admin@ghe-data-node-0:~$ ghe-cluster-status | grep error
> mysql-replication ghe-data-node-0: error Stopped
> mysql cluster: error

Note: If there are no failing tests, this command produces no output. This indicates the cluster is healthy.

Monitoring cluster status with Nagios

You can configure Nagios to monitor GitHub Enterprise Server. In addition to monitoring basic connectivity to each of the cluster nodes, you can check the cluster status by configuring Nagios to use the ghe-cluster-status -n command. This returns output in a format that Nagios understands.

Prerequisites

  • Linux host running Nagios.
  • Network access to the GitHub Enterprise Server cluster.

Configuring the Nagios host

  1. Generate an SSH key with a blank passphrase. Nagios uses this to authenticate to the GitHub Enterprise Server cluster.

    nagiosuser@nagios:~$ ssh-keygen -t rsa -b 4096
    > Generating public/private rsa key pair.
    > Enter file in which to save the key (/home/nagiosuser/.ssh/id_rsa):
    > Enter passphrase (empty for no passphrase): leave blank by pressing enter
    > Enter same passphrase again: press enter again
    > Your identification has been saved in /home/nagiosuser/.ssh/id_rsa.
    > Your public key has been saved in /home/nagiosuser/.ssh/id_rsa.pub.

    Security Warning: An SSH key without a passphrase can pose a security risk if authorized for full access to a host. Limit this key's authorization to a single read-only command.

  2. Copy the private key (id_rsa) to the nagios home folder and set the appropriate ownership.

    nagiosuser@nagios:~$ sudo cp .ssh/id_rsa /var/lib/nagios/.ssh/
    nagiosuser@nagios:~$ sudo chown nagios:nagios /var/lib/nagios/.ssh/id_rsa
  3. To authorize the public key to run only the ghe-cluster-status -n command, use a command= prefix in the /data/user/common/authorized_keys file. From the administrative shell on any node, modify this file to add the public key generated in step 1. For example: command="/usr/local/bin/ghe-cluster-status -n" ssh-rsa AAAA....

  4. Validate and copy the configuration to each node in the cluster by running ghe-cluster-config-apply on the node where you modified the /data/user/common/authorized_keys file.

    admin@ghe-data-node-0:~$ ghe-cluster-config-apply
    > Validating configuration
    > ...
    > Finished cluster configuration
  5. To test that the Nagios plugin can successfully execute the command, run it interactively from Nagios host.

    nagiosuser@nagios:~$ /usr/lib/nagios/plugins/check_by_ssh -l admin -p 122 -H hostname -C "ghe-cluster-status -n" -t 30
    > OK - No errors detected
  6. Create a command definition in your Nagios configuration.

    Example definition
    define command {
          command_name    check_ssh_ghe_cluster
          command_line    $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "ghe-cluster-status -n" -l admin -p 122 -t 30
    }
    
  7. Add this command to a service definition for a node in the GitHub Enterprise Server cluster.

    Example definition
    define host{
          use                     generic-host
          host_name               ghe-data-node-0
          alias                   ghe-data-node-0
          address                 10.11.17.180
          }
    
    define service{
            use                             generic-service
            host_name                       ghe-data-node-0
            service_description             GitHub Cluster Status
            check_command                   check_ssh_ghe_cluster
            }
    

Once you add the definition to Nagios, the service check executes according to your configuration. You should be able to see the newly configured service in the Nagios web interface.

Nagios Example

Did this doc help you?