ping903

Table of Contents

1 Overview

Ping903 is designed to periodically monitor a very large number of remote hosts using ICMP ECHO packets. The system is built using the client-server architecture. The main component (ping903) is a daemon that sits in memory and wakes up periodically to send certain number of ICMP echo packets to a preconfigured number of hosts and to collect replies. The resulting round-trip statistics is made available via REST API.

The daemon reads its settings from a plain text configuration file. Most settings have sensible defaults, so that the only thing that the user needs to supply to get started is a list of IP addresses to monitor. This list is referred to in this document as ip-list.

A simple command line client utility (ping903q) allows the user to communicate with the daemon, obtaining the needed information about each host in particular, or all monitored hosts at once. This utility can operate in several modes. In particular, it can be used as Nagios external check tool, instead of the standard check_ping utility.

The package and its companion packages are available for download from https://download.gnu.org.ua/release/ping903.

For the recent news, visit the package development site at https://puszcza.gnu.org.ua/projects/ping903.

The git repository is available at http://git.gnu.org.ua/cgit/ping903.git.

2 Installation

To build ping903 you will need GNU Libmicrohttpd library. It is available for download from http://ftp.gnu.org/gnu/libmicrohttpd.

When building from source package, usual incantations apply:

./configure
make
make install

This will install the package under /usr/local. That is, the server will be installed as /usr/local/sbin/ping903, the client program as /usr/local/bin/ping903q, etc. You can give a number of options to ./configure in order to customize your installation, in particular to alter the default installation paths. For example, to install to the /usr file hierarchy, use

./configure --prefix=/usr

Please refer to the INSTALL document in the source directory for a discussion of available options to configure and their effect.

After installing the package, copy the file src/ping903.conf to /etc/ping903.conf and edit it to your liking. This file contains configuration settings that control the behavior of the server daemon and, to a certain extent, that of a query tool. Short annotations before each statement will help you navigate through it. You will find a detailed discussion of the configuration file in the manpage ping903.conf(5). What follows is a short outline, intended for quick start.

At the very beginning you can leave most settings at their default values. You might wish to supply a list of IP addresses to monitor, although even that is not mandatory, since you can add them later, when the program is already running. To specify them in configuration file, use the ip-list statement. Its argument is either the name of a file with the IP addresses, or a list of IP addresses as a here-document:

ip-list FILENAME

or

ip-list <<EOF
...
EOF

(you can use arbitrary word in place of EOF in the latter example, the only requirement being that the list end with exactly the same word as the one that followed << at its beginning).

In either case, IP addresses must be listed one per line of input. Leading and trailing whitespace is ignored, as well as empty lines. Comments are introduced by a hash sign (#) appearing as the first non-whitespace character on a line.

You are not required to keep all your IP addresses in a single file. If necessary, you can scatter them among several files and name each of them in a separate ip-list statement.

IP addresses listed in ip-list files form the immutable IP list, called so because it cannot be altered while the program is running. The REST API allows the user to add any number of IP addresses at runtime as well as remove any of IP addresses added this way. These addresses form the mutable IP list. Mutable IP list is preserved across program restarts.

This means that actually the immutable IP list is optional. You may choose to keep all monitored addresses in an external storage (an SQL database, for example) and load them dynamically after the daemon has started. A working example program for adding IP addresses from a MySQL database is shipped in the examples directory. A full-fledged client package able to add or delete keywords at runtime, both individually or in batches and providing another features is available from http://git.gnu.org.ua/cgit/ping903/mangemanche.git.

Normally, the ip-list file should contain IP addresses of the hosts to monitor. It is OK, however, to use symbolic DNS names, too. If a hostname resolves to a single A record, such usage is equivalent to placing that IP in the ip-list. However, if the hostname resolves to multiple IPs, only first one will be used.

By default, the server will wake up each minute and send 10 echo requests within 1 second intervals to each registered IP. If the number of collected replies is less than 7, the IP will be declared as dead ("alive": false, in the returned JSON). Otherwise it is considered alive ("alive": true).

The following settings control these parameters:

probe-interval N
Interval between wake-ups in seconds. Default N=60.
ping-count N
Number of ICMP packets to send within each probe. Default N=10.
ping-interval N
Interval in seconds between two sequential echo requests. Default N=1.
tolerance N
Maximum number of lost requests after which the host is considered dead. Default N=3.

Another statement worth your attention is listen. It configures the IP address and port on which the server will listen for incoming HTTP requests. The default is localhost:8080. Change this setting if this port is already occupied on your system or if you wish to make ping903 accessible from outside.

The access to the HTTP interface is protected by the default access control library (the files /etc/hosts.allow and /etc/hosts.deny). Refer to hosts_access(3) for details.

When you have finished with the configuration file, run ping903 to start the daemon. Check if there are no errors (on the standard error and in the syslog channel daemon). To verify if the daemon is operational, run

curl http://localhost:8080/config

This should return the running configuration.

Within the next probe-interval seconds the server will collect enough statistics to answer your queries. You can request information about any particular IP from your ip-list by running

ping903q IP

This will return the current status of the IP, e.g.

$ ping903q 203.0.113.1
203.0.113.1 is alive

To get the detailed statistics use the -v option. The result will be formatted in a ping(8)-like manner:

$ ping903q -v 203.0.113.1
203.0.113.1 is alive
--- 203.0.113.1 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9414ms
rtt min/avg/max/mdev = 41.212/41.265/41.374/0.046 ms

In both cases, any number of IP addresses can be given. E.g. the following command will returns statistics for two IPs:

$ ping903q -v 203.0.113.1 203.0.113.5

To check the current status of all hosts, run

$ ping903q -a

Note, that depending on your settings the output can be huge.

Please refer to ping903q(1), for a detailed discussion of the tool.

3 System start-up sequence

To configure ping903 to start automatically at the system start-up, see the rc subdirectory. It contains start up scripts for various flavors of GNU/Linux distributions. Please refer to the README file in this directory for detailed instructions.

4 Nagios external check

The ping903q tool can be used as a Nagios external check program. The following snippet illustrates the simple Nagios configuration that makes use of it:

# Define the check_ping903 command
define command {
  command_name  check_ping903
  command_line  /usr/bin/ping903q -r -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
}

# Define the service using the new command
define service {
  host_name            server.example.net
  address              203.0.113.1 
  service_description  Server status
  check_command        check_ping903!200.0,20%!600.0,60%
  check_interval  5
  retry_interval  1
}

5 Installation from a git clone

If you are building from a clone of the Git repository, you will need GNU autotools to bootstrap the package first. Run

./bootstrap

in the top level source directory. This will create the configure script and populate the directory with the missing files. Then proceed as described above.

6 REST API

The default channel for communication with the ping903 daemon is the HTTP socket open on localhost port 8080. Only GET requests are allowed. The following endpoints are provided:

6.1 /id

Identifies the running instance. On success, a JSON object with the following attributes is returned:

"package": string
The package name.
"version": string
Package version string.
"pid": number
PID of the running instance.

6.2 /id/ATTR

ATTR is one of the attributes discussed above. Returned is the value of that attribute.

6.3 /host/[NAME]?[select=HOSTLIST][attr=ATTRLIST]

NAME is the IP address or hostname and HOSTLIST is a comma-separated list of such names. If NAME is supplied, it is added at the beginning of HOSTLIST and the hosts in HOSTLIST are looked up in the list of hosts being monitored. Note that NAME is treated as a character string and must coincide exactly with the IP or hostname as it was supplied in configuration. In particular, if a host was specified by its symbolic DNS name in the configuration, exactly that name must be used in URL to obtain statistics for that host. If you wish to use IP, see the /match endpoint, discussed below.

The return value is a JSON array whose elements correspond to the entries in HOSTLIST (after addition of NAME, if given). Each element is an object with the following attributes:

"name": string
The IP or hostname of the host under which it was supplied in the ip-list.
"validity": boolean
Status of this record. If false, the data has not been collected yet or the host is unreachable. A more detailed information is available in the "status" member (see below). If "validity" is false, only the following keys are warranted to be present in the object: "name", "validity", "status", and "xmit-timestamp". If it is true, the full statistics is available as described below.
"status": string
Detailed status of the object. Following values are defined:
"init"
Initial state: data are being collected ("validity":false).
"valid"
The object is valid and its statistics is reliable ("validity": true).
"pending"
The object is valid, it contains reliable statistics. The host is being probed at the moment and the object will be updated soon ("validity": true).
"invalid"
Host is unreachable. No statistics available ("validity": false).
"xmit-timestamp": number
Time (the number of seconds since the Epoch) when the last ICMP ECHO request was transmitted.
"start-timestamp": number
Time when the recent probe sequence was initiated.
"stop-timestamp": number
Time when the recent probe sequence was finished.
"xmit": number
Number of ICMP ECHO requests transmitted during the probe.
"recv": number
Number of ICMP ECHO responses received during the probe.
"loss": number
Percentage of lost packets.
"tmin": number
Minimal round-trip time observed during the probe.
"tmax": number
Maximal round-trip time observed during the probe.
"avg": number
Average round-trip time.
"stddev": number
Standard deviation of round-trip times.
"alive": boolean
Host status computed as a result of the probe. It is true, if the difference between "xmit" and "recv" parameters is less than the "tolerance" configuration setting, and false otherwise.

Example of the returned JSON for a reachable host:

{
   "alive":true,
   "avg":25.85150,
   "dup":0.00000,
   "loss":0.00000,
   "name":"203.0.113.1",
   "recv":10.00000,
   "start-timestamp":1581666176.01285,
   "status":true,
   "stddev":0.03201,
   "stop-timestamp":1581666185.27210,
   "tmax":25.91400,
   "tmin":25.81200,
   "xmit":10.00000,
   "xmit-timestamp":1581666185.24628
}

Example of the returned JSON for an unreachable host:

{
   "name":"203.0.113.2",
   "status":false,
   "xmit-timestamp":1581666176.01373
}

The "attr" request argument allows you to specify attributes in the "stat" object that you are interested in. It is a comma-separated list of attribute names. If given, each returned "stat" object will contain only elements from that list.

6.4 /host

Return statistics for all monitored hosts. The result is returned as an array of JSON "stat" objects (described above).

This is an experimental endpoint. Be careful with it, as it may cause considerable strain on the server.

6.5 /match/[HOST]?[select=HOSTLIST]

Return monitored names that correspond to HOST or HOSTLIST. HOSTLIST is a comma-separated list of host names or IPv4 addresses, HOST is a single such address. Both HOST and HOSTLIST can be supplied: /match/HOST?select=HOSTLIST is equivalent to /match?select=HOST,HOSTLIST.

Each name in the resulting list is resolved and monitored hosts with IPs matching any of its IPv4 addresses are returned as an array of JSON objects. Each element in the array describes a single host from the list and has the following attributes:

"name": string
Name under which this host appears in the HOSTLIST.
"hosts": array of strings
Array of monitored names corresponding to this "name". Each name from the array can be used as argument in a GET request to the /host endpoint.

This array is empty if none of the IP addresses of this "name" are monitored by the server.

If any error occurred during processing, the following attribute is added as well:

"error": string
Textual description of the error.

6.6 /config

Return current server configuration as a JSON object.

6.7 /config/KEYWORD

Return the value of a particular configuration setting.

7 Updating configuration on the fly.

The following requests allow administrator to update the IP list without reloading the server. For the purpose of updating the IP list is sectioned in two parts:

1. Immutable IP addresses
These are IP addresses specified in the configuration file via the ip-list statement. These addresses cannot be modified using the API described in this section. An attempt to do so will return an error status.
2. Mutable IP addresses
These are additional IP addresses configured via this API. The mutable IP addresses are saved in the file /var/lib/ping903/ip-list before starting next ping probe. This file is read upon start-up, after all files supplied in the configuration have been read and processed. This ensures that the mutable IP address list persists between restarts.

7.1 POST /config/ip-list

Adds one or more IP addresses to the list. The request must have the content type "application/json". The content can be either an array of IP addresses in dotted-quad representation (or hostnames that can be resolved to IPv4 addresses) or an object. The latter must contain the attribute "ip-list" whose value is an array of IP addresses formatted as described above, and the "mode" attribute. If "mode" has the value "replace", the new addresses will replace the current content of the ip-list. If its value is "append", the new addresses will be appended to the ip-list.

On success, returns 200 (OK). On error, returns a meaningful error status. If the error response has the content type "application/json", the returned JSON object describes the error in detail. It contains at least the "message" attribute with a descriptive explanation of the error. If the error refers to an element of the "ip-list" array, the "index" attribute contains the 1-based index of that element in the array.

7.2 PUT /config/ip-list/IP

Adds IP to the current IP list. Returns HTTP status 201 (Created) on success. On error, the following codes can be returned:

403 (Forbidden)
The entry for this IP address already exists or (if a hostname is given) the argument cannot be resolved to a IPv4 address. If the Content-Type of the response is "application/json", the "message" attribute of the returned JSON object supplies an explanation of the error.
500 (Internal server error)
Ping903 was unable to fulfill the request. See log files for a detailed diagnostics.

7.3 DELETE /config/ip-list/IP-OR-HOSTNAME

Deletes IP-OR-HOSTNAME from the IP list. Returns 200 (OK) on success. If IP-OR-HOSTNAME was not found in the IP list or is immutable, returns 404 (Not found).

All update requests are queued and take effect at the beginning of the next ping probe.

8 References

Ping903 is documented in the following manual pages:

ping903(8)
Documentation for the ping903 server program.
ping903.conf(5)
Documents the ping903 configuration.
ping903q(1)
Documents the ping903 query program.
ping903.cred(5)
Documents format of the user's credential storage file.

9 Bug reports

If you think you found a bug in ping903 or its documentation, please send a mail to Sergey Poznyakoff or use the bug tracker at https://puszcza.gnu.org.ua/bugs/?group=ping903 (requires authorization).

10 Copyright

Copyright (C) 2020 Sergey Poznyakoff

Permission is granted to anyone to make or distribute verbatim copies of this document as received, in any medium, provided that the copyright notice and this permission notice are preserved, thus giving the recipient permission to redistribute in turn.

Permission is granted to distribute modified versions of this document, or of portions of it, under the above conditions, provided also that they carry prominent notices stating who last changed them.

Author: Sergey Poznyakoff

Created: 2020-03-22 Sun 13:40

Emacs 25.3.1 (Org mode 8.2.10)

Validate