Welcome to SimpleMonitor
Installation
SimpleMonitor is available via PyPi:
pip install simplemonitor
Tip
You may want to install it in a virtualenv, or you can use pipx which automatically manages virtualenvs for command-line tools.
Create the configuration files: monitor.ini
and monitors.ini
. See
Configuration.
Warning
I know the configuration file names are dumb, sorry.
Running
Just run:
simplemonitor
SimpleMonitor does not fork. For best results, run it with a service management tool such as daemontools, supervisor, or systemd. You can find some sample configurations for this purpose on GitHub.
SimpleMonitor will look for its configuration files in the current working
directory. You can specify a different configuration file using -f
.
You can verify the configuration files syntax with -t
.
By default, SimpleMonitor’s output is limited to errors and other issues, and
it emits a .
character every two loops. Use -H
to disable the latter,
and -v
, -d
and -q
(or -l
) to control the former.
If you are using something like systemd or multilog which add their own
timestamps to the start of the line, you may want --no-timestamps
to
avoid having unnecessary timestamps added.
Command Line Options Reference
General options
- -h, --help
show help message and exit
- --version
show version number and exit
Execution options
- -p PIDFILE, --pidfile PIDFILE
Write PID into this file
- -N, --no-network
Disable network listening socket (if enabled in config)
- -f CONFIG, --config CONFIG
configuration file (this is the main config; you also need monitors.ini (default filename)
- -j THREADS, --threads THREADS
number of threads to run for checking monitors (default is number of CPUs detected)
- Output options
- -v, --verbose
Alias for
--log-level=info
- -q, --quiet
Alias for
--log-level=critical
- -d, --debug
Alias for
--log-level=debug
- -H, --no-heartbeat
Omit printing the
.
character when running checks- -l LOGLEVEL, --log-level LOGLEVEL
Log level: critical, error, warn, info, debug
- -C, --no-colour, --no-color
Do not colourise log output
- --no-timestamps
Do not prefix log output with timestamps
- Testing options
- -t, --test
Test config and exit
These options are really for testing SimpleMonitor itself, and you probably don’t need them.
- -1, --one-shot
Run the monitors once only, without alerting. Require monitors without “fail” in the name to succeed. Exit zero or non-zero accordingly.
- --loops LOOPS
Number of iterations to run before exiting
- --dump-known-resources
Print out loaded Monitor, Alerter and Logger types
Configuration
The main configuration lives in monitor.ini
. By default, SimpleMonitor will
look for it in the working directory when launched. To specify a different
file, use the -f
option.
The format is fairly standard “INI”; section names are lowercase in [square
brackets]
, and values inside the sections are defined as key=value
. You
can use blank lines to space things out, and comments start with #
.
Section names and option values, but not option names, support environment
variable injection. To include the value of an environment variable, use
%env:VARIABLE%
, which will inject the value of $VARAIBLE
from the
environment. You can use this to share a common configuration file across
multiple hosts, for example.
This main configuration file contains the global settings for SimpleMonitor,
plus the logging and alerting configuration. A separate file, by default
monitors.ini
, contains the monitor configuration. You can specify a
different monitors configuration file using a directive in the main
configuration.
Warning
I know the configuration file names are dumb, sorry.
Configuration value types
Values which take bool accept 1
, yes
, and true
as truthy, and
everything else as falsey.
Values which take bytes accept suffixes of K
, M
, or G
for
kibibytes, mibibytes or gibibytes, otherwise are just a number of bytes.
monitor.ini
This file must contain a [monitor]
section, which must contain at least the interval
setting.
[monitor]
section
- interval
- Type
integer
- Required
true
defines how many seconds to wait between running all the monitors. Note that the time taken to run the monitors is not subtracted from the interval, so the next iteration will run at interval + time_to_run_monitors seconds.
- monitors
- Type
string
- Required
false
- Default
monitors.ini
the filename to load the monitors themselves from. Relative to the cwd, not the path of this configuration file.
- pidfile
- Type
string
- Required
false
- Default
none
the path to write a pidfile to.
- remote
- Type
bool
- Required
false
- Default
false
enables the listener for receiving data from remote instances. Can be overridden to disabled with
-N
command line option.
- remote_port
- Type
integer
- Required
if
remote
is enabled
the TCP port to listen on for remote data
- key
- Type
string
- Required
if
remote
is enabled
shared secret for validating data from remote instances.
- bind_host
- Type
string
- Required
false
- Default
0.0.0.0
(all interfaces)
the local IP address to listen on, if
remote
is enabled.
- hup_file
- Type
string
- Required
false
- Default
none
a file to watch the modification time on. If the modification time increases, SimpleMonitor reloads its configuration.
Tip
SimpleMonitor will reload if it receives SIGHUP; this option is useful for platforms which don’t have that.
- bind_host
- Type
string
- Required
false
- Default
all interfaces
the local address to bind to for remote data
[reporting]
section
- loggers
- Type
comma-separated list of string
- Required
false
- Default
none
the names of the loggers you want to use. Each one must be a
[section]
in this configuration file.See Loggers for the common options and list of Alerters with their configurations.
- alerters
- Type
comma-separated list of string
- Required
false
- Default
none
the names of the alerters you want to use. Each one must be a
[section]
in this configuration file.See Alerters for the common options and list of Alerters with their configurations.
monitors.ini
This file only contains monitors. Each monitor is a [section]
in the file,
with the section name giving the monitor its name. The name defaults
is
reserved, and can be used to specify default values for options. Each monitor’s
individual configuration overrides the defaults.
See Monitors for the common options and list of Monitors with their configurations.
Example configuration
This is an example pair of configuration files to show what goes where. For more examples, see Config examples.
monitor.ini
:
[monitor]
interval=60
[reporting]
loggers=logfile
alerters=email,sms
# write a log file with the state of each monitor, each time
[logfile]
type=logfile
filename=monitor.log
# email me when monitors fail or succeed
[email]
type=email
host=mailserver.example.com
from=monitor@example.com
to=admin@example.com
# send me an SMS after a monitor has failed 10 times in a row
[sms]
type=bulksms
username=some-username
password=some-password
target=+447777123456
limit=10
monitors.ini
:
# check the webserver pings
[www-ping]
type=ping
host=www.example.com
# check the webserver answers https; don't bother checking if it's not pinging
[www-http]
type=http
url=https://www.example.com
depend=www-ping
# check the root partition has at least 1GB of free space
[root-diskspace]
type=diskspace
partition=/
limit=1G
Reloading
You can send SimpleMonitor a SIGHUP to make it reload its configuration. On platforms which don’t have that (e.g. Windows), you can specify a file to watch. If the modification time of the file changes, SimpleMonitor will reload its configration.
Reloading will pick up a change to interval
but no other configuration in
the [monitor]
section. Monitors, Alerters and Loggers are reloaded. You can
add and remove them, and change their configurations, but not change their
types. (To change a type, first remove it from the configuration and reload,
then add it back in.)
Monitor Configuration
Monitors are defined in (by default) monitors.ini
. The monitor is named
by its [section]
heading. If you create a [defaults]
section, the
values are used as defaults for all the other monitors. Each monitor’s
configuration will override the values from the default.
Contents
Common options
These options are common to all monitor types.
- type
- Type
string
- Required
true
the type of the monitor; one of those in the list below.
- runon
- Type
string
- Required
false
- Default
none
a hostname on which the monitor should run. If not set, always runs. You can use this to share one config file among many hosts. (The value which is compared to is that returned by Python’s
socket.gethostname()
.)
- depend
- Type
comma-separated list of string
- Required
false
- Default
none
the monitors on which this one depends. This monitor will run after those, unless one of them fails or is skipped, in which case this one will also skip. A skip does not trigger an alerter.
- tolerance
- Type
integer
- Required
false
- Default
1
the number of times a monitor can fail before it enters the failed state. Handy for things which intermittently fail, such as unreliable links. The number of times the monitor has actually failed, minus this number, is its “Virtual Failure Count”. See also the limit option on Alerters.
- urgent
- Type
boolean
- Required
false
- Default
true
if this monitor is “urgent” or not. Non-urgent monitors do not trigger urgent alerters (e.g. BulkSMS)
- gap
- Type
integer
- Required
false
- Default
0
the number of seconds this monitor should allow to pass before polling. Use it to make a monitor poll only once an hour (
3600
), for example. Setting this value lower than theinterval
will have no effect, and the monitor will run every loop like normal.Some monitors default to a higher value when it doesn’t make sense to run their check too frequently because the underlying data will not change that often or quickly, such as pkgaudit. You can override their default to a lower value as required.
Hint
Monitors which are in the failed state will poll every loop, regardless of this setting, in order to detect recovery as quickly as possible
- remote_alert
- Type
boolean
- Required
false
- Default
false
set to true to have this monitor’s alerting handled by a remote instance instead of the local one. If you’re using the remote feature, this is a good candidate to put in the
[defaults]
.
- recover_command
- Type
string
- Required
false
- Default
none
a command to execute once when this monitor enters the failed state. For example, it could attempt to restart a service.
- recovered_command
- Type
string
- Required
false
- Default
none
a command to execute once when this monitor returns to the OK state. For example, it could restart a service which was affected by the failure of what this monitor checks.
- notify
- Type
boolean
- Required
false
- Default
true
if this monitor should alert at all.
- group
- Type
string
- Required
false
- Default
default
the group the monitor belongs to. Alerters and Loggers will only fire for monitors which appear in their groups.
- failure_doc
- Type
string
- Required
false
- Default
none
information to include in alerts on failure (e.g. a URL to a runbook)
- gps
- Type
string
- Required
no, unless you want to use the html logger’s map
comma-separated latitude and longitude of this monitor
Monitors
Note
The type
of the monitor is the first word in its heading.
apcupsd - APC UPS status
Uses an existing and configured apcupsd
installation to check the UPS status. Any status other than ONLINE
is a failure.
- path
- Type
string
- Required
false
- Default
none
the path to the
apcaccess
binary. On Windows, defaults toC:\apcupsd\bin
. On other platforms, looks in$PATH
.
arlo_camera - Arlo camera battery level
Checks Arlo camera battery level is high enough.
- username
- Type
string
- Required
true
Arlo username
- password
- Type
string
- Required
true
Arlo password
- device_name
- Type
string
- Required
true
the device to check (e.g.
Front Camera
)
- base_station_id
- Type
integer
- Required
false
- Default
0
the number of your base station. Only required if you have more than one. It’s an array index, but figuring out which is which is an exercise left to the reader.
command - run an external command
Run a command, and optionally verify its output. If the command exits non-zero, this monitor fails.
- command
- Type
string
- Required
true
the command to run.
- result_regexp
- Type
string (regular expression)
- Required
false
- Default
none
if supplied, the output of the command must match else the monitor fails.
- result_max
- Type
integer
- Required
false
if supplied, the output of the command is evaluated as an integer and if greater than this, the monitor fails. If the output cannot be converted to an integer, the monitor fails.
compound - combine monitors
Combine (logical-and) multiple monitors. By default, if any monitor in the list is OK, this monitor is OK. If they all fail, this monitor fails. To change this limit use the min_fail
setting.
Warning
Do not specify the other monitors in this monitor’s depends
setting. The dependency handling for compound monitors is a special case and done for you.
- monitors
- Type
comma-separated list of string
- Required
true
the monitors to combine
- min_fail
- Type
integer
- Required
false
- Default
the number of monitors in the list
the number of monitors from the list which should be failed for this monitor to fail. The default is that all the monitors must fail.
diskspace - free disk space
Checks the free space on the given partition/drive.
- partition
- Type
string
- Required
true
the partition/drive to check. On Windows, give the drive letter (e.g.
C:
). Otherwise, give the mountpoint (e.g./usr
).
dns - resolve record
Attempts to resolve the DNS record, and optionally checks the result. Requires dig
to be installed and on the PATH.
- record
- Type
string
- Required
true
the DNS name to resolve
- record_type
- Type
string
- Required
false
- Default
A
the type of record to request
- desired_val
- Type
string
- Required
false
if not given, this monitor simply checks the record resolves.
Give the special value
NXDOMAIN
to check the record does not resolve.If you need to check a multivalue response (e.g. MX records), format them like this (note the leading spaces on the continuation lines):
desired_val=10 a.mx.domain.com 20 b.mx.domain.com 30 c.mx.domain.com
- server
- Type
string
- Required
false
the server to send the request to. If not given, uses the system default.
- port
- Type
integer
- Required
false
- Default
53
the port on the DNS server to use
eximqueue - Exim queue size
Checks the output of exigrep
to make sure the queue isn’t too big.
- max_length
- Type
integer
- Required
false
- Default
1
the maximum acceptable queue length
- path
- Type
string
- Required
false
- Default
/usr/local/sbin
the path containing the
exigrep
binary
fail - alawys fails
This monitor fails 5 times in a row, then succeeds once. Use for testing. See the null monitor for the inverse.
filestat - file size and age
Examines a file’s size and age. If neither of the age/size values are given, simply checks the file exists.
- filename
- Type
string
- Required
true
the path of the file to monitor.
- maxage
- Type
integer
- Required
false
the maximum allowed age of the file in seconds. If not given, not checked.
hass_sensor - Home Automation Sensors
This monitor checks for the existence of a home automation sensor.
- url
- Type
string
- Required
true
API URL for the monitor
- sensor
- Type
string
- Required
true
the name of the sensor
- token
- Type
string
- Required
true
API token for the sensor
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request to HASS
host - ping a host
Check a host is pingable.
Tip
This monitor relies on executing the ping
command provided by your OS. It has known issues on non-English locales on Windows. You should use the ping monitor instead. The only reason to use this one is that it does not require SimpleMonitor to run as root.
- host
- Type
string
- Required
true
the hostname/IP to ping
- ping_regexp
- Type
regexp
- Required
false
- Default
automatic
the regexp which matches a successful ping. You may need to set this to use this monitor in a non-English locale.
- time_regexp
- Type
regexp
- Required
false
- Default
automatic
the regexp which matches the ping time in the output. Must set a match group named
ms
. You may need to set this as above.
http - fetch and verify a URL
Attempts to fetch a URL and makes sure the HTTP return code is (by default) 200/OK. Can also match the content of the page to a regular expression.
- url
- Type
string
- Required
true
the URL to open
- regexp
- Type
regexp
- Required
false
- Default
none
the regexp to look for in the body of the response
- allowed_codes
- Type
comma-separated list of integer
- Required
false
- Default
200
a list of acceptable HTTP status codes
- allow_redirects
- Type
bool
- Required
false
- Default
true
Follow redirects
- username
- Type
str
- Required
false
- Default
none
Username for http basic auth
- password
- Type
str
- Required
false
- Default
none
Password for http basic auth
- verify_hostname
- Type
boolean
- Required
false
- Default
true
set to false to disable SSL hostname verification (e.g. with self-signed certificates)
- timeout
- Type
integer
- Required
false
- Default
5
the timeout in seconds for the HTTP request to complete
- headers
- Type
JSON map as string
- Required
false
- Default
{}
JSON map of HTTP header names and values to add to the request
loadavg - load average
Check the load average on the host.
- which
- Type
integer
- Required
false
- Default
1
the load average to monitor.
0
= 1min,1
= 5min,2
= 15min
- max
- Type
float
- Required
false
- Default
1.00
the maximum acceptable load average
memory - free memory percent
Check free memory percentage.
- percent_free
- Type
int
- Required
true
the minimum percent of available (as per psutils’ definition) memory
null - always passes
Monitor which always passes. Use for testing. See the fail monitor for the inverse.
This monitor has no additional parameters.
ping - ping a host
Pings a host to make sure it’s up. Uses a Python ping module instead of calling out to an external app, but needs to be run as root.
- host
- Type
string
- Required
true
the hostname or IP to ping
- timeout
- Type
int
- Required
false
- Default
5
the timeout for the ping in seconds
pkgaudit - FreeBSD pkg audit
Fails if pkg audit
reports any vulnerable packages installed.
- path
- Type
string
- Required
false
- Default
/usr/local/sbin/pkg
the path to the
pkg
binary
portaudit - FreeBSD port audit
Fails if portaudit
reports any vulnerable ports installed.
- path
- Type
string
- Required
false
- Default
/usr/local/sbin/portaudit
the path to the
portaudit
binary
process - running process
Check for a running process.
- process_name
- Type
string
- Required
true
the process name to check for
- min_count
- Type
integer
- Required
false
- Default
1
the minimum number of matching processes
- max_count
- Type
integer
- Required
false
- Default
infinity
the maximum number of matching processes
- username
- Type
string
- Required
false
- Default
any user
limit matches to processes owned by this user.
rc - FreeBSD rc service
Checks a FreeBSD-style service is running, by running its rc script (in /usr/local/etc/rc.d) with the status command.
Tip
You may want the unix_service monitor for a more generic check.
- service
- Type
string
- Required
true
the name of the service to check. Should be the name of the rc.d script in
/usr/local/etc/rc.d
. Any trailing.sh
is optional and added if needed.
- path
- Type
string
- Required
false
- Default
/usr/local/etc/rc.d
the path of the folder containing the rc script.
- return_code
- Type
integer
- Required
false
- Default
0
the required return code from the script
ring_doorbell - Ring doorbell battery
Check the battery level of a Ring doorbell.
- device_name
- Type
string
- Required
true
the name of the Ring Doorbell to monitor.
- minimum_battery
- Type
integer
- Required
false
- Default
25
the minimum battery percent allowed.
- username
- Type
string
- Required
true
your Ring username (e.g. email address). Accounts using MFA are not supported. You can create a separate user for API access.
- password
- Type
string
- Required
true
your Ring password.
Warning
Do not commit credentials to source control!
- device_type
- Type
string
- Required
false
- Default
doorbell
the device type. Acceptable values are
doorbell
orcamera
.
service - Windows Service
Checks a Windows service to make sure it’s in the correct state.
- service
- Type
string
- Required
true
the short name of the service to monitor (this is the “Service Name” on the General tab of the service Properties in the Services MMC snap-in).
- want_state
- Type
string
- Required
false
- Default
RUNNING
the required status for the service. One of:
RUNNING
STOPPED
PAUSED
START_PENDING
PAUSE_PENDING
CONTINUE_PENDING
STOP_PENDING
Tip
version 1.9 and earlier had a host parameter, which is no longer used.
svc - daemontools service
Checks a daemontools supervise
-managed service is running.
- path
- Type
string
- Required
true
the path to the service’s directory (e.g.
/var/service/something
)
swap - available swap space
Checks for available swap space.
- percent_free
- Type
integer
- Required
true
minimum acceptable free swap percent
systemd-unit - systemd unit check
Monitors a systemd unit status, via dbus. You may want the unix_service monitor instead if you just want to ensure a service is running.
- name
- Type
string
- Required
true
the name of the unit to monitor
- load_states
- Type
comma-separated list of string
- Required
false
- Default
loaded
desired load states for the unit
- active_states
- Type
comma-separated list of string
- Required
false
- Default
active,reloading
desired active states for the unit
- sub_states
- Type
comma-separated list of string
- Required
false
- Default
none
desired sub states for the service
tcp - open TCP port
Checks a TCP port is connectible. Doesn’t care what happens after the connection is opened.
- host
- Type
string
- Required
true
the name/IP of the host to connect to
- port
- Type
integer
- Required
true
the port number to connect to.
tls_expiry - TLS cert expiration
Checks an SSL/TLS certificate is not due to expire/has expired.
Note
This monitor’s gap defaults to 12 hours.
Warning
Due to a limitation of the underlying Python modules in use, this does not currently support TLS 1.3.
- host
- Type
string
- Required
true
the hostname to connect to
- port
- Type
integer
- Required
false
- Default
443
the port number to connect on
- min_days
- Type
integer
- Required
false
- Default
7
the minimum allowable number of days until expiry
- sni
- Type
string
- Required
false
the hostname to send during TLS handshake for SNI. Use if you are serving multiple certificates from the same host/port. If empty, will just get the default certificate from the server
unifi_failover - USG failover WAN status
Checks a Unifi Security Gateway for failover WAN status. Connects via SSH; the USG must be in your known_hosts
file. Requires the specified interface to have the carrier up, a gateway, and not be in the failover
state.
- router_address
- Type
string
- Required
true
the address of the USG
- router_username
- Type
string
- Required
true
the username to log in as
- router_password
- Type
string
- Required
conditional
the password to log in with. Required if not using
ssh_key
.
- ssh_key
- Type
string
- Required
conditional
path to the SSH private key to log in with. Required if not using
router_password
.
- check_interface
- Type
string
- Required
false
- Default
eth2
the interface which should be ready for failover.
unifi_watchdog - USG failover watchdog
Checks a Unifi Security Gateway to make sure the failover WAN is healthy. Connects via SSH; the USG must be in your known_hosts
file. Requires the specified interface to have status Running
and the ping target to be REACHABLE
.
- router_address
- Type
string
- Required
true
the address of the USG
- router_username
- Type
string
- Required
true
the username to log in as
- router_password
- Type
string
- Required
conditional
the password to log in with. Required if not using
ssh_key
.
- ssh_key
- Type
string
- Required
conditional
path to the SSH private key to log in with. Required if not using
router_password
.
- primary_interface
- Type
string
- Required
false
- Default
pppoe0
the primary WAN interface
- secondary_interface
- Type
string
- Required
false
- Default
eth2
the secondary (failover) WAN interface
unix_service - generic UNIX service
Generic UNIX service check, by running service ... status
.
- service
- Type
string
- Required
true
the name of the service to check
- state
- Type
string
- Required
false
- Default
running
the state of the service; either
running
(status command exits 0) orstopped
(status command exits 1).
Alerter Configuration
Alerters send one-off alerts when a monitor fails. They can also send an alert when it succeeds again.
An alerter knows if it is urgent or not; if a monitor defined as non-urgent fails, an urgent alerter will not trigger for it. This means you can avoid receiving SMS alerts for things which don’t require your immediate attention.
Alerters can also have a time configuration for hours when they are or are not allowed to alert. They can also send an alert at the end of the silence period for any monitors which are currently failed.
Alerters are defined in the main configuration file, which by default is monitor.ini
. The section name is the name of your alerter, which you should then add to the alerters
configuration value.
Common options
These options are common to all alerter types.
- type
- Type
string
- Required
true
the type of the alerter; one of those in the list below.
- depend
- Type
comma-separated list of string
- Required
false
- Default
none
a list of monitors this alerter depends on. If any of them fail, no attempt will be made to send the alert.
- limit
- Type
integer
- Required
false
- Default
1
the (virtual) number of times a monitor must have failed before this alerter fires for it. You can use this to escalate an alert to another email address or text messaging, for example. See the tolerance Monitor configuration option.
- dry_run
- Type
boolean
- Required
false
- Default
false
makes an alerter do everything except actually send the message, and instead will print some information about what it would do.
- ooh_success
- Type
boolean
- Required
false
- Default
false
makes an alerter trigger its success action even if out of hours
- groups
- Type
comma-separated list of string
- Required
false
- Default
default
list of monitor groups this alerter should fire for. See the group setting for monitors.
- only_failures
- Type
boolean
- Required
false
- Default
false
if true, only send alerts for failures (or catchups)
- tz
- Type
string
- Required
false
- Default
UTC
the timezone to use in alert messages. See also
times_tz
.
- repeat
- Type
boolean
- Required
false
- Default
false
fire this alerter (for a failed monitor) every iteration
- urgent
- Type
boolean
- Required
false
if the alerter should be urgent or not. The default varies from alerter to alerter. Typically, those which send “page” style alerts such as SMS default to urgent. You can use this option to override that in e.g. the case of the SNS alerter, which could be urgent if sending SMSes, but non-urgent if sending emails.
Time restrictions
All alerters accept time period configuration. By default, an alerter is active at all times, so you will always immediately receive an alert at the point where a monitor has failed enough (more times than the limit). To set limits on when an alerter can send, use the configuration values below.
Note that the times_type
option sets the timezone all the values are interpreted as. The default is the local timezone of the host evaluating the logic.
- day
- Type
comma-separated list of integer
- Required
false
- Default
all days
which days an alerter can operate on.
0
is Monday,6
is Sunday.
- times_type
- Type
string
- Required
false
- Default
always
one of
always
,only
, ornot
.only
means that the limits specify the period the alerter is allowed to operate in.not
means the specify the period it isn’t, and outside of that time it is allowed.
- time_lower
- Type
string
- Required
when
times_type
is notalways
the lower end of the time range. Must be lower than
time_upper
. The format isHH:mm
in 24-hour clock.
- time_upper
- Type
string
- Required
when
times_type
is notalways
the upper end of the time range. Must be lower than
time_lower
. The format isHH:mm
in 24-hour clock.
- times_tz
- Type
string
- Required
false
- Default
the host’s local time
the timezone for
day
,time_lower
andtime_upper
to be interpreted in.
- delay
- Type
boolean
- Required
false
- Default
false
set to true to have the alerter send a “catch-up” alert about a failed monitor if it failed during a time the alerter was not allowed to send, and is still failed as the alerter enters the time it is allowed to send. If the monitor fails and recovers during the not-allowed time, no alert is sent either way.
Time examples
These snippets omit the alerter-specific configuration values.
Don’t trigger during the hours I’m in the office (8:30am to 5:30pm, Monday to Friday):
[out_of_hours]
type=some-alerter-type
times_type=not
time_lower=08:30
time_upper_17:30
days=0,1,2,3,4
Don’t send at antisocial times, but let me know later if something broke and hasn’t recovered yet:
[polite_alerter]
type=some-alerter-type
times_type=only
time_lower=07:30
time_upper=22:00
delay=1
Alerters
Note
The type
of the alerter is the first word in its heading.
46elks - 46elks notifications
Warning
Do not commit your credentials to a public repo!
You will need to register for an account at 46elks.
- username
- Type
string
- Required
true
your 46wlks username
- password
- Type
string
- Required
true
your 46wlks password
- target
- Type
string
- Required
true
46elks target value
- sender
- Type
string
- Required
false
- Default
SmplMntr
your SMS sender field. Start with a
+
if using a phone number.
- api_host
- Type
string
- Required
false
- Default
api.46elks.com
API endpoint to use
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request
bulksms - SMS via BulkSMS
Warning
Do not commit your credentials to a public repo!
- sender
- Type
string
- Required
false
- Default
SmplMntr
who the SMS should appear to be from. Max 11 chars, and best to stick to alphanumerics.
- username
- Type
string
- Required
true
your BulkSMS username
- password
- Type
string
- Required
true
your BulkSMS password
- target
- Type
string
- Required
true
the number to send the SMS to. Specify using country code and number, with no
+
or international prefix. For example,447777123456
for a UK mobile.
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request
email - send via SMTP
Warning
Do not commit your credentials to a public repo!
- host
- Type
string
- Required
true
the email server to connect to
- port
- Type
integer
- Required
false
- Default
25
the port to connect on
- from
- Type
string
- Required
true
the email address to give as the sender
- to
- Type
string
- Required
true
the email address to send to. You can specify multiple addresses by separating with
;
.
- cc
- Type
string
- Required
false
the email address to cc to. You can specify multiple addresses by separating with
;
.
- username
- Type
string
- Required
false
the username to log in to the SMTP server with
- password
- Type
string
- Required
false
the password to log in to the SMTP server with
- ssl
- Type
string
- Required
false
specify
starttls` to use StartTLS. Specify ``yes
to use SMTP SSL. Otherwise, no SSL is used at all.
execute - run external command
- fail_command
- Type
string
- Required
false
command to execute when a monitor fails
- success_command
- Type
string
- Required
false
command to execute when a montior recovers
- catchup_command
- Type
string
- Required
false
command to execute when exiting a time period when the alerter couldn’t fire, a monitor failed during that time, and hasn’t recovered yet. (See the
delay
configuration option.) If you specify the literal stringfail_command
, this will share thefail_command
configuration value.
You can specify the following variable inside {curly brackets}
to have them substituted when the command is executed:
hostname
: the host the monitor is running onname
: the monitor’s namedays
,hours
,minutes
, andseconds
: the monitor’s downtimefailed_at
: the date and time the monitor failedvitual_fail_count
: the monitor’s virtual failure count (number of failed checks -tolerance
)info
: the additional information the monitor recorded about its statusdescription
: description of what the monitor is checking
You will probably need to quote parameters to the command. For example:
fail_command=say "Oh no, monitor {name} has failed at {failed_at}"
The commands are executed directly by Python. If you require shell features, such as piping and redirection, you should use something like bash -c "..."
. For example:
fail_command=/bin/bash -c "/usr/bin/printf \"The simplemonitor for {name} has failed on {hostname}.\n\nTime: {failed_at}\nInfo: {info}\n\" | /usr/bin/mailx -A gmail -s \"PROBLEM: simplemonitor {name} has failed on {hostname}.\" email@address"
nc - macOS notifications
Publishes alerts to the macOS Notification Center. Only for macOS.
No configuration options.
nextcloud_notification - notifications
Warning
Do not commit your credentials to a public repo!
Send a notification to a Nextcloud server.
- token
- Type
string
- Required
true
your nextcloud token
- user
- Type
string
- Required
true
the admin user name
- server
- Type
string
- Required
true
the nextcloud server
- server
- Type
string
- Required
true
the user id who should receive the notification
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request
pushbullet - push notifications
Warning
Do not commit your credentials to a public repo!
You will need to be registered at pushbullet.
- token
- Type
string
- Required
true
your pushbullet token
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request
pushover - notifications
Warning
Do not commit your credentials to a public repo!
You will need to be registered at pushover.
- user
- Type
string
- Required
true
your pushover user key
- token
- Type
string
- Required
true
your pushover app token
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request
ses - email via Amazon Simple Email Service
Warning
Do not commit your credentials to a public repo!
If you have AWS credentials configured elsewhere (e.g. in ~/.aws/credentials
), or in the environment, this will use those and you do not need to specifiy credentials in your configuration file.
As a best practice, use an IAM User/Role which is only allowed to access the resources in use.
You will need to verify an address or domain.
- from
- Type
string
- Required
true
the email address to send from
- to
- Type
string
- Required
true
the email address to send to
- aws_region
- Type
string
- Required
false
the AWS region to use (e.g.
eu-west-1
)
- aws_access_key
- Type
string
- Required
false
the AWS access key to use
- aws_secret_access_key
- Type
string
- Required
false
the AWS secret access key to use
slack - Slack webhook
Warning
Do not commit your credentials to a public repo!
First, set up a webhook for this to use.
Add a new webhook
Configure it to taste (channel, name, icon)
Copy the webhook URL for your configuration below
- url
- Type
string
- Required
true
the Slack webhook URL
- channel
- Type
string
- Required
false
- Default
the channel configured on the webhook
the channel to send to
- username
- Type
string
- Required
false
- Default
a username to send to
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request to Slack
sms77 - SMS via sms77
Warning
Do not commit your credentials to a public repo!
Send SMSes via the SMS77 service.
- api_key
- Type
string
- Required
true
your API key for SMS77
- target
- Type
string
- Required
true
the target number to send to
- sender
- Type
string
- Required
false
- Default
SmplMntr
the sender to use for the SMS
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request
sns - Amazon Simple Notification Service
Warning
Do not commit your credentials to a public repo!
If you have AWS credentials configured elsewhere (e.g. in ~/.aws/credentials
), or in the environment, this will use those and you do not need to specifiy credentials in your configuration file.
As a best practice, use an IAM User/Role which is only allowed to access the resources in use.
Note that not all regions with SNS also support sending SMS.
- topic
- Type
string
- Required
yes, if
number
is not given
the ARN of the SNS topic to publish to. Specify this, or
number
, but not both.
- number
- Type
string
- Required
yes, if
topic
is not given
the phone number to SMS. Give the number as country code then number, without a
+
or other international access code. For example,447777123456
for a UK mobile. Specify this, ortopic
, but not both.
- aws_region
- Type
string
- Required
false
the AWS region to use (e.g.
eu-west-1
)
- aws_access_key
- Type
string
- Required
false
the AWS access key to use
- aws_secret_access_key
- Type
string
- Required
false
the AWS secret access key to use
syslog - send to syslog
Syslog alerters have no additional configuration.
telegram - send to a chat
Warning
Do not commit your credentials to a public repo!
- token
- Type
string
- Required
true
the token to access Telegram
- chat_id
- Type
string
- Required
true
the chat id to send to
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request to Telegram
twilio_sms - SMS via Twilio
Warning
Do not commit your credentials to a public repo!
Send SMSes via the Twilio service.
- account_sid
- Type
string
- Required
true
your account SID for Twilio
- auth_token
- Type
string
- Required
true
your auth token for Twilio
- target
- Type
string
- Required
true
the target number to send to. Format should be
+
followed by a country code and then the phone number
- sender
- Type
string
- Required
false
- Default
SmplMntr
the sender to use for the SMS. Should be a number in the same format as the target parameter, or you may be able to use an alphanumberic ID.
Logger Configuration
Loggers record the state of every monitor after each interval.
Loggers are defined in the main configuration file, which by default is monitor.ini
. The section name is the name of your logger, which you should then add to the loggers
configuration value.
Contents
Common options
These options are common to all logger types.
- type
- Type
string
- Required
true
the type of the logger; one of those in the list below.
- depend
- Type
comma-separated list of string
- Required
false
- Default
none
a list of monitors this logger depends on. If any of them fail, no attempt will be made to log.
- groups
- Type
comma-separated list of string
- Required
false
- Default
default
list of monitor groups this logger should record. Use the special value
_all
to match all groups. See the group setting for monitors.
- tz
- Type
string
- Required
false
- Default
UTC
the timezone to convert date/times to
Loggers
Note
The type
of the logger is the first word in its heading.
db - sqlite log of results
Logs results to a SQLite database. The results are written to a table named results
.
If you want to have a SQLite snapshot of the current state of the monitors (not a log of results), see the dbstatus logger.
Automatically create the database schema.
- path
- Type
string
- Required
true
the path to the database file to use
dbstatus - sqlite status snapshot
Stores a snapshot of monitor status in a SQLite database. The statuses are written to a table named status
.
If you want to have a SQLite log of results (not a snapshot), see the db logger.
Automatically creates the database schema.
- path
- Type
string
- Required
true
the path to the database file to use
html - HTML status page
Warning
Do not commit your credentials to a public repo!
Writes an HTML status page. Can optionally display a map.
The supplied template includes JavaScript to notify you if the page either doesn’t auto-refresh, or if SimpleMonitor has stopped updating it. This requires your machine running SimpleMonitor and the machine you are browsing from to agree on what the time is (timezone doesn’t matter)! The template is written using Jinja2.
You can use the upload_command
setting to specify a command to push the generated files to another location (e.g. a web server, an S3 bucket etc). I’d suggest putting the commands in a script and just specifying that script as the value for this setting.
- filename
- Type
string
- Required
true
the html file to output. Will be stored in the
folder
- folder
- Type
string
- Required
false
- Default
html
the folder to write the output file(s) to. Must exist.
- copy_resources
- Type
boolean
- Required
false
- Default
true
if true, copy supporting files (CSS, images, etc) to the
folder
- source_folder
- Type
string
- Required
false
the path to find the template and supporting files in. Defaults to those contained in the package. (In the package source, they are in
simplemonitor/html/
.)
- upload_command
- Type
string
- Required
false
if set, a command to execute each time the output is updated to e.g. upload the files to an external webserver
- map
- Type
boolean
- Required
false
set to true to enable the map display instead of the table. You must set the gps value on your Monitors for them to show up!
- map_start
- Type
comma-separated list of float
- Required
false
three comma-separated values: the latitude the map display should start at, the longitude, and the zoom level. A good starting value for the zoom is probably between 10 and 15.
- map_token
- Type
string
- Required
yes, if using the map
an API token for mapbox.com in order to make the map work
json - write JSON status file
Writes the status of monitors to a JSON file.
- filename
- Type
string
- Required
true
the filename to write to
logfile - write a logfile
Writes a log file of the status of monitors.
The logfile format is:
datetime monitor-name: status; VFC=vfc (message) (execution-time)
where the fields have the following meanings:
- datetime
the datetime of the entry. Format is controlled by the
dateformat
configuration option.- monitor-name
the name of the monitor
- status
either
ok
if the monitor succeeded, orfailed since YYYY-MM-DD HH:MM:SS
- vfc
the virtual failure count: the number of failures of the monitor beyond its tolerance. Not present for ok lines.
- message
the message the monitor recorded as the reason for failure. Not present for ok lines.
- execution-time
the time the monitor took to execute its check
- filename
- Type
string
- Required
true
the filename to write to. Rotating this file underneath SimpleMonitor will likely result to breakage. If you would like the logfile to rotate automatically based on size or age, see the logfileng logger.
- buffered
- Type
boolean
- Required
false
- Default
true
disable to use unbuffered writes to the logfile, allowing it to be watched in real time. Otherwise, you will find that updates don’t appear in the file immediately.
- only_failures
- Type
boolean
- Required
false
- Default
false
set to have only monitor failures written to the log file (almost, but not quite, turning it into an alerter)
logfileng - write a logfile with rotation
Writes a log file of the status of monitors. Rotates and deletes old log files based on size or age.
The logfile format is:
datetime monitor-name: status; VFC=vfc (message) (execution-time)
where the fields have the following meanings:
- datetime
the datetime of the entry. Format is controlled by the
dateformat
configuration option.- monitor-name
the name of the monitor
- status
either
ok
if the monitor succeeded, orfailed since YYYY-MM-DD HH:MM:SS
- vfc
the virtual failure count: the number of failures of the monitor beyond its tolerance. Not present for ok lines.
- message
the message the monitor recorded as the reason for failure. Not present for ok lines.
- execution-time
the time the monitor took to execute its check
- filename
- Type
string
- Required
true
the filename to write to. Rotated logs have either
.N
(where N is an incrementing number) or the date/time appended to the filename, depending on the rotation mode.
- rotation_type
- Type
string
- Required
true
one of
time
orsize
- when
- Type
string
- Required
false
- Default
h
Only for rotation based on time. The units represented by
interval
. One ofs
for seconds,m
for minutes,h
for hours, ord
for days
- interval
- Type
integer
- Required
false
- Default
1
Only for rotation based on time. The number of
when
between file rotations.
- max_bytes
- Type
- Required
yes, when rotation_type is
size
the maximum log file size before it is rotated.
- backup_count
- Type
integer
- Required
false
- Default
1
the number of old files to keep
- only_failures
- Type
boolean
- Required
false
- Default
false
set to have only monitor failures written to the log file (almost, but not quite, turning it into an alerter)
mqtt - send to MQTT server
Warning
Do not commit your credentials to a public repo!
Sends monitor status to an MQTT server. Supports Home Assistant specifics (see https://www.home-assistant.io/docs/mqtt/discovery/ for more information).
- host
- Type
string
- Required
true
the hostname/IP to connect to
- port
- Type
integer
- Required
false
- Default
1883
the port to connect on
- hass
- Type
boolean
- Required
false
- Default
false
enable Home Assistant specific features for MQTT discovery
- topic
- Type
string
- Required
false
- Default
see below
the MQTT topic to post to. By default, if
hass
is not enabled, usessimplemonitor
, elsehomeassistant/binary_sensor
- username
- Type
string
- Required
false
the username to use
- password
- Type
string
- Required
false
the password to use
network - remote SimpleMonitor logging
Warning
Do not commit your credentials to a public repo!
This logger is used to send status reports of all monitors to a remote
instance. The remote instance must be configured to listen for connections. The
key
parameter is a shared secret used to generate a hash of the network traffic
so the receiving instance knows to trust the data.
Warning
Note that the traffic is not encrypted, just given a hash to validate it.
The remote instance will need the remote
, remote_port
, and key
configuration values set.
If you want the remote instance to handle alerting for this instance’s monitors, you need to set the remote_alert option on your monitors. This is a good candidate to go the [defaults]
section of your monitors config file.
- host
- Type
string
- Required
true
the remote hostname/IP to send to
- port
- Type
string
- Required
true
the remote port to connect to
- key
- Type
string
- Required
true
the shared secret to validate communications
seq - seq log server
Sends the status of monitors to a seq log server. See https://datalust.co for more information on Seq.
- endpoint
- Type
string
- Required
true
the full URI for the endpoint on the seq server, for example
http:://localhost:5341/api/events/seq
.
- timeout
- Type
int
- Required
false
- Default
5
Timeout for HTTP request to seq
Creating Monitors
To create your own Monitor, you need to:
Create a Python file in
simplemonitor/Monitors
(or pick a suitable existing one to add it to)If you’re creating a new file, you’ll need a couple of imports:
from .monitor import Monitor, register
Define your monitor class, which should subclass
Monitor
and be decorated by@register
. Set a class attribute for the “type” which will be used in the monitor configuration to use it.
@register
class MonitorMyThing(Monitor):
monitor_type = "my_thing"
Define your initialiser. It should call the superclass’s initialiser, and then read its configuration values from the supplied dict. You can also do any other initialisation here.
This code should be safe to re-run, as if SimpleMonitor reloads its configuration, it will call
__init__()
with the new configuration dict. Use theget_config_option()
helper to read config values.
@register
class MonitorMyThing(Monitor):
monitor_type = "my_thing"
def __init__(self, name: str, config_options: dict) -> None:
super().__init__(name, config_options)
self.my_setting = self.get_config_option("my_setting", required=True)
Add a
run_test
function. This should perform the test for your monitor, and callrecord_fail()
orrecord_success()
as appropriate. It must also returnFalse
orTrue
to match. The tworecord_*()
methods return the right value, so you can just use them as the value toreturn
. You can useself.monitor_logger
to perform logging (it’s a standard Pythonlogging
object).You should catch any suitable exceptions and handle them as a failure of the monitor. The main loop will handle any uncaught exceptions and fail the monitor with a generic message.
@register
class MonitorMyThing(Monitor):
# ...
def run_test(self) -> bool:
# my test logic here
if test_succeeded:
return self.record_success("it worked")
return self.record_fail(f"failed with message {test_result}")
You should also give a
describe
function, which explains what this monitor is checking for:
@register
class MonitorMyThing(Monitor):
# ...
def describe(self) -> str:
return f"checking that thing f{my_setting} does foo"
In
simplemonitor/Monitors/__init__.py
, add your Monitor to the list of imports.
That’s it! You should now be able to use type=my_thing
in your Monitors configuration to use your monitor.
If you’d like to share your monitor back via a PR, please also:
Use type decorators, and verify with mypy. You may need to use
cast(TYPE, self.get_config_option(...))
in your__init__()
to get things to settle down. See existing monitors for examples.Use Black to format the code.
Add documentation for your monitor. Create a file in docs/monitors/ called my_thing.rst and follow the pattern in the other files to document it.
There’s a pre-commit configuration in the repo which you can use to check things over.
Creating Alerters
To create your own Alerter, you need to:
Create a Python file in
simplemonitor/Alerters
(or pick a suitable existing one to add it to)If you’re creating a new file, you’ll need a couple of imports:
from ..Monitors.monitor import Monitor from .alerter import Alerter, AlertLength, AlertType, register
Define your alerter class, which should subclass
Alerter
and be decorated by@register
. Set a class attribute for the “type” which will be used in the alerter configuration to use it.@register class MyAlerter(Alerter): alerter_type = "my_alerter"
Define your initialiser. It should call the superclass’s initialiser, and then read its configuration values from the supplied dict. You can also do any other initialisation here.
This code should be safe to re-run, as if SimpleMonitor reloads its configuration, it will call
__init__()
with the new configuration dict. Use theget_config_option()
helper to read config values.@register class MyAlerter(Alerter): alerter_type = "my_alerter" def __init__(self, config_options: dict) -> None: super().__init__(config_options) self.my_setting = self.get_config_option("setting", required=True)
Add a
send_alerter
function. This receives the information for a single monitor. You should first callself.should_alert(monitor)
, which will return the type of alert to be sent (e.g. failure). You should return if it isAlertType.NONE
.You should then prepare your message. Call
self.build_message()
to generate the message content. Check the value ofself._dry_run
and if it is True, you should log (usingself.alerter_logger.info(...)
) what you would do, else you should do it.- Alerter.build_message(length: AlertLength, alert_type: AlertType, monitor: Monitor) str
Generate a suitable length alert message for the given type of alert, for the given Monitor.
- Parameters
AlertLength – one of the AlertLength enum values:
NOTIFICATION
(shortest),SMS
(will be <= 140 chars),ONELINE
,TERSE
(not currently supported),FULL
, orESSAY
AlertType – one of the AlertType enum values:
NONE
,FAILURE
,CATCHUP
, orSUCCESS
monitor – the Monitor to generate the message for
- Returns
the built message
- Return type
- Raises
ValueError – if the AlertType is unknown
NotImplementedError – if the AlertLength is unknown or unsupported
You should also give a
_describe_action
function, which explains what this alerter does. Note that the time configuration for the alerter will be automatically added:@register class MyAlerter(Alerter): # ... def _describe_action(self) -> str: return f"sending FooAlerters to {self.recipient}"
In
simplemonitor/Alerters/__init__.py
, add your Alerter to the list of imports.
That’s it! You should now be able to use type=my_alerter
in your Alerters configuration to use your alerter.
Creating Loggers
Before writing your logger, you need to consider if you should support batching or not. If a logger supports batching, then it collects all the monitor results and then performs its logging action. For example, the HTML logger uses batching so that when it generates the HTML output, it knows all the monitors to include (and can sort them etc). Non-batching loggers will simply perform their logging action multiple times, once per monitor.
To create your own Logger, you need to:
Create a Python file in
simplemonitor/Loggers
(or pick a suitable existing one to add it to)If you’re creating a new file, you’ll need a couple of imports:
from ..Monitors.monitor import Monitor from .logger import Logger, register
Define your logger class, which should subclass
Logger
and be decorated by@register
. Set a class attribute for the “type” which will be used in the logger configuration to use it. Additionally, set thesupports_batch
value to indicate if your logger should be used in batching mode.@register class MyLogger(Logger): logger_type = "my_logger" supports_batch = True # or False
Define your initialiser. It should call the superclass’s initialiser, and then read its configuration values from the supplied dict. You can also do any other initialisation here.
This code should be safe to re-run, as if SimpleMonitor reloads its configuration, it will call
__init__()
with the new configuration dict. Use theget_config_option()
helper to read config values.@register class MyLogger(Logger): logger_type = "my_logger" def __init__(self, config_options: dict) -> None: super().__init__(config_options) self.my_setting = self.get_config_option("setting", required=True)
Add a
save_result2
function (yes, I know). This receives the information for a single monitor.Batching loggers should save the information they need to into self.batch_data, which should (but does not have to be) a dict of str: Any using the monitor name as the key. This is automatically initialised to an empty dict at the start of the batch. You should extend the start_batch method from Logger to customise it.
@register class MyLogger(Logger): # ... def save_result2(self, name: str, monitor: Monitor) -> None: self.batch_data[name] = monitor.state
Non-batching loggers can perform whatever logging action they are designed for at this point.
@register class MyLogger(Logger): # ... def save_result2(self, name: str, monitor: Monitor) -> None: self._my_logger_action(f"Monitor {name} is in state {monitor.state}")
Batching loggers only should provide a
process_batch
method, which is called after all the monitors have been processed. This is where you should perform your batched logging operation.@register class MyLogger(Logger): # ... def process_batch(self) -> None: with open(self.filename, "w") as file_handle: for monitor, state in self.batch_data.iteritems(): file_handle.write(f"Monitor {monitor} is in state {state}\n")
You should also give a
describe
function, which explains what this logger does:@register class MyLogger(Logger): # ... def describe(self) -> str: return f"writing monitor info to {self.filename}"
In
simplemonitor/Loggers/__init__.py
, add your Logger to the list of imports.
That’s it! You should now be able to use type=my_thing
in your Loggers configuration to use your logger.
Getting configuration values
When loading configuration values for Monitors, Alerters and Loggers, you can use the get_config_option() function to perform sanity checks on the loaded config.
- get_config_option(config_options: dict, key: str[, default=None[, required=False[, required_type="str"[, allowed_values=None[, allow_empty=True[, mininum=None[, maximum=None]]]]]]])
Get a config value out of a dict, and perform basic validation on it.
- Parameters
config_options (dict) – The dict to get the value from
key (str) – The key to the value in the dict
default – The default value to return if the key is not found
required (bool) – Throw an exception if the value is not present (and default is None)
required_type (str) – One of str, int, float, bool, [int] (list of int), [str] (list of str)
allowed_values – A list of allowed values
allow_empty (bool) – Allow the empty string when required_type is “str”
minimum (integer, float or None) – The minimum allowed value for int and float
maximum (integer, float or None) – The maximum allowed value for int and float
- Returns
the fetched configuration value (or the default)
Note that the return type of the function signature covers all supported types, so you should use
typing.cast()
to help mypy understand. Do not use assert.
SimpleMonitor is a Python script which monitors hosts and network connectivity and status. It is designed to be quick and easy to set up and lacks complex features that can make things like Nagios, OpenNMS and Zenoss overkill for a small business or home network. Remote monitor instances can send their results back to a central location.
SimpleMonitor supports Python 3.6.2 and higher on Windows, Linux and FreeBSD.
To get started, see Installation.
Features
Things SimpleMonitor can monitor
For the complete list, see Monitors.
Host ping
Host open ports (TCP)
HTTP (is a URL fetchable without error? Does the page content it match a regular expression?)
DNS record return value
Services: Windows, Linux, FreeBSD services are supported
Disk space
File existence, age and time
FreeBSD portaudit (and pkg audit) for security notifications
Load average
Process existence
Exim queue size monitoring
APC UPS monitoring (requires apcupsd to be installed and configured)
Running an arbitrary command and checking the output
Compound monitors to combine any other types
Adding your own Monitor type is straightforward with a bit of Python knowledge.
Logging and Alerting
To SimpleMonitor, a Logger is something which reports the status of every monitor, each time it’s checked. An Alerter sends a message about a monitor changing state.
Some of the options include (for the complete list, see Loggers and Alerters):
Writing the state of each monitor at each iteration to a SQLite database
Sending an email alert when a monitor fails, and when it recovers, directly over SMTP or via Amazon SES
Writing a log file of all successes and failures, or just failures
Sending a message via BulkSMS, Amazon Simple Notification Service (SNS), Telegram, Slack, MQTT (with HomeBridge support) and more
Writing an HTML status page
Writing an entry to the syslog (non-Windows only)
Executing arbitary commands on monitor failure and recovery
Other features
Simple configuration file format: it’s a standard INI file for the overall configuration and another for the monitor definitions
Remote monitors: An instance running on a remote machine can send its results back to a central instance for central logging and alerting
Dependencies: Monitors can be declared as depending on the success of others. If a monitor fails, its dependencies will be skipped until it succeeds
Tolerance: Monitors checking things the other side of unreliable links or which have many transient failures can be configured to require their test to fail a number of times in a row before they report a problem
Escalation of alerts: Alerters can be configured to require a monitor to fail a number of times in a row (after its tolerance limit) before they fire, so alerts can be sent to additional addresses or people
Urgency: Monitors can be defined as non-urgent so that urgent alerting methods (like SMS) are not wasted on them
Per-host monitors: Define a monitor which should only run on a particular host and all other hosts will ignore it – so you can share one configuration file between all your hosts
Groups: Configure some Alerters to only react to some monitors
Monitor gaps: By default every monitor polls every interval (e.g. 60 seconds). Monitors can be given a gap between polls so that they only poll once a day (for example)
Alert periods: Alerters can be configured to only alert during certain times and/or on certain days
Alert catchup: …and also to alert you to a monitor which failed when they were unable to tell you. (For example, I don’t want to be woken up overnight by an SMS, but if something’s still broken I’d like an SMS at 7am as I’m getting up.)
Contributing
Clone the GitHub repo
poetry install
You can use pre-commit to ensure your code is up to my exacting standards ;)
You can run tests with make unit-test
. See the Makefile for other useful targets.
Licence
SimpleMonitor is released under the BSD Licence.