Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

General Remarks

Most changes to our Nagios server configuration are confidential, so they are not described in this public area.

Here you will find only a few examples; some values are anonymized.

References

Check HTTP: http://linux.101hacks.com/unix/check-http/

Global configuration

nagios.cfg

vi /usr/local/nagios/etc/nagios.cfg

Our changes were:

  • Include files. 
    • We use the directory search capability of the Nagios configuration which includes all files in the directory specified.
    • We commented out the reference to localhost.conf.
  • Admin mail adress (admin_email and admin_pager - though they are not used).
  • The global date format (date_format).
Code Block
titlenagios.cfg
# You can specify individual object config files as shown below:
...
cfg_file=/usr/local/nagios/etc/objects/templates.cfg

# Gutzmann setup
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg
cfg_dir=/usr/local/nagios/etc/objects/hosts

# Definitions for monitoring the local (Linux) host
# cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
...
date_format=iso8601
...
admin_email=thomas.gutzmann@gutzmann.com
admin_pager=thomas.gutzmann@gutzmann.com
...

 

cgi.cfg

vi /usr/local/nagios/etc/cgi.cfg

Our changes:

  • refresh_rate=60 # default is 90 seconds. Determines how often the web pages are refreshed.

Objects configuration

commands.cfg

vi /usr/local/nagios/etc/objects/commands.cfg

 

The first block of changes relates to sending SMS with sendmessage.pl:

Code Block
languagetext
titleSMS related commands
define command {
        command_name    notify-service-by-sms
        command_line    $USER1$/sendmessage.pl $CONTACTPAGER$ "Nagios-$NOTIFICATIONTYPE$ : $HOSTALIAS$/$SERVICEDESC$ State: $SERVICESTATE$ Additional Info:$SERVICEOUTPUT$"
        }
define command {
        command_name    notify-host-by-sms
        command_line    $USER1$/sendmessage.pl $CONTACTPAGER$ "Nagios - $NOTIFICATIONTYPE$ : Host $HOSTALIAS$ is $HOSTSTATE$ ($OUTPUT$)"
        }

To monitor a remote node with NRPE, another command has to be defined:

Code Block
languagetext
titleNRPE
define command {
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

contacts.cfg

vi /usr/local/nagios/etc/objects/contacts.cfg

Give the Nagios admin a real mail address:

Code Block
languagetext
titleNagios Admin
define contact{
        contact_name                    nagiosadmin             ; Short name of user
        use                             generic-contact         
        alias                           Nagios Admin            ; Full name of user
        email                           xxx@gutzmann.com
        }

Set up individual users:

 

Code Block
languagetext
titleAdditional users
define contact {
        contact_name                    gutzmtho
        use                             gutzmann-contact
        alias                           Thomas Gutzmann
        service_notification_commands   notify-service-by-email,notify-service-by-sms
        host_notification_commands      notify-host-by-email,notify-host-by-sms
        pager                           +46...
        email                           thomas.gutzmann@gutzmann.com
        }

Set up groups for ourselves and our customers:

Code Block
languagetext
titleUser groups
define contactgroup {
        contactgroup_name       gutzmann-admins
        alias                   Gutzmann Administrators
        members                 gutzmtho
        }
define contactgroup {
        contactgroup_name       xxx-admins
        alias                   Xxx
        members                 xxx
        }

template.cfg

vi /usr/local/nagios/etc/objects/templates.cfg

Some new templates are currently just placeholders; they make it easier to apply company-specific changes at a later point in time.

Code Block
titlePlaceholders
define contact {
        name                            gutzmann-contact
        use                             generic-contact
        register                        0
        }

Host templates for own servers:

Code Block
titleOwn servers
define host {
        name                            gutzmann-host
        use                             generic-host
        check_period                    24x7
        check_interval                  5
        retry_interval                  1
        max_check_attempts              5
        check_command                   check-host-alive
        notification_interval           60
        notification_options            d,u,r
        contact_groups                  gutzmann-admins
        notification_period             workhours
        register                        0
        }
define host {
        name                            gutzmann-host-critical
        use                             gutzmann-host
        check_interval                  2
        notification_period             24x7
        register                        0
        }

Host templates for customers:

Code Block
titleCustomers
define host {
        name                            xxx-host
        use                             gutzmann-host
        contact_groups                  xxx-admins
        notification_period             workhours
        register                        0
        }
define host {
        name                            xxx-host-critical
        use                             xxx-host
        check_interval                  1
        max_check_attempts              3
        notification_period             24x7
        register                        0
        }

Define service groups for easier overviews in the web interface:

Code Block
titleService groups
define servicegroup {
        servicegroup_name               gutzmann-mail-services
        alias                           Mail Services
        }
define servicegroup {
        servicegroup_name               gutzmann-db-services
        alias                           DB Services
        }
define servicegroup {
        servicegroup_name               gutzmann-os-services
        alias                           OS-Level Services
        }
define servicegroup {
        servicegroup_name               gutzmann-web-services
        alias                           Web-Level Services
        }

And now the service templates. They are mainly used to shorten and standardize the actual service definitions.

First a subclass of the generic service to set the default contact group:

Code Block
titlegutzmann-service
define service {
        name                            gutzmann-service
        use                             generic-service
        servicegroups                   gutzmann-os-services
        contact_groups                  gutzmann-admins
        register                        0
        }

Now the OS level services. These and all others use NRPE by default; to test the local host, the commands will be given in services.cfg (see below).

Code Block
titleOS checks
define service {
        name                            gutzmann-service-users
        use                             gutzmann-service
        servicegroups                   gutzmann-os-services
        service_description             User Count
        check_command                   check_nrpe!check_users
        register                        0
        }
define service {
        name                            gutzmann-service-load
        use                             gutzmann-service
        servicegroups                   gutzmann-os-services
        normal_check_interval           1
        service_description             CPU Load
        check_command                   check_nrpe!check_load
        register                        0
        }
define service {
        name                            gutzmann-service-disk-root
        use                             gutzmann-service
        servicegroups                   gutzmann-os-services
        service_description             Check Disk - root
        check_command                   check_nrpe!check_root
        register                        0
        }
define service {
        name                            gutzmann-service-disk-home
        use                             gutzmann-service
        servicegroups                   gutzmann-os-services
        service_description             Check Disk - home
        check_command                   check_nrpe!check_home
        register                        0
        }
define service {
        name                            gutzmann-service-total-procs
        use                             gutzmann-service
        servicegroups                   gutzmann-os-services
        service_description             Check Total Processes
        check_command                   check_nrpe!check_total_procs
        register                        0
        }
define service {
        name                            gutzmann-service-zombie-procs
        use                             gutzmann-service
        servicegroups                   gutzmann-os-services
        service_description             Check Zombie Processes
        check_command                   check_nrpe!check_zombie_procs
        register                        0
        }

The next groups of service definitions are used to check databases. They refer to commands described in other articles of this How-To group (individual links to be supplied).

Code Block
titleMySQL
define service {
        name                            gutzmann-service-mysql
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             MySQL
        normal_check_interval           1
        check_command                   check_nrpe!check_mysql
        register                        0
        }
Code Block
titleOracle
# Test Oracle listener directly (w/o NRPE)
define service {
        name                            gutzmann-service-1521
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             PING 1521
        normal_check_interval           1
        check_command                   check_tcp!1521
        register                        0
        }
# Tests through NRPE. 
# The first test is equivalent to the previous, but it runs on the database server.
define service {
        name                            gutzmann-service-tnsping
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             TNS Ping
        normal_check_interval           1
        check_command                   check_nrpe!check_tnsping
        register                        0
        }
define service {
        name                            gutzmann-service-tablespaces
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             Tablespace Usage
        check_command                   check_nrpe!check_tablespaces
        register                        0
        }
define service {
        name                            gutzmann-service-flra
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             FLRA Usage
        check_command                   check_nrpe!check_flra
        register                        0
        }
define service {
        name                            gutzmann-service-sga
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             Shared Pool Free
        check_command                   check_nrpe!check_sga
        register                        0
        }
#
Code Block
titlePostgres
define service {
        name                            gutzmann-service-pg-connection
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             Postgres Connection
        check_command                   check_nrpe!check_pg_connection
        register                        0
        }
define service {
        name                            gutzmann-service-pg-dbstats
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             Postgres DB Stats
        check_command                   check_nrpe!check_pg_dbstats
        register                        0
        }
define service {
        name                            gutzmann-service-pg-bloat
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             Postgres Bloat
        check_command                   check_nrpe!check_pg_bloat
        register                        0
        }
define service {
        name                            gutzmann-service-pg-db-size
        use                             gutzmann-service
        servicegroups                   gutzmann-db-services
        service_description             Postgres DB Size
        check_command                   check_nrpe!check_pg_database_size
        register                        0
        }

Here are the HTTP checks; they are run directly from the server, not through NRPE:

Code Block
titleHTTP
define service {
        name                            gutzmann-service-http
        use                             gutzmann-service
        servicegroups                   gutzmann-web-services
        service_description             HTTP
        normal_check_interval           1
        check_command                   check_http!-p 80
        register                        0
        }
define service {
        name                            gutzmann-service-https
        use                             gutzmann-service
        servicegroups                   gutzmann-web-services
        service_description             HTTPS
        normal_check_interval           1
        check_command                   check_http!-p 443
        register                        0
        }

And finally the simple SMTP check:

 

Code Block
titleMail
define service {
        name                            gutzmann-service-smtp
        use                             gutzmann-service
        servicegroups                   gutzmann-mail-services
        service_description             SMTP
        normal_check_interval           1
        check_command                   check_smtp
        register                        0
        }

hostsgroups.cfg

vi /usr/local/nagios/etc/objects/hostgroups.cfg 

Here we define host groups to improve the overviews in the web interface:

Code Block
titleHost groups
define hostgroup {
        hostgroup_name          gutzmann-servers-critical
        alias                   Critical Gutzmann Servers
        }
define hostgroup {
        hostgroup_name          xxx-servers-critical
        alias                   Critical Xxx Servers
        }
define hostgroup {
        hostgroup_name          gutzmann-servers-other
        alias                   Uncritical Gutzmann Servers
        }
define hostgroup {
        hostgroup_name          xxx-servers-other
        alias                   Unritical Xxx Servers
        } 

hosts/*.cfg

In the subdirectory /usr/local/nagios/etc/objects/hosts we have one file per host we want to monitor. Each file contains one host and several service directives.

To avoid DNS lookup problems, I use IP addresses only.

Here I give a few examples only:

hosts/monitor-a.cfg

This is the Nagios server itself - i.e. localhost.

Code Block
titlemonitor-a.cfg
define host {
        use                     gutzmann-host-critical
        host_name               monitor-a
        alias                   monitor-a.gutzmann.com
        hostgroups              gutzmann-servers-critical
        address                 localhost
}
define service {
        use                     local-service         ; Name of service template to use
        host_name               monitor-a
        service_description     Check Disk - root
        check_command           check_local_disk!20%!10%!/
        notifications_enabled   1
        }
define service {
        use                     local-service         ; Name of service template to use
        host_name               monitor-a
        service_description     Check Total Processes
        check_command           check_local_procs!250!400!RSZDT
        }
define service {
        use                     local-service         ; Name of service template to use
        host_name               monitor-a
        service_description     CPU Load
        check_command           check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
        }
define service {
        use                     local-service         ; Name of service template to use
        host_name               monitor-a
        service_description     HTTP
        check_command           check_http
        notifications_enabled   1
        }

hosts/wiki.cfg

wiki.gutzmann.com is our wiki server; we have set up Confluence with Postgres, so I include the Postgres checks as well.

Code Block
titlewiki.cfg
define host {
        use                     gutzmann-host-critical
        host_name               wiki
        alias                   wiki.gutzmann.com
        hostgroups              gutzmann-servers-critical
        address                 81.20.132.106
}
define service {
        use                     gutzmann-service-http
        host_name               wiki
        check_command           check_http!-H wiki.gutzmann.com
        }
define service {
        use                     gutzmann-service-load
        host_name               wiki
        }
define service {
        use                     gutzmann-service-disk-root
        host_name               wiki
        }
define service {
        use                     gutzmann-service-total-procs
        host_name               wiki
        }
define service {
        use                     gutzmann-service-zombie-procs
        host_name               wiki
        }
define service {
        use                     gutzmann-service-pg-connection
        host_name               wiki
        }
define service {
        use                     gutzmann-service-pg-dbstats
        host_name               wiki
        }
define service {
        use                     gutzmann-service-pg-bloat
        host_name               wiki
        }
define service {
        use                     gutzmann-service-pg-db-size
        host_name               wiki
        }

Test the configuration

You can use the service command to check the configuration before you restart Nagios:

service nagios checkconfig

The error messages can be found in /usr/local/nagios/var/nagios.log

But it's much more convenient to use the nagios command, because you see the errors immediatly:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Restart Nagios and check for errors

If you have modified files in /usr/local/nagios/etc (especially nagios.cfg), you must restart Nagios:

service nagios restart

If you have changes any other configuration files in /usr/local/nagios/etc, you should use "reload":

service nagios reload

Error messages can be found in

  • /usr/local/nagios/var/nagios.log
  • /var/log/httpd/error_log

Instead of reading or tailing the files in the console, you might better use the web interface (as long as Nagios did restart).

Nagios Management

Add Administrator Users

Web Interface

Code Block
languagebash
titleAdd Web user
htpasswd /usr/local/nagios/etc/htpasswd.users some_user
vi /usr/local/nagios/etc/cgi.cfg
...
authorized_for_system_information=nagiosadmin,some_user
... and other privileges as required
service nagios restart