UNICORE Monitoring Infrastructure Probes Administrator and Developer Guide
==========================================================================
Mariusz Strzelecki <szczeles@mat.umk.pl>
1.5, 17.01.2013

This document describes part of the http://nagios.org/[Nagios-based] UNICORE Monitoring Infrastructure developed in http://www.plgrid.pl/[PL-Grid] and http://www.eu-emi.eu/[EMI] projects.

General Documentation
---------------------

UNICORE Monitoring Infrastructure Probes (shortly: UMI-Probes) is a package that consists of scripts that can be used to test the functionality of each "main" http://unicore.eu/[UNICORE] components:

 * UNICORE Gateway,
 * UNICORE Registry,
 * UNICORE/X (including CIP component),
 * UNICORE SMS implementations,
 * UVOS,
 * UNICORE Workflow Factory,
 * UNICORE Service Orchestrator,
 * UNICORE Common Information Service,
 * UNICORE StorageFactory,
 * UNICORE accounting system message broker (ActiveMQ).

Additionally, a script that checks appropriate functionality of any application installed in the Grid environment is available.

All scripts can be easily used as probes in Nagios-based monitoring environment. Each of them presents the result of the test in well-known http://nagios.sourceforge.net/docs/3_0/pluginapi.html[Nagios probes format] and is compatible with http://nagiosplug.sourceforge.net/developer-guidelines.html[Nagios probes development standards]. The main programming languages of probes are Perl, Java and Groovy. The majority of scripts are dependent on availability of standard UNICORE clients: UNICORE Commandline Client and UVOS Commandline Client (they are available in the newest EMI release) and some Perl modules: perl-Error, perl-Sort-Versions, perl-XML-RSS (all available via CPAN).

What's new
~~~~~~~~~~

In 2.3.2 version of probes:

 * ability to check exit code of a job in check_application has been added
 * unknown error in check_workflow has been fixed
 * unknown error in check_workflow when workflows limit is reached has been fixed
 * unknown error when unable to create temporary file because of file existence has been fixed
 * unknown error in check_workflow when registry.listServices() fails has been fixed

Known issues
~~~~~~~~~~~~

 * Probes package is dependent upon the UCC package in specific version. Most probes are written in Groovy using UCC internals and UCC still has not a stable API - probes may work with newer UCC package that was speficied in the spec file but this cannot be assured.
 * This release is not backward compatible to previous ones because of changes
 in configuration files sytax (due to
 http://sourceforge.net/tracker/index.php?func=detail&aid=3554834&group_id=248204&atid=2181153)

System Administrator Documentation
----------------------------------

UMI-Probes component is available as rpm package, available to install in Red Hat-compatible systems and as deb package to install on Debian-compatible systems. The following guides are ensured to work on Scientific Linux 5.5, Fedora 14 and Debian 6.

Installation Guide
~~~~~~~~~~~~~~~~~~

As mentioned above, probes are written in Perl, Java and Groovy and use UNICORE standard clients. Assuming that EMI repository is enabled, component can be installed using simple command:

  # yum install unicore-nagios-plugins

This package provides:

 * Nagios commands configuration in +/etc/unicore/monitoring-probes/commands.cfg+
 * Documentation and Licence in directory +/usr/share/doc/unicore/monitoring-probes/+
 * Probes in directory +/usr/libexec/grid-monitoring/probes/pl.plgrid/UNICORE/+

Each probe is placed in directory named as probe itself and consists of main test program (executable script named as probe with +.pl+ extension), readme in text format (with +.README+ extension) and readme in html format (with +.html+ extension). There can be other files (groovy scripts or java classes) that are internally used by tests.

Configuration Guide
~~~~~~~~~~~~~~~~~~~

Each probe needs appropriate configurations:

 * configuration of related UNICORE client,
 * logging configuration for probe,
 * configuration of probe itself.

All configuration-making processes can be easily automated by using package UMI-Autoconf, released by PL-Grid project.

Samples of UNICORE *clients configurations* are attached in their packages and has to be changed to be able to connect to the Grid. This is recommended to test prepared clients configuration by executing commands:

  $ uvos-clc -b getMyIds
  $ ucc list-sites
  $ ucc list-storages
  $ ucc run /usr/share/doc/unicore/ucc/samples/date.u
  $ ucc workflow-submit \ 
  /usr/share/doc/unicore/ucc/samples/workflows/date-with-stageout.swf

If all of this commands ends without any error that means that clients configuration is ready to use with probes.

The second step is to prepare log4j *logging configuration*. This is standard log4j configuration file that will be used by UNICORE clients (both UCC and UVOS CLC are written in Java), samples are placed in files +/etc/unicore/ucc/logging.properties+ and +/etc/unicore/uvos-clc/log4j.properties+. Each probe needs two configuration files: for standard execution and for debug purposes. There are strict naming conventions: files has to be named +log4j-[clientname].properties+ or +log4j-[clientname]-debug.properties+ (where +[clientname]+ is +ucc+ or +uvosclc+). Every probe at each run looks for appropriate logging configuration to use in directories in the following order:

1. location of probe configuration,
2. location of UNICORE clients configuration,
3. location of logging directory of each probe (the least recommended way).

If configuration is not found, probe will not start and will display suitable message.

Finally, the third type of configuration is *probes configuration*. Every probe gets its "what is to be tested" information from configuration file. In some cases few probes can use the same configuration file (especially if they are used to monitor one grid site). The structure of file is pasted below:

--------------

# Comments need to be started with hash
# Comment

UCC_PATH="/usr/bin/ucc"

# Above line means that variable UCC_PATH is set to /usr/bin/ucc 
# (quotes are mandatory)

LOGS_DIR="/var/log/unicore/monitoring/icm.edu.pl"

--------------

In each probe configuration there is a section that describes what values need to be set in each probe configuration file.

Probes Reference Card
~~~~~~~~~~~~~~~~~~~~~

All probes are written using Nagios probes standard. That means that every probe

1. uses Perl as the main programming language (but disables the usage of Nagios embedded perl),
2. has definable directory for storing logs and temporary files,
3. has the ability to set timeout for probe execution (option +-t+ or +--timeout+)
4. has the ability to set verbosity level (option +-v+ or +--verbosity+) to one of
  * 0 -> prints only one line with status,
  * 1 -> default, prints line with status with optional debug info,
  * 2 -> prints data like +-v 1+ and additionally information about probe environment (configuration parsing, client running, debug info from UCC),
  * 3 -> prints data like +-v 2+ and disables deletion of temporary files after even successful execution,

5. shows readme with +-h+ or +--help+ flag given,
6. shows version of every probe with +--version+ option,
7. puts shell command into log file before execution.

Descriptions of all the probes are attached into next sections:

include::../../../../umi2/check_activemq/check_activemq.README[]

include::../../../../umi2/check_application/check_application.README[]

include::../../../../umi2/check_certificate/check_certificate.README[]

include::../../../../umi2/check_cip/check_cip.README[]

include::../../../../umi2/check_cis/check_cis.README[]

include::../../../../umi2/check_freespace/check_freespace.README[]

include::../../../../umi2/check_gateway/check_gateway.README[]

include::../../../../umi2/check_registry/check_registry.README[]

include::../../../../umi2/check_servorch/check_servorch.README[]

include::../../../../umi2/check_storagefactory/check_storagefactory.README[]

include::../../../../umi2/check_sms/check_sms.README[]

include::../../../../umi2/check_unicorex/check_unicorex.README[]

include::../../../../umi2/check_uvos/check_uvos.README[]

include::../../../../umi2/check_versions/check_versions.README[]

include::../../../../umi2/check_workflow/check_workflow.README[]

include::../../../../umi2/check_workflowservice/check_workflowservice.README[]

Developer Guide
---------------

API Documentation
~~~~~~~~~~~~~~~~~

All described probes use one-file Perl library that makes writing new ones quite easy. This file is located at +umi2/commons.pm+ and consists of several "public" (exported) functions:

 * +exit_plugin+ - the most preferred way of exiting probes. Takes two arguments. The first is status line in format +[STATUS]: [message]+ where +[STATUS]+ is one of: +OK, WARNING, CRITICAL, UNKNOWN+. This line will be shown as probe output in every verbose mode (and of course appropriate exit code of script will be set to meet Nagios API requirements). The second argument are optional debug data. Provided string will be evaluated and displayed by probe in first verbose mode.

 * +setup_plugin+ - this function has to be called at the beginning of probe execution. It takes two parameters: location of readme file (to display its fragment as help message if +--help+ option is provided) and version of probe (that will be displayed when a user calls script with +--version+ flag). Procedure gets options from the commandline and stores it in external +%config+ variable. Next sets timeout of probe to value specified by a user in command line or 300 seconds by the default. Then loads configuration file and also stores the data in +%config+ hash. Finally changes working directory to +[LOGS_DIR]/[plugin_name]+ - that is place where log files and temporary files will be stored.

 * +message+ - shows message to user according to requested verbose mode. Takes two parameters - one is message to display and the second is the least verbose mode to attach this message to the output.

 * +check_conditions+ - checks some conditions and if any is met, exits probe. Takes one argument - array of conditions. Every element is hash with three keys: +test+, +output+ and +show_debug+. First, +test+, is the logical condition that will be evaluated. If evaluation gives false message (empty string or 0) function tests next one. Otherwise it calls +exit_plugin+ subroutine with +output+ and +show_debug+ parameters. If +show_params+ is +'0'+ then debug messages will not be shown.

 * +create_temp_file+ - creates a temporary file with name provided as first argument. The file is stored in current directory (set by +setup_plugin+) and is saved to be deleted at the end of probe execution.

 * +check_config+ - checks if variable required by script execution is available in configuration file. Takes two parameters: first is a comma-separated list of configuration variables. If any of them is not given in configuration, probe exits with +UNKNOWN+ status. But if there is a second (optional) parameter defined, the function does not quit probe and just returns a number of undefined options.

 * +run+ - executes external command with timeout checking. The command can be one of: +ucc, uvosclc, java+ and is given as first argument. The second parameter is a line of arguments to be passed to command (configuration files for UNICORE clients and Registry URL for UCC are included in command line by script). The third argument is path to the file where output will be saved (it is preferred to pass path returned by above described +create_temp_file+ or just +'/dev/null'+ if probe output does not matter). The fourth parameter is optional and should be set if stderr stream has to be attached to output (if verbose mode is more or equal 2 this flag is set by default). Both UCC and UVOS CLC are executed with appropriate environment variables that sets path to log4j properties file.

 * +check_file_existence+ - checks if file passed as the first parameter exists in file system. Can be easily used if developer is not sure if script that is to be executed is properly defined.

 * +is_debug_enabled+ - returns if verbose mode is more or equal two.

Additionally there some other options of library that developers may need:

 * +$main::CLEANUP+ variable - if set, it is executed at the end of probe execution. Can be set to for example clear Grid objects at the end of each execution (see +check_application+ source code).

Build Documentation
~~~~~~~~~~~~~~~~~~~

Build of component is done by UNICORE packman tool (in fact modified version of packman, see +packaging/packman-opts.xml+ file). There are three main targets of packaging script:

 * +./packaging/packman.sh probes-clean+ - deletes +.class+ files and temporary build workspace directories
 * +./packaging/packman.sh probes-compile+ - compiles two +.java+ classes (used by check_uvos and check_gateway)
 * +./packaging/packman.sh all-rpm+ - packages component into four files: binary rpm, binary tar, source rpm and source tar.

Documentation is built using UNICORE docman tool. This can be run by command: +./packaging/docman.sh+.

Component changelog
-------------------

include::changelog.txt[]
