DESCRIPTION
===========

	Job Monarch is a set of tools to monitor and optionally archive (batch)job information.

	It is a addon for the Ganglia monitoring system and plugs in to a existing Ganglia setup.

	To view a operational setup with Job Monarch, have a look here: http://ganglia.sara.nl/


	Job Monarch stands for 'Job Monitoring and Archiving' tool and consists of three (3) components:

	* jobmond

		The Job Monitoring Daemon.
		  
		Gathers PBS/Torque batch statistics on jobs/nodes and submits them into
		Ganglia's XML stream.

		Through this daemon, users are able to view the PBS/Torque batch system and the
		jobs/nodes that are in it (be it either running or queued).

	* jobarchived (optionally)

		The Job Archiving Daemon.

		Listens to Ganglia's XML stream and archives the job and node statistics.
		It stores the job statistics in a Postgres SQL database and the node statistics
		in RRD files.
		
		Through this daemon, users are able to lookup a old/finished job
		and view all it's statistics.

		Optionally: You can either choose to use this daemon if your users have use for it.
		As it can be a heavy application to run and not everyone may have a need for it.

		- Multithreaded:	Will not miss any data regardless of (slow) storage

		- Staged writing:	Spread load over bigger time periods

		- High precision RRDs:	Allow for zooming on old periods with large precision

		- Timeperiod RRDs:	Allow for smaller number of files while still keeping advantage
					of small disk space
		
	* web

		The Job Monarch web interface.

		This interfaces with the jobmond data and (optionally) the jobarchived and presents the
		data and graphs.

		It does this in a similar layout/setup as Ganglia itself, so the navigation and usage is intuitive.

		- Graphical usage:	Displays graphical cluster overview so you can see the cluster (job) state
					in one view/image and additional pie chart with relevant information on your
					current view

		- Filters:		Ability to filter output to limit information displayed (usefull for those
					clusters with 500+ jobs). This also filters the graphical overview images output
					and pie chart so you only see the filter relevant data

		- Archive:		When enabling jobarchived, users can go back as far as recorded in the database
					or archived RRDs to find out what happened to a crashed or old job

		- Zoom ability:		Users can zoom into a timepriod as small as the smallest grain of the RRDS
					(typically up to 10 seconds) when a jobarchived is present

REQUIREMENTS
============

	all:

		- Python 2.3 or higher

	jobmond:

		- pbs_python v2.8.2 or higher
		  https://subtrac.sara.nl/oss/pbs_python/

		- gmond v3.0.1 or higher
		  http://www.ganglia.info/

	jobarchived:

		- Postgres SQL v7.xx
		  http://www.postgres.org/

		- rrdtool v1.xx
		  http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/

		- py-rrdtool
		  http://sourceforge.net/projects/py-rrdtool/

		- python-psycopg2
		  http://sourceforge.net/projects/pypgsql/

		- gmetad v3.x.x
		  http://www.ganglia.info/

	web:

		- PHP v4.1 or higher
		  http://www.php.net

		- php-pgsql v4.x.x
		  (should come with Postgres)

		- php-mbstring

		- GD v2.x
		  http://www.boutell.com/gd/

		- Ganglia web frontend v3.x.x
		  http://www.ganglia.info


INSTALLATION
============

	Prior to installing the software make sure you meet the necessary requirements as
	mentioned above.

	NOTE: You can choose to install to other path/directories if your setup is different.

	* jobmond

		1. Copy jobmond.py:

		 > cp jobmond/jobmond.py /usr/local/sbin/jobmond.py

		2. Copy jobmond.conf:
		
		 > cp jobmond/jobmond.conf /etc/jobmond.conf

	* jobarchived

		1. Create a Postgres SQL database for jobarchived:

		 > createdb jobarchive

		2. Setup jobarchived's tables:

		 > psql -f jobarchived/job_dbase.sql jobarchive

		3. Copy jobarchived/jobarchived.conf:

		 > cp jobarchived/jobarchived.conf /etc/jobarchived.conf

		4. Copy jobarchived.py:

		 > cp jobarchived/jobarchived.py /usr/local/sbin/jobarchived.py

	* web

		1. Copy the Job Monarch Template to your Ganglia installation

		 > cp -a web/templates/job_monarch /var/www/ganglia/templates

		2. Copy the web interface files to the addon directory in Ganglia

		 > mkdir -p /var/www/ganglia/addons
		 > cp -a web/addons/job_monarch /var/www/ganglia/addons

CONFIGURATION
=============

	After installation each component requires additional configuration.

	* jobmond
	
		1. Edit Jobmond's config to reflect your settings:

		 - In /etc/jobmond.conf

		   ( see config comments for syntax and explanation )

	* jobarchived

		1. Edit Jobarchived's config to reflect your settings:

		 - In /etc/jobarchived.conf

		   ( see config comments for syntax and explanation )

	* web

		1. Change your Ganglia's web template to Job Monarch

		 - In /var/www/ganglia/conf.php:

		 > $template_name = "job_monarch";

		2. Change Job Monarch's config to reflect your settings:

		 - In /var/www/ganglia/addons/job_monarch/conf.php

		   ( see config comments for syntax and explanation )

START
=====

	* jobmond

		The Job Monitor has to be run on a machine that is allowed to
		query the PBS/Torque server.
		Make sure that if you have 'acl_hosts' enabled on your PBS/Torque
		server that jobmond's machine is in it.

		1. Start the Job Monitor:

		 > /usr/local/sbin/jobmond.py -c /etc/jobmond.conf

	* jobarchived

		1. Start the Job Archiver:

		 > /usr/local/sbin/jobarchived.py -c /etc/jobarchived.conf

	* web

		Doesn't require you to (re)start anything.
		( make sure the Postgres database is running though )

CONTACT
=======

	To contact the author for anything from bugfixes to flame/hate mail:

	* Ramon Bastiaans

	  <bastiaans ( a t ) sara ( d o t ) nl>
