Plumage - NoSQL Operational Data Store for HTCondor
.................................................

Overview
========
This contrib provides components for an ODS capability in HTCondor using the 
NoSQL database mongodb. The Quill contrib is a similar effort but based on a 
RDBMS model. The essential design of Plumage is to capture the data traffic 
emitted from the various HTCondor daemons and convert them from the ClassAd form
into a document instance in a mongodb collection. Once converted, these ClassAd
records can be queried simply and directly based on any attribute.

This contrib features two capabilities:
- a Plumage ODS plugin inside a view collector to capture the raw ClassAds for 
the Machine and Submitter types. The plugin operates on a timer to take snapshots 
of essential attributes at a configurable interval, and then write those values 
also to a different mongodb collection.
- a job collection daemon that monitors the HTCondor job queue log and history files.
Finalized and complete job ads (i.e., Completed, Removed) are recorded to mongodb.  

Installation
============
You will need to install mongodb and the C++ driver (1.6.4 minimum) as well as
dependencies:
- js
- boost
- pymongo
- python-dateutil

Plumage can be included in a HTCondor source build using the following variables 
when cmake is invoked:

	-DWANT_CONTRIB:BOOL=TRUE -DWITH_MANAGEMENT:BOOL=TRUE -DWITH_PLUMAGE:BOOL=TRUE

As part of a standard view collector setup, you may need to manually create a 
directory where the view collector can write its standard stats files:

     sudo mkdir -p /var/lib/condor/ViewHist

Note that the job history recording feature currently relies on the existing HTCondor 
file-based job history mechanisms. The Plumage daemon is essentially performing ETL
of those HTCondor history file pointed to by the $HISTORY config parameter. As such,
the schedd must be generating that history file for the Plumage ETL to work
correctly. However, once the jobs are recorded to to mongodb, the HTCondor history file
can be rolled over or removed.

Configuration
=============
This initial release is as a plugin to the existing view collector so much of the
HTCondor configuration relates to that of a standard view collector.

# "real" Collector config ...
######################
COLLECTOR.CONDOR_VIEW_HOST = $(CONDOR_HOST):12345
# list the ClassAds of interest to be forwarded
CONDOR_VIEW_CLASSAD_TYPES = Machine, Submitter

################
# View Server
################
VIEW_SERVER = $(COLLECTOR)
VIEW_SERVER_ARGS = -f -p 12345 -local-name VIEW_SERVER
VIEW_SERVER_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/ViewServerLog"
# make sure the view server doesn't point at itself
VIEW_SERVER.CONDOR_VIEW_HOST =
VIEW_SERVER.KEEP_POOL_HISTORY = True
VIEW_SERVER.SAMPLING_INTERVAL=20
VIEW_SERVER.PLUGINS = $(LIB)/plugins/PlumageCollectorPlugin-plugin.so
# or if not from an rpm...
#VIEW_SERVER.PLUGINS = $(VIEW_SERVER.PLUGINS) $(LIBEXEC)/PlumageCollectorPlugin-plugin.so
POOL_HISTORY_SAMPLING_INTERVAL = 60
UPDATE_INTERVAL = 300
# the following are defaults in code also
PLUMAGE_DB_HOST = localhost
PLUMAGE_DB_PORT =
DAEMON_LIST = $(DAEMON_LIST), VIEW_SERVER

##############
History daemon
##############
HISTORY = $(SPOOL)/history

# Job ETL server, provides continuous loading of job history into mongodb
JOB_ETL_SERVER = $(SBIN)/plumage_job_etl_server
JOB_ETL_SERVER_ARGS = -f
JOB_ETL_SERVER.JOB_ETL_SERVER_LOG = $(LOG)/JobEtlLog
JOB_ETL_SERVER.JOB_ETL_SERVER_DEBUG = D_ALWAYS
JOB_ETL_SERVER.JOB_ETL_SERVER_ADDRESS_FILE = $(LOG)/.job_etl_server_address

# add the job etl server to daemon list
DAEMON_LIST = $(DAEMON_LIST), JOB_ETL_SERVER
DC_DAEMON_LIST = + JOB_ETL_SERVER



plumage_stats
=============

Plumage has a client tool with *similar* capabilities to the condor_stats tool.
It provides listings of submitters, resources and records over time spans. The 
submitter records include point-in-time snapshots of running, held and idle jobs.
The resource records show arch, OS, keyboard idle time, load average and state.

Usage: plumage_stats [options]

Query HTCondor ODS for statistics

Options:
  -h, --help            show this help message and exit
  -v, --verbose         enable logging
  -s SERVER, --server=SERVER
                        mongodb database server location: e.g., somehost,
                        localhost:2011
  --u=USER, --user=USER
                        stats for a single submitter:
                        user,timestamp,running,held,idle
  --r=RESOURCE, --resource=RESOURCE
                        stats for a single resource: slot,timestamp,keyboard
                        idle,load average,status
  --f=START, --from=START
                        records from datetime in ISO8601 format e.g.,
                        '2011-09-29 12:03'
  --t=END, --to=END     records to datetime in ISO8601 format e.g.,
                        '2011-09-30T17:16'
  --ul, --userlist      list all submitters
  --ugl, --usergrouplist
                        list all submitter groups
  --rl, --resourcelist  list all resources

Use Cases
---------

Find the names of all submitters.
$ ./plumage_stats --ul
pmackinn@redhat.com

Find the names of all user groups (i.e., users plus submitter machine).
$ ./plumage_stats --ugl
pmackinn@redhat.com/milo.usersys.redhat.com

Find the names of all resources (aka startds, slots).
$ ./plumage_stats --rl
slot1@milo.usersys.redhat.com
slot2@milo.usersys.redhat.com

Print the stats for any username that starts with 'pmackinn' for the previous hour (default time lookback).
$ ./plumage_stats --u pmackinn

Print the stats for user 'pmackinn' over a given datetime range.
$ ./plumage_stats --u pmackinn --from '2011-09-29 14:02' --to '2011-09-29 14:05'
pmackinn@redhat.com 	2011-09-29 14:02:19.239000 	2 	0 	0
pmackinn@redhat.com 	2011-09-29 14:03:19.235000 	2 	0 	0
pmackinn@redhat.com 	2011-09-29 14:04:19.469000 	0 	0 	0

Print the stats for a resource name that contains 'milo' for the previous hour (default time lookback).
$ ./plumage_stats --r milo

Print the stats for resource 'milo' over a given datetime range.
$ ./plumage_stats --r milo --from '2011-09-29 14:02' --to '2011-09-29 14:05'
slot1@milo.usersys.redhat.com 	2011-09-29 14:02:19.240000 	INTEL/LINUX 	205 	0.24 	Claimed
slot2@milo.usersys.redhat.com 	2011-09-29 14:02:19.240000 	INTEL/LINUX 	533 	0.00 	Claimed
slot1@milo.usersys.redhat.com 	2011-09-29 14:03:19.236000 	INTEL/LINUX 	12  	0.33 	Claimed
slot2@milo.usersys.redhat.com 	2011-09-29 14:03:19.236000 	INTEL/LINUX 	603 	0.00 	Claimed
slot2@milo.usersys.redhat.com 	2011-09-29 14:04:19.470000 	INTEL/LINUX 	653 	0.00 	Unclaimed
slot1@milo.usersys.redhat.com 	2011-09-29 14:04:19.471000 	INTEL/LINUX 	30  	0.28 	Unclaimed

Database Layout
---------------
The Plumage plugin emits machine and submitter classads to the db.collection 'condor_raw.ads'. It can be
queryed in the mongo shell like so:

$ mongo
MongoDB shell version: 1.6.4
connecting to: test
> use condor_raw
switched to db condor_raw
> db.ads.find({"MyType" : "Machine"})
<output not shown>
> db.ads.find({"MyType" : "Submitter"})
<output not shown>

The stats records are in the db.collections 'condor_stats.samples.machine' and 'condor_stats.samples.submitter'
respectively:

> use condor_stats
> db.samples.machine.find()
<output not shown>

> db.samples.submitter.find()
<output not shown>

NOTE: After the plumage collector plugin has been activated, it can take a default of 5 minutes for statistics 
to become available to the tool.

Additional sample tools
-----------------------
plumage_accounting: Query HTCondor ODS for accounting group data
plumage_scheduler: Query HTCondor ODS for scheduler job totals
plumage_utilization: Query HTCondor ODS for utilization efficency

Please the consult the help info (--help) for details on using each of these.

Sampled data schema
-------------------
An additional file called SCHEMA shows the name mapping between terse key 
fields in the samples collections and their ClassAd counterparts.

plumage_history
===============
Plumage provides a tool that is analagous to the condor_history client. It provides
summary and full job ad information from the job history stored in mongodb.

The mongodb database.collection for job history is 'condor_jobs.history'. 

Usage: plumage_history [options]

Query HTCondor ODS for job history

Options:
  -h, --help            show this help message and exit
  -s SERVER, --server=SERVER
                        mongodb database server location: e.g., somehost,
                        localhost:2012
  -f, --forward         list jobs oldest-to-newest
  -m MATCH, --match=MATCH
                        limit the number of jobs displayed
  -l, --long            verbose output (entire classads)
  -c CPROC, --cproc=CPROC
                        get information about specific cluster[.proc], e.g., 5
                        or 4.2
  -o OWNER, --owner=OWNER
                        information about jobs owned by <owner>
  -S SUB, --sub=SUB     information about jobs grouped by <submission>

Example
-------

$ ./plumage_history
ID      OWNER            SUBMITTED                   RUN_TIME   ST  COMPLETED                   CMD
6.1     pmackinn         Wed Oct 10 18:38:57 2012    00:01:00   C   Wed Oct 10 18:40:00 2012    /bin/sleep 60
6.2     pmackinn         Wed Oct 10 18:38:57 2012    00:01:00   C   Wed Oct 10 18:40:00 2012    /bin/sleep 60
6.0     pmackinn         Wed Oct 10 18:38:57 2012    00:01:01   C   Wed Oct 10 18:40:00 2012    /bin/sleep 60
5.0     pmackinn         Wed Oct 10 18:35:21 2012    00:01:01   C   Wed Oct 10 18:36:28 2012    /bin/sleep 60
4.0     pmackinn         Wed Oct 10 17:48:38 2012    00:01:01   C   Wed Oct 10 17:49:41 2012    /bin/sleep 60
3.0     pmackinn         Wed Oct 10 17:41:39 2012    00:01:01   C   Wed Oct 10 17:42:41 2012    /bin/sleep 60
2.0     pmackinn         Wed Oct 10 17:36:09 2012    00:01:01   C   Wed Oct 10 17:37:15 2012    /bin/sleep 60
1.0     pmackinn         Wed Oct 10 10:10:07 2012    00:01:01   C   Wed Oct 10 10:11:27 2012    /bin/sleep 60


plumage_history_load
====================
Plumage provides a tool to load existing history files and their backups into mongodb in the absence of the
continuous loading provided by the job history daemon. It can be used to "catch up" or prime a mongodb instance 
with job history data prior to activating the job history deamon. Jobs are uniquely identified by the value of 
their GlobalJobId attribute.

Example
-------

$ ./plumage_history_load -h
Usage: ./plumage_history_load [-db <mongodb URL>] [-f <full path to condor history file>] [-h | -help]
        -db : the mongod connection URL (host[:port])
        -f : full path to a specific HTCondor history file to load
        -h | -help : this help message
No arguments will load all the history and backup files from $HISTORY directory to a mongod server at $PLUMAGE_DB_HOST/_PORT.
Duplicate jobs (same GlobalJobID) will be upserted (updated or added if missing).

$ ./plumage_history_load 
SCHEDD_NAME = hamish
mongod server = localhost:27017
processed 8 records from '/home/pmackinn/personal-condor/spool/history' in 0.020 sec
