# $Id: DESIGN 65 2002-03-13 21:06:59Z rgb $
#
# See copyright in copyright.h and the accompanying file COPYING
#========================================================================

                       XML System Daemon

The XML system daemon is being designed as a replacement for procstatd,
an early way to provide lightweight access to key system statistics for
remote cluster or LAN monitoring.

Basically, the XML system daemon will be designed to provide 

  a) a lightweight wrapper function, based on libxml, that can
encapsulate arbitrary fields from /proc-based files.  The XML will
transparently encode the path and field indices of the data returned. At
this time read-only access will be provided; this is NOT intended to
function as a daemon-based proc control interface.

  b) functions to produce xml-encoded access to selected systems
information calls such as time and users that might be useful to systems
or cluster managers.

  c) a daemon-based command interface that can be used to select
specific path/field or supported systems data call descriptors for for
return in an xml-encoded packet produced by the xml interface.  As the
project develops, commands for selection based on glob or regexp
criteria may be added.  Initially this will be procstatd's old ascii 
command interface only.

This greatly simplifies and hence improves upon the functionality of
procstatd -- basically a small set of routines configures the daemon and
opens the selected proc paths, and then reads, xml encodes, and returns
the requested data in packet form through a standard socket interface.
All presentation and processing is the responsibility of the calling
routine.

This simplification means that trivial modifications of e.g. ps or
uptime or even top can be used to monitor remote hosts as easily as
local ones via a socket interface.  It will also permit the construction
of both web, gui, and even (via a suitable DTD) text applications for
converting return data into "reports" or visual displays of entire
cluster or lan proc/system data.

A swiss-army-knife tty monitoring tool (wulfstat) is provided so that
the daemon can be immediately useful until I finish a Gtk-based GUI app
with more features.

Further design notes and modifications will be documented below as
appropriate.  The principle one of interest are the rules to be followed
by folks seeking to add (e.g. sensors) support to the daemon for their
local hardware.  PLEASE follow these rules, as they are key to modifying
the daemon in a way that will not break applications like wulfstat AND
that can use wulfstat's core parsing routines to fairly easily extract
the stuff that you add.

  rgb  3/13/02

========================================================================
                 <xmlsyd> Language Specification

The language is very simple.  Note the packet must start with a
Content-Length specifier (from http 1.1) followed by an empty blank line
so that the message parser knows how many bytes to read from the stream.

The document root is <xmlsysd> and must
wrap the whole message.

There are two toplevel tags, <system> and <proc>.  <system> wraps fields
derived from systems calls, e.g. gettimeofday(), hostname() or non-proc
files like wtmp (users).  <proc> wraps all fields encapsulating data
parsed from files under /proc.

In general I have tried to create a tag structure that -- up to a point
-- recapitulates an xpath-like map:

  /proc/stat <-> <proc><stat>
  /proc/net/dev <-> <proc><net><dev>

This breaks down, of course, because /proc looks like it was designed by
a myriad of scraggly monkeys loaded on Jolt Cola and banging randomly on
a keyboard... no, that is too kind.  Some parts of it look like they
were designed by >>deranged<< scraggly monkeys.  Seriously, /proc is
crying out for a fundamental makeover into something that CAN be neatly
echod in xml.

In any event, one can see that within e.g. /proc/stat I TRY to assign a
tag per line parsed, and to tag the fields parsed out of the line with
sensible (and accurate) names.  Of course its format is completely
different from /proc/meminfo, which is TOTALLY different from
/proc/net/dev, and don't get me started on sensors (the /proc API
designed by SADISTIC deranged monkeys who may or may not be scraggly).

The fundamental design principles are:

  a) TRY to create a rational representation of /proc, feeling free to
ignore irrelevant (to monitoring, in your opinion) fields or to impose
order on chaos as needed.
  b) VERY IMPORTANT!  Every Tag/XPath Must Be Unique!  For example, the

<proc>
 <stat>
   <cpu>
    ...(content tags)
   </cpu>

can occur several times, for cpu0, cpu1.. on an SMP system.  We
hence differentiate the cpus (AND get unique xpaths) by adding an
"id" attribute:
<proc>
 <stat>
   <cpu id="0">
     <user>234235</user>
    ...(content tags)
   </cpu>
   <cpu id="1">
    ...(content tags)
   </cpu>

which can be accessed uniquely via xpaths like

  /xmlsysd/proc/stat/cpu[@id="0"]/user

Similarly there are many network devices (with different names and
types!) that occur inside /proc/net/dev, so we use a toplevel
  ...
   <interface id="0" devtype="eth" devid="0">

to tag eth0, for example.  That way

  /xmlsysd/proc/net/dev/interface[@devtype="eth" @devid="0"/receive/bytes

makes sense.  If we differentiated via only e.g.

  <interface>
    <name>eth0</name>
    <receive>
      <bytes>12091</bytes>

we'd have to "look ahead" to figure out which of the <interfaces> to
descend into to retrieve rx bytes.  Trust me, this is a real pain and
we don't want to do it on the client application side.

There are more sensible rules for xml design than I can capture right
here and now, so you should likely read a book or two (or ASK me) before
trying anything major that you then have to redo.  For example, I
learned the hard way not to create tags like <eth0> or tags with values
given in attributes so you shouldn't have to.

A good rule of thumb is: only create new tags that wulfstats "xtract()"
or "xtract_attribute()" functions can access (if necessary, adding a new
enumerated type to the return list) and MODIFY THIS FUNCTION as needed
at the SAME time.  The latter is often a very instructive step...

  c) Some quantities that we might want to present are rates.  In order
create a rate (on the client side) we need to know EXACTLY when the
data was read from proc.  Many file tags therefore carry a timestamp
attribute set right before opening the proc file for reading, e.g.

  <stat tv_sec="1016048061" tv_usec="789464">

so that two successive reads can form the time difference (accurate to
a few microseconds), the value difference, and form a rate.  The rate
formed won't be horribly accurate if the time delta is less than 1 sec
(many things monitored are very bursty and short time rates are not good
predictors of average rates) but we need the timestamp(s).  Besides,
the top to bottom timestamp delta gives you a very good idea of how long
the message took to assemble on the monitored host and hence the relative
load produced by running the daemon.

  d) Don't Break Things.  There are two things you need to be VERY
careful not to break.  One is existing tags and paths.  PLEASE submit
major changes in format back to me so I can incorporate them, explaining
why the changes are necessary, but remember that if the changes are
really big (change the paths completely) all the clients will break.
There are tags I >>plan<< to add (e.g. <disk> from /proc/stat) but
adding it shouldn't break <cpu> or <intr>.  The other is memory
management.  libxml is very tricky in that you have to free all sorts of
things used to construct or parse the xml document using ITS constructs,
while still remembering to free your own.  Even being careful, my first
cut of this daemon leaked memory like a sonuvabitch until I squashed a
nasty little bug.  It is a Bad Thing to run a leaky daemon intended to
run for months at a time and hundreds of thousands of message cycles...

Below you can see a typical return message (without compression and with
"pretty" indentation)  from the daemon as of this moment.  Note that
information redundancy is OK, and even tagname reuse is OK as long as
paths remain unique (including attributes).

Good Luck,

  rgb   (rgb@phy.duke.edu)



%< Snip Snip ============= Sample return xmlsysd message =============
Content-Length: 4028

<?xml version="1.0"?>
<xmlsysd>
  <system>
    <time tv_sec="1016048061" tv_usec="788933">2:34:21 pm</time>
    <identity>
      <hostname>golem</hostname>
      <hostip>192.168.1.140</hostip>
    </identity>
    <users tv_sec="1016048061" tv_usec="789442">3</users>
  </system>
  <proc>
    <stat tv_sec="1016048061" tv_usec="789464">
      <cpu id="0">
        <user>316976</user>
        <nice>773</nice>
        <sys>92659</sys>
        <tot>49861016</tot>
      </cpu>
      <page>
        <in>317279</in>
        <out>1860644</out>
      </page>
      <swap>
        <in>1</in>
        <out>0</out>
      </swap>
      <intr>54537416</intr>
      <ctxt>8610762</ctxt>
      <processes>52819</processes>
    </stat>
    <meminfo tv_sec="1016048061" tv_usec="790457">
      <memory>
        <total>327102464</total>
        <used>263335936</used>
        <free>63766528</free>
        <shared>393216</shared>
        <buffers>109690880</buffers>
        <cached>108212224</cached>
      </memory>
      <swap>
        <total>1077469184</total>
        <used>102400</used>
        <free>1077366784</free>
      </swap>
    </meminfo>
    <net>
      <dev tv_sec="1016048061" tv_usec="798104">
        <interface id="0" devtype="lo" devid="">
          <name>lo</name>
          <ip>127.0.0.1</ip>
          <host>localhost</host>
          <receive>
            <bytes>97083646</bytes>
            <packets>141291</packets>
            <errs>0</errs>
            <drop>0</drop>
            <fifo>0</fifo>
            <frame>0</frame>
            <compressed>0</compressed>
            <multicast>0</multicast>
          </receive>
          <transmit>
            <bytes>97083646</bytes>
            <packets>141291</packets>
            <errs>0</errs>
            <drop>0</drop>
            <fifo>0</fifo>
            <collisions>0</collisions>
            <carrier>0</carrier>
            <compressed>0</compressed>
          </transmit>
        </interface>
        <interface id="1" devtype="eth" devid="0">
          <name>eth0</name>
          <ip>192.168.1.140</ip>
          <host>golem.rgb.private.net</host>
          <receive>
            <bytes>762887796</bytes>
            <packets>2036397</packets>
            <errs>0</errs>
            <drop>0</drop>
            <fifo>0</fifo>
            <frame>0</frame>
            <compressed>0</compressed>
            <multicast>0</multicast>
          </receive>
          <transmit>
            <bytes>326138029</bytes>
            <packets>1860317</packets>
            <errs>0</errs>
            <drop>0</drop>
            <fifo>0</fifo>
            <collisions>0</collisions>
            <carrier>0</carrier>
            <compressed>0</compressed>
          </transmit>
        </interface>
      </dev>
      <sockstat tv_sec="1016048061" tv_usec="799216">
        <used>45</used>
        <tcp>
          <inuse>13</inuse>
          <orphan>0</orphan>
          <timewait>0</timewait>
          <alloc>13</alloc>
          <mem>1</mem>
        </tcp>
        <udp>
          <inuse>6</inuse>
        </udp>
        <raw>
          <inuse>0</inuse>
        </raw>
        <frag>
          <inuse>0</inuse>
          <memory>0</memory>
        </frag>
      </sockstat>
    </net>
    <loadavg tv_sec="1016048061" tv_usec="799388">
      <load1>0.21</load1>
      <load5>0.05</load5>
      <load15>0.02</load15>
    </loadavg>
    <cpuinfo tv_sec="1016048061" tv_usec="799554">
      <processor id="0">
        <vendor_id>GenuineIntel</vendor_id>
        <family>6</family>
        <model_num>8</model_num>
        <model_name>Celeron (Coppermine)</model_name>
        <clock units="MHz">801.824</clock>
        <cachesize units="KB">128</cachesize>
      </processor>
    </cpuinfo>
    <sysvipc tv_sec="1016048061" tv_usec="799947">
      <msgbufs>0</msgbufs>
      <msgtot>0</msgtot>
      <sembufs>0</sembufs>
      <semtot>0</semtot>
      <shmbufs>2</shmbufs>
      <shmtot>786432</shmtot>
    </sysvipc>
    <version>2.4.9-21</version>
  </proc>
</xmlsysd>
quit
rgb@golem|T:103>exit
Script done on Wed Mar 13 14:34:25 2002
