* there is an (unchecked) assumption/requirement that each node's submit
  file queues only 1 job -- if this is violated, many things may break
  (e.g., the injection of DAGNodeName into the submit file)

* there are two different recovery scenarios:

  - one where DAGMAN sees DONE statements in the DAG file.  (This
    isn't strictly "recovery" from an implementation point of view,
    but rather a specific type of DAG to start.)

  - another where DAGMAN detects a .lock files from a previous,
    aborted run, and uses the .log file to reconstruct its state in
    order to continue
