Thursday, 14 June 2018

Overlaying SLURM job timings on Grafana plots

As you may have noticed, I'm quite fond of Grafana and use it at home and work. One of the dashboards I have at work is the general state of our lustre filesystems, showing IO and metadata traffic, collected by a custom python script (I'm working on converting this to a real collectd python plugin) which stores the data in an influxDB.

I've since written a small python script that talks to our SLURM accounting DB, so that given a jobID, we can get the start/end times and overlay those using the annotations API. One minor niggle in that the API expects epoch milliseconds, and seems to be tied to the TZ of the browser that generated the API key.

~$ annotate_job 2924399
Found the following job:
  User: bskjerven (pawsey0001)
  Cluster: magnus, Partition: workq, QOS: normal
  Nodes: 768, CPUs: 36864
  Start: 2018-06-11 17:23:22, End: 2018-06-11 19:54:44
Got something back - Annotate? (y/n) y
200 - Annotation added

and lo - 

No comments:

Thoughts on integrations

  $dayjob uses for our public status page . Although I've got some things automated, there's still a bunch of compon...