Thursday, 14 June 2018

Overlaying SLURM job timings on Grafana plots

As you may have noticed, I'm quite fond of Grafana and use it at home and work. One of the dashboards I have at work is the general state of our lustre filesystems, showing IO and metadata traffic, collected by a custom python script (I'm working on converting this to a real collectd python plugin) which stores the data in an influxDB.

I've since written a small python script that talks to our SLURM accounting DB, so that given a jobID, we can get the start/end times and overlay those using the annotations API. One minor niggle in that the API expects epoch milliseconds, and seems to be tied to the TZ of the browser that generated the API key.

however...
~$ annotate_job 2924399
Found the following job:
  User: bskjerven (pawsey0001)
  Cluster: magnus, Partition: workq, QOS: normal
  Nodes: 768, CPUs: 36864
  Start: 2018-06-11 17:23:22, End: 2018-06-11 19:54:44
Got something back - Annotate? (y/n) y
200 - Annotation added

and lo - 


Thursday, 22 March 2018

weewx to home-assistant

At home I have a Fine Offset (this one branded by Jaycar) weather station that publishes to weather.two-fifteen.com via weewx (this is much simpler now I don't have to have the thing solar mounted in a field using a 3G dongle - at least the NBN is useful for some things) but I'd like to be able to use some of the measurements in home-assistant.

There isn't yet a direct plugin (spare time project anyone?) that I can see, but because I'm pushing the metrics locally to influxdb (have I mentioned I like drawing graphs of things?) for grafana, it's possible to use this in home-assistant via weewx-influx

Weewx config:
[StdRESTful]
      [[Influx]]
        host = localhost
        database = weather
        unit_system = METRIC

and on the home-assistant server:
  - platform: influxdb
    host:    203.0.113.88
    queries:
    - name: Outside Temp
      database: weather
      measurement: 'record'
      field: 'outTemp_C'
      group_function: last
      where: 'time > now() - 5m'
      unit_of_measurement: °C
      value_template: '{{ value | round(1) }}'

and lo and behold, you should end up with something like this:
which when compared to our nearest BOM observation site up the road in Grove correlates nicely (Grove is in a valley, we're at the top of a hill)


Monday, 15 January 2018

Don't count your chickens...

We have a cheapo Chinese incubator for hatching eggs. According to popular Internet postings, the calibration of the 'temperature setting' on the front vs reality inside isn't terribly accurate. Since I have a stack of 'Ruuvitags' (https://ruuvi.com) from their kickstarter, I decided to put combine them with Grafana so I could start logging the data and plotting trends.

First up, by default they broadcast an Eddystone beacon, so that you can simply see the data on a phone / tablet (via the Physical Web), however with Google dropping this feature, I decided to switch them to Raw mode which has a higher accuracy. This is done by simply opening the tag and pressing the 'B' button to toggle between Raw (LED blinks red) and URL (LED blinks green) mode.

I'm using a Raspberry Pi 3 as a Bluetooth receiver. Running Rasbian 9 (stretch) means that I get a recent (v 5.43) version of bluez which understands BLE. Although there's a Java app to push to influxdb, I'd rather use Python, so pip-installed the ruuvitag_sensor package.

Initially this worked 'OK', but the logs were full of noise on the listener, so I hacked up a quick script based loosely on the examples. When done, it was much cleaner than the original and was picking up more of the broadcasts.

Once again, trivial to add to influx with some templating

Tuesday, 10 October 2017

Plotting Lustre MDS stats

At $dayjob we have several large filesystems - for example our /scratch system has 3.1 PB of space using over 1000 HDDs. Although each vendor offers their own dashboard for monitoring they're all a little bit crap and don't integrate with anything else.

Cue an afternoon setting up influxdb (trivial) and grafana (also trivial) on a spare VM and a simple python script run on the metadata servers:

[admin@snx11038n003 ~]$ cat push_mdt_stats.py
#!/usr/local/bin/python2.7
import urllib
import time

def grabbit(mds):
 post = ""
 with open(('/proc/fs/lustre/mdt/%s/md_stats' % mds), 'r') as f:
    for line in f:
        k,v,null = line.split(None,2)
        if k == "snapshot_time":
            ts=int(float(v)*1000000)
        else:
            post += 'metadata,fs={3} {0}={1} {2}\n'.format(k,v,ts,mds)
 with open(('/proc/fs/lustre/mdd/%s/changelog_users' % mds), 'r') as f:
    tmp = f.read().split()
    # we can cheat here as they have the same format - 3rd item in list is current changelog count, and then
    # from the 6th item on we get changelog id / position to pull into a dict
    head = int(tmp[2])
    clog = dict(zip(tmp[5:][0::2], tmp[5:][1::2]))
    post += 'changelog,fs={2} head={0} {1}\n'.format(head,ts,mds)
    for cl,count in clog.items():
        post += 'changelog,fs={3} {0}={1} {2}\n'.format(cl,count,ts,mds)

 post=post.encode('ascii')
 p = urllib.urlopen('http://influxbox:8086/write?db=lustre&precision=u',post)
 #print(p.getcode())

while True:
  try:
    grabbit('snx11038-MDT0000')
  except:
    sys.exit("Whoa, that went a bit Pete Tong!")
  time.sleep(10)

And a couple of clicks in Grafana can soon knock up a dashboard:


Monday, 28 August 2017

PSU tinkering, Part 1

As previously blogged, I've got a couple of 12v 88.7A PSUs that I'm trying to control under arduino. Stage 1 complete - It powers up with a trivial bit of code

/* Arduino control for (ex) server PSU 
 * Andrew Elwell <andrew.elwell@gmail.com> August 2016
 * Released under BSD licence
 */

 /* Controls / Pins based on data sheet available at 
  *  https://belfuse.com/resources/PowerSolutions/SFP1050/bcd20031_ab_sfp1050-12bg.pdf
  *  
  *  A6/B4/C4/D4         +3.3 standby (power to arduino)
  *  A3/B1/B3/C1/C3/D3   Return 
  *  B5(SDA) / C5(SCL)   I2C
  *  B6                  Bring low for PS ON
  *  C6                  AC OK (if high)
  *  D6                  PWR OK (if high)
  *  
  */


#include <wire.h>

int ACOK  = 2;
int PSON  = 3;
int PWROK = 4;
int LED   = 13;

void setup() {
  Wire.begin();                // join i2c bus (address optional for master)
  pinMode(ACOK, INPUT);
  pinMode(PSON, OUTPUT);
  pinMode(PWROK, INPUT);
  pinMode(LED,  INPUT);
  
  digitalWrite(PSON,HIGH) ;   // Stay off until ready
}

void loop() {
  if (digitalRead(ACOK) == HIGH) {
    digitalWrite(PSON,LOW) ;
  } 
  if (digitalRead(PWROK) == HIGH) {
    digitalWrite(LED,HIGH) ;
  } 
}

The one gotcha that I needed to get it working was to also bring PS A0 low (I2C address) and suddenly green led and 12v out!



Tuesday, 4 July 2017

I've got the power

(It's getting, it's getting, it's getting kinda hectic)

So, another "I should really get round to that" project that's worked its way to the top of the desk is repurposing a skip-dived server PSU (or 4) to be more usable.

Exhibit A - One ex-sun 'SPASUNM-03G' PSU, which spits out a fairly chunky 12v at 86.7A
Since these were pulled from a bunch of servers, the output is a less than friendly set of three paired contacts for +12 and another set of three pairs or the ground. It won't start spitting out 12v when you plug a mains lead in as it needs the PS_ON connector bringing low. Power-one seem to gave been bought out by bel, and the datasheet is available here.


Rather than the (sometimes) crude way people have modified these and similar server PSUs over at RC Groups, I thought I'd hook up an arduino and be "smart"

So - Grand Plan (TM)
* Nice big illuminated push button for on/standby
* LCD display to show status (output / alarms / temp)
* No screaming 'fan-at-maximum' setting all the time

This shouldn't be that hard, right? Arduinos can do i2c and I have a bunch of 3.3v ones to hand, so I can drive this off the stby 3.3v (even that's at 3A on this thing)

TO THE SOLDERING IRON! ... to be continued


Sunday, 25 June 2017

Handheld animal RFID reader teardown

As some of you may know, we have a small herd of llamas and although each is easily recognisable legislation and the Llama Association of Australasia Inc. require that the animals be microchipped. As a geek, I'd also like to integrate routine monitoring of weight, so my idea is to use the embedded microchips (same as your pet FDX-B 'grain of rice' thing) to save the scale output into the relevant animal record (yup, another use of a Raspberry Pi in the steading)

Aliexpress provided the reader, and 5 small screws later (and another 2 for the PCB) has the innards exposed at https://www.flickr.com/photos/elwell/albums/72157683149501201

Overlaying SLURM job timings on Grafana plots

As you may have noticed, I'm quite fond of Grafana and use it at home and work. One of the dashboards I have at work is the general sta...