Wednesday, 2 September 2020

Thoughts on integrations

 $dayjob uses for our public status page. Although I've got some things automated, there's still a bunch of components that need manual updating when there's an issue (enough to keep me out of mischief working out how to automate them without false alerts anyway). However this post isn't for getting info _in_ to statuspage, but for how to get it _out_, and what I want.

We're predominantly a command line shop (no, we don't yet have a JupyterHub frontend), so users are presented with a MOTD on login. Now, wouldn't it be good if that could be updated automatically with details of upcoming planned outages as well as any recent (and current) incidents that affected the service you're currently logged into?

So - Armed with the API, it should be possible to get upcoming maintenance[✓] and the impacted components[✓], but where do I map the cluster name to the statuspage group_id? or, for that matter any of the autogenerated id strings. Hard coding them into scripts is out, a lookup makes sense but how many CMDBs out there come with that sorta functionality built in. and we're back to another 'where is my source of truth?' problem. Sure I can string match components->name and check that "group": true and then pull the ID, but... yeah faffy. 

Anyway, after much parsing (all hail requests) it's possible to get the various incidents/component states/planned maintenance out and the resulting text snippets touched to the correct updated_at timestamp whereupon they can be pulled into the final output whenever the files are regenerated by a Makefile build under Jenkins. The goal of DRY is starting to be achieved by updating one place (currently statuspage) and having that trigger a build via webhook which then distributes the info out to the clusters.

I think a former colleague summed it up fairly well tho.

Feeling Pumped!

Having just had a day without power, and then going round the site to check everything came back online correctly (including services such a...