Tollef Fog Heen's blog

tfheen Sun, 16 Nov 2014 - Resigning as a Debian systemd maintainer

Apparently, people care when you, as privileged person (white, male, long-time Debian Developer) throw in the towel because the amount of crap thrown your way just becomes too much. I guess that's good, both because it gives me a soap box for a short while, but also because if enough people talk about how poisonous the well that Debian is has become, we can fix it.

This morning, I resigned as a member of the systemd maintainer team. I then proceeded to leave the relevant IRC channels and announced this on twitter. The responses I've gotten have been almost all been heartwarming. People have generally been offering hugs, saying thanks for the work put into systemd in Debian and so on. I've greatly appreciated those (and I've been getting those before I resigned too, so this isn't just a response to that). I feel bad about leaving the rest of the team, they're a great bunch: competent, caring, funny, wonderful people. On the other hand, at some point I had to draw a line and say "no further".

Debian and its various maintainer teams are a bunch of tribes (with possibly Debian itself being a supertribe). Unlike many other situations, you can be part of multiple tribes. I'm still a member of the DSA tribe for instance. Leaving pkg-systemd means leaving one of my tribes. That hurts. It hurts even more because it feels like a forced exit rather than because I've lost interest or been distracted by other shiny things for long enough that you don't really feel like part of a tribe. That happened with me with debian-installer. It was my baby for a while (with a then quite small team), then a bunch of real life thing interfered and other people picked it up and ran with it and made it greater and more fantastic than before. I kinda lost touch, and while it's still dear to me, I no longer identify as part of the debian-boot tribe.

Now, how did I, standing stout and tall, get forced out of my tribe? I've been a DD for almost 14 years, I should be able to weather any storm, shouldn't I? It turns out that no, the mountain does get worn down by the rain. It's not a single hurtful comment here and there. There's a constant drum about this all being some sort of conspiracy and there are sometimes flares where people wish people involved in systemd would be run over by a bus or just accusations of incompetence.

Our code of conduct says, "assume good faith". If you ever find yourself not doing that, step back, breathe. See if there's a reasonable explanation for why somebody is saying something or behaving in a way that doesn't make sense to you. It might be as simple as your native tongue being English and their being something else.

If you do genuinely disagree with somebody (something which is entirely fine), try not to escalate, even if the stakes are high. Examples from the last year include talking about this as a war and talking about "increasingly bitter rear-guard battles". By using and accepting this terminology, we, as a project, poison ourselves. Sam Hartman puts this better than me:

I'm hoping that we can all take a few minutes to gain empathy for those who disagree with us. Then I'm hoping we can use that understanding to reassure them that they are valued and respected and their concerns considered even when we end up strongly disagreeing with them or valuing different things.

I'd be lying if I said I didn't ever feel the urge to demonise my opponents in discussions. That they're worse, as people, than I am. However, it is imperative to never give in to this, since doing that will diminish us as humans and make the entire project poorer. Civil disagreements with reasonable discussions lead to better technical outcomes, happier humans and a healthier projects.

[23:55] | Debian | Resigning as a Debian systemd maintainer

tfheen Fri, 29 Nov 2013 - Redirect loop with interaktiv.nsb.no (and how to fix it)

I'm running a local unbound instance on my laptop to get working DNSSEC. It turns out that with the captive portal NSB (the Norwegian national rail company), this doesn't work too well and you get into an endless series of redirects. Changing resolv.conf so you use the DHCP-provided resolver stops the redirect loop and you can then log in. Afterwards, you're free to switch back to using your own local resolver.

[08:37] | tech | Redirect loop with interaktiv.nsb.no (and how to fix it)

tfheen Thu, 03 Oct 2013 - Fingerprints as lightweight authentication

Dustin Kirkland recently wrote that "Fingerprints are usernames, not passwords". I don't really agree, I think fingerprints are fine for lightweight authentication. iOS at least allows you to only require a pass code after a time period has expired, so you don't have to authenticate to the phone all the time. Replacing no authentication with weak authentication (but only for a fairly short period) will improve security over the current status, even if it's not perfect.

Having something similar for Linux would also be reasonable, I think. Allow authentication with a fingerprint if I've only been gone for lunch (or maybe just for a trip to the loo), but require password or token if I've been gone for longer. There's a balance to be struck between convenience and security.

[11:20] | tech | Fingerprints as lightweight authentication

tfheen Thu, 27 Jun 2013 - Getting rid of NSCA using Python and Chef

NSCA is a tool used to submit passive check results to nagios. Unfortunately, an incompatibility was recently introduced between wheezy clients and old servers. Since I don't want to upgrade my server, this caused some problems and I decided to just get rid of NSCA completely.

The server side of NSCA is pretty trivial, it basically just adds a timestamp and a command name to the data sent by the client, then changes tabs into semicolons and stuffs all of that down Nagios' command pipe.

The script I came up with was:

#! /usr/bin/python
# -* coding: utf-8 -*-

import time
import sys

# format is:
# [TIMESTAMP] COMMAND_NAME;argument1;argument2;…;argumentN
#
# For passive checks, we want PROCESS_SERVICE_CHECK_RESULT with the
# format:
#
# PROCESS_SERVICE_CHECK_RESULT;<host_name>;<service_description>;<return_code>;<plugin_output>
#
# return code is 0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN
#
# Read lines from stdin with the format:
# $HOSTNAME\t$SERVICE_NAME\t$RETURN_CODE\t$TEXT_OUTPUT

if len(sys.argv) != 2:
    print "Usage: {0} HOSTNAME".format(sys.argv[0])
    sys.exit(1)
HOSTNAME = sys.argv[1]

timestamp = int(time.time())
nagios_cmd = file("/var/lib/nagios3/rw/nagios.cmd", "w")
for line in sys.stdin:
    (_, service, return_code, text) = line.split("\t", 3)
    nagios_cmd.write(u"[{timestamp}] PROCESS_SERVICE_CHECK_RESULT;{hostname};{service};{return_code};{text}\n".format
                     (timestamp = timestamp,
                      hostname = HOSTNAME,
                      service = service,
                      return_code = return_code,
                      text = text))

The reason for the hostname in the line (even though it's overridden) is to be compatible with send_nsca's input format.

Machines submit check results over SSH using its excellent ForceCommand capabilities, the Chef template for the authorized_keys file looks like:

<% for host in @nodes %>
command="/usr/local/lib/nagios/nagios-passive-check-result <%= host[:hostname] %>",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa <%= host[:keys][:ssh][:host_rsa_public] %> <%= host[:hostname] %>
<% end %>

The actual chef recipe looks like:

nodes = []
search(:node, "*:*") do |n|
  # Ignore not-yet-configured nodes                                                                       
  next unless n[:hostname]
  next unless n[:nagios]
  next if n[:nagios].has_key?(:ignore)
  nodes << n
end
nodes.sort! { |a,b| a[:hostname] <=> b[:hostname] }
print nodes

template "/etc/ssh/userkeys/nagios" do
  source "authorized_keys.erb"
  mode 0400
  variables({
              :nodes => nodes
            })
end

cookbook_file "/usr/local/lib/nagios/nagios-passive-check-result" do
  mode 0555
end

user "nagios" do
  action :manage
  shell "/bin/sh"
end

To submit a check, hosts do:

printf "$HOSTNAME\t$SERVICE_NAME\t$RET\t$TEXT\n" | ssh -i /etc/ssh/ssh_host_rsa_key -o BatchMode=yes -o StrictHostKeyChecking=no -T nagios@$NAGIOS_SERVER
[10:09] | tech | Getting rid of NSCA using Python and Chef

tfheen Tue, 18 Jun 2013 - An otter, please (or, a better notification system)

Recently, there's been discussions on IRC and the debian-devel mailing list about how to notify users, typically from a cron script or a system daemon needing to tell the user their hard drive is about to expire. The current way is generally "send email to root" and for some bits "pop up a notification bubble, hoping the user will see it". Emailing me means I get far too many notifications. They're often not actionable (apt-get update failed two days ago) and they're not aggregated.

I think we need a system that at its core has level and edge triggers and some way of doing flap detection. Level interrupts means "tell me if a disk is full right now". Edge means "tell me if the checksums have changed, even if they now look ok". Flap detection means "tell me if the nightly apt-get update fails more often than once a week". It would be useful if it could extrapolate some notifications too, so it could tell me "your disk is going to be full in $period unless you add more space".

The system needs to be able to take in input in a variety of formats: syslog, unstructured output from cron scripts (including their exit codes), snmp, nagios notifications, sockets and fifos and so on. Based on those inputs and any correlations it can pull out of it, it should try to reason about what's happening on the system. If the conclusion there is "something is broken", it should see if it's something that it can reasonably fix by itself. If so, fix it and record it (so it can be used for notification if appropriate: I want to be told if you restart apache every two minutes). If it can't fix it, notify the admin.

It should also group similar messages so a single important message doesn't drown in a million unimportant ones. Ideally, this should be cross-host aggregation. The notifications should be possible to escalate if they're not handled within some time period.

I'm not aware of such a tool. Maybe one could be rigged together by careful application of logstash, nagios, munin/ganglia/something and sentry. If anybody knows of such a tool, let me know, or if you're working on one, also please let me know.

[09:15] | tech | An otter, please (or, a better notification system)

tfheen Fri, 22 Mar 2013 - Sharing an SSH key, securely

Update: This isn't actually that much better than letting them access the private key, since nothing is stopping the user from running their own SSH agent, which can be run under strace. A better solution is in the works. Thanks Timo Juhani Lindfors and Bob Proulx for both pointing this out.

At work, we have a shared SSH key between the different people manning the support queue. So far, this has just been a file in a directory where everybody could read it and people would sudo to the support user and then run SSH.

This has bugged me a fair bit, since there was nothing stopping a person from making a copy of the key onto their laptop, except policy.

Thanks to a tip, I got around to implementing this and figured writing up how to do it would be useful.

First, you need a directory readable by root only, I use /var/local/support-ssh here. The other bits you need are a small sudo snippet and a profile.d script.

My sudo snippet looks like:

Defaults!/usr/bin/ssh-add env_keep += "SSH_AUTH_SOCK"
%support ALL=(root)  NOPASSWD: /usr/bin/ssh-add /var/local/support-ssh/id_rsa

Everybody in group support can run ssh-add as root.

The profile.d goes in /etc/profile.d/support.sh and looks like:

if [ -n "$(groups | grep -E "(^| )support( |$)")" ]; then
    export SSH_AUTH_ENV="$HOME/.ssh/agent-env"
    if [ -f "$SSH_AUTH_ENV" ]; then
        . "$SSH_AUTH_ENV"
    fi
    ssh-add -l >/dev/null 2>&1
    if [ $? = 2 ]; then
        mkdir -p "$HOME/.ssh"
        rm -f "$SSH_AUTH_ENV"
        ssh-agent > "$SSH_AUTH_ENV"
        . "$SSH_AUTH_ENV"
    fi
    sudo ssh-add /var/local/support-ssh/id_rsa
fi

The key is unavailable for the user in question because ssh-add is sgid and so runs with group ssh and the process is only debuggable for root. The only thing missing is there's no way to have the agent prompt to use a key and I would like it to die or at least unload keys when the last session for a user is closed, but that doesn't seem trivial to do.

[09:45] | tech | Sharing an SSH key, securely

tfheen Tue, 29 Jan 2013 - Abusing sbuild for fun and profit

Over the last couple of weeks, I have been working on getting binary packages for [Varnish] modules built. In the current version, you need to have a built, unpacked source tree to build a module against. This is being fixed in the next version, but until then, I needed to provide this in the build environment somehow.

RPMs were surprisingly easy, since our RPM build setup is much simpler and doesn't use mock/mach or other chroot-based tools. Just make a source RPM available and unpack + compile that.

Debian packages on the other hand, they were not easy to get going. My first problem was to just get the Varnish source package into the chroot. I ended up making a directory in /var/lib/sbuild/build which is exposed as /build once sbuild runs. The other hard part was getting Varnish itself built. sbuild exposes two hooks that could work: a pre-build hook and a chroot-setup hook. Neither worked: Pre-build is called before the chroot is set up, so we can't build Varnish. Chroot-setup is run before the build-dependencies are installed and it runs as the user invoking sbuild, so it can't install packages.

Sparc32 and similar architectures use the linux32 tool to set the personality before building packages. I ended up abusing this, so I set HOME to a temporary directory where I create a .sbuildrc which sets $build_env_cmnd to a script which in turns unpacks the Varnish source, builds it and then chains to dpkg-buildpackage. Of course, the build-dependencies for modules don't include all the build-dependencies for Varnish itself, so I have to extract those from the Varnish source package too.

No source available at this point, mostly because it's beyond ugly. I'll see if I can get it cleaned up.

[15:32] | tech | Abusing sbuild for fun and profit

tfheen Mon, 28 Jan 2013 - FOSDEM talk: systemd in Debian

Michael Biebl and I are giving a talk on systemd in Debian at FOSDEM on Sunday morning at 10. We'll be talking a bit about the current state in Wheezy, what our plans for Jessie are and what Debian packagers should be aware of. We would love to get input from people about what systemd in Jessie should look like, so if you have any ideas, opinions or insights, please come along. If you're just curious, you are also of course welcome to join.

[16:20] | Debian | FOSDEM talk: systemd in Debian

tfheen Thu, 17 Jan 2013 - Gitano – git hosting with ACLs and other shininess

gitano is not entirely unlike the non-web, server side of github. It allows you to create and manage users and their SSH keys, groups and repositories from the command line. Repositories have ACLs associated with them. Those can be complex ("allow user X to push to master in the doc/ subtree) or trivial ("admin can do anything"). Gitano is written by Daniel Silverstone, and I'd like to thank him both for writing it and for holding my hand as I went stumbling through my initial gitano setup.

Getting started with Gitano can be a bit tricky, as it's not yet packaged and fairly undocumented. Until it is packaged, it's install from source time. You need luxio, lace, supple, clod, gall and gitano itself.

luxio needs a make install LOCAL=1, the others will be installed to /usr/local with just make install.

Once that is installed, create a user to hold the instance. I've named mine git, but you're free to name it whatever you would like. As that user, run gitano-setup and answer the prompts. I'll use git.example.com as the host name and john as the user I'm setting this up for.

To create users, run ssh git@git.example.com user add john john@example.com John Doe, then add their SSH key with ssh git@git.example.com as john sshkey add workstation < /tmp/john_id_rsa.pub.

To create a repository, run ssh git@git.example.com repo create myrepo. Out of the box, this only allows the owner (typically "admin", unless overridden) to do anything with it. To change ACLs, you'll want to grab the refs/gitano/admin branch. This lives outside of the space git usually use for branches, so you can't just check it out. The easiest way to check it out is to use git-admin-clone. Run it as git-admin-clone git@git.example.com:myrepo ~/myrepo-admin and then edit in ~/myrepo-admin. Use git to add, commit and push as normal from there.

To change ACLs for a given repo, you'll want to edit the rules/main.lace file. A real-world example can be found in the NetSurf repository and the lace syntax might be useful. A lace file consists of four types of lines:

Rules are processed one by one, from the top and terminate whenever a matching allow or deny is found.

Conditions can either be matches to an update, such as ref refs/heads/master to match updates to the master branch. To create groupings, you can use the anyof or allof verbs in a definition. Allows and denials are checked against all the definitions listed and if all of them match, the appropriate action is taken.

Pay some attention to what conditions you group together, since a basic operation (is_basic_op, aka op_read and op_write) happens before git is even involved and you don't have a tree at that point, so rules like:

define is_master ref refs/heads/master
allow "Devs can push" op_is_basic is_master

simply won't work. You'll want to use a group and check on that for basic operations and then have a separate rule to restrict refs.

[07:34] | tech | Gitano – git hosting with ACLs and other shininess

tfheen Tue, 04 Sep 2012 - Driving Jenkins using YAML and a bit of python

We recently switched from Buildbot to Jenkins at work, for building Varnish on various platforms. Buildbot worked-ish, but was a bit fiddly to get going on some platforms such as Mac OS and Solaris. Where buildbot has a daemon on each node that is responsible for contacting the central host, Jenkins uses SSH as the transport and centrally manages retries if a host goes down or is rebooted.

All in all, we are pretty happy with Jenkins, except for one thing: The job configurations are a bunch of XML files and the way you are supposed to configure this is through a web interface. That doesn't scale particularly well when you want to build many very similar jobs. We want to build multiple branches, some which are not public and we want to build on many slaves. The latter we could partially solve with matrix builds, except that will fail the entire build if a single slave fails with an error that works on retry. As the number of slaves increases, such failures become more common.

To solve this, I hacked together a crude tool that takes a yaml file and writes the XML files. It's not anywhere near as well structured and pretty as liw's jenkinstool, but it is quite good at translating the YAML into a bunch of XML files. I don't know if it's useful for anybody else, there is no documentation and so on, but if you want to take a look, it's on github.

Feedback is most welcome, as usual. Patches even more so.

[13:03] | varnish | Driving Jenkins using YAML and a bit of python

Tollef Fog Heen <tfheen@err.no>