2013-01-28 – FOSDEM talk: systemd in Debian
Michael Biebl and I are giving a talk on systemd in Debian at FOSDEM on Sunday morning at 10. We’ll be talking a bit about the current state in Wheezy, what our plans for Jessie are and what Debian packagers should be aware of. We would love to get input from people about what systemd in Jessie should look like, so if you have any ideas, opinions or insights, please come along. If you’re just curious, you are also of course welcome to join.
2013-01-17 – Gitano – git hosting with ACLs and other shininess
gitano is not entirely unlike the
non-web, server side of github. It allows you to create and manage
users and their SSH keys, groups and repositories from the command
line. Repositories have ACLs associated with them. Those can be
complex (“allow user X to push to master
in the doc/
subtree) or
trivial (“admin can do anything”). Gitano is written by Daniel
Silverstone, and I’d like to thank him both for writing it and for
holding my hand as I went stumbling through my initial gitano setup.
Getting started with Gitano can be a bit tricky, as it’s not yet packaged and fairly undocumented. Until it is packaged, it’s install from source time. You need luxio, lace, supple, clod, gall and gitano itself.
luxio
needs a make install LOCAL=1
, the others will be installed
to /usr/local
with just make install
.
Once that is installed, create a user to hold the instance. I’ve
named mine git
, but you’re free to name it whatever you would like.
As that user, run gitano-setup and answer the prompts. I’ll use
git.example.com
as the host name and john
as the user I’m setting this
up for.
To create users, run ssh git@git.example.com user add john john@example.com John Doe
, then add their SSH key with ssh git@git.example.com as john sshkey add workstation < /tmp/john_id_rsa.pub
.
To create a repository, run ssh git@git.example.com repo create myrepo
. Out of the box, this only allows the owner (typically
“admin”, unless overridden) to do anything with it. To change ACLs,
you’ll want to grab the refs/gitano/admin
branch. This lives
outside of the space git usually use for branches, so you can’t just
check it out. The easiest way to check it out is to use
git-admin-clone. Run it as git-admin-clone git@git.example.com:myrepo ~/myrepo-admin
and then edit in
~/myrepo-admin
. Use git to add, commit and push as normal from
there.
To change ACLs for a given repo, you’ll want to edit the
rules/main.lace
file. A real-world example can be found in the
NetSurf repository and the lace syntax
might be useful. A lace file consists of four types of lines:
- Comments, start with – or #
- defines, look like
define name conditions
- allows, look like
allow "reason" definition [definition…]
- denials, look like
deny "reason" definition [definition…]
Rules are processed one by one, from the top and terminate whenever a matching allow or deny is found.
Conditions can either be matches to an update, such as ref refs/heads/master
to match updates to the master branch. To create
groupings, you can use the anyof
or allof
verbs in a definition.
Allows and denials are checked against all the definitions listed and
if all of them match, the appropriate action is taken.
Pay some attention to what conditions you group together, since a
basic operation (is_basic_op
, aka op_read
and op_write
) happens
before git is even involved and you don’t have a tree at that point,
so rules like:
define is_master ref refs/heads/master
allow "Devs can push" op_is_basic is_master
simply won’t work. You’ll want to use a group and check on that for basic operations and then have a separate rule to restrict refs.
2012-09-04 – Driving Jenkins using YAML and a bit of python
We recently switched from Buildbot to Jenkins at work, for building Varnish on various platforms. Buildbot worked-ish, but was a bit fiddly to get going on some platforms such as Mac OS and Solaris. Where buildbot has a daemon on each node that is responsible for contacting the central host, Jenkins uses SSH as the transport and centrally manages retries if a host goes down or is rebooted.
All in all, we are pretty happy with Jenkins, except for one thing: The job configurations are a bunch of XML files and the way you are supposed to configure this is through a web interface. That doesn’t scale particularly well when you want to build many very similar jobs. We want to build multiple branches, some which are not public and we want to build on many slaves. The latter we could partially solve with matrix builds, except that will fail the entire build if a single slave fails with an error that works on retry. As the number of slaves increases, such failures become more common.
To solve this, I hacked together a crude tool that takes a yaml file and writes the XML files. It’s not anywhere near as well structured and pretty as liw’s jenkinstool, but it is quite good at translating the YAML into a bunch of XML files. I don’t know if it’s useful for anybody else, there is no documentation and so on, but if you want to take a look, it’s on github.
Feedback is most welcome, as usual. Patches even more so.
2012-07-23 – Automating managing your on-call support rotation using google docs
At work, we have a rotation of who is on call at a given time. We have few calls, but they do happen and so it’s important to ensure both that a person is available, but also that they’re aware they are on call (so they don’t stray too far from their phone or a computer).
In the grand tradition of abusing spreadsheets, we are using google docs for the roster. It’s basically just two columns, one with date and one with user name. Since the volume is so low, people tend to be on call for about a week at a time, 24 hours a day.
Up until now, we’ve just had a pretty old and dumb phone that people have carried around, but that’s not really swish, so I have implemented a small system which grabs the current data, looks up the support person in LDAP and sends SMSes when people go on and off duty as well as reminding the person who’s on duty once a day.
If you’re interested, you can look at the (slightly redacted) script.
2011-10-21 – Today's rant about RPM
Before I start, I’ll admit that I’m not a real RPM packager. Maype I’m approaching this from completely the wrong direction, what do I know?
I’m in the process of packaging Varnish 3.0.2 which includes mangling the spec file. The top of the spec file reads:
%define v_rc
%define vd_rc %{?v_rc:-%{?v_rc}}
Apparently, this is not legal, since we’re trying to define v_rc as a macro with no body. It’s however not possible to directly define it as an empty string which can later be tested on, you have to do something like:
%define v_rc %{nil}
%define vd_rc %{?v_rc:-%{?v_rc}}
Now, this doesn’t work correctly either. %{?macro}
tests if macro
is defined, not whether it’s an empty string so instead of two lines,
we have to write:
%define v_rc %{nil}
%if 0%{?v_rc} != 0
%define vd_rc %{?v_rc:-%{?v_rc}}
%endif
The 0{?v_rc} != 0
workaround is there so that we don’t accidentially
end up with == 0
which would be a syntax error.
I think having four lines like that is pretty ugly, so I looked for a
workaround and figured that, ok, I’ll just rewrite every use of
%{vd_rc}
to %{?v_rc:-%{?v_rc}}
. There are only a couple, so the
damage is limited. Also, I’d then just comment out the v_rc
definition, since that makes it clear what you should uncomment to
have a release candidate version.
In my naivety, I tried:
# %define v_rc ""
#
is used as a comment character in spec files, but apparently not
for defines. The define was still processed and the build process
stopped pretty quickly.
Luckily, doing # % define ""
seems to work fine and is not
processed. I have no idea how people put up with this or if I’m doing
something very wrong. Feel free to point me at a better way of doing
this, of course.
2011-10-05 – The SugarCRM rest interface
We use SugarCRM at work and I’ve complained about its not-very-RESTy REST interface. John Mertic a (the?) SugarCRM Community Manager asked me about what problems I’d had (apart from its lack of RESTfulness) and I said I’d write a blog post about it.
In our case, the REST interface is used to integrate Sugar and RT so we get a link in both interfaces to jump from opportunities to the corresponding RT ticket (and back again). This should be a fairly trivial exercise or so you would think.
The problems, as I see it are:
- Not REST-y.
- Exposes the database tables all the way through the REST interface
- Lack of useful documentation forcing the developer to cargo cult and guess
- Annoying data structures
- Forced pagination
My first gripe is the complete lack of REST in the URLs. Everything
is just sent to https://sugar/service/v2/rest.php
. Usually a POST,
but sometimes a GET. It’s not documented what to use where.
The POST parameters we send when logging in are:
method=>"login"
input_type=>"JSON"
response_type=>"JSON"
rest_data=>json($params)
$params is a hash as follows:
user_auth => {
user_name => $USERNAME,
password => $PW,
version => "1.2",
},
application => "foo",
Nothing seems to actually care about the value of application
, nor
about the user_auth.version
value. The password is the md5 of the
actual password, hex encoded. I’m not sure why it is, as this adds
absolutely no security, but it is. This is also not properly
documented.
This gives us a JSON object back with a somewhat haphazard selection of attributes (reformatted here for readability):
{
"id":"<hex session id>,
"module_name":"Users",
"name_value_list": {
"user_id": {
"name":"user_id",
"value":"1"
},
"user_name": {
"name":"user_name",
"value":"<username>"
},
"user_language": {
"name":"user_language",
"value":"en_us"
},
"user_currency_id": {
"name":"user_currency_id",
"value":"-99"
},
"user_currency_name": {
"name":"user_currency_name",
"value":"Euro"
}
}
}
What is the module_name
? No real idea. In general, when you get
back an id
and a module_name
field, it tells you that the id
exists is an object that exists in the context of the given module.
Not here, since the session id is not a user.
The worst here is the name_value_list
concept which is used all over
the REST interface. First, it’s not a list, it’s a hash. Secondly, I
have no idea what would be wrong by just using keys directly in the
top level object, so the object would have looked somewhat like:
{
"id":"<hex session id>,
"user_id": 1,
"user_name": "<username>,
"user_language":"en_us",
"user_currency_id": "-99",
"user_currency_name": "Euro"
}
Some people might argue that since you can have custom field names
this can cause clashes. Except, it can’t, since they’re all suffixed
with _c
.
So we’re now logged in and can fetch all opportunities. This we do by posting:
method=>"get_entry_list",
input_type=>"JSON",
response_type=>"JSON",
rest_data=>to_json([
$sid,
$module,
$where,
"",
$next,
$fields,
$links,
1000
])
$sid
is our session id from the login$module
is “Opportunities”$where
isopportunities_cstm.rt_id_c IS NOT NULL
. Yes, that’s right. An SQL fragment right there and you have to know that you’ll join theopportunities_cstm
andopportunities
tables because we are using a custom field. I find this completely crazy.$next
starts out at 0 and we’re limited to 1000 entries at a time. There is, apparently, no way to say “just give me all you have”.$fields
is an array, in our case consisting ofid
,name
,description
,rt_id_c
andrt_status_c
. To find out the field names, look at the database schema or poke around in the SugarCRM studio.$links
is to link records together. I still haven’t been able to make this work properly and just do multiple queries.- 1000 is the maximum number of records. No, you can’t say -1 and get everything.
Why this is a list rather than a hash? Again, I don’t know. A hash would make more sense to me.
The resulting JSON looks like:
{
"result_count" : 16,
"relationship_list" : [],
"entry_list" : [
{
"name_value_list" : {
"rt_status_c" : {
"value" : "resolved",
"name" : "rt_status_c"
},
[…]
},
"module_name" : "Opportunities",
"id" : "<entry_uuid>"
},
[…]
],
"next_offset" : 16
}
Now, entry_list
actually is a list here, which is good and all, but
there’s still the annoying name_value_list
concept.
Last, we want to update the record in Sugar, to do this we do:
method=>"set_entry",
input_type=>"JSON",
response_type=>"JSON",
rest_data=>to_json([
$sid,
"Opportunities",
$fields
])
$fields
is not a name_value_list
, but instead is:
{
"rt_status_c" : "resolved",
"id" : "<status text>"
}
Why this works and my attempts at using a proper name_value_list
didn’t work? I have no idea.
I think that pretty much sums it up. I’m sure there are other problems in there (such as the over 100 lines of support code for the about 20 lines of actual code that does useful work), though.
2011-08-31 – Bizarre slapd (and gnutls) failures
Just this morning, I was setting up TLS on a LDAP host, but slapd
refused to start afterwards with a bizarre error message:
TLS init def ctx failed: -207
The key and certificate was freshly generated using openssl
on my
laptop (running wheezy, so OpenSSL 1.0.0d-3). After a bit of
googling, I discovered that -207 is gnutls-esque for “Base64 error”.
Of course, the key looks just fine and decodes fine using base64
,
openssl base64
and even gnutls’s own certtool
.
Now, certtool
also spits out what it considers the right base64
version of the key and I noticed it differed. Using the one
certtool
output seems to work, though, so if you ever run into this
problem try running the key through certtool --infile foo.pem -k
and
use the base64 representation it outputs.
2011-08-03 – libvmod_curl – using cURL from inside Varnish Cache
It’s sometimes necessary to be able to access HTTP resources from inside VCL. Some use cases include authentication or authorization where a service validates a token and then tell Varnish whether to proceed or not.
To do this, we recently implemented libvmod_curl
which is a set of
cURL bindings for VCL so you can fetch remote resource easily. HTTP
would be the usual method, but cURL also supports other protocols such
as LDAP or POP3.
The API is very simple, to use it you would do something like:
require curl;
sub vcl_recv {
curl.fetch("http://authserver/validate?key=" + regsub(req.url, ".*key=([a-z0-9]+), "\1"));
if (curl.status() != 200) {
error 403 "Go away";
}
}
Other methods you can use are curl.header(headername)
to get the
contents of a given header and curl.body()
to get the body of the
response. See the README file in the source for more information.
2011-05-21 – Upgrading Alioth
A while ago, we got another machine for hosting Alioth and so we started thinking about how to use that machine. It’s a used machine and not massively faster than the current hardware, so just moving everything over wouldn’t actually get us that much of a performance upgrade.
However, Alioth is using FusionForge, which is supposed to be able to run on a cluster of machines. After all, this was originally built for SourceForge.net, which certainly does not run on a single host. So, a split of services is what we’ll do.
This weekend, we’re having a sprint in Collabora’s office in Cambridge, actually implementing the split and doing a bit of general planning for the future.
Last afternoon (Friday), European time, we started the migration. The first step is to move all the data off the Xen guest on wagner, where Alioth is currently hosted. This finished a few minutes ago; it turns out syncing about 8.5 million files across almost 400G of data takes a little while.
The new host is called vasks and will host the database, run the main apache and be the canonical location for the various SCM repositories.
We are not decomissioning wagner, but it’ll be reinstalled without Xen or other virtualisation which should help performance a bit. It’ll host everything that has lower performance requirements such as cron jobs, mailing lists and so on.
I’ll try to keep you all updated and feel free to drop by #alioth on irc.debian.org if you have any questions.
2010-11-30 – My Varnish is leaking memory
Every so often, we get bug reports about Varnish leaking memory. People have told Varnish to use 20 gigabytes for cache and they discover the process is eating 30 gigabytes of memory and they get confused about what’s going on. So, let’s take a look.
First, a little bit of history. Varnish 2.0 had a fixed per-object
workspace which was used for both header manipulations in vcl_fetch
as well as for storing the headers of the object when vcl_fetch
was
done. The default size of this workspace was 8k. If we assume an
average object size of 20k, that is almost 1/3 of the store being
overhead.
With 2.1, this changed. First, vcl_fetch
doesn’t have obj
any
longer, it only has beresp
which is the backend response. At the
end of vcl_fetch
, the headers and other relevant bits of the backend
response are copied into an object. This means we no longer have a
fixed overhead, we use what we need. Of course, we’re still subject
to malloc’s whims when it comes to page sizes and how it actually
allocates memory.
Less overhead means more objects in the store. More objects in the store, means, everything else being equal, more overhead outside the store (for the hash buckets or critbit tree and other structs). This is where lots of people get confused, since what they see is just Varnish consuming more memory. When moving from 2.0 to 2.1, people should lower their cache size. How much depends on the amount of objects they have, but if they have many and small objects, a significant reduction might be needed. For a machine dedicated to Varnish, we usually recommend making the cache size be 70-75% of the memory of the machine.
A reasonable question to ask at this point is what all this overhead is being used for. Part of it is a per-thread overhead. Linux has a 10MB stack size by default, but luckily, most of it isn’t allocated, so it only counts against virtual, not resident memory. In addition, we have a hash algorithm which has overhead and the headers from the objects are stored in the object itself and not in the stevedore (object store). Last, but by no means least, we usually see an overhead of around 1k per object, but I have seen up to somewhere above 2k. This doesn’t sound like much, but when you’re looking at servers with 10 million objects, 1k of overhead means 10 gigabytes of total overhead, leading to the confusion I talked about at the start.