We recently switched from Buildbot to Jenkins at work, for
building Varnish on various platforms. Buildbot worked-ish, but
was a bit fiddly to get going on some platforms such as Mac OS and
Solaris. Where buildbot has a daemon on each node that is responsible
for contacting the central host, Jenkins uses SSH as the transport and
centrally manages retries if a host goes down or is rebooted.
All in all, we are pretty happy with Jenkins, except for one thing:
The job configurations are a bunch of XML files and the way you are
supposed to configure this is through a web interface. That doesn't
scale particularly well when you want to build many very similar
jobs. We want to build multiple branches, some which are not public
and we want to build on many slaves. The latter we could partially
solve with matrix builds, except that will fail the entire build if a
single slave fails with an error that works on retry. As the number
of slaves increases, such failures become more common.
To solve this, I hacked together a crude tool that takes a yaml
file and writes the XML files. It's not anywhere near as well
structured and pretty as liw's jenkinstool, but it is quite good
at translating the YAML into a bunch of XML files. I don't know if
it's useful for anybody else, there is no documentation and so on, but
if you want to take a look, it's on github.
Feedback is most welcome, as usual. Patches even more so.
Every so often, we get bug reports about Varnish leaking memory.
People have told Varnish to use 20 gigabytes for cache and they
discover the process is eating 30 gigabytes of memory and they get
confused about what's going on. So, let's take a look.
First, a little bit of history. Varnish 2.0 had a fixed per-object
workspace which was used for both header manipulations in vcl_fetch
as well as for storing the headers of the object when vcl_fetch was
done. The default size of this workspace was 8k. If we assume an
average object size of 20k, that is almost 1/3 of the store being
overhead.
With 2.1, this changed. First, vcl_fetch doesn't have obj any
longer, it only has beresp which is the backend response. At the
end of vcl_fetch, the headers and other relevant bits of the backend
response are copied into an object. This means we no longer have a
fixed overhead, we use what we need. Of course, we're still subject
to malloc's whims when it comes to page sizes and how it actually
allocates memory.
Less overhead means more objects in the store. More objects in the
store, means, everything else being equal, more overhead outside the
store (for the hash buckets or critbit tree and other structs). This
is where lots of people get confused, since what they see is just
Varnish consuming more memory. When moving from 2.0 to 2.1, people
should lower their cache size. How much depends on the amount of
objects they have, but if they have many and small objects, a
significant reduction might be needed. For a machine dedicated to
Varnish, we usually recommend making the cache size be 70-75% of the
memory of the machine.
A reasonable question to ask at this point is what all this overhead
is being used for. Part of it is a per-thread overhead. Linux has a
10MB stack size by default, but luckily, most of it isn't allocated,
so it only counts against virtual, not resident memory. In addition,
we have a hash algorithm which has overhead and the headers from the
objects are stored in the object itself and not in the stevedore
(object store). Last, but by no means least, we usually see an
overhead of around 1k per object, but I have seen up to somewhere
above 2k. This doesn't sound like much, but when you're looking at
servers with 10 million objects, 1k of overhead means 10 gigabytes of
total overhead, leading to the confusion I talked about at the start.
Currently, varnishlog does not support very advanced filtering. If
you run it with -o, you can also do a regular expression match on tag
- expression. An example would be
varnishlog -o TxStatus 404 to only
show log records where the transmitted status is 404 (not found).
While in Brazil, I needed something a bit more expressive. I needed
something that would tell me if I had vcl_recv call pass and the URL
ended in .jpg.
varnishlog -o -c | perl -ne 'BEGIN { $/ = "";} print if
(/RxURL.*jpg$/m and /VCL_call.*recv pass/);'
fixed this for me.