phk [Fri, 15 Feb 2008 08:54:20 +0000 (08:54 +0000)]
Determine our backend (using the director) before we filter the req
into the bereq, in order to be able to assign a default Host: header
if there is none.
des [Wed, 13 Feb 2008 13:05:36 +0000 (13:05 +0000)]
Some source files (especially in libraries) have embedded test
programs. Add a configure option and a corresponding automake
conditional to enable these tests.
des [Mon, 11 Feb 2008 10:46:09 +0000 (10:46 +0000)]
Update backend declaration syntax. Note that a) vcl.7 needs a partial
rewrite to track this change, and b) there have been other changes which
also need to be merged in.
phk [Thu, 7 Feb 2008 09:52:26 +0000 (09:52 +0000)]
Remove ident string from directors, they are not recycled.
Add vcl_name to backend hosts. Simple backends get the obvious
name. Directors inlined backend hosts gets the directors name
with an array suffix, for instance "b1[1]".
phk [Wed, 6 Feb 2008 15:19:49 +0000 (15:19 +0000)]
First part of major backend overhaul.
*** Please do not use -trunk in production until I say so again ***
I have not entirely decided in the precise terminology, so the following
may sound a lot more complicated than it really is:
In VCL we can now have "backends" and "directors" both of which we
treat as a "backend".
When we define backends and directors in VCL, they refer to "backend
hosts" which is just another way to say "hostname+portname" but later
these will grow other parameters (max connections etc).
A director is a piece of code that selects a "backend host" somehow,
"random" and "round-robin" are the first algorithms. A backend
can still be specified directly of course, that's the "simple director"
that always return the same "backend host".
This is probably where an example is in order:
/* A backend as we know it */
backend b1 {
.host = "fs";
.port = "80";
}
/* A director */
director b2 random {
{
/* We can refer to named backends */
.backend = b1;
.weight = 7;
}
{
/* Or define them inline */
.backend = {
.host = "fs2";
}
.weight = 3;
}
}
sub vcl_recv {
if (req.url ~ "\[[a-z]]") {
set req.backend = b2;
} else {
set req.backend = b1;
}
}
This results in quite a lot of changes in the C code, VRT API and
VCL compiler, the major thrust being:
Directors like "simple" and "random" will not have to think about
the actual connections to the backends, but just concentrate on
selecting which backend should be used.
When a new VCL is loaded, it will instantiate all directors, but
try to reuse any preexisting "backend hosts" (which we still
call "backend" in the C code).
This is simple for a backend like "b1" in the example above, but
sligthly more complex for the backend inlined in b2. The VCL
compiler solves this, by qualifying the ident string for the inlined
backend host with the prefix "b2 random :: 2 :: ", so that a reload
of the same director with the same (unchanged) inline backend host
will match, but none other will.
One implication of instantiating all directors for every VCL load,
is that private statistics cannot be reused, but stats on the
backend host can. This is likely a very fine point of no consequence.
Once the backend is selected by the director, the generic code in
cache_backend.c will cope with reusing the connection pool,
establishing connections and all that, moving most of the nastyness
out of directors, leaving cache_dir_simple.c with only 96 lines of
code, of which the license is a large fraction.
Until now, we have done automatic DNS re-lookups, but they seem to
cause more grief than advantage (I suspect some of the DNS lookups
to be resposible for long timeouts), so that will be dropped, and
instead we might add an explicit CLI command for this later.
The code as here committed can handle a couple of simple requests,
but there are a large number of INCOMPL()'s that need to be resolved
before this is ready for prime time again.
des [Sun, 3 Feb 2008 22:27:15 +0000 (22:27 +0000)]
Clean up checks for non-portable pthread extensions, and add a check for
pthread_mutex_islocked_np() (not present on any platform I know of, but I
am testing a FreeBSD patch)
phk [Sun, 3 Feb 2008 15:59:01 +0000 (15:59 +0000)]
Look for the new SF_SYNC facility in FreeBSDs sendfile(2), and if we
find it, allow its use, but still default to off via sendfile_threshold
paramter.
SF_SYNC is only available in FreeBSD-current as of a few seconds
ago, and is unlikely to appear in any release before FreeBSD-8.0
for intricate reasons of ABI compliance.
des [Sat, 2 Feb 2008 10:58:05 +0000 (10:58 +0000)]
Add an ALOCKED() macro which asserts that a mutex is locked. Unfortunately,
there is no portable way to do this, so we have to fake it by trying to lock
the mutex and assert that it fails. This can be very expensive, so we only
do it when built with --enable-diagnostics.
phk [Tue, 29 Jan 2008 16:05:54 +0000 (16:05 +0000)]
I am not sure if this is a/the race some users are seeing, or if it
even can have any effect, but this will close it at a cost of one
extra kevent(2) every 100ms timer tick.
The (perceived) problem is that we have pending kqueue changes we
have not yet told the kernel about, then close a number of expired
FD's which might be instantly be recycled by the accept(2) over in
the other thread before we tell the kernel about the pending changes.
In that case, the kernel has no way of knowing that our changes
referred to the previous instance of the fd and not the new one.
The solution is to push the changes to the kernel before servicing
the timer.
phk [Mon, 28 Jan 2008 09:09:12 +0000 (09:09 +0000)]
Instead of sleeping as soon as we see a busy object, traverse the rest
of the objects on the objecthead to see if there is anything we can use.
This unpessimizes Vary: processing, where we previously might go to sleep
on a busy object despite the fact that we have a good and valid object
with the Vary: we desire.
phk [Mon, 28 Jan 2008 09:01:38 +0000 (09:01 +0000)]
With Vary, Prefetch and degraded mode, a session sleeps not on a
particular object, because we cannot know beforehand if it will work out
for us, but sleeps on any one of potentially multiple busy objects becoming
ready for us to test against.
Therefore it makes sense to move the waiting list from the object to the
object head, as this both simplifies the code and eliminates a refhold on
busy objects.
phk [Mon, 28 Jan 2008 08:46:15 +0000 (08:46 +0000)]
Deoptimize the central object matching loop in the hash code:
With the advent of prefetch and degraded mode, the invariants of
objectheads change so that more than one object can be busy at any
one time.
Thus we can no longer assume that the busy object or one subsequent to
it, is the one we eventually desire, and we must start our search from
the front of the list again.
As an amusing sidenote: this eliminates the only "goto" in all of varnishd.
des [Fri, 25 Jan 2008 15:38:18 +0000 (15:38 +0000)]
Roundup of old uncommitted changes: Getopt::Long cleanup, IO::Multiplex
cleanup, statistics. Also improve banning, and avoid // which is only
available in very recent Perl versions.
des [Wed, 23 Jan 2008 16:23:28 +0000 (16:23 +0000)]
It is possible for VSS_parse() to succeed and return a NULL addr but a
non-NULL port (e.g. ":80" which is a valid listening address). In that
case, port should be free()d before returning.
des [Wed, 23 Jan 2008 15:45:03 +0000 (15:45 +0000)]
Don't assume that res0 != NULL automatically means i == 0. I can't say for
sure (without more coffee) that the assumption is incorrect, but it makes
the code gratuitously non-transparent.
des [Wed, 23 Jan 2008 13:45:48 +0000 (13:45 +0000)]
Add -c and -r options:
- If the former is specified, fetcher will go into a loop after having
traversed the entire tree, and continuously re-fetch all known URLs.
- The latter is not yet implented, but the idea is to assign a random
probability to each URL based on an inverse-exponential (or similar)
distribution, and re-fetch URLs at random according to this frequency.
This will help simulate a "short head long tail" scenario.
Some restructuring.
Add a comment about a possible improvement which will help work around
bugs in certain commonly used data sets (e.g. the Apache httpd manual)
which can result in an infinite set of URLs (which in reality map to
a fairly large but finite set of pages)