Let's put my money where my mouth is. Smaller code is almost always
faster, if only because a single I$ miss ends up leaving a lot of cycles
to make up for. And system software - kernels in particular - are known
for taking more cache misses than most other kinds.
On my random config, this made the kernel about 10% smaller, and lmbench
seems to say that it's pretty uniformly faster too. Your milage may vary.