schedule_on_each_cpu() presently does a large kmalloc - 96 kbytes on 1024 CPU
64-bit.
Rework it so that we do one 8192-byte allocation and then a pile of tiny ones,
via alloc_percpu(). This has a much higher chance of success (100% in the
current VM).
This also has the effect of reducing the memory requirements from NR_CPUS*n to
num_possible_cpus()*n.
Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>