Paul Mackerras [Fri, 10 Aug 2007 11:04:07 +0000 (21:04 +1000)]
[POWERPC] Fix potential duplicate entry in SLB shadow buffer
We were getting a duplicate entry in the SLB shadow buffer in
slb_flush_and_rebolt() if the kernel stack was in the same segment
as PAGE_OFFSET, which on POWER6 causes the hypervisor to terminate
the partition with an error. This fixes it.
Also we were not creating an SLB entry (or an SLB shadow buffer
entry) for the kernel stack on secondary CPUs when starting the
CPU. This isn't a major problem, since an appropriate entry will
be created on demand, but this fixes that also for consistency.
Jesper Juhl [Wed, 8 Aug 2007 23:31:30 +0000 (16:31 -0700)]
SLUB: Fix format specifier in Documentation/vm/slabinfo.c
There's a little problem in Documentation/vm/slabinfo.c
The code is using "%d" in a printf() call to print an 'unsigned long'.
This patch corrects it to use "%lu" instead.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>
The dynamic dma kmalloc creation can run into trouble if a
GFP_ATOMIC allocation is the first one performed for a certain size
of dma kmalloc slab.
- Move the adding of the slab to sysfs into a workqueue
(sysfs does GFP_KERNEL allocations)
- Do not call kmem_cache_destroy() (uses slub_lock)
- Only acquire the slub_lock once and--if we cannot wait--do a trylock.
This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
for a range of sizes failing due to another process holding the slub_lock.
However, we only need to acquire the spinlock once in order to establish
each power of two DMA kmalloc cache. The possible conflict is with the
slub_lock taken during slab management actions (create / remove slab cache).
It is rather typical that a driver will first fill its buffers using
GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
Drivers will also create its slab caches first outside of an atomic
context before starting to use atomic kmalloc from an interrupt context.
If there are any failures then they will occur early after boot or when
loading of multiple drivers concurrently. Drivers can already accomodate
failures of GFP_ATOMIC for other reasons. Retries will then create the slab.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
SLUB: Remove checks for MAX_PARTIAL from kmem_cache_shrink
The MAX_PARTIAL checks were supposed to be an optimization. However, slab
shrinking is a manually triggered process either through running slabinfo
or by the kernel calling kmem_cache_shrink.
If one really wants to shrink a slab then all operations should be done
regardless of the size of the partial list. This also fixes an issue that
could surface if the number of partial slabs was initially above MAX_PARTIAL
in kmem_cache_shrink and later drops below MAX_PARTIAL through the
elimination of empty slabs on the partial list (rare). In that case a few
slabs may be left off the partial list (and only be put back when they
are empty).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Alan Cox [Wed, 8 Aug 2007 23:57:54 +0000 (00:57 +0100)]
remove dubious legal statment from uio-howto
UIO currently contains a rather dubious statement which wants removing.
The actual questions around whether user space code that depends tightly
on kernel GPL code designed to co-work with it are derivative works of
the kernel is extremely complex, and since we don't have space for either
a masters length essay on legal issues or need to start flamewars lets
simply remove the comment and leave law to lawyers
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <tovalds@linux-foundation.org>
Linus Torvalds [Thu, 9 Aug 2007 15:38:14 +0000 (08:38 -0700)]
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Replace flush_workqueue() with cancel_work_sync() and friends
NFS: Replace flush_scheduled_work with cancel_work_sync() and friends
SUNRPC: Don't call gss_delete_sec_context() from an rcu context
NFSv4: Don't call put_rpccred() from an rcu callback
NFS: Fix NFSv4 open stateid regressions
NFSv4: Fix a locking regression in nfs4_set_mode_locked()
NFS: Fix put_nfs_open_context
SUNRPC: Fix a race in rpciod_down()
Linus Torvalds [Thu, 9 Aug 2007 15:27:25 +0000 (08:27 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix memory leak when cpu hotplugging.
[SPARC64]: Do not assume sun4v chips have load-twin/store-init support.
[SPARC64]: Fix hard-coding of cpu type output in /proc/cpuinfo on sun4v.
[SPARC]: Centralize find_in_proplist() instead of duplicating N times.
Linus Torvalds [Thu, 9 Aug 2007 15:23:47 +0000 (08:23 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
mmc: at91_mci: remove whitespace at the end of lines
mmc: reorganize bounce buffer init
wbsd: fix section mismatch warnings
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (61 commits)
sched: refine negative nice level granularity
sched: fix update_stats_enqueue() reniced codepath
sched: round a bit better
sched: make the multiplication table more accurate
sched: optimize update_rq_clock() calls in the load-balancer
sched: optimize activate_task()
sched: clean up set_curr_task_fair()
sched: remove __update_rq_clock() call from entity_tick()
sched: move the __update_rq_clock() call to scheduler_tick()
sched debug: remove the 'u64 now' parameter from print_task()/_rq()
sched: remove the 'u64 now' local variables
sched: remove the 'u64 now' parameter from deactivate_task()
sched: remove the 'u64 now' parameter from dequeue_task()
sched: remove the 'u64 now' parameter from enqueue_task()
sched: remove the 'u64 now' parameter from dec_nr_running()
sched: remove the 'u64 now' parameter from inc_nr_running()
sched: remove the 'u64 now' parameter from dec_load()
sched: remove the 'u64 now' parameter from inc_load()
sched: remove the 'u64 now' parameter from update_curr_load()
sched: remove the 'u64 now' parameter from ->task_new()
...
lguest: avoid shared libraries mapped over guest memory
Some versions of ld.so mmap the shared libraries right in over guest
memory, so compile lguest statically by default.
[ FC7 maps shared libraries very low, where the launcher maps guest's
physical memory. Quick fix is to link Launcher static, real fix is
for 2.6.24. ]
-static is a simple fix. I expect this problem will be more common than we
like, as different distro's make different "improvements" to ld.so
Signed-off-by: Ronald G. Minnich <rminnich@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Thu, 9 Aug 2007 10:57:13 +0000 (20:57 +1000)]
lguest: Fix Malicious Guest GDT Host Crash
If a Guest makes hypercall which sets a GDT entry to not present, we
currently set any segment registers using that GDT entry to 0.
Unfortunately, this is not sufficient: there are other ways of
altering GDT entries which will cause a fault.
The correct solution to do what Linux does: let them set any GDT value
they want and handle the #GP when popping causes a fault. This has
the added benefit of making our Switcher slightly more robust in the
case of any other bugs which cause it to fault.
We kill the Guest if it causes a fault in the Switcher: it's the
Guest's responsibility to make sure it's not using segments when it
changes them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Thu, 9 Aug 2007 10:52:35 +0000 (20:52 +1000)]
Fix non-TSC guest clocksource lockup
lguest uses a host-supplied wallclock-based clocksource when the TSC
is not reliable. As this is already in nanoseconds, I naively used a
multiplier of 1 and a shift of 0.
But update_wall_time() in its infinite wisdom decides to adjust the
clock a little (where does it think it's getting a more accurate time
from?)
It will happily tweak the multiplier... to 0, then -1.
So the "fix" is to use a shift of 22 like everyone else, and a
multiplier of 1 << 22.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 9 Aug 2007 15:10:16 +0000 (08:10 -0700)]
Revert "genirq: temporary fix for level-triggered IRQ resend"
This reverts commit 0fc4969b866671dfe39b1a9119d0fdc7ea0f63e5. It was
always meant to be temporary, but it's generating more useless noise
than anything else, and we probably should never have done it in the
generic kernel (only had the people involved test it on their own).
Gabriel C [Thu, 2 Aug 2007 18:20:44 +0000 (20:20 +0200)]
wbsd: fix section mismatch warnings
This patch fixes the following section mismatch warnings
...
WARNING: vmlinux.o(.init.text+0x29d40): Section mismatch: reference to .exit.text:wbsd_release_resources (between 'wbsd_init' and 'wbsd_probe')
WARNING: vmlinux.o(.init.text+0x29d49): Section mismatch: reference to .exit.text:wbsd_free_mmc (between 'wbsd_init' and 'wbsd_probe')
WARNING: vmlinux.o(.init.text+0x29f28): Section mismatch: reference to .exit.text:wbsd_free_mmc (between 'wbsd_init' and 'wbsd_probe')
...
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Ingo Molnar [Thu, 9 Aug 2007 09:16:52 +0000 (11:16 +0200)]
sched: refine negative nice level granularity
refine the granularity of negative nice level tasks: let them
reschedule more often to offset the effect of them consuming
their wait_runtime proportionately slower. (This makes nice-0
task scheduling smoother in the presence of negatively
reniced tasks.)
Ingo Molnar [Thu, 9 Aug 2007 09:16:51 +0000 (11:16 +0200)]
sched: optimize update_rq_clock() calls in the load-balancer
optimize update_rq_clock() calls in the load-balancer: update them
right after locking the runqueue(s) so that the pull functions do
not have to call it.
Ingo Molnar [Thu, 9 Aug 2007 09:16:51 +0000 (11:16 +0200)]
sched: optimize activate_task()
optimize activate_task() by removing update_rq_clock() from it.
(and add update_rq_clock() to all callsites of activate_task() that
did not have it before.)
Ingo Molnar [Thu, 9 Aug 2007 09:16:47 +0000 (11:16 +0200)]
sched: remove 'now' use from assignments
change all 'now' timestamp uses in assignments to rq->clock.
( this is an identity transformation that causes no functionality change:
all such new rq->clock is necessarily preceded by an update_rq_clock()
call. )
Ingo Molnar [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: add [__]update_rq_clock(rq)
add the [__]update_rq_clock(rq) functions. (No change in functionality,
just reorganization to prepare for elimination of the heavy 64-bit
timestamp-passing in the scheduler.)
Peter Williams [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: fix bug in balance_tasks()
There are two problems with balance_tasks() and how it used:
1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).
2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.
The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.
load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Josh Triplett [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: mark print_cfs_stats static
sched_fair.c defines print_cfs_stats, and sched_debug.c uses it, but sched.c
includes both sched_fair.c and sched_debug.c, so all the references to
print_cfs_stats occur in the same compilation unit. Thus, mark
print_cfs_stats static.
Eliminates a sparse warning:
warning: symbol 'print_cfs_stats' was not declared. Should it be static?
Ulrich Drepper [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: clean up sched_getaffinity()
here's another tiny cleanup. The generated code is not affected (gcc is
smart enough) but for people looking over the code it is just irritating
to have the extra conditional.
Peter Williams [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: simplify move_tasks()
The move_tasks() function is currently multiplexed with two distinct
capabilities:
1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.
The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.
The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.
This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()). However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:
1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.
One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list. This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.
Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).
NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.
[ mingo@elte.hu ]
this change also reduces code size nicely:
text data bss dec hex filename
39216 3618 24 42858 a76a sched.o.before
39173 3618 24 42815 a73f sched.o.after
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 9 Aug 2007 09:16:45 +0000 (11:16 +0200)]
sched: reorder update_cpu_load(rq) with the ->task_tick() call
Peter Williams suggested to flip the order of update_cpu_load(rq) with
the ->task_tick() call. This is a NOP for the current scheduler (the
two functions are independent of each other), ->task_tick() might
create some state for update_cpu_load() in the future (or in PlugSched).
David S. Miller [Thu, 9 Aug 2007 00:32:33 +0000 (17:32 -0700)]
[SPARC64]: Fix memory leak when cpu hotplugging.
Every time a cpu is added via hotplug, we allocate the per-cpu MONDO
queues but we never free them up. Freeing isn't easy since the first
cpu gets this memory from bootmem.
Therefore, the simplest thing to do to fix this bug is to allocate the
queues for all possible cpus at boot time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 7 Aug 2007 23:01:46 +0000 (00:01 +0100)]
fix oops in __audit_signal_info()
The check for audit_signals is misplaced and the check for
audit_dummy_context() is missing; as the result, if we send a signal to
auditd from task with NULL ->audit_context while we have audit_signals
!= 0 we end up with an oops.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix estimation of maxRTT. The original code ignores rtt measurements
during slow start (via the check tp->snd_ssthresh < 0xFFFF) yet this
is probably a good time to try to estimate max rtt as delayed acking
is disabled and slow start will only exit on a loss which presumably
corresponds to a maxrtt measurement. Second, the original code (via
the check htcp_ccount(ca) > 3) ignores rtt data during what it
estimates to be the first 3 round-trip times. This seems like an
unnecessary check now that the RCV timestamp are no longer used
for rtt estimation.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: ctnetlink: return EEXIST instead of EINVAL for existing nat'ed conntracks
ctnetlink must return EEXIST for existing nat'ed conntracks instead of
EINVAL. Only return EINVAL if we try to update a conntrack with NAT
handlings (that is not allowed).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Juhl [Wed, 8 Aug 2007 01:10:54 +0000 (18:10 -0700)]
[NETFILTER]: ipt_recent: avoid a possible NULL pointer deref in recent_seq_open()
If the call to seq_open() returns != 0 then the code calls
kfree(st) but then on the very next line proceeds to
dereference the pointer - not good.
Problem spotted by the Coverity checker.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Wed, 8 Aug 2007 00:53:10 +0000 (17:53 -0700)]
[NetLabel]: add missing rcu_dereference() calls in the LSM domain mapping hash table
The LSM domain mapping head table pointer was not being referenced via the RCU
safe dereferencing function, rcu_dereference(). This patch adds those missing
calls to the NetLabel code.
This has been tested using recent linux-2.6 git kernels with no visible
regressions.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
> >> Looks like memset() is zeroing wrong nr of bytes.
> >
> > Good catch, however, I think we can just remove this memset altogether
> > since the memory gets allocated via kzalloc.
>
> Correct, that memset() is superfluous.