Robert P. J. Day [Mon, 16 Jul 2007 06:39:57 +0000 (23:39 -0700)]
Remove unnecessary includes of spinlock.h under include/linux
Remove the obviously unnecessary includes of <linux/spinlock.h> under the
include/linux/ directory, and fix the couple errors that are introduced as
a result of that.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 16 Jul 2007 06:39:51 +0000 (23:39 -0700)]
percpu_counters(): use cpu notifiers
per-cpu counters presently must iterate over all possible CPUs in the
exhaustive percpu_counter_sum().
But it can be much better to only iterate over the presently-online CPUs. To
do this, we must arrange for an offlined CPU's count to be spilled into the
counter's central count.
We can do this for all percpu_counters in the machine by linking them into a
single global list and walking that list at CPU_DEAD time.
(I hope. Might have race windows in which the percpu_counter_sum() count is
inaccurate?)
Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 16 Jul 2007 06:39:50 +0000 (23:39 -0700)]
vxfs warning fixes
gcc-4.3:
fs/freevxfs/vxfs_lookup.c: In function 'vxfs_find_entry':
fs/freevxfs/vxfs_lookup.c:139: warning: cast from pointer to integer of different size
fs/freevxfs/vxfs_lookup.c: In function 'vxfs_readdir':
fs/freevxfs/vxfs_lookup.c:294: warning: cast from pointer to integer of different size
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 16 Jul 2007 06:39:50 +0000 (23:39 -0700)]
fuse warning fix
gcc-4.3:
fs/fuse/dir.c: In function 'parse_dirfile':
fs/fuse/dir.c:833: warning: cast from pointer to integer of different size
fs/fuse/dir.c:835: warning: cast from pointer to integer of different size
[miklos@szeredi.hu: use offsetof] Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime process
Fix ksoftirqd termination on cpu hotplug with naughty real time process.
Assuming the following case:
- Try to hot remove CPU2 from CPU1.
- There is a real time process on CPU2, and that process doesn't sleep at all.
- That rt process and ksoftirqd/2 is migrated to the CPU0
Then ksoftirqd/2 can't stop becasue that rt process runs everlastingly on
CPU0, and CPU1 waiting the ksoftirqd/2's termination hangs up. To fix this
problem, set the priority of ksoftirqd/2 to max one before kthread_stop().
Fix stop_machine_run problem with naughty real time process
stop_machine_run() does its work on "kstopmachine" thread having max
priority. However that thread get such priority after woken up.
Therefore, in the following case ...
- "kstopmachine" try to run on CPU1
- There is a real time process which doesn't relinquish CPU time
voluntary on CPU1
... "kstopmachine" can't start to run and the CPU on which
stop_machine_run() is runing hangs up. To fix this problem, call
sched_setscheduler() before waking up that thread.
This patch adds checking for granted memory while filling up inode data to
prevent possible NULL pointer usage. If there is not enough memory to fill
inode data we just mark it as "bad". Also some whitespace cleanup.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 16 Jul 2007 06:39:43 +0000 (23:39 -0700)]
Prevent an O_NDELAY writer from blocking when a tty write is blocked by the tty atomic writer mutex
Without this a tty write could block if a previous blocking tty write was
in progress on the same tty and blocked by a line discipline or hardware
event. Originally found and reported by Dave Johnson.
Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
UDF: check for allocated memory for data of new inodes
Add checking for granted memory for inode data at the moment of its
creation.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tomas Janousek [Mon, 16 Jul 2007 06:39:42 +0000 (23:39 -0700)]
Use boot based time for uptime in /proc
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
during suspend. This may cause confusion so I restore the old behaviour by
using the boot based time instead of monotonic for uptime.
Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tomas Janousek [Mon, 16 Jul 2007 06:39:42 +0000 (23:39 -0700)]
Use boot based time for process start time and boot time in /proc
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
process start times to become invalid after suspend. Using boot based time
for those restores the old behaviour and fixes the issue.
[akpm@linux-foundation.org: little cleanup] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
changed the monotonic time so that it no longer jumps after resume, but it's
not possible to use it for boot time and process start time calculations then.
Also, the uptime no longer increases during suspend.
I add a variable to track the wall_to_monotonic changes, a function to get the
real boot time and a function to get the boot based time from the monotonic
one.
[akpm@linux-foundation.org: remove exports, add comment] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before calling init_hwif_default, ide_unregister gets lock ide_lock and
disables irq. init_hwif_default calls ide_default_io_base which calls
pci_get_device and later pci_get_subsys tries to apply for semaphore
pci_bus_sem and goes to sleep.
Mostly, pci_get_device should be called when irq is turned on.
ide_default_io_base just needs find if list pci_devices is empty.
Jan Engelhardt [Mon, 16 Jul 2007 06:39:30 +0000 (23:39 -0700)]
Use menuconfig objects II - oprofile
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Philippe Elie <phil.el@wanadoo.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Mon, 16 Jul 2007 06:39:29 +0000 (23:39 -0700)]
Use menuconfig objects II - misc strange dev
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Mon, 16 Jul 2007 06:39:26 +0000 (23:39 -0700)]
Use menuconfig objects II - auxdisplay
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Acked-by: Miguel Ojeda Sandonis <maxextreme@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Mon, 16 Jul 2007 06:39:01 +0000 (23:39 -0700)]
more scheduled OSS driver removal
This patch contains the scheduled removal of OSS drivers that:
- have ALSA drivers for the same hardware without known regressions and
- whose Kconfig options have been removed in 2.6.20.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
meanwhile. Or, more generically, system call done on /proc file, method
supplied by module is called, module dissapeares meanwhile.
NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.
NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.
Jeff Dike [Mon, 16 Jul 2007 06:38:58 +0000 (23:38 -0700)]
uml: remove dead file
I forgot this file was here. It hasn't been used since UML has been in
mainline.
Thanks to Jesper for finding something that needed doing to it, thus reminding
me of its existence.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:58 +0000 (23:38 -0700)]
uml: limit request size on COWed devices
COWed devices can't handle more than 32 (64 on x86_64) sectors in one request
due to the size of the bitmap being carried around in the io_thread_req.
Enforce that by telling the block layer not to put too many sectors in
requests to COWed devices.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:57 +0000 (23:38 -0700)]
uml: export hostfs symbols
Add some exports for hostfs that are required after Alberto Bertogli's fixes
for accessing unlinked host files.
Also did some style cleanups while I was here.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:56 +0000 (23:38 -0700)]
uml: Eliminate kernel allocator wrappers
UML had two wrapper procedures for kmalloc, um_kmalloc and um_kmalloc_atomic
because the flag constants weren't available in userspace code.
kern_constants.h had made kernel constants available for a long time, so there
is no need for these wrappers any more. Rather, userspace code calls kmalloc
directly with the userspace versions of the gfp flags.
kmalloc isn't a real procedure, so I had to essentially copy the inline
wrapper around __kmalloc.
vmalloc also had its own wrapper for no good reason. This is now gone.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:56 +0000 (23:38 -0700)]
uml: simplify helper stack handling
run_helper and run_helper_thread had arguments which were the same in all
callers. run_helper's stack_out was always NULL and run_helper_thread's
stack_order was always 0. These are now gone, and the constants folded
into the code.
Also fixed leaks of the helper stack in the AIO and SIGIO code.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:55 +0000 (23:38 -0700)]
uml: SIGIO support cleanup
Cleanup of the SIGWINCH support.
Some code and comment reformatting.
The stack used for SIGWINCH threads was leaked. This is now fixed by storing
it with the pid and other information, and freeing it when the thread is
killed.
If something goes wrong with a WIGWINCH thread, and this is discovered in the
interrupt handler, the winch record would leak. It is now freed, except that
the IRQ isn't freed. This is hard to do from interrupt context. This has the
side-effect that the IRQ system maintains a reference to the freed structure,
but that shouldn't cause a problem since the descriptor is disabled.
register_winch_irq is now much better about cleaning up after an
initialization failure.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:54 +0000 (23:38 -0700)]
uml: handle errors on opening host side of consoles
If the host side of a console can't be opened, this will now produce visible
error messages.
enable_chan now returns a status and this is passed up to con_open and
ssl_open, which will complain if anything went wrong.
The default host device for the serial line driver is now a pts device rather
than a pty device since lots of hosts have LEGACY_PTYS disabled. This had
always been failing on such hosts, but silently.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:53 +0000 (23:38 -0700)]
uml: pty channel tidying
Cleanup, mostly style violations.
Tidied the includes.
getmaster returns a real errno, which pty_open returns if there's a
problem.
The printks now have severity.
Changed os_* calls to call libc directly.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:52 +0000 (23:38 -0700)]
uml: xterm driver tidying
Major tidying of the xterm console driver:
got rid of the tt-mode gdb support
tidied up the includes
fixed lots of style violations
replaced os_* calls with glibc calls in xterm.c
all printk calls now have a severity indicator
the error paths of xterm_open are closer to being right
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DEBUG_SHIRQ generates spurious interrupts, triggering handlers such as
mconsole_interrupt() or line_interrupt(). They expect data to be available to
be read from their sockets/pipes, but in the case of spurious interrupts, the
host didn't actually send anything, so UML hangs in read() and friends.
Setting those fd's as O_NONBLOCK makes DEBUG_SHIRQ-enabled UML kernels boot
and run correctly.
Signed-off-by: Eduard-Gabriel Munteanu <maxdamage@aladin.ro> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:48 +0000 (23:38 -0700)]
uml: use get_free_pages to allocate kernel stacks
For some reason, I was using kmalloc instead of get_free_pages for kernel
stacks.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Mon, 16 Jul 2007 06:38:47 +0000 (23:38 -0700)]
uml: fix request->sector update
It is theoretically possible for a request to finish and be freed between
writing it to the I/O thread and updating the sector count. In this case, the
update will dereference a freed pointer.
To avoid this, I delay the update until processing the next sg segment, when
the request pointer is known to be good.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Mon, 16 Jul 2007 06:38:46 +0000 (23:38 -0700)]
CRIS: replace old-style member inits with designated inits
Replace the old-style structure member initializers with designated
initializers.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 16 Jul 2007 06:38:42 +0000 (23:38 -0700)]
ARM26: enable arbitary speed tty ioctls and split input/output speed
Add the ioctls and values needed for this to the ARM26/ARM32 ports. The
actual code has been in the base kernel for a while and automatically turns
on when a port sets the required defines.
Signed-off-by: Alan Cox <alan@redhat.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Mon, 16 Jul 2007 06:38:41 +0000 (23:38 -0700)]
fix alpha ISA support
isa_bus_to_virt() is still needed in a few places (lance.c, at least). When
we switch the kernel to using -Werror-implicit-function-declaration, the lack
of isa_bus_to_virt() breaks alpha allmodconfig builds.
Add isa_bus_to_virt() and deprecate the ezisting ISA APIs, though it might be
better to define these functions as BUG(), since virt_to_bus/bus_to_virt just
do wrong things on a number of machines.
[akpm@linux-foundation.org: build fix] Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sam Ravnborg [Mon, 16 Jul 2007 06:38:37 +0000 (23:38 -0700)]
alpha: fix trivial section mismatch warnings
Fix the following section mismatch warnings:
WARNING: arch/alpha/kernel/built-in.o(.text+0x7c78): Section mismatch: reference to .init.text:init_rtc_irq (between 'common_init_rtc' and 'timer_interrupt')
WARNING: arch/alpha/kernel/built-in.o(.text+0x7c7c): Section mismatch: reference to .init.text:init_rtc_irq (between 'common_init_rtc' and 'timer_interrupt')
WARNING: arch/alpha/kernel/built-in.o(.data+0x2c30): Section mismatch: reference to .init.text:srm_console_setup (between 'srmcons' and 'tsunami_pci_ops')
In all three cases functions marked __init was called outside __init context.
So the fix was to just drop the __init attribute.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Meelis Roos <mroos@linux.ee> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
m68knommu: remove old cache management cruft from mm code
Remove cache management cruft. This code is dead, all the cache manangement
functions for the ColdFire exist in the header file
include/asm-m68knommu/cacheflush.h.
. remove include files not needed
. remove not used CAT_ROMARRAY code
. remove generic machine pointers not used
. remove unused functions
. fix email address in copyrights
David Howells [Mon, 16 Jul 2007 06:38:27 +0000 (23:38 -0700)]
FRV: Be (self-)consistent and use CONFIG_GDB_CONSOLE everywhere
Be (self-)consistent and use CONFIG_GDB_CONSOLE everywhere rather than using
CONFIG_GDBSTUB_CONSOLE in some places and not others. This is also then
consistent with other archs.
Also remove the gdbstub console device() op which doesn't seem to be necessary
now (especially as it doesn't compile).
[Found by Robert P. J. Day <rpjday@mindspring.com>]
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The do_loop_readv_writev implementation of readv breaks out of the loop as
soon as a single read request didn't fill it's buffer:
if (nr != len)
break;
The generic_file_aio_read version doesn't. So if it hits EOF before the end
of the list of buffers, it will try again on the next buffer. If the file was
extended in the mean time, this will produce a bad result.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY
Fix a bug in mm/mlock.c on 32-bit architectures that prevents a user from
locking more than 4GB of shared memory, or allocating more than 4GB of
shared memory in hugepages, when rlim[RLIMIT_MEMLOCK] is set to
RLIM_INFINITY.
Signed-off-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Acked-by: Chris Mason <chris.mason@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mundt [Mon, 16 Jul 2007 06:38:24 +0000 (23:38 -0700)]
slob: sparsemem support
Currently slob is disabled if we're using sparsemem, due to an earlier
patch from Goto-san. Slob and static sparsemem work without any trouble as
it is, and the only hiccup is a missing slab_is_available() in the case of
sparsemem extreme. With this, we're rid of the last set of restrictions
for slob usage.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mspec_mmap was setting VM_LOCKED (without adjusting locked_vm): don't do
that, it serves no purpose in 2.6, other than to mess up the locked_vm
accounting - mspec's pages won't get reclaimed anyway. Thanks to Dmitry
Monakhov for raising the issue.
Paul Mundt [Mon, 16 Jul 2007 06:38:22 +0000 (23:38 -0700)]
slob: initial NUMA support
This adds preliminary NUMA support to SLOB, primarily aimed at systems with
small nodes (tested all the way down to a 128kB SRAM block), whether
asymmetric or otherwise.
We follow the same conventions as SLAB/SLUB, preferring current node
placement for new pages, or with explicit placement, if a node has been
specified. Presently on UP NUMA this has the side-effect of preferring
node#0 allocations (since numa_node_id() == 0, though this could be
reworked if we could hand off a pfn to determine node placement), so
single-CPU NUMA systems will want to place smaller nodes further out in
terms of node id. Once a page has been bound to a node (via explicit node
id typing), we only do block allocations from partial free pages that have
a matching node id in the page flags.
The current implementation does have some scalability problems, in that all
partial free pages are tracked in the global freelist (with contention due
to the single spinlock). However, these are things that are being reworked
for SMP scalability first, while things like per-node freelists can easily
be built on top of this sort of functionality once it's been added.
Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mundt [Mon, 16 Jul 2007 06:38:20 +0000 (23:38 -0700)]
mm: more __meminit annotations
Currently zone_spanned_pages_in_node() and zone_absent_pages_in_node() are
non-static for ARCH_POPULATES_NODE_MAP and static otherwise. However, only
the non-static versions are __meminit annotated, despite only being called
from __meminit functions in either case.
zone_init_free_lists() is currently non-static and not __meminit annotated
either, despite only being called once in the entire tree by
init_currently_empty_zone(), which too is __meminit. So make it static and
properly annotated.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Mon, 16 Jul 2007 06:38:17 +0000 (23:38 -0700)]
page table handling cleanup
Kill pte_rdprotect(), pte_exprotect(), pte_mkread(), pte_mkexec(), pte_read(),
pte_exec(), and pte_user() except where arch-specific code is making use of
them.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mundt [Mon, 16 Jul 2007 06:38:16 +0000 (23:38 -0700)]
numa: mempolicy: trivial debug fixes.
Enabling debugging fails to build due to the nodemask variable in
do_mbind() having changed names, and then oopses on boot due to the
assumption that the nodemask can be dereferenced -- which doesn't work out
so well when the policy is changed to MPOL_DEFAULT with a NULL nodemask by
numa_default_policy().
This fixes it up, and switches from PDprintk() to pr_debug() while
we're at it.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
oom: stop allocating user memory if TIF_MEMDIE is set
get_user_pages() can try to allocate a nearly unlimited amount of memory on
behalf of a user process, even if that process has been OOM killed. The
OOM kill occurs upon return to user space via a SIGKILL, but
get_user_pages() will try allocate all its memory before returning. Change
get_user_pages() to check for TIF_MEMDIE, and if set then return
immediately.
Paul Mundt [Mon, 16 Jul 2007 06:38:15 +0000 (23:38 -0700)]
numa: mempolicy: dynamic interleave map for system init
This converts the default system init memory policy to use a dynamically
created node map instead of defaulting to all online nodes. Nodes of a
certain size (>= 16MB) are judged to be suitable for interleave, and are added
to the map. If all nodes are smaller in size, the largest one is
automatically selected.
Without this, tiny nodes find themselves out of memory before we even make it
to userspace. Systems with large nodes will notice no change.
Only the system init policy is effected by this change, the regular
MPOL_DEFAULT policy is still switched to later on in the boot process as
normal.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If set then the kernel will be booted by default with slab debugging
switched on. Similar to CONFIG_SLAB_DEBUG. By default slab debugging
is available but must be enabled by specifying "slub_debug" as a
kernel parameter.
Also add support to switch off slab debugging for a kernel that was
built with CONFIG_SLUB_DEBUG_ON. This works by specifying
slub_debug=-
as a kernel parameter.
Dave Jones wanted this feature.
http://marc.info/?l=linux-kernel&m=118072189913045&w=2
[akpm@linux-foundation.org: clean up switch statement] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Mon, 16 Jul 2007 06:38:12 +0000 (23:38 -0700)]
mm: debug check for the fault vs invalidate race
Add a bugcheck for Andrea's pagefault vs invalidate race. This is triggerable
for both linear and nonlinear pages with a userspace test harness (using
direct IO and truncate, respectively).
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Mon, 16 Jul 2007 06:38:09 +0000 (23:38 -0700)]
slob: improved alignment handling
Remove the core slob allocator's minimum alignment restrictions, and instead
introduce the alignment restrictions at the slab API layer. This lets us heed
the ARCH_KMALLOC/SLAB_MINALIGN directives, and also use __alignof__ (unsigned
long) for the default alignment (which should allow relaxed alignment
architectures to take better advantage of SLOB's small minimum alignment).
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Mon, 16 Jul 2007 06:38:08 +0000 (23:38 -0700)]
slob: remove bigblock tracking
Remove the bigblock lists in favour of using compound pages and going directly
to the page allocator. Allocation size is stored in page->private, which also
makes ksize more accurate than it previously was.
Saves ~.5K of code, and 12-24 bytes overhead per >= PAGE_SIZE allocation.
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>