[PATCH] page migration: fail if page is in a vma flagged VM_LOCKED
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.
However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
and avoid retrying the migration.
migrate_page_remove_references() now needs to return a reason why the
failure occured. So switch migrate_page_remove_references to use -Exx
style error messages.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nathan Scott [Wed, 15 Mar 2006 04:14:45 +0000 (15:14 +1100)]
Fix a direct I/O locking issue revealed by the new mutex code.
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
not possible to get i_mutex locking correct when using DIO_OWN
direct I/O locking in a filesystem due to indeterminism in the
possible return code/lock/unlock combinations. This can cause
a direct read to attempt a double i_mutex unlock inside XFS.
We're now ensuring __blockdev_direct_IO always exits with the
inode i_mutex (still) held for a direct reader.
Tested with the three different locking modes (via direct block
device access, ext3 and XFS) - both reading and writing; cannot
find any regressions resulting from this change, and it clearly
fixes the mutex_unlock warning originally reported here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2
Signed-off-by: Nathan Scott <nathans@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de>
Maneesh Soni [Tue, 14 Mar 2006 09:33:14 +0000 (15:03 +0530)]
[PATCH] Plug kdump shutdown race window
lapic_shutdown() re-enables interrupts which is un-desirable for panic
case, so use local_irq_save() and local_irq_restore() to keep the irqs
disabled for kexec on panic case, and close a possible race window while
kdump shutdown as shown in this stack trace
Dave Peterson [Tue, 14 Mar 2006 05:20:50 +0000 (21:20 -0800)]
[PATCH] EDAC: disable sysfs interface
- Disable the EDAC sysfs code. The sysfs interface that EDAC presents to
user space needs more thought, and is likely to change substantially.
Therefore disable it for now so users don't start depending on it in its
current form.
- Disable the default behavior of calling panic() when an uncorrectible
error is detected (since for now, there is no sysfs interface that allows
the user to configure this behavior).
Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trond Myklebust [Tue, 14 Mar 2006 05:20:48 +0000 (21:20 -0800)]
[PATCH] SUNRPC: Fix potential deadlock in RPC code
In rpc_wake_up() and rpc_wake_up_status(), it is possible for the call to
__rpc_wake_up_task() to fail if another thread happens to be calling
rpc_wake_up_task() on the same rpc_task.
Trond Myklebust [Tue, 14 Mar 2006 05:20:47 +0000 (21:20 -0800)]
[PATCH] NFSv4: fix mount segfault on errors returned that are < -1000
It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
mapping them to kernel errors. Problem spotted by Neil Horman
<nhorman@tuxdriver.com>
Trond Myklebust [Tue, 14 Mar 2006 05:20:46 +0000 (21:20 -0800)]
[PATCH] NFS: Fix a potential panic in O_DIRECT
Based on an original patch by Mike O'Connor and Greg Banks of SGI.
Mike states:
A normal user can panic an NFS client and cause a local DoS with
'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the
user buffer starts with a valid mapped page and contains an unmapped page,
will crash in this way. I haven't followed the code, but O_DIRECT reads with
similar user buffers will probably also crash albeit in different ways.
Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
correctly handles get_user_pages() returning an error, which happens if the
first page covered by the user buffer's address range is unmapped. However,
if the first page is mapped but some subsequent page isn't, get_user_pages()
will return a positive number which is less than the number of pages requested
(this behaviour is sort of analagous to a short write() call and appears to be
intentional). nfs_get_user_pages() doesn't detect this and hands off the
array of pages (whose last few elements are random rubbish from the newly
allocated array memory) to it's caller, whence they go to
nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
and calculates its own idea of how many pages are in the array from the user
buffer length. Needless to say, when it comes to transmit those uninitialised
page* pointers, we see a crash in the network stack.
GOTO Masanori [Tue, 14 Mar 2006 05:20:44 +0000 (21:20 -0800)]
[PATCH] Fix sigaltstack corruption among cloned threads
This patch fixes alternate signal stack corruption among cloned threads
with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.
The value of alternate signal stack is currently inherited after a call of
clone(... CLONE_SIGHAND | CLONE_VM). But if sigaltstack is set by a
parent thread, and then if multiple cloned child threads (+ parent threads)
call signal handler at the same time, some threads may be conflicted -
because they share to use the same alternative signal stack region.
Finally they get sigsegv. It's an undesirable race condition. Note that
child threads created from NPTL pthread_create() also hit this conflict
when the parent thread uses sigaltstack, without my patch.
To fix this problem, this patch clears the child threads' sigaltstack
information like exec(). This behavior follows the SUSv3 specification.
In SUSv3, pthread_create() says "The alternate stack shall not be inherited
(when new threads are initialized)". It means that sigaltstack should be
cleared when sigaltstack memory space is shared by cloned threads with
CLONE_SIGHAND.
Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
- If clone_flags line is not existed, fork() does not inherit sigaltstack.
- CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
- CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
- CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
but this flag has a bit different semantics.
I decided to use CLONE_SIGHAND.
[ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
[PATCH] macintosh: correct AC Power info in /proc/pmu/info
Report AC Power present in /proc/pmu/info if there is no battery.
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Brownell [Tue, 14 Mar 2006 05:20:40 +0000 (21:20 -0800)]
[PATCH] mtd_dataflash, fix block vs page erase
Fix a bug in the block-erase optimization for Dataflash; it was using block
erase even for smaller segments that need page erase.
That wouldn't matter for JFFS2, which never erases less than one block
(sometimes several blocks), but for other callers it might.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Herbert Xu [Mon, 13 Mar 2006 22:26:12 +0000 (14:26 -0800)]
[TCP]: Fix zero port problem in IPv6
When we link a socket into the hash table, we need to make sure that we
set the num/port fields so that it shows us with a non-zero port value
in proc/netlink and on the wire. This code and comment is copied over
from the IPv4 stack as is.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andi Kleen [Sun, 12 Mar 2006 22:52:59 +0000 (23:52 +0100)]
[PATCH] x86-64: Fix up handling of non canonical user RIPs
EM64T CPUs have somewhat weird error reporting for non canonical RIPs in
SYSRET.
We can't handle any exceptions there because the exception handler would
end up running on the user stack which is unsafe.
To avoid problems any code that might end up with a user touched pt_regs
should return using int_ret_from_syscall. int_ret_from_syscall ends up
using IRET, which allows safe exceptions.
Linus Torvalds [Sun, 12 Mar 2006 22:56:02 +0000 (14:56 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] iwmmxt thread state alignment
[ARM] 3350/1: Enable 1-wire on ARM
[ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem
[ARM] 3355/1: NSLU2: remove propmt depends
[ARM] 3354/1: NAS100d: fix power led handling
[ARM] Fix muldi3.S
Russell King [Sun, 12 Mar 2006 22:36:06 +0000 (22:36 +0000)]
[ARM] iwmmxt thread state alignment
This patch removes the reliance of iwmmxt on hand coded alignments.
Since thread_info is always 8K aligned, specifying that fpstate is
8-byte aligned achieves the same effect without needing to resort
to hand coded alignments.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Gregor Maier [Sun, 12 Mar 2006 02:51:25 +0000 (18:51 -0800)]
[NETFILTER]: Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG
Signed-off-by: Gregor Maier <gregor@net.in.tum.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Haley [Sun, 12 Mar 2006 02:50:14 +0000 (18:50 -0800)]
[IPV6]: fix ipv6_saddr_score struct element
The scope element in the ipv6_saddr_score struct used in
ipv6_dev_get_saddr() is an unsigned integer, but __ipv6_addr_src_scope()
returns a signed integer (and can return -1).
Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sam Ravnborg [Wed, 8 Mar 2006 08:06:33 +0000 (00:06 -0800)]
[PATCH] de620: fix section mismatch warning
In latest -mm de620 gave following warning:
WARNING: drivers/net/de620.o - Section mismatch: reference to \
.init.text:de620_probe from .text between 'init_module' (at offset \
0x1682) and 'cleanup_module'
init_module() call de620_probe() which is declared __init.
Fix is to declare init_module() __init too.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jon Mason [Fri, 10 Mar 2006 21:12:10 +0000 (15:12 -0600)]
[PATCH] dl2k: DMA freeing error
This patch fixes an error in the dl2k driver's DMA mapping/unmapping.
The adapter uses the upper 16bits of the DMA address for the buffer
size. However, this is not masked off when referencing the DMA
address, and can lead to errors by trying to free a DMA address out of
range.
Thanks,
Jon
Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
David S. Miller [Sat, 11 Mar 2006 02:08:09 +0000 (18:08 -0800)]
[PATCH] Wrong return value corrupts free object in e1000 driver
For some reason, E1000's ->hard_start_xmit() routine returns -EFAULT
instead of one of the NETDEV_TX_* error codes. In fact, it frees up
the SKB before returning this. This makes the queueing layer think
the packet should be requeued and subsequently we corrupt a freed
object.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
The patch '[PATCH] RCU signal handling' [1] added an export for
__put_task_struct_cb, a put_task_struct helper newly introduced in that
patch. But the put_task_struct couldn't be used modular previously as
__put_task_struct wasn't exported. There are not callers of it in modular
code, and it shouldn't be exported because we don't want drivers to hold
references to task_structs.
This patch removes the export and folds __put_task_struct into
__put_task_struct_cb as there's no other caller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Sat, 11 Mar 2006 11:27:16 +0000 (03:27 -0800)]
[PATCH] selinux: tracer SID fix
Fix SELinux to not reset the tracer SID when the child is already being
traced, since selinux_ptrace is also called by proc for access checking
outside of the context of a ptrace attach.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arjan van de Ven [Sat, 11 Mar 2006 11:27:15 +0000 (03:27 -0800)]
[PATCH] edac: disable a few sysfs files to avoid them becoming an ABI
Disable (via ugly #if 0's) the 3 sysfs files that I think by now we all
agree are very much wrong. These files shouldn't become part of the ABI by
the 2.6.16 release, so I rather have this minimal patch merged to disable
them for now, the real fix can then come during the 2.6.17 devel window.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Sat, 11 Mar 2006 11:27:14 +0000 (03:27 -0800)]
[PATCH] ext3: fix nobh mode for chattr +j inodes
One can do "chattr +j" on a file to change its journalling mode. Fix
writeback mode with "nobh" handling for it.
Even though, we mount ext3 filesystem in writeback mode with "nobh" option,
some one can do "chattr +j" on a single file to force it to do journalled
mode. In order to do journaling, ext3_block_truncate_page() need to
fallback to default case of creating buffers and adding them to transaction
etc.
Kirill Korotaev [Sat, 11 Mar 2006 11:27:13 +0000 (03:27 -0800)]
[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink(). Such allocation may re-enter ext3 code from
try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.
This bug led to "Assertion failure in journal_dirty_metadata()" messages.
Dmitry Torokhov [Sat, 11 Mar 2006 05:23:38 +0000 (00:23 -0500)]
[PATCH] Input: psmouse - disable autoresync
Automatic resynchronization in psmouse driver causes problems on some
hardware so disable it by default for now. People with KVM switches
that require resync can still enable it via module parameter or sysfs
attribute.
Jan Beulich [Wed, 22 Feb 2006 12:29:04 +0000 (13:29 +0100)]
[PATCH] kbuild: version.h should depend on .kernelrelease
Rebuilding a previously built tree while using make's -j option from
time to time results in the version.h check running at the same time as
the updating of .kernelrelease, resulting in UTS_RELEASE remaining an
empty string (and as a side effect causing the entire kernel to be
rebuilt).
Signed-Off-By: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Catalin Marinas [Fri, 10 Mar 2006 22:26:47 +0000 (22:26 +0000)]
[ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem
Patch from Catalin Marinas
ARM1136 erratum 371025 (category 2) specifies that, under rare
conditions, an invalidate I-cache by MVA (line or range) operation can
fail to invalidate a cache line. The recommended workaround is to
either invalidate the entire I-cache or invalidate the range by
set/way rather than MVA.
Note that for a 16K cache size, invalidating a 4K page by set/way is
equivalent to invalidating the entire I-cache.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.
The cache reaper currently tries to free all alien caches and all remote
per cpu pages in each pass of cache_reap. For a machines with large number
of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
Interrupts are disabled while reclaiming creating unacceptable delays.
This patch changes that behavior by adding a per cpu reap_node variable.
Instead of attempting to free all caches, we free only one alien cache and
the per cpu pages from one remote node. That reduces the time spend in
cache_reap. However, doing so will lengthen the time it takes to
completely drain all remote per cpu pagesets and all alien caches. The
time needed will grow with the number of nodes in the system. All caches
are drained when they overflow their respective capacity. So the drawback
here is only that a bit of memory may be wasted for awhile longer.
Details:
1. Rename drain_remote_pages to drain_node_pages to allow the specification
of the node to drain of pcp pages.
2. Add additional functions init_reap_node, next_reap_node for NUMA
that manage a per cpu reap_node counter.
3. Add a reap_alien function that reaps only from the current reap_node.
For us this seems to be a critical issue. Holdoffs of an average of ~7ms
cause some HPC benchmarks to slow down significantly. F.e. NAS parallel
slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor
compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with
the additional interrupt holdoff reductions.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the code tries up to spin_retry times to grab a lock using the cs
instruction. The cs instruction has exclusive access to a memory region
and therefore invalidates the appropiate cache line of all other cpus. If
there is contention on a lock this leads to cache line trashing. This can
be avoided if we first check wether a cs instruction is likely to succeed
before the instruction gets actually executed.
Signed-off-by: Christian Ehrhardt <ehrhardt@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] vmscan: no zone_reclaim if PF_MALLOC is set
If the process has already set PF_MALLOC and is already using
current->reclaim_state then do not try to reclaim memory from the zone.
This is set by kswapd and/or synchrononous global reclaim which will not
take it lightly if we zap the reclaim_state.
Signed-off-by: Christoph Lameter <clameter@sig.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Fri, 10 Mar 2006 01:33:45 +0000 (17:33 -0800)]
[PATCH] xtensa must set RWSEM_GENERIC_SPINLOCK=y
/usr/src/ctest/git/kernel/mm/rmap.c: In function `page_referenced_one':
/usr/src/ctest/git/kernel/mm/rmap.c:354: warning: implicit declaration of function `rwsem_is_locked'
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Doug Warzecha [Fri, 10 Mar 2006 01:33:35 +0000 (17:33 -0800)]
[PATCH] dcdbas: dcdbas_pdev referenced after platform_device_unregister on exit
smi_data_buf_free() references dcdbas_pdev when calling
dma_free_coherent(). In dcdbas_exit(), smi_data_buf_free() is called after
platform_device_unregister(dcdbas_pdev).
This patch moves platform_device_unregister(dcdbas_pdev) after
smi_data_buf_free() in dcdbas_exit().
Hugh Dickins [Fri, 10 Mar 2006 01:33:34 +0000 (17:33 -0800)]
[PATCH] page_add_file_rmap(): remove BUG_ON()s
Remove two early-development BUG_ONs from page_add_file_rmap.
The pfn_valid test (originally useful for checking that nobody passed an
artificial struct page) comes too late, since we already have the struct
page.
The PageAnon test (useful when anon was first distinguished from file rmap)
prevents ->nopage implementations from reusing ->mapping, which would
otherwise be available.
(1) Replace scsi_add_device with scsi_scan_target.
(Thus the rport instead of the scsi_host becomes parent of a
scsi_target again.)
(2) Avoid scsi_device allocation during registration of an remote port.
(Would be done during fc_scsi_scan_rport.)
(3) Fix queuecommand behaviour when an zfcp unit is blocked.
(Call scsi_done with DID_NO_CONNECT instead of returning
SCSI_MLQUEUE_DEVICE_BUSY otherwise we might end up waiting
for completion in blk_execute_rq for ever.)
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Ralf Baechle [Wed, 8 Mar 2006 14:13:04 +0000 (14:13 +0000)]
[MIPS] Discard .exit.text at runtime.
At times gcc will place bits of __exit functions into .rodata. If
compiled into the kernle itself we used to discard .exit.text - but
not the bits left in .rodata. While harmless this did at times result
in a large number of warnings. So until gcc fixes this, discard
.exit.text at runtime.
Ralf Baechle [Thu, 2 Mar 2006 16:50:12 +0000 (16:50 +0000)]
[MIPS] Threaten removal of code for NEC DDB5074 and DDB5476 evaluation boards.
What: Support for NEC DDB5074 and DDB5476 evaluation boards.
When: June 2006
Why: Board specific code doesn't build anymore since ~2.6.0 and no
users have complained indicating there is no more need for these
boards. This should really be considered a last call.
Who: Ralf Baechle <ralf@linux-mips.org>
In the past I added an host attribute but unfortunately
I forgot to increase FC_HOST_NUM_ATTRS.
This is fixed with the patch. Otherwise an fibre channel
lld might run into
BUG_ON(count > FC_HOST_NUM_ATTRS);
in fc_attach_transport().
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>