NeilBrown [Mon, 28 Nov 2005 21:44:09 +0000 (13:44 -0800)]
[PATCH] md: improve read speed to raid10 arrays using 'far copies'
raid10 has two different layouts. One uses near-copies (so multiple
copies of a block are at the same or similar offsets of different
devices) and the other uses far-copies (so multiple copies of a block
are stored a greatly different offsets on different devices). The point
of far-copies is that it allows the first section (normally first half)
to be layed out in normal raid0 style, and thus provide raid0 sequential
read performance.
Unfortunately, the read balancing in raid10 makes some poor decisions
for far-copies arrays and you don't get the desired performance. So
turn off that bad bit of read_balance for far-copies arrays.
With this patch, read speed of an 'f2' array is comparable with a raid0
with the same number of devices, though write speed is ofcourse still
very slow.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rik van Riel [Mon, 28 Nov 2005 21:44:07 +0000 (13:44 -0800)]
[PATCH] temporarily disable swap token on memory pressure
Some users (hi Zwane) have seen a problem when running a workload that
eats nearly all of physical memory - th system does an OOM kill, even
when there is still a lot of swap free.
The problem appears to be a very big task that is holding the swap
token, and the VM has a very hard time finding any other page in the
system that is swappable.
Instead of ignoring the swap token when sc->priority reaches 0, we could
simply take the swap token away from the memory hog and make sure we
don't give it back to the memory hog for a few seconds.
This patch resolves the problem Zwane ran into.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Mon, 28 Nov 2005 21:44:05 +0000 (13:44 -0800)]
[PATCH] cpuset fork locking fix
Move the cpuset_fork() call below the write_unlock_irq call in
kernel/fork.c copy_process().
Since the cpuset-dual-semaphore-locking-overhaul.patch, the cpuset_fork()
routine acquires task_lock(), so cannot be called while holding the
tasklist_lock for write.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now when pages_low is reached, we want to kick asynch reclaim, which gives us
an interval of "b" before we must start synch reclaim, and gives kswapd an
interval of "a" before it need go back to sleep.
When pages_min is reached, normal allocators must enter synch reclaim, but
PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (ie. atomic allocations, recursive
allocations, etc.) get access to varying amounts of the reserve "c".
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Seth, Rohit" <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ext3: Wrong return value for EXT3_IOC_GROUP_ADD
This patch corrects the return value for the EXT3_IOC_GROUP_ADD in case it
fails due to the presence of multiple resizers at the filesystem.
The problem is a little bit more serious than a wrong return value in this
case, since the clause err=0 in the exit_journal path will lead to a call
to update_backups which in turns causes a NULL pointer dereference.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hirokazu Takata [Mon, 28 Nov 2005 21:43:58 +0000 (13:43 -0800)]
[PATCH] m32r: Fix sys_tas() syscall
This patch fixes a deadlock problem of the m32r SMP kernel.
In the m32r kernel, sys_tas() system call is provided as a test-and-set
function for userspace, for backward compatibility.
In some multi-threading application program, deadlocks were rarely caused
at sys_tas() funcion. Such a deadlock was caused due to a collision of
__pthread_lock() and __pthread_unlock() operations.
The "tas" syscall is repeatedly called by pthread_mutex_lock() to get a
lock, while a lock variable's value is not 0. On the other hand,
pthead_mutex_unlock() sets the lock variable to 0 for unlocking.
In the previous implementation of sys_tas() routine, there was a
possibility that a unlock operation was ignored in the following case:
- Assume a lock variable (*addr) was equal to 1 before sys_tas() execution.
- __pthread_unlock() operation is executed by the other processor
and the lock variable (*addr) is set to 0, between a read operation
("oldval = *addr;") and the following write operation ("*addr = 1;")
during a execution of sys_tas().
In this case, the following write operation ("*addr = 1;") overwrites the
__pthread_unlock() result, and sys_tas() fails to get a lock in the next
turn and after that.
According to the attatched patch, sys_tas() returns 0 value in the next
turn and deadlocks never happen.
Ben Collins [Mon, 28 Nov 2005 21:43:56 +0000 (13:43 -0800)]
[PATCH] Fix hardcoded cpu=0 in workqueue for per_cpu_ptr() calls
Tracked this down on an Ultra Enterprise 3000. It's a 6-way machine. Odd
thing about this machine (and it's good for finding bugs like this) is that
the CPU id's are not 0 based. For instance, on my machine the CPU's are
6/7/10/11/14/15.
This caused some NULL pointer dereference in kernel/workqueue.c because for
single_threaded workqueue's, it hardcoded the cpu to 0.
I changed the 0's to any_online_cpu(cpu_online_mask), which cpumask.h
claims is "First cpu in mask". So this fits the same usage.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Drokin [Mon, 28 Nov 2005 21:43:53 +0000 (13:43 -0800)]
[PATCH] reiserfs: fix 32-bit overflow in map_block_for_writepage()
I now see another overflow in reiserfs that should lead to data corruptions
with files that are bigger than 4G under certain circumstances when using
mmap.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove bogus usage of test/set_bit() from fbcon rotation code and just
manipulate the bits directly. This fixes an oops on powerpc among others
and should be faster. Seems to work fine on the G5 here.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Mon, 28 Nov 2005 21:43:47 +0000 (13:43 -0800)]
[PATCH] memory_sysdev_class is static
So don't define it as extern in the header file.
drivers/base/memory.c:28: error: static declaration of 'memory_sysdev_class' follows non-static declaration
include/linux/memory.h:88: error: previous declaration of 'memory_sysdev_class' was here
Ashok Raj [Mon, 28 Nov 2005 21:43:46 +0000 (13:43 -0800)]
[PATCH] clean up lock_cpu_hotplug() in cpufreq
There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.
Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.
Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.
- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.
Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.
Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Stern [Mon, 28 Nov 2005 21:43:44 +0000 (13:43 -0800)]
[PATCH] Workaround for gcc 2.96 (undefined references)
LD .tmp_vmlinux1
mm/built-in.o(.text+0x100d6): In function `copy_page_range':
: undefined reference to `__pud_alloc'
mm/built-in.o(.text+0x1010b): In function `copy_page_range':
: undefined reference to `__pmd_alloc'
mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
: undefined reference to `__pud_alloc'
fs/built-in.o(.text+0xc930): In function `install_arg_page':
: undefined reference to `__pud_alloc'
make: *** [.tmp_vmlinux1] Error 1
Those missing references in mm/memory.c arise from this code in
include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
__PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:
/*
* The following ifdef needed to get the 4level-fixup.h header to work.
* Remove it when 4level-fixup.h has been removed.
*/
#if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
{
return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
NULL: pud_offset(pgd, address);
}
With my configuration the pgd_none and pud_none routines are inlines
returning a constant 0. Apparently the old compiler avoids generating
calls to __pud_alloc and __pmd_alloc but still lists them as undefined
references in the module's symbol table.
I don't know which change caused this problem. I think it was added
somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
several 2.6.14-rc kernels without difficulty. However I can't point to an
individual culprit.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Mon, 28 Nov 2005 22:34:23 +0000 (14:34 -0800)]
mm: re-architect the VM_UNPAGED logic
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Trond Myklebust [Fri, 25 Nov 2005 22:10:11 +0000 (17:10 -0500)]
SUNRPC: Funny looking code in __rpc_purge_upcall
In __rpc_purge_upcall (net/sunrpc/rpc_pipe.c), the newer code to clean up
the in_upcall list has a typo.
Thanks to Vince Busam <vbusam@google.com> for spotting this!
Trond Myklebust [Fri, 25 Nov 2005 22:10:06 +0000 (17:10 -0500)]
NFS: Fix a spinlock recursion inside nfs_update_inode()
In cases where the server has gone insane, nfs_update_inode() may end
up calling nfs_invalidate_inode(), which again calls stuff that takes
the inode->i_lock that we're already holding.
In addition, given the sort of things we have in NFS these days that
need to be cleaned up on inode release, I'm not sure we should ever
be calling make_bad_inode().
Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing
the caches, and marking the inode as being stale.
Thanks to Steve Dickson <SteveD@redhat.com> for spotting this.
David Gibson [Thu, 24 Nov 2005 02:34:56 +0000 (13:34 +1100)]
[PATCH] powerpc: More hugepage boundary case fixes
Blah. The patch [0] I recently sent fixing errors with
in_hugepage_area() and prepare_hugepage_range() for powerpc itself has
an off-by-one bug. Furthermore, the related functions
touches_hugepage_*_range() and within_hugepage_*_range() are also
buggy. Some of the bugs, like those addressed in [0] originated with
commit 7d24f0b8a53261709938ffabe3e00f88f6498df9 where we tweaked the
semantics of where hugepages are allowed. Other bugs have been there
essentially forever, and are due to the undefined behaviour of '<<'
with shift counts greater than the type width (LOW_ESID_MASK could
return non-zero for high ranges with the right congruences).
The good news is that I now have a testsuite which should pick up
things like this if they creep in again.
Stephen Rothwell [Tue, 22 Nov 2005 01:05:26 +0000 (12:05 +1100)]
[PATCH] powerpc: remove arch/powerpc/include hack for 64 bit
With the removal of include/asm-powerpc, we no longer need
arch/powerpc/include/asm for the 64 bit build. We also do not need
-Iarch/powerpc for the 64 bit build either.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Felix Blyakher [Fri, 25 Nov 2005 05:42:13 +0000 (16:42 +1100)]
[XFS] Tight loop in xfs_finish_reclaim_all prevented the xfslogd to run
its queue of IO completion callbacks, thus creating the deadlock between
umount and xfslogd. Breaking the loop solves the problem.
Jasper Spaans [Thu, 24 Nov 2005 15:53:36 +0000 (16:53 +0100)]
[PATCH] fbcon: fix obvious bug in fbcon logo rotation code
This code fixes a tiny problem with the recent fbcon rotation changes:
fb_prepare_logo doesn't check the return value of fb_find_logo and that
causes a crash for my while booting.
Dave Airlie [Thu, 24 Nov 2005 10:41:14 +0000 (21:41 +1100)]
drm: fix quiescent locking
A fix for a locking bug which is triggered when a client tries to lock with
flag DMA_QUIESCENT (typically the X server), but gets interrupted by a signal.
The locking IOCTL should then return an error, but if DMA_QUIESCENT succeeds
it returns 0, and the client falsely thinks it has the lock. In addition
The client waits for DMA_QUISCENT and possibly DMA_READY without having the lock.
From: Thomas Hellstrom Signed-off-by: Dave Airlie <airlied@linux.ie>
David Härdeman [Wed, 23 Nov 2005 23:45:49 +0000 (15:45 -0800)]
[PATCH] USB: fix USB key generates ioctl_internal_command errors issue
On Wed, Nov 16, 2005 at 06:34:24PM -0800, Pete Zaitcev wrote:
>On Wed, 16 Nov 2005 23:52:32 +0100, David Härdeman <david@2gen.com> wrote:
>> usb-storage: waiting for device to settle before scanning
>> Vendor: I0MEGA Model: UMni1GB*IOM2K4 Rev: 1.01
>> Type: Direct-Access ANSI SCSI revision: 02
>> SCSI device sda: 2048000 512-byte hdwr sectors (1049 MB)
>> sda: Write Protect is off
>> sda: Mode Sense: 00 00 00 00
>> sda: assuming drive cache: write through
>> ioctl_internal_command: <8 0 0 0> return code = 8000002
>> : Current: sense key=0x0
>> ASC=0x0 ASCQ=0x0
>> SCSI device sda: 2048000 512-byte hdwr sectors (1049 MB)
>
>I think it's harmless. I saw things like that, and initially I plugged
>them with workarounds like this:
Thanks for the pointer, and yes, it is harmless, but it floods the
console with the messages which hides other (potentially important)
messages...following your example I've made a patch which fixes the
problem.
This should fix a suspend/resume issues that appear with OHCI on some
PPC hardware. The PCI layer should doesn't have the hooks needed for
such ASIC-specific hooks (in this case, software clock gating), so
this moves the code to do that into hcd-pci.c ... where it can be
done after the relevant PCI PM state transition (to/from D3).
David Brownell [Wed, 23 Nov 2005 23:45:37 +0000 (15:45 -0800)]
[PATCH] USB: EHCI updates split init/reinit logic for resume
Moving the PCI-specific parts of the EHCI driver into their own file
created a few issues ... notably on resume paths which (like swsusp)
require re-initializing the controller. This patch:
- Splits the EHCI startup code into run-once HCD setup code and
separate "init the hardware" reinit code. (That reinit code is
a superset of the "early usb handoff" code.)
- Then it makes the PCI init code run both, and the resume code only
run the reinit code.
- It also removes needless pci wrappers around EHCI start/stop methods.
- Removes a byteswap issue that would be seen on big-endian hardware.
The HCD glue still doesn't actually provide a good way to do all this
run-one init stuff in one place though.
David Brownell [Wed, 23 Nov 2005 23:45:28 +0000 (15:45 -0800)]
[PATCH] USB: EHCI updates
This fixes some bugs in EHCI suspend/resume that joined us over the past
few releases (as usbcore, PCI, pmcore, and other components evolved):
- Removes suspend and resume recursion from the EHCI driver, getting
rid of the USB_SUSPEND special casing.
- Updates the wakeup mechanism to work again; there's a newish usbcore
call it needs to use.
- Provide simpler tests for "do we need to restart from scratch", to
address another case where PCI Vaux was lost. (In this case it was
restoring a swsusp snapshot, but there could be others.)
Un-exports a symbol that was temporarily exported.
A notable change from previous version is that this doesn't move
the spinlock init, so there's still a resume/reinit path bug.
Ian Abbott [Wed, 23 Nov 2005 23:45:23 +0000 (15:45 -0800)]
[PATCH] USB: ftdi_sio: new IDs for KOBIL devices
This patch adds two new devices to the ftdi_sio driver's device ID
table. The device IDs were supplied by Stefan Nies of KOBIL Systems for
two of their devices using the FTDI chip.
Damian Wrobel [Wed, 23 Nov 2005 23:45:17 +0000 (15:45 -0800)]
[PATCH] USB: SN9C10x driver - bad page state fix
This patch solves the following problem I've already discovered on the
latest 2.6.15-rc1-git1 kernel:
Nov 13 07:37:28 wrobel kernel: Bad page state at free_hot_cold_page (in process 'motion', page c164e020)
Nov 13 07:37:28 wrobel kernel: flags:0x40000400 mapping:00000000 mapcount:0 count:0
Nov 13 07:37:28 wrobel kernel: Backtrace:
Nov 13 07:37:28 wrobel kernel: [<c0146d86>] bad_page+0x85/0xbe
Nov 13 07:37:28 wrobel kernel: [<c0147629>] free_hot_cold_page+0x54/0x129
Nov 13 07:37:28 wrobel kernel: [<c01598c6>] __vunmap+0xa9/0xfe
Nov 13 07:37:28 wrobel kernel: [<c0154114>] vmalloc_to_page+0x34/0x55
Nov 13 07:37:28 wrobel kernel: [<c0159942>] vfree+0x27/0x35
Nov 13 07:37:28 wrobel kernel: [<f8a20292>] sn9c102_release_buffers+0x30/0x3f [sn9c102]
Nov 13 07:37:28 wrobel kernel: [<f8a231c2>] sn9c102_release+0x37/0xeb [sn9c102]
Nov 13 07:37:28 wrobel kernel: [<c0163e74>] __fput+0xa9/0x1aa
Nov 13 07:37:28 wrobel kernel: [<c01624f7>] filp_close+0x49/0x6d
Nov 13 07:37:30 wrobel kernel: [<c016258f>] sys_close+0x74/0x95
Nov 13 07:37:30 wrobel kernel: [<c0102ef9>] syscall_call+0x7/0xb
Nov 13 07:37:31 wrobel kernel: Trying to fix it up, but a reboot is needed
When attempting to hotadd a PCI card with a bridge on it, I saw
the kernel reporting resource collision errors even when there were
really no collisions. The problem is that the code doesn't skip
over "invalid" resources with their resource type flag not set.
Others have reported similar problems at boot time and for
non-bridge PCI card hotplug too, where the code flags a
resource collision for disabled ROMs. This patch fixes both
problems.
Rajesh Shah [Wed, 23 Nov 2005 23:44:54 +0000 (15:44 -0800)]
[PATCH] PCI Express Hotplug: clear sticky power-fault bit
Per the PCI Express spec, the power-fault-detected bit in the
slot status register can be set anytime hardware detects a power
fault, regardless of whether the slot has a device populated in
it or not. This bit is sticky and must be explicitly cleared.
This patch is needed to allow hot-add after such a power fault
has been detected.
Jean Delvare [Wed, 23 Nov 2005 23:44:31 +0000 (15:44 -0800)]
[PATCH] hwmon: Fix missing it87 fan div init
Fix a bug where setting the low fan speed limits will not work if no
data was ever read through the sysfs interface and the fan clock
dividers have not been explicitely set yet either. The reason is that
data->fan_div[nr] may currently be used before it is initialized from
the chip register values. The fix is to explicitely initialize
data->fan_div[nr] before using it.
Jean Delvare [Wed, 23 Nov 2005 23:44:26 +0000 (15:44 -0800)]
[PATCH] hwmon: Fix lm78 VID conversion
Fix the lm78 VID reading, which I accidentally broke while making
this driver use the common vid_from_reg function rather than
reimplementing its own in 2.6.14-rc1.
Jody McIntyre [Wed, 23 Nov 2005 23:44:03 +0000 (15:44 -0800)]
[PATCH] Clarify T: field in MAINTAINERS
Pavel Machek points out that for git repos, what we include is not
actually a URL. It is undesirable to use a URL since git repos can be
accessed in many different ways.
Olaf Rempel [Thu, 24 Nov 2005 03:04:08 +0000 (19:04 -0800)]
[BRIDGE]: recompute features when adding a new device
We must recompute bridge features everytime the list of underlying
devices changes, or we might end up with features that are not
supported by all devices (eg. NETIF_F_TSO)
This patch adds the missing recompute when adding a device to the bridge.
Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/netfilter/ip_conntrack_netlink.c: In function 'ctnetlink_dump_table':
net/ipv4/netfilter/ip_conntrack_netlink.c:409: warning: implicit declaration of function 'local_bh_disable'
net/ipv4/netfilter/ip_conntrack_netlink.c:427: warning: implicit declaration of function 'local_bh_enable'
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Gibson [Wed, 23 Nov 2005 21:37:45 +0000 (13:37 -0800)]
[PATCH] powerpc: fix for hugepage areas straddling 4GB boundary
Commit 7d24f0b8a53261709938ffabe3e00f88f6498df9 fixed bugs in the ppc64 SLB
miss handler with respect to hugepage handling, and in the process tweaked
the semantics of the hugepage address masks in mm_context_t.
Unfortunately, it left out a couple of necessary changes to go with that
change. First, the in_hugepage_area() macro was not updated to match,
second prepare_hugepage_range() was not updated to correctly handle
hugepages regions which straddled the 4GB point.
The latter appears only to cause process-hangs when attempting to map such
a region, but the former can cause oopses if a get_user_pages() is
triggered at the wrong point. This patch addresses both bugs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If unregister_console() is inadvertently called while no consoles are
registered, it will crash trying to dereference NULL pointer. It is
necessary to fix that because register_console() provides no indication
that it actually registered the console passed in. In fact, it may well
decide not to register it based on various things...
(akpm: It'd be better to make register_console() return something and fix the
callers. All 106 of them...)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jim Keniston [Wed, 23 Nov 2005 21:37:42 +0000 (13:37 -0800)]
[PATCH] kprobes: Fix return probes on sys_execve
Fix a bug in kprobes that can cause an Oops or even a crash when a return
probe is installed on one of the following functions: sys_execve,
do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to
remove the call to kprobe_flush_task() in flush_thread(). This fix has
been tested on all architectures for which the return-probes feature has
been implemented (i386, x86_64, ppc64, ia64). Please apply.
BACKGROUND
Up to now, we have called kprobe_flush_task() under two situations: when a
task exits, and when it execs. Flushing kretprobe_instances on exit is
correct because (a) do_exit() doesn't return, and (b) one or more
return-probed functions may be active when a task calls do_exit(). Neither
is the case for sys_execve() and its callees.
Initially, the mistaken call to kprobe_flush_task() on exec was harmless
because we put the "real" return address of each active probed function
back in the stack, just to be safe, when we recycled its
kretprobe_instance. When support for ppc64 and ia64 was added, this safety
measure couldn't be employed, and was eventually dropped even for i386 and
x86_64. sys_execve() and its callees were informally blacklisted for
return probes until this fix was developed.
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 23 Nov 2005 21:37:40 +0000 (13:37 -0800)]
[PATCH] mm: fill arch atomic64 gaps
alpha, sparc64, x86_64 are each missing some primitives from their atomic64
support: fill in the gaps I've noticed by extrapolating asm, follow the
groupings in each file. But powerpc and parisc still lack atomic64.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@muc.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 23 Nov 2005 21:37:39 +0000 (13:37 -0800)]
[PATCH] mm: powerpc ptlock comments
Update comments (only) on page_table_lock and mmap_sem in arch/powerpc.
Removed the comment on page_table_lock from hash_huge_page: since it's no
longer taking page_table_lock itself, it's irrelevant whether others are; but
how it is safe (even against huge file truncation?) I can't say.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 23 Nov 2005 21:37:38 +0000 (13:37 -0800)]
[PATCH] mm: unbloat get_futex_key
The follow_page changes in get_futex_key have left it with two almost
identical blocks, when handling the rare case of a futex in a nonlinear vma.
get_user_pages will itself do that follow_page, and its additional
find_extend_vma is hardly any overhead since the vma is already cached. Let's
just delete the follow_page block and let get_user_pages do it.
Hugh Dickins [Wed, 23 Nov 2005 21:37:37 +0000 (13:37 -0800)]
[PATCH] mm: update split ptlock Kconfig
Closer attention to the arithmetic shows that neither ppc64 nor sparc really
uses one page for multiple page tables: how on earth could they, while
pte_alloc_one returns just a struct page pointer, with no offset?
Well, arm26 manages it by returning a pte_t pointer cast to a struct page
pointer, harumph, then compensating in its pmd_populate. But arm26 is never
SMP, so it's not a problem for split ptlock either.
And the PA-RISC situation has been recently improved: CONFIG_PA20 works
without the 16-byte alignment which inflated its spinlock_t. But the current
union of spinlock_t with private does make the 7xxx struct page significantly
larger, even without debug, so disable its split ptlock.
Changing the #define to an inline function breaks on non-SMP builds,
since wuite a few places in the kernel do not implement the ipi handler
when compiling for UP.
Dave Airlie [Wed, 23 Nov 2005 11:12:59 +0000 (22:12 +1100)]
drm: move is_pci to the end of the structure
We memset the structure across opens except for the flags. The correct
fix is more intrusive but this should fix a problem with bad iounmaps
seen on AGP radeons acting like PCI ones.
Dave Airlie [Wed, 23 Nov 2005 10:45:43 +0000 (21:45 +1100)]
I think that if a PCI bus is a root bus, attached to a host bridge not a
PCI->PCI bridge, then bus->self is allowed to be NULL. Certainly that's
the case on my Pegasos, and it makes the MGA DRM driver oops...
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Dave Airlie <airlied@linux.ie>
Linus Torvalds [Wed, 23 Nov 2005 03:39:30 +0000 (19:39 -0800)]
Fix up GFP_ZONEMASK for GFP_DMA32 usage
There was some confusion about the different zone usage, this should fix
up the resulting mess in the GFP zonemask handling.
The different zone usage is still confusing (it's very easy to mix up
the individual zone numbers with the GFP zone _list_ numbers), so we
might want to clean up some of this in the future, but in the meantime
this should fix the actual problems.
Neil Horman [Tue, 22 Nov 2005 22:56:32 +0000 (14:56 -0800)]
[NET]: Fix ifenslave to not fail on lack of IP information
Patch to ifenslave so that under older ABI versions, a failure to propogate ip
information from master to slave does not result in a filure to enslave the
slave device.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Remove proto == NULL checking since ip_conntrack_[nat_]proto_find_get
always returns a valid pointer.
Fix missing ip_conntrack_proto_put in some paths.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 22 Nov 2005 22:47:37 +0000 (14:47 -0800)]
[IPV4]: Fix secondary IP addresses after promotion
This patch fixes the problem with promoting aliases when:
a) a single primary and > 1 secondary addresses
b) multiple primary addresses each with at least one secondary address
Based on earlier efforts from Brian Pomerantz <bapper@piratehaven.org>,
Patrick McHardy <kaber@trash.net> and Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Zippel [Tue, 22 Nov 2005 05:32:38 +0000 (21:32 -0800)]
[PATCH] prefer pkg-config for the QT check
This makes pkg-config now the prefered way to configure QT and properly
fixes the recent Fedora breakage and leaves the old QT detection as
fallback mechanism.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] device-mapper raid1: drop mark_region spinlock fix
The spinlock region_lock is held while calling mark_region which can sleep.
Drop the spinlock before calling that function.
A region's state and inclusion in the clean list are altered by rh_inc and
rh_dec. The state variable is set to RH_CLEAN in rh_dec, but only if
'pending' is zero. It is set to RH_DIRTY in rh_inc, but not if it is already
so. The changes to 'pending', the state, and the region's inclusion in the
clean list need to be atomicly.
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>