Tom Tucker [Tue, 13 May 2008 14:16:05 +0000 (09:16 -0500)]
svcrdma: Verify read-list fits within RPCSVC_MAXPAGES
A RDMA read-list cannot contain more elements than RPCSVC_MAXPAGES or
it will overflow the DTO context. Verify this when processing the
protocol header.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Wed, 7 May 2008 20:47:42 +0000 (15:47 -0500)]
svcrdma: Change svc_rdma_send_error return type to void
The svc_rdma_send_error function is called when an RPCRDMA protocol
error is detected. This function attempts to post an error reply message.
Since an error posting to a transport in error is ignored, change
the return type to void.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Wed, 7 May 2008 18:52:42 +0000 (13:52 -0500)]
svcrdma: Copy transport address and arm CQ before calling rdma_accept
This race was found by inspection. Messages can be received from the peer
immediately following the rdma_accept call, however, the CQ have not yet
been armed and the transport address has not yet been set.
Set the transport address in the connect request handler and arm the CQ
prior to calling rdma_accept.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Wed, 7 May 2008 18:49:58 +0000 (13:49 -0500)]
svcrdma: Set rqstp transport address in rdma_read_complete function
The rdma_read_complete function needs to copy the rqstp transport address
from the transport. Failure to do so can result in using the wrong
authentication method for the RPC or bug checking if the rqstp address
is not valid.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Thu, 1 May 2008 16:13:50 +0000 (11:13 -0500)]
svcrdma: Move the QP and cm_id destruction to svc_rdma_free
Move the destruction of the QP and CM_ID to the free path so that the
QP cleanup code doesn't race with the dto_tasklet handling flushed WR.
The QP reference is not needed because we now have a reference for
every WR.
Also add a guard in the SQ and RQ completion handlers to ignore
calls generated by some providers when the QP is destroyed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Thu, 1 May 2008 03:00:46 +0000 (22:00 -0500)]
svcrdma: Move destroy to kernel thread
Some providers may wait while destroying adapter resources.
Since it is possible that the last reference is put on the
dto_tasklet, the actual destroy must be scheduled as a work item.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Tue, 6 May 2008 16:49:05 +0000 (11:49 -0500)]
svcrdma: Shrink scope of spinlock on RQ CQ
The rq_cq_reap function is only called from the dto_tasklet. The
only resource shared with other threads is the sc_rq_dto_q. Move the
spin lock to protect only this list.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Thu, 1 May 2008 01:44:39 +0000 (20:44 -0500)]
svcrdma: Use standard Linux lists for context cache
Replace the one-off linked list implementation used to implement the
context cache with the standard Linux list_head lists. Add a context
counter to catch resource leaks. A WARN_ON will be added later to
ensure that we've freed all contexts.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
An NFS_WRITE requires a set of RDMA_READ requests to fetch the write
data from the client. There are two principal pieces of data that
need to be tracked: the list of pages that comprise the completed RPC
and the SGE of dma mapped pages to refer to this list of pages. Previously
this whole bit was managed as a linked list of contexts with the
context containing the page list buried in this list. This patch
simplifies this processing by not keeping a linked list, but rather only
a pionter from the last submitted RDMA_READ's context to the context
that maps the set of pages that describe the RPC. This significantly
simplifies this code path. SGE contexts are cleaned up inline in the DTO
path instead of at read completion time.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Tue, 6 May 2008 15:04:50 +0000 (10:04 -0500)]
svcrdma: Return error from rdma_read_xdr so caller knows to free context
The rdma_read_xdr function did not discriminate between no read-list and
an error posting the read-list. This results in a leak of a page if there
is an error posting the read-list.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Tue, 6 May 2008 14:45:54 +0000 (09:45 -0500)]
svcrdma: Fix error handling during listening endpoint creation
A listening endpoint isn't known to the generic transport switch until
the svc_create_xprt function returns without error. Calling
svc_xprt_put within the xpo_create function causes the module reference
count to be erroneously decremented.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Fri, 25 Apr 2008 23:08:59 +0000 (18:08 -0500)]
svcrdma: Free context on post_recv error in send_reply
If an error is encountered trying to post a recv buffer in send_reply,
free the passed in context. Return an error to the caller so it is
aware that the request was not posted.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Fri, 25 Apr 2008 19:11:31 +0000 (14:11 -0500)]
svcrdma: Free context on ib_post_recv error
If there is an error posting the recv WR to the RQ, free the
context associated with the WR. This would leak a context when
asynchronous errors occurred on the transport while conccurent threads
were processing their RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Thu, 24 Apr 2008 19:17:21 +0000 (14:17 -0500)]
svcrdma: Add put of connection ESTABLISHED reference in rdma_cma_handler
The svcrdma transport takes a reference when it gets the ESTABLISHED
event from the provider. This reference is supposed to be removed when
the DISCONNECT event is received, however, the call to svc_xprt_put
was missing in the switch statement. This results in the memory
associated with the transport never being freed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Fri, 25 Apr 2008 20:51:27 +0000 (15:51 -0500)]
svcrdma: Fix return value in svc_rdma_send
Fix the return value on close to -ENOTCONN so caller knows to free context.
Also if a thread is waiting for free SQ space, check for close when waking
to avoid posting WR to a closing transport.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Tue, 6 May 2008 16:33:11 +0000 (11:33 -0500)]
svcrdma: Fix race with dto_tasklet in svc_rdma_send
The svc_rdma_send function will attempt to reap SQ WR to make room for
a new request if it finds the SQ full. This function races with the
dto_tasklet that also reaps SQ WR. To avoid polling and arming the CQ
unnecessarily move the test_and_clear_bit of the RDMAXPRT_SQ_PENDING
flag and arming of the CQ to the sq_cq_reap function.
Refactor the rq_cq_reap function to match sq_cq_reap so that the
code is easier to follow.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Wed, 23 Apr 2008 21:49:54 +0000 (16:49 -0500)]
svcrdma: Simplify receive buffer posting
The svcrdma transport provider currently allocates receive buffers
to the RQ through the xpo_release_rqst method. This approach is overly
complicated since it means that the rqstp rq_xprt_ctxt has to be
selectively set based on whether the RPC is going to be processed
immediately or deferred. Instead, just post the receive buffer when
we are certain that we are replying in the send_reply function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tom Tucker [Fri, 25 Apr 2008 16:07:10 +0000 (11:07 -0500)]
svc: Remove extra check for XPT_DEAD bit in svc_xprt_enqueue
Remove a redundant check for the XPT_DEAD bit in the svc_xprt_enqueue
function. This same bit is checked below while holding the pool lock
and prints a debug message if found to be dead.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Commit f15364bd4cf8799a7677b6daeed7b67d9139d974 ("IPv6 support for NFS
server export caches") dropped a couple spaces, rendering the output
here difficult to read.
(However note that we expect the output to be parsed only by humans, not
machines, so this shouldn't have broken any userland software.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Linus Torvalds [Sun, 18 May 2008 20:56:54 +0000 (13:56 -0700)]
Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c/max6875: Really prevent 24RF08 corruption
i2c-amd756: Fix functionality flags
i2c: Kill the old driver matching scheme
i2c: Convert remaining new-style drivers to use module aliasing
i2c: Switch pasemi to the new device/driver matching scheme
i2c: Clean up Blackfin BF527 I2C device declarations
i2c-nforce2: Disable the second SMBus channel on the DFI Lanparty NF4 Expert
i2c: New co-maintainer
Add multi_defconfig, to build a kernel for all supported m68k platforms,
excluding Sun 3 (Sun 3 kernels are incompatible with all other m68k platforms)
The *_ISA type defines are quite generic and cause namespace conflicts
(e.g. with `AMIGAHW_DECLARE(GG2_ISA)' in <asm/amigahw.h>) for some kernel
configurations. Use ISA_TYPE_* to avoid such conflicts.
Use `__builtin_trap()' instead of `asm volatile("illegal")' in the m68k BUG()
macros (as suggested by Andrew Pinski), to kill warnings in code that assumes
BUG() does not return.
m68k: FB_HP300 depends on DIO and doesnt need FB_CFB_FILLRECT
Correct FB_HP300 dependencies:
- FB_HP300 doesn't depend only on HP300, but also on DIO (which depends on
HP300)
- FB_HP300 does not need FB_CFB_FILLRECT
Jean Delvare [Sun, 18 May 2008 18:49:41 +0000 (20:49 +0200)]
i2c/max6875: Really prevent 24RF08 corruption
i2c-core takes care of the possible corruption of 24RF08 chips for
quite some times, so device devices no longer need to do it. And they
really should not, as applying the prevention twice voids it.
I thought that I had fixed all drivers long ago but apparently I had
missed that one.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Ben Gardner <bgardner@wabtec.com>
Jean Delvare [Sun, 18 May 2008 18:49:40 +0000 (20:49 +0200)]
i2c: Convert remaining new-style drivers to use module aliasing
Update all the remaining new-style i2c drivers to use standard module
aliasing instead of the old driver_name/type driver matching scheme.
Note that the tuner driver is a bit quirky at the moment, as it
overwrites i2c_client.name with arbitrary strings. We write "tuner"
back on remove, to make sure that driver cycling will work properly,
but there may still be troublesome corner cases.
Jean Delvare [Sun, 18 May 2008 18:49:40 +0000 (20:49 +0200)]
i2c-nforce2: Disable the second SMBus channel on the DFI Lanparty NF4 Expert
There is a strange chip at 0x2e on the second SMBus channel of the
DFI Lanparty NF4 Expert motherboard. Accessing the chip reboots the
system. As there's nothing interesting on this SMBus channel, the
easiest and safest thing to do is to disable it on that board.
This is a better fix to bug #5889 than the it87 driver update that was
done originally:
http://bugzilla.kernel.org/show_bug.cgi?id=5889
Jean Delvare [Sun, 18 May 2008 18:49:40 +0000 (20:49 +0200)]
i2c: New co-maintainer
Ben Dooks agreed to become my co-maintainer for the i2c subsystem. In
particular, Ben will help with drivers for embedded systems, of which
my experience is inexistent. Thanks Ben and welcome on board!
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Ben Dooks <ben-linux@fluff.org>
[ARM] fix parenthesis in include/asm-arm/arch-omap/control.h
Parenthesis fix in include/asm-arm/arch-omap/control.h
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Cc: Paul Walmsley <paul@pwsan.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Michael Abbott [Wed, 14 May 2008 23:29:24 +0000 (16:29 -0700)]
[ARM] colibri: fix support for DM9000 ethernet device
Two changes are necessary to enable proper operation of the DM9000 device with
the Colibri PXA 270 board: firstly, the IRQ type needs to be configured for
rising edge interrupts, and secondly this configuration needs to be
communicated through to the DM9000.
[akpm@linux-foundation.org: remove set_irq_type() call as per ben-linux request] Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk> Cc: Daniel Mack <daniel@caiaq.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5037/1: Orion: fix DNS323/Kurobox Pro PCI initialisation
Whereas most Orion 5x machine support code would initialise the PCI
subsystem with nr_controllers in their struct hw_pci set to 2, the
DNS323 and Kurobox Pro machine support code had nr_controllers set
to 1.
This was presumably done because on those two machines, the PCI(-X)
controller (nr == 1) isn't used, requiring initialisation of only
the PCIe controller (nr == 0.) However, not initialising the PCI(-X)
controller on boards that don't use it leads to a situation where
both the PCIe and the PCI(-X) controller think that their root bus is
zero, and it messes up IRQ assignment.
This patch changes the DNS323 and Kurobox Pro support code to always
use nr_controllers == 2.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5034/1: fix arm{925,926,940,946} dma_flush_range() in WT mode
The CPU's dma_flush_range() operation needs to clean+invalidate the
given memory area if the cache is in writeback mode, or do just the
invalidate part if the cache is in writethrough mode, but the current
proc-arm{925,926,940,946} (incorrectly) do a cache clean in the
latter case. This patch fixes that.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Sat, 17 May 2008 21:21:43 +0000 (14:21 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: disable mwait for AMD family 10H/11H CPUs
x86: fix crash on cpu hotplug on pat-incapable machines
x86: remove mwait capability C-state check
It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.
If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.
Thus HLT saves more power than MWAIT here.
It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
Family 10)
Re-add the AMD families 10H/11H check and disable the mwait usage for
those.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Avi Kivity [Wed, 14 May 2008 09:20:32 +0000 (12:20 +0300)]
x86: fix crash on cpu hotplug on pat-incapable machines
pat_disable() is __init, which means it goes away after booting is complete.
Unfortunately it is used by the hotplug code if the machine is not
pat-capable, causing a crash.
Fix by marking pat_disable() as __cpuinit.
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 14 May 2008 06:47:40 +0000 (08:47 +0200)]
x86: remove mwait capability C-state check
Vegard Nossum reports:
| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec
| Date: Wed Jan 30 13:33:16 2008 +0100
|
| x86: use the correct cpuid method to detect MWAIT support for C states
remove the functional effects of this patch and make mwait unconditional.
A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.
Linus Torvalds [Fri, 16 May 2008 01:28:46 +0000 (18:28 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] macintosh: Replace deprecated __initcall with device_initcall
[POWERPC] cell: Fix section mismatches in io-workarounds code
[POWERPC] spufs: Fix compile error
[POWERPC] Fix uninitialized variable bug in copy_{to|from}_user
[POWERPC] Add null pointer check to of_find_property
[POWERPC] vmemmap fixes to use smaller pages
[POWERPC] spufs: Fix pointer reference in find_victim
[POWERPC] 85xx: SBC8548 - Add flash support and HW Rev reporting
[POWERPC] 85xx: Fix some sparse warnings for 85xx MDS
[POWERPC] 83xx: Enable DMA engine on the MPC8377 MDS board.
[POWERPC] 86xx: mpc8610_hpcd: fix second serial port
[POWERPC] 86xx: mpc8610_hpcd: add support for NOR and NAND flashes
[POWERPC] 85xx: Add 8568 PHY workarounds to board code
[POWERPC] 86xx: mpc8610_hpcd: use ULI526X driver for on-board ethernet
Linus Torvalds [Fri, 16 May 2008 01:28:28 +0000 (18:28 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: update transaction t_state to T_COMMIT fix
ext4: Retry block allocation if new blocks are allocated from system zone.
ext4: mballoc fix mb_normalize_request algorithm for 1KB block size filesystems
ext4: fix typos in messages and comments (journalled -> journaled)
ext4: fix synchronization of quota files in journal=data mode
ext4: Fix mount messages when quota disabled
ext4: correct mount option parsing to detect when quota options can be changed
Linus Torvalds [Fri, 16 May 2008 00:50:37 +0000 (17:50 -0700)]
Clean up 'print_fn_descriptor_symbol()' types
Everybody wants to pass it a function pointer, and in fact, that is what
you _must_ pass it for it to make sense (since it knows that ia64 and
ppc64 use descriptors for function pointers and fetches the actual
address from there).
So don't make the argument be a 'unsigned long' and force everybody to
add a cast.
Mingming Cao [Thu, 15 May 2008 18:46:17 +0000 (14:46 -0400)]
jbd2: update transaction t_state to T_COMMIT fix
Updating the current transaction's t_state is protected by j_state_lock. We
need to do the same when updating the t_state to T_COMMIT.
Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ext4: Retry block allocation if new blocks are allocated from system zone.
If the block allocator gets blocks out of system zone ext4 calls
ext4_error. But if the file system is mounted with errors=continue
retry block allocation. We need to mark the system zone blocks as
in use to make sure retry don't pick them again
System zone is the block range mapping block bitmap, inode bitmap and inode
table.
Ingo Molnar [Wed, 14 May 2008 15:11:46 +0000 (17:11 +0200)]
tty: fix BKL related leak and crash
Enabling the BKL to be lockdep tracked uncovered the following
upstream kernel bug in the tty code, which caused a BKL
reference leak:
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
dmesg/3121 is leaving the kernel with locks still held!
1 lock held by dmesg/3121:
#0: (kernel_mutex){--..}, at: [<c02f34d9>] opost+0x24/0x194
this might explain some of the atomicity warnings and crashes
that -tip tree testing has been experiencing since the BKL
was converted back to a spinlock.
The patch aims to fix a performance issue for the syscall
personality(PER_LINUX32).
On IA-64 box, the syscall personality (PER_LINUX32) has poor performance
because it failed to find the Linux/x86 execution domain. Then it tried
to load the kernel module however it failed always and it used the default
execution domain PER_LINUX instead. Requesting kernel modules is very
expensive. It caused the performance issue. (see the function
lookup_exec_domain in kernel/exec_domain.c).
To resolve the issue, execution domain Linux/x86 is always registered in
initialization time for IA-64 architecture.
Signed-off-by: Xiaolan Huang <xiaolan.huang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Thu, 15 May 2008 16:10:13 +0000 (09:10 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] show_interrupts: prevent cpu hotplug when walking cpu_online_map.
[S390] smp: __smp_call_function_map vs cpu_online_map fix.
[S390] tape: Use ccw_dev_id to build cdev_id.
[S390] dasd: fix timeout handling in interrupt handler
[S390] s390dbf: Use const char * for dbf name.
[S390] dasd: Use const in busid functions.
[S390] blacklist.c: removed duplicated include
[S390] vmlogrdr: module initialization function should return negative errors
[S390] sparsemem vmemmap: initialize memmap.
[S390] Remove last traces of cio_msg=.
[S390] cio: Remove CCW_CMD_SUSPEND_RECONN in front of CCW_CMD_SET_PGID.
Heiko Carstens [Thu, 15 May 2008 14:52:38 +0000 (16:52 +0200)]
[S390] smp: __smp_call_function_map vs cpu_online_map fix.
Both smp_call_function() and __smp_call_function_map() access
cpu_online_map. Both functions run with preemption disabled which
protects for cpus going offline. However new cpus can be added and
therefore the cpu_online_map can change unexpectedly.
So use the call_lock to protect against changes to the cpu_online_map
in start_secondary() and all smp_call_* functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Weinhuber [Thu, 15 May 2008 14:52:36 +0000 (16:52 +0200)]
[S390] dasd: fix timeout handling in interrupt handler
When the dasd_int_handler is called with an error code instead of
an irb, the associated request should be restarted. This handling
was missing from the -ETIMEDOUT case. In fact it should be done in
any case.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>