Bob Wilson [Sat, 2 Feb 2008 00:56:32 +0000 (16:56 -0800)]
[XTENSA] Fix makefile to work with binutils-2.18.
When building with binutils-2.18, vmlinux includes .note.gnu.build-id
sections that need to be stripped out when building the binary image.
The old .xt.insn sections haven't been used for a long time, so don't
bother stripping them.
Signed-off-by: Bob Wilson <bob.wilson@acm.org> Signed-off-by: Chris Zankel <chris@zankel.net>
Chris Zankel [Tue, 22 Jan 2008 08:45:25 +0000 (00:45 -0800)]
[XTENSA] Fix register corruption for certain processor configurations
For processor configurations that have optional registers
(compiler-used but non-coprocessor), user space registers
might get corrupted when there are only 4 registers in
the current window-frame, ie. register a4 belongs to the
oldest frame in the register file.
Chris Zankel [Sat, 19 Jan 2008 00:15:29 +0000 (16:15 -0800)]
[XTENSA] Fix cache flush macro for D$/I$ aliasing/non-aliasing
For configurations that have aliasing in the data cache but
not in the instruction cache, we don't need to flush the
instruction cache. Thus, we didn't define the macros to
flush the instruction cache. Some cache-flush functions,
howerver, were using those macros.
Chris Zankel [Fri, 11 Jan 2008 19:44:17 +0000 (11:44 -0800)]
[XTENSA] Add support for the sa_restorer function
Supporting the sa_restorer function allows for better security
since the sigreturn system call doesn't need to be placed on
the stack, so the stack doesn't need to be executable. This
requires support from the c-library as it has to provide the
restorer function.
Chris Zankel [Tue, 12 Feb 2008 21:17:07 +0000 (13:17 -0800)]
[XTENSA] Add support for configurable registers and coprocessors
The Xtensa architecture allows to define custom instructions and
registers. Registers that are bound to a coprocessor are only
accessible if the corresponding enable bit is set, which allows
to implement a 'lazy' context switch mechanism. Other registers
needs to be saved and restore at the time of the context switch
or during interrupt handling.
This patch adds support for these additional states:
- save and restore registers that are used by the compiler upon
interrupt entry and exit.
- context switch additional registers unbound to any coprocessor
- 'lazy' context switch of registers bound to a coprocessor
- ptrace interface to provide access to additional registers
- update configuration files in include/asm-xtensa/variant-fsf
Bob Wilson [Tue, 16 Oct 2007 23:43:00 +0000 (16:43 -0700)]
[XTENSA] Clean up stat structs.
Avoid using typedefs for stat fields.
Make stat64.st_blocks an unsigned long long to avoid endian-specific
padding with 32-bit values.
Clean up signed vs. unsigned and int vs. long types to be consistent
with other uses of these values.
Signed-off-by: Bob Wilson <bob.wilson@acm.org> Signed-off-by: Chris Zankel <chris@zankel.net>
Chris Zankel [Tue, 12 Feb 2008 21:10:40 +0000 (13:10 -0800)]
[XTENSA] Remove unused code
We will never (need to) support signal handling coming from a
double exception. There are too many things that could go wrong
and delivering signals is not the fastest method for IPC, anyway.
Chris Zankel [Tue, 8 Jan 2008 00:42:21 +0000 (16:42 -0800)]
[XTENSA] Fix modules for non-exec processor configurations
We need to use vmalloc_exec for module loading. Also remove
the definitions MODULE_START and MODULE_END, which wasn't
used, and increase the VMALLOC memory range accordingly.
Chris Zankel [Wed, 14 Nov 2007 21:47:02 +0000 (13:47 -0800)]
[XTENSA] Add missing a2 register restore in register spill routine
Register a2 is saved in depc but wasn't getting restored before
returning from _spill_registers when there weren't any registers
to spill. The mask to cut the top bit from the rotated WINDOWMASK
register was also one bit short.
Move boot-redboot load address from 0xD0200000 to 0xD1000000
to make space for larger kernel images, in particular those with
an embedded initramfs filesystem.
Also properly set the ELF start address in boot-elf images so
that PC need not be set manually when loading them using GDB.
Chris Zankel [Tue, 12 Feb 2008 19:55:32 +0000 (11:55 -0800)]
[XTENSA] Clean up elf-gregset.
Remove additional registers from the ELF gregset structure that
are only used by the kernel or are not required or invalid in
user-space. The ar registers are always aligned to a windowbase
value of 0, and the WB register is always assumed to be 0.
Increase the size of the structure to 128 entries. This will
provide enough space in future.
Chris Zankel [Tue, 23 Oct 2007 17:58:53 +0000 (10:58 -0700)]
[XTENSA] Fix clobbered register in asm macro
We dangerously re-used an input operand to an asm macro
without defining a constraint. By defining a separate
output operand (instead of input/output operand), the
compiler is more flexible during register allocation.
Marc Gauthier [Fri, 21 Sep 2007 23:38:09 +0000 (16:38 -0700)]
[XTENSA] Prevent inlining ISS platform asm constructs
The simcall asm macro assumes Windowed ABI parameter passing
in registers, and doesn't work if its containing function gets
inlined. This fix prevents that from happening.
Chris Zankel [Thu, 6 Sep 2007 08:38:18 +0000 (01:38 -0700)]
[XTENSA] Flush the page-address in update-mmu instead of user-address
The TLB entry for the user address doesn't exist at the time we
want to flush the caches, so use the page address. Note that processor
configurations with cache-aliasing issues are treated separately.
Chris Zankel [Thu, 14 Feb 2008 00:44:19 +0000 (16:44 -0800)]
[XTENSA] Add .literal sections for various init sectiont to linker script
Xtensa requires separate .literal section for each .text section.
Adding addition init sections for cpuinit, meminit, and devinit,
broke the Xtensa linker script, so, add these literal sections
manually for now.
Adrian Bunk [Sat, 8 Dec 2007 01:14:47 +0000 (17:14 -0800)]
[XTENSA] Remove dead code reported by Robert P. J. Day.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Christian Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chris Zankel [Fri, 18 Jan 2008 22:26:55 +0000 (14:26 -0800)]
[XTENSA] Remove duplicate includes.
Signed-off-by: Lucas Woods <woodzy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christian Zankel <chris@zankel.net>
Linus Torvalds [Tue, 12 Feb 2008 04:52:01 +0000 (20:52 -0800)]
WMI: initialize wmi_blocks.list even if ACPI is disabled
Even if we don't want to register the WMI driver, we should initialize
the wmi_blocks list to be empty, since we don't want the wmi helper
functions to oops just because that basic list has not even been set up.
With this, "find_guid()" will happily return "not found" rather than
oopsing all over the place, and the callers will then just automatically
return false or AE_NOT_FOUND as appropriate.
Roland McGrath [Mon, 11 Feb 2008 22:38:51 +0000 (14:38 -0800)]
x86: vdso_install fix
The makefile magic for installing the 32-bit vdso images on disk had a
little error. A single-line change would fix that bug, but this does a
little more to reduce the error-prone duplication of this bit of
makefile variable magic.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 12 Feb 2008 04:30:22 +0000 (13:30 +0900)]
mempolicy: silently restrict nodemask to allowed nodes
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully populated
[out to MAXNUMNODES] nodemask. set_mempolicy() [in do_set_mempolicy()]
calls contextualize_policy() which requires that the nodemask be a
subset of the current task's mems_allowed; else EINVAL will be returned.
A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
i.e., nodes with memory. So, a fully populated nodemask will be
declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the nodemask
that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy() will
complain [again, resulting in EINVAL] if the nodemask contains any
memoryless nodes. This is somewhat redundant as mpol_new() will remove
memoryless nodes for interleave policy, as will bind_zonelist()--called
by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy logic to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) fold this code into mpol_check_policy(), replace 2 calls to
contextualize_policy() to call mpol_check_policy() directly
and remove contextualize_policy().
2) In existing mpol_check_policy() logic, after "contextualization":
a) MPOL_DEFAULT: require that in coming mask "was_empty"
b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
contains at least one node.
c) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
c) remove the now redundant check for memoryless nodes
3) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
4) Now that mpol_check_policy() contextualizes the nodemask, remove
the in-line nodes_and() from sys_mbind(). I believe that this
restores mbind() to the behavior before the memoryless-nodes
patch series. E.g., we'll no longer treat an invalid nodemask
with MPOL_PREFERRED as local allocation.
[ Patch history:
v1 -> v2:
- Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
- As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
v2 -> v3:
- As suggested by Kosaki Motohito, fold the "contextualization"
of policy nodemask into mpol_check_policy(). Looks a little
cleaner. ]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Corbet [Mon, 11 Feb 2008 23:17:33 +0000 (16:17 -0700)]
Be more robust about bad arguments in get_user_pages()
So I spent a while pounding my head against my monitor trying to figure
out the vmsplice() vulnerability - how could a failure to check for
*read* access turn into a root exploit? It turns out that it's a buffer
overflow problem which is made easy by the way get_user_pages() is
coded.
In particular, "len" is a signed int, and it is only checked at the
*end* of a do {} while() loop. So, if it is passed in as zero, the loop
will execute once and decrement len to -1. At that point, the loop will
proceed until the next invalid address is found; in the process, it will
likely overflow the pages array passed in to get_user_pages().
I think that, if get_user_pages() has been asked to grab zero pages,
that's what it should do. Thus this patch; it is, among other things,
enough to block the (already fixed) root exploit and any others which
might be lurking in similar code. I also think that the number of pages
should be unsigned, but changing the prototype of this function probably
requires some more careful review.
Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 12 Feb 2008 04:42:11 +0000 (20:42 -0800)]
Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
sata_mv: platform driver allocs dma without create
pata_ninja32: setup changes
pata_legacy: typo fix
pata_amd: Note in the module description it handles Nvidia
sata_mv: fix loop with last port
libata: ignore deverr on SETXFER if mode is configured
pata_via: fix SATA cable detection on cx700
Olof Johansson [Mon, 11 Feb 2008 02:22:57 +0000 (20:22 -0600)]
mlx4_core: Fix build break (missing include)
Commit 313abe55 ("mlx4_core: For 64-bit systems, vmap() kernel queue
buffers") caused this to pop up on powerpc allyesconfig, looks like a
missing include file:
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_alloc':
drivers/net/mlx4/alloc.c:162: error: implicit declaration of function 'vmap'
drivers/net/mlx4/alloc.c:162: error: 'VM_MAP' undeclared (first use in this function)
drivers/net/mlx4/alloc.c:162: error: (Each undeclared identifier is reported only once
drivers/net/mlx4/alloc.c:162: error: for each function it appears in.)
drivers/net/mlx4/alloc.c:162: warning: assignment makes pointer from integer without a cast
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_free':
drivers/net/mlx4/alloc.c:187: error: implicit declaration of function 'vunmap'
Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Tony Luck [Mon, 11 Feb 2008 21:23:46 +0000 (13:23 -0800)]
[IA64] Fix build for sim_defconfig
Commit bdc807871d58285737d50dc6163d0feb72cb0dc2 broke the build
for this config because the sim_defconfig selects CONFIG_HZ=250
but include/asm-ia64/param.h has an ifdef for the simulator to
force HZ to 32. So we ended up with a kernel/timeconst.h set
for HZ=250 ... which then failed the check for the right HZ
value and died with:
Drop the #ifdef magic from param.h and make force CONFIG_HZ=32
directly for the simulator.
Alan Cox [Fri, 8 Feb 2008 15:25:10 +0000 (15:25 +0000)]
pata_ninja32: setup changes
Forcibly set more of the configuration at init time. This seems to fix at
least one problem reported. We don't know what most of these bits do, but
we do know what windows stuffs there.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Thu, 7 Feb 2008 01:34:08 +0000 (10:34 +0900)]
libata: ignore deverr on SETXFER if mode is configured
Some controllers (VIA CX700) raise device error on SETXFER even after
mode configuration succeeded. Update ata_dev_set_mode() such that
device error is ignored if transfer mode is configured correctly. To
implement this, device is revalidated even after device error on
SETXFER.
This fixes kernel bugzilla bug 8563.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Thomas Gleixner [Sun, 10 Feb 2008 22:57:36 +0000 (23:57 +0100)]
x86: remove over noisy debug printk
pageattr-test.c contains a noisy debug printk that people reported.
The condition under which it prints (randomly tapping into a mem_map[]
hole and not being able to c_p_a() there) is valid behavior and not
interesting to report.
Andi Kleen [Mon, 11 Feb 2008 00:35:20 +0000 (01:35 +0100)]
Prevent IDE boot ops on NUMA system
Without this patch a Opteron test system here oopses at boot with
current git.
Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.
Linus Torvalds [Mon, 11 Feb 2008 17:19:47 +0000 (09:19 -0800)]
Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
SUNPRC: Fix printk format warning
nfsd: clean up svc_reserve_auth()
NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
NLM: have server-side RPC clients default to soft RPC tasks
NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
Matthew Wilcox [Mon, 11 Feb 2008 04:18:15 +0000 (23:18 -0500)]
Use proper abstractions in quirk_intel_irqbalance
Since we may not have a pci_dev for the device we need to access, we can't
use pci_read_config_word. But raw_pci_read is an internal implementation
detail; it's better to use the architected pci_bus_read_config_word
interface. Using PCI_DEVFN instead of a mysterious constant helps
reassure everyone that we really do intend to access device 8.
[ Thanks to Grant Grundler for pointing out to me that this is exactly
what the write immediately above this is doing -- enabling device 8 to
respond to config space cycles.
- Matthew
Grant also says:
"Can you also add a comment which points at the Intel
documentation?
The 'Intel E7320 Memory Controller Hub (MCH) Datasheet' at
Stephen Smalley [Thu, 7 Feb 2008 16:21:04 +0000 (11:21 -0500)]
selinux: support 64-bit capabilities
Fix SELinux to handle 64-bit capabilities correctly, and to catch
future extensions of capabilities beyond 64 bits to ensure that SELinux
is properly updated.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
ide: switch idedisk_prepare_flush() to use REQ_TYPE_ATA_TASKFILE requests
...
broke flush requests.
Allocating IDE command structure on the stack for flush requests is not
a very brilliant idea:
- idedisk_prepare_flush() only prepares the request and it doesn't wait
for it to be completed
- there are can be multiple flush requests queued in the queue
Fix the problem (per hints from James Bottomley) by:
- dynamically allocating ide_task_t instance using kmalloc(..., GFP_ATOMIC)
- adding new taskfile flag (IDE_TFLAG_DYN)
- calling kfree() in ide_end_drive_command() if IDE_TFLAG_DYN is set
(while at it rename 'args' to 'task' and fix whitespace damage)
[ This will be fixed properly before 2.6.25 but this bug is rather
critical and the proper solution requires some more work + testing. ]
Thanks to Sebastian Siewior and Christoph Hellwig for reporting the
problem and testing patches (extra thanks to Sebastian for bisecting
it to the guilty commmit).
Tested-by: Sebastian Siewior <ide-bug@ml.breakpoint.cc> Cc: Christoph Hellwig <hch@infradead.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Sun, 10 Feb 2008 23:32:14 +0000 (00:32 +0100)]
ide: introduce CONFIG_BLK_DEV_IDEDMA_SFF option
Introduce new option CONFIG_BLK_DEV_IDEDMA_SFF for non-PCI SFF-8038i compatible
bus mastering IDE controllers (which there are a few known), thus fixing a hack
made for Palmchip BK3710 controller...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Anton Salnikov <asalnikov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
On Saturday 09 February 2008, Adrian Bunk wrote:
> Commit 9e016a719209d95338e314b46c3012cc7feaaeec causes the following
> compile error:
>
> <-- snip -->
>
> ...
> CC drivers/ide/arm/bast-ide.o
> /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/ide/arm/bast-ide.c: In function 'bastide_register':
> /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/ide/arm/bast-ide.c:31: error: 'hwif' redeclared as different kind of symbol
> /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/ide/arm/bast-ide.c:29: error: previous definition of 'hwif' was here
> make[4]: *** [drivers/ide/arm/bast-ide.o] Error 1
>
> <-- snip -->
Remove 'ide_hwif_t **hwif' argument from bastide_register()
(together with write-only ifs[]).
Cc: Adrian Bunk <bunk@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
On Thursday 03 January 2008, Robert Hancock wrote:
[...]
> How about getting rid of this stupid thing in drivers/ide/ide.c:
>
> #define       REVISION        "Revision: 7.00alpha2"
>
> which is used in:
>
> printk(KERN_INFO "Uniform Multi-Platform E-IDE driver " REVISION "\n");
>
> It's been 7.00alpha2 for god knows how long, so clearly this version
> number is not useful..
Cc: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Convert palm_bk3710 host driver to use ide_device_add() instead of
ide_register_hw() (while at it drop doing "ide_unregister()" loop which
tries to unregister _all_ IDE interfaces if useable ide_hwifs[] slot
cannot be find).
Sergei Shtylyov [Sun, 10 Feb 2008 23:32:12 +0000 (00:32 +0100)]
ide: insert BUG_ON() into __ide_set_handler() (take 2)
Replace the check for hwgroup->handler and printk(KERN_CRIT, ...) at the start
of __ide_set_handler() with mere BUG_ON() while removing such from the caller,
ide_execute_command(). Fix up the code formatting, while at it...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Kiyoshi Ueda [Sun, 10 Feb 2008 23:32:11 +0000 (00:32 +0100)]
ide: another possible ide panic fix for blk-end-request
I have reviewed all blk-end-request patches again to confirm whether
there are any similar problems with the last week's ide-cd panic:
http://lkml.org/lkml/2008/1/29/140
And I found a possible similar bug in ide-io change:
ide_end_drive_cmd() could be called for blk_pc_request() which could
have bios. To complete such requests correctly, we need to pass
the actual size of the request.
Otherwise, __blk_end_request() returns 1 because the request still has
bios, and the system will BUG() unnecessarily.
The following patch fixes the bug and should be applied on top of
Linus' git.
J. Bruce Fields [Fri, 8 Feb 2008 04:10:21 +0000 (23:10 -0500)]
nfsd: clean up svc_reserve_auth()
This is a void function attempting to return the return value from
another void function, which seems harmless but extremely weird, and
apparently makes some compilers complain.
While we're there, clean up a little (e.g. the switch statement had a
minor style problem and seemed overkill as long as there's only one
case).
Thanks to Trond for noticing this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Wed, 6 Feb 2008 16:34:13 +0000 (11:34 -0500)]
NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
is in flight. If this happens we don't want lockd to insert the block
back into the nlm_blocked list.
This helps that situation, but there's still a possible race. Fixing
that will mean adding real locking for nlm_blocked.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Jeff Layton [Wed, 6 Feb 2008 16:34:12 +0000 (11:34 -0500)]
NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
With the current scheme in nlmsvc_grant_blocked, we can end up with more
than one GRANT_MSG callback for a block in flight. Right now, we requeue
the block unconditionally so that a GRANT_MSG callback is done again in
30s. If the client is unresponsive, it can take more than 30s for the
call already in flight to time out.
There's no benefit to having more than one GRANT_MSG RPC queued up at a
time, so put it on the list with a timeout of NLM_NEVER before doing the
RPC call. If the RPC call submission fails, we requeue it with a short
timeout. If it works, then nlmsvc_grant_callback will end up requeueing
it with a shorter timeout after it completes.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Jeff Layton [Wed, 6 Feb 2008 16:34:11 +0000 (11:34 -0500)]
NLM: have server-side RPC clients default to soft RPC tasks
Now that it no longer does an RPC ping, lockd always ends up queueing
an RPC task for the GRANT_MSG callback. But, it also requeues the block
for later attempts. Since these are hard RPC tasks, if the client we're
calling back goes unresponsive the GRANT_MSG callbacks can stack up in
the RPC queue.
Fix this by making server-side RPC clients default to soft RPC tasks.
lockd requeues the block anyway, so this should be OK.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Jeff Layton [Wed, 6 Feb 2008 16:34:10 +0000 (11:34 -0500)]
NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
It's currently possible for an unresponsive NLM client to completely
lock up a server's lockd. The scenario is something like this:
1) client1 (or a process on the server) takes a lock on a file
2) client2 tries to take a blocking lock on the same file and
awaits the callback
3) client2 goes unresponsive (plug pulled, network partition, etc)
4) client1 releases the lock
...at that point the server's lockd will try to queue up a GRANT_MSG
callback for client2, but first it requeues the block with a timeout of
30s. nlm_async_call will attempt to bind the RPC client to client2 and
will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is
unresponsive it will take around 60s for that to time out. Once it times
out, it's already time to retry the block and the whole process repeats.
Once in this situation, nlmsvc_retry_blocked will never return until
the host starts responding again. lockd won't service new calls.
Fix this by skipping the RPC ping on NLM RPC clients. This makes
nlm_async_call return quickly when called.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Linus Torvalds [Sun, 10 Feb 2008 22:18:14 +0000 (14:18 -0800)]
Linux 2.6.25-rc1
.. and I really need to call it something else. Maybe it is time to
bring back the weasel series, since weasels always make me feel good
about a kernel.
Linus Torvalds [Sun, 10 Feb 2008 22:09:44 +0000 (14:09 -0800)]
Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (30 commits)
[ARM] constify function pointer tables
[ARM] 4823/1: AT91 section fix
[ARM] 4824/1: pxa: clear RDH bit after any reset
[ARM] pxa: remove debugging PM: printk
ARM: OMAP1: Misc clean-up
ARM: OMAP1: Update defconfigs for omap1
ARM: OMAP1: Palm Tungsten E board clean-up
ARM: OMAP1: Use I2C bus registration helper for omap1
ARM: OMAP1: Remove omap_sram_idle()
ARM: OMAP1: PM fixes for OMAP1
ARM: OMAP1: Use MMC multislot structures for Siemens SX1 board
ARM: OMAP1: Make omap1 use MMC multislot structures
ARM: OMAP1: Change the comments to C style
ARM: OMAP1: Make omap1 boards to use omap_nand_platform_data
ARM: OMAP: Add helper module for board specific I2C bus registration
ARM: OMAP: Add dmtimer support for OMAP3
ARM: OMAP: Pre-3430 clean-up for dmtimer.c
ARM: OMAP: Add DMA support for chaining and 3430
ARM: OMAP: Add 24xx GPIO debounce support
ARM: OMAP: Get rid of unnecessary ifdefs in GPIO code
...
Matthew Wilcox [Sun, 10 Feb 2008 14:45:28 +0000 (09:45 -0500)]
Change pci_raw_ops to pci_raw_read/write
We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Mon, 14 Jan 2008 22:31:09 +0000 (17:31 -0500)]
PCI x86: always use conf1 to access config space below 256 bytes
Thanks to Loic Prylli <loic@myri.com>, who originally proposed
this idea.
Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bastian Blank [Sun, 10 Feb 2008 14:47:57 +0000 (16:47 +0200)]
splice: fix user pointer access in get_iovec_page_array()
Commit 8811930dc74a503415b35c4a79d14fb0b408a361 ("splice: missing user
pointer access verification") added the proper access_ok() calls to
copy_from_user_mmap_sem() which ensures we can copy the struct iovecs
from userspace to the kernel.
But we also must check whether we can access the actual memory region
pointed to by the struct iovec to fix the access checks properly.
Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Oliver Pinter <oliver.pntr@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Sun, 10 Feb 2008 11:48:15 +0000 (03:48 -0800)]
[PKT_SCHED] ematch: Fix build warning.
Commit 954415e33ed6cfa932c13e8c2460bd05e50723b5 ("[PKT_SCHED] ematch:
tcf_em_destroy robustness") removed a cast on em->data when
passing it to kfree(), but em->data is an integer type that can
hold pointers as well as other values so the cast is necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleg Nesterov [Fri, 1 Feb 2008 17:41:30 +0000 (20:41 +0300)]
hrtimer: don't modify restart_block->fn in restart functions
hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is
pointless and complicates its usage. Note that if sys_restart_syscall()
doesn't actually happen, we have a bogus "pending" restart->fn anyway,
this is harmless.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Also, set ->addr_limit = KERNEL_DS before doing hrtimer_nanosleep(), this func
was changed by the previous patch and now takes the "__user *" parameter.
Thanks to Ingo Molnar for fixing the bug in this patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Oleg Nesterov [Fri, 1 Feb 2008 14:29:05 +0000 (17:29 +0300)]
hrtimer: fix *rmtp handling in hrtimer_nanosleep()
Spotted by Pavel Emelyanov and Alexey Dobriyan.
hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
the local variable which lives in the caller's stack frame. This means that
if sys_restart_syscall() actually happens and it is interrupted as well, we
don't update the user-space variable, but write into the already dead stack
frame.
Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
Small problem remains. man 2 nanosleep states that *rtmp should be written if
nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
if nanosleep returns 0), but (with or without this patch) we can dirty *rem
even if nanosleep() returns 0.
NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
bugs. Fixed by the next patch.
clocksource initialization and error accumulation. This corrects a 280ppm
drift seen on some systems using acpi_pm, and affects other clocksources as
well (likely to a lesser degree).
Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jarek Poplawski [Sun, 10 Feb 2008 07:44:00 +0000 (23:44 -0800)]
[NET_SCHED] sch_htb: htb_requeue fix
htb_requeue() enqueues skbs for which htb_classify() returns NULL.
This is wrong because such skbs could be handled by NET_CLS_ACT code,
and the decision could be different than earlier in htb_enqueue().
So htb_requeue() is changed to work and look more like htb_enqueue().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Sun, 10 Feb 2008 07:42:17 +0000 (23:42 -0800)]
starfire: secton fix
gcc-3.4.4 on powerpc:
drivers/net/starfire.c:219: error: version causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Sun, 10 Feb 2008 07:41:40 +0000 (23:41 -0800)]
via-velocity: section fix
From: Andrew Morton <akpm@linux-foundation.org>
gcc-3.4.4 on powerpc:
drivers/net/via-velocity.c:443: error: chip_info_table causes a section type conflict
on this one I had to remove the __devinitdata too. Don't know why.
Cc: Jeff Garzik <jeff@garzik.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Sun, 10 Feb 2008 07:41:08 +0000 (23:41 -0800)]
natsemi: section fix
gcc-3.4.4 on powerpc:
drivers/net/natsemi.c:245: error: natsemi_pci_info causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Sun, 10 Feb 2008 07:40:34 +0000 (23:40 -0800)]
typhoon: section fix
gcc-3.4.4 on powerpc:
drivers/net/typhoon.c:137: error: version causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sam Ravnborg [Sun, 10 Feb 2008 07:29:28 +0000 (23:29 -0800)]
isdn: fix section mismatch warning for ISACVer
Fix following warnings:
WARNING: drivers/isdn/hisax/built-in.o(.text+0x19723): Section mismatch in reference from the function ISACVersion() to the variable .devinit.data:ISACVer
WARNING: drivers/isdn/hisax/built-in.o(.text+0x2005b): Section mismatch in reference from the function setup_avm_a1_pcmcia() to the function .devinit.text:setup_isac()
ISACVer were only used from function annotated __devinit
so add same annotation to ISACVer.
One af the fererencing functions missed __devinit so add it
and kill an additional warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Karsten Keil <kkeil@suse.de> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: David S. Miller <davem@davemloft.net>