err.no Git - linux-2.6/log

s390: netiucv inlining cleanup

The recent iucv rework patches re-introduced some unnecessary inlines.
Remove them again.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: netiucv spinlock initializer cleanup

spinlock initializer cleanup in netiucv.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: don't call iucv_path_connect from tasklet context

net/iucv/iucv.c creates the requirement for
iucv_path_connect not to be called from tasklet context anymore.
An extra checking is added in case of a failing netiucv_tx
to fulfil this requirement for netiucv.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: Use ccw_device_get_id() in qeth/claw drivers

Use ccw_device_get_id() to get a device number
instead of parsing the ccw device's bus id.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: qeth: wrong packet length in qdio header

Packets Length in qdio header is broken when using
EDDP on Layer2 devices. This leads to skb_under_panic on receiver
system when running on z/VM GuestLAN devices.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: avoid inconsistent lock state in qeth

ipv6_regen_rndid in net/ipv6/addrconf.c makes use of "write_lock_bh"
for its inet6_dev->lock. It may run in softirq-context.
qeth makes use of "read_lock" for the same inet6_dev->lock.
To avoid a potential deadlock situation, qeth should make use of
"read_lock_bh" for its usages of inet6_dev->lock.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: qeth driver does not recover

While first recovery continues, the card issues
a STARTLAN command itself. In this case qeth
schedules another recovery. This second
recovery is cancelled because of an already running first recovery.
Stop first recovery in case of 0xe080.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

s390: print correct level for HiperSockets devices

For real HiperSockets the EBCDIC-ASCII conversion is not necessary.
This is only needed for z/VM GuestLAN devices.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

bonding: Fix 802.3ad no carrier on "no partner found" instance

Modify carrier state determination for 802.3ad mode to comply
with section 43.3.9 of IEEE 802.3, which requires that "Links that are
not successful candidates for aggregation (e.g., links that are attached
to other devices that cannot perform aggregation or links that have been
manually configured to be non-aggregatable) are enabled to operate as
individual IEEE 802.3 links."

Bug reported by Laurent Chavey <chavey@google.com>. This patch
is an updated version of his patch that changes the wording of
commentary and adds an update to the driver version.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

bonding: Fix use after free in unregister path

The following patch (based on a patch from Stephen Hemminger
<shemminger@linux-foundation.org>) removes use after free conditions in
the unregister path for the bonding master. Without this patch, an
operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
would trigger a NULL pointer dereference in sysfs. I was not able to
induce the failure with the non-sysfs code path, but for consistency I
updated that code as well.

I also did some testing of the bonding /proc file being open
while the bond is being deleted, and didn't see any problems there.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

spidernet: checksum and ethtool

It doesn't look like spidernet hardware can really checksum all protocols,
the code looks like it does IPV4 only. If so, it should use NETIF_F_IP_CSUM
instead of NETIF_F_HW_CSUM.

The driver doesn't need it's own get/set for ethtool tx csum, and it
should use the standard ethtool_op_get_link.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

spidernet: turn off descriptor chain end interrupt.

At some point, the transmit descriptor chain end interrupt (TXDCEINT)
was turned on. This is a mistake; and it damages small packet
transmit performance, as it results in a huge storm of interrupts.
Turn it off.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

spidernet: silence the ramfull messages

Although the previous patch resolved issues with hangs when the
RX ram full interrupt is encountered, there are still situations
where lots of RX ramfull interrupts arrive, resulting in a noisy
log in syslog. There is no need for this.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

spidernet: Don't terminate the RX ring

The terminated RX ring will cause trouble during the RX ram full
conditions, leading to a hung driver, as the hardware can't find
the next descr. There is no real reason to terminate the RX ring;
it doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

spidernet: Cure RX ram full bug

This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and the fix.

As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS). When te RX ram full condition occurs,
a certain bug/feature is triggered that has to be specially handled.
This section describes the special handling for this condition.

When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a
deadlock condition, as the tail pointer will be pointing at this descr,
which, from the OS point of view, is empty; the OS will be waiting for
this descr to be filled. However, the hardware has skipped this descr,
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill,
while the hardware is waiting for a differen set of descrs to become
empty.

A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:

net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa0800000
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x40800001
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa0800000

Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.

The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
descr 254, since tail was at 255.) Thus, the system is deadlocked,
and there can be no forward progress; the OS thinks there's nothing
to do, and the hardware has nowhere to put incoming data.

This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there. Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

spidernet: null out skb pointer after its been used.

Avoid kernel crash in mm/slab.c due to double-free of pointer.

If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: Only set client->iso_context if allocation was successful.
  ieee1394: fix to ether1394_tx in ether1394.c
  firewire: fix hang after card ejection

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary
  IB/mlx4: Handle FW command interface rev 3
  IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean()
  IB/mlx4: Get rid of max_inline_data calculation
  IB/mlx4: Handle new FW requirement for send request prefetching
  IB/mlx4: Fix warning in rounding up queue sizes
  IB/mlx4: Fix handling of wq->tail for send completions

firewire: Only set client->iso_context if allocation was successful.

This patch fixes an OOPS on cdev release for an fd where iso context
creation failed.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Don't drag a platform specific header into generic arch code.

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fix powermac late initcall to only run on powermac
[POWERPC] PowerPC: Prevent data exception in kernel space (32-bit)

Fix up CREDIT entry ordering

Reorder my CREDIT entry to make it in alphabetic order by last name.

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86_64: fix link warning between for .text and .init.text

WARNING: arch/x86_64/kernel/built-in.o(.text+0xace9): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad09): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad38): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: drivers/built-in.o(.text+0x3a680): Section mismatch: reference to .init.text:acpi_map_pxm_to_node (between 'acpi_get_node' and 'acpi_lock_ac_dir')

AK: also marked mtrr_bp_init __init to avoid some more warnings

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: change_page_attr bandaids

- Disable CLFLUSH again; it is still broken. Always do WBINVD.
- Always flush in the i386 case, not only when there are deferred pages.

These are both brute-force inefficient fixes, to be improved
next release cycle.

The changes to i386 are a little more extensive than strictly
needed (some dead code added), but it is more similar to the x86-64 version
now and the dead code will be used soon.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: Disable KPROBES with DEBUG_RODATA for now

Right now Kprobes cannot write to the write protected kernel text when
DEBUG_RODATA is enabled. Disallow this in Kconfig for now.

Temporary fix for 2.6.22. In .23 add code to temporarily
unprotect it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: Only make Macintosh drivers default on Macs

It's already annoying that they appear on x86 now -- that's for the 3button
emulation needed on x86 macs -- but at least don't make them default.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86_64: Quieten Atari keyboard warnings in Kconfig

Not directly related to x86, but I got tired of seeing these warnings on every
kconfig update when building on a non m68k box:

drivers/input/keyboard/Kconfig:170:warning: 'select' used by config symbol 'KEYBOARD_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'
drivers/input/mouse/Kconfig:182:warning: 'select' used by config symbol 'MOUSE_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'

I moved the definition of ATARI_KBD_CORE into drivers/input/keyboard/Kconfig
so it's always seen by Kconfig.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: Disable DAC on VIA bridges

Several reports that VIA bridges don't support DAC and corrupt
data. I don't know if it's fixed, but let's just blacklist
them all for now.

It can be overwritten with iommu=usedac

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86_64: Fix eventd/timerfd syscalls

They had the same syscall number.

Pointed out by Davide Libenzi

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86_64: Fix readahead/sync_file_range/fadvise64 compat calls

Correctly convert the u64 arguments from 32bit to 64bit.

Pointed out by Heiko Carstens.

I guess this proves Linus' theory that nobody uses the more exotic Linux
specific syscalls. It wasn't discovered by a user.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[MIPS] Don't drag a platform specific header into generic arch code.

For some platforms it's definitions may conflict. So that's the one-liner.
The rest is 10 square kilometers of collateral damage fixup this include
used to paper over.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[POWERPC] Fix powermac late initcall to only run on powermac

Current ppc64_defconfig kernel fails to boot on iSeries, dying with:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000071b258
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 iSeries
<snip>
NIP [c00000000071b258] .iSeries_src_init+0x34/0x64
LR [c000000000701bb4] .kernel_init+0x1fc/0x3bc
Call Trace:
[c000000007d0be30] [0000000000008000] 0x8000 (unreliable)
[c000000007d0bea0] [c000000000701bb4] .kernel_init+0x1fc/0x3bc
[c000000007d0bf90] [c0000000000262d4] .kernel_thread+0x4c/0x68
Instruction dump:
e922cba8 3880ffff 78840420 f8010010 f821ff91 60000000 e8090000 78095fe3
4182002c e922cb58 e862cbb0 e9290140 <e8090000> f8410028 7c0903a6 e9690010
Kernel panic - not syncing: Attempted to kill init!

This happens because some powermac code unconditionally sets
ppc_md.progress to NULL. This patch makes sure the powermac late
initcall is only run on powermac machines.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[POWERPC] PowerPC: Prevent data exception in kernel space (32-bit)

The "is_exec" branch of the protection check in do_page_fault()
didn't do anything on 32-bit PowerPC. So if a userland program
jumps to a page with Linux protection flags "---p", all the tests
happily fall through, and handle_mm_fault() is called, which in
turn calls handle_pte_fault(), which calls update_mmu_cache(),
which goes flush the dcache to a page with no access rights.

Boom.

This fixes it.

Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[POWERPC] rheap - eliminates internal fragments caused by alignment

The patch adds fragments caused by rh_alloc_align() back to free list, instead
of allocating the whole chunk of memory. This will greatly improve memory
utilization managed by rheap.

It solves MURAM not enough problem with 3 UCCs enabled on MPC8323.

Signed-off-by: Li Yang <leoli@freescale.com>
Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6:
sh64: Handle -ERESTART_RESTARTBLOCK for restartable syscalls.

Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Handle -ERESTART_RESTARTBLOCK for restartable syscalls.
  sh: oops_enter()/oops_exit() in die().
  sh: Fix restartable syscall arg5 clobbering.

Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] Move psw_set_key.
  [S390] Add oops_enter()/oops_exit() calls to die().
  [S390] Print list of modules on die().
  [S390] Fix yet another two section mismatches.
  [S390] Fix zfcpdump header
  [S390] Missing blank when appending cio_ignore kernel parameter

Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Update the MAINTAINERS file entry for XFS - change git repo name.
  [XFS] s/memclear_highpage_flush/zero_user_page/
  [XFS] Update the MAINTAINERS file entry for XFS.

[S390] Move psw_set_key.

Move psw_set_key() from ptrace.h to processor.h which is a more
suitable place for it. In addition the moves makes the function
invisible to user space.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] Add oops_enter()/oops_exit() calls to die().

This is mainly to switch off all potentially debugging stuff that
won't report anything useful after an oops happened.
Besided that setting pause_on_oops will work too, but doesn't make
too much sense on s390.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] Print list of modules on die().

Print list of modules on die() like a lot of other architectures do.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] Fix yet another two section mismatches.

WARNING: arch/s390/kernel/built-in.o(.text+0xb92a):
Section mismatch: reference to .init.text:start_secondary
(between 'restart_addr' and 'stack_overflow')
WARNING: arch/s390/appldata/built-in.o(.data+0xdc):
Section mismatch: reference to .init.text:
(between 'appldata_nb' and 'appldata_timer_lock')

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] Fix zfcpdump header

Added members for volume number and real memory size to header information.

Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] Missing blank when appending cio_ignore kernel parameter

When appending the 'cio_ignore' kernel parameter to the command line, a blank
has to be inserted in order to separate 'cio_ignore' from the preceding kernel
parameters.

Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[XFS] Update the MAINTAINERS file entry for XFS - change git repo name.

Make the git repository bare and so give it the conventional .git suffix.

Signed-off-by: Tim Shimmin <tes@sgi.com>

[XFS] s/memclear_highpage_flush/zero_user_page/

SGI-PV: 957103
SGI-Modid: xfs-linux-melb:xfs-kern:28678a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>

Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 into for-linus

[POWERPC] Fix snd-powermac refcounting bugs

The old snd-powermac driver has some serious refcounting issues when
initialisation fails, which is the case on all new machines with
a layout-id since those are handled by the new snd-aoa driver.

Some of those bugs seem to have been under the radar for some time
(like double pci_dev_put), but one was actually added in 2.6.22 with
Stephen attempt at teaching refcounting to the driver which didn't
do it at all.

This patch fixes both, thus removing all sort of kref errors that
would happen if that driver gets loaded on a G5 machine or a recent
PowerBook due to OF nodes left around with a 0 refcount.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>

sh64: Handle -ERESTART_RESTARTBLOCK for restartable syscalls.

The current implementation only handles -ERESTARTNOHAND, whereas we
also need to handle -ERESTART_RESTARTBLOCK in the handle_signal()
case for restartable system calls. Follows the sh change.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

sh: Handle -ERESTART_RESTARTBLOCK for restartable syscalls.

The current implementation only handles -ERESTARTNOHAND, whereas we
also need to handle -ERESTART_RESTARTBLOCK in the handle_signal()
case for restartable system calls.

As noted by Carl:

This fixes the LTP test nanosleep03 - the current kernel causes
-ERESTART_RESTARTBLOCK to reach user space rather than the correct
-EINTR.

Reported-by: Carl Shaw <shaw.carl@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Fix possible runqueue lock starvation in wait_task_inactive()

Miklos Szeredi reported very long pauses (several seconds, sometimes
more) on his T60 (with a Core2Duo) which he managed to track down to
wait_task_inactive()'s open-coded busy-loop.

He observed that an interrupt on one core tries to acquire the
runqueue-lock but does not succeed in doing so for a very long time -
while wait_task_inactive() on the other core loops waiting for the first
core to deschedule a task (which it wont do while spinning in an
interrupt handler).

This rewrites wait_task_inactive() to do all its waiting optimistically
without any locks taken at all, and then just double-check the end
result with the proper runqueue lock held over just a very short
section.  If there were races in the optimistic wait, of a preemption
event scheduled the process away, we simply re-synchronize, and start
over.

So the code now looks like this:

repeat:
/* Unlocked, optimistic looping! */
rq = task_rq(p);
while (task_running(rq, p))
cpu_relax();

/* Get the *real* values */
rq = task_rq_lock(p, &flags);
running = task_running(rq, p);
array = p->array;
task_rq_unlock(rq, &flags);

/* Check them.. */
if (unlikely(running)) {
cpu_relax();
goto repeat;
}

/* Preempted away? Yield if so.. */
if (unlikely(array)) {
yield();
goto repeat;
}

Basically, that first "while()" loop is done entirely without any
locking at all (and doesn't check for the case where the target process
might have been preempted away), and so it's possibly "incorrect", but
we don't really care.  Both the runqueue used, and the "task_running()"
check might be the wrong tests, but they won't oops - they just mean
that we could possibly get the wrong results due to lack of locking and
exit the loop early in the case of a race condition.

So once we've exited the loop, we then get the proper (and careful) rq
lock, and check the running/runnable state _safely_.  And if it turns
out that our quick-and-dirty and unsafe loop was wrong after all, we
just go back and try it all again.

(The patch also adds a lot of comments, which is the actual bulk of it
all, to make it more obvious why we can do these things without holding
the locks).

Thanks to Miklos for all the testing and tracking it down.

Tested-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sched: fix SysRq-N (normalize RT tasks)

Gene Heskett reported the following problem while testing CFS: SysRq-N
is not always effective in normalizing tasks back to SCHED_OTHER.

The reason for that turns out to be the following bug:

- normalize_rt_tasks() uses for_each_process() to iterate through all
tasks in the system. The problem is, this method does not iterate
through all tasks, it iterates through all thread groups.

The proper mechanism to enumerate over all threads is to use a
do_each_thread() + while_each_thread() loop.

Reported-by: Gene Heskett <gene.heskett@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] ESP: Don't forget to clear ESP_FLAG_RESETTING.
[SCSI] fusion: fix for BZ 8426 - massive slowdown on SCSI CD/DVD drive

Fix signalfd interaction with thread-private signals

Don't let signalfd dequeue private signals off other threads (in the
case of things like SIGILL or SIGSEGV, trying to do so would result
in undefined behaviour on who actually gets the signal, since they
are force unblocked).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Revert "futex_requeue_pi optimization"

This reverts commit d0aa7a70bf03b9de9e995ab272293be1f7937822.

It not only introduced user space visible changes to the futex syscall,
it is also non-functional and there is no way to fix it proper before
the 2.6.22 release.

The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
unanswered, and unfortunately it turned out that the concept is not
feasible at all. It violates the rtmutex semantics badly by introducing
a virtual owner, which hacks around the coupling of the user-space
pi_futex and the kernel internal rt_mutex representation.

At the moment the only safe option is to remove it fully as it contains
user-space visible changes to broken kernel code, which we do not want
to expose in the 2.6.22 release.

The patch reverts the original patch mostly 1:1, but contains a couple
of trivial manual cleanups which were necessary due to patches, which
touched the same area of code later.

Verified against the glibc tests and my own PI futex tests.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary

Inline data segments in send WQEs are not allowed to cross a 64 byte
boundary. We use inline data segments to hold the UD headers for MLX
QPs (QP0 and QP1). A send with GRH on QP1 will have a UD header that
is too big to fit in a single inline data segment without crossing a
64 byte boundary, so split the header into two inline data segments.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

IB/mlx4: Handle FW command interface rev 3

Upcoming firmware introduces command interface revision 3, which
changes the way port capabilities are queried and set. Update the
driver to handle both the new and old command interfaces by adding a
new MLX4_FLAG_OLD_PORT_CMDS that it is set after querying the firmware
interface revision and then using the correct interface based on the
setting of the flag.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean()

When compacting CQ entries, we need to set the correct value of the
ownership bit in case the value is different between the index we copy
the CQE from and the index we copy it to.

Found by Ronni Zimmerman of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

IB/mlx4: Get rid of max_inline_data calculation

The calculation of max_inline_data in set_kernel_sq_size() is bogus,
since it doesn't take into account the fact that inline segments may
not cross a 64-byte boundary, and hence multiple inline segments will
probably need to be used to post large inline sends.

We don't support inline sends for kernel QPs anyway, so there's no
point in doing this calculation anyway, since the field is just zeroed
out a little later. So just delete the bogus calculation.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

IB/mlx4: Handle new FW requirement for send request prefetching

New ConnectX firmware introduces FW command interface revision 2,
which requires that for each QP, a chunk of send queue entries (the
"headroom") is kept marked as invalid, so that the HCA doesn't get
confused if it prefetches entries that haven't been posted yet. Add
code to the driver to do this, and also update the user ABI so that
userspace can request that the prefetcher be turned off for userspace
QPs (we just leave the prefetcher on for all kernel QPs).

Unfortunately, marking send queue entries this way is confuses older
firmware, so we change the driver to allow only FW command interface
revisions 2. This means that users will have to update their firmware
to work with the new driver, but the firmware is changing quickly and
the old firmware has lots of other bugs anyway, so this shouldn't be too
big a deal.

Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

sh: oops_enter()/oops_exit() in die().

As Russell helpfully pointed out on linux-arch:

http://marc.info/?l=linux-arch&m=118208089204630&w=2

We were missing the oops_enter/exit() in the sh die() implementation.
As we do support lockdep, it's beneficial to add these calls so lockdep
properly disables itself in the die() case.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

sh: Fix restartable syscall arg5 clobbering.

We use R0 as the 5th argument of syscall. When the syscall restarts
after signal handling, we should restore the old value of R0.
The attached patch does it. Without this patch, I've experienced random
failures in the situation which signals are issued frequently.

Signed-off-by: Kaz Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Linux 2.6.22-rc5

The manatees, they are dancing!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shm: fix the filename of hugetlb sysv shared memory

Some user space tools need to identify SYSV shared memory when examining
/proc/<pid>/maps. To do so they look for a block device with major zero, a
dentry named SYSV<sysv key>, and having the minor of the internal sysv
shared memory kernel mount.

To help these tools and to make it easier for people just browsing
/proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the
SYSV<key> dentry naming convention.

User space tools will still have to be aware that hugetlb sysv shared
memory lives on a different internal kernel mount and so has a different
block device minor number from the rest of sysv shared memory.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hugetlb: fix get_policy for stacked shared memory files

Here's another breakage as a result of shared memory stacked files :(

The NUMA policy for a VMA is determined by checking the following (in the
order given):

1) vma->vm_ops->get_policy() (if defined)
2) vma->vm_policy (if defined)
3) task->mempolicy (if defined)
4) Fall back to default_policy

By switching to stacked files for shared memory, get_policy() is now always
set to shm_get_policy which is a wrapper function. This causes us to stop
at step 1, which yields NULL for hugetlb instead of task->mempolicy which
was the previous (and correct) result.

This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for
the wrapped vm_ops.

(akpm: the refcounting of mempolicies is busted and this patch does nothing to
improve it)

Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: dean gaudet <dean@arctic.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

udf: fix possible leakage of blocks

We have to take care that when we call udf_discard_prealloc() from
udf_clear_inode() we have to write inode ourselves afterwards (otherwise,
some changes might be lost leading to leakage of blocks, use of free blocks
or improperly aligned extents).

Also udf_discard_prealloc() does two different things - it removes
preallocated blocks and truncates the last extent to exactly match i_size.
We move the latter functionality to udf_truncate_tail_extent(), call
udf_discard_prealloc() when last reference to a file is dropped and call
udf_truncate_tail_extent() when inode is being removed from inode cache
(udf_clear_inode() call).

We cannot call udf_truncate_tail_extent() earlier as subsequent open+write
would find the last block of the file mapped and happily write to the end
of it, although the last extent says it's shorter.

[akpm@linux-foundation.org: Make checkpatch.pl happier]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Sandeen <sandeen@sandeen.net>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

SLUB: minimum alignment fixes

If ARCH_KMALLOC_MINALIGN is set to a value greater than 8 (SLUBs smallest
kmalloc cache) then SLUB may generate duplicate slabs in sysfs (yes again)
because the object size is padded to reach ARCH_KMALLOC_MINALIGN.  Thus the
size of the small slabs is all the same.

No arch sets ARCH_KMALLOC_MINALIGN larger than 8 though except mips which
for some reason wants a 128 byte alignment.

This patch increases the size of the smallest cache if
ARCH_KMALLOC_MINALIGN is greater than 8.  In that case more and more of the
smallest caches are disabled.

If we do that then the count of the active general caches that is displayed
on boot is not correct anymore since we may skip elements of the kmalloc
array.  So count them separately.

This approach was tested by Havard yesterday.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Rework ptep_set_access_flags and fix sun4c

Some changes done a while ago to avoid pounding on ptep_set_access_flags and
update_mmu_cache in some race situations break sun4c which requires
update_mmu_cache() to always be called on minor faults.

This patch reworks ptep_set_access_flags() semantics, implementations and
callers so that it's now responsible for returning whether an update is
necessary or not (basically whether the PTE actually changed). This allow
fixing the sparc implementation to always return 1 on sun4c.

[akpm@linux-foundation.org: fixes, cleanups]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Miller <davem@davemloft.net>
Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

random: fix output buffer folding

(As reported by linux@horizon.com)

Folding is done to minimize the theoretical possibility of systematic
weakness in the particular bits of the SHA1 hash output. The result of
this bug is that 16 out of 80 bits are un-folded. Without a major new
vulnerability being found in SHA1, this is harmless, but still worth
fixing.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

uml: kill x86_64 STACK_TOP_MAX

The x86_64 a.out.h got a definition of STACK_TOP_MAX, which interferes with
the UML version. So, just undef it like STACK_TOP.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

uml: remove PAGE_SIZE from libc code

Distros seem to be removing PAGE_SIZE from asm/page.h. So, the libc side of
UML should stop using it.

I replace it with UM_KERN_PAGE_SIZE, which is defined to be the same as
PAGE_SIZE on the kernel side of the house. I could also use getpagesize(),
but it's more important that UML have the same value of PAGE_SIZE everywhere.
It's conceivable that it could be built with a larger PAGE_SIZE, and use of
getpagesize() would break that badly.

PAGE_MASK got the same treatment, as it is closely tied to PAGE_SIZE.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

spi doc updates

Update two points in the SPI interface documentation:

- Update description of the "chip stays selected after message ends"
  mode.  In some cases it's required for correctness; it isn't just a
  performance tweak.  (Yes: to use this mode on mult-device busses, another
  programming interface will be needed.  One draft has been circulated
  already.)

- Clarify spi_setup(), highlighting that callers must ensure that no
  requests are queued (can't change configuration except between I/Os), and
  that the device must be deselected when this returns (which is a key part
  of why it's called during device init).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

md: fix bug in error handling during raid1 repair

If raid1/repair (which reads all block and fixes any differences it finds)
hits a read error, it doesn't reset the bio for writing before writing
correct data back, so the read error isn't fixed, and the device probably
gets a zero-length write which it might complain about.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

md: fix two raid10 bugs

1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
garbage can get synced on top of good data.

2/ We round the wrong way in part of the device size calculation, which
can cause confusion.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fuse: ->fs_flags fixlet

fs/fuse/inode.c:658:3: error: Initializer entry defined twice
fs/fuse/inode.c:661:3: also defined here

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

perfctr-watchdog: fix interchanged parameters to release_{evntsel,perfctr}_nmi

Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog

The culprit seems to be 09198e68501a7e34737cd9264d266f42429abcdc:
[PATCH] i386: Clean up NMI watchdog code

In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup.

Fix interchanged parameters to release_{evntsel,perfctr}_nmi.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

swsusp: Fix userland interface

Fix oops caused by 'cat /dev/snapshot', reported by Arkadiusz Miskiewicz,
and make it impossible to thaw tasks with the help of the swsusp userland
interface while there is a snapshot image ready to save.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

toshiba_acpi: fix section mismatch in allyesconfig

Fix section error (allyesconfig). The exit function is called from init,
so functions that are called by the exit function cannot be marked __exit.

WARNING: drivers/built-in.o(.text+0xe5bc6): Section mismatch: reference to .exit.
text: (between 'toshiba_acpi_exit' and 'hci_raw')

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cpuset: zero malloc - fix for old cpusets

The cpuset code to present a list of tasks using a cpuset to user space could
write to an array that it had kmalloc'd, after a kmalloc request of zero size.

The problem was that the code didn't check for writes past the allocated end
of the array until -after- the first write.

This is a race condition that is likely rare -- it would only show up if a
cpuset went from being empty to having a task in it, during the brief time
between the allocation and the first write.

Prior to roughly 2.6.22 kernels, this was also a benign problem, because a
zero kmalloc returned a few usable bytes anyway, and no harm was done with the
bogus write.

With the 2.6.22 kernel changes to make issue a warning if code tries to write
to the location returned from a zero size allocation, this problem is no
longer benign. This cpuset code would occassionally trigger that warning.

The fix is trivial -- check before storing into the array, not after, whether
the array is big enough to hold the store.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i386 mm: use pte_update() in ptep_test_and_clear_dirty()

It is not safe to use pte_update_defer() in ptep_test_and_clear_young():
its only user, /proc/<pid>/clear_refs, drops pte lock before flushing TLB.
Use the safe though less efficient pte_update() paravirtop in its place.
Likewise in ptep_test_and_clear_dirty(), though that has no current use.

These are macros (header file dependency stops them from becoming inline
functions), so be more liberal with the underscores and parentheses.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Restore shmid as inode# to fix /proc/pid/maps ABI breakage

shmid used to be stored as inode# for shared memory segments. Some of
the proc-ps tools use this from /proc/pid/maps. Recent cleanups
to newseg() changed it. This patch sets inode number back to shared
memory id to fix breakage.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: "Albert Cahalan" <acahalan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

SLUB slab validation: Alloc while interrupts are disabled must use GFP_ATOMIC

The data structure to manage the information gathered about functions
allocating and freeing objects is allocated when the list_lock has already
been taken. We need to allocate with GFP_ATOMIC instead of GFP_KERNEL.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i386: use the right wrapper to disable the NMI watchdog

When disabled through /proc/sys/kernel/nmi_watchdog, the NMI watchdog uses the
stop() method directly, which does not decrement the activity counter, leading
to a BUG(). Use the wrapper function instead to fix that.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i386: fix NMI watchdog not reserving its MSRs

At system boot time, the NMI watchdog no longer reserved its MSRs, allowing
other subsystems to mess with them. Fix that.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tty: restore locked ioctl file op

Restore tty locked ioctl handler which was replaced with
an unlocked ioctl handler in hung_up_tty_fops by the patch:

commit e10cc1df1d2014f68a4bdcf73f6dd122c4561f94
Author: Paul Fulghum <paulkf@microgate.com>
Date:   Thu May 10 22:22:50 2007 -0700

    tty: add compat_ioctl

This was reported in:
[Bug 8473] New: Oops: 0010 [1] SMP

The bug is caused by switching to hung_up_tty_fops in do_tty_hangup.  An
ioctl call can be waiting on BLK after testing for existence of the locked
ioctl handler in the normal tty fops, but before calling the locked ioctl
handler.  If a hangup occurs at that point, the locked ioctl fop is NULL
and an oops occurs.

(akpm: we can remove my debugging code from do_ioctl() now, but it'll be OK to
do that for 2.6.23)

Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fix radeon setparam on 32/64 systems, harder.

Commit 9b01bd5b284bbf519b726b39f1352023cb5e9e69 introduced a
compat_ioctl handler for RADEON_SETPARAM, the sole purpose of which was
to handle the fact that on i386, alignof(uint64_t)==4.

Unfortunately, this handler was installed for _all_ 64-bit
architectures, instead of only x86_64 and ia64. And thus it breaks
32-bit compatibility on every other arch, where 64-bit integers are
aligned to 8 bytes in 32-bit mode just the same as in 64-bit mode.

Arnd has a cunning plan to use 'compat_u64' with appropriate alignment
attributes according to the 32-bit ABI, but for now let's just make the
compat_radeon_cp_setparam routine entirely disappear on 64-bit machines
whose 32-bit compat support isn't for i386. It would be a no-op with
compat_u64 anyway.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ieee1394: fix to ether1394_tx in ether1394.c

This patch fixes a problem that occurs when packets cannot be sent across
the ieee1394 bus and we return NETDEV_TX_BUSY in the net driver "hard start
xmit" routine ether1394_tx. When we return NETDEV_TX_BUSY the stack will
call ether1394_tx again with the same skb. So we need to restore the header
to look like it did before we munged it for xmit over ieee1394.

[Stefan Richter: changed whitespace, deleted a local variable]

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fix hang after card ejection

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>

Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
ide-scsi: fix OOPS in idescsi_expiry()
Resume from RAM on HPC nx6325 broken

ide-scsi: fix OOPS in idescsi_expiry()

drive->driver_data contains pointer to Scsi_Host not idescsi_scsi_t.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Resume from RAM on HPC nx6325 broken

generic_ide_resume() should check if dev->driver is not NULL before applying
to_ide_driver() to it. Fix that.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [RXRPC] net/rxrpc/ar-connection.c: fix NULL dereference
  [TCP]: Fix logic breakage due to DSACK separation
  [TCP]: Congestion control API RTT sampling fix

mm: Fix memory/cpu hotplug section mismatch and oops.

When building with memory hotplug enabled and cpu hotplug disabled, we
end up with the following section mismatch:

WARNING: mm/built-in.o(.text+0x4e58): Section mismatch: reference to
.init.text: (between 'free_area_init_node' and '__build_all_zonelists')

This happens as a result of:

        -> free_area_init_node()
          -> free_area_init_core()
            -> zone_pcp_init() <-- all __meminit up to this point
              -> zone_batchsize() <-- marked as __cpuinit                     fo

This happens because CONFIG_HOTPLUG_CPU=n sets __cpuinit to __init, but
CONFIG_MEMORY_HOTPLUG=y unsets __meminit.

Changing zone_batchsize() to __devinit fixes this.

__devinit is the only thing that is common between CONFIG_HOTPLUG_CPU=y and
CONFIG_MEMORY_HOTPLUG=y. In the long run, perhaps this should be moved to
another section identifier completely. Without this, memory hot-add
of offline nodes (via hotadd_new_pgdat()) will oops if CPU hotplug is
not also enabled.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
--

mm/page_alloc.c |    2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (30 commits)
  Blackfin SMC91X ethernet supporting driver: SMC91C111 LEDs are note drived in the kernel like in uboot
  Blackfin SPI driver: fix bug SPI DMA incomplete transmission
  Blackfin SPI driver: tweak spi cleanup function to match newer kernel changes
  Blackfin RTC drivers: update MAINTAINERS information
  Blackfin serial driver: decouple PARODD and CMSPAR checking from PARENB
  Blackfin serial driver: actually implement the break_ctl() function
  Blackfin serial driver: ignore framing and parity errors
  Blackfin serial driver: hook up our UARTs STP bit with userspaces CMSPAR
  Blackfin arch: move HI/LO macros into blackfin.h and punt the rest of macros.h as it includes VDSP macros we never use
  Blackfin arch: redo our linker script a bit
  Blackfin arch: make sure we initialize our L1 Data B section properly based on the linked kernel
  Blackfin arch: fix bug can not wakeup from sleep via push buttons
  Blackfin arch: add support for Alon Bar-Lev's dynamic kernel command-line
  Blackfin arch: add missing gpio.h header to fix compiling in some pm configurations
  Blackfin arch: As Mike pointed out range goes form m..MAX_BLACKFIN_GPIO -1
  Blackfin arch: fix spelling typo in output
  Blackfin arch: try to split up functions like this into smaller units according to LKML review
  Blackfin arch: add proper ENDPROC()
  Blackfin arch: move more of our startup code to .init so it can be freed once we are up and running
  Blackfin arch: unify differences between our diff head.S files -- no functional changes
  ...

Merge branch 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block

* 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
  splice: only check do_wakeup in splice_to_pipe() for a real pipe
  splice: fix leak of pages on short splice to pipe
  splice: adjust balance_dirty_pages_ratelimited() call

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Prevent guest fpu state from leaking into the host

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Fix builds where MSC01E_xxx is undefined.
  [MIPS] Separate performance counter interrupts
  [MIPS] Malta: Fix for SOCitSC based Maltas

Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32

* 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
  [AVR32] Define ARCH_KMALLOC_MINALIGN to L1_CACHE_BYTES
  [AVR32] STK1000: Set SPI_MODE_3 in the ltv350qv board info
  [AVR32] gpio_*_cansleep() fix
  [AVR32] ratelimit segfault reporting rate

block: always requeue !fs requests at the front

SCSI marks internal commands with REQ_PREEMPT and push it at the front
of the request queue using blk_execute_rq().  When entering suspended
or frozen state, SCSI devices are quiesced using
scsi_device_quiesce().  In quiesced state, only REQ_PREEMPT requests
are processed.  This is how SCSI blocks other requests out while
suspending and resuming.  As all internal commands are pushed at the
front of the queue, this usually works.

Unfortunately, this interacts badly with ordered requeueing.  To
preserve request order on requeueing (due to busy device, active EH or
other failures), requests are sorted according to ordered sequence on
requeue if IO barrier is in progress.

The following sequence deadlocks.

1. IO barrier sequence issues.

2. Suspend requested.  Queue is quiesced with part or all of IO
   barrier sequence at the front.

3. During suspending or resuming, SCSI issues internal command which
   gets deferred and requeued for some reason.  As the command is
   issued after the IO barrier in #1, ordered requeueing code puts the
   request after IO barrier sequence.

4. The device is ready to process requests again but still is in
   quiesced state and the first request of the queue isn't
   REQ_PREEMPT, so command processing is deadlocked -
   suspending/resuming waits for the issued request to complete while
   the request can't be processed till device is put back into
   running state by resuming.

This can be fixed by always putting !fs requests at the front when
requeueing.

The following thread reports this deadlock.

  http://thread.gmane.org/gmane.linux.kernel/537473

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: David Greaves <david@dgreaves.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[RXRPC] net/rxrpc/ar-connection.c: fix NULL dereference

This patch fixes a NULL dereference spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>