David Brownell [Wed, 28 Jun 2006 14:47:15 +0000 (07:47 -0700)]
[PATCH] SPI: infrastructure to initialize spi_device.mode early
This patch adds earlier initialization of spi_device.mode, as needed
on boards using nondefault chipselect polarity. An example would be
ones using the RS5C348 RTC without an external signal inverter between
the RTC chipselect and the SPI controller.
Without this mechanism, the first setup() call for that chip would
wrongly enable chips, corrupting transfers to/from other chips sharing
that SPI bus.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Wed, 28 Jun 2006 06:39:19 +0000 (16:39 +1000)]
[PATCH] m68knommu: configuration options for ROM region
Use Kconfig options to setup the optional ROM region used on some
platforms. We used to define this in the linker script on a per
board basis. The configure options are more flexible and clean up
the linker script a lot.
Linus Torvalds [Wed, 28 Jun 2006 22:00:49 +0000 (15:00 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64-SGI] fix prom revision checks in SN kernel
[IA64] tiger_defconfig s/NR_CPUS=4/NR_CPUS=16/
[IA64-SGI] - Pass OS logical cpu number to the SN prom (bios)
[IA64] palinfo.c: s/register_cpu_notifier/register_hotcpu_notifier/
Alan Cox [Wed, 28 Jun 2006 11:26:59 +0000 (04:26 -0700)]
[PATCH] ide: clean up siimage
Remove all the ifdef preparation for enhanced features that never occcurred
and is only in libata. For the SATA chips (but not yet PATA ones) politely
suggest to the user that libata may offer more features.
Signed-off-by: Alan Cox <alan@redhat.com> Cc: Sergei Shtylylov <sshtylyov@ru.mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Wed, 28 Jun 2006 11:26:58 +0000 (04:26 -0700)]
[PATCH] Old IDE, fix SATA detection for cabling
This is based on the proposed patches flying around but also checks that
the device in question is new enough to have word 93 rather thanb blindly
assuming word 93 == 0 means SATA (see ATA-5, ATA-7)
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Wed, 28 Jun 2006 11:26:57 +0000 (04:26 -0700)]
[PATCH] ac97_codec: make bitfield unsigned
Make a 1-bit bitfield unsigned (no space for sign bit).
Removes 24 sparse warnings from this one file:
include/linux/ac97_codec.h:262:13: error: dubious one-bit signed bitfield
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Wed, 28 Jun 2006 11:26:56 +0000 (04:26 -0700)]
[PATCH] sound: fix cs4232 section mismatch
Fix section mismatch: adds __init to probe function,
frees some init memory, not critical.
WARNING: sound/oss/cs4232.o - Section mismatch: reference to .init.text: from .text after 'cs4232_pnp_probe' (at offset 0x152)
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a erroneous calculation of the legacy brightness values as reported by
Paul Collins. Additionally, it moves the calculation of the negative value
in the radeonfb driver after the value check.
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Collins <paul@briny.ondioline.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Corey Minyard [Wed, 28 Jun 2006 11:26:55 +0000 (04:26 -0700)]
[PATCH] IPMI: watchdog handle panic properly
Modify the watchdog timeout in IPMI to only do things at panic/reboot time if
the watchdog timer was already running. Some BIOSes do not disable the
watchdog timer at startup, and this led to a reboot a while later if the new
OS running didn't start monitoring the watchdog, even if the watchdog was not
running before.
Latchesar Ionkov [Wed, 28 Jun 2006 11:26:50 +0000 (04:26 -0700)]
[PATCH] v9fs: return the correct error when interrupted by signal
If a signal interrupts the user process, v9fs sends a flush request to the
file server and waits for its response. It error code is incorrectly set
to the error code of the flush message instead of ERESTARTSYS. The patch
sets the error code to the correct value.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Fulghum [Wed, 28 Jun 2006 11:26:49 +0000 (04:26 -0700)]
[PATCH] remove active field from tty buffer structure
Remove 'active' field from tty buffer structure. This was added in 2.6.16
as part of a patch to make the new tty buffering SMP safe. This field is
unnecessary with the more intelligently written flush_to_ldisc that adds
receive_room handling.
Removing this field reverts to simpler logic where the tail buffer is
always the 'active' buffer, which should not be freed by flush_to_ldisc.
(active == buffer being filled with new data)
The result is simpler, smaller, and faster tty buffer code.
Signed-off-by: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Fulghum [Wed, 28 Jun 2006 11:26:48 +0000 (04:26 -0700)]
[PATCH] add receive_room flow control to flush_to_ldisc
Flush data serially to line discipline in blocks no larger than
tty->receive_room to avoid losing data if line discipline is busy (such as
N_TTY operating at high speed on heavily loaded system) or does not accept
data in large blocks (such as N_MOUSE).
Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Fulghum [Wed, 28 Jun 2006 11:26:47 +0000 (04:26 -0700)]
[PATCH] remove TTY_DONT_FLIP
Remove TTY_DONT_FLIP tty flag. This flag was introduced in 2.1.X kernels
to prevent the N_TTY line discipline functions read_chan() and
n_tty_receive_buf() from running at the same time. 2.2.15 introduced
tty->read_lock to protect access to the N_TTY read buffer, which is the
only state requiring protection between these two functions.
The current TTY_DONT_FLIP implementation is broken for SMP, and is not
universally honored by drivers that send data directly to the line
discipline receive_buf function.
Because TTY_DONT_FLIP is not necessary, is broken in implementation, and is
not universally honored, it is removed.
Signed-off-by: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arjan van de Ven [Wed, 28 Jun 2006 11:26:45 +0000 (04:26 -0700)]
[PATCH] Add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL
Temporarily add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL. These
will be used as a transition measure for symbols that aren't used in the
kernel and are on the way out. When a module uses such a symbol, a warning
is printk'd at modprobe time.
The main reason for removing unused exports is size: eacho export takes
roughly between 100 and 150 bytes of kernel space in the binary. This
patch gives users the option to immediately get this size gain via a config
option.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Aaron Young [Wed, 28 Jun 2006 15:34:55 +0000 (08:34 -0700)]
[IA64-SGI] fix prom revision checks in SN kernel
The following patch fixes two spots in the SN kernel
that check a fixed prom revision number to determine prom
feature support. These checks are only valid on shub1 systems.
They are invalid on shub2 systems which have a different prom
with different revision numbers.
Signed-off-by: Aaron Young <ayoung@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Chandra Seetharaman missed one place in commit: 65edc68c345cbe21d0b0375c3452a3ed5e322868
[but it only shows up when building the ski simulator configuration
of ia64, so thats understandable]
Linus Torvalds [Wed, 28 Jun 2006 02:15:50 +0000 (19:15 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
[netdrvr] Remove long-unused bits from Becker template drivers
[netdrvr] natsemi: minor cleanups
[netdrvr] natsemi: Separate out media initialization code
[PATCH] WAN: update info page for a bunch of my drivers
[PATCH] drivers/net/hamradio/dmascc.c: fix section mismatch
[PATCH] Fix phy id for LXT971A/LXT972A
[PATCH] DM9000 - minor code cleanups
[PATCH] DM9000 - do no re-init spin lock
[PATCH] DM9000 - check for MAC left in by bootloader
[PATCH] DM9000 - better checks for platform resources
Linus Torvalds [Wed, 28 Jun 2006 02:09:16 +0000 (19:09 -0700)]
Merge git://oss.sgi.com:8090/nathans/xfs-2.6
* git://oss.sgi.com:8090/nathans/xfs-2.6:
[XFS] Fixup whitespace damage in log_write, remove final warning.
[XFS] Rework code snippets slightly to remove remaining recent-gcc
[XFS] Fix realtime subvolume expansion, a porting bug b0rked it. Coverity
[XFS] Remove a race condition where a linked inode could BUG_ON in
[XFS] Remove redundant directory checks from inode link operation.
[XFS] Remove a couple of no-longer-used macros.
[XFS] Reduce size of xfs_trans_t structure. * remove ->t_forw, ->t_back --
[XFS] remove unused behaviour lock - shrink XFS vnode as a side effect.
[XFS] * There is trivial "inode => vnode => inode" conversion, but only
[XFS] link(2) on directory is banned in VFS.
Daniel Ritz [Tue, 27 Jun 2006 16:40:54 +0000 (18:40 +0200)]
[PATCH] i2c-i801.c: don't pci_disable_device() after it was just enabled
Commit 02dd7ae2892e5ceff111d032769c78d3377df970 ("[PATCH] i2c-i801:
Merge setup function") has a missing return 0 in the _probe() function.
This means the error path is always executed and pci_disable_device() is
called even when the device just got successfully enabled.
Having the SMBus device disabled makes some systems (eg.
Fujitsu-Siemens Lifebook E8010) hang hard during power-off.
Intead of reverting the whole commit this patch fixes it up:
- don't ever call pci_disable_device(), also not in the _remove() function
to avoid hangs
- fix missing pci_release_region() in error path
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (25 commits)
[CIFS] Fix authentication choice so we do not force NTLMv2 unless the
[CIFS] Fix alignment of unicode strings in previous patch
[CIFS] Fix allocation of buffers for new session setup routine to allow
[CIFS] Remove calls to to take f_owner.lock
[CIFS] remove some redundant null pointer checks
[CIFS] Fix compile warning when CONFIG_CIFS_EXPERIMENTAL is off
[CIFS] Enable sec flags on mount for cifs (part one)
[CIFS] Fix suspend/resume problem which causes EIO on subsequent access to
[CIFS] fix minor compile warning when config_cifs_weak_security is off
[CIFS] NTLMv2 support part 5
[CIFS] Add support for readdir to legacy servers
[CIFS] NTLMv2 support part 4
[CIFS] NTLMv2 support part 3
[CIFS] NTLMv2 support part 2
[CIFS] Fix mask so can set new cifs security flags properly
CIFS] Support for older servers which require plaintext passwords - part 2
[CIFS] Support for older servers which require plaintext passwords
[CIFS] Fix mapping of old SMB return code Invalid Net Name so it is
[CIFS] Missing brace
[CIFS] Do not overwrite aops
...
Greg Ungerer [Tue, 27 Jun 2006 03:19:33 +0000 (13:19 +1000)]
[PATCH] m68knommu: FEC driver event/irq fixes
Collection of fixes for the ColdFire FEC ethernet driver:
. reworked event setting so that it occurs after the MII setup.
roucaries bastien <roucaries.bastien@gmail.com>
. Do not read cbd_sc in memory for each bit we test. Once per buffer is enough.
. Overrun errors must increase `rx_fifo_errors', not `rx_crc_errors'
. No need for a special value to activate rx or tx. Only write access matters.
. Simplify parameter of eth_copy_and_sum : `data' has already the right value.
. Some spelling fixes.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Willson Callan [Tue, 27 Jun 2006 03:13:44 +0000 (13:13 +1000)]
[PATCH] m68knommu: FEC driver set different priority/level on each IRQ
Set different irq priority levels for each IRQ requested.
According to the Freescale ColdFire documentation each separate IRQ
must have its own unique priority/level combination.
Here is a small patch that made my kernel .text segment shrink by 8k IIRC
on my 5272-based board, by removing `-Wa,-S' from CFLAGS.
The `-Wa,-S' option prevents `gas' from using short forms of jsr.
Without it, `gas' replaces `jsr xxx.l' (6 bytes) by `jsr xxx@pc'
(4 bytes) when possible. On 5272, both forms are equally fast.
The `-Wa,-m5307' option is useless, because gcc already gives it
to `gas' from the `-m5307' option.
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (26 commits)
V4L/DVB (4263): Fix warning when compiling on 64 bit machines
V4L/DVB (4261): Included required header for in-kernel compilation
V4L/DVB (4260): Stradis.c: make 2 functions static
V4L/DVB (4259): Pass an explicit log prefix to cx2341x_log_status
V4L/DVB (4257): Fix 64-bit compile warnings.
V4L/DVB (4255): Tda9887 default TOP value is 0x10
V4L/DVB (4254): Remove obsoleted tuner_debug option.
V4L/DVB (4253): IVTV VBI format description too long.
V4L/DVB (4252): Remove duplicate 'tda9887' in info messages.
V4L/DVB (4245): Reduce the amount of pvrusb2-sourced noise going into the system log
V4L/DVB (4244): Implement use of cx2341x module in pvrusb2 driver
V4L/DVB (4243): Exploit new V4L control features in pvrusb2
V4L/DVB (4242): Don't suspend encoder when changing its attributes (in pvrusb2)
V4L/DVB (4241): Fix faulty encoder error recovery in pvrusb2
V4L/DVB (4240): Various V4L control enhancements in pvrusb2
V4L/DVB (4239): Handle boolean controls in pvrusb2
V4L/DVB (4238): Make sure flags field is initialized when quering a control in pvrusb2
V4L/DVB (4237): Move LOG_STATUS bracketing to a different part of the pvrusb2 driver
V4L/DVB (4236): Rearrange things in pvrusb2 driver in preparation for using cx2341x module
V4L/DVB (4235): Increase the maximum number of controls that pvrusb2-sysfs.c can handle.
...
Thomas Gleixner [Tue, 27 Jun 2006 09:55:02 +0000 (02:55 -0700)]
[PATCH] rtmutex: Propagate priority settings into PI lock chains
When the priority of a task, which is blocked on a lock, changes we must
propagate this change into the PI lock chain. Therefor the chain walk code
is changed to get rid of the references to current to avoid false positives
in the deadlock detector, as setscheduler might be called by a task which
holds the lock on which the task whose priority is changed is blocked.
Also add some comments about the get/put_task_struct usage to avoid
confusion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thomas Gleixner [Tue, 27 Jun 2006 09:55:01 +0000 (02:55 -0700)]
[PATCH] rtmutex: Modify rtmutex-tester to test the setscheduler propagation
Make test suite setscheduler calls asynchronously. Remove the waits in the
test cases and add a new testcase to verify the correctness of the
setscheduler priority propagation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thomas Gleixner [Tue, 27 Jun 2006 09:55:00 +0000 (02:55 -0700)]
[PATCH] Drop tasklist lock in do_sched_setscheduler
There is no need to hold tasklist_lock across the setscheduler call, when
we pin the task structure with get_task_struct(). Interrupts are disabled
in setscheduler anyway and the permission checks do not need interrupts
disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Tue, 27 Jun 2006 09:54:57 +0000 (02:54 -0700)]
[PATCH] pi-futex: rt mutex futex api
Add proxy-locking rt-mutex functionality needed by pi-futexes.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thomas Gleixner [Tue, 27 Jun 2006 09:54:56 +0000 (02:54 -0700)]
[PATCH] pi-futex: rt mutex tester
RT-mutex tester: scriptable tester for rt mutexes, which allows userspace
scripting of mutex unit-tests (and dynamic tests as well), using the actual
rt-mutex implementation of the kernel.
[akpm@osdl.org: fixlet] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Tue, 27 Jun 2006 09:54:55 +0000 (02:54 -0700)]
[PATCH] pi-futex: rt mutex debug
Runtime debugging functionality for rt-mutexes.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steven Rostedt [Tue, 27 Jun 2006 09:54:54 +0000 (02:54 -0700)]
[PATCH] pi-futex: rt mutex docs
Add rt-mutex documentation.
[rostedt@goodmis.org: Update rt-mutex-design.txt as per Randy Dunlap suggestions] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Tue, 27 Jun 2006 09:54:53 +0000 (02:54 -0700)]
[PATCH] pi-futex: rt mutex core
Core functions for the rt-mutex subsystem.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Tue, 27 Jun 2006 09:54:51 +0000 (02:54 -0700)]
[PATCH] pi-futex: scheduler support for pi
Add framework to boost/unboost the priority of RT tasks.
This consists of:
- caching the 'normal' priority in ->normal_prio
- providing a functions to set/get the priority of the task
- make sched_setscheduler() aware of boosting
The effective_prio() cleanups also fix a priority-calculation bug pointed out
by Andrey Gelman, in set_user_nice().
has_rt_policy() fix: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrey Gelman <agelman@012.net.il> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Tue, 27 Jun 2006 09:54:51 +0000 (02:54 -0700)]
[PATCH] pi-futex: add plist implementation
Add the priority-sorted list (plist) implementation.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add debug_check_no_locks_freed(), as a central inline to add
bad-lock-free-debugging functionality to.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Tue, 27 Jun 2006 09:54:47 +0000 (02:54 -0700)]
[PATCH] pi-futex: futex code cleanups
We are pleased to announce "lightweight userspace priority inheritance" (PI)
support for futexes. The following patchset and glibc patch implements it,
ontop of the robust-futexes patchset which is included in 2.6.16-mm1.
We are calling it lightweight for 3 reasons:
- in the user-space fastpath a PI-enabled futex involves no kernel work
(or any other PI complexity) at all. No registration, no extra kernel
calls - just pure fast atomic ops in userspace.
- in the slowpath (in the lock-contention case), the system call and
scheduling pattern is in fact better than that of normal futexes, due to
the 'integrated' nature of FUTEX_LOCK_PI. [more about that further down]
- the in-kernel PI implementation is streamlined around the mutex
abstraction, with strict rules that keep the implementation relatively
simple: only a single owner may own a lock (i.e. no read-write lock
support), only the owner may unlock a lock, no recursive locking, etc.
Many of you heard the horror stories about the evil PI code circling Linux for
years, which makes no real sense at all and is only used by buggy applications
and which has horrible overhead. Some of you have dreaded this very moment,
when someone actually submits working PI code ;-)
So why would we like to see PI support for futexes?
We'd like to see it done purely for technological reasons. We dont think it's
a buggy concept, we think it's useful functionality to offer to applications,
which functionality cannot be achieved in other ways. We also think it's the
right thing to do, and we think we've got the right arguments and the right
numbers to prove that. We also believe that we can address all the
counter-arguments as well. For these reasons (and the reasons outlined below)
we are submitting this patch-set for upstream kernel inclusion.
What are the benefits of PI?
The short reply:
----------------
User-space PI helps achieving/improving determinism for user-space
applications. In the best-case, it can help achieve determinism and
well-bound latencies. Even in the worst-case, PI will improve the statistical
distribution of locking related application delays.
The longer reply:
-----------------
Firstly, sharing locks between multiple tasks is a common programming
technique that often cannot be replaced with lockless algorithms. As we can
see it in the kernel [which is a quite complex program in itself], lockless
structures are rather the exception than the norm - the current ratio of
lockless vs. locky code for shared data structures is somewhere between 1:10
and 1:100. Lockless is hard, and the complexity of lockless algorithms often
endangers to ability to do robust reviews of said code. I.e. critical RT
apps often choose lock structures to protect critical data structures, instead
of lockless algorithms. Furthermore, there are cases (like shared hardware,
or other resource limits) where lockless access is mathematically impossible.
Media players (such as Jack) are an example of reasonable application design
with multiple tasks (with multiple priority levels) sharing short-held locks:
for example, a highprio audio playback thread is combined with medium-prio
construct-audio-data threads and low-prio display-colory-stuff threads. Add
video and decoding to the mix and we've got even more priority levels.
So once we accept that synchronization objects (locks) are an unavoidable fact
of life, and once we accept that multi-task userspace apps have a very fair
expectation of being able to use locks, we've got to think about how to offer
the option of a deterministic locking implementation to user-space.
Most of the technical counter-arguments against doing priority inheritance
only apply to kernel-space locks. But user-space locks are different, there
we cannot disable interrupts or make the task non-preemptible in a critical
section, so the 'use spinlocks' argument does not apply (user-space spinlocks
have the same priority inversion problems as other user-space locking
constructs). Fact is, pretty much the only technique that currently enables
good determinism for userspace locks (such as futex-based pthread mutexes) is
priority inheritance:
Currently (without PI), if a high-prio and a low-prio task shares a lock [this
is a quite common scenario for most non-trivial RT applications], even if all
critical sections are coded carefully to be deterministic (i.e. all critical
sections are short in duration and only execute a limited number of
instructions), the kernel cannot guarantee any deterministic execution of the
high-prio task: any medium-priority task could preempt the low-prio task while
it holds the shared lock and executes the critical section, and could delay it
indefinitely.
Implementation:
---------------
As mentioned before, the userspace fastpath of PI-enabled pthread mutexes
involves no kernel work at all - they behave quite similarly to normal
futex-based locks: a 0 value means unlocked, and a value==TID means locked.
(This is the same method as used by list-based robust futexes.) Userspace uses
atomic ops to lock/unlock these mutexes without entering the kernel.
To handle the slowpath, we have added two new futex ops:
FUTEX_LOCK_PI
FUTEX_UNLOCK_PI
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID
fails], then FUTEX_LOCK_PI is called. The kernel does all the remaining work:
if there is no futex-queue attached to the futex address yet then the code
looks up the task that owns the futex [it has put its own TID into the futex
value], and attaches a 'PI state' structure to the futex-queue. The pi_state
includes an rt-mutex, which is a PI-aware, kernel-based synchronization
object. The 'other' task is made the owner of the rt-mutex, and the
FUTEX_WAITERS bit is atomically set in the futex value. Then this task tries
to lock the rt-mutex, on which it blocks. Once it returns, it has the mutex
acquired, and it sets the futex value to its own TID and returns. Userspace
has no other work to perform - it now owns the lock, and futex value contains
FUTEX_WAITERS|TID.
If the unlock side fastpath succeeds, [i.e. userspace manages to do a TID ->
0 atomic transition of the futex value], then no kernel work is triggered.
If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then
FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of
userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes
up any potential waiters.
Note that under this approach, contrary to other PI-futex approaches, there is
no prior 'registration' of a PI-futex. [which is not quite possible anyway,
due to existing ABI properties of pthread mutexes.]
Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties
of futexes, and all four combinations are possible: futex, robust-futex,
PI-futex, robust+PI-futex.
glibc support:
--------------
Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes
(and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX
mutexes. (PTHREAD_PRIO_PROTECT support will be added later on too, no
additional kernel changes are needed for that). [NOTE: The glibc patch is
obviously inofficial and unsupported without matching upstream kernel
functionality.]
the patch-queue and the glibc patch can also be downloaded from:
http://redhat.com/~mingo/PI-futex-patches/
Many thanks go to the people who helped us create this kernel feature: Steven
Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan
van de Ven, Oleg Nesterov and others. Credits for related prior projects goes
to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others.
Clean up the futex code, before adding more features to it:
- use u32 as the futex field type - that's the ABI
- use __user and pointers to u32 instead of unsigned long
- code style / comment style cleanups
- rename hash-bucket name from 'bh' to 'hb'.
I checked the pre and post futex.o object files to make sure this
patch has no code effects.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steven Rostedt [Tue, 27 Jun 2006 09:54:44 +0000 (02:54 -0700)]
[PATCH] BUG() if setscheduler is called from interrupt context
Thomas Gleixner is adding the call to a rtmutex function in setscheduler.
This call grabs a spin_lock that is not always protected by interrupts
disabled. So this means that setscheduler cant be called from interrupt
context.
To prevent this from happening in the future, this patch adds a
BUG_ON(in_interrupt()) in that function. (Thanks to akpm <aka. Andrew
Morton> for this suggestion).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Siddha, Suresh B [Tue, 27 Jun 2006 09:54:42 +0000 (02:54 -0700)]
[PATCH] sched: mc/smt power savings sched policy
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
/sys/devices/system/cpu/ control the MC/SMT power savings policy for the
scheduler.
Based on the values (1-enable, 0-disable) for these controls, sched groups
cpu power will be determined for different domains. When power savings
policy is enabled and under light load conditions, scheduler will minimize
the physical packages/cpu cores carrying the load and thus conserving
power(with a perf impact based on the workload characteristics... see OLS
2005 CMP kernel scheduler paper for more details..)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>