David S. Miller [Wed, 1 Feb 2006 02:34:51 +0000 (18:34 -0800)]
[SPARC64]: Fix race in LOAD_PER_CPU_BASE()
Since we use %g5 itself as a temporary, it can get clobbered
if we take an interrupt mid-stream and thus cause end up with
the final %g5 value too early as a result of rtrap processing.
Set %g5 at the very end, atomically, to avoid this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Feb 2006 02:34:21 +0000 (18:34 -0800)]
[SPARC64]: Fix too early reference to %g6
%g6 is not necessarily set to current_thread_info()
at sparc64_realfault_common. So store the fault
code and address after we invoke etrap and %g6 is
properly set up.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Feb 2006 02:33:00 +0000 (18:33 -0800)]
[SPARC64]: Fix bogus flush instruction usage.
Some of the trap code was still assuming that alternate
global %g6 was hard coded with current_thread_info().
Let's just consistently flush at KERNBASE when we need
a pipeline synchronization. That's locked into the TLB
and will always work.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Feb 2006 02:31:20 +0000 (18:31 -0800)]
[SPARC64]: Add infrastructure for dynamic TSB sizing.
This also cleans up tsb_context_switch(). The assembler
routine is now __tsb_context_switch() and the former is
an inline function that picks out the bits from the mm_struct
and passes it into the assembler code as arguments.
setup_tsb_parms() computes the locked TLB entry to map the
TSB. Later when we support using the physical address quad
load instructions of Cheetah+ and later, we'll simply use
the physical address for the TSB register value and set
the map virtual and PTE both to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Feb 2006 02:31:06 +0000 (18:31 -0800)]
[SPARC64]: TSB refinements.
Move {init_new,destroy}_context() out of line.
Do not put huge pages into the TSB, only base page size translations.
There are some clever things we could do here, but for now let's be
correct instead of fancy.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Feb 2006 07:24:22 +0000 (23:24 -0800)]
[SPARC64]: Elminate all usage of hard-coded trap globals.
UltraSPARC has special sets of global registers which are switched to
for certain trap types. There is one set for MMU related traps, one
set of Interrupt Vector processing, and another set (called the
Alternate globals) for all other trap types.
For what seems like forever we've hard coded the values in some of
these trap registers. Some examples include:
1) Interrupt Vector global %g6 holds current processors interrupt
work struct where received interrupts are managed for IRQ handler
dispatch.
2) MMU global %g7 holds the base of the page tables of the currently
active address space.
3) Alternate global %g6 held the current_thread_info() value.
Such hardcoding has resulted in some serious issues in many areas.
There are some code sequences where having another register available
would help clean up the implementation. Taking traps such as
cross-calls from the OBP firmware requires some trick code sequences
wherein we have to save away and restore all of the special sets of
global registers when we enter/exit OBP.
We were also using the IMMU TSB register on SMP to hold the per-cpu
area base address, which doesn't work any longer now that we actually
use the TSB facility of the cpu.
The implementation is pretty straight forward. One tricky bit is
getting the current processor ID as that is different on different cpu
variants. We use a stub with a fancy calling convention which we
patch at boot time. The calling convention is that the stub is
branched to and the (PC - 4) to return to is in register %g1. The cpu
number is left in %g6. This stub can be invoked by using the
__GET_CPUID macro.
We use an array of per-cpu trap state to store the current thread and
physical address of the current address space's page tables. The
TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this
table, it uses __GET_CPUID and also clobbers %g1.
TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load
the current processor's IRQ software state into %g6. It also uses
__GET_CPUID and clobbers %g1.
Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the
current address space's page tables into %g7, it clobbers %g1 and uses
__GET_CPUID.
Many refinements are possible, as well as some tuning, with this stuff
in place.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Feb 2006 02:29:18 +0000 (18:29 -0800)]
[SPARC64]: Move away from virtual page tables, part 1.
We now use the TSB hardware assist features of the UltraSPARC
MMUs.
SMP is currently knowingly broken, we need to find another place
to store the per-cpu base pointers. We hid them away in the TSB
base register, and that obviously will not work any more :-)
Another known broken case is non-8KB base page size.
Also noticed that flush_tlb_all() is not referenced anywhere, only
the internal __flush_tlb_all() (local cpu only) is used by the
sparc64 port, so we can get rid of flush_tlb_all().
The kernel gets it's own 8KB TSB (swapper_tsb) and each address space
gets it's own private 8K TSB. Later we can add code to dynamically
increase the size of per-process TSB as the RSS grows. An 8KB TSB is
good enough for up to about a 4MB RSS, after which the TSB starts to
incur many capacity and conflict misses.
We even accumulate OBP translations into the kernel TSB.
Another area for refinement is large page size support. We could use
a secondary address space TSB to handle those.
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch "[SPARC64]: Get rid of fast IRQ feature"
moved the the code from arch/sparc64/kernel/entry.S:
lduba [%g7] ASI_PHYS_BYPASS_EC_E, %g5
or %g5, AUXIO_AUX1_FTCNT, %g5
stba %g5, [%g7] ASI_PHYS_BYPASS_EC_E
andn %g5, AUXIO_AUX1_FTCNT, %g5
stba %g5, [%g7] ASI_PHYS_BYPASS_EC_E
to arch/sparc64/kernel/irq.c:
val = readb(auxio_register);
val |= AUXIO_AUX1_FTCNT;
writeb(val, auxio_register);
val &= AUXIO_AUX1_FTCNT;
writeb(val, auxio_register);
This looks like it it missing a bitwise not, which is reintroduced
by this patch.
Due to lack of a floppy device, I could not test it, but it looks
evident.
Signed-off-by: Bernhard R Link <brlink@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 20 Mar 2006 05:12:00 +0000 (21:12 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
[MIPS] Sibyte: Fix interrupt timer off by one bug.
[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
[MIPS] Protect more of timer_interrupt() by xtime_lock.
[MIPS] Work around bad code generation for <asm/io.h>.
[MIPS] Simple patch to power off DBAU1200
[MIPS] Fix DBAu1550 software power off.
[MIPS] local_r4k_flush_cache_page fix
[MIPS] SB1: Fix interrupt disable hazard.
[MIPS] Get rid of the IP22-specific code in arclib.
Update MAINTAINERS entry for MIPS.
Michael Chan [Sun, 19 Mar 2006 21:21:12 +0000 (13:21 -0800)]
[TG3]: 40-bit DMA workaround part 2
The 40-bit DMA workaround recently implemented for 5714, 5715, and
5780 needs to be expanded because there may be other tg3 devices
behind the EPB Express to PCIX bridge in the 5780 class device.
For example, some 4-port card or mother board designs have 5704 behind
the 5714.
All devices behind the EPB require the 40-bit DMA workaround.
Thanks to Chris Elmquist again for reporting the problem and testing
the patch.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.
Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.
The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.
Coverity #651.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Krufky [Wed, 15 Mar 2006 05:36:13 +0000 (02:36 -0300)]
[PATCH] Kconfig: swap VIDEO_CX88_ALSA and VIDEO_CX88_DVB
VIDEO_CX88_ALSA should not be between VIDEO_CX88_DVB and
VIDEO_CX88_DVB_ALL_FRONTENDS
When cx88-alsa was added to cx88/Kconfig, it was added in between
VIDEO_CX88_DVB and VIDEO_CX88_DVB_ALL_FRONTENDS. This caused
undesireable effects to the appearance of the menu options in
menuconfig.
This fix reorders cx88-alsa and cx88-dvb in Kconfig, to match saa7134,
and restore the correct menuconfig appearance.
Oleg Nesterov [Sat, 18 Mar 2006 17:41:10 +0000 (20:41 +0300)]
[PATCH] disable unshare(CLONE_VM) for now
sys_unshare() does mmput(new_mm). This is not enough if we have
mm->core_waiters.
This patch is a temporary fix for soon to be released 2.6.16.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
[ Checked with Uli: "I'm not planning to use unshare(CLONE_VM). It's
not needed for any functionality planned so far. What we (as in Red
Hat) need unshare() for now is the filesystem side." ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ralf Baechle [Wed, 15 Mar 2006 00:03:29 +0000 (00:03 +0000)]
[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining
value, however once this counter reaches 0 and the interrupt is raised,
it immediately resets and begins to count down again.
If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after
the timer has reset but prior to cpu 0 processing the interrupt and
taking write_seqlock() in timer_interrupt() it will return a full value
(or close to it) causing time to jump backwards 1ms. Once cpu 0 handles
the interrupt and timer_interrupt() gets far enough along it will jump
forward 1ms.
Fix this problem by implementing mips_hpt_*() on sb1250 using a spare
timer unrelated to the existing periodic interrupt timers. It runs at
1Mhz with a full 23bit counter. This eliminated the custom
do_gettimeoffset() for sb1250 and allowed use of the generic
fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo.
Ralf Baechle [Tue, 14 Mar 2006 23:46:58 +0000 (23:46 +0000)]
[MIPS] Protect more of timer_interrupt() by xtime_lock.
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
* do_timer() expects the arch-specific handler to take the lock as it
modifies jiffies[_64] and xtime.
* writing timerhi/lo in timer_interrupt() will mess up
fixed_rate_gettimeoffset() which reads timerhi/lo.
Ralf Baechle [Wed, 15 Mar 2006 11:36:31 +0000 (11:36 +0000)]
[MIPS] Work around bad code generation for <asm/io.h>.
If a call to set_io_port_base() was being followed by usage of
mips_io_port_base in the same function gcc was possibly using the old
value due to some clever abuse of const. Adding a barrier will keep
the optimization and result in correct code with latest gcc.
Atsushi Nemoto [Mon, 13 Mar 2006 09:23:03 +0000 (18:23 +0900)]
[MIPS] local_r4k_flush_cache_page fix
If dcache_size != icache_size or dcache_size != scache_size, or
set-associative cache, icache/scache does not flushed properly. Make
blast_?cache_page_indexed() masks its index value correctly. Also,
use physical address for physically indexed pcache/scache.
Hugh Dickins [Fri, 17 Mar 2006 07:04:09 +0000 (23:04 -0800)]
[PATCH] fix free swap cache latency
Lee Revell reported 28ms latency when process with lots of swapped memory
exits.
2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all. We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.
Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam Ravnborg [Fri, 17 Mar 2006 07:04:08 +0000 (23:04 -0800)]
[PATCH] kbuild: fix buffer overflow in modpost
Jiri Benc <jbenc@suse.cz> reported that modpost would stop with SIGABRT if
used with long filepaths.
The error looked like:
> Building modules, stage 2.
> MODPOST
> *** glibc detected *** scripts/mod/modpost: realloc(): invalid next size:
+0x0809f588 ***
> [...]
Fix this by allocating at least the required memory + SZ bytes each time.
Before we sometimes ended up allocating too little memory resuting in the
glibc detected bug above. Based on patch originally submitted by: Jiri
Benc <jbenc@suse.cz>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This can happen because the arguments to the nfsservctl() system call are
versioned. This is a good thing. However, when a bad version is detected,
the kernel prints a message and then returns an error.
Signed-off-by: Peter Staubach <staubach@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can call try_to_release_page() with PagePrivate off and a valid
page->mapping This may cause all sorts of trouble for the filesystem
*_releasepage() handlers. XFS bombs out in that case.
Lock the page before checking for page private.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kevin Corry [Fri, 17 Mar 2006 07:04:03 +0000 (23:04 -0800)]
[PATCH] dm stripe: Fix bounds
The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size. Under certain conditions, this can
lead to I/O requests going off the end of an underlying device. This
test-case shows one example.
This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out
And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100
The patch will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.
This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a
multiple of the chunk-size.
The size of a stripe device must be a multiple of its chunk-size.
(akpm: that typecast is quite gratuitous)
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] x86: check for online cpus before bringing them up
Bryce reported a bug wherein offlining CPU0 (on x86 box) and then
subsequently onlining it resulted in a lockup.
On x86, CPU0 is never offlined. The subsequent attempt to online CPU0
doesn't take that into account. It actually tries to bootup the already
booted CPU. Following patch fixes the problem (as acknowledged by Bryce).
Please consider for inclusion in 2.6.16.
[PATCH] v9fs: fix overzealous dropping of dentry which breaks dcache
There is a d_drop in dir_release which caused problems as it invalidates
dcache entries too soon. This was likely a part of the wierd cwd behavior
folks were seeing.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Fri, 17 Mar 2006 07:04:01 +0000 (23:04 -0800)]
[PATCH] posix-timers: fix requeue accounting when signal is ignored
When the posix-timer signal is ignored then the timer is rearmed by the
callback function. The requeue pending accounting has to be fixed up else
the state might be wrong.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pointer to the current time interpolator and the current list of time
interpolators are typically only changed during bootup. Adding
__read_mostly takes them away from possibly hot cachelines.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] page migration: Fail with error if swap not setup
Currently the migration of anonymous pages will silently fail if no swap is
setup. This patch makes page migration functions check for available swap
and fail with -ENODEV if no swap space is available.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] unshare: Use rcu_assign_pointer when setting sighand
The sighand pointer only needs the rcu_read_lock on the
read side. So only depending on task_lock protection
when setting this pointer is not enough. We also need
a memory barrier to ensure the initialization is seen first.
Use rcu_assign_pointer as it does this for us, and clearly
documents that we are setting an rcu readable pointer.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Scott Bardone [Fri, 17 Mar 2006 00:20:40 +0000 (19:20 -0500)]
[netdrvr] fix array overflows in Chelsio driver
Adrian Bunk wrote:
> The Coverity checker spotted the following two array overflows in
> drivers/net/chelsio/sge.c (in both cases, the arrays contain 3
> elements):
[snip]
This is a bug. The array should contain 2 elements. Here is the fix.
Signed-off-by: Scott Bardone <sbardone@chelsio.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge:
powerpc: update defconfigs
[PATCH] powerpc: properly configure DDR/P5IOC children devs
[PATCH] powerpc: remove duplicate EXPORT_SYMBOLS
[PATCH] powerpc: RTC memory corruption
[PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption
[PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option
[PATCH] powerpc/64: enable CONFIG_BLK_DEV_SL82C105
[PATCH] powerpc: correct cacheflush loop in zImage
powerpc: Fix problem with time going backwards
powerpc: Disallow lparcfg being a module
John Rose [Tue, 14 Mar 2006 23:46:45 +0000 (17:46 -0600)]
[PATCH] powerpc: properly configure DDR/P5IOC children devs
The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().
Signed-off-by: John Rose <johnrose@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Olaf Hering [Tue, 14 Mar 2006 20:21:11 +0000 (21:21 +0100)]
[PATCH] powerpc: remove duplicate EXPORT_SYMBOLS
remove warnings when building a 64bit kernel.
smp_call_function triggers also with 32bit kernel.
WARNING: vmlinux: duplicate symbol 'smp_call_function' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:164:EXPORT_SYMBOL(smp_call_function);
arch/powerpc/kernel/smp.c:300:EXPORT_SYMBOL(smp_call_function);
WARNING: vmlinux: duplicate symbol 'ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:113:EXPORT_SYMBOL(ioremap);
arch/powerpc/mm/pgtable_64.c:321:EXPORT_SYMBOL(ioremap);
WARNING: vmlinux: duplicate symbol '__ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:117:EXPORT_SYMBOL(__ioremap);
arch/powerpc/mm/pgtable_64.c:322:EXPORT_SYMBOL(__ioremap);
WARNING: vmlinux: duplicate symbol 'iounmap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:118:EXPORT_SYMBOL(iounmap);
arch/powerpc/mm/pgtable_64.c:323:EXPORT_SYMBOL(iounmap);
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
[PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption
This patch fixes incorrect setting of powersave_nap to 1 on all
PowerMacs, potentially causing memory corruption on some models. This
bug was introuced by me during the 32/64 bits merge.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Michael Ellerman [Fri, 10 Mar 2006 04:01:08 +0000 (15:01 +1100)]
[PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option
The wording of the CRASH_DUMP Kconfig option is not very clear. It gives you a
kernel that can be used _as_ the kdump kernel, not a kernel that can boot into
a kdump kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable the onboard IDE driver for p610, p615 and p630.
They have the CD connected to this card. All other RS/6000 systems with this
controller have no connectors and dont need this option.
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Wed, 15 Mar 2006 02:47:15 +0000 (13:47 +1100)]
powerpc: Fix problem with time going backwards
The recent changes to keep gettimeofday in sync with xtime had the side
effect that it was occasionally possible for the time reported by
gettimeofday to go back by a microsecond. There were two reasons:
(1) when we recalculated the offsets used by gettimeofday every 2^31
timebase ticks, we lost an accumulated fractional microsecond, and
(2) because the update is done some time after the notional start of
jiffy, if ntp is slowing the clock, it is possible to see time go backwards
when the timebase factor gets reduced.
This fixes it by (a) slowing the gettimeofday clock by about 1us in
2^31 timebase ticks (a factor of less than 1 in 3.7 million), and (b)
adjusting the timebase offsets in the rare case that the gettimeofday
result could possibly go backwards (i.e. when ntp is slowing the clock
and the timer interrupt is late). In this case the adjustment will
reduce to zero eventually because of (a).
This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
in the ext2 filesystem "readdir()" function. It also cleans up the code
to avoid the unnecessary goto mess.
The bugs were related to re-valiating the f_pos value after somebody had
either done an "lseek()" on the directory to an invalid offset, or when
the offset had become invalid due to a file being unlinked in the
directory. The code would not only set the f_version too eagerly, it
would also not update f_pos appropriately for when the offset fixup took
place.
When that happened, we'd occasionally subsequently fail the readdir()
even when we shouldn't (no real harm done, but an ugly printk, and
obviously you would end up not necessarily seeing all entries).
Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem
and had a test-case for it, and also fixed up a thinko in the first
version of this patch.
Ben Dooks [Wed, 15 Mar 2006 23:17:30 +0000 (23:17 +0000)]
[ARM] 3365/1: [cleanup] header for compat.c exported functions
Patch from Ben Dooks
arch/arm/kernel/compat.c exports two functions,
convert_to_tag_list and squash_mem_tags which
are not defined in any header files, and not
used outside arch/arm/kernel.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ben Dooks [Wed, 15 Mar 2006 23:17:23 +0000 (23:17 +0000)]
[ARM] 3363/1: [cleanup] process.c - fix warnings
Patch from Ben Dooks
Fix the following warnings from sparse:
arch/arm/kernel/process.c:86:6: warning: symbol 'default_idle' was not declared. Should it be static?
arch/arm/kernel/process.c:378:5: warning: symbol 'dump_fpu' was not declared. Should it be static?
Include <linux/elfcore.h> for dump_fpu() decleration, and
make default_idle() static as it is not used outside the file.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Hong Liu [Wed, 8 Mar 2006 02:50:20 +0000 (10:50 +0800)]
[PATCH] ieee80211: Fix QoS is not active problem
Fix QoS is not active even the network and the card is QOS enabled.
The problem is we pass the wrong ieee80211_network address to
ipw_handle_beacon/ipw_handle_probe_response, thus the
ieee80211_network->qos_data.active will not be set, causing the driver
not sending QoS frames at all.
Signed-off-by: Hong Liu <hong.liu@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Jesse Brandeburg [Wed, 15 Mar 2006 18:55:24 +0000 (10:55 -0800)]
e100: fix eeh on pseries during ethtool -t
Olaf Hering reported a problem on pseries with e100 where ethtool -t would
cause a bus error, and the e100 driver would stop working. Due to the new
load ucode command the cb list must be allocated before calling
e100_init_hw, so remove the call and just let e100_up take care of it.
Add DMA resources to s3c2410 spi platform devices - dma_(alloc|free)_coherent should now work as expected.
Signed-off-by: Albrecht Dreß <albrecht.dress@lios-tech.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pavel Machek [Wed, 15 Mar 2006 16:03:03 +0000 (16:03 +0000)]
[ARM] 3357/1: enable frontlight on collie
Patch from Pavel Machek
Enable frontlight during collie bootup, so that display is actually
readable in anything other than bright sunlight.
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] Consistent capabilites associated with MPOL_MOVE_ALL
It seems that setting scheduling policy and priorities is also the kind of
thing that might be performed in apps that also use the NUMA API, so it
would seem consistent to use CAP_SYS_NICE for NUMA also.
So use CAP_SYS_NICE for controlling migration permissions.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Rework text in vm/page-migration to be clearer and reflect the final
version of page migration in 2.6.16. Mention Andi Kleen's numactl
package that contains user space tools for page migration via
libnuma. Add reference to numa_maps and to the manpage in numactl.
- Add todo list for outstanding issues
Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] page migration: fail if page is in a vma flagged VM_LOCKED
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.
However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
and avoid retrying the migration.
migrate_page_remove_references() now needs to return a reason why the
failure occured. So switch migrate_page_remove_references to use -Exx
style error messages.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nathan Scott [Wed, 15 Mar 2006 04:14:45 +0000 (15:14 +1100)]
Fix a direct I/O locking issue revealed by the new mutex code.
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
not possible to get i_mutex locking correct when using DIO_OWN
direct I/O locking in a filesystem due to indeterminism in the
possible return code/lock/unlock combinations. This can cause
a direct read to attempt a double i_mutex unlock inside XFS.
We're now ensuring __blockdev_direct_IO always exits with the
inode i_mutex (still) held for a direct reader.
Tested with the three different locking modes (via direct block
device access, ext3 and XFS) - both reading and writing; cannot
find any regressions resulting from this change, and it clearly
fixes the mutex_unlock warning originally reported here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2
Signed-off-by: Nathan Scott <nathans@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de>
Maneesh Soni [Tue, 14 Mar 2006 09:33:14 +0000 (15:03 +0530)]
[PATCH] Plug kdump shutdown race window
lapic_shutdown() re-enables interrupts which is un-desirable for panic
case, so use local_irq_save() and local_irq_restore() to keep the irqs
disabled for kexec on panic case, and close a possible race window while
kdump shutdown as shown in this stack trace
Dave Peterson [Tue, 14 Mar 2006 05:20:50 +0000 (21:20 -0800)]
[PATCH] EDAC: disable sysfs interface
- Disable the EDAC sysfs code. The sysfs interface that EDAC presents to
user space needs more thought, and is likely to change substantially.
Therefore disable it for now so users don't start depending on it in its
current form.
- Disable the default behavior of calling panic() when an uncorrectible
error is detected (since for now, there is no sysfs interface that allows
the user to configure this behavior).
Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trond Myklebust [Tue, 14 Mar 2006 05:20:48 +0000 (21:20 -0800)]
[PATCH] SUNRPC: Fix potential deadlock in RPC code
In rpc_wake_up() and rpc_wake_up_status(), it is possible for the call to
__rpc_wake_up_task() to fail if another thread happens to be calling
rpc_wake_up_task() on the same rpc_task.
Trond Myklebust [Tue, 14 Mar 2006 05:20:47 +0000 (21:20 -0800)]
[PATCH] NFSv4: fix mount segfault on errors returned that are < -1000
It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
mapping them to kernel errors. Problem spotted by Neil Horman
<nhorman@tuxdriver.com>
Trond Myklebust [Tue, 14 Mar 2006 05:20:46 +0000 (21:20 -0800)]
[PATCH] NFS: Fix a potential panic in O_DIRECT
Based on an original patch by Mike O'Connor and Greg Banks of SGI.
Mike states:
A normal user can panic an NFS client and cause a local DoS with
'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the
user buffer starts with a valid mapped page and contains an unmapped page,
will crash in this way. I haven't followed the code, but O_DIRECT reads with
similar user buffers will probably also crash albeit in different ways.
Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
correctly handles get_user_pages() returning an error, which happens if the
first page covered by the user buffer's address range is unmapped. However,
if the first page is mapped but some subsequent page isn't, get_user_pages()
will return a positive number which is less than the number of pages requested
(this behaviour is sort of analagous to a short write() call and appears to be
intentional). nfs_get_user_pages() doesn't detect this and hands off the
array of pages (whose last few elements are random rubbish from the newly
allocated array memory) to it's caller, whence they go to
nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
and calculates its own idea of how many pages are in the array from the user
buffer length. Needless to say, when it comes to transmit those uninitialised
page* pointers, we see a crash in the network stack.
GOTO Masanori [Tue, 14 Mar 2006 05:20:44 +0000 (21:20 -0800)]
[PATCH] Fix sigaltstack corruption among cloned threads
This patch fixes alternate signal stack corruption among cloned threads
with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.
The value of alternate signal stack is currently inherited after a call of
clone(... CLONE_SIGHAND | CLONE_VM). But if sigaltstack is set by a
parent thread, and then if multiple cloned child threads (+ parent threads)
call signal handler at the same time, some threads may be conflicted -
because they share to use the same alternative signal stack region.
Finally they get sigsegv. It's an undesirable race condition. Note that
child threads created from NPTL pthread_create() also hit this conflict
when the parent thread uses sigaltstack, without my patch.
To fix this problem, this patch clears the child threads' sigaltstack
information like exec(). This behavior follows the SUSv3 specification.
In SUSv3, pthread_create() says "The alternate stack shall not be inherited
(when new threads are initialized)". It means that sigaltstack should be
cleared when sigaltstack memory space is shared by cloned threads with
CLONE_SIGHAND.
Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
- If clone_flags line is not existed, fork() does not inherit sigaltstack.
- CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
- CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
- CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
but this flag has a bit different semantics.
I decided to use CLONE_SIGHAND.
[ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
[PATCH] macintosh: correct AC Power info in /proc/pmu/info
Report AC Power present in /proc/pmu/info if there is no battery.
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Brownell [Tue, 14 Mar 2006 05:20:40 +0000 (21:20 -0800)]
[PATCH] mtd_dataflash, fix block vs page erase
Fix a bug in the block-erase optimization for Dataflash; it was using block
erase even for smaller segments that need page erase.
That wouldn't matter for JFFS2, which never erases less than one block
(sometimes several blocks), but for other callers it might.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>