David S. Miller [Thu, 22 Jun 2006 23:18:54 +0000 (16:18 -0700)]
[SPARC64]: Convert sparc64 PCI layer to in-kernel device tree.
One thing this change pointed out was that we really should
pull the "get 'local-mac-address' property" logic into a helper
function all the network drivers can call.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Jun 2006 07:01:56 +0000 (00:01 -0700)]
[SPARC64]: Fix for Niagara memory corruption.
On some sun4v systems, after netboot the ethernet controller and it's
DMA mappings can be left active. The net result is that the kernel
can end up using memory the ethernet controller will continue to DMA
into, resulting in corruption.
To deal with this, we are more careful about importing IOMMU
translations which OBP has left in the IO-TLB. If the mapping maps
into an area the firmware claimed was free and available memory for
the kernel to use, we demap instead of import that IOMMU entry.
This is going to cause the network chip to take a PCI master abort on
the next DMA it attempts, if it has been left going like this. All
tests show that this is handled properly by the PCI layer and the e1000
drivers.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Jun 2006 07:00:00 +0000 (00:00 -0700)]
[SPARC64]: Minor bug fix to obp_read_memory().
If we end up zero'ing out the size of one of the entries,
pop it out of the array completely because some code that
examines these things cannot handle a zero length element
properly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Fri, 23 Jun 2006 10:31:19 +0000 (03:31 -0700)]
[PATCH] cpufreq build fix
drivers/cpufreq/cpufreq_ondemand.c: In function 'do_dbs_timer':
drivers/cpufreq/cpufreq_ondemand.c:374: warning: implicit declaration of function 'lock_cpu_hotplug'
drivers/cpufreq/cpufreq_ondemand.c:381: warning: implicit declaration of function 'unlock_cpu_hotplug'
drivers/cpufreq/cpufreq_conservative.c: In function 'do_dbs_timer':
drivers/cpufreq/cpufreq_conservative.c:425: warning: implicit declaration of function 'lock_cpu_hotplug'
drivers/cpufreq/cpufreq_conservative.c:432: warning: implicit declaration of function 'unlock_cpu_hotplug'
Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Wed, 21 Jun 2006 12:48:09 +0000 (14:48 +0200)]
[BLOCK] Fix bounce limit address check
Do a safer check for when to enable DMA. Currently we enable ISA DMA
for cases that do not need it, resulting in OOM conditions when ZONE_DMA
runs out of space.
Jens Axboe [Fri, 16 Jun 2006 09:23:00 +0000 (11:23 +0200)]
[PATCH] cfq-iosched: many performance fixes
This is a collection of patches that greatly improve CFQ performance
in some circumstances.
- Change the idling logic to only kick in after a request is done and we
are deciding what to do. Before the idling included the request service
time, so it was hard to adjust. Now it's true think/idle time.
- Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep
it in control for sync and sequential (or close to) workloads.
- Expire queues immediately and move on to other busy queues, if we are
not going to idle after the current one finishes.
- Don't rearm idle timer if there are no busy queues. Just leave the
system idle.
Jens Axboe [Wed, 14 Jun 2006 07:10:45 +0000 (09:10 +0200)]
[PATCH] cfq-iosched: correctly set ioprio on both targets
Patch originally from Vasily Tarasov <vtaras@sw.ru>
If you set io-priority of process 1 using sys_ioprio_set system call by
another process 2 (like ionice do), then cfq_init_prio_data() function
sets priority of process 2 (current) on queue of process 1 and clears
the flag, that designates change of ioprio. So the process 1 will work
like with priority of process 2.
I propose not to call cfq_init_prio_data() on io-priority change, but
only mark queue as queue with changed prority. Every time when new
request comes cfq-scheduler checks for this flag and atomaticaly changes
priority of queue to new value.
Jens Axboe [Tue, 13 Jun 2006 06:26:10 +0000 (08:26 +0200)]
[PATCH] Kill PF_SYNCWRITE flag
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.
Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.
Jens Axboe [Tue, 13 Jun 2006 06:08:38 +0000 (08:08 +0200)]
[PATCH] cfq-iosched: Don't set the queue batching limits
We cannot update them if the user changes nr_requests, so don't
set it in the first place. The gains are pretty questionable as
well. The batching loss has been shown to decrease throughput.
Dave Jones [Mon, 12 Jun 2006 12:20:58 +0000 (14:20 +0200)]
[PATCH] remove dead code from elevator switching
We already drop the refcount in elevator_exit(), and as
we're setting 'e' to NULL, we'll never take that branch anyway.
Finally, as 'e' is a local var that isn't referenced afterwards,
setting it to NULL is pointless.
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@suse.de>
[PATCH] blk_start_queue() must be called with irq disabled - add warning
The queue lock can be taken from interrupts so it must always be taken with
irq disabling primitives. Some primitives already verify this.
blk_start_queue() is called under this lock, so interrupts must be
disabled.
Also document this requirement clearly in blk_init_queue(), where the queue
spinlock is set.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
Linus Torvalds [Fri, 23 Jun 2006 14:52:36 +0000 (07:52 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Bob Copeland [Fri, 23 Jun 2006 09:06:09 +0000 (02:06 -0700)]
[PATCH] docs: update sparse.txt with CHECK_ENDIAN
Update the sparse documentation to omit the -Wbitwise flag example (as it
is now passed by default), and document the kernel defines to enable
endianness checking.
Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] drivers/block/loop.c: don't return garbage if LOOP_SET_STATUS not called
While writing a version of losetup, I ran into the problem that the loop
device was returning total garbage.
It turns out the problem was that this losetup was only issuing the
LOOP_SET_FD ioctl and not issuing a subsequent LOOP_SET_STATUS ioctl. This
losetup didn't have any special status to set, so it left out the call.
The deeper cause is that loop_set_fd sets the transfer function to NULL,
which causes no transfer to happen lo_do_transfer.
This patch fixes the problem by setting transfer to transfer_none in
loop_set_fd.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mike Miller [Fri, 23 Jun 2006 09:06:07 +0000 (02:06 -0700)]
[PATCH] make kernel warn about incorrectly sized partitions
Sometimes partitions claim to be larger than the reported capacity of a
disk device. This patch makes the kernel warn about those partitions.
We still permit these patitions to be used. Quoting Andries Brouwer
<Andries.Brouwer@cwi.nl>:
Case 1: The kernel is mistaken about the size of the disk. (There are
commands to clip a disk to a certain capacity, there are jumpers to tell a
disk that it should report a certain capacity etc. Usually this is because
of BIOS bugs. In bad cases the machine will crash in the BIOS and hence fail
to boot if the disk reports full capacity.) In such cases actually accessing
the blocks of the partition may work fine, or may work fine after running an
unclip utility. I wrote "setmax" some years ago precisely for this reason.
Case 2: There was a messy partition table (maybe just a rounding error) but
the actual filesystem on the partition is contained in the physical disk.
Now using the filesystem goes without problem.
Case 3: Both partition and filesystem extend beyond the end of the disk. In
forensic or debugging situations one often uses a copy of the start of a
disk. Now access beyond the end gives an expected I/O error.
Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Stephen Cameron <steve.cameron@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Kara [Fri, 23 Jun 2006 09:06:05 +0000 (02:06 -0700)]
[PATCH] JBD: split checkpoint lists
Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Beulich [Fri, 23 Jun 2006 09:06:00 +0000 (02:06 -0700)]
[PATCH] adjust handle_IRR_event() return type
Correct the return type of handle_IRQ_event() (inconsistency noticed during
Xen development), and remove redundant declarations. The return type
adjustment required breaking out the definition of irqreturn_t into a
separate header, in order to satisfy current include order dependencies.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Fri, 23 Jun 2006 09:05:59 +0000 (02:05 -0700)]
[PATCH] drivers/md/raid6algos.c: fix a NULL dereference
This patch fixes a NULL dereference spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Porpoise [Fri, 23 Jun 2006 09:05:56 +0000 (02:05 -0700)]
[PATCH] When CONFIG_BASE_SMALL=1, cascade() may enter an infinite loop
When CONFIG_BASE_SAMLL=1, cascade() in may enter the infinite loop.
Because of CONFIG_BASE_SMALL=1(TVR_BITS=6 and TVN_BITS=4), the list
base->tv5 may cascade into base->tv5. So, the kernel enters the infinite
loop in the function cascade().
I created a test module to verify this bug, and a patch to fix it.
Oleg Nesterov [Fri, 23 Jun 2006 09:05:55 +0000 (02:05 -0700)]
[PATCH] list: use list_replace_init() instead of list_splice_init()
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Brent Casavant [Fri, 23 Jun 2006 09:05:52 +0000 (02:05 -0700)]
[PATCH] SGI IOC4: Detect IO card variant
There are three different IO cards which an SGI IOC4 controller may find
itself on. One of these variants does not bring out the IDE and serial
signals, so we need to disable attaching the corresponding IOC4 subdrivers
to such cards.
Cleans up message clutter emitted during device probing.
Paul E. McKenney [Fri, 23 Jun 2006 09:05:51 +0000 (02:05 -0700)]
[PATCH] Make RCU API inaccessible to non-GPL Linux kernel modules
Remove synchronize_kernel() (deprecated 2-APR-2005 in
http://lkml.org/lkml/2005/4/3/11) and makes the RCU API inaccessible to
non-GPL Linux kernel modules (as was announced more than one year ago in
http://lkml.org/lkml/2005/4/3/8). Tested on x86 and ppc64.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Fri, 23 Jun 2006 09:05:48 +0000 (02:05 -0700)]
[PATCH] Remove semi-softlockup from invalidate_mapping_pages
If invalidate_mapping_pages is called to invalidate a very large mapping
(e.g. a very large block device) and if the only active page in that
device is near the end (or at least, at a very large index), such as, say,
the superblock of an md array, and if that page happens to be locked when
invalidate_mapping_pages is called, then
pagevec_lookup will return this page and
as it is locked, 'next' will be incremented and pagevec_lookup
will be called again. and again. and again.
while we count from 0 upto a very large number.
We should really always set 'next' to 'page->index+1' before going around
the loop again, not just if the page isn't locked.
Cc: "Steinar H. Gunderson" <sgunderson@bigfoot.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao [Fri, 23 Jun 2006 09:05:41 +0000 (02:05 -0700)]
[PATCH] percpu counter data type changes to suppport more than 2**31 ext3 free blocks counter
The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.
- Generic perpcu counters data type changes. The size of the global counter
and local counter were explictly specified using s64 and s32. The global
counter is changed from long to s64, while the local counter is changed from
long to s32, so we could avoid doing 64 bit update in most cases.
- Users of the percpu counters are updated to make use of the new
percpu_counter_init() routine now taking an additional parameter to allow
users to pass the initial value of the global counter.
Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Laurent MEYER [Fri, 23 Jun 2006 09:05:36 +0000 (02:05 -0700)]
[PATCH] fix incorrect SA_ONSTACK behaviour for 64-bit processes
- When setting a sighandler using sigaction() call, if the flag
SA_ONSTACK is set and no alternate stack is provided via sigaltstack(),
the kernel still try to install the alternate stack. This behavior is
the opposite of the one which is documented in Single Unix Specifications
V3.
- Also when setting an alternate stack using sigaltstack() with the flag
SS_DISABLE, the kernel try to install the alternate stack on signal
delivery.
These two use cases makes the process crash at signal delivery.
Signed-off-by: Laurent Meyer <meyerlau@fr.ibm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Fri, 23 Jun 2006 09:05:31 +0000 (02:05 -0700)]
[PATCH] jbd: avoid kfree(NULL)
There are a couple of places where JBD has to check to see whether an unneeded
memory allocation was performed. Usually it _was_ needed, so we end up
calling kfree(NULL). We can micro-optimise that by checking the pointer
before calling kfree().
Thanks to Steven Rostedt <rostedt@goodmis.org> for identifying this.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andreas Mohr [Fri, 23 Jun 2006 09:05:30 +0000 (02:05 -0700)]
[PATCH] x86/powerpc make hardirq_ctx and softirq_ctx __read_mostly
The hardirq_ctx and softirq_ctx variables are written to on init only,
Signed-off-by: Andreas Mohr <andi@lisas.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>