NeilBrown [Fri, 26 Jan 2007 08:57:14 +0000 (00:57 -0800)]
[PATCH] md: remove unnecessary printk when raid5 gets an unaligned read.
raid5_mergeable_bvec tries to ensure that raid5 never sees a read request
that does not fit within just one chunk. However as we must always accept
a single-page read, that is not always possible.
So when "in_chunk_boundary" fails, it might be unusual, but it is not a
problem and printing a message every time is a bad idea.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Fri, 26 Jan 2007 08:57:12 +0000 (00:57 -0800)]
[PATCH] Fix UML on non-standard VM split hosts
This fixes UML on hosts with non-standard VM splits. We had changed the
config variable that controls UML behavior on such hosts, but not
propogated the change everywhere. In particular, the values of STUB_CODE
and STUB_DATA relied on the old variable.
I also reformatted the HOST_VMSPLIT_3G help to make it more standard.
Spotted by uml@flonatel.org.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Blaisorblade <blaisorblade@yahoo.it> Cc: Pravin <shindepravin@gmail.com> Cc: <uml@flonatel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 26 Jan 2007 08:57:10 +0000 (00:57 -0800)]
[PATCH] knfsd: Fix type mismatch with filldir_t used by nfsd
nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'. It
then casts encode_dent_fn points to 'filldir_t' as needed. This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.
So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now. Less internal (to nfsd) checking
offset by more external checking, which is more important.
Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.
Signed-off-by: Gabriel Paubert <paubert@iram.es> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Fri, 26 Jan 2007 08:57:09 +0000 (00:57 -0800)]
[PATCH] fix various kernel-doc in header files
Fix a number of kernel-doc entries for header files in include/linux by
making sure they begin with the appropriate '/**' notation and use @var
notation.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jun'ichi Nomura [Fri, 26 Jan 2007 08:57:07 +0000 (00:57 -0800)]
[PATCH] dm-multipath: fix stall on noflush suspend/resume
Allow noflush suspend/resume of device-mapper device only for the case
where the device size is unchanged.
Otherwise, dm-multipath devices can stall when resumed if noflush was used
when suspending them, all paths have failed and queue_if_no_path is set.
Explanation:
1. Something is doing fsync() on the block dev,
holding inode->i_sem
2. The fsync write is blocked by all-paths-down and queue_if_no_path
3. Someone requests to suspend the dm device with noflush.
Pending writes are left in queue.
4. In the middle of dm_resume(), __bind() tries to get
inode->i_sem to do __set_size() and waits forever.
'noflush suspend' is a new device-mapper feature introduced in
early 2.6.20. So I hope the fix being included before 2.6.20 is
released.
Example of reproducer:
1. Create a multipath device by dmsetup
2. Fail all paths during mkfs
3. Do dmsetup suspend --noflush and load new map with healthy paths
4. Do dmsetup resume
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] 9p: null terminate error strings for debug print
We weren't properly NULL terminating protocol error strings for our debug
printk resulting in garbage being included in the output when debug was
enabled.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] 9p: fix segfault caused by race condition in meta-data operations
Running dbench multithreaded exposed a race condition where fid structures
were removed while in use. This patch adds semaphores to meta-data operations
to protect the fid structure. Some cleanup of error-case handling in the
inode operations is also included.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] 9p: update documentation regarding server applications
Update the documentation to cover using Inferno as a server for 9p and to
include information about spfs (a stable single-threaded stand-alone 9p
server).
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9p doesn't handle renames between directories -- however, we were returning
EPERM instead of EXDEV when we detected this case.
Signed-off-by: Eric Van Hensbergren <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] 9p: fix bogus return code checks during initialization
There is a simple logic error in init_v9fs - the return code checks are
reversed. This patch fixes the return code and adds some messages to prevent
module initialization from failing silently.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 26 Jan 2007 08:57:03 +0000 (00:57 -0800)]
[PATCH] md: avoid reading past the end of a bitmap file
In most cases we check the size of the bitmap file before reading data from
it. However when reading the superblock, we always read the first PAGE_SIZE
bytes, which might not always be appropriate. So limit that read to the size
of the file if appropriate.
Also, we get the count of available bytes wrong in one place, so that too can
read past the end of the file.
Cc: "yang yin" <yinyang801120@gmail.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 26 Jan 2007 08:57:02 +0000 (00:57 -0800)]
[PATCH] md: make sure the events count in an md array never returns to zero
Now that we sometimes step the array events count backwards (when
transitioning dirty->clean where nothing else interesting has happened - so
that we don't need to write to spares all the time), it is possible for the
event count to return to zero, which is potentially confusing and triggers and
MD_BUG.
We could possibly remove the MD_BUG, but is just as easy, and probably safer,
to make sure we never return to zero.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 26 Jan 2007 08:57:01 +0000 (00:57 -0800)]
[PATCH] md: make 'repair' actually work for raid1
When 'repair' finds a block that is different one the various parts of the
mirror. it is meant to write a chosen good version to the others. However it
currently writes out the original data to each. The memcpy to make all the
data the same is missing.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Staubach [Fri, 26 Jan 2007 08:57:00 +0000 (00:57 -0800)]
[PATCH] knfsd: Don't mess with the 'mode' when storing a exclusive-create cookie
NFS V3 (and V4) support exclusive create by passing a 'cookie' which can get
stored with the file. If the file exists but has exactly the right cookie
stored, then we assume this is a retransmit and the exclusive create was
successful.
The cookie is 64bits and is traditionally stored in the mtime and atime
fields. This causes a problem with Solaris7 as negative mtime or atime
confuse it. So we moved two bits into the mode word instead.
But inherited ACLs sometimes overwrite the mode word on create, so this is a
problem.
So we give up and just store 62 of the 64 bits and assume that is close
enough.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 26 Jan 2007 08:56:59 +0000 (00:56 -0800)]
[PATCH] knfsd: replace some warning ins nfsfh.h with BUG_ON or WARN_ON
A couple of the warnings will be followed by an Oops if they ever fire, so may
as well be BUG_ON. Another isn't obviously fatal but has never been known to
fire, so make it a WARN_ON.
Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 26 Jan 2007 08:56:59 +0000 (00:56 -0800)]
[PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads
NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request. The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.
However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply. This can overflow and array and cause an Oops.
This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tilman Schmidt [Fri, 26 Jan 2007 08:56:56 +0000 (00:56 -0800)]
[PATCH] Gigaset ISDN driver error handling fixes
Fix several flaws in the error handling of the Siemens Gigaset ISDN driver,
including one that would cause an Oops when connecting more than one device
of the same type.
Ingo Molnar [Fri, 26 Jan 2007 08:56:55 +0000 (00:56 -0800)]
[PATCH] ACPI: fix cpufreq regression
Recently cpufreq support on my laptop (Lenovo T60) broke completely: when
it's plugged into AC it would never go higher than 1 GHz - neither 1.3 GHz
nor 1.83 GHz is possible - no matter which governor (userspace, speed or
ondemand) is used.
After some cpufreq debugging i tracked the regression back to the following
(totally correct) bug-fix commit:
[PATCH] Correct bound checking from the value returned from _PPC method.
This bugfix, which makes other laptops work, made a previously hidden
(BIOS) bug visible on my laptop.
The bug is the following: if the _PPC (Performance Present Capabilities)
optional ACPI object is queried /after/ bootup then the BIOS reports an
incorrect value of '2'.
My laptop (Lenovo T60) has the following performance states supported:
Per ACPI specification, a _PPC value of '0' means that all 3 performance
states are usable. A _PPC value of '1' means states 1 .. 2 are usable, a
value of '2' means only state '2' (slowest) is usable.
now, the _PPC object is optional, and it also comes with notification.
Furthermore, when a CPU object is initialized, the _PPC object is
initialized as well. So the following evaluation of the _PPC object is
superfluous:
And this is the point where my laptop's BIOS returns the incorrect value of
'2'. Note that it has not sent any notification event, so the value is
probably not really intentional (possibly spurious), and Windows likely
doesnt query it after bootup either. Maybe the value is kept at '2'
normally, and is only set to the real value when a true asynchronous event
(such as AC plug event, battery switch, etc.) occurs.
So i /think/ this is a grey area of the ACPI spec: per the letter of the
spec the _PPC value only changes when notified, so there's no reason to
query it after the system has booted up. So in my opinion the best (and
most compatible) strategy would be to do the change below, and to not
evaluate the _PPC object in the acpi_processor_get_performance_info() call,
but only evaluate it if _PPC is present during CPU object init, or if it's
notified during an asynchronous event. This change is more permissive than
the previous logic, so it definitely shouldnt break any existing system.
This also happens to fix my laptop, which is merrily chugging along at
1.83 GHz now. Yay!
Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@redhat.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Fri, 26 Jan 2007 08:56:54 +0000 (00:56 -0800)]
[PATCH] SPI: alternative fix for spi_busnum_to_master
If a SPI master device exists, udev (udevtrigger) causes kernel crash, due
to wrong kobj pointer in kobject_uevent_env(). This problem was not in
2.6.19.
The backtrace (on MIPS) was:
[<8024db6c>] kobject_uevent_env+0x54c/0x5e8
[<802a8264>] store_uevent+0x1c/0x3c (in drivers/class.c)
[<801cb14c>] subsys_attr_store+0x2c/0x50
[<801cb80c>] flush_write_buffer+0x38/0x5c
[<801cb900>] sysfs_write_file+0xd0/0x190
[<80181444>] vfs_write+0xc4/0x1a0
[<80181cdc>] sys_write+0x54/0xa0
[<8010dae4>] stack_done+0x20/0x3c
flush_write_buffer() passes kobject of spi_master_class.subsys to
subsys_addr_store(), then subsys_addr_store() passes a pointer to a struct
subsystem to store_uevent() which expects a pointer to a struct
class_device. The problem seems subsys_attr_store() called instead of
class_device_attr_store().
This mismatch was caused by commit 3bd0f6943520e459659d10f3282285e43d3990f1, which overrides kset of master
class. This made spi_master_class.subsys.kset.ktype NULL so
subsys_sysfs_ops is used instead of class_dev_sysfs_ops.
The commit was to fix spi_busnum_to_master(). Here is a patch fixes
this function in other way, just searching children list of
class_device.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 08:56:52 +0000 (00:56 -0800)]
[PATCH] x86_64 ia32 vDSO: define arch_vma_name
This patch makes x86_64 define arch_vma_name for CONFIG_IA32_EMULATION. This
makes the ia32 vDSO mapping appear in /proc/PID/maps with "[vdso]" for ia32
processes, as it does on native i386.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 08:56:50 +0000 (00:56 -0800)]
[PATCH] x86_64 ia32 vDSO: use VM_ALWAYSDUMP
This patch fixes ia32 core dumps on x86_64 to include just one phdr for the
vDSO vma. Currently it writes a confused format with two phdrs for the
address, one without contents and one with. This patch removes the
special-case core writing macros for the ia32 vDSO. Instead, it uses
VM_ALWAYSDUMP in the vma. This changes core dumps so they no longer include
the non-PT_LOAD phdrs from the vDSO, consistent with fixed native i386 core
dumps.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 08:56:49 +0000 (00:56 -0800)]
[PATCH] i386 vDSO: use VM_ALWAYSDUMP
This patch fixes core dumps to include the vDSO vma, which is left out now.
It removes the special-case core writing macros, which were not doing the
right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the
vma; there is no need for the fixmap page to be installed. It handles the
CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
get_gate_vma after real vmas in the same way the /proc/PID/maps code does.
This changes core dumps so they no longer include the non-PT_LOAD phdrs from
the vDSO. I made the change to add them in the first place, but in turned out
that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner
to leave them out, and just let the phdrs inside the vDSO image speak for
themselves.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 08:56:48 +0000 (00:56 -0800)]
[PATCH] Add VM_ALWAYSDUMP
This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct. This
provides a clean explicit way to have a vma always included in core dumps, as
is needed for vDSO's.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 08:56:47 +0000 (00:56 -0800)]
[PATCH] Fix gate_vma.vm_flags
This patch fixes the initialization of gate_vma.vm_flags and
gate_vma.vm_page_prot to reflect reality. This makes the "[vdso]" line in
/proc/PID/maps correctly show r-xp instead of ---p, when gate_vma is used
(CONFIG_COMPAT_VDSO on i386).
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 08:56:46 +0000 (00:56 -0800)]
[PATCH] Fix CONFIG_COMPAT_VDSO
I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely. But if it's there,
it should work properly. Currently it's quite haphazard: both real vma and
fixmap are mapped, both are put in the two different AT_* slots, sysenter
returns to the vma address rather than the fixmap address, and core dumps yet
are another story.
This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the fixmap
area consistently. This makes it actually compatible with what the old vdso
implementation did.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Justin Clacherty [Fri, 26 Jan 2007 08:56:44 +0000 (00:56 -0800)]
[PATCH] spi: fix error setting the spi mode in pxa2xx_spi.c
Currently the spi mode can be set to the wrong mode if you are switching
from any mode other than mode 0. This is because the mode is set using a
bitwise or on uncleared bits. The following patch clears the mode bits
before setting the new mode. I've also modified it to use the appropriate
defines from pxa-regs.h for readability.
Signed-off-by: Justin Clacherty <justin@redfish-group.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joerg Roedel [Fri, 26 Jan 2007 08:56:42 +0000 (00:56 -0800)]
[PATCH] KVM: SVM: Propagate cpu shutdown events to userspace
This patch implements forwarding of SHUTDOWN intercepts from the guest on to
userspace on AMD SVM. A SHUTDOWN event occurs when the guest produces a
triple fault (e.g. on reboot). This also fixes the bug that a guest reboot
actually causes a host reboot under some circumstances.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avi Kivity [Fri, 26 Jan 2007 08:56:41 +0000 (00:56 -0800)]
[PATCH] KVM: MMU: Report nx faults to the guest
With the recent guest page fault change, we perform access checks on our
own instead of relying on the cpu. This means we have to perform the nx
checks as well.
Software like the google toolbar on windows appears to rely on this
somehow.
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avi Kivity [Fri, 26 Jan 2007 08:56:41 +0000 (00:56 -0800)]
[PATCH] KVM: MMU: Perform access checks in walk_addr()
Check pte permission bits in walk_addr(), instead of scattering the checks all
over the code. This has the following benefits:
1. We no longer set the accessed bit for accessed which fail permission checks.
2. Setting the accessed bit is simplified.
3. Under some circumstances, we used to pretend a page fault was fixed when
it would actually fail the access checks. This caused an unnecessary
vmexit.
4. The error code for guest page faults is now correct.
The fix helps netbsd further along booting, and allows kvm to pass the new mmu
testsuite.
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Leonard Norrgard [Fri, 26 Jan 2007 08:56:38 +0000 (00:56 -0800)]
[PATCH] KVM: SVM: Fix SVM idt confusion
There's an obvious typo in svm_{get,set}_idt, causing it to access the ldt
instead.
Because these functions are only called for save/load on AMD, the bug does not
impact normal operation. With the fix, save/load works as expected on AMD
hosts.
Signed-off-by: Uri Lublin <uril@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 26 Jan 2007 20:53:20 +0000 (12:53 -0800)]
Write back inode data pages even when the inode itself is locked
In __writeback_single_inode(), when we find a locked inode and we're not
doing a data-integrity sync, we used to just skip writing entirely,
since we didn't want to wait for the inode to unlock.
However, there's really no reason to skip writing the data pages, which
are likely to be the the bulk of the dirty state anyway (and the main
reason why writeback was started for the non-data-integrity case, of
course!)
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Andrew Morton <akpm@osdl.org>, Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 26 Jan 2007 20:47:06 +0000 (12:47 -0800)]
Resurrect 'try_to_free_buffers()' VM hackery
It's not pretty, but it appears that ext3 with data=journal will clean
pages without ever actually telling the VM that they are clean. This,
in turn, will result in the VM (and balance_dirty_pages() in particular)
to never realize that the pages got cleaned, and wait forever for an
event that already happened.
Technically, this seems to be a problem with ext3 itself, but it used to
be hidden by 'try_to_free_buffers()' noticing this situation on its own,
and just working around the filesystem problem.
This commit re-instates that hack, in order to avoid a regression for
the 2.6.20 release. This fixes bugzilla 7844:
http://bugzilla.kernel.org/show_bug.cgi?id=7844
Peter Zijlstra points out that we should probably retain the debugging
code that this removes from cancel_dirty_page(), and I agree, but for
the imminent release we might as well just silence the warning too
(since it's not a new bug: anything that triggers that warning has been
around forever).
Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 26 Jan 2007 01:19:51 +0000 (17:19 -0800)]
[PATCH] x86_64: fix put_user for 64-bit constant
On x86-64, a put_user call using a 64-bit pointer and a constant value that
is > 0xffffffff will produce code that doesn't assemble. This patch fixes
the asm construct to use the Z constraint for 32-bit constants.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 24 Jan 2007 20:31:28 +0000 (12:31 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Fix wrong checksum calculation on 64-bit MIPS
[MIPS] VPE loader: Initialize lists before they're actually being used ...
[MIPS] Fix reported amount of freed memory - it's in kB not bytes
[MIPS] vr41xx: need one more nop with mtc0_tlbw_hazard()
[MIPS] SMTC: Fix module build by exporting symbol
[MIPS] SMTC: Fix TLB sizing bug for TLB of 64 >= entries
[MIPS] Fix APM build
[MIPS] There is no __GNUC_MAJOR__
The problem is the commit replaces some unsigned long with __be32. On
64bit MIPS, a __be32 (i.e. unsigned int) value is represented as a
sign-extented 32-bit value in a 64-bit argument register. So the
address 192.168.0.1 (0xc0a80001) is passed as 0xffffffffc0a80001 to
csum_tcpudp_nofold() but the asm code in the function expects
0x00000000c0a80001, therefore it returns a wrong checksum. Explicit
cast to unsigned long is needed to drop high 32bit.
Ralf Baechle [Wed, 24 Jan 2007 19:13:08 +0000 (19:13 +0000)]
[MIPS] VPE loader: Initialize lists before they're actually being used ...
kspd which due to makefile order happens to be initialized before the
vpe loader causes references to vpecontrol lists before they're actually
been initialized.
Alexey Dobriyan [Tue, 23 Jan 2007 18:30:14 +0000 (21:30 +0300)]
[MIPS] There is no __GNUC_MAJOR__
Gcc major version number is in __GNUC__. As side effect fix checking
with sparse if sparse was built with gcc 4.1 and mips cross-compiler
is 3.4.
Sparse will inherit version 4.1, __GNUC__ won't be filtered from
"-dM -E -xc" output, sparse will pick only new major, effectively becoming
gcc version 3.1 which is unsupported.
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Fix oops when Windows server sent bad domain name null terminator
[CIFS] cifs sprintf fix
[CIFS] Remove 2 unneeded kzalloc casts
[CIFS] Update CIFS version number
Brian King [Wed, 17 Jan 2007 18:32:12 +0000 (12:32 -0600)]
libata: Fixup n_elem initialization
Fixup the inialization of qc->n_elem. It currently gets
initialized to 1 for commands that do not transfer any data.
Fix this by initializing n_elem to 0 and only setting to 1
in ata_scsi_qc_new when there is data to transfer. This fixes
some problems seen with SATA devices attached to ipr adapters.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Sat, 20 Jan 2007 17:10:11 +0000 (02:10 +0900)]
ahci: don't enter slumber on power down
Some ATA/ATAPI devices act weirdly after the link is put into slumber
mode. Some hang completely requiring physical power removal while
others fail to wake up till the link is hardreset a couple of times.
The addition of slumber on power down was never driven by real need.
It just followed what ahci spec said literally. The spec itself seems
faulty in that it doesn't consider devices (not controllers) which
don't support link powersaving mode.
Theory never matches reality when it comes to dark allys of cheap
ATA/ATAPI world. It's just unrealistic to expect vendors to test
rarely used link powersaving feature rigorously. This patch makes
ahci more friendly to the coldness of reality.
This shouldn't have any negative effect - when suspend operation
succeeds, we power off the whole machine; otherwise, we wake up
everything. I can't see any reason to be so elaborate with powering
down the link in the first place.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Wed, 24 Jan 2007 02:09:02 +0000 (20:09 -0600)]
sata_nv: don't rely on NV_INT_DEV indication with ADMA
Several people reported issues with certain drive commands timing out on
sata_nv controllers running in ADMA mode. The commands in question were
non-DMA-mapped commands, usually FLUSH CACHE or FLUSH CACHE EXT.
From experimentation it appears that the NV_INT_DEV indication isn't
always set when a legitimate command completion interrupt is received on
a legacy-mode command, at least not on these controllers in ADMA mode.
When a command is pending on the port, force the flag on always in the
irq_stat value before calling nv_host_intr so that the drive busy state
is always checked by ata_host_intr.
This also fixes some questionable code in nv_host_intr which called
ata_check_status when a command was pending and ata_host_intr returned
"unhandled". If the device interrupted at just the wrong time this could
cause interrupts to be lost.
Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Tue, 23 Jan 2007 06:13:39 +0000 (15:13 +0900)]
ahci: make ULi M5288 ignore interface fatal error bit
As with JMicron controllers, ULi M5288 sets interface fatal error bit
on device error including ATAPI CC. This makes libata hardreset the
port on ATAPI CC thus making it impossible to use. Ignore interface
fatal error bit on ULi M5288. This fixes bugzilla bug #7837.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[IP] TUNNEL: Fix to be built with user application.
include/linux/if_tunnel.h is broken for user application
because it was changed to use __be32 which is required
to include linux/types.h in advance but didn't.
(This issue is found when building MIPL2 daemon. We are not sure this
is the last header to be fixed about __be32.)
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: TAKAMIYA Noriaki <takamiya@po.ntts.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Wed, 24 Jan 2007 06:07:12 +0000 (22:07 -0800)]
[TCP]: rare bad TCP checksum with 2.6.19
The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
changed to unconditional copying of ip_summed field from collapsed
skb. This patch reverts this change.
The majority of substantial work including heavy testing
and diagnosing by: Michael Tokarev <mjt@tls.msk.ru>
Possible reasons pointed by: Herbert Xu and Patrick McHardy.
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 24 Jan 2007 06:00:13 +0000 (22:00 -0800)]
[NETFILTER]: Fix iptables ABI breakage on (at least) CRIS
With the introduction of x_tables we accidentally broke compatibility
by defining IPT_TABLE_MAXNAMELEN to XT_FUNCTION_MAXNAMELEN instead of
XT_TABLE_MAXNAMELEN, which is two bytes larger.
On most architectures it doesn't really matter since we don't have
any tables with names that long in the kernel and the structure
layout didn't change because of alignment requirements of following
members. On CRIS however (and other architectures that don't align
data) this changed the structure layout and thus broke compatibility
with old iptables binaries.
Changing it back will break compatibility with binaries compiled
against recent kernels again, but since the breakage has only been
there for three releases this seems like the better choice.
Spotted by Jonas Berlin <xkr47@outerspace.dyndns.org>.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
I encountered a kernel panic with my test program, which is a very
simple IPv6 client-server program.
The server side sets IPV6_RECVPKTINFO on a listening socket, and the
client side just sends a message to the server. Then the kernel panic
occurs on the server. (If you need the test program, please let me
know. I can provide it.)
This problem happens because a skb is forcibly freed in
tcp_rcv_state_process().
When a socket in listening state(TCP_LISTEN) receives a syn packet,
then tcp_v6_conn_request() will be called from
tcp_rcv_state_process(). If the tcp_v6_conn_request() successfully
returns, the skb would be discarded by __kfree_skb().
However, in case of a listening socket which was already set
IPV6_RECVPKTINFO, an address of the skb will be stored in
treq->pktopts and a ref count of the skb will be incremented in
tcp_v6_conn_request(). But, even if the skb is still in use, the skb
will be freed. Then someone still using the freed skb will cause the
kernel panic.
I suggest to use kfree_skb() instead of __kfree_skb().
Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 17 Jan 2007 00:52:02 +0000 (16:52 -0800)]
[IPSEC]: Policy list disorder
The recent hashing introduced an off-by-one bug in policy list insertion.
Instead of adding after the last entry with a lesser or equal priority,
we're adding after the successor of that entry.
This patch fixes this and also adds a warning if we detect a duplicate
entry in the policy list. This should never happen due to this if clause.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Since we stop using dev_alloc_skb on the IrDA TX frame, we constantly run
into the case of the skb headroom being 0, and thus we call skb_cow for
every IrDA TX frame.
This patch uses a local buffer and memcpy the skb to it, saving us a
kmalloc for each of those IrDA TX frames.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Tue, 16 Jan 2007 03:20:21 +0000 (19:20 -0800)]
[SCTP]: Fix SACK sequence during shutdown
Currently, when association enters SHUTDOWN state,the
implementation will SACK any DATA first and then transmit
the SHUTDOWN chunk. This is against the order required by
2960bis spec. SHUTDOWN must always be first, followed by
SACK. This change forces this order and also enables bundling.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Consider the chunk as Out-of-the-Blue if we don't have
an endpoint. Otherwise discard it as before.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Tue, 16 Jan 2007 03:15:45 +0000 (19:15 -0800)]
[SCTP]: Verify some mandatory parameters.
Verify init_tag and a_rwnd mandatory parameters in INIT and
INIT-ACK chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Tue, 16 Jan 2007 03:12:31 +0000 (19:12 -0800)]
[SCTP]: Set correct error cause value for missing parameters
sctp_process_missing_param() needs to use the SCTP_ERROR_MISS_PARAM
error cause value.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In file included from net/netfilter/xt_state.c:13:
include/net/netfilter/nf_conntrack_compat.h: In function 'nf_ct_l3proto_try_module_get':
include/net/netfilter/nf_conntrack_compat.h:70: error: 'PF_INET' undeclared (first use in this function)
include/net/netfilter/nf_conntrack_compat.h:70: error: (Each undeclared identifier is reported only once
include/net/netfilter/nf_conntrack_compat.h:70: error: for each function it appears in.)
include/net/netfilter/nf_conntrack_compat.h:71: warning: control reaches end of non-void function
make[2]: *** [net/netfilter/xt_state.o] Error 1
make[1]: *** [net/netfilter] Error 2
make: *** [net] Error 2
A simple fix is to have nf_conntrack_compat.h #include <linux/socket.h>.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Yekkirala [Tue, 16 Jan 2007 00:38:45 +0000 (16:38 -0800)]
[SELINUX]: increment flow cache genid
Currently, old flow cache entries remain valid even after
a reload of SELinux policy.
This patch increments the flow cache generation id
on policy (re)loads so that flow cache entries are
revalidated as needed.
Thanks to Herbet Xu for pointing this out. See:
http://marc.theaimsgroup.com/?l=linux-netdev&m=116841378704536&w=2
There's also a general issue as well as a solution proposed
by David Miller for when flow_cache_genid wraps. I might be
submitting a separate patch for that later.
I request that this be applied to 2.6.20 since it's
a security relevant fix.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[IPV6] MCAST: Fix joining all-node multicast group on device initialization.
Join all-node multicast group after assignment of dev->ip6_ptr
because it must be assigned when ipv6_dev_mc_inc() is called.
This fixes Bug#7817, reported by <gernoth@informatik.uni-erlangen.de>.
Closes: 7817 Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 11 Jan 2007 06:06:32 +0000 (22:06 -0800)]
[IPSEC] flow: Fix potential memory leak
When old flow cache entries that are not at the head of their chain
trigger a transient security error they get unlinked along with all
the entries preceding them in the chain. The preceding entries are
not freed correctly.
This patch fixes this by simply leaving the entry around. It's based
on a suggestion by Venkat Yekkirala.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
[ The irony. Somebody still has his sign-off message hardcoded
in a script or his brainstem ;^] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 23 Jan 2007 22:16:31 +0000 (14:16 -0800)]
Clear spurious irq stat information when adding irq handler
Any newly added irq handler may obviously make any old spurious irq
status invalid, since the new handler may well be the thing that is
supposed to handle any interrupts that came in.
So just clear the statistics when adding handlers.
Pointed-out-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dale Farnsworth [Tue, 23 Jan 2007 16:52:25 +0000 (09:52 -0700)]
mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs
mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs
This bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org>
and Jarek Poplawski <jarkao2@o2.pl>. This patch is a modification of their
fixes. We acquire and release the lock for each descriptor that is freed
to minimize the time the lock is held.
Linus Torvalds [Tue, 23 Jan 2007 19:13:06 +0000 (11:13 -0800)]
Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/ehca: Fix mismatched spin_unlock in irq handler
IB/ehca: Fix improper use of yield() with spinlock held
IB/srp: Check match_strdup() return
while lock-profiling the -rt kernel i noticed weird contention during
mmap-intense workloads, and the tracer showed the following gem, in one
of our MM hotpaths:
ouch! a global rw-semaphore taken in one of the most performance-
sensitive codepaths of the kernel. And i dont even have oprofile
enabled! All distro kernels have CONFIG_PROFILING enabled, so this
scalability problem affects the majority of Linux users.
The fix is to enhance blocking_notifier_call_chain() to only take the
lock if there appears to be work on the call-chain.
With this patch applied i get nicely saturated system, and much higher
munmap performance, on SMP systems.
And as a bonus this also fixes a similar scalability bottleneck in the
thread-exit codepath: profile_task_exit() ...
Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>