Remove "noreplacement" kernel command line option.
It is no longer valid to not replace instructions, since we depend on
different behaviour depending on CPU capabilities.
If you need to limit the capabilities of the replacements (because the
boot CPU has features that non-boot CPU's do not have, for example), you
need to explicitly disable those capabilities that are not shared across
all CPU's.
For example, if your boot CPU has FXSR, but other CPU's in your system
do not, you need to use the "nofxsr" kernel command line, not disable
instruction replacement per se.
x86: use alternative instructions for fnsave/fxsave too
This one ends up using an inline asm format that claims to read memory
and then clobber it (rather than just write it directly), which made it
easier to use the existing "alternative_input()" infrastructure support.
x86: make restore_fpu() use alternative assembler instructions
It's really just a single instruction, conditional on whether the CPU
supports FXSR or not, so implement it as such instead of making it a
function that queries FXSR dynamically.
This means that the instruction just gets automatically rewritten to the
correct one at boot-time.
The portptr pointing to the port in the conntrack tuple is declared static,
which could result in memory corruption when two packets of the same
protocol are NATed at the same time and one conntrack goes away.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up incorrect "unlikely()" on %gs reload in x86 __switch_to
These days %gs is normally the TLS segment, so it's no longer zero. As
a result, we shouldn't just assume that %fs/%gs tend to be zero
together, but test them independently instead.
Also, fix setting of debug registers to use the "next" pointer instead
of "current". It so happens that the scheduler will have set the new
current pointer before calling __switch_to(), but that's just an
implementation detail.
Rusty Russell [Thu, 21 Jul 2005 20:14:46 +0000 (13:14 -0700)]
[NETFILTER]: ip_conntrack_expect_related must not free expectation
If a connection tracking helper tells us to expect a connection, and
we're already expecting that connection, we simply free the one they
gave us and return success.
The problem is that NAT helpers (eg. FTP) have to allocate the
expectation first (to see what port is available) then rewrite the
packet. If that rewrite fails, they try to remove the expectation,
but it was freed in ip_conntrack_expect_related.
This is one example of a larger problem: having registered the
expectation, the pointer is no longer ours to use. Reference counting
is needed for ctnetlink anyway, so introduce it now.
To have a single "put" path, we need to grab the reference to the
connection on creation, rather than open-coding it in the caller.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Tue, 19 Jul 2005 21:00:53 +0000 (14:00 -0700)]
[NET]: NETCONSOLE must depend on INET
NETCONSOLE=y and INET=n results in the following compile error:
net/built-in.o: In function `netpoll_parse_options':
: undefined reference to `in_aton'
net/built-in.o: In function `netpoll_parse_options':
: undefined reference to `in_aton'
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Duncan Sands <baldrick@free.fr> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Fusco [Tue, 19 Jul 2005 20:56:29 +0000 (13:56 -0700)]
[ATM]: [ambassador] Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Fusco [Tue, 19 Jul 2005 20:56:01 +0000 (13:56 -0700)]
[ATM]: [firestream] fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Marcelo Feitoza Parisi <marcelo@feitoza.com.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 18 Jul 2005 20:45:12 +0000 (13:45 -0700)]
[NET]: Kconfig: NETCONSOLE and NETPOLL together
Put NETCONSOLE and NETPOLL options together since they are related.
This cuts down on the hassle of flipping back and forth between
the Networking menu and the Network drivers menu to change their
config settings.
Tested with menuconfig, gconfig, and xconfig.
gconfig has a small problem with this. I think that it's
a bug in gconfig and I will take it up with Romain Lievin.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Lucas [Mon, 18 Jul 2005 20:38:07 +0000 (13:38 -0700)]
[SCTP]: Audit return code of create_proc_*
From: Christophe Lucas <clucas@rotomalug.org>
Audit return of create_proc_* functions.
Signed-off-by: Christophe Lucas <clucas@rotomalug.org> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Fusco [Mon, 18 Jul 2005 20:36:38 +0000 (13:36 -0700)]
[NET]: Fix "nocast type" warnings in skbuff.h
From: Victor Fusco <victor@cetuc.puc-rio.br>
Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Fusco [Mon, 18 Jul 2005 20:35:43 +0000 (13:35 -0700)]
[NETLINK]: Fix "nocast type" warnings
From: Victor Fusco <victor@cetuc.puc-rio.br>
Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 18 Jul 2005 20:34:35 +0000 (13:34 -0700)]
[PKT_SCHED]: Kill TCF_META_ID_TCCLASSID.
Thomas Graf states:
> I used to mark such ids as obsolete in the header but since
> skb is on diet anyway and there has been no official
> iproute2 release with the ematch bits included it might be
> a better idea to remove the ids from the header completely.
> Those that have picked up my patch on netdev shouldn't care
> about a ABI breakage, actually I doubt that someone is using
> it already.
So here's the patch to remove it.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Mon, 18 Jul 2005 20:30:53 +0000 (13:30 -0700)]
[PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeue
The current call to __qdisc_dequeue_head leads to a branch
misprediction for every loop iteration, the fact that the
most common priority is 2 makes this even worse. This issue
has been brought up by Eric Dumazet <dada1@cosmosbay.com>
but unlike his solution which was to manually unroll the loop,
this approach preserves the possibility to increase the number
of bands at compile time.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Schulz [Sat, 16 Jul 2005 16:17:18 +0000 (17:17 +0100)]
[PATCH] ARM: 2815/1: Shark: new defconfig, fixes with __io and serial ports
Patch from Alexander Schulz
This patch brings a new default config file for the shark and
fixes a compilation issue with io addressing and a runtime
problem with the serial ports, where I corrected a wrong
regshift value.
These are all shark specific files so I hope it is ok to
put them in one patch.
Signed-off-by: Alexander Schulz <alex@shark-linux.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 16 Jul 2005 08:30:53 +0000 (09:30 +0100)]
[PATCH] Serial: Move deprecation of register_serial forward to September
I think it's about time to make the build a little more vocal about the
expiry of these functions. Due to recent discussions with problems in
the console initialisation vs power manglement, I'd like to move the
date forward to September.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Thomas Gleixner [Fri, 15 Jul 2005 13:53:51 +0000 (14:53 +0100)]
[MTD] NAND: Fix broken bad block scan for 16 bit devices
The previous change to read a single byte from oob breaks the
bad block scan on 16 bit devices, when the byte is on an odd
address. Read the complete oob for now.
Remove the unused arguments from check_short_pattern()
Move the wait for ready function so it is only executed when
consecutive reads happen.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes
A malicious 32bit app can have an elf section at 0xffffe000. During
exec of this app, we will have a memory leak as insert_vm_struct() is
not checking for return value in syscall32_setup_pages() and thus not
freeing the vma allocated for the vsyscall page.
Check the return value and free the vma incase of failure.
Use of the time_after() macro, defined at linux/jiffies.h, which deal
with wrapping correctly and are nicer to read.
Signed-off-by: Marcelo Feitoza Parisi <marcelo@feitoza.com.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Use of the time_after() macro, defined at linux/jiffies.h, which deal
with wrapping correctly and are nicer to read.
Signed-off-by: Marcelo Feitoza Parisi <marcelo@feitoza.com.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[PATCH] md/raid1: clear bitmap when fullsync completes
We need to be careful differentiating between a resync of a complete array,
in which we can clear the bitmap, and a resync of a degraded array, in
which we cannot.
This patch cleans all that up.
Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michal Ostrowski [Fri, 15 Jul 2005 10:56:33 +0000 (03:56 -0700)]
[PATCH] rocket.c: Fix ldisc ref count handling
If bailing out because there is nothing to receive in rp_do_receive(),
tty_ldisc_deref is not called. Failure to do so increases the ref count
and causes release_dev() to hang since it can't get the ref count to 0.
Signed-off-by: Michal Ostrowski <mostrows@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch includes feedback from Andrew and Christoph. Thanks for
taking time to review.
Use of empty_zero_page was eliminated to fix compilation for architectures
that don't have it.
This patch removes setting pages up-to-date in ext2_get_xip_page and all
bug checks to verify that the page is indeed up to date. Setting the page
state on mapping to userland is bogus. None of the code patchs involved
with these pages in mm cares about the page state.
still on my ToDo list: identify a place outside second extended where
__inode_direct_access should reside
[PATCH] v4l: bug fixes for tuner, cx88 and tea5767
- In CX88 code, some cards needs to have audio reprogramed after changing
video channel;
- Tuner autodetection code seems not to work on some cards. Now,
no_autodetect insmod option allows disabling autodetection code;
- Minor fixes in tea5767 to reduce integer trunc;
- There are some new Pixelview Ultra Pro cards that doesn't use TEA5767
for radio. As autodetection is capable of checking for tea, radio tuners
and addresses removed.
This patch updates the Option Card driver:
- remove a deadlock
- add sponsor notice
- add new card
- renamed the device to what's usually printed on it
- removed some dead code
- clean up a bunch of irregular whitespace (end-of-line, tabs)
Also add a MAINTAINERS entry for the Option Card driver.
Remove ROOT_DEV after unexporting it in the previous patch, as requested time
ago by Christoph Hellwig.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minimal patch removing uses of ROOT_DEV; next patch unexports it. I've
opposed this, but I've planned to reintroduce the functionality without using
ROOT_DEV.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] uml: allow building as 32-bit binary on 64bit host
This patch makes the command:
make ARCH=um SUBARCH=i386
work on x86_64 hosts (with support for building 32-bit binaries). This is
especially needed since 64-bit UMLs don't support 32-bit emulation for guest
binaries, currently. This has been tested in all possible cases and works.
Only exception is that I've built but not tested a 64-bit binary, because I
hadn't a 64-bit filesystem available.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pcap support was not working because of some linking problems (expressing
the construct in Kbuild was a bit difficult) and because there was no user
request. Now that this has come back, here's the support.
This has been tested and works on both 32 and 64-bit hosts, even when
"cross-"building 32-bit binaries.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the error message to refer to the error code, i.e. err, not count, plus
add some cosmetical fixes.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) Cleanup an ugly hyper-nested code in Makefile (now only the arith.
expression is passed through the host bash).
2) Fix a problem with GCC 2.95: according to a report from Raphael Bossek,
.remap_data : { arch/um/sys-SUBARCH/unmap_fin.o (.data .bss) } is expanded
into: .remap_data : { arch/um/sys-i386 /unmap_fin.o (.data .bss) }
(because I didn't use ## to join the two tokens), thus stopping linking. Pass
the whole path from the Makefile as a simple and nice fix.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Raphael Bossek <raphael.bossek@gmx.de> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*) Reorganize the two cases of sys_modify_ldt to share all the reasonably
common code.
*) Avoid memory allocation when unneeded (i.e. when we are writing and the
passed buffer size is known), thus not returning ENOMEM (which isn't
allowed for this syscall, even if there is no strict "specification").
*) Add copy_{from,to}_user to modify_ldt for TT mode.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] uml: workaround host bug in "TT mode vs. NPTL link fix"
A big bug has been diagnosed on hosts running the SKAS patch and built with
CONFIG_REGPARM, due to some missing prevent_tail_call().
On these hosts, this workaround is needed to avoid triggering that bug,
because "to" is kept by GCC only in EBX, which is corrupted at the return of
mmap2().
Since to trigger this bug int 0x80 must be used when doing the call, it rarely
manifests itself, so I'd prefer to get this merged to workaround that host
bug, since it should cause no functional change. Still, you might prefer to
drop it, I'll leave this to you.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This construct is refused by GCC 4, so here's the (corrected) fix. Thanks to
Russell for noticing a stupid mistake I did when first sending this.
As he noted, the code is largely suboptimal however it currently works, and
will be fixed shortly. Just read the access_ok check on fp which is NULL, or
the pointer arithmetic below which should be done with a cast to void*:
The code shows clearly that has been taken from
arch/x86_64/kernel/signal.c:setup_rt_frame(), maybe in a bit of a hurry.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark's patch added "attribute((packed))" for pcdp_uart, without
accounting for the fact that the structure definition _relied_ on
implicit padding by 6 bytes. Fix is to make the padding explicit.
Signed-off-by: David Mosberger-Tang <David.Mosberger@acm.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
Dean Nelson [Wed, 13 Jul 2005 12:45:00 +0000 (05:45 -0700)]
[IA64] fix call of smp_processor_id() by XPC while
XPC calls smp_processor_id() twice from xpc_setup_infrastructure() with
preemption enabled, which gets flagged if 'DEBUG_PREEMPT=y'. This patch
replaces the two calls to smp_processor_id() by a single call to
raw_smp_processor_id() since any CPU within the partition will do.
Signed-off-by: Dean Nelson <dcn@sgi.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
Olof Johansson [Wed, 13 Jul 2005 08:11:44 +0000 (01:11 -0700)]
[PATCH] ppc64: add 970MP PVR
Add PVR value and tests for 970MP. Also switch to a simpler (but slightly
longer) check at init time for simplicity.
Signed-off-by: Olof Johansson <olof@austin.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Olaf Hering [Wed, 13 Jul 2005 08:11:41 +0000 (01:11 -0700)]
[PATCH] ppc32: make -j12 all fails in uImage target
make -j zImage may call if_changed twice at the same time, the result is a
corrupted vmlinux.gz
Write to a temporary file for the time being until someone with make skills
fix the serialization properly.
Signed-off-by: Olaf Hering <olh@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Tom Rini <trini@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steve Dickson [Wed, 13 Jul 2005 08:10:47 +0000 (01:10 -0700)]
[PATCH] NFS: procfs/sysctl interfaces for lockd do not work on x86_64
Allow the setting of NLM timeouts and grace periods through the proc and
sysclt interfaces on x86_64 architectures
Signed-off-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add special case for the POSIX_FADV_DONTNEED and POSIX_FADV_NOREUSE hint
values for s390-64. The user space values in the s390-64 glibc headers for
these two defines have always been 6 and 7 instead of 4 and 5. All 64 bit
applications therefore use the "wrong" values. To get these applications
working without recompiling the kernel needs to accept the "wrong" values.
Since the values for s390-31 are 4 and 5 the compat wrapper for fadvise64
and fadvise64_64 need to rewrite the values for 31 bit system calls.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix for a race condition when a task gets preempted by another task while
executing the destroy_context(...) in a FEW_CONTEXTS environment.
mm->context == NO_CONTEXT but the context_map may indicate all contexts are
in use.
The solution to this problem is to disable kernel preemption while
destroying a MMU context.
Signed-off-by: Guillaume Autran <gautran@mrv.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Fix soft lockup due to NTFS: VFS part and explanation
Something has changed in the core kernel such that we now get concurrent
inode write outs, one e.g via pdflush and one via sys_sync or whatever.
This causes a nasty deadlock in ntfs. The only clean solution
unfortunately requires a minor vfs api extension.
First the deadlock analysis:
Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
time. The NTFS driver uses the page cache for storing the file contents as
usual. More interestingly this file contains the table of on-disk inodes
as a sequence of MFT_RECORDs. Thus NTFS driver accesses the on-disk inodes
by accessing the MFT_RECORDs in the page cache pages of the loaded inode
$MFT.
The situation: VFS inode X on a mounted ntfs volume is dirty. For same
inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
table of inodes ($MFT, inode 0).
What happens:
Process 1: sys_sync()/umount()/whatever... calls __sync_single_inode() for
$MFT -> do_writepages() -> write_page for the dirty page containing the
on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
clears PageUptodate() on the page to prevent anyone else getting hold of it
whilst it does the write out (this is necessary as the on-disk inode needs
"fixups" applied before the write to disk which are removed again after the
write and PageUptodate is then set again). It then analyses the page
looking for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this on-disk
inode. This then calls ilookup5() to check if the corresponding VFS inode
is in icache(). This in turn calls ifind() which waits on the inode lock
via wait_on_inode whilst holding the global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the same
VFS inode X on the ntfs volume. This locks the inode (I_LOCK) then calls
write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
the page (in page cache of table of inodes $MFT, inode 0) containing the
on-disk inode. This page has PageUptodate() clear because of Process 1
(see above) so read_cache_page() blocks when tries to take the page lock
for the page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the on-disk
inode X and it is waiting on the inode X to be unlocked in ifind() so it
can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for the
page to be unlocked so it can call ntfs_readpage() or discover that
Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The only sensible solution: NTFS does not care whether the VFS inode is
locked or not when it calls ilookup5() (it doesn't use the VFS inode at
all, it just uses it to find the corresponding ntfs_inode which is of
course attached to the VFS inode (both are one single struct); and it uses
the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
hence we want a modified ilookup5_nowait() which is the same as ilookup5()
but it does not wait on the inode lock.
Without such functionality I would have to keep my own ntfs_inode cache in
the NTFS driver just so I can find ntfs_inodes independent of their VFS
inodes which would be slow, memory and cpu cycle wasting, and incredibly
stupid given the icache already exists in the VFS.
Below is a patch that does the ilookup5_nowait() implementation in
fs/inode.c and exports it.
ilookup5_nowait.diff:
Introduce ilookup5_nowait() which is basically the same as ilookup5() but
it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
done in ifind()).
This is needed to avoid a nasty deadlock in NTFS.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Bottomley [Wed, 13 Jul 2005 13:38:05 +0000 (09:38 -0400)]
[PATCH] fix voyager subarchitecture EXPORT_SYMBOL breakage caused by i386_ksym reduction
This patch:
[PATCH] Remove i386_ksyms.c, almost
made files like smp.c do their own EXPORT_SYMBOLS. This means that all
subarchitectures that override these symbols now have to do the exports
themselves. This patch adds the exports for voyager (which is the most
affected since it has a separate smp harness). However, someone should
audit all the other subarchitectures to see if any others got broken.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>