Al Viro [Fri, 21 Oct 2005 07:20:48 +0000 (03:20 -0400)]
[PATCH] gfp_t: fs/*
- ->releasepage() annotated (s/int/gfp_t), instances updated
- missing gfp_t in fs/* added
- fixed misannotation from the original sweep caught by bitwise checks:
XFS used __nocast both for gfp_t and for flags used by XFS allocator.
The latter left with unsigned int __nocast; we might want to add a
different type for those but for now let's leave them alone. That,
BTW, is a case when __nocast use had been actively confusing - it had
been used in the same code for two different and similar types, with
no way to catch misuses. Switch of gfp_t to bitwise had caught that
immediately...
One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications. Left alone for now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro [Fri, 21 Oct 2005 06:55:38 +0000 (02:55 -0400)]
[PATCH] gfp_t: infrastructure
Beginning of gfp_t annotations:
- -Wbitwise added to CHECKFLAGS
- old __bitwise renamed to __bitwise__
- __bitwise defined to either __bitwise__ or nothing, depending on
__CHECK_ENDIAN__ being defined
- gfp_t switched from __nocast to __bitwise__
- force cast to gfp_t added to __GFP_... constants
- new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
the result to int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens Axboe [Fri, 28 Oct 2005 06:30:39 +0000 (08:30 +0200)]
[BLOCK] elevator switch fixes/cleanup
- 100msec sleep is a little excessive, lots of requests can complete
in that timeframe. Use 10msec instead.
- Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what
is going on.
Tejun Heo [Fri, 28 Oct 2005 06:29:39 +0000 (08:29 +0200)]
[BLOCK] Reimplement elevator switch
This patch reimplements elevator switch. This patch assumes generic
dispatch queue patchset is applied.
* Each request is tagged with REQ_ELVPRIV flag if it has its elevator
private data set.
* Requests which doesn't have REQ_ELVPRIV flag set never enter
iosched. They are always directly back inserted to dispatch queue.
Of course, elevator_put_req_fn is called only for requests which
have its REQ_ELVPRIV set.
* Request queue maintains the current number of requests which have
its elevator data set (elevator_set_req_fn called) in
q->rq->elvpriv.
* If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
is not allocated for new requests.
To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS. New implementation is much simpler and main code
paths are less cluttered, IMHO.
Tejun Heo [Thu, 20 Oct 2005 14:46:23 +0000 (16:46 +0200)]
[PATCH] 03/05 move last_merge handlin into generic elevator code
Currently, both generic elevator code and specific ioscheds
participate in the management and usage of last_merge. This
and the following patches move last_merge handling into
generic elevator code.
Jens Axboe [Thu, 20 Oct 2005 14:42:29 +0000 (16:42 +0200)]
[PATCH] 02/05: update ioscheds to use generic dispatch queue
This patch updates all four ioscheds to use generic dispatch
queue. There's one behavior change in as-iosched.
* In as-iosched, when force dispatching
(ELEVATOR_INSERT_BACK), batch_data_dir is reset to REQ_SYNC
and changed_batch and new_batch are cleared to zero. This
prevernts AS from doing incorrect update_write_batch after
the forced dispatched requests are finished.
* In cfq-iosched, cfqd->rq_in_driver currently counts the
number of activated (removed) requests to determine
whether queue-kicking is needed and cfq_max_depth has been
reached. With generic dispatch queue, I think counting
the number of dispatched requests would be more appropriate.
* cfq_max_depth can be lowered to 1 again.
Original from Tejun Heo, modified version applied.
Tejun Heo [Thu, 20 Oct 2005 14:23:44 +0000 (16:23 +0200)]
[PATCH] 01/05 Implement generic dispatch queue
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched. This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.
Tejun Heo [Thu, 20 Oct 2005 08:56:41 +0000 (10:56 +0200)]
[PATCH] fix try_module_get race in elevator_find
This patch removes try_module_get race in elevator_find.
try_module_get should always be called with the spinlock protecting
what the module init/cleanup routines register/unregister to held. In
the case of elevators, we should be holding elv_list to avoid it going
away between spin_unlock_irq and try_module_get.
Chen, Kenneth W [Thu, 13 Oct 2005 19:49:29 +0000 (21:49 +0200)]
Following the same idea, it occurs to me that we should only update
disk stat when "now" is different from disk->stamp. Otherwise, we
are again needlessly adding zero to the stats.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Jens Axboe <axboe@suse.de>
Chen, Kenneth W [Thu, 13 Oct 2005 19:48:42 +0000 (21:48 +0200)]
[patch] remove gendisk->stamp_idle field
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Jens Axboe <axboe@suse.de>
Trond Myklebust [Fri, 28 Oct 2005 02:12:41 +0000 (22:12 -0400)]
NFS: Add optional post-op getattr instruction to the NFSv4 file close.
"Optional" means that the close call will not fail if the getattr
at the end of the compound fails.
If it does succeed, try to refresh inode attributes.
Chuck Lever [Tue, 25 Oct 2005 15:48:36 +0000 (11:48 -0400)]
NFS: nfs_lookup doesn't need to revalidate the parent directory's inode
nfs_lookup() used to consult a lookup cache before trying an actual wire
lookup operation. The lookup cache would be invalid, of course, if the
parent directory's mtime had changed, so nfs_lookup performed an inode
revalidation on the parent.
Since nfs_lookup() doesn't use a cache anymore, the revalidation is no
longer necessary. There are cases where it will generate a lot of
unnecessary GETATTR traffic.
See http://bugzilla.linux-nfs.org/show_bug.cgi?id=9
Test-plan:
Use lndir and "rm -rf" and watch for excess GETATTR traffic or application
level errors.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 28 Oct 2005 02:12:39 +0000 (22:12 -0400)]
NFS: Don't let nfs_end_data_update() clobber attribute update information
Since we almost always call nfs_end_data_update() after we called
nfs_refresh_inode(), we now end up marking the inode metadata
as needing revalidation immediately after having updated it.
This patch rearranges things so that we mark the inode as needing
revalidation _before_ we call nfs_refresh_inode() on those operations
that need it.
Herbert Xu [Thu, 27 Oct 2005 08:47:46 +0000 (18:47 +1000)]
[TCP]: Clear stale pred_flags when snd_wnd changes
This bug is responsible for causing the infamous "Treason uncloaked"
messages that's been popping up everywhere since the printk was added.
It has usually been blamed on foreign operating systems. However,
some of those reports implicate Linux as both systems are running
Linux or the TCP connection is going across the loopback interface.
In fact, there really is a bug in the Linux TCP header prediction code
that's been there since at least 2.1.8. This bug was tracked down with
help from Dale Blount.
The effect of this bug ranges from harmless "Treason uncloaked"
messages to hung/aborted TCP connections. The details of the bug
and fix is as follows.
When snd_wnd is updated, we only update pred_flags if
tcp_fast_path_check succeeds. When it fails (for example,
when our rcvbuf is used up), we will leave pred_flags with
an out-of-date snd_wnd value.
When the out-of-date pred_flags happens to match the next incoming
packet we will again hit the fast path and use the current snd_wnd
which will be wrong.
In the case of the treason messages, it just happens that the snd_wnd
cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore
when a zero-window packet comes in we incorrectly conclude that the
window is non-zero.
In fact if the peer continues to send us zero-window pure ACKs we
will continue making the same mistake. It's only when the peer
transmits a zero-window packet with data attached that we get a
chance to snap out of it. This is what triggers the treason
message at the next retransmit timeout.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Oleg Nesterov [Wed, 26 Oct 2005 16:26:53 +0000 (20:26 +0400)]
[PATCH] Fix cpu timers expiration time
There's a silly off-by-one error in the code that updates the expiration
of posix CPU timers, causing them to not be properly updated when they
hit exactly on their expiration time (which should be the normal case).
This causes them to then fire immediately again, and only _then_ get
properly updated.
Peter Wainwright [Wed, 26 Oct 2005 08:59:02 +0000 (01:59 -0700)]
[PATCH] Fix HFS+ to free up the space when a file is deleted.
fsck_hfs reveals lots of temporary files accumulating in the hidden
directory "\000\000\000HFS+ Private Data". According to the HFS+
documentation these are files which are unlinked while in use. However,
there may be a bug in the Linux hfsplus implementation which causes this to
happen even when the files are not in use. It looks like the "opencnt"
field is never initialized as (I think) it should be in hfsplus_read_inode.
This means that a file can appear to be still in use when in fact it has
been closed. This patch seems to fix it for me.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Garzik [Wed, 26 Oct 2005 08:59:01 +0000 (01:59 -0700)]
[PATCH] kill massive wireless-related log spam
Although this message is having the intended effect of causing wireless
driver maintainers to upgrade their code, I never should have merged this
patch in its present form. Leading to tons of bug reports and unhappy
users.
Some wireless apps poll for statistics regularly, which leads to a printk()
every single time they ask for stats. That's a little bit _too_ much of a
reminder that the driver is using an old API.
Change this to printing out the message once, per kernel boot.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ppc64: Fix wrong register mapping in mpic driver
The mpic interrupt controller driver (used on G5 and early pSeries among
others) has a bug where it doesn't get the right virtual address for the
timer registers. It causes the driver to poke at the MMIO space of
whatever has been mapped just next to it (ouch !) when initializing and
causes boot failures on some IBM machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Magnus Damm [Wed, 26 Oct 2005 08:58:59 +0000 (01:58 -0700)]
[PATCH] NUMA: broken per cpu pageset counters
The NUMA counters in struct per_cpu_pageset (linux/mmzone.h) are never
cleared today. This works ok for CPU 0 on NUMA machines because
boot_pageset[] is already zero, but for other CPU:s this results in
uninitialized counters.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 26 Oct 2005 08:58:58 +0000 (01:58 -0700)]
[PATCH] md: make sure mdthreads will always respond to kthread_stop
There are still a couple of cases where md threads (the resync/recovery
thread) is not interruptible since the change to use kthreads. All places
there it tests "signal_pending", it should also test kthread_should_stop,
as with this patch.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ian Campbell [Wed, 26 Oct 2005 14:04:21 +0000 (15:04 +0100)]
[ARM] 3032/1: sparse: complains about generic_fls() prototype in asm-arm/bitops.h
Patch from Ian Campbell
Sparse complains about the definition of generic_fls in asm-arm/bitops.h:
CHECK /home/icampbell/devel/kernel/2.6/arch/arm/mach-pxa/viper.c
include2/asm/bitops.h:350:34: error: marked inline, but without a definition
The definition is unnecessary since linux/bitops.h defines generic_fls before including asm/bitops.h and asm/bitops.h should not be included directly. There are still some places where asm/bitops.h is directly included, but I think that code should be fixed. I was a little wary of the patch for this reason but lubbock, mainstone and assabet all build OK and so do my in house boards...
ARM is the only arch with the generic_fls prototype in this way.
Signed-off-by: Ian Campbell <icampbell@arcom.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Wed, 26 Oct 2005 03:40:09 +0000 (20:40 -0700)]
PCI: be more verbose about resource quirks
When reserving an PCI quirk, note that in the kernel bootup messages.
Also, parse the strange PIIX4 device resources - they should get their
own PCI resource quirks, but for now just print out what it finds to
verify that the code does the right thing.
David Engel [Sat, 22 Oct 2005 03:09:16 +0000 (22:09 -0500)]
[IPV4]: Fix setting broadcast for SIOCSIFNETMASK
Fix setting of the broadcast address when the netmask is set via
SIOCSIFNETMASK in Linux 2.6. The code wanted the old value of
ifa->ifa_mask but used it after it had already been overwritten with
the new value.
Signed-off-by: David Engel <gigem@comcast.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Jayachandran C [Thu, 13 Oct 2005 18:43:02 +0000 (11:43 -0700)]
[IPV4]: Remove dead code from ip_output.c
skb_prev is assigned from skb, which cannot be NULL. This patch removes the
unnecessary NULL check.
Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Jayachandran C [Thu, 13 Oct 2005 18:43:05 +0000 (11:43 -0700)]
[NETLINK]: Remove dead code in af_netlink.c
Remove the variable nlk & call to nlk_sk as it does not have any side effect.
Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Herbert Xu [Fri, 14 Oct 2005 23:42:39 +0000 (09:42 +1000)]
[IPV4]: Kill redundant rcu_dereference on fa_info
This patch kills a redundant rcu_dereference on fa->fa_info in fib_trie.c.
As this dereference directly follows a list_for_each_entry_rcu line, we
have already taken a read barrier with respect to getting an entry from
the list.
This read barrier guarantees that all values read out of fa are valid.
In particular, the contents of structure pointed to by fa->fa_info is
initialised before fa->fa_info is actually set (see fn_trie_insert);
the setting of fa->fa_info itself is further separated with a write
barrier from the insertion of fa into the list.
Therefore by taking a read barrier after obtaining fa from the list
(which is given by list_for_each_entry_rcu), we can be sure that
fa->fa_info contains a valid pointer, as well as the fact that the
data pointed to by fa->fa_info is itself valid.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Harald Welte [Sun, 16 Oct 2005 12:22:59 +0000 (14:22 +0200)]
[NETFILTER] ip_conntrack: Make "hashsize" conntrack parameter writable
It's fairly simple to resize the hash table, but currently you need to
remove and reinsert the module. That's bad (we lose connection
state). Harald has even offered to write a daemon which sets this
based on load.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>