Thomas Graf [Sat, 5 Nov 2005 20:14:28 +0000 (21:14 +0100)]
[PKT_SCHED]: (G)RED: Introduce hard dropping
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
marking is enabled packets should still be dropped once the
average queue length exceeds the maximum threshold.
This _may_ help to avoid global synchronisation during small
bursts of peers advertising but not caring about ECN. Use this
option very carefully, it does more harm than good if
(qth_max - qth_min) does not cover at least two average burst
cycles.
The difference to the current behaviour, in which we'd run into
the hard queue limit, is that due to the low pass filter of RED
short bursts are less likely to cause a global synchronisation.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:24 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Remove auto-creation of default VQ
Since we are no longer depending on the default VQ to be always
allocated we can leave it up to the user to actually create it.
This gives the user the ability to leave it out on purpose and
enqueue packets directly to the device without applying the RED
algorithm.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:23 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Dont abuse default VQ for equalizing
Introduces a new red parameter set for use in equalize mode,
although only the qavg variable and the idle period marker are
being used for now this makes it possible to allow a separate
parameter set to be used for equalize later on.
The use of this separate parameter set fixes a bogus start of
an idle period in gred_drop() which did start an idle period
on the default VQ even if equalize mode was disabled.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:17 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Do not reset statistics in gred_reset/gred_change
Qdiscs are not supposed to reset statistics in reset() and while
changing parameters. My argumentation is that if the user wants
the counters to be reset he can simply remove and readd the
qdiscs, that's what most users do anyway.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:16 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Use new generic red interface
Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.
This brings GRED back to the level of RED and improves the
accuracy of the averge queue length calculations when stab
suggests a zero shift.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:15 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Use central VQ change procedure
Introduces a function gred_change_vq() acting as a central point
to change VQ parameters. Fixes priority inheritance in rio mode
when the default DP equals 0. Adds proper locking during changes.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:13 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Use a central table definition change procedure
Introduces a function gred_change_table_def() acting as a central
point to change the table definition.
Adds missing validations for table definition: MAX_DPs > DPs > 0
and def_DP < DPs thus fixing possible invalid memory reference
oopses. Only root could do it but having a typo crashing the
machine is a bit hard.
Adds missing locking while changing the table definition, the
operation of changing the number of DPs and removing shadowed VQs
may not be interrupted by a dequeue.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:11 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Cleanup dumping
Avoids the allocation of a buffer by appending the VQs directly
to the skb and simplifies the code by using the appropriate
message construction macros.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:09 +0000 (21:14 +0100)]
[PKT_SCHED]: GRED: Cleanup equalize flag and add new WRED mode detection
Introduces a flags variable using bitops and transforms eqp to use
it. Converts the conditions of the form (wred && rio) to (wred)
since wred can only be enabled in rio mode anyway.
The patch also improves WRED mode detection. The current behaviour
does not allow WRED mode to be turned off again without removing
the whole qdisc first. The new algorithm checks each VQ against
each other looking for equal priorities every time a VQ is changed
or added. The performance is poor, O(n**2), but it's used only
during administrative tasks and the number of VQs is strictly
limited.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:08 +0000 (21:14 +0100)]
[PKT_SCHED]: RED: Cleanup and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit. Removes Jamal's
obsolete email addresses upon his own request.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:05 +0000 (21:14 +0100)]
[PKT_SCHED]: RED: Use new generic red interface
Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:04 +0000 (21:14 +0100)]
[NET]: Introduce INET_ECN_set_ce() function
Changes IP_ECN_set_ce() and IP6_ECN_set_ce() to return 0 if the CE
bits could not bet set because none of the ECT bits are set or 1
if the CE bits are already set or have been successfully set.
Introduces INET_ECN_set_ce(skb) to enable CE bits for all supported
protocols.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Sat, 5 Nov 2005 20:14:03 +0000 (21:14 +0100)]
[PKT_SCHED]: Generic RED layer
Extracts the RED algorithm from sch_red.c and puts it into include/net/red.h
for use by other RED based modules. The statistics are extended to be more
fine grained in order to differ between probability/forced marks/drops.
We now reset the average queue length when setting new parameters, leaving
it might result in an unreasonable qavg for a while depending on the value of W.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Harald Welte [Thu, 3 Nov 2005 19:03:24 +0000 (20:03 +0100)]
[NETFILTER] nf_queue: Fix Ooops when no queue handler registered
With the new nf_queue generalization in 2.6.14, we've introduced a bug
that causes an oops as soon as a packet is queued but no queue handler
registered. This patch fixes it.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Harald Welte [Thu, 3 Nov 2005 18:25:56 +0000 (19:25 +0100)]
[NETFILTER]: CONNMARK target needs ip_conntrack
There's a missing dependency from the CONNMARK target to ip_conntrack.
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Harald Welte [Thu, 3 Nov 2005 19:17:51 +0000 (20:17 +0100)]
[NETFILTER] NAT: Fix module refcount dropping too far
The unknown protocol is used as a fallback when a protocol isn't known.
Hence we cannot handle it failing, so don't set ".me". It's OK, since we
only grab a reference from within the same module (iptable_nat.ko), so we
never take the module refcount from 0 to 1.
Also, remove the "protocol is NULL" test: it's never NULL.
Signed-off-by: Rusty Rusty <rusty@rustcorp.com.au> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Trond Myklebust [Fri, 4 Nov 2005 20:38:11 +0000 (15:38 -0500)]
NFSv4: Recover locks too when returning a delegation
Delegations allow us to cache posix and BSD locks, however when the
delegation is recalled, we need to "flush the cache" and send
the cached LOCK requests to the server.
Trond Myklebust [Fri, 4 Nov 2005 20:35:02 +0000 (15:35 -0500)]
NFSv4: Return any delegations before sillyrenaming the file
I missed this one... Any form of rename will result in a delegation
recall, so it is more efficient to return the one we hold before
trying the rename.
Trond Myklebust [Fri, 4 Nov 2005 20:33:38 +0000 (15:33 -0500)]
NFSv4: Fix problem with OPEN_DOWNGRADE
RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny
bits specified must be exactly equal to the union of the share_access and
share_deny bits specified for some subset of the OPENs in effect for
current openowner on the current file.
Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that
it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to
OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file
with O_WRONLY access mode.
Fix the problem by replacing nfs4_find_state() with a modified version of
nfs_find_open_context().
Nicolas Pitre [Fri, 4 Nov 2005 17:17:30 +0000 (17:17 +0000)]
[ARM] 3097/1: change library link ordering
Patch from Nicolas Pitre
We have an optimized sha1 routine (arch/arm/lib/sha1.S) meant to
override the generic one in lib/sha1.c.
Unfortunately lib/lib.a is listed _before_ arch/arm/lib/lib.a in the
link argument list and therefore the architecture specific lib functions
are not picked up before the generic versions.
This patch is a quick fix to change that ordering for ARM. Here's what
the kbuild maintainer had to say about it (was also CC'd on lkml):
On Wed, 2 Nov 2005, Sam Ravnborg wrote:
> This looks like an obvious way to achive correct ordering.
> We could change it so arch defines always took precedence but
> the above is so simple that it is not worth the effort.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add platform devices for flash to Lubbock and Mainstone board files.
Once in place, the two existing mtd map drivers for the boards will be
converted to use a single pxa2xx map driver in the linux-mtd tree.
Take 4: flash_platform_data .map_name vs. .name cleaned up, resync with
merged irda patch context.
Signed-off-by: Todd Poynor <tpoynor@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Jiang [Fri, 4 Nov 2005 17:15:44 +0000 (17:15 +0000)]
[ARM] 3086/1: ixp2xxx error irq handling
Patch from Dave Jiang
This provides support for IXP2xxx error interrupt handling. Previously there was a patch to remove this (although the original stuff was broken). Well, now the error bits are needed again. These are used extensively by the micro-engine drivers according to Deepak and also we will need it for the new EDAC code that Alan Cox is trying to push into the main kernel.
Re-submit of 3072/1, generated against git tree pulled today. AFAICT, this git tree pulled in all the ARM changes that's in arm.diff. Please let me know if there are additional changes. Thx!
Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Fri, 4 Nov 2005 17:15:43 +0000 (17:15 +0000)]
[ARM] 3094/1: remove PLD stuff from old uaccess code
Patch from Nicolas Pitre
ARM processors that have pld instructions are not using those copy_user
implementation anymore. Let's remove the useless PLD lines which were
half wrong anyway.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Thu, 3 Nov 2005 20:40:50 +0000 (20:40 +0000)]
[ARM] 3092/1: remove excessive print format padding
Patch from Nicolas Pitre
Using a llx format to print addresses that might possibly be (only) 36
bits wide make sense. However making it a zero padded 16 char wide
field is a bit excessive and useless.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Make "QoS and/or fair queueing" have its own menu, it's too big to be
inlined into "Network options". Remove the obsolete NET_QOS option.
Automatically select NET_CLS if needed. Do the same for NET_ESTIMATOR
but allow it to be selected manually for statistical purposes. Add
comments to separate queueing from classification. Fix dependencies
and ordering of classifiers. Improve descriptions/help texts and
remove outdated pieces.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Alexandre Oliva [Mon, 31 Oct 2005 20:29:36 +0000 (18:29 -0200)]
[PATCH] x86-64: bitops fix for -Os
This fixes the x86-64 find_[first|next]_zero_bit() function for the
end-of-range case. It didn't test for a zero size, and the "rep scas"
would do entirely the wrong thing.
Nathan Scott [Thu, 3 Nov 2005 02:55:06 +0000 (13:55 +1100)]
[XFS] fix XFS quota for modular XFS builds
Cannot build XFS filesystem support as module with quota support. It
works only when the XFS filesystem support is compiled into the kernel.
Menuconfig prevents from setting CONFIG_XFS_FS=m and CONFIG_XFS_QUOTA=y.
How to reproduce: configure the XFS filesystem with quota support as
module. The resulting kernel won't have quota support compiled into
xfs.ko.
Fix: Changing the fs/xfs/Kconfig file from tristate to bool lets you
configure the quota support to be compiled into the XFS module. The
Makefile-linux-2.6 checks only for CONFIG_XFS_QUOTA=y.
Signed-off-by: Dimitri Puzin <tristan-777@ddkom-online.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Nathan Scott <nathans@sgi.com>
Nathan Scott [Thu, 3 Nov 2005 02:53:34 +0000 (13:53 +1100)]
[XFS] Add a mechanism for XFS to use the generic quota sync method.
This is now used to issue a delayed allocation flush before reporting
quota, which allows the used space quota report to match reality.
Fix up etherdevice docbook comments and make them (and other networking stuff)
get dragged into the kernel-api. Delete the old 8390 stuff, it really isn't
interesting anymore.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Optimize the match for broadcast address by using bit operations instead
of comparison. This saves a number of conditional branches, and generates
smaller code.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
The max growth of BIC TCP is too large. Original code was based on
BIC 1.0 and the default there was 32. Later code (2.6.13) included
compensation for delayed acks, and should have reduced the default
value to 16; since normally TCP gets one ack for every two packets sent.
The current value of 32 makes BIC too aggressive and unfair to other
flows.
Submitted-by: Injong Rhee <rhee@eos.ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Yan Zheng [Fri, 28 Oct 2005 00:02:08 +0000 (08:02 +0800)]
[MCAST]: ip[6]_mc_add_src should be called when number of sources is zero
And filter mode is exclude.
Further explanation by David Stevens:
Multicast source filters aren't widely used yet, and that's really the only
feature that's affected if an application actually exercises this bug, as far
as I can tell. An ordinary filter-less multicast join should still work, and
only forwarded multicast traffic making use of filters and doing empty-source
filters with the MSFILTER ioctl would be at risk of not getting multicast
traffic forwarded to them because the reports generated would not be based on
the correct counts.
Signed-off-by: Yan Zheng <yanzheng@21cn.com Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Russell King [Wed, 2 Nov 2005 14:11:35 +0000 (14:11 +0000)]
[ARM] Fix mm initialisation with write buffered write allocate caches
It seems that without the extra tlb flush, we may end up faulting
during the early kernel initialisation because the TLB can't see
the updated page tables.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
John Bowler [Wed, 2 Nov 2005 11:55:12 +0000 (11:55 +0000)]
[ARM] 3083/1: include/asm-arm/arch-ixp4xx/io.h: eliminate warnings for pointer passed to integral function argument
Patch from John Bowler
Fix for a compiler warning, this wasn't apparent in 2.6.12, I
believe the compiler options have been changed (somewhere) so
that passing a (void*) to a (u32) argument is now warned.
This accounts for the majority of the warnings in my builds of
the 2.6.14 kernel for NSLU2.
The patch changes pointer parameters declared as u32 to be
declared as either, for read parameters:
const volatile void __iomem *
and for write parameters:
volatile void __iomem *
Signed-off-by: John Bowler <jbowler@acm.org> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tejun Heo [Tue, 1 Nov 2005 08:23:49 +0000 (17:23 +0900)]
[PATCH] blk: fix dangling pointer access in __elv_add_request
cfq's add_req_fn callback may invoke q->request_fn directly and
depending on low-level driver used and timing, a queued request may be
finished & deallocated before add_req_fn callback returns. So,
__elv_add_request must not access rq after it's passed to add_req_fn
callback.
This patch moves rq_mergeable test above add_req_fn(). This may
result in q->last_merge pointing to REQ_NOMERGE request if add_req_fn
callback sets it but as RQ_NOMERGE is checked again when blk layer
actually tries to merge requests, this does not cause any problem.
Santiago Leon [Tue, 1 Nov 2005 19:15:09 +0000 (14:15 -0500)]
[PATCH] ibmveth fix panic in initial replenish cycle
This patch fixes a panic in the current tree caused by a race condition between the initial replenish cycle and the rx processing of the first packets trying to replenish the buffers.
Signed-off-by: Santiago Leon <santil@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>