Make "QoS and/or fair queueing" have its own menu, it's too big to be
inlined into "Network options". Remove the obsolete NET_QOS option.
Automatically select NET_CLS if needed. Do the same for NET_ESTIMATOR
but allow it to be selected manually for statistical purposes. Add
comments to separate queueing from classification. Fix dependencies
and ordering of classifiers. Improve descriptions/help texts and
remove outdated pieces.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Fix up etherdevice docbook comments and make them (and other networking stuff)
get dragged into the kernel-api. Delete the old 8390 stuff, it really isn't
interesting anymore.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Optimize the match for broadcast address by using bit operations instead
of comparison. This saves a number of conditional branches, and generates
smaller code.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
The max growth of BIC TCP is too large. Original code was based on
BIC 1.0 and the default there was 32. Later code (2.6.13) included
compensation for delayed acks, and should have reduced the default
value to 16; since normally TCP gets one ack for every two packets sent.
The current value of 32 makes BIC too aggressive and unfair to other
flows.
Submitted-by: Injong Rhee <rhee@eos.ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Yan Zheng [Fri, 28 Oct 2005 00:02:08 +0000 (08:02 +0800)]
[MCAST]: ip[6]_mc_add_src should be called when number of sources is zero
And filter mode is exclude.
Further explanation by David Stevens:
Multicast source filters aren't widely used yet, and that's really the only
feature that's affected if an application actually exercises this bug, as far
as I can tell. An ordinary filter-less multicast join should still work, and
only forwarded multicast traffic making use of filters and doing empty-source
filters with the MSFILTER ioctl would be at risk of not getting multicast
traffic forwarded to them because the reports generated would not be based on
the correct counts.
Signed-off-by: Yan Zheng <yanzheng@21cn.com Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Tejun Heo [Tue, 1 Nov 2005 08:23:49 +0000 (17:23 +0900)]
[PATCH] blk: fix dangling pointer access in __elv_add_request
cfq's add_req_fn callback may invoke q->request_fn directly and
depending on low-level driver used and timing, a queued request may be
finished & deallocated before add_req_fn callback returns. So,
__elv_add_request must not access rq after it's passed to add_req_fn
callback.
This patch moves rq_mergeable test above add_req_fn(). This may
result in q->last_merge pointing to REQ_NOMERGE request if add_req_fn
callback sets it but as RQ_NOMERGE is checked again when blk layer
actually tries to merge requests, this does not cause any problem.
Santiago Leon [Tue, 1 Nov 2005 19:15:09 +0000 (14:15 -0500)]
[PATCH] ibmveth fix panic in initial replenish cycle
This patch fixes a panic in the current tree caused by a race condition between the initial replenish cycle and the rx processing of the first packets trying to replenish the buffers.
Signed-off-by: Santiago Leon <santil@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Wright [Tue, 1 Nov 2005 07:44:28 +0000 (23:44 -0800)]
[PATCH] TPM compile fix
CC drivers/char/tpm/tpm_nsc.o
drivers/char/tpm/tpm_nsc.c:277: error: `platform_bus_type' undeclared here (not in a function)
...
CC drivers/char/tpm/tpm_atmel.o
drivers/char/tpm/tpm_atmel.c:175: error: `platform_bus_type' undeclared here (not in a function)
Make sure to include proper headers.
Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Wed, 2 Nov 2005 05:10:22 +0000 (15:10 +1000)]
[PATCH] m68knommu: add 5208 ColdFire pit interrupt support
The PIT timer in the 5208 ColdFire has slightly different interrupt
bit definitions than the PIT timer used on other ColdFire parts.
Define the commonly used bit and mask numbers here, and let
part specific defines take precedence if they are defined.
Patch originally from Matt Wadell (from code originally written by
Mike Lavender).
Greg Ungerer [Wed, 2 Nov 2005 05:02:01 +0000 (15:02 +1000)]
[PATCH] m68knommu: add 5208 ColdFire support defines
Add support for the internal register map of the 5208 ColdFire fmaily.
Patch originally from Matt Wadell (from code originally written by
Mike Lavender).
Roland Dreier [Sat, 29 Oct 2005 04:50:35 +0000 (21:50 -0700)]
[PATCH] toshiba_ohci1394_dmi_table should be __devinitdata, not __devinit
I don't really understand why gcc gives the error it does, but without
this patch, when building with CONFIG_HOTPLUG=n, I get errors like:
CC arch/x86_64/pci/../../i386/pci/fixup.o
arch/x86_64/pci/../../i386/pci/fixup.c: In function `pci_fixup_i450nx':
arch/x86_64/pci/../../i386/pci/fixup.c:13: error: pci_fixup_i450nx causes a section type conflict
The change is obviously correct: an array should be declared
__devinitdata rather that __devinit.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Martin J. Bligh <mbligh@mbligh.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Deepak Saxena [Tue, 1 Nov 2005 22:32:12 +0000 (22:32 +0000)]
[ARM] 3081/1: Remove GTWX5715 from ixp4xx_defconfig
Patch from Deepak Saxena
CONFIG_MACH_GTWX5715 hardcodes the machine type in head-xscale.S so we
can no longer boot on any other machine types. The proper fix would be
to remove the hardcoding, but that machine is an off-the-shelf system
and most users won't have access to the bootloader. :(
Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dan Williams [Tue, 1 Nov 2005 22:31:12 +0000 (22:31 +0000)]
[ARM] 3079/1: Fix typo in i2c-iop3xx.c (invalid pointer passed to release_mem_region)
Patch from Dan Williams
* If request_irq fails then a call to release_mem_region will be made with an invalid pointer.
* Two formatting fixes
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds a microcode loader for the ixp2000 architecture.
The ixp2000 is an xscale-based CPU with a number of additional small
CPUs ('microengines') on die that can be programmed to do various
things. Depending on the ixp2000 model, there are between 2 and 16
microengines.
This code provides an API that allows configuring the microengines,
loading code into them, and starting and stopping them and reading
out a number of status registers, and is used by the microengine
network driver that was recently announced to netdev.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Tue, 1 Nov 2005 19:52:24 +0000 (19:52 +0000)]
[ARM] 2948/1: new preemption safe copy_{to|from}_user implementation
Patch from Nicolas Pitre
This patch provides a preemption safe implementation of copy_to_user
and copy_from_user based on the copy template also used for memcpy.
It is enabled unconditionally when CONFIG_PREEMPT=y. Otherwise if the
configured architecture is not ARMv3 then it is enabled as well as it
gives better performances at least on StrongARM and XScale cores. If
ARMv3 is not too affected or if it doesn't matter too much then
uaccess.S could be removed altogether.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Tue, 1 Nov 2005 19:52:23 +0000 (19:52 +0000)]
[ARM] 2947/1: copy template with new memcpy/memmove
Patch from Nicolas Pitre
This patch provides a new implementation for optimized memory copy
functions on ARM. It is made of two levels: a template that consists of
the core copy code and separate files that define macros to be used with
the core code depending on the type of copy needed. This allows for best
performances while sharing the same core for implementing memcpy(),
copy_from_user() and copy_to_user() for instance.
Two reasons for this work:
1) the current copy_to_user/copy_from_user implementation assumes no
task switch will ever occur in the middle of each copied page making
it completely unsafe with CONFIG_PREEMPT=y.
2) current copy implementations are measurably suboptimal and optimizing
different implementations separately is a pain and more opportunities
for bugs.
The reason for (1) is the fact that copy inside user pages are performed
with the ldm instruction which has no mean for testing user protections
and could possibly race with process preemption bypassing the COW mechanism
for example. This is a longstanding issue that we said ought to be fixed
for about two years now. The solution is to substitute those ldm insns
with a series of ldrt or strt insns to enforce user memory protection.
At least on StrongARM and XScale cores the ldm is not faster than the
equivalent ldr/str insns with a warm i-cache so there is no measurable
performance degradation with that change. The fact that the copy code is
a template makes it pretty easy to reuse the same core code as for memcpy
and benefit from the same performance optimizations.
Now (2) is best demonstrated with actual throughput measurements.
First, here is a summary of memcopy tests performed on a StrongARM core:
The buffer size is in bytes and the measured speed in MB/s. The copy
was performed repeatedly with given buffer and throughput averaged over
3 seconds.
Here we can see that the current kernel version has a higher entry cost
that shows up with small buffers. As buffer size grows both implementation
converge to the same throughput.
Now here's the exact same test performed on an XScale core (PXA255):
Again we can see the entry setup cost being higher for the current kernel
before getting to the main copy loop. Then throughput results converge
as long as the buffer remains in the cache. Then the 1MB case shows more
differences probably due to better pld placement and/or less instruction
interlocks in this proposed implementation.
Disclaimer: The PXA system was running with slower clocks than the
StrongARM system so trying to infer any conclusion by comparing those
separate sets of results side by side would be completely inappropriate.
So... What this patch does is to replace both memcpy and memmove with
an implementation based on the provided copy code template. The memmove
code is kept separate since it is used only if the memory areas involved
do overlap in which case the code is a transposition of the template but
with the copy occurring in the opposite direction (trying to fit that
mode into the template turned it into a mess not worth it for memmove
alone). And obviously both memcpy and memmove were tested with all kinds
of pointer alignments and buffer sizes to exercise all code paths for
correctness.
The next patch will provide the now trivial replacement implementation
copy_to_user and copy_from_user.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Switch the users of ixp2000_reg_write that depend on writes being
flushed out of the write buffer by the time that function returns
over to ixp2000_reg_wrb.
When using XCB=101, writes to the same functional unit are still
guaranteed to complete in order, so we only need to protect against:
- reordering of writes to different functional units
- masking an interrupt and then reenabling the IRQ bit in CPSR
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On the ixdp2x00, the slave CPU is currently not allowed to reset itself
for fear that it will do something 'funky' on the PCI bus. This fear is
ungrounded -- the slave CPU is wired up such that a CPU reset will not
cause a PCI bus reset to be done. This patch changes arch_reset() so
that the slave CPU also executes the reset sequence, allowing it to
reboot itself using /sbin/reboot.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 3062/1: map in various enp2611 peripherals for the ixp2000 netdev driver
Patch from Lennert Buytenhek
The enp2611 version of the ixp2000 netdev driver needs to be able to
access a number of on-board peripherals. ioremap() is not suitable
for this, as that will cause XCB=000 mappings to be done, which will
make the cpu susceptible to crashing on ixp2400 erratum #66. Properly
aligned iotable mappings with MT_IXP2000_DEVICE will cause section
mappings with XCB=101 to be done, which is safe.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Wim Van Sebroeck [Sun, 23 Oct 2005 13:21:44 +0000 (15:21 +0200)]
[WATCHDOG] adds device_driver .owner field
Initialise the .owner field of the device driver
with the module that owns it, for easier tracking
of device driver ownership. (probably also better
for sysfs...)
Jens Axboe [Tue, 1 Nov 2005 08:26:16 +0000 (09:26 +0100)]
[BLOCK] Unify the seperate read/write io stat fields into arrays
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).
Jens Axboe [Tue, 1 Nov 2005 07:35:42 +0000 (08:35 +0100)]
[BLOCK] Update read/write block io statistics at completion time
Right now we do it at queueing time, which works alright for reads
(since they are usually sync), but not for async writes since we can
queue io a lot faster than we can complete it. This makes the vmstat
output look extremely bursty.
Linus Torvalds [Tue, 1 Nov 2005 05:12:40 +0000 (21:12 -0800)]
Don't touch USB controller IO registers when they are disabled
The USB "handoff" code is an early PCI quirk to make sure we own the USB
controller (as opposed to the BIOS/SMM). But if the controller isn't
even enabled yet, don't try to access it.
Acked-by: Paul Mackerras <paulus@samba.org> (who had an alternate patch) Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Tue, 1 Nov 2005 03:16:17 +0000 (19:16 -0800)]
Revert "i386: move apic init in init_IRQs"
Commit f2b36db692b7ff6972320ad9839ae656a3b0ee3e causes a bootup hang on
at least one machine. Revert for now until we understand why. The old
code may be ugly, but it works.
Herbert Xu [Sun, 30 Oct 2005 00:20:59 +0000 (11:20 +1100)]
[DCCP]: Set socket owner iff packet is not data
Here is a complimentary insurance policy for those feeling a bit insecure.
You don't have to accept this. However, if you do, you can't blame me for
it :)
> 1) dccp_transmit_skb sets the owner for all packets except data packets.
We can actually verify this by looking at pkt_type.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This adds the magic IO wakeup code for the CardBus version of the
Creative Labs Audigy 2 to the snd-emu10k1 driver.
Without the magic IO enable sequence, reading from the IO region of the
card will fail spectacularly, and the machine will hang.
My next task will be getting the driver to actually play sound without
distortion.
Signed-off-by: James Courtier-Dutton <James@superbug.co.uk>
[ This is a work-in-progress, but since it avoids a total lockup
if the emu10k module is loaded on a machine with the cardbus
card inserted, we're better off with it than without it, even
if sound quality is bad right now ]
Andrea Arcangeli [Mon, 31 Oct 2005 22:08:54 +0000 (14:08 -0800)]
[PATCH] fix __writeback_single_inode WARN_ON
When the inode count is zero in inode writeback, the
WARN_ON(!(inode->i_state & I_WILL_FREE));
is broken, and needs to test for either I_WILL_FREE|I_FREEING.
When the inode is in I_FREEING state, it's already out of the visibility
of the vm so it can't be freed so it doesn't require the __iget and the
generic_delete_inode path can call the sync internally to the lowlevel
fs callback during the last iput. So the inode being in I_FREEING is
also a valid condition for calling the sync with i_count == 0.
Herbert Xu [Sun, 30 Oct 2005 00:20:59 +0000 (11:20 +1100)]
[DCCP]: Simplify skb_set_owner_w semantics
While we're at it let's reorganise the set_owner_w calls a little so that:
1) dccp_transmit_skb sets the owner for all packets except data packets.
2) Add dccp_skb_entail to set owner for packets queued for retransmission.
3) Make dccp_transmit_skb static.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Yan Zheng [Fri, 28 Oct 2005 22:12:00 +0000 (15:12 -0700)]
[IPV6]: Fix behavior of ip6_route_input() for link local address
I find that linux will reply echo request destined to an address which
belongs to an interface other than the one from which the request received.
This behavior doesn't make sense for link local address.
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> said:
Please note that sender does need to setup neighbor entry by hand to reproduce
this bug. (Link-local address on eth1 is not visible on eth0, from the point
of view of neighbor discovery in IPv6.)
Harald Welte [Wed, 26 Oct 2005 07:34:24 +0000 (09:34 +0200)]
[NETFILTER]: Add "revision" support to arp_tables and ip6_tables
Like ip_tables already has it for some time, this adds support for
having multiple revisions for each match/target. We steal one byte from
the name in order to accomodate a 8 bit version number.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Arthur Othieno [Mon, 31 Oct 2005 04:04:05 +0000 (23:04 -0500)]
[PATCH] i386: CONFIG_PC removal
CONFIG_PC is left-over cruft after the introduction of CONFIG_X86_PC with
the subarch split. Remove it, and fixup the remaining users to depend on
CONFIG_X86_PC instead.
Signed-off-by: Arthur Othieno <a.othieno@bluewin.ch> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"I'm currently debugging this. The problem is that we are using the
generic dispatch queue directly in the noop sched and merging is NOT
allowed on dispatch queues but generic handling of last_merge tries
to merge requests. I'm still trying to verify this, so I'll be back
with results soon."
In the meantime, disable merging for noop by setting REQ_NOMERGE in
elevator_noop_add_request().
Eventually, we should add a noop_list and do the dispatching like in the
other io schedulers. Merging is still beneficial for noop (and it has
always done it).
Jeff Garzik [Mon, 31 Oct 2005 04:31:48 +0000 (23:31 -0500)]
[libata] locking rewrite (== fix)
A lot of power packed into a little patch.
This change eliminates the sharing between our controller-wide spinlock
and the SCSI core's Scsi_Host lock. As the locking in libata was
already highly compartmentalized, always referencing our own lock, and
never scsi_host::host_lock.
As a side effect, this change eliminates a deadlock from calling
scsi_finish_command() while inside our spinlock.
Paul Mackerras [Mon, 31 Oct 2005 02:07:02 +0000 (13:07 +1100)]
powerpc: Fix bug arising from having multiple memory_limit variables
We had a static memory_limit in prom.c, and then another one defined
in setup_64.c and used in numa.c, which resulted in the kernel crashing
when mem=xxx was given on the command line. This puts the declaration
in system.h and the definition in mem.c. This also moves the
definition of tce_alloc_start/end out of setup_64.c.