David S. Miller [Tue, 5 Jul 2005 22:43:58 +0000 (15:43 -0700)]
[TCP]: Never TSO defer under periods of congestion.
Congestion window recover after loss depends upon the fact
that if we have a full MSS sized frame at the head of the
send queue, we will send it. TSO deferral can defeat the
ACK clocking necessary to exit cleanly from recovery.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:24:38 +0000 (15:24 -0700)]
[TCP]: Move to new TSO segmenting scheme.
Make TSO segment transmit size decisions at send time not earlier.
The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.
This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.
Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.
A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:19:54 +0000 (15:19 -0700)]
[TCP]: Break out tcp_snd_test() into it's constituent parts.
tcp_snd_test() does several different things, use inline
functions to express this more clearly.
1) It initializes the TSO count of SKB, if necessary.
2) It performs the Nagle test.
3) It makes sure the congestion window is adhered to.
4) It makes sure SKB fits into the send window.
This cleanup also sets things up so that things like the
available packets in the congestion window does not need
to be calculated multiple times by packet sending loops
such as tcp_write_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:19:38 +0000 (15:19 -0700)]
[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.
'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue. This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.
However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.
Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.
As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:18:51 +0000 (15:18 -0700)]
[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().
The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it. tcp_write_xmit() does the call for us,
so the call here can simply be removed.
Also, tcp_write_xmit() can be marked static.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:18:34 +0000 (15:18 -0700)]
[TCP]: Add missing skb_header_release() call to tcp_fragment().
When we add any new packet to the TCP socket write queue,
we must call skb_header_release() on it in order for the
TSO sharing checks in the drivers to work.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:18:03 +0000 (15:18 -0700)]
[TCP]: Move send test logic out of net/tcp.h
This just moves the code into tcp_output.c, no code logic changes are
made by this patch.
Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al. We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:17:45 +0000 (15:17 -0700)]
[TCP]: Fix quick-ack decrementing with TSO.
On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set. It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.
When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.
Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration. That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:17:25 +0000 (15:17 -0700)]
[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.
The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.
This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.
So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom(). This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.
Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.
Signed-off-by: David S. Miller <davem@davemloft.net>
I suspect "#define __ARGS(x) ()" was deprecated before I was born.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave, you were right and the sleeping locks in shaper were
broken. Markus Kanet noticed this and also tested the patch below that
switches locking to spinlocks.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Olsson [Tue, 5 Jul 2005 22:02:40 +0000 (15:02 -0700)]
[IPV4]: More broken memory allocation fixes for fib_trie
Below a patch to preallocate memory when doing resize of trie (inflate halve)
If preallocations fails it just skips the resize of this tnode for this time.
The oops we got when killing bgpd (with full routing) is now gone.
Patrick memory patch is also used.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Jul 2005 21:55:24 +0000 (14:55 -0700)]
[NET]: Hashed spinlocks in net/ipv4/route.c
- Locking abstraction
- Spinlocks moved out of rt hash table : Less memory (50%) used by rt
hash table. it's a win even on UP.
- Sizing of spinlocks table depends on NR_CPUS
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 5 Jul 2005 21:44:55 +0000 (14:44 -0700)]
[IPV4]: Handle large allocations in fib_trie
Inflating a node a couple of times makes it exceed the 128k kmalloc limit.
Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 5 Jul 2005 21:15:53 +0000 (14:15 -0700)]
[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation
Current behaviour is to not report an error if a rate
estimator is created together with a qdisc and the
configuration of the rate estimator is bogus. This leads
to unexpected behaviour because the user is not notified.
New behaviour is to report the error and let the whole
qdisc creation operation fail so the user is able to fix
his mistake.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 5 Jul 2005 21:13:41 +0000 (14:13 -0700)]
[NET]: Reduce size of sk_buff by 4 bytes
Reduce local_df to a bit field and ip_summed to a 2 bits
field thus saving 13 bits. Move bit fields, packet type,
and protocol into the spare area between the priority
and the destructor. Saves 4 bytes on both, 32bit and
64bit architectures.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 5 Jul 2005 21:08:57 +0000 (14:08 -0700)]
[NET]: Remove redundant code in net/core/filter.c
skb_header_pointer handles linear and non-linear data, no need to handle
linear data again.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 5 Jul 2005 21:08:10 +0000 (14:08 -0700)]
[NET]: Fix signedness issues in net/core/filter.c
This is the code to load packet data into a register:
k = fentry->k;
if (k < 0) {
...
} else {
u32 _tmp, *p;
p = skb_header_pointer(skb, k, 4, &_tmp);
if (p != NULL) {
A = ntohl(*p);
continue;
}
}
skb_header_pointer checks if the requested data is within the
linear area:
int hlen = skb_headlen(skb);
if (offset + len <= hlen)
return skb->data + offset;
When offset is within [INT_MAX-len+1..INT_MAX] the addition will
result in a negative number which is <= hlen.
I couldn't trigger a crash on my AMD64 with 2GB of memory, but a
coworker tried on his x86 machine and it crashed immediately.
This patch fixes the check in skb_header_pointer to handle large
positive offsets similar to skb_copy_bits. Invalid data can still
be accessed using negative offsets (also similar to skb_copy_bits),
anyone using negative offsets needs to verify them himself.
Thanks to Thomas Vögtle <thomas.voegtle@coreworks.de> for verifying the
problem by crashing his machine and providing me with an Oops.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[PATCH] ARM: 2784/1: Fix the block cache flush operation range
Patch from Catalin Marinas
The range for the ARMv6 block cache operations is inclusive but the
kernel doesn't re-calculate the end address, causing a page fault when
used (this only happens with support for cache aliasing, otherwise the
blk_flush_kern_dcache_page() is not called). This patch subtracts
L1_CACHE_BYTES from the end address.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ben Dooks [Sun, 3 Jul 2005 16:44:40 +0000 (17:44 +0100)]
[PATCH] ARM: 2785/1: S3C24XX - serial calls request_irq() with IRQs disabled
Patch from Ben Dooks
The request_irq() function is called by s3c24xx uart driver with
the local IRQs disabled. The request_irq() function can allocate
memory via kmalloc(), and this may sleep causing a warning about
sleeping in an invalid context.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rob Punkunus [Sun, 3 Jul 2005 15:37:18 +0000 (17:37 +0200)]
[PATCH] amd74xx: support MCP55 device IDs
From: Rob Punkunus <rpunkunus@nvidia.com>
Rob Punkunus recently submitted a patch to enable support for MCP51/MCP55 in
the amd74xx driver. This patch was whitespace-corrupted and didn't apply to
2.6.12 since MCP51 support was merged in the 2.6.12-rc series.
Gentoo would like to support this hardware for our upcoming release media, so
I fixed the patch, and here it is :)
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Ivan Kokshaysky [Fri, 1 Jul 2005 12:46:26 +0000 (16:46 +0400)]
[PATCH] alpha smp fix (part #2)
This fixes the bug that caused BUG_ON(!irqs_disabled()) to trigger in
run_posix_cpu_timers() on alpha/smp. We didn't disable interrupts
properly before calling smp_percpu_timer_interrupt().
We *do* disable interrupts everywhere except this unfortunate
smp_percpu_timer_interrupt(). Fixed thus.
[PATCH] ARM: replace schedule_timeout() with msleep()
Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. Neither signals nor wait-queue events are
important at this point in the code, I believe.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ben Dooks [Fri, 1 Jul 2005 10:27:06 +0000 (11:27 +0100)]
[PATCH] ARM: 2783/1: Remove omnimeter_defconfig as there is no kernel support
Patch from Ben Dooks
The omnimeter_defconfig does not define any machines and
seems to have no other support in the current kernel.
This patch removes the config file, as this is the only
thing currently mentioning the ominmeter.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add support for PXA27x Standby mode, a low-power mode that retains CPU
and some peripheral state (the existing "sleep" mode is a power-power
mode that retains less state). Activated via:
echo -n standby > /sys/power/state
From: David Burrage and Todd Poynor
Signed-off-by: Todd Poynor <tpoynor@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ivan Kokshaysky [Thu, 30 Jun 2005 16:02:18 +0000 (20:02 +0400)]
[PATCH] alpha smp fix
As usual, the reason of this breakage is quite silly: in do_entIF, we
are checking for PS == 0 to see whether it was a kernel BUG() or
userspace trap.
It works, unless BUG() happens in interrupt - PS is not 0 in kernel mode
due to non-zero IPL, and the things get messed up horribly then. In
this particular case it was BUG_ON(!irqs_disabled()) triggered in
run_posix_cpu_timers(), so we ended up shooting "current" with the
bursts of one SIGTRAP and three SIGILLs on every timer tick. ;-)
This patch calculates the AFS partition length by expanding the image
length information to the nearest erase block boundary. This
eliminates the problems with JFFS2 erasing the footer.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 30 Jun 2005 21:41:22 +0000 (22:41 +0100)]
[PATCH] Serial: Fix small CONFIG_SERIAL_8250_NR_UARTS
If CONFIG_SERIAL_8250_NR_UARTS is smaller than the array size in
asm/serial.h, we trampled on memory which wasn't ours. Take our
big boots away by limiting the number of ports initialised to the
smaller of ...NR_UARTS and the array size.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Catalin Marinas [Thu, 30 Jun 2005 16:04:14 +0000 (17:04 +0100)]
[PATCH] ARM: 2779/1: Fix the V bit setting for the ARM1020x CPUs
Patch from Catalin Marinas
This patch fixes the V bit setting for the ARM1020x processors. At
reset, this bit is automatically set to the value of the HIVECSINIT
input signal which just happened to be 1 but it is not mandatory.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Catalin Marinas [Thu, 30 Jun 2005 16:04:14 +0000 (17:04 +0100)]
[PATCH] ARM: 2778/1: Add -mno-thumb-interwork to CFLAGS_ABI
Patch from Catalin Marinas
The new EABI gcc adds -mthumb-interwork by default, even if
-mabi=apcs-gnu is passed. This causes a warning for every compiled C
file when -march=armv4 is used. The patch adds -mno-thumb-interwork
if the option is supported. This is also useful since we don't need
any ARM/Thumb interworking in the kernel
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pekka Enberg [Thu, 30 Jun 2005 09:59:05 +0000 (02:59 -0700)]
[PATCH] freevxfs: minor cleanups
This patch addresses the following minor issues:
- Typo in printk
- Redundant casts
- Use C99 struct initializers instead of memset
- Parenthesis around return value
- Use inline instead of __inline__
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Lan [Thu, 30 Jun 2005 09:59:03 +0000 (02:59 -0700)]
[PATCH] Improper initrd failure message at boot time
On system boot up, there was an failure reported to boot.msg:
<5>Trying to move old root to /initrd ... failed
According to initrd(4) man page, step #7 of BOOT-UP OPERATION
is described as below:
7. If the normal root file has directory /initrd, device
/dev/ram0 is moved from / to /initrd. Otherwise if
directory /initrd does not exist device /dev/ram0 is
unmounted.
We got service calls from customers concerning about this failure message
at boot time. Many systems do not have /initrd and thus the message can be
changed in the case of non-existing /initrd so that it does not sound like
a failure of the system.
Signed-off-by: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Zankel [Thu, 30 Jun 2005 09:58:59 +0000 (02:58 -0700)]
[PATCH] xtensa: Removed local copy of zlib and fixed O= support
Removed an unnecessary local copy of zlib (sorry for the add'l traffic).
Fixed 'O=' support (thanks to Jan Dittmer for pointing it out). Some minor
clean-ups in the make files.
Signed-off-by: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>