ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.
The commit 77d16f450ae0452d7d4b009f78debb1294fb435c ("[IPV6] ROUTE:
Unify RT6_F_xxx and RT6_SELECT_F_xxx flags") intended to pass various
routing lookup hints around RT6_LOOKUP_F_xxx flags, but conversion was
missing for rt6_device_match().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sat, 28 Jun 2008 03:12:32 +0000 (20:12 -0700)]
netlabel: Fix a problem when dumping the default IPv6 static labels
There is a missing "!" in a conditional statement which is causing entries to
be skipped when dumping the default IPv6 static label entries. This can be
demonstrated by running the following:
... you will notice that the entry for the IPv6 localhost address is not
displayed but does exist (works correctly, causes collisions when attempting
to add duplicate entries, etc.).
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Sat, 28 Jun 2008 03:09:00 +0000 (20:09 -0700)]
net/inet_lro: remove setting skb->ip_summed when not LRO-able
When an SKB cannot be chained to a session, the current code attempts
to "restore" its ip_summed field from lro_mgr->ip_summed. However,
lro_mgr->ip_summed does not hold the original value; in fact, we'd
better not touch skb->ip_summed since it is not modified by the code
in the path leading to a failure to chain it. Also use a cleaer
comment to the describe the ip_summed field of struct net_lro_mgr.
Issue raised by Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 28 Jun 2008 03:06:08 +0000 (20:06 -0700)]
inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild
The problem is that while we work w/o the inet_frags.lock even
read-locked the secret rebuild timer may occur (on another CPU, since
BHs are still disabled in the inet_frag_find) and change the rnd seed
for ipv4/6 fragments.
It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244
([INET]: Omit double hash calculations in xxx_frag_intern) late
in the 2.6.24 kernel, so this should probably be queued to -stable.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Li Zefan [Sat, 28 Jun 2008 03:03:24 +0000 (20:03 -0700)]
CONNECTOR: add a proc entry to list connectors
I got a problem when I wanted to check if the kernel supports process
event connector, and It seems there's no way to do this check.
At best I can check if the kernel supports connector or not, by looking
into /proc/net/netlink, or maybe checking the return value of bind() to
see if it's ENOENT.
So it would be useful to add /proc/net/connector to list all supported
connectors:
# cat /proc/net/connector
Name ID
connector 4294967295:4294967295
cn_proc 1:1
w1 3:1
Changelog:
- fix memory leak: s/seq_release/single_release
- use spin_lock_bh instead of spin_lock_irqsave
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Sat, 28 Jun 2008 02:54:05 +0000 (19:54 -0700)]
pkt_sched: Remove CONFIG_NET_SCH_RR
Commit d62733c8e437fdb58325617c4b3331769ba82d70
([SCHED]: Qdisc changes and sch_rr added for multiqueue)
added a NET_SCH_RR option that was unused since the code
went unconditionally into sch_prio.
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Sat, 28 Jun 2008 02:34:18 +0000 (19:34 -0700)]
af_unix: fix 'poll for write'/connected DGRAM sockets
For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg
routine implements a form of receiver-imposed flow control by
comparing the length of the receive queue of the 'peer socket' with
the max_ack_backlog value stored in the corresponding sock structure,
either blocking the thread which caused the send-routine to be called
or returning EAGAIN. This routine is used by both SOCK_DGRAM and
SOCK_SEQPACKET sockets. The poll-implementation for these socket types
is datagram_poll from core/datagram.c. A socket is deemed to be
writeable by this routine when the memory presently consumed by
datagrams owned by it is less than the configured socket send buffer
size. This is always wrong for PF_UNIX non-stream sockets connected to
server sockets dealing with (potentially) multiple clients if the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an (usual)
application to 'poll for writeability by repeated send request with
O_NONBLOCK set' until it has consumed its time quantum.
The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_recvmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.
Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Octavian Purdila [Sat, 28 Jun 2008 00:27:21 +0000 (17:27 -0700)]
tcp: fix for splice receive when used with software LRO
If an skb has nr_frags set to zero but its frag_list is not empty (as
it can happen if software LRO is enabled), and a previous
tcp_read_sock has consumed the linear part of the skb, then
__skb_splice_bits:
(a) incorrectly reports an error and
(b) forgets to update the offset to account for the linear part
Any of the two problems will cause the subsequent __skb_splice_bits
call (the one that handles the frag_list skbs) to either skip data,
or, if the unadjusted offset is greater then the size of the next skb
in the frag_list, make tcp_splice_read loop forever.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: calculate tcp_mem based on low memory instead of all memory
The tcp_mem array which contains limits on the total amount of memory
used by TCP sockets is calculated based on nr_all_pages. On a 32 bits
x86 system, we should base this on the number of lowmem pages.
Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
mac80211: fix an oops in several failure paths in key allocation
This patch fixes an oops in several failure paths in key allocation. This
Oops occurs when freeing a key that has not been linked yet, so the
key->sdata is not set.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo van Doorn [Wed, 25 Jun 2008 19:27:00 +0000 (21:27 +0200)]
rt2x00: Fix lock dependency errror
This fixes a circular locking dependency in the workqueue handling.
The interface work task uses the mac80211 function
ieee80211_iterate_active_interfaces() which grabs the RTNL lock.
However when the interface is brough down, this happens under the RTNL
lock as well, this causes problems because mac80211 will flush the workqueue
during the ifdown event. This causes mac80211 to wait until the driver has
completed all work which can't finish because it is waiting on the RTNL lock.
This is fixed by moving rt2x00 workqueue tasks on a different workqueue,
this workqueue can be flushed when the ieee80211_hw structure is removed
by the driver (when the driver is unloaded) which does not happen under the
RTNL lock.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Andi Kleen [Wed, 18 Jun 2008 11:58:36 +0000 (13:58 +0200)]
[netdrvr] Fix IOMMU overflow checking in s2io.c
s2io has IOMMU overflow checking, but unfortunately it is wrong.
It didn't use the standard macros, which meant that it only worked
on POWER and SPARC because only those define DMA_ERROR_CODE. Convert it to
use the standard macros instead.
I also commented two more bugs in the IOMMU handling. It assumes
that 0 DMA addresses cannot happen, but that's not true in all IOMMU setups.
The information if a buffer has been already mapped needs to be stored
elsewhere.
Didn't fix those because it needs careful checking of the buffer handling
by the maintainers.
Andy Gospodarek [Thu, 19 Jun 2008 21:19:02 +0000 (17:19 -0400)]
e1000: only enable TSO6 via ethtool when using correct hardware
When enabling TSO via ethool on e1000, it is possible to set
NETIF_F_TSO6 on hardware that does not support it. Setting TSO via
ethtool now matches the settings used when the hardware is probed.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Al Viro [Mon, 23 Jun 2008 01:04:50 +0000 (02:04 +0100)]
[netdrvr] netxen: fix netxen_pci_tbl[] breakage
PCI_DEVICE_CLASS sets .device and .vendor to PCI_ANY_DEV,
which overrides the effect of preceding PCI_DEVICE() and makes
all elements of netxen_pci_tbl[] identical. Introduced in the
commit dcd56fdbaeae1008044687b973c4a3e852e8a726.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Ingo Molnar [Mon, 23 Jun 2008 08:41:23 +0000 (10:41 +0200)]
[netdrvr] 3c59x: remove irqs_disabled warning from local_bh_enable
Original Author: Michael Buesch <mb@bu3sch.de>
net, vortex: fix lockup
Ingo Molnar reported:
-tip testing found that Johannes Berg's "softirq: remove irqs_disabled
warning from local_bh_enable" enhancement to lockdep triggers a new
warning on an old testbox that uses 3c59x vortex and netlogging:
----->
calling vortex_init+0x0/0xb0
PCI: Found IRQ 10 for device 0000:00:0b.0
PCI: Sharing IRQ 10 with 0000:00:0a.0
PCI: Sharing IRQ 10 with 0000:00:0b.1
3c59x: Donald Becker and others.
0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
PCI: Enabling bus mastering for device 0000:00:0b.0
initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
...
calling init_netconsole+0x0/0x1b0
netconsole: local port 4444
netconsole: local IP 10.0.1.9
netconsole: interface eth0
netconsole: remote port 4444
netconsole: remote IP 10.0.1.16
netconsole: remote ethernet address 00:19:xx:xx:xx:xx
netconsole: device eth0 not up yet, forcing it
eth0: setting half-duplex.
eth0: setting full-duplex.
------------[ cut here ]------------
WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
[<c0125ecf>] warn_on_slowpath+0x4f/0x70
[<c0126834>] ? release_console_sem+0x1b4/0x1d0
[<c0126d00>] ? vprintk+0x2a0/0x450
[<c012fde5>] ? __mod_timer+0xa5/0xc0
[<c046f7fd>] ? mdio_sync+0x3d/0x50
[<c0160ef6>] ? marker_probe_cb+0x46/0xa0
[<c0126ed7>] ? printk+0x27/0x50
[<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
[<c046f521>] ? vortex_set_duplex+0xa1/0xc0
[<c0471b92>] ? vortex_timer+0xe2/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b60a>] __do_softirq+0x9a/0x160
[<c012b570>] ? __do_softirq+0x0/0x160
[<c0106775>] call_on_stack+0x15/0x30
[<c012b4f5>] ? irq_exit+0x55/0x60
[<c0106e85>] ? do_IRQ+0x85/0xd0
[<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
[<c0104888>] ? common_interrupt+0x28/0x30
[<c08d8ac8>] ? mutex_unlock+0x8/0x10
[<c08d8180>] ? _cond_resched+0x10/0x30
[<c07a3be7>] ? netpoll_setup+0x117/0x390
[<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
[<c013d539>] ? ktime_get+0x19/0x40
[<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
[<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
[<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0104aa7>] ? kernel_thread_helper+0x7/0x10
=======================
---[ end trace 37f9c502aff112e0 ]---
console [netcon0] enabled
netconsole: network logging started
initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs
looking at the driver I think the bug is real and the fix actually
is trivial.
vp->lock is also taken in hardware IRQ context, so we _have_ to always
use irqsafe locking. As we run in a timer with IRQs disabled,
we can simply use spin_lock.
Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Atsushi Nemoto [Thu, 26 Jun 2008 08:14:15 +0000 (17:14 +0900)]
tc35815: Fix receiver hangup on Rx FIFO overflow
On Rx FIFO overflow error, the controller consume a buffer descriptor
but currently the driver does not give it back to the controller.
This results unrecoverable 'Buffer List Exhausted' condition. This
patch fix this problem by moving a "fbl_count--" line to proper place.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Michal Schmidt [Thu, 26 Jun 2008 14:06:19 +0000 (16:06 +0200)]
s2io: fix documentation about intr_type
The documentation for intr_type module parameter of the s2io driver is
not consistent with the code. The comments in drivers/net/s2io.c are
OK, but Documentation/networking/s2io.txt is wrong.
Pointed out by Andrew Hecox.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Ron Rindjunsky [Wed, 25 Jun 2008 08:46:31 +0000 (16:46 +0800)]
iwlwifi: improve scanning band selection management
This patch modifies the band selection management when scanning, so
bands are now scanned according to HW band support.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo van Doorn [Fri, 20 Jun 2008 20:11:00 +0000 (22:11 +0200)]
rt2x00: Fix unbalanced mutex locking
The usb_cache_mutex was not correctly released
under all circumstances. Both rt73usb as rt2500usb
didn't release the mutex under certain conditions
when the register access failed. Obviously such
failure would lead to deadlocks.
In addition under similar circumstances when the
bbp register couldn't be read the value must be
set to 0xff to indicate that the value is wrong.
This too didn't happen under all circumstances.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Michael Buesch [Fri, 20 Jun 2008 09:40:46 +0000 (11:40 +0200)]
b43legacy: Fix possible NULL pointer dereference in DMA code
This fixes a possible NULL pointer dereference in an error path of the
DMA allocation error checking code. This is also necessary for a future
DMA API change that is on its way into the mainline kernel that adds
an additional dev parameter to dma_mapping_error().
Signed-off-by: Michael Buesch <mb@bu3sch.de> Cc: stable <stable@kernel.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Michael Buesch [Sun, 15 Jun 2008 14:01:24 +0000 (16:01 +0200)]
b43: Fix possible MMIO access while device is down
This fixes a possible MMIO access while the device is still down
from a suspend cycle. MMIO accesses with the device powered down
may cause crashes on certain devices.
Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Michael Buesch [Sun, 15 Jun 2008 13:17:29 +0000 (15:17 +0200)]
b43: Do not return TX_BUSY from op_tx
Never return TX_BUSY from op_tx. It doesn't make sense to return
TX_BUSY, if we can not transmit the packet.
Drop the packet and return TX_OK.
This will fix the resume hang.
Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tony Vroon [Wed, 11 Jun 2008 20:23:56 +0000 (16:23 -0400)]
mac80211: implement EU regulatory domain
Implement missing EU regulatory domain for mac80211. Based on the
information in IEEE 802.11-2007 (specifically pages 1142, 1143 & 1148)
and ETSI 301 893 (V1.4.1).
With thanks to Johannes Berg.
Signed-off-by: Tony Vroon <tony@linx.net> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Wendy Xiong [Tue, 24 Jun 2008 03:36:22 +0000 (20:36 -0700)]
bnx2x: Add PCIE EEH support
Add PCI recovery functions to the driver. The initial PCI state is
also saved so the MSI state can be restored during PCI recovery.
Signed-off-by: Wendy Xiong <wendyx@us.ibm.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Tue, 24 Jun 2008 03:35:51 +0000 (20:35 -0700)]
bnx2x: Enhanced self test
Added registers, memories, loopback, nvram, interrupt and link tests to
the self-test
Signed-off-by: Yitchak Gertner <gertner@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Tue, 24 Jun 2008 03:35:13 +0000 (20:35 -0700)]
bnx2x: Re-factor Tx code
Add support for IPv6 TSO
Re-factor the Tx code with smaller functions to increase readability.
Add linearization code in case packet is too fragmented for the
microcode to handle.
Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The TPA stands for Transparent Packet Aggregation. When enabled, the FW
aggregate in-order TCP packets according to the 4-tuple match and sends
1 big packet to the driver. This packet is stored on an SGL in which
each SGE is 1 page. The FW also implements a timeout algorithm and it
honors all TCP flag, including the push flag as a trigger to halt
aggregation.
After receiving Ben Hutchings comments, we also added ethtool support,
so now, thanks to Ben's patch, when forwarding is enabled, our
aggregation is turned off using the LRO flags.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Tue, 24 Jun 2008 03:33:36 +0000 (20:33 -0700)]
bnx2x: New statistics code
To avoid race conditions with link up/down and driver up/down - the
statistics handling was re-written in a form of state machine.
Also supporting statistics for 57711
Signed-off-by: Yitchak Gertner <gertner@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Tue, 24 Jun 2008 03:33:01 +0000 (20:33 -0700)]
bnx2x: Add support for BCM57711 HW
Supporting the 57711 and 57711E - refers to in the code as E1H. The
57710 is referred to as E1.
To support the new members in the family, the bnx2x structure was
divided to 3 parts: common, port and function. These changes caused some
rearrangement in the bnx2x.h file.
A set of accessories macros were added to make access to the bnx2x
structure more readable
Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Tue, 24 Jun 2008 03:29:02 +0000 (20:29 -0700)]
bnx2x: New init infrastructure
This new initialization code supports the 57711 HW. It also supports
the emulation and FPGA for the 57711 and 57710 initializations values
(very small amount of code which is very helpful in the lab - less
than 30 lines).
The initialization is done via DMAE after the DMAE block is ready -
before it is ready, some of the initialization is done via PCI
configuration transactions (referred to as indirect write). A mutex
to protect the DMAE from being overlapped was added. There are few
new registers which needs to be initialized by SW - the full comment
for those registers is added to the register file. A place holder for
the 57711 (referred to as E1H) microcode was added- the microcode
itself is too big and it is split over the following 4 patches
Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Tue, 24 Jun 2008 03:27:52 +0000 (20:27 -0700)]
bnx2x: New link code
New Link code:
Moving all the link related code (including the calculations, the
initialization of the MAC and PHY and the external PHY's code) into
a separated file. The changes from the code that used to be part of
bnx2x.c (now called bnx2x_main.c) are:
- Using separate structures for link inputs and link outputs to clearly
identify what was configured and what is the outcome
- Adding code to read external PHY FW version and print it as part of
ethtool -i
- Adding code to upgrade external PHY FW from ethtool -E with special
magic number - Changing the link down indication to ERR level
- Adding a lock on all PHY access to prevent an interrupt and
setting changes to overlap
- Adding support for emulation and FPGA (small chunk of code that really
helps in the lab) - Adding support for 1G on BCM8706 PHY
- Adding clear debug print incase of fan failure (the PHY type is now
"failure")
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Tue, 24 Jun 2008 03:27:26 +0000 (20:27 -0700)]
bnx2x: Adding bnx2x_link
This patch is int the new bnx2x_link files (C and H). The files are
still not used in this patch, only in the next one so the patch will
be small enough for the mailing list.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilong Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Receiving packets while we are cleaning up a network namespace is a
racy proposition. It is possible when the packet arrives that we have
removed some but not all of the state we need to fully process it. We
have the choice of either playing wack-a-mole with the cleanup routines
or simply dropping packets when we don't have a network namespace to
handle them.
Since the check looks inexpensive in netif_receive_skb let's just
drop the incoming packets.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix warning:
drivers/net/pppoe.c: In function 'pppoe_recvmsg':
drivers/net/pppoe.c:945: warning: comparison of distinct pointer types lacks a cast
because skb->len is unsigned int and total_len is size_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 19 Jun 2008 23:41:57 +0000 (16:41 -0700)]
bnx2: Use one handler for all MSI-X vectors.
Use the same MSI-X handler to schedule NAPI. Change the dev_instance
void pointer to the bnx2_napi struct instead so we can have the proper
context for each MSI-X vector.
Add a new bnx2_poll_msix() that is optimized for handling MSI-X
NAPI polling of rx/tx work only. Remove the old bnx2_tx_poll() that
is no longer needed. Each MSI-X vector handles 1 tx and 1 rx ring.
The first vector handles link events as well.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 19 Jun 2008 23:41:08 +0000 (16:41 -0700)]
bnx2: Optimize fast-path tx and rx work.
Add hw_tx_cons_ptr and hw_rx_cons_ptr to speed up the retreival of
the tx and rx consumer index, since the MSI-X and default status
blocks have different structures.
Combine status_blk and status_blk_msix into a union. We'll only use
one type of status block for each vector.
Separate the code to detect more rx and tx work from the code to
detect link related work.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 19 Jun 2008 23:38:19 +0000 (16:38 -0700)]
bnx2: Put rx ring variables in a separate struct.
In preparation for multi-ring support, rx ring variables are now put
in a separate bnx2_rx_ring_info struct. With MSI-X, we can support
multiple rx rings.
The functions to allocate/free rx memory and to initialize rx rings
are now modified to handle multiple rings.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 19 Jun 2008 23:37:42 +0000 (16:37 -0700)]
bnx2: Put tx ring variables in a separate struct.
In preparation for multi-ring support, tx ring variables are now put
in a separate bnx2_tx_ring_info struct. Multi tx ring will not be
enabled until it is fully supported by the stack. Only 1 tx ring
will be used at the moment.
The functions to allocate/free tx memory and to initialize tx rings
are now modified to handle multiple rings.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Drop packets for loopback address from outside of the box.
[ Based upon original report and patch by Karsten Keil. Karsten
has verified that this fixes the TAHI test case "ICMPv6 test
v6LC.5.1.2 Part F". -DaveM ]
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shan Wei [Thu, 19 Jun 2008 23:29:39 +0000 (16:29 -0700)]
ipv6: Remove options header when setsockopt's optlen is 0
Remove the sticky Hop-by-Hop options header by calling setsockopt()
for IPV6_HOPOPTS with a zero option length, per RFC3542.
Routing header and Destination options header does the same as
Hop-by-Hop options header.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Thu, 19 Jun 2008 23:15:47 +0000 (16:15 -0700)]
net: Disable LRO on devices that are forwarding
Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded. It can also confuse the GSO on output.
Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.
Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Thu, 19 Jun 2008 23:08:18 +0000 (16:08 -0700)]
sctp: Follow security requirement of responding with 1 packet
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.
We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 19 Jun 2008 23:07:48 +0000 (16:07 -0700)]
sctp: Validate Initiate Tag when handling ICMP message
This patch add to validate initiate tag and chunk type if verification
tag is 0 when handling ICMP message.
RFC 4960, Appendix C. ICMP Handling
ICMP6) An implementation MUST validate that the Verification Tag
contained in the ICMP message matches the Verification Tag of the peer.
If the Verification Tag is not 0 and does NOT match, discard the ICMP
message. If it is 0 and the ICMP message contains enough bytes to
verify that the chunk type is an INIT chunk and that the Initiate Tag
matches the tag of the peer, continue with ICMP7. If the ICMP message
is too short or the chunk type or the Initiate Tag does not match,
silently discard the packet.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 18 Jun 2008 22:39:48 +0000 (15:39 -0700)]
mac80211: detect driver tx bugs
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 18 Jun 2008 09:07:07 +0000 (02:07 -0700)]
netlink: genl: fix circular locking
genetlink has a circular locking dependency when dumping the registered
families:
- dump start:
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
second time
- dump continuance:
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : take genl_mutex
Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Wed, 18 Jun 2008 08:48:28 +0000 (01:48 -0700)]
netdevice: Fix promiscuity and allmulti overflow
Max of promiscuity and allmulti plus positive @inc can cause overflow.
Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a
positive @inc will cause allmulti be off.
This is not what we want, though it's rare case.
The fix is that only negative @inc will cause allmulti or promiscuity be off
and when any caller makes the counters touch the roof, we return error.
Change of v2:
Change void function dev_set_promiscuity/allmulti to return int.
So callers can get the overflow error.
Caller's fix will be done later.
Change of v3:
1. Since we return error to caller, we don't need to print KERN_ERROR,
KERN_WARNING is enough.
2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we
return at once.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that the mac80211 stack not only needs to be able to
muck with the link-level headers, it also might need to mangle all of
the packet data if doing sw wireless encryption.
This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the
bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
bringing this bisection analysis to my attention), and Ilpo (for
trying to analyze this purely from the TCP side).
In 2.6.27 we can take another stab at this, by using something like
skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
tx->key. The ESP protocol code in the IPSEC stack can be used as a
model for implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Wed, 18 Jun 2008 05:28:05 +0000 (22:28 -0700)]
af_unix: fix 'poll for write'/ connected DGRAM sockets
The unix_dgram_sendmsg routine implements a (somewhat crude)
form of receiver-imposed flow control by comparing the length of the
receive queue of the 'peer socket' with the max_ack_backlog value
stored in the corresponding sock structure, either blocking
the thread which caused the send-routine to be called or returning
EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
sockets. The poll-implementation for these socket types is
datagram_poll from core/datagram.c. A socket is deemed to be writeable
by this routine when the memory presently consumed by datagrams
owned by it is less than the configured socket send buffer size. This
is always wrong for connected PF_UNIX non-stream sockets when the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an
(usual) application to 'poll for writeability by repeated send request
with O_NONBLOCK set' until it has consumed its time quantum.
The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_sendmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.
Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Jun 2008 04:26:37 +0000 (21:26 -0700)]
ax25: Fix std timer socket destroy handling.
Tihomir Heidelberg - 9a4gl, reports:
--------------------
I would like to direct you attention to one problem existing in ax.25
kernel since 2.4. If listening socket is closed and its SKB queue is
released but those sockets get weird. Those "unAccepted()" sockets
should be destroyed in ax25_std_heartbeat_expiry, but it will not
happen. And there is also a note about that in ax25_std_timer.c:
/* Magic here: If we listen() and a new link dies before it
is accepted() it isn't 'dead' so doesn't get removed. */
This issue cause ax25d to stop accepting new connections and I had to
restarted ax25d approximately each day and my services were unavailable.
Also netstat -n -l shows invalid source and device for those listening
sockets. It is strange why ax25d's listening socket get weird because of
this issue, but definitely when I solved this bug I do not have problems
with ax25d anymore and my ax25d can run for months without problems.
--------------------
Actually as far as I can see, this problem is even in releases
as far back as 2.2.x as well.
It seems senseless to special case this test on TCP_LISTEN state.
Anything still stuck in state 0 has no external references and
we can just simply kill it off directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Wed, 18 Jun 2008 04:12:48 +0000 (21:12 -0700)]
netdevice: change net_device->promiscuity/allmulti to unsigned int
The comments of dev_set_allmulti/promiscuity() is that "While the count in
the device remains above zero...". So negative count is useless.
Fix the type of the counter from "int" to "unsigned int".
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ang Way Chuang [Wed, 18 Jun 2008 04:10:33 +0000 (21:10 -0700)]
tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set
By default, tun.c running in TUN_TUN_DEV mode will set the protocol of
packet to IPv4 if TUN_NO_PI is set. My program failed to work when I
assumed that the driver will check the first nibble of packet,
determine IP version and set the appropriate protocol.
Signed-off-by: Ang Way Chuang <wcang@nav6.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh [Sat, 14 Jun 2008 01:12:04 +0000 (18:12 -0700)]
bonding: Allow setting max_bonds to zero
Permit bonding to function rationally if max_bonds is set to
zero. This will load the module, but create no master devices (which can
be created via sysfs).
Requires some change to bond_create_sysfs; currently, the
netdev sysfs directory is determined from the first bonding device created,
but this is no longer possible. Instead, an interface from net/core is
created to create and destroy files in net_class.
Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
update the documentation.
Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
bonding: Send more than one gratuitous ARP when slave takes over
This change modifies that support to remove duplicated code,
add support for ARP monitor (the original only supported miimon), clear
the grat ARP counter in bond_close (lest a later "ifconfig up" immediately
start spewing ARPs), and add documentation for the module parameter.
Also updated driver version to 3.3.0.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Or Gerlitz [Sat, 14 Jun 2008 01:12:02 +0000 (18:12 -0700)]
bonding: deliver netdev event for fail-over under the active-backup mode
under active-backup mode and when there's actual new_active slave,
have bond_change_active_slave() call the networking core to deliver
NETDEV_BONDING_FAILOVER event such that the fail-over can be notable
by code outside of the bonding driver such as the RDMA stack and
monitoring tools.
As the correct context of locking appropriate for notifier calls is RTNL
and nothing else, bond->curr_slave_lock and bond->lock are unlocked and
later locked again. This is ensured by the rest of the code to be safe
under backup-mode AND when new_active is not NULL.
Jay Vosburgh modified the original patch for formatting and fixed a
compiler error.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Or Gerlitz [Sat, 14 Jun 2008 01:12:01 +0000 (18:12 -0700)]
bonding: bond_change_active_slave() cleanup under active-backup
simplified the code of bond_change_active_slave() such that under
active-backup mode there's one "if (new_active)" test and the rest
of the code only does extra checks on top of it. This removed an
unneeded "if (bond->send_grat_arp > 0)" check and avoid calling
bond_send_gratuitous_arp when there's no active slave.
Jay Vosburgh made minor coding style changes to the orignal patch.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Or Gerlitz [Sat, 14 Jun 2008 01:12:00 +0000 (18:12 -0700)]
net/core: add NETDEV_BONDING_FAILOVER event
Add NETDEV_BONDING_FAILOVER event to be used in a successive patch
by bonding to announce fail-over for the active-backup mode through the
netdev events notifier chain mechanism. Such an event can be of use for the
RDMA CM (communication manager) to let native RDMA ULPs (eg NFS-RDMA, iSER)
always be aligned with the IP stack, in the sense that they use the same
ports/links as the stack does. More usages can be done to allow monitoring
tools based on netlink events being aware to bonding fail-over.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>