err.no Git - linux-2.6/log

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:

drivers/net/wireless/iwlwifi/iwl4965-base.c

ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.

The commit 77d16f450ae0452d7d4b009f78debb1294fb435c ("[IPV6] ROUTE:
Unify RT6_F_xxx and RT6_SELECT_F_xxx flags") intended to pass various
routing lookup hints around RT6_LOOKUP_F_xxx flags, but conversion was
missing for rt6_device_match().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlabel: Fix a problem when dumping the default IPv6 static labels

There is a missing "!" in a conditional statement which is causing entries to
be skipped when dumping the default IPv6 static label entries. This can be
demonstrated by running the following:

# netlabelctl unlbl add default address:::1 \
label:system_u:object_r:unlabeled_t:s0
# netlabelctl -p unlbl list

... you will notice that the entry for the IPv6 localhost address is not
displayed but does exist (works correctly, causes collisions when attempting
to add duplicate entries, etc.).

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/inet_lro: remove setting skb->ip_summed when not LRO-able

When an SKB cannot be chained to a session, the current code attempts
to "restore" its ip_summed field from lro_mgr->ip_summed. However,
lro_mgr->ip_summed does not hold the original value; in fact, we'd
better not touch skb->ip_summed since it is not modified by the code
in the path leading to a failure to chain it. Also use a cleaer
comment to the describe the ip_summed field of struct net_lro_mgr.

Issue raised by Or Gerlitz <ogerlitz@voltaire.com>

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild

The problem is that while we work w/o the inet_frags.lock even
read-locked the secret rebuild timer may occur (on another CPU, since
BHs are still disabled in the inet_frag_find) and change the rnd seed
for ipv4/6 fragments.

It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244
([INET]: Omit double hash calculations in xxx_frag_intern) late
in the 2.6.24 kernel, so this should probably be queued to -stable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

CONNECTOR: add a proc entry to list connectors

I got a problem when I wanted to check if the kernel supports process
event connector, and It seems there's no way to do this check.

At best I can check if the kernel supports connector or not, by looking
into /proc/net/netlink, or maybe checking the return value of bind() to
see if it's ENOENT.

So it would be useful to add /proc/net/connector to list all supported
connectors:
# cat /proc/net/connector
Name            ID
connector       4294967295:4294967295
cn_proc         1:1
w1              3:1

Changelog:
- fix memory leak: s/seq_release/single_release
- use spin_lock_bh instead of spin_lock_irqsave

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Fix some doc comments in net/netlink/attr.c

Fix some doc comments to match function and attribute names in
net/netlink/attr.c.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: /proc/net/tcp rto,ato values not scaled properly (v2)

I found another case where we are sending information to userspace
in the wrong HZ scale. This should have been fixed back in 2.5 :-(

This means an ABI change but as it stands there is no way for an application
like ss to get the right value.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

include/linux/netdevice.h: don't export MAX_HEADER to userspace

Due to the CONFIG_'s the value is anyway not correct in userspace.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: Remove CONFIG_NET_SCH_RR

Commit d62733c8e437fdb58325617c4b3331769ba82d70
([SCHED]: Qdisc changes and sch_rr added for multiqueue)
added a NET_SCH_RR option that was unused since the code
went unconditionally into sch_prio.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive.

Note, in the following patch, 'err' is initialized as:

int err = -ENOBUFS;

Signed-off-by: WANG Cong <wcong@critical-links.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice: Fix typo of dev_unicast_add() comment

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_unix: fix 'poll for write'/connected DGRAM sockets

For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg
routine implements a form of receiver-imposed flow control by
comparing the length of the receive queue of the 'peer socket' with
the max_ack_backlog value stored in the corresponding sock structure,
either blocking the thread which caused the send-routine to be called
or returning EAGAIN. This routine is used by both SOCK_DGRAM and
SOCK_SEQPACKET sockets. The poll-implementation for these socket types
is datagram_poll from core/datagram.c. A socket is deemed to be
writeable by this routine when the memory presently consumed by
datagrams owned by it is less than the configured socket send buffer
size. This is always wrong for PF_UNIX non-stream sockets connected to
server sockets dealing with (potentially) multiple clients if the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an (usual)
application to 'poll for writeability by repeated send request with
O_NONBLOCK set' until it has consumed its time quantum.

The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_recvmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.

Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix for splice receive when used with software LRO

If an skb has nr_frags set to zero but its frag_list is not empty (as
it can happen if software LRO is enabled), and a previous
tcp_read_sock has consumed the linear part of the skb, then
__skb_splice_bits:

(a) incorrectly reports an error and

(b) forgets to update the offset to account for the linear part

Any of the two problems will cause the subsequent __skb_splice_bits
call (the one that handles the frag_list skbs) to either skip data,
or, if the unadjusted offset is greater then the size of the next skb
in the frag_list, make tcp_splice_read loop forever.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: calculate tcp_mem based on low memory instead of all memory

The tcp_mem array which contains limits on the total amount of memory
used by TCP sockets is calculated based on nr_all_pages. On a 32 bits
x86 system, we should base this on the number of lowmem pages.

Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

hamradio: remove unused variable

Signed-off-by: Andre Haupt <andre@bitwigglers.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac80211: fix an oops in several failure paths in key allocation

This patch fixes an oops in several failure paths in key allocation. This
Oops occurs when freeing a key that has not been linked yet, so the
key->sdata is not set.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

prism: islpci_eth.c endianness fix

clock is already cpu-endian (see le32_to_cpu slightly before), so
le64_to_cpu doesn't make much sense.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix lock dependency errror

This fixes a circular locking dependency in the workqueue handling.
The interface work task uses the mac80211 function
ieee80211_iterate_active_interfaces() which grabs the RTNL lock.

However when the interface is brough down, this happens under the RTNL
lock as well, this causes problems because mac80211 will flush the workqueue
during the ifdown event. This causes mac80211 to wait until the driver has
completed all work which can't finish because it is waiting on the RTNL lock.

This is fixed by moving rt2x00 workqueue tasks on a different workqueue,
this workqueue can be flushed when the ieee80211_hw structure is removed
by the driver (when the driver is unloaded) which does not happen under the
RTNL lock.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6

Hold RTNL while calling dev_close()

dev_close() must be called holding the RTNL. Compile-tested only.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

qla3xxx: Hold RTNL while calling dev_close()

dev_close() must be called holding the RTNL. Compile-tested only.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[netdrvr] Fix IOMMU overflow checking in s2io.c

s2io has IOMMU overflow checking, but unfortunately it is wrong.

It didn't use the standard macros, which meant that it only worked
on POWER and SPARC because only those define DMA_ERROR_CODE. Convert it to
use the standard macros instead.

I also commented two more bugs in the IOMMU handling. It assumes
that 0 DMA addresses cannot happen, but that's not true in all IOMMU setups.
The information if a buffer has been already mapped needs to be stored
elsewhere.

Didn't fix those because it needs careful checking of the buffer handling
by the maintainers.

Cc: ram.vepa@neterion.com
Cc: santosh.rastapur@neterion.com
Cc: sivakumar.subramani@neterion.com
Cc: sreenivasa.honnur@neterion.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

e1000: only enable TSO6 via ethtool when using correct hardware

When enabling TSO via ethool on e1000, it is possible to set
NETIF_F_TSO6 on hardware that does not support it. Setting TSO via
ethtool now matches the settings used when the hardware is probed.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

e100: Do pci_dma_sync after skb_alloc for proper operation on ixp4xx

The E100 device can't work on current kernel (2.6.26-rc6) and will cause
kernel corruption on intel ixdp4xx.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[netdrvr] netxen: fix netxen_pci_tbl[] breakage

PCI_DEVICE_CLASS sets .device and .vendor to PCI_ANY_DEV,
which overrides the effect of preceding PCI_DEVICE() and makes
all elements of netxen_pci_tbl[] identical. Introduced in the
commit dcd56fdbaeae1008044687b973c4a3e852e8a726.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[netdrvr] 3c59x: remove irqs_disabled warning from local_bh_enable

Original Author: Michael Buesch <mb@bu3sch.de>

net, vortex: fix lockup

Ingo Molnar reported:

-tip testing found that Johannes Berg's "softirq: remove irqs_disabled
warning from local_bh_enable" enhancement to lockdep triggers a new
warning on an old testbox that uses 3c59x vortex and netlogging:

----->
    calling  vortex_init+0x0/0xb0
    PCI: Found IRQ 10 for device 0000:00:0b.0
    PCI: Sharing IRQ 10 with 0000:00:0a.0
    PCI: Sharing IRQ 10 with 0000:00:0b.1
    3c59x: Donald Becker and others.
    0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
    PCI: Enabling bus mastering for device 0000:00:0b.0
    initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
...
    calling  init_netconsole+0x0/0x1b0
    netconsole: local port 4444
    netconsole: local IP 10.0.1.9
    netconsole: interface eth0
    netconsole: remote port 4444
    netconsole: remote IP 10.0.1.16
    netconsole: remote ethernet address 00:19:xx:xx:xx:xx
    netconsole: device eth0 not up yet, forcing it
    eth0:  setting half-duplex.
    eth0:  setting full-duplex.
------------[ cut here ]------------
    WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
    Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
     [<c0125ecf>] warn_on_slowpath+0x4f/0x70
     [<c0126834>] ? release_console_sem+0x1b4/0x1d0
     [<c0126d00>] ? vprintk+0x2a0/0x450
     [<c012fde5>] ? __mod_timer+0xa5/0xc0
     [<c046f7fd>] ? mdio_sync+0x3d/0x50
     [<c0160ef6>] ? marker_probe_cb+0x46/0xa0
     [<c0126ed7>] ? printk+0x27/0x50
     [<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
     [<c046f521>] ? vortex_set_duplex+0xa1/0xc0
     [<c0471b92>] ? vortex_timer+0xe2/0x3e0
     [<c012b361>] local_bh_enable_ip+0xd1/0xe0
     [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
     [<c0471b92>] vortex_timer+0xe2/0x3e0
     [<c014743b>] ? trace_hardirqs_on+0xb/0x10
     [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
     [<c012f8b2>] run_timer_softirq+0x162/0x1c0
     [<c0471ab0>] ? vortex_timer+0x0/0x3e0
     [<c012b361>] local_bh_enable_ip+0xd1/0xe0
     [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
     [<c0471b92>] vortex_timer+0xe2/0x3e0
     [<c014743b>] ? trace_hardirqs_on+0xb/0x10
     [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
     [<c012f8b2>] run_timer_softirq+0x162/0x1c0
     [<c0471ab0>] ? vortex_timer+0x0/0x3e0
     [<c0471ab0>] ? vortex_timer+0x0/0x3e0
     [<c012b60a>] __do_softirq+0x9a/0x160
     [<c012b570>] ? __do_softirq+0x0/0x160
     [<c0106775>] call_on_stack+0x15/0x30
     [<c012b4f5>] ? irq_exit+0x55/0x60
     [<c0106e85>] ? do_IRQ+0x85/0xd0
     [<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
     [<c0104888>] ? common_interrupt+0x28/0x30
     [<c08d8ac8>] ? mutex_unlock+0x8/0x10
     [<c08d8180>] ? _cond_resched+0x10/0x30
     [<c07a3be7>] ? netpoll_setup+0x117/0x390
     [<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
     [<c013d539>] ? ktime_get+0x19/0x40
     [<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
     [<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
     [<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
     [<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
     [<c0c9b900>] ? kernel_init+0x0/0x2c0
     [<c0c9b900>] ? kernel_init+0x0/0x2c0
     [<c0104aa7>] ? kernel_thread_helper+0x7/0x10
     =======================
---[ end trace 37f9c502aff112e0 ]---
    console [netcon0] enabled
    netconsole: network logging started
    initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs

looking at the driver I think the bug is real and the fix actually
is trivial.

vp->lock is also taken in hardware IRQ context, so we _have_ to always
use irqsafe locking. As we run in a timer with IRQs disabled,
we can simply use spin_lock.

Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ipg: use NULL, not zero, for pointers

Fixes a sparse warning in a code block that's hidden under JUMBO_FRAME #ifdef.

Tested-by: Andrew Savchenko <Bircoph@list.ru>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ipg: fix jumbo frame compilation

Make jumbo frame support compile again. It was broken by the cleanup series
before the merge because the code is hidden under JUMBO_FRAME #ifdef.

Tested-by: Andrew Savchenko <Bircoph@list.ru>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

drivers/net/r6040.c: Eliminate double sizeof

Taking sizeof the result of sizeof is quite strange and does not seem to be
what is wanted here.

This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression E;
@@

- sizeof (
sizeof (E)
- )
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pcnet_cs, axnet_cs: clear bogus interrupt before request_irq

Signed-off-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

e1000e: fix EEH recovery during reset on PPC

EEH is not recovering in a reasonable amount of time on PPC during
e1000e_down().

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

igb: fix EEH recovery during reset on PPC

EEH is not recovering in a reasonable amount of time on PPC during
igb_down().

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ixgbe: fix EEH recovery during reset on PPC

EEh is not recovering in a resonable amount of time on PPC during
ixgbe_down().

Signed-off-by: Paul Larson <pl@us.ibm.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

tc35815: Fix receiver hangup on Rx FIFO overflow

On Rx FIFO overflow error, the controller consume a buffer descriptor
but currently the driver does not give it back to the controller.
This results unrecoverable 'Buffer List Exhausted' condition. This
patch fix this problem by moving a "fbl_count--" line to proper place.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

tc35815: Mark carrier-off before starting PHY

Call netif_carrier_off() before starting PHY device. This is a
behavior before converting to generic PHY layer.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

s2io: fix documentation about intr_type

The documentation for intr_type module parameter of the s2io driver is
not consistent with the code. The comments in drivers/net/s2io.c are
OK, but Documentation/networking/s2io.txt is wrong.

Pointed out by Andrew Hecox.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

iwlwifi: improve scanning band selection management

This patch modifies the band selection management when scanning, so
bands are now scanned according to HW band support.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix unbalanced mutex locking

The usb_cache_mutex was not correctly released
under all circumstances. Both rt73usb as rt2500usb
didn't release the mutex under certain conditions
when the register access failed. Obviously such
failure would lead to deadlocks.

In addition under similar circumstances when the
bbp register couldn't be read the value must be
set to 0xff to indicate that the value is wrong.
This too didn't happen under all circumstances.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b43legacy: Fix possible NULL pointer dereference in DMA code

This fixes a possible NULL pointer dereference in an error path of the
DMA allocation error checking code. This is also necessary for a future
DMA API change that is on its way into the mainline kernel that adds
an additional dev parameter to dma_mapping_error().

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: stable <stable@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b43: Fix possible MMIO access while device is down

This fixes a possible MMIO access while the device is still down
from a suspend cycle. MMIO accesses with the device powered down
may cause crashes on certain devices.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b43legacy: Do not return TX_BUSY from op_tx

Never return TX_BUSY from op_tx. It doesn't make sense to return
TX_BUSY, if we can not transmit the packet.
Drop the packet and return TX_OK.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b43: Do not return TX_BUSY from op_tx

Never return TX_BUSY from op_tx. It doesn't make sense to return
TX_BUSY, if we can not transmit the packet.
Drop the packet and return TX_OK.
This will fix the resume hang.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac80211: implement EU regulatory domain

Implement missing EU regulatory domain for mac80211. Based on the
information in IEEE 802.11-2007 (specifically pages 1142, 1143 & 1148)
and ETSI 301 893 (V1.4.1).
With thanks to Johannes Berg.

Signed-off-by: Tony Vroon <tony@linx.net>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

netfilter: ip6table_mangle: don't reroute in LOCAL_IN

Rerouting should only happen in LOCAL_OUT, in INPUT its useless
since the packet has already chosen its final destination.

Noticed by Alexey Dobriyan <adobriyan@gmail.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Update version

Updating to version 1.45.6

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add PCIE EEH support

Add PCI recovery functions to the driver. The initial PCI state is
also saved so the MSI state can be restored during PCI recovery.

Signed-off-by: Wendy Xiong <wendyx@us.ibm.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Enhanced self test

Added registers, memories, loopback, nvram, interrupt and link tests to
the self-test

Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Re-factor Tx code

Add support for IPv6 TSO
Re-factor the Tx code with smaller functions to increase readability.
Add linearization code in case packet is too fragmented for the
microcode to handle.

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add TPA, Broadcoms HW LRO

The TPA stands for Transparent Packet Aggregation. When enabled, the FW
aggregate in-order TCP packets according to the 4-tuple match and sends
1 big packet to the driver. This packet is stored on an SGL in which
each SGE is 1 page. The FW also implements a timeout algorithm and it
honors all TCP flag, including the push flag as a trigger to halt
aggregation.

After receiving Ben Hutchings comments, we also added ethtool support,
so now, thanks to Ben's patch, when forwarding is enabled, our
aggregation is turned off using the LRO flags.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: New statistics code

To avoid race conditions with link up/down and driver up/down - the
statistics handling was re-written in a form of state machine.
Also supporting statistics for 57711

Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add support for BCM57711 HW

Supporting the 57711 and 57711E - refers to in the code as E1H. The
57710 is referred to as E1.

To support the new members in the family, the bnx2x structure was
divided to 3 parts: common, port and function. These changes caused some
rearrangement in the bnx2x.h file.

A set of accessories macros were added to make access to the bnx2x
structure more readable

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: New microcode part 3/3

The new Microcode BLOB - broken into a separate patch to make it small
enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: New microcode part 2/3

The new Microcode BLOB - broken into a separate patch to make it small
enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: New microcode part 1/3

The new Microcode BLOB - broken into a separate patch to make it small
enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Remove old microcode

Removing the old Microcode from the BLOB - broken into a separate
patch to make it small enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: New init infrastructure

This new initialization code supports the 57711 HW. It also supports
the emulation and FPGA for the 57711 and 57710 initializations values
(very small amount of code which is very helpful in the lab - less
than 30 lines).

The initialization is done via DMAE after the DMAE block is ready -
before it is ready, some of the initialization is done via PCI
configuration transactions (referred to as indirect write).  A mutex
to protect the DMAE from being overlapped was added.  There are few
new registers which needs to be initialized by SW - the full comment
for those registers is added to the register file.  A place holder for
the 57711 (referred to as E1H) microcode was added- the microcode
itself is too big and it is split over the following 4 patches

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: New link code

New Link code:
Moving all the link related code (including the calculations, the
initialization of the MAC and PHY and the external PHY's code) into
a separated file. The changes from the code that used to be part of
bnx2x.c (now called bnx2x_main.c) are:
- Using separate structures for link inputs and link outputs to clearly
  identify what was configured and what is the outcome
- Adding code to read external PHY FW version and print it as part of
  ethtool -i
- Adding code to upgrade external PHY FW from ethtool -E with special
  magic number - Changing the link down indication to ERR level
- Adding a lock on all PHY access to prevent an interrupt and
  setting changes to overlap
- Adding support for emulation and FPGA (small chunk of code that really
  helps in the lab) - Adding support for 1G on BCM8706 PHY
- Adding clear debug print incase of fan failure (the PHY type is now
  "failure")

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Adding bnx2x_link

This patch is int the new bnx2x_link files (C and H). The files are
still not used in this patch, only in the next one so the patch will
be small enough for the mailing list.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilong Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Rename bnx2x.c to bnx2x_main.c

This patch is the rename of bnx2x.c to bnx2x_main.c.

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: Don't receive new packets in a dead network namespace.

Alexey Dobriyan <adobriyan@gmail.com> writes:
> Subject: ICMP sockets destruction vs ICMP packets oops

> After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
> icmp_reply() wants ICMP socket.
>
> Steps to reproduce:
>
> launch shell in new netns
> move real NIC to netns
> setup routing
> ping -i 0
> exit from shell
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
> PGD 17f3cd067 PUD 17f3ce067 PMD 0
> Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU 0
> Modules linked in: usblp usbcore
> Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
> RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
> RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
> RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
> RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
> R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
> FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
> Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
>  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
>  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
> Call Trace:
>  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
>  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
>  [<ffffffff803fd645>] icmp_echo+0x65/0x70
>  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
>  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
>  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
>  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
>  [<ffffffff803be57d>] process_backlog+0x9d/0x100
>  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
>  [<ffffffff80237be5>] __do_softirq+0x75/0x100
>  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
>  [<ffffffff8020f085>] do_softirq+0x65/0xa0
>  [<ffffffff80237af7>] irq_exit+0x97/0xa0
>  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
>  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
>  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
>  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
>  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
>  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
>  [<ffffffff8040f380>] ? rest_init+0x70/0x80
> Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
> 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
> 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
> RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
>  RSP <ffffffff8057fc30>
> CR2: 0000000000000000
> ---[ end trace ea161157b76b33e8 ]---
> Kernel panic - not syncing: Aiee, killing interrupt handler!

Receiving packets while we are cleaning up a network namespace is a
racy proposition. It is possible when the packet arrives that we have
removed some but not all of the state we need to fully process it.  We
have the choice of either playing wack-a-mole with the cleanup routines
or simply dropping packets when we don't have a network namespace to
handle them.

Since the check looks inexpensive in netif_receive_skb let's just
drop the incoming packets.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Make sure N * sizeof(union sctp_addr) does not overflow.

As noticed by Gabriel Campana, the kmalloc() length arg
passed in by sctp_getsockopt_local_addrs_old() can overflow
if ->addr_num is large enough.

Therefore, enforce an appropriate limit.

Signed-off-by: David S. Miller <davem@davemloft.net>

pppoe: warning fix

Fix warning:
drivers/net/pppoe.c: In function 'pppoe_recvmsg':
drivers/net/pppoe.c:945: warning: comparison of distinct pointer types lacks a cast
because skb->len is unsigned int and total_len is size_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Kill unused variable in sctp_assoc_bh_rcv()

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Update driver version to 1.7.7.

And update module description.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Cleanup error handling in bnx2_open().

All error handling in bnx2_open() can be consolidated.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Turn on multi rx rings.

Enable multiple rx rings if MSI-X vectors are available. We enable
up to 7 rx rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Update firmware to support multi rx rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Use one handler for all MSI-X vectors.

Use the same MSI-X handler to schedule NAPI.  Change the dev_instance
void pointer to the bnx2_napi struct instead so we can have the proper
context for each MSI-X vector.

Add a new bnx2_poll_msix() that is optimized for handling MSI-X
NAPI polling of rx/tx work only.  Remove the old bnx2_tx_poll() that
is no longer needed.  Each MSI-X vector handles 1 tx and 1 rx ring.
The first vector handles link events as well.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Optimize fast-path tx and rx work.

Add hw_tx_cons_ptr and hw_rx_cons_ptr to speed up the retreival of
the tx and rx consumer index, since the MSI-X and default status
blocks have different structures.

Combine status_blk and status_blk_msix into a union. We'll only use
one type of status block for each vector.

Separate the code to detect more rx and tx work from the code to
detect link related work.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Put rx ring variables in a separate struct.

In preparation for multi-ring support, rx ring variables are now put
in a separate bnx2_rx_ring_info struct. With MSI-X, we can support
multiple rx rings.

The functions to allocate/free rx memory and to initialize rx rings
are now modified to handle multiple rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Put tx ring variables in a separate struct.

In preparation for multi-ring support, tx ring variables are now put
in a separate bnx2_tx_ring_info struct. Multi tx ring will not be
enabled until it is fully supported by the stack. Only 1 tx ring
will be used at the moment.

The functions to allocate/free tx memory and to initialize tx rings
are now modified to handle multiple rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Drop packets for loopback address from outside of the box.

[ Based upon original report and patch by Karsten Keil.  Karsten
  has verified that this fixes the TAHI test case "ICMPv6 test
  v6LC.5.1.2 Part F". -DaveM ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Remove options header when setsockopt's optlen is 0

Remove the sticky Hop-by-Hop options header by calling setsockopt()
for IPV6_HOPOPTS with a zero option length, per RFC3542.

Routing header and Destination options header does the same as
Hop-by-Hop options header.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Discard and warn about LRO'd skbs received for forwarding

Add skb_warn_if_lro() to test whether an skb was received with LRO and
warn if so.

Change br_forward(), ip_forward() and ip6_forward() to call it) and
discard the skb if it returns true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Disable LRO on devices that are forwarding

Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded. It can also confuse the GSO on output.

Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.

Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Follow security requirement of responding with 1 packet

RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts

When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet.  If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet.  If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses.  Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.

We implement this by not servicing our outqueue until we reach the end
of the packet.  This enables maximum bundling.  We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Validate Initiate Tag when handling ICMP message

This patch add to validate initiate tag and chunk type if verification
tag is 0 when handling ICMP message.

RFC 4960, Appendix C. ICMP Handling

ICMP6) An implementation MUST validate that the Verification Tag
contained in the ICMP message matches the Verification Tag of the peer.
If the Verification Tag is not 0 and does NOT match, discard the ICMP
message. If it is 0 and the ICMP message contains enough bytes to
verify that the chunk type is an INIT chunk and that the Initiate Tag
matches the tag of the peer, continue with ICMP7. If the ICMP message
is too short or the chunk type or the Initiate Tag does not match,
silently discard the packet.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:

net/mac80211/tx.c

mac80211: detect driver tx bugs

When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: genl: fix circular locking

genetlink has a circular locking dependency when dumping the registered
families:

- dump start:
genl_rcv()            : take genl_mutex
genl_rcv_msg()        : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump()        : take nlk->cb_mutex
ctrl_dumpfamily()     : try to detect this case and not take genl_mutex a
                        second time

- dump continuance:
netlink_rcv()         : call netlink_dump
netlink_dump          : take nlk->cb_mutex
ctrl_dumpfamily()     : take genl_mutex

Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice: Fix promiscuity and allmulti overflow

Max of promiscuity and allmulti plus positive @inc can cause overflow.
Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a
positive @inc will cause allmulti be off.
This is not what we want, though it's rare case.
The fix is that only negative @inc will cause allmulti or promiscuity be off
and when any caller makes the counters touch the roof, we return error.

Change of v2:
Change void function dev_set_promiscuity/allmulti to return int.
So callers can get the overflow error.
Caller's fix will be done later.

Change of v3:
1. Since we return error to caller, we don't need to print KERN_ERROR,
KERN_WARNING is enough.
2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we
return at once.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "mac80211: Use skb_header_cloned() on TX path."

This reverts commit 608961a5eca8d3c6bd07172febc27b5559408c5d.

The problem is that the mac80211 stack not only needs to be able to
muck with the link-level headers, it also might need to mangle all of
the packet data if doing sw wireless encryption.

This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the
bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
bringing this bisection analysis to my attention), and Ilpo (for
trying to analyze this purely from the TCP side).

In 2.6.27 we can take another stab at this, by using something like
skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
tx->key. The ESP protocol code in the IPSEC stack can be used as a
model for implementation.

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: minor cleanup in net/ipv6/tcp_ipv6.c [RESEND ].

In net/ipv6/tcp_ipv6.c:

- Remove unneeded tcp_v6_send_check() declaration.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add sk_set_socket() helper.

In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.

Suggested (although only half-seriously) by Evgeniy Polyakov.

Signed-off-by: David S. Miller <davem@davemloft.net>

af_unix: fix 'poll for write'/ connected DGRAM sockets

The unix_dgram_sendmsg routine implements a (somewhat crude)
form of receiver-imposed flow control by comparing the length of the
receive queue of the 'peer socket' with the max_ack_backlog value
stored in the corresponding sock structure, either blocking
the thread which caused the send-routine to be called or returning
EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
sockets. The poll-implementation for these socket types is
datagram_poll from core/datagram.c. A socket is deemed to be writeable
by this routine when the memory presently consumed by datagrams
owned by it is less than the configured socket send buffer size. This
is always wrong for connected PF_UNIX non-stream sockets when the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an
(usual) application to 'poll for writeability by repeated send request
with O_NONBLOCK set' until it has consumed its time quantum.

The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_sendmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.

Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6

Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6

ax25: Fix std timer socket destroy handling.

Tihomir Heidelberg - 9a4gl, reports:

--------------------
I would like to direct you attention to one problem existing in ax.25
kernel since 2.4. If listening socket is closed and its SKB queue is
released but those sockets get weird. Those "unAccepted()" sockets
should be destroyed in ax25_std_heartbeat_expiry, but it will not
happen. And there is also a note about that in ax25_std_timer.c:
/* Magic here: If we listen() and a new link dies before it
is accepted() it isn't 'dead' so doesn't get removed. */

This issue cause ax25d to stop accepting new connections and I had to
restarted ax25d approximately each day and my services were unavailable.
Also netstat -n -l shows invalid source and device for those listening
sockets. It is strange why ax25d's listening socket get weird because of
this issue, but definitely when I solved this bug I do not have problems
with ax25d anymore and my ax25d can run for months without problems.
--------------------

Actually as far as I can see, this problem is even in releases
as far back as 2.2.x as well.

It seems senseless to special case this test on TCP_LISTEN state.
Anything still stuck in state 0 has no external references and
we can just simply kill it off directly.

Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice: change net_device->promiscuity/allmulti to unsigned int

The comments of dev_set_allmulti/promiscuity() is that "While the count in
the device remains above zero...". So negative count is useless.
Fix the type of the counter from "int" to "unsigned int".

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set

By default, tun.c running in TUN_TUN_DEV mode will set the protocol of
packet to IPv4 if TUN_NO_PI is set. My program failed to work when I
assumed that the driver will check the first nibble of packet,
determine IP version and set the appropriate protocol.

Signed-off-by: Ang Way Chuang <wcang@nav6.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: sk_drops handling

In commits 33c732c36169d7022ad7d6eb474b0c9be43a2dc1 ([IPV4]: Add raw
drops counter) and a92aa318b4b369091fd80433c80e62838db8bc1c ([IPV6]:
Add raw drops counter), Wang Chen added raw drops counter for
/proc/net/raw & /proc/net/raw6

This patch adds this capability to UDP sockets too (/proc/net/udp &
/proc/net/udp6).

This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
be examined for each udp socket.

# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 23971006 75 899420 16390693 146348 0

# cat /proc/net/udp
sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt  ---
uid  timeout inode ref pointer drops
75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
  0        0 2358 2 ffff81082a538c80 0
111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
  0        0 2286 2 ffff81042dd35c80 146348

In this example, only port 111 (0x006F) was flooded by messages that
user program could not read fast enough. 146348 messages were lost.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS

Add PJ Waskiewicz to the list of maintainers for Intel 10/100/1000/10GbE
adapters.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

bonding: Allow setting max_bonds to zero

Permit bonding to function rationally if max_bonds is set to
zero. This will load the module, but create no master devices (which can
be created via sysfs).

Requires some change to bond_create_sysfs; currently, the
netdev sysfs directory is determined from the first bonding device created,
but this is no longer possible. Instead, an interface from net/core is
created to create and destroy files in net_class.

Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
update the documentation.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

bonding: Rework / fix multiple gratuitous ARP support

Support for sending multiple gratuitous ARPs during failovers
was added by commit:

commit 7893b2491a2d5f716540ac5643d78d37a7f6628b
Author: Moni Shoua <monis@voltaire.com>
Date: Sat May 17 21:10:12 2008 -0700

bonding: Send more than one gratuitous ARP when slave takes over

This change modifies that support to remove duplicated code,
add support for ARP monitor (the original only supported miimon), clear
the grat ARP counter in bond_close (lest a later "ifconfig up" immediately
start spewing ARPs), and add documentation for the module parameter.

Also updated driver version to 3.3.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

bonding: deliver netdev event for fail-over under the active-backup mode

under active-backup mode and when there's actual new_active slave,
have bond_change_active_slave() call the networking core to deliver
NETDEV_BONDING_FAILOVER event such that the fail-over can be notable
by code outside of the bonding driver such as the RDMA stack and
monitoring tools.

As the correct context of locking appropriate for notifier calls is RTNL
and nothing else, bond->curr_slave_lock and bond->lock are unlocked and
later locked again. This is ensured by the rest of the code to be safe
under backup-mode AND when new_active is not NULL.

Jay Vosburgh modified the original patch for formatting and fixed a
compiler error.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

bonding: bond_change_active_slave() cleanup under active-backup

simplified the code of bond_change_active_slave() such that under
active-backup mode there's one "if (new_active)" test and the rest
of the code only does extra checks on top of it. This removed an
unneeded "if (bond->send_grat_arp > 0)" check and avoid calling
bond_send_gratuitous_arp when there's no active slave.

Jay Vosburgh made minor coding style changes to the orignal patch.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

net/core: add NETDEV_BONDING_FAILOVER event

Add NETDEV_BONDING_FAILOVER event to be used in a successive patch
by bonding to announce fail-over for the active-backup mode through the
netdev events notifier chain mechanism. Such an event can be of use for the
RDMA CM (communication manager) to let native RDMA ULPs (eg NFS-RDMA, iSER)
always be aligned with the IP stack, in the sense that they use the same
ports/links as the stack does. More usages can be done to allow monitoring
tools based on netlink events being aware to bonding fail-over.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

sky2: version 1.22

New version to reflect new hardware support

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

sky2: 88E8057 chip support

Add support for Yukon 2 Ultra 2 chip set (88E8057) based on code in latest
version of vendor driver (sk98lin 10.60.2.3). Untested on real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>