[BRIDGE]: forwarding remove unneeded preempt and bh diasables
Optimize the forwarding and transmit paths. Both places are
called with bottom half/no preempt so there is no need to use
spin_lock_bh or rcu_read_lock.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Use kzalloc versus kmalloc+memset. Also don't need to do
memset() of bridge address since it is in netdev private data
that is already zero'd in alloc_netdev.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[DCCP] feat: Pass dccp_minisock ptr where only the minisock is used
This is in preparation for having a dccp_minisock embedded into
dccp_request_sock so that feature negotiation can be done prior to
creating the full blown dccp_sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[DCCP] minisock: Rename struct dccp_options to struct dccp_minisock
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.
Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A recent changeset removes dummy_socket_getpeersec, replacing it with
two new functions, but still references the removed function in the
security_fixup_ops table, fix it by doing the replacement operation in
the fixup table too.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Mishin [Tue, 21 Mar 2006 06:45:21 +0000 (22:45 -0800)]
[NET]: {get|set}sockopt compatibility layer
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 21 Mar 2006 06:43:56 +0000 (22:43 -0800)]
[NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsum
We're now starting to have quite a number of places that do skb_pull
followed immediately by an skb_postpull_rcsum. We can merge these two
operations into one function with skb_pull_rcsum. This makes sense
since most pull operations on receive skb's need to update the
checksum.
I've decided to make this out-of-line since it is fairly big and the
fast path where hardware checksums are enabled need to call
csum_partial anyway.
Since this is a brand new function we get to add an extra check on the
len argument. As it is most callers of skb_pull ignore its return
value which essentially means that there is no check on the len
argument.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
As per Robert Olsson's patch for ipv4, this is the DECnet
version to keep the code "in step". It changes the list
of rules to use RCU rather than an rwlock.
Inspired-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch means that 64bit kernel/32bit userland platforms will now
work correctly with DECnet.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com> Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.
The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.
Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.
One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.
There are still a few warnings to fix, but this patch deals with the
important cases.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Zhang [Tue, 21 Mar 2006 06:41:23 +0000 (22:41 -0800)]
[SECURITY]: TCP/UDP getpeersec
This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.
Patch purpose:
This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using. The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection. In the case of UDP, the
security context is for each individual packet. An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.
Patch design approach:
- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association. The application may retrieve this context using
getsockopt. When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations. If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.
- Design for UDP
Unlike TCP, UDP is connectionless. This requires a somewhat different
API to retrieve the peer security context. With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down. With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.
The solution is to build upon the existing Unix domain socket API for
retrieving user credentials. Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).
Patch implementation details:
- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag. As an example (ignoring error
checking):
The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations. If
these have security associations with security contexts, the security
context is returned.
getsockopt returns a buffer that contains a security context string or
the buffer is unmodified.
- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt. Then the application retrieves the security context using
the auxiliary data mechanism.
An example server application for UDP should look like this:
ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer. A new
ancillary message type SCM_SECURITY.
When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space. An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space. The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.
Testing:
We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built. For TCP, we can then
extract the peer security context using getsockopt on either end. For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.
Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 21 Mar 2006 06:40:54 +0000 (22:40 -0800)]
[XFRM]: Fix aevent related crash
When xfrm_user isn't loaded xfrm_nl is NULL, which makes IPsec crash because
xfrm_aevent_is_on passes the NULL pointer to netlink_has_listeners as socket.
A second problem is that the xfrm_nl pointer is not cleared when the socket
is releases at module unload time.
Protect references of xfrm_nl from outside of xfrm_user by RCU, check
that the socket is present in xfrm_aevent_is_on and set it to NULL
when unloading xfrm_user.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Tue, 21 Mar 2006 06:40:29 +0000 (22:40 -0800)]
[TCP]: sysctl to allow TCP window > 32767 sans wscale
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated. Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.
Those days are long gone, so we can use the full 16-bits by default
now.
There is a sysctl added so that we can still interact with such old
stacks
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 21 Mar 2006 06:40:03 +0000 (22:40 -0800)]
[IPV4] ARP: Documentation for new arp_accept sysctl variable.
As John pointed out, I had not added documentation to describe the
arp_accpet sysctl that I posted in my last patch. This patch adds
that documentation.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Mahoney [Tue, 21 Mar 2006 06:39:21 +0000 (22:39 -0800)]
[TG3]: netif_carrier_off runs too early; could still be queued when init fails
Move the netif_carrier_off() call from tg3_init_one()->
tg3_init_link_config() to tg3_open() as is the convention for most other
network drivers.
I was getting a panic after a tg3 device failed to initialize due to DMA
failure. The oops pointed to the link watch queue with spinlock debugging
enabled. Without spinlock debugging, the Oops didn't occur.
I suspect that the link event was getting queued but not executed until
after the DMA test had failed and the device was freed. The link event was
then operating on freed memory, which could contain anything. With this
patch applied, the Oops no longer occurs.
[ Based upon feedback from Michael Chan, we move netif_carrier_off()
to the end of tg3_init_one() instead of moving it to tg3_open() -DaveM ]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Per Liden [Tue, 21 Mar 2006 06:38:14 +0000 (22:38 -0800)]
[TIPC]: Reduce stack usage
The node_map struct can be quite large (516 bytes) and allocating two of
them on the stack is not a good idea since we might only have a 4K stack
to start with.
Signed-off-by: Per Liden <per.liden@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Tue, 21 Mar 2006 06:37:52 +0000 (22:37 -0800)]
[TIPC]: Cleanups
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global functions:
- name_table.c: tipc_nametbl_print()
- name_table.c: tipc_nametbl_dump()
- net.c: tipc_net_next_node()
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Per Liden <per.liden@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sam Ravnborg [Tue, 21 Mar 2006 06:37:04 +0000 (22:37 -0800)]
[TIPC]: Remove inlines from *.c
With reference to latest discussions on linux-kernel with respect to
inline here is a patch for tipc to remove all inlines as used in
the .c files. See also chapter 14 in Documentation/CodingStyle.
Before:
text data bss dec hex filename
102990 5292 1752 110034 1add2 tipc.o
Now:
text data bss dec hex filename
101190 5292 1752 108234 1a6ca tipc.o
This is a nice text size reduction which will improve icache usage.
In some cases bigger (> 4 lines) functions where declared inline
and used in many places, they are most probarly no longer inlined by gcc
resulting in the size reduction.
There are several one liners that no longer are declared inline, but gcc
should inline these just fine without the inline hint.
With this patch applied one warning is added about an unused static
function - that was hidded by utilising inline before.
The function in question were kept so this patch is solely a
inline removal patch.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Per Liden <per.liden@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sam Ravnborg [Tue, 21 Mar 2006 06:36:47 +0000 (22:36 -0800)]
[TIPC]: Fix simple sparse warnings
Tried to run the new tipc stack through sparse.
Following patch fixes all cases where 0 was used
as replacement of NULL.
Use NULL to document this is a pointer and to silence sparse.
This brough sparse warning count down with 127 to 24 warnings.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Per Liden <per.liden@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 21 Mar 2006 06:36:21 +0000 (22:36 -0800)]
[NETFILTER]: Fix warnings in ip_nat_snmp_basic.c
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode':
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate':
net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function
Signed-off-by: David S. Miller <davem@davemloft.net>
Sam Ravnborg [Tue, 21 Mar 2006 06:34:52 +0000 (22:34 -0800)]
[WAN]: fix section mismatch warning in sbni
In latest -mm sbni gives following warning: WARNING:
drivers/net/wan/sbni.o - Section mismatch: reference to \ .init.data:
from .text between 'init_module' (at offset 0x14ef) and \
'cleanup_module'
The warning is caused by init_module() calling a function declared
__init. Declare init_module() __init too to fix warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Arjan van de Ven [Tue, 21 Mar 2006 06:33:17 +0000 (22:33 -0800)]
[NET] sem2mutex: net/
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Arjan van de Ven [Tue, 21 Mar 2006 06:32:53 +0000 (22:32 -0800)]
[IRDA] sem2mutex: drivers/net/irda
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel. And turn dev_hold into an inline instead of a macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[DCCP]: Use sk->sk_prot->max_header consistently for non-data packets
Using this also provides opportunities for introducing
inet_csk_alloc_skb that would call alloc_skb, account it to the sock
and skb_reserve(max_header), but I'll leave this for later, for now
using sk_prot->max_header consistently is enough.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 21 Mar 2006 06:28:41 +0000 (22:28 -0800)]
[TG3]: Add new one-shot MSI handler
Support one-shot MSI on 5787.
This one-shot MSI idea is credited to David Miller. In this mode, MSI
disables itself automatically after it is generated, saving the driver
a register access to disable it for NAPI.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 21 Mar 2006 06:28:05 +0000 (22:28 -0800)]
[TG3]: Add new hard_start_xmit
Support 5787 hardware TSO using a new flag TG3_FLG2_HW_TSO_2.
Since the TSO interface is slightly different and these chips have
finally fixed the 4GB DMA problem and do not have the 40-bit DMA
problem, a new hard_start_xmit is used for these chips. All previous
chips will use the old hard_start_xmit that is now renamed
tg3_start_xmit_dma_bug().
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin LaHaise [Tue, 21 Mar 2006 06:27:12 +0000 (22:27 -0800)]
[NET]: use fget_light() in net/socket.c
Here's an updated copy of the patch to use fget_light in net/socket.c.
Rerunning the tests show a drop of ~80Mbit/s on average, which looks
bad until you see the drop in cpu usage from ~89% to ~82%. That will
get fixed in another patch...
Alpt [Tue, 21 Mar 2006 06:26:17 +0000 (22:26 -0800)]
[NET] rtnetlink: Add RTPROT entry for Netsukuku.
The Netsukuku daemon is using the same number to mark its routes, you
can see it here:
http://hinezumilabs.org/cgi-bin/viewcvs.cgi/netsukuku/src/krnl_route.h?rev=HEAD&content-type=text/vnd.viewcvs-markup
Signed-off-by: David S. Miller <davem@davemloft.net>
[NET]: Move destructor from neigh->ops to neigh_params
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed. We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.
The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup. Two additional
patches in the patch series update ipoib to use this new interface.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Baruch Even [Tue, 21 Mar 2006 06:22:47 +0000 (22:22 -0800)]
[TCP] H-TCP: Account for delayed-ACKs
Account for delayed-ACKs in H-TCP.
Delayed-ACKs cause H-TCP to be less aggressive than its design calls
for. It is especially true when the receiver is a Linux machine where
the average delayed ack is over 3 packets with values of 7 not unheard
of.
Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Baruch Even [Tue, 21 Mar 2006 06:22:20 +0000 (22:22 -0800)]
[TCP] H-TCP: Use msecs_to_jiffies
Use functions to calculate jiffies from milliseconds and not the old,
crude method of dividing HZ by a value. Ensures more accurate values
even in the face of strange HZ values.
Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Evgeniy Polyakov [Tue, 21 Mar 2006 06:21:40 +0000 (22:21 -0800)]
[CONNECTOR]: Use netlink_has_listeners() to avoind unnecessary allocations.
Return -ESRCH from cn_netlink_send() when there are not listeners,
just as it could be done by netlink_broadcast(). Propagate
netlink_broadcast() error back to the caller.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
David Basden [Tue, 21 Mar 2006 06:21:10 +0000 (22:21 -0800)]
[IRDA]: TOIM3232 dongle support
Here goes a patch for supporting TOIM3232 based serial IrDA dongles.
The code is based on the tekram dongle code.
It's been tested with a TOIM3232 based IRWave 320S dongle. It may work
for TOIM4232 dongles, although it's not been tested.
Signed-off-by: David Basden <davidb-irda@rcpt.to> Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Luiz Capitulino [Tue, 21 Mar 2006 06:17:55 +0000 (22:17 -0800)]
[PKTGEN]: Fix Initialization fail leak.
Even if pktgen's thread initialization fails for all CPUs, the module
will be successfully loaded.
This patch changes that behaivor, by returning an error on module load time,
and also freeing all the resources allocated. It also prints a warning if a
thread initialization has failed.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
Luiz Capitulino [Tue, 21 Mar 2006 06:16:40 +0000 (22:16 -0800)]
[PKTGEN]: Ports thread list to Kernel list implementation.
The final result is a simpler and smaller code.
Note that I'm adding a new member in the struct pktgen_thread called
'removed'. The reason is that I didn't find a better wait condition to
be used in the place of the replaced one.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
[DCCP] options: Fix some aspects of mandatory option processing
According to dccp draft (draft-ietf-dccp-spec-13.txt) section 5.8.2
(Mandatory Option) the following patch correct the handling of the
following cases:
1) "... and any Mandatory options received on DCCP-Data packets MUST be
ignored."
2) "The connection is in error and should be reset with Reset Code 5, ...
if option O is absent (Mandatory was the last byte of the option list), or
if option O equals Mandatory."
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Tue, 21 Mar 2006 05:58:29 +0000 (21:58 -0800)]
[DCCP] ipv4: make struct dccp_v4_prot static
There's no reason for struct dccp_v4_prot being global.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>