]> err.no Git - linux-2.6/log
linux-2.6
18 years ago[NETFILTER]: Use macro for spinlock_t/rwlock_t initializations/definition.
YOSHIFUJI Hideaki [Wed, 4 Jan 2006 21:56:54 +0000 (13:56 -0800)]
[NETFILTER]: Use macro for spinlock_t/rwlock_t initializations/definition.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Use macro for rwlock_t initialization.
YOSHIFUJI Hideaki [Wed, 4 Jan 2006 21:56:31 +0000 (13:56 -0800)]
[IPV6]: Use macro for rwlock_t initialization.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ECONET]: Use macro for spinlock_t definition.
YOSHIFUJI Hideaki [Wed, 4 Jan 2006 21:56:08 +0000 (13:56 -0800)]
[ECONET]: Use macro for spinlock_t definition.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPVS]: Add missing include <linux/net.h>
Arnaldo Carvalho de Melo [Wed, 4 Jan 2006 04:02:20 +0000 (02:02 -0200)]
[IPVS]: Add missing include <linux/net.h>

  CC [M]  net/ipv4/ipvs/ip_vs_conn.o
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In
  function 'ip_vs_conn_new':
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:606:
  warning: implicit declaration of function 'net_ratelimit'
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In
  function 'ip_vs_random_dropentry':
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:810:
  warning: implicit declaration of function 'net_random'

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
18 years ago[TCP]: syn_flood_warning is only needed if CONFIG_SYN_COOKIES is selected
Arnaldo Carvalho de Melo [Wed, 4 Jan 2006 03:58:06 +0000 (01:58 -0200)]
[TCP]: syn_flood_warning is only needed if CONFIG_SYN_COOKIES is selected

  CC      net/ipv4/tcp_ipv4.o
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/tcp_ipv4.c:665: warning:
  'syn_flood_warning' defined but not used

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
18 years ago[DCCP] ackvec: use u8 for the buf offsets
Arnaldo Carvalho de Melo [Wed, 4 Jan 2006 03:46:34 +0000 (01:46 -0200)]
[DCCP] ackvec: use u8 for the buf offsets

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
18 years ago[DCCP] ackvec: Fix spelling of "throw"
Andrea Bittau [Wed, 4 Jan 2006 03:45:17 +0000 (01:45 -0200)]
[DCCP] ackvec: Fix spelling of "throw"

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
18 years ago[TCP]: less inline's
Stephen Hemminger [Wed, 4 Jan 2006 00:03:49 +0000 (16:03 -0800)]
[TCP]: less inline's

TCP inline usage cleanup:
 * get rid of inline in several places
 * replace __inline__ with inline where possible
 * move functions used in one file out of tcp.h
 * let compiler decide on used once cases

On x86_64:
   text    data     bss     dec     hex filename
3594701  648348  567400 4810449  4966d1 vmlinux.orig
3593133  648580  567400 4809113  496199 vmlinux

On sparc64:
   text    data     bss     dec     hex filename
2538278  406152  530392 3474822  350586 vmlinux.ORIG
2536382  406384  530392 3473158  34ff06 vmlinux

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IEEE80211] ipw2200: Simplify multicast checks.
Stephen Hemminger [Tue, 3 Jan 2006 23:27:38 +0000 (15:27 -0800)]
[IEEE80211] ipw2200: Simplify multicast checks.

From: Stephen Hemminger <shemminger@osdl.org>

is_multicast_ether_addr() accepts broadcast too, so the
is_broadcast_ether_addr() calls are redundant.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Don't exclude broadcast addresses from is_multicast_ether_addr()
Stephen Hemminger [Tue, 3 Jan 2006 23:25:45 +0000 (15:25 -0800)]
[NET]: Don't exclude broadcast addresses from is_multicast_ether_addr()

The check for multicast shouldn't exclude broadcast type addresses.
This reverts the incorrect change done in 2.6.13.

The broadcast address is a multicast address and should be excluded
from being a valid_ether_address for use in bridging or device address.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV4] fib_trie: build fix
Stephen Hemminger [Tue, 3 Jan 2006 22:38:34 +0000 (14:38 -0800)]
[IPV4] fib_trie: build fix

Need this to fix build of fib_trie in net-2.6.16 (rebased) tree.
The code needs the new inet_make_mask inline.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BRIDGE]: Fix faulty check in br_stp_recalculate_bridge_id()
Stephen Hemminger [Tue, 3 Jan 2006 22:35:54 +0000 (14:35 -0800)]
[BRIDGE]: Fix faulty check in br_stp_recalculate_bridge_id()

One of the conversions from memcmp to compare_ether_addr is incorrect.
We need to do relative comparison to determine min MAC address to
use in bridge id.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Notify CCID only after ACK vectors have been processed.
Andrea Bittau [Tue, 3 Jan 2006 22:26:15 +0000 (14:26 -0800)]
[DCCP]: Notify CCID only after ACK vectors have been processed.

The CCID should be notified of packet reception only when a packet is
valid.  Therefore, the ACK vector needs to be processed before
notifying the CCID.  Also, the CCID might need information provided by
the ACK vector.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Send an ACK vector when ACKing a response packet
Andrea Bittau [Tue, 3 Jan 2006 22:25:49 +0000 (14:25 -0800)]
[DCCP]: Send an ACK vector when ACKing a response packet

If ACK vectors are used, each packet with an ACK should contain an ACK
vector.  The only exception currently is response packets.  It
probably is not a good idea to store ACK vector state before the
connection is completed (to help protect from syn floods).

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Do not process a packet twice when it's not in state DCCP_OPEN.
Andrea Bittau [Tue, 3 Jan 2006 22:25:17 +0000 (14:25 -0800)]
[DCCP]: Do not process a packet twice when it's not in state DCCP_OPEN.

When packets are received, the connection is either in DCCP_OPEN
[fast-path] or it isn't.  If it's not [e.g. DCCP_PARTOPEN] upper
layers will perform sanity checks and parse options.  If it is in
DCCP_OPEN, dccp_rcv_established() will do it.  It is important not to
re-parse options in dccp_rcv_established() when it is not called from
the fast-path.  Else, fore example, the ack vector will be added twice
and the CCID will see the packet twice.

The solution is to always enfore sanity checks from the upper layers.
When packets arrive in the fast-path, sanity checks will be performed
before calling dccp_rcv_established().

Note(acme): I rewrote the patch to achieve the same result but keeping
dccp_rcv_established with the previous semantics and having it split
into __dccp_rcv_established, that doesn't does do any sanity check,
code in state != DCCP_OPEN use this lighter version as they already do
the sanity checks.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DECNET]: Only use local routers
Patrick Caulfield [Tue, 3 Jan 2006 22:24:02 +0000 (14:24 -0800)]
[DECNET]: Only use local routers

The attached patch makes DECnet routing only use routers from the same
area - rather than the highest rated router seen.

In theory there should not be an out-of-area router on a local network
but some networks are bridged rather than properly routed. VMS seems
to behave similarly: if I bring up a VMS node with no router then it
can't see anything else on the global network.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPVS]: Cleanup IP_VS_DBG statements.
Roberto Nibali [Tue, 3 Jan 2006 22:22:59 +0000 (14:22 -0800)]
[IPVS]: Cleanup IP_VS_DBG statements.

From: Roberto Nibali <ratz@drugphish.ch>

The attached patch (against current -GIT) is a cleanup patch which does
following:

o lookup debug messages shifted back to 9
o added more informational value to flags and refcnt since those
entries can be in multiple referenced structures
o cleanup 80 char violation

It's the prepatch to the session pool implementation and helps very much
to debug and monitor important variables and structures regarding the
threshold limitation and persistency without the thousands of lookup
messages which noone is interested in.

Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TG3]: fixup tot_len calculation
Alexey Dobriyan [Tue, 3 Jan 2006 22:19:25 +0000 (14:19 -0800)]
[TG3]: fixup tot_len calculation

Turning struct iphdr::tot_len into __be16 added sparse warning.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Add a dev_ioctl() fallback to sock_ioctl()
Christoph Hellwig [Tue, 3 Jan 2006 22:18:33 +0000 (14:18 -0800)]
[NET]: Add a dev_ioctl() fallback to sock_ioctl()

Currently all network protocols need to call dev_ioctl as the default
fallback in their ioctl implementations.  This patch adds a fallback
to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
This way all the procotol ioctl handlers can be simplified and we don't
need to export dev_ioctl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETROM]: Remove unessecary lock_sock calls in netrom_ioctl()
Christoph Hellwig [Tue, 3 Jan 2006 22:14:46 +0000 (14:14 -0800)]
[NETROM]: Remove unessecary lock_sock calls in netrom_ioctl()

lock_sock is needed only in very few cases, so do it there instead of
around the switch statement.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETLINK] genetlink: fix cmd type in genl_ops to be consistent to u8
Per Liden [Tue, 3 Jan 2006 22:13:29 +0000 (14:13 -0800)]
[NETLINK] genetlink: fix cmd type in genl_ops to be consistent to u8

Signed-off-by: Per Liden <per.liden@ericsson.com>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AF_UNIX]: Convert to use a spinlock instead of rwlock
Benjamin LaHaise [Tue, 3 Jan 2006 22:10:46 +0000 (14:10 -0800)]
[AF_UNIX]: Convert to use a spinlock instead of rwlock

From: Benjamin LaHaise <bcrl@kvack.org>

In af_unix, a rwlock is used to protect internal state.  At least on my
P4 with HT it is faster to use a spinlock due to the simpler memory
barrier used to unlock.  This patch raises bw_unix to ~690K/s.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Speed up __alloc_skb()
Benjamin LaHaise [Tue, 3 Jan 2006 22:06:50 +0000 (14:06 -0800)]
[NET]: Speed up __alloc_skb()

From: Benjamin LaHaise <bcrl@kvack.org>

In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the
shared info structure results in gcc being forced to do a reload of the
pointer since it has no information on possible aliasing.  Fix this by
using a pointer to refer to skb_shared_info.

By initializing skb_shared_info sequentially, the write combining buffers
can reduce the number of memory transactions to a single write.  Reorder
the initialization in __alloc_skb() to match the structure definition.
There is also an alignment issue on 64 bit systems with skb_shared_info
by converting nr_frags to a short everything packs up nicely.

Also, pass the slab cache pointer according to the fclone flag instead
of using two almost identical function calls.

This raises bw_unix performance up to a peak of 707KB/s when combined
with the spinlock patch.  It should help other networking protocols, too.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PPPOX]: Fix assignment into const proto_ops.
David S. Miller [Wed, 28 Dec 2005 04:57:40 +0000 (20:57 -0800)]
[PPPOX]: Fix assignment into const proto_ops.

And actually, with this, the whole pppox layer can basically
be removed and subsumed into pppoe.c, no other pppox sub-protocol
implementation exists and we've had this thing for at least 4
years.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP]: Don't use __constant_htonl for a non const arg
Arnaldo Carvalho de Melo [Tue, 27 Dec 2005 17:17:57 +0000 (15:17 -0200)]
[TCP]: Don't use __constant_htonl for a non const arg

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h
Arnaldo Carvalho de Melo [Tue, 27 Dec 2005 04:43:12 +0000 (02:43 -0200)]
[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h

To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SOCK]: Introduce sk_receive_skb
Arnaldo Carvalho de Melo [Tue, 27 Dec 2005 04:42:22 +0000 (02:42 -0200)]
[SOCK]: Introduce sk_receive_skb

Its common enough to to justify that, TCP still can't use it as it has the
prequeueing stuff, still to be made generic in the not so distant future :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: restructure sock_aio_{read,write} / sock_{readv,writev}
Christoph Hellwig [Fri, 23 Dec 2005 05:08:46 +0000 (21:08 -0800)]
[NET]: restructure sock_aio_{read,write} / sock_{readv,writev}

Mid-term I plan to restructure the file_operations so that we don't need
to have all these duplicate aio and vectored versions.  This patch is
a small step in that direction but also a worthwile cleanup on it's own:

(1) introduce a alloc_sock_iocb helper that encapsulates allocating a
    proper sock_iocb
(2) add do_sock_read and do_sock_write helpers for common read/write
    code

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Fix sock_init() return value.
David S. Miller [Thu, 22 Dec 2005 20:58:55 +0000 (12:58 -0800)]
[NET]: Fix sock_init() return value.

It needs to return zero now that it is an initcall.

Also, net/nonet.c no longer needs a dummy sock_init().

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PKTGEN]: Deinitialise static variables.
Jaco Kroon [Thu, 22 Dec 2005 20:51:46 +0000 (12:51 -0800)]
[PKTGEN]: Deinitialise static variables.

static variables should not be explicitly initialised to 0.  This causes
them to be placed in .data instead of .bss.  This patch de-initialises 3
static variables in net/core/pktgen.c.

There are approximately 800 more such variables in the source tree
(2.6.15rc5).  If there is more interrest I'd be willing to track down the
rest of these as well and de-initialise them as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: move struct proto_ops to const
Eric Dumazet [Thu, 22 Dec 2005 20:49:22 +0000 (12:49 -0800)]
[NET]: move struct proto_ops to const

I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)

This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.

This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)

I made a global stubstitute to change all existing occurences to make
them const.

This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Small cleanup to socket initialization
Andi Kleen [Thu, 22 Dec 2005 20:43:42 +0000 (12:43 -0800)]
[NET]: Small cleanup to socket initialization

sock_init can be done as a core_initcall instead of calling
it directly in init/main.c

Also I removed an out of date #ifdef.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SCTP]: Add support for SCTP_DELAYED_ACK_TIME socket option.
Frank Filz [Thu, 22 Dec 2005 19:37:30 +0000 (11:37 -0800)]
[SCTP]: Add support for SCTP_DELAYED_ACK_TIME socket option.

Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SCTP]: Update SCTP_PEER_ADDR_PARAMS socket option to the latest api draft.
Frank Filz [Thu, 22 Dec 2005 19:36:46 +0000 (11:36 -0800)]
[SCTP]: Update SCTP_PEER_ADDR_PARAMS socket option to the latest api draft.

This patch adds support to set/get heartbeat interval, maximum number of
retransmissions, pathmtu, sackdelay time for a particular transport/
association/socket as per the latest SCTP sockets api draft11.

Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV4] fib_trie: Add credits.
Robert Olsson [Thu, 22 Dec 2005 19:25:10 +0000 (11:25 -0800)]
[IPV4] fib_trie: Add credits.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP] cubic: use Newton-Raphson
Stephen Hemminger [Thu, 22 Dec 2005 03:32:36 +0000 (19:32 -0800)]
[TCP] cubic: use Newton-Raphson

Replace cube root algorithim with a faster version using Newton-Raphson.
Surprisingly, doing the scaled div64_64 is faster than a true 64 bit
division on 64 bit CPU's.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP] cubic: precompute constants
Stephen Hemminger [Thu, 22 Dec 2005 03:32:08 +0000 (19:32 -0800)]
[TCP] cubic: precompute constants

Revised version of patch to pre-compute values for TCP cubic.
  * d32,d64 replaced with descriptive names
  * cube_factor replaces
 srtt[scaled by count] / HZ * ((1 << (10+2*BICTCP_HZ)) / bic_scale)
  * beta_scale replaces
8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta);

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[FLS64]: x86_64 version
Stephen Hemminger [Thu, 22 Dec 2005 03:31:36 +0000 (19:31 -0800)]
[FLS64]: x86_64 version

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[FLS64]: generic version
Stephen Hemminger [Thu, 22 Dec 2005 03:30:53 +0000 (19:30 -0800)]
[FLS64]: generic version

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PKT_SCHED] netem: packet corruption option
Stephen Hemminger [Thu, 22 Dec 2005 03:03:44 +0000 (19:03 -0800)]
[PKT_SCHED] netem: packet corruption option

Here is a new feature for netem in 2.6.16. It adds the ability to
randomly corrupt packets with netem. A version was done by
Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
compatibility with netlink interface and presence of hardware checksum
offload. It is useful for testing hardware offload in devices.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BRIDGE]: add version number
Stephen Hemminger [Thu, 22 Dec 2005 03:01:30 +0000 (19:01 -0800)]
[BRIDGE]: add version number

Add version info to bridge module.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BRIDGE]: limited ethtool support
Stephen Hemminger [Thu, 22 Dec 2005 03:00:58 +0000 (19:00 -0800)]
[BRIDGE]: limited ethtool support

Add limited ethtool support to bridge to allow disabling
features.

Note: if underlying device does not support a feature (like checksum
offload), then the bridge device won't inherit it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BRIDGE]: filter packets in learning state
Stephen Hemminger [Thu, 22 Dec 2005 03:00:18 +0000 (19:00 -0800)]
[BRIDGE]: filter packets in learning state

While in the learning state, run filters but drop the result.
This prevents us from acquiring bad fdb entries in learning state.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BRIDGE]: handle speed detection after carrier changes
Stephen Hemminger [Tue, 20 Dec 2005 23:19:51 +0000 (15:19 -0800)]
[BRIDGE]: handle speed detection after carrier changes

Speed of a interface may not be available until carrier
is detected in the case of autonegotiation. To get the correct value
we need to recheck speed after carrier event.  But the check needs to
be done in a context that is similar to normal ethtool interface (can sleep).

Also, delay check for 1ms to try avoid any carrier bounce transitions.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BRIDGE]: allow setting hardware address of bridge pseudo-dev
Stephen Hemminger [Thu, 22 Dec 2005 02:51:49 +0000 (18:51 -0800)]
[BRIDGE]: allow setting hardware address of bridge pseudo-dev

Some people are using bridging to hide multiple machines from an ISP
that restricts by MAC address. So in that case allow the bridge mac
address to be set to any of the existing interfaces.  I don't want to
allow any arbitrary value and confuse STP.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AF_UNIX]: Use spinlock for unix_table_lock
David S. Miller [Wed, 14 Dec 2005 07:26:29 +0000 (23:26 -0800)]
[AF_UNIX]: Use spinlock for unix_table_lock

This lock is actually taken mostly as a writer,
so using a rwlock actually just makes performance
worse especially on chips like the Intel P4.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IP_SOCKGLUE]: Remove most of the tcp specific calls
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:26:10 +0000 (23:26 -0800)]
[IP_SOCKGLUE]: Remove most of the tcp specific calls

As DCCP needs to be called in the same spots.

Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP]: Move the TCPF_ enum to tcp_states.h
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:25:56 +0000 (23:25 -0800)]
[TCP]: Move the TCPF_ enum to tcp_states.h

Upcoming patches will make, for instance, ip_sockglue.c need just this enum
and not all of tcp.h.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[INET6]: Generalise tcp_v6_hash_connect
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:25:44 +0000 (23:25 -0800)]
[INET6]: Generalise tcp_v6_hash_connect

Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[INET]: Generalise tcp_v4_hash_connect
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:25:31 +0000 (23:25 -0800)]
[INET]: Generalise tcp_v4_hash_connect

Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TWSK]: Introduce struct timewait_sock_ops
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:25:19 +0000 (23:25 -0800)]
[TWSK]: Introduce struct timewait_sock_ops

So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.

Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Use reqsk_free in dccp_v4_conn_request
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:25:06 +0000 (23:25 -0800)]
[DCCP]: Use reqsk_free in dccp_v4_conn_request

Now we have the destructor (dccp_v4_reqsk_destructor) in our
request_sock_ops vtable.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Introduce DCCPv6
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:24:53 +0000 (23:24 -0800)]
[DCCP]: Introduce DCCPv6

Still needs mucho polishing, specially in the checksum code, but works
just fine, inet_diag/iproute2 and all 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Export ipv6_opt_accepted
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:24:28 +0000 (23:24 -0800)]
[IPV6]: Export ipv6_opt_accepted

It was already non-TCP specific, will be used by DCCPv6.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:24:16 +0000 (23:24 -0800)]
[DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6

Basically exports a similar set of functions as the one exported by
the non-AF specific TCP code.

In the process moved some non-AF specific code from dccp_v4_connect to
dccp_connect_init and moved the checksum verification from
dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv
too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Just rename dccp_v4_prot to dccp_prot
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:23:32 +0000 (23:23 -0800)]
[DCCP]: Just rename dccp_v4_prot to dccp_prot

To match TCP equivalent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Export some symbols for DCCPv6
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:23:20 +0000 (23:23 -0800)]
[IPV6]: Export some symbols for DCCPv6

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Introduce inet6_timewait_sock
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:23:09 +0000 (23:23 -0800)]
[IPV6]: Introduce inet6_timewait_sock

Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.

tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Generalise some functions
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:22:54 +0000 (23:22 -0800)]
[IPV6]: Generalise some functions

Using sk->sk_protocol instead of IPPROTO_TCP.

Will be used by DCCPv6 in the next changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AF_UNIX]: Remove superfluous reference counting in unix_stream_sendmsg
Benjamin LaHaise [Wed, 14 Dec 2005 07:22:32 +0000 (23:22 -0800)]
[AF_UNIX]: Remove superfluous reference counting in unix_stream_sendmsg

AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a
lot of pipeline flushes from atomic operations.  The patch below
removes the sock_hold() and sock_put() in unix_stream_sendmsg().  This
should be safe as the socket still holds a reference to its peer which
is only released after the file descriptor's final user invokes
unix_release_sock().  The only consideration is that we must add a
memory barrier before setting the peer initially.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Avoid atomic xchg() for non-error case
Benjamin LaHaise [Wed, 14 Dec 2005 07:22:19 +0000 (23:22 -0800)]
[NET]: Avoid atomic xchg() for non-error case

It also looks like there were 2 places where the test on sk_err was
missing from the event wait logic (in sk_stream_wait_connect and
sk_stream_wait_memory), while the rest of the sock_error() users look
to be doing the right thing.  This version of the patch fixes those,
and cleans up a few places that were testing ->sk_err directly.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPVS]: remove dead code
Roberto Nibali [Wed, 14 Dec 2005 07:17:20 +0000 (23:17 -0800)]
[IPVS]: remove dead code

This patch removes dead code. I don't see the reason to keep this cruft
around, besides cluttering the nice and functionally working code.

Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[UDP]: udp_checksum_init return value
Stephen Hemminger [Wed, 14 Dec 2005 07:17:02 +0000 (23:17 -0800)]
[UDP]: udp_checksum_init return value

Since udp_checksum_init always returns 0 there is no point in
having it return a value.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IP]: Simplify and consolidate MSG_PEEK error handling
Herbert Xu [Wed, 14 Dec 2005 07:16:37 +0000 (23:16 -0800)]
[IP]: Simplify and consolidate MSG_PEEK error handling

When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue.  This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.

Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6.  This patch moves them into a one place and simplifies
the code somewhat.

This is based on a suggestion by Eric Dumazet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[DCCP]: Introduce dccp_ipv4_af_ops
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:16:16 +0000 (23:16 -0800)]
[DCCP]: Introduce dccp_ipv4_af_ops

And make the core DCCP code AF agnostic, just like TCP, now its time
to work on net/dccp/ipv6.c, we are close to the end!

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ICSK]: Move v4_addr2sockaddr from TCP to icsk
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:16:04 +0000 (23:16 -0800)]
[ICSK]: Move v4_addr2sockaddr from TCP to icsk

Renaming it to inet_csk_addr2sockaddr.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:15:52 +0000 (23:15 -0800)]
[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops

And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Introduce inet6_rsk()
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:15:40 +0000 (23:15 -0800)]
[IPV6]: Introduce inet6_rsk()

And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Generalise tcp_v6_search_req & tcp_v6_synq_add
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:15:24 +0000 (23:15 -0800)]
[IPV6]: Generalise tcp_v6_search_req & tcp_v6_synq_add

More work is needed tho to introduce inet6_request_sock from
tcp6_request_sock, in the same layout considerations as ipv6_pinfo in
inet_sock, next changeset will do that.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:15:12 +0000 (23:15 -0800)]
[ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Generalise __tcp_v6_hash, renaming it to __inet6_hash
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:15:01 +0000 (23:15 -0800)]
[IPV6]: Generalise __tcp_v6_hash, renaming it to __inet6_hash

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port
Arnaldo Carvalho de Melo [Wed, 14 Dec 2005 07:14:47 +0000 (23:14 -0800)]
[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV4]: Safer reassembly
Herbert Xu [Wed, 14 Dec 2005 07:14:27 +0000 (23:14 -0800)]
[IPV4]: Safer reassembly

Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.

(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)

This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER] ebtables: Support nf_log API from ebt_log and ebt_ulog
Bart De Schuymer [Wed, 14 Dec 2005 07:14:08 +0000 (23:14 -0800)]
[NETFILTER] ebtables: Support nf_log API from ebt_log and ebt_ulog

This makes ebt_log and ebt_ulog use the new nf_log api.  This enables
the bridging packet filter to log packets e.g. via nfnetlink_log.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER] ip_tables: NUMA-aware allocation
Eric Dumazet [Wed, 14 Dec 2005 07:13:48 +0000 (23:13 -0800)]
[NETFILTER] ip_tables: NUMA-aware allocation

Part of a performance problem with ip_tables is that memory allocation
is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch
separate cache lines)

Even with small iptables rules, the cost of this misplacement can be
high on common workloads.  Instead of using one vmalloc() area
(located in the node of the iptables process), we now allocate an area
for each possible CPU, using vmalloc_node() so that memory should be
allocated in the CPU's node if possible.

Port to arp_tables and ip6_tables by Harald Welte.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP] BIC: CUBIC window growth (2.0)
Stephen Hemminger [Wed, 14 Dec 2005 07:13:28 +0000 (23:13 -0800)]
[TCP] BIC: CUBIC window growth (2.0)

Replace existing BIC version 1.1 with new version 2.0.
The main change is to replace the window growth function
with a cubic function as described in:
  http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP] BIC: spelling and whitespace
Stephen Hemminger [Wed, 14 Dec 2005 07:13:13 +0000 (23:13 -0800)]
[TCP] BIC: spelling and whitespace

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP] BIC: remove low utilization code.
Stephen Hemminger [Wed, 14 Dec 2005 07:13:00 +0000 (23:13 -0800)]
[TCP] BIC: remove low utilization code.

The latest BICTCP patch at:
http://www.csc.ncsu.edu:8080/faculty/rhee/export/bitcp/index_files/Page546.htm

disables the low_utilization feature of BICTCP because it doesn't work
in some cases. This patch removes it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[LSM-IPSec]: Per-packet access control.
Trent Jaeger [Wed, 14 Dec 2005 07:12:40 +0000 (23:12 -0800)]
[LSM-IPSec]: Per-packet access control.

This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets.  Extensions to the SELinux LSM are
included that leverage the patch for this purpose.

This patch implements the changes necessary to the SELinux LSM to
create, deallocate, and use security contexts for policies
(xfrm_policy) and security associations (xfrm_state) that enable
control of a socket's ability to send and receive packets.

Patch purpose:

The patch is designed to enable the SELinux LSM to implement access
control on individual packets based on the strongly authenticated
IPSec security association.  Such access controls augment the existing
ones in SELinux based on network interface and IP address.  The former
are very coarse-grained, and the latter can be spoofed.  By using
IPSec, the SELinux can control access to remote hosts based on
cryptographic keys generated using the IPSec mechanism.  This enables
access control on a per-machine basis or per-application if the remote
machine is running the same mechanism and trusted to enforce the
access control policy.

Patch design approach:

The patch's main function is to authorize a socket's access to a IPSec
policy based on their security contexts.  Since the communication is
implemented by a security association, the patch ensures that the
security association's negotiated and used have the same security
context.  The patch enables allocation and deallocation of such
security contexts for policies and security associations.  It also
enables copying of the security context when policies are cloned.
Lastly, the patch ensures that packets that are sent without using a
IPSec security assocation with a security context are allowed to be
sent in that manner.

A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.

Patch implementation details:

The function which authorizes a socket to perform a requested
operation (send/receive) on a IPSec policy (xfrm_policy) is
selinux_xfrm_policy_lookup.  The Netfilter and rcv_skb hooks ensure
that if a IPSec SA with a securit y association has not been used,
then the socket is allowed to send or receive the packet,
respectively.

The patch implements SELinux function for allocating security contexts
when policies (xfrm_policy) are created via the pfkey or xfrm_user
interfaces via selinux_xfrm_policy_alloc.  When a security association
is built, SELinux allocates the security context designated by the
XFRM subsystem which is based on that of the authorized policy via
selinux_xfrm_state_alloc.

When a xfrm_policy is cloned, the security context of that policy, if
any, is copied to the clone via selinux_xfrm_policy_clone.

When a xfrm_policy or xfrm_state is freed, its security context, if
any is also freed at selinux_xfrm_policy_free or
selinux_xfrm_state_free.

Testing:

The SELinux authorization function is tested using ipsec-tools.  We
created policies and security associations with particular security
contexts and added SELinux access control policy entries to verify the
authorization decision.  We also made sure that packets for which no
security context was supplied (which either did or did not use
security associations) were authorized using an unlabelled context.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[LSM-IPSec]: Security association restriction.
Trent Jaeger [Wed, 14 Dec 2005 07:12:27 +0000 (23:12 -0800)]
[LSM-IPSec]: Security association restriction.

This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets.  Extensions to the SELinux LSM are
included that leverage the patch for this purpose.

This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.

Patch purpose:

The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association.  Such access
controls augment the existing ones based on network interface and IP
address.  The former are very coarse-grained, and the latter can be
spoofed.  By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.

Patch design approach:

The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.

A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.

Patch implementation details:

On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools).  This is enforced in xfrm_state_find.

On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.

The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.

Also, if IPSec is used without security contexts, the impact is
minimal.  The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.

Testing:

The pfkey interface is tested using the ipsec-tools.  ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.

The xfrm_user interface is tested via ad hoc programs that set
security contexts.  These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface.  Testing of sa functions was done by tracing kernel
behavior.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years agoLinux v2.6.15 v2.6.15
Linus Torvalds [Tue, 3 Jan 2006 03:21:10 +0000 (19:21 -0800)]
Linux v2.6.15

Hey, it's fifteen years today since I bought the machine that got Linux
started.  January 2nd is a good date.

18 years ago[PATCH] Make sure interleave masks have at least one node set
Andi Kleen [Mon, 2 Jan 2006 23:07:28 +0000 (00:07 +0100)]
[PATCH] Make sure interleave masks have at least one node set

Otherwise a bad mem policy system call can confuse the interleaving
code into referencing undefined nodes.

Originally reported by Doug Chapman

I was told it's CVE-2005-3358
(one has to love these security people - they make everything sound important)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Avoid namespace pollution in <asm/param.h>
Dag-Erling Smørgrav [Mon, 2 Jan 2006 14:57:06 +0000 (15:57 +0100)]
[PATCH] Avoid namespace pollution in <asm/param.h>

In commit 3D59121003721a8fad11ee72e646fd9d3076b5679c, the x86 and x86-64
<asm/param.h> was changed to include <linux/config.h> for the
configurable timer frequency.

However, asm/param.h is sometimes used in userland (it is included
indirectly from <sys/param.h>), so your commit pollutes the userland
namespace with tons of CONFIG_FOO macros.  This greatly confuses
software packages (such as BusyBox) which use CONFIG_FOO macros
themselves to control the inclusion of optional features.

After a short exchange, Christoph approved this patch

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] powerpc: more g5 overtemp problem fix
Benjamin Herrenschmidt [Mon, 2 Jan 2006 02:04:44 +0000 (13:04 +1100)]
[PATCH] powerpc: more g5 overtemp problem fix

Some G5s still occasionally experience shutdowns due to overtemp
conditions despite the recent fix. After analyzing logs from such
machines, it appears that the overtemp code is a bit too quick at
shutting the machine down when reaching the critical temperature (tmax +
8) and doesn't leave the fan enough time to actually cool it down. This
happens if the temperature of a CPU suddenly rises too high in a very
short period of time, or occasionally on boot (that is the CPUs are
already overtemp by the time the driver loads).

This patches makes the code a bit more relaxed, leaving a few seconds to
the fans to do their job before kicking the machine shutown.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86: teach dump_task_regs() about the -8 offset.
Stas Sergeev [Sun, 1 Jan 2006 01:18:52 +0000 (04:18 +0300)]
[PATCH] x86: teach dump_task_regs() about the -8 offset.

This should fix multi-threaded core-files

Signed-off-by: stsp@aknet.ru
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agosysctl: make sure to terminate strings with a NUL
Linus Torvalds [Sun, 1 Jan 2006 01:00:29 +0000 (17:00 -0800)]
sysctl: make sure to terminate strings with a NUL

This is a slightly more complete fix for the previous minimal sysctl
string fix.  It always terminates the returned string with a NUL, even
if the full result wouldn't fit in the user-supplied buffer.

The returned length is the full untruncated length, so that you can
tell when truncation has occurred.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge master.kernel.org:/home/rmk/linux-2.6-serial
Linus Torvalds [Sat, 31 Dec 2005 21:49:26 +0000 (13:49 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-serial

18 years ago[PATCH] Fix false old value return of sysctl
Yi Yang [Fri, 30 Dec 2005 08:37:10 +0000 (16:37 +0800)]
[PATCH] Fix false old value return of sysctl

For the sysctl syscall, if the user wants to get the old value of a
sysctl entry and set a new value for it in the same syscall, the old
value is always overwritten by the new value if the sysctl entry is of
string type and if the user sets its strategy to sysctl_string.  This
issue lies in the strategy being run twice if the strategy is set to
sysctl_string, the general strategy sysctl_string always returns 0 if
success.

Such strategy routines as sysctl_jiffies and sysctl_jiffies_ms return 1
because they do read and write for the sysctl entry.

The strategy routine sysctl_string return 0 although it actually read
and write the sysctl entry.

According to my analysis, if a strategy routine do read and write, it
should return 1, if it just does some necessary check but not read and
write, it should return 0, for example sysctl_intvec.

Signed-off-by: Yi Yang <yang.y.yi@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agosysctl: don't overflow the user-supplied buffer with '\0'
Linus Torvalds [Sat, 31 Dec 2005 01:18:53 +0000 (17:18 -0800)]
sysctl: don't overflow the user-supplied buffer with '\0'

If the string was too long to fit in the user-supplied buffer,
the sysctl layer would zero-terminate it by writing past the
end of the buffer. Don't do that.

Noticed by Yi Yang <yang.y.yi@gmail.com>

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoInsanity avoidance in /proc
Linus Torvalds [Fri, 30 Dec 2005 16:39:10 +0000 (08:39 -0800)]
Insanity avoidance in /proc

The old /proc interfaces were never updated to use loff_t, and are just
generally broken.  Now, we should be using the seq_file interface for
all of the proc files, but converting the legacy functions is more work
than most people care for and has little upside..

But at least we can make the non-LFS rules explicit, rather than just
insanely wrapping the offset or something.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Input: wacom - fix X axis setup
Denny Priebe [Fri, 30 Dec 2005 03:19:09 +0000 (22:19 -0500)]
[PATCH] Input: wacom - fix X axis setup

This patch fixes a typo introduced by conversion to dynamic input_dev
allocation.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Input: warrior - fix HAT0Y axis setup
Dmitry Torokhov [Fri, 30 Dec 2005 03:19:08 +0000 (22:19 -0500)]
[PATCH] Input: warrior - fix HAT0Y axis setup

This patch fixes a typo introduced by conversion to dynamic input_dev
allocation.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Input: kbtab - fix Y axis setup
Dmitry Torokhov [Fri, 30 Dec 2005 03:19:07 +0000 (22:19 -0500)]
[PATCH] Input: kbtab - fix Y axis setup

This patch fixes a typo introduced by conversion to dynamic input_dev
allocation.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[ARM] 3216/1: indent and typo in drivers/serial/pxa.c
Erik Hovland [Fri, 30 Dec 2005 15:57:35 +0000 (15:57 +0000)]
[ARM] 3216/1: indent and typo in drivers/serial/pxa.c

Patch from Erik Hovland

This patch provides two changes. An indent is supplied for an if/else clause so that it is more readable. An acronym is incorrectly typed as UER when it should be IER.

Signed-off-by: Erik Hovland <erik@hovland.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
18 years ago[PATCH] Simplify the VIDEO_SAA7134_OSS Kconfig dependency line
Jean Delvare [Thu, 29 Dec 2005 21:07:30 +0000 (22:07 +0100)]
[PATCH] Simplify the VIDEO_SAA7134_OSS Kconfig dependency line

Thanks to Roman Zippel for the suggestion.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
[ Short explanation: Kconfig uses ternary math: n/m/y, and !m is m ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoRevert radeon AGP aperture offset changes
Linus Torvalds [Thu, 29 Dec 2005 21:01:54 +0000 (13:01 -0800)]
Revert radeon AGP aperture offset changes

This reverts the series of commits

67dbb4ea33731415fe09c62149a34f472719ac1d
281ab031a8c9e5b593142eb4ec59a87faae8676a
47807ce381acc34a7ffee2b42e35e96c0f322e52

that changed the GART VM start offset.  It fixed some machines, but
seems to continually interact badly with some X versions.

Quoth Ben Herrenschmidt:

  "So I think at this point, the best is that we keep the old bogus code
   that at least is consistent with the bug in the server. I'm working on a
   big patch to X that reworks the memory map stuff completely and fixes
   those issues on the server side, I'll do a DRM patch matching this X fix
   as well so that the memory map is only ever set in one place and with
   what I hope is a correct algorithm..."

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge master.kernel.org:/home/rmk/linux-2.6-mmc
Linus Torvalds [Thu, 29 Dec 2005 18:27:28 +0000 (10:27 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-mmc

18 years agoMerge master.kernel.org:/home/rmk/linux-2.6-serial
Linus Torvalds [Thu, 29 Dec 2005 18:27:07 +0000 (10:27 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-serial

18 years ago[PATCH] Fix recursive config dependency for SAA7134
Jean Delvare [Wed, 28 Dec 2005 20:02:57 +0000 (21:02 +0100)]
[PATCH] Fix recursive config dependency for SAA7134

Fix the cyclic dependency issue between CONFIG_SAA7134_ALSA and
CONFIG_SAA7134_OSS (credits to Mauro Carvalho Chehab.)

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] ppc64: htab_initialize_secondary cannot be marked __init
Anton Blanchard [Wed, 28 Dec 2005 23:46:29 +0000 (10:46 +1100)]
[PATCH] ppc64: htab_initialize_secondary cannot be marked __init

Sonny has noticed hotplug CPU on ppc64 is broken in 2.6.15-*. One of the
problems is that htab_initialize_secondary is called when a cpu is being
brought up, but it is marked __init.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>