]> err.no Git - linux-2.6/log
linux-2.6
17 years ago[NETFILTER]: nfnetlink_queue: don't unregister handler of other subsystem
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:40:08 +0000 (22:40 -0700)]
[NETFILTER]: nfnetlink_queue: don't unregister handler of other subsystem

The queue handlers registered by ip[6]_queue.ko at initialization should
not be unregistered according to requests from userland program
using nfnetlink_queue. If we allow that, there is no way to register
the handlers of built-in ip[6]_queue again.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Convert DEBUGP to pr_debug
Patrick McHardy [Sun, 8 Jul 2007 05:39:38 +0000 (22:39 -0700)]
[NETFILTER]: Convert DEBUGP to pr_debug

Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: xt_helper: use RCU
Patrick McHardy [Sun, 8 Jul 2007 05:39:16 +0000 (22:39 -0700)]
[NETFILTER]: xt_helper: use RCU

The ->helper pointer is protected by RCU, no need to take
nf_conntrack_lock. Also remove excessive debugging.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_h323: turn some printks into DEBUGPs
Patrick McHardy [Sun, 8 Jul 2007 05:38:54 +0000 (22:38 -0700)]
[NETFILTER]: nf_conntrack_h323: turn some printks into DEBUGPs

Don't spam the ringbuffer with decoding errors. The only printks remaining
are for dropped packets when we're certain they are H.323.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: ipt_CLUSTERIP: add compat code
Patrick McHardy [Sun, 8 Jul 2007 05:38:30 +0000 (22:38 -0700)]
[NETFILTER]: ipt_CLUSTERIP: add compat code

Adjust structure size and don't expect pointers passed in from
userspace to be valid. Also replace an enum in an ABI structure
by a fixed size type.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: ipt_SAME: add to feature-removal-schedule
Patrick McHardy [Sun, 8 Jul 2007 05:38:07 +0000 (22:38 -0700)]
[NETFILTER]: ipt_SAME: add to feature-removal-schedule

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: early_drop improvement
Patrick McHardy [Sun, 8 Jul 2007 05:37:38 +0000 (22:37 -0700)]
[NETFILTER]: nf_conntrack: early_drop improvement

When the maximum number of conntrack entries is reached and a new
one needs to be allocated, conntrack tries to drop an unassured
connection from the same hash bucket the new conntrack would hash
to. Since with a properly sized hash the average number of entries
per bucket is 1, the chances of actually finding one are not very
good. This patch makes it walk the hash until a minimum number of
8 entries are checked.

Based on patch by Vasily Averin <vvs@sw.ru>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: mark helpers __read_mostly
Patrick McHardy [Sun, 8 Jul 2007 05:37:03 +0000 (22:37 -0700)]
[NETFILTER]: nf_conntrack: mark helpers __read_mostly

Most are __read_mostly already, this changes the remaining ones.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_helper: use hashtable for conntrack helpers
Patrick McHardy [Sun, 8 Jul 2007 05:36:46 +0000 (22:36 -0700)]
[NETFILTER]: nf_conntrack_helper: use hashtable for conntrack helpers

Eliminate the last global list searched for every new connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_expect: introduce nf_conntrack_expect_max sysct
Patrick McHardy [Sun, 8 Jul 2007 05:36:24 +0000 (22:36 -0700)]
[NETFILTER]: nf_conntrack_expect: introduce nf_conntrack_expect_max sysct

As a last step of preventing DoS by creating lots of expectations, this
patch introduces a global maximum and a sysctl to control it. The default
is initialized to 4 * the expectation hash table size, which results in
1/64 of the default maxmimum of conntracks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_expect: maintain per conntrack expectation list
Patrick McHardy [Sun, 8 Jul 2007 05:35:56 +0000 (22:35 -0700)]
[NETFILTER]: nf_conntrack_expect: maintain per conntrack expectation list

This patch brings back the per-conntrack expectation list that was
removed around 2.6.10 to avoid walking all expectations on expectation
eviction and conntrack destruction.

As these were the last users of the global expectation list, this patch
also kills that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_helper/nf_conntrack_netlink: convert to expectation hash
Patrick McHardy [Sun, 8 Jul 2007 05:35:21 +0000 (22:35 -0700)]
[NETFILTER]: nf_conntrack_helper/nf_conntrack_netlink: convert to expectation hash

Convert from the global expectation list to the hash table.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_expect: convert proc functions to hash
Patrick McHardy [Sun, 8 Jul 2007 05:34:07 +0000 (22:34 -0700)]
[NETFILTER]: nf_conntrack_expect: convert proc functions to hash

Convert from the global expectation list to the hash table.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: use hashtable for expectations
Patrick McHardy [Sun, 8 Jul 2007 05:33:47 +0000 (22:33 -0700)]
[NETFILTER]: nf_conntrack: use hashtable for expectations

Currently all expectations are kept on a global list that

- needs to be searched for every new conncetion
- needs to be walked for evicting expectations when a master connection
  has reached its limit
- needs to be walked on connection destruction for connections that
  have open expectations

This is obviously not good, especially when considering helpers like
H.323 that register *lots* of expectations and can set up permanent
expectations, but it also allows for an easy DoS against firewalls
using connection tracking helpers.

Use a hashtable for expectations to avoid incurring the search overhead
for every new connection. The default hash size is 1/256 of the conntrack
hash table size, this can be overriden using a module parameter.

This patch only introduces the hash table for expectation lookups and
keeps other users to reduce the noise, the following patches will get
rid of it completely.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: move expectaton related init code to nf_conntrack_expect.c
Patrick McHardy [Sun, 8 Jul 2007 05:32:53 +0000 (22:32 -0700)]
[NETFILTER]: nf_conntrack: move expectaton related init code to nf_conntrack_expect.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_netlink: sync expectation dumping with conntrack table...
Patrick McHardy [Sun, 8 Jul 2007 05:32:34 +0000 (22:32 -0700)]
[NETFILTER]: nf_conntrack_netlink: sync expectation dumping with conntrack table dumping

Resync expectation table dumping code with conntrack dumping: don't
rely on the unique ID anymore since that requires to walk the list
backwards, which doesn't work with the upcoming conversion to hlists.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_expect: avoid useless list walking
Patrick McHardy [Sun, 8 Jul 2007 05:32:03 +0000 (22:32 -0700)]
[NETFILTER]: nf_conntrack_expect: avoid useless list walking

Don't walk the list when unexpecting an expectation, we already
have a reference and the timer check is enough to guarantee
that it still is on the list.

This comment suggests that it was copied there by mistake from
expectation eviction:

/* choose the oldest expectation to evict */

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: reduce masks to a subset of tuples
Patrick McHardy [Sun, 8 Jul 2007 05:31:32 +0000 (22:31 -0700)]
[NETFILTER]: nf_conntrack: reduce masks to a subset of tuples

Since conntrack currently allows to use masks for every bit of both
helper and expectation tuples, we can't hash them and have to keep
them on two global lists that are searched for every new connection.

This patch removes the never used ability to use masks for the
destination part of the expectation tuple and completely removes
masks from helpers since the only reasonable choice is a full
match on l3num, protonum and src.u.all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_ftp: use nf_ct_expect_init
Patrick McHardy [Sun, 8 Jul 2007 05:31:07 +0000 (22:31 -0700)]
[NETFILTER]: nf_conntrack_ftp: use nf_ct_expect_init

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_expect: function naming unification
Patrick McHardy [Sun, 8 Jul 2007 05:30:49 +0000 (22:30 -0700)]
[NETFILTER]: nf_conntrack_expect: function naming unification

Currently there is a wild mix of nf_conntrack_expect_, nf_ct_exp_,
expect_, exp_, ...

Consistently use nf_ct_ as prefix for exported functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: use hlists for bysource hash
Patrick McHardy [Sun, 8 Jul 2007 05:30:27 +0000 (22:30 -0700)]
[NETFILTER]: nf_nat: use hlists for bysource hash

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: export hash allocation/destruction functions
Patrick McHardy [Sun, 8 Jul 2007 05:30:08 +0000 (22:30 -0700)]
[NETFILTER]: nf_conntrack: export hash allocation/destruction functions

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: remove 'ignore_conntrack' argument from nf_conntrack_find_get
Patrick McHardy [Sun, 8 Jul 2007 05:28:42 +0000 (22:28 -0700)]
[NETFILTER]: nf_conntrack: remove 'ignore_conntrack' argument from nf_conntrack_find_get

All callers pass NULL, this also doesn't seem very useful for modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: use hlists for conntrack hash
Patrick McHardy [Sun, 8 Jul 2007 05:28:14 +0000 (22:28 -0700)]
[NETFILTER]: nf_conntrack: use hlists for conntrack hash

Convert conntrack hash to hlists to reduce its size and cache
footprint. Since the default hashsize to max. entries ratio
sucks (1:16), this patch doesn't reduce the amount of memory
used for the hash by default, but instead uses a better ratio
of 1:8, which results in the same max. entries value.

One thing worth noting is early_drop. It really should use LRU,
so it now has to iterate over the entire chain to find the last
unconfirmed entry. Since chains shouldn't be very long and the
entire operation is very rare this shouldn't be a problem.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: round up hashsize to next multiple of PAGE_SIZE
Patrick McHardy [Sun, 8 Jul 2007 05:27:33 +0000 (22:27 -0700)]
[NETFILTER]: nf_conntrack: round up hashsize to next multiple of PAGE_SIZE

Don't let the rest of the page go to waste.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_extend: use __read_mostly for struct nf_ct_ext_type
Patrick McHardy [Sun, 8 Jul 2007 05:27:06 +0000 (22:27 -0700)]
[NETFILTER]: nf_conntrack_extend: use __read_mostly for struct nf_ct_ext_type

Also make them static.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: merge nf_conn and nf_nat_info
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:26:35 +0000 (22:26 -0700)]
[NETFILTER]: nf_nat: merge nf_conn and nf_nat_info

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: kill global 'destroy' operation
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:26:16 +0000 (22:26 -0700)]
[NETFILTER]: nf_nat: kill global 'destroy' operation

This kills the global 'destroy' operation which was used by NAT.
Instead it uses the extension infrastructure so that multiple
extensions can register own operations.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: remove old memory allocator of conntrack
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:25:51 +0000 (22:25 -0700)]
[NETFILTER]: nf_conntrack: remove old memory allocator of conntrack

Now memory space for help and NAT are allocated by extension
infrastructure.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: remove unused nf_nat_module_is_loaded
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:25:28 +0000 (22:25 -0700)]
[NETFILTER]: nf_nat: remove unused nf_nat_module_is_loaded

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: use extension infrastructure
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:24:28 +0000 (22:24 -0700)]
[NETFILTER]: nf_nat: use extension infrastructure

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: add reference to conntrack from entry of bysource list
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:24:04 +0000 (22:24 -0700)]
[NETFILTER]: nf_nat: add reference to conntrack from entry of bysource list

I will split 'struct nf_nat_info' out from conntrack. So I cannot use
'offsetof' to get the pointer to conntrack from it.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: use extension infrastructure for helper
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:23:42 +0000 (22:23 -0700)]
[NETFILTER]: nf_conntrack: use extension infrastructure for helper

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack: introduce extension infrastructure
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:23:21 +0000 (22:23 -0700)]
[NETFILTER]: nf_conntrack: introduce extension infrastructure

Old space allocator of conntrack had problems about extensibility.
- It required slab cache per combination of extensions.
- It expected what extensions would be assigned, but it was impossible
  to expect that completely, then we allocated bigger memory object than
  really required.
- It needed to search helper twice due to lock issue.

Now basic informations of a connection are stored in 'struct nf_conn'.
And a storage for extension (helper, NAT) is allocated by kmalloc.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat: move NAT declarations from nf_conntrack_ipv4.h to nf_nat.h
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:22:33 +0000 (22:22 -0700)]
[NETFILTER]: nf_nat: move NAT declarations from nf_conntrack_ipv4.h to nf_nat.h

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: x_tables: mark matches and targets __read_mostly
Patrick McHardy [Sun, 8 Jul 2007 05:22:02 +0000 (22:22 -0700)]
[NETFILTER]: x_tables: mark matches and targets __read_mostly

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: x_tables: add TRACE target
Jozsef Kadlecsik [Sun, 8 Jul 2007 05:21:23 +0000 (22:21 -0700)]
[NETFILTER]: x_tables: add TRACE target

The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Add u32 match
Jan Engelhardt [Sun, 8 Jul 2007 05:20:36 +0000 (22:20 -0700)]
[NETFILTER]: Add u32 match

Along comes... xt_u32, a revamped ipt_u32 from POM-NG,
Plus:

    * 2007-06-02: added ipv6 support

    * 2007-06-05: uses kmalloc for the big buffer

    *   2007-06-05: added inversion

    *   2007-06-20: use skb_copy_bits() and get rid of the big buffer
        and lock (suggested by Pablo Neira Ayuso)

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_nat_sip: only perform RTP DNAT if SIP session was SNATed
Jerome Borsboom [Sun, 8 Jul 2007 05:19:48 +0000 (22:19 -0700)]
[NETFILTER]: nf_nat_sip: only perform RTP DNAT if SIP session was SNATed

DNAT of the the RTP session is only necessary if the SIP session has
been SNATed.

Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Remove redundant parentheses/braces
Jan Engelhardt [Sun, 8 Jul 2007 05:19:08 +0000 (22:19 -0700)]
[NETFILTER]: Remove redundant parentheses/braces

Removes redundant parentheses and braces (And add one pair in a
xt_tcpudp.c macro).

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Remove incorrect inline markers
Jan Engelhardt [Sun, 8 Jul 2007 05:17:36 +0000 (22:17 -0700)]
[NETFILTER]: Remove incorrect inline markers

device_cmp: the function's address is taken (call to nf_ct_iterate_cleanup)
alloc_null_binding: referenced externally

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: add some consts, remove some casts
Jan Engelhardt [Sun, 8 Jul 2007 05:16:55 +0000 (22:16 -0700)]
[NETFILTER]: add some consts, remove some casts

Make a number of variables const and/or remove unneeded casts.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: x_tables: switch xt_target->checkentry to bool
Jan Engelhardt [Sun, 8 Jul 2007 05:16:26 +0000 (22:16 -0700)]
[NETFILTER]: x_tables: switch xt_target->checkentry to bool

Switch the return type of target checkentry functions to boolean.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: x_tables: switch xt_match->checkentry to bool
Jan Engelhardt [Sun, 8 Jul 2007 05:16:00 +0000 (22:16 -0700)]
[NETFILTER]: x_tables: switch xt_match->checkentry to bool

Switch the return type of match functions to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: x_tables: switch xt_match->match to bool
Jan Engelhardt [Sun, 8 Jul 2007 05:15:35 +0000 (22:15 -0700)]
[NETFILTER]: x_tables: switch xt_match->match to bool

Switch the return type of match functions to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: x_tables: switch hotdrop to bool
Jan Engelhardt [Sun, 8 Jul 2007 05:15:12 +0000 (22:15 -0700)]
[NETFILTER]: x_tables: switch hotdrop to bool

Switch the "hotdrop" variables to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: ip6_tables: fix explanation of valid upper protocol number
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:14:23 +0000 (22:14 -0700)]
[NETFILTER]: ip6_tables: fix explanation of valid upper protocol number

This explains the allowed upper protocol numbers. IP6T_F_NOPROTO was
introduced to use 0 as Hop-by-Hop option header, not wildcard. But that
seemed to be forgotten. 0 has been used as wildcard since 2002-08-23.

Signed-off-by: Yasuyuki Kozakai <yasuyuki@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: nf_conntrack_h323: check range first in sequence extension
Jing Min Zhao [Sun, 8 Jul 2007 05:13:17 +0000 (22:13 -0700)]
[NETFILTER]: nf_conntrack_h323: check range first in sequence extension

Check range before checking STOP flag. This optimization may save a
nanosecond or less :)

Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[UDP]: Cleanup UDP encapsulation code
James Chapman [Fri, 6 Jul 2007 00:08:05 +0000 (17:08 -0700)]
[UDP]: Cleanup UDP encapsulation code

This cleanup fell out after adding L2TP support where a new encap_rcv
funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv
funcptr, which allows us to move the XFRM encap code from udp.c into
xfrm4_input.c.

Make xfrm4_rcv_encap() static since it is no longer called externally.

Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IrDA]: tsap init routine factorisation.
G. Liakhovetski [Tue, 3 Jul 2007 05:56:57 +0000 (22:56 -0700)]
[IrDA]: tsap init routine factorisation.

This patch extracts common code from irttp_open_tsap() and irttp_dup()
into a new function to 1) avoid code duplication, 2) help avoid
forgetting object initialization in the tsap duplication path in the
future.

Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IrDA]: kingsun-sir.c charset fix.
Samuel Ortiz [Tue, 3 Jul 2007 05:56:15 +0000 (22:56 -0700)]
[IrDA]: kingsun-sir.c charset fix.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IrDA]: Monitor mode.
Samuel Ortiz [Tue, 3 Jul 2007 05:55:31 +0000 (22:55 -0700)]
[IrDA]: Monitor mode.

Through the IrDA netlink set mode command, we switch to IrDA monitor
mode, where one IrLAP instance receives all the packets on the media,
without ever responding to them.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IrDA]: Netlink layer.
Samuel Ortiz [Tue, 3 Jul 2007 05:54:18 +0000 (22:54 -0700)]
[IrDA]: Netlink layer.

First IrDA configuration netlink layer implementation.
Currently, we only support the set/get mode commands.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Allow group ownership of TUN/TAP devices.
Guido Guenther [Tue, 3 Jul 2007 05:50:25 +0000 (22:50 -0700)]
[NET]: Allow group ownership of TUN/TAP devices.

Introduce a new syscall TUNSETGROUP for group ownership setting of tap
devices. The user now is allowed to send packages if either his euid or
his egid matches the one specified via tunctl (via -u or -g
respecitvely). If both, gid and uid, are set via tunctl, both have to
match.

Signed-off-by: Guido Guenther <agx@sigxcpu.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Remove unnecessary includes
Patrick McHardy [Tue, 3 Jul 2007 05:49:07 +0000 (22:49 -0700)]
[NET_SCHED]: Remove unnecessary includes

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: sch_htb: use generic estimator
Patrick McHardy [Tue, 3 Jul 2007 05:48:13 +0000 (22:48 -0700)]
[NET_SCHED]: sch_htb: use generic estimator

Use the generic estimator instead of reimplementing (parts of) it.
For compatibility always create a default estimator for new classes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Remove unnecessary stats_lock pointers
Patrick McHardy [Tue, 3 Jul 2007 05:47:37 +0000 (22:47 -0700)]
[NET_SCHED]: Remove unnecessary stats_lock pointers

Remove stats_lock pointers from qdisc-internal structures, in all cases
it points to dev->queue_lock. The only case where it is necessary is for
top-level qdiscs, where it might also point to dev->ingress_lock in case
of the ingress qdisc. Also remove it from actions completely, it always
points to the actions internal lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option
Patrick McHardy [Tue, 3 Jul 2007 05:46:07 +0000 (22:46 -0700)]
[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option

The generic estimator is always built in anways and all the config options
does is prevent including a minimal amount of code for setting it up.
Additionally the option is already automatically selected for most cases.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PKTGEN]: IPSEC support
Jamal Hadi Salim [Tue, 3 Jul 2007 05:41:59 +0000 (22:41 -0700)]
[PKTGEN]: IPSEC support

Added transport mode ESP support for starters.  I will send more of
these modes and types once i have resolved the tunnel mode isses.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM] Introduce standalone SAD lookup
Jamal Hadi Salim [Tue, 3 Jul 2007 05:41:14 +0000 (22:41 -0700)]
[XFRM] Introduce standalone SAD lookup

This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PKTGEN]: Introduce sequential flows
Jamal Hadi Salim [Tue, 3 Jul 2007 05:40:36 +0000 (22:40 -0700)]
[PKTGEN]: Introduce sequential flows

By default all flows in pktgen are randomly selected.
This patch introduces ability to have all defined flows to
be sent sequentially. Robert defined randomness to be the
default behavior.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PKTGEN]: Centralize packet overhead tracking
Jamal Hadi Salim [Tue, 3 Jul 2007 05:39:50 +0000 (22:39 -0700)]
[PKTGEN]: Centralize packet overhead tracking

Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[MAC80211]: Set low initial rate in rc80211_simple
Larry Finger [Tue, 3 Jul 2007 05:36:38 +0000 (22:36 -0700)]
[MAC80211]: Set low initial rate in rc80211_simple

The initial rate for STA's using rc80211_simple is set to the last
rate in the rate table. For situations for which the signal is weak,
the rate may be too high for authentication and association. Although
the rc80211_simple module will adjust the speed, the response may not
be fast enough for a successful connection. This modification sets the
initial rate to the lowest supported value.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: SACK fastpath did override adjusted fackets_out
Ilpo Järvinen [Tue, 3 Jul 2007 05:07:22 +0000 (22:07 -0700)]
[TCP]: SACK fastpath did override adjusted fackets_out

Do same adjustment to SACK fastpath counters provided that
they're valid.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Fix secondary unicast/multicast address count maintenance
Patrick McHardy [Sat, 30 Jun 2007 20:35:52 +0000 (13:35 -0700)]
[NET]: Fix secondary unicast/multicast address count maintenance

When a reference to an existing address is increased or decreased without
hitting zero, the address count is incorrectly adjusted.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCHED]: Qdisc changes and sch_rr added for multiqueue
Peter P Waskiewicz Jr [Fri, 29 Jun 2007 04:04:31 +0000 (21:04 -0700)]
[SCHED]: Qdisc changes and sch_rr added for multiqueue

Add the new sch_rr qdisc for multiqueue network device support.  Allow
sch_prio and sch_rr to be compiled with or without multiqueue hardware
support.

sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS.  This
was done since sch_prio and sch_rr only differ in their dequeue
routine.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[CORE] Stack changes to add multiqueue hardware support API
Peter P Waskiewicz Jr [Fri, 6 Jul 2007 20:36:20 +0000 (13:36 -0700)]
[CORE] Stack changes to add multiqueue hardware support API

Add the multiqueue hardware device support API to the core network
stack.  Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.

Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: [DOC] Multiqueue hardware support documentation
Peter P Waskiewicz Jr [Fri, 29 Jun 2007 03:45:47 +0000 (20:45 -0700)]
[NET]: [DOC] Multiqueue hardware support documentation

Add a brief howto to Documentation/networking for multiqueue.  It
explains how to use the multiqueue API in a driver to support
multiqueue paths from the stack, as well as the qdiscs to use for
feeding a multiqueue device.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Fix TX checksum feature check
Herbert Xu [Thu, 28 Jun 2007 20:44:37 +0000 (13:44 -0700)]
[NET]: Fix TX checksum feature check

This patch fixes a boolean error in the new TX checksum check
that causes bogus TSO packets to be generated.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[L2TP]: Add PPPoL2TP in-kernel documentation
James Chapman [Wed, 27 Jun 2007 22:53:49 +0000 (15:53 -0700)]
[L2TP]: Add PPPoL2TP in-kernel documentation

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[L2TP]: Add PPPoL2TP maintainer
James Chapman [Wed, 27 Jun 2007 22:53:17 +0000 (15:53 -0700)]
[L2TP]: Add PPPoL2TP maintainer

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PPPOL2TP]: Use proper printf format specifier for size_t.
David S. Miller [Wed, 27 Jun 2007 22:52:25 +0000 (15:52 -0700)]
[PPPOL2TP]: Use proper printf format specifier for size_t.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[L2TP]: PPP over L2TP driver core
James Chapman [Wed, 27 Jun 2007 22:49:24 +0000 (15:49 -0700)]
[L2TP]: PPP over L2TP driver core

This driver handles only L2TP data frames; control frames are handled
by a userspace application. It implements L2TP using the PPPoX socket
family. There is a PPPoX socket for each L2TP session in an L2TP
tunnel.  PPP data within each session is passed through the kernel's
PPP subsystem via this driver. Kernel parameters of each socket can be
read or modified using ioctl() or [gs]etsockopt() calls.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[L2TP]: Changes to existing ppp and socket kernel headers for L2TP
James Chapman [Wed, 27 Jun 2007 22:43:43 +0000 (15:43 -0700)]
[L2TP]: Changes to existing ppp and socket kernel headers for L2TP

Add struct sockaddr_pppol2tp to carry L2TP-specific address
information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
use the union inside struct sockaddr_pppox because the L2TP-specific
data is larger than the current size of the union and we must preserve
the size of struct sockaddr_pppox for binary compatibility.

Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
L2TP counters and state from the kernel.

Add new if_pppol2tp.h header.

[ Modified to use aligned_u64 in statistics structure -DaveM ]

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[UDP]: Introduce UDP encapsulation type for L2TP
James Chapman [Wed, 27 Jun 2007 22:37:46 +0000 (15:37 -0700)]
[UDP]: Introduce UDP encapsulation type for L2TP

This patch adds a new UDP_ENCAP_L2TPINUDP encapsulation type for UDP
sockets. When a UDP socket's encap_type is UDP_ENCAP_L2TPINUDP, the
skb is delivered to a function pointed to by the udp_sock's
encap_rcv funcptr. If the skb isn't wanted by L2TP, it returns >0, which
causes it to be passed through to UDP.

Include padding to put the new encap_rcv field on a 4-byte boundary.

Previously, the only user of UDP encap sockets was ESP, so when
CONFIG_XFRM was not defined, some of the encap code was compiled
out. This patch changes that. As a result, udp_encap_rcv() will
now do a little more work when CONFIG_XFRM is not defined.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: dev: secondary unicast address support
Patrick McHardy [Wed, 27 Jun 2007 08:28:10 +0000 (01:28 -0700)]
[NET]: dev: secondary unicast address support

Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: dev_mcast: switch to generic net_device address lists
Patrick McHardy [Wed, 27 Jun 2007 08:26:58 +0000 (01:26 -0700)]
[NET]: dev_mcast: switch to generic net_device address lists

Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: dev: introduce generic net_device address lists
Patrick McHardy [Wed, 27 Jun 2007 08:26:19 +0000 (01:26 -0700)]
[NET]: dev: introduce generic net_device address lists

Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: dev_mcast: unexport dev_mc_upload
Patrick McHardy [Wed, 27 Jun 2007 08:25:11 +0000 (01:25 -0700)]
[NET]: dev_mcast: unexport dev_mc_upload

dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: IPV6 checksum offloading in network devices
Stephen Hemminger [Wed, 27 Jun 2007 07:47:37 +0000 (00:47 -0700)]
[NET]: IPV6 checksum offloading in network devices

The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
 * adds NETIF_F_IPV6_CSUM for those devices
 * fixes bnx2 and tg3 devices that need it
 * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
 * fixes assumptions about NETIF_F_ALL_CSUM in nat
 * adjusts bridge union of checksumming computation

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM]: Add module alias for transformation type.
Masahide NAKAMURA [Wed, 27 Jun 2007 06:57:49 +0000 (23:57 -0700)]
[XFRM]: Add module alias for transformation type.

It is clean-up for XFRM type modules and adds aliases with its
protocol:
 ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
 ROUTING and DSTOPTS for MIPv6

It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6] MIP6: Loadable module support for MIPv6.
Masahide NAKAMURA [Wed, 27 Jun 2007 06:56:32 +0000 (23:56 -0700)]
[IPV6] MIP6: Loadable module support for MIPv6.

This patch makes MIPv6 loadable module named "mip6".

Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:

alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6

Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.

Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO

These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
  opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6] MIP6: Kill unnecessary ifdefs.
Masahide NAKAMURA [Wed, 27 Jun 2007 06:51:41 +0000 (23:51 -0700)]
[IPV6] MIP6: Kill unnecessary ifdefs.

Kill unnecessary CONFIG_IPV6_MIP6.

o It is redundant for RAW socket to keep MH out with the config then
  it can handle any protocol.
o Clean-up at AH.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[RTNETLINK]: Fix rtnetlink compat attribute patch
Patrick McHardy [Tue, 26 Jun 2007 10:23:44 +0000 (03:23 -0700)]
[RTNETLINK]: Fix rtnetlink compat attribute patch

Sent the wrong patch previously.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[RTNETLINK]: Add nested compat attribute
Patrick McHardy [Mon, 25 Jun 2007 21:30:16 +0000 (14:30 -0700)]
[RTNETLINK]: Add nested compat attribute

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
        [ compat contents ]
        struct rtattr {
                .rta_len        = total size,
                .rta_type       = type,
        } rta;
        struct old_structure struct;

        [ nested top-level attribute ]
        struct rtattr {
                .rta_len        = nest size,
                .rta_type       = type,
        } nest_attr;

        [ optional 0 .. n nested attributes ]
        struct rtattr {
                .rta_len        = private attribute len,
                .rta_type       = private attribute typ,
        } nested_attr;
        struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETLINK]: attr: add nested compat attribute type
Patrick McHardy [Mon, 25 Jun 2007 20:49:35 +0000 (13:49 -0700)]
[NETLINK]: attr: add nested compat attribute type

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SKBUFF]: Keep track of writable header len of headerless clones
Patrick McHardy [Mon, 25 Jun 2007 11:35:20 +0000 (04:35 -0700)]
[SKBUFF]: Keep track of writable header len of headerless clones

Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.

The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.

I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.

Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:

- sendfile eth0, no NAT: sys     0m0.388s
- sendfile eth0, NAT: sys     0m1.835s
- sendfile eth0: NAT + path: sys     0m0.370s (~ -80%)

- sendfile lo, no NAT: sys     0m0.258s
- sendfile lo, NAT: sys     0m2.609s
- sendfile lo, NAT + patch: sys     0m0.260s (~ -90%)

- sendmsg eth0, no NAT: sys     0m2.508s
- sendmsg eth0, NAT: sys     0m2.539s
- sendmsg eth0, NAT + patch: sys     0m2.445s (no change)

- sendmsg lo, no NAT: sys 0m2.151s
- sendmsg lo, NAT: sys     0m3.557s
- sendmsg lo, NAT + patch: sys     0m2.159s (~ -40%)

I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: qdisc_restart - couple of optimizations.
Krishna Kumar [Mon, 25 Jun 2007 02:57:27 +0000 (19:57 -0700)]
[NET]: qdisc_restart - couple of optimizations.

Changes :

- netif_queue_stopped need not be called inside qdisc_restart as
  it has been called already in qdisc_run() before the first skb
  is sent, and in __qdisc_run() after each intermediate skb is
  sent (note : we are the only sender, so the queue cannot get
  stopped while the tx lock was got in the ~LLTX case).

- BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
  meant more packets are available, and __qdisc_run used to loop
  when qdisc_restart() returned -1. During those days, it was
  necessary to make sure that qlen is never less than zero, since
  __qdisc_run would get into an infinite loop if no packets are on
  the queue and this bug in qdisc was there (and worse - no more
  skbs could ever get queue'd as we hold the queue lock too). With
  Herbert's recent change to return values, this check is not
  required.  Hopefully Herbert can validate this change. If at all
  this is required, it should be added to skb_dequeue (in failure
  case), and not to qdisc_qlen.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: qdisc_restart - readability changes plus one bug fix.
Krishna Kumar [Mon, 25 Jun 2007 02:56:09 +0000 (19:56 -0700)]
[NET]: qdisc_restart - readability changes plus one bug fix.

New changes :

- Incorporated Peter Waskiewicz's comments.
- Re-added back one warning message (on driver returning wrong value).

Previous changes :

- Converted to use switch/case code which looks neater.

- "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless
  check should be removed, since driver will return NETDEV_TX_LOCKED only
  if lockless is true and driver has to do the locking. In the original
  code as well as the latest code, this code can result in a bug where
  if LLTX is not set for a driver (lockless == 0) but the driver is written
  wrongly to do a trylock (despite LLTX being set), the driver returns
  LOCKED. But since lockless is zero, the packet is requeue'd instead of
  calling collision code which will issue warning and free up the skb.
  Instead this skb will be retried with this driver next time, and the same
  result will ensue. Removing this check will catch these driver bugs instead
  of hiding the problem. I am keeping this change to readability section
  since :
   a. it is confusing to check two things as it is; and
   b. it is difficult to keep this check in the changed 'switch' code.

- Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is
  the work being done and easier to understand) and do_dev_requeue to
  dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to
  dev_handle_collision (handle_dev_cpu_collision is a small routine with only
  one caller, so there is no need to have two separate routines which also
  results in getting rid of two macros, etc.

- Removed an XXX comment as it should never fail (I suspect this was related
  to batch skb WIP, Jamal ?). Converted some functions to original coding
  style of having the return values and the function name on same line, eg
  prio2list.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[CCID3]: Fix a bug in the send time processing
Gerrit Renker [Sat, 16 Jun 2007 16:48:50 +0000 (13:48 -0300)]
[CCID3]: Fix a bug in the send time processing

ccid3_hc_tx_send_packet currently returns 0 when the time difference between
current time and t_nom is less than 1000 microseconds.

In this case the packet is sent immediately; but, unlike other packets that can
be emitted on first attempt, it will not have its window counter updated and
its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
algorithm then converges: time differences > 1000 microseconds trigger the
timer in dccp_write_xmit; after timer expiry this function is tried again; when
the time difference is less than 1000, the packet will have its options added
and window counter updated as required.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
17 years ago[CCID3]: Sending time: update to ktime_t
Gerrit Renker [Sat, 16 Jun 2007 16:34:02 +0000 (13:34 -0300)]
[CCID3]: Sending time: update to ktime_t

This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in
                ccid3hctx_no_feedback_timer

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
17 years ago[KTIME]: Introduce ktime_add_us
Arnaldo Carvalho de Melo [Sat, 16 Jun 2007 15:39:38 +0000 (12:39 -0300)]
[KTIME]: Introduce ktime_add_us

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
17 years ago[KTIME]: Introduce ktime_us_delta
Gerrit Renker [Sat, 16 Jun 2007 15:38:51 +0000 (12:38 -0300)]
[KTIME]: Introduce ktime_us_delta

This provides a reusable time difference function which returns the difference in
microseconds, as often used in the DCCP code.

Commiter note: renamed ktime_delta to ktime_us_delta and put it in ktime.h.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
17 years agoloss_interval: make struct dccp_li_hist_entry private
Arnaldo Carvalho de Melo [Mon, 28 May 2007 21:56:44 +0000 (18:56 -0300)]
loss_interval: make struct dccp_li_hist_entry private

net/dccp/ccids/lib/loss_interval.c is the only place where this struct is used.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
17 years agoloss_interval: Nuke dccp_li_hist
Arnaldo Carvalho de Melo [Mon, 28 May 2007 21:53:08 +0000 (18:53 -0300)]
loss_interval: Nuke dccp_li_hist

It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
17 years agoloss_interval: Make dccp_li_hist_entry_{new,delete} private
Arnaldo Carvalho de Melo [Mon, 28 May 2007 21:25:12 +0000 (18:25 -0300)]
loss_interval: Make dccp_li_hist_entry_{new,delete} private

Not used outside the loss_interval code anymore.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
17 years agoloss_interval: unexport dccp_li_hist_interval_new
Arnaldo Carvalho de Melo [Mon, 28 May 2007 21:21:53 +0000 (18:21 -0300)]
loss_interval: unexport dccp_li_hist_interval_new

Now its only used inside the loss_interval code.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
17 years ago[DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval
Arnaldo Carvalho de Melo [Thu, 14 Jun 2007 20:41:28 +0000 (17:41 -0300)]
[DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval

Renaming it to dccp_li_update_li.

Also based on previous work by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
17 years ago[CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li
Arnaldo Carvalho de Melo [Thu, 14 Jun 2007 15:24:46 +0000 (12:24 -0300)]
[CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li

Now ccid3_hc_rx_update_li is ready to be moved to
net/dccp/ccids/lib/loss_interval, it uses the same interface as the other
functions there.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
17 years agoRemove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li
Arnaldo Carvalho de Melo [Mon, 28 May 2007 21:04:14 +0000 (18:04 -0300)]
Remove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li

This is a preparatory patch for moving these loss interval functions from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c.

Based on a patch by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>