Herbert Xu [Thu, 18 Oct 2007 04:35:51 +0000 (21:35 -0700)]
[IPSEC]: Rename mode to outer_mode and add inner_mode
This patch adds a new field to xfrm states called inner_mode. The existing
mode object is renamed to outer_mode.
This is the first part of an attempt to fix inter-family transforms. As it
is we always use the outer family when determining which mode to use. As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.
What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part. For inbound processing
we'd use the opposite pairing.
I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:35:15 +0000 (21:35 -0700)]
[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
Combining RO and AH/ESP/IPCOMP does not make sense. So this patch adds a
check in the state initialisation function to prevent this.
This allows us to safely remove the mode input function of RO since it
can never be called anymore. Indeed, if somehow it does get called we'll
know about it through an OOPS instead of it slipping past silently.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:34:46 +0000 (21:34 -0700)]
[IPSEC]: Use the top IPv4 route's peer instead of the bottom
For IPv4 we were using the bottom route's peer instead of the top one.
This is wrong because the peer is only used by TCP to keep track of
information about the TCP destination address which certainly does not
live in the bottom route.
This patch fixes that which allows us to get rid of the family check
since the bottom route could be IPv6 while the top one must always
be IPv4.
I've also changed the other fields which are IPv4-specific to get the
info from the top route instead of potentially bogus data from the
bottom route.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:33:12 +0000 (21:33 -0700)]
[IPSEC]: Store afinfo pointer in xfrm_mode
It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family. Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.
This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it. I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:31:50 +0000 (21:31 -0700)]
[IPSEC]: Add missing BEET checks
Currently BEET mode does not reinject the packet back into the stack
like tunnel mode does. Since BEET should behave just like tunnel mode
this is incorrect.
This patch fixes this by introducing a flags field to xfrm_mode that
tells the IPsec code whether it should terminate and reinject the packet
back into the stack.
It then sets the flag for BEET and tunnel mode.
I've also added a number of missing BEET checks elsewhere where we check
whether a given mode is a tunnel or not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:31:12 +0000 (21:31 -0700)]
[IPSEC]: Move type and mode map into xfrm_state.c
The type and mode maps are only used by SAs, not policies. So it makes
sense to move them from xfrm_policy.c into xfrm_state.c. This also allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.
The only other change I've made in the move is to get rid of the casts
on the request_module call for types. They're unnecessary because C
will promote them to ints anyway.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:30:34 +0000 (21:30 -0700)]
[IPSEC]: Fix length check in xfrm_parse_spi
Currently xfrm_parse_spi requires there to be 16 bytes for AH and ESP.
In contrived cases there may not actually be 16 bytes there since the
respective header sizes are less than that (8 and 12 currently).
This patch changes the test to use the actual header length instead of 16.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:30:07 +0000 (21:30 -0700)]
[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
Not every transform needs to zap ip_summed. For example, a pure tunnel
mode encapsulation does not affect the hardware checksum at all. In fact,
every algorithm (that needs this) other than AH6 already does its own
ip_summed zapping.
This patch moves the zapping into AH6 which is in line with what IPv4 does.
Possible future optimisation: Checksum the data as we copy them in IPComp.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:29:25 +0000 (21:29 -0700)]
[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet.
This means that we need to fix up the value in case we have a 4-on-6
tunnel. Moving this logic into the caller simplifies things and allows
us to merge the code with IPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:28:53 +0000 (21:28 -0700)]
[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
This patch moves the tunnel parsing for IPv4 out of xfrm4_input and into
xfrm4_tunnel. This change is in line with what IPv6 does and will allow
us to merge the two input functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:28:06 +0000 (21:28 -0700)]
[IPSEC]: Fix pure tunnel modes involving IPv6
I noticed that my recent patch broke 6-on-4 pure IPsec tunnels (the ones
that are only used for incompressible IPsec packets). Subsequent reviews
show that I broke 6-on-6 pure tunnels more than three years ago and nobody
ever noticed. I suppose every must be testing 6-on-6 IPComp with large
pings which are very compressible :)
This patch fixes both cases.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:22:42 +0000 (21:22 -0700)]
[NET]: Fix the race between sk_filter_(de|at)tach and sk_clone()
The proposed fix is to delay the reference counter decrement
until the quiescent state pass. This will give sk_clone() a
chance to get the reference on the cloned filter.
Regular sk_filter_uncharge can happen from the sk_free() only
and there's no need in delaying the put - the socket is dead
anyway and is to be release itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:22:17 +0000 (21:22 -0700)]
[NET]: Cleanup the error path in sk_attach_filter
The sk_filter_uncharge is called for error handling and
for releasing the former filter, but this will have to
be done in a bit different manner, so cleanup the error
path a bit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:48:26 +0000 (19:48 -0700)]
[INET]: Consolidate frag queues freeing
Since we now allocate the queues in inet_fragment.c, we
can safely free it in the same place. The ->destructor
callback thus becomes optional for inet_frags.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:47:56 +0000 (19:47 -0700)]
[INET]: Remove no longer needed ->equal callback
Since this callback is used to check for conflicts in
hashtable when inserting a newly created frag queue, we can
do the same by checking for matching the queue with the
argument, used to create one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:47:21 +0000 (19:47 -0700)]
[INET]: Consolidate xxx_find() in fragment management
Here we need another callback ->match to check whether the
entry found in hash matches the key passed. The key used
is the same as the creation argument for inet_frag_create.
Yet again, this ->match is the same for netfilter and ipv6.
Running a frew steps forward - this callback will later
replace the ->equal one.
Since the inet_frag_find() uses the already consolidated
inet_frag_create() remove the xxx_frag_create from protocol
codes.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:45:23 +0000 (19:45 -0700)]
[INET]: Consolidate xxx_frag_alloc()
Just perform the kzalloc() allocation and setup common
fields in the inet_frag_queue(). Then return the result
to the caller to initialize the rest.
The inet_frag_alloc() may return NULL, so check the
return value before doing the container_of(). This looks
ugly, but the xxx_frag_alloc() will be removed soon.
The xxx_expire() timer callbacks are patches,
because the argument is now the inet_frag_queue, not
the protocol specific queue.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:44:34 +0000 (19:44 -0700)]
[INET]: Consolidate xxx_frag_intern
This routine checks for the existence of a given entry
in the hash table and inserts the new one if needed.
The ->equal callback is used to compare two frag_queue-s
together, but this one is temporary and will be removed
later. The netfilter code and the ipv6 one use the same
routine to compare frags.
The inet_frag_intern() always returns non-NULL pointer,
so convert the inet_frag_queue into protocol specific
one (with the container_of) without any checks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:43:37 +0000 (19:43 -0700)]
[INET]: Omit double hash calculations in xxx_frag_intern
Since the hash value is already calculated in xxx_find, we can
simply use it later. This is already done in netfilter code,
so make the same in ipv4 and ipv6.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Dike [Thu, 18 Oct 2007 02:35:04 +0000 (19:35 -0700)]
[UMP]: header_ops conversion needed for non-ethernet drivers
UML's two non-ethernet drivers need some header_ops conversion.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Every IrCOMM socket is registered with the discovery subsystem, so we don't
need to loop over all of them for every discovery event. We just need to
do it for the registered IrCOMM socket.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ingo Molnar [Thu, 18 Oct 2007 02:33:06 +0000 (19:33 -0700)]
[DCCP]: fix link error with !CONFIG_SYSCTL
Do not define the sysctl_dccp_sync_ratelimit sysctl variable in the
CONFIG_SYSCTL dependent sysctl.c module - move it to input.c instead.
This fixes the following build bug:
net/built-in.o: In function `dccp_check_seqno':
input.c:(.text+0xbd859): undefined reference to `sysctl_dccp_sync_ratelimit'
distcc[29953] ERROR: compile (null) on localhost failed
make: *** [vmlinux] Error 1
Found via 'make randconfig' build testing.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:30:40 +0000 (19:30 -0700)]
[IPV6]: Fix memory leak in cleanup_ipv6_mibs()
The icmpv6msg mib statistics is not freed.
This is almost not critical for current kernel, since ipv6
module is unloadable, but this can happen on load error and
will happen every time we stop the network namespace (when
we have one, of course).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 18 Oct 2007 02:26:15 +0000 (19:26 -0700)]
[BNX2]: Fix Serdes WoL bug.
The bug is in the code in bnx2_set_power_state() that assumes copper
devices when setting up WoL. This is no longer true after adding WoL
support for Serdes devices.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Wed, 17 Oct 2007 21:28:38 +0000 (14:28 -0700)]
[IA64] fix non-numa build
arch/ia64/kernel/machine_kexec.c: In function `arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:131: error: `pgdat_list' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:131: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/machine_kexec.c:131: error: for each function it appears in.)
arch/ia64/kernel/machine_kexec.c:134: error: `node_memblk' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:135: error: `NR_NODE_MEMBLKS' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:136: error: invalid application of `sizeof' to incomplete type `node_memblk_s'
arch/ia64/kernel/machine_kexec.c:137: error: dereferencing pointer to incomplete type
arch/ia64/kernel/machine_kexec.c:138: error: dereferencing pointer to incomplete type
make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 17 Oct 2007 21:12:44 +0000 (14:12 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
net: libertas sdio driver
mmc: at91_mci: cleanup: use MCI_ERRORS
mmc: possible leak in mmc_read_ext_csd
A sysctl method was added to enable and disable debugging levels. After
further review, it was decided that there are better approaches to doing this
and the sysctl methodology isn't really desirable. This patch removes the
sysctl code from 9p.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Andrew Victor [Wed, 17 Oct 2007 09:53:40 +0000 (11:53 +0200)]
mmc: at91_mci: cleanup: use MCI_ERRORS
A small MMC driver cleanup.
Use the defined AT91_MCI_ERRORS in at91_mci_completed_command() instead
of specifying all the error bits individually.
Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Loose mode in 9p utilizes the page cache without respecting coherency with
the server. Any writes previously invaldiated the entire mapping for a file.
This patch softens the behavior to only invalidate the region of the actual
write.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: attach-per-user
The 9P2000 protocol requires the authentication and permission checks to be
done in the file server. For that reason every user that accesses the file
server tree has to authenticate and attach to the server separately.
Multiple users can share the same connection to the server.
Currently v9fs does a single attach and executes all I/O operations as a
single user. This makes using v9fs in multiuser environment unsafe as it
depends on the client doing the permission checking.
This patch improves the 9P2000 support by allowing every user to attach
separately. The patch defines three modes of access (new mount option
'access'):
- attach-per-user (access=user) (default mode for 9P2000.u)
If a user tries to access a file served by v9fs for the first time, v9fs
sends an attach command to the server (Tattach) specifying the user. If
the attach succeeds, the user can access the v9fs tree.
As there is no uname->uid (string->integer) mapping yet, this mode works
only with the 9P2000.u dialect.
- allow only one user to access the tree (access=<uid>)
Only the user with uid can access the v9fs tree. Other users that attempt
to access it will get EPERM error.
- do all operations as a single user (access=any) (default for 9P2000)
V9fs does a single attach and all operations are done as a single user.
If this mode is selected, the v9fs behavior is identical with the current
one.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: rename uid and gid parameters
Change the names of 'uid' and 'gid' parameters to the more appropriate
'dfltuid' and 'dfltgid'. This also sets the default uid/gid to -2
(aka nfsnobody)
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch abstracts out the interfaces to underlying transports so that
new transports can be added as modules. This should also allow kernel
configuration of transports without ifdef-hell.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Sam Ravnborg [Wed, 17 Oct 2007 19:16:33 +0000 (21:16 +0200)]
x86: fix kernel rebuild due to vsyscall fallout
Fix rebuild of kernel when there is no changes.
This happened for i386.
Using make V=2 hinted that the output files were
not assigned to targets - fixed by this patch.
Reported by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Roland McGrath [Wed, 17 Oct 2007 16:04:41 +0000 (18:04 +0200)]
x86: vdso linker script cleanup
I can't see the reason ". = VDSO_PRELINK + 0x900;" was ever there in
the linker script for the x86_64 vDSO. I can't find anything that
depends on this magic offset, or that should care at all about the
particular location of of the .data section (all from vvar.c) in the
vDSO image. If it is really desireable to place .data at 0x900, then it
should be after all the other sections so they fill in the space up to
0x900.
This removes the 0x900 magic and cleans up the output sections generally
in the vDSO linker script. This saves a few hundred bytes in the size
of the vDSO file, bringing it back well under 4kb total so that its vma
only needs one page.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
A threshold interrupt occurs when ECC memory correction is occuring at too
high a frequency. Thresholds are used by the ECC hardware as occasional
ECC failures are part of normal operation, but long sequences of ECC
failures usually indicate a memory chip that is about to fail.
Thermal event interrupts occur when a temperature threshold has been
exceeded for some CPU chip. IIRC, a thermal interrupt is also generated
when the temperature drops back to a normal level.
A spurious interrupt is an interrupt that was raised then lowered by the
device before it could be fully processed by the APIC. Hence the apic sees
the interrupt but does not know what device it came from. For this case
the APIC hardware will assume a vector of 0xff.
Rescheduling, call, and TLB flush interrupts are sent from one CPU to
another per the needs of the OS. Typically, their statistics would be used
to discover if an interrupt flood of the given type has been occuring.
AK: merged v2 and v4 which had some more tweaks
AK: replace Local interrupts with Local timer interrupts
AK: Fixed description of interrupt types.
[ tglx: arch/x86 adaptation ]
[ mingo: small cleanup ]
Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Tim Hockin <thockin@hockin.org> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Such output is gained with some ugly if (!nl) printk("\n"); code and
besides being a waste of lines, this is also annoying to read. The
following output looks better (and it is how it looks on x86_64):
Satyam Sharma [Wed, 17 Oct 2007 16:04:40 +0000 (18:04 +0200)]
x86: call cache_add_dev() from cache_sysfs_init() explicitly
Call cache_add_dev() from cache_sysfs_init() explicitly, instead of
referencing the CPU notifier callback directly from generic startup
code. Looks cleaner (to me at least) this way, and also makes it
possible to use other tricks to replace __cpuinit{data} annotations, as
recently discussed on this list.
Roland McGrath [Wed, 17 Oct 2007 16:04:40 +0000 (18:04 +0200)]
x86: vdso put vars in rodata
This adds a const to the definitions vvar.c makes, so that the vdso_*
variables go into .rodata instead of .data. This is essentially a
cosmetic change, just giving the section headers in the vDSO file more
pleasing flags. These variables are read-only from the perspective of
the vDSO itself and user mode, even though the contents of the DSO image
were adjusted at boot.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andrew Morton [Wed, 17 Oct 2007 16:04:39 +0000 (18:04 +0200)]
x86: asm-i386/io.h fix constness
- Fix this:
include/asm/io.h: In function `memcpy_fromio':
include/asm/io.h:208: warning: passing argument 2 of `__memcpy' discards qualifiers from pointer target type
- Clean up code a bit
Reported-by: Uwe Bugla <uwe.bugla@gmx.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>