Failing context is a multi threaded process context and the failing
sequence is as follows.
One thread T0 doing self modifying code on page X on processor P0 and
another thread T1 doing COW (breaking the COW setup as part of just
happened fork() in another thread T2) on the same page X on processor P1.
T0 doing SMC can endup modifying the new page Y (allocated by the T1 doing
COW on P1) but because of different I/D TLB's, P0 ITLB will not see the new
mapping till the flush TLB IPI from P1 is received. During this interval,
if T0 executes the code created by SMC it can result in an app error (as
ITLB still points to old page X and endup executing the content in page X
rather than using the content in page Y).
Fix this issue by first clearing the PTE and flushing it, before updating
it with new entry.
Hugh sayeth:
I was a bit sceptical, in the habit of thinking that Self Modifying Code
must look such issues itself: but I guess there's nothing it can do to avoid
this one.
Fair enough, what you're changing it to is pretty much what powerpc and
s390 were already doing, and is a more robust way of proceeding, consistent
with how ptes are set everywhere else.
The ptep_clear_flush is a bit heavy-handed (it's anxious to return the pte
that was atomically cleared), but we'd have to wander through lots of arches
to get the right minimal behaviour. It'd also be nice to eliminate
ptep_establish completely, now only used to define other macros/inlines: it
always seemed obfuscation to me, what you've got there now is clearer.
Let's put those cleanups on a TODO list.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] convert s390 page handling macros to functions
Convert s390 page handling macros to functions. In particular this fixes a
problem with s390's SetPageUptodate macro which uses its input parameter
twice which again can cause subtle bugs.
[akpm@osdl.org: build fix] Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Woodhouse [Fri, 29 Sep 2006 08:58:37 +0000 (01:58 -0700)]
[PATCH] Fix uninitialised spinlock in via-pmu-backlight code.
The uninitialised pmu_backlight_lock causes the current Fedora test kernel
(which has spinlock debugging enabled) to panic on suspend.
This is suboptimal, so I fixed it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Michael Hanselmann <linux-kernel@hansmi.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matthew Wilcox [Fri, 29 Sep 2006 08:58:36 +0000 (01:58 -0700)]
[PATCH] remove generic__raw_read_trylock()
If the cpu has the lock held for write, is interrupted, and the interrupt
handler calls read_trylock(), it's an instant deadlock.
Now, Dave Miller has subsequently pointed out that we don't have any
situations where this can occur. Nevertheless, we should delete
generic__raw_read_lock (and its associated EXPORT to make Arjan happy) so that
nobody thinks they can use it.
Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro [Fri, 29 Sep 2006 08:58:34 +0000 (01:58 -0700)]
[PATCH] __percpu_alloc_mask() has to be __always_inline in UP case
... or we'll end up with cpu_online_map being evaluated on UP. In
modules. cpumask.h is very careful to avoid that, and for a very good
reason. So should we...
PS: yes, it really triggers (on alpha).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[Bluetooth]: Fix section mismatch of bt_sysfs_cleanup()
The bt_sysfs_cleanup() is marked with __exit attribute, but it will
be called from an __init function in the error case. So the __exit
attribute must be removed.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[Bluetooth]: Don't update disconnect timer for incoming connections
In the case of device pairing the only safe method is to establish
a low-level ACL link. In this case, the remote side should not use
the disconnect timer to give the other side the chance to enter the
PIN code. If the disconnect timer is used, the connection will be
dropped to soon, because it is impossible to identify an actual user
of this link.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
John Heffner [Thu, 28 Sep 2006 21:47:38 +0000 (14:47 -0700)]
[TCP]: Fix and simplify microsecond rtt sampling
This changes the microsecond RTT sampling so that samples are taken in
the same way that RTT samples are taken for the RTO calculator: on the
last segment acknowledged, and only when the segment hasn't been
retransmitted.
Signed-off-by: John Heffner <jheffner@psc.edu> Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Sesterhenn [Thu, 28 Sep 2006 21:37:07 +0000 (14:37 -0700)]
[SUNRPC]: Remove unnecessary check in net/sunrpc/svcsock.c
coverity spotted this one as possible dereference in the dprintk(),
but since there is only one caller of svc_create_socket(), which always
passes a valid sin, we dont need this check.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 28 Sep 2006 05:53:24 +0000 (22:53 -0700)]
[IPVS]: Make sure ip_vs_ftp ports are valid: module_param_array approach
I'm not entirely sure what happens in the case of a valid port,
at best it'll be silently ignored. This patch ensures that
the port values are unsigned short values, and thus always valid.
FIFOCTL_RXERR and FIFOCTL_TXERR are undocumented bits, according to the
Sigmatel datasheet. We should thus not take any assumption on their values
and semantics.
Problem spotted by andrzej zaborowski <balrogg@gmail.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch detects the smsc-ircc chipset on the nx1000
(including nx7000 and nx7010) and the nx5000 HP/Compaq laptop series.
Patch from "Linus Walleij (LD/EAB)" <linus.walleij@ericsson.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[IrDA] nsc-ircc: Configuration base address for PC87383
According to NatSemi datasheet, the configuration base address for the PC8738x
family is 0x2e or 0x164. 0x0 doesn't appear in any datasheet.
Patch from Lamarque Vieira Souza <lamarque@gmail.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NET]: Move netlink interface bits to linux/if_link.h.
Moving netlink interface bits to linux/if.h is rather troublesome for
applications including both linux/if.h (which was changed to be included
from linux/rtnetlink.h automatically) and net/if.h.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[XFRM]: Do not add a state whose SPI is zero to the SPI hash.
SPI=0 is used for acquired IPsec SA and MIPv6 RO state.
Such state should not be added to the SPI hash
because we do not care about it on deleting path.
The loopback device status structure is a singleton and doesn't
need to be allocated. Add ethtool_ops hooks to show checksum always on,
and make ethtool_ops const.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Ortiz [Thu, 28 Sep 2006 03:05:38 +0000 (20:05 -0700)]
[IrDA]: af_irda.c cleanups
We lock the socket when both releasing and getting a disconnected
notification. In the latter case, we also ste the socket as orphan.
This fixes a potential kernel bug that can be triggered when we get the
disconnection notification before closing the socket.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 28 Sep 2006 02:03:36 +0000 (19:03 -0700)]
[IPV6]: Disable SG for GSO unless we have checksum
Because the system won't turn off the SG flag for us we
need to do this manually on the IPv6 path. Otherwise we
will throw IPv6 packets with bad checksums at the hardware.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Thu, 28 Sep 2006 01:43:33 +0000 (18:43 -0700)]
[IPV4]: annotate inet_lookup() and friends
inet_lookup() annotated along with helper functions (__inet_lookup(),
__inet_lookup_established(), inet_lookup_established(),
inet_lookup_listener(), __inet_lookup_listener() and inet_ehashfn())
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Thu, 28 Sep 2006 01:43:07 +0000 (18:43 -0700)]
[IPV4]: INET_MATCH() annotations
INET_MATCH() and friends depend on an interesting set of kludges:
* there's a pair of adjacent fields in struct inet_sock - __be16 dport
followed by __u16 num. We want to search by pair, so we combine the keys into
a single 32bit value and compare with 32bit value read from &...->dport.
* on 64bit targets we combine comparisons with pair of adjacent __be32
fields in the same way.
Make sure that we don't mix those values with anything else and that pairs
we form them from have correct types.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Thu, 28 Sep 2006 01:32:28 +0000 (18:32 -0700)]
[TCP]: struct tcp_sack_block annotations
Some of the instances of tcp_sack_block are host-endian, some - net-endian.
Define struct tcp_sack_block_wire identical to struct tcp_sack_block
with u32 replaced with __be32; annotate uses of tcp_sack_block replacing
net-endian ones with tcp_sack_block_wire. Change is obviously safe since
for cc(1) __be32 is typedefed to u32.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 27 Sep 2006 23:45:45 +0000 (16:45 -0700)]
[NET_SCHED]: Fix fallout from dev->qdisc RCU change
The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.
The two assumptions were:
- since changes only happen in process context, read_lock doesn't need
bottem half protection. Now invalid since destruction of inner qdiscs,
classifiers, actions and estimators happens in the RCU callback unless
they're manually deleted, resulting in dead-locks when read_lock in
process context is interrupted by write_lock_bh in bottem half context.
- since changes only happen under the RTNL, no additional locking is
necessary for data not used during packet processing (f.e. u32_list).
Again, since destruction now happens in the RCU callback, this assumption
is not valid anymore, causing races while using this data, which can
result in corruption or use-after-free.
Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev->qdisc
pointer is protected by RCU, but ->enqueue and the qdisc tree are still
protected by dev->qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 27 Sep 2006 23:36:23 +0000 (16:36 -0700)]
[NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODE
Fix incorrect use of RB_EMPTY_NODE in htb_safe_rb_erase, which makes it
skip nodes within the rbtree instead of nodes not in the tree, resulting
in crashes later on.
The root cause for this seems to be the very counter-intuitive behaviour
of the RB_EMPTY_NODE macro, which returns _false_ when the node is empty.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Olaf Kirch [Wed, 27 Sep 2006 23:33:45 +0000 (16:33 -0700)]
[IPV4]: Fix order in inet_init failure path.
This is just a minor buglet I came across by accident - when inet_init
fails to register raw_prot, it jumps to out_unregister_udp_proto which
should unregister UDP _and_ TCP.
Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Francesco Fondelli <francesco.fondelli@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>