Linus Torvalds [Fri, 20 May 2005 05:43:37 +0000 (22:43 -0700)]
Fix get_unmapped_area sanity tests
As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.
In netlink_broadcast() we're sending shared skb's to netlink listeners
when possible (saves some copying). This is OK, since we hold the only
other reference to the skb.
However, this implies that we must drop our reference on the skb, before
allowing a receiving socket to disappear. Otherwise, the socket buffer
accounting is disrupted.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETLINK]: Move broadcast skb_orphan to the skb_get path.
Cloned packets don't need the orphan call.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122)
What's happening is that:
1) The skb is sent to socket 1.
2) Someone does a recvmsg on socket 1 and drops the ref on the skb.
Note that the rmalloc is not returned at this point since the
skb is still referenced.
3) The same skb is now sent to socket 2.
This version of the fix resurrects the skb_orphan call that was moved
out, last time we had 'shared-skb troubles'. It is practically a no-op
in the common case, but still prevents the possible race with recvmsg.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 19 May 2005 19:39:04 +0000 (12:39 -0700)]
[IPSEC]: Fixed alg_key_len usage in attach_one_algo
The variable alg_key_len is in bits and not bytes. The function
attach_one_algo is currently using it as if it were in bytes.
This causes it to read memory which may not be there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 May 2005 22:39:33 +0000 (15:39 -0700)]
[PATCH] prevent NULL mmap in topdown model
Prevent the topdown allocator from allocating mmap areas all the way
down to address zero.
We still allow a MAP_FIXED mapping of page 0 (needed for various things,
ranging from Wine and DOSEMU to people who want to allow speculative
loads off a NULL pointer).
Herbert Xu [Thu, 19 May 2005 05:52:33 +0000 (22:52 -0700)]
[IPV4/IPV6] Ensure all frag_list members have NULL sk
Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's. The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter. Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.
So let's go the other way and make this an invariant:
For any skb on a frag_list, skb->sk must be NULL.
That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.
The above invariant is actually violated in the following patch
for a short duration inside ip_fragment. This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Evgeniy Polyakov [Thu, 19 May 2005 05:51:45 +0000 (22:51 -0700)]
[XFRM]: skb_cow_data() does not set proper owner for new skbs.
It looks like skb_cow_data() does not set
proper owner for newly created skb.
If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there
might be a situation when we require recreating skb and
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.
If above "analysis" is correct then attached patch fixes that.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 May 2005 05:49:26 +0000 (22:49 -0700)]
[TG3]: Set minimal hw interrupt mitigation.
Even though we do software interrupt mitigation
via NAPI, it still helps to have some minimal
hw assisted mitigation.
This helps, particularly, on systems where register
I/O overhead is much greater than the CPU horsepower.
For example, it helps on NUMA systems. In such cases
the PIO overhead to disable interrupts for NAPI accounts
for the majority of the packet processing cost. The
CPU is fast enough such that only a single packet is
processed by each NAPI poll call.
Thanks to Michael Chan for reviewing this patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 May 2005 05:46:34 +0000 (22:46 -0700)]
[TG3]: Add tagged status support.
When supported, use the TAGGED interrupt processing support
the chip provides. In this mode, instead of a "on/off" binary
semaphore, an incrementing tag scheme is used to ACK interrupts.
All MSI supporting chips support TAGGED mode, so the tg3_msi()
interrupt handler uses it unconditionally. This invariant is
verified when MSI support is tested.
Since we can invoke tg3_poll() multiple times per interrupt under
high packet load, we fetch a new copy of the tag value in the
status block right before we actually do the work.
Also, because the tagged status tells the chip exactly which
work we have processed, we can make two optimizations:
1) tg3_restart_ints() need not check tg3_has_work()
2) the tg3_timer() need not poke the chip 10 times per
second to keep from losing interrupt events
Based upon valuable feedback from Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Tweedie [Wed, 18 May 2005 15:47:17 +0000 (11:47 -0400)]
[PATCH] Avoid console spam with ext3 aborted journal.
Avoid console spam with ext3 aborted journal.
ext3 usually reports error conditions that it detects in its environment.
But when its journal gets aborted due to such errors, it can sometimes
continue to report that condition forever, spamming the console to such
an extent that the initial first cause of the journal abort can be lost.
When the journal aborts, we put the filesystem into readonly mode. Most
subsequent filesystem operations will get rejected immediately by checks
for MS_RDONLY either in the filesystem or in the VFS. But some paths do
not have such checks --- for example, if we continue to write to a file
handle that was opened before the fs went readonly. (We only check for
the ROFS condition when the file is first opened.) In these cases, we
can continue to generate log errors similar to
EXT3-fs error (device $DEV) in start_transaction: Journal has aborted
for each subsequent write.
There is really no point in generating these errors after the initial
error has been fully reported. Specifically, if we're starting a
completely new filesystem operation, and the filesystem is *already*
readonly (ie. the ext3 layer has already detected and handled the
underlying jbd abort), and we see an EROFS error, then there is simply
no point in reporting it again.
Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Tweedie [Wed, 18 May 2005 15:22:31 +0000 (11:22 -0400)]
[PATCH] Fix filp being passed through raw ioctl handler
Don't pass meaningless file handles to block device ioctls.
The recent raw IO ioctl-passthrough fix started passing the raw file
handle into the block device ioctl handler. That's unlikely to be
useful, as the file handle is actually open on a character-mode raw
device, not a block device, so dereferencing it is not going to yield
useful results to a block device ioctl handler.
Previously we just passed NULL; also not a value that can usefully
be dereferenced, but at least if it does happen, we'll oops instead of
silently pretending that the file is a block device, so NULL is the more
defensive option here. This patch reverts to that behaviour.
Noticed by Al Viro.
Signed-off-by: Stephen Tweedie <sct@redhat.com> Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Brownell [Thu, 12 May 2005 19:06:27 +0000 (12:06 -0700)]
[PATCH] Driver Core: remove driver model detach_state
The driver model has a "detach_state" mechanism that:
- Has never been used by any in-kernel drive;
- Is superfluous, since driver remove() methods can do the same thing;
- Became buggy when the suspend() parameter changed semantics and type;
- Could self-deadlock when called from certain suspend contexts;
- Is effectively wasted documentation, object code, and headspace.
This removes that "detach_state" mechanism; net code shrink, as well
as a per-device saving in the driver model and sysfs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Brownell [Mon, 9 May 2005 15:07:00 +0000 (08:07 -0700)]
[PATCH] Driver Core: pm diagnostics update, check for errors
This patch includes various tweaks in the messaging that appears during
system pm state transitions:
* Warn about certain illegal calls in the device tree, like resuming
child before parent or suspending parent before child. This could
happen easily enough through sysfs, or in some cases when drivers
use device_pm_set_parent().
* Be more consistent about dev_dbg() tracing ... do it for resume() and
shutdown() too, and never if the driver doesn't have that method.
* Say which type of system sleep state is being entered.
Except for the warnings, these only affect debug messaging.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Scott Murray [Mon, 9 May 2005 21:36:27 +0000 (17:36 -0400)]
[PATCH] PCI Hotplug: remove pci_visit_dev
If my CPCI hotplug update patch is applied, then there are no longer any
in tree users of the pci_visit_dev API, and it and its related code can be
removed.
Signed-off-by: Scott Murray <scottm@somanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Scott Murray [Mon, 9 May 2005 21:31:50 +0000 (17:31 -0400)]
[PATCH] PCI Hotplug: CPCI update
[PATCH] CPCI: update
I have finally done some work to update the CompactPCI hotplug driver to
fix some of the outstanding issues in 2.6:
- Added adapter and latch status ops so that those files will get created
by the current PCI hotplug core. This used to not be required, but
seems to be now after some of the sysfs rework in the core.
- Replaced slot list spinlock with a r/w semaphore to avoid any potential
issues with sleeping. This quiets all of the runtime warnings.
- Reworked interrupt driven hot extraction handling to remove need for a
polling operator for ENUM# status. There are a lot of boards that only
have an interrupt driven by ENUM#, so this lowers the bar to entry.
- Replaced pci_visit_dev usage with better use of the PCI core functions.
The new code is functionally equivalent to the previous code, but the
use of pci_enable_device on insert needs to be investigated further, as
I need to do some more testing to see if it is still necessary.
Signed-off-by: Scott Murray <scottm@somanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dely Sy [Sat, 7 May 2005 00:19:09 +0000 (17:19 -0700)]
[PATCH] PCI Hotplug: get pciehp to work on the downstream port of a switch
Here is the updated patch to get pciehp driver to work for downstream
port of a switch and handle the difference in the offset value of PCI
Express capability list item of different ports.
Signed-off-by: Dely Sy <dely.l.sy@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dely Sy [Thu, 5 May 2005 18:57:25 +0000 (11:57 -0700)]
[PATCH] PCI Hotplug: Fix echoing 1 to power file of enabled slot problem with SHPC driver
Here is a patch to fix the problem of echoing 1 to "power" file
to enabled slot causing the slot to power down, and echoing 0
to disabled slot causing shpchp_disabled_slot() to be called
twice. This problem was reported by kenji Kaneshige.
Thanks,
Dely
Signed-off-by: Dely Sy <dely.l.sy@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] fix memory scribble in arch/i386/pci/fixup.c
The GET_INDEX() macro should use just the low three bits of the devfn,
otherwise we have a memory scribble in pcie_rootport_aspm_quirk that
overwrites ptype_all
Fix it to be more careful about its arguments while at it.
- remove enable_ci, ci interface is assumed to be present if the saa7113
is not found.
- reduce the delay when checking for saa7113
- clean up the cam reset according to specifications
- turn off Vcc to the cam slot if cam is removed or fails reset
- remove cam reset in ciintf_init
- clean up printks (KERN_)
- move gpio setting into saa7113_init
- clean up unreadable frontend_init
(Kenneth Aafloy)
Signed-off-by: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] dvb: modified dvb_register_adapter() to avoid kmalloc/kfree
Modified dvb_register_adapter() to avoid kmalloc/kfree. Drivers have to embed
struct dvb_adapter into their private data struct from now on. (Andreas
Oberritter)
Signed-off-by: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] dvb: make needlessly global code static or drop it
- make needlessly global code static
- #if 0 the following unused global functions:
- ttpci/av7110_hw.c: av7110_reset_arm
- ttpci/av7110_hw.c: av7110_send_ci_cmd
- frontends/mt352.[ch]: drop mt352_read
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Switch analog output of the Crystal sound chip to left/stereo/right mode.
This will fix problems with some (most?) channels which do not encode
2-channel audio correctly. (Oliver Endriss)
Signed-off-by: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- driver receives many null TS packets (pid=0x1fff). They occupy the
limited USB bandwidth and this leads to loss of video packets. Enabling the
null packet filter fixes this.
- packets that flexcop sends to USB have a 2 byte header that has to be
removed.
- sometimes a TS packet is split between different urbs. These parts have
to be combined in a temporary buffer.
Signed-off-by: Vadim Catana <skystar@moldova.cc> Signed-off-by: Patrick Boettcher <pb@linuxtv.org> Signed-off-by: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jody McIntyre [Tue, 17 May 2005 04:54:05 +0000 (21:54 -0700)]
[PATCH] ieee1394: fix premature expiry of async packets
Set the initial sendtime to be 10 seconds in the future, to avoid the packet
timing out while it's still queued to be sent. This fixes furthur "no tlabel
match" problems caused by premature expiry.
Jody McIntyre [Tue, 17 May 2005 04:54:04 +0000 (21:54 -0700)]
[PATCH] ieee1394: single buffer fixes to video1394
Apply and fixup patch from Markus Tavenrath <speedygoo@speedygoo.de> for
video1394 to allow only a single buffer on receive and two buffers on
transmit. Tested with libdc1394 and dvconnect (libdv).
Signed-off-by: Dan Dennedy <dan@dennedy.org> Signed-off-by: Jody McIntyre <scjody@steamballoon.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jody McIntyre [Tue, 17 May 2005 04:54:03 +0000 (21:54 -0700)]
[PATCH] ieee1394: drivers/ieee1394/pcilynx.c: Use the DMA_32BIT_MASK constant
Use the DMA_32BIT_MASK constant from dma-mapping.h when calling
pci_set_dma_mask() or pci_set_consistent_dma_mask() These patches include
dma-mapping.h explicitly because it caused errors on some architectures
otherwise. See http://marc.theaimsgroup.com/?t=108001993000001&r=1&w=2 for
details
profile=schedule parsing is not quite what it should be. First, str[7] is
'e', not ',', but then even if it did fall through, prof_on =
SCHED_PROFILING would be clobbered inside if (get_option(...)) So a small
amount of rearrangement is done in this patch to correct it.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Tue, 17 May 2005 04:53:52 +0000 (21:53 -0700)]
[PATCH] selinux: fix avc_alloc_node() oom with no policy loaded
This patch should fix the avc_alloc_node() oom condition that Andrew
reported when no policy is loaded in SELinux.
Prior to this patch, when no policy was loaded, the SELinux "security
server" (policy engine) was only returning allowed decisions for the
requested permissions for each access check. This caused the cache to
thrash when trying to use SELinux for real work with no policy loaded
(typically, the no policy loaded state is only for bootstrapping to the
point where we can load an initial policy).
This patch changes the SELinux security server to return the complete
allowed access vector at once, and then to reset the cache after the
initial policy load to flush the initial cache state created during
bootstrapping.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kirill Korotaev [Tue, 17 May 2005 04:53:50 +0000 (21:53 -0700)]
[PATCH] do_swap_page() can map random data if swap read fails
There is a bug in do_swap_page(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space. The fix is
to check for PageUptodate and send SIGBUS in case of error.
If block_read_full_page() detects an error when running get_block() it will
run SetPageError(), then it will zero out the block in pagecache and will mark
the buffer_head uptodate.
So at the end of readahead we end up with a non-uptodate pagecache page which
is marked PageError. But it has uptodate buffers.
The pagefault code will run ClearPageError, will launch readpage a second time
and block_read_full_page() will notice the uptodate buffers and will mark the
page uptodate as well. We end up with an uptodate, !PageError page full of
zeros and the error is lost.
(It seems a little odd that filemap_nopage() runs ClearPageError(). I guess
all of this adds up to meaning that for each attempted access to the page, the
pagefault handler will retry the I/O. Which is good and bad. If the app is
ignoring SIGBUS for some reason we could get a lot of back-to-back I/O
errors.)
Fix it by not marking the pagecache buffer_head as uptodate if the attempt to
map that buffer to a disk block failed.
Credit-to: Qu Fuping <fs@ercist.iscas.ac.cn>
For reporting the bug and identifying its source.