This should fix a suspend/resume issues that appear with OHCI on some
PPC hardware. The PCI layer should doesn't have the hooks needed for
such ASIC-specific hooks (in this case, software clock gating), so
this moves the code to do that into hcd-pci.c ... where it can be
done after the relevant PCI PM state transition (to/from D3).
David Brownell [Wed, 23 Nov 2005 23:45:37 +0000 (15:45 -0800)]
[PATCH] USB: EHCI updates split init/reinit logic for resume
Moving the PCI-specific parts of the EHCI driver into their own file
created a few issues ... notably on resume paths which (like swsusp)
require re-initializing the controller. This patch:
- Splits the EHCI startup code into run-once HCD setup code and
separate "init the hardware" reinit code. (That reinit code is
a superset of the "early usb handoff" code.)
- Then it makes the PCI init code run both, and the resume code only
run the reinit code.
- It also removes needless pci wrappers around EHCI start/stop methods.
- Removes a byteswap issue that would be seen on big-endian hardware.
The HCD glue still doesn't actually provide a good way to do all this
run-one init stuff in one place though.
David Brownell [Wed, 23 Nov 2005 23:45:28 +0000 (15:45 -0800)]
[PATCH] USB: EHCI updates
This fixes some bugs in EHCI suspend/resume that joined us over the past
few releases (as usbcore, PCI, pmcore, and other components evolved):
- Removes suspend and resume recursion from the EHCI driver, getting
rid of the USB_SUSPEND special casing.
- Updates the wakeup mechanism to work again; there's a newish usbcore
call it needs to use.
- Provide simpler tests for "do we need to restart from scratch", to
address another case where PCI Vaux was lost. (In this case it was
restoring a swsusp snapshot, but there could be others.)
Un-exports a symbol that was temporarily exported.
A notable change from previous version is that this doesn't move
the spinlock init, so there's still a resume/reinit path bug.
Ian Abbott [Wed, 23 Nov 2005 23:45:23 +0000 (15:45 -0800)]
[PATCH] USB: ftdi_sio: new IDs for KOBIL devices
This patch adds two new devices to the ftdi_sio driver's device ID
table. The device IDs were supplied by Stefan Nies of KOBIL Systems for
two of their devices using the FTDI chip.
Damian Wrobel [Wed, 23 Nov 2005 23:45:17 +0000 (15:45 -0800)]
[PATCH] USB: SN9C10x driver - bad page state fix
This patch solves the following problem I've already discovered on the
latest 2.6.15-rc1-git1 kernel:
Nov 13 07:37:28 wrobel kernel: Bad page state at free_hot_cold_page (in process 'motion', page c164e020)
Nov 13 07:37:28 wrobel kernel: flags:0x40000400 mapping:00000000 mapcount:0 count:0
Nov 13 07:37:28 wrobel kernel: Backtrace:
Nov 13 07:37:28 wrobel kernel: [<c0146d86>] bad_page+0x85/0xbe
Nov 13 07:37:28 wrobel kernel: [<c0147629>] free_hot_cold_page+0x54/0x129
Nov 13 07:37:28 wrobel kernel: [<c01598c6>] __vunmap+0xa9/0xfe
Nov 13 07:37:28 wrobel kernel: [<c0154114>] vmalloc_to_page+0x34/0x55
Nov 13 07:37:28 wrobel kernel: [<c0159942>] vfree+0x27/0x35
Nov 13 07:37:28 wrobel kernel: [<f8a20292>] sn9c102_release_buffers+0x30/0x3f [sn9c102]
Nov 13 07:37:28 wrobel kernel: [<f8a231c2>] sn9c102_release+0x37/0xeb [sn9c102]
Nov 13 07:37:28 wrobel kernel: [<c0163e74>] __fput+0xa9/0x1aa
Nov 13 07:37:28 wrobel kernel: [<c01624f7>] filp_close+0x49/0x6d
Nov 13 07:37:30 wrobel kernel: [<c016258f>] sys_close+0x74/0x95
Nov 13 07:37:30 wrobel kernel: [<c0102ef9>] syscall_call+0x7/0xb
Nov 13 07:37:31 wrobel kernel: Trying to fix it up, but a reboot is needed
When attempting to hotadd a PCI card with a bridge on it, I saw
the kernel reporting resource collision errors even when there were
really no collisions. The problem is that the code doesn't skip
over "invalid" resources with their resource type flag not set.
Others have reported similar problems at boot time and for
non-bridge PCI card hotplug too, where the code flags a
resource collision for disabled ROMs. This patch fixes both
problems.
Rajesh Shah [Wed, 23 Nov 2005 23:44:54 +0000 (15:44 -0800)]
[PATCH] PCI Express Hotplug: clear sticky power-fault bit
Per the PCI Express spec, the power-fault-detected bit in the
slot status register can be set anytime hardware detects a power
fault, regardless of whether the slot has a device populated in
it or not. This bit is sticky and must be explicitly cleared.
This patch is needed to allow hot-add after such a power fault
has been detected.
Jean Delvare [Wed, 23 Nov 2005 23:44:31 +0000 (15:44 -0800)]
[PATCH] hwmon: Fix missing it87 fan div init
Fix a bug where setting the low fan speed limits will not work if no
data was ever read through the sysfs interface and the fan clock
dividers have not been explicitely set yet either. The reason is that
data->fan_div[nr] may currently be used before it is initialized from
the chip register values. The fix is to explicitely initialize
data->fan_div[nr] before using it.
Jean Delvare [Wed, 23 Nov 2005 23:44:26 +0000 (15:44 -0800)]
[PATCH] hwmon: Fix lm78 VID conversion
Fix the lm78 VID reading, which I accidentally broke while making
this driver use the common vid_from_reg function rather than
reimplementing its own in 2.6.14-rc1.
Jody McIntyre [Wed, 23 Nov 2005 23:44:03 +0000 (15:44 -0800)]
[PATCH] Clarify T: field in MAINTAINERS
Pavel Machek points out that for git repos, what we include is not
actually a URL. It is undesirable to use a URL since git repos can be
accessed in many different ways.
Olaf Rempel [Thu, 24 Nov 2005 03:04:08 +0000 (19:04 -0800)]
[BRIDGE]: recompute features when adding a new device
We must recompute bridge features everytime the list of underlying
devices changes, or we might end up with features that are not
supported by all devices (eg. NETIF_F_TSO)
This patch adds the missing recompute when adding a device to the bridge.
Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/netfilter/ip_conntrack_netlink.c: In function 'ctnetlink_dump_table':
net/ipv4/netfilter/ip_conntrack_netlink.c:409: warning: implicit declaration of function 'local_bh_disable'
net/ipv4/netfilter/ip_conntrack_netlink.c:427: warning: implicit declaration of function 'local_bh_enable'
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Gibson [Wed, 23 Nov 2005 21:37:45 +0000 (13:37 -0800)]
[PATCH] powerpc: fix for hugepage areas straddling 4GB boundary
Commit 7d24f0b8a53261709938ffabe3e00f88f6498df9 fixed bugs in the ppc64 SLB
miss handler with respect to hugepage handling, and in the process tweaked
the semantics of the hugepage address masks in mm_context_t.
Unfortunately, it left out a couple of necessary changes to go with that
change. First, the in_hugepage_area() macro was not updated to match,
second prepare_hugepage_range() was not updated to correctly handle
hugepages regions which straddled the 4GB point.
The latter appears only to cause process-hangs when attempting to map such
a region, but the former can cause oopses if a get_user_pages() is
triggered at the wrong point. This patch addresses both bugs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If unregister_console() is inadvertently called while no consoles are
registered, it will crash trying to dereference NULL pointer. It is
necessary to fix that because register_console() provides no indication
that it actually registered the console passed in. In fact, it may well
decide not to register it based on various things...
(akpm: It'd be better to make register_console() return something and fix the
callers. All 106 of them...)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jim Keniston [Wed, 23 Nov 2005 21:37:42 +0000 (13:37 -0800)]
[PATCH] kprobes: Fix return probes on sys_execve
Fix a bug in kprobes that can cause an Oops or even a crash when a return
probe is installed on one of the following functions: sys_execve,
do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to
remove the call to kprobe_flush_task() in flush_thread(). This fix has
been tested on all architectures for which the return-probes feature has
been implemented (i386, x86_64, ppc64, ia64). Please apply.
BACKGROUND
Up to now, we have called kprobe_flush_task() under two situations: when a
task exits, and when it execs. Flushing kretprobe_instances on exit is
correct because (a) do_exit() doesn't return, and (b) one or more
return-probed functions may be active when a task calls do_exit(). Neither
is the case for sys_execve() and its callees.
Initially, the mistaken call to kprobe_flush_task() on exec was harmless
because we put the "real" return address of each active probed function
back in the stack, just to be safe, when we recycled its
kretprobe_instance. When support for ppc64 and ia64 was added, this safety
measure couldn't be employed, and was eventually dropped even for i386 and
x86_64. sys_execve() and its callees were informally blacklisted for
return probes until this fix was developed.
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 23 Nov 2005 21:37:40 +0000 (13:37 -0800)]
[PATCH] mm: fill arch atomic64 gaps
alpha, sparc64, x86_64 are each missing some primitives from their atomic64
support: fill in the gaps I've noticed by extrapolating asm, follow the
groupings in each file. But powerpc and parisc still lack atomic64.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@muc.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 23 Nov 2005 21:37:39 +0000 (13:37 -0800)]
[PATCH] mm: powerpc ptlock comments
Update comments (only) on page_table_lock and mmap_sem in arch/powerpc.
Removed the comment on page_table_lock from hash_huge_page: since it's no
longer taking page_table_lock itself, it's irrelevant whether others are; but
how it is safe (even against huge file truncation?) I can't say.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 23 Nov 2005 21:37:38 +0000 (13:37 -0800)]
[PATCH] mm: unbloat get_futex_key
The follow_page changes in get_futex_key have left it with two almost
identical blocks, when handling the rare case of a futex in a nonlinear vma.
get_user_pages will itself do that follow_page, and its additional
find_extend_vma is hardly any overhead since the vma is already cached. Let's
just delete the follow_page block and let get_user_pages do it.
Hugh Dickins [Wed, 23 Nov 2005 21:37:37 +0000 (13:37 -0800)]
[PATCH] mm: update split ptlock Kconfig
Closer attention to the arithmetic shows that neither ppc64 nor sparc really
uses one page for multiple page tables: how on earth could they, while
pte_alloc_one returns just a struct page pointer, with no offset?
Well, arm26 manages it by returning a pte_t pointer cast to a struct page
pointer, harumph, then compensating in its pmd_populate. But arm26 is never
SMP, so it's not a problem for split ptlock either.
And the PA-RISC situation has been recently improved: CONFIG_PA20 works
without the 16-byte alignment which inflated its spinlock_t. But the current
union of spinlock_t with private does make the 7xxx struct page significantly
larger, even without debug, so disable its split ptlock.
Changing the #define to an inline function breaks on non-SMP builds,
since wuite a few places in the kernel do not implement the ipi handler
when compiling for UP.
Linus Torvalds [Wed, 23 Nov 2005 03:39:30 +0000 (19:39 -0800)]
Fix up GFP_ZONEMASK for GFP_DMA32 usage
There was some confusion about the different zone usage, this should fix
up the resulting mess in the GFP zonemask handling.
The different zone usage is still confusing (it's very easy to mix up
the individual zone numbers with the GFP zone _list_ numbers), so we
might want to clean up some of this in the future, but in the meantime
this should fix the actual problems.
Neil Horman [Tue, 22 Nov 2005 22:56:32 +0000 (14:56 -0800)]
[NET]: Fix ifenslave to not fail on lack of IP information
Patch to ifenslave so that under older ABI versions, a failure to propogate ip
information from master to slave does not result in a filure to enslave the
slave device.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Remove proto == NULL checking since ip_conntrack_[nat_]proto_find_get
always returns a valid pointer.
Fix missing ip_conntrack_proto_put in some paths.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 22 Nov 2005 22:47:37 +0000 (14:47 -0800)]
[IPV4]: Fix secondary IP addresses after promotion
This patch fixes the problem with promoting aliases when:
a) a single primary and > 1 secondary addresses
b) multiple primary addresses each with at least one secondary address
Based on earlier efforts from Brian Pomerantz <bapper@piratehaven.org>,
Patrick McHardy <kaber@trash.net> and Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Zippel [Tue, 22 Nov 2005 05:32:38 +0000 (21:32 -0800)]
[PATCH] prefer pkg-config for the QT check
This makes pkg-config now the prefered way to configure QT and properly
fixes the recent Fedora breakage and leaves the old QT detection as
fallback mechanism.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] device-mapper raid1: drop mark_region spinlock fix
The spinlock region_lock is held while calling mark_region which can sleep.
Drop the spinlock before calling that function.
A region's state and inclusion in the clean list are altered by rh_inc and
rh_dec. The state variable is set to RH_CLEAN in rh_dec, but only if
'pending' is zero. It is set to RH_DIRTY in rh_inc, but not if it is already
so. The changes to 'pending', the state, and the region's inclusion in the
clean list need to be atomicly.
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The linux bitset operators (test_bit, set_bit etc) work on arrays of "unsigned
long". dm-log uses such bitsets but treats them as arrays of uint32_t, only
allocating and zeroing a multiple of 4 bytes (as 'clean_bits' is a uint32_t).
The patch below fixes this problem.
The problem is specific to 64-bit big endian machines such as s390x or ppc-64
and can prevent pvmove terminating.
In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).
0 0 0 0 X X X X
^
here
which hasn't been cleared properly.
As the dm-raid1 code only syncs and counts regions which have a 0 in the
'sync_bits' bitset, and only finishes when it has counted high enough, a large
number of 1's among those 'X's will cause the sync to not complete.
It is worth noting that the code uses the same bitsets for in-memory and
on-disk logs. As these bitsets are host-endian and host-sized, this means
that they cannot safely be moved between computers with
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In some circumstances the LIST_VERSIONS output is truncated because the size
calculation forgets about a 'uint32_t' in each structure - but the inclusion
of the whole of ALIGN_MASK frequently compensates for the omission.
This is a quick workaround to use an upper bound. (The code ought to be fixed
to supply the actual size.)
Running 'dmsetup targets' may demonstrate the problem: when I run it, the last
line comes out as 'erro' instead of 'error'. Consequently, 'lvcreate --type
error' doesn't work.
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matthew Dobson [Tue, 22 Nov 2005 05:32:29 +0000 (21:32 -0800)]
[PATCH] Fix a bug in scsi_get_command
scsi_get_command() attempts to write into a structure that may not have
been successfully allocated. Move this write inside the if statement that
ensures we won't panic the kernel with a NULL pointer dereference.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Grant Coady [Tue, 22 Nov 2005 05:32:28 +0000 (21:32 -0800)]
[PATCH] cpufreq: silence cpufreq for UP
drivers/cpufreq/cpufreq.c: In function `cpufreq_remove_dev':
drivers/cpufreq/cpufreq.c:696: warning: unused variable `cpu_sys_dev'
Signed-off-by: Grant Coady <gcoady@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Paris [Tue, 22 Nov 2005 05:32:28 +0000 (21:32 -0800)]
[PATCH] hugetlb: fix race in set_max_huge_pages for multiple updaters of nr_huge_pages
If there are multiple updaters to /proc/sys/vm/nr_hugepages simultaneously
it is possible for the nr_huge_pages variable to become incorrect. There
is no locking in the set_max_huge_pages function around
alloc_fresh_huge_page which is able to update nr_huge_pages. Two callers
to alloc_fresh_huge_page could race against each other as could a call to
alloc_fresh_huge_page and a call to update_and_free_page. This patch just
expands the area covered by the hugetlb_lock to cover the call into
alloc_fresh_huge_page. I'm not sure how we could say that a sysctl section
is performance critical where more specific locking would be needed.
My reproducer was to run a couple copies of the following script
simultaneously
and then watch /proc/meminfo and eventually you will see things like
HugePages_Total: 100
HugePages_Free: 109
After applying the patch all seemed well.
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] vgacon: Fix usage of stale height value on vc initialization
Reported by: Wayne E. Harlan
"[1.] One line summary of the problem:
When the kernel option "vga=1" is used, additional tty's (alt+control+Fx
with x=2,3,4,5, etc) do not provide the full 50 lines of output. The first
one does have 50 lines, however.
[2.] Full description of the problem/report:
These addtitional tty's show only 39 lines plus the top pixel of the 40-th
line. The remaining lines are black and not shown. Kernel version
2.6.13.4 does not show this problem."
This bug is caused by using a stale font height value on vgacon_init.
Booting with vga=1 gives an 80x50 screen with an 8x8 font. Somewhere
during the initialization, the font was changed to 8x9 and the first
vc was correctly resized to 80x44. However, the rest of the vc's were
not allocated yet, and when they were subsequently initialized, they
still used a font height of 8 (instead of 9) causing the mentioned bug.
Fix by saving the new font height to vga_video_font_height.
The shift value (amount to shift the bitmap so first pixel starts at
origin(0,0)) is incorrect. This causes corrupted characters or a kernel crash
if fontwidth is not divisible by 8 at 270 degrees, or fontheight not divisible
by 8 at 180 degrees.
Report and part of the fix contributed by Knut Petersen.
David Gibson [Tue, 22 Nov 2005 05:32:24 +0000 (21:32 -0800)]
[PATCH] Fix hugetlbfs_statfs() reporting of block limits
Currently, if a hugetlbfs is mounted without limits (the default), statfs()
will return -1 for max/free/used blocks. This does not appear to be in
line with normal convention: simple_statfs() and shmem_statfs() both return
0 in similar cases. Worse, it confuses the translation logic in
put_compat_statfs(), causing it to return -EOVERFLOW on such a mount.
This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks
on a mount without limits. Note that we need the test in the patch below,
rather than just using 0 in the sbinfo structure, because the -1 marked in
the free blocks field is used internally to tell the
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Tue, 22 Nov 2005 05:32:23 +0000 (21:32 -0800)]
[PATCH] Fix error handling with put_compat_statfs()
In fs/compat.c, whenever put_compat_statfs() returns an error, the
containing syscall returns -EFAULT. This is presumably by analogy with the
non-compat case, where any non-zero code from copy_to_user() should be
translated into an EFAULT. However, put_compat_statfs() is also return
-EOVERFLOW. The same applies for put_compat_statfs64().
This bug can be observed with a statfs() on a hugetlbfs directory.
hugetlbfs, when mounted without limits reports available, free and total
blocks as -1 (itself a bug, another patch coming). statfs() will
mysteriously return EFAULT although it's parameters are perfectly valid
addresses.
This patch causes the compat versions of statfs() and statfs64() to
correctly propogate the return values from put_compat_statfs() and
put_compat_statfs64().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Tue, 22 Nov 2005 05:32:22 +0000 (21:32 -0800)]
[PATCH] unpaged: fix sound Bad page states
Earlier I unifdefed PageCompound, so that snd_pcm_mmap_control_nopage and
others can give out a 0-order component of a higher-order page, which won't
be mistakenly freed when zap_pte_range unmaps it. But many Bad page states
reported a PG_reserved was freed after all: I had missed that we need to
say __GFP_COMP to get compound page behaviour.
Some of these higher-order pages are allocated by snd_malloc_pages, some by
snd_malloc_dev_pages; or if SBUS, by sbus_alloc_consistent - but that has
no gfp arg, so add __GFP_COMP into its sparc32/64 implementations.
I'm still rather puzzled that DRM seems not to need a similar change.
Hugh Dickins [Tue, 22 Nov 2005 05:32:20 +0000 (21:32 -0800)]
[PATCH] unpaged: PG_reserved bad_page
It used to be the case that PG_reserved pages were silently never freed, but
in 2.6.15-rc1 they may be freed with a "Bad page state" message. We should
work through such cases as they appear, fixing the code; but for now it's
safer to issue the message without freeing the page, leaving PG_reserved set.
Hugh Dickins [Tue, 22 Nov 2005 05:32:19 +0000 (21:32 -0800)]
[PATCH] unpaged: ZERO_PAGE in VM_UNPAGED
It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
let's not insert the ZERO_PAGE there - though whether it would matter will
depend on what we decide about ZERO_PAGE refcounting.
But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
area, do_no_page should never be: just BUG_ON.
Hugh Dickins [Tue, 22 Nov 2005 05:32:18 +0000 (21:32 -0800)]
[PATCH] unpaged: anon in VM_UNPAGED
copy_one_pte needs to copy the anonymous COWed pages in a VM_UNPAGED area,
zap_pte_range needs to free them, do_wp_page needs to COW them: just like
ordinary pages, not like the unpaged.
But recognizing them is a little subtle: because PageReserved is no longer a
condition for remap_pfn_range, we can now mmap all of /dev/mem (whether the
distro permits, and whether it's advisable on this or that architecture, is
another matter). So if we can see a PageAnon, it may not be ours to mess with
(or may be ours from elsewhere in the address space). I suspect there's an
entertaining insoluble self-referential problem here, but the page_is_anon
function does a good practical job, and MAP_PRIVATE PROT_WRITE VM_UNPAGED will
always be an odd choice.
In updating the comment on page_address_in_vma, noticed a potential NULL
dereference, in a path we don't actually take, but fixed it.
Hugh Dickins [Tue, 22 Nov 2005 05:32:17 +0000 (21:32 -0800)]
[PATCH] unpaged: COW on VM_UNPAGED
Remove the BUG_ON(vma->vm_flags & VM_UNPAGED) from do_wp_page, and let it do
Copy-On-Write without touching the VM_UNPAGED's page counts - but this is
incomplete, because the anonymous page it inserts will itself need to be
handled, here and in other functions - next patch.
We still don't copy the page if the pfn is invalid, because the
copy_user_highpage interface does not allow it. But that's not been a problem
in the past: can be added in later if the need arises.
Hugh Dickins [Tue, 22 Nov 2005 05:32:16 +0000 (21:32 -0800)]
[PATCH] unpaged: VM_NONLINEAR VM_RESERVED
There's one peculiar use of VM_RESERVED which the previous patch left behind:
because VM_NONLINEAR's try_to_unmap_cluster uses vm_private_data as a swapout
cursor, but should never meet VM_RESERVED vmas, it was a way of extending
VM_NONLINEAR to VM_RESERVED vmas using vm_private_data for some other purpose.
But that's an empty set - they don't have the populate function required. So
just throw away those VM_RESERVED tests.
But one more interesting in rmap.c has to go too: try_to_unmap_one will want
to swap out an anonymous page from VM_RESERVED or VM_UNPAGED area.
Hugh Dickins [Tue, 22 Nov 2005 05:32:15 +0000 (21:32 -0800)]
[PATCH] unpaged: VM_UNPAGED
Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
drivers set VM_RESERVED on areas which are then populated by nopage. The
PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
zap_pte_range, without changing those drivers not to set it: so their pages
just leak away.
Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
to flag the special areas where the ptes may have no struct page, or if they
have then it's not to be touched. Replace most instances of VM_RESERVED in
core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
sparc64 io_remap_pfn_range.
Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
needed anywhere? It still governs the mm->reserved_vm statistic, and special
vmas not to be merged, and areas not to be core dumped; but could probably be
eliminated later (the drivers are probably specifying it because in 2.4 it
kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
don't get on).
Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
purpose whatsoever, and should be removed from drivers when we clean up.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Tue, 22 Nov 2005 05:32:14 +0000 (21:32 -0800)]
[PATCH] unpaged: unifdefed PageCompound
It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.
That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.
Hugh Dickins [Tue, 22 Nov 2005 05:32:13 +0000 (21:32 -0800)]
[PATCH] unpaged: sound nopage get_page
Something noticed when studying use of VM_RESERVED in different drivers:
snd_usX2Y_hwdep_pcm_vm_nopage omitted to get_page: fixed.
And how did this work before? Aargh! That nopage is returning a page from
within a buffer allocated by snd_malloc_pages, which allocates a high-order
page, then does SetPageReserved on each 0-order page within.
That would have worked in 2.6.14, because when the area was unmapped,
PageReserved inhibited put_page. 2.6.15-rc1 removed that inhibition (while
leaving ineffective PageReserveds around for now), but it hasn't caused
trouble because.. we've not been freeing from VM_RESERVED at all.
Hugh Dickins [Tue, 22 Nov 2005 05:32:12 +0000 (21:32 -0800)]
[PATCH] unpaged: private write VM_RESERVED
The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you
tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed
with -EACCES: because do_wp_page lacks the refinement to COW pages in those
areas, nor do we expect to find anonymous pages in them; and it seemed just
bloat to add code for handling such a peculiar case. But immediately it
caused vbetool and ddcprobe (using lrmi) to fail.
So revert the "deprecated" messages, letting mmap and mprotect succeed. But
leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've
added the code to do it right: so this particular patch is only good if the
app doesn't really need to write to that private area.
Hugh Dickins [Tue, 22 Nov 2005 05:32:11 +0000 (21:32 -0800)]
[PATCH] unpaged: get_user_pages VM_RESERVED
The PageReserved removal in 2.6.15-rc1 prohibited get_user_pages on the areas
flagged VM_RESERVED in place of PageReserved. That is correct in theory - we
ought not to interfere with struct pages in such a reserved area; but in
practice it broke BTTV for one.
So revert to prohibiting only on VM_IO: if someone gets into trouble with
get_user_pages on VM_RESERVED, it'll just be a "don't do that".
You can argue that videobuf_mmap_mapper shouldn't set VM_RESERVED in the first
place, but now's not the time for breaking drivers without notice.
Jeff Dike [Tue, 22 Nov 2005 05:32:10 +0000 (21:32 -0800)]
[PATCH] uml: eliminate use of libc PAGE_SIZE
On some systems, libc PAGE_SIZE calls getpagesize, which can't happen from a
stub. So, I use UM_KERN_PAGE_SIZE, which is less variable in its definition,
instead.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Tue, 22 Nov 2005 05:32:09 +0000 (21:32 -0800)]
[PATCH] uml: properly invoke x86_64 system calls
This patch makes stub_segv use the stub_syscall macros. This was needed
anyway, but the bug that prompted this was the discovery that gcc was storing
stuff in RCX, which is trashed across a system call. This is exactly the sort
of problem that the new macros fix.
There is a stub_syscall0 for getpid. stub_segv was changed to be a libc file,
and that caused some include changes.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Tue, 22 Nov 2005 05:32:04 +0000 (21:32 -0800)]
[PATCH] uml: eliminate use of local in clone stub
We have a bug in the i386 stub_syscall6 which pushes ebp before the system
call and pops it afterwards. Because we use syscall6 to remap the stack, the
old contents of the stack (and the former value of ebp) are no longer
available. Some versions of gcc make from a real local, accessed through ebp,
despite my efforts to make it obvious that references to from are really
constants. This patch attempts to make it even more obvious by eliminating
from and using a macro to access the stub's data explicitly with constants.
My original thinking on this was to replace syscall6 with a remap_stack
interface which saved ebp someplace and restored it afterwards. The problem
is that there are no registers to put it in, except for esp. That could work,
since we can store a constant in esp after the mmap because we just replaced
the stack. However, this approach seems a tad cleaner.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
hawkes@sgi.com [Fri, 18 Nov 2005 19:30:34 +0000 (11:30 -0800)]
[IA64] fix bug in sn/ia64 for sparse CPU numbering
The kernel's use of the for_each_*cpu(i) macros has allowed for sparse CPU
numbering. When I hacked the kernel to test sparse cpu_present_map[] and
cpu_possible_map[] cpumasks, I discovered one remaining spot, in
sn_hwperf_ioctl() during sn initialization, that needs to be fixed.
Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Dean Roe <roe@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Jens Axboe [Mon, 21 Nov 2005 18:49:41 +0000 (19:49 +0100)]
[PATCH] as-iosched: remove state assertion in as_add_request()
Kill the arq->state poison statement in as_add_request(), it can trigger
for perfectly valid code that just reuses a request after io completion
instead of freeing it and allocating a new one. We probably should
introduce a blk_init_request() to start from scratch, but for now just
kill it as we will be removing the as specific poisoning soon.
Russell King [Mon, 21 Nov 2005 15:26:18 +0000 (15:26 +0000)]
[ARM] Add asm/memory.h to asm/numnodes.h
Since the defintion of NODES_SHIFT may be overridden in asm/arch/memory.h
it's important to include asm/memory.h into asm/numnodes.h to ensure
that the correct value is always defined.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>