[CELL] spufs: rework list management and associated locking
This sorts out the various lists and related locks in the spu code.
In detail:
- the per-node free_spus and active_list are gone. Instead struct spu
gained an alloc_state member telling whether the spu is free or not
- the per-node spus array is now locked by a per-node mutex, which
takes over from the global spu_lock and the per-node active_mutex
- the spu_alloc* and spu_free function are gone as the state change is
now done inline in the spufs code. This allows some more sharing of
code for the affinity vs normal case and more efficient locking
- some little refactoring in the affinity code for this locking scheme
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Bob Nelson [Fri, 20 Jul 2007 19:39:53 +0000 (21:39 +0200)]
[CELL] oprofile: add support to OProfile for profiling CELL BE SPUs
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory
was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code.
Exports spu_set_profile_private_kref and spu_get_profile_private_kref which
are used by OProfile to store private profile information in spufs data
structures.
Also incorporated several fixes from other patches (rrn). Check pointer
returned from kzalloc. Eliminated unnecessary cast. Better error
handling and cleanup in the related area. 64-bit unsigned long parameter
was being demoted to 32-bit unsigned int and eventually promoted back to
unsigned long.
Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Bob Nelson <rrnelson@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org>
Bob Nelson [Fri, 20 Jul 2007 19:39:52 +0000 (21:39 +0200)]
[CELL] oprofile: enable SPU switch notification to detect currently active SPU tasks
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch adds to the capability of spu_switch_event_register so that
the caller is also notified of currently active SPU tasks.
Exports spu_switch_event_register and spu_switch_event_unregister so
that OProfile can get access to the notifications provided.
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Bob Nelson <rrnelson@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org>
[CELL] cell: indexing of SPUs based on firmware vicinity properties
This patch links spus according to their physical position using
information provided by the firmware through a special vicinity
device-tree property. This property is present in current version
of Malta firmware.
Example of vicinity properties for a node in Malta:
[CELL] spufs: integration of SPE affinity with the scheduller
This patch makes the scheduller honor affinity information for each
context being scheduled. If the context has no affinity information,
behaviour is unchanged. If there are affinity information, context is
schedulled to be run on the exact spu recommended by the affinity
placement algorithm.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
[CELL] cell: add placement computation for scheduling of affinity contexts
This patch provides the spu affinity placement logic for the spufs scheduler.
Each time a gang is going to be scheduled, the placement of a reference
context is defined. The placement of all other contexts with affinity from
the gang is defined based on this reference context location and on a
precomputed displacement offset.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
[CELL] spufs: extension of spu_create to support affinity definition
This patch adds support for additional flags at spu_create, which relate
to the establishment of affinity between contexts and contexts to memory.
A fourth, optional, parameter is supported. This parameter represent
a affinity neighbor of the context being created, and is used when defining
SPU-SPU affinity.
Affinity is represented as a doubly linked list of spu_contexts.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
[CELL] cell: add hardcoded spu vicinity information for QS20
This patch allows the use of spu affinity on QS20, whose
original FW does not provide affinity information.
This is done through two hardcoded arrays, and by reading the reg
property from each spu.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds affinity data to each spu instance.
A doubly linked list is created, meant to connect the spus
in the physical order they are placed in the BE. SPUs
near to memory should be marked as having memory affinity.
Adjustments of the fields acording to FW properties is done
in separate patches, one for CPBW, one for Malta (patch for
Malta under testing).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
[CELL] cell: add per BE structure with info about its SPUs
Addition of a spufs-global "cbe_info" array. Each entry contains information
about one Cell/B.E. node, namelly:
* list of spus (both free and busy spus are in this list);
* list of free spus (replacing the static spu_list from spu_base.c)
* number of spus;
* number of reserved (non scheduleable) spus.
SPE affinity implementation actually requires only access to one spu per
BE node (since it implements its own pointer to walk through the other spus
of the ring) and the number of scheduleable spus (n_spus - non_sched_spus)
However having this more general structure can be useful for other
functionalities, concentrating per-cbe statistics / data.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
[CELL] spufs: use find_first_bit() instead of sched_find_first_bit()
spu_sched->bitmap has MAX_PRIO(=140) width in bits.However, since ff80a77f20f811c0cc5b251d0f657cbc6f788385, sched_find_first_bit()
only supports 100-bit bitmaps.
Thus, spu_sched->bitmap should be treated by generic find_first_bit().
The decr_status in the LSCSA is confusedly used as two meanings:
* SPU decrementer was running
* SPU decrementer was wrapped as a result of adjust
and the code to set decr_status is missing.
This patch fixes these problems by using the decr_status argument as a
set of flags. This requires a rebuild of the shipped spu_restore code.
The following steps are not needed in the SPE context save/restore
paths:
Save Step 12: save_mfc_decr()
save suspend_time to CSA (It will be done by step 14)
save ch 7 (decrementer value will be saved in LSCSA by spe-side step 10)
Restore Step 59: restore_ch_part1()
restore ch 1 (it will be done by spe-side step 15)
[CELL] spufs: make sure context are scheduled again after spu_acquire_saved
Currently a process is removed from the physical spu when spu_acquire_saved
is saved but never put back. This patch adds a new spu_release_saved
that is to be paired with spu_acquire_saved and put the process back if
it has been in RUNNABLE state before.
Niether Jeremy not be are entirely happy about this exact patch because
it adds another spu_activate call outside of the owner thread, but I
feel this is the best short-term fix we can come up with.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Andre Detsch [Fri, 20 Jul 2007 19:39:33 +0000 (21:39 +0200)]
[CELL] spufs: add spu stats in sysfs and ctx stat file in spufs
This patch exports per-context statistics in spufs as long as spu
statistics in sysfs.
It was formed by merging:
"spufs: add spu stats in sysfs" From: Christoph Hellwig
"spufs: add stat file to spufs" From: Christoph Hellwig
"spufs: fix libassist accounting" From: Jeremy Kerr
"spusched: fix spu utilization statistics" From: Luke Browning
And some adjustments by myself, after suggestions on cbe-oss-dev.
Having separate patches was making the review process harder
than it should, as we end up integrating spus and ctx statistics
accounting much more than it was on the first implementation.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Jeremy Kerr [Fri, 20 Jul 2007 19:39:32 +0000 (21:39 +0200)]
[CELL] spufs: Remove spurious WARN_ON for spu_deactivate for NOSCHED contexts
In 6cbf93960e64f313f6e247cbca7afaa50e3ee2c we added a WARN_ON for
calling spu_deactivate on contexts created with the SPU_CREATE_NOSCHED
flag. However, all NOSCHED contexts will need to be deactivated when
the context is destroyed, so this gives a spurious warning when any
NOSCHED context is closed.
This change removes the WARN_ON.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
[CELL] spufs: Avoid unexpectedly restaring MFC during context save
The current SPU context saving procedure in SPUFS unexpectedly
restarts MFC when halting decrementer, because MFC_CNTL[Dh] is set
without MFC_CNTL[Sm]. This bug causes, for example, saving broken DMA
queues. Here is a patch to fix the problem.
Michael Ellerman [Fri, 20 Jul 2007 19:39:28 +0000 (21:39 +0200)]
[CELL] add support for MSI on Axon-based Cell systems
This patch adds support for the setup and decoding of MSIs
on Axon-based Cell systems, using the MSIC mechanism.
This involves setting up an area of BE memory which the Axon
then uses as a FIFO for MSI messages. When one or more MSIs
are decoded by the MSIC we receive an interrupt on the MPIC,
and the MSI messages are written into the FIFO. At the moment
we use a 64KB FIFO, one per MSIC/BE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Andre Detsch [Fri, 20 Jul 2007 19:39:27 +0000 (21:39 +0200)]
[CELL] saving spus information for kexec crash
This patch adds support for investigating spus information after a
kernel crash event, through kdump vmcore file.
Implementation is based on xmon code, but the new functionality was
kept independent from xmon.
Signed-off-by: Lucio Jose Herculano Correia <luciojhc@br.ibm.com> Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The Axon bridge chip used on new Cell/B.E. based blade servers
comes with a DDR2 memory controller that can be used to
attach cheap memory modules, as opposed to the high-speed
XDR memory that is used by the CPU itself.
Since the memory controller does not participate in the
cache coherency protocol, we can not use the memory direcly
for Linux applications, but by providing a block device
it can be used for swap space, temporary file storage and
through the use of the direct_access block device operation
for mapping into user addresses, when it is mounted with
an appropriate file system.
[CELL] allow linux to map Cell regs on legacy SLOF tree.
The platforms missing the "cpus" property in the "be" node are mono-Cell
platforms such as CAB or Getaway.
Therefore it is possible to assume that if there is no "cpus" properties
under the "be" node then we can safely return the "device node" without
more checking. This is a bit hacky but ... it allows it to work on
these platforms.
Previous patch changed based on Christian Krafft's comment.
On some legacy SLOF tree the generic code is unable to ioremap some Cell BE
registers. Therefore the "generic" functions are returning a NULL pointer,
triggering a crash on such platforms.
Previous patch changed based on Christian Krafft's comment.
On some legacy SLOF tree the generic code is unable to ioremap some Cell BE
registers. Therefore the "generic" functions are returning a NULL pointer,
triggering a crash on such platforms.
Christian Krafft [Fri, 20 Jul 2007 19:39:22 +0000 (21:39 +0200)]
[CELL] cbe_cpufreq: reorganize code
This patch reorganizes the code of the driver into three files.
Two cbe_cpufreq_pmi.c and cbe_cpufreq_pervasive.c care about hardware.
cbe_cpufreq.c contains the logic.
There is no changed behaviour, except that the PMI related function
is now located in a seperate module cbe_cpufreq_pmi. This module
will be required by cbe_cpufreq, if CONFIG_CBE_CPUFREQ_PMI has been set.
Signed-off-by: Christian Krafft <krafft@de.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Christian Krafft [Fri, 20 Jul 2007 19:39:21 +0000 (21:39 +0200)]
[CELL] cbe_cpufreq: fix minor issues
Minor issues have been fixed:
* added a missing call to of_node_put()
* signedness of a function parameter
* added some line breaks
* changed global pmi_frequency_limit to a
per node pmi_slow_mode_limit array
Signed-off-by: Christian Krafft <krafft@de.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Christian Krafft [Fri, 20 Jul 2007 19:39:20 +0000 (21:39 +0200)]
[CELL] cbe_cpufreq: fix initialization
This patch fixes the initialization of the cbe_cpufreq driver.
The code that initializes the PMI related functions was called per cpu:
* registering cpufreq notifier block
* registering a pmi handler
This ends in a bug that the notifier block gets called in an endless loop.
The initialization code is being put to the
module init code path by this patch. This way it only gets called once.
Signed-off-by: Christian Krafft <krafft@de.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Christian Krafft [Fri, 20 Jul 2007 19:39:18 +0000 (21:39 +0200)]
[CELL] pmi: remove support for mutiple devices.
The pmi driver got simplified by removing support for multiple devices.
As there is no more than one pmi device per maschine, there is no need to
specify the device for listening and sending messages.
This way the caller (cbe_cpufreq) doesn't need to scan the device tree.
When registering the handler on a board without a pmi
interface, pmi.c will just return -ENODEV.
The patch that fixed the breakage of cell_defconfig has been
broken out of the earlier version of this patch. So this is
the version that applies cleanly on top of it.
Signed-off-by: Christian Krafft <krafft@de.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Avi Kivity [Fri, 20 Jul 2007 05:18:27 +0000 (08:18 +0300)]
KVM: MMU: Fix oopses with SLUB
The kvm mmu uses page->private on shadow page tables; so does slub, and
an oops result. Fix by allocating regular pages for shadows instead of
using slub.
Tested-by: S.Çağlar Onur <caglar@pardus.org.tr> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 17 Jul 2007 10:04:56 +0000 (13:04 +0300)]
KVM: Fix memory slot management functions for guest smp
The memory slot management functions were oriented against vcpu 0, where
they should be kvm-wide. This causes hangs starting X on guest smp.
Fix by making the functions (and resultant tail in the mmu) non-vcpu-specific.
Unfortunately this reduces the efficiency of the mmu object cache a bit. We
may have to revisit this later.
Avi Kivity [Tue, 10 Jul 2007 14:50:55 +0000 (17:50 +0300)]
KVM: MMU: Store nx bit for large page shadows
We need to distinguish between large page shadows which have the nx bit set
and those which don't. The problem shows up when booting a newer smp Linux
kernel, where the trampoline page (which is in real mode, which uses the
same shadow pages as large pages) is using the same mapping as a kernel data
page, which is mapped using nx, causing kvm to spin on that page.
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/ofcons
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/ofcons:
Create drivers/of/platform.c
Create linux/of_platorm.h
[SPARC/64] Rename some functions like PowerPC
Begin consolidation of of_device.h
Begin to consolidate of_device.c
Consolidate of_find_node_by routines
Consolidate of_get_next_child
Consolidate of_get_parent
Consolidate of_find_property
Consolidate of_device_is_compatible
Start split out of common open firmware code
Split out common parts of prom.h
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: appletouch - improve powersaving for Geyser3 devices
Input: lifebook - fix an oops on Panasonic CF-18
Input: document intended meaning of KEY_SWITCHVIDEOMODE
Input: switch to using seq_list_xxx helpers
Input: i8042 - give more trust to PNP data on i386
Input: add driver for Fujitsu serial touchscreens
Input: ads7846 - re-check pendown status before reporting events
Input: ads7846 - introduce sample settling delay
Input: xpad - add support for leds on xbox 360 pad
If add_to_page_cache_lru() fails, the page will not be locked. But
splice jumps to an error path that does a page release and unlock,
causing a BUG() in unlock_page().
Fix this by adding one more label that just releases the page. This bug
was actually triggered on EL5 by gurudas pai <gurudas.pai@oracle.com>
using fio.
Rusty Russell [Fri, 20 Jul 2007 12:15:01 +0000 (22:15 +1000)]
lguest: override sched_clock
Guests currently use the default scheduler clock: this means they
always use jiffies even if TSC is actually available. It doesn't make
any noticeable difference here, but it's a better thing to do.
Also remove commented-out asm/sched-clock.h from -mm tree.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Fri, 20 Jul 2007 12:11:13 +0000 (22:11 +1000)]
lguest: fix sense if IF flag on interrupt injection
The sense of the IF bit is backwards in the host interrupt handling.
This means we always save "IF=1" on the stack when injecting an
interrupt. It turns out this is almost always correct (unless the
guest is taking a page fault in an interrupt due to an unpopulated
vmalloc mapping), so went unnoticed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 20 Jul 2007 09:59:41 +0000 (10:59 +0100)]
AFS: Use patched rxrpc_kernel_send_data() correctly
Fix afs_send_simple_reply() to accept a greater-than-zero return value from
rxrpc_kernel_send_data() as being a successful return rather than thinking it
an error and aborting the call.
rxrpc_kernel_send_data() previously returned zero incorrectly when it worked
successfully, but has been patched to return the number of bytes it
transmitted.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits)
sh: intc - add support for SH7750 and its variants
sh: Move entry point code to .text.head.
sh: heartbeat: Shut up resource size warning.
sh: update r2d defconfig and fix SH7751R pci compliation
sh: Many symbol exports for nommu allmodconfig.
sh: zero terminate 8250 platform data for r2d board
sh: cpufreq: Fix up the build for SH-2.
sh: Make on-chip DMA channel selection explicit.
sh: Fix up CPU dependencies for on-chip DMAC.
sh: cpufreq: clock framework support.
sh: Support rate rounding for SH7722 FRQCR clocks.
sh: Implement clk_round_rate() in the clock framework.
sh: Fix up PCI section mismatch warnings.
sh: Wire up fallocate() syscall.
sh: intc - add support for 7780
sh: intc - improve group support
sh: Fix up SH-3 and SH-4 driver dependencies.
sh: push-switch: Correct license string.
sh: cpufreq: Fix driver dependencies and flag as broken.
sh: IPR/INTC2 IRQ setup consolidation.
...
Merge branch 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa
* 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa: (102 commits)
[ALSA] version 1.0.14
[ALSA] remove duplicate Logitech Quickcam USB ID in usbquirks.h
[ALSA] hda-codec - Fix input with STAC92xx
[ALSA] hda-intel: support for iMac 24'' released on 09/2006
[ALSA] hda-codec - Add quirk for Asus P5LD2
[ALSA] snd-ca0106: Add support for X-Fi Extreme Audio.
[ALSA] snd-emu10k1:Enable E-Mu 1616m notebook firmware loading.
[ALSA] snd-emu10k1: Initial support for E-Mu 1616 and 1616m.
[ALSA] cs46xx - Fix PM resume
[ALSA] hda: Enable SPDIF in/out on some stac9205 boards
[ALSA] timer: check for incorrect device state in non-debug compiles, too
[ALSA] snd-aoa-codec-onyx: fix typo
[ALSA] hda-codec - Add quirks for HP dx2200/dx2250
[ALSA] hda-codec - Rename HP model-specific quirks
[ALSA] hda-codec - Add quirk for HP Samba
[ALSA] hda-codec - Add LG LW20 line-in capture source
[ALSA] usb-audio - Fix AC3 with M-Audio Audiophile USB
[ALSA] hda: stac9202 mixer fix
[ALSA] Make s3c24xx_i2s_set_clkdiv() change the correct bits
[ALSA] hda-codec - Add LG LW20 si3054 modem id
...
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6:
sh64: Flag sh64_get_page() as __init_refok.
sh64: Move entry point code to .text.head.
sh64: Fix up PCI section mismatch warnings.
sh64: Update cayman defconfig.
sh64: Wire up fallocate() syscall.
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (29 commits)
libata: implement EH fast drain
libata: schedule probing after SError access failure during autopsy
libata: clear HOTPLUG flag after a reset
libata: reorganize ata_ehi_hotplugged()
libata: improve SCSI scan failure handling
libata: quickly trigger SATA SPD down after debouncing failed
libata: improve SATA PHY speed down logic
The SATA controller device ID is different according to
ahci: implement SCR_NOTIFICATION r/w
ahci: make NO_NCQ handling more consistent
libata: make ->scr_read/write callbacks return error code
libata: implement AC_ERR_NCQ
libata: improve EH report formatting
sata_sil24: separate out sil24_do_softreset()
sata_sil24: separate out sil24_exec_polled_cmd()
sata_sil24: replace sil24_update_tf() with sil24_read_tf()
ahci: separate out ahci_do_softreset()
ahci: separate out ahci_exec_polled_cmd()
ahci: separate out ahci_kick_engine()
ahci: use deadline instead of fixed timeout for 1st FIS for SRST
...
Dan Williams [Fri, 20 Jul 2007 07:31:46 +0000 (00:31 -0700)]
async_tx: fix kmap_atomic usage in async_memcpy
Andrew Morton:
[async_memcpy] is very wrong if both ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC can ever be set. We'll end up using the same kmap
slot for both src add dest and we get either corrupted data or a BUG.
Evgeniy Polyakov:
Btw, shouldn't it always be kmap_atomic() even if flag is not set.
That pages are usual one returned by alloc_page().
So fix the usage of kmap_atomic and kill the ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC flags.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Fri, 20 Jul 2007 07:31:45 +0000 (00:31 -0700)]
fix some conversion overflows
Fix page index to offset conversion overflows in buffer layer, ecryptfs,
and ocfs2.
It would be nice to convert the whole tree to page_offset, but for now
just fix the bugs.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mundt [Fri, 20 Jul 2007 07:31:44 +0000 (00:31 -0700)]
mm: fix memory hotplug oops from ZONE_MOVABLE changes.
zone_movable_pfn is presently marked as __initdata and referenced from
adjust_zone_range_for_zone_movable(), which in turn is referenced by
zone_spanned_pages_in_node(). Both of these are __meminit annotated. When
memory hotplug is enabled, this will oops on a hot-add, due to
zone_movable_pfn having been freed.
__meminitdata annotation gives the desired behaviour.
This will only impact platforms that enable both memory hotplug
and ARCH_POPULATES_NODE_MAP.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Fri, 20 Jul 2007 07:31:43 +0000 (00:31 -0700)]
xen: disable vdso "nosegneg" on native boot
One of the nice ideas behind paravirt is that CONFIG_XEN=y can be included
in a standard configuration and be no worse for native booting than as a
Xen guest. The glibc feature that supports the vDSO "nosegneg" note is
designed specifically to make this easy. You just have to flip one bit at
boot time. This patch makes Xen flip the bit, so a CONFIG_XEN=y kernel on
bare hardware does not make glibc use the less-optimized library builds.
Signed-off-by: Roland McGrath <roland@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix two year old bug in early bootup asm.
[SPARC64]: Update defconfig.
[SPARC64]: Fix log message type in vio_create_one().
[SPARC64]: Tweak assertions in sun4v_build_virq().
[SPARC64]: Tweak kernel log messages in power_probe().
[SPARC64]: Fix handling of multiple vdc-port nodes.
[SPARC64]: Fix device type matching in VIO's devspec_show().
[SPARC64]: Fix MODULE_DEVICE_TABLE() specification in VDC and VNET.
[SPARC]: Add sys_fallocate() entries.
[SPARC64]: Use orderly_poweroff().
Al Viro [Fri, 20 Jul 2007 15:07:33 +0000 (16:07 +0100)]
Fix up sky2 breakage
Doing |= 1 << 19 to 16bit unsigned is not particulary useful;
that register is 32bit, unlike the ones dealt with in the rest of
function, so we need u32 variable here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In most cases, when EH is scheduled, all in-flight commands are
aborted causing EH to kick in immediately. However, in some cases
(especially with PMP), it's unclear which commands are affected by the
error condition and although aborting all in-flight commands work, it
isn't optimal and may cause unnecessary disruption. On the other
hand, waiting for in-flight commands to drain themselves can take up
to 30seconds.
This patch implements EH fast drain to handle such situations. It
gives in-flight commands some time to finish up but doesn't wait for
too long. After EH is scheduled, fast drain timer is started and if
no other completion occurs in ATA_EH_FASTDRAIN_INTERVAL all in-flight
commands are aborted. If any completion occurred in the interval, the
port is given another interval to finish up itself.
Currently ATA_EH_FASTDRAIN_INTERVAL is 3 secs which should be enough
for finishing up most commands.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata: schedule probing after SError access failure during autopsy
If SError isn't accessible, EH can't tell whether hotplug has happened
or not. Report SError read failure with AC_ERR_OTHER and schedule
probing with hardreset. This will be mainly useful for PMPs.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
ATA_EHI_HOTPLUGGED is a hint for reset functions indicating the the
port might have gone through hotplug/unplug just before entering EH.
Reset functions modify their behaviors a bit to handle the situation
better - e.g. using longer debouncing delay.
Currently, once HOTPLUG is set, it isn't cleared till the end of EH.
This is unnecessary and makes EH take longer. Clear the HOTPLUGGED
flag after a reset try (successful or not).
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
__ata_ehi_hotplugged() now has no users. Regorganize
ata_ehi_hotplugged() such that a new function ata_ehi_schedule_probe()
deals with scheduling probing. ata_ehi_hotplugged() calls it and
additionally marks hotplug specific flags. ata_ehi_schedule_probe()
will be used laster.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
SCSI scan may fail due to memory allocation failure even if EH is not
in progress. Due to use of GFP_ATOMIC in SCSI scan path, allocation
failure isn't too rare especially while probing multiple devices at
once which is the case when a bunch of devices are connected to PMP.
This patch moves SCSI scan failure detetion logic from
ata_scsi_hotplug() to ata_scsi_scan_host() and implement synchronous
scan behavior. The synchronous path sleeps briefly and repeats SCSI
scan if some devices aren't attached properly. It contains robust
retry loop to minimize the chance of device misdetection during boot
and falls back to async retry if everything fails.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata: quickly trigger SATA SPD down after debouncing failed
Debouncing failure is a good indicator of basic link problem. Use
-EPIPE to indicate debouncing failure and make ata_eh_reset() invoke
sata_down_spd_limit() if the error occurs during reset.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
sata_down_spd_limit() first reads the current SPD from SStatus and
limit the speed to the lower one of one below the current limit or one
below the current SPD in SStatus. SPD may not be accessible or valid
when SPD down is requested making sata_down_spd_limit() fail when it's
most needed.
This patch makes the current SPD cached after each successful reset
and forces GEN I speed (1.5Gbps) if neither of SStatus or the cached
value is valid, so sata_down_spd_limit() is now guaranteed to lower
the speed limit if lower speed is available.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
su henry [Fri, 20 Jul 2007 12:07:46 +0000 (08:07 -0400)]
The SATA controller device ID is different according to
the onchip SATA type set in the system BIOS:
Device Device ID
SATA in IDE mode 0x4390
SATA in AHCI mode 0x4391
SATA in non-raid5 driver 0x4392
SATA in raid5 driver 0x4393
Although the device ID is different, they use the same AHCI driver
.The attached file is the patch for adding these device
IDs for ATI SB700.
Signed-off-by: henry.su.ati@gmail.com Signed-off-by: Jeff Garzik <jeff@garzik.org>
ahci_save_initial_config() is responsible for reading, screening the
host CAP register and storing the modified result into hpriv->cap for
the rest of the driver. Move ATA_FLAG_NO_NCQ handling into
ahci_save_initial_config(). It's more consistent this way and the
rest of the driver can always refer to hpriv->cap to determine
configured capability.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
When an NCQ command fails, all commands in flight are aborted and the
offending one is reported using log page 10h. Depending on controller
characteristics and LLD implementation, all commands may appear as
having a device error due to shared TF status making it hard to
determine what's actually going on.
This patch adds AC_ERR_NCQ, marks the command reported by log page 10h
with it and print extra "<F>" after the error report for the command
to help distinguishing the offending command.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Requiring LLDs to format multiple error description messages properly
doesn't work too well. Help LLDs a bit by making ata_ehi_push_desc()
insert ", " on each invocation. __ata_ehi_push_desc() is the raw
version without the automatic separator.
While at it, make ehi_desc interface proper functions instead of
macros.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
sata_sil24: replace sil24_update_tf() with sil24_read_tf()
Replace sil24_update_tf() to sil24_read_tf() which reads TF into
passed int result TF argument and can read TFs of PMP links. This
will be used by PMP support.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Separate out ahci_exec_polled_cmd() from ahci_softreset(). This will
be used to implement ahci_pmp_read/write(). ahci_exec_polled_cmd()
performs reset_engine before returning if the command fails (times
out). This is to improve robustness.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Separate out stop_engine - CLO - start_engine sequence from
ahci_softreset() and ahci_clo() into ahci_reset_engine() and use it in
ahci_softreset() and ahci_post_internal_cmd(). The function will also
be used to prepare for and clean up after PMP register access
commands.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>