One of the last IOMMU updates covered a bug in the AMD IOMMU code. The early
detection code does not succeed if the GART is already detected. This patch
fixes this.
x86, AMD IOMMU: don't try to init IOMMU if early detect code did not detect one
This patch adds a check if the early detect code has found AMD IOMMU hardware
descriptions and does not try to initialize hardware if the check failed.
x86, AMD IOMMU: flush domain TLB when there is more than one page to flush
This patch changes the domain TLB flushing behavior of the driver. When there
is more than one page to flush it flushes the whole domain TLB instead of every
single page. So we send only a single command to the IOMMU in every case which
is faster to execute.
Joerg Roedel [Mon, 30 Jun 2008 18:18:02 +0000 (20:18 +0200)]
x86, AMD IOMMU: disable suspend/resume with IOMMU enabled (for now)
This patch disables suspend/resume on machines with AMD IOMMU enabled. Real
suspend/resume support for AMD IOMMU is currently being worked on. Until this
is ready it will be disabled to avoid data corruption when the IOMMU is not
properly re-enabled at resume. The patch is based on a similar patch for the
GART driver written by Pavel Machek.
The overall driver merged into tip/master is tested with parallel disk and
network loads and showed no problems in a test running for 3 days.
Ingo Molnar [Fri, 27 Jun 2008 09:31:28 +0000 (11:31 +0200)]
x86, AMD IOMMU: build fix #3
fix typo causing:
arch/x86/kernel/built-in.o: In function `__unmap_single':
amd_iommu.c:(.text+0x17771): undefined reference to `iommu_area_free'
arch/x86/kernel/built-in.o: In function `__map_single':
amd_iommu.c:(.text+0x1797a): undefined reference to `iommu_area_alloc'
amd_iommu.c:(.text+0x179a2): undefined reference to `iommu_area_alloc'
Ingo Molnar [Fri, 27 Jun 2008 08:48:16 +0000 (10:48 +0200)]
x86, AMD IOMMU, build fix #2
fix:
arch/x86/kernel/amd_iommu.c: In function ‘amd_iommu_init_dma_ops':
arch/x86/kernel/amd_iommu.c:940: error: lvalue required as left operand of assignment
arch/x86/kernel/amd_iommu.c:941: error: lvalue required as left operand of assignment
Ingo Molnar [Fri, 27 Jun 2008 08:37:03 +0000 (10:37 +0200)]
x86, AMD IOMMU: build fix
fix:
arch/x86/kernel/amd_iommu_init.c:247: warning: 'struct acpi_table_header' declared inside parameter list
arch/x86/kernel/amd_iommu_init.c:247: warning: its scope is only this definition or declaration, which is probably not what you want
arch/x86/kernel/amd_iommu_init.c: In function 'find_last_devid_acpi':
arch/x86/kernel/amd_iommu_init.c:257: error: dereferencing pointer to incomplete type
arch/x86/kernel/amd_iommu_init.c:265: error: dereferencing pointer to incomplete type
Joerg Roedel [Thu, 26 Jun 2008 19:28:07 +0000 (21:28 +0200)]
x86, AMD IOMMU: initialize dma_ops from IOMMU initialization and enable IOMMUs
This patch adds a call to the driver specific dma_ops initialization routine
from the AMD IOMMU hardware initialization. Further it adds the necessary code
to finally enable AMD IOMMU hardware.
Joerg Roedel [Thu, 26 Jun 2008 19:28:04 +0000 (21:28 +0200)]
x86, AMD IOMMU: add pre-allocation of protection domains
This patch adds a function to pre-allocate protection domains. So we don't have
to allocate it on the first request for a device (which can happen in atomic
mode).
Joerg Roedel [Thu, 26 Jun 2008 19:27:41 +0000 (21:27 +0200)]
x86, AMD IOMMU: add functions to find last possible PCI device for IOMMU
This patch adds functions to find the last PCI bus/device/function the IOMMU
driver has to handle. This information is used later to allocate the memory for
the data structures.
Linus Torvalds [Wed, 25 Jun 2008 01:12:33 +0000 (18:12 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
[IA64] Handle count==0 in sn2_ptc_proc_write()
[IA64] Fix boot failure on ia64/sn2
Linus Torvalds [Wed, 25 Jun 2008 01:09:06 +0000 (18:09 -0700)]
Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Remove now unused structs from kvm_para.h
x86: KVM guest: Use the paravirt clocksource structs and functions
KVM: Make kvm host use the paravirt clocksource structs
x86: Make xen use the paravirt clocksource structs and functions
x86: Add structs and functions for paravirt clocksource
KVM: VMX: Fix host msr corruption with preemption enabled
KVM: ioapic: fix lost interrupt when changing a device's irq
KVM: MMU: Fix oops on guest userspace access to guest pagetable
KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
KVM: close timer injection race window in __vcpu_run
KVM: Fix race between timer migration and vcpu migration
Jie Luo [Tue, 24 Jun 2008 17:38:31 +0000 (10:38 -0700)]
enable bus mastering on i915 at resume time
On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.
Signed-off-by: Jie Luo <clotho67@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerd Hoffmann [Tue, 3 Jun 2008 14:17:32 +0000 (16:17 +0200)]
x86: KVM guest: Use the paravirt clocksource structs and functions
This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.
The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup. kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Gerd Hoffmann [Tue, 3 Jun 2008 14:17:29 +0000 (16:17 +0200)]
x86: Add structs and functions for paravirt clocksource
This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).
It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code. They are enabled using
CONFIG_PARAVIRT_CLOCK.
Subsequent patches of this series will put the code in use.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Julia Lawall [Tue, 24 Jun 2008 08:22:05 +0000 (10:22 +0200)]
[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory. Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tony Luck <tony.luck@intel.com>
Cliff Wickman [Tue, 24 Jun 2008 17:20:06 +0000 (10:20 -0700)]
[IA64] Handle count==0 in sn2_ptc_proc_write()
The fix applied in e0c6d97c65e0784aade7e97b9411f245a6c543e7
"security hole in sn2_ptc_proc_write" didn't take into account
the case where count==0 (which results in a buffer underrun
when adding the trailing '\0'). Thanks to Andi Kleen for
pointing this out.
Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used. xen-unstable has now officially dropped
non-PAE support. Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avi Kivity [Tue, 24 Jun 2008 08:48:49 +0000 (11:48 +0300)]
KVM: VMX: Fix host msr corruption with preemption enabled
Switching msrs can occur either synchronously as a result of calls to
the msr management functions (usually in response to the guest touching
virtualized msrs), or asynchronously when preempting a kvm thread that has
guest state loaded. If we're unlucky enough to have the two at the same
time, host msrs are corrupted and the machine goes kaput on the next syscall.
Most easily triggered by Windows Server 2008, as it does a lot of msr
switching during bootup.
Avi Kivity [Tue, 17 Jun 2008 22:36:36 +0000 (15:36 -0700)]
KVM: ioapic: fix lost interrupt when changing a device's irq
The ioapic acknowledge path translates interrupt vectors to irqs. It
currently uses a first match algorithm, stopping when it finds the first
redirection table entry containing the vector. That fails however if the
guest changes the irq to a different line, leaving the old redirection table
entry in place (though masked). Result is interrupts not making it to the
guest.
Fix by always scanning the entire redirection table.
Avi Kivity [Thu, 12 Jun 2008 13:54:41 +0000 (16:54 +0300)]
KVM: MMU: Fix oops on guest userspace access to guest pagetable
KVM has a heuristic to unshadow guest pagetables when userspace accesses
them, on the assumption that most guests do not allow userspace to access
pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
it also oopses.
This never triggers on ordinary guests since sane OSes will clear the
pagetables before assigning them to userspace, which will trigger the flood
heuristic, unshadowing the pagetables before the first userspace access. One
particular guest, though (Xenner) will run the kernel in userspace, triggering
the oops. Since the heuristic is incorrect in this case, we can simply
remove it.
Marcelo Tosatti [Wed, 11 Jun 2008 23:32:40 +0000 (20:32 -0300)]
KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
guests properly. It will instantiate two 2MB sptes pointing to the same
physical 2MB page when a guest large pte update is trapped.
Instead of duplicating code to handle this, disallow directory level
updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
emulating one guest 4MB pte can be correctly created by the page fault
handling path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
rmap_next() does not work correctly after rmap_remove(), as it expects
the rmap chains not to change during iteration. Fix (for now) by restarting
iteration from the beginning.
Marcelo Tosatti [Fri, 6 Jun 2008 19:37:36 +0000 (16:37 -0300)]
KVM: close timer injection race window in __vcpu_run
If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.
It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.
For now introduce a new vcpu requests bit to cancel guest entry.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Fri, 6 Jun 2008 19:37:35 +0000 (16:37 -0300)]
KVM: Fix race between timer migration and vcpu migration
A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().
If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.
Fix it by cancelling guest entry if any vcpu request is pending. This
has the side effect of nicely consolidating vcpu->requests checks.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Linus Torvalds [Mon, 23 Jun 2008 23:25:11 +0000 (16:25 -0700)]
Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: nfs_updatepage(): don't mark page as dirty if an error occurred
NFS: Fix filehandle size comparisons in the mount code
NFS: Reduce the NFS mount code stack usage.
Linus Torvalds [Mon, 23 Jun 2008 19:48:50 +0000 (12:48 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: refactor wait_for_completion_timeout()
sched: fix wait_for_completion_timeout() spurious failure under heavy load
sched: rt: dont stop the period timer when there are tasks wanting to run
Linus Torvalds [Mon, 23 Jun 2008 19:48:17 +0000 (12:48 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xen: don't drop NX bit
xen: mask unwanted pte bits in __supported_pte_mask
xen: Use wmb instead of rmb in xen_evtchn_do_upcall().
x86: fix NULL pointer deref in __switch_to