KVM: x86 emulator: Make a distinction between repeat prefixes F3 and F2
cmps and scas instructions accept repeat prefixes F3 and F2. So in
order to emulate those prefixed instructions we need to be able to know
if prefixes are REP/REPE/REPZ or REPNE/REPNZ. Currently kvm doesn't make
this distinction. This patch introduces this distinction.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
Dan Kenigsberg [Wed, 21 Nov 2007 15:10:04 +0000 (17:10 +0200)]
KVM: Enhance guest cpuid management
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:
- No way to tell which features the host supports
While some features can be supported with no changes to kvm, others
need explicit support. That means kvm needs to vet the feature set
before it is passed to the guest.
- No support for indexed or stateful cpuid entries
Some cpuid entries depend on ecx as well as on eax, or on internal
state in the processor (running cpuid multiple times with the same
input returns different output). The current cpuid machinery only
supports keying on eax.
- No support for save/restore/migrate
The internal state above needs to be exposed to userspace so it can
be saved or migrated.
This patch adds extended cpuid support by means of three new ioctls:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports
- KVM_SET_CPUID2: sets the vcpu's cpuid table
- KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state
[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
before]
Signed-off-by: Dan Kenigsberg <danken@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 21 Nov 2007 13:28:32 +0000 (15:28 +0200)]
KVM: MMU: Rename variables of type 'struct kvm_mmu_page *'
These are traditionally named 'page', but even more traditionally, that name
is reserved for variables that point to a 'struct page'. Rename them to 'sp'
(for "shadow page").
Avi Kivity [Wed, 21 Nov 2007 12:44:45 +0000 (14:44 +0200)]
KVM: MMU: Introduce gfn_to_gpa()
Converting a frame number to an address is tricky since the data type changes
size. Introduce a function to do it. This fixes an actual bug when
accessing guest ptes.
Avi Kivity [Wed, 21 Nov 2007 00:06:21 +0000 (02:06 +0200)]
KVM: MMU: Avoid unnecessary remote tlb flushes when guest updates a pte
If all we're doing is increasing permissions on a pte (typical for demand
paging), then there's not need to flush remote tlbs. Worst case they'll
get a spurious page fault.
Avi Kivity [Tue, 20 Nov 2007 19:39:54 +0000 (21:39 +0200)]
KVM: MMU: Implement guest page fault bypass for nonpae
I spent an hour worrying why I see so many guest page faults on FC6 i386.
Turns out bypass wasn't implemented for nonpae. Implement it so it doesn't
happen again.
Avi Kivity [Tue, 20 Nov 2007 13:30:24 +0000 (15:30 +0200)]
KVM: Split vcpu creation to avoid vcpu_load() before preemption setup
Split kvm_arch_vcpu_create() into kvm_arch_vcpu_create() and
kvm_arch_vcpu_setup(), enabling preemption notification between the two.
This mean that we can now do vcpu_load() within kvm_arch_vcpu_setup().
Avi Kivity [Tue, 20 Nov 2007 10:49:31 +0000 (12:49 +0200)]
KVM: x86 emulator: retire ->write_std()
Theoretically used to acccess memory known to be ordinary RAM, it was
never implemented. It is questionable whether it is possible to implement
it correctly.
Izik Eidus [Tue, 20 Nov 2007 09:49:33 +0000 (11:49 +0200)]
KVM: MMU: Selectively set PageDirty when releasing guest memory
Improve dirty bit setting for pages that kvm release, until now every page
that we released we marked dirty, from now only pages that have potential
to get dirty we mark dirty.
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Tue, 20 Nov 2007 09:30:04 +0000 (11:30 +0200)]
KVM: MMU: Fix potential memory leak with smp real-mode
When we map a page, we check whether some other vcpu mapped it for us and if
so, bail out. But we should decrease the refcount on the page as we do so.
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Jan Kiszka [Mon, 19 Nov 2007 09:21:45 +0000 (10:21 +0100)]
KVM: VMX: Force seg.base == (seg.sel << 4) in real mode
Ensure that segment.base == segment.selector << 4 when entering the real
mode on Intel so that the CPU will not bark at us. This fixes some old
protected mode demo from http://www.x86.org/articles/pmbasics/tspec_a1_doc.htm.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 18 Nov 2007 11:50:24 +0000 (13:50 +0200)]
KVM: Replace 'light_exits' stat with 'host_state_reload'
This is a little more accurate (since it counts actual reloads, not potential
reloads), and reverses the sense of the statistic to measure a bad event like
most of the other stats (e.g. we want to minimize all counters).
Avi Kivity [Thu, 15 Nov 2007 16:06:18 +0000 (18:06 +0200)]
KVM: VMX: Consolidate register usage in vmx_vcpu_run()
We pass vcpu, vmx->fail, and vmx->launched to assembly code, but all three
are fields within vmx. Consolidate by only passing in vmx and offsets for
the rest.
Izik Eidus [Sun, 11 Nov 2007 20:10:22 +0000 (22:10 +0200)]
KVM: Change kvm_{read,write}_guest() to use copy_{from,to}_user()
This changes kvm_write_guest_page/kvm_read_guest_page to use
copy_to_user/read_from_user, as a result we get better speed
and better dirty bit tracking.
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 22 Nov 2007 09:42:59 +0000 (11:42 +0200)]
KVM: Fix faults during injection of real-mode interrupts
If vmx fails to inject a real-mode interrupt while fetching the interrupt
redirection table, it fails to record this in the vectoring information
field. So we detect this condition and do it ourselves.
Avi Kivity [Thu, 8 Nov 2007 16:19:20 +0000 (18:19 +0200)]
KVM: VMX: Use vmx to inject real-mode interrupts
Instead of injecting real-mode interrupts by writing the interrupt frame into
guest memory, abuse vmx by injecting a software interrupt. We need to
pretend the software interrupt instruction had a length > 0, so we have to
adjust rip backward.
This lets us not to mess with writing guest memory, which is complex and also
sleeps.
Avi Kivity [Wed, 31 Oct 2007 09:15:56 +0000 (11:15 +0200)]
KVM: x86 emulator: centralize decoding of one-byte register access insns
Instructions like 'inc reg' that have the register operand encoded
in the opcode are currently specially decoded. Extend
decode_register_operand() to handle that case, indicated by having
DstReg or SrcReg without ModRM.
Carsten Otte [Tue, 30 Oct 2007 17:44:25 +0000 (18:44 +0100)]
KVM: Portability: Move pio emulation functions to x86.c
This patch moves implementation of the following functions from
kvm_main.c to x86.c:
free_pio_guest_pages, vcpu_find_pio_dev, pio_copy_data, complete_pio,
kernel_pio, pio_string_write, kvm_emulate_pio, kvm_emulate_pio_string
The function inject_gp, which was duplicated by yesterday's patch
series, is removed from kvm_main.c now because it is not needed anymore.
Carsten Otte [Tue, 30 Oct 2007 17:44:21 +0000 (18:44 +0100)]
KVM: Portability: Move x86 emulation and mmio device hook to x86.c
This patch moves the following functions to from kvm_main.c to x86.c:
emulator_read/write_std, vcpu_find_pervcpu_dev, vcpu_find_mmio_dev,
emulator_read/write_emulated, emulator_write_phys,
emulator_write_emulated_onepage, emulator_cmpxchg_emulated,
get_setment_base, emulate_invlpg, emulate_clts, emulator_get/set_dr,
kvm_report_emulation_failure, emulate_instruction
The following data type is moved to x86.c:
struct x86_emulate_ops emulate_ops
Carsten Otte [Tue, 30 Oct 2007 17:44:17 +0000 (18:44 +0100)]
KVM: Portability: Move kvm_get/set_msr[_common] to x86.c
This patch moves the implementation of the functions of kvm_get/set_msr,
kvm_get/set_msr_common, and set_efer from kvm_main.c to x86.c. The
definition of EFER_RESERVED_BITS is moved too.
Anthony Liguori [Mon, 29 Oct 2007 20:15:20 +0000 (15:15 -0500)]
KVM: Fix gfn_to_page() acquiring mmap_sem twice
KVM's nopage handler calls gfn_to_page() which acquires the mmap_sem when
calling out to get_user_pages(). nopage handlers are already invoked with the
mmap_sem held though. Introduce a __gfn_to_page() for use by the nopage
handler which requires the lock to already be held.
This was noticed by tglx.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch based on CR8/TPR patch, and enable the TPR shadow (FlexPriority)
for 32bit Windows. Since TPR is accessed very frequently by 32bit
Windows, especially SMP guest, with FlexPriority enabled, we saw significant
performance gain.
Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Mon, 29 Oct 2007 15:09:35 +0000 (16:09 +0100)]
KVM: Portability: Move control register helper functions to x86.c
This patch moves the definitions of CR0_RESERVED_BITS,
CR4_RESERVED_BITS, and CR8_RESERVED_BITS along with the following
functions from kvm_main.c to x86.c:
set_cr0(), set_cr3(), set_cr4(), set_cr8(), get_cr8(), lmsw(),
load_pdptrs()
The static function wrapper inject_gp is duplicated in kvm_main.c and
x86.c for now, the version in kvm_main.c should disappear once the last
user of it is gone too.
The function load_pdptrs is no longer static, and now defined in x86.h
for the time being, until the last user of it is gone from kvm_main.c.
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Mon, 29 Oct 2007 15:08:51 +0000 (16:08 +0100)]
KVM: Portability: Move memory segmentation to x86.c
This patch moves the definition of segment_descriptor_64 for AMD64 and
EM64T from kvm_main.c to segment_descriptor.h. It also adds a proper
#ifndef...#define...#endif around that header file.
The implementation of segment_base is moved from kvm_main.c to x86.c.
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>