Arjan van de Ven [Thu, 17 Apr 2008 15:41:31 +0000 (17:41 +0200)]
x86: add comments to describe the new api's in cacheflush.h
The new cacheflush.h API's didn't have any comments describing
how they're to be used yet and the conventions around these functions.
This patch adds comments to this effect; in order for that to be
a logical series, some prototypes had to move around.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jesper Juhl [Wed, 26 Mar 2008 01:16:15 +0000 (02:16 +0100)]
x86 floppy: kill off the 'register' keyword from header
When compilers became generally better at optimizing code than humans, the
register keyword became mostly useless. For the floppy driver it certainly
is since it's so slow compared to the rest of the system that optimizing
access to a single variable or two isn't going to make any real difference
So let's just leave it to the compiler - it'll do a better job anyway.
This patch does away with a few register keywords in the x86 floppy driver.
Andi Kleen [Wed, 12 Mar 2008 02:53:32 +0000 (03:53 +0100)]
x86: split large page mapping for AMD TSEG
On AMD SMM protected memory is part of the address map, but handled
internally like an MTRR. That leads to large pages getting split
internally which has some performance implications. Check for the
AMD TSEG MSR and split the large page mapping on that area
explicitely if it is part of the direct mapping.
There is also SMM ASEG, but it is in the first 1MB and already covered by
the earlier split first page patch.
Idea for this came from an earlier patch by Andreas Herrmann
On a RevF dual Socket Opteron system kernbench shows a clear
improvement from this:
(together with the earlier patches in this series, especially the
split first 2MB patch)
[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%
=> About 3.2% improvement in elapsed time for kernbench.
With GB pages on AMD Fam1h the impact of splitting is much higher of course,
since it would split two full GB pages (together with the first
1MB split patch) instead of two 2MB pages. I could not benchmark
a clear difference in kernbench on gbpages, so I kept it disabled
for that case
That was only limited benchmarking of course, so if someone
was interested in running more tests for the gbpages case
that could be revisited (contributions welcome)
I didn't bother implementing this for 32bit because it is very
unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB
and the 2MB low split is already handled for both.
[ mingo@elte.hu: do it on gbpages kernels too, there's no clear reason
why it shouldnt help there. ]
Andi Kleen [Wed, 12 Mar 2008 02:53:30 +0000 (03:53 +0100)]
x86: don't use large pages to map the first 2/4MB of memory
Intel recommends to not use large pages for the first 1MB
of the physical memory because there are fixed size MTRRs there
which cause splitups in the TLBs.
On AMD doing so is also a good idea.
The implementation is a little different between 32bit and 64bit.
On 32bit I just taught the initial page table set up about this
because it was very simple to do. This also has the advantage
that the risk of a prefetch ever seeing the page even
if it only exists for a short time is minimized.
On 64bit that is not quite possible, so use set_memory_4k() a little
later (in check_bugs) instead.
Andi Kleen [Wed, 12 Mar 2008 02:53:29 +0000 (03:53 +0100)]
x86: add set_memory_4k to pageattr.c
Add a new function to force split large pages into 4k pages.
This is needed for some followup optimizations.
I had to add a new field to cpa_data to pass down the information
that try_preserve_large_page should not run.
Right now no set_page_4k() because I didn't need it and all the
specialized users I have in mind would be more comfortable with
pure addresses. I also didn't export it because it's unlikely
external code needs it.
Andi Kleen [Wed, 12 Mar 2008 02:53:27 +0000 (03:53 +0100)]
x86: implement true end_pfn_mapped for 32bit
Even on 32bit 2MB pages can map more memory than is in the true
max_low_pfn if end_pfn is not highmem and not aligned to 2MB.
Add a end_pfn_map similar to x86-64 that accounts for this
fact. This is important for code that really needs to know about
all mapping aliases.
Andi Kleen [Tue, 11 Mar 2008 01:23:21 +0000 (02:23 +0100)]
x86: move early exception handlers into init.text
Currently they are in .text.head because the rest of head_64.S.
.text.head is not removed as init data, but the early exception handlers
should be because they are not needed after early boot of the BP.
So move them over.
Andi Kleen [Tue, 11 Mar 2008 01:23:22 +0000 (02:23 +0100)]
x86: replace early exception setup macro recursion with loop
The early exception handlers are currently set up using a macro
recursion. There is only one user left. Replace the macro with a
standard loop in place.
* Ingo Molnar (mingo@elte.hu) wrote:
>
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>
> > The shadow vmap for DEBUG_RODATA kernel text modification uses
> > virt_to_page to get the pages from the pointer address.
> >
> > However, I think vmalloc_to_page would be required in case the page is
> > used for modules.
> >
> > Since only the core kernel text is marked read-only, use
> > kernel_text_address() to make sure we only shadow map the core kernel
> > text, not modules.
>
> actually, i think we should mark module text readonly too.
>
Yes, but in the meantime, the x86 tree would need this patch to make
kprobes work correctly on modules.
I suspect that without this fix, with the enhanced hotplug and kprobes
patch, kprobes will use text_poke to insert breakpoints in modules
(vmalloced pages used), which will map the wrong pages and corrupt
random kernel locations instead of updating the correct page.
Work that would write protect the module pages should clearly be done,
but it can come in a later time. We have to make sure we interact
correctly with the page allocation debugging, as an example.
Here is the patch against x86.git 2.6.25-rc5 :
The shadow vmap for DEBUG_RODATA kernel text modification uses virt_to_page to
get the pages from the pointer address.
However, I think vmalloc_to_page would be required in case the page is used for
modules.
Since only the core kernel text is marked read-only, use kernel_text_address()
to make sure we only shadow map the core kernel text, not modules.
vSMP detection: access pci config space early in boot to detect if the
system is a vSMPowered box, and cache the result in a flag, so that
is_vsmp_box() retrieves the value of the flag always.
x86: only enable interrupts when kernel state has been set up
The sysenter path tries to enable interrupts immediately. Unfortunately
this doesn't work in a paravirt environment, because not enough kernel
state has been set up at that point (namely, pointing %fs to the kernel
percpu data segment). To fix this, defer ENABLE_INTERRUPTS until after
the kernel state has been set up.
Unfortunately this means that we're running with interrupts disabled
for a while without calling the IRQ tracing code, but that can't be
called without setting up %fs either.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>