arch/x86/kernel/acpi/boot.c: In function ‘dmi_ignore_irq0_timer_override’:
arch/x86/kernel/acpi/boot.c:1443: error: implicit declaration of function ‘force_mask_ioapic_irq_2’
The problems are that, with the ACPI vs timer overring issue _fixed_,
after using the box for some time (between several seconds and 1 hour, at
random) processes get very high CPU loads (once I've got X using 107% of
the CPU, for example) and the system becomes unresponsive, as though there
were interrupts lost or something similar.
Andreas Herrman reproduced similar problems:
> Ok, now I've reproduced the stability problem.
> - Using tip/master,
> - reverting e38502eb8aa82314d5ab0eba45f50e6790dadd88 and
> - applying your patch from this posting
> http://marc.info/?l=linux-kernel&m=121539354224562&w=4
>
> Starting X, firefox, gimp, tuxpaint and doing some drawing in tuxpaint
> results in a slow system. Drawing is almost not possible anymore --
> Selections of new colors, cursors etc. is performed with huge delay
> if it's performed at all.
>
> BTW, the code sets up timer IRQ as Virtual Wire IRQ:
>
> Jul 8 14:57:58 kodscha IO-APIC (apicid-pin) 2-22, 2-23 not connected.
> Jul 8 14:57:58 kodscha ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> Jul 8 14:57:58 kodscha ...trying to set up timer as Virtual Wire IRQ... works.
>
> and both INT0 and INT2 of IOAPIC are masked:
>
> Jul 8 14:57:58 kodscha NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
> Jul 8 14:57:58 kodscha 00 000 1 0 0 0 0 0 0 00
> Jul 8 14:57:58 kodscha 01 003 0 0 0 0 0 1 1 31
> Jul 8 14:57:58 kodscha 02 003 1 0 0 0 0 0 0 30
>
> I've also seen strange CPU utilization -- with syslog-ng:
>
> top - 15:33:06 up 35 min, 4 users, load average: 1.70, 0.68, 0.37
> Tasks: 64 total, 4 running, 60 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu1 : 6.4%us, 87.2%sy, 0.0%ni, 5.8%id, 0.0%wa, 0.6%hi, 0.0%si, 0.0%st
> Mem: 895384k total, 283568k used, 611816k free, 35492k buffers
> Swap: 1959920k total, 0k used, 1959920k free, 163044k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4632 root 20 0 17216 800 580 S 104 0.1 0:34.22 syslog-ng
> 28505 root 20 0 205m 11m 4024 S 6 1.3 0:21.16 X
> 28518 root 20 0 56292 5652 4492 S 1 0.6 0:01.80 fluxbox
> 1 root 20 0 3724 608 508 S 0 0.1 0:00.36 init
>
> So far I have no clue why C1E-idle in conjunction with virtual wire
> mode causes this strange behaviour.
>
> ... and I start to think about the root cause of all this.
>
> I've performed similar tests under X with the IRQ0/INT0 configuration and
> I did not see above symptoms.
So lets fall back to the IRQ0/INT0 configuration on this box.
This basically restores the dont-use-the-lapic-timer exception mechanism
that was unconditional on this box prior commit 8750bf5 ("x86: add C1E
aware idle function").
Glauber Costa [Wed, 25 Jun 2008 15:48:47 +0000 (12:48 -0300)]
x86: merge __get_user_asm and its users.
Move __get_user_asm and __get_user_size and __get_user_nocheck
to uaccess.h. This requires us to define a macro at __get_user_size
for the 64-bit access case.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 25 Jun 2008 14:48:29 +0000 (11:48 -0300)]
x86: merge __put_user_asm and its user.
Move both __put_user_asm and __put_user_size to
uaccess.h. i386 already had a special function for 64-bit access,
so for x86_64, we just define a macro with the same name.
Note that for X86_64, CONFIG_X86_WP_WORKS_OK will always
be defined, so the #else part will never be even compiled in.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 25 Jun 2008 14:05:11 +0000 (11:05 -0300)]
x86: merge getuser.
Merge versions of getuser from uaccess_32.h and uaccess_64.h into
uaccess.h. There is a part which is 64-bit only (for now), and for
that, we use a __get_user_8 macro.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Fri, 13 Jun 2008 17:39:25 +0000 (14:39 -0300)]
x86: merge common parts of uaccess.
Common parts of uaccess_32.h and uaccess_64.h
are put in uaccess.h. Bits in uaccess_32.h and
uaccess_64.h that come to this file are equal
except for comments and whitespaces differences.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Mon, 30 Jun 2008 20:07:51 +0000 (17:07 -0300)]
x86: change asm constraint.
Our integration efforts broke a build with this function being used
with i386. Reason is "g" can put the operand in an imm32, which according
to The Book (tm), is invalid as the second operand.
This is actually a bug
in x86_64 too, since the x86_64 instruction set reference does not list
it as valid.
We probably didn't trigger this before due to the ammount of
registers available for 64-bit platforms. But that's just my guess.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 25 Jun 2008 13:14:13 +0000 (10:14 -0300)]
x86: commonize __range_not_ok.
For i386, __range_not_ok is a better name than __range_ok, since
it returns 0 when it is in fact okay. Other than that,
both versions does not need the word size specifiers, and we remove them.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Tue, 24 Jun 2008 19:51:59 +0000 (16:51 -0300)]
x86: change testing logic in putuser_64.S.
Instead of operating over a register we need to put back
into normal state afterwards (the memory position), just
sub from rbx, which is trashed anyway. We can save a few instructions.
Also, this is the i386 way.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Tue, 24 Jun 2008 18:02:31 +0000 (15:02 -0300)]
x86: user put_user_x instead of all variants.
Follow the pattern, and define a single put_user_x, instead
of defining macros for all available sizes. Exception is
put_user_8, since the "A" constraint does not give us enough
power to specify which register (a or d) to use in the 32-bit
common case.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Tue, 24 Jun 2008 14:37:57 +0000 (11:37 -0300)]
x86: introduce __ASM_REG macro.
There are situations in which the architecture wants to use the
register that represents its word-size, whatever it is. For those,
introduce __ASM_REG in asm.h, along with the first users _ASM_AX
and _ASM_DX. They have users waiting for it, namely the getuser
functions.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Fri, 13 Jun 2008 19:35:52 +0000 (16:35 -0300)]
x86: don't clobber r8 nor use rcx.
There's really no reason to clobber r8 or pass the address in rcx.
We can safely use only two registers (which we already have to touch anyway)
to do the job.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86, uv: build fix #2 for "x86, uv: update x86 mmr list for SGI uv"
fix:
In file included from arch/x86/kernel/tlb_uv.c:14:
include/asm/uv/uv_mmrs.h:986: error: redefinition of ‘union uvh_rh_gam_cfg_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:988: error: redefinition of ‘struct uvh_rh_gam_cfg_overlay_config_mmr_s’
include/asm/uv/uv_mmrs.h:1064: error: redefinition of ‘union uvh_rh_gam_mmioh_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:1066: error: redefinition of ‘struct uvh_rh_gam_mmioh_overlay_config_mmr_s’
caused by another duplicate section (cut & paste error) in commit 5d061e397db1 "x86, uv: update x86 mmr list for SGI uv".
x86, uv: build fix for "x86, uv: update x86 mmr list for SGI uv"
fix:
In file included from arch/x86/kernel/genx2apic_uv_x.c:25:
include/asm/uv/uv_mmrs.h:986: error: redefinition of ‘union uvh_rh_gam_cfg_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:988: error: redefinition of ‘struct uvh_rh_gam_cfg_overlay_config_mmr_s’
include/asm/uv/uv_mmrs.h:1064: error: redefinition of ‘union uvh_rh_gam_mmioh_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:1066: error: redefinition of ‘struct uvh_rh_gam_mmioh_overlay_config_mmr_s’
caused by duplicate section (cut & paste error) in commit 5d061e397db1 "x86, uv: update x86 mmr list for SGI uv".
- local caching of smp_processor_id() in default_do_nmi()
- v2: do not split default_do_nmi over two lines
On Wed, Jul 02, 2008 at 08:12:20PM +0400, Cyrill Gorcunov wrote:
> | -static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> | +static notrace __kprobes void
> | +default_do_nmi(struct pt_regs *regs)
> | [ ... ]
> | -asmlinkage notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> | +asmlinkage notrace __kprobes void
> | +default_do_nmi(struct pt_regs *regs)
>
> Hi Alexander, good done, thanks! But why did you split default_do_nmi
> definition by two lines? I think it would be better to keep them as it
> was before, ie by a single line
>
> static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
Thanks! Here is the replacement patch with default_do_nmi left on
a single line.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reorder headers and collect globals in traps_32.c and traps_64.c
Code size and data size are unaffected by the changes. Code
itself is changed due to different ordering of data and bss.
The bss segment changed size due to a change in the packing
of the variables.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge the tsc calibration code for the 32bit and 64bit kernel.
The paravirtualized calculate_cpu_khz for 64bit now points to the correct
tsc_calibrate code as in 32bit.
Original native_calculate_cpu_khz for 64 bit is now called as calibrate_cpu.
Also moved the recalibrate_cpu_khz function in the common file.
Note that this function is called only from powernow K7 cpu freq driver.
Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the basic global variable definitions and sched_clock handling in the
common "tsc.c" file.
- Unify notsc kernel command line handling for 32 bit and 64bit.
- Functional changes for 64bit.
- "tsc_disabled" is updated if "notsc" is passed at boottime.
- Fallback to jiffies for sched_clock, incase notsc is passed on
commandline.
Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bernhard Walle [Fri, 27 Jun 2008 11:12:55 +0000 (13:12 +0200)]
x86: use FIRMWARE_MEMMAP on x86/E820
This patch uses the /sys/firmware/memmap interface provided in the last patch
on the x86 architecture when E820 is used. The patch copies the E820
memory map very early, and registers the E820 map afterwards via
firmware_map_add_early().
Bernhard Walle [Fri, 27 Jun 2008 11:12:54 +0000 (13:12 +0200)]
sysfs: add /sys/firmware/memmap
This patch adds /sys/firmware/memmap interface that represents the BIOS
(or Firmware) provided memory map. The tree looks like:
/sys/firmware/memmap/0/start (hex number)
end (hex number)
type (string)
... /1/start
end
type
With the following shell snippet one can print the memory map in the same form
the kernel prints itself when booting on x86 (the E820 map).
--------- 8< --------------------------
#!/bin/sh
cd /sys/firmware/memmap
for dir in * ; do
start=$(cat $dir/start)
end=$(cat $dir/end)
type=$(cat $dir/type)
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
done
--------- >8 --------------------------
That patch only provides the needed interface:
1. The sysfs interface.
2. The structure and enumeration definition.
3. The function firmware_map_add() and firmware_map_add_early()
that should be called from architecture code (E820/EFI, for
example) to add the contents to the interface.
If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
nothing without cluttering the architecture-specific code with #ifdef's.
The purpose of the new interface is kexec: While /proc/iomem represents
the *used* memory map (e.g. modified via kernel parameters like 'memmap'
and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
map provided via the firmware. So kexec can:
- use the original memory map for rebooting,
- use the /proc/iomem for setting up the ELF core headers for kdump
case that should only represent the memory of the system.
Rather than using _PAGE_GLOBAL - which not all CPUs support - to test
CPA, use one of the reserved-for-software-use PTE flags instead. This
allows CPA testing to work on CPUs which don't support PGD.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86_32: remove __PAGE_KERNEL(_EXEC)
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Older x86-32 processors do not support global mappings (PGD), so must
only use it if the processor supports it.
The _PAGE_KERNEL* flags always have _PAGE_KERNEL set, since logically
we always want it set.
This is OK even on processors which do not support PGD, since all
_PAGE flags are masked with __supported_pte_mask before being turned
into a real in-pagetable pte. On 32-bit systems, __supported_pte_mask
is initialized to not contain _PAGE_GLOBAL, and it is then added if
the CPU is found to support it.
The x86-32 code used to use __PAGE_KERNEL/__PAGE_KERNEL_EXEC for this
purpose, but they're now redundant and can be removed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: always set _PAGE_GLOBAL in _PAGE_KERNEL* flags
Consistently set _PAGE_GLOBAL in _PAGE_KERNEL flags. This makes 32-
and 64-bit code consistent, and removes some special cases where
__PAGE_KERNEL* did not have _PAGE_GLOBAL set, causing confusion as a
result of the inconsistencies.
This patch only affects x86-64, which generally always supports PGD.
The x86-32 patch is next.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Thu, 3 Jul 2008 01:53:44 +0000 (18:53 -0700)]
x86: move init_cpu_to_node after get_smp_config
when acpi=off, cpu_to_apicid is ready after get_smp_config
so need to move init_cpu_to_node after it.
otherwise, we will get wrong cpu->node mapping, and it will rely on
amd_detect_cmp() to correct it - but that is too late as
setup_per_cpu_data is already called before that so we will get
per_cpu_data on the wrong node.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bernhard Walle [Thu, 26 Jun 2008 19:54:08 +0000 (21:54 +0200)]
x86: find offset for crashkernel reservation automatically
This patch removes the need of the crashkernel=...@offset parameter to define
a fixed offset for crashkernel reservation. That feature can be used together
with a relocatable kernel where the kexec-tools relocate the kernel and
get the actual offset from /proc/iomem.
The use case is a kernel where the .text+.data+.bss is after 16M physical
memory (debug kernel with lockdep on x86_64 can cause that) which caused a
major pain in autoconfiguration in our distribution.
Also, that patch unifies crashdump architectures a bit since IA64 has
that semantics from the very beginning of the kdump port.
Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: vgoyal@redhat.com Cc: Bernhard Walle <bwalle@suse.de> Cc: kexec@lists.infradead.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Fri, 27 Jun 2008 17:10:13 +0000 (10:10 -0700)]
x86: add check for node passed to node_to_cpumask, v3
* When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
node_to_cpumask and node_to_cpumask_ptr should be validated.
If invalid, then a dump_stack is performed and a zero cpumask
is returned.
v2: Slightly different version to remove a compiler warning.
v3: Redone to reflect moving setup.c -> setup_percpu.c
Signed-off-by: Mike Travis <travis@sgi.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: fix CPA self-test for "x86/paravirt: groundwork for 64-bit Xen support"
Ingo Molnar wrote:
> -tip auto-testing found pagetable corruption (CPA self-test failure):
>
> [ 32.956015] CPA self-test:
> [ 32.958822] 4k 2048 large 508 gb 0 x 2556[ffff880000000000-ffff88003fe00000] miss 0
> [ 32.964000] CPA ffff88001d54e000: bad pte 1d4000e3
> [ 32.968000] CPA ffff88001d54e000: unexpected level 2
> [ 32.972000] CPA ffff880022c5d000: bad pte 22c000e3
> [ 32.976000] CPA ffff880022c5d000: unexpected level 2
> [ 32.980000] CPA ffff8800200ce000: bad pte 200000e3
> [ 32.984000] CPA ffff8800200ce000: unexpected level 2
> [ 32.988000] CPA ffff8800210f0000: bad pte 210000e3
>
> config and full log can be found at:
>
> http://redhat.com/~mingo/misc/config-Mon_Jun_30_11_11_51_CEST_2008.bad
> http://redhat.com/~mingo/misc/log-Mon_Jun_30_11_11_51_CEST_2008.bad
Phew. OK, I've worked this out. Short version is that's it's a false
alarm, and there was no real failure here. Long version:
* I changed the code to create the physical mapping pagetables to
reuse any existing mapping rather than replace it. Specifically,
reusing an pud pointed to by the pgd caused this symptom to appear.
* The specific PUD being reused is the one created statically in
head_64.S, which creates an initial 1GB mapping.
* That mapping doesn't have _PAGE_GLOBAL set on it, due to the
inconsistency between __PAGE_* and PAGE_*.
* The CPA test attempts to clear _PAGE_GLOBAL, and then checks to
see that the resulting range is 1) shattered into 4k pages, and 2)
has no _PAGE_GLOBAL.
* However, since it didn't have _PAGE_GLOBAL on that range to start
with, change_page_attr_clear() had nothing to do, and didn't
bother shattering the range,
* resulting in the reported messages
The simple fix is to set _PAGE_GLOBAL in level2_ident_pgt.
An additional fix to make CPA testing more robust by using some other
pagetable bit (one of the unused available-to-software ones). This
would solve spurious CPA test warnings under Xen which uses _PAGE_GLOBAL
for its own purposes (ie, not under guest control).
Also, we should revisit the use of _PAGE_GLOBAL in asm-x86/pgtable.h,
and use it consistently, and drop MAKE_GLOBAL. The first time I
proposed it it caused breakages in the very early CPA code; with luck
that's all fixed now.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Mon, 30 Jun 2008 23:20:54 +0000 (16:20 -0700)]
x86: move reserve_setup_data to setup.c
Ying Huang would like setup_data to be reserved, but not included in the
no save range.
Here we try to modify the e820 table to reserve that range early.
also add that in early_res in case bootloader messes up with the ramdisk.
other solution would be
1. add early_res_to_highmem...
2. early_res_to_e820...
but they could reserve another type memory wrongly, if early_res has some
resource reserved early, and not needed later, but it is not removed from
early_res in time. Like the RAMDISK (already handled).