x86: generic versions of find_first_(zero_)bit, convert i386
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: merge the simple bitops and move them to bitops.h
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.
For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.
Improved comments for ffs, ffz, fls and variations.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.
On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:
In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
On x86_64, 52 locations are optimized with a minimal increase in
code size:
Current #testing defconfig:
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename 5392637 846592 724424 6963653 6a41c5 vmlinux
After removing the x86_64 specific optimization for find_next_*bit:
94 x bsf, 79 x find_next_*bit
text data bss dec hex filename 5392358 846592 724424 6963374 6a40ae vmlinux
After this patch (making the optimization generic):
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename 5392396 846592 724424 6963412 6a40d4 vmlinux
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
x86 PAT: decouple from nonpromisc devmem
x86 PAT: tone down debugging messages
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (7751): ir-kbd-i2c: Save a temporary memory allocation in ir_probe
V4L/DVB (7750): au0828/ cleanups and fixes
V4L/DVB (7748): tuner-core: some adjustments at tuner logs, if debug enabled
V4L/DVB (7746): pvrusb2: make signed one-bit bitfields unsigned
V4L/DVB (7744): pvrusb2-dvb: add atsc/qam support for Hauppauge pvrusb2 model 751xx
V4L/DVB (7742): cx88: Add support for the DViCO FusionHDTV_7_GOLD digital modes
V4L/DVB (7741): s5h1411: Adding support for this ATSC/QAM demodulator
V4L/DVB (7740): tuner-xc2028.c dubious !x & y
V4L/DVB (7739): mt312.h: dubious one-bit signed bitfield
V4L/DVB (7735): Fix compilation for au0828
V4L/DVB (7734): em28xx: copy and paste error in em28xx_init_isoc
V4L/DVB (7733): blackbird_find_mailbox negative return ignored in blackbird_initialize_codec()
V4L/DVB (7732): vivi: fix a warning
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (61 commits)
ide: sanitize handling of IDE_HFLAG_NO_SET_MODE host flag
sis5513: fail early for unsupported chipsets
it821x: fix kzalloc() failure handling
qd65xx: use IDE_HFLAG_SINGLE host flag
qd65xx: always use ->selectproc method
ide-cd: put proc-related functions together under single ifdef
ide-cd: Replace __FUNCTION__ with __func__
IDE: Coding Style fixes to drivers/ide/ide-cd.c
IDE: Coding Style fixes to drivers/ide/pci/cy82c693.c
IDE: Coding Style fixes to drivers/ide/pci/it8213.c
IDE: Coding Style fixes to drivers/ide/ide-floppy.c
IDE: Coding Style fixes to drivers/ide/legacy/ali14xx.c
IDE: Coding Style fixes to drivers/ide/legacy/hd.c
IDE: Coding Style fixes to drivers/ide/pci/cmd640.c
IDE: Coding Style fixes to drivers/ide/pci/opti621.c
IDE: Coding Style fixes to drivers/ide/ide-pnp.c
IDE: Coding Style fixes to drivers/ide/ide-proc.c
IDE: Coding Style fixes to drivers/ide/legacy/ide-4drives.c
IDE: Coding Style fixes to drivers/ide/legacy/umc8672.c
IDE: Coding Style fixes to drivers/ide/pci/generic.c
...
Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp: convert drivers/char/agp/frontend.c to use unlocked_ioctl
agp: fix shadowed variable warning in amd-k7-agp.c
Merge branch 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: _end is shadowing real _end, just rename it.
drm/vbl rework: rework how the drm deals with vblank.
drm: reorganise minor number handling using backported modesetting code.
drm/i915: Handle tiled buffers in vblank tasklet
drm/i965: On I965, use correct 3DSTATE_DRAWING_RECTANGLE command in vblank
drm: Remove unneeded dma sync in ATI pcigart alloc
drm: Fix mismerge of non-coherent DMA patch
Ingo Molnar [Mon, 3 Mar 2008 11:38:52 +0000 (12:38 +0100)]
x86: add optimized inlining
add CONFIG_OPTIMIZE_INLINING=y.
allow gcc to optimize the kernel image's size by uninlining
functions that have been marked 'inline'. Previously gcc was
forced by Linux to always-inline these functions via a gcc
attribute:
Especially when the user has already selected
CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in
kernel image size (using a standard Fedora .config):
qd_select() checks itself whether timings should be reprogrammed so
remove superfluous qd_timing_ok() and always use ->selectproc method
(rename qd_select() to qd65xx_select() while at it).
All host drivers now either set hwif->mmio or reserve continuous
I/O resources so remove no longer needed hwif->straight8 flag
and never reached code for 'hwif->straight8 == 0' case.
Adrian Bunk [Sat, 26 Apr 2008 15:36:37 +0000 (17:36 +0200)]
remove include/linux/hdsmart.h
include/linux/hdsmart.h is not used by the kernel and should therefore
be removed.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>, Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This removes the internal ide-cd buffer and falls back to read-ahead block layer
capabilities. Thorough testing (cd burning, dvd read, raw read) gives with the
bufferless mode marginally better performance in addition to simplified code.
bufferless:
dd: reading `/dev/hdc': Input/output error
6238+0 records in
6238+0 records out 204406784 bytes (204 MB) copied, 259.891 s, 787 kB/s
real 4m21.598s
user 0m0.014s
sys 0m0.744s
with the old buffer (2.6.25-rc1):
dd: reading `/dev/hdc': Input/output error
6238+0 records in
6238+0 records out 204406784 bytes (204 MB) copied, 262.893 s, 778 kB/s
Sergei Shtylyov [Sat, 26 Apr 2008 15:36:31 +0000 (17:36 +0200)]
ide: make ide_pci_check_iomem() actually work
This function didn't actually check if a given BAR is in I/O space because of
using the bogus PCI_BASE_ADDRESS_IO_MASK (which equals ~3) to test the resource
flags instead of IORESOURCE_IO -- fix this, make ide_hwif_configure() check the
results failing if necessary, and move the printk() call to the failure path.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Jacek Luczak [Fri, 11 Apr 2008 11:29:04 +0000 (13:29 +0200)]
x86: section mismatch fixes, #3
This patch fixes section mismatch warnings in unlock_ExtINT_logic().
WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jacek Luczak [Fri, 11 Apr 2008 11:28:49 +0000 (13:28 +0200)]
x86: section mismatch fixes, #2
This patch fixes section mismatch warnings in smpboot_setup_io_apic().
WARNING: arch/x86/kernel/built-in.o(.text+0x11781): Section mismatch in reference from the function smpboot_setup_io_apic()
to the function .init.text:setup_IO_APIC()
The function smpboot_setup_io_apic() references
the function __init setup_IO_APIC().
This is often because smpboot_setup_io_apic lacks a __init
annotation or the annotation of setup_IO_APIC is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Roland McGrath [Tue, 22 Apr 2008 19:21:25 +0000 (12:21 -0700)]
x86_64 ia32 ptrace: convert to compat_arch_ptrace
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Roland McGrath [Tue, 22 Apr 2008 19:20:20 +0000 (12:20 -0700)]
x86_64 ia32 ptrace: use compat_ptrace_request for siginfo
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace. The generic
compat_ptrace_request code handles these.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>