err.no Git - linux-2.6/log

[libata] wrap kmap_atomic(KM_IRQ0) with local_irq_save/restore()

Interrupts must be disabled if using kmap_atomic(KM_IRQ0), but that was
not the case in a few code paths coming directly from ATA driver
interrupt handlers (which use spin_lock rather than spin_lock_irqsave).

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

sata_svw: Add support for HT1100 SATA controller

This patch adds support (including ATAPI DMA) for HT1100 (aka BCM11000) SATA controller.

Signed-off-by: Anantha Subramanyam <ananth@broadcom.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
  [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
  Remove empty file fs/xfs/Makefile-linux-2.6.

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add missing ext4_journal_stop()
  ext4: ext4_find_next_zero_bit needs an aligned address on some arch
  ext4: set EXT4_EXTENTS_FL only for directory and regular files
  ext4: Don't mark filesystem error if fallocate fails
  ext4: Fix BUG when writing to an unitialized extent
  ext4: Don't use ext4_dec_count() if not needed
  ext4: modify block allocation algorithm for the last group
  ext4: Don't claim block from group which has corrupt bitmap
  ext4: Get journal write access before modifying the extent tree
  ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent()
  ext4: Don't leave behind a half-created inode if ext4_mkdir() fails
  ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!
  ext4: Fix locking hierarchy violation in ext4_fallocate()
  Remove incorrect BKL comments in ext4

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
Revert "power_state: get rid of write-only variable in SATA"
make atapi_dmadir static

Merge branch 'v2.6.25-rc3-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep

* 'v2.6.25-rc3-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
Subject: lockdep: include all lock classes in all_lock_classes
lockdep: increase MAX_LOCK_DEPTH

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: fix NULL pointer deref. and resource leak
  Documentation: correction to debugging-via-ohci1394
  ieee1394: sbp2: fix rescan-scsi-bus
  firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device
  firewire: fw-sbp2: fix NULL pointer deref. in slave_alloc
  firewire: fw-sbp2: (try to) avoid I/O errors during reconnect
  firewire: fw-sbp2: enforce a retry of __scsi_add_device if bus generation changed
  firewire: fw-sbp2: sort includes
  firewire: fw-sbp2: logout and login after failed reconnect
  firewire: fw-sbp2: don't add scsi_device twice
  firewire: fw-sbp2: log bus_id at management request failures
  firewire: fw-sbp2: wait for completion of fetch agent reset
  ieee1394: sbp2: add INQUIRY delay workaround
  firewire: fw-sbp2: add INQUIRY delay workaround
  firewire: log GUID of new devices
  firewire: fw-sbp2: don't retry login or reconnect after unplug
  firewire: fix "kobject_add failed for fw* with -EEXIST"
  firewire: fw-sbp2: fix logout before login retry
  firewire: fw-sbp2: unsigned int vs. unsigned

Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86

* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (24 commits)
  x86: no robust/pi futex for real i386 CPUs
  x86: fix boot failure on 486 due to TSC breakage
  x86: fix build on non-C locales.
  x86: make c_idle.work have a static address.
  x86: don't save unreliable stack trace entries
  x86: don't make swapper_pg_pmd global
  x86: don't print a warning when MTRR are blank and running in KVM
  x86: fix execve with -fstack-protect
  x86: fix vsyscall wreckage
  x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZE
  x86: fix spontaneous reboot with allyesconfig bzImage
  x86: remove double-checking empty zero pages debug
  x86: notsc is ignored on common configurations
  x86/mtrr: fix kernel-doc missing notation
  x86: handle BIOSes which terminate e820 with CF=1 and no SMAP
  x86: add comments for NOPs
  x86: don't use P6_NOPs if compiling with CONFIG_X86_GENERIC
  x86: require family >= 6 if we are using P6 NOPs
  x86: do not promote TM3x00/TM5x00 to i686-class
  x86: hpet fix docbook comment
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  latencytop: change /proc task_struct access method
  latencytop: fix memory leak on latency proc file
  latencytop: fix kernel panic while reading latency proc file
  sched: add declaration of sched_tail to sched.h
  sched: fix signedness warnings in sched.c
  sched: clean up __pick_last_entity() a bit
  sched: remove duplicate code from sched_fair.c
  sched: make early bootup sched_clock() use safer

printk: fix possible printk overrun

printk recursion detection prepends message to printk_buf and offsets
printk_buf when actual message is printed but it forgets to trim buffer
length accordingly. This can result in overrun in extreme cases. Fix it.

[ mingo@elte.hu:

  bug was introduced by me via:

   commit 32a76006683f7b28ae3cc491da37716e002f198e
   Author: Ingo Molnar <mingo@elte.hu>
   Date:   Fri Jan 25 21:07:58 2008 +0100

       printk: make printk more robust by not allowing recursion
]

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: no robust/pi futex for real i386 CPUs

Real i386 CPUs do not have cmpxchg instructions. Catch it before
crashing on an invalid opcode.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix boot failure on 486 due to TSC breakage

> Diffing dmesg between git7 and git8 doesn't sched any light since
> git8 also removed the printouts of the x86 caps as they were being
> initialised and updated. I'm currently adding those printouts back
> in the hope of seeing where and when the caps get broken.

That turned out to be very illuminating:

--- dmesg-2.6.24-git7 2008-02-24 18:01:25.295851000 +0100
+++ dmesg-2.6.24-git8 2008-02-24 18:01:25.530358000 +0100
...
CPU: After generic identify, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000

CPU: After all inits, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+CPU: After applying cleared_cpu_caps, caps: 00000013 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Notice how the TSC cap bit goes from Off to On.

(The first two lines are printout loops from -git7 forward-ported
to -git8, the third line is the same printout loop added just after
the xor-with-cleared_cpu_caps[] loop.)

Here's how the breakage occurs:
1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc,
   so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC).
2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears
   the bit in boot_cpu_data and sets it in cleared_cpu_caps
3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps
   in with cleared_cpu_caps
   HOWEVER, at this point c->x86_capability correctly has TSC
   Off, cleared_cpu_caps has TSC On, so the XOR incorrectly
   sets TSC to On in c->x86_capability, with disastrous results.

The real bug is that clearing bits with XOR only works if the
bits are known to be 1 prior to the XOR, and that's not true here.

A simple fix is to convert the XOR to AND-NOT instead. The following
patch does that, and allows my 486 to boot 2.6.25-rc kernels again.

[ mingo@elte.hu: fixed a similar bug in setup_64.c as well. ]

The breakage was introduced via commit 7d851c8d3db0.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: fix build on non-C locales.

For some locales regex range [a-zA-Z] does not work as it is supposed to.
so we have to use [:alnum:] and [:xdigit:] to make it work as intended.

[1] http://en.wikipedia.org/wiki/Estonian_alphabet

Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: make c_idle.work have a static address.

Currently, c_idle is declared in the stack, and thus, have no static address.

Peter Zijlstra points out this simple solution, in which c_idle.work
is initializated separatedly. Note that the INIT_WORK macro has a static
declaration of a key inside.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Acked-by: Peter Zijlstra <pzijlstr@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: don't save unreliable stack trace entries

Currently, there is no way for print_stack_trace() to determine whether
a given stack trace entry was deemed reliable or not, simply because
save_stack_trace() does not record this information. (Perhaps needless
to say, this makes the saved stack traces A LOT harder to read, and
probably with no other benefits, since debugging features that use
save_stack_trace() most likely also require frame pointers, etc.)

This patch reverts to the old behaviour of only recording the reliable trace
entries for saved stack traces.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: don't make swapper_pg_pmd global

There doesn't seem to be any reason for swapper_pg_pmd being global.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: don't print a warning when MTRR are blank and running in KVM

Inside a KVM virtual machine the MTRRs are usually blank. This confuses Linux
and causes a warning message at boot. This patch removes that warning message
when running Linux as a KVM guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: fix execve with -fstack-protect

pointed out by pageexec@freemail.hu:

> what happens here is that gcc treats the argument area as owned by the
> callee, not the caller and is allowed to do certain tricks. for ssp it
> will make a copy of the struct passed by value into the local variable
> area and pass *its* address down, and it won't copy it back into the
> original instance stored in the argument area.
>
> so once sys_execve returns, the pt_regs passed by value hasn't at all
> changed and its default content will cause a nice double fault (FWIW,
> this part took me the longest to debug, being down with cold didn't
> help it either ;).

To fix this we pass in pt_regs by pointer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix vsyscall wreckage

based on a report from Arne Georg Gleditsch about user-space apps
misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
of the code revealed that the "NOP patching" done there is
fundamentally unsafe for a number of reasons:

1) the patching code runs without synchronizing other CPUs

2) it inserts NOPs even if there is no clock source which provides vread

3) when the clock source changes to one without vread we run in
exactly the same problem as in #2

4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out

as a result it is possible to break user-space via this patching.
The only safe thing for now is to remove the patching.

This code was broken since v2.6.21.

Reported-by: Arne Georg Gleditsch <arne.gleditsch@dolphinics.no>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZE

The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel
text but data, bss and init sections as well.

That name led me on the wrong path with the KERNEL_TEXT_SIZE regression,
because i knew how big of _text_ my images have and i knew about the 40 MB
"text" limit so i wrongly thought to be on the safe side of the 40 MB limit
with my 29 MB of text, while the total image size was slightly above 40 MB.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: fix spontaneous reboot with allyesconfig bzImage

recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.

after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:

      #define KERNEL_TEXT_SIZE  (40*1024*1024)

which limit my vmlinux just happened to pass:

       text           data       bss        dec       hex   filename
   29703744        4222751   8646224   42572719   2899baf   vmlinux

40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/

So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)

So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)

We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: remove double-checking empty zero pages debug

so far no one complained about that.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: notsc is ignored on common configurations

notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is
bad, fix it.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86/mtrr: fix kernel-doc missing notation

Fix mtrr kernel-doc warning:
Warning(linux-2.6.24-git12//arch/x86/kernel/cpu/mtrr/main.c:677): No description found for parameter 'end_pfn'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: handle BIOSes which terminate e820 with CF=1 and no SMAP

The proper way to terminate the e820 chain is with %ebx == 0 on the
last legitimate memory block.  However, several BIOSes don't do that
and instead return error (CF = 1) when trying to read off the end of
the list.  For this error return, %eax doesn't necessarily return the
SMAP signature -- correctly so, since %ah should contain an error code
in this case.

To deal with some particularly broken BIOSes, we clear the entire e820
chain if the SMAP signature is missing in the middle, indicating a
plain insane e820 implementation.  However, we need to make the test
for CF = 1 before the SMAP check.

This fixes at least one HP laptop (nc6400) for which none of the
memory-probing methods (e820, e801, 88) functioned fully according to
spec.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: add comments for NOPs

Add comments describing the various NOP sequences.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: don't use P6_NOPs if compiling with CONFIG_X86_GENERIC

P6_NOPs are definitely not supported on some VIA CPUs, and possibly
(unverified) on AMD K7s. It is also the only thing that prevents a
686 kernel from running on Transmeta TM3x00/5x00 (Crusoe) series.

The performance benefit over generic NOPs is very small, so when
building for generic consumption, avoid using them.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: require family >= 6 if we are using P6 NOPs

The P6 family of NOPs are only available on family >= 6 or above, so
enforce that in the boot code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: do not promote TM3x00/TM5x00 to i686-class

We have been promoting Transmeta TM3x00/TM5x00 chips to i686-class
based on the notion that they contain all the user-space visible
features of an i686-class chip. However, this is not actually true:
they lack the EA-taking long NOPs (0F 1F /0). Since this is a
userspace-visible incompatibility, downgrade these CPUs to the
manufacturer-defined i586 level.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: hpet fix docbook comment

Signed-off-by: Pavel Machek <Pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: make DEBUG_PAGEALLOC and CPA more robust

Use PF_MEMALLOC to prevent recursive calls in the DBEUG_PAGEALLOC
case. This makes the code simpler and more robust against allocation
failures.

This fixes the following fallback to non-mmconfig:

http://lkml.org/lkml/2008/2/20/551
http://bugzilla.kernel.org/show_bug.cgi?id=10083

Also, for DEBUG_PAGEALLOC=n reduce the pool size to one page.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86/lguest: fix pgdir pmd index calculation

Hi all,

Beginning from commits close to v2.6.25-rc2, running lguest always oopses
the host kernel. Oops is at [1].

Bisection led to the following commit:

commit 37cc8d7f963ba2deec29c9b68716944516a3244f

    x86/early_ioremap: don't assume we're using swapper_pg_dir

    At the early stages of boot, before the kernel pagetable has been
    fully initialized, a Xen kernel will still be running off the
    Xen-provided pagetables rather than swapper_pg_dir[].  Therefore,
    readback cr3 to determine the base of the pagetable rather than
    assuming swapper_pg_dir[].

static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
{
- pgd_t *pgd = &swapper_pg_dir[pgd_index(addr)];
+ /* Don't assume we're using swapper_pg_dir at this point */
+ pgd_t *base = __va(read_cr3());
+ pgd_t *pgd = &base[pgd_index(addr)];
pud_t *pud = pud_offset(pgd, addr);
pmd_t *pmd = pmd_offset(pud, addr);

Trying to analyze the problem, it seems on the guest side of lguest,
%cr3 has a different value from &swapper_pg-dir (which
is AFAIK fine on a pravirt guest):

Putting some debugging messages in early_ioremap_pmd:

/* Appears 3 times */
[    0.000000] ***************************
[    0.000000] __va(%cr3) = c0000000, &swapper_pg_dir = c02cc000
[    0.000000] ***************************

After 8 hours of debugging and staring on lguest code, I noticed something
strange in paravirt_ops->set_pmd hypercall invocation:

static void lguest_set_pmd(pmd_t *pmdp, pmd_t pmdval)
{
*pmdp = pmdval;
lazy_hcall(LHCALL_SET_PMD, __pa(pmdp)&PAGE_MASK,
   (__pa(pmdp)&(PAGE_SIZE-1))/4, 0);
}

The first hcall parameter is global pgdir which looks fine. The second
parameter is the pmd index in the pgdir which is suspectful.

AFAIK, calculating the index of pmd does not need a divisoin over four.
Removing the division made lguest work fine again . Patch is at [2].

I am not sure why the division over four existed in the first place. It
seems bogus, maybe the Xen patch just made the problem appear ?

[2]: The patch:

[PATCH] lguest: fix pgdir pmd index cacluation

Remove an error in index calculation which leads to removing
a not existing shadow page table (leading to a Null dereference).

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

lguest: fix build breakage

[ mingo@elte.hu: merged to Rusty's patch ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

lguest: include function prototypes

Added a declaration to asm-x86/lguest.h and moved the extern arrays there
as well. As an alternative to including asm/lguest.h directly, an
include could be put in linux/lguest.h

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: "rusty@rustcorp.com.au" <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

[XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
platform.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30559a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>

[XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
platform.

SGI-PV: 974005
SGI-Modid: xfs-linux-melb:xfs-kern:30558a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus

Revert "power_state: get rid of write-only variable in SATA"

This reverts commit 559bbe6cbd0d8c68d40076a5f7dc98e3bf5864b2.

Michael S. Tsirkin reports that this changes breaks suspend/resume.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

make atapi_dmadir static

atapi_dmadir can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Subject: lockdep: include all lock classes in all_lock_classes
Add each lock class to the all_lock_classes list when it is
first registered.

Previously, lock classes were added to all_lock_classes when
the lock class was first used. Since one of the uses of the
list is to find unused locks, this didn't work well.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

lockdep: increase MAX_LOCK_DEPTH

Some code paths exceed the current max lock depth (XFS), so increase
this limit a bit. I looked at making this a dynamic allocated array,
but we should not advocate insane lock depths, so stay with this as
long as it works...

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ext4: add missing ext4_journal_stop()

Add missing ext4_journal_stop() in error handling.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>

latencytop: change /proc task_struct access method

Change getting task_struct by get_proc_task() at read or write time,
and returns -ESRCH if get_proc_task() returns NULL.
This is same behavior as other /proc files.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

latencytop: fix memory leak on latency proc file

At lstats_open(), calling get_proc_task() gets task struct, but it never put.
put_task_struct() should be called when releasing.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

latencytop: fix kernel panic while reading latency proc file

Reading /proc/<pid>/latency or /proc/<pid>/task/<tid>/latency could cause
NULL pointer dereference.

In lstats_open(), get_proc_task() can return NULL, in which case the kernel
will oops at lstats_show_proc() because m->private is NULL.

When get_proc_task() returns NULL, the kernel should return -ENOENT.

This can be reproduced by the following script.
while :
do
        date
        bash -c 'ls > ls.$$' &
        pid=$!
        cat /proc/$pid/latency &
        cat /proc/$pid/latency &
        cat /proc/$pid/latency &
        cat /proc/$pid/latency
done

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add declaration of sched_tail to sched.h

Avoids sparse warnings:
kernel/sched.c:2170:17: warning: symbol 'schedule_tail' was not declared. Should it be static?

Avoids the need for an external declaration in arch/um/process.c

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: fix signedness warnings in sched.c

Unsigned long values are always assigned to switch_count,
make it unsigned long.

kernel/sched.c:3897:15: warning: incorrect type in assignment (different signedness)
kernel/sched.c:3897:15:    expected long *switch_count
kernel/sched.c:3897:15:    got unsigned long *<noident>
kernel/sched.c:3921:16: warning: incorrect type in assignment (different signedness)
kernel/sched.c:3921:16:    expected long *switch_count
kernel/sched.c:3921:16:    got unsigned long *<noident>

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: clean up __pick_last_entity() a bit

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove duplicate code from sched_fair.c

pick_task_entity() duplicates existing code. This functionality can be
easily obtained using rb_last(). Avoid code duplication by using rb_last().

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: make early bootup sched_clock() use safer

do not call sched_clock() too early. Not only might rq->idle
not be set up - but pure per-cpu data might not be accessible
either.

this solves an ia64 early bootup hang with CONFIG_PRINTK_TIME=y.

Tested-by: Tony Luck <tony.luck@gmail.com>
Acked-by: Tony Luck <tony.luck@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Linux 2.6.25-rc3

i2c-i801: Add support for the ICH10

Add the Intel ICH10 SMBus Controller DeviceID's and updates
Tolapai support.

Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c: Make i2c_register_board_info() a NOP when CONFIG_I2C_BOARDINFO=n

Don't require platform code to be #ifdeffed according to whether
I2C is enabled or not ... if it's not enabled, let GCC compile out
all I2C device declarations. (Issue noted on an NSLU2 build that
didn't configure I2C.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c-pca-isa: Add access check to legacy ioports

When probing i2c-pca-isa writes to legacy ioports, which crashes the kernel
if there is no device at that port.
This patch adds a check_legacy_ioport call, so probe fails gracefully
and thus prevents the oops.

Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

Alchemy: compile fix

Commit 8b798c4d16b762d15f4055597ff8d87f73b35552 broke
alchemy build, fix it. Pointed out by Adrian Bunk.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c: Storage class should be before const qualifier

The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c-pxa: Misc fixes

While working on the PCA9564-platform driver, I sometimes had a glimpse at the
pxa-driver. I found some suspicious places, and this patch contains my
suggestions. Note: They are not tested, due to no hardware.

[JD: Some more fixes.]

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Mike Rapoport <mike@compulab.co.il>
Tested-by: Eric Miao <ymiao3@marvell.com>

ARM: OMAP: Release i2c_adapter after use (Siemens SX1)

Each call to i2c_get_adapter() must be followed by a call to
i2c_put_adapter() to release the grabbed reference. Otherwise the
reference count grows forever and the adapter can never be
unregistered.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Vladimir Ananiev <vovan888@gmail.com>
Acked-by: Tony Lindgren <tony@atomide.com>

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-core: fix kernel-doc warning
  sata_fsl: fix build with ATA_VERBOSE_DEBUG
  [libata] ahci: AMD SB700/SB800 SATA support 64bit DMA
  libata-pmp: clear hob for pmp register accesses
  libata: automatically use DMADIR if drive/bridge requires it
  power_state: get rid of write-only variable in SATA
  pata_atiixp: Use 255 sector limit

libata-core: fix kernel-doc warning

Fix libata-core kernel-doc warning:
Warning(linux-2.6.25-rc2-git6//drivers/ata/libata-core.c:168): No description found for parameter 'ap'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

sata_fsl: fix build with ATA_VERBOSE_DEBUG

This patch fixes build and few warnings when ATA_VERBOSE_DEBUG
is defined:

CC drivers/ata/sata_fsl.o
drivers/ata/sata_fsl.c: In function ‘sata_fsl_fill_sg’:
drivers/ata/sata_fsl.c:338: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘void *’
drivers/ata/sata_fsl.c:338: warning: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘struct prde *’
drivers/ata/sata_fsl.c: In function ‘sata_fsl_qc_issue’:
drivers/ata/sata_fsl.c:459: error: ‘csr_base’ undeclared (first use in this function)
drivers/ata/sata_fsl.c:459: error: (Each undeclared identifier is reported only once
drivers/ata/sata_fsl.c:459: error: for each function it appears in.)
drivers/ata/sata_fsl.c: In function ‘sata_fsl_freeze’:
drivers/ata/sata_fsl.c:525: error: ‘csr_base’ undeclared (first use in this function)
make[2]: *** [drivers/ata/sata_fsl.o] Error 1

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

[libata] ahci: AMD SB700/SB800 SATA support 64bit DMA

SB700 SATA controller can support 64 bit DMA, the previous commit
badc2341579511a247f5993865aa68379e283c5c was added with
careless reference to SB600, which should be modified by this patch.

Signed-off-by: Shane Huang <shane.huang@amd.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

libata-pmp: clear hob for pmp register accesses

>> Mark Lord wrote:
>>> Tejun, I've added PMP to sata_mv, and am now trying to get it
>>> to work with a Marvell PM attached.
>>>
>>> And the behaviour I see is very bizarre.
>>>
>>> After hard+soft resets, the PM signature is found,
>>> and libata interrogates the PM registers.
>>>
>>> It successfully reads register 0, and then register 1.
>>> But all subsequent registers read out (incorrectly) as zeros.
...

This behavior has been confirmed by Marvell with a SATA analyzer.
The Marvell port-multiplier apparently likes to see clean HOB
information when accessing PMP registers.

Since sata_mv uses PIO shadow register access, this doesn't happen
automatically, as it might in a more purely FIS-based driver (eg. ahci).

One way to fix this is to flag these commands with ATA_TFLAG_LBA48,
forcing libata to write out the HOB fields with known (zero) values.

Signed-off-by: Saeed Bishara <saeed@marvell.com>
Acked-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

libata: automatically use DMADIR if drive/bridge requires it

Back in 2.6.17-rc2, a libata module parameter was added for atapi_dmadir.

That's nice, but most SATA devices which need it will tell us about it
in their IDENTIFY PACKET response, as bit-15 of word-62 of the
returned data (as per ATA7, ATA8 specifications).

So for those which specify it, we should automatically use the DMADIR bit.
Otherwise, disc writing will fail by default on many SATA-ATAPI drives.

This patch adds ATA_DFLAG_DMADIR and make ata_dev_configure() set it
if atapi_dmadir is set or identify data indicates DMADIR is necessary.
atapi_xlat() is converted to check ATA_DFLAG_DMADIR before setting
DMADIR.

Original patch is from Mark Lord.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

power_state: get rid of write-only variable in SATA

power_state is scheduled for removal, and libata uses it in write-only
mode. Remove it.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

pata_atiixp: Use 255 sector limit

AHCI needs sorting too but this deals with the old interface

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  [NETFILTER]: fix ebtable targets return
  [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
  [NET]: Restore sanity wrt. print_mac().
  [NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.
  [RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
  tg3: ethtool phys_id default
  [BNX2]: Update version to 1.7.4.
  [BNX2]: Disable parallel detect on an HP blade.
  [BNX2]: More 5706S link down workaround.
  ssb: Fix support for PCI devices behind a SSB->PCI bridge
  zd1211rw: fix sparse warnings
  rtl818x: fix sparse warnings
  ssb: Fix pcicore cardbus mode
  ssb: Make the GPIO API reentrancy safe
  ssb: Fix the GPIO API
  ssb: Fix watchdog access for devices without a chipcommon
  ssb: Fix serial console on new bcm47xx devices
  ath5k: Fix build warnings on some 64-bit platforms.
  WDEV, ath5k, don't return int from bool function
  WDEV: ath5k, fix lock imbalance
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: make IOMMU code respect the segment boundary limits
  [SPARC64]: Fix cpu trampoline et al. mismatch warnings.
  [SPARC64]: More sparse warning fixes in process.c
  [SPARC64]: Fix sparse warning wrt. fault_in_user_windows.
  [SPARC64]: Kill show_regs32().
  [SPARC64]: Fix sparse warnings wrt. __show_regs().
  [SPARC64]: Kill show_stackframe{,32}().
  [SPARC64]: Fix sparse warnings wrt. machine_alt_power_off().

Fix u132-hcd.c compile error

This fixes the following compile error caused by commit
3a2d5b700132f35401f1d9e22fe3c2cab02c2549 ("PM: Introduce
PM_EVENT_HIBERNATE callback state")

    CC [M]  drivers/usb/host/u132-hcd.o
  drivers/usb/host/u132-hcd.c: In function ‘u132_suspend’:
  drivers/usb/host/u132-hcd.c:3224: error: expected expression before ‘int’
  drivers/usb/host/u132-hcd.c:3225: error: ‘ports’ undeclared (first use in this function)
  ...

Signed-off-by: Mirco Tischler <mt-ml@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[NETFILTER]: fix ebtable targets return

The function ebt_do_table doesn't take NF_DROP as a verdict from the targets.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.

Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.

Thanks Patrick for noticing this.

[ The way this works is, when the device is actually registered,
the generic code noticed the '%' in the name and invokes
dev_alloc_name() to fully resolve the name. -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NET]: Restore sanity wrt. print_mac().

MAC_FMT had only one user and we tried to get rid of
that, but this created more problems than it solved.

As a result, this reverts three commits:

235365f3aaaa10b7056293877c0ead50425f25c7 ("net/8021q/vlan_dev.c: Use
print_mac."), fea5fa875eb235dc186b1f5184eb36abc63e26cc ("[NET]: Remove
MAC_FMT"), and 8f789c48448aed74fe1c07af76de8f04adacec7d ("[NET]:
Elminate spurious print_mac() calls.")

Signed-off-by: David S. Miller <davem@davemloft.net>

[NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.

The neigh_hash_grow() may update the tbl->hash_rnd value, which
is used in all tbl->hash callbacks to calculate the hashval.

Two lookup routines may race with this, since they call the
->hash callback without the tbl->lock held. Since the hash_rnd
is changed with this lock write-locked moving the calls to ->hash
under this lock read-locked closes this gap.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK

RTM_NEWLINK allows for already existing links to be modified. For this
purpose do_setlink() is called which expects address attributes with a
payload length of at least dev->addr_len. This patch adds the necessary
validation for the RTM_NEWLINK case.

The address length for links to be created is not checked for now as the
actual attribute length is used when copying the address to the netdevice
structure. It might make sense to report an error if less than addr_len
bytes are provided but enforcing this might break drivers trying to be
smart with not transmitting all zero addresses.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: ethtool phys_id default

When asked to blink LEDs the tg3 driver behaves when using:
ethtool -p ethX
The default value for data is zero, and other drivers interpret this
as blink forever (or at least a really long time). The tg3 driver
interprets this as blink once. All drivers should have the same
behaviour.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[BNX2]: Update version to 1.7.4.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fix vmsas.c file permissions

Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[BNX2]: Disable parallel detect on an HP blade.

Because of some board issues, we need to disable parallel detect on
an HP blade. Without this patch, the link state can become stuck
when it goes into parallel detect mode.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[BNX2]: More 5706S link down workaround.

The previous patches to workaround the 5706S on an HP blade were not
sufficient. The link state still does not change properly in some
cases. This patch adds polling to make it completely reliable.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Add memory barrier semantics to wake_up() & co

Oleg Nesterov and others have pointed out that on some architectures,
the traditional sequence of

set_current_state(TASK_INTERRUPTIBLE);
if (CONDITION)
return;
schedule();

is racy wrt another CPU doing

CONDITION = 1;
wake_up_process(p);

because while set_current_state() has a memory barrier separating
setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION
variable, there is no such memory barrier on the wakeup side.

Now, wake_up_process() does actually take a spinlock before it reads and
sets the task state on the waking side, and on x86 (and many other
architectures) that spinlock is in fact equivalent to a memory barrier,
but that is not generally guaranteed. The write that sets CONDITION
could move into the critical region protected by the runqueue spinlock.

However, adding a smp_wmb() to before the spinlock should now order the
writing of CONDITION wrt the lock itself, which in turn is ordered wrt
the accesses within the spinlock (which includes the reading of the old
state).

This should thus close the race (which probably has never been seen in
practice, but since smp_wmb() is a no-op on x86, it's not like this will
make anything worse either on the most common architecture where the
spinlock already gave the required protection).

Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mvsas: fix build warning, clean prototypes

- Fix build 'make randconfig' build warning spotted by Toralf Foerster:

drivers/scsi/mvsas.c: In function 'mvs_hexdump':
drivers/scsi/mvsas.c:715: error: implicit declaration of function 'isalnum'

- Remove unneeded prototypes (spotted by hch)

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

documentation: atomic_add_unless() doesn't imply mb() on failure

(sorry for being offtpoic, but while experts are here...)

A "typical" implementation of atomic_add_unless() can return 0 immediately
after the first atomic_read() (before doing cmpxchg). In that case it doesn't
provide any barrier semantics. See include/asm-ia64/atomic.h as an example.

We should either change the implementation, or fix the docs.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcgroup: return negative error code in mem_cgroup_create()

Cgroup requires the subsystem to return negative error code on error in the
create method.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcgroup: remove a useless VM_BUG_ON()

Remove this VM_BUG_ON(), as Balbir stated:

We used to have a for loop with !list_empty() as a termination condition
and VM_BUG_ON(!pc) is a spill over. With the new loop, VM_BUG_ON(!pc) does
not make sense.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcgroup: fix and update documentation

- remove trailing " Bytes"s in the demonstration
- remove section 4.4 (feature control_type has been removed)
- fix reference section

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: remove dead code in cgroup_get_rootdir()

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: remove duplicate code in find_css_set()

The list head res->tasks gets initialized twice in find_css_set().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: fix subsys bitops

Cgroup uses unsigned long for subsys bitops, not unsigned long long.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: fix memory leak in cgroup_get_sb()

opts.release_agent is not kfree()ed in all necessary places.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: clean up cgroup.h

- replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for
cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510)
- remove a duplicate declaration of cgroup_path()

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: fix comments

fix:
- comments about need_forkexit_callback
- comments about release agent
- typo and comment style, etc.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: fix and update documentation

Misc fixes and updates, make the doc consistent with current cgroup
implementation.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Solve section mismatch for free_area_init_core.

WARNING: vmlinux.o(.meminit.text+0x649):
Section mismatch in reference from the
function free_area_init_core() to the function .init.text:setup_usemap()
The function __meminit free_area_init_core() references
a function __init setup_usemap().
If free_area_init_core is only used by setup_usemap then
annotate free_area_init_core with a matching annotation.

The warning is covers this stack of functions in mm/page_alloc.c:

alloc_bootmem_node must be marked __init.
alloc_bootmem_node is used by setup_usemap, if !SPARSEMEM.
(usemap_size is only used by setup_usemap, if !SPARSEMEM.)
setup_usemap is only used by free_area_init_core.
free_area_init_core is only used by free_area_init_node.

free_area_init_node is used by:
arch/alpha/mm/numa.c: __init paging_init()
arch/arm/mm/init.c: __init bootmem_init_node()
arch/avr32/mm/init.c: __init paging_init()
arch/cris/arch-v10/mm/init.c: __init paging_init()
arch/cris/arch-v32/mm/init.c: __init paging_init()
arch/m32r/mm/discontig.c: __init zone_sizes_init()
arch/m32r/mm/init.c: __init zone_sizes_init()
arch/m68k/mm/motorola.c: __init paging_init()
arch/m68k/mm/sun3mmu.c: __init paging_init()
arch/mips/sgi-ip27/ip27-memory.c: __init paging_init()
arch/parisc/mm/init.c: __init paging_init()
arch/sparc/mm/srmmu.c: __init srmmu_paging_init()
arch/sparc/mm/sun4c.c: __init sun4c_paging_init()
arch/sparc64/mm/init.c: __init paging_init()
mm/page_alloc.c: __init free_area_init_nodes()
mm/page_alloc.c: __init free_area_init()
and
mm/memory_hotplug.c: hotadd_new_pgdat()

hotadd_new_pgdat can not be an __init function, but:

It is compiled for MEMORY_HOTPLUG configurations only
MEMORY_HOTPLUG depends on SPARSEMEM || X86_64_ACPI_NUMA
X86_64_ACPI_NUMA depends on X86_64
ARCH_FLATMEM_ENABLE depends on X86_32
ARCH_DISCONTIGMEM_ENABLE depends on X86_32
So X86_64_ACPI_NUMA implies SPARSEMEM, right?

So we can mark the stack of functions __init for !SPARSEMEM, but we must mark
them __meminit for SPARSEMEM configurations. This is ok, because then the
calls to alloc_bootmem_node are also avoided.

Compile-tested on:
silly minimal config
defconfig x86_32
defconfig x86_64
defconfig x86_64 -HIBERNATION +MEMORY_HOTPLUG

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Smack: update for file capabilities

Update the Smack LSM to allow the registration of the capability "module"
as a secondary LSM. Integrate the new hooks required for file based
capabilities.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Paul Moore <paul.moore@hp.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kprobes: refuse kprobe insertion on add/sub_preempt_counter()

Kprobes makes use of preempt_disable(),preempt_enable_noresched() and these
functions inturn call add/sub_preempt_count(). So we need to refuse user from
inserting probe in to these functions.

This patch disallows user from probing add/sub_preempt_count().

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup memory controller: document huge memory/cache overhead in Kconfig

Document huge memory/cache overhead of memory controller in Kconfig

I was a little surprised that 2.6.25-rc* increased struct page for the
memory controller. At least on many x86-64 machines it will not fit into a
single cache line now anymore and also costs considerable amounts of RAM.
At earlier review I remembered asking for a external data structure for
this.

It's also quite unobvious that a innocent looking Kconfig option with a
single line Kconfig description has such a negative effect.

This patch attempts to document these disadvantages at least so that users
configuring their kernel can make a informed decision.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel-doc: fix function-pointer-parameter parsing

When running "make htmldocs" I'm seeing some non-fatal perl errors caused
by trying to parse the callback function definitions in blk-core.c.

The errors are "Use of uninitialized value in concatenation (.)..."
in combination with:
Warning(linux-2.6.25-rc2/block/blk-core.c:1877): No description found for parameter ''

The function pointers are defined without a * i.e.
int (drv_callback)(struct request *)

The compiler is happy with them, but kernel-doc isn't.

This patch teaches create_parameterlist in kernel-doc to parse this type of
function pointer definition, but is it the right way to fix the problem ?
The problem only seems to occur in blk-core.c.

However with the patch applied, kernel-doc finds the correct parameter
description for the callback in blk_end_request_callback, which is doesn't
normally.

I thought it would be a bit odd to change to code to use the more normal
form of function pointers just to get the documentation to work, so I fixed
kernel-doc instead - even though this is teaching it to understand code
that might go away (The comment for blk_end_request_callback says that it
should not be used and will removed at some point).

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300: defconfig update

defconfig update.

Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300: IRQ handling update

- add missing file and declare.
- remove unused file and macros.
- some cleanup.

Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300: uaccess.h update

get_user const *ptr access fix.

Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>