err.no Git - linux-2.6/log

]> err.no Git - linux-2.6/log

projects / linux-2.6 / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Alexey Dobriyan [Fri, 8 Feb 2008 12:18:37 +0000 (04:18 -0800)]

proc: fix ->open'less usage due to ->proc_fops flip

Typical PDE creation code looks like:

pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;

Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
   possible to ->read without ->open (see one class of oopses below).

The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:

pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;

Fix most networking users for a start.

In the long run, create_proc_entry() for regular files will go.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom

Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
       00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
       00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric W. Biederman [Fri, 8 Feb 2008 12:18:35 +0000 (04:18 -0800)]

proc: fix the threaded /proc/self

Long ago when the CLONE_THREAD support first went it someone thought it
would be wise to point /proc/self at /proc/<tgid> instead of /proc/<pid>.

Given that /proc/<tgid> can return information about a very different task
(if enough things have been unshared) then our current process /proc/<tgid>
seems blatantly wrong. So far I have yet to think up an example where the
current behavior would be advantageous, and I can see several places where
it is seriously non-intuitive.

We may be stuck with the current broken behavior for backwards
compatibility reasons but lets try fixing our ancient bug for the 2.6.25
time frame and see if anyone screams.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Guillaume Chazarain" <guichaz@yahoo.fr>
Cc: "Pavel Emelyanov" <xemul@openvz.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric W. Biederman [Fri, 8 Feb 2008 12:18:34 +0000 (04:18 -0800)]

proc: proper pidns handling for /proc/self

Currently if you access a /proc that is not mounted with your processes
current pid namespace /proc/self will point at a completely random task.

This patch fixes /proc/self to point to the current process if it is
available in the particular mount of /proc or to return -ENOENT if the
current process is not visible.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric W. Biederman [Fri, 8 Feb 2008 12:18:33 +0000 (04:18 -0800)]

proc: seqfile convert proc_pid_status to properly handle pid namespaces

Currently we possibly lookup the pid in the wrong pid namespace. So
seq_file convert proc_pid_status which ensures the proper pid namespaces is
passed in.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: s390 build fix]
[akpm@linux-foundation.org: fix task_name() output]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric W. Biederman [Fri, 8 Feb 2008 12:18:32 +0000 (04:18 -0800)]

seqfile convert proc_pid_statm

This conversion is just for code cleanliness, uniformity, and general safety.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric W. Biederman [Fri, 8 Feb 2008 12:18:31 +0000 (04:18 -0800)]

proc: rewrite do_task_stat to correctly handle pid namespaces.

Currently (as pointed out by Oleg) do_task_stat has a race when calling
task_pid_nr_ns with the task exiting.  In addition do_task_stat is not
currently displaying information in the context of the pid namespace that
mounted the /proc filesystem.  So "cut -d' ' -f 1 /proc/<pid>/stat" may not
equal <pid>.

This patch fixes the problem by converting to a single_open seq_file show
method.  Getting the pid namespace from the filesystem superblock instead of
current, and simply using the the struct pid from the inode instead of
attempting to get that same pid from the task.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric W. Biederman [Fri, 8 Feb 2008 12:18:30 +0000 (04:18 -0800)]

proc: implement proc_single_file_operations

Currently many /proc/pid files use a crufty precursor to the current seq_file
api, and they don't have direct access to the pid_namespace or the pid of for
which they are displaying data.

So implement proc_single_file_operations to make the seq_file routines easy to
use, and to give access to the full state of the pid of we are displaying data
for.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Zhang Rui [Fri, 8 Feb 2008 12:18:29 +0000 (04:18 -0800)]

proc: detect duplicate names on registration

Print a warning if PDE is registered with a name which already exists in
target directory.

Bug report and a simple fix can be found here:
http://bugzilla.kernel.org/show_bug.cgi?id=8798

[\n fixlet and no undescriptive variable usage --adobriyan]
[akpm@linux-foundation.org: make printk comprehensible]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alexey Dobriyan [Fri, 8 Feb 2008 12:18:28 +0000 (04:18 -0800)]

proc: remove useless check on symlink removal

proc symlinks always have valid ->data containing destination of symlink. No
need to check it on removal -- proc_symlink() already done it.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alexey Dobriyan [Fri, 8 Feb 2008 12:18:27 +0000 (04:18 -0800)]

proc: simplify function prototypes

Move code around so as to reduce the number of forward-declarations.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alexey Dobriyan [Fri, 8 Feb 2008 12:18:27 +0000 (04:18 -0800)]

proc: less LOCK operations during lookup

Pseudo-code for lookup effectively is:

LOCK kernel
LOCK proc_subdir_lock
find PDE
UNLOCK proc_subdir_lock

get inode

LOCK proc_subdir_lock
goto unlock
UNLOCK proc_subdir_lock
UNLOCK kernel

We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
to unlock_kernel() directly.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alexey Dobriyan [Fri, 8 Feb 2008 12:18:26 +0000 (04:18 -0800)]

proc: remove MODULE_LICENSE

proc is not modular, so MODULE_LICENSE just expands to empty space. proc
without doubts remains GPLed.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Pavel Emelyanov [Fri, 8 Feb 2008 12:18:25 +0000 (04:18 -0800)]

namespaces: mark NET_NS with "depends on NAMESPACES"

There's already an option controlling the net namespaces cloning code, so make
it work the same way as all the other namespaces' options.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Pavel Emelyanov [Fri, 8 Feb 2008 12:18:24 +0000 (04:18 -0800)]

namespaces: cleanup the code managed with PID_NS option

Just like with the user namespaces, move the namespace management code into
the separate .c file and mark the (already existing) PID_NS option as "depend
on NAMESPACES"

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Pavel Emelyanov [Fri, 8 Feb 2008 12:18:23 +0000 (04:18 -0800)]

namespaces: cleanup the code managed with the USER_NS option

Make the user_namespace.o compilation depend on this option and move the
init_user_ns into user.c file to make the kernel compile and work without the
namespaces support. This make the user namespace code be organized similar to
other namespaces'.

Also mask the USER_NS option as "depend on NAMESPACES".

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Pavel Emelyanov [Fri, 8 Feb 2008 12:18:22 +0000 (04:18 -0800)]

namespaces: move the IPC namespace under IPC_NS option

Currently the IPC namespace management code is spread over the ipc/*.c files.
I moved this code into ipc/namespace.c file which is compiled out when needed.

The linux/ipc_namespace.h file is used to store the prototypes of the
functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
so, because the stub for copy_ipc_namespace requires the knowledge of the
CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
included into many many .c files via the sys.h->sem.h sequence so adding the
sched.h into it will make all these .c depend on sched.h which is not that
good.  On the other hand the knowledge about the namespaces stuff is required
in 4 .c files only.

Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
msg.c and shm.c files.  It turned out that moving these functions into
namespaces.c is not that easy because they use many other calls and macros
from the original file.  Moving them would make this patch complicated.  On
the other hand all these functions can be consolidated, so I will send a
separate patch doing this a bit later.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Pavel Emelyanov [Fri, 8 Feb 2008 12:18:21 +0000 (04:18 -0800)]

namespaces: move the UTS namespace under UTS_NS option

Currently all the namespace management code is in the kernel/utsname.c file,
so just compile it out and make stubs in the appropriate header.

The init namespace itself is in init/version.c and is in the kernel all the
time.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Pavel Emelyanov [Fri, 8 Feb 2008 12:18:19 +0000 (04:18 -0800)]

namespaces: add the NAMESPACES config option

The option is selectable if EMBEDDED is chosen only. When the EMBEDDED is off
namespaces will be on.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Nishanth Aravamudan [Fri, 8 Feb 2008 12:18:18 +0000 (04:18 -0800)]

hugetlb: add locking for overcommit sysctl

When I replaced hugetlb_dynamic_pool with nr_overcommit_hugepages I used
proc_doulongvec_minmax() directly.  However, hugetlb.c's locking rules
require that all counter modifications occur under the hugetlb_lock.  Add a
callback into the hugetlb code similar to the one for nr_hugepages.  Grab
the lock around the manipulation of nr_overcommit_hugepages in
proc_doulongvec_minmax().

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Ulisses Furquim [Fri, 8 Feb 2008 12:18:16 +0000 (04:18 -0800)]

inotify: fix check for one-shot watches before destroying them

As the IN_ONESHOT bit is never set when an event is sent we must check it
in the watch's mask and not in the event's mask.

Signed-off-by: Ulisses Furquim <ulissesf@gmail.com>
Reported-by: "Clem Taylor" <clem.taylor@gmail.com>
Tested-by: "Clem Taylor" <clem.taylor@gmail.com>
Cc: Amy Griffis <amy.griffis@hp.com>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Fri, 8 Feb 2008 03:30:50 +0000 (19:30 -0800)]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (44 commits)
  dm raid1: report fault status
  dm raid1: handle read failures
  dm raid1: fix EIO after log failure
  dm raid1: handle recovery failures
  dm raid1: handle write failures
  dm snapshot: combine consecutive exceptions in memory
  dm: stripe enhanced status return
  dm: stripe trigger event on failure
  dm log: auto load modules
  dm: move deferred bio flushing to workqueue
  dm crypt: use async crypto
  dm crypt: prepare async callback fn
  dm crypt: add completion for async
  dm crypt: add async request mempool
  dm crypt: extract scatterlist processing
  dm crypt: tidy io ref counting
  dm crypt: introduce crypt_write_io_loop
  dm crypt: abstract crypt_write_done
  dm crypt: store sector mapping in dm_crypt_io
  dm crypt: move queue functions
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 8 Feb 2008 03:15:38 +0000 (19:15 -0800)]

Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6

* 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (59 commits)
  hwmon: (lm80) Add individual alarm files
  hwmon: (lm80) De-macro the sysfs callbacks
  hwmon: (lm80) Various cleanups
  hwmon: (w83627hf) Refactor beep enable handling
  hwmon: (w83627hf) Add individual alarm and beep files
  hwmon: (w83627hf) Enable VBAT monitoring
  hwmon: (w83627ehf) The W83627DHG has 8 VID pins
  hwmon: (asb100) Add individual alarm files
  hwmon: (asb100) De-macro the sysfs callbacks
  hwmon: (asb100) Various cleanups
  hwmon: VRM is not written to registers
  hwmon: (dme1737) fix Super-IO device ID override
  hwmon: (dme1737) fix divide-by-0
  hwmon: (abituguru3) Add AUX4 fan input for Abit IP35 Pro
  hwmon: Add support for Texas Instruments/Burr-Brown ADS7828
  hwmon: (adm9240) Add individual alarm files
  hwmon: (lm77) Add individual alarm files
  hwmon: Discard useless I2C driver IDs
  hwmon: (lm85) Make the pwmN_enable files writable
  hwmon: (lm85) Return standard values in pwmN_enable
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 8 Feb 2008 03:12:12 +0000 (19:12 -0800)]

Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (62 commits)
  [XFS] add __init/__exit mark to specific init/cleanup functions
  [XFS] Fix oops in xfs_file_readdir()
  [XFS] kill xfs_root
  [XFS] keep i_nlink updated and use proper accessors
  [XFS] stop updating inode->i_blocks
  [XFS] Make xfs_ail_check check less by default
  [XFS] Move AIL pushing into it's own thread
  [XFS] use generic_permission
  [XFS] stop re-checking permissions in xfs_swapext
  [XFS] clean up xfs_swapext
  [XFS] remove permission check from xfs_change_file_space
  [XFS] prevent panic during log recovery due to bogus op_hdr length
  [XFS] Cleanup various fid related bits:
  [XFS] Fix xfs_lowbit64
  [XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros.
  [XFS] kill superflous buffer locking (2nd attempt)
  [XFS] Use kernel-supplied "roundup_pow_of_two" for simplicity
  [XFS] Remove the BPCSHIFT and NB* based macros from XFS.
  [XFS] Remove bogus assert
  [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config
  ...

commit | commitdiff | tree

Nick Piggin [Fri, 8 Feb 2008 02:46:06 +0000 (18:46 -0800)]

Convert SG from nopage to fault.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Fri, 8 Feb 2008 02:22:29 +0000 (18:22 -0800)]

Merge branch 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm

* 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
  SLUB: fix checkpatch warnings
  Use non atomic unlock
  SLUB: Support for performance statistics
  SLUB: Alternate fast paths using cmpxchg_local
  SLUB: Use unique end pointer for each slab page.
  SLUB: Deal with annoying gcc warning on kfree()

commit | commitdiff | tree

Jonathan Brassow [Fri, 8 Feb 2008 02:11:39 +0000 (02:11 +0000)]

dm raid1: report fault status

This patch adds extra information to the mirror status output, so that
it can be determined which device(s) have failed.  For each mirror device,
a character is printed indicating the most severe error encountered.  The
characters are:
*    A => Alive - No failures
*    D => Dead - A write failure occurred leaving mirror out-of-sync
*    S => Sync - A sychronization failure occurred, mirror out-of-sync
*    R => Read - A read failure occurred, mirror data unaffected
This allows userspace to properly reconfigure the mirror set.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Jonathan Brassow [Fri, 8 Feb 2008 02:11:37 +0000 (02:11 +0000)]

dm raid1: handle read failures

This patch gives the ability to respond-to/record device failures
that happen during read operations.  It also adds the ability to
read from mirror devices that are not the primary if they are
in-sync.

There are essentially two read paths in mirroring; the direct path
and the queued path.  When a read request is mapped, if the region
is 'in-sync' the direct path is taken; otherwise the queued path
is taken.

If the direct path is taken, we must record bio information so that
if the read fails we can retry it.  We then discover the status of
a direct read through mirror_end_io.  If the read has failed, we will
mark the device from which the read was attempted as failed (so we
don't try to read from it again), restore the bio and try again.

If the queued path is taken, we discover the results of the read
from 'read_callback'.  If the device failed, we will mark the device
as failed and attempt the read again if there is another device
where this region is known to be 'in-sync'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Jonathan Brassow [Fri, 8 Feb 2008 02:11:35 +0000 (02:11 +0000)]

dm raid1: fix EIO after log failure

This patch adds the ability to requeue write I/O to
core device-mapper when there is a log device failure.

If a write to the log produces and error, the pending writes are
put on the "failures" list.  Since the log is marked as failed,
they will stay on the failures list until a suspend happens.

Suspends come in two phases, presuspend and postsuspend.  We must
make sure that all the writes on the failures list are requeued
in the presuspend phase (a requirement of dm core).  This means
that recovery must be complete (because writes may be delayed
behind it) and the failures list must be requeued before we
return from presuspend.

The mechanisms to ensure recovery is complete (or stopped) was
already in place, but needed to be moved from postsuspend to
presuspend.  We rely on 'flush_workqueue' to ensure that the
mirror thread is complete and therefore, has requeued all writes
in the failures list.

Because we are using flush_workqueue, we must ensure that no
additional 'queue_work' calls will produce additional I/O
that we need to requeue (because once we return from
presuspend, we are unable to do anything about it).  'queue_work'
is called in response to the following functions:
- complete_resync_work = NA, recovery is stopped
- rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                           is ready to recover the region
                           (recovery is stopped) or it needs
                           to clear the region in the log*
                           **this doesn't get called while
                           suspending**
- rh_recovery_end = NA, recovery is stopped
- rh_recovery_start = NA, recovery is stopped
- write_callback = 1) Writes w/o failures simply call
                   bio_endio -> mirror_end_io -> rh_dec
                   (see rh_dec above)
                   2) Writes with failures are put on
                   the failures list and queue_work is
                   called**
                   ** write_callbacks don't happen
                   during suspend **
- do_failures = NA, 'queue_work' not called if suspending
- add_mirror (initialization) = NA, only done on mirror creation
- queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
              is called.  2) No more I/Os are being issued.
              3) Re-attempted READs can still be handled.
              (Write completions are handled through rh_dec/
              write_callback - mention above - and do not
              use queue_bio.)

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Jonathan Brassow [Fri, 8 Feb 2008 02:11:32 +0000 (02:11 +0000)]

dm raid1: handle recovery failures

This patch adds the calls to 'fail_mirror' if an error occurs during
mirror recovery (aka resynchronization). 'fail_mirror' is responsible
for recording the type of error by mirror device and ensuring an event
gets raised for the purpose of notifying userspace.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Jonathan Brassow [Fri, 8 Feb 2008 02:11:29 +0000 (02:11 +0000)]

dm raid1: handle write failures

This patch gives mirror the ability to handle device failures
during normal write operations.

The 'write_callback' function is called when a write completes.
If all the writes failed or succeeded, we report failure or
success respectively.  If some of the writes failed, we call
fail_mirror; which increments the error count for the device, notes
the type of error encountered (DM_RAID1_WRITE_ERROR),  and
selects a new primary (if necessary).  Note that the primary
device can never change while the mirror is not in-sync (IOW,
while recovery is happening.)  This means that the scenario
where a failed write changes the primary and gives
recovery_complete a chance to misread the primary never happens.
The fact that the primary can change has necessitated the change
to the default_mirror field.  We need to protect against reading
garbage while the primary changes.  We then add the bio to a new
list in the mirror set, 'failures'.  For every bio in the 'failures'
list, we call a new function, '__bio_mark_nosync', where we mark
the region 'not-in-sync' in the log and properly set the region
state as, RH_NOSYNC.  Userspace must also be notified of the
failure.  This is done by 'raising an event' (dm_table_event()).
If fail_mirror is called in process context the event can be raised
right away.  If in interrupt context, the event is deferred to the
kmirrord thread - which raises the event if 'event_waiting' is set.

Backwards compatibility is maintained by ignoring errors if
the DM_FEATURES_HANDLE_ERRORS flag is not present.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:27 +0000 (02:11 +0000)]

dm snapshot: combine consecutive exceptions in memory

Provided sector_t is 64 bits, reduce the in-memory footprint of the
snapshot exception table by the simple method of using unused bits of
the chunk number to combine consecutive entries.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Brian Wood [Fri, 8 Feb 2008 02:11:24 +0000 (02:11 +0000)]

dm: stripe enhanced status return

This patch adds additional information to the status line. It is added at the
end of the returned text so it will not interfere with existing
implementations using this data. The addition of this information will allow
for a common return interface to match that returned with the dm-raid1.c
status line (with Jonathan Brassow's patches).

Here is a sample of what is returned with a mirror "status" call:
isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core

Here's what's returned with this patch for a stripe "status" call:
isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA

Signed-off-by: Brian Wood <brian.j.wood@intel.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Brian Wood [Fri, 8 Feb 2008 02:11:22 +0000 (02:11 +0000)]

dm: stripe trigger event on failure

This patch adds the stripe_end_io function to process errors that might
occur after an IO operation. As part of this there are a number of
enhancements made to record and trigger events:

- New atomic variable in struct stripe to record the number of
errors each stripe volume device has experienced (could be used
later with uevents to report back directly to userspace)

- New workqueue/work struct setup to process the trigger_event function

- New end_io function. It is here that testing for BIO error conditions
take place. It determines the exact stripe that cause the error,
records this in the new atomic variable, and calls the queue_work() function

- New trigger_event function to process failure events. This
calls dm_table_event()

Signed-off-by: Brian Wood <brian.j.wood@intel.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Jonathan Brassow [Fri, 8 Feb 2008 02:11:19 +0000 (02:11 +0000)]

dm log: auto load modules

If the log type is not recognised, attempt to load the module
'dm-log-<type>.ko'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:17 +0000 (02:11 +0000)]

dm: move deferred bio flushing to workqueue

Add a single-thread workqueue for each mapped device
and move flushing of the lists of pushback and deferred bios
to this new workqueue.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:14 +0000 (02:11 +0000)]

dm crypt: use async crypto

dm-crypt: Use crypto ablkcipher interface

Move encrypt/decrypt core to async crypto call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:12 +0000 (02:11 +0000)]

dm crypt: prepare async callback fn

dm-crypt: Use crypto ablkcipher interface

Prepare callback function for async crypto operation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:09 +0000 (02:11 +0000)]

dm crypt: add completion for async

dm-crypt: Use crypto ablkcipher interface
Prepare completion for async crypto request.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:07 +0000 (02:11 +0000)]

dm crypt: add async request mempool

dm-crypt: Use crypto ablkcipher interface

Introduce mempool for async crypto requests.

cc->req is used mainly during synchronous operations
(to prevent allocation and deallocation of the same object).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:04 +0000 (02:11 +0000)]

dm crypt: extract scatterlist processing

dm-crypt: Use crypto ablkcipher interface

Move scatterlists to separate dm_crypt_struct and
pick out block processing from crypt_convert.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:11:02 +0000 (02:11 +0000)]

dm crypt: tidy io ref counting

Make io reference counting more obvious.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:59 +0000 (02:10 +0000)]

dm crypt: introduce crypt_write_io_loop

Introduce crypt_write_io_loop().

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:57 +0000 (02:10 +0000)]

dm crypt: abstract crypt_write_done

Process write request in separate function and queue
final bio through io workqueue.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:54 +0000 (02:10 +0000)]

dm crypt: store sector mapping in dm_crypt_io

Add sector into dm_crypt_io instead of using local variable.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Alasdair G Kergon [Fri, 8 Feb 2008 02:10:52 +0000 (02:10 +0000)]

dm crypt: move queue functions

Reorder kcryptd functions for clarity.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:49 +0000 (02:10 +0000)]

dm crypt: adjust io processing functions

Rename functions to follow calling convention.
Prepare write io error processing function skeleton.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:46 +0000 (02:10 +0000)]

dm crypt: tidy crypt_endio

Simplify crypt_endio function.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:43 +0000 (02:10 +0000)]

dm crypt: move error setting outside crypt_dec_pending

Move error code setting outside of crypt_dec_pending function.
Use -EIO if crypt_convert_scatterlist() fails.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:41 +0000 (02:10 +0000)]

dm crypt: remove unnecessary crypt_context write parm

Remove write attribute from convert_context and use bio flag instead.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:38 +0000 (02:10 +0000)]

dm crypt: move convert_context inside dm_crypt_io

Move convert_context inside dm_crypt_io.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Alasdair G Kergon [Fri, 8 Feb 2008 02:10:35 +0000 (02:10 +0000)]

dm mpath: add missing static

A static declaration missing.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Alasdair G Kergon [Fri, 8 Feb 2008 02:10:32 +0000 (02:10 +0000)]

dm: targets no longer experimental

Drop the EXPERIMENTAL tag from well-established device-mapper targets, so
the newer ones stand out better.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:30 +0000 (02:10 +0000)]

dm: refactor dm_suspend completion wait

Move completion wait to separate function

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:27 +0000 (02:10 +0000)]

dm: split dm_suspend io_lock hold into two

Change io_locking to allow processing flush in separate thread.

Because we have DMF_BLOCK_IO already set, any possible
new ios are queued in dm_requests now.

In the case of interrupting previous wait there can be more
ios queued (we unlocked io_lock for a while) but this is safe.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:25 +0000 (02:10 +0000)]

dm: tidy dm_suspend

Tidy dm_suspend function

- change return value logic in dm_suspend
- use atomic_read only once.
- move DMF_BLOCK_IO clearing into one place

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:22 +0000 (02:10 +0000)]

dm: refactor deferred bio_list processing

Refactor deferred bio_list processing.

- use separate _merge_pushback_list function
- move deferred bio list pick up to flush function
- use bio_list_pop instead of bio_list_get
- simplify noflush flag use

No real functional change in this patch.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:10:19 +0000 (02:10 +0000)]

dm: tidy alloc_dev labels

Tidy labels in alloc_dev to make later patches more clear.

No functional change in this patch.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Andrew Morton [Fri, 8 Feb 2008 02:10:16 +0000 (02:10 +0000)]

dm ioctl: use uninitialized_var

drivers/md/dm-ioctl.c:1405: warning: 'param' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Andrew Morton [Fri, 8 Feb 2008 02:10:14 +0000 (02:10 +0000)]

dm: table use uninitialized_var

drivers/md/dm-table.c: In function 'dm_get_device':
drivers/md/dm-table.c:478: warning: 'dev' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Andrew Morton [Fri, 8 Feb 2008 02:10:11 +0000 (02:10 +0000)]

dm snapshot: use uninitialized_var

drivers/md/dm-exception-store.c: In function 'persistent_read_metadata':
drivers/md/dm-exception-store.c:452: warning: 'new_snapshot' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Daniel Walker [Fri, 8 Feb 2008 02:10:08 +0000 (02:10 +0000)]

dm: convert suspend_lock semaphore to mutex

Replace semaphore with mutex.

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Robert P. J. Day [Fri, 8 Feb 2008 02:10:06 +0000 (02:10 +0000)]

dm snapshot: use rounddown_pow_of_two

Since the source file already includes the log2.h header file, it
seems pointless to re-invent the necessary routine.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Jun'ichi Nomura [Fri, 8 Feb 2008 02:10:04 +0000 (02:10 +0000)]

dm: table remove unused total

"total = 0" does nothing.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Vasily Averin [Fri, 8 Feb 2008 02:10:01 +0000 (02:10 +0000)]

dm: table remove unused variable

Save some bytes.

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Paul Jimenez [Fri, 8 Feb 2008 02:09:59 +0000 (02:09 +0000)]

dm: table use list_for_each

This patch is some minor janitorish cleanup, using some macros
from linux/list.h (already #included via dm.h) to improve
readability.

Signed-off-by: Paul Jimenez <pj@place.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:09:56 +0000 (02:09 +0000)]

dm ioctl: move compat code

Move compat_ioctl handling into dm-ioctl.c.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Alasdair G Kergon [Fri, 8 Feb 2008 02:09:53 +0000 (02:09 +0000)]

dm ioctl: remove lock_kernel

Remove lock_kernel() from the device-mapper ioctls - there should
be sufficient internal locking already where required.

Also remove some superfluous casts.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Alasdair G Kergon [Fri, 8 Feb 2008 02:09:51 +0000 (02:09 +0000)]

dm: mark function lists static

Add a couple of statics.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Milan Broz [Fri, 8 Feb 2008 02:09:49 +0000 (02:09 +0000)]

dm: add missing memory barrier to dm_suspend

Add memory barrier to fix atomic_read of pending value.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Ingo Molnar [Wed, 6 Feb 2008 01:57:39 +0000 (17:57 -0800)]

SLUB: fix checkpatch warnings

fix checkpatch --file mm/slub.c errors and warnings.

$ q-code-quality-compare
                                      errors   lines of code   errors/KLOC
mm/slub.c      [before]                  22            4204           5.2
mm/slub.c      [after]                    0            4210             0

no code changed:

    text    data     bss     dec     hex filename
   22195    8634     136   30965    78f5 slub.o.before
   22195    8634     136   30965    78f5 slub.o.after

   md5:
     93cdfbec2d6450622163c590e1064358  slub.o.before.asm
     93cdfbec2d6450622163c590e1064358  slub.o.after.asm

[clameter: rediffed against Pekka's cleanup patch, omitted
moves of the name of a function to the start of line]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

commit | commitdiff | tree

Nick Piggin [Tue, 8 Jan 2008 07:20:27 +0000 (23:20 -0800)]

Use non atomic unlock

Slub can use the non-atomic version to unlock because other flags will not
get modified with the lock held.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Christoph Lameter [Fri, 8 Feb 2008 01:47:41 +0000 (17:47 -0800)]

SLUB: Support for performance statistics

The statistics provided here allow the monitoring of allocator behavior but
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure. The per cpu structure may be extended by the
statistics to grow larger than one cacheline which will increase the cache
footprint of SLUB.

There is a compile option to enable/disable the inclusion of the runtime
statistics and its off by default.

The slabinfo tool is enhanced to support these statistics via two options:

-D Switches the line of information displayed for a slab from size
mode to activity mode.

-A Sorts the slabs displayed by activity. This allows the display of
the slabs most important to the performance of a certain load.

-r Report option will report detailed statistics on

Example (tbench load):

slabinfo -AD ->Shows the most active slabs

Name                   Objects    Alloc     Free   %Fast
skbuff_fclone_cache         33 111953835 111953835  99  99
:0000192                  2666  5283688  5281047  99  99
:0001024                   849  5247230  5246389  83  83
vm_area_struct            1349   119642   118355  91  22
:0004096                    15    66753    66751  98  98
:0000064                  2067    25297    23383  98  78
dentry                   10259    28635    18464  91  45
:0000080                 11004    18950     8089  98  98
:0000096                  1703    12358    10784  99  98
:0000128                   762    10582     9875  94  18
:0000512                   184     9807     9647  95  81
:0002048                   479     9669     9195  83  65
anon_vma                   777     9461     9002  99  71
kmalloc-8                 6492     9981     5624  99  97
:0000768                   258     7174     6931  58  15

So the skbuff_fclone_cache is of highest importance for the tbench load.
Pretty high load on the 192 sized slab. Look for the aliases

slabinfo -a | grep 000192
:0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili

Likely skbuff_head_cache.

Looking into the statistics of the skbuff_fclone_cache is possible through

slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned

.... Usual output ...

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath             111953360 111946981  99  99
Slowpath                 1044     7423   0   0
Page Alloc                272      264   0   0
Add partial                25      325   0   0
Remove partial             86      264   0   0
RemoteObj/SlabFrozen      350     4832   0   0
Total                111954404 111954404

Flushes       49 Refill        0
Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)

Looks good because the fastpath is overwhelmingly taken.

skbuff_head_cache:

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath              5297262  5259882  99  99
Slowpath                 4477    39586   0   0
Page Alloc                937      824   0   0
Add partial                 0     2515   0   0
Remove partial           1691      824   0   0
RemoteObj/SlabFrozen     2621     9684   0   0
Total                 5301739  5299468

Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)

Descriptions of the output:

Total: The total number of allocation and frees that occurred for a
slab

Fastpath: The number of allocations/frees that used the fastpath.

Slowpath: Other allocations

Page Alloc: Number of calls to the page allocator as a result of slowpath
processing

Add Partial: Number of slabs added to the partial list through free or
alloc (occurs during cpuslab flushes)

Remove Partial: Number of slabs removed from the partial list as a result of
allocations retrieving a partial slab or by a free freeing
the last object of a slab.

RemoteObj/Froz: How many times were remotely freed object encountered when a
slab was about to be deactivated. Frozen: How many times was
free able to skip list processing because the slab was in use
as the cpuslab of another processor.

Flushes: Number of times the cpuslab was flushed on request
(kmem_cache_shrink, may result from races in __slab_alloc)

Refill: Number of times we were able to refill the cpuslab from
remotely freed objects for the same slab.

Deactivate: Statistics how slabs were deactivated. Shows how they were
put onto the partial list.

In general fastpath is very good. Slowpath without partial list processing is
also desirable. Any touching of partial list uses node specific locks which
may potentially cause list lock contention.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

commit | commitdiff | tree

Christoph Lameter [Tue, 8 Jan 2008 07:20:30 +0000 (23:20 -0800)]

SLUB: Alternate fast paths using cmpxchg_local

Provide an alternate implementation of the SLUB fast paths for alloc
and free using cmpxchg_local. The cmpxchg_local fast path is selected
for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only
set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an
interrupt enable/disable sequence. This is known to be true for both
x86 platforms so set FAST_CMPXCHG_LOCAL for both arches.

Currently another requirement for the fastpath is that the kernel is
compiled without preemption. The restriction will go away with the
introduction of a new per cpu allocator and new per cpu operations.

The advantages of a cmpxchg_local based fast path are:

1. Potentially lower cycle count (30%-60% faster)

2. There is no need to disable and enable interrupts on the fast path.
   Currently interrupts have to be disabled and enabled on every
   slab operation. This is likely avoiding a significant percentage
   of interrupt off / on sequences in the kernel.

3. The disposal of freed slabs can occur with interrupts enabled.

The alternate path is realized using #ifdef's. Several attempts to do the
same with macros and inline functions resulted in a mess (in particular due
to the strange way that local_interrupt_save() handles its argument and due
to the need to define macros/functions that sometimes disable interrupts
and sometimes do something else).

[clameter: Stripped preempt bits and disabled fastpath if preempt is enabled]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Christoph Lameter [Tue, 8 Jan 2008 07:20:29 +0000 (23:20 -0800)]

SLUB: Use unique end pointer for each slab page.

We use a NULL pointer on freelists to signal that there are no more objects.
However the NULL pointers of all slabs match in contrast to the pointers to
the real objects which are in different ranges for different slab pages.

Change the end pointer to be a pointer to the first object and set bit 0.
Every slab will then have a different end pointer. This is necessary to ensure
that end markers can be matched to the source slab during cmpxchg_local.

Bring back the use of the mapping field by SLUB since we would otherwise have
to call a relatively expensive function page_address() in __slab_alloc(). Use
of the mapping field allows avoiding a call to page_address() in various other
functions as well.

There is no need to change the page_mapping() function since bit 0 is set on
the mapping as also for anonymous pages. page_mapping(slab_page) will
therefore still return NULL although the mapping field is overloaded.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Christoph Lameter [Fri, 8 Feb 2008 01:47:41 +0000 (17:47 -0800)]

SLUB: Deal with annoying gcc warning on kfree()

gcc 4.2 spits out an annoying warning if one casts a const void *
pointer to a void * pointer. No warning is generated if the
conversion is done through an assignment.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

commit | commitdiff | tree

Jean Delvare [Sat, 5 Jan 2008 14:40:38 +0000 (15:40 +0100)]

hwmon: (lm80) Add individual alarm files

The new libsensors needs these individual alarm files.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sat, 5 Jan 2008 14:37:05 +0000 (15:37 +0100)]

hwmon: (lm80) De-macro the sysfs callbacks

Use standard dynamic sysfs callbacks instead of macro-generated
functions. This makes the code more readable, and the binary smaller
(by about 34%).

As a side note, another benefit of this type of cleanup is that they
shrink the build time. For example, this cleanup saves about 29% of
the lm80 driver build time.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sat, 5 Jan 2008 14:35:09 +0000 (15:35 +0100)]

hwmon: (lm80) Various cleanups

* Drop trailing whitespace
* Fold a long line
* Rename new_client to client
* Drop redundant initializations to 0
* Drop bogus comment

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 22:04:55 +0000 (23:04 +0100)]

hwmon: (w83627hf) Refactor beep enable handling

We can handle the beep enable bit as any other beep mask bit for
slightly smaller code.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 22:00:30 +0000 (23:00 +0100)]

hwmon: (w83627hf) Add individual alarm and beep files

The new libsensors needs these individual alarm and beep files. The
code was copied from the w83781d driver. I've tested the alarm files
on a W83627THF. I couldn't test the beep files as the system in
question doesn't have a speaker.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 21:54:13 +0000 (22:54 +0100)]

hwmon: (w83627hf) Enable VBAT monitoring

If VBAT monitoring is disabled, enable it. Bug reported on the
lm-sensors trac system:
http://lm-sensors.org/ticket/2282
This is the exact same patch that was applied to the w83627ehf driver
6 months ago.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 20:22:44 +0000 (21:22 +0100)]

hwmon: (w83627ehf) The W83627DHG has 8 VID pins

While the W83627EHF/EHG has only 6 VID pins, the W83627DHG has 8 VID
pins, to support VRD 11.0. Add support for this.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 22:24:24 +0000 (23:24 +0100)]

hwmon: (asb100) Add individual alarm files

The new libsensors needs these individual alarm files.

I did not create alarm files for in5 and in6 as these alarms are documented
as not working.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 22:21:07 +0000 (23:21 +0100)]

hwmon: (asb100) De-macro the sysfs callbacks

Use standard dynamic sysfs callbacks instead of macro-generated
wrappers. This makes the code more readable, and the binary smaller
(by about 12%).

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 22:15:49 +0000 (23:15 +0100)]

hwmon: (asb100) Various cleanups

* Drop history, it's incomplete and doesn't belong there
* Drop unused version number
* Drop trailing spaces
* Coding style fixes
* Fold long lines
* Rename new_client to client
* Drop redundant initializations to 0

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sat, 1 Dec 2007 10:25:33 +0000 (11:25 +0100)]

hwmon: VRM is not written to registers

What was true of reading the VRM value is also true of writing it: not
being a register value, it doesn't need hardware access, so we don't
need a reference to the i2c client. This allows for a minor code
cleanup. As gcc appears to be smart enough to simplify the generated
code by itself, this cleanup only affects the source code, the
generated binaries are unchanged.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Juerg Haefliger [Sat, 26 Jan 2008 16:54:24 +0000 (08:54 -0800)]

hwmon: (dme1737) fix Super-IO device ID override

The dme1737 has a second place where the Super-IO device ID is
checked. This has been missed by Jean's initial patch that adds
support for user-controlled Super-IO device ID override. This patch
fixes this issue.

Signed-off-by: Juerg Haefliger <juergh at gmail.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Juerg Haefliger [Mon, 28 Jan 2008 00:39:46 +0000 (16:39 -0800)]

hwmon: (dme1737) fix divide-by-0

This patch fixes a possible divide-by-0 and a minor bug in the
FAN_FROM_REG macro (in TPC mode).

Signed-off-by: Juerg Haefliger <juergh at gmail.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Sergey Vlasov [Tue, 15 Jan 2008 18:57:44 +0000 (21:57 +0300)]

hwmon: (abituguru3) Add AUX4 fan input for Abit IP35 Pro

Abit IP35 Pro has 6 fan connectors (CPU, SYS and AUX1-4), but the
entry for AUX4 was missing from the table.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Steve Hardy [Tue, 22 Jan 2008 23:00:02 +0000 (23:00 +0000)]

hwmon: Add support for Texas Instruments/Burr-Brown ADS7828

Signed-off-by: Steve Hardy <steve@linuxrealtime.co.uk>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sun, 6 Jan 2008 14:49:19 +0000 (15:49 +0100)]

hwmon: (adm9240) Add individual alarm files

The new libsensors needs these individual alarm files.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Grant Coady <gcoady.lk@gmail.com>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 22:35:33 +0000 (23:35 +0100)]

hwmon: (lm77) Add individual alarm files

The new libsensors needs this. As the old library never had support for
the lm77 driver, I even dropped the legacy "alarms" file.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Thu, 3 Jan 2008 18:44:09 +0000 (19:44 +0100)]

hwmon: Discard useless I2C driver IDs

Many I2C hwmon drivers define a driver ID but no other code references
these, meaning that they are useless. Discard them, along with a few
IDs which are defined but never used at all.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Mon, 3 Dec 2007 22:28:42 +0000 (23:28 +0100)]

hwmon: (lm85) Make the pwmN_enable files writable

Make the pwmN_enable files writable. This makes it possible to use
standard fan speed control tools (pwmconfig, fancontrol) with the lm85
driver.

I left the non-standard pwmN_auto_channels files in place, as they
give additional control for the automatic mode, and some users might
be used to them by now.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Mon, 3 Dec 2007 22:23:21 +0000 (23:23 +0100)]

hwmon: (lm85) Return standard values in pwmN_enable

The values returned by the lm85 driver in pwmN_enable sysfs files do
not match the standard. Fix that.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sun, 2 Dec 2007 22:42:24 +0000 (23:42 +0100)]

hwmon: (adm1031) Add individual alarm and fault files

The new libsensors needs these.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sun, 2 Dec 2007 22:39:38 +0000 (23:39 +0100)]

hwmon: (adm1031) Get rid of macro-generated wrappers

Use the standard dynamic sysfs callbacks instead of macro-generated
wrappers. It makes the code more simple and the binary smaller (-8% on
my system.)

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sun, 2 Dec 2007 22:33:57 +0000 (23:33 +0100)]

hwmon: (adm1031) Various cleanups

* Rename new_client to client
* Drop redundant initializations to 0
* Drop trailing space
* Other whitespace cleanups
* Split/fold a few long lines
* Constify static data
* Optimizations in set_fan_div()

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Sun, 2 Dec 2007 22:32:42 +0000 (23:32 +0100)]

hwmon: (adm1031) Fix register overwrite in set_fan_div()

Don't rely on the register cache when setting a new fan clock divider.
For one thing, the cache might not have been initialized at all if the
driver has just been loaded. For another, the cached values may be old
and you never know what can happen in the driver's back.

Also invalidate the cache instead of trying to adjust the measured fan
speed: the whole point of changing the clock divider is to get a better
reading.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

commit | commitdiff | tree

Jean Delvare [Fri, 14 Dec 2007 13:41:53 +0000 (14:41 +0100)]

hwmon: (it87) Delete pwmN_freq files on driver removal

In commit f8d0c19a93cea3a26a90f2462295e1e01a4cd250 I forgot to delete
the pwmN_freq files on driver removal, here's the fix.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Riku Voipio <riku.voipio@movial.fi>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>

Linux 2.6 source tree