Robert P. J. Day [Tue, 29 Apr 2008 08:02:52 +0000 (01:02 -0700)]
nbd: delete superfluous test for __GNUC__
Since <linux/compiler.h> already tests for __GNUC__, there's no point in nbd.h
repeating that test.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Permit the use of partitions with network block devices (NBD).
A new parameter is introduced to define how many partition we want to be able
to manage per network block device. This parameter is "max_part".
For instance, to manage 63 partitions / loop device, we will do:
[on the server side]
# nbd-server 1234 /dev/sdb
[on the client side]
# modprobe nbd max_part=63
# ls -l /dev/nbd*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:14 /dev/nbd0
brw-rw---- 1 root disk 43, 64 2008-03-25 11:11 /dev/nbd1
brw-rw---- 1 root disk 43, 640 2008-03-25 11:11 /dev/nbd10
brw-rw---- 1 root disk 43, 704 2008-03-25 11:11 /dev/nbd11
brw-rw---- 1 root disk 43, 768 2008-03-25 11:11 /dev/nbd12
brw-rw---- 1 root disk 43, 832 2008-03-25 11:11 /dev/nbd13
brw-rw---- 1 root disk 43, 896 2008-03-25 11:11 /dev/nbd14
brw-rw---- 1 root disk 43, 960 2008-03-25 11:11 /dev/nbd15
brw-rw---- 1 root disk 43, 128 2008-03-25 11:11 /dev/nbd2
brw-rw---- 1 root disk 43, 192 2008-03-25 11:11 /dev/nbd3
brw-rw---- 1 root disk 43, 256 2008-03-25 11:11 /dev/nbd4
brw-rw---- 1 root disk 43, 320 2008-03-25 11:11 /dev/nbd5
brw-rw---- 1 root disk 43, 384 2008-03-25 11:11 /dev/nbd6
brw-rw---- 1 root disk 43, 448 2008-03-25 11:11 /dev/nbd7
brw-rw---- 1 root disk 43, 512 2008-03-25 11:11 /dev/nbd8
brw-rw---- 1 root disk 43, 576 2008-03-25 11:11 /dev/nbd9
# nbd-client localhost 1234 /dev/nbd0
Negotiation: ..size = 80418240KB
bs=1024, sz=80418240
-------NOTE, RFC: partition table is not automatically read.
The driver sets bdev->bd_invalidated to 1 to force the read of the partition
table of the device, but this is done only on an open of the device.
So we have to do a "touch /dev/nbdX" or something like that.
It can't be done from the nbd-client or nbd driver because at this
level we can't ask to read the partition table and to serve the request
at the same time (-> deadlock)
If someone has a better idea, I'm open to any suggestion.
-------NOTE, RFC
# fdisk -l /dev/nbd0
Disk /dev/nbd0: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/nbd0p1 * 1 9965 80043831 83 Linux
/dev/nbd0p2 9966 10011 369495 5 Extended
/dev/nbd0p5 9966 10011 369463+ 82 Linux swap / Solaris
# ls -l /dev/nbd0*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:16 /dev/nbd0
brw-rw---- 1 root disk 43, 1 2008-03-25 11:16 /dev/nbd0p1
brw-rw---- 1 root disk 43, 2 2008-03-25 11:16 /dev/nbd0p2
brw-rw---- 1 root disk 43, 5 2008-03-25 11:16 /dev/nbd0p5
# mount /dev/nbd0p1 /mnt
# ls /mnt
bin dev initrd lost+found opt sbin sys var
boot etc initrd.img media proc selinux tmp vmlinuz
cdrom home lib mnt root srv usr
# umount /mnt
# nbd-client -d /dev/nbd0
# ls -l /dev/nbd0*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:16 /dev/nbd0
-------NOTE
On "nbd-client -d", we can do an iocl(BLKRRPART) to update partition table:
as the size of the device is 0, we don't have to serve the partition manager
request (-> no deadlock).
-------NOTE
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows Network Block Device to be mounted locally (nbd-client to
nbd-server over 127.0.0.1).
It creates a kthread to avoid the deadlock described in NBD tools
documentation. So, if nbd-client hangs waiting for pages, the kblockd thread
can continue its work and free pages.
I have tested the patch to verify that it avoids the hang that always occurs
when writing to a localhost nbd connection. I have also tested to verify that
no performance degradation results from the additional thread and queue.
Patch originally from Laurent Vivier.
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Gardner [Tue, 29 Apr 2008 08:02:45 +0000 (01:02 -0700)]
edd: add default mode CONFIG_EDD_OFF=n, override with edd={on,off}
Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk
Drive Services. CONFIG_EDD_OFF disables EDD while still compiling EDD into
the kernel. Default behavior can be forced using 'edd=on' or 'edd=off' as
a kernel parameter.
[akpm@linux-foundation.org: fix kernel-parameters.txt] Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 29 Apr 2008 08:02:44 +0000 (01:02 -0700)]
sysctl: add the ->permissions callback on the ctl_table_root
When reading from/writing to some table, a root, which this table came from,
may affect this table's permissions, depending on who is working with the
table.
The core hunk is at the bottom of this patch. All the rest is just pushing
the ctl_table_root argument up to the sysctl_perm() function.
This will be mostly (only?) used in the net sysctls.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 29 Apr 2008 08:02:41 +0000 (01:02 -0700)]
sysctl: clean from unneeded extern and forward declarations
The do_sysctl_strategy isn't used outside kernel/sysctl.c, so this can be
static and without a prototype in header.
Besides, move this one and parse_table() above their callers and drop the
forward declarations of the latter call.
One more "besides" - fix two checkpatch warnings: space before a ( and an
extra space at the end of a line.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 29 Apr 2008 08:02:40 +0000 (01:02 -0700)]
sysctl: merge equal proc_sys_read and proc_sys_write
Many (most of) sysctls do not have a per-container sense. E.g.
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget and so on
and so forth. Besides, tuning then from inside a container is not even
secure. On the other hand, hiding them completely from the container's tasks
sometimes causes user-space to stop working.
When developing net sysctl, the common practice was to duplicate a table and
drop the write bits in table->mode, but this approach was not very elegant,
lead to excessive memory consumption and was not suitable in general.
Here's the alternative solution. To facilitate the per-container sysctls
ctl_table_root-s were introduced. Each root contains a list of
ctl_table_header-s that are visible to different namespaces. The idea of this
set is to add the permissions() callback on the ctl_table_root to allow ctl
root limit permissions to the same ctl_table-s.
The main user of this functionality is the net-namespaces code, but later this
will (should) be used by more and more namespaces, containers and control
groups.
Actually, this idea's core is in a single hunk in the third patch. First two
patches are cleanups for sysctl code, while the third one mostly extends the
arguments set of some sysctl functions.
This patch:
These ->read and ->write callbacks act in a very similar way, so merge these
paths to reduce the number of places to patch later and shrink the .text size
(a bit).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code. The original OOPS is described in the
commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc:
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.
create_proc_entries is replaced in the entire kernel code as new method
is also simply better.
This patch:
The problem is the same as for de->proc_fops. Right now PDE becomes visible
without data set. So, the entry could be looked up without data. This, in
most cases, will simply OOPS.
proc_create_data call is created to address this issue. proc_create now
becomes a wrapper around it.
Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Chris Mason <chris.mason@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Karsten Keil <kkeil@suse.de> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Neil Brown <neilb@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Osterlund <petero2@telia.com> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: convert /proc/tty/ldiscs to seq_file interface
Note: THIS_MODULE and header addition aren't technically needed because
this code is not modular, but let's keep it anyway because people
can copy this code into modular code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.
P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
is long-standing crap, BTW, thus
a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
b) new such users should be rejected,
c) everyone is encouraged to convert his favourite ->read_proc user or
I'll do it, lazy bastards.
proc: switch /proc/scsi/device_info to seq_file interface
Note 1: 0644 should be used, but root bypasses permissions, so writing
to /proc/scsi/device_info still works.
Note 2: looks like scsi_dev_info_list is unprotected
Note 3: probably make proc whine about "unwriteable but with ->write hook"
entries. Probably.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: switch /proc/irda/irnet to seq_file interface
Probably interface misuse, because of the way iterating over hashbin is done.
However! Printing of socket number ("IrNET socket %d - ", i++") made conversion
to proper ->start/->next difficult enough to do blindly without hardware.
Said that, please apply.
Remove useless comment while I am it.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: switch /proc/bus/ecard/devices to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Yani Ioannou <yani.ioannou@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail. However, if NULL is passed as parent then create/remove
accept full path as a argument. This is arbitrary restriction -- all
infrastructure is in place.
proc_subdir_lock protects only modifying and walking through PDE lists, so
after we've found PDE to remove and actually removed it from lists, there is
no need to hold proc_subdir_lock for the rest of operation.
Roland McGrath [Tue, 29 Apr 2008 08:01:38 +0000 (01:01 -0700)]
procfs: mem permission cleanup
This cleans up the permission checks done for /proc/PID/mem i/o calls. It
puts all the logic in a new function, check_mem_permission().
The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
magical expression multiple times. The new function does all that work in one
place, with clear comments.
The old code called security_ptrace() twice on successful checks, once in
MAY_PTRACE() and once in __ptrace_may_attach(). Now it's only called once,
and only if all other checks have succeeded.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Helsley [Tue, 29 Apr 2008 08:01:36 +0000 (01:01 -0700)]
procfs task exe symlink
The kernel implements readlink of /proc/pid/exe by getting the file from
the first executable VMA. Then the path to the file is reconstructed and
reported as the result.
Because of the VMA walk the code is slightly different on nommu systems.
This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
walking the VMAs to find the first executable file-backed VMA we store a
reference to the exec'd file in the mm_struct.
That reference would prevent the filesystem holding the executable file
from being unmounted even after unmapping the VMAs. So we track the number
of VM_EXECUTABLE VMAs and drop the new reference when the last one is
unmapped. This avoids pinning the mounted filesystem.
[akpm@linux-foundation.org: improve comments]
[yamamoto@valinux.co.jp: fix dup_mmap] Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>
Cc:"Eric W. Biederman" <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: print more information when removing non-empty directories
This usually saves one recompile to insert similar printk like below. :)
Sample nastygram:
remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
Pid: 3034, comm: rmmod Tainted: G M 2.6.25-rc1 #5
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:34 +0000 (01:01 -0700)]
keys: make key_serial() a function if CONFIG_KEYS=y
Make key_serial() an inline function rather than a macro if CONFIG_KEYS=y.
This prevents double evaluation of the key pointer and also provides better
type checking.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:01:32 +0000 (01:01 -0700)]
keys: explicitly include required slab.h header file.
Since these two source files invoke kmalloc(), they should explicitly
include <linux/slab.h>.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maximum number of keys that each non-root user may have and the maximum
total number of bytes of data that each of those users may have stored in
their keys.
Also increase the quotas as a number of people have been complaining that it's
not big enough. I'm not sure that it's big enough now either, but on the
other hand, it can now be set in /etc/sysctl.conf.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:31 +0000 (01:01 -0700)]
keys: don't generate user and user session keyrings unless they're accessed
Don't generate the per-UID user and user session keyrings unless they're
explicitly accessed. This solves a problem during a login process whereby
set*uid() is called before the SELinux PAM module, resulting in the per-UID
keyrings having the wrong security labels.
This also cures the problem of multiple per-UID keyrings sometimes appearing
due to PAM modules (including pam_keyinit) setuiding and causing user_structs
to come into and go out of existence whilst the session keyring pins the user
keyring. This is achieved by first searching for extant per-UID keyrings
before inventing new ones.
The serial bound argument is also dropped from find_keyring_by_name() as it's
not currently made use of (setting it to 0 disables the feature).
Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
keys: allow clients to set key perms in key_create_or_update()
The key_create_or_update() function provided by the keyring code has a default
set of permissions that are always applied to the key when created. This
might not be desirable to all clients.
Here's a patch that adds a "perm" parameter to the function to address this,
which can be set to KEY_PERM_UNDEF to revert to the current behaviour.
Signed-off-by: Arun Raghavan <arunsr@cse.iitk.ac.in> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:26 +0000 (01:01 -0700)]
keys: add keyctl function to get a security label
Add a keyctl() function to get the security label of a key.
The following is added to Documentation/keys.txt:
(*) Get the LSM security context attached to a key.
long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
size_t buflen)
This function returns a string that represents the LSM security context
attached to a key in the buffer provided.
Unless there's an error, it always returns the amount of data it could
produce, even if that's too big for the buffer, but it won't copy more
than requested to userspace. If the buffer pointer is NULL then no copy
will take place.
A NUL character is included at the end of the string if the buffer is
sufficiently big. This is included in the returned count. If no LSM is
in force then an empty string will be returned.
A process must have view permission on the key for this function to be
successful.
[akpm@linux-foundation.org: declare keyctl_get_security()] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:24 +0000 (01:01 -0700)]
keys: allow the callout data to be passed as a blob rather than a string
Allow the callout data to be passed as a blob rather than a string for
internal kernel services that call any request_key_*() interface other than
request_key(). request_key() itself still takes a NUL-terminated string.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Coffman [Tue, 29 Apr 2008 08:01:22 +0000 (01:01 -0700)]
keys: check starting keyring as part of search
Check the starting keyring as part of the search to (a) see if that is what
we're searching for, and (b) to check it is still valid for searching.
The scenario: User in process A does things that cause things to be created in
its process session keyring. The user then does an su to another user and
starts a new process, B. The two processes now share the same process session
keyring.
Process B does an NFS access which results in an upcall to gssd. When gssd
attempts to instantiate the context key (to be linked into the process session
keyring), it is denied access even though it has an authorization key.
The order of calls is:
keyctl_instantiate_key()
lookup_user_key() (the default: case)
search_process_keyrings(current)
search_process_keyrings(rka->context) (recursive call)
keyring_search_aux()
keyring_search_aux() verifies the keys and keyrings underneath the top-level
keyring it is given, but that top-level keyring is neither fully validated nor
checked to see if it is the thing being searched for.
This patch changes keyring_search_aux() to:
1) do more validation on the top keyring it is given and
2) check whether that top-level keyring is the thing being searched for
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:19 +0000 (01:01 -0700)]
keys: increase the payload size when instantiating a key
Increase the size of a payload that can be used to instantiate a key in
add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs
to be passed around. The limit is raised to 1MB. If kmalloc() can't allocate
a buffer of sufficient size, vmalloc() will be tried instead.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Tue, 29 Apr 2008 08:01:18 +0000 (01:01 -0700)]
elf: fix shadowed variables in fs/binfmt_elf.c
Fix these sparse warings:
fs/binfmt_elf.c:1749:29: warning: symbol 'tmp' shadows an earlier one
fs/binfmt_elf.c:1734:28: originally declared here
fs/binfmt_elf.c:2009:26: warning: symbol 'vma' shadows an earlier one
fs/binfmt_elf.c:1892:24: originally declared here
[akpm@linux-foundation.org: chose better variable name] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:01:14 +0000 (01:01 -0700)]
ipmi: make comment match actual preprocessor check
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lots of style fixes for the miscellaneous IPMI files. No functional
changes. Basically fixes everything reported by checkpatch and fixes the
comment style.
Lots of style fixes for the IPMI system interface driver. No functional
changes. Basically fixes everything reported by checkpatch and fixes the
comment style.
Convert the #defines for statistics into an enum in the IPMI system interface
and remove the unused timeout_restart statistic. And comment what these
statistics mean.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "run_to_completion" mode was somewhat broken. Locks need to be avoided in
run_to_completion mode, and it shouldn't be used by normal users, just
internally for panic situations.
This patch removes locks in run_to_completion mode and removes the user call
for setting the mode. The only user was the poweroff code, but it was easily
converted to use the polling interface.
CLONE_NEWIPC|CLONE_SYSVSEM interaction isn't handled properly. This can cause
a kernel memory corruption. CLONE_NEWIPC must detach from the existing undo
lists.
Fix, part 3: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC).
With unshare, specifying CLONE_SYSVSEM means unshare the sysvsem. So it seems
reasonable that CLONE_NEWIPC without CLONE_SYSVSEM would just imply
CLONE_SYSVSEM.
However with clone, specifying CLONE_SYSVSEM means *share* the sysvsem. So
calling clone(CLONE_SYSVSEM|CLONE_NEWIPC) is explicitly asking for something
we can't allow. So return -EINVAL in that case.
[akpm@linux-foundation.org: cleanups] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc: sysvsem: force unshare(CLONE_SYSVSEM) when CLONE_NEWIPC
sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing
undo lists.
Fix, part 2: perform an implicit CLONE_SYSVSEM in CLONE_NEWIPC. CLONE_NEWIPC
creates a new IPC namespace, the task cannot access the existing semaphore
arrays after the unshare syscall. Thus the task can/must detach from the
existing undo list entries, too.
This fixes the kernel corruption, because it makes it impossible that
undo records from two different namespaces are in sysvsem.undo_list.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing
undo lists.
Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM)
The original reason to not support it was the potential (inevitable?)
confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the
inverse meaning of clone(CLONE_SYSVSEM).
Our two most reasonable options then appear to be (1) fully support
CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM,
but always do it anyway on unshare(CLONE_SYSVSEM). This patch does
(1).
Changelog:
Apr 16: SEH: switch to Manfred's alternative patch which
removes the unshare_semundo() function which
always refused CLONE_SYSVSEM.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Tue, 29 Apr 2008 08:00:54 +0000 (01:00 -0700)]
IPC: consolidate all xxxctl_down() functions
semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
of commands for each kind of IPC. They all start to do the same job (they
retrieve the ipc and do some permission checks) before handling the commands
on their own.
This patch proposes to consolidate this by moving these same pieces of code
into one common function called ipcctl_pre_down().
It simplifies a little these xxxctl_down() functions and increases a little
the maintainability.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Tue, 29 Apr 2008 08:00:51 +0000 (01:00 -0700)]
IPC: introduce ipc_update_perm()
The IPC_SET command performs the same permission setting for all IPCs. This
patch introduces a common ipc_update_perm() function to update these
permissions and makes use of it for all IPCs.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Tue, 29 Apr 2008 08:00:50 +0000 (01:00 -0700)]
IPC: get rid of the use *_setbuf structure.
All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
command. This is not really needed and, moreover, it complicates a little bit
the code.
This patch gets rid of the use of it and uses directly the semid64_ds/
msgid64_ds/shmid64_ds structure.
In addition of removing one struture declaration, it also simplifies and
improves a little bit the common 64-bits path.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Tue, 29 Apr 2008 08:00:49 +0000 (01:00 -0700)]
IPC/semaphores: move the rwmutex handling inside semctl_down
semctl_down is called with the rwmutex (the one which protects the list of
ipcs) taken in write mode.
This patch moves this rwmutex taken in write-mode inside semctl_down.
This has the advantages of reducing a little bit the window during which this
rwmutex is taken, clarifying sys_semctl, and finally of having a coherent
behaviour with [shm|msg]ctl_down
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Tue, 29 Apr 2008 08:00:48 +0000 (01:00 -0700)]
IPC/message queues: introduce msgctl_down
Currently, sys_msgctl is not easy to read.
This patch tries to improve that by introducing the msgctl_down function to
handle all commands requiring the rwmutex to be taken in write mode (ie
IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
for message queues.
This greatly changes the readability of sys_msgctl and also harmonizes the way
these commands are handled among all IPCs.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Tue, 29 Apr 2008 08:00:47 +0000 (01:00 -0700)]
IPC/shared memory: introduce shmctl_down
Currently, the way the different commands are handled in sys_shmctl introduces
some duplicated code.
This patch introduces the shmctl_down function to handle all the commands
requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for
now). It is the equivalent function of semctl_down for shared memory.
This removes some duplicated code for handling these both commands and
harmonizes the way they are handled among all IPCs.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>