]> err.no Git - linux-2.6/log
linux-2.6
17 years ago[PATCH] fdtable: Make fdarray and fdsets equal in size
Vadim Lobanov [Sun, 10 Dec 2006 10:21:12 +0000 (02:21 -0800)]
[PATCH] fdtable: Make fdarray and fdsets equal in size

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] fdtable: Delete pointless code in dup_fd()
Vadim Lobanov [Sun, 10 Dec 2006 10:21:09 +0000 (02:21 -0800)]
[PATCH] fdtable: Delete pointless code in dup_fd()

The dup_fd() function creates a new files_struct and fdtable embedded inside
that files_struct, and then possibly expands the fdtable using expand_files().

The out_release error path is invoked when expand_files() returns an error
code.  However, when this attempt to expand fails, the fdtable is left in its
original embedded form, so it is pointless to try to free the associated
fdarray and fdsets.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] dio: lock refcount operations
Zach Brown [Sun, 10 Dec 2006 10:21:07 +0000 (02:21 -0800)]
[PATCH] dio: lock refcount operations

The wait_for_more_bios() function name was poorly chosen.  While looking to
clean it up it I noticed that the dio struct refcounting between the bio
completion and dio submission paths was racey.

The bio submission path was simply freeing the dio struct if
atomic_dec_and_test() indicated that it dropped the final reference.

The aio bio completion path was dereferencing its dio struct pointer *after
dropping its reference* based on the remaining number of references.

These two paths could race and result in the aio bio completion path
dereferencing a freed dio, though this was not observed in the wild.

This moves the refcount under the bio lock so that bio completion can drop
its reference and decide to wake all in one atomic step.

Once testing and waking is locked dio_await_one() can test its sleeping
condition and mark itself uninterruptible under the lock.  It gets simpler
and wait_for_more_bios() disappears.

The addition of the interrupt masking spin lock acquiry in dio_bio_submit()
looks alarming.  This lock acquiry existed in that path before the recent
dio completion patch set.  We shouldn't expect significant performance
regression from returning to the behaviour that existed before the
completion clean up work.

This passed 4k block ext3 O_DIRECT fsx and aio-stress on an SMP machine.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED
Zach Brown [Sun, 10 Dec 2006 10:21:05 +0000 (02:21 -0800)]
[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED

The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
going to call aio_complete().  It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.

Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio().  direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.

The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain.  It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.

This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio.  Instead we return the return
code of the operation and let the aio core call aio_complete().  This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.

Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it.  XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
 We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] dio: remove duplicate bio wait code
Zach Brown [Sun, 10 Dec 2006 10:21:01 +0000 (02:21 -0800)]
[PATCH] dio: remove duplicate bio wait code

Now that we have a single refcount and waiting path we can reuse it in the
async 'should_wait' path.  It continues to rely on the fragile link between
the conditional in dio_complete_aio() which decides to complete the AIO and
the conditional in direct_io_worker() which decides to wait and free.

By waiting before dropping the reference we stop dio_bio_end_aio() from
calling dio_complete_aio() which used to wake up the waiter after seeing the
reference count drop to 0.  We hoist this wake up into dio_bio_end_aio() which
now notices when it's left a single remaining reference that is held by the
waiter.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] dio: formalize bio counters as a dio reference count
Zach Brown [Sun, 10 Dec 2006 10:20:59 +0000 (02:20 -0800)]
[PATCH] dio: formalize bio counters as a dio reference count

Previously we had two confusing counts of bio progress.  'bio_count' was
decremented as bios were processed and freed by the dio core.  It was used to
indicate final completion of the dio operation.  'bios_in_flight' reflected
how many bios were between submit_bio() and bio->end_io.  It was used by the
sync path to decide when to wake up and finish completing bios and was ignored
by the async path.

This patch collapses the two notions into one notion of a dio reference count.
 bios hold a dio reference when they're between submit_bio and bio->end_io.

Since bios_in_flight was only used in the sync path it is now equivalent to
dio->refcount - 1 which accounts for direct_io_worker() holding a reference
for the duration of the operation.

dio_bio_complete() -> finished_one_bio() was called from the sync path after
finding bios on the list that the bio->end_io function had deposited.
finished_one_bio() can not drop the dio reference on behalf of these bios now
because bio->end_io already has.  The is_async test in finished_one_bio()
meant that it never actually did anything other than drop the bio_count for
sync callers.  So we remove its refcount decrement, don't call it from
dio_bio_complete(), and hoist its call up into the async dio_bio_complete()
caller after an explicit refcount decrement.  It is renamed dio_complete_aio()
to reflect the remaining work it actually does.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] dio: call blk_run_address_space() once per op
Zach Brown [Sun, 10 Dec 2006 10:20:56 +0000 (02:20 -0800)]
[PATCH] dio: call blk_run_address_space() once per op

We only need to call blk_run_address_space() once after all the bios for the
direct IO op have been submitted.  This removes the chance of calling
blk_run_address_space() after spurious wake ups as the sync path waits for
bios to drain.  It's also one less difference betwen the sync and async paths.

In the process we remove a redundant dio_bio_submit() that its caller had
already performed.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] dio: centralize completion in dio_complete()
Zach Brown [Sun, 10 Dec 2006 10:20:54 +0000 (02:20 -0800)]
[PATCH] dio: centralize completion in dio_complete()

There have been a lot of bugs recently due to the way direct_io_worker() tries
to decide how to finish direct IO operations.  In the worst examples it has
failed to call aio_complete() at all (hang) or called it too many times
(oops).

This set of patches cleans up the completion phase with the goal of removing
the complexity that lead to these bugs.  We end up with one path that
calculates the result of the operation after all off the bios have completed.
We decide when to generate a result of the operation using that path based on
the final release of a refcount on the dio structure.

I tried to progress towards the final state in steps that were relatively easy
to understand.  Each step should compile but I only tested the final result of
having all the patches applied.

I've tested these on low end PC drives with aio-stress, the direct IO tests I
could manage to get running in LTP, orasim, and some home-brew functional
tests.

In http://lkml.org/lkml/2006/9/21/103 IBM reports success with ext2 and ext3
running DIO LTP tests.  They found that XFS bug which has since been addressed
in the patch series.

This patch:

The mechanics which decide the result of a direct IO operation were duplicated
in the sync and async paths.

The async path didn't check page_errors which can manifest as silently
returning success when the final pointer in an operation faults and its
matching file region is filled with zeros.

The sync path and async path differed in whether they passed errors to the
caller's dio->end_io operation.  The async path was passing errors to it which
trips an assertion in XFS, though it is apparently harmless.

This centralizes the completion phase of dio ops in one place.  AIO will now
return EFAULT consistently and all paths fall back to the previously sync
behaviour of passing the number of bytes 'transferred' to the dio->end_io
callback, regardless of errors.

dio_await_completion() doesn't have to propogate EIO from non-uptodate bios
now that it's being propogated through dio_complete() via dio->io_error.  This
lets it return void which simplifies its sole caller.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: assorted md and raid1 one-liners
NeilBrown [Sun, 10 Dec 2006 10:20:52 +0000 (02:20 -0800)]
[PATCH] md: assorted md and raid1 one-liners

Fix few bugs that meant that:
  - superblocks weren't alway written at exactly the right time (this
    could show up if the array was not written to - writting to the array
    causes lots of superblock updates and so hides these errors).

  - restarting device recovery after a clean shutdown (version-1 metadata
    only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
   The body of this if takes one of two branches depending on whether
   MD_RECOVERY_SYNC is set, so testing it in the clause of the if
   is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
   metadata only) make sure a full recovery (not just as guided by
   bitmaps) does get done.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5
NeilBrown [Sun, 10 Dec 2006 10:20:51 +0000 (02:20 -0800)]
[PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5

Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error
to higher levels.  While this should be sufficient, it is safer to explicitly
set the error code as well - less room for confusion.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: remove some old ifdefed-out code from raid5.c
NeilBrown [Sun, 10 Dec 2006 10:20:50 +0000 (02:20 -0800)]
[PATCH] md: remove some old ifdefed-out code from raid5.c

There are some vestiges of old code that was used for bypassing the stripe
cache on reads in raid5.c.  This was never updated after the change from
buffer_heads to bios, but was left as a reminder.

That functionality has nowe been implemented in a completely different way, so
the old code can go.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] MD: conditionalize some code
Jeff Garzik [Sun, 10 Dec 2006 10:20:50 +0000 (02:20 -0800)]
[PATCH] MD: conditionalize some code

The autorun code is only used if this module is built into the static
kernel image.  Adjust #ifdefs accordingly.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx
NeilBrown [Sun, 10 Dec 2006 10:20:49 +0000 (02:20 -0800)]
[PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx

stripe_to_pdidx finds the index of the parity disk for a given stripe.  It
assumes raid5 in that it uses "disks-1" to determine the number of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each other
out.  The only way that 'data_disks' affects the calculation of pd_idx in
raid5_compute_sector is when it is divided into the sector number.  But as
that sector number is calculated by multiplying in the wrong value of
'data_disks' the division produces the right value.

So it is innocuous but needs to be fixed.

Also change the calculation of raid_disks in compute_blocknr to make it
more obviously correct (it seems at first to always use disks-1 too).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: enable bypassing cache for reads
Raz Ben-Jehuda(caro) [Sun, 10 Dec 2006 10:20:48 +0000 (02:20 -0800)]
[PATCH] md: enable bypassing cache for reads

Call the chunk_aligned_read where appropriate.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: allow reads that have bypassed the cache to be retried on failure
Raz Ben-Jehuda(caro) [Sun, 10 Dec 2006 10:20:47 +0000 (02:20 -0800)]
[PATCH] md: allow reads that have bypassed the cache to be retried on failure

If a bypass-the-cache read fails, we simply try again through the cache.  If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown <neilb@suse.de>

1/
  chunk_aligned_read and retry_aligned_read assume that
      data_disks == raid_disks - 1
  which is not true for raid6.
  So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
   (to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
   a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown <neilb@suse.de>

update 2:

1/ When aligned requests fail (read error) they need to be retried
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to
   the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: handle bypassing the read cache (assuming nothing fails)
Raz Ben-Jehuda(caro) [Sun, 10 Dec 2006 10:20:46 +0000 (02:20 -0800)]
[PATCH] md: handle bypassing the read cache (assuming nothing fails)

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: define raid5_mergeable_bvec
Raz Ben-Jehuda(caro) [Sun, 10 Dec 2006 10:20:45 +0000 (02:20 -0800)]
[PATCH] md: define raid5_mergeable_bvec

This will encourage read request to be on only one device, so we will often be
able to bypass the cache for read requests.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] md: tidy up device-change notification when an md array is stopped
NeilBrown [Sun, 10 Dec 2006 10:20:44 +0000 (02:20 -0800)]
[PATCH] md: tidy up device-change notification when an md array is stopped

An md array can be stopped leaving all the setting still in place, or it can
torn down and destroyed.  set_capacity and other change notifications only
happen in the latter case, but should happen in both.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Fbdev driver for IBM GXT4500P videocards
Paul Mackerras [Sun, 10 Dec 2006 10:20:42 +0000 (02:20 -0800)]
[PATCH] Fbdev driver for IBM GXT4500P videocards

This is an fbdev driver for the IBM GXT4500P display card found in some IBM
System P (pSeries) machines.  These cards have hardware 2D and 3D
capabilities, but the driver does not use them; it just exports a dumb
framebuffer.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: James Simmons <jsimmons@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] ide-cd: Handle strange interrupt on the Intel ESB2
Alan Cox [Sun, 10 Dec 2006 10:20:39 +0000 (02:20 -0800)]
[PATCH] ide-cd: Handle strange interrupt on the Intel ESB2

The ESB2 appears to emit spurious DMA interrupts when configured for native
mode and handling ATAPI devices.  Stratus were able to pin this bug down and
produce a patch.  This is a rework which applies the fixup only to the ESB2
(for now).  We can apply it to other chips later if the same problem is found.

This code has been tested and confirmed to fix the problem on the tested
systems.

Signed-off-by: Alan Cox <alan@redhat.com>
(Most of the hard work done by Stratus however)
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] kernel/sched.c: whitespace cleanups
Miguel Ojeda Sandonis [Sun, 10 Dec 2006 10:20:38 +0000 (02:20 -0800)]
[PATCH] kernel/sched.c: whitespace cleanups

[akpm@osdl.org: additional cleanups]
Signed-off-by: Miguel Ojeda Sandonis <maxextreme@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: optimize activate_task for RT task
Chen, Kenneth W [Sun, 10 Dec 2006 10:20:36 +0000 (02:20 -0800)]
[PATCH] sched: optimize activate_task for RT task

RT task does not participate in interactiveness priority and thus shouldn't
be bothered with timestamp and p->sleep_type manipulation when task is
being put on run queue.  Bypass all of the them with a single if (rt_task)
test.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: remove lb_stopbalance counter
Chen, Kenneth W [Sun, 10 Dec 2006 10:20:35 +0000 (02:20 -0800)]
[PATCH] sched: remove lb_stopbalance counter

Remove scheduler stats lb_stopbalance counter.  This counter can be
calculated by: lb_balanced - lb_nobusyg - lb_nobusyq.  There is no need to
create gazillion counters while we can derive the value.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: decrease number of load balances
Siddha, Suresh B [Sun, 10 Dec 2006 10:20:33 +0000 (02:20 -0800)]
[PATCH] sched: decrease number of load balances

Currently at a particular domain, each cpu in the sched group will do a
load balance at the frequency of balance_interval.  More the cores and
threads, more the cpus will be in each sched group at SMP and NUMA domain.
And we endup spending quite a bit of time doing load balancing in those
domains.

Fix this by making only one cpu(first idle cpu or first cpu in the group if
all the cpus are busy) in the sched group do the load balance at that
particular sched domain and this load will slowly percolate down to the
other cpus with in that group(when they do load balancing at lower
domains).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: improve migration accuracy
Mike Galbraith [Sun, 10 Dec 2006 10:20:31 +0000 (02:20 -0800)]
[PATCH] sched: improve migration accuracy

Co-opt rq->timestamp_last_tick to maintain a cache_hot_time evaluation
reference timestamp at both tick and sched times to prevent said reference,
formerly rq->timestamp_last_tick, from being behind task->last_ran at
evaluation time, and to move said reference closer to current time on the
remote processor, intent being to improve cache hot evaluation and
timestamp adjustment accuracy for task migration.

Fix minor sched_time double accounting error which occurs when a task
passing through schedule() does not schedule off, and takes the next timer
tick.

[kenneth.w.chen@intel.com: cleanup]
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Don Mullis <dwm@meer.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: add option to serialize load balancing
Christoph Lameter [Sun, 10 Dec 2006 10:20:29 +0000 (02:20 -0800)]
[PATCH] sched: add option to serialize load balancing

Large sched domains can be very expensive to scan.  Add an option SD_SERIALIZE
to the sched domain flags.  If that flag is set then we make sure that no
other such domain is being balanced.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: call tasklet less frequently
Christoph Lameter [Sun, 10 Dec 2006 10:20:27 +0000 (02:20 -0800)]
[PATCH] sched: call tasklet less frequently

Trigger softirq less frequently

We trigger the softirq before this patch using offset of sd->interval.
However, if the queue is busy then it is sufficient to schedule the softirq
with sd->interval * busy_factor.

So we modify the calculation of the next time to balance by taking
the interval added to last_balance again. This is only the
right value if the idle/busy situation continues as is.

There are two potential trouble spots:
- If the queue was idle and now gets busy then we call rebalance
  early. However, that is not a problem because we will then use
  the longer interval for the next period.

- If the queue was busy and becomes idle then we potentially
  wait too long before rebalancing. However, when the task
  goes idle then idle_balance is called. We add another calculation
  of the next balance time based on sd->interval in idle_balance
  so that we will rebalance soon.

V2->V3:
- Calculate rebalance time based on current jiffies and not
  based on the jiffies at the last time we load balanced.
  We no longer rely on staggering and therefore we can
  affort to do this now.

V3->V4:
- Use functions to do jiffy comparisons.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: use softirq for load balancing
Christoph Lameter [Sun, 10 Dec 2006 10:20:25 +0000 (02:20 -0800)]
[PATCH] sched: use softirq for load balancing

Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
softirq.

We calculate the earliest time for each layer of sched domains to be rescanned
(this is the rescan time for idle) and use the earliest of those to schedule
the softirq via a new field "next_balance" added to struct rq.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: move idle status calculation into rebalance_tick()
Christoph Lameter [Sun, 10 Dec 2006 10:20:23 +0000 (02:20 -0800)]
[PATCH] sched: move idle status calculation into rebalance_tick()

Perform the idle state determination in rebalance_tick.

If we separate balancing from sched_tick then we also need to determine the
idle state in rebalance_tick.

V2->V3
Remove useless idlle != 0 check. Checking nr_running seems
to be sufficient. Thanks Suresh.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: extract load calculation from rebalance_tick
Christoph Lameter [Sun, 10 Dec 2006 10:20:22 +0000 (02:20 -0800)]
[PATCH] sched: extract load calculation from rebalance_tick

A load calculation is always done in rebalance_tick() in addition to the real
load balancing activities that only take place when certain jiffie counts have
been reached.  Move that processing into a separate function and call it
directly from scheduler_tick().

Also extract the time slice handling from scheduler_tick and put it into a
separate function.  Then we can clean up scheduler_tick significantly.  It
will no longer have any gotos.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: disable interrupts for locking in load_balance()
Christoph Lameter [Sun, 10 Dec 2006 10:20:21 +0000 (02:20 -0800)]
[PATCH] sched: disable interrupts for locking in load_balance()

Interrupts must be disabled for request queue locks if we want to run
load_balance() with interrupts enabled.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: remove staggering of load balancing
Christoph Lameter [Sun, 10 Dec 2006 10:20:19 +0000 (02:20 -0800)]
[PATCH] sched: remove staggering of load balancing

Timer interrupts already are staggered.  We do not need an additional layer of
time staggering for short load balancing actions that take a reasonably small
portion of the time slice.

For load balancing on large sched_domains we will add a serialization later
that avoids concurrent load balance operations and thus has the same effect as
load staggering.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched: avoid taking rq lock in wake_priority_sleeper
Christoph Lameter [Sun, 10 Dec 2006 10:20:13 +0000 (02:20 -0800)]
[PATCH] sched: avoid taking rq lock in wake_priority_sleeper

Avoid taking the request queue lock in wake_priority_sleeper if there are no
running processes.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched domain: increase the SMT busy rebalance interval
Siddha, Suresh B [Sun, 10 Dec 2006 10:20:12 +0000 (02:20 -0800)]
[PATCH] sched domain: increase the SMT busy rebalance interval

With SMT, if the logical processor is busy, load balance happens for every
8msec(min)-16msec(max).  There is no need to do this often, as this is just
for fairness(to maintain uniform runqueue lengths) and default time slice
anyhow is 100msec.

Appended patch increases this interval to 64msec(min)-128msec(max) when the
logical processor is busy.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] move_task_off_dead_cpu() should be called with disabled ints
Kirill Korotaev [Sun, 10 Dec 2006 10:20:11 +0000 (02:20 -0800)]
[PATCH] move_task_off_dead_cpu() should be called with disabled ints

move_task_off_dead_cpu() requires interrupts to be disabled, while
migrate_dead() calls it with enabled interrupts.  Added appropriate
comments to functions and added BUG_ON(!irqs_disabled()) into
double_rq_lock() and double_lock_balance() which are the origin sources of
such bugs.

Signed-off-by: Kirill Korotaev <dev@openvz.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] ched domain: move sched group allocations to percpu area
Siddha, Suresh B [Sun, 10 Dec 2006 10:20:07 +0000 (02:20 -0800)]
[PATCH] ched domain: move sched group allocations to percpu area

Move the sched group allocations to percpu area.  This will minimize cross
node memory references and also cleans up the sched groups allocation for
allnodes sched domain.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sched.c: correct comment for this_rq_lock()
Robert P. J. Day [Sun, 10 Dec 2006 10:20:00 +0000 (02:20 -0800)]
[PATCH] sched.c: correct comment for this_rq_lock()

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Don't build some broken ISDN drivers on big endian MIPS
Ralf Baechle [Sun, 10 Dec 2006 10:19:58 +0000 (02:19 -0800)]
[PATCH] Don't build some broken ISDN drivers on big endian MIPS

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: add to getdelays
Andrew Morton [Sun, 10 Dec 2006 10:19:56 +0000 (02:19 -0800)]
[PATCH] io-accounting: add to getdelays

Wire up the IO accounting into getdelays.c.

Usage:

To display I/O stats for each exitting task:

vmm:/home/akpm> ./getdelays -m0,1,2,3 -i -l
cpumask 0 maskset 1
printing IO accounting
listen forever
rm: read=8192, write=0, cancelled_write=0
cvs: read=733184, write=4255744, cancelled_write=4096
make: read=217088, write=0, cancelled_write=0
cc1: read=4263936, write=12288, cancelled_write=0
as: read=811008, write=8192, cancelled_write=0
gcc: read=323584, write=0, cancelled_write=12288
cc1: read=0, write=8192, cancelled_write=0
as: read=4096, write=4096, cancelled_write=0
gcc: read=16384, write=0, cancelled_write=4096
as: read=4096, write=4096, cancelled_write=0
gcc: read=16384, write=0, cancelled_write=8192
ld: read=1011712, write=16384, cancelled_write=0
collect2: read=626688, write=0, cancelled_write=0
gcc: read=204800, write=0, cancelled_write=0
cc1: read=0, write=8192, cancelled_write=0
as: read=4096, write=4096, cancelled_write=0
gcc: read=16384, write=0, cancelled_write=8192
ld: read=8192, write=16384, cancelled_write=0
collect2: read=49152, write=0, cancelled_write=0
gcc: read=0, write=0, cancelled_write=0
cc1: read=0, write=4096, cancelled_write=0
ld: read=4096, write=12288, cancelled_write=0
collect2: read=49152, write=0, cancelled_write=0
gcc: read=0, write=0, cancelled_write=0

To display I/O stats for a particular presently-running task:

vmm:/home/akpm> ./getdelays -i -p $(pidof crond)
printing IO accounting
crond: read=61440, write=0, cancelled_write=0

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] getdelays: various fixes
Andrew Morton [Sun, 10 Dec 2006 10:19:55 +0000 (02:19 -0800)]
[PATCH] getdelays: various fixes

- Various cleanups

- Report errors to stderr, not stdout

- A printf was missing a \n and was hiding from me.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: via taskstats
Andrew Morton [Sun, 10 Dec 2006 10:19:53 +0000 (02:19 -0800)]
[PATCH] io-accounting: via taskstats

Deliver IO accounting via taskstats.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] cleanup taskstats.h
Andrew Morton [Sun, 10 Dec 2006 10:19:50 +0000 (02:19 -0800)]
[PATCH] cleanup taskstats.h

Fix weird whitespace mangling in taskstats.h

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: report in procfs
Andrew Morton [Sun, 10 Dec 2006 10:19:48 +0000 (02:19 -0800)]
[PATCH] io-accounting: report in procfs

Add a simple /proc/pid/io to show the IO accounting fields.

Maybe this shouldn't be merged in mainline - the preferred reporting channel
is taskstats.  But given the poor state of our userspace support for
taskstats, this is useful for developer-testing, at least.  And it improves
the changes that the procps developers will wire it up into top(1).  Opinions
are sought.

The patch also wires up the existing IO-accounting fields.

It's a bit racy on 32-bit machines: if process A reads process B's
/proc/pid/io while process B is updating one of those 64-bit counters, process
A could see an intermediate result.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: direct-io
Andrew Morton [Sun, 10 Dec 2006 10:19:47 +0000 (02:19 -0800)]
[PATCH] io-accounting: direct-io

Account for direct-io.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting-read-accounting cifs fix
Andrew Morton [Sun, 10 Dec 2006 10:19:44 +0000 (02:19 -0800)]
[PATCH] io-accounting-read-accounting cifs fix

CIFS implements ->readpages and doesn't use read_cache_pages().  So wire the
read IO accounting up within CIFS.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting-read-accounting nfs fix
Andrew Morton [Sun, 10 Dec 2006 10:19:40 +0000 (02:19 -0800)]
[PATCH] io-accounting-read-accounting nfs fix

nfs's ->readpages uses read_cache_pages().  Wire it up there.

[wfg@mail.ustc.edu.cn: account only successful nfs/fuse reads]
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: read accounting
Andrew Morton [Sun, 10 Dec 2006 10:19:35 +0000 (02:19 -0800)]
[PATCH] io-accounting: read accounting

Wire up read accounting for block devices, within submit_bio().

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: write-cancel accounting
Andrew Morton [Sun, 10 Dec 2006 10:19:31 +0000 (02:19 -0800)]
[PATCH] io-accounting: write-cancel accounting

Account for the number of byte writes which this process caused to not happen
after all.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: write accounting
Andrew Morton [Sun, 10 Dec 2006 10:19:27 +0000 (02:19 -0800)]
[PATCH] io-accounting: write accounting

Accounting writes is fairly simple: whenever a process flips a page from clean
to dirty, we accuse it of having caused a write to underlying storage of
PAGE_CACHE_SIZE bytes.

This may overestimate the amount of writing: the page-dirtying may cause only
one buffer_head's worth of writeout.  Fixing that is possible, but probably a
bit messy and isn't obviously important.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] clean up __set_page_dirty_nobuffers()
Andrew Morton [Sun, 10 Dec 2006 10:19:24 +0000 (02:19 -0800)]
[PATCH] clean up __set_page_dirty_nobuffers()

Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
and a few other places.  No functional changes.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] io-accounting: core statistics
Andrew Morton [Sun, 10 Dec 2006 10:19:19 +0000 (02:19 -0800)]
[PATCH] io-accounting: core statistics

The present per-task IO accounting isn't very useful.  It simply counts the
number of bytes passed into read() and write().  So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.

(David Wright had some comments on the applicability of the present logical IO accounting:

  For billing purposes it is useless but for workload analysis it is very
  useful

  read_bytes/read_calls  average read request size
  write_bytes/write_calls average write request size

  read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
  write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                be missed

  I often look for logical larger than physical to see filesystem cache
  problems.  And the bytes/cpusec can help find applications that are
  dominating the cache and causing slow interactive response from page cache
  contention.

  I want to find the IO intensive applications and make sure they are doing
  efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
  IO commands).

This patchset adds new accounting which tries to be more accurate.  We account
for three things:

reads:

  attempt to count the number of bytes which this process really did cause
  to be fetched from the storage layer.  Done at the submit_bio() level, so it
  is accurate for block-backed filesystems.  I also attempt to wire up NFS and
  CIFS.

writes:

  attempt to count the number of bytes which this process caused to be sent
  to the storage layer.  This is done at page-dirtying time.

  The big inaccuracy here is truncate.  If a process writes 1MB to a file
  and then deletes the file, it will in fact perform no writeout.  But it will
  have been accounted as having caused 1MB of write.

  So...

cancelled_writes:

  account the number of bytes which this process caused to not happen, by
  truncating pagecache.

  We _could_ just subtract this from the process's `write' accounting.  But
  that means that some processes would be reported to have done negative
  amounts of write IO, which is silly.

  So we just report the raw number and punt this decision up to userspace.

Now, we _could_ account for writes at the physical I/O level.  But

- This would require that we track memory-dirtying tasks at the per-page
  level (would require a new pointer in struct page).

- It would mean that IO statistics for a process are usually only available
  long after that process has exitted.  Which means that we probably cannot
  communicate this info via taskstats.

This patch:

Wire up the kernel-private data structures and the accessor functions to
manipulate them.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] pdc202xx_new: fix PLL/timing issues
Sergei Shtylyov [Sun, 10 Dec 2006 10:19:13 +0000 (02:19 -0800)]
[PATCH] pdc202xx_new: fix PLL/timing issues

Fix the CRC errors in the higher UltraDMA modes with the Promise PDC20268
and newer chips that always occur on non-x86 machines and when there are
more than 2 adapters on x86 machines.  Fix the overclocking issue for
PDC20269 and newer chips that occurs when an UltraDMA/133 capable drive is
connected.  Here's the summary of changes:

- add code to detect the PLL input clock detection and setup it output clock,
  remove the PowerMac hacks;

- replace the macros accessing the indexed regiters with functions, switch to
  using them where appropriate, gather the PIO/MWDMA/UDMA timings into tables;

- rewrite the speedproc() handler to set the drive's transfer mode first, and
  then override the timing registers set by hardware on UltraDMA/133 chips;

- use better criterion for determining higher UltraDMA modes, and add comment
  concerning the doubtful value of the code enabling IORDY/prefetch;

- replace the stupid 'pdcnew_new_' prefixes with mere 'pdcnew_';

- get rid of unneded spaces, parens and type casts, clean up some printk's,
  add some new lines here and there...

This work is loosely based on these former patches by Albert Lee:

[1] http://marc.theaimsgroup.com/?l=linux-ide&m=110992442032300
[2] http://marc.theaimsgroup.com/?l=linux-ide&m=110992457729382
[3] http://marc.theaimsgroup.com/?l=linux-ide&m=110992474205555
[4] http://marc.theaimsgroup.com/?l=linux-ide&m=111019224802939

Some PLL clock detection code was backported from his pata_pdc2027x driver...

This code has been successfully tested by me on PDC2026[89] chips.

I tried to keep this rework as several patches but it made no sense: [2] was
largely a modification of the non-working timing override code, [3] by itself
extended the overclocking issue to the case of non-UltraDMA/133 drives, and
finally, the cleanup patch based on [1] ended up rejected...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Albert Lee <albertcc@tw.ibm.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Fix noise in futex.h
David Woodhouse [Sun, 10 Dec 2006 10:19:11 +0000 (02:19 -0800)]
[PATCH] Fix noise in futex.h

There are some kernel-only bits in the middle of <linux/futex.h> which
should be removed in what we export to userspace.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sysctl: remove unused "context" param
Alexey Dobriyan [Sun, 10 Dec 2006 10:19:10 +0000 (02:19 -0800)]
[PATCH] sysctl: remove unused "context" param

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sysctl: remove some OPs
Alexey Dobriyan [Sun, 10 Dec 2006 10:19:09 +0000 (02:19 -0800)]
[PATCH] sysctl: remove some OPs

kernel.cap-bound uses only OP_SET and OP_AND

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] IPMI: misc fixes
Corey Minyard [Sun, 10 Dec 2006 10:19:08 +0000 (02:19 -0800)]
[PATCH] IPMI: misc fixes

Fix various problems pointed out by Andrew Morton and others:
  * platform_device_unregister checks for NULL, no need to check here.
  * Formatting fixes.
  * Remove big macro and convert to a function.
  * Use strcmp instead of defining a broken case-insensitive comparison,
    and make the output parameter info match the case of the input one
    (change "I/O" to "i/o").
  * Return the length instead of 0 from the hotmod parameter handler.
  * Remove some unused cruft.
  * The trydefaults parameter only has to do with scanning the "standard"
    addresses, don't check for that on ACPI.

Signed-off-by: Corey Minyard <cminyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] IPMI: remove zero inits
Randy Dunlap [Sun, 10 Dec 2006 10:19:06 +0000 (02:19 -0800)]
[PATCH] IPMI: remove zero inits

Remove all =0 and =NULL from static initializers.  They are not needed and
removing them saves space in the object files.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] update MAINTAINERS with rtc-linux mailing list info
Alessandro Zummo [Sun, 10 Dec 2006 10:19:06 +0000 (02:19 -0800)]
[PATCH] update MAINTAINERS with rtc-linux mailing list info

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] AT91RM9200 RTC
Andrew Victor [Sun, 10 Dec 2006 10:19:03 +0000 (02:19 -0800)]
[PATCH] AT91RM9200 RTC

The new Atmel AT91SAM9261 and AT91SAM9260 processors do not have the
internal RTC peripheral.  This RTC driver is therefore
AT91RM9200-specific.

This patch renames rtc-at91.c to rtc-at91rm9200.c, and changes the name
of the configuration option.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] RTCs don't use i2c_adapter.dev
David Brownell [Sun, 10 Dec 2006 10:19:02 +0000 (02:19 -0800)]
[PATCH] RTCs don't use i2c_adapter.dev

Update more I2C drivers that live outside drivers/i2c to understand that using
adapter->dev is not The Way.  When actually referring to the adapter hardware,
adapter->class_dev.dev is the answer.  When referring to a device connected to
it, client->dev.dev is the answer.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] rtc: Add rtc_merge_alarm()
Scott Wood [Sun, 10 Dec 2006 10:19:00 +0000 (02:19 -0800)]
[PATCH] rtc: Add rtc_merge_alarm()

Add rtc_merge_alarm(), which can be used by rtc drivers to turn a partially
specified alarm expiry (i.e.  most significant fields set to -1, as with the
RTC_ALM_SET ioctl()) into a fully specified expiry.

If the most significant specified field is earlier than the current time, the
least significant unspecified field is incremented.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] geode crypto is PCI device
Randy Dunlap [Sun, 10 Dec 2006 10:19:00 +0000 (02:19 -0800)]
[PATCH] geode crypto is PCI device

This driver seems to be for a PCI device.

drivers/crypto/geode-aes.c:384: warning: implicit declaration of function 'pci_release_regions'
drivers/crypto/geode-aes.c:397: warning: implicit declaration of function 'pci_request_regions'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] freezer.h uses task_struct fields
Randy Dunlap [Sun, 10 Dec 2006 10:18:58 +0000 (02:18 -0800)]
[PATCH] freezer.h uses task_struct fields

freezer.h uses task_struct fields so it should include sched.h.

  CC [M]  fs/jfs/jfs_txnmgr.o
In file included from fs/jfs/jfs_txnmgr.c:49:
include/linux/freezer.h: In function 'frozen':
include/linux/freezer.h:9: error: dereferencing pointer to incomplete type
include/linux/freezer.h:9: error: 'PF_FROZEN' undeclared (first use in this function)
include/linux/freezer.h:9: error: (Each undeclared identifier is reported only once
include/linux/freezer.h:9: error: for each function it appears in.)
include/linux/freezer.h: In function 'freezing':
include/linux/freezer.h:17: error: dereferencing pointer to incomplete type
include/linux/freezer.h:17: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'freeze':
include/linux/freezer.h:26: error: dereferencing pointer to incomplete type
include/linux/freezer.h:26: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'do_not_freeze':
include/linux/freezer.h:34: error: dereferencing pointer to incomplete type
include/linux/freezer.h:34: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'thaw_process':
include/linux/freezer.h:43: error: dereferencing pointer to incomplete type
include/linux/freezer.h:43: error: 'PF_FROZEN' undeclared (first use in this function)
include/linux/freezer.h:44: warning: implicit declaration of function 'wake_up_process'
include/linux/freezer.h: In function 'frozen_process':
include/linux/freezer.h:55: error: dereferencing pointer to incomplete type
include/linux/freezer.h:55: error: dereferencing pointer to incomplete type
include/linux/freezer.h:55: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h:55: error: 'PF_FROZEN' undeclared (first use in this function)
fs/jfs/jfs_txnmgr.c: In function 'freezing':
include/linux/freezer.h:18: warning: control reaches end of non-void function
make[2]: *** [fs/jfs/jfs_txnmgr.o] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Document how to decode an IOCTL number
Chuck Ebbert [Sun, 10 Dec 2006 10:18:57 +0000 (02:18 -0800)]
[PATCH] Document how to decode an IOCTL number

Document how to decode a binary IOCTL number.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] submit checklist update
Andrew Morton [Sun, 10 Dec 2006 10:18:56 +0000 (02:18 -0800)]
[PATCH] submit checklist update

Mention the new fault-injection test framework.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] CodingStyle updates
Randy Dunlap [Sun, 10 Dec 2006 10:18:56 +0000 (02:18 -0800)]
[PATCH] CodingStyle updates

Add some kernel coding style comments, mostly pulled from emails
by Andrew Morton, Jesper Juhl, and Randy Dunlap.

- add paragraph on switch/case indentation (with fixes)
- add paragraph on multiple-assignments
- add more on Braces
- add section on Spaces; add typeof, alignof, & __attribute__ with sizeof;
  add more on postfix/prefix increment/decrement operators
- add paragraph on function breaks in source files; add info on
  function prototype parameter names
- add paragraph on EXPORT_SYMBOL placement
- add section on /*-comment style, long-comment style, and data
  declarations and comments
- correct some chapter number references that were missed when
  chapters were renumbered

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] spi: stabilize PIO mode transfers on PXA2xx systems
Stephen Street [Sun, 10 Dec 2006 10:18:54 +0000 (02:18 -0800)]
[PATCH] spi: stabilize PIO mode transfers on PXA2xx systems

Stabilize PIO mode transfers against a range of word sizes and FIFO
thresholds and fixes word size setup/override issues.

1) 16 and 32 bit DMA/PIO transfers broken due to timing differences.
2) Potential for bad transfer counts due to transfer size assumptions.
3) Setup function broken is multiple ways.
4) Per transfer bit_per_word changes break DMA setup in pump_tranfers.
5) False positive timeout are not errors.
6) Changes in pxa2xx_spi_chip not effective in calls to setup.
7) Timeout scaling wrong for PXA255 NSSP.
8) Driver leaks memory while busy during unloading.

Known issues:

SPI_CS_HIGH and SPI_LSB_FIRST settings in struct spi_device are not handled.

Testing:

This patch has been test against the "random length, random bits/word,
random data (verified on loopback) and stepped baud rate by octaves
(3.6MHz to 115kHz)" test.  It is robust in PIO mode, using any
combination of tx and rx thresholds, and also in DMA mode (which
internally computes the thresholds).

Much thanks to Ned Forrester for exhaustive reviews, fixes and testing.
The driver is substantially better for his efforts.

Signed-off-by: Stephen Street <stephen@streetfiresound.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] ide: complete switch to pci_get
Alan Cox [Sun, 10 Dec 2006 10:18:53 +0000 (02:18 -0800)]
[PATCH] ide: complete switch to pci_get

The reverse get function allows the final piece of the switching for the old
IDE layer

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] xtensa: fix system call interface
Chris Zankel [Sun, 10 Dec 2006 10:18:52 +0000 (02:18 -0800)]
[PATCH] xtensa: fix system call interface

This is a long outstanding patch to finally fix the syscall interface.  The
constants used for the system calls are those we have provided in our libc
patches.  This patch also fixes the shmbuf and stat structure, and fcntl
definitions.

Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] xtensa: remove extra header files
Chris Zankel [Sun, 10 Dec 2006 10:18:48 +0000 (02:18 -0800)]
[PATCH] xtensa: remove extra header files

The Xtensa port contained many header files that were never needed.  This
rather lengthy patch removes all those files.  Unfortunately, there were
many dependencies that needed to be updated, so this patch touches quite a
few source files.

Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] xtensa: fix irq and misc fixes
Chris Zankel [Sun, 10 Dec 2006 10:18:47 +0000 (02:18 -0800)]
[PATCH] xtensa: fix irq and misc fixes

Update the architecture specific interrupt handling code for Xtensa to support
the new API.  Use generic BUG macros in bug.h, and some minor fixes.

Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] read_zero_pagealigned() locking fix
Hugh Dickins [Sun, 10 Dec 2006 10:18:43 +0000 (02:18 -0800)]
[PATCH] read_zero_pagealigned() locking fix

Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
bugzilla 7645.  Right: read_zero_pagealigned uses down_read of mmap_sem,
but another thread's racing read of /dev/zero, or a normal fault, can
easily set that pte again, in between zap_page_range and zeromap_page_range
getting there.  It's been wrong ever since 2.4.3.

The simple fix is to use down_write instead, but that would serialize reads
of /dev/zero more than at present: perhaps some app would be badly
affected.  So instead let zeromap_page_range return the error instead of
BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
that case - there's no need to optimize for it.

Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
zeromap_page_range), though it really isn't interesting there.  And since
mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
than -ENOMEM.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Ramiro Voicu: <Ramiro.Voicu@cern.ch>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] kbuild: don't put temp files in source
Roman Zippel [Sun, 10 Dec 2006 10:18:41 +0000 (02:18 -0800)]
[PATCH] kbuild: don't put temp files in source

The as-instr/ld-option need to create temporary files, but create them in the
output directory, when compiling external modules.  Reformat them a bit and
use $(CC) instead of $(AS) as the former is used by kbuild to assemble files.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: <jpdenheijer@gmail.com>
Cc: Horst Schirmeier <horst@schirmeier.com>
Cc: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] kbuild: fix-rR-is-now-default
Oleg Verych [Sun, 10 Dec 2006 10:18:40 +0000 (02:18 -0800)]
[PATCH] kbuild: fix-rR-is-now-default

`make -d help | grep Makefile` shows patterns, where make tries to rebuild
included and top makefiles.

While `make -rR is now default' commit should fix this, actually, it was just
a little janitorial.

This fix is aimed to complete cancelling implicit rules.

Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Oleg Verych <olecom@flower.upol.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Kconfig refactoring for better menu nesting
Don Mullis [Sun, 10 Dec 2006 10:18:37 +0000 (02:18 -0800)]
[PATCH] Kconfig refactoring for better menu nesting

Refactor Kconfig content to maximize nesting of menus by menuconfig and
xconfig.

Tested by simultaneously running `make xconfig` with and without
patch, and comparing displays.

Signed-off-by: Don Mullis <dwm@meer.net>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] ipc-procfs-sysctl mixups
Randy Dunlap [Sun, 10 Dec 2006 10:18:36 +0000 (02:18 -0800)]
[PATCH] ipc-procfs-sysctl mixups

When CONFIG_PROC_FS=n and CONFIG_PROC_SYSCTL=n but CONFIG_SYSVIPC=y, we get
this build error:

kernel/built-in.o:(.data+0xc38): undefined reference to `proc_ipc_doulongvec_minmax'
kernel/built-in.o:(.data+0xc88): undefined reference to `proc_ipc_doulongvec_minmax'
kernel/built-in.o:(.data+0xcd8): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xd28): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xd78): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xdc8): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xe18): undefined reference to `proc_ipc_dointvec'
make: *** [vmlinux] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] ucb1400_ts depends SND_AC97_BUS
Randy Dunlap [Sun, 10 Dec 2006 10:18:34 +0000 (02:18 -0800)]
[PATCH] ucb1400_ts depends SND_AC97_BUS

This driver is an AC97 codec according to its help text.  However, if SOUND is
disabled, the "select SND_AC97_BUS" still inserts that into the .config file:

#
# Sound
#
# CONFIG_SOUND is not set
CONFIG_SND_AC97_BUS=m

Even if the config software followed dependency chains on selects, we should
try to limit usage of "select" to library-type code that is needed (e.g., CRC
functions) instead of bus-type support.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] workstruct: fix ieee80211-softmac compile problem
David Howells [Sun, 10 Dec 2006 10:18:31 +0000 (02:18 -0800)]
[PATCH] workstruct: fix ieee80211-softmac compile problem

Fix ieee80211-softmac compile problem where it's using schedule_work() on a
delayed_work struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoMerge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
Linus Torvalds [Sat, 9 Dec 2006 21:31:07 +0000 (13:31 -0800)]
Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
  [PATCH] x86-64: no paravirt for X86_VOYAGER or X86_VISWS
  [PATCH] i386: Fix io_apic.c warning
  [PATCH] i386: export smp_num_siblings for oprofile
  [PATCH] x86: Work around gcc 4.2 over aggressive optimizer
  [PATCH] x86: Fix boot hang due to nmi watchdog init code
  [PATCH] x86: Fix verify_quirk_intel_irqbalance()
  [PATCH] i386: Update defconfig
  [PATCH] x86-64: Update defconfig

17 years ago[PATCH] x86-64: no paravirt for X86_VOYAGER or X86_VISWS
Randy Dunlap [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] x86-64: no paravirt for X86_VOYAGER or X86_VISWS

Since Voyager and Visual WS already define ARCH_SETUP,
it looks like PARAVIRT shouldn't be offered for them.

In file included from arch/i386/kernel/setup.c:63:
include/asm-i386/mach-visws/setup_arch.h:8:1: warning: "ARCH_SETUP" redefin=
ed
In file included from include/asm/msr.h:5,
                 from include/asm/processor.h:17,
                 from include/asm/thread_info.h:16,
                 from include/linux/thread_info.h:21,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:49,
                 from include/linux/capability.h:45,
                 from include/linux/sched.h:46,
                 from arch/i386/kernel/setup.c:26:
include/asm/paravirt.h:163:1: warning: this is the location of the previous=
 definition
In file included from arch/i386/kernel/setup.c:63:
include/asm-i386/mach-visws/setup_arch.h:8:1: warning: "ARCH_SETUP" redefin=
ed
In file included from include/asm/msr.h:5,
                 from include/asm/processor.h:17,
                 from include/asm/thread_info.h:16,
                 from include/linux/thread_info.h:21,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:49,
                 from include/linux/capability.h:45,
                 from include/linux/sched.h:46,
                 from arch/i386/kernel/setup.c:26:
include/asm/paravirt.h:163:1: warning: this is the location of the previous=
 definition

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
17 years ago[PATCH] i386: Fix io_apic.c warning
Andi Kleen [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] i386: Fix io_apic.c warning

gcc 4.2 warns

linux/arch/i386/kernel/io_apic.c: In function ‘create_irq’:
linux/arch/i386/kernel/io_apic.c:2488: warning: ‘vector’ may be used uninitialized in this function

The warning is false, but somewhat legitimate so work around it.

Signed-off-by: Andi Kleen <ak@suse.de>
17 years ago[PATCH] i386: export smp_num_siblings for oprofile
Randy Dunlap [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] i386: export smp_num_siblings for oprofile

oprofile uses smp_num_siblings without testing for CONFIG_X86_HT.
I looked at modifying oprofile, but this way is cleaner & simpler
and I didn't see a good reason not to just export it when CONFIG_SMP.

WARNING: "smp_num_siblings" [arch/i386/oprofile/oprofile.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andi Kleen <ak@suse.de>
17 years ago[PATCH] x86: Work around gcc 4.2 over aggressive optimizer
Andi Kleen [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] x86: Work around gcc 4.2 over aggressive optimizer

The new PDA code uses a dummy _proxy_pda variable to describe
memory references to the PDA. It is never referenced
in inline assembly, but exists as input/output arguments.
gcc 4.2 in some cases can CSE references to this which causes
unresolved symbols.  Define it to zero to avoid this.

Signed-off-by: Andi Kleen <ak@suse.de>
17 years ago[PATCH] x86: Fix boot hang due to nmi watchdog init code
Ravikiran G Thirumalai [Sat, 9 Dec 2006 20:33:35 +0000 (21:33 +0100)]
[PATCH] x86: Fix boot hang due to nmi watchdog init code

2.6.19  stopped booting (or booted based on build/config) on our x86_64
systems due to a bug introduced in 2.6.19.  check_nmi_watchdog schedules an
IPI on all cpus to  busy wait on a flag, but fails to set the busywait
flag if NMI functionality is disabled.  This causes the secondary cpus
to spin in an endless loop, causing the kernel bootup to hang.
Depending upon the build, the  busywait flag got overwritten (stack variable)
and caused  the kernel to bootup on certain builds.  Following patch fixes
the bug by setting the busywait flag before returning from check_nmi_watchdog.
I guess using a stack variable is not good here as the calling function could
potentially return while the busy wait loop is still spinning on the flag.

AK: I redid the patch significantly to be cleaner

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
17 years ago[PATCH] x86: Fix verify_quirk_intel_irqbalance()
Andi Kleen [Sat, 9 Dec 2006 20:33:35 +0000 (21:33 +0100)]
[PATCH] x86: Fix verify_quirk_intel_irqbalance()

Fix verify_quirk_intel_irqbalance(). genapic checks should really
happen only on affected versions of the E7520/E7320/E7525 based platforms.

AK: This should akpm's Coyote SDV

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
17 years ago[PATCH] i386: Update defconfig
Andi Kleen [Sat, 9 Dec 2006 20:33:35 +0000 (21:33 +0100)]
[PATCH] i386: Update defconfig

Signed-off-by: Andi Kleen <ak@suse.de>
17 years ago[PATCH] x86-64: Update defconfig
Andi Kleen [Sat, 9 Dec 2006 20:33:35 +0000 (21:33 +0100)]
[PATCH] x86-64: Update defconfig

Signed-off-by: Andi Kleen <ak@suse.de>
17 years agoMerge branch 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Sat, 9 Dec 2006 20:26:37 +0000 (12:26 -0800)]
Merge branch 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (21 commits)
  Fix http://bugzilla.kernel.org/show_bug.cgi?id=7606
  drm: add flag for mapping PCI DMA buffers read-only.
  drm: fix up irqflags in drm_lock.c
  drm: i915 updates
  drm: i915: fix up irqflags arg
  drm: i915: Only return EBUSY after we've established we need to schedule a new swap.
  drm: i915: Fix 'sequence has passed' condition in i915_vblank_swap().
  drm: i915: Add SAREA fileds for determining which pipe to sync window buffer swaps to.
  drm: Make handling of dev_priv->vblank_pipe more robust.
  drm: DRM_I915_VBLANK_SWAP ioctl: Take drm_vblank_seq_type_t instead
  drm: i915: Add ioctl for scheduling buffer swaps at vertical blanks.
  drm: Core vsync: Don't clobber target sequence number when scheduling signal.
  drm: Core vsync: Add flag DRM_VBLANK_NEXTONMISS.
  drm: Make locked tasklet handling more robust.
  drm: drm_rmdraw: Declare id and idx as signed so testing for < 0 works as intended.
  drm: Change first valid DRM drawable ID to be 1 instead of 0.
  drm: drawable locking + memory management fixes + copyright
  drm: Add support for interrupt triggered driver callback with lock held to DRM core.
  drm: Add support for tracking drawable information to core
  drm: add support for secondary vertical blank interrupt to i915
  ...

17 years ago[PATCH] WorkStruct: Use direct assignment rather than cmpxchg()
David Howells [Thu, 7 Dec 2006 11:33:26 +0000 (11:33 +0000)]
[PATCH] WorkStruct: Use direct assignment rather than cmpxchg()

Use direct assignment rather than cmpxchg() as the latter is unavailable
and unimplementable on some platforms and is actually unnecessary.

The use of cmpxchg() was to guard against two possibilities, neither of
which can actually occur:

 (1) The pending flag may have been unset or may be cleared.  However, given
     where it's called, the pending flag is _always_ set.  I don't think it
     can be unset whilst we're in set_wq_data().

     Once the work is enqueued to be actually run, the only way off the queue
     is for it to be actually run.

     If it's a delayed work item, then the bit can't be cleared by the timer
     because we haven't started the timer yet.  Also, the pending bit can't be
     cleared by cancelling the delayed work _until_ the work item has had its
     timer started.

 (2) The workqueue pointer might change.  This can only happen in two cases:

     (a) The work item has just been queued to actually run, and so we're
         protected by the appropriate workqueue spinlock.

     (b) A delayed work item is being queued, and so the timer hasn't been
       started yet, and so no one else knows about the work item or can
       access it (the pending bit protects us).

     Besides, set_wq_data() _sets_ the workqueue pointer unconditionally, so
     it can be assigned instead.

So, replacing the set_wq_data() with a straight assignment would be okay
in most cases.

The problem is where we end up tangling with test_and_set_bit() emulated
using spinlocks, and even then it's not a problem _provided_
test_and_set_bit() doesn't attempt to modify the word if the bit was
set.

If that's a problem, then a bitops-proofed assignment will be required -
equivalent to atomic_set() vs other atomic_xxx() ops.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Amiga PCMCIA NE2000 Ethernet dev->irq init
Kars de Jong [Sat, 9 Dec 2006 09:51:03 +0000 (10:51 +0100)]
[PATCH] Amiga PCMCIA NE2000 Ethernet dev->irq init

Amiga PCMCIA NE2000 Ethernet: Add missing initialization of dev->irq

Signed-off-by: Kars de Jong <jongk@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] m68k: EXPORT_SYMBOL(cache_{clear,push}) bogus comment
Geert Uytterhoeven [Sat, 9 Dec 2006 09:50:15 +0000 (10:50 +0100)]
[PATCH] m68k: EXPORT_SYMBOL(cache_{clear,push}) bogus comment

Remove bogus comments about unexporting cache_{push,clear}(), as inline
dma_cache_maintenance() (used by at least bionet and pamsnet) calls them.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] m68k/Atari: 2.6.18 Atari IDE interrupt needs SA_SHIRQ
Michael Schmitz [Sat, 9 Dec 2006 09:46:30 +0000 (10:46 +0100)]
[PATCH] m68k/Atari: 2.6.18 Atari IDE interrupt needs SA_SHIRQ

Atari IDE: The interrupt needs SA_SHIRQ now to get registered.

Signed-off-by: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Sun3 SCSI: Make sun3 scsi drivers compile/work again
Sam Creasey [Sat, 9 Dec 2006 09:37:05 +0000 (10:37 +0100)]
[PATCH] Sun3 SCSI: Make sun3 scsi drivers compile/work again

Make sun3 scsi drivers compile/work again (though with way too many warnings...)

Tested on 3/50, 3/60.

Signed-off-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Sun3: General updates
Sam Creasey [Sat, 9 Dec 2006 09:34:38 +0000 (10:34 +0100)]
[PATCH] Sun3: General updates

General compile fixes for 2.6.16 for sun3, and some updates to make the new
bootloader work correctly.  Tested on 3/50, 3/60, 3/80.

Signed-off-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] m68k/HP300: HP LANCE updates
Kars de Jong [Sat, 9 Dec 2006 09:29:58 +0000 (10:29 +0100)]
[PATCH] m68k/HP300: HP LANCE updates

- 7990: request_irq() should have SA_SHIRQ flag set
- hplance_init() printed dev->name before register_netdev() had filled it in

Signed-off-by: Kars de Jong <jongk@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sat, 9 Dec 2006 17:38:59 +0000 (09:38 -0800)]
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NETLINK]: Put {IFA,IFLA}_{RTA,PAYLOAD} macros back for userspace.
  [NET_SCHED] sch_htb: turn intermediate classes into leaves
  [NET_SCHED] sch_cbq: deactivating when grafting, purging etc.
  [XFRM]: Fix XFRMGRP_REPORT to use correct multicast group.
  [NET]: Force a cache line split in hh_cache in SMP.
  [NETPOLL]: make arp replies through netpoll use mac address of sender
  [NETLINK]: Restore API compatibility of address and neighbour bits
  [AX.25]: Fix default address and broadcast address initialization.
  [AX.25]: Constify ax25 utility functions
  [BNX2]: Add an error check.
  [NET]: Convert hh_lock to seqlock.

17 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Sat, 9 Dec 2006 01:21:38 +0000 (17:21 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [PATCH] add STB810 support (Philips PNX8550-based)
  [MIPS] Qemu now has an ELF loader.
  [MIPS] Add GENERIC_HARDIRQS_NO__DO_IRQ for i8259 users
  [MIPS] Optimize csum_partial for 64bit kernel
  [MIPS] Optimize flow of csum_partial
  [MIPS] Make csum_partial more readable
  [MIPS] Rename SNI_RM200_PCI to just SNI_RM preparing for more RM machines

17 years ago[NETLINK]: Put {IFA,IFLA}_{RTA,PAYLOAD} macros back for userspace.
David S. Miller [Sat, 9 Dec 2006 01:05:13 +0000 (17:05 -0800)]
[NETLINK]: Put {IFA,IFLA}_{RTA,PAYLOAD} macros back for userspace.

GLIBC uses them etc.

They are guarded by ifndef __KERNEL__ so nobody will start
accidently using them in the kernel again, it's just for
userspace.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED] sch_htb: turn intermediate classes into leaves
Jarek Poplawski [Fri, 8 Dec 2006 08:26:56 +0000 (00:26 -0800)]
[NET_SCHED] sch_htb: turn intermediate classes into leaves

- turn intermediate classes into leaves again when their
  last child is deleted (struct htb_class changed)

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED] sch_cbq: deactivating when grafting, purging etc.
Jarek Poplawski [Fri, 8 Dec 2006 08:25:55 +0000 (00:25 -0800)]
[NET_SCHED] sch_cbq: deactivating when grafting, purging etc.

- deactivating of active classes when q.qlen drops to zero
  (cbq_drop)

- a redundant instruction removed from cbq_deactivate_class

PS: probably htb_deactivate in htb_delete and
cbq_deactivate_class in cbq_delete are also
redundant now.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>