]> err.no Git - linux-2.6/log
linux-2.6
17 years agosched: tidy up SCHED_RR
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: tidy up SCHED_RR

- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.

[1] according to the following link, it's not compliant with SUSv3
(not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656

[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".
(i.e. it's more precise if a task is runnable at the moment)

yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.

results:

(SCHED_FIFO)

dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval
time_slice: 0 : 0

(SCHED_RR)

dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval
time_slice: 0 : 99984800

(SCHED_NORMAL)

dimm@earth:~/storage/prog$ ./rr_interval
time_slice: 0 : 19996960

(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)

dimm@earth:~/storage/prog$ taskset 1 ./rr_interval
time_slice: 0 : 9998480

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: uninline scheduler
Alexey Dobriyan [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: uninline scheduler

* save ~300 bytes
* activate_idle_task() was moved to avoid a warning

bloat-o-meter output:

add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <===
function                                     old     new   delta
__enqueue_entity                               -     165    +165
finish_task_switch                             -     110    +110
update_curr_rt                                 -      79     +79
__load_balance_iterator                        -      32     +32
__task_rq_unlock                               -      28     +28
find_process_by_pid                            -      24     +24
do_sched_setscheduler                        133     123     -10
sys_sched_rr_get_interval                    176     165     -11
sys_sched_getparam                           156     145     -11
normalize_rt_tasks                           482     470     -12
sched_getaffinity                            112      99     -13
sys_sched_getscheduler                        86      72     -14
sched_setaffinity                            226     212     -14
sched_setscheduler                           666     642     -24
load_balance_start_fair                       33       9     -24
load_balance_next_fair                        33       9     -24
dequeue_task_rt                              133      67     -66
put_prev_task_rt                              97      28     -69
schedule_tail                                133      50     -83
schedule                                     682     594     -88
enqueue_entity                               499     366    -133
task_new_fair                                317     180    -137

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: tweak wakeup granularity
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: tweak wakeup granularity

tweak wakeup granularity.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: optimize schedule() a bit on SMP
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: optimize schedule() a bit on SMP

optimize schedule() a bit on SMP, by moving the rq-clock update
outside the rq lock.

code size is the same:

      text    data     bss     dec     hex filename
     25725    2666      96   28487    6f47 sched.o.before
     25725    2666      96   28487    6f47 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix __pick_next_entity()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: fix __pick_next_entity()

The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).

The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be "Houston, we've got a problem" case.

it's +16 bytes to the ".text". Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: vslice fixups for non-0 nice levels
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: vslice fixups for non-0 nice levels

Make vslice accurate wrt nice levels, and add some comments
while we're at it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: whitespace cleanups
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: whitespace cleanups

more whitespace cleanups. No code changed:

      text    data     bss     dec     hex filename
     26553    2790     288   29631    73bf sched.o.before
     26553    2790     288   29631    73bf sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: mark scheduling classes as const
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: mark scheduling classes as const

mark scheduling classes as const. The speeds up the code
a bit and shrinks it:

   text    data     bss     dec     hex filename
  40027    4018     292   44337    ad31 sched.o.before
  40190    3842     292   44324    ad24 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler, fix latency
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler, fix latency

There is a possibility that because of task of a group moving from one
cpu to another, it may gain more cpu time that desired. See
http://marc.info/?l=linux-kernel&m=119073197730334 for details.

This is an attempt to fix that problem. Basically it simulates dequeue
of higher level entities as if they are going to sleep. Similarly it
simulate wakeup of higher level entities as if they are waking up from
sleep.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler, fix bloat
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler, fix bloat

Recent fix to check_preempt_wakeup() to check for preemption at higher
levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED.

Fix the problem.

  42277   10598     320   53195    cfcb kernel/sched.o-before_this_patch
  42216   10598     320   53134    cf8e kernel/sched.o-after_this_patch

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler, fix coding style issues
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler, fix coding style issues

Fix coding style issues reported by Randy Dunlap and others

Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: cleanup, remove stale comment
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: cleanup, remove stale comment

cleanup, remove stale comment.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: speed up and simplify vslice calculations
Peter Zijlstra [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: speed up and simplify vslice calculations

speed up and simplify vslice calculations.

[ From: Mike Galbraith <efault@gmx.de>: build fix ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up min_vruntime use
Peter Zijlstra [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: clean up min_vruntime use

clean up min_vruntime use.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: group scheduler SMP migration fix
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler SMP migration fix

group scheduler SMP migration fix: use task_cfs_rq(p) to get
to the relevant fair-scheduling runqueue of a task, rq->cfs
is not the right one.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up schedstats, cnt -> count
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: clean up schedstats, cnt -> count

rename all 'cnt' fields and variables to the less yucky 'count' name.

yuckage noticed by Andrew Morton.

no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:

   text    data     bss     dec     hex filename
  38236    3506      24   41766    a326 sched.o.before
  38240    3506      24   41770    a32a sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: yield fix
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: yield fix

fix yield bugs due to the current-not-in-rbtree changes: the task is
not in the rbtree so rbtree-removal is a no-no.

[ From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>: build fix. ]

also, nice code size reduction:

kernel/sched.o:
   text    data     bss     dec     hex filename
  38323    3506      24   41853    a37d sched.o.before
  38236    3506      24   41766    a326 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler wakeup latency fix
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler wakeup latency fix

group scheduler wakeup latency fix: when checking for preemption
we must check cross-group too, not just intra-group.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: remove set_leftmost()
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: remove set_leftmost()

Lee Schermerhorn noticed that set_leftmost() contains dead code,
remove this.

Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: clean up sched_fork()
Hiroshi Shimamoto [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: clean up sched_fork()

The adjusting sched_class is a missing part of the already existing "do
not leak PI boosting priority to the child" at the sched_fork(). This
patch moves the adjusting sched_class from wake_up_new_task() to
sched_fork().

this also shrinks the code a bit:

   text    data     bss     dec     hex filename
  40111    4018     292   44421    ad85 sched.o.before
  40102    4018     292   44412    ad7c sched.o.after

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: max_vruntime() simplification
Peter Zijlstra [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: max_vruntime() simplification

max_vruntime() simplification.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: fix sched_fork()
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: fix sched_fork()

fix sched_fork(): large latencies at new task creation time because
the ->vruntime was not fixed up cross-CPU, if the parent got migrated
after the child's CPU got set up.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix sign check error in place_entity()
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: fix sign check error in place_entity()

fix sign check error in place_entity() - we'd get excessive
latencies due to negatives being converted to large u64's.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: undo some of the recent changes
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: undo some of the recent changes

undo some of the recent changes that are not needed after all,
such as last_min_vruntime.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: remove last_min_vruntime effect
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: remove last_min_vruntime effect

remove last_min_vruntime use - prepare to remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: remove condition from set_task_cpu()
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: remove condition from set_task_cpu()

remove condition from set_task_cpu(). Now that ->vruntime
is not global anymore, it should (and does) work fine without
it too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: entity_key() fix
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)]
sched: entity_key() fix

entity_key() fix - we'd occasionally end up with a 0 vruntime
in the !initial case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched debug: check spread
Peter Zijlstra [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched debug: check spread

debug feature: check how well we schedule within a reasonable
vruntime 'spread' range. (note that CPU overload can increase
the spread, so this is not a hard condition, but normal loads
should be within the spread.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched debug: more width for parameter printouts
Ingo Molnar [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched debug: more width for parameter printouts

more width for parameter printouts in /proc/sched_debug.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: add vslice
Peter Zijlstra [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched: add vslice

add vslice: the load-dependent "virtual slice" a task should
run ideally, so that the observed latency stays within the
sched_latency window.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched debug: print settings
Ingo Molnar [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched debug: print settings

print the current value of all tunables in /proc/sched_debug output.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove unneeded tunables
Ingo Molnar [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched: remove unneeded tunables

remove unneeded tunables.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched debug: BKL usage statistics, fix
S.Caglar Onur [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched debug: BKL usage statistics, fix

build fix for the SCHED_DEBUG && !SCHEDSTATS case.

Signed-off-by: S.Ceglar Onur <caglar@pardus.org.tr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched debug: BKL usage statistics
Ingo Molnar [Mon, 15 Oct 2007 15:00:10 +0000 (17:00 +0200)]
sched debug: BKL usage statistics

add per task and per rq BKL usage statistics.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: enable CONFIG_FAIR_GROUP_SCHED=y by default
Ingo Molnar [Mon, 15 Oct 2007 15:00:09 +0000 (17:00 +0200)]
sched: enable CONFIG_FAIR_GROUP_SCHED=y by default

enable CONFIG_FAIR_GROUP_SCHED=y by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fair-group sched, cleanups
Ingo Molnar [Mon, 15 Oct 2007 15:00:09 +0000 (17:00 +0200)]
sched: fair-group sched, cleanups

fair-group sched, cleanups.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: add fair-user scheduler
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:09 +0000 (17:00 +0200)]
sched: add fair-user scheduler

Enable user-id based fair group scheduling. This is useful for anyone
who wants to test the group scheduler w/o having to enable
CONFIG_CGROUPS.

A separate scheduling group (i.e struct task_grp) is automatically created for
every new user added to the system. Upon uid change for a task, it is made to
move to the corresponding scheduling group.

A /proc tunable (/proc/root_user_share) is also provided to tune root
user's quota of cpu bandwidth.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: clean up code under CONFIG_FAIR_GROUP_SCHED
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:09 +0000 (17:00 +0200)]
sched: clean up code under CONFIG_FAIR_GROUP_SCHED

With the view of supporting user-id based fair scheduling (and not just
container-based fair scheduling), this patch renames several functions
and makes them independent of whether they are being used for container
or user-id based fair scheduling.

Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating
less-sized array for tg->cfs_rq[] and tf->se[]).

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: print &rq->cfs stats
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:09 +0000 (17:00 +0200)]
sched: print &rq->cfs stats

- Print &rq->cfs statistics as well (useful for group scheduling)

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: print nr_running and load in /proc/sched_debug
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:09 +0000 (17:00 +0200)]
sched: print nr_running and load in /proc/sched_debug

- print nr_running and load information for cfs_rq in /proc/sched_debug

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix minor bug in yield
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: fix minor bug in yield

- fix a minor bug in yield (seen for CONFIG_FAIR_GROUP_SCHED),
  group scheduling would skew when yield was called.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: revert recent removal of set_curr_task()
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: revert recent removal of set_curr_task()

Revert removal of set_curr_task.
Use put_prev_task/set_curr_task when changing groups/policies

Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: kernel/sched_fair.c whitespace cleanups
Ingo Molnar [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: kernel/sched_fair.c whitespace cleanups

some trivial whitespace cleanups.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix formatting of /proc/sched_debug
Mike Galbraith [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: fix formatting of /proc/sched_debug

fix formatting of /proc/sched_debug

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: enhance debug output
Ingo Molnar [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: enhance debug output

enhance debug output by changing 12345678 nsecs to 12.345678 output,
this is more human-readable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: prettify /proc/sched_debug output
Ingo Molnar [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: prettify /proc/sched_debug output

print the correct amount of dashes in /proc/sched_debug.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()

rework enqueue/dequeue_entity() to get rid of
sched_class::set_curr_task(). This simplifies sched_setscheduler(),
rt_mutex_setprio() and sched_move_tasks().

   text    data     bss     dec     hex filename
  24330    2734      20   27084    69cc sched.o.before
  24233    2730      20   26983    6967 sched.o.after

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: simplify sched_class::yield_task()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: simplify sched_class::yield_task()

the 'p' (task_struct) parameter in the sched_class :: yield_task() is
redundant as the caller is always the 'current'. Get rid of it.

   text    data     bss     dec     hex filename
  24341    2734      20   27095    69d7 sched.o.before
  24330    2734      20   27084    69cc sched.o.after

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: optimize task_new_fair()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: optimize task_new_fair()

due to the fact that we no longer keep the 'current' within the tree,
dequeue/enqueue_entity() is useless for the 'current' in
task_new_fair(). We are about to reschedule and
sched_class->put_prev_task() will put the 'current' back into the tree,
based on its new key.

   text    data     bss     dec     hex filename
  24388    2734      20   27142    6a06 sched.o.before
  24341    2734      20   27095    69d7 sched.o.after

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix delay accounting performance regression
Ingo Molnar [Mon, 15 Oct 2007 15:00:08 +0000 (17:00 +0200)]
sched: fix delay accounting performance regression

fix delay accounting performance regression - those sched_clock()
calls are not needed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: do not keep current in the tree and get rid of sched_entity::fair_key
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: do not keep current in the tree and get rid of sched_entity::fair_key

Get rid of 'sched_entity::fair_key'.

As a side effect, 'current' is not kept withing the tree for
SCHED_NORMAL/BATCH tasks anymore. This simplifies some parts of code
(e.g. entity_tick() and yield_task_fair()) and also somewhat optimizes
them (e.g. a single update_curr() now vs. dequeue/enqueue() before in
entity_tick()).

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: add set_curr_task() calls
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: add set_curr_task() calls

p->sched_class->set_curr_task() has to be called before
activate_task()/enqueue_task() in rt_mutex_setprio(),
sched_setschedule() and sched_move_task() in order to set up
'cfs_rq->curr'. The logic of enqueueing depends on whether a task to be
inserted is 'current' or not.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: sched_setscheduler() fix
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: sched_setscheduler() fix

Fix a problem in the 'sched-group' patch for !CONFIG_FAIR_GROUP_SCHED.

description:

sched_setscheduler()
{
...
if (task_running()) p->sched_class->put_prev_entity();

[ this one sets up cfs_rq->curr to NULL ]

...

if (task_running) p->sched_class->set_curr_task();

[ and this one is a _NOP_ (empty) for !CONFIG_FAIR_GROUP_SCHED ]

As a result, the task continues to run with cfs_rq->curr == NULL... no
crashes (due to checks for !NULL in place) but e.g. update_curr()
effectively becomes a NOP... i.e. runtime statistics for this task is
not accounted untill it's rescheduled anew.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group-scheduler core
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: group-scheduler core

Add interface to control cpu bandwidth allocation to task-groups.

(not yet configurable, due to missing CONFIG_CONTAINERS)

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: fix SMP migration latencies
Mike Galbraith [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: fix SMP migration latencies

fix SMP migration latencies: the vruntimes of different CPUs are
at incompatible offsets so they have to be fixed up when migrating
a task across CPUs.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: better min_vruntime tracking
Peter Zijlstra [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: better min_vruntime tracking

Better min_vruntime tracking: update it every time 'curr' is
updated - not just when a task is enqueued into the tree.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: x86: allow single-depth wchan output
Ingo Molnar [Mon, 15 Oct 2007 15:00:07 +0000 (17:00 +0200)]
sched: x86: allow single-depth wchan output

sched.o gets smaller and faster if we compile it with -fomit-frame-pointers,
so make this a config option. The cost is the loss of multi-depth wchan
lookups - but SysRq-T is a sufficient replacement for them anyway, so their
utility is much lower these days.

the size difference is significant:

   text    data     bss     dec     hex filename
  34005    3462      24   37491    9273 sched.o.before
  33470    3462      24   36956    905c sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: clean up schedstat block in dequeue_entity()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:06 +0000 (17:00 +0200)]
sched: clean up schedstat block in dequeue_entity()

Better placement of #ifdef CONFIG_SCHEDSTAT block in dequeue_entity().

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove wait_runtime fields and features
Ingo Molnar [Mon, 15 Oct 2007 15:00:06 +0000 (17:00 +0200)]
sched: remove wait_runtime fields and features

remove wait_runtime based fields and features, now that the CFS
math has been changed over to the vruntime metric.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove wait_runtime limit
Ingo Molnar [Mon, 15 Oct 2007 15:00:06 +0000 (17:00 +0200)]
sched: remove wait_runtime limit

remove the wait_runtime-limit fields and the code depending on it, now
that the math has been changed over to rely on the vruntime metric.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: clean up struct load_stat
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:06 +0000 (17:00 +0200)]
sched: clean up struct load_stat

'struct load_stat' is redundant now so let's get rid of it.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: debug: update exec_clock only when SCHED_DEBUG
Ingo Molnar [Mon, 15 Oct 2007 15:00:06 +0000 (17:00 +0200)]
sched: debug: update exec_clock only when SCHED_DEBUG

micro-optimization: update cfs_rq->exec_clock only if
CONFIG_SCHED_DEBUG=y.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: add more vruntime statistics
Ingo Molnar [Mon, 15 Oct 2007 15:00:06 +0000 (17:00 +0200)]
sched: add more vruntime statistics

add more vruntime statistics.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: handle vruntime 64-bit overflow
Peter Zijlstra [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: handle vruntime 64-bit overflow

Handle vruntime overflow by centering the key space around min_vruntime.

( otherwise we could overflow 64-bit vruntime in a few days with SCHED_IDLE
 tasks - or in a few years with nice +19. )

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: add tree based averages
Peter Zijlstra [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: add tree based averages

add support for tree based vruntime averages.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove SCHED_FEAT_SKIP_INITIAL
Ingo Molnar [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: remove SCHED_FEAT_SKIP_INITIAL

remove SCHED_FEAT_SKIP_INITIAL - it was off by default and even
when enabled it never made any real difference.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: add se->vruntime debugging
Ingo Molnar [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: add se->vruntime debugging

debug se->vruntime fields.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
17 years agosched: clean up new task placement
Peter Zijlstra [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: clean up new task placement

clean up new task placement.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
17 years agosched: wakeup granularity increase
Ingo Molnar [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: wakeup granularity increase

increase wakeup granularity - we were overscheduling a bit.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
17 years agosched: simplify check_preempt() methods
Ingo Molnar [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: simplify check_preempt() methods

simplify the check_preempt() methods.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
17 years agosched: simplify adaptive latency
Peter Zijlstra [Mon, 15 Oct 2007 15:00:05 +0000 (17:00 +0200)]
sched: simplify adaptive latency

simplify adaptive latency.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: new task placement for vruntime
Peter Zijlstra [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: new task placement for vruntime

add proper new task placement for the vruntime based math too.

( note: introduces a swap() macro, but the swap token is too
  widely used in the kernel namespace for a generic version
  to be added without changing non-scheduler code - so this
  cleanup will be done separately. )

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: optimize vruntime based scheduling
Ingo Molnar [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: optimize vruntime based scheduling

optimize vruntime based scheduling.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: move sched_feat() definitions
Ingo Molnar [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: move sched_feat() definitions

move sched_feat() definitions so that it can be used sooner by generic
code too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: introduce se->vruntime
Ingo Molnar [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: introduce se->vruntime

introduce se->vruntime as a sum of weighted delta-exec's, and use that
as the key into the tree.

the idea to use absolute virtual time as the basic metric of scheduling
has been first raised by William Lee Irwin, advanced by Tong Li and first
prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset.

also see:

   http://lkml.org/lkml/2007/9/2/76

for a simpler variant of this patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: clean up calc_weighted()
Ingo Molnar [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: clean up calc_weighted()

clean up calc_weighted() - we always use the normalized shift so
it's not needed to pass that in. Also, push the non-nice0 branch
into the function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: speed up update_load_add/_sub()
Ingo Molnar [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: speed up update_load_add/_sub()

speed up update_load_add/_sub() by not delaying the division - this
reduces CPU pipeline dependencies.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: uninline __enqueue_entity()/__dequeue_entity()
Ingo Molnar [Mon, 15 Oct 2007 15:00:04 +0000 (17:00 +0200)]
sched: uninline __enqueue_entity()/__dequeue_entity()

suggested by Roman Zippel: uninline __enqueue_entity() and
__dequeue_entity().

this reduces code size:

      text    data     bss     dec     hex filename
     25385    2386      16   27787    6c8b sched.o.before
     25257    2386      16   27659    6c0b sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: simplify SCHED_FEAT_* code
Peter Zijlstra [Mon, 15 Oct 2007 15:00:03 +0000 (17:00 +0200)]
sched: simplify SCHED_FEAT_* code

Peter Zijlstra suggested to simplify SCHED_FEAT_* checks via the
sched_feat(x) macro.

No code changed:

   text    data     bss     dec     hex filename
   38895    3550      24   42469    a5e5 sched.o.before
   38895    3550      24   42469    a5e5 sched.o.after

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: cleanup: simplify cfs_rq_curr() methods
Ingo Molnar [Mon, 15 Oct 2007 15:00:03 +0000 (17:00 +0200)]
sched: cleanup: simplify cfs_rq_curr() methods

cleanup: simplify cfs_rq_curr() methods - now that the cfs_rq->curr
pointer is unconditionally present, remove the wrappers.

  kernel/sched.o:
      text    data     bss     dec     hex filename
     11784     224    2012   14020    36c4 sched.o.before
     11784     224    2012   14020    36c4 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: track cfs_rq->curr on !group-scheduling too
Ingo Molnar [Mon, 15 Oct 2007 15:00:03 +0000 (17:00 +0200)]
sched: track cfs_rq->curr on !group-scheduling too

Noticed by Roman Zippel: use cfs_rq->curr in the !group-scheduling
case too. Small micro-optimization and cleanup effect:

   text    data     bss     dec     hex filename
   36269    3482      24   39775    9b5f sched.o.before
   36177    3486      24   39687    9b07 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove precise CPU load calculations #2
Ingo Molnar [Mon, 15 Oct 2007 15:00:03 +0000 (17:00 +0200)]
sched: remove precise CPU load calculations #2

continued removal of precise CPU load calculations.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove precise CPU load
Ingo Molnar [Mon, 15 Oct 2007 15:00:03 +0000 (17:00 +0200)]
sched: remove precise CPU load

CPU load calculations are statistical anyway, and there's little benefit
from having it calculated on every scheduling event. So remove this code,
it gets rid of a divide from the scheduler wakeup and context-switch
fastpath.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: remove stat_gran
Ingo Molnar [Mon, 15 Oct 2007 15:00:03 +0000 (17:00 +0200)]
sched: remove stat_gran

remove the stat_gran code - it was disabled by default and it causes
unnecessary overhead.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: use constants if !CONFIG_SCHED_DEBUG
Ingo Molnar [Mon, 15 Oct 2007 15:00:02 +0000 (17:00 +0200)]
sched: use constants if !CONFIG_SCHED_DEBUG

use constants if !CONFIG_SCHED_DEBUG.

this speeds up the code and reduces code-size:

    text    data     bss     dec     hex filename
   27464    3014      16   30494    771e sched.o.before
   26929    3010      20   29959    7507 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: uniform tunings
Ingo Molnar [Mon, 15 Oct 2007 15:00:02 +0000 (17:00 +0200)]
sched: uniform tunings

use the same defaults on both UP and SMP.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: debug: track maximum 'slice'
Ingo Molnar [Mon, 15 Oct 2007 15:00:02 +0000 (17:00 +0200)]
sched: debug: track maximum 'slice'

track the maximum amount of time a task has executed while
the CPU load was at least 2x. (i.e. at least two nice-0
tasks were runnable)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: small sched_debug cleanup
Ingo Molnar [Mon, 15 Oct 2007 15:00:02 +0000 (17:00 +0200)]
sched: small sched_debug cleanup

small kernel/sched_debug.c cleanup - break up
multi-variable assignment.

no code changed:

   text    data     bss     dec     hex filename
   38869    3550      24   42443    a5cb sched.o.before
   38869    3550      24   42443    a5cb sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: use list_for_each_entry_safe() in __wake_up_common()
Matthias Kaehlcke [Mon, 15 Oct 2007 15:00:02 +0000 (17:00 +0200)]
sched: use list_for_each_entry_safe() in __wake_up_common()

Use list_for_each_entry_safe() instead of list_for_each_safe() in
__wake_up_common()

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: resched task in task_new_fair()
Ingo Molnar [Mon, 15 Oct 2007 15:00:02 +0000 (17:00 +0200)]
sched: resched task in task_new_fair()

to get full child-runs-first semantics make sure the parent is
rescheduled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix sysctl_sched_child_runs_first flag
Ingo Molnar [Mon, 15 Oct 2007 15:00:01 +0000 (17:00 +0200)]
sched: fix sysctl_sched_child_runs_first flag

fix the sched_child_runs_first flag: always call into ->task_new()
if we are on the same CPU, as SCHED_OTHER tasks depend on it for
correct initial setup.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agoFix compile while compiling drivers/mmc/host/mmc_spi.o with !BLOCK
David Brownell [Sun, 14 Oct 2007 21:50:25 +0000 (14:50 -0700)]
Fix compile while compiling drivers/mmc/host/mmc_spi.o with !BLOCK

Make sure the mmc_spi driver can build without CONFIG_BLOCK.
Issue noted by "Avuton Olrich" <avuton@gmail.com> and randconfig.

While that won't be a common configuration, sometimes embedded
boards use SDIO to interface WLAN or Bluetooth chips (vs some
parallel interface), and don't provide an MMC/SD socket for use
with flash memory cards.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86
Linus Torvalds [Sun, 14 Oct 2007 23:47:05 +0000 (16:47 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86

* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86:
  x86: force timer broadcast on late AMD C1E detection
  x86: move local APIC timer init to the end of start_secondary()
  clockevents: introduce force broadcast notifier
  x86: fix missing include for vsyscall

17 years agosky2: reboot fix
Stephen Hemminger [Sun, 14 Oct 2007 20:25:22 +0000 (13:25 -0700)]
sky2: reboot fix

The call to napi_disable() in the PCI shutdown handler is problematic,
and is aggravated by the new NAPI.
Also, make sure watchdog timer doesn't go off.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agox86: force timer broadcast on late AMD C1E detection
Thomas Gleixner [Sun, 14 Oct 2007 20:57:45 +0000 (22:57 +0200)]
x86: force timer broadcast on late AMD C1E detection

The 64bit SMP bootup is slightly different to the 32bit one. It enables
the boot CPU local APIC timer before all CPUs are brought up. Some AMD C1E
systems have the C1E feature flag only set in the secondary CPU. Due to
the early enable of the boot CPU local APIC timer the APIC timer is
registered as a fully functional device. When we detect the wreckage during
the bringup of the secondary CPU, we need to force the boot CPU into
broadcast mode.

Check the C1E caused APIC timer disable, when the secondary APIC timer is
initialized. If the boot CPU APIC timer was registered as a functional
clock event device, then fix this up and utilize the
CLOCK_EVT_NOTIFY_BROADCAST_FORCE mechanism to force the already
registered boot CPU APIC timer into broadcast mode.

Tested by force injecting the failure mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years agox86: move local APIC timer init to the end of start_secondary()
Thomas Gleixner [Sun, 14 Oct 2007 20:57:45 +0000 (22:57 +0200)]
x86: move local APIC timer init to the end of start_secondary()

Preparatory patch for the AMD C1E wreckage fixup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years agoclockevents: introduce force broadcast notifier
Thomas Gleixner [Sun, 14 Oct 2007 20:57:45 +0000 (22:57 +0200)]
clockevents: introduce force broadcast notifier

The 64bit SMP bootup is slightly different to the 32bit one. It enables
the boot CPU local APIC timer before all CPUs are brought up. Some AMD C1E
systems have the C1E feature flag only set in the secondary CPU. Due to
the early enable of the boot CPU local APIC timer the APIC timer is
registered as a fully functional device. When we detect the wreckage during
the bringup of the secondary CPU, we need to force the boot CPU into
broadcast mode.

Add a new notifier reason and implement the force broadcast in the clock
events layer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years agox86: fix missing include for vsyscall
Dave Jones [Sun, 14 Oct 2007 20:57:45 +0000 (22:57 +0200)]
x86: fix missing include for vsyscall

 > Maybe I just picked a bad time to try, but...
 >
 > arch/x86/kernel/alternative.c: In function 'apply_alternatives':
 > arch/x86/kernel/alternative.c:191: error: 'VSYSCALL_START' undeclared (first use in this function)
 > arch/x86/kernel/alternative.c:191: error: (Each undeclared identifier is reported only once
 > arch/x86/kernel/alternative.c:191: error: for each function it appears in.)
 > arch/x86/kernel/alternative.c:191: error: 'VSYSCALL_END' undeclared (first use in this function)
 > make[1]: *** [arch/x86/kernel/alternative.o] Error 1
 > make: *** [arch/x86/kernel] Error 2

Try this.

Include missing header for vsyscall.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years agoMerge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6
Linus Torvalds [Sun, 14 Oct 2007 19:50:19 +0000 (12:50 -0700)]
Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6

* 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (53 commits)
  hwmon: (vt8231) fix sparse warning
  hwmon: (sis5595) fix sparse warning
  hwmon: (w83627hf) don't assume bank 0
  hwmon: (w83627hf) Fix setting fan min right after driver load
  hwmon: (w83627hf) De-macro sysfs callback functions
  hwmon: Add new combined driver for FSC chips
  hwmon: (ibmpex) Release IPMI user if hwmon registration fails
  hwmon: (dme1737) Add sch311x support
  hwmon: (dme1737) group functions logically
  hwmon: (dme1737) cleanups
  hwmon: IBM power meter driver
  hwmon: (coretemp) Add support for Celeron 4xx
  hwmon: (lm87) Disable VID when it should be
  hwmon: (w83781d) Add individual alarm and beep files
  hwmon: VRM is not read from registers
  MAINTAINERS: update hwmon subsystem git trees
  hwmon: Fix the code examples in documentation
  hwmon: update sysfs interface document - error handling
  hwmon: (thmc50) Fix a debug message
  hwmon: (thmc50) Don't create temp3 if not enabled
  ...

17 years agohisax: hfc_usb: update to current CVS version
Martin Bachem [Sun, 14 Oct 2007 16:10:30 +0000 (18:10 +0200)]
hisax: hfc_usb: update to current CVS version

- killed paranoid NULL Pointer check
- human readable LED states
- support for "Eicon DIVA USB 4.0" (0x071d/0x1005)

Signed-off-by: Martin Bachem <info@colognechip.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>