Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
hrtimer: optimize the softirq time optimization
hrtimer: reduce calls to hrtimer_get_softirq_time()
clockevents: fix typo in tick-broadcast.c
jiffies: add time_is_after_jiffies and others which compare with jiffies
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (77 commits)
x86: UV startup of slave cpus
x86: integrate pci-dma.c
x86: don't do dma if mask is NULL.
x86: return conditional to mmu
x86: remove kludge from x86_64
x86: unify gfp masks
x86: retry allocation if failed
x86: don't try to allocate from DMA zone at first
x86: use a fallback dev for i386
x86: use numa allocation function in i386
x86: remove virt_to_bus in pci-dma_64.c
x86: adjust dma_free_coherent for i386
x86: move bad_dma_address
x86: isolate coherent mapping functions
x86: move dma_coherent functions to pci-dma.c
x86: merge iommu initialization parameters
x86: merge dma_supported
x86: move pci fixup to pci-dma.c
x86: move x86_64-specific to common code.
x86: move initialization functions to pci-dma.c
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (27 commits)
sh: Fix up L2 cache probe.
sh: Fix up SH-4A part probe.
sh: Add support for SH7723 CPU subtype.
sh: Fix up SH7763 build.
sh: Add migor_ts support to MigoR
sh: Add rs5c732b RTC support to MigoR
sh: Add I2C support to MigoR
sh: Add I2C platform data to sh7722
sh: MigoR NAND flash support using gen_flash
sh: MigoR NOR flash support using physmap-flash
sh: Fix up mach-types formatting from merge damage.
sh: r7780rp: Hook up the I2C and SMBus platform devices.
sh: Use phyical addresses for MigoR smc91x resources
sh: Use physical addresses for sh7722 USBF resources
sh: Add MigoR header file
Fix sh_keysc double free
sh: Fix up __access_ok() check for nommu.
sh: Allow optimized clear/copy page routines to be used on SH-2.
sh: Hook up the rest of the SH7770 serial ports.
sh: Add support for Solution Engine SH7721 board
...
The RCU iterators used 'rcu_dereference()' on an already-fetched RCU
pointer value, which defeats the whole point of the exercise.
When we dereference a pointer protected by RCU, we need to make sure
that we only fetch the value _once_, because if the compiler ends up
re-loading it due to register pressure, the newly reloaded value could
be different from the previously fetched one, and you get inconsistent
results.
Cleaned-up, fixed, and the pointless list_for_each_safe_rcu #define
deleted by Paul Kenney.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Sat, 19 Apr 2008 19:31:26 +0000 (21:31 +0200)]
hrtimer: optimize the softirq time optimization
The previous optimization did not take the case into account where a
clock provides its own softirq_get_time() function.
Check for the availablitiy of the clock get time function first and
then check if we need to retrieve the time for both clocks via
hrtimer_softirq_gettime() to avoid a double evaluation of time in that
case as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hrtimer: reduce calls to hrtimer_get_softirq_time()
It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more
often than it needs to. This can cause frequent contention on systems with
large numbers of processors/cores.
With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if
there is a pending timer in one of the hrtimer bases, and only once.
This also combines hrtimer_run_queues() and the inline run_hrtimer_queue()
into one function.
[ tglx@linutronix.de: coding style ]
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Glauber Costa [Fri, 18 Apr 2008 20:38:58 +0000 (13:38 -0700)]
clockevents: fix typo in tick-broadcast.c
braodcast -> broadcast
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dave Young [Fri, 18 Apr 2008 20:38:57 +0000 (13:38 -0700)]
jiffies: add time_is_after_jiffies and others which compare with jiffies
Most of time_after like macros usages just compare jiffies and another number,
so here add some time_is_* macros for convenience.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Sat, 19 Apr 2008 17:45:00 +0000 (19:45 +0200)]
sched: rt-group: optimize dequeue_rt_stack
Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
of RT task dequeues will really hurt. Optimize this by providing space to
store the tree path, so we can walk it the other way.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Sat, 19 Apr 2008 17:45:00 +0000 (19:45 +0200)]
sched: fair-group: SMP-nice for group scheduling
Implement SMP nice support for the full group hierarchy.
On each load-balance action, compile a sched_domain wide view of the full
task_group tree. We compute the domain wide view when walking down the
hierarchy, and readjust the weights when walking back up.
After collecting and readjusting the domain wide view, we try to balance the
tasks within the task_groups. The current approach is a naively balance each
task group until we've moved the targeted amount of load.
Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
paper.
XXX: there will be some numerical issues due to the limited nature of
SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
total weight. When the tree is deep enough, or the task weight small
enough, we'll run out of bits.
This patch introduces new feature of cpuset - sched domain customization.
This version provides a per-cpuset file 'sched_relax_domain_level' that
enable us to change the searching range of scheduler, which used to limit
how many cpus the scheduler searches at some schedule events, such as
wakening task and running out of runqueue.
This patch allows tasks and groups to exist in the same cfs_rq. With this
change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N)
fairness model where M tasks and N groups exist at the cfs_rq level.
Mike Travis [Wed, 26 Mar 2008 21:23:49 +0000 (14:23 -0700)]
sched: add new set_cpus_allowed_ptr function
Add a new function that accepts a pointer to the "newly allowed cpus"
cpumask argument.
int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
The current set_cpus_allowed() function is modified to use the above
but this does not result in an ABI change. And with some compiler
optimization help, it may not introduce any additional overhead.
Additionally, to enforce the read only nature of the new_mask arg, the
"const" property is migrated to sub-functions called by set_cpus_allowed.
This silences compiler warnings.
Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Wed, 26 Mar 2008 21:23:48 +0000 (14:23 -0700)]
init: move setup of nr_cpu_ids to as early as possible
Move the setting of nr_cpu_ids from sched_init() to start_kernel()
so that it's available as early as possible.
Note that an arch has the option of setting it even earlier if need be,
but it should not result in a different value than the setup_nr_cpu_ids()
function.
Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Tue, 8 Apr 2008 18:43:04 +0000 (11:43 -0700)]
cpumask: add show cpu map functions
* Add cpu_sysdev_class functions to display the following maps
with cpulist_scnprintf().
cpu_online_map
cpu_present_map
cpu_possible_map
* Small change to include/linux/sysdev.h to allow the attribute
name and label to be different (to avoid collision with the
"attr_online" entry for bringing cpus on- and off-line.)
Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Sat, 5 Apr 2008 01:11:01 +0000 (18:11 -0700)]
x86: convert cpumask_of_cpu macro to allocated array
* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.
Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Sat, 5 Apr 2008 01:11:02 +0000 (18:11 -0700)]
cpumask: add CPU_MASK_ALL_PTR macro
* Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
a pointer reference to CPU_MASK_ALL. This reduces where possible
the instances where CPU_MASK_ALL allocates and fills a large
array on the stack. Used only if NR_CPUS > BITS_PER_LONG.
* Change init/main.c to use new set_cpus_allowed_ptr().
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Sat, 5 Apr 2008 01:11:11 +0000 (18:11 -0700)]
cpumask: reduce stack usage in SD_x_INIT initializers
* Remove empty cpumask_t (and all non-zero/non-null) variables
in SD_*_INIT macros. Use memset(0) to clear. Also, don't
inline the initializer functions to save on stack space in
build_sched_domains().
* Merge change to include/linux/topology.h that uses the new
node_to_cpumask_ptr function in the nr_cpus_node macro into
this patch.
Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Sat, 5 Apr 2008 01:11:10 +0000 (18:11 -0700)]
nodemask: use new node_to_cpumask_ptr function
* Use new node_to_cpumask_ptr. This creates a pointer to the
cpumask for a given node. This definition is in mm patch:
asm-generic-add-node_to_cpumask_ptr-macro.patch
* Use new set_cpus_allowed_ptr function.
Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function
[x86/latest]: x86: add cpus_scnprintf function
Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Sat, 5 Apr 2008 01:11:06 +0000 (18:11 -0700)]
generic: use new set_cpus_allowed_ptr function
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
Mike Travis [Sat, 5 Apr 2008 01:11:05 +0000 (18:11 -0700)]
x86: use new set_cpus_allowed_ptr function
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Len Brown <len.brown@intel.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
(1) - these arrays are allocated via alloc_bootmem_low()
* Change sched_domain_debug_one() to use cpulist_scnprintf instead of
cpumask_scnprintf. This reduces the output buffer required and improves
readability when large NR_CPU count machines arrive.
* In sched_create_group() we allocate new arrays based on nr_cpu_ids.
Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Mon, 31 Mar 2008 15:41:55 +0000 (08:41 -0700)]
asm-generic: add node_to_cpumask_ptr macro
Create a simple macro to always return a pointer to the node_to_cpumask(node)
value. This relies on compiler optimization to remove the extra indirection:
A node_to_cpumask_ptr_next() macro is provided to access another
node_to_cpumask value.
The other change is to always include asm-generic/topology.h moving the
ifdef CONFIG_NUMA to this same file.
Note: there are no references to either of these new macros in this patch,
only the definition.
Based on 2.6.25-rc5-mm1
# alpha Cc: Richard Henderson <rth@twiddle.net>
# fujitsu Cc: David Howells <dhowells@redhat.com>
# ia64 Cc: Tony Luck <tony.luck@intel.com>
# powerpc Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org>
# sparc Cc: David S. Miller <davem@davemloft.net> Cc: William L. Irwin <wli@holomorphy.com>
# x86 Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
The existing code passed a reference to cpu 0's instance of struct op_msrs
to model->shutdown, whilst the other functions are passed a reference to
<this cpu's> instance of a struct op_msrs. This seemed to be a bug to me
even though as long as cpu 0 and <this cpu> are of the same type it would
have the same effect...?
Cc: Philippe Elie <phil.el@wanadoo.fr> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Tue, 25 Mar 2008 22:06:55 +0000 (15:06 -0700)]
cpumask: add cpumask_scnprintf_len function
Add a new function cpumask_scnprintf_len() to return the number of
characters needed to display "len" cpumask bits. The current method
of allocating NR_CPUS bytes is incorrect as what's really needed is
9 characters per 32-bit word of cpumask bits (8 hex digits plus the
seperator [','] or the terminating NULL.) This function provides the
caller the means to allocate the correct string length.
Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Olof Johansson [Tue, 4 Mar 2008 23:23:25 +0000 (15:23 -0800)]
tasklets: execute tasklets in the same order they were queued
I noticed this when looking at an openswan issue. Openswan (ab?)uses the
tasklet API to defer processing of packets in some situations, with one
packet per tasklet_action(). I started noticing sequences of
backwards-ordered sequence numbers coming over the wire, since new tasklets
are always queued at the head of the list but processed sequentially.
Convert it to instead append new entries to the tail of the list. As an
extra bonus, the splicing code in takeover_tasklets() no longer has to
iterate over the list.
Signed-off-by: Olof Johansson <olof@lixom.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Sat, 19 Apr 2008 17:44:58 +0000 (19:44 +0200)]
sched: rt-group: smp balancing
Currently the rt group scheduling does a per cpu runtime limit, however
the rt load balancer makes no guarantees about an equal spread of real-
time tasks, just that at any one time, the highest priority tasks run.
Solve this by making the runtime limit a global property by borrowing
excessive runtime from the other cpus once the local limit runs out.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Sat, 19 Apr 2008 17:44:57 +0000 (19:44 +0200)]
sched: fix wakeup granularity for buddies
The wakeup buddy logic didn't use the same wakeup granularity logic as the
wakeup preemption did, this might cause the ->next buddy to be selected past
the point where we would have preempted had the task been a single running
instance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
sched: fix rq->clock overflows detection with CONFIG_NO_HZ
When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC.
We check that the number of skipped ticks matches the clock jump seen in
__update_rq_clock().
Reynes Philippe [Mon, 17 Mar 2008 23:19:05 +0000 (16:19 -0700)]
sched: sched.c needs tick.h
kernel/sched.c:506: erreur: implicit declaration of function tick_get_tick_sched
kernel/sched.c:506: erreur: invalid type argument of ->
kernel/sched.c:506: erreur: NOHZ_MODE_INACTIVE undeclared (first use in this function)
kernel/sched.c:506: erreur: (Each undeclared identifier is reported only once
kernel/sched.c:506: erreur: for each function it appears in.)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jack Steiner [Wed, 16 Apr 2008 16:45:15 +0000 (11:45 -0500)]
x86: UV startup of slave cpus
This patch changes smpboot.c so that it can start slave cpus running
in UV non-unique apicid mode. The SIPI must be sent using a UV-specific
mechanism.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 9 Apr 2008 16:18:09 +0000 (13:18 -0300)]
x86: don't do dma if mask is NULL.
if the device hasn't provided a mask, abort allocation.
Note that we're using a fallback device now, so it does not cover
the case of a NULL device: just drivers passing NULL masks around.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Glauber Costa [Wed, 9 Apr 2008 16:18:08 +0000 (13:18 -0300)]
x86: return conditional to mmu
Just return our allocation if we don't have an mmu. For i386, where this patch
is being applied, we never have. So our goal is just to have the code to look like
x86_64's.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Glauber Costa [Wed, 9 Apr 2008 16:18:06 +0000 (13:18 -0300)]
x86: unify gfp masks
Use the same gfp masks for x86_64 and i386.
It involves using HIGHMEM or DMA32 where necessary, for the sake
of code compatibility, (no real effect), and using the NORETRY
mask for i386.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Glauber Costa [Wed, 9 Apr 2008 16:18:05 +0000 (13:18 -0300)]
x86: retry allocation if failed
This patch puts in the code to retry allocation in case it fails. By its
own, it does not make much sense but making the code look like x86_64.
But later patches in this series will make we try to allocate from
zones other than DMA first, which will possibly fail.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Glauber Costa [Tue, 8 Apr 2008 16:21:01 +0000 (13:21 -0300)]
x86: remove virt_to_bus in pci-dma_64.c
virt_to_bus() is deprecated according to the docs, and moreover,
won't return the right thing in i386 if we're dealing with high memory mappings.
So we make our allocation function return a page, and then use page_address() (for
virtual addr) and page_to_phys() (for physical addr) instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Glauber Costa [Tue, 8 Apr 2008 16:20:58 +0000 (13:20 -0300)]
x86: isolate coherent mapping functions
i386 implements the declare coherent memory API, and x86_64 does not
it is reflected in pieces of dma_alloc_coherent and dma_free_coherent.
Those pieces are isolated in separate functions, that are declared
as empty macros in x86_64. This way we can make the code the same.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>