Ingo Molnar [Thu, 9 Aug 2007 09:16:47 +0000 (11:16 +0200)]
sched: remove 'now' use from assignments
change all 'now' timestamp uses in assignments to rq->clock.
( this is an identity transformation that causes no functionality change:
all such new rq->clock is necessarily preceded by an update_rq_clock()
call. )
Ingo Molnar [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: add [__]update_rq_clock(rq)
add the [__]update_rq_clock(rq) functions. (No change in functionality,
just reorganization to prepare for elimination of the heavy 64-bit
timestamp-passing in the scheduler.)
Peter Williams [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: fix bug in balance_tasks()
There are two problems with balance_tasks() and how it used:
1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).
2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.
The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.
load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Josh Triplett [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: mark print_cfs_stats static
sched_fair.c defines print_cfs_stats, and sched_debug.c uses it, but sched.c
includes both sched_fair.c and sched_debug.c, so all the references to
print_cfs_stats occur in the same compilation unit. Thus, mark
print_cfs_stats static.
Eliminates a sparse warning:
warning: symbol 'print_cfs_stats' was not declared. Should it be static?
Ulrich Drepper [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: clean up sched_getaffinity()
here's another tiny cleanup. The generated code is not affected (gcc is
smart enough) but for people looking over the code it is just irritating
to have the extra conditional.
Peter Williams [Thu, 9 Aug 2007 09:16:46 +0000 (11:16 +0200)]
sched: simplify move_tasks()
The move_tasks() function is currently multiplexed with two distinct
capabilities:
1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.
The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.
The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.
This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()). However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:
1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.
One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list. This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.
Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).
NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.
[ mingo@elte.hu ]
this change also reduces code size nicely:
text data bss dec hex filename
39216 3618 24 42858 a76a sched.o.before
39173 3618 24 42815 a73f sched.o.after
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 9 Aug 2007 09:16:45 +0000 (11:16 +0200)]
sched: reorder update_cpu_load(rq) with the ->task_tick() call
Peter Williams suggested to flip the order of update_cpu_load(rq) with
the ->task_tick() call. This is a NOP for the current scheduler (the
two functions are independent of each other), ->task_tick() might
create some state for update_cpu_load() in the future (or in PlugSched).
Rusty Russell [Mon, 6 Aug 2007 00:48:18 +0000 (10:48 +1000)]
Enable lguest drivers in Kconfig
Lguest drivers need to default to "Y" otherwise they're never selected
for new builds. (We don't bother prompting, because they're less than
4k combined, and implied by selecting lguest support).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avi Kivity [Sun, 5 Aug 2007 07:16:11 +0000 (10:16 +0300)]
KVM: x86 emulator: fix debug reg mov instructions
More fallout from the writeback fixes: debug register transfer
instructions do their own writeback and thus need to disable the general
writeback mechanism.
This fixes oopses and some guest failures on AMD machines (the Intel
variant decodes the instruction in hardware and thus does not need
emulation).
Cc: Alistair John Strachan <alistair@devzero.co.uk> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 7 Aug 2007 00:52:56 +0000 (17:52 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NETFILTER]: Add xt_statistic.h to the header list for usermode programs
[BNX2]: Fix suspend/resume problem.
[TG3]: Fix suspend/resume problem.
Dave Airlie [Mon, 6 Aug 2007 23:09:51 +0000 (09:09 +1000)]
drm/i915: Fix i965 secured batchbuffer usage
This 965G and above chipsets moved the batch buffer non-secure bits to
another place. This means that previous drm's allowed in-secure batchbuffers
to be submitted to the hardware from non-privileged users who are logged
into X and and have access to direct rendering.
Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Mon, 6 Aug 2007 15:10:54 +0000 (16:10 +0100)]
[ARM] pata_icside: fix the FIXMEs
Alan Cox suggested that the solution to the FIXMEs in pata_icside is
to use a private postreset method to detect the lack of devices on a
port, and in such a case, disable the interrupt for the port.
This patch implements such a method, and removes the hard coded
disable of port 0. Tested as working.
Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[CRYPTO] api: fix writting into unallocated memory in setkey_aligned
setkey_unaligned() commited in ca7c39385ce1a7b44894a4b225a4608624e90730
overwrites unallocated memory in the following memset() because
I used the wrong buffer length.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Williams [Thu, 2 Aug 2007 16:08:51 +0000 (17:08 +0100)]
[ARM] 4541/1: iop: defconfig updates
With the availability of the iop-adma driver iop platforms can now use
their offload engines for md-raid5 (copy+xor) and net-dma (tcp receive
copy) offload.
Cc: Lennert Buytenhek <kernel@wantstofly.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Salyzyn, Mark [Thu, 2 Aug 2007 19:38:59 +0000 (15:38 -0400)]
[SCSI] aacraid: prevent panic on adapter resource failure
If the driver fails to allocate the contiguous (DMAable) memory for
system reasons, we fail to load the instance, but then we try to free
the <nul> allocation in the cleanup code and we get a panic in
pci_free_consistent(). This is reported against an older kernel, hope
this is relevant for latest/greatest.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
check_condition code-path was similar but more
complicated to Reset. It went like this:
1. extra space was allocated at aha152x_scdata for mirroring
scsi_cmnd members.
2. At aha152x_internal_queue() every not check_condition
(REQUEST_SENSE) command was copied to above members in
case of error.
3. At busfree_run() in the DONE_CS phase if a Status of
SAM_STAT_CHECK_CONDITION was detected. The command was
re-queued Internally using aha152x_internal_queue(,,check_condition,)
The old command members are over written with the
REQUEST_SENSE info.
4. At busfree_run() in the DONE_CS phase again. If it is a
check_condition command, info was restored from mirror
made at first call to aha152x_internal_queue() (see 2)
and the command is completed.
What I did is:
1. Allocate less space in aha152x_scdata only for the 16-byte
original command. (which is actually not needed by scsi-ml
anymore at this stage. But this is to much knowledge of scsi-ml)
2. If Status == SAM_STAT_CHECK_CONDITION, then like before
re-queue a REQUEST_SENSE command. But only now save original
command members. (Less of them)
3. In aha152x_internal_queue(), just like for Reset, use the
check_condition hint to set differently the working members.
execute the command.
4. At busfree_run() in the DONE_CS phase again. restore needed
members.
While at it. This patch fixes a BUG. Old code when sending
a REQUEST_SENSE for a failed command. Would than return with
cmd->resid == 0 which was the status of the REQUEST_SENSE.
The failing command resid was lost. And when would resid
be interesting if not on a failing command?
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
What Reset code was doing: Save command's important/dangerous
Info on stack. NULL those members from scsi_cmnd.
Issue a Reset. wait for it to finish than restore members
and return.
What I do is save or NULL nothing. But use the "resetting"
hint in aha152x_internal_queue() to NULL out working members
and leave struct scsi_cmnd alone.
The indent here looks funny but it will change/drop in last
patch and it is clear this way what changed.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
[SCSI] aha152x: preliminary fixes and some comments
hunk by hunk:
- CHECK_CONDITION is what happens to cmnd->status >> 1
or after status_byte() macro. But here it is used
directly on status which means 0x1 which is an undefined
bit in the standard. And is a status that will never
return from a target.
- in busfree_run at the DONE_SC phase we have 3 distinct
operation:
1-if(DONE_SC->SCp.phase & check_condition)
The REQUEST_SENSE command return.
- Restore original command
- Than continue to operation 3.
2-if(DONE_SC->SCp.Status==SAM_STAT_CHECK_CONDITION)
A regular command returned with a status.
- Internally re-Q a REQUEST_SENSE.
- Do not do operation 3.
3-
- Complete the command and return it to scsi-ml
So the 0x2 in both these operations (1,2) means the scsi
check-condition status, hence SAM_STAT_CHECK_CONDITION
- Here the code asks about !(DONE_SC->SCp.Status & not_issued)
but "not_issued" is an enum belonging to the "phase" member
and not to the Status returned from target. The reason this
works is because not_issued==1 and Also CHECK_CONDITION==1
(remember from hunk 1). So actually the code was asking
!(DONE_SC->SCp.Status & CHECK_CONDITION). Which means
"Has the status been read from target yet?"
Staus is read at status_run(). "not_issued" is
cleared in seldo_run() which is usually earlier than
status_run().
So this patch does nothing as far as assembly is concerned
but it does let the reader understand what is going on.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
James Bottomley [Fri, 3 Aug 2007 21:41:11 +0000 (16:41 -0500)]
[SCSI] sd: disentangle barriers in SCSI
Our current implementation has a generic set of barrier functions that
go through the SCSI driver model. Realistically, this is unnecessary,
because the only device that can use barriers (sd) can set the flush
functions up at probe or revalidate time. This patch pulls the barrier
functions out of the mid layer and scsi driver model and relocates them
directly in sd.
Acked-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Michael Chan [Sat, 4 Aug 2007 03:57:25 +0000 (20:57 -0700)]
[BNX2]: Fix suspend/resume problem.
The device would not resume properly if it was shutdown before the system
was suspended. In such scenario where the netif_running state is 0,
bnx2_suspend() would not save the PCI state and so the memory enable bit
and bus master enable bit would be lost.
We fix this by always saving and restoring the PCI state in
bnx2_suspend() and bnx2_resume() regardless of netif_running() state.
Update version to 1.6.4.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sat, 4 Aug 2007 03:56:54 +0000 (20:56 -0700)]
[TG3]: Fix suspend/resume problem.
Joachim Deguara <joachim.deguara@amd.com> reported that tg3 devices
would not resume properly if the device was shutdown before the system
was suspended. In such scenario where the netif_running state is 0,
tg3_suspend() would not save the PCI state and so the memory enable bit
and bus master enable bit would be lost.
We fix this by always saving and restoring the PCI state in
tg3_suspend() and tg3_resume() regardless of netif_running() state.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 3 Aug 2007 22:16:33 +0000 (15:16 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fixes for the SLB shadow buffer code
[POWERPC] Fix a compile warning in powermac/feature.c
[POWERPC] Fix a compile warning in pci_32.c
[POWERPC] Fix parse_drconf_memory() for 64-bit start addresses
[POWERPC] Fix num_cpus calculation in smp_call_function_map()
[POWERPC] ps3: Fix section mismatch in ps3/setup.c
[POWERPC] spufs: Fix affinity after introduction of node_allowed() calls
[POWERPC] Fix special PTE code for secondary hash bucket
[POWERPC] Expand RPN field to 34 bits when using 64k pages
Oleg Nesterov [Fri, 3 Aug 2007 21:04:41 +0000 (01:04 +0400)]
Kill some obsolete sub-thread-ptrace stuff
There is a couple of subtle checks which were needed to handle ptracing from
the same thread group. This was deprecated a long ago, imho this code just
complicates the understanding.
And, the "->parent->signal->flags & SIGNAL_GROUP_EXIT" check in exit_notify()
is not right. SIGNAL_GROUP_EXIT can mean exec(), not exit_group(). This means
ptracer can lose a ptraced zombie on exec(). Minor problem, but still the bug.
Daniel Ritz [Fri, 3 Aug 2007 14:07:43 +0000 (16:07 +0200)]
serial: fix 8250 early console setup
the early setup function serial8250_console_early_setup() can be called
from non __init code (eg. hotpluggable serial ports like serial_cs) so
remove the __init from the call chain to avoid crashes.
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Cc: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 3 Aug 2007 21:57:41 +0000 (14:57 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: DSACK signals data receival, be conservative
[TCP]: Also handle snd_una changes in tcp_cwnd_down
[TIPC]: Fix two minor sparse warnings.
[TIPC]: Make function tipc_nameseq_subscribe static.
[PF_KEY]: Fix ipsec not working in 2.6.23-rc1-git10
[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg().
[IPV4] route.c: mostly kmalloc + memset conversion to k[cz]alloc
[IPV4] raw.c: kmalloc + memset conversion to kzalloc
[NETFILTER] nf_conntrack_l3proto_ipv4_compat.c: kmalloc + memset conversion to kzalloc
[NETFILTER] nf_conntrack_expect.c: kmalloc + memset conversion to kzalloc
[NET]: Removal of duplicated include net/wanrouter/wanmain.c
SCTP: remove useless code in function sctp_init_cause
SCTP: drop SACK if ctsn is not less than the next tsn of assoc
SCTP: IPv4 mapped addr not returned in SCTPv6 accept()
SCTP: IPv4 mapped addr not returned in SCTPv6 accept()
sctp: fix shadow symbol in net/sctp/tsnmap.c
sctp: try to fix readlock
sctp: remove shadowed symbols
sctp: move global declaration to header file.
sctp: make locally used function static
Satyam Sharma [Fri, 3 Aug 2007 02:57:13 +0000 (08:27 +0530)]
[MTD] Makefile fix for mtdsuper
We want drivers/mtd/{mtdcore, mtdsuper, mtdpart}.c to be built and linked
into the same mtd.ko module. Fix the Makefile to ensure this, and remove
duplicate MODULE_ declarations in mtdpart.c, as mtdcore.c already has them.
Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Michael Neuling [Fri, 3 Aug 2007 01:55:39 +0000 (11:55 +1000)]
[POWERPC] Fixes for the SLB shadow buffer code
On a machine with hardware 64kB pages and a kernel configured for a
64kB base page size, we need to change the vmalloc segment from 64kB
pages to 4kB pages if some driver creates a non-cacheable mapping in
the vmalloc area. However, we never updated with SLB shadow buffer.
This fixes it. Thanks to paulus for finding this.
Also added some write barriers to ensure the shadow buffer contents
are always consistent.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
[POWERPC] Fix parse_drconf_memory() for 64-bit start addresses
Some new machines use the "ibm,dynamic-reconfiguration-memory" property
to provide memory layout information, rather than via memory nodes.
There is a bug in the code to parse this property for start addresses
over 4GB; we store the start address in an unsigned int, which means
we throw away the high bits and add apparently duplicate regions.
This results in a BUG() in free_bootmem_core(). This fixes it by
using an unsigned long instead.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Kevin Corry [Tue, 31 Jul 2007 20:19:46 +0000 (06:19 +1000)]
[POWERPC] Fix num_cpus calculation in smp_call_function_map()
In smp_call_function_map(), num_cpus is set to the number of online
CPUs minus one. However, if the CPU mask does not include all CPUs
(except the one we're running on), the routine will hang in the first
while() loop until the 8 second timeout occurs.
The num_cpus should be set to the number of CPUs specified in the mask
passed into the routine, after we've made any modifications to the
mask. With this change, we can also get rid of the call to
cpus_empty() and avoid adding another pass through the bitmask.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Andre Detsch [Mon, 30 Jul 2007 23:48:11 +0000 (09:48 +1000)]
[POWERPC] spufs: Fix affinity after introduction of node_allowed() calls
This patch fixes affinity reference point placement, which was not being
done in some situations, after the introduction of node_allowed() calls.
The previously used parameter, 'ctx', is just the iterator of the
previous list_for_each_entry_reverse loop, and its value might be
invalid at the end of the loop. Also, the right context to seek
for information when defining the reference ctx location
_is_ the reference ctx.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Fri, 3 Aug 2007 09:16:11 +0000 (19:16 +1000)]
[POWERPC] Fix special PTE code for secondary hash bucket
The code for mapping special 4k pages on kernels using a 64kB base
page size was missing the code for doing the RPN (real page number)
manipulation when inserting the hardware PTE in the secondary hash
bucket. It needs the same code as has already been added to the
code that inserts the HPTE in the primary hash bucket. This adds it.
Paul Mackerras [Fri, 3 Aug 2007 04:08:24 +0000 (14:08 +1000)]
[POWERPC] Expand RPN field to 34 bits when using 64k pages
The real page number field in our PTEs when configured for 64kB pages
is currently 32 bits, which turns out to be not quite enough for the
resources that the eHCA driver wants to map. This expands the RPN
field to include 2 adjacent, previously-unused bits.
Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>