* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (67 commits)
ide: remove redundant DMA blacklist check from __ide_dma_on()
ide: cleanup ide_set_dma()
ide: remove redundant ->ide_dma_on call from set_using_dma()
sc1200: move DMA timings to timing tables
ide: add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag
sis5513: factor out UDMA programming code
pdc202xx_new: move PIO programming code to pdcnew_set_pio_mode()
ide: make 'extra' field in struct ide_port_info u8
ide: kill duplicate code in ide_dump_{ata,atapi}_status()
ide-disk: use ide_get_lba_addr()
ide: printk fix
ide: add ide_tf_read() helper
ide: fix registers loading order in ide_dump_ata_status()
ide-disk: use do_rw_taskfile() (take 2)
ide-disk: add ide_tf_set_cmd() helper
ide-disk: extend timeout for PIO-in commands
ide: remove 'handler' field from ide_task_t (take 2)
ide: use ->data_phase to set ->handler in do_rw_taskfile()
ide: convert do_rw_taskfile() to use ->data_phase
ide: merge flagged_taskfile() into do_rw_taskfile()
...
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (96 commits)
sched: keep total / count stats in addition to the max for
sched, futex: detach sched.h and futex.h
sched: fix: don't take a mutex from interrupt context
sched: print backtrace of running tasks too
printk: use ktime_get()
softlockup: fix signedness
sched: latencytop support
sched: fix goto retry in pick_next_task_rt()
timers: don't #error on higher HZ values
sched: monitor clock underflows in /proc/sched_debug
sched: fix rq->clock warps on frequency changes
sched: fix, always create kernel threads with normal priority
debug: clean up kernel/profile.c
sched: remove the !PREEMPT_BKL code
sched: make PREEMPT_BKL the default
debug: track and print last unloaded module in the oops trace
debug: show being-loaded/being-unloaded indicator for modules
sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY
sched: rt-group: reduce rescheduling
hrtimer: unlock hrtimer_wakeup
...
ide: remove redundant DMA blacklist check from __ide_dma_on()
->ide_dma_on method is called only after successful ide_dma_check() call
(ide_dma_check()->ide_tune_dma() checks DMA blacklist) or if drive->using_dma
has been previously enabled for a given device (->ide_dma_on is the only place
which sets drive->using_dma to '1').
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag and use it to decide
what to do with transfer modes < XFER_PIO_0 in ide_set_xfer_rate().
* Set IDE_HFLAG_ABUSE_SET_DMA_MODE in host drivers that need it
(aec62xx, amd74xx, cs5520, cs5535, hpt34x, hpt366, pdc202xx_old,
serverworks, tc86c001 and via82cxxx) and cleanup ->set_dma_mode
methods in host drivers that don't (IDE core code guarantees that
->set_dma_mode will be called only for modes which are present
in SWDMA/MWDMA/UDMA masks).
While at it:
* Add IDE_HFLAGS_HPT34X/HPT3XX/PDC202XX/SVWKS define in
hpt34x/hpt366/pdc202xx_old/serverworks host driver.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Factor out code reading taskfile registers from ide_end_drive_cmd()
to the new ide_tf_read() helper.
* Add IDE_TFLAG_IN_* taskfile flags to indicate the need to load
particular IDE taskfile register in ide_tf_read().
* Update ide_end_drive_cmd() to set respective IDE_TFLAG_IN_* taksfile flags.
* Add ide_get_lba_addr() for getting LBA sector address from taskfile struct.
* Factor out code getting sector address from ide_dump_ata_status()
to the new ide_dump_sector() function.
* Convert ide_dump_sector() to use ide_tf_read() and ide_get_lba_addr().
* Remove no longer needed ide_read_24().
The only change in functionality caused by this patch is that
ide_dump_ata_status() no longer prints "high"/"low" parts of LBA48
sector address (of course LBA48 sector address is still printed).
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ide_tf_set_cmd() helper for selecting/setting command and data phase
(note: DMA data phases are there for completness, they are not required ATM).
* Set IDE_TFLAG_WRITE taskfile flag for write requests in __ide_do_rw_disk().
* Convert __ide_do_rw_disk() to use the new ide_tf_set_cmd() helper.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
ide: remove 'handler' field from ide_task_t (take 2)
* Add IDE_TFLAG_CUSTOM_HANDLER taskfile flag and use it for internal requests
which require custom handlers. Check the flag in do_rw_taskfile() and set
handler accordingly.
* Cleanup ide_init_{specify,restore,setmult}_cmd() and rename it to
ide_tf_set_{specify,restore,setmult}_cmd().
* Make {set_geometry,recal,set_multmode}_intr() static.
* Remove no longer needed 'handler' field from ide_task_t.
v2:
* 'handler' in do_rw_taskfile() must be set to NULL initially.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Use task->data_phase in do_rw_taskfile() to decide what to do.
* task->prehandler is only used by TASKFILE[_MULTI]_OUT so just
use pre_task_out_intr() directly and remove no longer needed
'prehandler' field from ide_task_t.
* Remove no longer needed ide_pre_handler_t type.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
ide: merge flagged_taskfile() into do_rw_taskfile()
Based on the earlier work by Tejun Heo.
task->data_phase == TASKFILE_MULTI_{IN,OUT} vs drive->mult_count == 0
check is needed also for ide_taskfile_ioctl() requests that don't have
IDE_TFLAG_FLAGGED taskfile flag set.
* Add 'data_buf' and 'nsect' variables in ide_taskfile_ioctl()
to cache data buffer pointer and number of sectors to transfer
(this allows us to have only one ide_diag_taskfile() call).
* Add IDE_TFLAG_WRITE taskfile flag and use it to check whether
the REQ_RW request flag should be set.
* Move ->command_type handling from ide_diag_taskfile() to
ide_taskfile_ioctl() and use ->req_cmd instead of ->command_type.
* Add 'nsect' parameter to ide_raw_taskfile().
* Merge ide_diag_taskfile() into ide_raw_taskfile().
* Initialize ->data_phase explicitly in idedisk_prepare_flush(),
ide_start_power_step() and ide_disk_special().
* Remove no longer needed 'command_type' field from ide_task_t.
* Add #ifndef/#endif __KERNEL__ to <linux/hdreg.h> around no
longer used by kernel IDE_DRIVE_TASK_* and TASKFILE_* defines.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_TFLAG_OUT_DEVICE taskfile flag to indicate the need of writing
the Device register and handle it in ide_tf_load().
Update ide_tf_load() and {do_rw,flagged}_taskfile() users accordingly.
* Use struct ide_taskfile and ide_tf_load() in execute_drive_cmd().
* Make the debugging code dump all taskfile registers for both
REQ_ATA_TYPE_{CMD,TASK} requests and move it to ide_tf_load()
so it also covers REQ_ATA_TYPE_TASKFILE requests.
There should be no functionality changes caused by this patch
(unless DEBUG is defined).
* Rename 'args' variable in 'if (rq->cmd_type == REQ_TYPE_ATA_TASKFILE)'
block to 'task'.
* execute_drive_cmd() is used only for REQ_TYPE_ATA_{CMD,TASK,TASKFILE} so
we can move the common code out from 'if (rq->cmd_type == REQ_TYPE_ATA_CMD)'
and 'if (rq->cmd_type == REQ_TYPE_ATA_TASK)' blocks.
There should be no functionality changes caused by this patch.
ide: fix registers loading order for WIN_SMART in execute_drive_cmd()
Fix registers loading order for REQ_TYPE_ATA_CMD request with WIN_SMART
command in execute_drive_cmd() (load IDE_FEATURE_REG and IDE_SECTOR_REG
before loading IDE_LCYL_REG and IDE_HCYL_REG).
It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
ide: remove IRQF_DISABLED from IRQ flags for IDE IRQ handler
IRQF_DISABLED is not needed because the first thing that ide_intr()
(IDE IRQ handler) does is calling spin_lock_irqsave() which disables
local IRQs (IRQ unmasking is later handled by drive->unmask).
kernel/irq/handle.c:
irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
...
if (!(action->flags & IRQF_DISABLED))
local_irq_enable_in_hardirq();
do {
ret = action->handler(irq, action->dev_id);
if (ret == IRQ_HANDLED)
status |= action->flags;
retval |= ret;
action = action->next;
} while (action);
...
* pmac_ide_init_hwif_ports() can be called by ide_init_hwif_ports()
(through ppc_ide_md.ide_init_hwif hook) for non IDE PMAC interfaces.
If this is the case the hw->io_ports[] should be already setup by
ide_init_hwif_ports()->ide_std_init_ports() so remove redundant code
from pmac_ide_init_hwif_ports().
As side-effect this change fixes ctl_addr == 0 special handling in
ide_init_hwif_ports().
* Fix misleading comment while at it.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add 'tf_flags' field (for taskfile flags) to ide_task_t.
* Add IDE_TFLAG_LBA48 taskfile flag for LBA48 taskfiles.
* Add IDE_TFLAG_NO_SELECT_MASK taskfile flag for __ide_do_rw_disk()
which doesn't use SELECT_MASK() (looks like a bug but it requires
some more investigation).
* Split off ide_tf_load() helper from do_rw_taskfile().
* Convert __ide_do_rw_disk() to use ide_tf_load().
There should be no functionality changes caused by this patch.
Sergei Shtylyov [Fri, 25 Jan 2008 21:17:05 +0000 (22:17 +0100)]
hpt366: merge set_dma_mode() methods
Group the array of pointers to the timing tables with the timing register masks
which allows us to merge HPT36x/HPT37x set_dma_mode() methods into one.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Fri, 25 Jan 2008 21:17:04 +0000 (22:17 +0100)]
hpt366: change timing register masks
Since PIO autotuning is now done always, there's no need anymore to program
the taskfile timings also on DMA modes, so change the IDE timing register
masks accordingly, "inverting the polarity" of the masks while at it...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
/* bail early if we've exceeded max_failures */
if (drive->max_failures && (drive->failures > drive->max_failures)) {
goto kill_rq;
}
(...)
kill_rq:
ide_kill_rq(drive, rq);
return ide_stopped;
ide_kill_rq() and the next calls won't set REQ_FAILED on rq->cmd_flags and thus
cdrom_queue_packet_command() won't return an error. then:
stat = cdrom_queue_packet_command(drive, &req);
if (stat == 0) {
*capacity = 1 + be32_to_cpu(capbuf.lba);
*sectors_per_frame =
be32_to_cpu(capbuf.blocklen) >> SECTOR_BITS;
}
cdrom_read_capacity() ends believing capbuf is valid but in fact it's just
uninitialized data. back to cdrom_read_toc():
/* Try to get the total cdrom capacity and sector size. */
stat = cdrom_read_capacity(drive, &toc->capacity, §ors_per_frame,
sense);
if (stat)
toc->capacity = 0x1fffff;
set_capacity(info->disk, toc->capacity * sectors_per_frame);
/* Save a private copy of te TOC capacity for error handling */
drive->probed_capacity = toc->capacity * sectors_per_frame;
that will set drive->queue->hardsect_size to be the random value.
hardsect_size is used to calculate inode->i_blkbits. later on, on a read
path:
void create_empty_buffers(struct page *page,
unsigned long blocksize, unsigned long b_state)
{
struct buffer_head *bh, *head, *tail;
head = alloc_page_buffers(page, blocksize, 1);
bh = head;
do {
bh->b_state |= b_state;
tail = bh;
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
alloc_page_buffers() will return NULL if blocksize > 4096. blocksize is
calculed based on inode->i_blkbits. that will trigger a null
dereference on create_empty_buffers().
Linus Torvalds [Fri, 25 Jan 2008 20:20:32 +0000 (12:20 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
mount options: fix jfs
JFS: simplify types to get rid of sparse warning
JFS: FIx one more plain integer as NULL pointer warning
JFS: Remove defconfig ptr comparison to 0
JFS: use DIV_ROUND_UP where appropriate
Remove unnecessary kmalloc casts in the jfs filesystem
JFS is missing a memory barrier
JFS: Make sure special inode data is written after journal is flushed
JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
Arjan van de Ven [Fri, 25 Jan 2008 20:08:35 +0000 (21:08 +0100)]
sched: keep total / count stats in addition to the max for
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 25 Jan 2008 20:08:34 +0000 (21:08 +0100)]
sched: fix: don't take a mutex from interrupt context
print_cfs_stats is callable from interrupt context (sysrq), hence it should
not take mutexes. Change it to use RCU since the task group data is RCU
freed anyway.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko [Fri, 25 Jan 2008 20:08:34 +0000 (21:08 +0100)]
sched: fix goto retry in pick_next_task_rt()
looking at it one more time:
(1) it looks to me that there is no need to call
sched_rt_ratio_exceeded() from pick_next_rt_entity()
- [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are
not within this 'tree-like hierarchy' (or whatever we should call it
:-)
- there is also no need to re-check 'rt_rq->rt_time > ratio' at this
point as 'rt_rq->rt_time' couldn't have been increased since the last
call to update_curr_rt() (which obviously calls
sched_rt_ratio_esceeded())
well, it might be that 'ratio' for this rt_rq has been re-configured
(and the period over which this rt_rq was active has not yet been
finished)... but I don't think we should really take this into
account.
(2) now pick_next_rt_entity() must never return NULL, so let's change
pick_next_task_rt() accordingly.
Fix 2bacec8c318ca0418c0ee9ac662ee44207765dd4
(sched: touch softlockup watchdog after idling) that reintroduced warps
on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock
that checks rq->clock for warps, so call it after adjusting rq->clock.
Arjan van de Ven [Fri, 25 Jan 2008 20:08:33 +0000 (21:08 +0100)]
debug: track and print last unloaded module in the oops trace
Based on a suggestion from Andi:
In various cases, the unload of a module may leave some bad state around
that causes a kernel crash AFTER a module is unloaded; and it's then hard
to find which module caused that.
This patch tracks the last unloaded module, and prints this as part of the
module list in the oops trace.
Right now, only the last 1 module is tracked; I expect that this is enough
for the vast majority of cases where this information matters; if it turns
out that tracking more is important, we can always extend it to that.
[ mingo@elte.hu: build fix ]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Fri, 25 Jan 2008 20:08:33 +0000 (21:08 +0100)]
debug: show being-loaded/being-unloaded indicator for modules
It's rather common that an oops/WARN_ON/BUG happens during the load or
unload of a module. Unfortunatly, it's not always easy to see directly
which module is being loaded/unloaded from the oops itself. Worse,
it's not even always possible to ask the bug reporter, since there
are so many components (udev etc) that auto-load modules that there's
a good chance that even the reporter doesn't know which module this is.
This patch extends the existing "show if it's tainting" print code,
which is used as part of printing the modules in the oops/BUG/WARN_ON
to include a "+" for "being loaded" and a "-" for "being unloaded".
As a result this extension, the "taint_flags()" function gets renamed to
"module_flags()" (and takes a module struct as argument, not a taint
flags int).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 25 Jan 2008 20:08:32 +0000 (21:08 +0100)]
sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY
Remove the curious logic to set it_sched_expires in the future. It useless
because rt.timeout wouldn't be incremented anyway.
Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit
and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1
is d-2.
Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl> CC: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 25 Jan 2008 20:08:30 +0000 (21:08 +0100)]
sched: rt group scheduling
Extend group scheduling to also cover the realtime classes. It uses the time
limiting introduced by the previous patch to allow multiple realtime groups.
The hard time limit is required to keep behaviour deterministic.
The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
number of task groups. This is the worst case behaviour I can't seem to get out
of, the avg. case of the algorithms can be improved, I focused on correctness
and worst case.
[ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 25 Jan 2008 20:08:29 +0000 (21:08 +0100)]
sched: rt time limit
Very simple time limit on the realtime scheduling classes.
Allow the rq's realtime class to consume sched_rt_ratio of every
sched_rt_period slice. If the class exceeds this quota the fair class
will preempt the realtime class.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 25 Jan 2008 20:08:29 +0000 (21:08 +0100)]
sched: high-res preemption tick
Use HR-timers (when available) to deliver an accurate preemption tick.
The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.
The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 25 Jan 2008 20:08:27 +0000 (21:08 +0100)]
sched: SCHED_FIFO/SCHED_RR watchdog timer
Introduce a new rlimit that allows the user to set a runtime timeout on
real-time tasks their slice. Once this limit is exceeded the task will receive
SIGXCPU.
So it measures runtime since the last sleep.
Input and ideas by Thomas Gleixner and Lennart Poettering.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Lennart Poettering <mzxreary@0pointer.de> CC: Michael Kerrisk <mtk.manpages@googlemail.com> CC: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>