err.no Git - linux-2.6/log

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (67 commits)
  ide: remove redundant DMA blacklist check from __ide_dma_on()
  ide: cleanup ide_set_dma()
  ide: remove redundant ->ide_dma_on call from set_using_dma()
  sc1200: move DMA timings to timing tables
  ide: add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag
  sis5513: factor out UDMA programming code
  pdc202xx_new: move PIO programming code to pdcnew_set_pio_mode()
  ide: make 'extra' field in struct ide_port_info u8
  ide: kill duplicate code in ide_dump_{ata,atapi}_status()
  ide-disk: use ide_get_lba_addr()
  ide: printk fix
  ide: add ide_tf_read() helper
  ide: fix registers loading order in ide_dump_ata_status()
  ide-disk: use do_rw_taskfile() (take 2)
  ide-disk: add ide_tf_set_cmd() helper
  ide-disk: extend timeout for PIO-in commands
  ide: remove 'handler' field from ide_task_t (take 2)
  ide: use ->data_phase to set ->handler in do_rw_taskfile()
  ide: convert do_rw_taskfile() to use ->data_phase
  ide: merge flagged_taskfile() into do_rw_taskfile()
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (96 commits)
  sched: keep total / count stats in addition to the max for
  sched, futex: detach sched.h and futex.h
  sched: fix: don't take a mutex from interrupt context
  sched: print backtrace of running tasks too
  printk: use ktime_get()
  softlockup: fix signedness
  sched: latencytop support
  sched: fix goto retry in pick_next_task_rt()
  timers: don't #error on higher HZ values
  sched: monitor clock underflows in /proc/sched_debug
  sched: fix rq->clock warps on frequency changes
  sched: fix, always create kernel threads with normal priority
  debug: clean up kernel/profile.c
  sched: remove the !PREEMPT_BKL code
  sched: make PREEMPT_BKL the default
  debug: track and print last unloaded module in the oops trace
  debug: show being-loaded/being-unloaded indicator for modules
  sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY
  sched: rt-group: reduce rescheduling
  hrtimer: unlock hrtimer_wakeup
  ...

ide: remove redundant DMA blacklist check from __ide_dma_on()

->ide_dma_on method is called only after successful ide_dma_check() call
(ide_dma_check()->ide_tune_dma() checks DMA blacklist) or if drive->using_dma
has been previously enabled for a given device (->ide_dma_on is the only place
which sets drive->using_dma to '1').

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: cleanup ide_set_dma()

* ->dma_off_quietly is always called before ide_set_dma()
  so the call can be moved inside ide_set_dma().

* ide_dma_check() doesn't touch hardware so ->dma_off_quietly
  call for 'rc == -1' case is redundant, remove it.

* '0' and '-1' are the only values returned by ide_dma_check()
  so remove dead code for other cases.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove redundant ->ide_dma_on call from set_using_dma()

ide_set_dma() calls ->ide_dma_on method itself and returns zero
only if ->ide_dma_on call succeeded.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

sc1200: move DMA timings to timing tables

Based on pata_sc1200.c.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag

* Add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag and use it to decide
  what to do with transfer modes < XFER_PIO_0 in ide_set_xfer_rate().

* Set IDE_HFLAG_ABUSE_SET_DMA_MODE in host drivers that need it
  (aec62xx, amd74xx, cs5520, cs5535, hpt34x, hpt366, pdc202xx_old,
  serverworks, tc86c001 and via82cxxx) and cleanup ->set_dma_mode
  methods in host drivers that don't (IDE core code guarantees that
  ->set_dma_mode will be called only for modes which are present
  in SWDMA/MWDMA/UDMA masks).

While at it:

* Add IDE_HFLAGS_HPT34X/HPT3XX/PDC202XX/SVWKS define in
  hpt34x/hpt366/pdc202xx_old/serverworks host driver.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

sis5513: factor out UDMA programming code

* Factor out UDMA programming code from sis_set_dma_mode() to per
chipset family helpers: sis_{ata33,ata133}_program_udma_timings().

* Add sis_program_udma_timings() helper.

* Remove unneeded casts to 'unsigned long'.

* Minor cleanups.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

pdc202xx_new: move PIO programming code to pdcnew_set_pio_mode()

* Move PIO programming code from pdcnew_set_mode() to pdcnew_set_pio_mode().

* Rename pdcnew_set_mode() to pdcnew_set_dma_mode().

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: make 'extra' field in struct ide_port_info u8

The maximum value used currently for 'extra' field in struct ide_port_info
is 240.

Make 'extra' u8 so it packs nicely together with enablebits[] and 'chipset'
fields (ide_pci_enablebit_t is 3 bytes and hwif_chipset_t is 1 byte).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: kill duplicate code in ide_dump_{ata,atapi}_status()

* Move the common code from ide_dump_{ata,atapi}_status() to
ide_dump_status().

* ide_dump_{ata,atapi}_status() -> ide_dump_{ata,atapi}_error().

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: use ide_get_lba_addr()

* Export ide_get_lba_addr().

* Convert idedisk_{read_native,set}_max_address() to use ide_get_lba_addr().

* Remove incorrect comment from idedisk_read_native_max_address()
(noticed by Sergei).

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: printk fix

power4:

drivers/ide/ide-lib.c: In function `ide_dump_sector':
drivers/ide/ide-lib.c:516: warning: long long unsigned int format, u64 arg (arg 2)

We don't know what type is used to implement u64 hence it must always be cast
when printed.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: add ide_tf_read() helper

* Factor out code reading taskfile registers from ide_end_drive_cmd()
  to the new ide_tf_read() helper.

* Add IDE_TFLAG_IN_* taskfile flags to indicate the need to load
  particular IDE taskfile register in ide_tf_read().

* Update ide_end_drive_cmd() to set respective IDE_TFLAG_IN_* taksfile flags.

* Add ide_get_lba_addr() for getting LBA sector address from taskfile struct.

* Factor out code getting sector address from ide_dump_ata_status()
  to the new ide_dump_sector() function.

* Convert ide_dump_sector() to use ide_tf_read() and ide_get_lba_addr().

* Remove no longer needed ide_read_24().

The only change in functionality caused by this patch is that
ide_dump_ata_status() no longer prints "high"/"low" parts of LBA48
sector address (of course LBA48 sector address is still printed).

Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: fix registers loading order in ide_dump_ata_status()

Fix registers loading order in ide_dump_ata_status()/ide_read_24().

Load registers in this order:
* IDE_SECTOR_REG
* IDE_LCYL_REG
* IDE_HCYL_REG
* IDE_SELECT_REG

It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: use do_rw_taskfile() (take 2)

* Add IDE_TFLAG_DMA_PIO_FALLBACK taskfile flag to indicate the need
to skip loading taskfile registers in do_rw_taskfile().

* Export do_rw_taskfile().

* Convert __ide_do_rw_disk() to use do_rw_taskfile().

* Unexport ide_tf_load().

* Unexport {pre_task_out,task_in}_intr() and make it static.

* Remove incorrect comment about do_rw_taskfile() from <linux/ide.h>.

There should be no functionality changes caused by this patch.

v2:
* Add missing blk_fs_request() check to task_dma_ok() (for VDMA).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: add ide_tf_set_cmd() helper

* Add ide_tf_set_cmd() helper for selecting/setting command and data phase
(note: DMA data phases are there for completness, they are not required ATM).

* Set IDE_TFLAG_WRITE taskfile flag for write requests in __ide_do_rw_disk().

* Convert __ide_do_rw_disk() to use the new ide_tf_set_cmd() helper.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: extend timeout for PIO-in commands

s/WAIT_CMD/WAIT_WORSTCASE/ to make the timeout the same as in do_rw_taskfile()

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove 'handler' field from ide_task_t (take 2)

* Add IDE_TFLAG_CUSTOM_HANDLER taskfile flag and use it for internal requests
  which require custom handlers.  Check the flag in do_rw_taskfile() and set
  handler accordingly.

* Cleanup ide_init_{specify,restore,setmult}_cmd() and rename it to
  ide_tf_set_{specify,restore,setmult}_cmd().

* Make {set_geometry,recal,set_multmode}_intr() static.

* Remove no longer needed 'handler' field from ide_task_t.

v2:
* 'handler' in do_rw_taskfile() must be set to NULL initially.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: use ->data_phase to set ->handler in do_rw_taskfile()

* Use ->data_phase to set ->handler in do_rw_taskfile() instead of
setting ->handler in callers of ide_raw_taskfile()/do_rw_taskfile().

* Unexport task_no_data_intr() and make it static.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: convert do_rw_taskfile() to use ->data_phase

* Use task->data_phase in do_rw_taskfile() to decide what to do.

* task->prehandler is only used by TASKFILE[_MULTI]_OUT so just
use pre_task_out_intr() directly and remove no longer needed
'prehandler' field from ide_task_t.

* Remove no longer needed ide_pre_handler_t type.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: merge flagged_taskfile() into do_rw_taskfile()

Based on the earlier work by Tejun Heo.

task->data_phase == TASKFILE_MULTI_{IN,OUT} vs drive->mult_count == 0
check is needed also for ide_taskfile_ioctl() requests that don't have
IDE_TFLAG_FLAGGED taskfile flag set.

Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: guarantee 400ns delay after writing command register

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: fix __ide_do_rw_disk() to use ->OUTBSYNC

Fix __ide_do_rw_disk() to use ->OUTBSYNC instead of ->OUTB
(needed for pmac and scc_pata host drivers).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

sc1200: remove pointless hwif lookup loop

Save PCI regs values for both IDE ports in one buffer, in order to eliminate
a needless and ugly loop across all hwifs, searching for our PCI device.

Partially based on the previous patch by Jeff Garzik.

Cc: Jeff Garzik <jeff@garzik.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove 'tf_in_flags' field from ide_task_t

* Add IDE_TFLAG_IN_DATA taskfile flag to indicate the need of reading
  IDE_DATA_REG in ide_end_drive_cmd().

  Set the new flag in ide_taskfile_ioctl() if ->in_flags.b.data is set.

* Add IDE_TFLAG_FLAGGED_SET_IN_FLAGS taskfile flag to indicate the
  need of modifying ->in_flags in ide_taskfile_ioctl().

  Set the new flag in flagged_taskfile() and move the code modifying
  ->tf_in_flags to ide_taskfile_ioctl().

  While at it remove the bogus comment: ->tf_in_flags (except .b.data)
  have no effect on selection of registers to read.

* Remove no longer needed 'tf_in_flags' field from ide_task_t.

As the result we finally have the internals of HDIO_DRIVE_TASKFILE ioctl
separated from the core IDE code.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove 'command_type' field from ide_task_t

* Add 'data_buf' and 'nsect' variables in ide_taskfile_ioctl()
  to cache data buffer pointer and number of sectors to transfer
  (this allows us to have only one ide_diag_taskfile() call).

* Add IDE_TFLAG_WRITE taskfile flag and use it to check whether
  the REQ_RW request flag should be set.

* Move ->command_type handling from ide_diag_taskfile() to
  ide_taskfile_ioctl() and use ->req_cmd instead of ->command_type.

* Add 'nsect' parameter to ide_raw_taskfile().

* Merge ide_diag_taskfile() into ide_raw_taskfile().

* Initialize ->data_phase explicitly in idedisk_prepare_flush(),
  ide_start_power_step() and ide_disk_special().

* Remove no longer needed 'command_type' field from ide_task_t.

* Add #ifndef/#endif __KERNEL__ to <linux/hdreg.h> around no
  longer used by kernel IDE_DRIVE_TASK_* and TASKFILE_* defines.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove hwif->intrproc

Given that:

* hpt366.c::hpt3xx_intrproc() is the only user of hwif->intrproc

* hpt366.c::hpt3xx_quirkproc() sets drive->quirk_list to 1 for quirky drives
which is a value unique to hpt366 host driver

we can remove hwif->intproc and just check for drive->quirk_list == 1
in ide_do_request().

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove SELECT_INTERRUPT()

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove QUIRK_LIST()

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: add ide_pktcmd_tf_load() helper

Add ide_pktcmd_tf_load() helper and convert ATAPI device drivers to use it.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-{floppy,tape,scsi}: fix register loading order when issuing packet command

Load IDE_BCOUNTL_REG before IDE_BCOUNTH_REG when issuing packet command.

It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-cd: fix register loading order in cdrom_start_packet_command()

Load IDE_CONTROL_REG before other registers in cdrom_start_packet_command().

It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).

While at it move misplaced FIXME comment in the right place.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove atapi_ireason_t (take 3)

Remove atapi_ireason_t.

While at it:
* replace 'HWIF(drive)' by 'drive->hwif' (or just 'hwif' where possible)

v2:
* v1 had CD and IO bits reversed in many places.

* Use CD and IO defines from <linux/hdreg.h>.

v3:
* Fix incorrect "(ireason & IO) == test_bit()". (Noticed by Sergei)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove ata_nsector_t, ata_data_t and atapi_bcount_t

Remove ata_nsector_t, ata_data_t (unused) and atapi_bcount_t.

While at it:
* replace 'HWIF(drive)' by 'hwif'

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove atapi_feature_t

Remove atapi_feature_t.

While at it:
* replace 'HWIF(drive)' by 'hwif'

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove atapi_error_t (take 2)

Remove atapi_error_t.

While at it:
* replace 'HWIF(drive)' by 'drive->hwif'

v2:
* Add {ILI,EOM,LFS}_ERR defines to <linux/hdreg.h>.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove ata_status_t and atapi_status_t

Remove ata_status_t (unused) and atapi_status_t.

While at it:
* replace 'HWIF(drive)' by 'drive->hwif' (or just 'hwif' where possible)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: CPU endianness doesn't matter for special_t

special_t is used only internally by the IDE subsystem (it isn't
related to hardware registers and isn't exported to the user-space).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-floppy: remove dead code

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove REQ_TYPE_ATA_TASK

Based on the earlier work by Tejun Heo.

All users are gone so we can finally remove it.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: switch ide_task_ioctl() to use REQ_TYPE_ATA_TASKFILE requests

Based on the earlier work by Tejun Heo.

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: switch idedisk_prepare_flush() to use REQ_TYPE_ATA_TASKFILE requests

Based on the earlier work by Tejun Heo.

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: extend timeout for REQ_TYPE_ATA_{CMD,TASK} requests

Extend timeout for REQ_TYPE_ATA_{CMD,TASK} requests from WAIT_CMD (10sec)
to WAIT_WORSTCASE (30sec, already used for REQ_TYPE_ATA_TASKFILE).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove unnecessary writes to HOB taskfile registers

* Set taskfile flags for REQ_TYPE_ATA_TASKFILE requests before
  adding the request to the queue.

* Cleanup execute_drive_cmd().

* Remove unnecessary writes to HOB taskfile registers when using
  LBA48 disk for the following cases:

  - Power Management requests
    (WIN_FLUSH_CACHE[_EXT], WIN_STANDBYNOW1, WIN_IDLEIMMEDIATE commands)

  - special commands (WIN_SPECIFY, WIN_RESTORE, WIN_SETMULT)

  - Host Protected Area support (WIN_READ_NATIVE_MAX, WIN_SET_MAX)

  - /proc/ide/ SMART support (WIN_SMART with SMART_ENABLE,
    SMART_READ_VALUES and SMART_READ_THRESHOLDS subcommands)

  - write cache enabling/disabling in ide-disk
    (WIN_SETFEATURES with SETFEATURES_{EN,DIS}_WCACHE)

  - write cache flushing in ide-disk (WIN_FLUSH_CACHE[_EXT])

  - acoustic management in ide-disk
    (WIN_SETFEATURES with SETFEATURES_{EN,DIS}_AAM)

  - door (un)locking in ide-disk (WIN_DOORLOCK, WIN_DOORUNLOCK)

  - /proc/ide/hd?/identify support (WIN_IDENTIFY)

  - ACPI _GTF taskfiles

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: use IDE_TFLAG_LBA48 for REQ_TYPE_ATA_TASKFILE requests

* Use IDE_TFLAG_LBA48 for REQ_TYPE_ATA_TASKFILE requests in ide_end_drive_cmd()
to decide whether we need to read HOB taskfile registers.

* Update execute_drive_cmd() accordingly.

This is a preparation for the next patch which removes unnecessary writes to
HOB taskfile registers for some ATA commands.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: use ide_tf_load() in execute_drive_cmd()

* Add IDE_TFLAG_OUT_DEVICE taskfile flag to indicate the need of writing
  the Device register and handle it in ide_tf_load().

  Update ide_tf_load() and {do_rw,flagged}_taskfile() users accordingly.

* Use struct ide_taskfile and ide_tf_load() in execute_drive_cmd().

* Make the debugging code dump all taskfile registers for both
  REQ_ATA_TYPE_{CMD,TASK} requests and move it to ide_tf_load()
  so it also covers REQ_ATA_TYPE_TASKFILE requests.

There should be no functionality changes caused by this patch
(unless DEBUG is defined).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove ide_cmd() helper

* Remove ide_cmd() helper.

* Clear nIEN and call SELECT_MASK() before writing taskfile registers.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: execute_drive_cmd() cleanup

* Rename 'args' variable in 'if (rq->cmd_type == REQ_TYPE_ATA_TASKFILE)'
  block to 'task'.

* execute_drive_cmd() is used only for REQ_TYPE_ATA_{CMD,TASK,TASKFILE} so
  we can move the common code out from 'if (rq->cmd_type == REQ_TYPE_ATA_CMD)'
  and 'if (rq->cmd_type == REQ_TYPE_ATA_TASK)' blocks.

There should be no functionality changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: fix registers loading order for IDE_NSECTOR_REG in execute_drive_cmd()

Move loading of IDE_NSECTOR_REG from ide_cmd() to execute_drive_cmd()
(load the IDE_NSECTOR_REG just after IDE_FEATURE_REG).

This also allows us to drop 'nsect' argument from ide_cmd() and simplify
execute_drive_cmd() code for REQ_TYPE_ATA_CMD case a bit.

It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: fix registers loading order for WIN_SMART in execute_drive_cmd()

Fix registers loading order for REQ_TYPE_ATA_CMD request with WIN_SMART
command in execute_drive_cmd() (load IDE_FEATURE_REG and IDE_SECTOR_REG
before loading IDE_LCYL_REG and IDE_HCYL_REG).

It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-tape: remove dead USE_IOTRACE code

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove stale ide.h "configuration options"

Remove stale ide.h "configuration options":

* INITIAL_MULT_COUNT - always defined to 0

* SUPPORT_SLOW_DATA_PORTS - unused

* OK_TO_RESET_CONTROLLER - always defined to 1

* DISABLE_IRQ_NOSYNC - always defined to 0

Leave SUPPORT_VLB_SYNC (defined to 0 for CRIS and FRV, otherwise to 1)
for now but disallow overriding it by <asm/ide.h>.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove CONFIG_IDEPCI_SHARE_IRQ config option

We can safely remove CONFIG_IDEPCI_SHARE_IRQ and always support
PCI IRQ sharing.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove IRQF_DISABLED from IRQ flags for IDE IRQ handler

IRQF_DISABLED is not needed because the first thing that ide_intr()
(IDE IRQ handler) does is calling spin_lock_irqsave() which disables
local IRQs (IRQ unmasking is later handled by drive->unmask).

kernel/irq/handle.c:

irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
...
if (!(action->flags & IRQF_DISABLED))
local_irq_enable_in_hardirq();

do {
ret = action->handler(irq, action->dev_id);
if (ret == IRQ_HANDLED)
status |= action->flags;
retval |= ret;
action = action->next;
} while (action);
...

drivers/ide/ide-io.c:

irqreturn_t ide_intr (int irq, void *dev_id)
...
spin_lock_irqsave(&ide_lock, flags);
...
spin_unlock(&ide_lock);
...
if (drive->unmask)
local_irq_enable_in_hardirq();
...

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-pmac: fix pmac_ide_init_hwif_ports()

* pmac_ide_init_hwif_ports() can be called by ide_init_hwif_ports()
  (through ppc_ide_md.ide_init_hwif hook) for non IDE PMAC interfaces.
  If this is the case the hw->io_ports[] should be already setup by
  ide_init_hwif_ports()->ide_std_init_ports() so remove redundant code
  from pmac_ide_init_hwif_ports().

  As side-effect this change fixes ctl_addr == 0 special handling in
  ide_init_hwif_ports().

* Fix misleading comment while at it.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: use do_rw_taskfile() in flagged_taskfile()

Based on the earlier work by Tejun Heo.

* Move setting IDE_TFLAG_LBA48 taskfile flag from do_rw_taskfile()
  function to the callers.

* Add IDE_TFLAG_FLAGGED taskfile flag for flagged taskfiles coming
  from ide_taskfile_ioctl().  Check it instead of ->tf_out_flags.all.

* Add IDE_TFLAG_OUT_DATA taskfile flag to indicate the need to load
  IDE data register in ide_tf_load().

* Add IDE_TFLAG_OUT_* taskfile flags to indicate the need to load
  particular IDE taskfile registers in ide_tf_load().

* Update do_rw_taskfile() and ide_tf_load() users to set respective
  IDE_TFLAG_OUT_* taksfile flags.

* Add task_dma_ok() helper.

* Use IDE_TFLAG_FLAGGED taskfile flag to select HIHI mask in ide_tf_load().

* Use do_rw_taskfile() in flagged_taskfile().

* Remove no longer needed 'tf_out_flags' field from ide_task_t.

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: add ide_no_data_taskfile() helper

* Add ide_no_data_taskfile() helper and convert ide_raw_taskfile() w/ NO DATA
protocol users to use it instead.

* Set ->data_phase explicitly in ide_no_data_taskfile()
(TASKFILE_NO_DATA is defined as 0x0000).

* Unexport task_no_data_intr().

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: add ide_tf_load() helper

Based on the earlier work by Tejun Heo.

* Add 'tf_flags' field (for taskfile flags) to ide_task_t.

* Add IDE_TFLAG_LBA48 taskfile flag for LBA48 taskfiles.

* Add IDE_TFLAG_NO_SELECT_MASK taskfile flag for __ide_do_rw_disk()
which doesn't use SELECT_MASK() (looks like a bug but it requires
some more investigation).

* Split off ide_tf_load() helper from do_rw_taskfile().

* Convert __ide_do_rw_disk() to use ide_tf_load().

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: use struct ide_taskfile in __ide_do_rw_disk()

Based on the earlier work by Tejun Heo.

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: fix taskfile registers loading order in __ide_do_rw_disk()

Load IDE_SECTOR_REG after IDE_FEATURE_REG and IDE_NSECTOR_REG when using CHS.

This patch is basically a preparation for the next one which converts
__ide_do_rw_disk() to use struct ide_taskfile.

It shouldn't affect anything (just a usual paranoia to separate changes
which change the way in which hardware is accessed from code cleanups).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-disk: merge LBA28 and LBA48 Host Protected Area support code (take 2)

* Merge idedisk_{read_native,set}_max_address_ext() into
idedisk_{read_native,set}_max_address().

v2:
* Remove LBA48 code leftover from idedisk_read_native_max_address()
('high' variable initialization). (Noticed by Sergei).

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: add struct ide_taskfile (take 2)

* Don't set write-only ide_task_t.hobRegister[6] and ide_task_t.hobRegister[7]
  in idedisk_set_max_address_ext().

* Add struct ide_taskfile and use it in ide_task_t instead of tfRegister[]
  and hobRegister[].

* Remove no longer needed IDE_CONTROL_OFFSET_HOB define.

* Add #ifndef/#endif __KERNEL__ around definitions of {task,hob}_struct_t.

While at it:

* Use ATA_LBA define for LBA bit (0x40) as suggested by Tejun Heo.

v2:
* Add missing newlines. (Noticed by Sergei)

* Use ~ATA_LBA instead of 0xBF. (Noticed by Sergei)

* Use unnamed unions for error/feature and status/command.
  (Suggested by Sergei).

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove task_ioreg_t typedef (take 2)

Remove task_ioreg_t typedef from the kernel code (but leave it
in <linux/hdreg.h> for #ifndef/#endif __KERNEL__ case).

While at it also move sata_ioreg_t typedef under #ifndef/#endif __KERNEL__.

v2:
Remove name of the second parameter from ide_execute_command() declaration.
(Noticed by Sergei).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove ->dma_master field from ide_hwif_t (take 5)

* Convert cmd64x, hpt366 and pdc202xx_old host drivers to use
  pci_resource_start(hwif->pci_dev, 4) instead of hwif->dma_master.

* Remove no longer needed ->dma_master field from ide_hwif_t.

v2:
* Use the more readable 'hwif->dma_base - (hwif->channel * 8)' instead of
  pci_resource_start(hwif->pci_dev, 4).

v3:
* Use hwif->extra_base in hpt366/pdc20xx_old + some cosmetic fixups over v2
  (suggested by Sergei).

v4:
* Correct offsets in hpt3xxn_set_clock().

v5:
* Use hwif->extra_base in hpt366 for _real_ this time. (Noticed by Sergei)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

hpt366: merge set_dma_mode() methods

Group the array of pointers to the timing tables with the timing register masks
which allows us to merge HPT36x/HPT37x set_dma_mode() methods into one.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

hpt366: kill set_dma_mode() method wrapper

There's no reason to keep the set_dma_mode() method wrapper for two different
chip families, so get rid of it...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

hpt366: change timing register masks

Since PIO autotuning is now done always, there's no need anymore to program
the taskfile timings also on DMA modes, so change the IDE timing register
masks accordingly, "inverting the polarity" of the masks while at it...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide-io: set REQ_FAILED when drive is dead

Currently it's possible to ide-cd to set an incorrect blocksize by
reading garbage if the drive is dead:

ide_cd_probe()
-> cdrom_read_toc()
     -> cdrom_read_capacity()
         -> cdrom_queue_packet_command()
             -> ide_do_drive_cmd()
                 -> ide_do_request()
                     -> start_request()

on start_request():

        /* bail early if we've exceeded max_failures */
        if (drive->max_failures && (drive->failures > drive->max_failures)) {
                goto kill_rq;
        }
(...)
kill_rq:
        ide_kill_rq(drive, rq);
        return ide_stopped;

ide_kill_rq() and the next calls won't set REQ_FAILED on rq->cmd_flags and thus
cdrom_queue_packet_command() won't return an error. then:

        stat = cdrom_queue_packet_command(drive, &req);
        if (stat == 0) {
                *capacity = 1 + be32_to_cpu(capbuf.lba);
                *sectors_per_frame =
                        be32_to_cpu(capbuf.blocklen) >> SECTOR_BITS;
        }

cdrom_read_capacity() ends believing capbuf is valid but in fact it's just
uninitialized data. back to cdrom_read_toc():

        /* Try to get the total cdrom capacity and sector size. */
        stat = cdrom_read_capacity(drive, &toc->capacity, &sectors_per_frame,
                                   sense);
        if (stat)
                toc->capacity = 0x1fffff;

        set_capacity(info->disk, toc->capacity * sectors_per_frame);
        /* Save a private copy of te TOC capacity for error handling */
        drive->probed_capacity = toc->capacity * sectors_per_frame;

        blk_queue_hardsect_size(drive->queue,
                                sectors_per_frame << SECTOR_BITS);

that will set drive->queue->hardsect_size to be the random value.
hardsect_size is used to calculate inode->i_blkbits. later on, on a read
path:

void create_empty_buffers(struct page *page,
                        unsigned long blocksize, unsigned long b_state)
{
        struct buffer_head *bh, *head, *tail;

        head = alloc_page_buffers(page, blocksize, 1);
        bh = head;
        do {
                bh->b_state |= b_state;
                tail = bh;
                bh = bh->b_this_page;
        } while (bh);
        tail->b_this_page = head;

alloc_page_buffers() will return NULL if blocksize > 4096. blocksize is
calculed based on inode->i_blkbits. that will trigger a null
dereference on create_empty_buffers().

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  mount options: fix jfs
  JFS: simplify types to get rid of sparse warning
  JFS: FIx one more plain integer as NULL pointer warning
  JFS: Remove defconfig ptr comparison to 0
  JFS: use DIV_ROUND_UP where appropriate
  Remove unnecessary kmalloc casts in the jfs filesystem
  JFS is missing a memory barrier
  JFS: Make sure special inode data is written after journal is flushed
  JFS: clear PAGECACHE_TAG_DIRTY for no-write pages

sched: keep total / count stats in addition to the max for

Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched, futex: detach sched.h and futex.h

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: fix: don't take a mutex from interrupt context

print_cfs_stats is callable from interrupt context (sysrq), hence it should
not take mutexes. Change it to use RCU since the task group data is RCU
freed anyway.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: print backtrace of running tasks too

The attached patch is something really simple that can sometimes help
in getting more info out of a hung system.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

printk: use ktime_get()

printk timestamps: use ktime_get().

Some platforms have a functioning clocksource function only after
they are done with early bootup, so delay this until out of
SYSTEM_BOOTING state.

it's also inherently safe now, as any bugs in this area will be
caught by the printk recursion checks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

softlockup: fix signedness

fix softlockup tunables signedness.

mark tunables read-mostly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: latencytop support

LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: fix goto retry in pick_next_task_rt()

looking at it one more time:

(1) it looks to me that there is no need to call
sched_rt_ratio_exceeded() from pick_next_rt_entity()

- [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are
not within this 'tree-like hierarchy' (or whatever we should call it
:-)

- there is also no need to re-check 'rt_rq->rt_time > ratio' at this
point as 'rt_rq->rt_time' couldn't have been increased since the last
call to update_curr_rt() (which obviously calls
sched_rt_ratio_esceeded())
well, it might be that 'ratio' for this rt_rq has been re-configured
(and the period over which this rt_rq was active has not yet been
finished)... but I don't think we should really take this into
account.

(2) now pick_next_rt_entity() must never return NULL, so let's change
pick_next_task_rt() accordingly.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

timers: don't #error on higher HZ values

For some crazy reason (trying to work around hw problem in i810) I wanted
to use HZ around 4000.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: monitor clock underflows in /proc/sched_debug

We monitor clock overflows, let's also monitor clock underflows.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: fix rq->clock warps on frequency changes

sched: fix rq->clock warps on frequency changes

Fix 2bacec8c318ca0418c0ee9ac662ee44207765dd4
(sched: touch softlockup watchdog after idling) that reintroduced warps
on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock
that checks rq->clock for warps, so call it after adjusting rq->clock.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: fix, always create kernel threads with normal priority

Ensure that the kernel threads are created with the usual nice level
and affinity even if kthreadd's properties were changed from the
default by root.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

debug: clean up kernel/profile.c

Before:
total: 25 errors, 13 warnings, 602 lines checked

After:
total: 0 errors, 2 warnings, 601 lines checked

No code changed:

kernel/profile.o:
   text    data     bss     dec     hex filename
   3048     236      24    3308     cec profile.o.before
   3048     236      24    3308     cec profile.o.after
md5:
   2501d64748a4d350dffb11203e2a5182  profile.o.before.asm
   2501d64748a4d350dffb11203e2a5182  profile.o.after.asm

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove the !PREEMPT_BKL code

remove the !PREEMPT_BKL code.

this removes 160 lines of legacy code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: make PREEMPT_BKL the default

make PREEMPT_BKL the default.

precursor to removal of the !PREEMPT_BKL code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

debug: track and print last unloaded module in the oops trace

Based on a suggestion from Andi:

In various cases, the unload of a module may leave some bad state around
that causes a kernel crash AFTER a module is unloaded; and it's then hard
to find which module caused that.

This patch tracks the last unloaded module, and prints this as part of the
module list in the oops trace.

Right now, only the last 1 module is tracked; I expect that this is enough
for the vast majority of cases where this information matters; if it turns
out that tracking more is important, we can always extend it to that.

[ mingo@elte.hu: build fix ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

debug: show being-loaded/being-unloaded indicator for modules

It's rather common that an oops/WARN_ON/BUG happens during the load or
unload of a module. Unfortunatly, it's not always easy to see directly
which module is being loaded/unloaded from the oops itself. Worse,
it's not even always possible to ask the bug reporter, since there
are so many components (udev etc) that auto-load modules that there's
a good chance that even the reporter doesn't know which module this is.

This patch extends the existing "show if it's tainting" print code,
which is used as part of printing the modules in the oops/BUG/WARN_ON
to include a "+" for "being loaded" and a "-" for "being unloaded".

As a result this extension, the "taint_flags()" function gets renamed to
"module_flags()" (and takes a module struct as argument, not a taint
flags int).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY

Remove the curious logic to set it_sched_expires in the future. It useless
because rt.timeout wouldn't be incremented anyway.

Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit
and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1
is d-2.

Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
CC: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: rt-group: reduce rescheduling

Only reschedule if the new group has a higher prio task.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

hrtimer: unlock hrtimer_wakeup

hrtimer_wakeup creates a

base->lock
rq->lock

lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ
which doesn't hold base->lock.

This fully untangles hrtimer locks from the scheduler locks, and allows
hrtimer usage in the scheduler proper.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallback

Currently all highres=off timers are run from softirq context, but
HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.

Fix this up by splitting it similar to the highres=on case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

hrtimer: clean up cpu->base locking tricks

In order to more easily allow for the scheduler to use timers, clean up
the locking a bit.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: rt throttling vs no_hz

We need to teach no_hz about the rt throttling because its tick driven.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: pull_rt_task() cleanup

"goto out" is an odd way to spell "skip".

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: rt group scheduling

Extend group scheduling to also cover the realtime classes. It uses the time
limiting introduced by the previous patch to allow multiple realtime groups.

The hard time limit is required to keep behaviour deterministic.

The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
number of task groups. This is the worst case behaviour I can't seem to get out
of, the avg. case of the algorithms can be improved, I focused on correctness
and worst case.

[ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: rt time limit

Very simple time limit on the realtime scheduling classes.
Allow the rq's realtime class to consume sched_rt_ratio of every
sched_rt_period slice. If the class exceeds this quota the fair class
will preempt the realtime class.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: high-res preemption tick

Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: do not do cond_resched() when CONFIG_PREEMPT

Why do we even have cond_resched when real preemption
is on? It seems to be a waste of space and time.

remove cond_resched with CONFIG_PREEMPT on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: documentation, whitespace fixes

whitespace fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: SCHED_FIFO/SCHED_RR watchdog timer

Introduce a new rlimit that allows the user to set a runtime timeout on
real-time tasks their slice. Once this limit is exceeded the task will receive
SIGXCPU.

So it measures runtime since the last sleep.

Input and ideas by Thomas Gleixner and Lennart Poettering.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Lennart Poettering <mzxreary@0pointer.de>
CC: Michael Kerrisk <mtk.manpages@googlemail.com>
CC: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>