Linus Torvalds [Mon, 15 Oct 2007 20:41:39 +0000 (13:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits)
Input: use full RCU API
Input: remove tsdev interface
Input: add support for Blackfin BF54x Keypad controller
Input: appletouch - another fix for idle reset logic
HWMON: hdaps - switch to using input-polldev
Input: add support for SEGA Dreamcast keyboard
Input: omap-keyboard - don't pretend we support changing keymap
Input: lifebook - fix X and Y axis range
Input: usbtouchscreen - add support for GeneralTouch devices
Input: fix open count handling in input interfaces
Input: keyboard - add CapsShift lock
Input: adbhid - produce all CapsLock key events
Input: ALPS - add signature for ThinkPad R61
Input: jornada720_kbd - send MSC_SCAN events
Input: add support for the HP Jornada 7xx (710/720/728) touchscreen
Input: add support for HP Jornada 7xx onboard keyboard
Input: add support for HP Jornada onboard keyboard (HP6XX)
Input: ucb1400_ts - use schedule_timeout_uninterruptible
Input: xpad - fix dependancy on LEDS class
Input: auto-select INPUT for MAC_EMUMOUSEBTN option
...
Resolved conflicts manually in drivers/hwmon/applesmc.c: converting from
a class device to a device and converting to use input-polldev created a
few apparently trivial clashes..
Linus Torvalds [Mon, 15 Oct 2007 20:31:14 +0000 (13:31 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] pata_pcmcia: Add additional id string (corsair, 1GB)
libata: prevent devices with blank model names from being DMA blacklisted
ata_piix: SATA 2port controller port map fix
pata_cs5536: ATA driver for Geode companion chip
libata: add ST9160821AS / 3.CCD to NCQ blacklist
libata: fix revalidation issuing after configuration commands
[libata] sata_nv: add SW NCQ support for MCP51/MCP55/MCP61
[libata] pata_sil680: Add MMIO support
Linus Torvalds [Mon, 15 Oct 2007 20:30:35 +0000 (13:30 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (35 commits)
xen-netfront: rearrange netfront structure to separate tx and rx
netdev: convert non-obvious instances to use ARRAY_SIZE()
ucc_geth: Fix build break introduced by commit 09f75cd7bf13720738e6a196cc0107ce9a5bd5a0
gianfar: Fix regression caused by new napi interface
gianfar: Cleanup compile warning caused by 0795af57
gianfar: Fix compile regression caused by bea3348e
add new prom.h for AU1x00
update AU1000 get_ethernet_addr()
MIPSsim: General cleanup
Jazzsonic: Fix warning about unused variable.
Remove msic_dcr_read() in axon_msi.c
Use dcr_host_t.base in dcr_unmap()
Add dcr_host_t.base in dcr_read()/dcr_write()
Use dcr_host_t.base in ibm_emac_mal
Update ibm_newemac to use dcr_host_t.base
tehuti: possible leak in bdx_probe
TC35815: Fix build
SAA9730: Fix build
AR7 ethernet
myri10ge: update driver version to 1.3.2-1.287
...
xen-netfront: rearrange netfront structure to separate tx and rx
Keep tx and rx elements separate on different cachelines to prevent
bouncing.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Acked-by: Jeff Garzik <jgarzik@pobox.com> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Atari keyboard: incorporate additional review comments:
o Kill reference to source file name
o Return error value from input_register_device() instead of -ENOMEM
Linus Torvalds [Mon, 15 Oct 2007 19:55:20 +0000 (12:55 -0700)]
Reinstate lost flush_ioremap_region() fix to pxa2xx-flash driver
Commit 90833fdab89da02fc0276224167f0a42e5176f41 ("[ARM] 4554/1: replace
consistent_sync() with flush_ioremap_region()") introduced a new
"flush_ioremap_region()" function to be used by the MTD mainstone-flash
and lubbock-flash drivers to fix a regression from around 2.6.18.
Those drivers were independently merged into a single driver by Todd
Poynor in commit e644f7d6289456657996df4192de76c5d0a9f9c7 ("[MTD] MAPS:
Merge Lubbock and Mainstone drivers into common PXA2xx driver")
Later, those two commits were merged into the main MTD tree by commit b160292cc216a50fd0cd386b0bda2cd48352c73b ("Merge Linux 2.6.23") by David
Woodhouse, but in that merge, the fix to use flush_iomap_region() got
lost (as it was to files that now no longer existed).
This reinstates the fix in the new driver.
Noticed-by: Russell King <rmk@arm.linux.org.uk> Tested-and-acked-by: Nicolas Pitre <nico@cam.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Todd Poynor <tpoynor@mvista.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 15 Oct 2007 19:46:16 +0000 (12:46 -0700)]
scsi/gdth: fix crash in gdth_timeout if no gdth controllers found
If the gdth module is loaded (or compiled in), the gdth_timeout function
gets started even if no actual gdth controllers are found b the probing.
That ends up not only being unnecessary, but also causes a crash due to
the function blindly just trying to pick the first entry off the
"gdth_instances" list, and accessing it - which obviously doesn't work
if the list is empty!
Andrew Paprocki [Mon, 15 Oct 2007 19:43:12 +0000 (15:43 -0400)]
libata: prevent devices with blank model names from being DMA blacklisted
The strn_pattern_cmp routine does not handle a blank name parameter
properly. The only patterns which should match a blank name are "*"
and an explicit "". If the function is passed a blank name in current
code, it will always match against the patt parameter. The bug manifests
itself as the device with the empty model name always matching the first
device in the DMA blacklist, forcing it to revert to PIO mode.
Signed-off-by: Andrew Paprocki <andrew@ishiboo.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This is a driver for the ATA controller on the Geode CS5536 companion
chip. The PCI device ID for this device was previously claimed by
pata_amd.c but the PIO timings were not correct. This driver also
works around a bug in some BIOSes that handle unaligned access to the
PCI config registers poorly. Finally, the driver allows fallback to
using MSR registers for configuration on BIOSes that are truly
broken.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 10 Oct 2007 06:57:44 +0000 (15:57 +0900)]
libata: fix revalidation issuing after configuration commands
After commands which can change device configuration, EH is scheduled
to revalidate and reconfigure the device. Host link was incorrectly
used unconditionally when scheduling EH action. This resulted in
bogus revalidation request and mismatched configuration between device
and driver. Fix it.
This bug was reported by Igor Durdanovic.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Igor Durdanovic <idurdanovic@comcast.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Kuan Luo [Mon, 15 Oct 2007 19:16:53 +0000 (15:16 -0400)]
[libata] sata_nv: add SW NCQ support for MCP51/MCP55/MCP61
Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA
controller. NCQ function is disable by default, you can enable it
with 'swncq=1'. NCQ will be turned off if the drive is Maxtor on
MCP51 or MCP55 rev 0xa2 platform.
[akpm@linux-foundation.org: build fix] Signed-off-by: Kuan Luo <kluo@nvidia.com> Signed-off-by: Peer Chen <pchen@nvidia.com> Cc: Zoltan Boszormenyi <zboszor@dunaweb.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch adds MMIO support to the pata_sil680 for taskfile IOs,
based on what the old siimage does.
I haven't bothered changing the chip setup stuff from PCI config
cycles to MMIO though (siimage does it), I don't think it matters,
I've only adapted it to use MMIO for taskfile accesses.
I've tested it on a Cell blade and it seems to work fine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
drivers/net/ucc_geth.c: In function 'ucc_geth_rx':
drivers/net/ucc_geth.c:3483: error: 'dev' undeclared (first use in this function)
drivers/net/ucc_geth.c:3483: error: (Each undeclared identifier is reported only once
drivers/net/ucc_geth.c:3483: error: for each function it appears in.)
make[2]: *** [drivers/net/ucc_geth.o] Error 1
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Michael Ellerman [Mon, 15 Oct 2007 09:34:37 +0000 (19:34 +1000)]
Use dcr_host_t.base in dcr_unmap()
With the base stored in dcr_host_t, there's no need for callers to pass
the dcr_n into dcr_unmap(). In fact this removes the possibility of them
passing the incorrect value, which would then be iounmap()'ed.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Michael Ellerman [Mon, 15 Oct 2007 09:34:36 +0000 (19:34 +1000)]
Add dcr_host_t.base in dcr_read()/dcr_write()
Now that all users of dcr_read()/dcr_write() add the dcr_host_t.base, we
can save them the trouble and do it in dcr_read()/dcr_write().
As some background to why we just went through all this jiggery-pokery,
benh sayeth:
Initially the goal of the dcr_read/dcr_write routines was to operate like
mfdcr/mtdcr which take absolute DCR numbers. The reason is that on 4xx
hardware, indirect DCR access is a pain (goes through a table of
instructions) and it's useful to have the compiler resolve an absolute DCR
inline.
We decided that wasn't worth the API bastardisation since most places
where absolute DCR values are used are low level 4xx-only code which may
as well continue using mfdcr/mtdcr, while the new API is designed for
device "instances" that can exist on 4xx and Axon type platforms and may
be located at variable DCR offsets.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Michael Ellerman [Mon, 15 Oct 2007 09:34:35 +0000 (19:34 +1000)]
Use dcr_host_t.base in ibm_emac_mal
This requires us to do a sort-of fake dcr_map(), so that base is set
properly. This will be fixed/removed when the device-tree-aware emac driver
is merged.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jeff Garzik <jeff@garzik.org>
CC drivers/net/tc35815.o
drivers/net/tc35815.c: In function 'tc35815_interrupt':
drivers/net/tc35815.c:1464: error: redefinition of 'lp'
drivers/net/tc35815.c:1443: error: previous definition of 'lp' was here
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Vosburgh [Wed, 10 Oct 2007 02:57:24 +0000 (19:57 -0700)]
net/bonding: Optionally allow ethernet slaves to keep own MAC
Update the "don't change MAC of slaves" functionality added in
previous changes to be a generic option, rather than something tied to
IB devices, as it's occasionally useful for regular ethernet devices as
well.
Adds "fail_over_mac" option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.
Includes documentation update.
Updates bonding driver version to 3.2.0.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:43 +0000 (19:43 -0700)]
net/bonding: Destroy bonding master when last slave is gone
When bonding enslaves non Ethernet devices it takes pointers to functions
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved
because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero
ensures that these functions be used anymore.
Signed-off-by: Moni Shoua <monis at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:42 +0000 (19:43 -0700)]
net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev->state field is on. This improves the chances for the arp packet to
be transmitted.
Signed-off-by: Moni Shoua <monis at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:41 +0000 (19:43 -0700)]
net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).
Signed-off-by: Moni Shoua <monis at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:40 +0000 (19:43 -0700)]
net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.
Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)
Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:39 +0000 (19:43 -0700)]
net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).
Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:38 +0000 (19:43 -0700)]
net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.
IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).
Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:37 +0000 (19:43 -0700)]
IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.
Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Moni Shoua [Wed, 10 Oct 2007 02:43:36 +0000 (19:43 -0700)]
IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.
When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.
Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.
To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.
Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Mark Brown [Wed, 10 Oct 2007 10:05:44 +0000 (11:05 +0100)]
natsemi: Use round_jiffies() for slow timers
Unless we have failed to fill the RX ring the timer used by the natsemi
driver is not particularly urgent and can use round_jiffies() to allow
grouping with other timers.
Signed-off-by: Mark Brown <broonie@sirena.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Linus Torvalds [Mon, 15 Oct 2007 17:46:05 +0000 (10:46 -0700)]
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
NFSv4: Fix a typo in nfs_inode_reclaim_delegation
NFS: Add a boot parameter to disable 64 bit inode numbers
NFS: nfs_refresh_inode should clear cache_validity flags on success
NFS: Fix a connectathon regression in NFSv3 and NFSv4
NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
SUNRPC: Don't call xprt_release in call refresh
SUNRPC: Don't call xprt_release() if call_allocate fails
SUNRPC: Fix buggy UDP transmission
[23/37] Clean up duplicate includes in
[2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
SUNRPC: Use correct type in buffer length calculations
SUNRPC: Fix default hostname created in rpc_create()
nfs: add server port to rpc_pipe info file
NFS: Get rid of some obsolete macros
NFS: Simplify filehandle revalidation
NFS: Ensure that nfs_link() returns a hashed dentry
NFS: Be strict about dentry revalidation when doing exclusive create
NFS: Don't zap the readdir caches upon error
NFS: Remove the redundant nfs_reval_fsid()
NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
...
Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
Linus Torvalds [Mon, 15 Oct 2007 15:18:44 +0000 (08:18 -0700)]
Merge branch 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/agp-2.6:
fix use after free in amd create gatt pages
AGP fix race condition between unmapping and freeing pages
Linus Torvalds [Mon, 15 Oct 2007 15:17:26 +0000 (08:17 -0700)]
Merge branch 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
via invalid device ids removal
radeon: Commit the ring after each partial texture upload blit.
i915: fix vbl swap allocation size.
drm: Replace DRM_IOCTL_ARGS with (dev, data, file_priv) and remove DRM_DEVICE.
drm: remove XFREE86_VERSION macros.
drm: Replace filp in ioctl arguments with drm_file *file_priv.
drm: Remove DRM_ERR OS macro.
Linus Torvalds [Mon, 15 Oct 2007 15:16:53 +0000 (08:16 -0700)]
Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux
* 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
knfsd: nfsv4 delegation recall should take reference on client
knfsd: don't shutdown callbacks until nfsv4 client is freed
knfsd: let nfsd manage timing out its own leases
knfsd: Add source address to sunrpc svc errors
knfsd: 64 bit ino support for NFS server
svcgss: move init code into separate function
knfsd: remove code duplication in nfsd4_setclientid()
nfsd warning fix
knfsd: fix callback rpc cred
knfsd: move nfsv4 slab creation/destruction to module init/exit
knfsd: spawn kernel thread to probe callback channel
knfsd: nfs4 name->id mapping not correctly parsing negative downcall
knfsd: demote some printk()s to dprintk()s
knfsd: cleanup of nfsd4 cmp_* functions
knfsd: delete code made redundant by map_new_errors
nfsd: fix horrible indentation in nfsd_setattr
nfsd: remove unused cache_for_each macro
nfsd: tone down inaccurate dprintk
Jiri Kosina [Mon, 15 Oct 2007 13:17:41 +0000 (15:17 +0200)]
HID: fix HIDIOCGRDESC memory access in hidraw
Fix bogus copying of data into userspace when HIDIOCGRDESC is issued.
HID-transport layer makes sure that dev->hid->rdesc is not larger than
HID_MAX_DESCRIPTOR_SIZE.
Ingo Molnar [Mon, 15 Oct 2007 15:00:20 +0000 (17:00 +0200)]
sched: sync wakeups preempt too
make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: affine sync wakeups
make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: maintain stats in account_system_time()
modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.
Modify /proc/<pid>/stat to display these new fields.
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: add guest-CPU /proc/stat field
as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: do not crash on allocation failure
Now that we are calling this at runtime, a more relaxed error path is
suggested. If an allocation fails, we just register the partial table,
which will show empty directories.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: unregister the sysctl table before domains
Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains. This prevents the
sysctl table from pointing to freed memory for root to write.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: use for_each_online_cpu()
init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables. If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data. If the online mask is not
contigious then we would show offline cpus and miss online ones.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
Make scheduler debug file operations const
In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: do not normalize kernel threads via SysRq-N
do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: refactor common code of sleep_on / wait_for_completion
Refactor common code of sleep_on / wait_for_completion
These functions were largely cut'n'pasted. This moves
the common code into single helpers instead. Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.
Mike Galbraith [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: prevent wakeup over-scheduling
Prevent wakeup over-scheduling. Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept. Instead, the
task is marked for preemption at the next tick. Tasks of higher
priority still preempt immediately.
Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix group scheduling for SCHED_BATCH
The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
possibility of 'p' being always a valid address.
Gautham R Shenoy [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix rt ptracer monopolizing CPU
yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.
Use schedule_timeout_uninterruptible(1) instead.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Credit: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dhaval Giani [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: group scheduling, sysfs tunables
Add tunables in sysfs to modify a user's cpu share.
A directory is created in sysfs for each new user in the system.
/sys/kernel/uids/<uid>/cpu_share
Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.
Paul E. McKenney [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: export cpu_clock()
export cpu_clock() - the preferred API instead of sched_clock().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>