Brian King [Tue, 22 Jul 2008 13:31:42 +0000 (08:31 -0500)]
[SCSI] ibmvfc: Fix hang on module removal
If certain ELS events are received during module removal, after the kthread
is stopped, the rmmod can hang. This fixes the ibmvfc driver so that ELS
events during rmmod are ignored by stopping all device activity prior to
killing the kthread and also changes reinitialization to not attempt a reinit
if the adapter has been taken offline.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Brian King [Tue, 22 Jul 2008 13:31:39 +0000 (08:31 -0500)]
[SCSI] ibmvfc: Reduce unnecessary log noise
Reduces some unnecessary log noise by removing a printk during
host port state query and increasing the log level required to
log received async events.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Anderson [Mon, 21 Jul 2008 22:58:32 +0000 (15:58 -0700)]
[SCSI] sym53c8xx: free luntbl in sym_hcb_free
This patch frees the luntbl dma area in sym_hcb_free if allocated.
Since the luntbl is part of a larger dma coherent area not freeing the
luntbl kept a 64k dma coherent area previous allocated through
dma_alloc_coherent allocated. This prevented a DLPAR remove IO
operation from completing successfully.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Julia Lawall [Mon, 21 Jul 2008 07:58:30 +0000 (09:58 +0200)]
[SCSI] scsi_scan.c: Release mutex in error handling code
The mutex is released on a successful return, so it would seem that it
should be released on an error return as well.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression l;
@@
mutex_lock(l);
... when != mutex_unlock(l)
when any
when strict
(
if (...) { ... when != mutex_unlock(l)
+ mutex_unlock(l);
return ...;
}
|
mutex_unlock(l);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Alan Stern [Mon, 21 Jul 2008 14:25:52 +0000 (10:25 -0400)]
[SCSI] scsi_eh_prep_cmnd should save scmd->underflow
This patch (as1116) fixes a bug in scsi_eh_prep_cmnd() and
scsi_eh_restore_cmnd(). These routines are supposed to save any
values they change and restore them later, but someone forgot to
save & restore scmd->underflow.
This fixes part of the problem reported in Bugzilla #9638.
[jejb: fix up rejections around DIF/DIX] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[SCSI] sd: Identify DIF protection type and application tag ownership
If a disk is formatted with protection information (Inquiry bit
PROTECT=1) it is required to support Read Capacity(16). Force use of
the 16-bit command in this case and extract the P_TYPE field which
indicates whether the disk is formatted using DIF Type 1, 2 or 3.
The ATO (App Tag Own) bit in the Control Mode Page indicates whether
the storage device or the initiator own the contents of the
DIF application tag.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Controllers that support DMA of protection information must be told
explicitly how to handle the I/O. The controller has no knowledge of
the protection capabilities of the target device so this information
must be passed in the scsi_cmnd.
- The protection operation tells the HBA whether to generate, strip or
verify protection information.
- The protection type tells the HBA which layout the target is
formatted with. This is necessary because the controller must be
able to correctly interpret the included protection information in
order to verify it.
- When a scsi_cmnd is reused for error handling the protection
operation must be cleared and saved while error handling is in
progress.
- prot_op and prot_type are placed in an existing hole in scsi_cmnd
and don't cause the structure to grow.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hannes Reinecke [Thu, 17 Jul 2008 23:53:33 +0000 (16:53 -0700)]
[SCSI] scsi_dh: create lookup cache
Create a cache of devices that are seen in a system. This will avoid
the unnecessary traversal of the device list in the scsi_dh when there
are multiple luns of a same type.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hannes Reinecke [Fri, 18 Jul 2008 00:49:02 +0000 (17:49 -0700)]
[SCSI] scsi_dh: attach to hardware handler from dm-mpath
multipath keeps a separate device table which may be
more current than the built-in one.
So we should make sure to always call ->attach whenever
a multipath map with hardware handler is instantiated.
And we should call ->detach on removal, too.
[sekharan: update as per comments from agk] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hannes Reinecke [Thu, 17 Jul 2008 23:53:03 +0000 (16:53 -0700)]
[SCSI] scsi_dh: Update EMC handler
This patch converts the EMC device handler to use a proper
state machine. We now also parse the extended INQUIRY
information to determine if long trespass commands are
supported. And we're now using the long trespass command
correctly. And finally there's now an check at init time
to refuse to attach to devices not supporting EMC-specific
VPD pages.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hannes Reinecke [Thu, 17 Jul 2008 23:52:57 +0000 (16:52 -0700)]
[SCSI] scsi_dh: Add 'dh_state' sysfs attribute
Implement a 'dh_state' sdev attribute for dynamic device handler
manipulation. A read on the attribute will return the name of
the currently attached device handler or 'detached' if no handler
is attached.
The attribute allows the following strings to be written:
- The name of the device handler to be attached if the state is
'detached'.
- 'activate' to trigger path activation if a device handler
is attached.
- 'detach' to detach the currently attached device handler.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hannes Reinecke [Thu, 17 Jul 2008 23:52:51 +0000 (16:52 -0700)]
[SCSI] scsi_dh: Implement common device table handling
Instead of having each and every driver implement its own
device table scanning code we should rather implement a common
routine and scan the device tables there.
This allows us also to implement a general notifier chain
callback for all device handler instead for one per handler.
[sekharan: Fix rejections caused by conflicting bug fix] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Matthew Wilcox [Tue, 15 Jul 2008 20:54:16 +0000 (14:54 -0600)]
[SCSI] Make host_no an unsigned int
Daniel Debonzi reports that he has managed to wrap host_no. Increasing
the number of host numbers available to 32-bit from 16-bit allows the
problem to be evaded for another hundred years.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Sat, 12 Jul 2008 00:50:35 +0000 (19:50 -0500)]
[SCSI] fix shared tag map tag allocation
When drivers use a shared tag map we can end up with more requests
than tags, because the tag map is shost->can_queue tags and there
can be sdevs * sdev->queue_depth requests. In scsi_request_fn
if tag allocation fails we just drop down to just dequeueing the
tag without a tag. The problem is that drivers using the shared tag
map rely on a valid tag always being set, because it will use the
tag number to lookup commands later.
This patch has us check if we got a valid tag when the host lock
is held right before we check if the host queue is ready. We do the
check here because to allocate the tag we need the q lock, but
if the tag is bad we want to add the device/q onto the starved list
which requires the host lock.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Sat, 12 Jul 2008 00:50:34 +0000 (19:50 -0500)]
[SCSI] stex: fix queue depth setting
We want to set the queue depth to something reasonable - not
the can_queue.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Ed Lin <ed.lin@promise.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Sat, 12 Jul 2008 00:50:33 +0000 (19:50 -0500)]
[SCSI] qla4xxx: fix queue depth setting
We want to set the queue depth to something reasonable - not
the can_queue.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: David Somayajulu <david.somayajulu@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Sat, 12 Jul 2008 00:50:32 +0000 (19:50 -0500)]
[SCSI] fix shared tag map setup
Currently qla4xxx and stex pass in their can_queue values into
scsi_activate_tcq because they wanted the tag map that large.
The problem with this is that it ends up also setting the queue
depth to that large value. All we want to do this in this case
is set the device queue depth and the other device settings.
We do not need to touch the tag map sizing because the drivers
had setup that map according to their can_queue limits when the
shared map was created.
The scsi mid layer in request_fn will then handle the case where we
have more requests than available tags when it checks the host
queue ready function.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Kai Makisara [Fri, 11 Jul 2008 12:06:40 +0000 (15:06 +0300)]
[SCSI] st: Remove bogus memset
Mike Christie noticed a bogus memset. It can be removed as dead code
since the number of bytes in the driver buffer in fixed block mode is
always a multiple of the tape block size.
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Andrew Vasquez [Thu, 10 Jul 2008 23:56:01 +0000 (16:56 -0700)]
[SCSI] qla2xxx: Verify the RISC is not in ROM code if firmware-load is disabled.
Add an additional check to verify that the current executing
firmware is in fact non-ROM code. The non-ROM Get-ID mailbox
command is used for verification.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Seokmann Ju [Thu, 10 Jul 2008 23:55:58 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Correct rport/fcport visibility-state handling during loop-resync.
There were several issues here, one, during RSCN handling if a
follow-on RSCN occurred (within interrupt context) the DPC thread
could inadvertantly leave the fcport in a stale lost state.
Secondly, scheduled rport removal is handled exclusively by the
'parent' DPC thread, so wake up the proper thread. Finally,
process vport loop-resync's only when the vport has in an
"active" state (ID acquired).
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Seokmann Ju [Thu, 10 Jul 2008 23:55:57 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Correct vport management of MBA_PORT_UPDATE.
By allowing the qla2x00_alert_all_vps() to manage per-vport
recognition of the MBA.
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Seokmann Ju [Thu, 10 Jul 2008 23:55:56 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Correct fcport state-management during loss.
All fcport->state management should be done within
qla2x00_mark_device_lost(), the assignment of state within
qla2x00_mark_vp_devices_dead() caused associated rports to not be
removed.
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Seokmann Ju [Thu, 10 Jul 2008 23:55:55 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Always aquire the parent's hardware_lock.
While issuing a marker, manipulating the request/response queues
and modifying the outstanding command array.
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Andrew Vasquez [Thu, 10 Jul 2008 23:55:54 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Swap enablement order of EFT and FCE.
The firmware group has suggested that FCE (Fibre Channel Event)
tracing be enabled prior to EFT (Extended Firmware Tracing) to
maximize the capturing of data on the wire. This change has no
real semantic effect on driver operation, as it's mostly a
shuffling of code.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Joe Carnuccio [Thu, 10 Jul 2008 23:55:53 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Retrieve board serial-number and description from VPD.
Recent ISPs have this information written at manufacturing time,
so use the information. This also reduces future churn of the
qla_devtbl.h file contents, as the driver can now depend on the
information to be present in VPD.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Andrew Vasquez [Thu, 10 Jul 2008 23:55:52 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Allow the user the option of disabling iIDMA.
iIDMA support requires the driver issue several additional
fabric-managegment (FM) commands per port discovered during SNS
scanning -- GFPN (Get Fabric Port Name) and GPSC (Get Port Speed
Capabilities). It has been found during testing that some
switches do not respond as *well* as expected to these commands
(silence -- no ACC nor BS_RJT). So, to handle such conditions,
allow the user the ability to indirectly disable the FM commands
by disabling iIDMA with the ql2xiidmaenable module-parameter.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Seokmann Ju [Thu, 10 Jul 2008 23:55:51 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Cleanup NPIV related functions
Removed repeated or unnecessary operations during vport
creation/deletion.
Signed-off-by: Shyam Sundar <shyam.sundar@qlogic.com> Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Ravi Anand <ravi.anand@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Andrew Vasquez [Thu, 10 Jul 2008 23:55:48 +0000 (16:55 -0700)]
[SCSI] qla2xxx: Set an rport's dev_loss_tmo value in a consistent manner.
As there's no point in adding a fixed-fudge value (originally 5
seconds), honor the user settings only. We also remove the
driver's dead-callback get_rport_dev_loss_tmo function
(qla2x00_get_rport_loss_tmo()).
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Roland McGrath [Sat, 26 Jul 2008 03:00:10 +0000 (20:00 -0700)]
x86_64: fix ia32 AMD syscall audit fast-path
The new code in commit 5cbf1565f29eb57a86a305b08836613508e294d7
has a bug in the version supporting the AMD 'syscall' instruction.
It clobbers the user's %ecx register value (with the %ebp value).
Adrian Bunk [Fri, 25 Jul 2008 23:38:00 +0000 (02:38 +0300)]
MFD_TC6393XB is ARM-only
Compile error on other architectures:
CC drivers/mfd/tc6393xb.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/mfd/tc6393xb.c: In function ‘tc6393xb_attach_irq’:
/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/mfd/tc6393xb.c:324: error: implicit declaration of function ‘set_irq_flags’
...
Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net/socket.c:1515:17: error: symbol 'sys_paccept' redeclared with different type (originally declared at include/linux/syscalls.h:413) - incompatible argument 4 (different address spaces)
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
powerpc: Fix boot problem due to AT_BASE_PLATFORM change
Commit 9115d13453dee22473a1e8cacc90a8d64a9c4bc9 ("powerpc: Enable
AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
have to use PTRRELOC to initialize powerpc_base_platform this early in
boot.
Bug reported by Jon Smirl.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Adrian Bunk [Tue, 1 Jul 2008 16:27:16 +0000 (19:27 +0300)]
remove dummy asm/kvm.h files
This patch removes the dummy asm/kvm.h files on architectures not (yet)
supporting KVM and uses the same conditional headers installation as
already used for a.out.h .
Also removed are superfluous install rules in the s390 and x86 Kbuild
files (they are already in Kbuild.asm).
Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
mm/hugetlb.c: fix build failure with !CONFIG_SYSCTL
on !CONFIG_SYSCTL on x86 with latest -git i get:
mm/hugetlb.c: In function 'decrement_hugepage_resv_vma':
mm/hugetlb.c:83: error: 'reserve' undeclared (first use in this function)
mm/hugetlb.c:83: error: (Each undeclared identifier is reported only once
mm/hugetlb.c:83: error: for each function it appears in.)
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
powerpc: Wireup new syscalls
Move update_mmu_cache() declaration from tlbflush.h to pgtable.h
powerpc/pseries: Remove kmalloc call in handling writes to lparcfg
powerpc/pseries: Update arch vector to indicate support for CMO
ibmvfc: Add support for collaborative memory overcommit
ibmvscsi: driver enablement for CMO
ibmveth: enable driver for CMO
ibmveth: Automatically enable larger rx buffer pools for larger mtu
powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O
powerpc/pseries: vio bus support for CMO
powerpc/pseries: iommu enablement for CMO
powerpc/pseries: Add CMO paging statistics
powerpc/pseries: Add collaborative memory manager
powerpc/pseries: Utilities to set firmware page state
powerpc/pseries: Enable CMO feature during platform setup
powerpc/pseries: Split retrieval of processor entitlement data into a helper routine
powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg
powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines
powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg
powerpc: Fix compile error with binutils 2.15
...
Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.
The new type checking of the flags arguments to irqsave and friends
(commit 3f307891ce0e7b0438c432af1aacd656a092ff45) pointed out this thing
with a big nice warning.
This module harvests more than just memory errors, it also harvests
various bus and dma errors that the Chipset detects. Previously, it would
report all such errors, which would cause output to be TOO loud.
This patches therefore adds a parameter which is used to turn off
NON-MEMORY error reports by default. Or the reporting can be enabled via
the parameter
Also did code style cleanup: less than 80 characters per line rule
Arthur Jones [Fri, 25 Jul 2008 08:49:12 +0000 (01:49 -0700)]
edac: core fix added newline to sysfs dimm labels
The channel DIMM label does not seem to be used much in the edac code.
However, where it is used (in the core code), it is assumed to not have a
newline embedded. This leaves the sysfs file newline free which looks
funny when cat'ing it. Here we just add the trailing newline to the sysfs
chX_dimm_label output...
[Doug Thompson note: the DIMM label is one of the primary uses of EDAC.
User space daemon scripts, edac-utils@sourceforge, populate the DIMM label
fields, via /sys/devices/system/edac attributes, with the silk screen
labels of the motherboard in use. dmidecode access BIOS tables, but BIOS
tables are well known to be incorrect and useless in these respects.
edac-utils will strip off any newlines before its use of the output, when
displaying DIMM slot silk screen labels.
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:11 +0000 (01:49 -0700)]
edac: core fix static to dynamic kset
Static kobjects and ksets are not supported in Linux kernel. Convert the
mc_kset from static to dynamic. This patch depends on my previous patch
to remove the module parameter attributes from mc...
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:10 +0000 (01:49 -0700)]
edac: core fix redundant sysfs controls to parameters
/sys/devices/system/edac/mc has a few files which are duplicated in
/sys/module/edac_core/parameters. Now that all the functionality is
duplicated between these two locations, we remove the former kobject
attributes and update the documentation.
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:09 +0000 (01:49 -0700)]
edac: core fix workq timer
When updating the edac_mc_poll_msec module parameter from the sysfs
/sys/module/edac_core/parameters/edac_mc_poll_msec file, we don't update
the workq timers. So that, if we move from a big poll time to a small
one, the small one won't take effect until the big one has timed out.
Here we provide a new module parameter set method to call out to the
update routine. This brings the /sys/module/edac_core/parameters
functionality up to that provided by the /sys/drivers/system/edac/mc sysfs
module parameter files so that we can remove them or at least link to the
/sys/module files...
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:08 +0000 (01:49 -0700)]
edac: core fix to use dynamic kobject
Static kobjects are not supported in linux kernel. Convert the
edac_pci_top_main_kobj from static to dynamic. This avoids the double
free of the edac_pci_top_main_kobj.name that we see on module reload of
the e752x edac driver (and probably others as well).
In addition Greg KH <greg@kroah.com> has pointed out that this code may be
cleaned up significantly. I will look at that as a follow-on patch, for
now, I just want the minimum fix to get this double-free oops bug
squashed...
Many thanks to Greg KH for his patience in showing me what the
Documentation/kobject.txt already said (oops)...
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:08 +0000 (01:49 -0700)]
edac: i5100: cleanup
Some code cleanliness issues found by Andrew Morton (thanks!) which should
not affect functionality, but which should help make the code more
maintainable.
In particular, we now:
* convert all #define's w/ a parameter to static inlines
* use 1UL rather than 1ULL when calculating an unsigned long
* use pci_disable_device
The resulting code is tested and seems to work fine...
Signed-off-by: Arthur Jones <ajones@riverbed.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:05 +0000 (01:49 -0700)]
edac: i5100 fix missing bits
The error mask we use to trigger ECC notifications is missing many bits of
interest. We add these bits here so that all possible ECC errors can be
reported.
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:04 +0000 (01:49 -0700)]
edac: i5100 new intel chipset driver
Preliminary support for the Intel 5100 MCH. CE and UE errors are reported
along with the current DIMM label information and other memory parameters.
Reasons why this is preliminary:
1) This chip has 2 independent memory controllers which, for best
perforance, use interleaved accesses to the DDR2 memory. This
architecture does not map very well to the current edac data structures
which depend on symmetric channel access to the interleaved data.
Without core changes, the best I could do for now is to map both memory
controllers to different csrows (first all ranks of controller 0, then
all ranks of controller 1). Someone much more familiar with the edac
core than I will probably need to come up with a more general data
structure to handle the interleaving and de-interleaving of the two
memory controllers.
2) I have not yet tackled the de-interleaving of the rank/controller
address space into the physical address space of the CPU. There is
nothing fundamentally missing, it is just ending up to be a lot of
code, and I'd rather keep it separate for now, esp since it doesn't
work yet...
3) The code depends on a particular i5100 chip select to DIMM mainboard
chip select mapping. This mapping seems obvious to me in order to
support dual and single ranked memory, but it is not unique and DIMM
labels could be wrong on other mainboards. There is no way to query
this mapping that I know of.
4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per
controller, 2 ranks per DIMM are supported. I do not have hardware
(nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
per controller) mode.
5) The serial presence detect code should be broken out into a "real"
i2c driver so that decode-dimms.pl can work.
Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the get_parent export operation by sending a LOOKUP request with
".." as the name.
Implement looking up an inode by node ID after it has been evicted from
the cache. This is done by seding a LOOKUP request with "." as the name
(for all file types, not just directories).
The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
indicate that it supports these special lookups.
Thanks to John Muir for the original implementation of this feature.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new helper function which sends a LOOKUP request with the supplied
name. This will be used by the next patch to send special LOOKUP requests
with "." and ".." as the name.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement export_operations, to allow fuse filesystems to be exported to
NFS. This feature has been in the out-of-tree fuse module, and is widely
used and tested.
It has not been originally merged into mainline, because doing the NFS
export in userspace was thought to be a cleaner and more efficient way of
doing it, than through the kernel.
While that is true, it would also have involved a lot of duplicated effort
at reimplementing NFS exporting (all the different versions of the
protocol). This effort was unfortunately not undertaken by anyone, so we
are left with doing it the easy but less efficient way.
If this feature goes in, the out-of-tree fuse module can go away,
which would have several advantages:
- not having to maintain two versions
- less confusion for users
- no bugs due to kernel API changes
Comment from hch:
- Use the same fh_type values as XFS, since we use the same fh encoding.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
locks: allow ->lock() to return FILE_LOCK_DEFERRED
Allow filesystem's ->lock() method to call posix_lock_file() instead of
posix_lock_file_wait(), and return FILE_LOCK_DEFERRED. This makes it
possible to implement a such a ->lock() function, that works with the lock
manager, which needs the call to be asynchronous.
Now the vfs_lock_file() helper can be used, so this is a cleanup as well.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
locks: add special return value for asynchronous locks
Use a special error value FILE_LOCK_DEFERRED to mean that a locking
operation returned asynchronously. This is returned by
posix_lock_file() for sleeping locks to mean that the lock has been
queued on the block list, and will be woken up when it might become
available and needs to be retried (either fl_lmops->fl_notify() is
called or fl_wait is woken up).
f_op->lock() to mean either the above, or that the filesystem will
call back with fl_lmops->fl_grant() when the result of the locking
operation is known. The filesystem can do this for sleeping as well
as non-sleeping locks.
This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
filesystems are not mistaken to mean an asynchronous locking.
This also makes error handling in fs/locks.c and lockd/svclock.c slightly
cleaner.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix nlm_fopen() to return NLM_FAILED (or NLM_LCK_DENIED_NOLOCKS) instead
of NLM_LCK_DENIED. The latter means the lock request failed because of a
conflicting lock (i.e. a temporary error), which is wrong in this case.
Also fix the client to return ENOLCK instead of EAGAIN if a blocking lock
request returns with NLM_LOCK_DENIED.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes, application responses become bad under heavy memory load.
Applications take a bit time to reclaim memory. The statistics, how long
memory reclaim takes, will be useful to measure memory usage.
This patch adds accounting memory reclaim to per-task-delay-accounting for
accounting the time of do_try_to_free_pages().
<i.e>
- When System is under low memory load,
memory reclaim may not occur.
$ time tar cvf test.tar test.dat
real 0m13.388s
user 0m0.116s
sys 0m5.304s
$ ./delayget -d -p <pid>
CPU count real total virtual total delay total
428 5528345500547711608062749891
IO count delay total
338 8078977189
SWAP count delay total
0 0
RECLAIM count delay total
0 0
- When system is under heavy memory load
memory reclaim may occur.
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0
0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0
0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0
In this case, one process uses more 8G memory
by execution of malloc() and memset().
$ time tar cvf test.tar test.dat
real 1m38.563s <- increased by 85 sec
user 0m0.140s
sys 0m7.060s
David Howells [Fri, 25 Jul 2008 08:48:50 +0000 (01:48 -0700)]
tsacct: fix bacct_add_tsk()'s use of do_div()
Fix bacct_add_tsk()'s use of do_div() on an s64 by making ac_etime a u64
instead and dividing that.
Possibly this should be guarded lest the interval calculation turn up
negative, but the possible negativity of the result of the division is
cast away, and it shouldn't end up negative anyway.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Fri, 25 Jul 2008 08:48:49 +0000 (01:48 -0700)]
task IO accounting: provide distinct tgid/tid I/O statistics
Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
parent I/O statistics in /proc/pid/io. This approach follows the same
model used to account per-process and per-thread CPU times.
As a practial application, this allows for example to quickly find the top
I/O consumer when a process spawns many child threads that perform the
actual I/O work, because the aggregated I/O statistics can always be found
in /proc/pid/io.
[ Oleg Nesterov points out that we should check that the task is still
alive before we iterate over the threads, but also says that we can do
that fixup on top of this later. - Linus ]
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Matt Heaton <matt@hostmonster.com> Cc: Shailabh Nagar <nagar@watson.ibm.com>
Acked-by-with-comments: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:47 +0000 (01:48 -0700)]
bsdacct: turn acct off for all pidns-s on umount time
All the bsd_acct_strcts with opened accounting are linked into a global
list. So, the acct_auto_close(_mnt) walks one and drops the accounting
for each.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>