Jonathan Corbet [Sat, 4 Nov 2006 12:22:27 +0000 (09:22 -0300)]
V4L/DVB (4796): A couple of V4L2 defines needed by Cafe Camara driver
Two defines for V4L2, needed by the Cafe camera driver:
1) Add the RGB444 image format
2) Add the "init" internal command which is separate from "reset".
Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Hartmut Hackmann [Mon, 30 Oct 2006 23:00:16 +0000 (20:00 -0300)]
V4L/DVB (4792): Add support for the Compro Videomate DVB-T200A
This board has the same PCI ID as the T200, so the exact board type
is determined from the eeprom.
The original patch was provided by Francis Barber <fedora@barber-family.id.au>
Hartmut Hackmann [Mon, 30 Oct 2006 22:56:59 +0000 (19:56 -0300)]
V4L/DVB (4791): Added autodetected flag to the saa7134_dev structure
In case the exact board type needs to be determined by probing
or evaluating the eeprom, this flag allows to still set the
board type via the card=xx insmod option.
This is an extract of a patch by Francis Barber.
Trent Piepho [Sun, 29 Oct 2006 16:35:39 +0000 (13:35 -0300)]
V4L/DVB (4789): Lgdt330x: SNR and signal strength reporting
Update the SNR calculations to use the new dvb_math log function, and add
SNR calculations for all supported modulations for both lg dt3302 and dt3303.
The QAM equations don't appear in the dt3302 datasheet, so the ones from the
dt3303 datasheet were used.
SNR returned is the actual value in dB as 8.8 fixed point.
Reporting of real signal strength isn't supported, so rather than return 0,
which confuses some software and users, a re-scaled SNR value is returned.
Code originally by Rusty Scott.
Signed-off-by: Trent Piepho <xyzzy@speakeasy.org> Signed-off-by: Rusty Scott <rustys@ieee.org> Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Hartmut Hackmann [Thu, 12 Oct 2006 23:38:51 +0000 (20:38 -0300)]
V4L/DVB (4769): Added support for a ASUSTEK P7131 Dual DVB-T variant
This card has no firmware eeprom. The old version still should not
need a firmware file due to an undocumented feature of the TDA10046.
The patch also includes Hermann Pitton's proposal for improved
antenna switch handling
Mike Isely [Mon, 16 Oct 2006 00:35:14 +0000 (21:35 -0300)]
V4L/DVB (4763): Pvrusb2: Implement IR reception for 24xxx devices
Unlike 29xxx devices, the 24xxx model series does not have a dedicated
I2C device for reception of IR codes. Instead IR is handled directly
by the FX2 microcontroller and the results are communicated via
commands to the FX2. Rather than implement a whole new IR reception
pathway for 24xxx devices, this changeset instead emulates the
presence of the 29xxx device's I2C based IR receiver by intercepting
commands to that chip and issuing appropriate FX2 commands to do the
needed action. This has the result of allowing all the usual IR
frameworks (ir-kbd-i2c or lirc) to continue working unmodified for
24xxx devices.
Signed-off-by: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Michael Krufky [Mon, 16 Oct 2006 23:47:14 +0000 (20:47 -0300)]
V4L/DVB (4759): Cx88: use external adc for rca audio inputs on the ASUS PVR-416
For the ASUS PVR-416, the external adc must be used for
the rca audio inputs, but television / radio inputs use
the internal adc.
Thanks to Alex Deucher for lending his card to me.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Michael Krufky [Mon, 16 Oct 2006 19:51:11 +0000 (16:51 -0300)]
V4L/DVB (4757): Cx88: determine whether or not to use external adc
Some cx88-blackbird boards use an external adc, but not necessarily
for all inputs. Thus, this needs to be configurable on the card level
for each input.
This patch allows for the usage of the external adc to be determined
by a bit setting in the cx88_input struct for cards based on the cx88
blackbird design.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Steven Toth [Sat, 7 Oct 2006 00:29:25 +0000 (21:29 -0300)]
V4L/DVB (4736): Cx88-blackbird module is rejected during probe.
If the last cx88 board probed is not backbird based, and a previous board was,
the entire module is unloaded leading to an oops during mpeg_open on the
first /dev/videoN device.
Signed-off-by: Steven Toth <stoth@hauppauge.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Steven Toth [Fri, 6 Oct 2006 00:28:24 +0000 (21:28 -0300)]
V4L/DVB (4723): Bugfix: Select the correct cx8802_dev when enumerating by CX88_MPEG_type
A bug in cx8802_get_driver() meant that in multiboard environments, when testing
frontends on the non primary board, the incorrect device was returned resulting
in "Unsupported value in .mpeg.." messages. Depending on the electrical design
of the hardware (serial, parallel, rising/falling edge detect), transport would
still be delivered and the problem went unnoticed.
This patch ensures the correct instance of cx8802_dev is returned.
Signed-off-by: Steven Toth <stoth@hauppauge.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Steven Toth [Sat, 2 Dec 2006 23:15:51 +0000 (21:15 -0200)]
V4L/DVB (4676): Dynamic cx88 mpeg port management for HVR1300 MPEG2/DVB-T support.
A series of patches to change the cx88 framework to allow the
PCI mpeg port to be shared dynamically between different
types of drivers or applications. This patch changes the cx88-dvb
and cx88-blackbird drivers to become 'sub drivers' of a higher
single cx88-mpeg driver.
The cx88-mpeg driver is a superset of the previous cx88-mpeg/blackbird
drivers and now owns the IRQ. cx88-dvb/blackbird now become mini drivers,
registering themselves with cx88-mpeg through a standard interface with
callbacks.
Sub drivers request access to hardware via the cx88-mpeg driver. In turn
the cx88-mpeg driver determines whether the hardware is busy and accepts
or refuses the request, grant access using callbacks into the sub drivers.
The net effect is that you are no longer able to tamper with the mpeg port
from multiple different applications at the same time, potentially breaking
a live mpeg2 hardware encoding or dvb stream.
The mechanism extends to enable multiple dvb frontends to be registered
and share the single resource.
Signed-off-by: Steven Toth <stoth@hauppauge.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Trent Piepho [Wed, 4 Oct 2006 23:33:51 +0000 (20:33 -0300)]
V4L/DVB (4722): Cx88: Add support for VIDIOC_INT_[SR]_REGISTER ioctls
Add support for the advanced debugging ioctls, to allow access to the
cx88 registers from userspace. Only i2c_id == 0 is supported, for access
to the cx88 adapter itself. There isn't any support for access to I2C
clients of the adapter. Most of them don't have R/W registers anyway,
and its necessary to use i2c-dev to talk to them from userspace.
Linus Torvalds [Sat, 9 Dec 2006 21:31:07 +0000 (13:31 -0800)]
Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] x86-64: no paravirt for X86_VOYAGER or X86_VISWS
[PATCH] i386: Fix io_apic.c warning
[PATCH] i386: export smp_num_siblings for oprofile
[PATCH] x86: Work around gcc 4.2 over aggressive optimizer
[PATCH] x86: Fix boot hang due to nmi watchdog init code
[PATCH] x86: Fix verify_quirk_intel_irqbalance()
[PATCH] i386: Update defconfig
[PATCH] x86-64: Update defconfig
Randy Dunlap [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] x86-64: no paravirt for X86_VOYAGER or X86_VISWS
Since Voyager and Visual WS already define ARCH_SETUP,
it looks like PARAVIRT shouldn't be offered for them.
In file included from arch/i386/kernel/setup.c:63:
include/asm-i386/mach-visws/setup_arch.h:8:1: warning: "ARCH_SETUP" redefin=
ed
In file included from include/asm/msr.h:5,
from include/asm/processor.h:17,
from include/asm/thread_info.h:16,
from include/linux/thread_info.h:21,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:49,
from include/linux/capability.h:45,
from include/linux/sched.h:46,
from arch/i386/kernel/setup.c:26:
include/asm/paravirt.h:163:1: warning: this is the location of the previous=
definition
In file included from arch/i386/kernel/setup.c:63:
include/asm-i386/mach-visws/setup_arch.h:8:1: warning: "ARCH_SETUP" redefin=
ed
In file included from include/asm/msr.h:5,
from include/asm/processor.h:17,
from include/asm/thread_info.h:16,
from include/linux/thread_info.h:21,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:49,
from include/linux/capability.h:45,
from include/linux/sched.h:46,
from arch/i386/kernel/setup.c:26:
include/asm/paravirt.h:163:1: warning: this is the location of the previous=
definition
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Andi Kleen [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] i386: Fix io_apic.c warning
gcc 4.2 warns
linux/arch/i386/kernel/io_apic.c: In function ‘create_irq’:
linux/arch/i386/kernel/io_apic.c:2488: warning: ‘vector’ may be used uninitialized in this function
The warning is false, but somewhat legitimate so work around it.
Randy Dunlap [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] i386: export smp_num_siblings for oprofile
oprofile uses smp_num_siblings without testing for CONFIG_X86_HT.
I looked at modifying oprofile, but this way is cleaner & simpler
and I didn't see a good reason not to just export it when CONFIG_SMP.
Andi Kleen [Sat, 9 Dec 2006 20:33:36 +0000 (21:33 +0100)]
[PATCH] x86: Work around gcc 4.2 over aggressive optimizer
The new PDA code uses a dummy _proxy_pda variable to describe
memory references to the PDA. It is never referenced
in inline assembly, but exists as input/output arguments.
gcc 4.2 in some cases can CSE references to this which causes
unresolved symbols. Define it to zero to avoid this.
[PATCH] x86: Fix boot hang due to nmi watchdog init code
2.6.19 stopped booting (or booted based on build/config) on our x86_64
systems due to a bug introduced in 2.6.19. check_nmi_watchdog schedules an
IPI on all cpus to busy wait on a flag, but fails to set the busywait
flag if NMI functionality is disabled. This causes the secondary cpus
to spin in an endless loop, causing the kernel bootup to hang.
Depending upon the build, the busywait flag got overwritten (stack variable)
and caused the kernel to bootup on certain builds. Following patch fixes
the bug by setting the busywait flag before returning from check_nmi_watchdog.
I guess using a stack variable is not good here as the calling function could
potentially return while the busy wait loop is still spinning on the flag.
Linus Torvalds [Sat, 9 Dec 2006 20:26:37 +0000 (12:26 -0800)]
Merge branch 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (21 commits)
Fix http://bugzilla.kernel.org/show_bug.cgi?id=7606
drm: add flag for mapping PCI DMA buffers read-only.
drm: fix up irqflags in drm_lock.c
drm: i915 updates
drm: i915: fix up irqflags arg
drm: i915: Only return EBUSY after we've established we need to schedule a new swap.
drm: i915: Fix 'sequence has passed' condition in i915_vblank_swap().
drm: i915: Add SAREA fileds for determining which pipe to sync window buffer swaps to.
drm: Make handling of dev_priv->vblank_pipe more robust.
drm: DRM_I915_VBLANK_SWAP ioctl: Take drm_vblank_seq_type_t instead
drm: i915: Add ioctl for scheduling buffer swaps at vertical blanks.
drm: Core vsync: Don't clobber target sequence number when scheduling signal.
drm: Core vsync: Add flag DRM_VBLANK_NEXTONMISS.
drm: Make locked tasklet handling more robust.
drm: drm_rmdraw: Declare id and idx as signed so testing for < 0 works as intended.
drm: Change first valid DRM drawable ID to be 1 instead of 0.
drm: drawable locking + memory management fixes + copyright
drm: Add support for interrupt triggered driver callback with lock held to DRM core.
drm: Add support for tracking drawable information to core
drm: add support for secondary vertical blank interrupt to i915
...
David Howells [Thu, 7 Dec 2006 11:33:26 +0000 (11:33 +0000)]
[PATCH] WorkStruct: Use direct assignment rather than cmpxchg()
Use direct assignment rather than cmpxchg() as the latter is unavailable
and unimplementable on some platforms and is actually unnecessary.
The use of cmpxchg() was to guard against two possibilities, neither of
which can actually occur:
(1) The pending flag may have been unset or may be cleared. However, given
where it's called, the pending flag is _always_ set. I don't think it
can be unset whilst we're in set_wq_data().
Once the work is enqueued to be actually run, the only way off the queue
is for it to be actually run.
If it's a delayed work item, then the bit can't be cleared by the timer
because we haven't started the timer yet. Also, the pending bit can't be
cleared by cancelling the delayed work _until_ the work item has had its
timer started.
(2) The workqueue pointer might change. This can only happen in two cases:
(a) The work item has just been queued to actually run, and so we're
protected by the appropriate workqueue spinlock.
(b) A delayed work item is being queued, and so the timer hasn't been
started yet, and so no one else knows about the work item or can
access it (the pending bit protects us).
Besides, set_wq_data() _sets_ the workqueue pointer unconditionally, so
it can be assigned instead.
So, replacing the set_wq_data() with a straight assignment would be okay
in most cases.
The problem is where we end up tangling with test_and_set_bit() emulated
using spinlocks, and even then it's not a problem _provided_
test_and_set_bit() doesn't attempt to modify the word if the bit was
set.
If that's a problem, then a bitops-proofed assignment will be required -
equivalent to atomic_set() vs other atomic_xxx() ops.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NETLINK]: Put {IFA,IFLA}_{RTA,PAYLOAD} macros back for userspace.
[NET_SCHED] sch_htb: turn intermediate classes into leaves
[NET_SCHED] sch_cbq: deactivating when grafting, purging etc.
[XFRM]: Fix XFRMGRP_REPORT to use correct multicast group.
[NET]: Force a cache line split in hh_cache in SMP.
[NETPOLL]: make arp replies through netpoll use mac address of sender
[NETLINK]: Restore API compatibility of address and neighbour bits
[AX.25]: Fix default address and broadcast address initialization.
[AX.25]: Constify ax25 utility functions
[BNX2]: Add an error check.
[NET]: Convert hh_lock to seqlock.
Linus Torvalds [Sat, 9 Dec 2006 01:21:38 +0000 (17:21 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[PATCH] add STB810 support (Philips PNX8550-based)
[MIPS] Qemu now has an ELF loader.
[MIPS] Add GENERIC_HARDIRQS_NO__DO_IRQ for i8259 users
[MIPS] Optimize csum_partial for 64bit kernel
[MIPS] Optimize flow of csum_partial
[MIPS] Make csum_partial more readable
[MIPS] Rename SNI_RM200_PCI to just SNI_RM preparing for more RM machines
J Hadi Salim [Fri, 8 Dec 2006 08:12:15 +0000 (00:12 -0800)]
[XFRM]: Fix XFRMGRP_REPORT to use correct multicast group.
XFRMGRP_REPORT uses 0x10 which is a group that belongs
to events. The correct value is 0x20.
We should really be using xfrm_nlgroups going forward; it was tempting
to delete the definition of XFRMGRP_REPORT but it would break at
least iproute2.
Signed-off-by: J Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 8 Dec 2006 08:08:43 +0000 (00:08 -0800)]
[NET]: Force a cache line split in hh_cache in SMP.
hh_lock was converted from rwlock to seqlock by Stephen.
To have a 100% benefit of this change, I suggest to place read mostly fields
of hh_cache in a separate cache line, because hh_refcnt may be changed quite
frequently on some busy machines.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Fri, 8 Dec 2006 08:05:55 +0000 (00:05 -0800)]
[NETPOLL]: make arp replies through netpoll use mac address of sender
Back in 2.4 arp requests that were recevied by netpoll were processed
in netconsole_receive_skb, where they were responded to using the src
mac of the request sender. In the 2.6 kernel arp_reply is responsible
for this function, but instead of using the src mac address of the
incomming request, the stored mac address that was registered for the
netconsole application is used. While this is usually ok, it can lead
to failures in netpoll in some situations (specifically situations
where a network may have two gateways, as arp requests from one may be
responded to using the mac address of the other). This patch reverts
the behavior to what we had in 2.4, in which all arp requests are sent
back using the src address of the request sender.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Thu, 7 Dec 2006 23:47:08 +0000 (15:47 -0800)]
[AX.25]: Fix default address and broadcast address initialization.
Only the callsign but not the SSID part of an AX.25 address is ASCII
based but Linux by initializes the SSID which should be just a 4-bit
number from ASCII anyway.
Fix that and convert the code to use a shared constant for both default
addresses. While at it, use the same style for null_ax25_address also.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Thu, 7 Dec 2006 23:10:06 +0000 (15:10 -0800)]
[BNX2]: Add an error check.
This patch adds a missing error check spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Jeff Garzik <jgarzik@pobox.com> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Atsushi Nemoto [Thu, 7 Dec 2006 16:04:45 +0000 (01:04 +0900)]
[MIPS] Optimize flow of csum_partial
Delete dead codes at end of the function and move small_csumcopy
there. This makes some labels (maybe_end_cruft, small_memcpy,
end_bytes, out) needless and eliminates some branches.
Linus Torvalds [Fri, 8 Dec 2006 19:21:55 +0000 (11:21 -0800)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] Poison init section before freeing it.
[S390] Use add_active_range() and free_area_init_nodes().
[S390] Virtual memmap for s390.
[S390] Update documentation for dynamic subchannel mapping.
[S390] Use dev->groups for adding/removing the subchannel attribute group.
[S390] Support for disconnected devices reappearing on another subchannel.
[S390] subchannel lock conversion.
[S390] Some preparations for the dynamic subchannel mapping patch.
[S390] runtime switch for qdio performance statistics
[S390] New DASD feature for ERP related logging
[S390] add reset call handler to the ap bus.
[S390] more workqueue fixes.
[S390] workqueue fixes.
[S390] uaccess_pt: add missing down_read() and convert to is_init().
Jiri Kosina [Fri, 8 Dec 2006 17:41:17 +0000 (18:41 +0100)]
[PATCH] Generic HID layer - input and event reporting
hid_input_report() was needlessly USB-specific in USB HID. This patch
makes the function independent of HID implementation and fixes all
the current users. Bluetooth patches comply with this prototype.
Jiri Kosina [Fri, 8 Dec 2006 17:41:10 +0000 (18:41 +0100)]
[PATCH] Generic HID layer - hiddev
- hiddev is USB-only (agreed with Marcel Holtmann that Bluetooth currently
doesn't need it, and future planned interface (rawhid) will be more flexible
and usable)
- both HID and USB-hid can be now compiled as modules (wasn't possible before
hiddev was fully separated from generic HID layer)
Jiri Kosina [Fri, 8 Dec 2006 17:41:03 +0000 (18:41 +0100)]
[PATCH] Generic HID layer - USB API
- 'dev' in struct hid_device changed from struct usb_device to
struct device and fixed all the users
- renamed functions which are part of USB HID API from 'hid_*' to
'usbhid_*'
- force feedback initialization moved from common part into USB-specific
driver
- added usbhid.h header for USB HID API users
- removed USB-specific fields from struct hid_device and moved them
to new usbhid_device, which is pointed to by hid_device->driver_data
- fixed all USB users to use this new structure
Jiri Kosina [Fri, 8 Dec 2006 17:40:53 +0000 (18:40 +0100)]
[PATCH] Generic HID layer - API
- fixed generic API (added neccessary EXPORT_SYMBOL, fixed hid.h to provide correct
prototypes)
- extended hid_device with open/close/event function pointers to driver-specific
functions
- added driver specific driver_data to hid_device
Jiri Kosina [Fri, 8 Dec 2006 17:40:37 +0000 (18:40 +0100)]
[PATCH] Generic HID layer - disable USB HID
This patch is a part of generic HID layer introduction. USB HID is
disabled, so that the code split and changes could be introduced in a
way that is reviewable (i.e. separate patches), but not to break git
bisect by uncompilable kernel throughout different stages of the code
splitup and changes. The last patch of this series enables HID again.
Reset sync_search on resume. The effect is to retry syncing all out-of-sync
regions when a mirror is resumed, including ones that previously failed.
Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The complete_resync_work function only provides the ability to change an
out-of-sync region to in-sync. This patch enhances the function to allow us
to change the status from in-sync to out-of-sync as well, something that is
needed when a mirror write to one of the devices or an initial resync on a
given region fails.
Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kiyoshi Ueda [Fri, 8 Dec 2006 10:41:10 +0000 (02:41 -0800)]
[PATCH] dm: mpath: use noflush suspending
Implement the pushback feature for the multipath target.
The pushback request is used when:
1) there are no valid paths;
2) queue_if_no_path was set;
3) a suspend is being issued with the DMF_NOFLUSH_SUSPENDING flag.
Otherwise bios are returned to applications with -EIO.
To check whether queue_if_no_path is specified or not, you need to check
both queue_if_no_path and saved_queue_if_no_path, because presuspend saves
the original queue_if_no_path value to saved_queue_if_no_path.
The check for 1 already exists in both map_io() and do_end_io().
So this patch adds __must_push_back() to check 2 and 3.
Test results:
See the test results in the preceding patch.
Kiyoshi Ueda [Fri, 8 Dec 2006 10:41:09 +0000 (02:41 -0800)]
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Kiyoshi Ueda [Fri, 8 Dec 2006 10:41:07 +0000 (02:41 -0800)]
[PATCH] dm: ioctl: add noflush suspend
Provide a dm ioctl option to request noflush suspending. (See next patch for
what this is for.) As the interface is extended, the version number is
incremented.
Other than accepting the new option through the interface, There is no change
to existing behaviour.
Test results:
Confirmed the option is given from user-space correctly.
Kiyoshi Ueda [Fri, 8 Dec 2006 10:41:05 +0000 (02:41 -0800)]
[PATCH] dm: map and endio return code clarification
Tighten the use of return values from the target map and end_io functions.
Values of 2 and above are now explictly reserved for future use. There are no
existing targets using such values.
The patch has no effect on existing behaviour.
o Reserve return values of 2 and above from target map functions.
Any positive value currently indicates "mapping complete", but all
existing drivers use the value 1. We now make that a requirement
so we can assign new meaning to higher values in future.
The new definition of return values from target map functions is:
< 0 : error
= 0 : The target will handle the io (DM_MAPIO_SUBMITTED).
= 1 : Mapping completed (DM_MAPIO_REMAPPED).
> 1 : Reserved (undefined). Previously this was the same as '= 1'.
o Reserve return values of 2 and above from target end_io functions
for similar reasons.
DM_ENDIO_INCOMPLETE is introduced for a return value of 1.
Test results:
I have tested by using the multipath target.
I/Os succeed when valid paths exist.
I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set.
I/Os fail when there are no valid paths and queue_if_no_path is not set.
Kiyoshi Ueda [Fri, 8 Dec 2006 10:41:04 +0000 (02:41 -0800)]
[PATCH] dm: suspend: parameter change
Change the interface of dm_suspend() so that we can pass several options
without increasing the number of parameters. The existing 'do_lockfs' integer
parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG.
There is no functional change to the code.
Test results:
I have tested 'dmsetup suspend' command with/without the '--nolockfs'
option and confirmed the do_lockfs value is correctly set.