Rusty Russell [Tue, 5 Feb 2008 04:50:12 +0000 (23:50 -0500)]
virtio: balloon driver
After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler. Here's my attempt.
The device configuration tells the driver how much memory it should
take from the guest (ie. balloon size). The guest feeds the page
numbers it has taken via one virtqueue.
A second virtqueue feeds the page numbers the driver wants back: if
the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this
queue is compulsory, otherwise it's advisory (and the guest can simply
fault the pages back in).
This driver can be enhanced later to deflate the balloon via a
shrinker, oom callback or we could even go for a complete set of
in-guest regulators.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Anthony Liguori [Mon, 28 Jan 2008 15:59:59 +0000 (09:59 -0600)]
virtio: Use PCI revision field to indicate virtio PCI ABI version
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make
things a little more friendly to users by utilizing the PCI revision field to
indicate which version of the ABI we're using. This is a hard ABI version
and incrementing it will cause the guest driver to break.
This is the necessary changes to virtio_pci to support this.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
Am Freitag, 1. Februar 2008 schrieb Christian Borntraeger:
> Right. I will fix that with an additional patch.
This patch goes on top of the minor number patch. Please let me know if
you want a merged patch:
Currently virtio_blk creates the disk name combinging "vd" with 'a'++.
This will give strange names after vdz. I have implemented names up to
vdzzz - inspired by the sd.c code. That should be sufficient for now.
There is one driver in the kernel (driver/s390/block/dasd_genhd.c) that
implements names from dasda-dasdzzzz allowing even more disks. Maybe
a janitor can come up with a common implementation usable for all kind
of block device drivers.
I have tested this patch with 100 disks - seems to work.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
currently virtio_blk uses one major number per device. While this works
quite well on most systems it is wasteful and will exhaust major numbers
on larger installations.
This patch allocates a major number on init and will use 16 minor numbers
for each disk. That will allow ~64k virtio_blk disks.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I currently try to make my guest boot from an virtio root device
without having an external kernel. Some of the tools that I tried
expect HDIO_GETGEO to work. The most interesting value is likely
the geo.start value to get the offset of a partition. This value
is filled by block/ioctl.c if fops->getgeo is set. This patch also
fills in some standard values for heads, sectors and cylinders.
Makes sense?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:50:07 +0000 (23:50 -0500)]
virtio: flush buffers on open
Fix bug found by Christian Borntraeger: if the other side fills all
the registered network buffers before we enable NAPI, we will never
get an interrupt. The simplest fix is to process the input queue once
on open.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:50:04 +0000 (23:50 -0500)]
virtio: handle interrupts after callbacks turned off
Anthony Liguori found double interrupt suppression in the virtio_net
driver, triggered by two skb_recv_done's in a row. This is because
virtio_ring's interrupt suppression is a best-effort optimization: it
contains no synchronization so the host can miss it and still send
interrupts.
But it's certainly nicer for virtio users if calling disable_cb
actually disables callbacks, so we check for the race in the interrupt
routine.
Note: SMP guests might require syncronization here, but since
disable_cb is actually called from interrupt context, there has to be
some form of synchronization before the next same interrupt handler is
called (Linux guarantees that the same device's irq handler will never
run simultanously on multiple CPUs).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:50:02 +0000 (23:50 -0500)]
virtio: Tweak virtio_net defines
1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
separate). Having multiple bits is a pain: if you can't support something
you should handle it in software, which is still a performance win.
2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
IPv6 or v4.
3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
checksumming).
4) Add csum and gso params to virtio_net to allow more testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:50:01 +0000 (23:50 -0500)]
virtio: Net header needs hdr_len
It's far easier to deal with packets if we don't have to parse the
packet to figure out the header length to know how much to pull into
the skb data. Add the field to the virtio_net_hdr struct (and fix the
spaces that somehow crept in there).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:49:59 +0000 (23:49 -0500)]
virtio: clarify NO_NOTIFY flag usage
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things". Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Anthony Liguori [Fri, 21 Dec 2007 00:17:47 +0000 (02:17 +0200)]
virtio: Fix vring_init/vring_size to take unsigned long
Using unsigned int resulted in silent truncation of the upper 32-bit
on x86_64 resulting in an OOPS since the ring was being initialized
wrong.
Please reconsider my previous patch to just use PAGE_ALIGN(). Open
coding this sort of stuff, no matter how simple it seems, is just
asking for this sort of trouble.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:49:58 +0000 (23:49 -0500)]
virtio: configuration change callback
Various drivers want to know when their configuration information
changes: the balloon driver is the immediate user, but the network
driver may one day have a "carrier" status as well.
This introduces that callback (lguest doesn't use it yet).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Feb 2008 04:49:56 +0000 (23:49 -0500)]
virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.
The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Nick Piggin [Sat, 2 Feb 2008 14:01:17 +0000 (15:01 +0100)]
fix writev regression: pan hanging unkillable and un-straceable
Frederik Himpe reported an unkillable and un-straceable pan process.
Zero length iovecs can go into an infinite loop in writev, because the
iovec iterator does not always advance over them.
The sequence required to trigger this is not trivial. I think it
requires that a zero-length iovec be followed by a non-zero-length iovec
which causes a pagefault in the atomic usercopy. This causes the writev
code to drop back into single-segment copy mode, which then tries to
copy the 0 bytes of the zero-length iovec; a zero length copy looks like
a failure though, so it loops.
Put a test into iov_iter_advance to catch zero-length iovecs. We could
just put the test in the fallback path, but I feel it is more robust to
skip over zero-length iovecs throughout the code (iovec iterator may be
used in filesystems too, so it should be robust).
All those 2-byte values denoting the different capabilities are being written to
the local copy of the caps buffer without being converted to big endian for
simplicity of usage and shorter code later. Also, we add some comments stating
which are the fields of the caps page in question in order to alleviate the
cryptic pointer casting exercises as in e.g. idetape_get_mode_sense_results().
There should be no functional changes resulting from this patch.
SELECT_DRIVE() is called by IDE core code in start_request() before device
driver's ->do_request method. In ide-scsi case ->do_request is implemented
by idescsi_do_request() which is also the only user of idescsi_issue_pc().
* Factor out code for finding ide_hwifs[] slot from ide_register_hw()
to ide_deprecated_find_port().
* Convert bast-ide, ide-cs and delkin_cb host drivers to use ide_device_add()
instead of ide_register_hw() (while at it drop doing "ide_unregister()" loop
which tries to unregister _all_ IDE interfaces if useable ide_hwifs[] slot
cannot be find).
This patch leaves us with only two ide_register_hw() users:
- drivers/macintosh/mediabay.c
- drivers/ide/ide.c
Borislav Petkov [Sat, 2 Feb 2008 18:56:38 +0000 (19:56 +0100)]
ide-floppy: fix most of the remaining checkpatch.pl issues
such as
ERROR: switch and case should be at the same indent
ERROR: need spaces around that '=' (ctx:VxV)
ERROR: trailing statements should be on next line
WARNING: no space between function name and open parenthesis '('
WARNING: printk() should include KERN_ facility level
ERROR: That open brace { should be on the previous line
ERROR: use tabs not spaces
ERROR: do not use assignment in if condition
WARNING: braces {} are not necessary for single statement blocks
ERROR: need space after that ',' (ctx:VxV)
WARNING: line over 80 characters
ERROR: do not use assignment in if condition
...
and so on.
Borislav Petkov [Sat, 2 Feb 2008 18:56:37 +0000 (19:56 +0100)]
ide-floppy: remove unused flag PC_ABORT
This flag was never being set in the code so remove it. By the way, the
code in the second patch was being executed unconditionally, i.e. in case
pc->retries > IDEFLOPPY_MAX_PC_RETRIES is true (actually that is the only case
when the outer if-test passed), !test_bit(PC_ABORT, &pc->flags)
was always true so the comment is now incorrect and has to go.
We merge idefloppy_{input,output}_buffers() into idefloppy_io_buffers() by
introducing a 4th arg. called direction. According to its value
we atapi_input_bytes() or atapi_output_bytes(). Also, this simplifies the
interrupt handler logic a bit. Finally, rename idefloppy_io_buffers() to
ide_floppy_io_buffers().
Borislav Petkov [Sat, 2 Feb 2008 18:56:35 +0000 (19:56 +0100)]
ide-floppy: mv idefloppy_{should_,}report_error
In addition to shortening the function name, move the printk-call into the
function thereby saving some code lines. Also, make the function out_of_line
since it is not on a performance critical path. Finally, rename the reworked
function to ide_floppy..().