Michael Chan [Thu, 3 May 2007 20:24:48 +0000 (13:24 -0700)]
[BNX2]: Add 1-shot MSI handler for 5709.
The 5709 supports the one-shot MSI handler similar to some of the tg3
chips. In this mode, the MSI disables itself automatically until it
is re-enabled at the end of NAPI poll.
Put the request_irq/free_irq logic in common procedures.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 3 May 2007 20:20:19 +0000 (13:20 -0700)]
[BNX2]: Fix race conditions when calling register_netdev().
Hot-plug scripts can call bnx2_open() as soon as register_netdev() is
called in bnx2_init_one(). We need to call pci_set_drvdata() and
setup everything before calling register_netdev(). netif_carrier_off()
also needs to be moved to bnx2_open() to avoid race conditions with
the irq.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 3 May 2007 20:19:18 +0000 (13:19 -0700)]
[BNX2]: Add 40-bit DMA workaround for 5708.
The internal PCIE-to-PCIX bridge of the 5708 has the same 40-bit DMA
limitation as some of the tg3 chips. Set dma_mask and persistent DMA
mask to 40-bit to workaround.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Pitre [Wed, 21 Feb 2007 14:58:13 +0000 (15:58 +0100)]
[ARM] 4227/1: minor head.S fixups
Let's surround constructs like:
orr r3, r3, #(KERNEL_RAM_PADDR & 0x00f00000)
between .if .endif since (KERNEL_RAM_PADDR & 0x00f00000) is 0 in 99% of
all cases.
Also let's mask PHYS_OFFSET with 0x00f00000 instead of 0x00e00000.
Section mappings are really 1MB not 2MB and the 2MB groupping is
a higher level issue already much better enforced with
#if (PHYS_OFFSET & 0x001fffff)
#error "PHYS_OFFSET must be at an even 2MiB boundary!"
#endif
at the top of the file.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch moves the i.MX UART register descriptions from
include/asm-arm/arch-imx/imx-regs.h to the serial driver itself.
This helps using the driver on other architectures like mx31
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Thu, 3 May 2007 13:39:41 +0000 (14:39 +0100)]
[ARM] 4355/2: AT91: SAM9260-EK and SAM9263-EK board updates
Various small changes for the Atmel AT91SAM9260-EK and AT91SAM9263-EK
boards.
SAM9260-EK:
- Register I2C device.
SAM9263-EK:
- Add platform_data and register MACB device.
(Patch by Nicolas Ferre)
- Add platform_data and register AC97 device.
(Patch by Nicolas Ferre)
- Register I2C device.
Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 3 May 2007 09:47:37 +0000 (10:47 +0100)]
[ARM] ecard: Convert card type enum to a flag
'type' in the struct expansion_card is only used to indicate
whether this card is an EASI card or not. Therefore, having
it as an enum is wasteful (and introduces additional noise
when we come to remove the enum.) Convert it to a mere flag
instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch modifies the startup of kecardd to use kthread_run not a
kernel_thread combination of kernel_thread and daemonize. Making the code
slightly simpler and more maintainable.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 2 Apr 2007 12:53:15 +0000 (13:53 +0100)]
[ARM] Set coherent DMA mask for Acorn expansion cards
Although expansion cards can't do bus-master DMA, subsystems
want to be able to use coherent memory for DMA purposes to
these cards. Therefore, set the coherent DMA mask to allow
such memory to be allocated.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Wed, 2 May 2007 17:00:45 +0000 (18:00 +0100)]
[ARM] 4354/1: AT91: Support ADS7846 touchsceen on SAM9263-EK board
Add support for the ADS7846 Touchscreen found on the Atmel
AT91SAM9263-EK board.
Signed-off-by: Nicolas Ferre <nicolas.ferre@rfo.atmel.com> Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Wed, 2 May 2007 16:46:49 +0000 (17:46 +0100)]
[ARM] 4352/1: AT91: Platform data for LCD and AC97.
Define resources, platform_device and device registration functions for
the LCD and AC97 controllers on the AT91SAM9263.
Also update the AT91SAM9261 to use the common atmel_lcdfb driver.
Signed-off-by: Nicolas Ferre <nicolas.ferre@rfo.atmel.com> Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dan Williams [Wed, 2 May 2007 16:59:44 +0000 (17:59 +0100)]
[ARM] 4348/4: iop3xx: Give Linux control over PCI initialization
Currently the iop3xx platform support code assumes that RedBoot is the
bootloader and has already initialized the ATU. Linux should handle this
initialization for three reasons:
1/ The memory map that RedBoot sets up is not optimal (page_to_dma and
virt_to_phys return different addresses). The effect of this is that using
the dma mapping API for the internal bus dma units generates pci bus
addresses that are incorrect for the internal bus.
2/ Not all iop platforms use RedBoot
3/ If the ATU is already initialized it indicates that the iop is an add-in
card in another host, it does not own the PCI bus, and should not be
re-initialized.
Changelog:
* rather than change nr_controllers to zero, simply do not call
pci_common_init
Cc: Lennert Buytenhek <kernel@wantstofly.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Herbert Xu [Thu, 3 May 2007 10:35:31 +0000 (03:35 -0700)]
[NETFILTER]: sip: Fix RTP address NAT
I needed to use this recently to talk to a Cisco server. In my case
I only did SNAT while the Cisco server used a different address for
RTP traffic than the one for SIP. I discovered that nf_nat_sip NATed
the RTP address to the SIP one which was unnecessary but OK. However,
in doing so it did not DNAT the destination address on the RTP traffic
to the Cisco back to the original RTP address.
This patch corrects this by noting down the RTP address and using it
when the expectation fires.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jorge Boncompte [Thu, 3 May 2007 10:34:42 +0000 (03:34 -0700)]
[NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Thu, 3 May 2007 10:30:34 +0000 (03:30 -0700)]
[TCP]: Use S+L catcher only with SACK for now
TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 3 May 2007 10:29:41 +0000 (03:29 -0700)]
[AFS]: Adjust the new netdevice scanning code
Adjust the new netdevice scanning code provided by Patrick McHardy:
(1) Restore the function banner comments that were dropped.
(2) Rather than using an array size of 6 in some places and an array size of
ETH_ALEN in others, pass a pointer instead and pass the array size
through so that we can actually check it.
(3) Do the buffer fill count check before checking the for_primary_ifa
condition again. This permits us to skip that check should maxbufs be
reached before we run out of interfaces.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 3 May 2007 10:28:49 +0000 (03:28 -0700)]
[AFS]: Replace rtnetlink client by direct dev_base walking
Replace the large and complicated rtnetlink client by two simple
functions for getting the MAC address for the first ethernet device
and building a list of IPv4 addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 3 May 2007 10:28:13 +0000 (03:28 -0700)]
[NET]: Add __dev_getfirstbyhwtype
Add __dev_getfirstbyhwtype for callers that don't want a reference but
some data from the device and thus need to take the rtnl anyway.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 3 May 2007 10:27:39 +0000 (03:27 -0700)]
[AFS]: Fix memory leak in SRXAFSCB_GetCapabilities
The interface array is not freed on exit.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 3 May 2007 10:27:01 +0000 (03:27 -0700)]
[NETLINK]: Fix use after free in netlink_recvmsg
When the user passes in MSG_TRUNC the skb is used after getting freed.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 3 May 2007 10:17:14 +0000 (03:17 -0700)]
[NETLINK]: Kill CB only when socket is unused
Since we can still receive packets until all references to the
socket are gone, we don't need to kill the CB until that happens.
This also aligns ourselves with the receive queue purging which
happens at that point.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Avi Kivity [Sun, 29 Apr 2007 12:02:17 +0000 (15:02 +0300)]
KVM: Don't require explicit indication of completion of mmio or pio
It is illegal not to return from a pio or mmio request without completing
it, as mmio or pio is an atomic operation. Therefore, we can simplify
the userspace interface by avoiding the completion indication.
Avi Kivity [Wed, 14 Mar 2007 13:54:54 +0000 (15:54 +0200)]
KVM: Remove extraneous guest entry on mmio read
When emulating an mmio read, we actually emulate twice: once to determine
the physical address of the mmio, and, after we've exited to userspace to
get the mmio value, we emulate again to place the value in the result
register and update any flags.
But we don't really need to enter the guest again for that, only to take
an immediate vmexit. So, if we detect that we're doing an mmio read,
emulate a single instruction before entering the guest again.
Anthony Liguori [Sun, 29 Apr 2007 08:56:06 +0000 (11:56 +0300)]
KVM: SVM: Only save/restore MSRs when needed
We only have to save/restore MSR_GS_BASE on every VMEXIT. The rest can be
saved/restored when we leave the VCPU. Since we don't emulate the DEBUGCTL
MSRs and the guest cannot write to them, we don't have to worry about
saving/restoring them at all.
This shaves a whopping 40% off raw vmexit costs on AMD.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 19 Apr 2007 14:27:43 +0000 (17:27 +0300)]
KVM: Per-vcpu statistics
Make the exit statistics per-vcpu instead of global. This gives a 3.5%
boost when running one virtual machine per core on my two socket dual core
(4 cores total) machine.
Avi Kivity [Thu, 19 Apr 2007 11:28:44 +0000 (14:28 +0300)]
KVM: VMX: Only save/restore MSR_K6_STAR if necessary
Intel hosts only support syscall/sysret in long more (and only if efer.sce
is enabled), so only reload the related MSR_K6_STAR if the guest will
actually be able to use it.
This reduces vmexit cost by about 500 cycles (6400 -> 5870) on my setup.
Avi Kivity [Thu, 19 Apr 2007 10:22:48 +0000 (13:22 +0300)]
KVM: VMX: Don't switch 64-bit msrs for 32-bit guests
Some msrs are only used by x86_64 instructions, and are therefore
not needed when the guest is legacy mode. By not bothering to switch
them, we reduce vmexit latency by 2400 cycles (from about 8800) when
running a 32-bt guest on a 64-bit host.
Avi Kivity [Tue, 17 Apr 2007 12:30:24 +0000 (15:30 +0300)]
KVM: VMX: Reduce unnecessary saving of host msrs
THe automatically switched msrs are never changed on the host (with
the exception of MSR_KERNEL_GS_BASE) and thus there is no need to save
them on every vm entry.
This reduces vmexit latency by ~400 cycles on i386 and by ~900 cycles (10%)
on x86_64.
Avi Kivity [Tue, 17 Apr 2007 07:53:22 +0000 (10:53 +0300)]
KVM: Handle guest page faults when emulating mmio
Usually, guest page faults are detected by the kvm page fault handler,
which detects if they are shadow faults, mmio faults, pagetable faults,
or normal guest page faults.
However, in ceratin circumstances, we can detect a page fault much later.
One of these events is the following combination:
- A two memory operand instruction (e.g. movsb) is executed.
- The first operand is in mmio space (which is the fault reported to kvm)
- The second operand is in an ummaped address (e.g. a guest page fault)
The Windows 2000 installer does such an access, an promptly hangs. Fix
by adding the missing page fault injection on that path.
Avi Kivity [Thu, 12 Apr 2007 14:35:58 +0000 (17:35 +0300)]
KVM: Handle partial pae pdptr
Some guests (Solaris) do not set up all four pdptrs, but leave some invalid.
kvm incorrectly treated these as valid page directories, pinning the
wrong pages and causing general confusion.
Fix by checking the valid bit of a pae pdpte. This closes sourceforge bug 1698922.
where sp is a u16 is undefined in C since 'sp - 6' is promoted to int,
and signed overflow is undefined in C. gcc 4.2 actually warns about it.
Replace with a simpler test.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Fri, 30 Mar 2007 14:02:14 +0000 (17:02 +0300)]
KVM: SVM: enable LBRV virtualization if available
This patch enables the virtualization of the last branch record MSRs on
SVM if this feature is available in hardware. It also introduces a small
and simple check feature for specific SVM extensions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Fri, 30 Mar 2007 13:54:30 +0000 (16:54 +0300)]
KVM: Add physical memory aliasing feature
With this, we can specify that accesses to one physical memory range will
be remapped to another. This is useful for the vga window at 0xa0000 which
is used as a movable window into the (much larger) framebuffer.
Avi Kivity [Fri, 30 Mar 2007 11:02:32 +0000 (14:02 +0300)]
KVM: Simply gfn_to_page()
Mapping a guest page to a host page is a common operation. Currently,
one has first to find the memory slot where the page belongs (gfn_to_memslot),
then locate the page itself (gfn_to_page()).
This is clumsy, and also won't work well with memory aliases. So simplify
gfn_to_page() not to require memory slot translation first, and instead do it
internally.
Dor Laor [Fri, 30 Mar 2007 10:06:33 +0000 (13:06 +0300)]
KVM: Add mmu cache clear function
Functions that play around with the physical memory map
need a way to clear mappings to possibly nonexistent or
invalid memory. Both the mmu cache and the processor tlb
are cleared.
Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 28 Mar 2007 18:04:16 +0000 (20:04 +0200)]
KVM: x86 emulator: fix bit string operations operand size
On x86, bit operations operate on a string of bits that can reside in
multiple words. For example, 'btsl %eax, (blah)' will touch the word
at blah+4 if %eax is between 32 and 63.
The x86 emulator compensates for that by advancing the operand address
by (bit offset / BITS_PER_LONG) and truncating the bit offset to the
range (0..BITS_PER_LONG-1). This has a side effect of forcing the operand
size to 8 bytes on 64-bit hosts.
Now, a 32-bit guest goes and fork()s a process. It write protects a stack
page at 0xbffff000 using the 'btr' instruction, at offset 0xffc in the page
table, with bit offset 1 (for the write permission bit).
The emulator now forces the operand size to 8 bytes as previously described,
and an innocent page table update turns into a cross-page-boundary write,
which is assumed by the mmu code not to be a page table, so it doesn't
actually clear the corresponding shadow page table entry. The guest and
host permissions are out of sync and guest memory is corrupted soon
afterwards, leading to guest failure.
Fix by not using BITS_PER_LONG as the word size; instead use the actual
operand size, so we get a 32-bit write in that case.
Note we still have to teach the mmu to handle cross-page-boundary writes
to guest page table; but for now this allows Damn Small Linux 0.4 (2.4.20)
to boot.
Avi Kivity [Sun, 25 Mar 2007 10:07:27 +0000 (12:07 +0200)]
KVM: SVM: Ensure timestamp counter monotonicity
When a vcpu is migrated from one cpu to another, its timestamp counter
may lose its monotonic property if the host has unsynced timestamp counters.
This can confuse the guest, sometimes to the point of refusing to boot.
As the rdtsc instruction is rather fast on AMD processors (7-10 cycles),
we can simply record the last host tsc when we drop the cpu, and adjust
the vcpu tsc offset when we detect that we've migrated to a different cpu.
Avi Kivity [Fri, 23 Mar 2007 07:55:25 +0000 (09:55 +0200)]
KVM: MMU: Fix hugepage pdes mapping same physical address with different access
The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map
the same physical address, they share the same shadow page. This is a fairly
common case (kernel mappings on i386 nonpae Linux, for example).
However, if the two pdes map the same memory but with different permissions, kvm
will happily use the cached shadow page. If the access through the more
permissive pde will occur after the access to the strict pde, an endless pagefault
loop will be generated and the guest will make no progress.
Fix by making the access permissions part of the cache lookup key.
The fix allows Xen pae to boot on kvm and run guest domains.
Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix.
Joerg Roedel [Wed, 21 Mar 2007 18:47:00 +0000 (19:47 +0100)]
KVM: SVM: forbid guest to execute monitor/mwait
This patch forbids the guest to execute monitor/mwait instructions on
SVM. This is necessary because the guest can execute these instructions
if they are available even if the kvm cpuid doesn't report its
existence.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Sergey Kiselev [Thu, 22 Mar 2007 12:06:18 +0000 (14:06 +0200)]
KVM: Handle writes to MCG_STATUS msr
Some older (~2.6.7) kernels write MCG_STATUS register during kernel
boot (mce_clear_all() function, called from mce_init()). It's not
currently handled by kvm and will cause it to inject a GPF.
Following patch adds a "nop" handler for this.
Signed-off-by: Sergey Kiselev <sergey.kiselev@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 21 Mar 2007 15:58:32 +0000 (17:58 +0200)]
KVM: Hack real-mode segments on vmx from KVM_SET_SREGS
As usual, we need to mangle segment registers when emulating real mode
as vm86 has specific constraints. We special case the reset segment base,
and set the "access rights" (or descriptor flags) to vm86 comaptible values.
Avi Kivity [Wed, 21 Mar 2007 11:44:58 +0000 (13:44 +0200)]
KVM: Modify guest segments after potentially switching modes
The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and
guest segment registers. Since segment handling is modified by the mode on
Intel procesors, update the segment registers after the mode switch has taken
place.
Avi Kivity [Tue, 20 Mar 2007 16:44:51 +0000 (18:44 +0200)]
KVM: Remove set_cr0_no_modeswitch() arch op
set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers.
As we now cache the protected mode values on entry to real mode, this
isn't an issue anymore, and it interferes with reboot (which usually _is_
a modeswitch).
Avi Kivity [Tue, 20 Mar 2007 16:40:40 +0000 (18:40 +0200)]
KVM: Workaround vmx inability to virtualize the reset state
The reset state has cs.selector == 0xf000 and cs.base == 0xffff0000,
which aren't compatible with vm86 mode, which is used for real mode
virtualization.
When we create a vcpu, we set cs.base to 0xf0000, but if we get there by
way of a reset, the values are inconsistent and vmx refuses to enter
guest mode.
Workaround by detecting the state and munging it appropriately.
Avi Kivity [Tue, 20 Mar 2007 12:34:28 +0000 (14:34 +0200)]
KVM: MMU: Remove global pte tracking
The initial, noncaching, version of the kvm mmu flushed the all nonglobal
shadow page table translations (much like a native tlb flush). The new
implementation flushes translations only when they change, rendering global
pte tracking superfluous.
This removes the unused tracking mechanism and storage space.