err.no Git - linux-2.6/log

Merge branch 'upstream' of git://git.infradead.org/~dedekind/ubi-2.6

* 'upstream' of git://git.infradead.org/~dedekind/ubi-2.6: (28 commits)
  UBI: fix compile warning
  UBI: fix error handling in erase worker
  UBI: fix comments
  UBI: remove unneeded error checks
  UBI: cleanup usage of try_module_get
  UBI: fix overflow bug
  UBI: bugfix in max_sqnum calculation
  UBI: bugfix in sqnum calculation
  UBI: fix signed-unsigned multiplication
  UBI: fix bug in atomic_leb_change()
  UBI: fix message
  UBI: fix debugging stuff
  UBI: bugfix in error path
  UBI: use is_power_of_2()
  UBI: fix freeing ubi->vtbl while unloading
  UBI: fix MAINTAINERS
  UBI: bugfix in ubi_leb_change()
  UBI: kill homegrown endian macros
  UBI: cleanup ioctl handling
  UBI: error path bugfix
  ...

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
  [NETFILTER]: xt_connlimit needs to depend on nf_conntrack
  [NETFILTER]: ipt_iprange.h must #include <linux/types.h>
  [IrDA]: Fix IrDA build failure
  [ATM]: nicstar needs virt_to_bus
  [NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability
  [NET]: merge dev_unicast_discard and dev_mc_discard into one
  [NET]: move dev_mc_discard from dev_mcast.c to dev.c
  [NETLINK]: negative groups in netlink_setsockopt
  [PPPOL2TP]: Reset meta-data in xmit function
  [PPPOL2TP]: Fix use-after-free
  [PKT_SCHED]: Some typo fixes in net/sched/Kconfig
  [XFRM]: Fix crash introduced by struct dst_entry reordering
  [TCP]: remove unused argument to cong_avoid op
  [ATM]: [idt77252] Rename CONFIG_ATM_IDT77252_SEND_IDLE to not resemble a Kconfig variable
  [ATM]: [drivers] ioremap balanced with iounmap
  [ATM]: [lanai] sram_test_word() must be __devinit
  [ATM]: [nicstar] Replace C code with call to ARRAY_SIZE() macro.
  [ATM]: Eliminate dead config variable CONFIG_BR2684_FAST_TRANS.
  [ATM]: Replacing kmalloc/memset combination with kzalloc.
  [NET]: gen_estimator deadlock fix
  ...

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Set vio->desc_buf to NULL after freeing.
  [SPARC]: Mark sparc and sparc64 as not having virt_to_bus
  [SPARC64]: Fix reset handling in VNET driver.
  [SPARC64]: Handle reset events in vio_link_state_change().
  [SPARC64]: Handle LDC resets properly in domain-services driver.
  [SPARC64]: Massively simplify VIO device layer and support hot add/remove.
  [SPARC64]: Simplify VNET probing.
  [SPARC64]: Simplify VDC device probing.
  [SPARC64]: Add basic infrastructure for MD add/remove notification.

Merge branch 'xen-upstream' of ssh://master.kernel.org/pub/scm/linux/kernel/git/jeremy/xen

* 'xen-upstream' of ssh://master.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (44 commits)
  xen: disable all non-virtual drivers
  xen: use iret directly when possible
  xen: suppress abs symbol warnings for unused reloc pointers
  xen: Attempt to patch inline versions of common operations
  xen: Place vcpu_info structure into per-cpu memory
  xen: handle external requests for shutdown, reboot and sysrq
  xen: machine operations
  xen: add virtual network device driver
  xen: add virtual block device driver.
  xen: add the Xenbus sysfs and virtual device hotplug driver
  xen: Add grant table support
  xen: use the hvc console infrastructure for Xen console
  xen: hack to prevent bad segment register reload
  xen: lazy-mmu operations
  xen: Add support for preemption
  xen: SMP guest support
  xen: Implement sched_clock
  xen: Account for stolen time
  xen: ignore RW mapping of RO pages in pagetable_init
  xen: Complete pagetable pinning
  ...

Revert "[POWERPC] Do firmware feature fixups after features are initialised"

This reverts commit 5a26f6bbb767d7ad23311a1e81cfdd2bebefb855.

The original patch causes boot failures when built with ppc64_defconfig. The
quickest fix is to revert it while alterates are investigated.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Fix compile failure in arch/powerpc/kernel/pci-common.c

This fixes the fallout from the recent powerpc merge (commit
489de30259e667d7bc47da9da44a0270b050cd97):

   CC      arch/powerpc/kernel/pci-common.o
  arch/powerpc/kernel/pci-common.c:160: error: conflicting types for 'pcibios_add_platform_entries'
  include/linux/pci.h:889: error: previous declaration of 'pcibios_add_platform_entries' was here

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Tested-by: Bret Towe <magnade@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xen: disable all non-virtual drivers

A domU Xen environment has no non-virtual drivers, so make sure
they're all disabled at once.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>

xen: use iret directly when possible

Most of the time we can simply use the iret instruction to exit the
kernel, rather than having to use the iret hypercall - the only
exception is if we're returning into vm86 mode, or from delivering an
NMI (which we don't support yet).

When running native, iret has the behaviour of testing for a pending
interrupt atomically with re-enabling interrupts.  Unfortunately
there's no way to do this with Xen, so there's a window in which we
could get a recursive exception after enabling events but before
actually returning to userspace.

This causes a problem: if the nested interrupt causes one of the
task's TIF_WORK_MASK flags to be set, they will not be checked again
before returning to userspace.  This means that pending work may be
left pending indefinitely, until the process enters and leaves the
kernel again.  The net effect is that a pending signal or reschedule
event could be delayed for an unbounded amount of time.

To deal with this, the xen event upcall handler checks to see if the
EIP is within the critical section of the iret code, after events
are (potentially) enabled up to the iret itself.  If its within this
range, it calls the iret critical section fixup, which adjusts the
stack to deal with any unrestored registers, and then shifts the
stack frame up to replace the previous invocation.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

xen: suppress abs symbol warnings for unused reloc pointers

arch/i386/xen/xen-asm.S defines some small pieces of code which are
used to implement a few paravirt_ops.  They're designed so they can be
used either in-place, or be inline patched into their callsites if
there's enough space.

Some of those operations need to make calls out (specifically, if you
re-enable events [interrupts], and there's a pending event at that
time).  These calls need the call instruction to be relocated if the
code is patched inline.  In this case xen_foo_reloc is a
section-relative symbol which points to xen_foo's required relocation.

Other operations have no need of a relocation, and so their
corresponding xen_bar_reloc is absolute 0.  These are the cases which
are triggering the warning.

This patch adds those symbols to the list of safe abs symbols.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Adrian Bunk <bunk@stusta.de>

xen: Attempt to patch inline versions of common operations

This patchs adds the mechanism to allow us to patch inline versions of
common operations.

The implementations of the direct-access versions save_fl, restore_fl,
irq_enable and irq_disable are now in assembler, and the same code is
used for both out of line and inline uses.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Keir Fraser <keir@xensource.com>

xen: Place vcpu_info structure into per-cpu memory

An experimental patch for Xen allows guests to place their vcpu_info
structs anywhere. We try to use this to place the vcpu_info into the
PDA, which allows direct access.

If this works, then switch to using direct access operations for
irq_enable, disable, save_fl and restore_fl.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Keir Fraser <keir@xensource.com>

xen: handle external requests for shutdown, reboot and sysrq

The guest domain can be asked to shutdown or reboot itself, or have a
sysrq key injected, via xenbus. This patch adds a watcher for those
events, and does the appropriate action.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>

xen: machine operations

Make the appropriate hypercalls to halt and reboot the virtual machine.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>

xen: add virtual network device driver

The network device frontend driver allows the kernel to access network
devices exported exported by a virtual machine containing a physical
network device driver.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Jeff Garzik <jeff@garzik.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
Cc: netdev@vger.kernel.org

xen: add virtual block device driver.

The block device frontend driver allows the kernel to access block
devices exported exported by a virtual machine containing a physical
block device driver.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>

xen: add the Xenbus sysfs and virtual device hotplug driver

This communicates with the machine control software via a registry
residing in a controlling virtual machine. This allows dynamic
creation, destruction and modification of virtual device
configurations (network devices, block devices and CPUS, to name some
examples).

[ Greg, would you mind giving this a review? Thanks -J ]

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Greg KH <greg@kroah.com>

xen: Add grant table support

Add Xen 'grant table' driver which allows granting of access to
selected local memory pages by other virtual machines and,
symmetrically, the mapping of remote memory pages which other virtual
machines have granted access to.

This driver is a prerequisite for many of the Xen virtual device
drivers, which grant the 'device driver domain' restricted and
temporary access to only those memory pages that are currently
involved in I/O operations.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

xen: use the hvc console infrastructure for Xen console

Implement a Xen back-end for hvc console.

* * *
Add early printk support via hvc console, enable using
"earlyprintk=xen" on the kernel command line.

From: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Olof Johansson <olof@lixom.net>

xen: hack to prevent bad segment register reload

The hypervisor saves and restores the segment registers as part of the
state is saves while context switching. If, during a context switch,
the next process doesn't use the TLS segments, it invalidates the GDT
entry, causing the segment register reload to fault. This fault
effectively doubles the cost of a context switch.

This patch is a band-aid workaround which clears the usermode %gs
after it has been saved for the previous process, but before it gets
reloaded for the next, and it avoids having the hypervisor attempt to
erroneously reload it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

xen: lazy-mmu operations

This patch uses the lazy-mmu hooks to batch mmu operations where
possible. This is primarily useful for batching operations applied to
active pagetables, which happens during mprotect, munmap, mremap and
the like (mmap does not do bulk pagetable operations, so it isn't
helped).

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>

xen: Add support for preemption

Add Xen support for preemption. This is mostly a cleanup of existing
preempt_enable/disable calls, or just comments to explain the current
usage.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

xen: SMP guest support

This is a fairly straightforward Xen implementation of smp_ops.

Xen has its own IPI mechanisms, and has no dependency on any
APIC-based IPI. The smp_ops hooks and the flush_tlb_others pv_op
allow a Xen guest to avoid all APIC code in arch/i386 (the only apic
operation is a single apic_read for the apic version number).

One subtle point which needs to be addressed is unpinning pagetables
when another cpu may have a lazy tlb reference to the pagetable. Xen
will not allow an in-use pagetable to be unpinned, so we must find any
other cpus with a reference to the pagetable and get them to shoot
down their references.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andi Kleen <ak@suse.de>

xen: Implement sched_clock

Implement xen_sched_clock, which returns the number of ns the current
vcpu has been actually in an unstolen state (ie, running or blocked,
vs runnable-but-not-running, or offline) since boot.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: john stultz <johnstul@us.ibm.com>

xen: Account for stolen time

This patch accounts for the time stolen from our VCPUs. Stolen time is
time where a vcpu is runnable and could be running, but all available
physical CPUs are being used for something else.

This accounting gets run on each timer interrupt, just as a way to get
it run relatively often, and when interesting things are going on.
Stolen time is not really used by much in the kernel; it is reported
in /proc/stats, and that's about it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>

xen: ignore RW mapping of RO pages in pagetable_init

When setting up the initial pagetable, which includes mappings of all
low physical memory, ignore a mapping which tries to set the RW bit on
an RO pte. An RO pte indicates a page which is part of the current
pagetable, and so it cannot be allowed to become RW.

Once xen_pagetable_setup_done is called, set_pte reverts to its normal
behaviour.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: ebiederm@xmission.com (Eric W. Biederman)

xen: Complete pagetable pinning

Xen requires all active pagetables to be marked read-only.  When the
base of the pagetable is loaded into %cr3, the hypervisor validates
the entire pagetable and only allows the load to proceed if it all
checks out.

This is pretty slow, so to mitigate this cost Xen has a notion of
pinned pagetables.  Pinned pagetables are pagetables which are
considered to be active even if no processor's cr3 is pointing to is.
This means that it must remain read-only and all updates are validated
by the hypervisor.  This makes context switches much cheaper, because
the hypervisor doesn't need to revalidate the pagetable each time.

This also adds a new paravirt hook which is called during setup once
the zones and memory allocator have been initialized.  When the
init_mm pagetable is first built, the struct page array does not yet
exist, and so there's nowhere to put he init_mm pagetable's PG_pinned
flags.  Once the zones are initialized and the struct page array
exists, we can set the PG_pinned flags for those pages.

This patch also adds the Xen support for pte pages allocated out of
highmem (highpte) by implementing xen_kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Zach Amsden <zach@vmware.com>

xen: add pinned page flag

Add a new definition for PG_owner_priv_1 to define PG_pinned on Xen
pagetable pages.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

xen: configuration

Put config options for Xen after the core pieces are in place.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

xen: time implementation

Xen maintains a base clock which measures nanoseconds since system
boot.  This is provided to guests via a shared page which contains a
base time in ns, a tsc timestamp at that point and tsc frequency
parameters.  Guests can compute the current time by reading the tsc
and using it to extrapolate the current time from the basetime.  The
hypervisor makes sure that the frequency parameters are updated
regularly, paricularly if the tsc changes rate or stops.

This is implemented as a clocksource, so the interface to the rest of
the kernel is a simple clocksource which simply returns the current
time directly in nanoseconds.

Xen also provides a simple timer mechanism, which allows a timeout to
be set in the future.  When that time arrives, a timer event is sent
to the guest.  There are two timer interfaces:
- An old one which also delivers a stream of (unused) ticks at 100Hz,
   and on the same event, the actual timer events.  The 100Hz ticks
   cause a lot of spurious wakeups, but are basically harmless.
- The new timer interface doesn't have the 100Hz ticks, and can also
   fail if the specified time is in the past.

This code presents the Xen timer as a clockevent driver, and uses the
new interface by preference.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>

xen: event channels

Xen implements interrupts in terms of event channels. Each guest
domain gets 1024 event channels which can be used for a variety of
purposes, such as Xen timer events, inter-domain events,
inter-processor events (IPI) or for real hardware IRQs.

Within the kernel, we map the event channels to IRQs, and implement
the whole interrupt handling using a Xen irq_chip.

Rather than setting NR_IRQ to 1024 under PARAVIRT in order to
accomodate Xen, we create a dynamic mapping between event channels and
IRQs. Ideally, Linux will eventually move towards dynamically
allocating per-irq structures, and we can use a 1:1 mapping between
event channels and irqs.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric W. Biederman <ebiederm@xmission.com>

xen: virtual mmu

Xen pagetable handling, including the machinery to implement direct
pagetables.

Xen presents the real CPU's pagetables directly to guests, with no
added shadowing or other layer of abstraction.  Naturally this means
the hypervisor must maintain close control over what the guest can put
into the pagetable.

When the guest modifies the pte/pmd/pgd, it must convert its
domain-specific notion of a "physical" pfn into a global machine frame
number (mfn) before inserting the entry into the pagetable.  Xen will
check to make sure the domain is allowed to create a mapping of the
given mfn.

Xen also requires that all mappings the guest has of its own active
pagetable are read-only.  This is relatively easy to implement in
Linux because all pagetables share the same pte pages for kernel
mappings, so updating the pte in one pagetable will implicitly update
the mapping in all pagetables.

Normally a pagetable becomes active when you point to it with cr3 (or
the Xen equivalent), but when you do so, Xen must check the whole
pagetable for correctness, which is clearly a performance problem.

Xen solves this with pinning which keeps a pagetable effectively
active even if its currently unused, which means that all the normal
update rules are enforced.  This means that it need not revalidate the
pagetable when loading cr3.

This patch has a first-cut implementation of pinning, but it is more
fully implemented in a later patch.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

xen: Core Xen implementation

This patch is a rollup of all the core pieces of the Xen
implementation, including:
- booting and setup
- pagetable setup
- privileged instructions
- segmentation
- interrupt flags
- upcalls
- multicall batching

BOOTING AND SETUP

The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.

Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.

Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper.  The main
steps are:
  1. Install the Xen paravirt_ops, which is simply a matter of a
     structure assignment.
  2. Set init_mm to use the Xen-supplied pagetables (analogous to the
     head.S generated pagetables in a native boot).
  3. Reserve address space for Xen, since it takes a chunk at the top
     of the address space for its own use.
  4. Call start_kernel()

PAGETABLE SETUP

Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state.  One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable.  Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind.  It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.

This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.

PRIVILEGED INSTRUCTIONS AND SEGMENTATION

When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly.  Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops.  In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.

The privileged instructions fall into the broad classes of:
  Segmentation: setting up the GDT and the GDT entries, LDT,
     TLS and so on.  Xen doesn't allow the GDT to be directly
     modified; all GDT updates are done via hypercalls where the new
     entries can be validated.  This is important because Xen uses
     segment limits to prevent the guest kernel from damaging the
     hypervisor itself.
  Traps and exceptions: Xen uses a special format for trap entrypoints,
     so when the kernel wants to set an IDT entry, it needs to be
     converted to the form Xen expects.  Xen sets int 0x80 up specially
     so that the trap goes straight from userspace into the guest kernel
     without going via the hypervisor.  sysenter isn't supported.
  Kernel stack: The esp0 entry is extracted from the tss and provided to
     Xen.
  TLB operations: the various TLB calls are mapped into corresponding
     Xen hypercalls.
  Control registers: all the control registers are privileged.  The most
     important is cr3, which points to the base of the current pagetable,
     and we handle it specially.

Another instruction we treat specially is CPUID, even though its not
privileged.  We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.

INTERRUPT FLAGS

Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure.  Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).

(A note on terminology: "events" and interrupts are effectively
synonymous.  However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)

There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state.  The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.

UPCALLS

Xen needs a couple of upcall (or callback) functions to be implemented
by each guest.  One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests.  The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret.  These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.

MULTICALL BATCHING

Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor.  This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one.  This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>

xen: Add Xen interface header files

Add Xen interface header files. These are taken fairly directly from
the Xen tree, but somewhat rearranged to suit the kernel's conventions.

Define macros and inline functions for doing hypercalls into the
hypervisor.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

Add nosegneg capability to the vsyscall page notes

Add the "nosegneg" fake capabilty to the vsyscall page notes. This is
used by the runtime linker to select a glibc version which then
disables negative-offset accesses to the thread-local segment via
%gs. These accesses require emulation in Xen (because segments are
truncated to protect the hypervisor address space) and avoiding them
provides a measurable performance boost.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>

Add a sched_clock paravirt_op

The tsc-based get_scheduled_cycles interface is not a good match for
Xen's runstate accounting, which reports everything in nanoseconds.

This patch replaces this interface with a sched_clock interface, which
matches both Xen and VMI's requirements.

In order to do this, we:
   1. replace get_scheduled_cycles with sched_clock
   2. hoist cycles_2_ns into a common header
   3. update vmi accordingly

One thing to note: because sched_clock is implemented as a weak
function in kernel/sched.c, we must define a real function in order to
override this weak binding.  This means the usual paravirt_ops
technique of using an inline function won't work in this case.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: john stultz <johnstul@us.ibm.com>

paravirt: helper to disable all IO space

In a virtual environment, device drivers such as legacy IDE will waste
quite a lot of time probing for their devices which will never appear.
This helper function allows a paravirt implementation to lay claim to
the whole iomem and ioport space, thereby disabling all device drivers
trying to claim IO resources.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>

Allocate and free vmalloc areas

Allocate/release a chunk of vmalloc address space:
alloc_vm_area reserves a chunk of address space, and makes sure all
the pagetables are constructed for that address range - but no pages.

free_vm_area releases the address space range.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: "Andi Kleen" <ak@muc.de>

paravirt: export __supported_pte_mask

__supported_pte_mask is needed when constructing pte values. Xen
device drivers need to do this to make mappings of foreign pages (ie,
pages granted to us by other domains).

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

paravirt: make siblingmap functions visible

Paravirt implementations need to set the sibling map on new cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

paravirt: unstatic smp_store_cpu_info

Paravirt implementations need to store cpu info when bringing up cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

paravirt: unstatic leave_mm

Make globally leave_mm visible, specifically so that Xen can use it to
shoot-down lazy uses of cr3.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

paravirt: increase IRQ limit

When running with CONFIG_PARAVIRT, we may want lots of IRQs even if
there's no IO APIC.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>

paravirt: add a hook for once the allocator is ready

Add a hook so that the paravirt backend knows when the allocator is
ready. This is useful for the obvious reason that the allocator is
available, but the other side-effect of having the bootmem allocator
available is that each page now has an associated "struct page".

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

paravirt: add an "mm" argument to alloc_pt

It's useful to know which mm is allocating a pagetable. Xen uses this
to determine whether the pagetable being added to is pinned or not.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

use elfnote.h to generate vsyscall notes.

Use existing elfnote.h to generate vsyscall notes, rather than doing
it locally. Changes elfnote.h a bit to suit, since this is the first
asm user, and it wasn't quite right.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.com>

usermodehelper: Tidy up waiting

Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that. I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>

Add common orderly_poweroff()

Various pieces of code around the kernel want to be able to trigger an
orderly poweroff. This pulls them together into a single
implementation.

By default the poweroff command is /sbin/poweroff, but it can be set
via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it
can include command-line arguments.

This patch replaces four other instances of invoking either "poweroff"
or "shutdown -h now": two sbus drivers, and acpi thermal
management.

sparc64 has its own "powerd"; still need to determine whether it should
be replaced by orderly_poweroff().

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David S. Miller <davem@davemloft.net>

usermodehelper: split setup from execution

Rather than having hundreds of variations of call_usermodehelper for
various pieces of usermode state which could be set up, split the
info allocation and initialization from the actual process execution.

This means the general pattern becomes:
info = call_usermodehelper_setup(path, argv, envp); /* basic state */
call_usermodehelper_<SET EXTRA STATE>(info, stuff...); /* extra state */
call_usermodehelper_exec(info, wait); /* run process and free info */

This patch introduces wrappers for all the existing calling styles for
call_usermodehelper_*, but folds their implementations into one.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Bj?rn Steinbrink <B.Steinbrink@gmx.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>

add argv_split()

argv_split() is a helper function which takes a string, splits it at
whitespace, and returns a NULL-terminated argv vector.  This is
deliberately simple - it does no quote processing of any kind.

[ Seems to me that this is something which is already being done in
  the kernel, but I couldn't find any other implementations, either to
  steal or replace.  Keep an eye out. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>

add kstrndup

Add a kstrndup function, modelled on strndup. Like strndup this
returns a string copied into its own allocated memory, but it copies
no more than the specified number of bytes from the source.

Remove private strndup() from irda code.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@mandriva.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Panagiotis Issaris <takis@issaris.org>
Cc: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>

zs: move to the serial subsystem

This is a reimplementation of the zs driver for the serial subsystem.  Any
resemblance to the old driver is purely coincidential.  ;-) I do hope I got
the handling of modem lines right -- better do not tackle me about the
issue unless you feel too good...

Any users of the old driver: please note the numbers of the serial lines
have now been swapped, i.e.  ttyS0 <-> ttyS1 and ttyS2 <-> ttyS3.  It has
to do with the modem lines mentioned above; basically the port A in a given
chip has to be initialised before the port B if you want to use the latter
as the serial console (which is usually the case), as operations on modem
lines of the serial line associated with the port B access both ports (see
the comment at the top of the driver for the details of wiring used).
Please update your scripts.

This is also the reason each SCC now requests an IRQ once only (as seen in
"/proc/interrupts") -- the handler takes care of both ports at once as the
line associated with the port B has to take status update interrupts from
both ports (and yet the line of the port A takes its own for itself too).
The old driver never got it right...

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

serial: add early_serial_setup() back to header file

early_serial_setup was removed from serial.h, but forgot to put in
serial_8250.h

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fbdev: make fb_append_extra_logo() depend on fb=y

We can't show the extra logo from boot code if FB is built as a module.
Make the FB_LOGO_EXTRA depend on FB=y.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dm: fix memory leak in dm_create_persistent() when starting metadata update thread fails

If, in dm_create_persistent(), the call to create_singlethread_workqueue()
fails then we'll return without freeing the memory allocated to 'ps', thus
leaking sizeof(struct pstore) bytes. This patch fixes the leak.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

UBI: fix compile warning

cdev.c whines in current git:

drivers/mtd/ubi/cdev.c: In function `major_to_device':
drivers/mtd/ubi/cdev.c:67: warning: control reaches end of non-void function

Shut it up.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix error handling in erase worker

Do not switch to read-only mode in case of -EINTR and some
other obvious cases. Switch to RO mode only when we do not
know what is the error.

Reported-by: Vinit Agnihotri <vinit.agnihotri@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix comments

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: remove unneeded error checks

Pointed to by viro.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: cleanup usage of try_module_get

The use of try_module_get(THIS_MODULE) in ubi_get_device_info does not
offer real protection against unexpected driver unloads, since we could
be preempted before try_modules_get gets executed. It is the caller who
should manipulate the refcounts. Besides, ubi_get_device_info is an
exported symbol which guarantees protection when accessed through
symbol_get.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix overflow bug

I was experiencing overflows in multiplications for
volume->used_bytes in vmt.c & vtbl.c, while creating & resizing large volumes.

vol->used_bytes is long long however its 2 operands vol->used_ebs &
vol->usable_leb_size
are int. So their multiplication for larger values causes integer overflows.
Typecasting them solves the problem.

My machine & flash details:

64Bit dual-core AMD opteron, 1 GB RAM, linux 2.6.18.3.
mtd size = 6GB, volume size= 5GB, peb_size = 4MB.

heres patch which does the fix.

Signed-off-by: Vinit Agnihotri <vinit.agnihotri@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: bugfix in max_sqnum calculation

Do not zero max_sqnum after a new volume has been found.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: bugfix in sqnum calculation

Hi,I came across problem of having two leb with same sequence no.This
happens when we continuously write one block again and again and reboot
machine before background thread erases those blocks.
The problem here was,when we find two blocks with same sequence no,we take
the higher one,but we were not updating max seq no,so next block may have
the same seqnum.
This patch solves this problem.

Signed-off-by: Brijesh Singh <brijesh.s.singh@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix signed-unsigned multiplication

There is signed multiplication assigned to unsigned ei.addr in io.c.
This causes wrong addresses for big multiplication.This patch solves the
problem.

Signed-off-by: Brijesh Singh <brijesh.s.singh@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix bug in atomic_leb_change()

atomic_leb_change() is only allowed for dynamic volumes, so set
the volume type correctly.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix message

Increase UBI devices couter after the message, not before.

Signed-off-by: Vinit Agnihotri <vinit.agnihotri@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix debugging stuff

Do not check volumes which are currently in use because thay may be
in inconsistent state.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: bugfix in error path

When volume creation fails, we have to set ubi->volumes[vol_id]
back to NULL.

This patch also tweaks some debugging stuff.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: use is_power_of_2()

Replacing (n & (n-1)) in the context of power of 2 checks
with is_power_of_2

Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix freeing ubi->vtbl while unloading

ubi->vtbl is allocated using vmalloc() in vtbl.c empty_create_lvol(),
but it is freed in build.c with kfree()

Signed-off-by: Vinit Agnihotri <vinit.agnihotri@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix MAINTAINERS

Fix UBI git tree URL.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: bugfix in ubi_leb_change()

Do not call 'ubi_wl_put_peb()' if the LEB was unmapped.

Reported-by: Gabor Loki <loki@inf.u-szeged.hu>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: kill homegrown endian macros

Kill UBI's homegrown endianess handling and replace it with
the standard kernel endianess handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: cleanup ioctl handling

- don't do access_ok + get/put user but use the proper macro
- remove useless checks

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: error path bugfix

No need to unlock the lock, this will be done at out_unlock.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: minor comma fix

Use coma at the the last elements of structure initializer.

Daniel Stone's explanation:

Because it turns:
-   .attr   = foo
+   .attr   = foo,
+   .bar    = baz

into:
+   .bar    = baz,

i.e., far less likely to screw up a merge.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: use vmalloc for large buffers

UBI allocates temporary buffers of PEB size, which may be 256KiB.
Use vmalloc instead of kmalloc for such big temporary buffers.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: add few more comments

Add few comments above ubi_scan_add_used() to explain why it is so
complex. Requested by Satyam Sharma <satyam.sharma@gmail.com>.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: set correct gluebi device size

In case of static volumes, make emulated MTD device size to
be equivalent to data size, rather then volume size.

Reported-by: John Smith <john@arrows.demon.co.uk>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: do not let to read too much

In case of static volumes it is prohibited to read more data
then available.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix error path in create_vtbl()

There were several bugs in volume table creation error path. Thanks to
Satyam Sharma <satyam.sharma@gmail.com> and Florin Malita <fmalita@gmail.com>
for finding and analysing them: http://lkml.org/lkml/2007/5/3/274

This patch makes ubi_scan_add_to_list() static and renames it to
add_to_list(), just because it is not needed outside scan.c anymore.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix dereference after kfree

Coverity (CID 1614) spotted new_seb being dereferenced after kfree() in
create_vtbl's write_error path.

Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBI: fix memory leak in checking code

Reported-by: Eric Sesterhenn / Snakebyte <snakebyte@gmx.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

[NETFILTER]: xt_connlimit needs to depend on nf_conntrack

With NF_CONNTRACK=n, NETFILTER_XT_MATCH_CONNLIMIT=m I get the
following errors on current git:

  CC [M]  net/netfilter/xt_connlimit.o
  In file included from net/netfilter/xt_connlimit.c:27:
  include/net/netfilter/nf_conntrack.h:100: error: field 'ct_general' has incomplete type
  include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
  include/net/netfilter/nf_conntrack.h:164: error: 'const struct sk_buff' has no member named 'nfct'
  include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
  include/net/netfilter/nf_conntrack.h:171: warning: implicit declaration of function 'nf_conntrack_put'
  include/net/netfilter/nf_conntrack.h: In function 'nf_ct_is_untracked':
  include/net/netfilter/nf_conntrack.h:253: error: 'const struct sk_buff' has no member named 'nfct'
  In file included from net/netfilter/xt_connlimit.c:28:
  include/net/netfilter/nf_conntrack_core.h: In function 'nf_conntrack_confirm':
  include/net/netfilter/nf_conntrack_core.h:68: error: 'struct sk_buff' has no member named 'nfct'

Adding a dependency in Kconfig fixes this.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: ipt_iprange.h must #include <linux/types.h>

ipt_iprange.h must #include <linux/types.h> since it uses __be32.

This patch fixes kernel Bugzilla #7604.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IrDA]: Fix IrDA build failure

When having built-in IrDA, we hit the following error:

`irda_sysctl_unregister' referenced in section `.init.text' of
net/built-in.o: defined in discarded section `.exit.text' of
net/built-in.o
`irda_proc_unregister' referenced in section `.init.text' of
net/built-in.o: defined in discarded section `.exit.text' of
net/built-in.o
`irsock_cleanup' referenced in section `.init.text' of net/built-in.o:
defined in discarded section `.exit.text' of net/built-in.o
`irttp_cleanup' referenced in section `.init.text' of net/built-in.o:
defined in discarded section `.exit.text' of net/built-in.o
`iriap_cleanup' referenced in section `.init.text' of net/built-in.o:
defined in discarded section `.exit.text' of net/built-in.o
`irda_device_cleanup' referenced in section `.init.text' of
net/built-in.o: defined in discarded section `.exit.text' of
net/built-in.o
`irlap_cleanup' referenced in section `.init.text' of net/built-in.o:
defined in discarded section `.exit.text' of net/built-in.o
`irlmp_cleanup' referenced in section `.init.text' of net/built-in.o:
defined in discarded section `.exit.text' of net/built-in.o
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [_all] Error 2

This is due to the irda_init fix recently added, where we call __exit
routines from an __init one. It is a build failure that I didn't catch
because it doesn't show up when building IrDA as a module. My apologies
for that.
The following patch fixes that failure and is against your net-2.6
tree. I hope it can make it to the merge window, and stable@kernel.org
is CCed on this mail.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: nicstar needs virt_to_bus

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NET]: merge dev_unicast_discard and dev_mc_discard into one

this two functions could share the dev->_xmit_lock acquired context.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NET]: move dev_mc_discard from dev_mcast.c to dev.c

Because this function is only called by unregister_netdevice,
this moving could make this non-global function static,
and also remove its declaration in netdevice.h;

Any further, function __dev_addr_discard is also just called by
dev_mc_discard and dev_unicast_discard, keeping this two functions
both in one c file could make __dev_addr_discard also static
and remove its declaration in netdevice.h;

Futhermore, the sequential call to dev_unicast_discard and then
dev_mc_discard in unregister_netdevice have a similar mechanism that:
(netif_tx_lock_bh / __dev_addr_discard / netif_tx_unlock_bh),
they should merged into one to eliminate duplicates in acquiring and
releasing the dev->_xmit_lock, this would be done in my following patch.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETLINK]: negative groups in netlink_setsockopt

Reading netlink_setsockopt it's not immediately clear why there isn't a
bug when you pass in negative numbers, the reason being that the >=
comparison is really unsigned although 'val' is signed because
nlk->ngroups is unsigned. Make 'val' unsigned too.

[ Update the get_user() cast to match. --DaveM ]

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[PPPOL2TP]: Reset meta-data in xmit function

Reset netfilter data and IP CB, fix dst_entry leak.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[PPPOL2TP]: Fix use-after-free

Don't use skb->len after passing it to ip_queue_xmit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[PKT_SCHED]: Some typo fixes in net/sched/Kconfig

Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[XFRM]: Fix crash introduced by struct dst_entry reordering

XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which
was broken by the dst_entry reordering in commit 1e19e02c~, causing
an oops in xfrm_bundle_ok when walking the bundle upwards.

Kill xfrm_dst->u.next and change the only user to use dst->next instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[TCP]: remove unused argument to cong_avoid op

None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: [idt77252] Rename CONFIG_ATM_IDT77252_SEND_IDLE to not resemble a Kconfig variable

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: [drivers] ioremap balanced with iounmap

Signed-off-by: Amol Lad <amol@verismonetworks.com>
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: [lanai] sram_test_word() must be __devinit

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: [nicstar] Replace C code with call to ARRAY_SIZE() macro.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: Eliminate dead config variable CONFIG_BR2684_FAST_TRANS.

Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>