err.no Git - linux-2.6/log

]> err.no Git - linux-2.6/log

Martin J. Bligh [Thu, 23 Jun 2005 07:08:08 +0000 (00:08 -0700)]

[PATCH] add page_state info to show_mem

This helps a lot when debugging out of memory stuff - useful especially to
see if all the memory is sucked into slab, etc.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:07 +0000 (00:08 -0700)]

[PATCH] add x86-64 specific support for sparsemem

This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels. Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

text data bss dec hex filename
2371036 765884 1237108 4374028 42be0c vmlinux.discontig
2366549 776484 1302772 4445805 43d66d vmlinux.sparse

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:06 +0000 (00:08 -0700)]

[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options

In order to use the alternative sparsemem implmentation for NUMA kernels,
we need to reorganize the config options. This patch effectively abstracts
out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus,
the discontigmem implementation may be employed as always, but the
sparsemem implementation may be used alternatively.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:05 +0000 (00:08 -0700)]

[PATCH] add x86-64 Kconfig options for sparsemem

Add the requisite arch specific Kconfig options to enable the use of the
sparsemem implementation for NUMA kernels on x86-64.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:03 +0000 (00:08 -0700)]

[PATCH] remove direct ref to contig_page_data for x86-64

This patch pulls out all remaining direct references to contig_page_data
from arch/x86-64, thus saving an ifdef in one case.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:03 +0000 (00:08 -0700)]

[PATCH] ppc64: sparsemem memory model

Provide the architecture specific implementation for SPARSEMEM for PPC64
systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part)
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:02 +0000 (00:08 -0700)]

[PATCH] ppc64: add memory present

Provide hooks for PPC64 to allow memory models to be informed of installed
memory areas. This allows SPARSEMEM to instantiate mem_map for the populated
areas.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:01 +0000 (00:08 -0700)]

[PATCH] ppc64: add early_pfn_to_nid

Provide an implementation of early_pfn_to_nid for PPC64. This is used by
memory models to determine the node from which to take allocations before the
memory allocators are fully initialised.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:00 +0000 (00:08 -0700)]

[PATCH] sparsemem hotplug base

Make sparse's initalization be accessible at runtime.  This allows sparse
mappings to be created after boot in a hotplug situation.

This patch is separated from the previous one just to give an indication how
much of the sparse infrastructure is *just* for hotplug memory.

The section_mem_map doesn't really store a pointer.  It stores something that
is convenient to do some math against to get a pointer.  It isn't valid to
just do *section_mem_map, so I don't think it should be stored as a pointer.

There are a couple of things I'd like to store about a section.  First of all,
the fact that it is !NULL does not mean that it is present.  There could be
such a combination where section_mem_map *is* NULL, but the math gets you
properly to a real mem_map.  So, I don't think that check is safe.

Since we're storing 32-bit-aligned structures, we have a few bits in the
bottom of the pointer to play with.  Use one bit to encode whether there's
really a mem_map there, and the other one to tell whether there's a valid
section there.  We need to distinguish between the two because sometimes
there's a gap between when a section is discovered to be present and when we
can get the mem_map for it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:59 +0000 (00:07 -0700)]

[PATCH] sparsemem swiss cheese numa layouts

The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem.  It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn.  It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.

Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts.  The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:

Node 0: +++++     +++++
Node 1:      +++++

Previous behavior before Mike's patch was to assign nodes like this:

Node 0: 00000     XXXXX
Node 1:      11111

Where the 'X' areas were simply thrown away.  The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111

This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory.  memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages.  However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:

struct page *start = pfn_to_page(start_pfn);
// effectively start = &node->node_mem_map[0]
for (page = start; page < (start + size); page++) {
init_page_here();...
page++;
}

Slow, and wasteful, but generally harmless.

But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):

for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
page = pfn_to_page(pfn);
}

And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0.  This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:57 +0000 (00:07 -0700)]

[PATCH] sparsemem memory model for i386

Provide the architecture specific implementation for SPARSEMEM for i386 SMP
and NUMA systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:54 +0000 (00:07 -0700)]

[PATCH] sparsemem memory model

Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:53 +0000 (00:07 -0700)]

[PATCH] generify memory present

Allow architectures to indicate that they will be providing hooks to indice
installed memory areas, memory_present(). Provide prototypes for the i386
implementation.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:52 +0000 (00:07 -0700)]

[PATCH] generify early_pfn_to_nid

Provide a default implementation for early_pfn_to_nid returning node 0. Allow
architectures to override this with their own implementation out of
asm/mmzone.h.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Mike Kravetz [Thu, 23 Jun 2005 07:07:51 +0000 (00:07 -0700)]

[PATCH] ppc64: Kconfig memory models

This patch changes some of the default behavior in the ppc64 Kconfig file
that was recently changed/added to 2.6.12-rc2-mm1 by Dave Hansen in
preparation for SPARSEMEM. Patch allows the display of both FLAT and
DISCONTIG models on pseries. As before, default is DISCONTIG for SMP and
PSERIES and FLAT for others.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:50 +0000 (00:07 -0700)]

[PATCH] mm/Kconfig: give DISCONTIG more help text

This gives DISCONTIGMEM a bit more help text to explain what it does, not just
when to choose it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:49 +0000 (00:07 -0700)]

[PATCH] mm/Kconfig: hide "Memory Model" selection menu

I got some feedback from users who think that the new "Memory Model" menu is a
little invasive. This patch will hide that menu, except when
CONFIG_EXPERIMENTAL is enabled *or* when an individual architecture wants it.

An individual arch may want to enable it because they've removed their
arch-specific DISCONTIG prompt in favor of the mm/Kconfig one.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:48 +0000 (00:07 -0700)]

[PATCH] mm/Kconfig: kill unused ARCH_FLATMEM_DISABLE

This used to be used to disable FLATMEM selection, but I decided to change it
to be done generically when DISCONTIG is enabled. The option is unused, so
this kills it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:47 +0000 (00:07 -0700)]

[PATCH] sparsemem: fix minor "defaults" issue in mm/Kconfig

The following patch applies on top of 2.6.12-rc2-mm1. It fixes a minor
user interaction issue, and an early reference to SPARSEMEM.

This "choice" menu would always default to FLATMEM, as it was listed first.
Move it to the end so that the other defaults have a chance first.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:47 +0000 (00:07 -0700)]

[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG

There is some confusion that arose when working on SPARSEMEM patch between
what is needed for DISCONTIG vs. NUMA.

Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
All of the current NUMA implementations require an implementation of
DISCONTIG.  Because of this, quite a lot of code which is really needed for
NUMA is actually under DISCONTIG #ifdefs.  For SPARSEMEM, we changed some
of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
case.

Introducing this new NEED_MULTIPLE_NODES config option allows code that is
needed for both NUMA or DISCONTIG to be separated out from code that is
specific to DISCONTIG.

One great advantage of this approach is that it doesn't require every
architecture to be converted over.  All of the current implementations
should "just work", only the ones implementing SPARSEMEM will have to be
fixed up.

The change to free_area_init() makes it work inside, or out of the new
config option.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:45 +0000 (00:07 -0700)]

[PATCH] update all defconfigs for ARCH_DISCONTIGMEM_ENABLE

This will at least suppress one prompt that users would have received the
first time they compile with the new DISCONTIG arch option. They'll still
get the "Memory Model" prompt, but 99% of them will have the default work
there.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:43 +0000 (00:07 -0700)]

[PATCH] make each arch use mm/Kconfig

For all architectures, this just means that you'll see a "Memory Model"
choice in your architecture menu. For those that implement DISCONTIGMEM,
you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
y" and make your users select DISCONTIGMEM right out of the new choice
menu. The only disadvantage might be if you have some specific things that
you need in your help option to explain something about DISCONTIGMEM.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:42 +0000 (00:07 -0700)]

[PATCH] create mm/Kconfig for arch-independent memory options

With sparsemem being introduced, we need a central place for new
memory-related .config options: mm/Kconfig.  This allows us to remove many
of the duplicated arch-specific options.

The new option, CONFIG_FLATMEM, is there to enable us to detangle NUMA and
DISCONTIGMEM.  This is a requirement for sparsemem because sparsemem uses
the NUMA code without the presence of DISCONTIGMEM.  The sparsemem patches
use CONFIG_FLATMEM in generic code, so this patch is a requirement before
applying them.

Almost all places that used to do '#ifndef CONFIG_DISCONTIGMEM' should use
'#ifdef CONFIG_FLATMEM' instead.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:41 +0000 (00:07 -0700)]

[PATCH] sparsemem base: teach discontig about sparse ranges

discontig.c has some assumptions that mem_map[]s inside of a node are
contiguous. Teach it to make sure that each region that it's bringing online
is actually made up of valid ranges of ram.

Written-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:40 +0000 (00:07 -0700)]

[PATCH] sparsemem base: reorganize page->flags bit operations

Generify the value fields in the page_flags.  The aim is to allow the location
and size of these fields to be varied.  Additionally we want to move away from
fixed allocations per field whilst still enforcing the overall bit utilisation
limits.  We rely on the compiler to spot and optimise the accessor functions.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:39 +0000 (00:07 -0700)]

[PATCH] sparsemem base: simple NUMA remap space allocator

Introduce a simple allocator for the NUMA remap space.  This space is very
scarce, used for structures which are best allocated node local.

This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.

Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
  to allow us to allocate the pgdat's using this mechanism; which we do not yet
  do.  Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:38 +0000 (00:07 -0700)]

[PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)

The following four patches provide the last needed changes before the
introduction of sparsemem.  For a more complete description of what this
will do, please see this patch:

http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

or previous posts on the subject:
http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2

Three of these are i386-only, but one of them reorganizes the macros
used to manage the space in page->flags, and will affect all platforms.
There are analogous patches to the i386 ones for ppc64, ia64, and
x86_64, but those will be submitted by the normal arch maintainers.

The combination of the four patches has been test-booted on a variety of
i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17
different .configs.  It's also been runtime-tested on ia64 configs (with
more patches on top).

This patch:

We _know_ which node pages in general belong to, at least at a very gross
level in node_{start,end}_pfn[].  Use those to target the allocations of
pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:37 +0000 (00:07 -0700)]

[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map

This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code.  On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.

There was also a node_mem_map(nid) macro on many architectures.  Its use
along with the use of ->node_mem_map itself was not consistent.  It has
been removed in favor of two new, more explicit, arch-independent macros:

pgdat_page_nr(pgdat, pagenr)
nid_page_nr(nid, pagenr)

I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways.  I
believe the newer names are much clearer.

These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead.  We could
make this the only behavior if people want, but I don't want to change too
much at once.  One thing at a time.

This patch removes more code than it adds.

Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64.  Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/

Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Linus Torvalds [Thu, 23 Jun 2005 16:25:04 +0000 (09:25 -0700)]

Merge 'misc-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6

commit | commitdiff | tree

Mitch Williams [Thu, 23 Jun 2005 07:41:00 +0000 (03:41 -0400)]

e1000: fix spinlock bug

This patch fixes an obvious and nasty bug where we could exit the transmit
routine while holding tx_lock.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 23 Jun 2005 06:18:10 +0000 (23:18 -0700)]

Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6

commit | commitdiff | tree

Linus Torvalds [Thu, 23 Jun 2005 06:11:50 +0000 (23:11 -0700)]

Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

commit | commitdiff | tree

Greg Kroah-Hartman [Wed, 22 Jun 2005 23:09:05 +0000 (16:09 -0700)]

[PATCH] driver core: Fix up the device_attach() error handling in bus_add_device()

Don't error out if something "bad" happens when trying to bind a driver to a
device. We want the sysfs attributes to be present for later when we try to
tear down the device.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit | commitdiff | tree

Stelian Pop [Wed, 22 Jun 2005 15:53:28 +0000 (17:53 +0200)]

[PATCH] USB: fix hid core to return proper error code from probe

Drivers need to return -ENODEV when they can't bind to a device.
Anything else stops the "bind a device to a driver" search.

From: Stelian Pop <stelian@popies.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit | commitdiff | tree

Nishanth Aravamudan [Thu, 23 Jun 2005 05:19:52 +0000 (22:19 -0700)]

[LTPC]: Replace schedule_timeout() with ssleep()/msleep()

Use ssleep() / msleep() [as appropriate]
instead of schedule_timeout() to guarantee the task delays as expected.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Shaun Pereira [Thu, 23 Jun 2005 05:16:17 +0000 (22:16 -0700)]

[X25]: Fast select with no restriction on response

This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data".  It allows use of the Fast-Select-Acceptance
optional user facility for X.25.

This patch just implements fast select with no restriction on response
(NRR).  What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data.  This patch allows such a response.

The called DTE can also respond with a clear-request packet that contains
call-user-data.  However, this feature is currently not implemented by the
patch.

How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,
After a listen socket in created and bound as follows
socket(AF_X25, SOCK_SEQPACKET, 0);
bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted
packet will be sent back until the following system call is made on the socket
that accepts the call
ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the
application server's call-user-data in the call-accepted packet,
provided the call-request was made with Fast-select NRR.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Shaun Pereira [Thu, 23 Jun 2005 05:15:01 +0000 (22:15 -0700)]

[X25]: Selective sub-address matching with call user data.

From: Shaun Pereira <spereira@tusc.com.au>

This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router).  Details
are as follows.

Current state of module:

A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.

If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own.  This is required when multiple servers listen at the same x25 address
and device interface.  The kernel currently matches ALL call user data, if
present.

Current Changes:

This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match.  By
default no call user data is matched, even if call user data is present.
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched.
Otherwise the kernel behavior is exactly as the original implementation.

This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller

Future Changes on next patch:

There are cases however when call user data may be present in the call
accepted packet.  According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used.  My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used.  I
am currently testing this, again with xot on linux and cisco.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

James Lamanna [Thu, 23 Jun 2005 05:12:57 +0000 (22:12 -0700)]

[EBTABLES]: vfree() checking cleanups

From: jlamanna@gmail.com

ebtables.c vfree() checking cleanups.

Signed-off by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nishanth Aravamudan [Thu, 23 Jun 2005 05:11:44 +0000 (22:11 -0700)]

[ATALK] aarp: replace schedule_timeout() with msleep()

From: Nishanth Aravamudan <nacc@us.ibm.com>

Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not wrong, but it does not account for
early return due to signals, so I think msleep() should be appropriate.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Chuck Short [Thu, 23 Jun 2005 05:10:23 +0000 (22:10 -0700)]

[IPV4]: Fix route.c gcc4 warnings

Signed-off by: Chuck Short <zulcss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jeff Moyer [Thu, 23 Jun 2005 05:05:59 +0000 (22:05 -0700)]

[NETPOLL]: allow multiple netpoll_clients to register against one interface

This patch provides support for registering multiple netpoll clients to the
same network device.  Only one of these clients may register an rx_hook,
however.  In practice, this restriction has not been problematic.  It is
worth mentioning, though, that the current design can be easily extended to
allow for the registration of multiple rx_hooks.

The basic idea of the patch is that the rx_np pointer in the netpoll_info
structure points to the struct netpoll that has rx_hook filled in.  Aside
from this one case, there is no need for a pointer from the struct
net_device to an individual struct netpoll.

A lock is introduced to protect the setting and clearing of the np_rx
pointer.  The pointer will only be cleared upon netpoll client module
removal, and the lock should be uncontested.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jeff Moyer [Thu, 23 Jun 2005 05:05:31 +0000 (22:05 -0700)]

[NETPOLL]: Introduce a netpoll_info struct

This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll.  The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll;  and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.

The struct netpoll is now pointed to by the netpoll_info structure.  As
such, the previous behaviour of the code is preserved.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jeff Moyer [Thu, 23 Jun 2005 05:04:55 +0000 (22:04 -0700)]

[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock()

This trivial patch moves the assignment of poll_owner to -1 inside of
the lock. This fixes a potential SMP race in the code.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 03:26:07 +0000 (20:26 -0700)]

[PATCH] boot_pageset must not be freed.

The boot_pageset needs to be preserved for hotplugging and for off line
processors and nodes. Otherwise pointers will point into memory that has
now a different use. /proc/zoneinfo is currently showing strange results
if processors / nodes are not present.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Linus Torvalds [Wed, 22 Jun 2005 21:51:06 +0000 (14:51 -0700)]

Merge master.kernel.org:/home/rmk/linux-2.6-arm

commit | commitdiff | tree

Eric Dumazet [Wed, 22 Jun 2005 21:32:51 +0000 (14:32 -0700)]

[NET]: dont use strlen() but the result from a prior sprintf()

Small patch to save an unecessary call to strlen() : sprintf() gave us
the length, just trust it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Wed, 22 Jun 2005 21:32:15 +0000 (14:32 -0700)]

Merge rsync://client.linux-nfs.org/pub/linux/nfs-2.6

commit | commitdiff | tree

Russell King [Wed, 22 Jun 2005 20:47:25 +0000 (21:47 +0100)]

[PATCH] ARM: Remove explicit page-alignments in memory init

Since meminfo.bank[] array contains page-aligned start/size, we
no longer need to explicitly round up/down the addresses when
converting to PFNs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Russell King [Wed, 22 Jun 2005 20:43:10 +0000 (21:43 +0100)]

[PATCH] ARM: Ensure memory information is page aligned

Ensure that meminfo.bank[] array contains page-aligned start/size
information.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:29:03 +0000 (13:29 -0700)]

[CRYPTO]: Use CPU cycle counters in tcrypt

After using this facility for a while to test my changes to the
cipher crypt() layer, I realised that I should've listend to Dave
and made this thing use CPU cycle counters :) As it is it's too
jittery for me to feel safe about relying on the results.

So here is a patch to make it use CPU cycles by default but fall
back to jiffies if the user specifies a non-zero sec value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:27:51 +0000 (13:27 -0700)]

[CRYPTO]: Use template keys for speed tests if possible

The existing keys used in the speed tests do not pass the 3DES quality check.
This patch makes it use the template keys instead.

Other algorithms can supply template keys through the same interface if needed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Harald Welte [Wed, 22 Jun 2005 20:27:23 +0000 (13:27 -0700)]

[CRYPTO]: Add cipher speed tests

From: Reyk Floeter <reyk@vantronix.net>

I recently had the requirement to do some benchmarking on cryptoapi, and
I found reyk's very useful performance test patch [1].

However, I could not find any discussion on why that extension (or
something providing a similar feature but different implementation) was
not merged into mainline. If there was such a discussion, can someone
please point me to the archive[s]?

I've now merged the old patch into 2.6.12-rc1, the result can be found
attached to this email.

[1] http://lists.logix.cz/pipermail/padlock/2004/000010.html

Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:26:36 +0000 (13:26 -0700)]

[CRYPTO]: Kill unnecessary strncpy from tcrypt

It seems that bad code tends to get copied (see test_cipher_speed). So let's
kill this idiom before it spreads any further.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:26:03 +0000 (13:26 -0700)]

[CRYPTO]: White space and coding style clean up in tcrypt

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Russell King [Wed, 22 Jun 2005 20:25:58 +0000 (21:25 +0100)]

[PATCH] ARM: Use list_for_each_entry() for dmabounce

Convert dmabounce.c to use list_for_each_entry() instead of
list_for_each() + list_entry().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Kumar Gala [Wed, 22 Jun 2005 20:10:02 +0000 (15:10 -0500)]

[PATCH] ppc32: Fix building MPC8555 CDS

Adding support for MPC8548 w/o PCI support, broke building MPC8555 CDS
by trying to remove a loop variable that was used when PCI is enabled.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org)

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]

[PATCH] NFS: Add debugging code to NFSv4 readdir

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Manoj Naik [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]

[PATCH] NFSv4: Map a couple of NFSv4 errors to EINVAL.

This shows up on running tar over NFSv4.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Manoj Naik [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]

[PATCH] NFSv4: add support for rdattr_error in NFSv4 readdir requests.

Request RDATTR_ERROR as an attribute in readdir to distinguish between a
directory being within an absent filesystem or one (or more) of its entries.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:32 +0000 (17:16 +0000)]

[PATCH] NFSv4: Clean up nfs4 lock state accounting

Ensure that lock owner structures are not released prematurely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] NLM: fix a client-side race on blocking locks.

If the lock blocks, the server may send us a GRANTED message that
races with the reply to our LOCK request. Make sure that we catch
the GRANTED by queueing up our request on the nlm_blocked list
before we send off the first LOCK rpc call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] NLM: cleanup for blocked locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] VFS: Ensure that all the on-stack struct file_lock call fl_release_private

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] NFS: Replace nfs_page insertion sort with a radix sort

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree