]> err.no Git - linux-2.6/log
linux-2.6
17 years agox86_64: ia32 ptrace THREAD_AREA fix
Roland McGrath [Tue, 6 Nov 2007 23:30:38 +0000 (15:30 -0800)]
x86_64: ia32 ptrace THREAD_AREA fix

The addr argument to PTRACE_GET_THREAD_AREA and PTRACE_SET_THREAD_AREA is
not a magic constant.  It's derived from the segment register values being
used, which are computed originally from the index used with set_thread_area.
The value does not need to match what a native i386 kernel would accept.
It needs to match the segment selectors that can actually be in use in this
32-bit process.  The 64-bit ptrace support for PTRACE_GET_THREAD_AREA
(normally used only on 32-bit processes) is correct, but the 32-bit emulation
of ptrace is broken.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years agovoyager: use struct instead of PARAM
Randy Dunlap [Sat, 10 Nov 2007 03:30:36 +0000 (04:30 +0100)]
voyager: use struct instead of PARAM

Use struct boot_params instead of PARAM + 0xoffsets.
Fixes one of many Voyager build problems.

arch/x86/kernel/setup_32.c:543: error: 'PARAM' undeclared (first use in this function)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years ago[FUTEX] Fix address computation in compat code.
David Miller [Wed, 7 Nov 2007 05:13:56 +0000 (21:13 -0800)]
[FUTEX] Fix address computation in compat code.

compat_exit_robust_list() computes a pointer to the
futex entry in userspace as follows:

(void __user *)entry + futex_offset

'entry' is a 'struct robust_list __user *', and
'futex_offset' is a 'compat_long_t' (typically a 's32').

Things explode if the 32-bit sign bit is set in futex_offset.

Type promotion sign extends futex_offset to a 64-bit value before
adding it to 'entry'.

This triggered a problem on sparc64 running 32-bit applications which
would lock up a cpu looping forever in the fault handling for the
userspace load in handle_futex_death().

Compat userspace runs with address masking (wherein the cpu zeros out
the top 32-bits of every effective address given to a memory operation
instruction) so the sparc64 fault handler accounts for this by
zero'ing out the top 32-bits of the fault address too.

Since the kernel properly uses the compat_uptr interfaces, kernel side
accesses to compat userspace work too since they will only use
addresses with the top 32-bit clear.

Because of this compat futex layer bug we get into the following loop
when executing the get_user() load near the top of handle_futex_death():

1) load from address '0xfffffffff7f16bd8', FAULT
2) fault handler clears upper 32-bits, processes fault
   for address '0xf7f16bd8' which succeeds
3) goto #1

I want to thank Bernd Zeimetz, Josip Rodin, and Fabio Massimo Di Nitto
for their tireless efforts helping me track down this bug.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Fri, 9 Nov 2007 23:28:11 +0000 (15:28 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] IOSAPIC bogus error cleanup
  [IA64] Update printing of feature set bits
  [IA64] Fix IOSAPIC delivery mode setting
  [IA64] XPC heartbeat timer function must run on CPU 0
  [IA64] Clean up /proc/interrupts output
  [IA64] Disable/re-enable CPE interrupts on Altix
  [IA64] Clean-up McKinley Errata message
  [IA64] Add gate.lds to list of files ignored by Git
  [IA64] Fix section mismatch in contig.c version of per_cpu_init()
  [IA64] Wrong args to memset in efi_gettimeofday()
  [IA64] Remove duplicate includes from ia32priv.h
  [IA64] fix number of bytes zeroed by sys_fw_init() in arch/ia64/hp/sim/boot/fw-emu.c
  [IA64] Fix perfmon sysctl directory modes

17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
Linus Torvalds [Fri, 9 Nov 2007 23:27:54 +0000 (15:27 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  sched: proper prototype for kernel/sched.c:migration_init()
  sched: avoid large irq-latencies in smp-balancing
  sched: fix copy_namespace() <-> sched_fork() dependency in do_fork
  sched: clean up the wakeup preempt check, #2
  sched: clean up the wakeup preempt check
  sched: wakeup preemption fix
  sched: remove PREEMPT_RESTRICT
  sched: turn off PREEMPT_RESTRICT
  KVM: fix !SMP build error
  x86: make nmi_cpu_busy() always defined
  x86: make ipi_handler() always defined
  sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC
  sched: reintroduce SMP tunings again
  sched: restore deterministic CPU accounting on powerpc
  sched: fix delay accounting regression
  sched: reintroduce the sched_min_granularity tunable
  sched: documentation: place_entity() comments
  sched: fix vslice

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
Linus Torvalds [Fri, 9 Nov 2007 23:25:29 +0000 (15:25 -0800)]
Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits)
  sh: remove dead config symbols from SH code
  sh: Kill off broken snapgear ds1302 code.
  sh: Add a dummy vga.h.
  rtc: rtc-sh: Zero out tm value for invalid rtc states.
  rtc: sh-rtc: Handle rtc_device_register() failure properly.
  sh: Fix heartbeart on Solution Engine series
  sh: Remove SCI_NPORTS from sh-sci.h
  sh: Fix up PAGE_KERNEL_PCC() for nommu.
  sh: hs7751rvoip: Kill off dead IPR IRQ mappings.
  sh: hs7751rvoip: irq.c needs linux/interrupt.h.
  sh: Kill off __{copy,clear}_user_page().
  sh: Optimized copy_{to,from}_user_page() for SH-4.
  sh: Wire up clear_user_highpage().
  sh: Kill off the remaining ST40 cruft.
  superhyway: Handle device_register() retval properly.
  sh: kgdb sysrq depends on magic sysrq.
  sh: Add -Werror for clean directories.
  sh: Fix up kgdb build with modular sh-sci.
  sh: Export __{s,u}divsi3_i4i on all CPUs.
  sh: Fix up kgdb-on-NMI branch target.
  ...

17 years agoMerge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Fri, 9 Nov 2007 23:24:19 +0000 (15:24 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] pxa: fix one-shot timer mode
  [ARM] 4645/1: Cyberpro: Trivial fix to restore 16bpp mode.
  [ARM] 4644/2: fix flush_kern_tlb_range() in module space
  [ARM] Allow watchdog drivers to be selected again
  [ARM] 4633/1: omap build fix when FB enabled
  [ARM] 4642/2: netX: default config for netx based boards
  [ARM] 4641/2: netX: fix kobject_name type
  [ARM] Fix iop3xx macro

17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 9 Nov 2007 23:19:54 +0000 (15:19 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  [LIB] crc32c: Keep intermediate crc state in cpu order

17 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Fri, 9 Nov 2007 23:17:49 +0000 (15:17 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Add UNPLUG traces to all appropriate places
  block: fix requeue handling in blk_queue_invalidate_tags()
  mmc: Fix sg helper copy-and-paste error
  pktcdvd: fix BUG caused by sysfs module reference semantics change
  ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
  cfq_idle_class_timer: add paranoid checks for jiffies overflow
  cfq: fix IOPRIO_CLASS_IDLE delays
  cfq: fix IOPRIO_CLASS_IDLE accounting

17 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Linus Torvalds [Fri, 9 Nov 2007 23:16:52 +0000 (15:16 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (37 commits)
  [POWERPC] EEH: Make sure warning message is printed
  [POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC
  [POWERPC] windfarm: Fix windfarm thread freezer interaction
  [POWERPC] Fix si_addr value on low level hash failures
  [POWERPC] Refresh ppc64_defconfig and enable pasemi-related options
  [POWERPC] pasemi: Update defconfig
  [POWERPC] iSeries: Fix ref counting in vio setup
  [POWERPC] ] Fix memset size error
  [POWERPC] Fix link errors for allyesconfig
  [POWERPC] iSeries_init_IRQ non-PCI tidy
  [POWERPC] Change fallocate to match unistd.h on powerpc
  [POWERPC] EEH: Avoid crash on null device
  [POWERPC] EEH: Drivers that need reset trump others
  [POWERPC] EEH: Clean up comments
  [POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2)
  [POWERPC] Fix switch_slb handling of 1T ESID values
  [POWERPC] Fix build failure when CONFIG_VIRT_CPU_ACCOUNTING is not defined
  [POWERPC] Include udbg.h when using udbg_printf
  [POWERPC] Fix cache line vs. block size confusion
  [POWERPC] Fix sysctl table check failure on PowerMac
  ...

17 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 9 Nov 2007 23:11:58 +0000 (15:11 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: fix rename vs unlink race
  [PATCH] Fix possibly too long write in o2hb_setup_one_bio()
  ocfs2: fix write() performance regression
  ocfs2: Commit journal on sync writes
  ocfs2: Re-order iput in ocfs2_drop_dentry_lock
  ocfs2: Create locks at initially requested level
  [PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}
  [2.6 patch] make ocfs2_find_entry_el() static

17 years agofrv: Remove bogus NO_IRQ = -1 define
Alan Cox [Wed, 7 Nov 2007 16:53:00 +0000 (16:53 +0000)]
frv: Remove bogus NO_IRQ = -1 define

The old NO_IRQ define some platforms had was long ago declared obsolete
and wrong. FRV should therefore not be re-introducing this, especially as
IRQs are usually unsigned in the kernel. The "no IRQ" case is defined to be
zero and Linus made this rather clear at the time.

arch/frv shows no dependancy on this but it might show up driver fixes
needing doing I guess

Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Fri, 9 Nov 2007 23:08:37 +0000 (15:08 -0800)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Use "is_power_of_2" macro for simplicity.
  [SPARC]: Remove duplicate includes.

17 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 9 Nov 2007 23:07:57 +0000 (15:07 -0800)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
  [NETLINK]: Fix unicast timeouts
  [INET]: Remove per bucket rwlock in tcp/dccp ehash table.
  [IPVS]: Synchronize closing of Connections
  [IPVS]: Bind connections on stanby if the destination exists
  [NET]: Remove Documentation/networking/pt.txt
  [NET]: Remove Documentation/networking/routing.txt
  [NET]: Remove Documentation/networking/ncsa-telnet
  [NET]: Remove comx driver docs.
  [NET]: Remove Documentation/networking/Configurable
  [NET]: Clean proto_(un)register from in-code ifdefs
  [IPSEC]: Fix crypto_alloc_comp error checking
  [VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl
  [NETNS]: Fix compiler error in net_namespace.c
  [TTY]: Use tty_mode_ioctl() in network drivers.
  [TTY]: Fix network driver interactions with TCGET/SET calls.
  [PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.
  [NET]: Removing duplicit #includes
  [NET]: Let USB_USBNET always select MII.
  [RRUNNER]: Do not muck with sysctl_{r,w}mem_max
  [DLM] lowcomms: Do not muck with sysctl_rmem_max.
  ...

17 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Fri, 9 Nov 2007 23:04:12 +0000 (15:04 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: fw-sbp2: fix refcounting

17 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Fri, 9 Nov 2007 23:02:43 +0000 (15:02 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: add more validity checks on policy load
  SELinux: fix bug in new ebitmap code.
  SELinux: suppress a warning for 64k pages.

17 years agoFRV: Remove the section annotation on free_initmem()
David Howells [Tue, 6 Nov 2007 21:54:44 +0000 (21:54 +0000)]
FRV: Remove the section annotation on free_initmem()

Remove the section annotation on FRV's free_initmem().  It can't be marked
__init, lest it free itself.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosched: proper prototype for kernel/sched.c:migration_init()
Adrian Bunk [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: proper prototype for kernel/sched.c:migration_init()

This patch adds a proper prototype for migration_init() in
include/linux/sched.h

Since there's no point in always returning 0 to a caller that doesn't check
the return value it also changes the function to return void.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: avoid large irq-latencies in smp-balancing
Peter Zijlstra [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: avoid large irq-latencies in smp-balancing

SMP balancing is done with IRQs disabled and can iterate the full rq.
When rqs are large this can cause large irq-latencies. Limit the nr of
iterations on each run.

This fixes a scheduling latency regression reported by the -rt folks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix copy_namespace() <-> sched_fork() dependency in do_fork
Srivatsa Vaddagiri [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: fix copy_namespace() <-> sched_fork() dependency in do_fork

Sukadev Bhattiprolu reported a kernel crash with control groups.
There are couple of problems discovered by Suka's test:

- The test requires the cgroup filesystem to be mounted with
  atleast the cpu and ns options (i.e both namespace and cpu
  controllers are active in the same hierarchy).

# mkdir /dev/cpuctl
# mount -t cgroup -ocpu,ns none cpuctl
(or simply)
# mount -t cgroup none cpuctl -> Will activate all controllers
 in same hierarchy.

- The test invokes clone() with CLONE_NEWNS set. This causes a a new child
  to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
  cgroup_clone) and the child is attached to the new group (cgroup_clone->
  attach_task->sched_move_task). At this point in time, the child's scheduler
  related fields are uninitialized (including its on_rq field, which it has
  inherited from parent). As a result sched_move_task thinks its on
  runqueue, when it isn't.

  As a solution to this problem, I moved sched_fork() call, which
  initializes scheduler related fields on a new task, before
  copy_namespaces(). I am not sure though whether moving up will
  cause other side-effects. Do you see any issue?

- The second problem exposed by this test is that task_new_fair()
  assumes that parent and child will be part of the same group (which
  needn't be as this test shows). As a result, cfs_rq->curr can be NULL
  for the child.

  The solution is to test for curr pointer being NULL in
  task_new_fair().

With the patch below, I could run ns_exec() fine w/o a crash.

Reported-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up the wakeup preempt check, #2
Ingo Molnar [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: clean up the wakeup preempt check, #2

clean up the preemption check to not use unnecessary 64-bit
variables. This improves code size:

   text    data     bss     dec     hex filename
  44227    3326      36   47589    b9e5 sched.o.before
  44201    3326      36   47563    b9cb sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up the wakeup preempt check
Ingo Molnar [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: clean up the wakeup preempt check

clean up the wakeup preemption check. No code changed:

   text    data     bss     dec     hex filename
  44227    3326      36   47589    b9e5 sched.o.before
  44227    3326      36   47589    b9e5 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: wakeup preemption fix
Ingo Molnar [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: wakeup preemption fix

wakeup preemption fix: do not make it dependent on p->prio.
Preemption purely depends on ->vruntime.

This improves preemption in mixed-nice-level workloads.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: remove PREEMPT_RESTRICT
Ingo Molnar [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: remove PREEMPT_RESTRICT

remove PREEMPT_RESTRICT. (this is a separate commit so that any
regression related to the removal itself is bisectable)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: turn off PREEMPT_RESTRICT
Ingo Molnar [Fri, 9 Nov 2007 21:39:39 +0000 (22:39 +0100)]
sched: turn off PREEMPT_RESTRICT

PREEMPT_RESTRICT was a method aimed at reducing the amount of wakeup
related preemption. It has a disadvantage though, it can prevent
legitimate wakeups if a task is 'unlucky' to be hit too early by a tick
that clears peer_preempt.

Now that the wakeup preemption has been cleaned up we dont seem to have
excessive preemptions anymore, so this feature can be turned off. (and
removed in the next patch)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoKVM: fix !SMP build error
Ingo Molnar [Fri, 9 Nov 2007 21:39:38 +0000 (22:39 +0100)]
KVM: fix !SMP build error

fix a !SMP build error:

drivers/kvm/kvm_main.c: In function 'kvm_flush_remote_tlbs':
drivers/kvm/kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask'

(and also avoid unused function warning related to up_smp_call_function()
not making use of the 'func' parameter.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agox86: make nmi_cpu_busy() always defined
Ingo Molnar [Fri, 9 Nov 2007 21:39:38 +0000 (22:39 +0100)]
x86: make nmi_cpu_busy() always defined

nmi_cpu_busy() must be available on !SMP too.

this is in preparation to a smp_call_function_mask() fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agox86: make ipi_handler() always defined
Ingo Molnar [Fri, 9 Nov 2007 21:39:38 +0000 (22:39 +0100)]
x86: make ipi_handler() always defined

prepare for up_smp_call_function() to ensure that the 'func'
pointer is unused. (which is related to a KVM build fix)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC
Eric Dumazet [Fri, 9 Nov 2007 21:39:38 +0000 (22:39 +0100)]
sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC

1) hardcoded 1000000000 value is used five times in places where
   NSEC_PER_SEC might be more readable.

2) A conversion from nsec to msec uses the hardcoded 1000000 value,
   which is a candidate for NSEC_PER_MSEC.

no code changed:

    text    data     bss     dec     hex filename
   44359    3326      36   47721    ba69 sched.o.before
   44359    3326      36   47721    ba69 sched.o.after

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: reintroduce SMP tunings again
Ingo Molnar [Fri, 9 Nov 2007 21:39:38 +0000 (22:39 +0100)]
sched: reintroduce SMP tunings again

Yanmin Zhang reported an aim7 regression and bisected it down to:

 |  commit 38ad464d410dadceda1563f36bdb0be7fe4c8938
 |  Author: Ingo Molnar <mingo@elte.hu>
 |  Date:   Mon Oct 15 17:00:02 2007 +0200
 |
 |     sched: uniform tunings
 |
 |     use the same defaults on both UP and SMP.

fix this by reintroducing similar SMP tunings again. This resolves
the regression.

(also update the comments to match the ilog2(nr_cpus) tuning effect)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: restore deterministic CPU accounting on powerpc
Paul Mackerras [Fri, 9 Nov 2007 21:39:38 +0000 (22:39 +0100)]
sched: restore deterministic CPU accounting on powerpc

Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
broken on powerpc, because we end up counting user time twice: once in
timer_interrupt() and once in update_process_times().

This fixes the problem by pulling the code in update_process_times
that updates utime and stime into a separate function called
account_process_tick.  If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
there is a version of account_process_tick in kernel/timer.c that
simply accounts a whole tick to either utime or stime as before.  If
CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
implement account_process_tick.

This also lets us simplify the s390 code a bit; it means that the s390
timer interrupt can now call update_process_times even when
CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
suitable account_process_tick().

account_process_tick() now takes the task_struct * as an argument.
Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix delay accounting regression
Balbir Singh [Fri, 9 Nov 2007 21:39:37 +0000 (22:39 +0100)]
sched: fix delay accounting regression

Fix the delay accounting regression introduced by commit
75d4ef16a6aa84f708188bada182315f80aab6fa. rq no longer has sched_info
data associated with it. task_struct sched_info structure is used by delay
accounting to provide back statistics to user space.

also remove direct use of sched_clock() (which is not a valid thing to
do anymore) and use rq->clock instead.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: reintroduce the sched_min_granularity tunable
Peter Zijlstra [Fri, 9 Nov 2007 21:39:37 +0000 (22:39 +0100)]
sched: reintroduce the sched_min_granularity tunable

we lost the sched_min_granularity tunable to a clever optimization
that uses the sched_latency/min_granularity ratio - but the ratio
is quite unintuitive to users and can also crash the kernel if the
ratio is set to 0. So reintroduce the min_granularity tunable,
while keeping the ratio maintained internally.

no functionality changed.

[ mingo@elte.hu: some fixlets. ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: documentation: place_entity() comments
Peter Zijlstra [Fri, 9 Nov 2007 21:39:37 +0000 (22:39 +0100)]
sched: documentation: place_entity() comments

Add a few comments to place_entity(). No code changed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix vslice
Peter Zijlstra [Fri, 9 Nov 2007 21:39:37 +0000 (22:39 +0100)]
sched: fix vslice

vslice was missing a factor NICE_0_LOAD, as weight is in
weight*NICE_0_LOAD units.

the effect of this bug was larger initial slices and
thus latency-noisier forks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years ago[IA64] IOSAPIC bogus error cleanup
George Beshers [Thu, 11 Oct 2007 19:33:55 +0000 (15:33 -0400)]
[IA64] IOSAPIC bogus error cleanup

On Altix (sn2) machines the "Error parsing MADT" message is
misleading because the lack of IOSAPIC entries is expected.

Since I am sure someone will ask, I have been told that
the chance of this changing anytime soon is close to nil.

Signed-off-by: George Beshers <gbeshers@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
17 years ago[IA64] Update printing of feature set bits
Russ Anderson [Tue, 16 Oct 2007 22:02:38 +0000 (17:02 -0500)]
[IA64] Update printing of feature set bits

Newer Itanium versions have added additional processor feature set
bits.  This patch prints all the implemented feature set bits.  Some
bit descriptions have not been made public.  For those bits, a generic
"Feature set X bit Y" message is printed.  Bits that are not implemented
will no longer be printed.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
17 years ago[IA64] Fix IOSAPIC delivery mode setting
Kenji Kaneshige [Wed, 7 Nov 2007 06:38:30 +0000 (15:38 +0900)]
[IA64] Fix IOSAPIC delivery mode setting

Fix the problem that redirect hit bit in I/O SAPIC RTE is set even
when it must be disabled (e.g. nointroute boot option is set, CPU
hotplug is enabled or percpu vector is enabled).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
17 years ago[IA64] XPC heartbeat timer function must run on CPU 0
Dean Nelson [Wed, 7 Nov 2007 13:53:06 +0000 (07:53 -0600)]
[IA64] XPC heartbeat timer function must run on CPU 0

Currently, XPC's heartbeat timer function runs on whatever CPU modprobe/insmod
ran on when XPC was started. To avoid the heartbeat from being delayed for
long periods the timer function must run on CPU 0.

N.B. Altix doesn't currently allow cpu0 to be taken offline, so this is
safe for now.  This code must be revised when offline of cpu0 is enabled.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
17 years agoAdd UNPLUG traces to all appropriate places
Alan D. Brunelle [Wed, 7 Nov 2007 19:26:56 +0000 (14:26 -0500)]
Add UNPLUG traces to all appropriate places

Added blk_unplug interface, allowing all invocations of unplugs to result
in a generated blktrace UNPLUG.

Signed-off-by: Alan D. Brunelle <Alan.Brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: fix requeue handling in blk_queue_invalidate_tags()
Jens Axboe [Fri, 9 Nov 2007 11:52:45 +0000 (12:52 +0100)]
block: fix requeue handling in blk_queue_invalidate_tags()

Credit goes to juergen.kadidlo@exasol.com for diagnosing this issue
and supplying the initial patch.

blk_queue_invalidate_tags() must use the proper requeueing paths instead
of open coding the re-add of the request, otherwise we bug out in rq
accounting. Just switch to using blk_requeue_request(), that takes care
of end-tag handling as well and also adds the blktrace REQUEUE notify
event that is also appropriate here.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6
Paul Mundt [Fri, 9 Nov 2007 08:59:53 +0000 (17:59 +0900)]
Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6

17 years ago[ARM] pxa: fix one-shot timer mode
Russell King [Thu, 8 Nov 2007 23:35:46 +0000 (23:35 +0000)]
[ARM] pxa: fix one-shot timer mode

One-shot timer mode on PXA has various bugs which prevent kernels
build with NO_HZ enabled booting.  They end up spinning on a
permanently asserted timer interrupt because we don't properly
clear it down - clearing the OIER bit does not stop the pending
interrupt status.  Fix this in the set_mode handler as well.

Moreover, the code which sets the next expiry point may race with
the hardware, and we might not set the match register sufficiently
in the future.  If we encounter that situation, return -ETIME so
the generic time code retries.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years ago[ARM] 4645/1: Cyberpro: Trivial fix to restore 16bpp mode.
Jan Rinze [Thu, 8 Nov 2007 20:51:05 +0000 (21:51 +0100)]
[ARM] 4645/1: Cyberpro: Trivial fix to restore 16bpp mode.

Cyberpro: when user requests 16bpp, use it and not 24bpp.
There was a missing break causing requests for 16bpp mode
to end up in 24bpp mode.

Signed-off-by: Jan Rinze Peterzon <janrinze@home.nl>
Acked-by: Ralph Siemsen <ralphs@netwinder.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agosh: remove dead config symbols from SH code
Jiri Olsa [Thu, 8 Nov 2007 19:45:29 +0000 (04:45 +0900)]
sh: remove dead config symbols from SH code

Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
17 years ago[LIB] crc32c: Keep intermediate crc state in cpu order
Benny Halevy [Thu, 8 Nov 2007 13:34:09 +0000 (21:34 +0800)]
[LIB] crc32c: Keep intermediate crc state in cpu order

crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.

Signed-off-by: Benny Halevy <bhalevy@fs1.bhalevy.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
17 years agommc: Fix sg helper copy-and-paste error
Roland Dreier [Thu, 8 Nov 2007 12:50:58 +0000 (13:50 +0100)]
mmc: Fix sg helper copy-and-paste error

Commit 45711f1a ("[SG] Update drivers to use sg helpers") had the
following bogus change in drivers/mmc/card/queue.c:

    > - src_buf = page_address(src->page) + src->offset;
    > + src_buf = sg_virt(dst);

(Notice that "src" is converted to "dst").  Turn this "dst" back into
the intended "src".

Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Tested-by: Romano Giannetti <romano.giannetti@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[ARM] 4644/2: fix flush_kern_tlb_range() in module space
Kevin Hilman [Thu, 8 Nov 2007 00:48:16 +0000 (01:48 +0100)]
[ARM] 4644/2: fix flush_kern_tlb_range() in module space

For kernel addresses between TASK_SIZE and PAGE_OFFSET,
flush_tlb_kern_range() does not work as would be expected.

The TLB invalidate works with a matching ASID, or on entries marked as
global.  The set_pte_at() macro marks addresses >= PAGE_OFFSET as
global, but not addresses from TASK_SIZE to PAGE_OFFSET, which are
also kernel addresses.

The result is that the entries in this range are not actually
invalidated by flush_tlb_kern_range().

This patch instead marks addresses >= TASK_SIZE as global.

Signed-off-by: Satoru Fujii <s-fujii@ct.jp.nec.com>
Signed-off-by: Kevin Hilman <khilman@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agoMerge branch 'page_colouring_despair'
Paul Mundt [Thu, 8 Nov 2007 08:01:42 +0000 (17:01 +0900)]
Merge branch 'page_colouring_despair'

17 years agopktcdvd: fix BUG caused by sysfs module reference semantics change
Tejun Heo [Thu, 8 Nov 2007 07:00:24 +0000 (08:00 +0100)]
pktcdvd: fix BUG caused by sysfs module reference semantics change

pkt_setup_dev() expects module reference to be held on invocation.
This used to be true for sysfs callbacks but not anymore.  Test and
grab module reference around pkt_setup_dev() in
class_pktcdvd_store_add().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoMerge branch 'for-2.6.24' of master.kernel.org:/pub/scm/linux/kernel/git/jwboyer...
Paul Mackerras [Thu, 8 Nov 2007 03:28:14 +0000 (14:28 +1100)]
Merge branch 'for-2.6.24' of master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx into merge

17 years ago[POWERPC] EEH: Make sure warning message is printed
Linas Vepstas [Wed, 7 Nov 2007 20:03:53 +0000 (07:03 +1100)]
[POWERPC] EEH: Make sure warning message is printed

Fix old buglet; a warning message should have been printed
when a hardware reset takes too long.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC
Johannes Berg [Wed, 7 Nov 2007 12:59:44 +0000 (23:59 +1100)]
[POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC

This makes the altivec code in swsusp_32.S depend on CONFIG_ALTIVEC to
avoid build failures for systems that don't have altivec. I'm not sure
whether the code will actually work for other systems, but it was merged
for just ppc32 rather than powermac a very long time ago.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] windfarm: Fix windfarm thread freezer interaction
Johannes Berg [Wed, 7 Nov 2007 10:18:02 +0000 (21:18 +1100)]
[POWERPC] windfarm: Fix windfarm thread freezer interaction

When I fixed the windfarm freezer interaction first in commit
1ed2ddf380e19dafeec2150ca709ef7f4a67cd21, an earlier patch than the one
I came up with after comments was committed. This has come back to haunt
us now because commit d5d8c5976d6adeddb8208c240460411e2198b393 changed
the freezer to no long send signals. Fix it by removing the windfarm
thread's signal logic and restoring the original try_to_freeze().

We could simply revert 1ed2ddf380e19dafeec2150ca709ef7f4a67cd21 now
but I feel that the assertion that no signal is delivered to the
windfarm thread needs not be there.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix si_addr value on low level hash failures
Benjamin Herrenschmidt [Wed, 7 Nov 2007 06:17:02 +0000 (17:17 +1100)]
[POWERPC] Fix si_addr value on low level hash failures

If the low level MMU hash table insertion returns an error (which
can happen in some rare circumstances when the hypervisor refuses
the insertion of a PTE, typically if you try to access junk via
/dev/mem), the generated signal had an incorrect si_addr value due
to a bug in the assembly, which was loading it as a 32 bits quantity
instead of a 64 bits quantity.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Refresh ppc64_defconfig and enable pasemi-related options
Olof Johansson [Wed, 7 Nov 2007 05:53:58 +0000 (16:53 +1100)]
[POWERPC] Refresh ppc64_defconfig and enable pasemi-related options

Refresh ppc64_defconfig, add PPC_PASEMI and various options that the
common boards there need:

* Chip drivers (iommu, ethernet, IDE, CF, EDAC, MDIO/PHY)
* PCMCIA
* PATA_PCMCIA
* RTC_CLASS
* SATA_MV
* SATA_SIL24
* IP_PNP + NFS_ROOT for diskless booting

+ possibly some other things I might have missed to list

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] pasemi: Update defconfig
Olof Johansson [Wed, 7 Nov 2007 05:48:35 +0000 (16:48 +1100)]
[POWERPC] pasemi: Update defconfig

Update pasemi_defconfig.  Add a few missing options for default devices
on electra boards, enable tickless and hrtimers, etc, etc.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] iSeries: Fix ref counting in vio setup
Stephen Rothwell [Tue, 6 Nov 2007 06:26:42 +0000 (17:26 +1100)]
[POWERPC] iSeries: Fix ref counting in vio setup

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] ] Fix memset size error
Li Zefan [Mon, 5 Nov 2007 02:21:56 +0000 (13:21 +1100)]
[POWERPC] ] Fix memset size error

The size passing to memset is wrong.

Signed-off-by Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix link errors for allyesconfig
Stephen Rothwell [Sun, 4 Nov 2007 02:28:39 +0000 (13:28 +1100)]
[POWERPC] Fix link errors for allyesconfig

An allyesconfig build creates a .text section that is so big that the
.text.init.refok and .fixup sections are too far away for the relocations
to be fixed up correctly. This patch fixes that by linking all the
relevent text sections for each file together.

Suggested by Paul Mackerras.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] iSeries_init_IRQ non-PCI tidy
Stephen Rothwell [Sun, 4 Nov 2007 02:22:46 +0000 (13:22 +1100)]
[POWERPC] iSeries_init_IRQ non-PCI tidy

ppc_md.init_IRQ is not called if it is NULL, so we don't need an empty
routine in the non PCI case.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Change fallocate to match unistd.h on powerpc
Patrick Mansfield [Sat, 3 Nov 2007 17:42:03 +0000 (04:42 +1100)]
[POWERPC] Change fallocate to match unistd.h on powerpc

Fix the fallocate system call on powerpc to match its unistd.h.

This implies none of these system calls are currently working with the
unistd.h sys call values:
fallocate
signalfd
timerfd
eventfd
sync_file_range2

Signed-off-by: Patrick Mansfield <patmans@us.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] EEH: Avoid crash on null device
Linas Vepstas [Fri, 2 Nov 2007 20:29:01 +0000 (07:29 +1100)]
[POWERPC] EEH: Avoid crash on null device

Bugfix: avoid crash if there's no PCI device for a given
openfirmware node.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] EEH: Drivers that need reset trump others
Linas Vepstas [Fri, 2 Nov 2007 20:27:50 +0000 (07:27 +1100)]
[POWERPC] EEH: Drivers that need reset trump others

Bugfix: if a driver controlling one part of a multi-function PCI card
has asked for a reset, honor that request above all others.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] EEH: Clean up comments
Linas Vepstas [Fri, 2 Nov 2007 20:25:55 +0000 (07:25 +1100)]
[POWERPC] EEH: Clean up comments

Clean up commentary, remove dead code.

Signed-off-by Linas Vepstas <linas@austin.ibm.com>

Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2)
Paul Mackerras [Wed, 31 Oct 2007 11:25:35 +0000 (22:25 +1100)]
[POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2)

The decrementer in Book E and 4xx processors interrupts on the
transition from 1 to 0, rather than on the 0 to -1 transition as on
64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors.  At the
moment we subtract 1 from the count of how many decrementer ticks are
required before the next interrupt before putting it into the
decrementer, which is correct for server/classic processors, but could
possibly cause the interrupt to happen too early on Book E and 4xx if
the timebase/decrementer frequency is low.

This fixes the problem by making set_dec subtract 1 from the count for
server and classic processors, instead of having the callers subtract
1.  Since set_dec already had a bunch of ifdefs to handle different
processor types, there is no net increase in ugliness. :)

Note that calling set_dec(0) may not generate an interrupt on some
processors.  To make sure that decrementer_set_next_event always calls
set_dec with an interval of at least 1 tick, we set min_delta_ns of
the decrementer_clockevent to correspond to 2 ticks (2 rather than 1
to compensate for truncations in the conversions between ticks and
ns).

This also removes a redundant call to set the decrementer to
0x7fffffff - it was already set to that earlier in timer_interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix switch_slb handling of 1T ESID values
will schmidt [Tue, 30 Oct 2007 18:59:33 +0000 (05:59 +1100)]
[POWERPC] Fix switch_slb handling of 1T ESID values

Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc, stack, and
unmapped_base within switch_slb().   A new helper function called
esids_match() contains the logic for deciding when to call GET_ESID
and GET_ESID_1T.

This fixes a duplicate-slb-entry inspired machine-check exception I
was seeing when trying to run java on a power6 partition.

Tested on power6 and power5.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix build failure when CONFIG_VIRT_CPU_ACCOUNTING is not defined
Tony Breeds [Tue, 30 Oct 2007 03:55:04 +0000 (14:55 +1100)]
[POWERPC] Fix build failure when CONFIG_VIRT_CPU_ACCOUNTING is not defined

Without this patch I get the following build failure
  CC      arch/powerpc/platforms/celleb/setup.o
arch/powerpc/platforms/celleb/setup.c:151: error: 'generic_calibrate_decr' undeclared here (not in a function)

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Include udbg.h when using udbg_printf
will schmidt [Mon, 29 Oct 2007 19:24:19 +0000 (06:24 +1100)]
[POWERPC] Include udbg.h when using udbg_printf

This fixes the error
error: implicit declaration of function "udbg_printf"

We have a few spots where we reference udbg_printf() without #including
udbg.h.  These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix cache line vs. block size confusion
Benjamin Herrenschmidt [Sat, 27 Oct 2007 21:49:28 +0000 (08:49 +1100)]
[POWERPC] Fix cache line vs. block size confusion

We had an historical confusion in the kernel between cache line
and cache block size. The former is an implementation detail of
the L1 cache which can be useful for performance optimisations,
the later is the actual size on which the cache control
instructions operate, which can be different.

For some reason, we had a weird hack reading the right property
on powermac and the wrong one on any other 64 bits (32 bits is
unaffected as it only uses the cputable for cache block size
infos at this stage).

This fixes the booting-without-of.txt documentation to mention
the right properties, and fixes the 64 bits initialization code
to look for the block size first, with a fallback to the line
size if the property is missing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix sysctl table check failure on PowerMac
Alexey Dobriyan [Sat, 27 Oct 2007 18:34:53 +0000 (05:34 +1100)]
[POWERPC] Fix sysctl table check failure on PowerMac

kernel was marked with 0755. Everywhere else it's 0555.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix CONFIG_SMP=n build break
Olof Johansson [Sat, 27 Oct 2007 17:28:51 +0000 (04:28 +1100)]
[POWERPC] Fix CONFIG_SMP=n build break

Fix two build errors on powerpc allyesconfig + CONFIG_SMP=n:

arch/powerpc/platforms/built-in.o: In function `cpu_affinity_set':
arch/powerpc/platforms/cell/spu_priv1_mmio.c:78: undefined reference to `.iic_get_target_id'
arch/powerpc/platforms/built-in.o: In function `iic_init_IRQ':
arch/powerpc/platforms/cell/interrupt.c:397: undefined reference to `.iic_setup_cpu'

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] bootwrapper: Revert ps3 binary flag usage, and remove .bin suffix
Scott Wood [Wed, 24 Oct 2007 16:56:28 +0000 (02:56 +1000)]
[POWERPC] bootwrapper: Revert ps3 binary flag usage, and remove .bin suffix

The ps3 target produces two images, and the binary one is not the
"primary" image that corresponds to the -o flag; thus, it no longer
uses the generic binary flag.

On platforms which do use the binary flag, it no longer produces a
.bin suffix, so that the output file matches what was passed to the -o flag.

This should fix the zImage ln problems for the ps3 target.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix mv643xx_pci sysfs .read and .write functions
Dale Farnsworth [Tue, 23 Oct 2007 19:20:32 +0000 (05:20 +1000)]
[POWERPC] Fix mv643xx_pci sysfs .read and .write functions

Commit 91a69029 introduced an additional parameter to the .read and .write
methods for sysfs binary attributes.  Two mv64x60_pci functions
were missed in that patch, resulting in these errors:
/cache/git/linux-2.6/arch/powerpc/sysdev/mv64x60_pci.c:77: warning: initialization from incompatible pointer type
/cache/git/linux-2.6/arch/powerpc/sysdev/mv64x60_pci.c:78: warning: initialization from incompatible pointer type

Add the missing "struct bin_attribute *" parameter.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] i8259: Add disable method
Aurelien Jarno [Tue, 23 Oct 2007 13:06:38 +0000 (23:06 +1000)]
[POWERPC] i8259: Add disable method

Since commit 76d2160147f43f982dfe881404cfde9fd0a9da21, the NE2000 card
is not working anymore on PPC and POWERPC and produces WATCHDOG
timeouts.

The patch below fixes that the same way it has been done on x86, x86_64
and MIPS.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Read back MSI message in rtas_setup_msi_irqs() so restore works
Michael Ellerman [Tue, 23 Oct 2007 04:23:44 +0000 (14:23 +1000)]
[POWERPC] Read back MSI message in rtas_setup_msi_irqs() so restore works

There are plans afoot to use pci_restore_msi_state() to restore MSI
state after a device reset.  In order for this to work for the RTAS MSI
backend, we need to read back the MSI message from config space after
it has been setup by firmware.

This should be sufficient for restoring the MSI state after a device
reset, however we will need to revisit this for suspend to disk if that
is ever implemented on pseries.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years ago[POWERPC] Fix build break in arch/ppc/syslib/m8260_setup.c
Olof Johansson [Sat, 20 Oct 2007 03:39:26 +0000 (13:39 +1000)]
[POWERPC] Fix build break in arch/ppc/syslib/m8260_setup.c

Fix build break and warnings in current mainline git:

arch/ppc/syslib/m8260_setup.c: In function 'm8260_setup_arch':
arch/ppc/syslib/m8260_setup.c:63: error: implicit declaration of function 'identify_ppc_sys_by_name_and_id'
arch/ppc/syslib/m8260_setup.c:64: warning: passing argument 1 of 'in_be32' makes pointer from integer without a cast
arch/ppc/syslib/m8260_setup.c: In function 'm8260_show_cpuinfo':
arch/ppc/syslib/m8260_setup.c:158: warning: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%d' expects type 'int', but argument 6 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 8 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 9 has type 'long unsigned int'
make[1]: *** [arch/ppc/syslib/m8260_setup.o] Error 1
make[1]: *** Waiting for unfinished jobs....

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17 years agosh: Kill off broken snapgear ds1302 code.
Paul Mundt [Thu, 8 Nov 2007 02:24:33 +0000 (11:24 +0900)]
sh: Kill off broken snapgear ds1302 code.

This will force the snapgear boards to use the on-chip SH RTC instead,
until the rtc-ds1302 driver is merged. The current code is broken
and hasn't built in some time, so just kill it off and get the board
working again.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
17 years agoSELinux: add more validity checks on policy load
Stephen Smalley [Wed, 7 Nov 2007 15:08:00 +0000 (10:08 -0500)]
SELinux: add more validity checks on policy load

Add more validity checks at policy load time to reject malformed
policies and prevent subsequent out-of-range indexing when in permissive
mode.  Resolves the NULL pointer dereference reported in
https://bugzilla.redhat.com/show_bug.cgi?id=357541.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: fix bug in new ebitmap code.
KaiGai Kohei [Tue, 6 Nov 2007 16:17:16 +0000 (01:17 +0900)]
SELinux: fix bug in new ebitmap code.

The "e_iter = e_iter->next;" statement in the inner for loop is primally
bug.  It should be moved to outside of the for loop.

Signed-off-by: KaiGai Kohei <kaigai@kaigai.gr.jp>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: suppress a warning for 64k pages.
Stephen Rothwell [Wed, 31 Oct 2007 05:47:19 +0000 (16:47 +1100)]
SELinux: suppress a warning for 64k pages.

On PowerPC allmodconfig build we get this:

security/selinux/xfrm.c:214: warning: comparison is always false due to limited range of data type

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>
17 years ago[ARM] Allow watchdog drivers to be selected again
Russell King [Wed, 7 Nov 2007 14:13:35 +0000 (14:13 +0000)]
[ARM] Allow watchdog drivers to be selected again

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agoioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
Jens Axboe [Wed, 7 Nov 2007 12:54:07 +0000 (13:54 +0100)]
ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting

Normally io priorities follow the CPU nice, unless a specific scheduling
class has been set. Once that is set, there's no way to reset the
behaviour to 'none' so that it follows CPU nice again.

Currently passing in 0 as the ioprio class/value will return -1/EINVAL,
change that to allow resetting of a set scheduling class.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocfq_idle_class_timer: add paranoid checks for jiffies overflow
Oleg Nesterov [Wed, 7 Nov 2007 12:51:35 +0000 (13:51 +0100)]
cfq_idle_class_timer: add paranoid checks for jiffies overflow

In theory, if the queue was idle long enough, cfq_idle_class_timer may have
a false (and very long) timeout because jiffies can wrap into the past wrt
->last_end_request.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years ago[NETLINK]: Fix unicast timeouts
Patrick McHardy [Wed, 7 Nov 2007 10:42:09 +0000 (02:42 -0800)]
[NETLINK]: Fix unicast timeouts

Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.

ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[INET]: Remove per bucket rwlock in tcp/dccp ehash table.
Eric Dumazet [Wed, 7 Nov 2007 10:40:20 +0000 (02:40 -0800)]
[INET]: Remove per bucket rwlock in tcp/dccp ehash table.

As done two years ago on IP route cache table (commit
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPVS]: Synchronize closing of Connections
Rumen G. Bogdanovski [Wed, 7 Nov 2007 10:36:55 +0000 (02:36 -0800)]
[IPVS]: Synchronize closing of Connections

This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.

Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPVS]: Bind connections on stanby if the destination exists
Rumen G. Bogdanovski [Wed, 7 Nov 2007 10:35:54 +0000 (02:35 -0800)]
[IPVS]: Bind connections on stanby if the destination exists

This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski

Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
17 years ago[NET]: Remove Documentation/networking/pt.txt
Adrian Bunk [Wed, 7 Nov 2007 10:30:43 +0000 (02:30 -0800)]
[NET]: Remove Documentation/networking/pt.txt

There's no no point in keeping documentation for a driver that was
removed many years ago.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Remove Documentation/networking/routing.txt
Adrian Bunk [Wed, 7 Nov 2007 10:30:03 +0000 (02:30 -0800)]
[NET]: Remove Documentation/networking/routing.txt

This file is so outdated that I can't see any value in keeping it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Remove Documentation/networking/ncsa-telnet
Adrian Bunk [Wed, 7 Nov 2007 10:29:33 +0000 (02:29 -0800)]
[NET]: Remove Documentation/networking/ncsa-telnet

Newsflash: There once was a version of NCSA telnet that had some bug.

Spotted by Pekka Pietikainen.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Remove comx driver docs.
Adrian Bunk [Wed, 7 Nov 2007 10:28:52 +0000 (02:28 -0800)]
[NET]: Remove comx driver docs.

The drivers have already been removed 3.5 years ago.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Remove Documentation/networking/Configurable
Adrian Bunk [Wed, 7 Nov 2007 10:26:15 +0000 (02:26 -0800)]
[NET]: Remove Documentation/networking/Configurable

After more than 11 years this file does no longer contain much useful
information.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Clean proto_(un)register from in-code ifdefs
Pavel Emelyanov [Wed, 7 Nov 2007 10:23:38 +0000 (02:23 -0800)]
[NET]: Clean proto_(un)register from in-code ifdefs

The struct proto has the per-cpu "inuse" counter, which is handled
with a special care. All the handling code hides under the ifdef
CONFIG_SMP and it introduces some code duplication and makes it
look worse than it could.

Clean this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPSEC]: Fix crypto_alloc_comp error checking
Herbert Xu [Wed, 7 Nov 2007 10:21:47 +0000 (02:21 -0800)]
[IPSEC]: Fix crypto_alloc_comp error checking

The function crypto_alloc_comp returns an errno instead of NULL
to indicate error.  So it needs to be tested with IS_ERR.

This is based on a patch by Vicenç Beltran Querol.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl
Patrick McHardy [Wed, 7 Nov 2007 09:31:32 +0000 (01:31 -0800)]
[VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl

Based on report and patch by Doug Kehn <rdkehn@yahoo.com>:

vconfig returns the following error when attempting to execute the
set_ingress_map command:

vconfig: socket or ioctl error for set_ingress_map: Operation not permitted

In vlan.c, vlan_ioctl_handler for SET_VLAN_INGRESS_PRIORITY_CMD
sets err = -EPERM and calls vlan_dev_set_ingress_priority.
vlan_dev_set_ingress_priority is a void function so err remains
at -EPERM and results in the vconfig error (even though the ingress
map was set).

Fix by setting err = 0 after the vlan_dev_set_ingress_priority call.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETNS]: Fix compiler error in net_namespace.c
Johann Felix Soden [Wed, 7 Nov 2007 09:30:30 +0000 (01:30 -0800)]
[NETNS]: Fix compiler error in net_namespace.c

Because net_free is called by copy_net_ns before its declaration, the
compiler gives an error. This patch puts net_free before copy_net_ns
to fix this.

The compiler error:
net/core/net_namespace.c: In function 'copy_net_ns':
net/core/net_namespace.c:97: error: implicit declaration of function 'net_free'
net/core/net_namespace.c: At top level:
net/core/net_namespace.c:104: warning: conflicting types for 'net_free'
net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration
net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here

The error was introduced by the '[NET]: Hide the dead code in the
net_namespace.c' patch (6a1a3b9f686bb04820a232cc1657ef2c45670709).

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TTY]: Use tty_mode_ioctl() in network drivers.
Alan Cox [Wed, 7 Nov 2007 09:27:34 +0000 (01:27 -0800)]
[TTY]: Use tty_mode_ioctl() in network drivers.

We conciously make a change here - we permit mode and speed setting to
be done in things like SLIP mode. There isn't actually a technical
reason to disallow this. It's usually a silly thing to do but we can
do it and soemone might wish to do so.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TTY]: Fix network driver interactions with TCGET/SET calls.
Alan Cox [Wed, 7 Nov 2007 09:24:56 +0000 (01:24 -0800)]
[TTY]: Fix network driver interactions with TCGET/SET calls.

Dave Miller noted various cases where line disciplines for things like
ppp go poking around in termios themselves in ways that broke with the
new termios code. Rather than have them all learning about termios
internals provide proper methods for this

- tty_mode_ioctl()

This handles all the terminal mode handling for speed/carrier
etc and none of the methods are ldisc dependant so they can be called
by any user

- tty_perform_flush()

This extracts the flush functionality and enables pppd the ppp
layer to share it cleanly.

The existing n_tty_ioctl code is refactored in this patch to provide
the new functions and to call them itself appropriately. This patch
has no (intended) behaviour changes and simply prepares for the other
fixes.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.
Radu Rendec [Wed, 7 Nov 2007 09:20:12 +0000 (01:20 -0800)]
[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.

While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).

The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).

Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.

This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.

One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.

Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.

The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.

With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.

Signed-off-by: David S. Miller <davem@davemloft.net>