]> err.no Git - linux-2.6/log
linux-2.6
17 years agoIB/ipath: Change UD to queue work requests like RC & UC
Ralph Campbell [Wed, 25 Jul 2007 18:08:28 +0000 (11:08 -0700)]
IB/ipath: Change UD to queue work requests like RC & UC

The code to post UD sends tried to process work requests at the time
ib_post_send() is called without using a WQE queue.  This was fine as
long as HW resources were available for sending a packet.  This patch
changes UD to be handled more like RC and UC and shares more code.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: Performance optimization for CPU differences
Ralph Campbell [Tue, 24 Jul 2007 20:55:39 +0000 (13:55 -0700)]
IB/ipath: Performance optimization for CPU differences

Different processors have different ordering restrictions for write
combining.  By taking advantage of this, we can eliminate some write
barriers when writing to the send buffers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: iba6110 rev4 GPIO counters support
Arthur Jones [Thu, 2 Aug 2007 21:46:29 +0000 (14:46 -0700)]
IB/ipath: iba6110 rev4 GPIO counters support

On iba6110 rev4, support for three more IB counters were added.  The
LocalLinkIntegrityError counter, the ExcessiveBufferOverrunErrors
counter and support for error counting of flow control packets on an
invalid VL.  These counters trigger GPIO interrupts and the sw keeps
track of the counts.  Since we also use GPIO interrupts to signal packet
reception, we need to turn off the fast interrupts, or we risk losing a
GPIO interrupt.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Fix clipping of device limits to INT_MAX
Roland Dreier [Wed, 10 Oct 2007 02:59:18 +0000 (19:59 -0700)]
IB/ehca: Fix clipping of device limits to INT_MAX

Doing min_t(int, foo, INT_MAX) doesn't work correctly, because if foo
is bigger than INT_MAX, then when treated as a signed integer, it will
become negative and hence such an expression is just an elaborate NOP.

Fix such cases in ehca to do min_t(unsigned, foo, INT_MAX) instead.
This fixes negative reported values for max_cqe, max_pd and max_ah:

Before:

        max_cqe:                        -64
        max_pd:                         -1
        max_ah:                         -1

After:
        max_cqe:                        2147483647
        max_pd:                         2147483647
        max_ah:                         2147483647

Based on a bug report and fix from Anton Blanchard <anton@samba.org>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()
Dotan Barak [Sun, 7 Oct 2007 07:30:48 +0000 (09:30 +0200)]
IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()

Make the way QP is being created in ipoib_cm_create_tx_qp()
consistent with ipoib_cm_create_rx_qp().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Use mmiowb() to avoid firmware commands getting jumbled up
Roland Dreier [Wed, 10 Oct 2007 02:59:18 +0000 (19:59 -0700)]
mlx4_core: Use mmiowb() to avoid firmware commands getting jumbled up

Firmware commands are sent to the HCA by writing multiple words to a
command register block.  Access to this block of registers is
serialized with a mutex.  However, on large SGI systems writes to the
register block may be reordered within the system interconnect and
reach the HCA in a different order than they were issued (even with
the mutex).  Fix this by adding an mmiowb() before dropping the mutex.

This bug was observed with real workloads with the similar FW command
code in the mthca driver, and adding the mmiowb() as in commit
66547550 ("IB/mthca: Use mmiowb() to avoid firmware commands getting
jumbled up") was confirmed to fix the problems, so we should add the
same fix to mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up
Roland Dreier [Wed, 10 Oct 2007 02:59:17 +0000 (19:59 -0700)]
IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up

Firmware commands are sent to the HCA by writing multiple words to a
command register block.  Access to this block of registers is
serialized with a mutex.  However, on large SGI systems, problems were
seen with multiple CPUs issuing FW commands at the same time, because
the writes to the register block may be reordered within the system
interconnect and reach the HCA in a different order than they were
issued (even with the mutex).  Fix this by adding an mmiowb() before
dropping the mutex.

Tested-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries
Sean Hefty [Wed, 1 Aug 2007 21:47:16 +0000 (14:47 -0700)]
RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries

Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment.  This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/cm: Modify interface to send MRAs in response to duplicate messages
Sean Hefty [Wed, 1 Aug 2007 20:49:53 +0000 (13:49 -0700)]
IB/cm: Modify interface to send MRAs in response to duplicate messages

The IB CM provides a message received acknowledged (MRA) message that
can be sent to indicate that a REQ or REP message has been received, but
will require more time to process than the timeout specified by those
messages.  In many cases, the application may not know how long it will
take to respond to a CM message, but the majority of the time, it will
usually respond before a retry has been sent.  Rather than sending an
MRA in response to all messages just to handle the case where a longer
timeout is needed, it is more efficient to queue the MRA for sending in
case a duplicate message is received.

This avoids sending an MRA when it is not needed, but limits the number
of times that a REQ or REP will be resent.  It also provides for a
simpler implementation than generating the MRA based on a timer event.
(That is, trying to send the MRA after receiving the first REQ or REP if
a response has not been generated, so that it is received at the remote
side before a duplicate REQ or REP has been received)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Increase max number of QPs per multicast group to 56
Roland Dreier [Wed, 10 Oct 2007 02:59:17 +0000 (19:59 -0700)]
IB/mthca: Increase max number of QPs per multicast group to 56

Increase the number of QPs allowed per multicast group from 8 to 56.
This allows for one QP per core on 16-core systems, which are now
quite common, and allows some space for future growth.

This is basically the same patch that Jack Morgenstein
<jackm@dev.mellanox.co.il> just supplied for mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Increase max number of QPs per multicast group to 56
Jack Morgenstein [Tue, 2 Oct 2007 07:40:13 +0000 (09:40 +0200)]
mlx4_core: Increase max number of QPs per multicast group to 56

Increase the number of QPs allowed per multicast group from 8 to 56.
This allows for one QP per core on 16-core systems, which are now
quite common, and allows some space for future growth.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Implement FMRs
Jack Morgenstein [Wed, 1 Aug 2007 09:29:05 +0000 (12:29 +0300)]
IB/mlx4: Implement FMRs

Implement FMRs for mlx4.  This is an adaptation of code from mthca.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Write MTTs from CPU instead with of WRITE_MTT FW command
Jack Morgenstein [Wed, 1 Aug 2007 09:28:53 +0000 (12:28 +0300)]
mlx4_core: Write MTTs from CPU instead with of WRITE_MTT FW command

Write MTT entries directly to ICM from the driver (eliminating use of
WRITE_MTT command).  This reduces the number of FW commands needed to
register an MR by at least a factor of 2 and speeds up memory
registration significantly.  This code will also be used to implement
FMRs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Fix meaning of dev->caps.reserved_mtts
Roland Dreier [Wed, 10 Oct 2007 02:59:16 +0000 (19:59 -0700)]
mlx4_core: Fix meaning of dev->caps.reserved_mtts

Everything that uses caps.reserved_mtts expects it to be a count of MTT
segments, not MTT entries.  So convert the value that the FW gives us to
a count of segments.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Reserve the correct number of MTT segments
Roland Dreier [Wed, 10 Oct 2007 02:59:16 +0000 (19:59 -0700)]
mlx4_core: Reserve the correct number of MTT segments

Taking ilog2(dev->caps.reserved_mtts) to find out the order to pass to
the MTT buddy allocator will do the wrong thing if reserved_mtts is ever
not a power of 2.  Be safe and use fls(dev->caps.reserved_mtts - 1).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Support ICM tables in coherent memory
Jack Morgenstein [Wed, 1 Aug 2007 09:28:20 +0000 (12:28 +0300)]
mlx4_core: Support ICM tables in coherent memory

Enable having ICM tables in coherent memory, and use coherent memory
for the dMPT table.  This will allow writing MPT entries for MRs both
via the SW2HW_MPT command and also directly by the driver for FMR
remapping without needing to flush or worry about cacheline boundaries.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/uverbs: Make ib_uverbs_release_event_file() static
Roland Dreier [Wed, 10 Oct 2007 02:59:15 +0000 (19:59 -0700)]
IB/uverbs: Make ib_uverbs_release_event_file() static

ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it
static to that file.  Also move the definition before the first use, so
a forward declaration is not needed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/umad: Fix bit ordering and 32-on-64 problems on big endian systems
Roland Dreier [Wed, 10 Oct 2007 02:59:15 +0000 (19:59 -0700)]
IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems

The declaration of struct ib_user_mad_reg_req.method_mask[] exported
to userspace was an array of __u32, but the kernel internally treated
it as a bitmap made up of longs.  This makes a difference for 64-bit
big-endian kernels, where numbering the bits in an array of__u32 gives:

    |31.....0|63....31|95....64|127...96|

while numbering the bits in an array of longs gives:

    |63..............0|127............64|

64-bit userspace can handle this by just treating method_mask[] as an
array of longs, but 32-bit userspace is really stuck: the meaning of
the bits in method_mask[] depends on whether the kernel is 32-bit or
64-bit, and there's no sane way for userspace to know that.

Fix this by updating <rdma/ib_user_mad.h> to make it clear that
method_mask[] is an array of longs, and using a compat_ioctl method to
convert to an array of 64-bit longs to handle the 32-on-64 problem.
This fixes the interface description to match existing behavior (so
working binaries continue to work) in almost all situations, and gives
consistent semantics in the case of 32-bit userspace that can run on
either a 32-bit or 64-bit kernel, so that the same binary can work for
both 32-on-32 and 32-on-64 systems.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/umad: Add P_Key index support
Roland Dreier [Wed, 10 Oct 2007 02:59:15 +0000 (19:59 -0700)]
IB/umad: Add P_Key index support

Add support for setting the P_Key index of sent MADs and getting the
P_Key index of received MADs.  This requires a change to the layout of
the ABI structure struct ib_user_mad_hdr, so to avoid breaking
compatibility, we default to the old (unchanged) ABI and add a new
ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
of the new ABI to opt into using it.

We plan on switching to the new ABI by default in a year or so, and
this patch adds a warning that is printed when an application uses the
old ABI, to push people towards converting to the new ABI.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@xsigo.com>
17 years agoIB/ehca: Return srq_attr->max_sge in ehca_query_srq()
Joachim Fenkes [Fri, 28 Sep 2007 15:20:05 +0000 (17:20 +0200)]
IB/ehca: Return srq_attr->max_sge in ehca_query_srq()

Totally forgot this.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Adjust 64-bit alignment of create QP response for userspace
Hoang-Nam Nguyen [Fri, 28 Sep 2007 15:18:47 +0000 (17:18 +0200)]
IB/ehca: Adjust 64-bit alignment of create QP response for userspace

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Fix mem leak of firmware ctrlblock in ehca_create_srq()
Hoang-Nam Nguyen [Fri, 28 Sep 2007 15:16:27 +0000 (17:16 +0200)]
IB/ehca: Fix mem leak of firmware ctrlblock in ehca_create_srq()

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Display misc device information under /sys/class/infiniband/
Jack Morgenstein [Tue, 18 Sep 2007 07:14:18 +0000 (09:14 +0200)]
IB/mlx4:  Display misc device information under /sys/class/infiniband/

display the following device information under /sys/class/infiniband/mlx4_X:
board_id, fw_ver, hw_rev, hca_type.

This patch makes this information available to userspace utilities
such as ibstat and ibv_devinfo.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/core: Fix handling of multicast response failures
Ralph Campbell [Thu, 20 Sep 2007 23:33:44 +0000 (16:33 -0700)]
IB/core: Fix handling of multicast response failures

I was looking at the code for multicast.c and noticed that
ib_sa_join_multicast() calls queue_join() which puts the
request at the front of the group->pending_list.  If this
is a second request, it seems like it would interfere with
process_join_error() since group->last_join won't point
to the member at the head of the pending_list. The sequence
would thus be:

1. ib_sa_join_multicast()
   puts member1 on head of pending_list and starts work thread
2. mcast_work_handler()
   calls send_join() which sets group->last_join to member1
3. ib_sa_join_multicast()
   puts member2 on head of pending_list
4. join operation for member1 receives failures response from SA.
5. join_handler() is called with error status
6. process_join_error() fails to process member1 since
   it doesn't match the first entry in the group->pending_list.

The impact is that the failed join request is tossed.  The second
request is processed, and after it completes, the original request ends
up being retried.

This change also results in join requests being processed in FIFO
order.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Misc cpuinit section annotations and #ifdef cleanups
Satyam Sharma [Wed, 22 Aug 2007 23:28:30 +0000 (04:58 +0530)]
IB/ehca: Misc cpuinit section annotations and #ifdef cleanups

* Replace {un}register_cpu_notifier with {un}register_hotcpu_notifier
  thereby losing a couple of #ifdef HOTPLUG_CPU pairs.
* Move comp_pool_callback_nb declaration to below that of callback
  function so that initialization of .notifier_call and .priority can
  occur at build time itself and not runtime.
* Mark the notifier_block (and callback function, and another static
  function used by it) as __cpuinit{data} for the sake of consistency
  and remove enclosing #ifdef. (This may increase size for modular
  build of this module, however, because these are no longer dropped
  unconditionally now.)

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Change capability decoding: SRC->XRC
Roland Dreier [Wed, 10 Oct 2007 02:59:13 +0000 (19:59 -0700)]
mlx4_core: Change capability decoding: SRC->XRC

The SRC ("scalable RC") transport has been renamed to XRC ("extended
RC"), to avoid having an abbreviation that is so easily confused with an
abbreviation for "source."  Update the HCA capability decoding output to
use the new name.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/iser: Remove unnecessary includes
Roland Dreier [Wed, 10 Oct 2007 02:59:13 +0000 (19:59 -0700)]
IB/iser: Remove unnecessary includes

<asm/scatterlist.h> is not needed because everyplace it appears,
<linux/scatterlist.h> also appears.  <asm/io.h> is not needed because
nothing seems to be using device IO anyway.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cma: Use neigh_event_send() to start neighbour discovery
Steve Wise [Wed, 12 Sep 2007 10:00:25 +0000 (05:00 -0500)]
RDMA/cma: Use neigh_event_send() to start neighbour discovery

Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all
this.  Without doing full ND, RDMA address resolution fails in the
presence of dropped ARP broadcast packets.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Only use MR large pages for hugetlb regions
Joachim Fenkes [Thu, 13 Sep 2007 16:16:20 +0000 (18:16 +0200)]
IB/ehca: Only use MR large pages for hugetlb regions

...because, on virtualized hardware like System p, we can't be sure
that the physical pages behind them are contiguous otherwise.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/umem: Add hugetlb flag to struct ib_umem
Joachim Fenkes [Thu, 13 Sep 2007 16:15:28 +0000 (18:15 +0200)]
IB/umem: Add hugetlb flag to struct ib_umem

During ib_umem_get(), determine whether all pages from the memory
region are hugetlb pages and report this in the "hugetlb" member.
Low-level drivers can use this information if they need it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/srp: Add QoS support through service ID
Sean Hefty [Wed, 8 Aug 2007 22:51:18 +0000 (15:51 -0700)]
IB/srp: Add QoS support through service ID

Provide the target service ID when performing a path record query to
support optional QoS capability.  QoS requires support from the SA.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/ucma: Allow user space to set service type
Sean Hefty [Wed, 8 Aug 2007 22:51:13 +0000 (15:51 -0700)]
RDMA/ucma: Allow user space to set service type

Export the ability to set the type of service to user space.  Model
the interface after setsockopt.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cma: Add ability to specify type of service
Sean Hefty [Wed, 8 Aug 2007 22:51:06 +0000 (15:51 -0700)]
RDMA/cma: Add ability to specify type of service

Provide support to specify a type of service for a communication
identifier.  A new function call is used when dealing with IPv4
addresses.  For IPv6 addresses, the ToS is specified through the
traffic class field in the sockaddr_in6 structure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
[ The comments Eitan Zahavi and myself have made over the v1 post at
  <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
  were fully addressed. ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/sa: Add new QoS fields to path record
Sean Hefty [Wed, 8 Aug 2007 22:41:28 +0000 (15:41 -0700)]
IB/sa: Add new QoS fields to path record

The QoS annex defines new fields for path records.  Add them to the
ib_sa for consumers that want to use them.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Specify Traffic Class with path record queries for QoS support
Sean Hefty [Thu, 2 Aug 2007 19:21:31 +0000 (12:21 -0700)]
IPoIB: Specify Traffic Class with path record queries for QoS support

To support QoS within and between subnets, modify IPoIB to request
specific Traffic Class values with path record queries, using
the value associated with the IPoIB broadcast group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
[ See some comments I made on this at v1 and v2 of the posts
  <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html>
  <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Fix large page HW cap defines
Hoang-Nam Nguyen [Thu, 13 Sep 2007 16:14:58 +0000 (18:14 +0200)]
IB/ehca: Fix large page HW cap defines

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Bump version number and change its format
Joachim Fenkes [Tue, 11 Sep 2007 13:35:32 +0000 (15:35 +0200)]
IB/ehca: Bump version number and change its format

Nobody needed the SVNEHCA_ prefix anyway.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Replace get_paca()->paca_index by the more portable raw_smp_processor_id()
Joachim Fenkes [Wed, 12 Sep 2007 14:44:11 +0000 (16:44 +0200)]
IB/ehca: Replace get_paca()->paca_index by the more portable raw_smp_processor_id()

We can use raw_smp_processor_id() here because the processor ID is
only used for debug output and therefore our use is preemption-unsafe.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Serialize MR alloc and MR free hvCalls
Joachim Fenkes [Tue, 11 Sep 2007 13:34:35 +0000 (15:34 +0200)]
IB/ehca: Serialize MR alloc and MR free hvCalls

Some firmware levels exhibit a race condition between H_ALLOC_RESOURCE(MR)
and H_FREE_RESOURCE(MR).  Work around this problem by locking these hvCalls
against each other.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Path migration support
Joachim Fenkes [Tue, 11 Sep 2007 13:34:04 +0000 (15:34 +0200)]
IB/ehca: Path migration support

Fix some modify_qp() issues related to path migration.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Add check for max #SGE to create_qp()
Joachim Fenkes [Tue, 11 Sep 2007 13:33:40 +0000 (15:33 +0200)]
IB/ehca: Add check for max #SGE to create_qp()

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: ehca_gen_warn() should always print
Joachim Fenkes [Tue, 11 Sep 2007 13:32:50 +0000 (15:32 +0200)]
IB/ehca: ehca_gen_warn() should always print

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Print return codes as signed decimal integers
Joachim Fenkes [Tue, 11 Sep 2007 13:32:22 +0000 (15:32 +0200)]
IB/ehca: Print return codes as signed decimal integers

...because -12 is easier to read than FFFFFFF4.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Refactor hvcall tracing
Joachim Fenkes [Tue, 11 Sep 2007 13:31:49 +0000 (15:31 +0200)]
IB/ehca: Refactor hvcall tracing

Change hvcall trace output towards better readability: reg numbers
instead of argument numbers, return code as signed decimal instead of
unsigned hex.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Use remap_4k_pfn() to map firmware contexts to user space
Hoang-Nam Nguyen [Tue, 11 Sep 2007 13:31:06 +0000 (15:31 +0200)]
IB/ehca: Use remap_4k_pfn() to map firmware contexts to user space

Use Paul's new remap_4k_pfn() function to map our 4K firmware contexts
into user space on 64K-page machines without exposing neighboring
firmware contexts. Return the context's offset within a 64K page to
user space so it can determine the proper virtual address.

For details about remap_4k_pfn(), see commit 721151d0 or
http://patchwork.ozlabs.org/linuxppc/patch?id=10281

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Support more than 4k QPs for userspace and kernelspace
Stefan Roscher [Tue, 11 Sep 2007 13:29:39 +0000 (15:29 +0200)]
IB/ehca: Support more than 4k QPs for userspace and kernelspace

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Small QP userspace support
Stefan Roscher [Tue, 11 Sep 2007 13:26:33 +0000 (15:26 +0200)]
IB/ehca: Small QP userspace support

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Use PCI-X/PCI-Express read control interfaces
Peter Oruba [Fri, 10 Aug 2007 20:54:33 +0000 (13:54 -0700)]
IB/mthca: Use PCI-X/PCI-Express read control interfaces

These driver changes incorporate the proposed PCI-X / PCI-Express read
byte count interface.  Reading and setting those values doesn't take
place "manually", instead wrapping functions are called to allow
quirks for some PCI bridges.

Signed-off by: Peter Oruba <peter.oruba@amd.com>
Based on work by Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/sa: Error handling thinko fix
Ali Ayoub [Sun, 9 Sep 2007 11:55:11 +0000 (14:55 +0300)]
IB/sa: Error handling thinko fix

ib_create_send_mad() returns an error code pointer on error, not NULL.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Export module parameters in sysfs
Anton Blanchard [Wed, 29 Aug 2007 17:43:01 +0000 (12:43 -0500)]
IB/ehca: Export module parameters in sysfs

At the moment the ehca module parameters are not exported in sysfs.
Export them with 0444 permissions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Make output clearer by removing some debug messages
Anton Blanchard [Wed, 29 Aug 2007 16:05:35 +0000 (11:05 -0500)]
IB/ehca: Make output clearer by removing some debug messages

ehca spits out a lot of debugging information. I had to look closely to
see the "Port 1 is not active" message within all the debug:

eHCA Infiniband Device Driver (Rel.: SVNEHCA_0022)
eHCA scaling code enabled
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_define_sqp Port 1 is not active.
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_create_qp ehca_define_sqp() failed rc=ffffffffffffffff
ib_mad: Couldn't create ib_mad QP1
ib_mad: Couldn't open ehca0 port 1
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_alloc_fmr unsupported fmr_attr->page_shift=9
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_alloc_fmr rc=ffffffffffffffea pd=c000000b4b5b2420 mr_access_flags=7 fmr_attr=c0000005afd37394
fmr_create failed for FMR 0

Remove a few debug statements so that things are clearer:

eHCA Infiniband Device Driver (Rel.: SVNEHCA_0022)
eHCA scaling code enabled
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_define_sqp Port 1 is not active.
ib_mad: Couldn't create ib_mad QP1
ib_mad: Couldn't open ehca0 port 1
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_alloc_fmr unsupported fmr_attr->page_shift=9
fmr_create failed for FMR 0

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Fix up SRQ limit_watermark endianness
Roland Dreier [Wed, 10 Oct 2007 02:59:06 +0000 (19:59 -0700)]
IB/mlx4: Fix up SRQ limit_watermark endianness

mlx4_srq_query() returns a big-endian 16-bit value through an int *,
which screws up sparse checking.  Fix this so that a CPU-endian value
is returned.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Fix error path memory leak
Eli Cohen [Tue, 21 Aug 2007 15:46:10 +0000 (18:46 +0300)]
IPoIB: Fix error path memory leak

Clean up properly if ib_query_pkey() or ib_query_gid() fail.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Fix typo to end statement with ';' instead of ','
Eli Cohen [Wed, 10 Oct 2007 02:59:06 +0000 (19:59 -0700)]
IPoIB: Fix typo to end statement with ';' instead of ','

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Enable MSI-X by default
Michael S. Tsirkin [Tue, 7 Aug 2007 13:10:34 +0000 (16:10 +0300)]
IB/mthca: Enable MSI-X by default

Recover from MSI-X errors by automatically falling back on regular
interrupt, instead of asking the user to do this manually.  This makes
it possible to enable MSI-X by default, and will make it possible to
get rid of the msi_x module option in the future.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Enable MSI-X by default
Michael S. Tsirkin [Tue, 7 Aug 2007 13:08:28 +0000 (16:08 +0300)]
mlx4_core: Enable MSI-X by default

Recover from MSI-X errors by automatically falling back on regular
interrupt, instead of asking the user to do this manually.  This makes
it possible to enable MSI-X by default, and will make it possible to
get rid of the msi_x module option in the future.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/fmr_pool: Clean up some error messages in fmr_pool.c
Anton Blanchard [Wed, 29 Aug 2007 13:36:22 +0000 (08:36 -0500)]
IB/fmr_pool: Clean up some error messages in fmr_pool.c

A number of printks in fmr_pool.c dont have newlines, eg:

    fmr_create failed for FMR 0<5>FS-Cache: Loaded

Fix them up.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Include <linux/mutex.h> from ehca_classes.h
Roland Dreier [Wed, 10 Oct 2007 02:59:05 +0000 (19:59 -0700)]
IB/ehca: Include <linux/mutex.h> from ehca_classes.h

ehca_classes.h uses struct mutex, so while <linux/mutex.h> seems to be
pulled in indirectly by one of the headers it includes, the right
thing is to include <linux/mutex.h> directly.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Use __set_data_seg() in mlx4_ib_post_recv()
Roland Dreier [Wed, 10 Oct 2007 02:59:05 +0000 (19:59 -0700)]
IB/mlx4: Use __set_data_seg() in mlx4_ib_post_recv()

Use a __set_data_seg() helper in mlx4_ib_post_recv() too; in addition
to making the code easier to read, this also allows gcc to generate
better code -- on x86_64:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8)
function                                     old     new   delta
mlx4_ib_post_recv                            359     351      -8

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Don't free special QPs in QP number bitmap
Roland Dreier [Wed, 10 Oct 2007 02:59:05 +0000 (19:59 -0700)]
mlx4_core: Don't free special QPs in QP number bitmap

Special QPs are not allocated using the regular QP number bitmap, so
when they are destroyed, their QP number should not be freed in the
bitmap.

Found by Dotan Barak of Mellanox.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Use enum value GO_BIT_TIMEOUT_MSECS
Dotan Barak [Tue, 7 Aug 2007 08:18:52 +0000 (11:18 +0300)]
mlx4_core: Use enum value GO_BIT_TIMEOUT_MSECS

Rename GO_BIT_TIMEOUT to GO_BIT_TIMEOUT_MSECS for clarity, and
actually use it as the go bit timeout (instead of having the define
but then ignoring it and using a hard-coded 10 * HZ for the actual
timeout).

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB: find_first_zero_bit() takes unsigned pointer
Roland Dreier [Wed, 10 Oct 2007 02:59:04 +0000 (19:59 -0700)]
IB: find_first_zero_bit() takes unsigned pointer

Fix sparse warning

    drivers/infiniband/core/device.c:142:6: warning: incorrect type in argument 1 (different signedness)
    drivers/infiniband/core/device.c:142:6:    expected unsigned long const *addr
    drivers/infiniband/core/device.c:142:6:    got long *[assigned] inuse

by making the local variable inuse unsigned.  Does not affect generated
code at all.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Make sure no receives are handled when stopping device
Roland Dreier [Wed, 10 Oct 2007 02:59:04 +0000 (19:59 -0700)]
IPoIB: Make sure no receives are handled when stopping device

The current IPoIB code might process receive completions from
ipoib_drain_cq() when bringing down the interface.  This could cause
packets to be passed up the stack without the device's poll method
being called.  Avoid this by setting the status of any successful
completions to IB_WC_WR_FLUSH_ERR.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cxgb3: Make the iw_cxgb3 module parameters writable
Steve Wise [Sun, 29 Jul 2007 20:12:26 +0000 (15:12 -0500)]
RDMA/cxgb3: Make the iw_cxgb3 module parameters writable

Allow changing parameter values without having to reload the module.
This is safe because these parameters are only looked at when a new
connection is established.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoLinux 2.6.23 v2.6.23
Linus Torvalds [Tue, 9 Oct 2007 20:31:38 +0000 (13:31 -0700)]
Linux 2.6.23

17 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Tue, 9 Oct 2007 19:38:44 +0000 (12:38 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Au1000: set the PCI controller IO base
  [MIPS] Alchemy: Fix USB initialization.
  [MIPS] IP32: Fix fatal typo in address computation.

17 years agoNLM: Fix a memory leak in nlmsvc_testlock
Trond Myklebust [Tue, 9 Oct 2007 15:04:57 +0000 (11:04 -0400)]
NLM: Fix a memory leak in nlmsvc_testlock

The recent fix for a circular lock dependency unfortunately introduced a
potential memory leak in the event where the call to nlmsvc_lookup_host
fails for some reason.

Thanks to Roel Kluin for spotting this.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosata_mv: correct S/G table limits
Jeff Garzik [Tue, 9 Oct 2007 17:51:57 +0000 (13:51 -0400)]
sata_mv: correct S/G table limits

The recent mv_fill_sg() rewrite, to fix a data corruption problem
related to IOMMU virtual merging, forgot to account for the
potentially-increased size of the scatter/gather table after its run.

Additionally, the DMA boundary is reduced from 0xffffffff to 0xffff
to more closely match the needs of mv_fill_sg().

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[MIPS] Au1000: set the PCI controller IO base
Florian Fainelli [Tue, 25 Sep 2007 15:07:30 +0000 (17:07 +0200)]
[MIPS] Au1000: set the PCI controller IO base

The PCI controller IO base was not set in the au1000 pci code.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
17 years ago[MIPS] Alchemy: Fix USB initialization.
Florian Fainelli [Tue, 25 Sep 2007 15:07:24 +0000 (17:07 +0200)]
[MIPS] Alchemy: Fix USB initialization.

This patch fixes a wrong ifdef in the board setup code, leading to the GPIO
pin not being pulled high, and thus the USB switch not being powered at all.

This finishes the rename of CONFIG_USB_OHCI to CONFIG_USB_OHCI_HCD, which
started in 2005 (before 2.6.12-rc2), then probably because things were
working anyway for most people got forgotten.

[Ralf: Paolo's original patch didn't fix the module case, Florian's patch
only fixed MTX1 etc. so this is a combined patch plus some cleanups.]

Cc: Giuseppe Patanè <giuseppe.patane@tvblob.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
17 years ago[MIPS] IP32: Fix fatal typo in address computation.
Giuseppe Sacco [Sat, 6 Oct 2007 17:55:03 +0000 (19:55 +0200)]
[MIPS] IP32: Fix fatal typo in address computation.

Signed-off-by: Giuseppe Sacco <eppesuig@debian.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
17 years agoCorrect Makefile rule for generating custom keymap
Maarten Bressers [Mon, 8 Oct 2007 22:59:13 +0000 (15:59 -0700)]
Correct Makefile rule for generating custom keymap

When building a custom keymap, after setting GENERATE_KEYMAP := 1 in
drivers/char/Makefile, the kernel build fails like this:

    CC      drivers/char/vt.o
  make[2]: *** No rule to make target `drivers/char/%.map', needed by `drivers/char/defkeymap.c'.  Stop.
  make[1]: *** [drivers/char] Error 2
  make: *** [drivers] Error 2

This was caused by commit af8b128719f5248e542036ea994610a29d0642a6, which
deleted a necessary colon from the Makefile rule that generates the keymap,
since that rule contains both a target and a target-pattern.  The following
patch puts the colon back:

Signed-off-by: Maarten Bressers <mbres@gentoo.org>
Cc: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoISDN: Fix data access out of array bounds
Karsten Keil [Mon, 8 Oct 2007 10:52:09 +0000 (12:52 +0200)]
ISDN: Fix data access out of array bounds

Fix against access random data bytes outside the dev->chanmap array.
Thanks to Oliver Neukum for pointing me to this issue.

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Mon, 8 Oct 2007 19:59:10 +0000 (12:59 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [IPv6]: Fix ICMPv6 redirect handling with target multicast address
  [PKT_SCHED] cls_u32: error code isn't been propogated properly
  [ROSE]: Fix rose.ko oops on unload
  [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed

17 years agoAIO: fix cleanup in io_submit_one(...)
Yan Zheng [Mon, 8 Oct 2007 19:16:20 +0000 (12:16 -0700)]
AIO: fix cleanup in io_submit_one(...)

When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.

Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofix page release issue in filemap_fault
Yan Zheng [Mon, 8 Oct 2007 17:08:37 +0000 (10:08 -0700)]
fix page release issue in filemap_fault

find_lock_page increases page's usage count, we should decrease it
before return VM_FAULT_SIGBUS

Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofix VM_CAN_NONLINEAR check in sys_remap_file_pages
Yan Zheng [Mon, 8 Oct 2007 17:05:48 +0000 (10:05 -0700)]
fix VM_CAN_NONLINEAR check in sys_remap_file_pages

The test for VM_CAN_NONLINEAR always fails

Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: set_page_dirty_balance() vs ->page_mkwrite()
Peter Zijlstra [Mon, 8 Oct 2007 16:54:37 +0000 (18:54 +0200)]
mm: set_page_dirty_balance() vs ->page_mkwrite()

All the current page_mkwrite() implementations also set the page dirty. Which
results in the set_page_dirty_balance() call to _not_ call balance, because the
page is already found dirty.

This allows us to dirty a _lot_ of pages without ever hitting
balance_dirty_pages().  Not good (tm).

Force a balance call if ->page_mkwrite() was successful.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[IPv6]: Fix ICMPv6 redirect handling with target multicast address
Brian Haley [Mon, 8 Oct 2007 07:12:05 +0000 (00:12 -0700)]
[IPv6]: Fix ICMPv6 redirect handling with target multicast address

When the ICMPv6 Target address is multicast, Linux processes the
redirect instead of dropping it.  The problem is in this code in
ndisc_redirect_rcv():

         if (ipv6_addr_equal(dest, target)) {
                 on_link = 1;
         } else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
                 ND_PRINTK2(KERN_WARNING
                            "ICMPv6 Redirect: target address is not
link-local.\n");
                 return;
         }

This second check will succeed if the Target address is, for example,
FF02::1 because it has link-local scope.  Instead, it should be checking
if it's a unicast link-local address, as stated in RFC 2461/4861 Section
8.1:

       - The ICMP Target Address is either a link-local address (when
         redirected to a router) or the same as the ICMP Destination
         Address (when redirected to the on-link destination).

I know this doesn't explicitly say unicast link-local address, but it's
implied.

This bug is preventing Linux kernels from achieving IPv6 Logo Phase II
certification because of a recent error that was found in the TAHI test
suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the
multicast address in the Destination field instead of Target field, so
we were passing the test.  This won't be the case anymore.

The patch below fixes this problem, and also fixes ndisc_send_redirect()
to not send an invalid redirect with a multicast address in the Target
field.  I re-ran the TAHI Neighbor Discovery section to make sure Linux
passes all 245 tests now.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PKT_SCHED] cls_u32: error code isn't been propogated properly
Stephen Hemminger [Mon, 8 Oct 2007 06:57:45 +0000 (23:57 -0700)]
[PKT_SCHED] cls_u32: error code isn't been propogated properly

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[ROSE]: Fix rose.ko oops on unload
Alexey Dobriyan [Mon, 8 Oct 2007 06:44:17 +0000 (23:44 -0700)]
[ROSE]: Fix rose.ko oops on unload

Commit a3d384029aa304f8f3f5355d35f0ae274454f7cd aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.

Steps to reproduce:

modprobe rose
rmmod rose

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014c664
*pde = 00000000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
CPU:    0
EIP:    0060:[<c014c664>]    Not tainted VLI
EFLAGS: 00210086   (2.6.23-rc9 #3)
EIP is at kfree+0x48/0xa1
eax: 00000556   ebx: c1734aa0   ecx: f6a5e000   edx: f7082000
esi: 00000000   edi: f9a55d20   ebp: 00200287   esp: f6a5ef28
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000)
Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00
       00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000
       f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000
Call Trace:
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a51f3f>] rose_exit+0x4c/0xd5 [rose]
 [<c0132c60>] sys_delete_module+0x15e/0x186
 [<c014244a>] remove_vma+0x40/0x45
 [<c01025e6>] sysenter_past_esp+0x8f/0x99
 [<c012bacf>] trace_hardirqs_on+0x118/0x13b
 [<c01025b6>] sysenter_past_esp+0x5f/0x99
 =======================
Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f
EIP: [<c014c664>] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed
Ilpo Järvinen [Mon, 8 Oct 2007 06:43:10 +0000 (23:43 -0700)]
[TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed

When only GSO skb was partially ACKed, no hints are reset,
therefore fastpath_cnt_hint must be tweaked too or else it can
corrupt fackets_out. The corruption to occur, one must have
non-trivial ACK/SACK sequence, so this bug is not very often
that harmful. There's a fackets_out state reset in TCP because
fackets_out is known to be inaccurate and that fixes the issue
eventually anyway.

In case there was also at least one skb that got fully ACKed,
the fastpath_skb_hint is set to NULL which causes a recount for
fastpath_cnt_hint (the old value won't be accessed anymore),
thus it can safely be decremented without additional checking.

Reported by Cedric Le Goater <clg@fr.ibm.com>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoDriver core: fix SYSF_DEPRECATED breakage for nested classdevs
Dmitry Torokhov [Sun, 7 Oct 2007 16:22:21 +0000 (12:22 -0400)]
Driver core: fix SYSF_DEPRECATED breakage for nested classdevs

We should only reparent to a class former class devices that
form the base of class hierarchy. Nested devices should still
grow from their real parents.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Tested-by: Andrey Borzenkov <arvidjaar@mail.ru>
Tested-by: Anssi Hannula <anssi.hannula@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Sun, 7 Oct 2007 23:41:09 +0000 (16:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: point to migration document

17 years agoAdd manufacturer and card id of teltonica pcmcia modems
Attila Kinali [Sun, 7 Oct 2007 07:24:38 +0000 (00:24 -0700)]
Add manufacturer and card id of teltonica pcmcia modems

Add the manufacturer and card id of teltonica pcmcia modems to serial_cs.c

Signed-off-by: Attila Kinali <attila@kinali.ch>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosysrq docs: document sequence that actually works
Pavel Machek [Sun, 7 Oct 2007 07:24:37 +0000 (00:24 -0700)]
sysrq docs: document sequence that actually works

Document sequence of keypresses that actually works. Yes, this changed
year-or-so ago.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofix bogus reporting of signals by audit
Al Viro [Sun, 7 Oct 2007 07:24:36 +0000 (00:24 -0700)]
fix bogus reporting of signals by audit

Async signals should not be reported as sent by current in audit log.  As
it is, we call audit_signal_info() too early in check_kill_permission().
Note that check_kill_permission() has that test already - it needs to know
if it should apply current-based permission checks.  So the solution is to
move the call of audit_signal_info() between those.

Bogosity in question is easily reproduced - add a rule watching for e.g.
kill(2) from specific process (so that audit_signal_info() would not
short-circuit to nothing), say load_policy, watch the bogus OBJ_PID entry
in audit logs claiming that write(2) on selinuxfs file issued by
load_policy(8) had somehow managed to send a signal to syslogd...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMove kasprintf.o to obj-y
Alexey Dobriyan [Sun, 7 Oct 2007 07:24:34 +0000 (00:24 -0700)]
Move kasprintf.o to obj-y

Modulat lguest started giving linking errors

MODPOST 1 modules
ERROR: "kasprintf" [drivers/lguest/lg.ko] undefined!

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agolockstat: documentation
Peter Zijlstra [Sun, 7 Oct 2007 07:24:33 +0000 (00:24 -0700)]
lockstat: documentation

Provide some documentation for CONFIG_LOCK_STAT.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoLonghaul: add auto enabled "revid_errata" option
Rafal Bilski [Sun, 7 Oct 2007 07:24:32 +0000 (00:24 -0700)]
Longhaul: add auto enabled "revid_errata" option

VIA C3 Ezra-T has RevisionID equal to 1, but it needs RevisionKey to be 0
or CPU will ignore new frequency and will continue to work at old
frequency.  New "revid_errata" option will force RevisionKey to be set to
0, whatever RevisionID is.

Additionaly "Longhaul" will not silently ignore unsuccessful transition.
It will try to check if "revid_errata" or "disable_acpi_c3" options need to
be enabled for this processor/system.

Same for Longhaul ver.  2 support.  It will be disabled if none of above
options will work.

 Best case scenario (with patch apllied and v2 enabled):
 longhaul: VIA C3 'Ezra' [C5C] CPU detected.  Longhaul v2 supported.
 longhaul: Using northbridge support.
 longhaul: VRM 8.5
 longhaul: Max VID=1.350  Min VID=1.050, 13 possible voltage scales
 longhaul: f: 300000 kHz, index: 0, vid: 1050 mV
 [...]
 longhaul: Voltage scaling enabled.
 Worst case scenario:
 longhaul: VIA C3 'Ezra-T' [C5M] CPU detected.  Powersaver supported.
 longhaul: Using northbridge support.
 longhaul: Using ACPI support.
 longhaul: VRM 8.5
 longhaul: Claims to support voltage scaling but min & max are both 1.250. Voltage scaling disabled
 longhaul: Failed to set requested frequency!
 longhaul: Enabling "Ignore Revision ID" option.
 longhaul: Failed to set requested frequency!
 longhaul: Disabling ACPI C3 support.
 longhaul: Disabling "Ignore Revision ID" option.
 longhaul: Failed to set requested frequency!
 longhaul: Enabling "Ignore Revision ID" option.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoFix timer_stats printout of events/sec
Anton Blanchard [Sun, 7 Oct 2007 07:24:31 +0000 (00:24 -0700)]
Fix timer_stats printout of events/sec

When using /proc/timer_stats on ppc64 I noticed the events/sec field wasnt
accurate.  Sometimes the integer part was incorrect due to rounding (we
werent taking the fractional seconds into consideration).

The fraction part is also wrong, we need to pad the printf statement and
take the bottom three digits of 1000 times the value.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoDon't do load-average calculations at even 5-second intervals
Linus Torvalds [Sun, 7 Oct 2007 23:17:38 +0000 (16:17 -0700)]
Don't do load-average calculations at even 5-second intervals

It turns out that there are a few other five-second timers in the
kernel, and if the timers get in sync, the load-average can get
artificially inflated by events that just happen to coincide.

So just offset the load average calculation it by a timer tick.

Noticed by Anders Boström, for whom the coincidence started triggering
on one of his machines with the JBD jiffies rounding code (JBD is one of
the subsystems that also end up using a 5-second timer by default).

Tested-by: Anders Boström <anders@bostrom.dyndns.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoVT_WAITACTIVE: Avoid returning EINTR when not necessary
Linus Torvalds [Sun, 7 Oct 2007 23:02:55 +0000 (16:02 -0700)]
VT_WAITACTIVE: Avoid returning EINTR when not necessary

We should generally prefer to return ERESTARTNOHAND rather than EINTR,
so that processes with unhandled signals that get ignored don't return
EINTR.

This can help with X startup issues:

    Fatal server error:
    xf86OpenConsole: VT_WAITACTIVE failed: Interrupted system call

although the real fix is having the X server always retry EINTR
regardless (since EINTR does happen for signals that have handlers
installed). Keithp has a patch for that.

Regardless, ERESTARTNOHAND is the correct thing to use.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofirewire: point to migration document
Stefan Richter [Sun, 7 Oct 2007 10:31:22 +0000 (12:31 +0200)]
firewire: point to migration document

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
17 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Sat, 6 Oct 2007 22:47:16 +0000 (15:47 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] IP32: Enable PCI bridges

17 years agoRevert "intel_agp: fix stolen mem range on G33"
Kyle McMartin [Sat, 6 Oct 2007 05:42:34 +0000 (01:42 -0400)]
Revert "intel_agp: fix stolen mem range on G33"

This reverts commit f443675affe3f16dd428e46f0f7fd3f4d703eeab, which
breaks horribly if you aren't running an unreleased xf86-video-intel
driver out of git.

Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Zhenyu Wang <zhenyu.z.wang@intel.com>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoFix non-terminated PCI match table in PowerMac IDE
Benjamin Herrenschmidt [Sat, 6 Oct 2007 08:52:27 +0000 (18:52 +1000)]
Fix non-terminated PCI match table in PowerMac IDE

The PCI device table in the powermac IDE driver isn't properly
terminated.  Depending on how your kernel is linked and other random
factors, you can end up with this driver matched against any other PCI
device in your system, possibly crashing at boot.

Thanks to Heikki for tracking this down with me, the bug have been there
for some time, though it rarely hurts due to luck.  In this case, the
switch from .22 to .23-rc9 is causing it to show up due to differences
in the resulting layout of .data I suppose.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <pmac@au1.ibm.com>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Heikki Lindholm <holindho@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoxen: disable split pte locks for now
Jeremy Fitzhardinge [Sat, 6 Oct 2007 00:19:35 +0000 (17:19 -0700)]
xen: disable split pte locks for now

When pinning and unpinning pagetables, we must protect them against
being used by other CPUs, lest they see the pagetable in an
intermediate read-only-but-not-pinned state.

When using split pte locks, doing this properly would require taking
all the pte locks for the pagetable while pinning, but this may overflow
the PREEMPT_BITS part of the preempt counter if the process has mapped
more than about 512M of memory.

However, failing to take the pte locks causes write-protect faults when
the pageout code is trying to clear the Access bit on a pte which is part
of a freshy created and still being pinned process after fork.

This is a short-term fix until the problem is solved properly.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Keir Fraser <keir@xensource.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Fri, 5 Oct 2007 21:09:10 +0000 (14:09 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 4598/2: OSIRIS: Ensure we do not get nRSTOUT during suspend
  [ARM] 4597/2: OSIRIS: ensure CPLD0 is preserved after suspend

17 years ago[ARM] 4598/2: OSIRIS: Ensure we do not get nRSTOUT during suspend
Ben Dooks [Thu, 4 Oct 2007 22:18:08 +0000 (23:18 +0100)]
[ARM] 4598/2: OSIRIS: Ensure we do not get nRSTOUT during suspend

Ensure nRSTOUT is not asserted during or on resume.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>