Eli Cohen [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
IB/mlx4: Use kzalloc() for new QPs so flags are initialized to 0
Current code uses kmalloc() and then just does a bitwise OR operation on
qp->flags in create_qp_common(), which means that qp->flags may
potentially have some unintended bits set. This patch uses kzalloc()
and avoids further explicit clearing of structure members, which also
shrinks the code:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-65 (-65)
function old new delta
create_qp_common 2024 1959 -65
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4_core: Use MOD_STAT_CFG command to get minimal page size
There was a bug in some versions of the mlx4 driver in
mlx4_alloc_fmr(), which hardcoded the minimum acceptable page_shift to
be 12. However, new ConnectX firmware can support a minimum
page_shift of 9 (log_pg_sz of 9 returned by QUERY_DEV_LIM) -- so with
old drivers, ib_fmr_alloc() would fail for ULPs using the device
minimum when creating FMRs.
To preserve firmware compatibility with released mlx4 drivers, the
firmware will continue to return 12 as before for log_page_sz in
QUERY_DEV_CAP for these drivers. However, to enable new drivers to
take advantage of the available smaller page size, the mlx4 driver now
first sets the log_pg_sz to the device minimum by setting a
log_page_sz value to 0 via the MOD_STAT_CFG command and then reading
the real minimum via QUERY_DEV_CAP.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Or Gerlitz [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/cma: Simplify locking needed for serialization of callbacks
The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.
This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable. This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Or Gerlitz [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr
Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.
In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/cxgb3: Fixes for zero STag
Handling the zero STag in receive work request requires some extra
logic in the driver:
- Only set the QP_PRIV bit for kernel mode QPs.
- Add a zero STag build function for recv wrs. The uP needs a PBL
allocated and passed down in the recv WR so it can construct a HW
PBL for the zero STag S/G entries. Note: we need to place a few
restrictions on zero STag usage because of this:
1) all SGEs in a recv WR must either be zero STag or not. No mixing.
2) an individual SGE length cannot exceed 128MB for a zero-stag SGE.
This should be OK since it's not really practical to allocate
such a large chunk of pinned contiguous DMA mapped memory.
- Add an optimized non-zero-STag recv wr format for kernel users.
This is needed to optimize both zero and non-zero STag cracking in
the recv path for kernel users.
- Remove the iwch_ prefix from the static build functions.
- Bump required FW version.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Steve Wise [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/core: Add local DMA L_Key support
- Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
STag support and IB HCAs to indicate reserved L_Key support.
- Add a u32 local_dma_lkey member to struct ib_device. Drivers fill
this in with the appropriate local DMA L_Key (if they support it).
- Fix up the drivers using this flag.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IB/mthca: Fix check of max_send_sge for special QPs
The MLX transport requires two extra gather entries for sends (one for
the header and one for the checksum at the end, as the comment says).
However the code checked that max_recv_sge was not too big, instead of
checking max_send_sge as it should have. Fix the code to check the
correct condition.
Eli Cohen [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IPoIB: Double default RX/TX ring sizes
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks. With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput. Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IPoIB/cm: Reduce connected mode TX object size
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array. Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IB/ipath: Use IEEE OUI for vendor_id reported by ibv_query_device()
The IB spe. for SubnGet(NodeInfo) and query HCA says that the vendor
ID field should be the IEEE OUI assigned to the vendor. The ipath
driver was returning the PCI vendor ID instead. This will affect
applications which call ibv_query_device(). The old value was
0x001fc1 or 0x001077, the new value is 0x001175.
The vendor ID doesn't appear to be exported via /sys so that should
reduce possible compatibility issues. I'm only aware of Open MPI as a
major application which depends on this change, and they have made
necessary adjustments.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 15 Jul 2008 06:48:51 +0000 (23:48 -0700)]
IPoIB: Use dev_set_mtu() to change mtu
When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice. Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 15 Jul 2008 06:48:51 +0000 (23:48 -0700)]
IPoIB: Use rtnl lock/unlock when changing device flags
Use of this lock is required to synchronize changes to the netdvice's
data structs. Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 15 Jul 2008 06:48:50 +0000 (23:48 -0700)]
IPoIB: Only set Q_Key once: after joining broadcast group
The current code will set the Q_Key for any join of a non-sendonly
multicast group. The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Jon Mason [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
RDMA/cxgb3: Propagate HW page size capabilities
cxgb3 does not currently report the page size capabilities, and
incorrectly reports them internally.
This version changes the bit-shifting to a static value (per Steve's
request).
Signed-off-by: Jon Mason <jon@opengridcomputing.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Moni Shoua [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
IPoIB: Refresh paths instead of flushing them on SM change events
The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.
The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level. In most cases, SM
failover doesn't cause LID change so traffic won't stop. In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state. In fact, preventing the device from
going down saves packets that otherwise would be lost.
Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated. Make LRO controllable and LRO statistics accessible
through ethtool.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ron Livne [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
IPoIB: Use multicast loopback blocking if available
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device. This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.
Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ron Livne [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
IB/core: Add support for multicast loopback blocking
This patch also adds a creation flag for QPs,
IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that
multicast sends from the QP to a group that the QP is attached to will
not be looped back to the QP's receive queue. This can be used to
save receive resources when a consumer does not want a local copy of
multicast traffic; for example IPoIB must waste CPU time throwing away
such local copies of multicast traffic.
This patch also adds a device capability flag that shows whether a
device supports this feature or not.
Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
RDMA/core: Add iWARP protocol statistics attributes in sysfs
This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device. Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available. These stats are per-device
and more importantly -not- per port.
Details:
- Add union rdma_protocol_stats in ib_verbs.h. This union allows
defining transport-specific stats. Currently only iwarp stats are
defined.
- Add struct iw_protocol_stats to define the current set of iwarp
protocol stats.
- Add new ib_device method called get_proto_stats() to return protocol
statistics.
- Add logic in core/sysfs.c to create iwarp protocol stats attributes
if the device is an RNIC and has a get_proto_stats() method.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.
Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().
Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.
Stefan Roscher [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts
During corner case testing, we noticed that some versions of ehca do
not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI, if
EQEs are pending.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Tue, 15 Jul 2008 06:48:46 +0000 (23:48 -0700)]
IB/mlx4: Remove extra code for RESET->ERR QP state transition
Commit 65adfa91 ("IB/mlx4: Fix RESET to RESET and RESET to ERROR
transitions") added some extra code to handle a QP state transition
from RESET to ERROR. However, the latest 1.2.1 version of the IB spec
has clarified that this transition is actually not allowed, so we can
remove this extra code again.
Roland Dreier [Tue, 15 Jul 2008 06:48:46 +0000 (23:48 -0700)]
IB/mthca: Remove extra code for RESET->ERR QP state transition
Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some
extra code to handle a QP state transition from RESET to ERROR.
However, the latest 1.2.1 version of the IB spec has clarified that
this transition is actually not allowed, so we can remove this extra
code again.
Ralph Campbell [Tue, 15 Jul 2008 06:48:46 +0000 (23:48 -0700)]
IB/core: Reset to error QP state transition is not allowed
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed. This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.
Fix up the qp_state_table[] to make RESET->ERR not allowed.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
IB/mlx4: Configure QPs' max message size based on real device capability
ConnectX returns the max message size it supports through the
QUERY_DEV_CAP firmware command. When modifying a QP to RTR, the max
message size for the QP must be specified. This value must not exceed
the value declared through QUERY_DEV_CAP. The current code ignores
the max allowed size and unconditionally sets the value to 2^31. This
patch sets all QPs to the max value allowed as returned from firmware.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
RDMA/cxgb3: MEM_MGT_EXTENSIONS support
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
- set max_fast_reg_page_list_len device attribute.
- add iwch_alloc_fast_reg_mr function.
- add iwch_alloc_fastreg_pbl
- add iwch_free_fastreg_pbl
- adjust the WQ depth for kernel mode work queues to account for
fastreg possibly taking 2 WR slots.
- add fastreg_mr work request support.
- add local_inv work request support.
- add send_with_inv and send_with_se_inv work request support.
- removed useless duplicate enums/defines for TPT/MW/MR stuff.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement). The new operations are:
- Allocate an ib_mr for use in fast register work requests.
- Allocate/free a physical buffer lists for use in fast register work
requests. This allows device drivers to allocate this memory as
needed for use in posting send requests (eg via dma_alloc_coherent).
- New send queue work requests:
* send with remote invalidate
* fast register memory region
* local invalidate memory region
* RDMA read with invalidate local memory region (iWARP only)
Consumer interface details:
- A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
to indicate device support for these features.
- New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
IB_WR_RDMA_READ_WITH_INV are added.
- A new consumer API function, ib_alloc_mr() is added to allocate
fast register memory regions.
- New consumer API functions, ib_alloc_fast_reg_page_list() and
ib_free_fast_reg_page_list() are added to allocate and free
device-specific memory for fast registration page lists.
- A new consumer API function, ib_update_fast_reg_key(), is added to
allow the key portion of the R_Key and L_Key of a fast registration
MR to be updated. Consumers call this if desired before posting
a IB_WR_FAST_REG_MR work request.
Consumers can use this as follows:
- MR is allocated with ib_alloc_mr().
- Page list memory is allocated with ib_alloc_fast_reg_page_list().
- MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
- MR made VALID and bound to a specific page list via
ib_post_send(IB_WR_FAST_REG_MR)
- MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
invalidate operation.
- MR is deallocated with ib_dereg_mr()
- page lists dealloced via ib_free_fast_reg_page_list().
Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ). For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.
The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key. The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 15 Jul 2008 06:48:44 +0000 (23:48 -0700)]
IPoIB: Copy small received SKBs in connected mode
The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow. It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU. This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB. The newly allocated SKB is passed to the stack and the
old SKB is reposted.
When running netperf, UDP small messages, without this pach I get:
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
14.4.3.178 (14.4.3.178) port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
With this patch I get both send and receive at ~315 mbps.
The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced. As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs. So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted. Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets. This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):
tx packets
==========
port counter: 1,543,996
ifconfig: 1,581,426
netperf: 5,142,034
Eli Cohen [Tue, 15 Jul 2008 06:48:44 +0000 (23:48 -0700)]
IB/mlx4: Optimize QP stamping
The idea is that for QPs with fixed size work requests (eg selective
signaling QPs), before stamping the WQE, we read the value of the DS
field, which gives the effective size of the descriptor as used in the
previous post. Then we stamp only that area, since the rest of the
descriptor is already stamped.
When initializing the send queue buffer, make sure the DS field is
initialized to the max descriptor size so that the subsequent stamping
will be done on the entire descriptor area.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Moni Shoua [Tue, 15 Jul 2008 06:48:43 +0000 (23:48 -0700)]
IB/sa: Fail requests made while creating new SM AH
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH). When SM AH
becomes invalid and needs an update it is handled by the global
workqueue. On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins. Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.
This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.
The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.
For consumers, the patch doesn't make things worse. Before the patch,
MADs are sent to the wrong SM so the request gets lost. Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.
Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Tue, 15 Jul 2008 06:48:43 +0000 (23:48 -0700)]
RDMA: Fix license text
The license text for several files references a third software license
that was inadvertently copied in. Update the license to what was
intended. This update was based on a request from HP.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Tue, 15 Jul 2008 06:48:43 +0000 (23:48 -0700)]
IB/srp: Remove use of cached P_Key/GID queries
The SRP initiator is currently using ib_find_cached_pkey() and
ib_get_cached_gid() in situations where the uncached ib_find_pkey()
and ib_query_gid() functions serve just as well: sleeping is allowed
and performance is not an issue. Since we want to eliminate the
cached operations in the long term, convert SRP to use the uncached
variants.
Kumar Gala [Mon, 14 Jul 2008 13:08:45 +0000 (08:08 -0500)]
powerpc: Fix pte_update for CONFIG_PTE_64BIT and !PTE_ATOMIC_UPDATES
Because the pte is now 64-bits the compiler was optimizing the update
to always clear the upper 32-bits of the pte. We need to ensure the
clr mask is treated as an unsigned long long to get the proper behavior.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
David Woodhouse [Tue, 15 Jul 2008 01:13:10 +0000 (18:13 -0700)]
Fix accidental reference to tg3 firmware
We're not updating the tg3 driver to use request_firmware() yet, but a
reference to its firmware accidentally slipped in as part of commit c4667746 ("dabusb: use request_firmware()"). Remove it again.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Reported-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Grant Erickson [Tue, 8 Jul 2008 15:03:06 +0000 (01:03 +1000)]
ibm_newemac: Add MII mode support to the EMAC RGMII bridge.
This patch adds support to the RGMII handler in the EMAC driver for
the MII PHY mode such that device tree entries of the form `phy-mode = "mii";'
are recognized and handled appropriately.
While logically, in software, "gmii" and "mii" modes are the same,
they are wired differently, so it makes sense to allow DTS authors to
specify each explicitly.
Signed-off-by: Grant Erickson <gerickson@nuovations.com> Acked-by: Stefan Roese <sr@denx.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Don't spin on sync instruction at boot time
Push the sync below the secondary smp init hold loop and comment its purpose.
This should speed up boot by reducing global traffic during the single-threaded
portion of boot.
Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com> Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Dave Kleikamp [Wed, 9 Jul 2008 15:28:07 +0000 (01:28 +1000)]
powerpc: Remove unnecessary condition when sanity-checking WIMG bits
It is okay for both _PAGE_GUARDED and _PAGE_COHERENT (G and M) to be set
in the same pte. In fact, even if that were not the case, there doesn't
seem to be any place where G is set without also setting I (_PAGE_NO_CACHE),
so the test for I is sufficient as a condition to clear _PAGE_COHERENT
when filling the hash table.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Background from Maynard Johnson:
As of POWER6, a set of 32 common events is defined that must be
supported on all future POWER processors. The main impetus for this
compat set is the need to support partition migration, especially from
processor P(n) to processor P(n+1), where performance software that's
running in the new partition may not be knowledgeable about processor
P(n+1). If a performance tool determines it does not support the
physical processor, but is told (via the
PPC_FEATURE_PSERIES_PERFMON_COMPAT bit) that the processor supports
the notion of the PMU compat set, then the performance tool can
surface just those events to the user of the tool.
PPC_FEATURE_PSERIES_PERFMON_COMPAT indicates that the PMU supports at
least this basic subset of events which is compatible across POWER
processor lines.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Add driver for Barrier Synchronization Register
Adds a character driver for BSR support on IBM POWER systems including
Power5 and Power6. The BSR is an optional processor facility not currently
implemented by any other processors. It's primary purpose is fast large SMP
synchronization. More details on the BSR are in comments to the code which
follows. This patch adds BSR driver to pseries_defconfig.
Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Mon, 14 Jul 2008 09:25:57 +0000 (19:25 +1000)]
powerpc: mman.h export fixups
Commit ef3d3246a0d06be622867d21af25f997aeeb105f ("powerpc/mm: Add Strong
Access Ordering support") in the powerpc/{next,master} tree caused the
following in a powerpc allmodconfig build:
usr/include/asm/mman.h requires linux/mm.h, which does not exist in exported headers
We should not use CONFIG_PPC64 in an unprotected (by __KERNEL__)
section of an exported include file and linux/mm.h is not exported. So
protect the whole section that is CONFIG_PPC64 with __KERNEL__ and put
the two introduced includes in there as well.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
David Woodhouse [Tue, 15 Jul 2008 00:50:24 +0000 (17:50 -0700)]
firmware: Correct dependency on CONFIG_EXTRA_FIRMWARE_DIR
When CONFIG_EXTRA_FIRMWARE_DIR gets changed, the filename in the .S file
(which uses .incbin to include the binary) needs to change. When we
renamed the BUILTIN_FIRMWARE_DIR option to EXTRA_FIRMWARE_DIR, we forgot
to update the manual dependency in firmware/Makefile, so it was
depending on a non-existent file in include/config/
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6
* 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6: (64 commits)
firmware: convert sb16_csp driver to use firmware loader exclusively
dsp56k: use request_firmware
edgeport-ti: use request_firmware()
edgeport: use request_firmware()
vicam: use request_firmware()
dabusb: use request_firmware()
cpia2: use request_firmware()
ip2: use request_firmware()
firmware: convert Ambassador ATM driver to request_firmware()
whiteheat: use request_firmware()
ti_usb_3410_5052: use request_firmware()
emi62: use request_firmware()
emi26: use request_firmware()
keyspan_pda: use request_firmware()
keyspan: use request_firmware()
ttusb-budget: use request_firmware()
kaweth: use request_firmware()
smctr: use request_firmware()
firmware: convert ymfpci driver to use firmware loader exclusively
firmware: convert maestro3 driver to use firmware loader exclusively
...
Fix up trivial conflicts with BKL removal in drivers/char/dsp56k.c and
drivers/char/ip2/ip2main.c manually.
Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (241 commits)
[ARM] 5171/1: ep93xx: fix compilation of modules using clocks
[ARM] 5133/2: at91sam9g20 defconfig file
[ARM] 5130/4: Support for the at91sam9g20
[ARM] 5160/1: IOP3XX: gpio/gpiolib support
[ARM] at91: Fix NAND FLASH timings for at91sam9x evaluation kits.
[ARM] 5084/1: zylonite: Register AC97 device
[ARM] 5085/2: PXA: Move AC97 over to the new central device declaration model
[ARM] 5120/1: pxa: correct platform driver names for PXA25x and PXA27x UDC drivers
[ARM] 5147/1: pxaficp_ir: drop pxa_gpio_mode calls, as pin setting
[ARM] 5145/1: PXA2xx: provide api to control IrDA pins state
[ARM] 5144/1: pxaficp_ir: cleanup includes
[ARM] pxa: remove pxa_set_cken()
[ARM] pxa: allow clk aliases
[ARM] Feroceon: don't disable BPU on boot
[ARM] Orion: LED support for HP mv2120
[ARM] Orion: add RD88F5181L-FXO support
[ARM] Orion: add RD88F5181L-GE support
[ARM] Orion: add Netgear WNR854T support
[ARM] s3c2410_defconfig: update for current build
[ARM] Acer n30: Minor style and indentation fixes.
...
Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softirq: remove irqs_disabled warning from local_bh_enable
softirq: remove initialization of static per-cpu variable
Remove argument from open_softirq which is always NULL
Merge branch 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, generic: mark early_printk as asmlinkage
printk: export console_drivers
printk: remember the message level for multi-line output
printk: refactor processing of line severity tokens
printk: don't prefer unsuited consoles on registration
printk: clean up recursion check related static variables
namespacecheck: more kernel/printk.c fixes
namespacecheck: fix kernel printk.c
Merge branch 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: fix kernel/fork.c warning
lockdep: fix ftrace irq tracing false positive
lockdep: remove duplicate definition of STATIC_LOCKDEP_MAP_INIT
lockdep: add lock_class information to lock_chain and output it
lockdep: add lock_class information to lock_chain and output it
lockdep: output lock_class key instead of address for forward dependency output
__mutex_lock_common: use signal_pending_state()
mutex-debug: check mutex magic before owner
Merge branch 'sched/new-API-sched_setscheduler' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/new-API-sched_setscheduler' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: add new API sched_setscheduler_nocheck: add a flag to control access checks
Merge branch 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits)
ftrace: build fix for ftraced_suspend
ftrace: separate out the function enabled variable
ftrace: add ftrace_kill_atomic
ftrace: use current CPU for function startup
ftrace: start wakeup tracing after setting function tracer
ftrace: check proper config for preempt type
ftrace: trace schedule
ftrace: define function trace nop
ftrace: move sched_switch enable after markers
ftrace: prevent ftrace modifications while being kprobe'd, v2
fix "ftrace: store mcount address in rec->ip"
mmiotrace broken in linux-next (8-bit writes only)
ftrace: avoid modifying kprobe'd records
ftrace: freeze kprobe'd records
kprobes: enable clean usage of get_kprobe
ftrace: store mcount address in rec->ip
ftrace: build fix with gcc 4.3
namespacecheck: fixes
ftrace: fix "notrace" filtering priority
ftrace: fix printout
...
Jaswinder Singh [Fri, 27 Jun 2008 14:20:40 +0000 (19:50 +0530)]
vicam: use request_firmware()
Although it wasn't actually using ihex records before, we use the Intel
HEX record format for this firmware -- because that gives us a simple
way to split it into separate chunks internally as we need, without
loading each part as a separate file.
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits)
sched_clock: and multiplier for TSC to gtod drift
sched_clock: record TSC after gtod
sched_clock: only update deltas with local reads.
sched_clock: fix calculation of other CPU
sched_clock: stop maximum check on NO HZ
sched_clock: widen the max and min time
sched_clock: record from last tick
sched: fix accounting in task delay accounting & migration
sched: add avg-overlap support to RT tasks
sched: terminate newidle balancing once at least one task has moved over
sched: fix warning
sched: build fix
sched: sched_clock_cpu() based cpu_clock(), lockdep fix
sched: export cpu_clock
sched: make sched_{rt,fair}.c ifdefs more readable
sched: bias effective_load() error towards failing wake_affine().
sched: incremental effective_load()
sched: correct wakeup weight calculations
sched: fix mult overflow
sched: update shares on wakeup
...
Jean Delvare [Mon, 14 Jul 2008 20:38:36 +0000 (22:38 +0200)]
i2c: Add detection capability to new-style drivers
Add a mechanism to let new-style i2c drivers optionally autodetect
devices they would support on selected buses and ask i2c-core to
instantiate them. This is a replacement for legacy i2c drivers, much
cleaner.
Where drivers had to implement both a legacy i2c_driver and a
new-style i2c_driver so far, this mechanism makes it possible to get
rid of the legacy i2c_driver and implement both enumerated and
detected device support with just one (new-style) i2c_driver.
Here is a quick conversion guide for these drivers, step by step:
* Delete the legacy driver definition, registration and removal.
Delete the attach_adapter and detach_client methods of the legacy
driver.
* Change the prototype of the legacy detect function from
static int foo_detect(struct i2c_adapter *adapter, int address, int kind);
to
static int foo_detect(struct i2c_client *client, int kind,
struct i2c_board_info *info);
* Set the new-style driver detect callback to this new function, and
set its address_data to &addr_data (addr_data is generally provided
by I2C_CLIENT_INSMOD.)
* Add the appropriate class to the new-style driver. This is
typically the class the legacy attach_adapter method was checking
for. Class checking is now mandatory (done by i2c-core.) See
<linux/i2c.h> for the list of available classes.
* Remove the i2c_client allocation and freeing from the detect
function. A pre-allocated client is now handed to you by i2c-core,
and is freed automatically.
* Make the detect function fill the type field of the i2c_board_info
structure it was passed as a parameter, and return 0, on success. If
the detection fails, return -ENODEV.
Jean Delvare [Mon, 14 Jul 2008 20:38:36 +0000 (22:38 +0200)]
i2c: Call client_unregister for new-style devices too
We call adapter->client_register for both legacy and new-style i2c
devices, however we only call adapter->client_unregister for legacy
drivers. This doesn't make much sense. Usually, drivers will undo
in client_unregister what they did in client_register, so we should
call neither or both for every given i2c device.
In order to ease the transition from legacy to new-style devices, it
seems preferable to actually call both.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: David Brownell <david-b@pacbell.net>
Sean MacLennan [Mon, 14 Jul 2008 20:38:36 +0000 (22:38 +0200)]
i2c-ibm_iic: Register child nodes
This patch completes the conversion of the IBM IIC driver to an
of-platform driver.
It removes the index from the IBM IIC driver and makes it an unnumbered
driver. It then calls of_register_i2c_devices to properly register all
the child nodes in the DTS.
Signed-off-by: Sean MacLennan <smaclennan@pikatech.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>