ext4: fdatasync should skip metadata writeout when overwriting
Currently fdatasync is identical to fsync in ext3.
I think fdatasync should skip journal flush in data=ordered and
data=writeback mode when it overwrites to already-instantiated blocks on
HDD. When I_DIRTY_DATASYNC flag is not set, fdatasync should skip journal
writeout because this indicates only atime or/and mtime updates.
Following patch is the same approach of ext2's fsync code(ext2_sync_file).
I did a performance test using the sysbench.
#sysbench --num-threads=128 --max-requests=50000 --test=fileio --file-total-size=128G
--file-test-mode=rndwr --file-fsync-mode=fdatasync run
The result on ext3 was:
-2.6.24
Operations performed: 0 Read, 50080 Write, 59600 Other = 109680 Total
Read 0b Written 782.5Mb Total transferred 782.5Mb (12.116Mb/sec)
775.45 Requests/sec executed
Test execution summary:
total time: 64.5814s
total number of events: 50080
total time taken by event execution: 3713.9836
per-request statistics:
min: 0.0000s
avg: 0.0742s
max: 0.9375s
approx. 95 percentile: 0.2901s
Threads fairness:
events (avg/stddev): 391.2500/23.26
execution time (avg/stddev): 29.0155/1.99
-2.6.24-patched
Operations performed: 0 Read, 50009 Write, 61596 Other = 111605 Total
Read 0b Written 781.39Mb Total transferred 781.39Mb (16.419Mb/sec)
1050.83 Requests/sec executed
Test execution summary:
total time: 47.5900s
total number of events: 50009
total time taken by event execution: 2934.5768
per-request statistics:
min: 0.0000s
avg: 0.0587s
max: 0.8938s
approx. 95 percentile: 0.1993s
Threads fairness:
events (avg/stddev): 390.6953/22.64
execution time (avg/stddev): 22.9264/1.17
Filesystem I/O throughput was improved.
Signed-off-by :Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Josef Bacik [Thu, 17 Apr 2008 14:38:59 +0000 (10:38 -0400)]
jbd2: fix possible journal overflow issues
There are several cases where the running transaction can get buffers
added to its BJ_Metadata list which it never dirtied, which makes its
t_nr_buffers counter end up larger than its t_outstanding_credits
counter.
This will cause issues when starting new transactions as while we are
logging buffers we decrement t_outstanding_buffers, so when
t_outstanding_buffers goes negative, we will report that we need less
space in the journal than we actually need, so transactions will be
started even though there may not be enough room for them. In the worst
case scenario (which admittedly is almost impossible to reproduce) this
will result in the journal running out of space.
The fix is to only refile buffers from the committing transaction to the
running transactions BJ_Modified list when b_modified is set on that
journal, which is the only way to be sure if the running transaction has
modified that buffer.
This patch also fixes an accounting error in journal_forget, it is
possible that we can call journal_forget on a buffer without having
modified it, only gotten write access to it, so instead of freeing a
credit, we only do so if the buffer was modified. The assert will help
catch if this problem occurs. Without these two patches I could hit
this assert within minutes of running postmark, with them this issue no
longer arises.
Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Josef Bacik [Thu, 17 Apr 2008 14:38:59 +0000 (10:38 -0400)]
jbd2: fix the way the b_modified flag is cleared
Currently at the start of a journal commit we loop through all of the buffers
on the committing transaction and clear the b_modified flag (the flag that is
set when a transaction modifies the buffer) under the j_list_lock.
The problem is that everywhere else this flag is modified only under the jbd2
lock buffer flag, so it will race with a running transaction who could
potentially set it, and have it unset by the committing transaction.
This is also a big waste, you can have several thousands of buffers that you
are clearing the modified flag on when you may not need to. This patch
removes this code and instead clears the b_modified flag upon entering
do_get_write_access/journal_get_create_access, so if that transaction does
indeed use the buffer then it will be accounted for properly, and if it does
not then we know we didn't use it.
That will be important for the next patch in this series. Tested thoroughly
by myself using postmark/iozone/bonnie++.
Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
Smack: Integrate Smack with Audit
Security: Typecast CAP_*_SET macros
Security: Make secctx_to_secid() take const secdata
Ahmed S. Darwish [Tue, 29 Apr 2008 22:34:10 +0000 (08:34 +1000)]
Smack: Integrate Smack with Audit
Setup the new Audit hooks for Smack. SELinux Audit rule fields are recycled
to avoid `auditd' userspace modifications. Currently only equality testing
is supported on labels acting as a subject (AUDIT_SUBJ_USER) or as an object
(AUDIT_OBJ_USER).
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] linux/libata.h: reorganize ata_device struct members a bit
ahci: SB600 ahci can't do MSI, blacklist that capability
libata: More TSSTcorp pain, keep in sync with legacy IDE
pata_via: Fix 6410 misdetect
[libata] pata_atiixp: fix PIO timing data misprogramming
Alex Chiang [Tue, 29 Apr 2008 22:05:29 +0000 (15:05 -0700)]
[IA64] Provide ACPI fixup for /proc/cpuinfo/physical_id
Legacy HP ia64 platforms currently cannot provide
/proc/cpuinfo/physical_id due to legacy SAL/PAL implementations.
However, that physical topology information can be obtained
via ACPI.
Provide an interface that gives ACPI one last chance to provide
physical_id for these legacy platforms. This logic only comes
into play iff:
- ACPI actually provides slot information for the CPU
- we lack a valid socket_id
Otherwise, we don't do anything.
Since x86 uses the ACPI processor driver as well, we provide a nop
stub function for arch_fix_phys_package_id() in asm-x86/topology.h
Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (28 commits)
V4L-DVB(7789a): cx18: fix symbol conflict with ivtv driver
V4L/DVB (7789): tuner: remove static dependencies on analog tuner sub-modules
V4L/DVB (7785): [2.6 patch] make mt9{m001,v022}_controls[] static
V4L/DVB (7786): cx18: new driver for the Conexant CX23418 MPEG encoder chip
V4L/DVB (7783): drivers/media/dvb/frontends/s5h1420.c: printk fix
V4L/DVB (7782): pvrusb2: Driver is no longer experimental
V4L/DVB (7781): pvrusb2-dvb: include dvb support by default and update Kconfig help text
V4L/DVB (7780): pvrusb2: always enable support for OnAir Creator / HDTV USB2
V4L/DVB (7779): pvrusb2-dvb: quiet down noise in kernel log for feed debug
Rename common tuner Kconfig names to use the same
Fix V4L/DVB core help messages
V4L/DVB (7769): Move other terrestrial tuners to common/tuners
V4L/DVB (7768): reorganize some DVB-S Kconfig items
V4L/DVB(7767): Move tuners to common/tuners
V4L/DVB (7766): saa7134: add another PCI ID for Beholder M6
V4L/DVB (7765): Add support for Beholder BeholdTV H6
V4L/DVB (7763): ivtv: add tuner support for the AverMedia M116
V4L/DVB (7762): ivtv: fix tuner detection for PAL-N/Nc
V4L/DVB (7761): ivtv: increase the DMA timeout from 100 to 300 ms
V4L/DVB (7759): ivtv: increase version number to 1.2.1
...
Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c: Convert most new-style drivers to use module aliasing
i2c: Add support for device alias names
i2c-amd756-s4882: Fix an error path
i2c: Drop unused RTC driver IDs
i2c/tps65010: Add missing intialization of client data
i2c-sis5595: Minor cleanups in sis5595_access
i2c-piix4: Minor cleanups
i2c: Spelling fix (successful)
i2c-stub: No newline in parameter description
V4L-DVB(7789a): cx18: fix symbol conflict with ivtv driver
LD drivers/media/video/built-in.o
drivers/media/video/cx18/built-in.o: In function `get_service_set':
/home/v4l/tokernel/git/drivers/media/video/cx18/cx18-ioctl.c:118: multiple definition of `get_service_set'
drivers/media/video/ivtv/built-in.o:/home/v4l/tokernel/git/drivers/media/video/ivtv/ivtv-ioctl.c:119: first defined here
drivers/media/video/cx18/built-in.o: In function `expand_service_set':
/home/v4l/tokernel/git/drivers/media/video/cx18/cx18-ioctl.c:92: multiple definition of `expand_service_set'
drivers/media/video/ivtv/built-in.o:/home/v4l/tokernel/git/drivers/media/video/ivtv/ivtv-ioctl.c:92: first defined here
drivers/media/video/cx18/built-in.o: In function `service2vbi':
/home/v4l/tokernel/git/drivers/media/video/cx18/cx18-ioctl.c:44: multiple definition of `service2vbi'
drivers/media/video/ivtv/built-in.o:/home/v4l/tokernel/git/drivers/media/video/ivtv/ivtv-ioctl.c:42: first defined here
make[2]: ** [drivers/media/video/built-in.o] Erro 1
make[1]: ** [drivers/media/video] Erro 2
make: ** [drivers/media/] Erro 2
Hans Verkuil [Mon, 28 Apr 2008 23:24:33 +0000 (20:24 -0300)]
V4L/DVB (7786): cx18: new driver for the Conexant CX23418 MPEG encoder chip
Many thanks to Steve Toth from Hauppauge and Nattu Dakshinamurthy from
Conexant for their support. I am in particular thankful to Hauppauge
since without their help this driver would not exist. It should also
be noted that Steve did the work to get the DVB part up and running.
Thank you!
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Steven Toth <stoth@hauppauge.com> Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: G. Andrew Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
drivers/media/dvb/frontends/s5h1420.c: In function `s5h1420_setsymbolrate':
drivers/media/dvb/frontends/s5h1420.c:484: warning: long long unsigned int format, u64 arg (arg 2)
We do not know what type the architecture uses for u64.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Mike Isely [Mon, 28 Apr 2008 00:37:33 +0000 (21:37 -0300)]
V4L/DVB (7782): pvrusb2: Driver is no longer experimental
This driver has been in-kernel and reasonably stable for well over a
year. It is in a stable form and is known to work well. Remove its
experimental status.
Signed-off-by: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Michael Krufky [Sun, 27 Apr 2008 22:12:29 +0000 (19:12 -0300)]
V4L/DVB (7780): pvrusb2: always enable support for OnAir Creator / HDTV USB2
This was a build option in the past, to avoid conflicts with the cxusb module
for digital televsion support. Now that dtv mode support has been merged into
pvrusb2, the OnAir devices are fully supported by this single module. This no
longer should be a build option.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
V4L/DVB (7769): Move other terrestrial tuners to common/tuners
Those tuners are currently used only under media/dvb. However,
they can support also analog TV. Better to move them to the same place
as the other hybrid tuners. This would make easier to use those tuners also
by analog drivers.
There were several issues in the past, caused by the hybrid tuner design, since
now, the same tuner can be used by drivers/media/dvb and drivers/media/video.
Kconfig items were rearranged, to split V4L/DVB core from their drivers.
Tuner setup were happening during i2c attach callback. This means that it would
happen on two conditions:
1) if tuner module weren't load, it will happen at request_module("tuner");
2) if tuner is not compiled as a module, or it is already loaded
(for example, on setups with more than one tuner), it will happen
when saa7134 registers I2C bus.
Due to that, if tuner were loaded, tuner setup will happen _before_ reading
the proper values at tuner eeprom. Since set_addr refuses to change for a tuner
that were previously defined (except if the tuner_addr is set), this were
making eeprom tuner detection useless.
This patch removes tuner type setup from saa7134-i2c, moving it to the proper
place, after taking eeprom into account.
Reviewed-by: Hermann Pitton <hermann-pitton@arcor.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Tuner setup were happening during i2c attach callback. This means that it would
happen on two conditions:
1) if tuner module weren't load, it will happen at request_module("tuner");
2) if tuner is not compiled as a module, or it is already loaded
(for example, on setups with more than one tuner), it will happen
when cx88 registers I2C bus.
Due to that, if tuner were loaded, tuner setup will happen _before_ reading
the proper values at tuner eeprom. Since set_addr refuses to change for a tuner
that were previously defined (except if the tuner_addr is set), this were making
eeprom tuner detection useless.
This patch removes tuner type setup from cx88-i2c, moving it to the proper
place, after taking eeprom into account.
Reviewed-by: Gert Vervoort <gert.vervoort@hccnet.nl> Reviewed-by: Ian Pickworth <ian@pickworth.me.uk> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Alan Cox [Tue, 29 Apr 2008 13:10:57 +0000 (14:10 +0100)]
pata_via: Fix 6410 misdetect
The discrete VIA ATA chips don't have 0x40 enable bits. We check that
properly in one location but not another. This causes some users 6410
RAID cards to be incorrectly skipped.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Jean Delvare [Tue, 29 Apr 2008 21:11:40 +0000 (23:11 +0200)]
i2c: Convert most new-style drivers to use module aliasing
Based on earlier work by Jon Smirl and Jochen Friedrich.
Update most new-style i2c drivers to use standard module aliasing
instead of the old driver_name/type driver matching scheme. I've
left the video drivers apart (except for SoC camera drivers) as
they're a bit more diffcult to deal with, they'll have their own
patch later.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Jon Smirl <jonsmirl@gmail.com> Cc: Jochen Friedrich <jochen@scram.de>
Jean Delvare [Tue, 29 Apr 2008 21:11:39 +0000 (23:11 +0200)]
i2c: Add support for device alias names
Based on earlier work by Jon Smirl and Jochen Friedrich.
This patch allows new-style i2c chip drivers to have alias names using
the official kernel aliasing system and MODULE_DEVICE_TABLE(). At this
point, the old i2c driver binding scheme (driver_name/type) is still
supported.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Jochen Friedrich <jochen@scram.de> Cc: Jon Smirl <jonsmirl@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/nes: Formatting cleanup
RDMA/nes: Add support for SFP+ PHY
RDMA/nes: Use LRO
IPoIB: Copy child MTU from parent
IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute
IB/mthca: Avoid recycling old FMR R_Keys too soon
mlx4_core: Avoid recycling old FMR R_Keys too soon
IB/ehca: Allocate event queue size depending on max number of CQs and QPs
IPoIB: Use separate CQ for UD send completions
IB/iser: Count FMR alignment violations per session
IB/iser: Move high-volume debug output to higher debug level
IB/ehca: handle negative return value from ibmebus_request_irq() properly
RDMA/cxgb3: Support peer-2-peer connection setup
RDMA/cxgb3: Set the max_mr_size device attribute correctly
RDMA/cxgb3: Correctly serialize peer abort path
mlx4_core: Add a way to set the "collapsed" CQ flag
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
alim15x3: disable init_hwif_ali15x3 for PowerPC
ide: fix crash at boot with siimage driver
Anton Vorontsov [Tue, 29 Apr 2008 20:57:38 +0000 (22:57 +0200)]
alim15x3: disable init_hwif_ali15x3 for PowerPC
We don't need init_hwif_ali15x3() on the PowerPC systems either.
Before:
ALI15X3: IDE controller (0x10b9:0x5229 rev 0xc8) at PCI slot 0001:03:1f.0
ALI15X3: 100% native mode on irq 19
ide0: BM-DMA at 0x1120-0x1127
ide1: BM-DMA at 0x1128-0x112f
hda: SONY DVD RW AW-Q170A, ATAPI CD/DVD-ROM drive
hda: UDMA/66 mode selected
ide0: Disabled unable to get IRQ 14.
ide0: failed to initialize IDE interface
ide1: Disabled unable to get IRQ 15.
ide1: failed to initialize IDE interface
After:
ALI15X3: IDE controller (0x10b9:0x5229 rev 0xc8) at PCI slot 0001:03:1f.0
ALI15X3: 100% native mode on irq 19
ide0: BM-DMA at 0x1120-0x1127
ide1: BM-DMA at 0x1128-0x112f
hda: SONY DVD RW AW-Q170A, ATAPI CD/DVD-ROM drive
hda: UDMA/66 mode selected
ide0 at 0x1100-0x1107,0x110a on irq 19
ide1 at 0x1110-0x1117,0x111a on irq 19
hda: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache
ide0 works well, though I can't test ide1, it isn't traced out on
the board.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Some change to the IDE layer are causing the siimage driver to crash
at boot with a NULL dereference. This is due to the sil_dma_ops not
containing all the necessary pointers. I suppose it used to just
"override" the defaults while now, it needs to contain everything.
[bart: while at it: sil_dma_ops should be const now (pointed out by Sergei)]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>, Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Alex Chiang [Thu, 24 Apr 2008 18:57:08 +0000 (12:57 -0600)]
[IA64] Remove printk noise on unimplemented SAL_PHYSICAL_ID_INFO
Commit 113134fcbca83619be4c68d0ca66db6093777b5d changed the flow of
control when calling PAL_LOGICAL_TO_PHYSICAL and SAL_PHYSICAL_ID_INFO.
With the change, if a platform did not implement the latter, a useless
printk would appear in the boot log:
ia64_sal_pltid failed with -1
So let's check the return code and only printk on a true error, and do
not print anything in the unimplemented case. While we're in there,
clean up some stylistic issues too.
Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Various cleanups:
- Change // to /* .. */
- Place whitespace around binary operators.
- Trim down a few long lines.
- Some minor alignment formatting for better readability.
- Remove some silly tabs.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 29 Apr 2008 20:46:53 +0000 (13:46 -0700)]
IPoIB: Copy child MTU from parent
When creating a child interface, copy the MTU information from the
parent. Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does
dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
and priv->admin_mtu will be set to 0.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Tue, 29 Apr 2008 20:46:53 +0000 (13:46 -0700)]
IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute
Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the
mthca userspace ABI to provide a way for userspace to indicate which
memory regions need the DMA write barrier attribute. However, it is
possible to handle this without breaking existing userspace, by having
the mthca kernel driver recognize whether it is talking to old or new
userspace, depending on the size of the register MR structure passed in.
The only potential drawback of this is that is allows old userspace
(which has a bug with DMA ordering on large SGI Altix systems) to
continue to run on new kernels, but the advantage of allowing old
userspace to continue to work on unaffected systems seems to outweigh
this, and we can print a warning to push people to upgrade their
userspace.
Olaf Kirch [Tue, 29 Apr 2008 20:46:53 +0000 (13:46 -0700)]
IB/mthca: Avoid recycling old FMR R_Keys too soon
When a FMR is unmapped, mthca resets the map count to 0, and clears
the upper part of the R_Key which is used as the sequence counter.
This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation. RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers. When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers. The only way to achieve that is by using unmap and sync the
TPT.
However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.
To prevent this, don't reset the sequence count when unmapping a FMR.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Olaf Kirch [Tue, 29 Apr 2008 20:46:53 +0000 (13:46 -0700)]
mlx4_core: Avoid recycling old FMR R_Keys too soon
When a FMR is unmapped, mlx4 resets the map count to 0, and clears the
upper part of the R_Key which is used as the sequence counter.
This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation. RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers. When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers. The only way to achieve that is by using unmap and sync the
TPT.
However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.
To prevent this, don't reset the sequence count when unmapping a FMR.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Stefan Roscher [Tue, 29 Apr 2008 20:46:53 +0000 (13:46 -0700)]
IB/ehca: Allocate event queue size depending on max number of CQs and QPs
If a lot of QPs fall into Error state at once and the EQ of the
respective HCA is too small, it might overrun, causing the eHCA driver
to stop processing completion events and calling the application's
completion handlers, effectively causing traffic to stop.
Fix this by limiting available QPs and CQs to a customizable max
count, and determining EQ size based on these counts and a worst-case
assumption.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Tue, 29 Apr 2008 20:46:53 +0000 (13:46 -0700)]
IPoIB: Use separate CQ for UD send completions
Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated. This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
IB/ehca: handle negative return value from ibmebus_request_irq() properly
ehca_create_eq() was assigning a signed return value to an unsiged
local variable and then checking if the variable was < 0, which meant
that errors were always ignored. Fix this by using one variable for
signed integer return values and another for u64 hcall return values.
Bug originally found by Roel Kluin <12o3l@tiscali.nl>.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Tue, 29 Apr 2008 20:46:52 +0000 (13:46 -0700)]
RDMA/cxgb3: Support peer-2-peer connection setup
Open MPI, Intel MPI and other applications don't respect the iWARP
requirement that the client (active) side of the connection send the
first RDMA message. This class of application connection setup is
called peer-to-peer. Typically once the connection is setup, _both_
sides want to send data.
This patch enables supporting peer-to-peer over the chelsio RNIC by
enforcing this iWARP requirement in the driver itself as part of RDMA
connection setup.
Connection setup is extended, when the peer2peer module option is 1,
such that the MPA initiator will send a 0B Read (the RTR) just after
connection setup. The MPA responder will suspend SQ processing until
the RTR message is received and reply-to.
In the longer term, this will be handled in a standardized way by
enhancing the MPA negotiation so peers can indicate whether they
want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
should be sent. This will be done by standardizing a few bits of the
private data in order to negotiate all this. However this patch
enables peer-to-peer applications now and allows most of the required
firmware and driver changes to be done and tested now.
Design:
- Add a module option, peer2peer, to enable this mode.
- New firmware support for peer-to-peer mode:
- a new bit in the rdma_init WR to tell it to do peer-2-peer
and what form of RTR message to send or expect.
- process _all_ preposted recvs before moving the connection
into rdma mode.
- passive side: defer completing the rdma_init WR until all
pre-posted recvs are processed. Suspend SQ processing until
the RTR is received.
- active side: expect and process the 0B read WR on offload TX
queue. Defer completing the rdma_init WR until all
pre-posted recvs are processed. Suspend SQ processing until
the 0B read WR is processed from the offload TX queue.
- If peer2peer is set, driver posts 0B read request on offload TX
queue just after posting the rdma_init WR to the offload TX queue.
- Add CQ poll logic to ignore unsolicitied read responses.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Tue, 29 Apr 2008 20:46:51 +0000 (13:46 -0700)]
RDMA/cxgb3: Correctly serialize peer abort path
Open MPI and other stress testing exposed a few bad bugs in handling
aborts in the middle of a normal close. Fix these by:
- serializing abort reply and peer abort processing with disconnect
processing
- warning (and ignoring) if ep timer is stopped when it wasn't running
- cleaning up disconnect path to correctly deal with aborting and
dead endpoints
- in iwch_modify_qp(), taking a ref on the ep before releasing the qp
lock if iwch_ep_disconnect() will be called. The ref is dropped
after calling disconnect.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
The referenced commit changed the order such that the CPU code was
initialised before MFP, resulting in unregistered MFP sysfs objects
being referenced. Reverse the link order of these so MFP is
initialised before the CPU code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
spin_is_locked() doesn't work on UP without spinlock debugging. Make it
safer and just return 1 on UP, so we don't get false positives. The plan
is to kill this debug function during the -rc cycle.
drivers/net/tehuti: use proper capability check for raw IO access
Yeah, in practice they both mean "root", but Alan correctly points out
that anybody who gets to do raw IO space accesses should really be using
CAP_SYS_RAWIO rather than CAP_NET_ADMIN.
Pointed-out-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] new predicate - AUDIT_FILETYPE
[patch 2/2] Use find_task_by_vpid in audit code
[patch 1/2] audit: let userspace fully control TTY input auditing
[PATCH 2/2] audit: fix sparse shadowed variable warnings
[PATCH 1/2] audit: move extern declarations to audit.h
Audit: MAINTAINERS update
Audit: increase the maximum length of the key field
Audit: standardize string audit interfaces
Audit: stop deadlock from signals under load
Audit: save audit_backlog_limit audit messages in case auditd comes back
Audit: collect sessionid in netlink messages
Audit: end printk with newline
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
pciehp: fix error message about getting hotplug control
pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2
pci/irq: restore mask_bits in msi shutdown -v3
doc: replace yet another dev with pdev for consistency in DMA-mapping.txt
PCI: don't expose struct pci_vpd to userspace
doc: fix an incorrect suggestion to pass NULL for PCI like buses
Consistently use pdev as the variable of type struct pci_dev *.
pciehp: Fix command write
shpchp: fix slot name
make pciehp_acpi_get_hp_hw_control_from_firmware()
pciehp: Clean up pcie_init()
pciehp: Mask hotplug interrupt at controller release
pciehp: Remove useless hotplug interrupt enabling
pciehp: Fix wrong slot capability check
pciehp: Fix wrong slot control register access
pciehp: Add missing memory barrier
pciehp: Fix interrupt event handlig
pciehp: fix slot name
Update MAINTAINERS with location of PCI tree
PCI: Add Intel SCH PCI IDs
...
The new queue_flag_set/clear() functions verify that the queue is
locked, but in doing so they will actually instead oops if the queue
lock hasn't been initialized at all.
So fix the lock debug test to consider the "no lock" case to be
unlocked. This way you get a nice WARN_ON_ONCE() instead of a fatal
oops.
Jarkko Nikula [Fri, 25 Apr 2008 11:55:19 +0000 (13:55 +0200)]
[ALSA] ASoC: Add drivers for the Texas Instruments OMAP processors
Add common OMAP ASoC drivers and machine driver for Nokia N810. Currently
supported features are:
- Covers OMAPs from 1510 to 2420
- Common DMA driver
- DAI link driver using McBSP port in I2S mode
- Basic machine driver for Nokia N810
Signed-off-by: Jarkko Nikula <jarkko.nikula@nokia.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>