md: introduce get_priority_stripe() to improve raid456 write performance
Improve write performance by preventing the delayed_list from dumping all its
stripes onto the handle_list in one shot. Delayed stripes are now further
delayed by being held on the 'hold_list'. The 'hold_list' is bypassed when:
* a STRIPE_IO_STARTED stripe is found at the head of 'handle_list'
* 'handle_list' is empty and i/o is being done to satisfy full stripe-width
write requests
* 'bypass_count' is less than 'bypass_threshold'. By default the threshold
is 1, i.e. every other stripe handled is a preread stripe provided the
top two conditions are false.
Changes since v1:
* reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and
make it configurable. This defaults to fairness and modest performance
gains out of the box.
Changes since v2:
* [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not
necessary, the important change was clearing STRIPE_DELAYED in
add_stripe_bio and this has been moved out to make_request for the hang
fix.
* [neilb@suse.de]: simplify get_priority_stripe
* [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is
sampled empty (+11%)
* [dan.j.williams@intel.com]: decrement the bypass_count at the detection
of stripes being naturally promoted off of hold_list +2%. Note, resetting
bypass_count instead of decrementing on these events yields +4% but that is
probably too aggressive.
Changes since v3:
* cosmetic fixups
Tested-by: James W. Laferriere <babydr@baby-dragons.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>