Since GCC has to emit a call and a delay slot to the
out-of-line "membar" routines in arch/sparc64/lib/mb.S
it is much better to just do the necessary predicted
branch inline instead as:
ba,pt %xcc, 1f
membar #whatever
1:
instead of the current:
call membar_foo
dslot
because this way GCC is not required to allocate a stack
frame if the function can be a leaf function.
This also makes this bug fix easier to backport to 2.4.x
Signed-off-by: David S. Miller <davem@davemloft.net>