I found that NPTL's pthread_cond_signal() does not work properly on
kernels compiled by gcc 4.1.x. I suppose inline asm for
__futex_atomic_op() was wrong. I suppose:
1. "=&r" constraint should be used for oldval.
2. Instead of "r" (uaddr), "=R" (*uaddr) for output and "R" (*uaddr)
for input should be used.
3. "memory" should be added to the clobber list.