This and the following patches aim to optimize the code dealing with
page tables and TLB operations. Each patch reduces the time it takes
to gzip a 16 MB file slightly, but I expect things like fork() and
mmap() will improve somewhat more.
This patch deals with the low-level TLB operations:
* Remove unused _TLBEHI_I define
* Use gcc builtins instead of inline assembly
* Remove a few unnecessary pipeline flushes and nops
* Introduce NR_TLB_ENTRIES define and use it instead of hardcoding it
to 32 a few places throughout the code.
* Use sysreg bitops instead of hardcoded shifts and masks
* Make a few needlessly global functions static