The memcgroup regime relies upon a cgroup reclaiming pages from itself within
add_to_page_cache: which may involve some waiting. Whereas shmem and tmpfs
rely upon using add_to_page_cache while holding a spinlock: when it cannot
wait. The consequence is that when a cgroup reaches its limit, shmem_getpage
just hangs - unless there is outside memory pressure too, neither kswapd nor
radix_tree_preload get it out of the retry loop.
In most cases we can mem_cgroup_cache_charge the page waitably first, to
attach the page_cgroup in advance, so add_to_page_cache will do no more than
increment a count; then mem_cgroup_uncharge_page after (in both success and
failure cases) to balance the books again.
And where there used to be a congestion_wait for kswapd (recently made
redundant by radix_tree_preload), use mem_cgroup_cache_charge with NULL page
to go through a cycle of allocation and freeing, without accounting to any
particular page, and without updating the statistics vector. This brings the
cgroup below its limit so the next try usually succeeds.