The lock hold time in zap_page_range is horrid.  This patch breaks the
work up into chunks and relinquishes the lock after each iteration. 
This drastically lowers latency by creating a preemption point, as well
as lowering lock contention. 

The chunk size is ZAP_BLOCK_SIZE and currently 256*PAGE_SIZE. 

This lowers the maximum latency in zap_page_range from 10~20ms (on a
dual Athlon - one of the worst latencies recorded) to unmeasurable.

The patch contains some other cleanups too.

	-- rml