Currently kswapd walks across all zones in dma->normal->highmem order, performing proportional scanning until all zones are OK. This means that pressure against ZONE_NORMAL causes unnecessary reclaim of ZONE_HIGHMEM. To fix that up we change kswapd so that it walks the zones in the high->normal->dma direction, skipping zones which are OK. Once it encounters a zone which needs some reclaim kswapd will perform proportional scanning against that zone as well as all the succeeding lower zones. We scan the lower zones even if they have sufficient free pages. This is because a) the lower zone may be above pages_high, but because of the incremental min, the lower zone may still not be eligible for allocations. That's bad because cache in that lower zone will then not be scanned at the correct rate. b) pages in this lower zone are usable for allocations against the higher zone. So we do want to san all the relevant zones at an equal rate. --- mm/rmap.c | 0 mm/vmscan.c | 11 ++++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff -puN mm/rmap.c~kswapd-avoid-higher-zones mm/rmap.c diff -puN mm/vmscan.c~kswapd-avoid-higher-zones mm/vmscan.c --- 25/mm/vmscan.c~kswapd-avoid-higher-zones 2004-02-28 23:38:18.000000000 -0800 +++ 25-akpm/mm/vmscan.c 2004-02-28 23:38:18.000000000 -0800 @@ -900,6 +900,13 @@ out: * scanned twice and there has been zero successful reclaim. Mark the zone as * dead and from now on, only perform a short scan. Basically we're polling * the zone for when the problem goes away. + * + * kswapd scans the zones in the highmem->normal->dma direction. It skips + * zones which have free_pages > pages_high, but once a zone is found to have + * free_pages <= pages_high, we scan that zone and the lower zones regardless + * of the number of free pages in the lower zones. This interoperates with + * the page allocator fallback scheme to ensure that aging of pages is balanced + * across the zones. */ static int balance_pgdat(pg_data_t *pgdat, int nr_pages, struct page_state *ps) { @@ -920,7 +927,7 @@ static int balance_pgdat(pg_data_t *pgda int all_zones_ok = 1; int pages_scanned = 0; - for (i = 0; i < pgdat->nr_zones; i++) { + for (i = pgdat->nr_zones - 1; i >= 0; i--) { struct zone *zone = pgdat->node_zones + i; int total_scanned = 0; int max_scan; @@ -932,6 +939,8 @@ static int balance_pgdat(pg_data_t *pgda if (nr_pages == 0) { /* Not software suspend */ if (zone->free_pages <= zone->pages_high) all_zones_ok = 0; + if (all_zones_ok) + continue; } zone->temp_priority = priority; max_scan = zone->nr_inactive >> priority; _