[PATCH] vm: prevent kswapd pageout priority windup

Now that we are correctly kicking off kswapd early (before the synch reclaim watermark), it is really doing asynchronous pageout. This has exposed a latent problem where allocators running at the same time will make kswapd think it is getting into trouble, and cause too much swapping and suboptimal behaviour. This patch changes the kswapd scanning algorithm to use the same metrics for measuring pageout success as the synchronous reclaim path - namely, how much work is required to free SWAP_CLUSTER_MAX pages. This should make things less fragile all round, and has the added benefit that kswapd will continue running so long as memory is low and it is managing to free pages, rather than going through the full priority loop, then giving up. Should result in much better behaviour all round, especially when there are concurrent allocators. akpm: the patch was confirmed to fix up the excessive swapout which Ray Bryant <raybry@sgi.com> has been reporting. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Nick Piggin <nickpiggin@yahoo.com.au> 2004-10-02 19:16:48 -0700
committer: Linus Torvalds <torvalds@ppc970.osdl.org> 2004-10-02 19:16:48 -0700
commit: 31a9b2ac213ce619a75b3acd2f403685c800d06d (patch)
tree: dd0ca31ece167148f8b9db4ff7efcd9798c7d2f4 /mm
parent: aeb1ae30e6d1aaedd6508ab71bdd87da566f07ff (diff)
download: history-31a9b2ac213ce619a75b3acd2f403685c800d06d.tar.gz
1 files changed, 19 insertions, 2 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 501ef742e59e83..b6f288a49d4cde 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -968,12 +968,16 @@ out:
 static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
 {
 	int to_free = nr_pages;
+	int all_zones_ok;
 	int priority;
 	int i;
-	int total_scanned = 0, total_reclaimed = 0;
+	int total_scanned, total_reclaimed;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
 
+loop_again:
+	total_scanned = 0;
+	total_reclaimed = 0;
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
 	sc.nr_mapped = read_page_state(nr_mapped);
@@ -987,10 +991,11 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
 	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
-		int all_zones_ok = 1;
 		int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 		unsigned long lru_pages = 0;
 
+		all_zones_ok = 1;
+
 		if (nr_pages == 0) {
 			/*
 			 * Scan in the highmem->dma direction for the highest
@@ -1072,6 +1077,15 @@ scan:
 		 */
 		if (total_scanned && priority < DEF_PRIORITY - 2)
 			blk_congestion_wait(WRITE, HZ/10);
+
+		/*
+		 * We do this so kswapd doesn't build up large priorities for
+		 * example when it is freeing in parallel with allocators. It
+		 * matches the direct reclaim path behaviour in terms of impact
+		 * on zone->*_priority.
+		 */
+		if (total_reclaimed >= SWAP_CLUSTER_MAX)
+			break;
 	}
 out:
 	for (i = 0; i < pgdat->nr_zones; i++) {
@@ -1079,6 +1093,9 @@ out:
 
 		zone->prev_priority = zone->temp_priority;
 	}
+	if (!all_zones_ok)
+		goto loop_again;
+
 	return total_reclaimed;
 }
author	Nick Piggin <nickpiggin@yahoo.com.au>	2004-10-02 19:16:48 -0700
committer	Linus Torvalds <torvalds@ppc970.osdl.org>	2004-10-02 19:16:48 -0700
commit	31a9b2ac213ce619a75b3acd2f403685c800d06d (patch)
tree	dd0ca31ece167148f8b9db4ff7efcd9798c7d2f4 /mm
parent	aeb1ae30e6d1aaedd6508ab71bdd87da566f07ff (diff)
download	history-31a9b2ac213ce619a75b3acd2f403685c800d06d.tar.gz