CMA: aggressively allocate the pages on CMA reserved memory when not used

CMA is introduced to provide physically contiguous pages at runtime. For this purpose, it reserves memory at boot time. Although it reserve memory, this reserved memory can be used for movable memory allocation request. This usecase is beneficial to the system that needs this CMA reserved memory infrequently and it is one of main purpose of introducing CMA. But, there is a problem in current implementation. The problem is that it works like as just reserved memory approach. The pages on CMA reserved memory are hardly used for movable memory allocation. This is caused by combination of allocation and reclaim policy. The pages on CMA reserved memory are allocated if there is no movable memory, that is, as fallback allocation. So the time this fallback allocation is started is under heavy memory pressure. Although it is under memory pressure, movable allocation easily succeed, since there would be many pages on CMA reserved memory. But this is not the case for unmovable and reclaimable allocation, because they can't use the pages on CMA reserved memory. These allocations regard system's free memory as (free pages - free CMA pages) on watermark checking, that is, free unmovable pages + free reclaimable pages + free movable pages. Because we already exhausted movable pages, only free pages we have are unmovable and reclaimable types and this would be really small amount. So watermark checking would be failed. It will wake up kswapd to make enough free memory for unmovable and reclaimable allocation and kswapd will do. So before we fully utilize pages on CMA reserved memory, kswapd start to reclaim memory and try to make free memory over the high watermark. This watermark checking by kswapd doesn't take care free CMA pages so many movable pages would be reclaimed. After then, we have a lot of movable pages again, so fallback allocation doesn't happen again. To conclude, amount of free memory on meminfo which includes free CMA pages is moving around 512 MB if I reserve 512 MB memory for CMA. I found this problem on following experiment. 4 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 CMA reserve: 0 MB 512 MB Elapsed-time: 225.2 472.5 Average-MemFree: 322490 KB 630839 KB To solve this problem, I can think following 2 possible solutions. 1. allocate the pages on CMA reserved memory first, and if they are exhausted, allocate movable pages. 2. interleaved allocation: try to allocate specific amounts of memory from CMA reserved memory and then allocate from free movable memory. I tested #1 approach and found the problem. Although free memory on meminfo can move around low watermark, there is large fluctuation on free memory, because too many pages are reclaimed when kswapd is invoked. Reason for this behaviour is that successive allocated CMA pages are on the LRU list in that order and kswapd reclaim them in same order. These memory doesn't help watermark checking from kwapd, so too many pages are reclaimed, I guess. So, I implement #2 approach. One thing I should note is that we should not change allocation target (movable list or CMA) on each allocation attempt, since this prevent allocated pages to be in physically succession, so some I/O devices can be hurt their performance. To solve this, I keep allocation target in at least pageblock_nr_pages attempts and make this number reflect ratio, free pages without free CMA pages to free CMA pages. With this approach, system works very smoothly and fully utilize the pages on CMA reserved memory. Following is the experimental result of this patch. 4 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 <Before> CMA reserve: 0 MB 512 MB Elapsed-time: 225.2 472.5 Average-MemFree: 322490 KB 630839 KB nr_free_cma: 0 131068 pswpin: 0 261666 pswpout: 75 1241363 <After> CMA reserve: 0 MB 512 MB Elapsed-time: 222.7 224 Average-MemFree: 325595 KB 393033 KB nr_free_cma: 0 61001 pswpin: 0 6 pswpout: 44 502 There is no difference if we don't have CMA reserved memory (0 MB case). But, with CMA reserved memory (512 MB case), we fully utilize these reserved memory through this patch and the system behaves like as it doesn't reserve any memory. With this patch, we aggressively allocate the pages on CMA reserved memory so latency of CMA can arise. Below is the experimental result about latency. 4 CPUs, 1024 MB, VIRTUAL MACHINE CMA reserve: 512 MB Backgound Workload: make -jN Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval N: 1 4 8 16 Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5 Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2 So generally we can see latency increase. Ratio of this increase is rather big - up to 70%. But, under the heavy workload, it shows latency decrease - up to 55%. This may be worst-case scenario, but reducing it would be important for some system, so, I can say that this patch have advantages and disadvantages in terms of latency. Although I think that this patch is right direction for CMA, there is side-effect in following case. If there is small memory zone and CMA occupys most of them, LRU for this zone would have many CMA pages. When reclaim is started, these CMA pages would be reclaimed, but not counted for watermark checking, so too many CMA pages could be reclaimed unnecessarily. Until now, this can't happen because free CMA pages aren't used easily. But, with this patch, free CMA pages are used easily so this problem can be possible. I will handle it on another patchset after some investigating. v2: - In fastpath, just replenish counters. Calculation is done whenver CMA area is varied v3: - Use unsigned type in adjust_managed_cma_page_count() (per Gioh) - Fix +/- count when calling adjust_managed_cma_page_count() (per Gioh) - Instead of implementing __rmqueue_cma() which has another __rmqueue_smallest(), choose_rmqueue_migratetype() is implemented to change original migratetype to MIGRATE_CMA according to criteria. It helps not to violate layering. (per Minchan in offline discussion) BZ: 111727 Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
author: Joonsoo Kim <iamjoonsoo.kim@lge.com> 2014-06-18 14:10:59 +0900
committer: Paul Mackerras <paulus@samba.org> 2014-07-17 15:00:02 +1000
commit: 512c0dd835c3c49a139acb7491456f13747d2e41 (patch)
tree: 193c1614d573bff7af4b7bcde1cef001503bde1e
parent: 425c4dbc49b29a7e4bb91089142e5308ebd387a0 (diff)
download: powerpc-512c0dd835c3c49a139acb7491456f13747d2e41.tar.gz
5 files changed, 119 insertions, 3 deletions
diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
index d9d3d8553d51d9..4a2602a801e016 100644
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ b/arch/powerpc/kvm/book3s_hv_cma.c
@@ -133,6 +133,7 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
 			bitmap_set(cma->bitmap, pageno, nr_chunk);
 			page = pfn_to_page(pfn);
 			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
+			adjust_managed_cma_page_count(page_zone(page), -nr_pages);
 			break;
 		} else if (ret != -EBUSY) {
 			break;
@@ -180,6 +181,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
 		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
 		     nr_chunk);
 	free_contig_range(pfn, nr_pages);
+	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
 	mutex_unlock(&kvm_cma_mutex);
 
 	return true;
@@ -210,6 +212,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
+	adjust_managed_cma_page_count(zone, count);
+
 	return 0;
 }
 
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 0ca54421ce977b..8d304536cb09cf 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -153,6 +153,7 @@ static __init int cma_activate_area(unsigned long base_pfn, unsigned long count)
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
+	adjust_managed_cma_page_count(zone, count);
 	return 0;
 }
 
@@ -338,6 +339,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 		if (ret == 0) {
 			bitmap_set(cma->bitmap, pageno, count);
 			page = pfn_to_page(pfn);
+			adjust_managed_cma_page_count(page_zone(page), -count);
 			break;
 		} else if (ret != -EBUSY) {
 			break;
@@ -384,6 +386,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	mutex_lock(&cma_mutex);
 	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
 	free_contig_range(pfn, count);
+	adjust_managed_cma_page_count(page_zone(page), count);
 	mutex_unlock(&cma_mutex);
 
 	return true;
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb23d054a..b32734a63d793c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -410,6 +410,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 
 /* CMA stuff */
+extern void adjust_managed_cma_page_count(struct zone *zone, long count);
 extern void init_cma_reserved_pageblock(struct page *page);
 
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5c76737d836b1e..aa79acce6d9233 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -386,6 +386,20 @@ struct zone {
 	int			compact_order_failed;
 #endif
 
+#ifdef CONFIG_CMA
+	unsigned long managed_cma_pages;
+	/*
+	 * Number of allocation attempt on each normal/cma type
+	 * without switching type. max_try_(normal/cma) maintain
+	 * pre-calculated counter and replenish nr_try_(normal/cma)
+	 * with each of them whenever both of them are 0.
+	 */
+	int nr_try_normal;
+	int nr_try_cma;
+	int max_try_normal;
+	int max_try_cma;
+#endif
+
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ee0fd313f036e..d2a47b148d1c77 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -767,7 +767,56 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 }
 
 #ifdef CONFIG_CMA
-/* Free whole pageblock and set it's migration type to MIGRATE_CMA. */
+void adjust_managed_cma_page_count(struct zone *zone, long count)
+{
+	unsigned long flags;
+	unsigned long total, cma, normal;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	zone->managed_cma_pages += count;
+
+	total = zone->managed_pages;
+	cma = zone->managed_cma_pages;
+	normal = total - cma - high_wmark_pages(zone);
+
+	/* No cma pages, so do only normal allocation */
+	if (cma == 0) {
+		zone->max_try_normal = pageblock_nr_pages;
+		zone->max_try_cma = 0;
+		goto out;
+	}
+
+	/*
+	 * We want to consume cma pages with well balanced ratio so that
+	 * we have consumed enough cma pages before the reclaim. For this
+	 * purpose, we can use the ratio, normal : cma. And we doesn't
+	 * want to switch too frequently, because it prevent allocated pages
+	 * from beging successive and it is bad for some sorts of devices.
+	 * I choose pageblock_nr_pages for the minimum amount of successive
+	 * allocation because it is the size of a huge page and fragmentation
+	 * avoidance is implemented based on this size.
+	 *
+	 * To meet above criteria, I derive following equation.
+	 *
+	 * if (normal > cma) then; normal : cma = X : pageblock_nr_pages
+	 * else (normal <= cma) then; normal : cma = pageblock_nr_pages : X
+	 */
+	if (normal > cma) {
+		zone->max_try_normal = normal * pageblock_nr_pages / cma;
+		zone->max_try_cma = pageblock_nr_pages;
+	} else {
+		zone->max_try_normal = pageblock_nr_pages;
+		zone->max_try_cma = cma * pageblock_nr_pages / normal;
+	}
+
+out:
+	zone->nr_try_normal = zone->max_try_normal;
+	zone->nr_try_cma = zone->max_try_cma;
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
@@ -1090,6 +1139,44 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	return NULL;
 }
 
+#ifdef CONFIG_CMA
+static int __choose_rmqueue_migratetype(struct zone *zone, unsigned int order)
+{
+	if (zone->nr_try_normal > 0) {
+		zone->nr_try_normal -= 1 << order;
+		return MIGRATE_MOVABLE;
+	}
+
+	if (zone->nr_try_cma > 0) {
+		zone->nr_try_cma -= 1 << order;
+		return MIGRATE_CMA;
+	}
+
+	/* Reset counter */
+	zone->nr_try_normal = zone->max_try_normal;
+	zone->nr_try_cma = zone->max_try_cma;
+
+	zone->nr_try_normal -= 1 << order;
+	return MIGRATE_MOVABLE;
+}
+
+static inline int choose_rmqueue_migratetype(struct zone *zone,
+					unsigned int order, int migratetype)
+{
+	if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
+		return __choose_rmqueue_migratetype(zone, order);
+
+	return migratetype;
+}
+
+#else
+static inline int choose_rmqueue_migratetype(struct zone *zone,
+					unsigned int order, int migratetype)
+{
+	return migratetype;
+}
+#endif
+
 /*
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -1099,10 +1186,17 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
-retry_reserve:
+	migratetype = choose_rmqueue_migratetype(zone, order, migratetype);
+
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 
 	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
+		if (is_migrate_cma(migratetype)) {
+			migratetype = MIGRATE_MOVABLE;
+			goto retry;
+		}
+
 		page = __rmqueue_fallback(zone, order, migratetype);
 
 		/*
@@ -1112,7 +1206,7 @@ retry_reserve:
 		 */
 		if (!page) {
 			migratetype = MIGRATE_RESERVE;
-			goto retry_reserve;
+			goto retry;
 		}
 	}
author	Joonsoo Kim <iamjoonsoo.kim@lge.com>	2014-06-18 14:10:59 +0900
committer	Paul Mackerras <paulus@samba.org>	2014-07-17 15:00:02 +1000
commit	512c0dd835c3c49a139acb7491456f13747d2e41 (patch)
tree	193c1614d573bff7af4b7bcde1cef001503bde1e
parent	425c4dbc49b29a7e4bb91089142e5308ebd387a0 (diff)
download	powerpc-512c0dd835c3c49a139acb7491456f13747d2e41.tar.gz