mm: gup: retain synchronicity of concurrent FOLL_LONGTERM R/O pins on SWP_STABLE_WRITE

David reported a SMP race condition discovered upstream, might affect downstream too. This is related to SWP_STABLE_WRITE potentially causing an extra COW that shouldn't have happened. Only swapping over zram, raid5 or blk-integrity or similar storage that would set SWP_STABLE_WRITES could trigger it, even then it's an almost impossible to trigger SMP race condition in real life. The only defect is a FOLL_LONGTERM R/O GUP pin, if taken concurrently with heavy swapout activity on the aforementioned storage devices requiring stable writes, could lose synchronicity. Short term R/O GUP pins are always fine to see the only the current snapshot of the memory even if do_wp_page replaces the page later, so they wouldn't be affected by SWP_STABLE_WRITES. This has never been reproduced except in synthetic testing and it was found upstream through code review only. NOTE: this problem already could happen before the GUP/COW fixes and it could have affected RDMA GUP pins well before gup_must_unshare and the COR fault were introduced, but thanks to gup_must_unshare and the COR fault, we can now correct this longstanding defect for good. Fixes: f05714293a59 ("mm: support anonymous stable page") Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
author: Andrea Arcangeli <aarcange@redhat.com> 2023-05-30 21:22:19 -0400
committer: Andrea Arcangeli <aarcange@redhat.com> 2023-11-11 22:03:36 -0500
commit: 9088166cc63dc38823b5e9d1cb163f8e6f1dacfa (patch)
tree: 869333e291761ea904b896876bd87ed8d26bb62c
parent: 13b65d69e4047e7395155904e573ef7636c961c3 (diff)
download: aa-9088166cc63dc38823b5e9d1cb163f8e6f1dacfa.tar.gz
3 files changed, 35 insertions, 0 deletions
diff --git a/mm/gup.c b/mm/gup.c
index d404b47a58fd50..0450462ad1df46 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -146,6 +146,9 @@ static __always_inline bool __gup_must_unshare(unsigned int flags,
 			(irq_safe || !(vma->vm_flags & VM_SHARED));
 	if (PageKsm(page))
 		return gup_must_unshare_ksm(flags);
+	/* see the SWP_STABLE_WRITE comment in can_read_pin_swap_page */
+	if (PageWriteback(page))
+		return true;
 	if (PageHuge(page)) {
 		if (__page_mapcount(page) > 1)
 			return true;
diff --git a/mm/ksm.c b/mm/ksm.c
index 13242ead684aa1..7edb1db9742b43 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2706,6 +2706,11 @@ bool reuse_ksm_page(struct page *page,
 	}
 #endif
 
+	/*
+	 * We cannot reuse PageSwapCache also because it may be in
+	 * PageWriteback mode, so it would break SWP_STABLE_WRITE if
+	 * we did.
+	 */
 	if (PageSwapCache(page) || !page_stable_node(page))
 		return false;
 	/* Prohibit parallel get_ksm_page() */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 73f2a6b1a47502..e031ed44434c52 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1694,6 +1694,33 @@ bool can_read_pin_swap_page(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageKsm(page), page);
+	/*
+	 * SWP_STABLE_WRITE might prevent page reuse later. So if we
+	 * don't unshare the page under stable write swapout I/O,
+	 * before taking the PIN, the R/O (!FOLL_WRITE) GUP pin we are
+	 * taking now, may be broken later in do_wp_page at the first
+	 * write access. In other words by unsharing the page here, we
+	 * clear PageWriteback before any GUP pin can be taken.
+	 *
+	 * This check relies on the vmscan.c is_page_cache_freeable()
+	 * check which guarantees us, if the page is under writeback,
+	 * it had to be unmapped from all ptes and not be pinned
+	 * first. And if it's still under writeback then this is the
+	 * first GUP pin being taken on it or it already would have
+	 * been unshared and in turn the PageWriteback would have been
+	 * cleared.
+	 *
+	 * KSM pages can be pinned even under writeback because they
+	 * would never be reused by the COW fault, unless
+	 * PageSwapCache is cleared first, so gup_must_unshare must
+	 * already cope with that regardless of SWP_STABLE_WRITE set
+	 * or not. If instead PageSwapCache has been cleared and the
+	 * page can be reused by the COW fault, then it'd imply
+	 * PageWriteback is gone as well and there would be no extra
+	 * COW caused by SWP_STABLE_WRITE either.
+	 */
+	if (PageWriteback(page))
+		return false;
 	return page_trans_huge_map_swapcount(page, NULL, NULL) <= 1;
 }
author	Andrea Arcangeli <aarcange@redhat.com>	2023-05-30 21:22:19 -0400
committer	Andrea Arcangeli <aarcange@redhat.com>	2023-11-11 22:03:36 -0500
commit	9088166cc63dc38823b5e9d1cb163f8e6f1dacfa (patch)
tree	869333e291761ea904b896876bd87ed8d26bb62c
parent	13b65d69e4047e7395155904e573ef7636c961c3 (diff)
download	aa-9088166cc63dc38823b5e9d1cb163f8e6f1dacfa.tar.gz