aboutsummaryrefslogtreecommitdiffstatshomepage
diff options
context:
space:
mode:
authorAndrea Arcangeli <aarcange@redhat.com>2023-05-30 21:22:19 -0400
committerAndrea Arcangeli <aarcange@redhat.com>2023-11-11 22:03:36 -0500
commit9088166cc63dc38823b5e9d1cb163f8e6f1dacfa (patch)
tree869333e291761ea904b896876bd87ed8d26bb62c
parent13b65d69e4047e7395155904e573ef7636c961c3 (diff)
downloadaa-9088166cc63dc38823b5e9d1cb163f8e6f1dacfa.tar.gz
mm: gup: retain synchronicity of concurrent FOLL_LONGTERM R/O pins on SWP_STABLE_WRITE
David reported a SMP race condition discovered upstream, might affect downstream too. This is related to SWP_STABLE_WRITE potentially causing an extra COW that shouldn't have happened. Only swapping over zram, raid5 or blk-integrity or similar storage that would set SWP_STABLE_WRITES could trigger it, even then it's an almost impossible to trigger SMP race condition in real life. The only defect is a FOLL_LONGTERM R/O GUP pin, if taken concurrently with heavy swapout activity on the aforementioned storage devices requiring stable writes, could lose synchronicity. Short term R/O GUP pins are always fine to see the only the current snapshot of the memory even if do_wp_page replaces the page later, so they wouldn't be affected by SWP_STABLE_WRITES. This has never been reproduced except in synthetic testing and it was found upstream through code review only. NOTE: this problem already could happen before the GUP/COW fixes and it could have affected RDMA GUP pins well before gup_must_unshare and the COR fault were introduced, but thanks to gup_must_unshare and the COR fault, we can now correct this longstanding defect for good. Fixes: f05714293a59 ("mm: support anonymous stable page") Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
-rw-r--r--mm/gup.c3
-rw-r--r--mm/ksm.c5
-rw-r--r--mm/swapfile.c27
3 files changed, 35 insertions, 0 deletions
diff --git a/mm/gup.c b/mm/gup.c
index d404b47a58fd50..0450462ad1df46 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -146,6 +146,9 @@ static __always_inline bool __gup_must_unshare(unsigned int flags,
(irq_safe || !(vma->vm_flags & VM_SHARED));
if (PageKsm(page))
return gup_must_unshare_ksm(flags);
+ /* see the SWP_STABLE_WRITE comment in can_read_pin_swap_page */
+ if (PageWriteback(page))
+ return true;
if (PageHuge(page)) {
if (__page_mapcount(page) > 1)
return true;
diff --git a/mm/ksm.c b/mm/ksm.c
index 13242ead684aa1..7edb1db9742b43 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2706,6 +2706,11 @@ bool reuse_ksm_page(struct page *page,
}
#endif
+ /*
+ * We cannot reuse PageSwapCache also because it may be in
+ * PageWriteback mode, so it would break SWP_STABLE_WRITE if
+ * we did.
+ */
if (PageSwapCache(page) || !page_stable_node(page))
return false;
/* Prohibit parallel get_ksm_page() */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 73f2a6b1a47502..e031ed44434c52 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1694,6 +1694,33 @@ bool can_read_pin_swap_page(struct page *page)
{
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageKsm(page), page);
+ /*
+ * SWP_STABLE_WRITE might prevent page reuse later. So if we
+ * don't unshare the page under stable write swapout I/O,
+ * before taking the PIN, the R/O (!FOLL_WRITE) GUP pin we are
+ * taking now, may be broken later in do_wp_page at the first
+ * write access. In other words by unsharing the page here, we
+ * clear PageWriteback before any GUP pin can be taken.
+ *
+ * This check relies on the vmscan.c is_page_cache_freeable()
+ * check which guarantees us, if the page is under writeback,
+ * it had to be unmapped from all ptes and not be pinned
+ * first. And if it's still under writeback then this is the
+ * first GUP pin being taken on it or it already would have
+ * been unshared and in turn the PageWriteback would have been
+ * cleared.
+ *
+ * KSM pages can be pinned even under writeback because they
+ * would never be reused by the COW fault, unless
+ * PageSwapCache is cleared first, so gup_must_unshare must
+ * already cope with that regardless of SWP_STABLE_WRITE set
+ * or not. If instead PageSwapCache has been cleared and the
+ * page can be reused by the COW fault, then it'd imply
+ * PageWriteback is gone as well and there would be no extra
+ * COW caused by SWP_STABLE_WRITE either.
+ */
+ if (PageWriteback(page))
+ return false;
return page_trans_huge_map_swapcount(page, NULL, NULL) <= 1;
}