diff options
author | Andrea Arcangeli <aarcange@redhat.com> | 2023-05-30 21:22:19 -0400 |
---|---|---|
committer | Andrea Arcangeli <aarcange@redhat.com> | 2023-11-11 22:03:36 -0500 |
commit | 9088166cc63dc38823b5e9d1cb163f8e6f1dacfa (patch) | |
tree | 869333e291761ea904b896876bd87ed8d26bb62c | |
parent | 13b65d69e4047e7395155904e573ef7636c961c3 (diff) | |
download | aa-9088166cc63dc38823b5e9d1cb163f8e6f1dacfa.tar.gz |
mm: gup: retain synchronicity of concurrent FOLL_LONGTERM R/O pins on SWP_STABLE_WRITE
David reported a SMP race condition discovered upstream, might affect
downstream too. This is related to SWP_STABLE_WRITE potentially
causing an extra COW that shouldn't have happened.
Only swapping over zram, raid5 or blk-integrity or similar storage
that would set SWP_STABLE_WRITES could trigger it, even then it's an
almost impossible to trigger SMP race condition in real life. The only
defect is a FOLL_LONGTERM R/O GUP pin, if taken concurrently with
heavy swapout activity on the aforementioned storage devices requiring
stable writes, could lose synchronicity.
Short term R/O GUP pins are always fine to see the only the current
snapshot of the memory even if do_wp_page replaces the page later, so
they wouldn't be affected by SWP_STABLE_WRITES.
This has never been reproduced except in synthetic testing and it was
found upstream through code review only.
NOTE: this problem already could happen before the GUP/COW fixes and
it could have affected RDMA GUP pins well before gup_must_unshare and
the COR fault were introduced, but thanks to gup_must_unshare and the
COR fault, we can now correct this longstanding defect for good.
Fixes: f05714293a59 ("mm: support anonymous stable page")
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
-rw-r--r-- | mm/gup.c | 3 | ||||
-rw-r--r-- | mm/ksm.c | 5 | ||||
-rw-r--r-- | mm/swapfile.c | 27 |
3 files changed, 35 insertions, 0 deletions
diff --git a/mm/gup.c b/mm/gup.c index d404b47a58fd50..0450462ad1df46 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -146,6 +146,9 @@ static __always_inline bool __gup_must_unshare(unsigned int flags, (irq_safe || !(vma->vm_flags & VM_SHARED)); if (PageKsm(page)) return gup_must_unshare_ksm(flags); + /* see the SWP_STABLE_WRITE comment in can_read_pin_swap_page */ + if (PageWriteback(page)) + return true; if (PageHuge(page)) { if (__page_mapcount(page) > 1) return true; diff --git a/mm/ksm.c b/mm/ksm.c index 13242ead684aa1..7edb1db9742b43 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2706,6 +2706,11 @@ bool reuse_ksm_page(struct page *page, } #endif + /* + * We cannot reuse PageSwapCache also because it may be in + * PageWriteback mode, so it would break SWP_STABLE_WRITE if + * we did. + */ if (PageSwapCache(page) || !page_stable_node(page)) return false; /* Prohibit parallel get_ksm_page() */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 73f2a6b1a47502..e031ed44434c52 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1694,6 +1694,33 @@ bool can_read_pin_swap_page(struct page *page) { VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageKsm(page), page); + /* + * SWP_STABLE_WRITE might prevent page reuse later. So if we + * don't unshare the page under stable write swapout I/O, + * before taking the PIN, the R/O (!FOLL_WRITE) GUP pin we are + * taking now, may be broken later in do_wp_page at the first + * write access. In other words by unsharing the page here, we + * clear PageWriteback before any GUP pin can be taken. + * + * This check relies on the vmscan.c is_page_cache_freeable() + * check which guarantees us, if the page is under writeback, + * it had to be unmapped from all ptes and not be pinned + * first. And if it's still under writeback then this is the + * first GUP pin being taken on it or it already would have + * been unshared and in turn the PageWriteback would have been + * cleared. + * + * KSM pages can be pinned even under writeback because they + * would never be reused by the COW fault, unless + * PageSwapCache is cleared first, so gup_must_unshare must + * already cope with that regardless of SWP_STABLE_WRITE set + * or not. If instead PageSwapCache has been cleared and the + * page can be reused by the COW fault, then it'd imply + * PageWriteback is gone as well and there would be no extra + * COW caused by SWP_STABLE_WRITE either. + */ + if (PageWriteback(page)) + return false; return page_trans_huge_map_swapcount(page, NULL, NULL) <= 1; } |