aboutsummaryrefslogtreecommitdiffstats
path: root/mm/z3fold.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-25mm: zpool: return pool size in pagesJohannes Weiner1-5/+5
All zswap backends track their pool sizes in pages. Currently they multiply by PAGE_SIZE for zswap, only for zswap to divide again in order to do limit math. Report pages directly. Link: https://lkml.kernel.org/r/20240312153901.3441-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/z3fold: fix the comment for __encode_handle()Zhongkun He1-2/+3
The comment is confusing that Pool lock should be held as this function accesses first_num above the __encode_handle() because first_num is the element of z3fold_header which is protected by z3fold_header->page_lock. I found the same comment for encode_handle() in zbud.c by accident ,Pool lock should be held as this function accesses first|last_chunks, which is the element of zbud_header and it does not have any lock, so pool lock should be held. Z3fold is based on zbud, maybe the comment come from zbud, but it was wrong, so fix it. Link: https://lkml.kernel.org/r/20240219024453.2240147-1-hezhongkun.hzk@bytedance.com Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21mm/z3fold: remove obsolete comment for struct z3fold_poolXiu Jianfeng1-2/+0
Since commit e774a7bc7f0a ("mm: zswap: remove page reclaim logic from z3fold"), zpool and zpool_ops have been removed, so also remove the corresponding comments. Link: https://lkml.kernel.org/r/20230814221142.486548-1-xiujianfeng@huaweicloud.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21mm/z3fold: use helper function put_z3fold_locked() and put_z3fold_locked_list()Ruan Jinjie1-8/+17
This code is already duplicated six times, use helper function put_z3fold_locked() to release z3fold page instead of open code it to help improve code readability a bit. And add put_z3fold_locked_list() helper function to be consistent with it. No functional change involved. Link: https://lkml.kernel.org/r/20230803113824.886413-1-ruanjinjie@huawei.com Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove shrink from zpool interfaceDomenico Cerasuolo1-3/+1
Now that all three zswap backends have removed their shrink code, it is no longer necessary for the zpool interface to include shrink/writeback endpoints. Link: https://lkml.kernel.org/r/20230612093815.133504-6-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove page reclaim logic from z3foldDomenico Cerasuolo1-243/+2
Switch z3fold to the new generic zswap LRU and remove its custom implementation. Link: https://lkml.kernel.org/r/20230612093815.133504-4-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm: remove PageMovable exportGreg Kroah-Hartman1-2/+0
The only in-kernel users that need PageMovable() to be exported are z3fold and zsmalloc and they are only using it for dubious debugging functionality. So remove those usages and the export so that no driver code accidentally thinks that they are allowed to use this symbol. Link: https://lkml.kernel.org/r/20230106135900.3763622-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11zpool: clean out dead codeJohannes Weiner1-31/+5
There is a lot of provision for flexibility that isn't actually needed or used. Zswap (the only zpool user) always passes zpool_ops with an .evict method set. The backends who reclaim only do so for zswap, so they can also directly call zpool_ops without indirection or checks. Finally, there is no need to check the retries parameters and bail with -EINVAL in the reclaim function, when that's called just a few lines below with a hard-coded 8. There is no need to duplicate the evictable and sleep_mapped attrs from the driver in zpool_ops. Link: https://lkml.kernel.org/r/20221128191616.1261026-3-nphamcs@gmail.com Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Nhat Pham <nphamcs@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-02mm: Convert all PageMovable users to movable_operationsMatthew Wilcox (Oracle)1-75/+9
These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-27mm/z3fold: fix z3fold_page_migrate races with z3fold_mapMiaohe Lin1-4/+12
Think about the below scenario: CPU1 CPU2 z3fold_page_migrate z3fold_map z3fold_page_trylock ... z3fold_page_unlock /* slots still points to old zhdr*/ get_z3fold_header get slots from handle get old zhdr from slots z3fold_page_trylock return *old* zhdr encode_handle(new_zhdr, FIRST|LAST|MIDDLE) put_page(page) /* zhdr is freed! */ but zhdr is still used by caller! z3fold_map can map freed z3fold page and lead to use-after-free bug. To fix it, we add PAGE_MIGRATED to indicate z3fold page is migrated and soon to be released. So get_z3fold_header won't return such page. Link: https://lkml.kernel.org/r/20220429064051.61552-10-linmiaohe@huawei.com Fixes: 1f862989b04a ("mm/z3fold.c: support page migration") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix z3fold_reclaim_page races with z3fold_freeMiaohe Lin1-15/+3
Think about the below scenario: CPU1 CPU2 z3fold_reclaim_page z3fold_free spin_lock(&pool->lock) get_z3fold_header -- hold page_lock kref_get_unless_zero kref_put--zhdr->refcount can be 1 now !z3fold_page_trylock kref_put -- zhdr->refcount is 0 now release_z3fold_page WARN_ON(!list_empty(&zhdr->buddy)); -- we're on buddy now! spin_lock(&pool->lock); -- deadlock here! z3fold_reclaim_page might race with z3fold_free and will lead to pool lock deadlock and zhdr buddy non-empty warning. To fix this, defer getting the refcount until page_lock is held just like what __z3fold_alloc does. Note this has the side effect that we won't break the reclaim if we meet a soon to be released z3fold page now. Link: https://lkml.kernel.org/r/20220429064051.61552-9-linmiaohe@huawei.com Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: always clear PAGE_CLAIMED under z3fold page lockMiaohe Lin1-3/+3
Think about the below race window: CPU1 CPU2 z3fold_reclaim_page z3fold_free test_and_set_bit PAGE_CLAIMED failed to reclaim page z3fold_page_lock(zhdr); add back to the lru list; z3fold_page_unlock(zhdr); get_z3fold_header page_claimed=test_and_set_bit PAGE_CLAIMED clear_bit(PAGE_CLAIMED, &page->private); if (!page_claimed) /* it's false true */ free_handle is not called free_handle won't be called in this case. So z3fold_buddy_slots will leak. Fix it by always clear PAGE_CLAIMED under z3fold page lock. Link: https://lkml.kernel.org/r/20220429064051.61552-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: put z3fold page back into unbuddied list when reclaim or ↵Miaohe Lin1-0/+4
migration fails When doing z3fold page reclaim or migration, the page is removed from unbuddied list. If reclaim or migration succeeds, it's fine as page is released. But in case it fails, the page is not put back into unbuddied list now. The page will be leaked until next compaction work, reclaim or migration is done. Link: https://lkml.kernel.org/r/20220429064051.61552-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"Miaohe Lin1-5/+3
Revert commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"). z3fold can't support GFP_HIGHMEM page now. page_address is used directly at all places. Moreover, z3fold_header is on per cpu unbuddied list which could be accessed anytime. So we should remove the support of GFP_HIGHMEM allocation for z3fold. Link: https://lkml.kernel.org/r/20220429064051.61552-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: throw warning on failure of trylock_page in z3fold_allocMiaohe Lin1-4/+3
If trylock_page fails, the page won't be non-lru movable page. When this page is freed via free_z3fold_page, it will trigger bug on PageMovable check in __ClearPageMovable. Throw warning on failure of trylock_page to guard against such rare case just as what zsmalloc does. Link: https://lkml.kernel.org/r/20220429064051.61552-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: remove buggy use of stale list for allocationMiaohe Lin1-22/+1
Currently if z3fold couldn't find an unbuddied page it would first try to pull a page off the stale list. But this approach is problematic. If init z3fold page fails later, the page should be freed via free_z3fold_page to clean up the relevant resource instead of using __free_page directly. And if page is successfully reused, it will BUG_ON later in __SetPageMovable because it's already non-lru movable page, i.e. PAGE_MAPPING_MOVABLE is already set in page->mapping. In order to fix all of these issues, we can simply remove the buggy use of stale list for allocation because can_sleep should always be false and we never really hit the reusing code path now. Link: https://lkml.kernel.org/r/20220429064051.61552-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix possible null pointer dereferencingMiaohe Lin1-1/+11
alloc_slots could fail to allocate memory under heavy memory pressure. So we should check zhdr->slots against NULL to avoid future null pointer dereferencing. Link: https://lkml.kernel.org/r/20220429064051.61552-3-linmiaohe@huawei.com Fixes: fc5488651c7d ("z3fold: simplify freeing slots") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix sheduling while atomicMiaohe Lin1-2/+1
Patch series "A few fixup patches for z3fold". This series contains a few fixup patches to fix sheduling while atomic, fix possible null pointer dereferencing, fix various race conditions and so on. More details can be found in the respective changelogs. This patch (of 9): z3fold's page_lock is always held when calling alloc_slots. So gfp should be GFP_ATOMIC to avoid "scheduling while atomic" bug. Link: https://lkml.kernel.org/r/20220429064051.61552-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220429064051.61552-2-linmiaohe@huawei.com Fixes: fc5488651c7d ("z3fold: simplify freeing slots") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: remove unneeded PAGE_HEADLESS check in free_handle()Miaohe Lin1-3/+0
The only caller z3fold_free() never calls free_handle() in PAGE_HEADLESS case. Remove this unneeded check. Link: https://lkml.kernel.org/r/20220308134311.59086-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: remove redundant list_del_init of zhdr->buddy in z3fold_freeMiaohe Lin1-3/+0
do_compact_page() will do list_del_init(&zhdr->buddy) for us. Remove this extra one to save some possible cpu cycles. Link: https://lkml.kernel.org/r/20220308134311.59086-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: move decrement of pool->pages_nr into __release_z3fold_page()Miaohe Lin1-29/+12
The z3fold will always do atomic64_dec(&pool->pages_nr) when the __release_z3fold_page() is called. Thus we can move decrement of pool->pages_nr into __release_z3fold_page() to simplify the code. Also we can reduce the size of z3fold.o ~1k. Without this patch: text data bss dec hex filename 15444 1376 8 16828 41bc mm/z3fold.o With this patch: text data bss dec hex filename 15044 1248 8 16300 3fac mm/z3fold.o Link: https://lkml.kernel.org/r/20220308134311.59086-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: remove confusing local variable l reassignmentMiaohe Lin1-1/+0
The local variable l holds the address of unbuddied[i] which won't change after we take the pool lock. Remove it to avoid confusion. Link: https://lkml.kernel.org/r/20220308134311.59086-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: remove unneeded page_mapcount_reset and ClearPagePrivateMiaohe Lin1-3/+0
Page->page_type and PagePrivate are not used in z3fold. We should remove these confusing unneeded operations. The z3fold do these here is due to referring to zsmalloc's migration code which does need these operations. Link: https://lkml.kernel.org/r/20220308134311.59086-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: minor clean up for z3fold_freeMiaohe Lin1-4/+4
Use put_z3fold_header() to pair with get_z3fold_header. Also fix the wrong comments. Minor readability improvement. Link: https://lkml.kernel.org/r/20220308134311.59086-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: remove obsolete comment in z3fold_allocMiaohe Lin1-3/+0
The highmem pages are supported since commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"). Remove the residual comment. Link: https://lkml.kernel.org/r/20220308134311.59086-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/z3fold: declare z3fold_mount with __initMiaohe Lin1-1/+1
Patch series "A few cleanup patches for z3fold", v2. This series contains a few patches to simplify the code, remove unneeded code, fix obsolete comment and so on. More details can be found in the respective changelogs. This patch (of 8): z3fold_mount is only called during init. So we should declare it with __init. Link: https://lkml.kernel.org/r/20220308134311.59086-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220308134311.59086-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2021-07-01mm/z3fold: add kerneldoc fields for z3fold_poolMel Gorman1-0/+2
make W=1 generates the following warning for z3fold_pool mm/z3fold.c:171: warning: Function parameter or member 'zpool' not described in 'z3fold_pool' mm/z3fold.c:171: warning: Function parameter or member 'zpool_ops' not described in 'z3fold_pool' Commit 9a001fc19ccc ("z3fold: the 3-fold allocator for compressed pages") simply did not document the fields at the time. Add rudimentary documentation. Link: https://lkml.kernel.org/r/20210520084809.8576-11-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Streetman <ddstreet@ieee.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30mm/z3fold: use release_z3fold_page_locked() to release locked z3fold pageMiaohe Lin1-1/+1
We should use release_z3fold_page_locked() to release z3fold page when it's locked, although it looks harmless to use release_z3fold_page() now. Link: https://lkml.kernel.org/r/20210619093151.1492174-7-linmiaohe@huawei.com Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30mm/z3fold: fix potential memory leak in z3fold_destroy_pool()Miaohe Lin1-0/+1
There is a memory leak in z3fold_destroy_pool() as it forgets to free_percpu pool->unbuddied. Call free_percpu for pool->unbuddied to fix this issue. Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com Fixes: d30561c56f41 ("z3fold: use per-cpu unbuddied lists") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30mm/z3fold: remove unused function handle_to_z3fold_header()Miaohe Lin1-18/+4
handle_to_z3fold_header() is unused now. So we can remove it. As a result, get_z3fold_header() becomes the only caller of __get_z3fold_header() and the argument lock is always true. Therefore we could further fold the __get_z3fold_header() into get_z3fold_header() with lock = true. Link: https://lkml.kernel.org/r/20210619093151.1492174-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30mm/z3fold: remove magic number in z3fold_create_pool()Miaohe Lin1-1/+2
It's meaningless to pass a magic number 2 to __alloc_percpu() as there is a minimum alignment size of PCPU_MIN_ALLOC_SIZE (> 2) in it. Also there is no special alignment requirement for unbuddied. So we could replace this magic number with nature alignment, i.e. __alignof__(struct list_head), to improve readability. Link: https://lkml.kernel.org/r/20210619093151.1492174-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30mm/z3fold: avoid possible underflow in z3fold_alloc()Miaohe Lin1-2/+5
It is not enough to just make sure the z3fold header is not larger than the page size. When z3fold header is equal to PAGE_SIZE, we would underflow when check alloc size against PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE in z3fold_alloc(). Make sure there has remaining spaces for its buddy to fix this theoretical issue. Link: https://lkml.kernel.org/r/20210619093151.1492174-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKSMiaohe Lin1-1/+1
Patch series "Cleanup and fixup for z3fold". This series contains cleanups to remove unused function, redefine macro to improve readability and so on. Also this fixes several bugs in z3fold, such as memory leak in z3fold_destroy_pool(). More details can be found in the respective changelogs. This patch (of 6): To improve code readability, we could define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS. No functional change intended. Link: https://lkml.kernel.org/r/20210619093151.1492174-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210619093151.1492174-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07mm: fix some typos and code style problemsShijie Luo1-1/+1
fix some typos and code style problems in mm. gfp.h: s/MAXNODES/MAX_NUMNODES mmzone.h: s/then/than rmap.c: s/__vma_split()/__vma_adjust() swap.c: s/__mod_zone_page_stat/__mod_zone_page_state, s/is is/is swap_state.c: s/whoes/whose z3fold.c: code style problem fix in z3fold_unregister_migration zsmalloc.c: s/of/or, s/give/given Link: https://lkml.kernel.org/r/20210419083057.64820-1-luoshijie1@huawei.com Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25z3fold: prevent reclaim/free race for headless pagesThomas Hebb1-1/+15
Commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced the PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page release." By atomically testing and setting the bit in each of z3fold_free() and z3fold_reclaim_page(), a double-free was avoided. However, commit dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") appears to have unintentionally broken this behavior by moving the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock gets taken, which only happens for non-headless pages. For headless pages, the check is now skipped entirely and races can occur again. I have observed such a race on my system: page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x165316 flags: 0x2ffff0000000000() raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000000 raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:707! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G B 5.10.7-arch1-1-kasan #1 Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016 Workqueue: zswap-shrink shrink_worker RIP: 0010:__free_pages+0x10a/0x130 Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 89 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07 RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000 RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8 RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4 R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588 FS: 0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0 Call Trace: z3fold_zpool_shrink+0x9b6/0x1240 shrink_worker+0x35/0x90 process_one_work+0x70c/0x1210 worker_thread+0x539/0x1200 kthread+0x330/0x400 ret_from_fork+0x22/0x30 Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitech_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypass iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrtl uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_generic mxm_wmi mei_me hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_key_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 video intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart ---[ end trace 126d646fc3dc0ad8 ]--- To fix the issue, re-add the earlier test and set in the case where we have a headless page. Link: https://lkml.kernel.org/r/c8106dbe6d8390b290cd1d7f873a2942e805349e.1615452048.git.tommyhebb@gmail.com Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") Signed-off-by: Thomas Hebb <tommyhebb@gmail.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Jongseok Kim <ks77sj@gmail.com> Cc: Snild Dolkow <snild@sony.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: set the sleep_mapped to true for zbud and z3foldTian Tao1-0/+1
zpool driver adds a flag to indicate whether the zpool driver can enter an atomic context after mapping. This patch sets it true for z3fold and zbud. Link: https://lkml.kernel.org/r/1611035683-12732-3-git-send-email-tiantao6@hisilicon.com Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Mike Galbraith <efault@gmx.de> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24z3fold: simplify the zhdr initialization code in init_z3fold_page()Miaohe Lin1-7/+1
We can simplify the zhdr initialization by memset() the zhdr first instead of set struct member to zero one by one. This would also make code more compact and clear. Link: https://lkml.kernel.org/r/20210120085851.16159-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24z3fold: remove unused attribute for release_z3fold_pageMiaohe Lin1-2/+1
Since commit dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim"), release_z3fold_page() is used again. So we can drop the unused attribute safely. Link: https://lkml.kernel.org/r/20210120084008.58432-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15z3fold: remove preempt disabled sections for RTVitaly Wool1-7/+10
Replace get_cpu_ptr() with migrate_disable()+this_cpu_ptr() so RT can take spinlocks that become sleeping locks. Signed-off-by Mike Galbraith <efault@gmx.de> Link: https://lkml.kernel.org/r/20201209145151.18994-3-vitaly.wool@konsulko.com Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15z3fold: stricter locking and more careful reclaimVitaly Wool1-58/+85
Use temporary slots in reclaim function to avoid possible race when freeing those. While at it, make sure we check CLAIMED flag under page lock in the reclaim function to make sure we are not racing with z3fold_alloc(). Link: https://lkml.kernel.org/r/20201209145151.18994-4-vitaly.wool@konsulko.com Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15z3fold: simplify freeing slotsVitaly Wool1-42/+13
Patch series "z3fold: stability / rt fixes". Address z3fold stability issues under stress load, primarily in the reclaim and free aspects. Besides, it fixes the locking problems that were only seen in real-time kernel configuration. This patch (of 3): There used to be two places in the code where slots could be freed, namely when freeing the last allocated handle from the slots and when releasing the z3fold header these slots aree linked to. The logic to decide on whether to free certain slots was complicated and error prone in both functions and it led to failures in RT case. To fix that, make free_handle() the single point of freeing slots. Link: https://lkml.kernel.org/r/20201209145151.18994-1-vitaly.wool@konsulko.com Link: https://lkml.kernel.org/r/20201209145151.18994-2-vitaly.wool@konsulko.com Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Tested-by: Mike Galbraith <efault@gmx.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13mm/z3fold.c: use xx_zalloc instead xx_alloc and memsetHui Su1-2/+1
alloc_slots() allocates memory for slots using kmem_cache_alloc(), then memsets it. We can just use kmem_cache_zalloc(). Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200926100834.GA184671@rlk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm/z3fold: silence kmemleak false positives of slotsQian Cai1-0/+3
Kmemleak reported many leaks while under memory pressue in, slots = alloc_slots(pool, gfp); which is referenced by "zhdr" in init_z3fold_page(), zhdr->slots = slots; However, "zhdr" could be gone without freeing slots as the later will be freed separately when the last "handle" off of "handles" array is freed. It will be within "slots" which is always aligned. unreferenced object 0xc000000fdadc1040 (size 104): comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: z3fold_zpool_malloc+0x7b0/0xe10 alloc_slots at mm/z3fold.c:214 (inlined by) init_z3fold_page at mm/z3fold.c:412 (inlined by) z3fold_alloc at mm/z3fold.c:1161 (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735 zpool_malloc+0x34/0x50 zswap_frontswap_store+0x60c/0xda0 zswap_frontswap_store at mm/zswap.c:1093 __frontswap_store+0x128/0x330 swap_writepage+0x58/0x110 pageout+0x16c/0xa40 shrink_page_list+0x1ac8/0x25c0 shrink_inactive_list+0x270/0x730 shrink_lruvec+0x444/0xf30 shrink_node+0x2a4/0x9c0 do_try_to_free_pages+0x158/0x640 try_to_free_pages+0x1bc/0x5f0 __alloc_pages_slowpath.constprop.60+0x4dc/0x15a0 __alloc_pages_nodemask+0x520/0x650 alloc_pages_vma+0xc0/0x420 handle_mm_fault+0x1174/0x1bf0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/r/20200522220052.2225-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-23z3fold: fix use-after-free when freeing handlesUladzislau Rezki1-5/+6
free_handle() for a foreign handle may race with inter-page compaction, what can lead to memory corruption. To avoid that, take write lock not read lock in free_handle to be synchronized with __release_z3fold_page(). For example KASAN can detect it: ================================================================== BUG: KASAN: use-after-free in LZ4_decompress_safe+0x2c4/0x3b8 Read of size 1 at addr ffffffc976695ca3 by task GoogleApiHandle/4121 CPU: 0 PID: 4121 Comm: GoogleApiHandle Tainted: P S OE 4.19.81-perf+ #162 Hardware name: Sony Mobile Communications. PDX-203(KONA) (DT) Call trace: LZ4_decompress_safe+0x2c4/0x3b8 lz4_decompress_crypto+0x3c/0x70 crypto_decompress+0x58/0x70 zcomp_decompress+0xd4/0x120 ... Apart from that, initialize zhdr->mapped_count in init_z3fold_page() and remove "newpage" variable because it is not used anywhere. Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Raymond Jennings <shentino@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200520082100.28876-1-vitaly.wool@konsulko.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06mm/z3fold.c: do not include rwlock.h directlySebastian Andrzej Siewior1-1/+0
rwlock.h should not be included directly. Instead linux/splinlock.h should be included. One thing it does is to break the RT build. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/z3fold.c: add inter-page compactionVitaly Wool1-72/+303
For each page scheduled for compaction (e. g. by z3fold_free()), try to apply inter-page compaction before running the traditional/ existing intra-page compaction. That means, if the page has only one buddy, we treat that buddy as a new object that we aim to place into an existing z3fold page. If such a page is found, that object is transferred and the old page is freed completely. The transferred object is named "foreign" and treated slightly differently thereafter. Namely, we increase "foreign handle" counter for the new page. Pages with non-zero "foreign handle" count become unmovable. This patch implements "foreign handle" detection when a handle is freed to decrement the foreign handle counter accordingly, so a page may as well become movable again as the time goes by. As a result, we almost always have exactly 3 objects per page and significantly better average compression ratio. [cai@lca.pw: fix -Wunused-but-set-variable warnings] Link: http://lkml.kernel.org/r/1570542062-29144-1-git-send-email-cai@lca.pw [vitalywool@gmail.com: avoid subtle race when freeing slots] Link: http://lkml.kernel.org/r/20191127152118.6314b99074b0626d4c5a8835@gmail.com [vitalywool@gmail.com: compact objects more accurately] Link: http://lkml.kernel.org/r/20191127152216.6ad33745a21ba71c53606acb@gmail.com [vitalywool@gmail.com: protect handle reads] Link: http://lkml.kernel.org/r/20191127152345.8059852f60947686674d726d@gmail.com Link: http://lkml.kernel.org/r/20191006041457.24113-1-vitalywool@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07mm/z3fold.c: claim page in the beginning of freeVitaly Wool1-2/+8
There's a really hard to reproduce race in z3fold between z3fold_free() and z3fold_reclaim_page(). z3fold_reclaim_page() can claim the page after z3fold_free() has checked if the page was claimed and z3fold_free() will then schedule this page for compaction which may in turn lead to random page faults (since that page would have been reclaimed by then). Fix that by claiming page in the beginning of z3fold_free() and not forgetting to clear the claim in the end. [vitalywool@gmail.com: v2] Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reported-by: Markus Linnala <markus.linnala@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Markus Linnala <markus.linnala@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24z3fold: fix memory leak in kmem cacheVitaly Wool1-6/+9
Currently there is a leak in init_z3fold_page() -- it allocates handles from kmem cache even for headless pages, but then they are never used and never freed, so eventually kmem cache may get exhausted. This patch provides a fix for that. Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reported-by: Markus Linnala <markus.linnala@gmail.com> Tested-by: Markus Linnala <markus.linnala@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24z3fold: fix retry mechanism in page reclaimVitaly Wool1-15/+34
z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration it will have zhdr from the first one so that zhdr is no longer in line with struct page. That leads to crashes when the system is stressed. Fix that by moving zhdr assignment up. While at it, protect against using already freed handles by using own local slots structure in z3fold_page_reclaim(). Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reported-by: Markus Linnala <markus.linnala@gmail.com> Reported-by: Chris Murphy <bugzilla@colorremedies.com> Reported-by: Agustin Dall'Alba <agustin@dallalba.com.ar> Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> Cc: Shakeel Butt <shakeelb@google.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24Revert "mm/z3fold.c: fix race between migration and destruction"Vitaly Wool1-90/+0
With the original commit applied, z3fold_zpool_destroy() may get blocked on wait_event() for indefinite time. Revert this commit for the time being to get rid of this problem since the issue the original commit addresses is less severe. Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction") Reported-by: Agustín Dall'Alba <agustin@dallalba.com.ar> Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolateGustavo A. R. Silva1-0/+1
Fix lock/unlock imbalance by unlocking *zhdr* before return. Addresses Coverity ID 1452811 ("Missing unlock") Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24mm/z3fold.c: fix race between migration and destructionHenry Burns1-0/+89
In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq). However, we have no guarantee that migration isn't happening in the background at that time. Migration directly calls queue_work_on(pool->compact_wq), if destruction wins that race we are using a destroyed workqueue. Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com Signed-off-by: Henry Burns <henryburns@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/z3fold.c: fix z3fold_destroy_pool() race conditionHenry Burns1-1/+4
The constraint from the zpool use of z3fold_destroy_pool() is there are no outstanding handles to memory (so no active allocations), but it is possible for there to be outstanding work on either of the two wqs in the pool. Calling z3fold_deregister_migration() before the workqueues are drained means that there can be allocated pages referencing a freed inode, causing any thread in compaction to be able to trip over the bad pointer in PageMovable(). Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com Fixes: 1f862989b04a ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jonathan Adams <jwadams@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm/z3fold.c: fix z3fold_destroy_pool() orderingHenry Burns1-1/+8
The constraint from the zpool use of z3fold_destroy_pool() is there are no outstanding handles to memory (so no active allocations), but it is possible for there to be outstanding work on either of the two wqs in the pool. If there is work queued on pool->compact_workqueue when it is called, z3fold_destroy_pool() will do: z3fold_destroy_pool() destroy_workqueue(pool->release_wq) destroy_workqueue(pool->compact_wq) drain_workqueue(pool->compact_wq) do_compact_page(zhdr) kref_put(&zhdr->refcount) __release_z3fold_page(zhdr, ...) queue_work_on(pool->release_wq, &pool->work) *BOOM* So compact_wq needs to be destroyed before release_wq. Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jonathan Adams <jwadams@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds1-9/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-16mm/z3fold.c: reinitialize zhdr structs after migrationHenry Burns1-0/+10
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE). However, zhdr contains fields that can't be directly coppied over (ex: list_head, a circular linked list). We only need to initialize the linked lists in new_zhdr, as z3fold_isolate_page() already ensures that these lists are empty Additionally it is possible that zhdr->work has been placed in a workqueue. In this case we shouldn't migrate the page, as zhdr->work references zhdr as opposed to new_zhdr. Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Jonathan Adams <jwadams@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16mm/z3fold.c: remove z3fold_migration trylockHenry Burns1-6/+0
z3fold_page_migrate() will never succeed because it attempts to acquire a lock that has already been taken by migrate.c in __unmap_and_move(). __unmap_and_move() migrate.c trylock_page(oldpage) move_to_new_page(oldpage_newpage) a_ops->migrate_page(oldpage, newpage) z3fold_page_migrate(oldpage, newpage) trylock_page(oldpage) Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com Fixes: 1f862989b04a ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Jonathan Adams <jwadams@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Snild Dolkow <snild@sony.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_allocHenry Burns1-3/+5
One of the gfp flags used to show that a page is movable is __GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We strip the movability related flags from the call to kmem_cache_alloc() for our slots since it is a kernel allocation. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.com Signed-off-by: Henry Burns <henryburns@google.com> Acked-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16mm/z3fold: don't try to use buddy slots after freeVitaly Wool1-1/+4
As reported by Henry Burns: Running z3fold stress testing with address sanitization showed zhdr->slots was being used after it was freed. z3fold_free(z3fold_pool, handle) free_handle(handle) kmem_cache_free(pool->c_handle, zhdr->slots) release_z3fold_page_locked_list(kref) __release_z3fold_page(zhdr, true) zhdr_to_pool(zhdr) slots_to_pool(zhdr->slots) *BOOM* To fix this, add pointer to the pool back to z3fold_header and modify zhdr_to_pool to return zhdr->pool. Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com Fixes: 7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles") Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reported-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12mm/z3fold.c: lock z3fold page before __SetPageMovable()Henry Burns1-1/+11
Following zsmalloc.c's example we call trylock_page() and unlock_page(). Also make z3fold_page_migrate() assert that newpage is passed in locked, as per the documentation. [akpm@linux-foundation.org: fix trylock_page return value test, per Shakeel] Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com Link: http://lkml.kernel.org/r/20190702233538.52793-1-henryburns@google.com Signed-off-by: Henry Burns <henryburns@google.com> Suggested-by: Vitaly Wool <vitalywool@gmail.com> Acked-by: Vitaly Wool <vitalywool@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Xidong Wang <wangxidong_97@163.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01z3fold: fix sheduling while atomicVitaly Wool1-5/+6
kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so we need to pass correct gfp flags to avoid "scheduling while atomic" bug. Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles") Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-25zsfold: Convert zsfold to use the new mount APIDavid Howells1-5/+5
Convert the zsfold filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-25mount_pseudo(): drop 'name' argument, switch to d_make_root()Al Viro1-1/+1
Once upon a time we used to set ->d_name of e.g. pipefs root so that d_path() on pipes would work. These days it's completely pointless - dentries of pipes are not even connected to pipefs root. However, mount_pseudo() had set the root dentry name (passed as the second argument) and callers kept inventing names to pass to it. Including those that didn't *have* any non-root dentries to start with... All of that had been pointless for about 8 years now; it's time to get rid of that cargo-culting... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21z3fold: don't bother with dentry_operationsDavid Howells1-5/+1
Don't bother with dentry_operations as no dentry is ever allocated. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-14mm/z3fold.c: support page migrationVitaly Wool1-10/+231
Now that we are not using page address in handles directly, we can make z3fold pages movable to decrease the memory fragmentation z3fold may create over time. This patch starts advertising non-headless z3fold pages as movable and uses the existing kernel infrastructure to implement moving of such pages per memory management subsystem's request. It thus implements 3 required callbacks for page migration: * isolation callback: z3fold_page_isolate(): try to isolate the page by removing it from all lists. Pages scheduled for some activity and mapped pages will not be isolated. Return true if isolation was successful or false otherwise * migration callback: z3fold_page_migrate(): re-check critical conditions and migrate page contents to the new page provided by the memory subsystem. Returns 0 on success or negative error code otherwise * putback callback: z3fold_page_putback(): put back the page if z3fold_page_migrate() for it failed permanently (i. e. not with -EAGAIN code). [lkp@intel.com: z3fold_page_isolate() can be static] Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42 Link: http://lkml.kernel.org/r/20190417103922.31253da5c366c4ebe0419cfc@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Signed-off-by: kbuild test robot <lkp@intel.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/z3fold.c: add structure for buddy handlesVitaly Wool1-40/+145
For z3fold to be able to move its pages per request of the memory subsystem, it should not use direct object addresses in handles. Instead, it will create abstract handles (3 per page) which will contain pointers to z3fold objects. Thus, it will be possible to change these pointers when z3fold page is moved. Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/z3fold.c: improve compression by extending searchVitaly Wool1-0/+36
The current z3fold implementation only searches this CPU's page lists for a fitting page to put a new object into. This patch adds quick search for very well fitting pages (i. e. those having exactly the required number of free space) on other CPUs too, before allocating a new page for that object. Link: http://lkml.kernel.org/r/20190417103733.72ae81abe1552397c95a008e@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/z3fold.c: introduce helper functionsVitaly Wool1-84/+100
Patch series "z3fold: support page migration", v2. This patchset implements page migration support and slightly better buddy search. To implement page migration support, z3fold has to move away from the current scheme of handle encoding. i. e. stop encoding page address in handles. Instead, a small per-page structure is created which will contain actual addresses for z3fold objects, while pointers to fields of that structure will be used as handles. Thus, it will be possible to change the underlying addresses to reflect page migration. To support migration itself, 3 callbacks will be implemented: 1: isolation callback: z3fold_page_isolate(): try to isolate the page by removing it from all lists. Pages scheduled for some activity and mapped pages will not be isolated. Return true if isolation was successful or false otherwise 2: migration callback: z3fold_page_migrate(): re-check critical conditions and migrate page contents to the new page provided by the system. Returns 0 on success or negative error code otherwise 3: putback callback: z3fold_page_putback(): put back the page if z3fold_page_migrate() for it failed permanently (i. e. not with -EAGAIN code). To make sure an isolated page doesn't get freed, its kref is incremented in z3fold_page_isolate() and decremented during post-migration compaction, if migration was successful, or by z3fold_page_putback() in the other case. Since the new handle encoding scheme implies slight memory consumption increase, better buddy search (which decreases memory consumption) is included in this patchset. This patch (of 4): Introduce a separate helper function for object allocation, as well as 2 smaller helpers to add a buddy to the list and to get a pointer to the pool from the z3fold header. No functional changes here. Link: http://lkml.kernel.org/r/20190417103633.a4bb770b5bf0fb7e43ce1666@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18z3fold: fix possible reclaim racesVitaly Wool1-39/+62
Reclaim and free can race on an object which is basically fine but in order for reclaim to be able to map "freed" object we need to encode object length in the handle. handle_to_chunks() is then introduced to extract object length from a handle and use it during mapping. Moreover, to avoid racing on a z3fold "headless" page release, we should not try to free that page in z3fold_free() if the reclaim bit is set. Also, in the unlikely case of trying to reclaim a page being freed, we should not proceed with that page. While at it, fix the page accounting in reclaim function. This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups". Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Signed-off-by: Jongseok Kim <ks77sj@gmail.com> Reported-by-by: Jongseok Kim <ks77sj@gmail.com> Reviewed-by: Snild Dolkow <snild@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11z3fold: fix reclaim lock-upsVitaly Wool1-12/+30
Do not try to optimize in-page object layout while the page is under reclaim. This fixes lock-ups on reclaim and improves reclaim performance at the same time. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: <Oleksiy.Avramchenko@sony.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/z3fold.c: use gfpflags_allow_blockingMatthew Wilcox1-1/+1
We have a perfectly good macro to determine whether the gfp flags allow you to sleep or not; use it instead of trying to infer it. Link: http://lkml.kernel.org/r/20180408062206.GC16007@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vitaly Wool <vitalywool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11z3fold: fix memory leakXidong Wang1-2/+7
In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not released on the error path that pool->compact_wq , which holds the return value of create_singlethread_workqueue(), is NULL. This will result in a memory leak bug. [akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval] Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com Signed-off-by: Xidong Wang <wangxidong_97@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05z3fold: limit use of stale list for allocationVitaly Wool1-16/+19
Currently if z3fold couldn't find an unbuddied page it would first try to pull a page off the stale list. The problem with this approach is that we can't 100% guarantee that the page is not processed by the workqueue thread at the same time unless we run cancel_work_sync() on it, which we can't do if we're in an atomic context. So let's just limit stale list usage to non-atomic contexts only. Link: http://lkml.kernel.org/r/47ab51e7-e9c1-d30e-ab17-f734dbc3abce@gmail.com Signed-off-by: Vitaly Vul <vitaly.vul@sony.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06mm: docs: fix parameter names mismatchMike Rapoport1-2/+2
There are several places where parameter descriptions do no match the actual code. Fix it. Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm/z3fold.c: use kref to prevent page free/compact raceVitaly Wool1-2/+8
There is a race in the current z3fold implementation between do_compact() called in a work queue context and the page release procedure when page's kref goes to 0. do_compact() may be waiting for page lock, which is released by release_z3fold_page_locked right before putting the page onto the "stale" list, and then the page may be freed as do_compact() modifies its contents. The mechanism currently implemented to handle that (checking the PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref counter to guarantee that the page is not released if its compaction is scheduled. It then becomes compaction function's responsibility to decrease the counter and quit immediately if the page was actually freed. Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com Signed-off-by: Vitaly Wool <vitaly.wool@sonymobile.com> Cc: <Oleksiy.Avramchenko@sony.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03z3fold: fix stale list handlingVitaly Wool1-4/+2
Fix the situation when clear_bit() is called for page->private before the page pointer is actually assigned. While at it, remove work_busy() check because it is costly and does not give 100% guarantee anyway. Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03z3fold: fix potential race in z3fold_reclaim_pageVitaly Wool1-1/+3
It is possible that on a (partially) unsuccessful page reclaim, kref_put() called in z3fold_reclaim_page() does not yield page release, but the page is released shortly afterwards by another thread. Then z3fold_reclaim_page() would try to list_add() that (released) page again which is obviously a bug. To avoid that, spin_lock() has to be taken earlier, before the kref_put() call mentioned earlier. Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06z3fold: use per-cpu unbuddied listsVitaly Wool1-135/+344
It's been noted that z3fold doesn't scale well when it's run in a large number of threads on many cores, which can be easily reproduced with fio 'randrw' test with --numjobs=32. E.g. the result for 1 cluster (4 cores) is: Run status group 0 (all jobs): READ: io=244785MB, aggrb=496883KB/s, minb=15527KB/s, ... WRITE: io=246735MB, aggrb=500841KB/s, minb=15651KB/s, ... While for 8 cores (2 clusters) the result is: Run status group 0 (all jobs): READ: io=244785MB, aggrb=265942KB/s, minb=8310KB/s, ... WRITE: io=246735MB, aggrb=268060KB/s, minb=8376KB/s, ... The bottleneck here is the pool lock which many threads become waiting upon. To reduce that spin lock contention, z3fold can operate only on the lists local to the current CPU whenever possible. Due to the nature of z3fold unbuddied list handling (it only takes the first entry off the list on a hot path), if the z3fold pool is big enough and balanced well enough, limiting search to only local unbuddied list doesn't lead to a significant compression ratio degrade (2.57x vs 2.65x in our measurements). This patch also introduces two worker threads: one for async in-page object layout optimization and one for releasing freed pages. This is done to speed up z3fold_free() which is often on a hot path. The fio results for 8-core case are now the following: Run status group 0 (all jobs): READ: io=244785MB, aggrb=1568.3MB/s, minb=50182KB/s, ... WRITE: io=246735MB, aggrb=1580.8MB/s, minb=50582KB/s, ... So we're in for almost 6x performance increase. Link: http://lkml.kernel.org/r/20170806181443.f9b65018f8bde25ef990f9e8@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13z3fold: fix page locking in z3fold_alloc()Vitaly Wool1-2/+7
Stress testing of the current z3fold implementation on a 8-core system revealed it was possible that a z3fold page deleted from its unbuddied list in z3fold_alloc() would be put on another unbuddied list by z3fold_free() while z3fold_alloc() is still processing it. This has been introduced with commit 5a27aa822 ("z3fold: add kref refcounting") due to the removal of special handling of a z3fold page not on any list in z3fold_free(). To fix this, the z3fold page lock should be taken in z3fold_alloc() before the pool lock is released. To avoid deadlocking, we just try to lock the page as soon as we get a hold of it, and if trylock fails, we drop this page and take the next one. Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-16z3fold: fix spinlock unlocking in page reclaimVitaly Wool1-0/+1
Commmit 5a27aa822029 ("z3fold: add kref refcounting") introduced a bug in z3fold_reclaim_page() with function exit that may leave pool->lock spinlock held. Here comes the trivial fix. Fixes: 5a27aa822029 ("z3fold: add kref refcounting") Link: http://lkml.kernel.org/r/20170311222239.7b83d8e7ef1914e05497649f@gmail.com Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24z3fold: add kref refcountingVitaly Wool1-86/+69
With both coming and already present locking optimizations, introducing kref to reference-count z3fold objects is the right thing to do. Moreover, it makes buddied list no longer necessary, and allows for a simpler handling of headless pages. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170131214650.8ea78033d91ded233f552bc0@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24z3fold: use per-page spinlockVitaly Wool1-42/+106
Most of z3fold operations are in-page, such as modifying z3fold page header or moving z3fold objects within a page. Taking per-pool spinlock to protect per-page objects is therefore suboptimal, and the idea of having a per-page spinlock (or rwlock) has been around for some time. This patch implements spinlock-based per-page locking mechanism which is lightweight enough to normally fit ok into the z3fold header. Link: http://lkml.kernel.org/r/20170131214438.433e0a5fda908337b63206d3@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24z3fold: extend compaction functionVitaly Wool1-1/+25
z3fold_compact_page() currently only handles the situation when there's a single middle chunk within the z3fold page. However it may be worth it to move middle chunk closer to either first or last chunk, whichever is there, if the gap between them is big enough. This patch adds the relevant code, using BIG_CHUNK_GAP define as a threshold for middle chunk to be worth moving. Link: http://lkml.kernel.org/r/20170131214334.c4f3eac9a477af0fa9a22c46@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24z3fold: fix header size related issuesVitaly Wool1-50/+64
Currently the whole kernel build will be stopped if the size of struct z3fold_header is greater than the size of one chunk, which is 64 bytes by default. This patch instead defines the offset for z3fold objects as the size of the z3fold header in chunks. Fixed also are the calculation of num_free_chunks() and the address to move the middle chunk to in case of in-page compaction in z3fold_compact_page(). Link: http://lkml.kernel.org/r/20170131214057.d98677032bc7b1c6c59a80c9@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24z3fold: make pages_nr atomicVitaly Wool1-11/+9
Convert pages_nr per-pool counter to atomic64_t. Link: http://lkml.kernel.org/r/20170131213946.b828676ab17bbea42022c213@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22mm/z3fold.c: limit first_num to the actual range of possible buddy indexeszhong jiang1-3/+7
At present, Tying the first_num size to NCHUNKS_ORDER is confusing. the number of chunks is completely unrelated to the number of buddies. The patch limits the first_num to actual range of possible buddy indexes. and that is more reasonable and obvious without functional change. Link: http://lkml.kernel.org/r/1476776569-29504-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Suggested-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Vitaly Wool <vitalywool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm/z3fold.c: avoid modifying HEADLESS page and minor cleanupVitaly Wool1-10/+14
Fix erroneous z3fold header access in a HEADLESS page in reclaim function, and change one remaining direct handle-to-buddy conversion to use the appropriate helper. Link: http://lkml.kernel.org/r/5748706F.9020208@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20z3fold: the 3-fold allocator for compressed pagesVitaly Wool1-0/+792
This patch introduces z3fold, a special purpose allocator for storing compressed pages. It is designed to store up to three compressed pages per physical page. It is a ZBUD derivative which allows for higher compression ratio keeping the simplicity and determinism of its predecessor. This patch comes as a follow-up to the discussions at the Embedded Linux Conference in San-Diego related to the talk [1]. The outcome of these discussions was that it would be good to have a compressed page allocator as stable and deterministic as zbud with with higher compression ratio. To keep the determinism and simplicity, z3fold, just like zbud, always stores an integral number of compressed pages per page, but it can store up to 3 pages unlike zbud which can store at most 2. Therefore the compression ratio goes to around 2.6x while zbud's one is around 1.7x. The patch is based on the latest linux.git tree. This version has been updated after testing on various simulators (e.g. ARM Versatile Express, MIPS Malta, x86_64/Haswell) and basing on comments from Dan Streetman [3]. [1] https://openiotelc2016.sched.org/event/6DAC/swapping-and-embedded-compression-relieves-the-pressure-vitaly-wool-softprise-consulting-ou [2] https://lkml.org/lkml/2016/4/21/799 [3] https://lkml.org/lkml/2016/5/4/852 Link: http://lkml.kernel.org/r/20160509151753.ec3f9fda3c9898d31ff52a32@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>