Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That
means we now have six free FMODE_* bits in total (but bit #6
already got used for FMODE_WRITE_RESTRICTED)
- Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup)
- Add fd_raw cleanup class so we can make use of automatic cleanup
provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well
- Optimize seq_puts()
- Simplify __seq_puts()
- Add new anon_inode_getfile_fmode() api to allow specifying f_mode
instead of open-coding it in multiple places
- Annotate struct file_handle with __counted_by() and use
struct_size()
- Warn in get_file() whether f_count resurrection from zero is
attempted (epoll/drm discussion)
- Folio-sophize aio
- Export the subvolume id in statx() for both btrfs and bcachefs
- Relax linkat(AT_EMPTY_PATH) requirements
- Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors
for dup*() equality replacing kcmp()
Cleanups:
- Compile out swapfile inode checks when swap isn't enabled
- Use (1 << n) notation for FMODE_* bitshifts for clarity
- Remove redundant variable assignment in fs/direct-io
- Cleanup uses of strncpy in orangefs
- Speed up and cleanup writeback
- Move fsparam_string_empty() helper into header since it's currently
open-coded in multiple places
- Add kernel-doc comments to proc_create_net_data_write()
- Don't needlessly read dentry->d_flags twice
Fixes:
- Fix out-of-range warning in nilfs2
- Fix ecryptfs overflow due to wrong encryption packet size
calculation
- Fix overly long line in xfs file_operations (follow-up to FMODE_*
cleanup)
- Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up
to FMODE_* cleanup)
- Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_*
cleanup)
- Fix stable offset api to prevent endless loops
- Fix afs file server rotations
- Prevent xattr node from overflowing the eraseblock in jffs2
- Move fdinfo PTRACE_MODE_READ procfs check into the .permission()
operation instead of .open() operation since this caused userspace
regressions"
* tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
afs: Fix fileserver rotation getting stuck
selftests: add F_DUPDFD_QUERY selftests
fcntl: add F_DUPFD_QUERY fcntl()
file: add fd_raw cleanup class
fs: WARN when f_count resurrection is attempted
seq_file: Simplify __seq_puts()
seq_file: Optimize seq_puts()
proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
fs: Create anon_inode_getfile_fmode()
xfs: don't call xfs_file_open from xfs_dir_open
xfs: drop fop_flags for directories
xfs: fix overly long line in the file_operations
shmem: Fix shmem_rename2()
libfs: Add simple_offset_rename() API
libfs: Fix simple_offset_rename_exchange()
jffs2: prevent xattr node from overflowing the eraseblock
vfs, swap: compile out IS_SWAPFILE() on swapless configs
vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
fs/direct-io: remove redundant assignment to variable retval
fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
...
|
|
Removes uses of page_mapping() and page_index().
Link: https://lkml.kernel.org/r/20240423225552.4113447-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There's a bunch of flags that are purely based on what the file
operations support while also never being conditionally set or unset.
IOW, they're not subject to change for individual files. Imho, such
flags don't need to live in f_mode they might as well live in the fops
structs itself. And the fops struct already has that lonely
mmap_supported_flags member. We might as well turn that into a generic
fop_flags member and move a few flags from FMODE_* space into FOP_*
space. That gets us four FMODE_* bits back and the ability for new
static flags that are about file ops to not have to live in FMODE_*
space but in their own FOP_* space. It's not the most beautiful thing
ever but it gets the job done. Yes, there'll be an additional pointer
chase but hopefully that won't matter for these flags.
I suspect there's a few more we can move into there and that we can also
redirect a bunch of new flag suggestions that follow this pattern into
the fop_flags field instead of f_mode.
Link: https://lore.kernel.org/r/20240328-gewendet-spargel-aa60a030ef74@brauner
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently a device is only really released once the umount returns to
userspace due to how file closing works. That ultimately could cause
an old umount assumption to be violated that concurrent umount and mount
don't fail. So an exclusively held device with a temporary holder should
be yielded before the filesystem is gone. Add a helper that allows
callers to do that. This also allows us to remove the two holder ops
that Linus wasn't excited about.
Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org
Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this round, there are a number of updates on mainly two areas:
Zoned block device support and Per-file compression. For example,
we've found several issues to support Zoned block device especially
having large sections regarding to GC and file pinning used for
Android devices. In compression side, we've fixed many corner race
conditions that had broken the design assumption.
Enhancements:
- Support file pinning for Zoned block device having large section
- Enhance the data recovery after sudden power cut on Zoned block
device
- Add more error injection cases to easily detect the kernel panics
- add a proc entry show the entire disk layout
- Improve various error paths paniced by BUG_ON in block allocation
and GC
- support SEEK_DATA and SEEK_HOLE for compression files
Bug fixes:
- avoid use-after-free issue in f2fs_filemap_fault
- fix some race conditions to break the atomic write design
assumption
- fix to truncate meta inode pages forcely
- resolve various per-file compression issues wrt the space
management and compression policies
- fix some swap-related bugs
In addition, we removed deprecated codes such as io_bits and
heap_allocation, and also fixed minor error handling routines with
neat debugging messages"
* tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault
f2fs: truncate page cache before clearing flags when aborting atomic write
f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag
f2fs: prevent atomic write on pinned file
f2fs: fix to handle error paths of {new,change}_curseg()
f2fs: unify the error handling of f2fs_is_valid_blkaddr
f2fs: zone: fix to remove pow2 check condition for zoned block device
f2fs: fix to truncate meta inode pages forcely
f2fs: compress: fix reserve_cblocks counting error when out of space
f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks
f2fs: add a proc entry show disk layout
f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup
f2fs: fix to check return value of f2fs_gc_range
f2fs: fix to check return value __allocate_new_segment
f2fs: fix to do sanity check in update_sit_entry
f2fs: fix to reset fields for unloaded curseg
f2fs: clean up new_curseg()
f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate()
f2fs: fix blkofs_end correctly in f2fs_migrate_blocks()
f2fs: ro: don't start discard thread for readonly image
...
|
|
syzbot reports a f2fs bug as below:
BUG: KASAN: slab-use-after-free in f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49
Read of size 8 at addr ffff88807bb22680 by task syz-executor184/5058
CPU: 0 PID: 5058 Comm: syz-executor184 Not tainted 6.7.0-syzkaller-09928-g052d534373b7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x163/0x540 mm/kasan/report.c:488
kasan_report+0x142/0x170 mm/kasan/report.c:601
f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49
__do_fault+0x131/0x450 mm/memory.c:4376
do_shared_fault mm/memory.c:4798 [inline]
do_fault mm/memory.c:4872 [inline]
do_pte_missing mm/memory.c:3745 [inline]
handle_pte_fault mm/memory.c:5144 [inline]
__handle_mm_fault+0x23b7/0x72b0 mm/memory.c:5285
handle_mm_fault+0x27e/0x770 mm/memory.c:5450
do_user_addr_fault arch/x86/mm/fault.c:1364 [inline]
handle_page_fault arch/x86/mm/fault.c:1507 [inline]
exc_page_fault+0x456/0x870 arch/x86/mm/fault.c:1563
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
The root cause is: in f2fs_filemap_fault(), vmf->vma may be not alive after
filemap_fault(), so it may cause use-after-free issue when accessing
vmf->vma->vm_flags in trace_f2fs_filemap_fault(). So it needs to keep vm_flags
in separated temporary variable for tracepoint use.
Fixes: 87f3afd366f7 ("f2fs: add tracepoint for f2fs_vm_page_mkwrite()")
Reported-and-tested-by: syzbot+763afad57075d3f862f2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000e8222b060f00db3b@google.com
Cc: Ed Tsai <Ed.Tsai@mediatek.com>
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_do_write_data_page, FI_ATOMIC_FILE flag selects the target inode
between the original inode and COW inode. When aborting atomic write and
writeback occur simultaneously, invalid data can be written to original
inode if the FI_ATOMIC_FILE flag is cleared meanwhile.
To prevent the problem, let's truncate all pages before clearing the flag
Atomic write thread Writeback thread
f2fs_abort_atomic_write
clear_inode_flag(inode, FI_ATOMIC_FILE)
__writeback_single_inode
do_writepages
f2fs_do_write_data_page
- use dn of original inode
truncate_inode_pages_final
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_update_inode, i_size of the atomic file isn't updated until
FI_ATOMIC_COMMITTED flag is set. When committing atomic write right
after the writeback of the inode, i_size of the raw inode will not be
updated. It can cause the atomicity corruption due to a mismatch between
old file size and new data.
To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED
Atomic write thread Writeback thread
__writeback_single_inode
write_inode
f2fs_update_inode
- skip i_size update
f2fs_ioc_commit_atomic_write
f2fs_commit_atomic_write
set_inode_flag(inode, FI_ATOMIC_COMMITTED)
f2fs_do_sync_file
f2fs_fsync_node_pages
- skip f2fs_update_inode since the inode is clean
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, isofs, udf, and quota updates from Jan Kara:
"A lot of material this time:
- removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it
was legacy or replaced with scoped memalloc_nofs_*() API)
- removal of BUG_ONs in quota code
- conversion of UDF to the new mount API
- tightening quota on disk format verification
- fix some potentially unsafe use of RCU pointers in quota code and
annotate everything properly to make sparse happy
- a few other small quota, ext2, udf, and isofs fixes"
* tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits)
udf: remove SLAB_MEM_SPREAD flag usage
quota: remove SLAB_MEM_SPREAD flag usage
isofs: remove SLAB_MEM_SPREAD flag usage
ext2: remove SLAB_MEM_SPREAD flag usage
ext2: mark as deprecated
udf: convert to new mount API
udf: convert novrs to an option flag
MAINTAINERS: add missing git address for ext2 entry
quota: Detect loops in quota tree
quota: Properly annotate i_dquot arrays with __rcu
quota: Fix rcu annotations of inode dquot pointers
isofs: handle CDs with bad root inode but good Joliet root directory
udf: Avoid invalid LVID used on mount
quota: Fix potential NULL pointer dereference
quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
quota: Set nofs allocation context when acquiring dqio_sem
ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert()
ext2: Drop GFP_NOFS use in ext2_get_blocks()
ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info()
udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb()
...
|
|
Since atomic write way was changed to out-place-update, we should
prevent it on pinned files.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
{new,change}_curseg() may return error in some special cases,
error handling should be did in their callers, and this will also
facilitate subsequent error path expansion in {new,change}_curseg().
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There are some cases of f2fs_is_valid_blkaddr not handled as
ERROR_INVALID_BLKADDR,so unify the error handling about all of
f2fs_is_valid_blkaddr.
Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned
device") missed to remove pow2 check condition in init_blkz_info(),
fix it.
Fixes: 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned device")
Signed-off-by: Feng Song <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Below race case can cause data corruption:
Thread A GC thread
- gc_data_segment
- ra_data_block
- locked meta_inode page
- f2fs_inplace_write_data
- invalidate_mapping_pages
: fail to invalidate meta_inode page
due to lock failure or dirty|writeback
status
- f2fs_submit_page_bio
: write last dirty data to old blkaddr
- move_data_block
- load old data from meta_inode page
- f2fs_submit_page_write
: write old data to new blkaddr
Because invalidate_mapping_pages() will skip invalidating page which
has unclear status including locked, dirty, writeback and so on, so
we need to use truncate_inode_pages_range() instead of
invalidate_mapping_pages() to make sure meta_inode page will be dropped.
Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC")
Fixes: e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When a file only needs one direct_node, performing the following
operations will cause the file to be unrepairable:
unisoc # ./f2fs_io compress test.apk
unisoc #df -h | grep dm-48
/dev/block/dm-48 112G 112G 1.2M 100% /data
unisoc # ./f2fs_io release_cblocks test.apk
924
unisoc # df -h | grep dm-48
/dev/block/dm-48 112G 112G 4.8M 100% /data
unisoc # dd if=/dev/random of=file4 bs=1M count=3
3145728 bytes (3.0 M) copied, 0.025 s, 120 M/s
unisoc # df -h | grep dm-48
/dev/block/dm-48 112G 112G 1.8M 100% /data
unisoc # ./f2fs_io reserve_cblocks test.apk
F2FS_IOC_RESERVE_COMPRESS_BLOCKS failed: No space left on device
adb reboot
unisoc # df -h | grep dm-48
/dev/block/dm-48 112G 112G 11M 100% /data
unisoc # ./f2fs_io reserve_cblocks test.apk
0
This is because the file has only one direct_node. After returning
to -ENOSPC, reserved_blocks += ret will not be executed. As a result,
the reserved_blocks at this time is still 0, which is not the real
number of reserved blocks. Therefore, fsck cannot be set to repair
the file.
After this patch, the fsck flag will be set to fix this problem.
unisoc # df -h | grep dm-48
/dev/block/dm-48 112G 112G 1.8M 100% /data
unisoc # ./f2fs_io reserve_cblocks test.apk
F2FS_IOC_RESERVE_COMPRESS_BLOCKS failed: No space left on device
adb reboot then fsck will be executed
unisoc # df -h | grep dm-48
/dev/block/dm-48 112G 112G 11M 100% /data
unisoc # ./f2fs_io reserve_cblocks test.apk
924
Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS")
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The following f2fs_io test will get a "0" result instead of -EINVAL,
unisoc # ./f2fs_io compress file
unisoc # ./f2fs_io reserve_cblocks file
0
it's not reasonable, so the judgement of
atomic_read(&F2FS_I(inode)->i_compr_blocks) should be placed after
the judgement of is_inode_flag_set(inode, FI_COMPRESS_RELEASED).
Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS")
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Pull block updates from Jens Axboe:
- MD pull requests via Song:
- Cleanup redundant checks (Yu Kuai)
- Remove deprecated headers (Marc Zyngier, Song Liu)
- Concurrency fixes (Li Lingfeng)
- Memory leak fix (Li Nan)
- Refactor raid1 read_balance (Yu Kuai, Paul Luse)
- Clean up and fix for md_ioctl (Li Nan)
- Other small fixes (Gui-Dong Han, Heming Zhao)
- MD atomic limits (Christoph)
- NVMe pull request via Keith:
- RDMA target enhancements (Max)
- Fabrics fixes (Max, Guixin, Hannes)
- Atomic queue_limits usage (Christoph)
- Const use for class_register (Ricardo)
- Identification error handling fixes (Shin'ichiro, Keith)
- Improvement and cleanup for cached request handling (Christoph)
- Moving towards atomic queue limits. Core changes and driver bits so
far (Christoph)
- Fix UAF issues in aoeblk (Chun-Yi)
- Zoned fix and cleanups (Damien)
- s390 dasd cleanups and fixes (Jan, Miroslav)
- Block issue timestamp caching (me)
- noio scope guarding for zoned IO (Johannes)
- block/nvme PI improvements (Kanchan)
- Ability to terminate long running discard loop (Keith)
- bdev revalidation fix (Li)
- Get rid of old nr_queues hack for kdump kernels (Ming)
- Support for async deletion of ublk (Ming)
- Improve IRQ bio recycling (Pavel)
- Factor in CPU capacity for remote vs local completion (Qais)
- Add shared_tags configfs entry for null_blk (Shin'ichiro
- Fix for a regression in page refcounts introduced by the folio
unification (Tony)
- Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid,
Ricardo, Roman, Tang, Uwe)
* tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits)
block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC
block/swim: Convert to platform remove callback returning void
cdrom: gdrom: Convert to platform remove callback returning void
block: remove disk_stack_limits
md: remove mddev->queue
md: don't initialize queue limits
md/raid10: use the atomic queue limit update APIs
md/raid5: use the atomic queue limit update APIs
md/raid1: use the atomic queue limit update APIs
md/raid0: use the atomic queue limit update APIs
md: add queue limit helpers
md: add a mddev_is_dm helper
md: add a mddev_add_trace_msg helper
md: add a mddev_trace_remap helper
bcache: move calculation of stripe_size and io_opt into bcache_device_init
virtio_blk: Do not use disk_set_max_open/active_zones()
aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts
block: move capacity validation to blkpg_do_ioctl()
block: prevent division by zero in blk_rq_stat_sum()
drbd: atomically update queue limits in drbd_reconsider_queue_parameters
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs uuid updates from Christian Brauner:
"This adds two new ioctl()s for getting the filesystem uuid and
retrieving the sysfs path based on the path of a mounted filesystem.
Getting the filesystem uuid has been implemented in filesystem
specific code for a while it's now lifted as a generic ioctl"
* tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: add support for FS_IOC_GETFSSYSFSPATH
fs: add FS_IOC_GETFSSYSFSPATH
fat: Hook up sb->s_uuid
fs: FS_IOC_GETUUID
ovl: convert to super_set_uuid()
fs: super_set_uuid()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull block handle updates from Christian Brauner:
"Last cycle we changed opening of block devices, and opening a block
device would return a bdev_handle. This allowed us to implement
support for restricting and forbidding writes to mounted block
devices. It was accompanied by converting and adding helpers to
operate on bdev_handles instead of plain block devices.
That was already a good step forward but ultimately it isn't necessary
to have special purpose helpers for opening block devices internally
that return a bdev_handle.
Fundamentally, opening a block device internally should just be
equivalent to opening files. So now all internal opens of block
devices return files just as a userspace open would. Instead of
introducing a separate indirection into bdev_open_by_*() via struct
bdev_handle bdev_file_open_by_*() is made to just return a struct
file. Opening and closing a block device just becomes equivalent to
opening and closing a file.
This all works well because internally we already have a pseudo fs for
block devices and so opening block devices is simple. There's a few
places where we needed to be careful such as during boot when the
kernel is supposed to mount the rootfs directly without init doing it.
Here we need to take care to ensure that we flush out any asynchronous
file close. That's what we already do for opening, unpacking, and
closing the initramfs. So nothing new here.
The equivalence of opening and closing block devices to regular files
is a win in and of itself. But it also has various other advantages.
We can remove struct bdev_handle completely. Various low-level helpers
are now private to the block layer. Other helpers were simply
removable completely.
A follow-up series that is already reviewed build on this and makes it
possible to remove bdev->bd_inode and allows various clean ups of the
buffer head code as well. All places where we stashed a bdev_handle
now just stash a file and use simple accessors to get to the actual
block device which was already the case for bdev_handle"
* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
block: remove bdev_handle completely
block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
bdev: remove bdev pointer from struct bdev_handle
bdev: make struct bdev_handle private to the block layer
bdev: make bdev_{release, open_by_dev}() private to block layer
bdev: remove bdev_open_by_path()
reiserfs: port block device access to file
ocfs2: port block device access to file
nfs: port block device access to files
jfs: port block device access to file
f2fs: port block device access to files
ext4: port block device access to file
erofs: port device access to file
btrfs: port device access to file
bcachefs: port block device access to file
target: port block device access to file
s390: port block device access to file
nvme: port block device access to file
block2mtd: port device access to files
bcache: port block device access to files
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
- Restore read-write hints in struct bio through the bi_write_hint
member for the sake of UFS devices in mobile applications. This can
result in up to 40% lower write amplification in UFS devices. The
patch series that builds on this will be coming in via the SCSI
maintainers (Bart)
- Overhaul the iomap writeback code. Afterwards ->map_blocks() is able
to map multiple blocks at once as long as they're in the same folio.
This reduces CPU usage for buffered write workloads on e.g., xfs on
systems with lots of cores (Christoph)
- Record processed bytes in iomap_iter() trace event (Kassey)
- Extend iomap_writepage_map() trace event after Christoph's
->map_block() changes to map mutliple blocks at once (Zhang)
* tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
iomap: Add processed for iomap_iter
iomap: add pos and dirty_len into trace_iomap_writepage_map
block, fs: Restore the per-bio/request data lifetime fields
fs: Propagate write hints to the struct block_device inode
fs: Move enum rw_hint into a new header file
fs: Split fcntl_rw_hint()
fs: Verify write lifetime constants at compile time
fs: Fix rw_hint validation
iomap: pass the length of the dirty region to ->map_blocks
iomap: map multiple blocks at a time
iomap: submit ioends immediately
iomap: factor out a iomap_writepage_map_block helper
iomap: only call mapping_set_error once for each failed bio
iomap: don't chain bios
iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
iomap: clean up the iomap_alloc_ioend calling convention
iomap: move all remaining per-folio logic into iomap_writepage_map
iomap: factor out a iomap_writepage_handle_eof helper
iomap: move the PF_MEMALLOC check to iomap_writepages
iomap: move the io_folios field out of struct iomap_ioend
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc
Merge case-insensitive updates from Gabriel Krisman Bertazi:
- Patch case-insensitive lookup by trying the case-exact comparison
first, before falling back to costly utf8 casefolded comparison.
- Fix to forbid using a case-insensitive directory as part of an
overlayfs mount.
- Patchset to ensure d_op are set at d_alloc time for fscrypt and
casefold volumes, ensuring filesystem dentries will all have the
correct ops, whether they come from a lookup or not.
* tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
libfs: Drop generic_set_encrypted_ci_d_ops
ubifs: Configure dentry operations at dentry-creation time
f2fs: Configure dentry operations at dentry-creation time
ext4: Configure dentry operations at dentry-creation time
libfs: Add helper to choose dentry operations at mount-time
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
fscrypt: Drop d_revalidate once the key is added
fscrypt: Drop d_revalidate for valid dentries during lookup
fscrypt: Factor out a helper to configure the lookup dentry
ovl: Always reject mounting over case-insensitive directories
libfs: Attempt exact-match comparison first during casefolded lookup
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This patch adds the disk map of block address ranges configured by multiple
partitions.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no functional change.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_gc_range may return error, so its caller
f2fs_allocate_pinning_section should determine whether
to do retry based on ist return value.
Also just do f2fs_gc_range when f2fs_allocate_new_section
return -EAGAIN, and check cp error case in f2fs_gc_range.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
__allocate_new_segment may return error when get_new_segment
fails, so its caller should check its return value.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If GET_SEGNO return NULL_SEGNO for some unecpected case,
update_sit_entry will access invalid memory address,
cause system crash. It is better to do sanity check about
GET_SEGNO just like update_segment_mtime & locate_dirty_segment.
Also remove some redundant judgment code.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_allocate_data_block(), before skip allocating new segment
for DATA_PINNED log header, it needs to tag log header as unloaded
one to avoid skipping logic in locate_dirty_segment() and
__f2fs_save_inmem_curseg().
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Move f2fs_valid_pinned_area() check logic from new_curseg() to
get_new_segment(), it can avoid calling __set_free() if it fails
to find free segment in conventional zone for pinned file.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch exchangs position of f2fs_precache_extents() and
filemap_fdatawrite(), so that f2fs_precache_extents() can load
extent info after physical addresses of all data are fixed.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_migrate_blocks(), when traversing blocks in last section,
blkofs_end should be (start_blk + blkcnt - 1) % blk_per_sec, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
[ 9299.893835] F2FS-fs (vdd): Allow to mount readonly mode only
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
root@qemu:/ ps -ef|grep f2fs
root 94 2 0 03:46 ? 00:00:00 [kworker/u17:0-f2fs_post_read_wq]
root 6282 2 0 06:21 ? 00:00:00 [f2fs_discard-253:48]
There will be no deletion in readonly image, let's skip starting
discard thread to save system resources.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Mapping info from dump.f2fs:
i_addr[0x2d] cluster flag [0xfffffffe : 4294967294]
i_addr[0x2e] [0x 10428 : 66600]
i_addr[0x2f] [0x 10429 : 66601]
i_addr[0x30] [0x 1042a : 66602]
f2fs_io fiemap 37 1 /mnt/f2fs/disk-58390c8c.raw
Previsouly, it missed to align fofs and ofs_in_node to cluster_size,
result in adding incorrect read extent cache, fix it.
Before:
f2fs_update_read_extent_tree_range: dev = (253,48), ino = 5, pgofs = 37, len = 4, blkaddr = 66600, c_len = 3
After:
f2fs_update_read_extent_tree_range: dev = (253,48), ino = 5, pgofs = 36, len = 4, blkaddr = 66600, c_len = 3
Fixes: 94afd6d6e525 ("f2fs: extent cache: support unaligned extent")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_insert_range(), it missed to check return value of
filemap_write_and_wait_range(), fix it.
Meanwhile, just return error number once __exchange_data_block()
fails.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
@type in f2fs_allocate_data_block() indicates log header's type, it
can be CURSEG_COLD_DATA_PINNED or CURSEG_ALL_DATA_ATGC, rather than
type of data/node, however IS_DATASEG()/IS_NODESEG() only accept later
one, fix it.
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't block mounting the partition, if cap is 100%.
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No functional change, but add some more logs.
Note, it includes the spelling mistakes pointed by Colin Ian King.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Even if the roll forward recovery stopped due to any error, we have to fix
the write pointers in order to mount the disk from the previous checkpoint.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In cfd66bb715fd ("f2fs: fix deadloop in foreground GC"), we needed to check
the number of blocks in a section instead of the segment.
Fixes: cfd66bb715fd ("f2fs: fix deadloop in foreground GC")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't get stuck in the f2fs_gc loop while disabling checkpoint. Instead, we have
a time-based management.
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use it to simulate no free segment case during block allocation.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to check compress flag w/ .i_sem lock, otherwise, compressed
inode may be disabled after the check condition, it's not needed to
set compress option on non-compress inode.
Fixes: e1e8debec656 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.
Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There is low probability that an out-of-bounds segment will be got
on a small-capacity device. In order to prevent subsequent write requests
allocating block address from this invalid segment, which may cause
unexpected issue, stop checkpoint should be performed.
Also introduce a new stop cp reason: STOP_CP_REASON_NO_SEGMENT.
Note, f2fs_stop_checkpoint(, false) is complex and it may sleep, so we should
move it outside segmap_lock spinlock coverage in get_new_segment().
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 093749e296e2 ("f2fs: support age threshold based garbage
collection") added this declaration, but w/ definition, delete
it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This was already the case for case-insensitive before commit
bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), but it
was changed to set at lookup-time to facilitate the integration with
fscrypt. But it's a problem because dentries that don't get created
through ->lookup() won't have any visibility of the operations.
Since fscrypt now also supports configuring dentry operations at
creation-time, do it for any encrypted and/or casefold volume,
simplifying the implementation across these features.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20240221171412.10710-9-krisman@suse.de
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
|
|
There are very similar codes in inc_valid_block_count() and
inc_valid_node_count() which is used for available user block
count calculation.
This patch introduces a new helper get_available_block_count()
to include those common codes, and used it to clean up codes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Support file pinning with conventional storage area for zoned devices
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No one uses this feature. Let's kill it.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fix to support SEEK_DATA and SEEK_HOLE for compression files
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs only support to config zstd compress level w/ a positive number due
to layout design, but since commit e0c1b49f5b67 ("lib: zstd: Upgrade to
latest upstream zstd version 1.4.10"), zstd supports negative compress
level, so that zstd_min_clevel() may return a negative number, then w/
below mount option, .compress_level can be configed w/ a negative number,
which is not allowed to f2fs, let's add check condition to avoid it.
mount -o compress_algorithm=zstd:4294967295 /dev/sdx /mnt/f2fs
Fixes: 00e120b5e4b5 ("f2fs: assign default compression level")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use folio in f2fs_read_merkle_tree_page to reduce folio & page converisons
from find_get_page_flags and read_mapping_page functions. But the return
value should be the exact page.
Signed-off-by: HuangXiaojia <huangxiaojia2@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
generic/700 - output mismatch (see /media/fstests/results//generic/700.out.bad)
--- tests/generic/700.out 2023-03-28 10:40:42.735529223 +0000
+++ /media/fstests/results//generic/700.out.bad 2024-02-06 04:37:56.000000000 +0000
@@ -1,2 +1,4 @@
QA output created by 700
+/mnt/scratch_f2fs/f1: security.selinux: No such attribute
+/mnt/scratch_f2fs/f2: security.selinux: No such attribute
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/700.out /media/fstests/results//generic/700.out.bad' to see the entire diff)
HINT: You _MAY_ be missing kernel fix:
70b589a37e1a xfs: add selinux labels to whiteout inodes
Previously, it missed to create selinux labels during whiteout inode
initialization, fix this issue.
Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Make f2fs_gc_range() an extenal function to use it for GC for a range.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No functional change.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-22-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add two new helpers to allow opening block devices as files.
This is not the final infrastructure. This still opens the block device
before opening a struct a file. Until we have removed all references to
struct bdev_handle we can't switch the order:
* Introduce blk_to_file_flags() to translate from block specific to
flags usable to pen a new file.
* Introduce bdev_file_open_by_{dev,path}().
* Introduce temporary sb_bdev_handle() helper to retrieve a struct
bdev_handle from a block device file and update places that directly
reference struct bdev_handle to rely on it.
* Don't count block device openes against the number of open files. A
bdev_file_open_by_{dev,path}() file is never installed into any
file descriptor table.
One idea that came to mind was to use kernel_tmpfile_open() which
would require us to pass a path and it would then call do_dentry_open()
going through the regular fops->open::blkdev_open() path. But then we're
back to the problem of routing block specific flags such as
BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste
FMODE_* flags every time we add a new one. With this we can avoid using
a flag bit and we have more leeway in how we open block devices from
bdev_open_by_{dev,path}().
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Let's deprecate an unused io_bits feature to save CPU cycles and memory.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use
memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can
drop the gfp_mask parameter from blkdev_zone_mgmt() as well as
blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated().
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Guard the calls to blkdev_zone_mgmt() with a memalloc_nofs scope.
This helps us getting rid of the GFP_NOFS argument to blkdev_zone_mgmt();
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-4-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some weird old filesytems have UUID-like things that we wish to expose
as UUIDs, but are smaller; add a length field so that the new
FS_IOC_(GET|SET)UUID ioctls can handle them in generic code.
And add a helper super_set_uuid(), for setting nonstandard length uuids.
Helper is now required for the new FS_IOC_GETUUID ioctl; if
super_set_uuid() hasn't been called, the ioctl won't be supported.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.
Fixes: b9ba6f94b238 ("quota: remove dqptr_sem")
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Move enum rw_hint into a new header file to prepare for using this data
type in the block layer. Add the attribute __packed to reduce the space
occupied by instances of this data type from four bytes to one byte.
Change the data type of i_write_hint from u8 into enum rw_hint.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240202203926.2478590-5-bvanassche@acm.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
During recovery, if FAULT_BLOCK is on, it is possible that
f2fs_reserve_new_block() will return -ENOSPC during recovery,
then it may trigger panic.
Also, if fault injection rate is 1 and only FAULT_BLOCK fault
type is on, it may encounter deadloop in loop of block reservation.
Let's change as below to fix these issues:
- remove bug_on() to avoid panic.
- limit the loop count of block reservation to avoid potential
deadloop.
Fixes: 956fa1ddc132 ("f2fs: fix to check return value of f2fs_reserve_new_block()")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now IS_DNODE is used in f2fs_flush_inline_data and it has some problems:
1. Just only inodes may include inline data,not all direct nodes
2. When system IO is busy, it is inefficient to lock a direct node page
but not an inode page. Besides, if this direct node page is being
locked by others for IO, f2fs_flush_inline_data will be blocked here,
which will affects the checkpoint process, this is unreasonable.
So IS_INODE should be used in f2fs_flush_inline_data.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just remove some redundant codes, no logic change.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
- f2fs_disable_compressed_file
- check inode_has_data
- f2fs_file_mmap
- mkwrite
- f2fs_get_block_locked
: update metadata in compressed
inode's disk layout
- fi->i_flags &= ~F2FS_COMPR_FL
- clear_inode_flag(inode, FI_COMPRESSED_FILE);
we should use i_sem lock to prevent above race case.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use f2fs_err_ratelimited() to instead f2fs_err() in
f2fs_record_stop_reason() and f2fs_record_errors() to
avoid redundant logs.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch supports using printk_ratelimited() in f2fs_printk(), and
wrap ratelimited f2fs_printk() into f2fs_{err,warn,info}_ratelimited(),
then, use these new helps to clean up codes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
BUG: kernel NULL pointer dereference, address: 0000000000000014
RIP: 0010:f2fs_submit_page_write+0x6cf/0x780 [f2fs]
Call Trace:
<TASK>
? show_regs+0x6e/0x80
? __die+0x29/0x70
? page_fault_oops+0x154/0x4a0
? prb_read_valid+0x20/0x30
? __irq_work_queue_local+0x39/0xd0
? irq_work_queue+0x36/0x70
? do_user_addr_fault+0x314/0x6c0
? exc_page_fault+0x7d/0x190
? asm_exc_page_fault+0x2b/0x30
? f2fs_submit_page_write+0x6cf/0x780 [f2fs]
? f2fs_submit_page_write+0x736/0x780 [f2fs]
do_write_page+0x50/0x170 [f2fs]
f2fs_outplace_write_data+0x61/0xb0 [f2fs]
f2fs_do_write_data_page+0x3f8/0x660 [f2fs]
f2fs_write_single_data_page+0x5bb/0x7a0 [f2fs]
f2fs_write_cache_pages+0x3da/0xbe0 [f2fs]
...
It is possible that other threads have added this fio to io->bio
and submitted the io->bio before entering f2fs_submit_page_write().
At this point io->bio = NULL.
If is_end_zone_blkaddr(sbi, fio->new_blkaddr) of this fio is true,
then an NULL pointer dereference error occurs at bio_get(io->bio).
The original code for determining zone end was after "out:",
which would have missed some fio who is zone end. I've moved
this code before "skip:" to make sure it's done for each fio.
Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices")
Signed-off-by: Wenjie Qi <qwjhust@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to check last zone_pending_bio and wait IO completion before
traverse next fio in io->io_list, otherwise, bio in next zone may be
submitted before all IO completion in current zone.
Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We will encounter below inconsistent status when FAULT_BLKADDR type
fault injection is on.
Info: checkpoint state = d6 : nat_bits crc fsck compacted_summary orphan_inodes sudden-power-off
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c100 has i_blocks: 000000c0, but has 191 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1c100] i_blocks=0x000000c0 -> 0xbf
[FIX] (fsck_chk_inode_blk:1269) --> [0x1c100] i_compr_blocks=0x00000026 -> 0x27
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1cadb has i_blocks: 0000002f, but has 46 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1cadb] i_blocks=0x0000002f -> 0x2e
[FIX] (fsck_chk_inode_blk:1269) --> [0x1cadb] i_compr_blocks=0x00000011 -> 0x12
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c62c has i_blocks: 00000002, but has 1 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1c62c] i_blocks=0x00000002 -> 0x1
After we inject fault into f2fs_is_valid_blkaddr() during truncation,
a) it missed to increase @nr_free or @valid_blocks
b) it can cause in blkaddr leak in truncated dnode
Which may cause inconsistent status.
This patch separates FAULT_BLKADDR_CONSISTENCE from FAULT_BLKADDR,
and rename FAULT_BLKADDR to FAULT_BLKADDR_VALIDITY
so that we can:
a) use FAULT_BLKADDR_CONSISTENCE in f2fs_truncate_data_blocks_range()
to simulate inconsistent issue independently, then it can verify fsck
repair flow.
b) FAULT_BLKADDR_VALIDITY fault will not cause any inconsistent status,
we can just use it to check error path handling in kernel side.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
verify_blkaddr() will trigger panic once we inject fault into
f2fs_is_valid_blkaddr(), fix to remove this unnecessary f2fs_bug_on().
Fixes: 18792e64c86d ("f2fs: support fault injection for f2fs_is_valid_blkaddr()")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In reserve_compress_blocks(), we update blkaddrs of dnode in prior to
inc_valid_block_count(), it may cause inconsistent status bewteen
i_blocks and blkaddrs once inc_valid_block_count() fails.
To fix this issue, it needs to reverse their invoking order.
Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Compressed cluster may not be released due to we can fail in
release_compress_blocks(), fix to handle reserved compressed
cluster correctly in reserve_compress_blocks().
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When we overwrite compressed cluster w/ normal cluster, we should
not unlock cp_rwsem during f2fs_write_raw_pages(), otherwise data
will be corrupted if partial blocks were persisted before CP & SPOR,
due to cluster metadata wasn't updated atomically.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If data block in compressed cluster is not persisted with metadata
during checkpoint, after SPOR, the data may be corrupted, let's
guarantee to write compressed page by checkpoint.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
'f2fs_is_checkpoint_ready()' checks free sections. If there is not
enough free sections, most f2fs operations will return -ENOSPC when
checkpoint is disabled.
It would be better to check free sections before disable checkpoint.
Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
[1] changed the below condition, which made f2fs_put_page() voided.
This patch reapplies the AL's resolution in -next from [2].
- if (S_ISDIR(old_inode->i_mode)) {
+ if (old_is_dir && old_dir != new_dir) {
old_dir_entry = f2fs_parent_dir(old_inode, &old_dir_page);
if (!old_dir_entry) {
if (IS_ERR(old_dir_page))
[1] 7deee77b993a ("f2fs: Avoid reading renamed directory if parent does not change")
[2] https://lore.kernel.org/all/20231220013402.GW1674809@ZenIV/
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux
Merge exportfs fixes from Chuck Lever:
* tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux:
fs: Create a generic is_dot_dotdot() utility
exportfs: fix the fallback implementation of the get_name export operation
Link: https://lore.kernel.org/r/BDC2AEB4-7085-4A7C-8DE8-A659FE1DBA6A@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
De-duplicate the same functionality in several places by hoisting
the is_dot_dotdot() utility function into linux/fs.h.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
kill_f2fs_super() is called even if f2fs_fill_super() fails.
f2fs_fill_super() frees the struct f2fs_sb_info, so it must set
sb->s_fs_info to NULL to prevent it from being freed again.
Fixes: 275dca4630c1 ("f2fs: move release of block devices to after kill_block_super()")
Reported-by: <syzbot+8f477ac014ff5b32d81f@syzkaller.appspotmail.com>
Closes: https://lore.kernel.org/lkml/0000000000006cb174060ec34502@google.com
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/linux-f2fs-devel/20240113005747.38887-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this series, we've some progress to support Zoned block device
regarding to the power-cut recovery flow and enabling
checkpoint=disable feature which is essential for Android OTA.
Other than that, some patches touched sysfs entries and tracepoints
which are minor, while several bug fixes on error handlers and
compression flows are good to improve the overall stability.
Enhancements:
- enable checkpoint=disable for zoned block device
- sysfs entries such as discard status, discard_io_aware, dir_level
- tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(),
f2fs_new_inode()
- use shared inode lock during f2fs_fiemap() and f2fs_seek_block()
Bug fixes:
- address some power-cut recovery issues on zoned block device
- handle errors and logics on do_garbage_collect(),
f2fs_reserve_new_block(), f2fs_move_file_range(),
f2fs_recover_xattr_data()
- don't set FI_PREALLOCATED_ALL for partial write
- fix to update iostat correctly in f2fs_filemap_fault()
- fix to wait on block writeback for post_read case
- fix to tag gcing flag on page during block migration
- restrict max filesize for 16K f2fs
- fix to avoid dirent corruption
- explicitly null-terminate the xattr list
There are also several clean-up patches to remove dead codes and
better readability"
* tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (33 commits)
f2fs: show more discard status by sysfs
f2fs: Add error handling for negative returns from do_garbage_collect
f2fs: Constrain the modification range of dir_level in the sysfs
f2fs: Use wait_event_freezable_timeout() for freezable kthread
f2fs: fix to check return value of f2fs_recover_xattr_data
f2fs: don't set FI_PREALLOCATED_ALL for partial write
f2fs: fix to update iostat correctly in f2fs_filemap_fault()
f2fs: fix to check compress file in f2fs_move_file_range()
f2fs: fix to wait on block writeback for post_read case
f2fs: fix to tag gcing flag on page during block migration
f2fs: add tracepoint for f2fs_vm_page_mkwrite()
f2fs: introduce f2fs_invalidate_internal_cache() for cleanup
f2fs: update blkaddr in __set_data_blkaddr() for cleanup
f2fs: introduce get_dnode_addr() to clean up codes
f2fs: delete obsolete FI_DROP_CACHE
f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN
f2fs: Restrict max filesize for 16K f2fs
f2fs: let's finish or reset zones all the time
f2fs: check write pointers when checkpoint=disable
f2fs: fix write pointers on zoned device after roll forward
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull rename updates from Al Viro:
"Fix directory locking scheme on rename
This was broken in 6.5; we really can't lock two unrelated directories
without holding ->s_vfs_rename_mutex first and in case of same-parent
rename of a subdirectory 6.5 ends up doing just that"
* tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rename(): avoid a deadlock in the case of parents having no common ancestor
kill lock_two_inodes()
rename(): fix the locking of subdirectories
f2fs: Avoid reading renamed directory if parent does not change
ext4: don't access the source subdirectory content on same-directory rename
ext2: Avoid reading renamed directory if parent does not change
udf_rename(): only access the child content on cross-directory rename
ocfs2: Avoid touching renamed directory if parent does not change
reiserfs: Avoid touching renamed directory if parent does not change
|
|
Pull block updates from Jens Axboe:
"Pretty quiet round this time around. This contains:
- NVMe updates via Keith:
- nvme fabrics spec updates (Guixin, Max)
- nvme target udpates (Guixin, Evan)
- nvme attribute refactoring (Daniel)
- nvme-fc numa fix (Keith)
- MD updates via Song:
- Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai)
- Fix raid5 hang issue (Junxiao Bi)
- Add Yu Kuai as Reviewer of the md subsystem
- Remove deprecated flavors (Song Liu)
- raid1 read error check support (Li Nan)
- Better handle events off-by-1 case (Alex Lyakas)
- Efficiency improvements for passthrough (Kundan)
- Support for mapping integrity data directly (Keith)
- Zoned write fix (Damien)
- rnbd fixes (Kees, Santosh, Supriti)
- Default to a sane discard size granularity (Christoph)
- Make the default max transfer size naming less confusing
(Christoph)
- Remove support for deprecated host aware zoned model (Christoph)
- Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel,
Bart, Christoph)"
* tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits)
block: Treat sequential write preferred zone type as invalid
block: remove disk_clear_zoned
sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics
drivers/block/xen-blkback/common.h: Fix spelling typo in comment
blk-cgroup: fix rcu lockdep warning in blkg_lookup()
blk-cgroup: don't use removal safe list iterators
block: floor the discard granularity to the physical block size
mtd_blkdevs: use the default discard granularity
bcache: use the default discard granularity
zram: use the default discard granularity
null_blk: use the default discard granularity
nbd: use the default discard granularity
ubd: use the default discard granularity
block: default the discard granularity to sector size
bcache: discard_granularity should not be smaller than a sector
block: remove two comments in bio_split_discard
block: rename and document BLK_DEF_MAX_SECTORS
loop: don't abuse BLK_DEF_MAX_SECTORS
aoe: don't abuse BLK_DEF_MAX_SECTORS
null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS
...
|
|
Pull fscrypt updates from Eric Biggers:
"Adjust the timing of the fscrypt keyring destruction, to prepare for
btrfs's fscrypt support.
Also document that CephFS supports fscrypt now"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fs: move fscrypt keyring destruction to after ->put_super
f2fs: move release of block devices to after kill_block_super()
fscrypt: document that CephFS supports fscrypt now
fscrypt: update comment for do_remove_key()
fscrypt.rst: update definition of struct fscrypt_context_v2
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
|
|
Call destroy_device_list() and free the f2fs_sb_info from
kill_f2fs_super(), after the call to kill_block_super(). This is
necessary to order it after the call to fscrypt_destroy_keyring() once
generic_shutdown_super() starts calling fscrypt_destroy_keyring() just
after calling ->put_super. This is because fscrypt_destroy_keyring()
may call into f2fs_get_devices() via the fscrypt_operations.
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20231227171429.9223-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
The current pending_discard attr just only shows the discard_cmd_cnt
information. More discard status can be shown so that we can check
them through sysfs when needed.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The function do_garbage_collect can return a value less than 0 due
to f2fs_cp_error being true or page allocation failure, as a result
of calling f2fs_get_sum_page. However, f2fs_gc does not account for
such cases, which could potentially lead to an abnormal total_freed
and thus cause subsequent code to behave unexpectedly. Given that
an f2fs_cp_error is irrecoverable, and considering that
do_garbage_collect already retries page allocation errors through
its call to f2fs_get_sum_page->f2fs_get_meta_page_retry, any error
reported by do_garbage_collect should immediately terminate the
current GC.
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The {struct f2fs_sb_info}->dir_level can be modified through the sysfs
interface, but its value range is not limited. If the value exceeds
MAX_DIR_HASH_DEPTH and the mount options include "noinline_dentry",
the following error will occur:
[root@fedora ~]# mount -o noinline_dentry /dev/sdb /mnt/sdb/
[root@fedora ~]# echo 128 > /sys/fs/f2fs/sdb/dir_level
[root@fedora ~]# cd /mnt/sdb/
[root@fedora sdb]# mkdir test
[root@fedora sdb]# cd test/
[root@fedora test]# mkdir test
mkdir: cannot create directory 'test': Argument list too long
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
A freezable kernel thread can enter frozen state during freezing by
either calling try_to_freeze() or using wait_event_freezable() and its
variants. So for the following snippet of code in a kernel thread loop:
wait_event_interruptible_timeout();
try_to_freeze();
We can change it to a simple wait_event_freezable_timeout() and then
eliminate the function calls to try_to_freeze() and freezing().
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):
- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)
Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.
Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Should check return value of f2fs_recover_xattr_data in
__f2fs_setxattr rather than doing invalid retry if error happen.
Also just do set_page_dirty in f2fs_recover_xattr_data when
page is changed really.
Fixes: 50a472bbc79f ("f2fs: do not return EFSCORRUPTED, but try to run online repair")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_preallocate_blocks(), if it is partial write in 4KB, it's not
necessary to call f2fs_map_blocks() and set FI_PREALLOCATED_ALL flag.
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_filemap_fault(), it fixes to update iostat info only if
VM_FAULT_LOCKED is tagged in return value of filemap_fault().
Fixes: 8b83ac81f428 ("f2fs: support read iostat")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_move_file_range() doesn't support migrating compressed cluster
data, let's add the missing check condition and return -EOPNOTSUPP
for the case until we support it.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If inode is compressed, but not encrypted, it missed to call
f2fs_wait_on_block_writeback() to wait for GCed page writeback
in IPU write path.
Thread A GC-Thread
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_block
- f2fs_submit_page_write
migrate normal cluster's block via
meta_inode's page cache
- f2fs_write_single_data_page
- f2fs_do_write_data_page
- f2fs_inplace_write_data
- f2fs_submit_page_bio
IRQ
- f2fs_read_end_io
IRQ
old data overrides new data due to
out-of-order GC and common IO.
- f2fs_read_end_io
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to add missing gcing flag on page during block migration,
in order to garantee migrated data be persisted during checkpoint,
otherwise out-of-order persistency between data and node may cause
data corruption after SPOR.
Similar issue was fixed by commit 2d1fe8a86bf5 ("f2fs: fix to tag
gcing flag on page during file defragment").
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to support tracepoint for f2fs_vm_page_mkwrite(),
meanwhile it prints more details for trace_f2fs_filemap_fault().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no logic changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch allows caller to pass blkaddr to f2fs_set_data_blkaddr()
and let __set_data_blkaddr() inside f2fs_set_data_blkaddr() to update
dn->data_blkaddr w/ last value of blkaddr.
Just cleanup, no logic changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no logic changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
FI_DROP_CACHE was introduced in commit 1e84371ffeef ("f2fs: change
atomic and volatile write policies") for volatile write feature,
after commit 7bc155fec5b3 ("f2fs: kill volatile write support"),
we won't support volatile write, let's delete related codes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 3c6c2bebef79 ("f2fs: avoid punch_hole overhead when releasing
volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason:
This patch is to avoid some punch_hole overhead when releasing volatile
data. If volatile data was not written yet, we just can make the first
page as zero.
After commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we
won't support volatile write, but it missed to remove obsolete
FI_FIRST_BLOCK_WRITTEN, delete it in this patch.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There were already assertions that we were not passing a tail page to
error_remove_page(), so make the compiler enforce that by converting
everything to pass and use a folio.
Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Blocks are tracked by u32, so the max permitted filesize is
(U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto
data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must
further restrict max filesize to (U32_MAX + 1) * 4096. This does not
affect 4K blocksize f2fs as the natural limit for files are well below
that.
Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size")
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In order to limit # of open zones, let's finish or reset zones given # of
valid blocks per section and its zone condition.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Even if f2fs was rebooted as staying checkpoint=disable, let's match the write
pointers all the time.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
1. do roll forward recovery
2. update current segments pointers
3. fix the entire zones' write pointers
4. do checkpoint
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If fsck can allocate a new zone, it'd be better to use that instead of
allocating a new one.
And, it modifies kernel messages.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's allow checkpoint=disable back for zoned block device. It's very risky
as the feature relies on fsck or runtime recovery which matches the write
pointers again if the device rebooted while disabling the checkpoint.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It gives a way to enable/disable IO aware feature for background
discard, so that we can tune background discard more precisely
based on undiscard condition. e.g. force to disable IO aware if
there are large number of discard extents, and discard IO may
always be interrupted by frequent common IO.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds tracepoints for f2fs_rename().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Al reported in link[1]:
f2fs_rename()
...
if (old_dir != new_dir && !whiteout)
f2fs_set_link(old_inode, old_dir_entry,
old_dir_page, new_dir);
else
f2fs_put_page(old_dir_page, 0);
You want correct inumber in the ".." link. And cross-directory
rename does move the source to new parent, even if you'd been asked
to leave a whiteout in the old place.
[1] https://lore.kernel.org/all/20231017055040.GN800259@ZenIV/
With below testcase, it may cause dirent corruption, due to it missed
to call f2fs_set_link() to update ".." link to new directory.
- mkdir -p dir/foo
- renameat2 -w dir/foo bar
[ASSERT] (__chk_dots_dentries:1421) --> Bad inode number[0x4] for '..', parent parent ino is [0x3]
[FSCK] other corrupted bugs [Fail]
Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Cc: Jan Kara <jack@suse.cz>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The VFS will not be locking moved directory if its parent does not
change. Change f2fs rename code to avoid reading renamed directory if
its parent does not change. Having it uninlined while we are reading
it would cause trouble and we won't be able to rely upon ->i_rwsem
on the directory being renamed in cases that do not alter its parent.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When recovering zoned UFS, sometimes we add the same zone to discard multiple
times. Simple workaround is to bypass adding it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We have bdev_mark_dead() etc and we're going to move block device
freezing to holder ops in the next patch. Make the naming consistent:
* freeze_bdev() -> bdev_freeze()
* thaw_bdev() -> bdev_thaw()
Also document the return code.
Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-2-599c19f4faac@kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Let's check return value of f2fs_reserve_new_block() in do_recover_data()
rather than letting it fails silently.
Also refactoring check condition on return value of f2fs_reserve_new_block()
as below:
- trigger f2fs_bug_on() only for ENOSPC case;
- use do-while statement to avoid redundant codes;
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_fiemap() will only traverse metadata of inode, let's use shared
inode lock for it to avoid unnecessary race on inode lock.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no logic changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When setting an xattr, explicitly null-terminate the xattr list. This
eliminates the fragile assumption that the unused xattr space is always
zeroed.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
inode_lock_shared() -> down_read(&inode->i_rwsem)
inode_lock() -> down_write(&inode->i_rwsem)
Inode is not updated in f2fs_seek_block(), so there is no need
to hold write lock, use read lock for more efficiency.
Signed-off-by: zhangxirui <xirui.zhang@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fanotify fsid updates from Christian Brauner:
"This work is part of the plan to enable fanotify to serve as a drop-in
replacement for inotify. While inotify is availabe on all filesystems,
fanotify currently isn't.
In order to support fanotify on all filesystems two things are needed:
(1) all filesystems need to support AT_HANDLE_FID
(2) all filesystems need to report a non-zero f_fsid
This contains (1) and allows filesystems to encode non-decodable file
handlers for fanotify without implementing any exportfs operations by
encoding a file id of type FILEID_INO64_GEN from i_ino and
i_generation.
Filesystems that want to opt out of encoding non-decodable file ids
for fanotify that don't support NFS export can do so by providing an
empty export_operations struct.
This also partially addresses (2) by generating f_fsid for simple
filesystems as well as freevxfs. Remaining filesystems will be dealt
with by separate patches.
Finally, this contains the patch from the current exportfs maintainers
which moves exportfs under vfs with Chuck, Jeff, and Amir as
maintainers and vfs.git as tree"
* tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
MAINTAINERS: create an entry for exportfs
fs: fix build error with CONFIG_EXPORTFS=m or not defined
freevxfs: derive f_fsid from bdev->bd_dev
fs: report f_fsid from s_dev for "simple" filesystems
exportfs: support encoding non-decodeable file handles by default
exportfs: define FILEID_INO64_GEN* file handle types
exportfs: make ->encode_fh() a mandatory method for NFS export
exportfs: add helpers to check if filesystem can encode/decode file handles
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we introduce a bigger page size support by changing the
internal f2fs's block size aligned to the page size. We also continue
to improve zoned block device support regarding the power off
recovery. As usual, there are some bug fixes regarding the error
handling routines in compression and ioctl.
Enhancements:
- Support Block Size == Page Size
- let f2fs_precache_extents() traverses in file range
- stop iterating f2fs_map_block if hole exists
- preload extent_cache for POSIX_FADV_WILLNEED
- compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file()
Bug fixes:
- do not return EFSCORRUPTED, but try to run online repair
- finish previous checkpoints before returning from remount
- fix error handling of __get_node_page and __f2fs_build_free_nids
- clean up zones when not successfully unmounted
- fix to initialize map.m_pblk in f2fs_precache_extents()
- fix to drop meta_inode's page cache in f2fs_put_super()
- set the default compress_level on ioctl
- fix to avoid use-after-free on dic
- fix to avoid redundant compress extension
- do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on
- fix deadloop in f2fs_write_cache_pages()"
* tag 'f2fs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: finish previous checkpoints before returning from remount
f2fs: fix error handling of __get_node_page
f2fs: do not return EFSCORRUPTED, but try to run online repair
f2fs: fix error path of __f2fs_build_free_nids
f2fs: Clean up errors in segment.h
f2fs: clean up zones when not successfully unmounted
f2fs: let f2fs_precache_extents() traverses in file range
f2fs: avoid format-overflow warning
f2fs: fix to initialize map.m_pblk in f2fs_precache_extents()
f2fs: Support Block Size == Page Size
f2fs: stop iterating f2fs_map_block if hole exists
f2fs: preload extent_cache for POSIX_FADV_WILLNEED
f2fs: set the default compress_level on ioctl
f2fs: compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file()
f2fs: fix to drop meta_inode's page cache in f2fs_put_super()
f2fs: split initial and dynamic conditions for extent_cache
f2fs: compress: fix to avoid redundant compress extension
f2fs: compress: do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on
f2fs: compress: fix to avoid use-after-free on dic
f2fs: compress: fix deadloop in f2fs_write_cache_pages()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Kemeng Shi has contributed some compation maintenance work in the
series 'Fixes and cleanups to compaction'
- Joel Fernandes has a patchset ('Optimize mremap during mutual
alignment within PMD') which fixes an obscure issue with mremap()'s
pagetable handling during a subsequent exec(), based upon an
implementation which Linus suggested
- More DAMON/DAMOS maintenance and feature work from SeongJae Park i
the following patch series:
mm/damon: misc fixups for documents, comments and its tracepoint
mm/damon: add a tracepoint for damos apply target regions
mm/damon: provide pseudo-moving sum based access rate
mm/damon: implement DAMOS apply intervals
mm/damon/core-test: Fix memory leaks in core-test
mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
- In the series 'Do not try to access unaccepted memory' Adrian
Hunter provides some fixups for the recently-added 'unaccepted
memory' feature. To increase the feature's checking coverage. 'Plug
a few gaps where RAM is exposed without checking if it is
unaccepted memory'
- In the series 'cleanups for lockless slab shrink' Qi Zheng has done
some maintenance work which is preparation for the lockless slab
shrinking code
- Qi Zheng has redone the earlier (and reverted) attempt to make slab
shrinking lockless in the series 'use refcount+RCU method to
implement lockless slab shrink'
- David Hildenbrand contributes some maintenance work for the rmap
code in the series 'Anon rmap cleanups'
- Kefeng Wang does more folio conversions and some maintenance work
in the migration code. Series 'mm: migrate: more folio conversion
and unification'
- Matthew Wilcox has fixed an issue in the buffer_head code which was
causing long stalls under some heavy memory/IO loads. Some cleanups
were added on the way. Series 'Add and use bdev_getblk()'
- In the series 'Use nth_page() in place of direct struct page
manipulation' Zi Yan has fixed a potential issue with the direct
manipulation of hugetlb page frames
- In the series 'mm: hugetlb: Skip initialization of gigantic tail
struct pages if freed by HVO' has improved our handling of gigantic
pages in the hugetlb vmmemmep optimizaton code. This provides
significant boot time improvements when significant amounts of
gigantic pages are in use
- Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
rationalization and folio conversions in the hugetlb code
- Yin Fengwei has improved mlock()'s handling of large folios in the
series 'support large folio for mlock'
- In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
added statistics for memcg v1 users which are available (and
useful) under memcg v2
- Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
prctl so that userspace may direct the kernel to not automatically
propagate the denial to child processes. The series is named 'MDWE
without inheritance'
- Kefeng Wang has provided the series 'mm: convert numa balancing
functions to use a folio' which does what it says
- In the series 'mm/ksm: add fork-exec support for prctl' Stefan
Roesch makes is possible for a process to propagate KSM treatment
across exec()
- Huang Ying has enhanced memory tiering's calculation of memory
distances. This is used to permit the dax/kmem driver to use 'high
bandwidth memory' in addition to Optane Data Center Persistent
Memory Modules (DCPMM). The series is named 'memory tiering:
calculate abstract distance based on ACPI HMAT'
- In the series 'Smart scanning mode for KSM' Stefan Roesch has
optimized KSM by teaching it to retain and use some historical
information from previous scans
- Yosry Ahmed has fixed some inconsistencies in memcg statistics in
the series 'mm: memcg: fix tracking of pending stats updates
values'
- In the series 'Implement IOCTL to get and optionally clear info
about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
which permits us to atomically read-then-clear page softdirty
state. This is mainly used by CRIU
- Hugh Dickins contributed the series 'shmem,tmpfs: general
maintenance', a bunch of relatively minor maintenance tweaks to
this code
- Matthew Wilcox has increased the use of the VMA lock over
file-backed page faults in the series 'Handle more faults under the
VMA lock'. Some rationalizations of the fault path became possible
as a result
- In the series 'mm/rmap: convert page_move_anon_rmap() to
folio_move_anon_rmap()' David Hildenbrand has implemented some
cleanups and folio conversions
- In the series 'various improvements to the GUP interface' Lorenzo
Stoakes has simplified and improved the GUP interface with an eye
to providing groundwork for future improvements
- Andrey Konovalov has sent along the series 'kasan: assorted fixes
and improvements' which does those things
- Some page allocator maintenance work from Kemeng Shi in the series
'Two minor cleanups to break_down_buddy_pages'
- In thes series 'New selftest for mm' Breno Leitao has developed
another MM self test which tickles a race we had between madvise()
and page faults
- In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
and an optimization to the core pagecache code
- Nhat Pham has added memcg accounting for hugetlb memory in the
series 'hugetlb memcg accounting'
- Cleanups and rationalizations to the pagemap code from Lorenzo
Stoakes, in the series 'Abstract vma_merge() and split_vma()'
- Audra Mitchell has fixed issues in the procfs page_owner code's new
timestamping feature which was causing some misbehaviours. In the
series 'Fix page_owner's use of free timestamps'
- Lorenzo Stoakes has fixed the handling of new mappings of sealed
files in the series 'permit write-sealed memfd read-only shared
mappings'
- Mike Kravetz has optimized the hugetlb vmemmap optimization in the
series 'Batch hugetlb vmemmap modification operations'
- Some buffer_head folio conversions and cleanups from Matthew Wilcox
in the series 'Finish the create_empty_buffers() transition'
- As a page allocator performance optimization Huang Ying has added
automatic tuning to the allocator's per-cpu-pages feature, in the
series 'mm: PCP high auto-tuning'
- Roman Gushchin has contributed the patchset 'mm: improve
performance of accounted kernel memory allocations' which improves
their performance by ~30% as measured by a micro-benchmark
- folio conversions from Kefeng Wang in the series 'mm: convert page
cpupid functions to folios'
- Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
kmemleak'
- Qi Zheng has improved our handling of memoryless nodes by keeping
them off the allocation fallback list. This is done in the series
'handle memoryless nodes more appropriately'
- khugepaged conversions from Vishal Moola in the series 'Some
khugepaged folio conversions'"
[ bcachefs conflicts with the dynamically allocated shrinkers have been
resolved as per Stephen Rothwell in
https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/
with help from Qi Zheng.
The clone3 test filtering conflict was half-arsed by yours truly ]
* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
mm/damon/sysfs: update monitoring target regions for online input commit
mm/damon/sysfs: remove requested targets when online-commit inputs
selftests: add a sanity check for zswap
Documentation: maple_tree: fix word spelling error
mm/vmalloc: fix the unchecked dereference warning in vread_iter()
zswap: export compression failure stats
Documentation: ubsan: drop "the" from article title
mempolicy: migration attempt to match interleave nodes
mempolicy: mmap_lock is not needed while migrating folios
mempolicy: alloc_pages_mpol() for NUMA policy without vma
mm: add page_rmappable_folio() wrapper
mempolicy: remove confusing MPOL_MF_LAZY dead code
mempolicy: mpol_shared_policy_init() without pseudo-vma
mempolicy trivia: use pgoff_t in shared mempolicy tree
mempolicy trivia: slightly more consistent naming
mempolicy trivia: delete those ancient pr_debug()s
mempolicy: fix migrate_pages(2) syscall return nr_failed
kernfs: drop shared NUMA mempolicy hooks
hugetlbfs: drop shared NUMA mempolicy pretence
mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
...
|
|
Pull fscrypt updates from Eric Biggers:
"This update adds support for configuring the crypto data unit size
(i.e. the granularity of file contents encryption) to be less than the
filesystem block size. This can allow users to use inline encryption
hardware in some cases when it wouldn't otherwise be possible.
In addition, there are two commits that are prerequisites for the
extent-based encryption support that the btrfs folks are working on"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: track master key presence separately from secret
fscrypt: rename fscrypt_info => fscrypt_inode_info
fscrypt: support crypto data unit size less than filesystem block size
fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes
fscrypt: compute max_lblk_bits from s_maxbytes and block size
fscrypt: make the bounce page pool opt-in instead of opt-out
fscrypt: make it clearer that key_prefix is deprecated
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode time accessor updates from Christian Brauner:
"This finishes the conversion of all inode time fields to accessor
functions as discussed on list. Changing timestamps manually as we
used to do before is error prone. Using accessors function makes this
robust.
It does not contain the switch of the time fields to discrete 64 bit
integers to replace struct timespec and free up space in struct inode.
But after this, the switch can be trivially made and the patch should
only affect the vfs if we decide to do it"
* tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
fs: rename inode i_atime and i_mtime fields
security: convert to new timestamp accessors
selinux: convert to new timestamp accessors
apparmor: convert to new timestamp accessors
sunrpc: convert to new timestamp accessors
mm: convert to new timestamp accessors
bpf: convert to new timestamp accessors
ipc: convert to new timestamp accessors
linux: convert to new timestamp accessors
zonefs: convert to new timestamp accessors
xfs: convert to new timestamp accessors
vboxsf: convert to new timestamp accessors
ufs: convert to new timestamp accessors
udf: convert to new timestamp accessors
ubifs: convert to new timestamp accessors
tracefs: convert to new timestamp accessors
sysv: convert to new timestamp accessors
squashfs: convert to new timestamp accessors
server: convert to new timestamp accessors
client: convert to new timestamp accessors
...
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs xattr updates from Christian Brauner:
"The 's_xattr' field of 'struct super_block' currently requires a
mutable table of 'struct xattr_handler' entries (although each handler
itself is const). However, no code in vfs actually modifies the
tables.
This changes the type of 's_xattr' to allow const tables, and modifies
existing file systems to move their tables to .rodata. This is
desirable because these tables contain entries with function pointers
in them; moving them to .rodata makes it considerably less likely to
be modified accidentally or maliciously at runtime"
* tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
const_structs.checkpatch: add xattr_handler
net: move sockfs_xattr_handlers to .rodata
shmem: move shmem_xattr_handlers to .rodata
overlayfs: move xattr tables to .rodata
xfs: move xfs_xattr_handlers to .rodata
ubifs: move ubifs_xattr_handlers to .rodata
squashfs: move squashfs_xattr_handlers to .rodata
smb: move cifs_xattr_handlers to .rodata
reiserfs: move reiserfs_xattr_handlers to .rodata
orangefs: move orangefs_xattr_handlers to .rodata
ocfs2: move ocfs2_xattr_handlers and ocfs2_xattr_handler_map to .rodata
ntfs3: move ntfs_xattr_handlers to .rodata
nfs: move nfs4_xattr_handlers to .rodata
kernfs: move kernfs_xattr_handlers to .rodata
jfs: move jfs_xattr_handlers to .rodata
jffs2: move jffs2_xattr_handlers to .rodata
hfsplus: move hfsplus_xattr_handlers to .rodata
hfs: move hfs_xattr_handlers to .rodata
gfs2: move gfs2_xattr_handlers_max to .rodata
fuse: move fuse_xattr_handlers to .rodata
...
|
|
Rename the default helper for encoding FILEID_INO32_GEN* file handles to
generic_encode_ino32_fh() and convert the filesystems that used the
default implementation to use the generic helper explicitly.
After this change, exportfs_encode_inode_fh() no longer has a default
implementation to encode FILEID_INO32_GEN* file handles.
This is a step towards allowing filesystems to encode non-decodeable
file handles for fanotify without having to implement any
export_operations.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Convert f2fs to use bdev_open_by_dev/path() and pass the handle around.
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: Chao Yu <chao@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230927093442.25915-23-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Flush remaining checkpoint requests at the end of remount, since a new
checkpoint would be triggered while remount and we need to take care of
it before returning from remount, in order to avoid the below race
condition.
- Thread - checkpoint thread
do_remount()
down_write(&sb->s_umount);
f2fs_remount()
f2fs_disable_checkpoint(sbi) -> add checkpoints to the list
block_operations()
down_read_trylock(&sb->s_umount) = 0
up_write(&sb->s_umount);
f2fs_quota_sync()
dquot_writeback_dquots()
WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount));
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use f2fs_handle_error to record inconsistent node block error
and return -EFSCORRUPTED instead of -EINVAL.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If we return the error, there's no way to recover the status as of now, since
fsck does not fix the xattr boundary issue.
Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to using the new inode timestamp accessor functions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-34-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If NAT is corrupted, let scan_nat_page() return EFSCORRUPTED, so that,
caller can set SBI_NEED_FSCK flag into checkpoint for later repair by
fsck.
Also, this patch introduces a new fscorrupted error flag, and in above
scenario, it will persist the error flag into superblock synchronously
to avoid it has no luck to trigger a checkpoint to record SBI_NEED_FSCK
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fix the following errors reported by checkpatch:
ERROR: spaces required around that ':' (ctx:VxW)
Signed-off-by: KaiLong Wang <wangkailong@jari.cn>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We can't trust write pointers when the previous mount was not
successfully unmounted.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Rather than in range of [0, max_file_blocks()), since data after EOF
is alwasy zero, it's unnecessary to preload mapping info of the data.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
With gcc and W=1 option, there's a warning like this:
fs/f2fs/compress.c: In function ‘f2fs_init_page_array_cache’:
fs/f2fs/compress.c:1984:47: error: ‘%u’ directive writing between
1 and 7 bytes into a region of size between 5 and 8
[-Werror=format-overflow=]
1984 | sprintf(slab_name, "f2fs_page_array_entry-%u:%u", MAJOR(dev),
MINOR(dev));
| ^~
String "f2fs_page_array_entry-%u:%u" can up to 35. The first "%u" can up
to 4 and the second "%u" can up to 7, so total size is "24 + 4 + 7 = 35".
slab_name's size should be 35 rather than 32.
Cc: stable@vger.kernel.org
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Otherwise, it may print random physical block address in tracepoint
of f2fs_map_blocks() as below:
f2fs_map_blocks: dev = (253,16), ino = 2297, file offset = 0, start blkaddr = 0xa356c421, len = 0x0, flags = 0
Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This makes it harder for accidental or malicious changes to
f2fs_xattr_handlers or f2fs_xattr_handler_map at runtime.
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Link: https://lore.kernel.org/r/20230930050033.41174-11-wedsonaf@gmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This allows f2fs to support cases where the block size = page size for
both 4K and 16K block sizes. Other sizes should work as well, should the
need arise. This does not currently support 4K Block size filesystems if
the page size is 16K.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use new APIs to dynamically allocate the f2fs-shrinker.
Link: https://lkml.kernel.org/r/20230911094444.68966-8-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Chuck Lever <cel@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sean Paul <sean@poorly.run>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Song Liu <song@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Let's avoid unnecessary f2fs_map_block calls to load extents.
# f2fs_io fadvise willneed 0 4096 /data/local/tmp/test
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 386, start blkaddr = 0x34ac00, len = 0x1400, flags = 2,
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 5506, start blkaddr = 0x34c200, len = 0x1000, flags = 2,
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 9602, start blkaddr = 0x34d600, len = 0x1200, flags = 2,
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 14210, start blkaddr = 0x34ec00, len = 0x400, flags = 2,
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 15235, start blkaddr = 0x34f401, len = 0xbff, flags = 2,
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 18306, start blkaddr = 0x350200, len = 0x1200, flags = 2
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 22915, start blkaddr = 0x351601, len = 0xa7d, flags = 2
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 25600, start blkaddr = 0x351601, len = 0x0, flags = 0
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 25601, start blkaddr = 0x351601, len = 0x0, flags = 0
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 25602, start blkaddr = 0x351601, len = 0x0, flags = 0
...
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 1037188, start blkaddr = 0x351601, len = 0x0, flags = 0
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 1038206, start blkaddr = 0x351601, len = 0x0, flags = 0
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 1039224, start blkaddr = 0x351601, len = 0x0, flags = 0
f2fs_map_blocks: dev = (254,51), ino = 85845, file offset = 2075548, start blkaddr = 0x351601, len = 0x0, flags = 0
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch tries to preload extent_cache given POSIX_FADV_WILLNEED, which is
more useful for generic usecases.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Until now, fscrypt has always used the filesystem block size as the
granularity of file contents encryption. Two scenarios have come up
where a sub-block granularity of contents encryption would be useful:
1. Inline crypto hardware that only supports a crypto data unit size
that is less than the filesystem block size.
2. Support for direct I/O at a granularity less than the filesystem
block size, for example at the block device's logical block size in
order to match the traditional direct I/O alignment requirement.
(1) first came up with older eMMC inline crypto hardware that only
supports a crypto data unit size of 512 bytes. That specific case
ultimately went away because all systems with that hardware continued
using out of tree code and never actually upgraded to the upstream
inline crypto framework. But, now it's coming back in a new way: some
current UFS controllers only support a data unit size of 4096 bytes, and
there is a proposal to increase the filesystem block size to 16K.
(2) was discussed as a "nice to have" feature, though not essential,
when support for direct I/O on encrypted files was being upstreamed.
Still, the fact that this feature has come up several times does suggest
it would be wise to have available. Therefore, this patch implements it
by using one of the reserved bytes in fscrypt_policy_v2 to allow users
to select a sub-block data unit size. Supported data unit sizes are
powers of 2 between 512 and the filesystem block size, inclusively.
Support is implemented for both the FS-layer and inline crypto cases.
This patch focuses on the basic support for sub-block data units. Some
things are out of scope for this patch but may be addressed later:
- Supporting sub-block data units in combination with
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this
combination usually causes data unit indices to exceed 32 bits, and
thus fscrypt_supported_policy() correctly disallows it. The users who
potentially need this combination are using f2fs. To support it, f2fs
would need to provide an option to slightly reduce its max file size.
- Supporting sub-block data units in combination with
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem
described above, but also it will need special code to make DUN
wraparound still happen on a FS block boundary.
- Supporting use case (2) mentioned above. The encrypted direct I/O
code will need to stop requiring and assuming FS block alignment.
This won't be hard, but it belongs in a separate patch.
- Supporting this feature on filesystems other than ext4 and f2fs.
(Filesystems declare support for it via their fscrypt_operations.)
On UBIFS, sub-block data units don't make sense because UBIFS encrypts
variable-length blocks as a result of compression. CephFS could
support it, but a bit more work would be needed to make the
fscrypt_*_block_inplace functions play nicely with sub-block data
units. I don't think there's a use case for this on CephFS anyway.
Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Now that fs/crypto/ computes the filesystem's lblk_bits from its maximum
file size, it is no longer necessary for filesystems to provide
lblk_bits via fscrypt_operations::get_ino_and_lblk_bits.
It is still necessary for fs/crypto/ to retrieve ino_bits from the
filesystem. However, this is used only to decide whether inode numbers
fit in 32 bits. Also, ino_bits is static for all relevant filesystems,
i.e. it doesn't depend on the filesystem instance.
Therefore, in the interest of keeping things as simple as possible,
replace 'get_ino_and_lblk_bits' with a flag 'has_32bit_inodes'. This
can always be changed back to a function if a filesystem needs it to be
dynamic, but for now a static flag is all that's needed.
Link: https://lore.kernel.org/r/20230925055451.59499-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Replace FS_CFLG_OWN_PAGES with a bit flag 'needs_bounce_pages' which has
the opposite meaning. I.e., filesystems now opt into the bounce page
pool instead of opt out. Make fscrypt_alloc_bounce_page() check that
the bounce page pool has been initialized.
I believe the opt-in makes more sense, since nothing else in
fscrypt_operations is opt-out, and these days filesystems can choose to
use blk-crypto which doesn't need the fscrypt bounce page pool. Also, I
happen to be planning to add two more flags, and I wanted to fix the
"FS_CFLG_" name anyway as it wasn't prefixed with "FSCRYPT_".
Link: https://lore.kernel.org/r/20230925055451.59499-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_operations::key_prefix should not be set by any filesystems that
aren't setting it already. This is already documented, but apparently
it's not sufficiently clear, as both ceph and btrfs have tried to set
it. Rename the field to legacy_key_prefix and improve the documentation
to hopefully make it clearer.
Link: https://lore.kernel.org/r/20230925055451.59499-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Otherwise, we'll get a broken inode.
# touch $FILE
# f2fs_io setflags compression $FILE
# f2fs_io set_coption 2 8 $FILE
[ 112.227612] F2FS-fs (dm-51): sanity_check_compress_inode: inode (ino=8d3fe) has unsupported compress level: 0, run fsck to fix
Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If file has both cold and compress flag, during f2fs_ioc_compress_file(),
f2fs will trigger IPU for non-compress cluster and OPU for compress
cluster, so that, data of the file may be fragmented.
Fix it by always triggering OPU for IOs from user mode compression.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
syzbot reports a kernel bug as below:
F2FS-fs (loop1): detect filesystem reference count leak during umount, type: 10, count: 1
kernel BUG at fs/f2fs/super.c:1639!
CPU: 0 PID: 15451 Comm: syz-executor.1 Not tainted 6.5.0-syzkaller-09338-ge0152e7481c6 #0
RIP: 0010:f2fs_put_super+0xce1/0xed0 fs/f2fs/super.c:1639
Call Trace:
generic_shutdown_super+0x161/0x3c0 fs/super.c:693
kill_block_super+0x3b/0x70 fs/super.c:1646
kill_f2fs_super+0x2b7/0x3d0 fs/f2fs/super.c:4879
deactivate_locked_super+0x9a/0x170 fs/super.c:481
deactivate_super+0xde/0x100 fs/super.c:514
cleanup_mnt+0x222/0x3d0 fs/namespace.c:1254
task_work_run+0x14d/0x240 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x60 kernel/entry/common.c:296
do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
In f2fs_put_super(), it tries to do sanity check on dirty and IO
reference count of f2fs, once there is any reference count leak,
it will trigger panic.
The root case is, during f2fs_put_super(), if there is any IO error
in f2fs_wait_on_all_pages(), we missed to truncate meta_inode's page
cache later, result in panic, fix this case.
Fixes: 20872584b8c0 ("f2fs: fix to drop all dirty meta/node pages during umount()")
Reported-by: syzbot+ebd7072191e2eddd7d6e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000a14f020604a62a98@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's allocate the extent_cache tree without dynamic conditions to avoid a
missing condition causing a panic as below.
# create a file w/ a compressed flag
# disable the compression
# panic while updating extent_cache
F2FS-fs (dm-64): Swapfile: last extent is not aligned to section
F2FS-fs (dm-64): Swapfile (3) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * N)
Adding 124996k swap on ./swap-file. Priority:0 extents:2 across:17179494468k
==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read_write out/common/include/linux/instrumented.h:101 [inline]
BUG: KASAN: null-ptr-deref in atomic_try_cmpxchg_acquire out/common/include/asm-generic/atomic-instrumented.h:705 [inline]
BUG: KASAN: null-ptr-deref in queued_write_lock out/common/include/asm-generic/qrwlock.h:92 [inline]
BUG: KASAN: null-ptr-deref in __raw_write_lock out/common/include/linux/rwlock_api_smp.h:211 [inline]
BUG: KASAN: null-ptr-deref in _raw_write_lock+0x5a/0x110 out/common/kernel/locking/spinlock.c:295
Write of size 4 at addr 0000000000000030 by task syz-executor154/3327
CPU: 0 PID: 3327 Comm: syz-executor154 Tainted: G O 5.10.185 #1
Hardware name: emulation qemu-x86/qemu-x86, BIOS 2023.01-21885-gb3cc1cd24d 01/01/2023
Call Trace:
__dump_stack out/common/lib/dump_stack.c:77 [inline]
dump_stack_lvl+0x17e/0x1c4 out/common/lib/dump_stack.c:118
__kasan_report+0x16c/0x260 out/common/mm/kasan/report.c:415
kasan_report+0x51/0x70 out/common/mm/kasan/report.c:428
kasan_check_range+0x2f3/0x340 out/common/mm/kasan/generic.c:186
__kasan_check_write+0x14/0x20 out/common/mm/kasan/shadow.c:37
instrument_atomic_read_write out/common/include/linux/instrumented.h:101 [inline]
atomic_try_cmpxchg_acquire out/common/include/asm-generic/atomic-instrumented.h:705 [inline]
queued_write_lock out/common/include/asm-generic/qrwlock.h:92 [inline]
__raw_write_lock out/common/include/linux/rwlock_api_smp.h:211 [inline]
_raw_write_lock+0x5a/0x110 out/common/kernel/locking/spinlock.c:295
__drop_extent_tree+0xdf/0x2f0 out/common/fs/f2fs/extent_cache.c:1155
f2fs_drop_extent_tree+0x17/0x30 out/common/fs/f2fs/extent_cache.c:1172
f2fs_insert_range out/common/fs/f2fs/file.c:1600 [inline]
f2fs_fallocate+0x19fd/0x1f40 out/common/fs/f2fs/file.c:1764
vfs_fallocate+0x514/0x9b0 out/common/fs/open.c:310
ksys_fallocate out/common/fs/open.c:333 [inline]
__do_sys_fallocate out/common/fs/open.c:341 [inline]
__se_sys_fallocate out/common/fs/open.c:339 [inline]
__x64_sys_fallocate+0xb8/0x100 out/common/fs/open.c:339
do_syscall_64+0x35/0x50 out/common/arch/x86/entry/common.c:46
Cc: stable@vger.kernel.org
Fixes: 72840cccc0a1 ("f2fs: allocate the extent_cache by default")
Reported-and-tested-by: syzbot+d342e330a37b48c094b7@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
With below script, redundant compress extension will be parsed and added
by parse_options(), because parse_options() doesn't check whether the
extension is existed or not, fix it.
1. mount -t f2fs -o compress_extension=so /dev/vdb /mnt/f2fs
2. mount -t f2fs -o remount,compress_extension=so /mnt/f2fs
3. mount|grep f2fs
/dev/vdb on /mnt/f2fs type f2fs (...,compress_extension=so,compress_extension=so,...)
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Fixes: 151b1982be5d ("f2fs: compress: add nocompress extensions support")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch covers sanity check logic on cluster w/ CONFIG_F2FS_CHECK_FS,
otherwise, there will be performance regression while querying cluster
mapping info.
Callers of f2fs_is_compressed_cluster() only care about whether cluster
is compressed or not, rather than # of valid blocks in compressed cluster,
so, let's adjust f2fs_is_compressed_cluster()'s logic according to
caller's requirement.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Call trace:
__memcpy+0x128/0x250
f2fs_read_multi_pages+0x940/0xf7c
f2fs_mpage_readpages+0x5a8/0x624
f2fs_readahead+0x5c/0x110
page_cache_ra_unbounded+0x1b8/0x590
do_sync_mmap_readahead+0x1dc/0x2e4
filemap_fault+0x254/0xa8c
f2fs_filemap_fault+0x2c/0x104
__do_fault+0x7c/0x238
do_handle_mm_fault+0x11bc/0x2d14
do_mem_abort+0x3a8/0x1004
el0_da+0x3c/0xa0
el0t_64_sync_handler+0xc4/0xec
el0t_64_sync+0x1b4/0x1b8
In f2fs_read_multi_pages(), once f2fs_decompress_cluster() was called if
we hit cached page in compress_inode's cache, dic may be released, it needs
break the loop rather than continuing it, in order to avoid accessing
invalid dic pointer.
Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
With below mount option and testcase, it hangs kernel.
1. mount -t f2fs -o compress_log_size=5 /dev/vdb /mnt/f2fs
2. touch /mnt/f2fs/file
3. chattr +c /mnt/f2fs/file
4. dd if=/dev/zero of=/mnt/f2fs/file bs=1MB count=1
5. sync
6. dd if=/dev/zero of=/mnt/f2fs/file bs=111 count=11 conv=notrunc
7. sync
INFO: task sync:4788 blocked for more than 120 seconds.
Not tainted 6.5.0-rc1+ #322
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync state:D stack:0 pid:4788 ppid:509 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x335/0xf80
schedule+0x6f/0xf0
wb_wait_for_completion+0x5e/0x90
sync_inodes_sb+0xd8/0x2a0
sync_inodes_one_sb+0x1d/0x30
iterate_supers+0x99/0xf0
ksys_sync+0x46/0xb0
__do_sys_sync+0x12/0x20
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
The reason is f2fs_all_cluster_page_ready() assumes that pages array should
cover at least one cluster, otherwise, it will always return false, result
in deadloop.
By default, pages array size is 16, and it can cover the case cluster_size
is equal or less than 16, for the case cluster_size is larger than 16, let's
allocate memory of pages array dynamically.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
|
|
Pull block updates from Jens Axboe:
"Pretty quiet round for this release. This contains:
- Add support for zoned storage to ublk (Andreas, Ming)
- Series improving performance for drivers that mark themselves as
needing a blocking context for issue (Bart)
- Cleanup the flush logic (Chengming)
- sed opal keyring support (Greg)
- Fixes and improvements to the integrity support (Jinyoung)
- Add some exports for bcachefs that we can hopefully delete again in
the future (Kent)
- deadline throttling fix (Zhiguo)
- Series allowing building the kernel without buffer_head support
(Christoph)
- Sanitize the bio page adding flow (Christoph)
- Write back cache fixes (Christoph)
- MD updates via Song:
- Fix perf regression for raid0 large sequential writes (Jan)
- Fix split bio iostat for raid0 (David)
- Various raid1 fixes (Heinz, Xueshi)
- raid6test build fixes (WANG)
- Deprecate bitmap file support (Christoph)
- Fix deadlock with md sync thread (Yu)
- Refactor md io accounting (Yu)
- Various non-urgent fixes (Li, Yu, Jack)
- Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"
* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
block: use strscpy() to instead of strncpy()
block: sed-opal: keyring support for SED keys
block: sed-opal: Implement IOC_OPAL_REVERT_LSP
block: sed-opal: Implement IOC_OPAL_DISCOVERY
blk-mq: prealloc tags when increase tagset nr_hw_queues
blk-mq: delete redundant tagset map update when fallback
blk-mq: fix tags leak when shrink nr_hw_queues
ublk: zoned: support REQ_OP_ZONE_RESET_ALL
md: raid0: account for split bio in iostat accounting
md/raid0: Fix performance regression for large sequential writes
md/raid0: Factor out helper for mapping and submitting a bio
md raid1: allow writebehind to work on any leg device set WriteMostly
md/raid1: hold the barrier until handle_read_error() finishes
md/raid1: free the r1bio before waiting for blocked rdev
md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
drivers/rnbd: restore sysfs interface to rnbd-client
md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
raid6: test: only check for Altivec if building on powerpc hosts
raid6: test: make sure all intermediate and artifact files are .gitignored
...
|
|
Pull iomap updates from Darrick Wong:
"We've got some big changes for this release -- I'm very happy to be
landing willy's work to enable large folios for the page cache for
general read and write IOs when the fs can make contiguous space
allocations, and Ritesh's work to track sub-folio dirty state to
eliminate the write amplification problems inherent in using large
folios.
As a bonus, io_uring can now process write completions in the caller's
context instead of bouncing through a workqueue, which should reduce
io latency dramatically. IOWs, XFS should see a nice performance bump
for both IO paths.
Summary:
- Make large writes to the page cache fill sparse parts of the cache
with large folios, then use large memcpy calls for the large folio.
- Track the per-block dirty state of each large folio so that a
buffered write to a single byte on a large folio does not result in
a (potentially) multi-megabyte writeback IO.
- Allow some directio completions to be performed in the initiating
task's context instead of punting through a workqueue. This will
reduce latency for some io_uring requests"
* tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
iomap: support IOCB_DIO_CALLER_COMP
io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
fs: add IOCB flags related to passing back dio completions
iomap: add IOMAP_DIO_INLINE_COMP
iomap: only set iocb->private for polled bio
iomap: treat a write through cache the same as FUA
iomap: use an unsigned type for IOMAP_DIO_* defines
iomap: cleanup up iomap_dio_bio_end_io()
iomap: Add per-block dirty state tracking to improve performance
iomap: Allocate ifs in ->write_begin() early
iomap: Refactor iomap_write_delalloc_punch() function out
iomap: Use iomap_punch_t typedef
iomap: Fix possible overflow condition in iomap_write_delalloc_scan
iomap: Add some uptodate state handling helpers for ifs state bitmap
iomap: Drop ifs argument from iomap_set_range_uptodate()
iomap: Rename iomap_page to iomap_folio_state and others
iomap: Copy larger chunks from userspace
iomap: Create large folios in the buffered write path
filemap: Allow __filemap_get_folio to allocate large folios
filemap: Add fgf_t typedef
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock updates from Christian Brauner:
"This contains the super rework that was ready for this cycle. The
first part changes the order of how we open block devices and allocate
superblocks, contains various cleanups, simplifications, and a new
mechanism to wait on superblock state changes.
This unblocks work to ultimately limit the number of writers to a
block device. Jan has already scheduled follow-up work that will be
ready for v6.7 and allows us to restrict the number of writers to a
given block device. That series builds on this work right here.
The second part contains filesystem freezing updates.
Overview:
The generic superblock changes are rougly organized as follows
(ignoring additional minor cleanups):
(1) Removal of the bd_super member from struct block_device.
This was a very odd back pointer to struct super_block with
unclear rules. For all relevant places we have other means to get
the same information so just get rid of this.
(2) Simplify rules for superblock cleanup.
Roughly, everything that is allocated during fs_context
initialization and that's stored in fs_context->s_fs_info needs
to be cleaned up by the fs_context->free() implementation before
the superblock allocation function has been called successfully.
After sget_fc() returned fs_context->s_fs_info has been
transferred to sb->s_fs_info at which point sb->kill_sb() if
fully responsible for cleanup. Adhering to these rules means that
cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
brittle and inconsistent.
Cleanup shouldn't be duplicated between sb->put_super() as
sb->put_super() is only called if sb->s_root has been set aka
when the filesystem has been successfully born (SB_BORN). That
complexity should be avoided.
This also means that block devices are to be closed in
sb->kill_sb() instead of sb->put_super(). More details in the
lower section.
(3) Make it possible to lookup or create a superblock before opening
block devices
There's a subtle dependency on (2) as some filesystems did rely
on fill_super() to be called in order to correctly clean up
sb->s_fs_info. All these filesystems have been fixed.
(4) Switch most filesystem to follow the same logic as the generic
mount code now does as outlined in (3).
(5) Use the superblock as the holder of the block device. We can now
easily go back from block device to owning superblock.
(6) Export and extend the generic fs_holder_ops and use them as
holder ops everywhere and remove the filesystem specific holder
ops.
(7) Call from the block layer up into the filesystem layer when the
block device is removed, allowing to shut down the filesystem
without risk of deadlocks.
(8) Get rid of get_super().
We can now easily go back from the block device to owning
superblock and can call up from the block layer into the
filesystem layer when the device is removed. So no need to wade
through all registered superblock to find the owning superblock
anymore"
Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/
* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
super: use higher-level helper for {freeze,thaw}
super: wait until we passed kill super
super: wait for nascent superblocks
super: make locking naming consistent
super: use locking helpers
fs: simplify invalidate_inodes
fs: remove get_super
block: call into the file system for ioctl BLKFLSBUF
block: call into the file system for bdev_mark_dead
block: consolidate __invalidate_device and fsync_bdev
block: drop the "busy inodes on changed media" log message
dasd: also call __invalidate_device when setting the device offline
amiflop: don't call fsync_bdev in FDFMTBEG
floppy: call disk_force_media_change when changing the format
block: simplify the disk_force_media_change interface
nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
xfs use fs_holder_ops for the log and RT devices
xfs: drop s_umount over opening the log and RT devices
ext4: use fs_holder_ops for the log device
ext4: drop s_umount over opening the log device
...
|
|
Use the finish zone command first when a zone should be closed.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
After remount, F2FS_OPTION().compress_level was assgin to
LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it.
1. mount /dev/vdb
/dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...)
2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/
3. mount|grep f2fs
/dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...)
Fixes: 00e120b5e4b5 ("f2fs: assign default compression level")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In error path of f2fs_submit_page_read(), it missed to call
iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In sanity_check_{compress_,}inode(), it doesn't need to set SBI_NEED_FSCK
in each error case, instead, we can set the flag in do_read_inode() only
once when sanity_check_inode() fails.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull filesystem freezing updates from Darrick Wong:
New code for 6.6:
* Allow the kernel to initiate a freeze of a filesystem. The kernel
and userspace can both hold a freeze on a filesystem at the same
time; the freeze is not lifted until /both/ holders lift it. This
will enable us to fix a longstanding bug in XFS online fsck.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Message-Id: <20230822182604.GB11286@frogsfrogsfrogs>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted
------------------------------------------------------
syz-executor273/5027 is trying to acquire lock:
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
but task is already holding lock:
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fi->i_xattr_sem){.+.+}-{3:3}:
down_read+0x9c/0x470 kernel/locking/rwsem.c:1520
f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532
__f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:377 [inline]
f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420
f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558
f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740
f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
do_mkdirat+0x2a9/0x330 fs/namei.c:4140
__do_sys_mkdir fs/namei.c:4160 [inline]
__se_sys_mkdir fs/namei.c:4158 [inline]
__x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (&fi->i_sem){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
lock_acquire kernel/locking/lockdep.c:5761 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
down_write+0x93/0x200 kernel/locking/rwsem.c:1573
f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline]
ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146
ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309
ovl_make_workdir fs/overlayfs/super.c:711 [inline]
ovl_get_workdir fs/overlayfs/super.c:864 [inline]
ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400
vfs_get_super+0xf9/0x290 fs/super.c:1152
vfs_get_tree+0x88/0x350 fs/super.c:1519
do_new_mount fs/namespace.c:3335 [inline]
path_mount+0x1492/0x1ed0 fs/namespace.c:3662
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount fs/namespace.c:3861 [inline]
__x64_sys_mount+0x293/0x310 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(&fi->i_xattr_sem);
lock(&fi->i_sem);
lock(&fi->i_xattr_sem);
lock(&fi->i_sem);
Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com
Fixes: 5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock"
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Previously, we have two mechanisms to cache & submit small discards:
a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards,
and checkpoint will cache small discard candidates w/ configured maximum
number.
b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small
discard candidates w/ configured discard granularity, but w/o limitation
of number. FSTRIM interface is asynchronized, so it won't submit discard
directly.
Finally, discard thread will submit them in background periodically.
However, after commit 9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint"), the mechanism a) is broken, since no matter
how we configure the sysfs entry /sys/fs/f2fs/vdb/max_small_discards,
checkpoint will not cache small discard candidates any more.
echo 0 > /sys/fs/f2fs/vdb/max_small_discards
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
echo 100 > /sys/fs/f2fs/vdb/max_small_discards
rm /mnt/f2fs/file
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
Before the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 3 1 . . . . . .
After the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
This patch reverts commit 9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint") in order to fix this issue.
Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The sending interval of discard and GC should also
consider direct write requests; filesystem is not
idle if there is direct write.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
cp_foreground_calls sysfs entry shows total CP call count rather than
foreground CP call count, fix it.
Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As reported, status debugfs entry shows inconsistent GC stats as below:
GC calls: 6008 (BG: 6161)
- data segments : 3053 (BG: 3053)
- node segments : 2955 (BG: 2955)
Total GC calls is larger than BGGC calls, the reason is:
- f2fs_stat_info.call_count accounts total migrated section count
by f2fs_gc()
- f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from
background gc_thread
Another issue is gc_foreground_calls sysfs entry shows total GC call
count rather than FGGC call count.
This patch changes as below for fix:
- account GC calls and migrated segment count separately
- support to account migrated section count if it enables large section
mode
- fix to show correct value in gc_foreground_calls sysfs entry
Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It has checked return value of write_all_xattrs(), remove unneeded
following check condition.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
generic/728 - output mismatch (see /media/fstests/results//generic/728.out.bad)
--- tests/generic/728.out 2023-07-19 07:10:48.362711407 +0000
+++ /media/fstests/results//generic/728.out.bad 2023-07-19 08:39:57.000000000 +0000
QA output created by 728
+Expected ctime to change after setxattr.
+Expected ctime to change after removexattr.
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/728.out /media/fstests/results//generic/728.out.bad' to see the entire diff)
generic/729 1s
It needs to update i_ctime after {set,remove}xattr, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
syzbot reports a f2fs bug as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3275:19
index 1409 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
inline_data_addr fs/f2fs/f2fs.h:3275 [inline]
__recover_inline_status fs/f2fs/inode.c:113 [inline]
do_read_inode fs/f2fs/inode.c:480 [inline]
f2fs_iget+0x4730/0x48b0 fs/f2fs/inode.c:604
f2fs_fill_super+0x640e/0x80c0 fs/f2fs/super.c:4601
mount_bdev+0x276/0x3b0 fs/super.c:1391
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue was bisected to:
commit d48a7b3a72f121655d95b5157c32c7d555e44c05
Author: Chao Yu <chao@kernel.org>
Date: Mon Jan 9 03:49:20 2023 +0000
f2fs: fix to do sanity check on extent cache correctly
The root cause is we applied both v1 and v2 of the patch, v2 is the right
fix, so it needs to revert v1 in order to fix reported issue.
v1:
commit d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230109034920.492914-1-chao@kernel.org/
v2:
commit 269d11948100 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230207134808.1827869-1-chao@kernel.org/
Reported-by: syzbot+601018296973a481f302@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000fcf0690600e4d04d@google.com/
Fixes: d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.
Fixes: 6691d940b0e0 ("f2fs: introduce fragment allocation mode mount option")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The commit 25f9080576b9 ("f2fs: add async reset zone command support")
introduced "async reset zone commands" by calling
__submit_zone_reset_cmd() in async discard operations. However,
__submit_zone_reset_cmd() is called regardless of zone type of discard
target zone. When devices have conventional zones, zone reset commands
are sent to the conventional zones and cause I/O errors.
Avoid the I/O errors by checking that the discard target zone type is
sequential write required. If not, handle the discard operation in same
manner as non-zoned, regular block devices. For that purpose, add a new
helper function f2fs_bdev_index() which gets index of the zone reset
target device.
Fixes: 25f9080576b9 ("f2fs: add async reset zone command support")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs won't compress non-full cluster in tail of file, let's skip
dirtying and rewrite such cluster during f2fs_ioc_{,de}compress_file.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch allows f2fs_ioc_{,de}compress_file() to be interrupted, so that,
userspace won't be blocked when manual {,de}compression on large file is
interrupted by signal.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_scan_devices reopens the main device since the very beginning, which
has always been useless, and also means that we don't pass the right
holder for the reopen, which now leads to a warning as the core super.c
holder ops aren't passed in for the reopen.
Fixes: 3c62be17d4f5 ("f2fs: support multiple devices")
Fixes: 0718afd47f70 ("block: introduce holder ops")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Compression option in inode should not be changed after they have
been used, however, it may happen in below race case:
Thread A Thread B
- f2fs_ioc_set_compress_option
- check f2fs_is_mmap_file()
- check get_dirty_pages()
- check F2FS_HAS_BLOCKS()
- f2fs_file_mmap
- set_inode_flag(FI_MMAP_FILE)
- fault
- do_page_mkwrite
- f2fs_vm_page_mkwrite
- f2fs_get_block_locked
- fault_dirty_shared_page
- set_page_dirty
- update i_compress_algorithm
- update i_log_cluster_size
- update i_cluster_size
Avoid such race condition by covering f2fs_file_mmap() w/ i_sem lock,
meanwhile add mmap file check condition in f2fs_may_compress() as well.
Fixes: e1e8debec656 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=216050
Somehow we're getting a page which has a different mapping.
Let's avoid the infinite loop.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's flush the inode being aborted atomic operation to avoid stale dirty
inode during eviction in this call stack:
f2fs_mark_inode_dirty_sync+0x22/0x40 [f2fs]
f2fs_abort_atomic_write+0xc4/0xf0 [f2fs]
f2fs_evict_inode+0x3f/0x690 [f2fs]
? sugov_start+0x140/0x140
evict+0xc3/0x1c0
evict_inodes+0x17b/0x210
generic_shutdown_super+0x32/0x120
kill_block_super+0x21/0x50
deactivate_locked_super+0x31/0x90
cleanup_mnt+0x100/0x160
task_work_run+0x59/0x90
do_exit+0x33b/0xa50
do_group_exit+0x2d/0x80
__x64_sys_exit_group+0x14/0x20
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
This triggers f2fs_bug_on() in f2fs_evict_inode:
f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
This fixes the syzbot report:
loop0: detected capacity change from 0 to 131072
F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Found nat_bits in checkpoint
F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:869!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 5014 Comm: syz-executor220 Not tainted 6.4.0-syzkaller-11479-g6cd06ab12d1a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007
RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000
R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0
Call Trace:
<TASK>
evict+0x2ed/0x6b0 fs/inode.c:665
dispose_list+0x117/0x1e0 fs/inode.c:698
evict_inodes+0x345/0x440 fs/inode.c:748
generic_shutdown_super+0xaf/0x480 fs/super.c:478
kill_block_super+0x64/0xb0 fs/super.c:1417
kill_f2fs_super+0x2af/0x3c0 fs/f2fs/super.c:4704
deactivate_locked_super+0x98/0x160 fs/super.c:330
deactivate_super+0xb1/0xd0 fs/super.c:361
cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1254
task_work_run+0x16f/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xa9a/0x29a0 kernel/exit.c:874
do_group_exit+0xd4/0x2a0 kernel/exit.c:1024
__do_sys_exit_group kernel/exit.c:1035 [inline]
__se_sys_exit_group kernel/exit.c:1033 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f309be71a09
Code: Unable to access opcode bytes at 0x7f309be719df.
RSP: 002b:00007fff171df518 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f309bef7330 RCX: 00007f309be71a09
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 00007f309bef1e40
R10: 0000000000010600 R11: 0000000000000246 R12: 00007f309bef7330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007
RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000
R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0
Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e1246909d526a9d470fa@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_compress_alloc_page() uses mempool to allocate memory, it never
fail, don't handle error case in its callers.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This reverts commit bfd476623999118d9c509cb0fa9380f2912bc225.
Shinichiro Kawasaki reported:
When I ran workloads on f2fs using v6.5-rcX with fixes [1][2] and a zoned block
devices with 4kb logical block size, I observe mount failure as follows. When
I revert this commit, the failure goes away.
[ 167.781975][ T1555] F2FS-fs (dm-0): IO Block Size: 4 KB
[ 167.890728][ T1555] F2FS-fs (dm-0): Found nat_bits in checkpoint
[ 171.482588][ T1555] F2FS-fs (dm-0): Zone without valid block has non-zero write pointer. Reset the write pointer: wp[0x1300,0x8]
[ 171.496000][ T1555] F2FS-fs (dm-0): (0) : Unaligned zone reset attempted (block 280000 + 80000)
[ 171.505037][ T1555] F2FS-fs (dm-0): Discard zone failed: (errno=-5)
The patch replaced "sbi->log_blocksize - SECTOR_SHIFT" with
"sbi->log_sectors_per_block". However, I think these two are not equal when the
device has 4k logical block size. The former uses Linux kernel sector size 512
byte. The latter use 512b sector size or 4kb sector size depending on the
device. mkfs.f2fs obtains logical block size via BLKSSZGET ioctl from the device
and reflects it to the value sbi->log_sector_size_per_block. This causes
unexpected write pointer calculations in check_zone_write_pointer(). This
resulted in unexpected zone reset and the mount failure.
[1] https://lkml.kernel.org/linux-f2fs-devel/20230711050101.GA19128@lst.de/
[2] https://lore.kernel.org/linux-f2fs-devel/20230804091556.2372567-1-shinichiro.kawasaki@wdc.com/
Cc: stable@vger.kernel.org
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: bfd476623999 ("f2fs: clean up w/ sbi->log_sectors_per_block")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance. Pass the super_block instead
which is useful when passed back to the file system driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-7-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.
Add a request_mask argument to generic_fillattr and have most callers
just pass in the value that is passed to getattr. Have other callers
(e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
STATX_CHANGE_COOKIE into generic_fillattr.
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new config option that controls building the buffer_head code, and
select it from all file systems and stacking drivers that need it.
For the block device nodes and alternative iomap based buffered I/O path
is provided when buffer_head support is not enabled, and iomap needs a
a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call
into the buffer_head code when it doesn't exist.
Otherwise this is just Kconfig and ifdef changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
block_page_mkwrite_return is neither block nor mkwrite specific, and
should not be under CONFIG_BLOCK. Move it to mm.h and rename it to
vmf_fs_error.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Similarly to gfp_t, define fgf_t as its own type to prevent various
misuses and confusion. Leave the flags as FGP_* for now to reduce the
size of this patch; they will be converted to FGF_* later. Move the
documentation to the definition of the type insted of burying it in the
__filemap_get_folio() documentation.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-41-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Userspace can freeze a filesystem using the FIFREEZE ioctl or by
suspending the block device; this state persists until userspace thaws
the filesystem with the FITHAW ioctl or resuming the block device.
Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
the fsfreeze ioctl") we only allow the first freeze command to succeed.
The kernel may decide that it is necessary to freeze a filesystem for
its own internal purposes, such as suspends in progress, filesystem fsck
activities, or quiescing a device prior to removal. Userspace thaw
commands must never break a kernel freeze, and kernel thaw commands
shouldn't undo userspace's freeze command.
Introduce a couple of freeze holder flags and wire it into the
sb_writers state. One kernel and one userspace freeze are allowed to
coexist at the same time; the filesystem will not thaw until both are
lifted.
I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but
for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing
behaviors.
Cc: mcgrof@kernel.org
Cc: jack@suse.cz
Cc: hch@infradead.org
Cc: ruansy.fnst@fujitsu.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we've mainly investigated the zoned block device
support along with patches such as correcting write pointers between
f2fs and storage, adding asynchronous zone reset flow, and managing
the number of open zones.
Other than them, f2fs adds another mount option, "errors=x" to specify
how to handle when it detects an unexpected behavior at runtime.
Enhancements:
- support 'errors=remount-ro|continue|panic' mount option
- enforce some inode flag policies
- allow .tmp compression given extensions
- add some ioctls to manage the f2fs compression
- improve looped node chain flow
- avoid issuing small-sized discard commands during checkpoint
- implement an asynchronous zone reset
Bug fixes:
- fix deadlock in xattr and inode page lock
- fix and add sanity check in some error paths
- fix to avoid NULL pointer dereference f2fs_write_end_io() along
with put_super
- set proper flags to quota files
- fix potential deadlock due to unpaired node_write lock use
- fix over-estimating free section during FG GC
- fix the wrong condition to determine atomic context
As usual, also there are a number of patches with code refactoring and
minor clean-ups"
* tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
f2fs: fix to do sanity check on direct node in truncate_dnode()
f2fs: only set release for file that has compressed data
f2fs: fix compile warning in f2fs_destroy_node_manager()
f2fs: fix error path handling in truncate_dnode()
f2fs: fix deadlock in i_xattr_sem and inode page lock
f2fs: remove unneeded page uptodate check/set
f2fs: update mtime and ctime in move file range method
f2fs: compress tmp files given extension
f2fs: refactor struct f2fs_attr macro
f2fs: convert to use sbi directly
f2fs: remove redundant assignment to variable err
f2fs: do not issue small discard commands during checkpoint
f2fs: check zone write pointer points to the end of zone
f2fs: add f2fs_ioc_get_compress_blocks
f2fs: cleanup MIN_INLINE_XATTR_SIZE
f2fs: add helper to check compression level
f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
f2fs: do more sanity check on inode
f2fs: compress: fix to check validity of i_compress_flag field
f2fs: add sanity compress level check for compressed file
...
|
|
syzbot reports below bug:
BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000
CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
f2fs_truncate include/linux/fs.h:825 [inline]
f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
notify_change+0xb2c/0x1180 fs/attr.c:483
do_truncate+0x143/0x200 fs/open.c:66
handle_truncate fs/namei.c:3295 [inline]
do_open fs/namei.c:3640 [inline]
path_openat+0x2083/0x2750 fs/namei.c:3791
do_filp_open+0x1ba/0x410 fs/namei.c:3818
do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1448 [inline]
__se_sys_creat fs/open.c:1442 [inline]
__x64_sys_creat+0xcd/0x120 fs/open.c:1442
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.
This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.
Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.
Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If a file is not comprssed yet or does not have compressed data,
for example, its data has a very low compression ratio, do not
set FI_COMPRESS_RELEASED flag.
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
fs/f2fs/node.c: In function ‘f2fs_destroy_node_manager’:
fs/f2fs/node.c:3390:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
3390 | }
Merging below pointer arrays into common one, and reuse it by cast type.
struct nat_entry *natvec[NATVEC_SIZE];
struct nat_entry_set *setvec[SETVEC_SIZE];
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If truncate_node() fails in truncate_dnode(), it missed to call
f2fs_put_page(), fix it.
Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Thread #1:
[122554.641906][ T92] f2fs_getxattr+0xd4/0x5fc
-> waiting for f2fs_down_read(&F2FS_I(inode)->i_xattr_sem);
[122554.641927][ T92] __f2fs_get_acl+0x50/0x284
[122554.641948][ T92] f2fs_init_acl+0x84/0x54c
[122554.641969][ T92] f2fs_init_inode_metadata+0x460/0x5f0
[122554.641990][ T92] f2fs_add_inline_entry+0x11c/0x350
-> Locked dir->inode_page by f2fs_get_node_page()
[122554.642009][ T92] f2fs_do_add_link+0x100/0x1e4
[122554.642025][ T92] f2fs_create+0xf4/0x22c
[122554.642047][ T92] vfs_create+0x130/0x1f4
Thread #2:
[123996.386358][ T92] __get_node_page+0x8c/0x504
-> waiting for dir->inode_page lock
[123996.386383][ T92] read_all_xattrs+0x11c/0x1f4
[123996.386405][ T92] __f2fs_setxattr+0xcc/0x528
[123996.386424][ T92] f2fs_setxattr+0x158/0x1f4
-> f2fs_down_write(&F2FS_I(inode)->i_xattr_sem);
[123996.386443][ T92] __f2fs_set_acl+0x328/0x430
[123996.386618][ T92] f2fs_set_acl+0x38/0x50
[123996.386642][ T92] posix_acl_chmod+0xc8/0x1c8
[123996.386669][ T92] f2fs_setattr+0x5e0/0x6bc
[123996.386689][ T92] notify_change+0x4d8/0x580
[123996.386717][ T92] chmod_common+0xd8/0x184
[123996.386748][ T92] do_fchmodat+0x60/0x124
[123996.386766][ T92] __arm64_sys_fchmodat+0x28/0x3c
Cc: <stable@vger.kernel.org>
Fixes: 27161f13e3c3 "f2fs: avoid race in between read xattr & write xattr"
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
|