aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nilfs2
AgeCommit message (Collapse)AuthorFilesLines
8 hoursMerge tag 'mm-nonmm-stable-2024-05-19-11-56' of ↵HEADmasterLinus Torvalds9-247/+214
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: "Mainly singleton patches, documented in their respective changelogs. Notable series include: - Some maintenance and performance work for ocfs2 in Heming Zhao's series "improve write IO performance when fragmentation is high". - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes exposed by fstests". - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean up kfifo.h". - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes for $lx_current and $lx_per_cpu". - After much discussion, a coding-style update from Barry Song explaining one reason why inline functions are preferred over macros. The series is "codingstyle: avoid unused parameters for a function-like macro"" * tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits) fs/proc: fix softlockup in __read_vmcore nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON() scripts: checkpatch: check unused parameters for function-like macro Documentation: coding-style: ask function-like macros to evaluate parameters nilfs2: use __field_struct() for a bitwise field selftests/kcmp: remove unused open mode nilfs2: remove calls to folio_set_error() and folio_clear_error() kernel/watchdog_perf.c: tidy up kerneldoc watchdog: allow nmi watchdog to use raw perf event watchdog: handle comma separated nmi_watchdog command line nilfs2: make superblock data array index computation sparse friendly squashfs: remove calls to set the folio error flag squashfs: convert squashfs_symlink_read_folio to use folio APIs scripts/gdb: fix detection of current CPU in KGDB scripts/gdb: make get_thread_info accept pointers scripts/gdb: fix parameter handling in $lx_per_cpu scripts/gdb: fix failing KGDB detection during probe kfifo: don't use "proxy" headers media: stih-cec: add missing io.h media: rc: add missing io.h ...
6 daysMerge tag 'vfs-6.10.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
8 daysnilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()Ryusuke Konishi1-1/+3
The BUG_ON check performed on the return value of __getblk() in nilfs_finish_roll_forward() assumes that a buffer that has been successfully read once is retrieved with the same parameters and does not fail (__getblk() does not return an error due to memory allocation failure). Also, nilfs_finish_roll_forward() is called at most once during mount. Taking these into consideration, rewrite the check to use WARN_ON() to avoid using BUG_ON(). Link: https://lkml.kernel.org/r/20240508221429.7559-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
8 daysnilfs2: remove calls to folio_set_error() and folio_clear_error()Matthew Wilcox (Oracle)2-8/+1
Nobody checks this flag on nilfs2 folios, stop setting and clearing it. That lets us simplify nilfs_end_folio_io() slightly. Link: https://lkml.kernel.org/r/20240420025029.2166544-17-willy@infradead.org Link: https://lkml.kernel.org/r/20240430050901.3239-1-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 daysnilfs2: make superblock data array index computation sparse friendlyRyusuke Konishi1-2/+18
Upon running sparse, "warning: dubious: x & !y" is output at an array index calculation within nilfs_load_super_block(). The calculation is not wrong, but to eliminate the sparse warning, replace it with an equivalent calculation. Also, add a comment to make it easier to understand what the unintuitive array index calculation is doing and whether it's correct. Link: https://lkml.kernel.org/r/20240430080019.4242-3-konishi.ryusuke@gmail.com Fixes: e339ad31f599 ("nilfs2: introduce secondary super block") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 daysnilfs2: convert to use the new mount APIEric Sandeen4-229/+174
Convert nilfs2 to use the new mount API. [sandeen@redhat.com: v2] Link: https://lkml.kernel.org/r/33d078a7-9072-4d8e-a3a9-dec23d4191da@redhat.com Link: https://lkml.kernel.org/r/20240425190526.10905-1-konishi.ryusuke@gmail.com [konishi.ryusuke: fixed missing SB_RDONLY flag repair in nilfs_reconfigure] Link: https://lkml.kernel.org/r/33d078a7-9072-4d8e-a3a9-dec23d4191da@redhat.com Link: https://lkml.kernel.org/r/20240424182716.6024-1-konishi.ryusuke@gmail.com Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25nilfs2: add kernel-doc comments to nilfs_remove_all_gcinodes()Yang Li1-0/+1
This commit adds kernel-doc style comments with complete parameter descriptions for the function nilfs_remove_all_gcinodes. Link: https://lkml.kernel.org/r/20240410075629.3441-4-konishi.ryusuke@gmail.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25nilfs2: add kernel-doc comments to nilfs_btree_convert_and_insert()Yang Li1-7/+16
This commit adds kernel-doc style comments with complete parameter descriptions for the function nilfs_btree_convert_and_insert. Link: https://lkml.kernel.org/r/20240410075629.3441-3-konishi.ryusuke@gmail.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25nilfs2: add kernel-doc comments to nilfs_do_roll_forward()Yang Li1-0/+1
Patch series "nilfs2: fix missing kernel-doc comments". This commit adds kernel-doc style comments with complete parameter descriptions for the function nilfs_do_roll_forward. Link: https://lkml.kernel.org/r/20240410075629.3441-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240410075629.3441-2-konishi.ryusuke@gmail.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-16nilfs2: fix OOB in nilfs_set_de_typeJeongjun Park1-1/+1
The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function, which uses this array, specifies the index to read from the array in the same way as "(mode & S_IFMT) >> S_SHIFT". static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode *inode) { umode_t mode = inode->i_mode; de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob } However, when the index is determined this way, an out-of-bounds (OOB) error occurs by referring to an index that is 1 larger than the array size when the condition "mode & S_IFMT == S_IFMT" is satisfied. Therefore, a patch to resize the nilfs_type_by_mode array should be applied to prevent OOB errors. Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com Reported-by: syzbot+2e22057de05b9f3b30d8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-09nilfs2: fix out-of-range warningArnd Bergmann1-1/+1
clang-14 points out that v_size is always smaller than a 64KB page size if that is configured by the CPU architecture: fs/nilfs2/ioctl.c:63:19: error: result of comparison of constant 65536 with expression of type '__u16' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (argv->v_size > PAGE_SIZE) ~~~~~~~~~~~~ ^ ~~~~~~~~~ This is ok, so just shut up that warning with a cast. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240328143051.1069575-7-arnd@kernel.org Fixes: 3358b4aaa84f ("nilfs2: fix problems of memory allocation in ioctl") Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-14nilfs2: prevent kernel bug at submit_bh_wbc()Ryusuke Konishi1-1/+1
Fix a bug where nilfs_get_block() returns a successful status when searching and inserting the specified block both fail inconsistently. If this inconsistent behavior is not due to a previously fixed bug, then an unexpected race is occurring, so return a temporary error -EAGAIN instead. This prevents callers such as __block_write_begin_int() from requesting a read into a buffer that is not mapped, which would cause the BUG_ON check for the BH_Mapped flag in submit_bh_wbc() to fail. Link: https://lkml.kernel.org/r/20240313105827.5296-3-konishi.ryusuke@gmail.com Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-14nilfs2: fix failure to detect DAT corruption in btree and direct mappingsRyusuke Konishi2-4/+14
Patch series "nilfs2: fix kernel bug at submit_bh_wbc()". This resolves a kernel BUG reported by syzbot. Since there are two flaws involved, I've made each one a separate patch. The first patch alone resolves the syzbot-reported bug, but I think both fixes should be sent to stable, so I've tagged them as such. This patch (of 2): Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data to a nilfs2 file system whose metadata is corrupted. There are two flaws involved in this issue. The first flaw is that when nilfs_get_block() locates a data block using btree or direct mapping, if the disk address translation routine nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata corruption, it can be passed back to nilfs_get_block(). This causes nilfs_get_block() to misidentify an existing block as non-existent, causing both data block lookup and insertion to fail inconsistently. The second flaw is that nilfs_get_block() returns a successful status in this inconsistent state. This causes the caller __block_write_begin_int() or others to request a read even though the buffer is not mapped, resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc() failing. This fixes the first issue by changing the return value to code -EINVAL when a conversion using DAT fails with code -ENOENT, avoiding the conflicting condition that leads to the kernel bug described above. Here, code -EINVAL indicates that metadata corruption was detected during the block lookup, which will be properly handled as a file system error and converted to -EIO when passing through the nilfs2 bmap layer. Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+cfed5b56649bddf80d6e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-12nilfs2: use div64_ul() instead of do_div()Thorsten Blum6-7/+7
Fixes Coccinelle/coccicheck warnings reported by do_div.cocci. Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder. Link: https://lkml.kernel.org/r/20240229210456.63234-2-thorsten.blum@toblux.com Link: https://lkml.kernel.org/r/20240306142547.4612-1-konishi.ryusuke@gmail.com Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert cpfile to use kmap_localRyusuke Konishi1-45/+45
Convert all remaining usages of kmap_atomic in cpfile to kmap_local. Link: https://lkml.kernel.org/r/20240122140202.6950-16-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: remove nilfs_cpfile_{get,put}_checkpoint()Ryusuke Konishi2-107/+0
All calls to nilfs_cpfile_get_checkpoint() and nilfs_cpfile_put_checkpoint() that call kmap() and kunmap() separately are now gone, so remove these methods. Link: https://lkml.kernel.org/r/20240122140202.6950-15-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: localize highmem mapping for checkpoint reading within cpfileRyusuke Konishi5-34/+87
Move the code for reading from a checkpoint entry that is performed in nilfs_attach_checkpoint() to the cpfile side, and make the page mapping local and temporary. And use kmap_local instead of kmap to access the checkpoint entry page. In order to load the ifile inode information included in the checkpoint entry within the inode lock section of nilfs_ifile_read(), the newly added checkpoint reading method nilfs_cpfile_read_checkpoint() is called indirectly via nilfs_ifile_read() instead of from nilfs_attach_checkpoint(). Link: https://lkml.kernel.org/r/20240122140202.6950-14-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: localize highmem mapping for checkpoint finalization within cpfileRyusuke Konishi3-46/+82
Move the checkpoint finalization routine to the cpfile side, and make the page mapping local and temporary. And use kmap_local instead of kmap to access the checkpoint entry page when finalizing a checkpoint. In this conversion, some of the information on the checkpoint entry being rewritten is passed through the arguments of the newly added method nilfs_cpfile_finalize_checkpoint(). Link: https://lkml.kernel.org/r/20240122140202.6950-13-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: localize highmem mapping for checkpoint creation within cpfileRyusuke Konishi3-29/+77
In order to convert kmap() used in cpfile to kmap_local, first move the checkpoint creation routine, which is one of the places where kmap is used, to the cpfile side and make the page mapping local and temporary. And use kmap_local instead of kmap to access the checkpoint entry page (and header block page) when generating a checkpoint. Link: https://lkml.kernel.org/r/20240122140202.6950-12-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert ifile to use kmap_localRyusuke Konishi4-10/+9
Convert deprecated kmap() and kmap_atomic() to use kmap_local for the ifile metadata file used to manage disk inodes. In some usages, calls to kmap_local and kunmap_local are split into different helpers, but those usages can be safely changed to local thread kmap. Link: https://lkml.kernel.org/r/20240122140202.6950-11-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: do not acquire rwsem in nilfs_bmap_write()Ryusuke Konishi1-3/+0
It is now clear that nilfs_bmap_write() is only used to finalize logs written to disk. Concurrent bmap modification operations are not performed on bmaps in this context. Additionally, this function does not modify data used in read-only operations such as bmap lookups. Therefore, there is no need to acquire bmap->b_sem in nilfs_bmap_write(), so delete it. Link: https://lkml.kernel.org/r/20240122140202.6950-10-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: move nilfs_bmap_write call out of nilfs_write_inode_commonRyusuke Konishi3-31/+57
Before converting the disk inode management metadata file ifile, the call to nilfs_bmap_write(), the i_device_code setting, and the zero-fill code for inodes on the super root block are moved from nilfs_write_inode_common() to its callers. This cleanup simplifies the role and arguments of nilfs_write_inode_common() and collects calls to nilfs_bmap_write() to the log writing code. Also, add and use a new helper nilfs_write_root_mdt_inode() to avoid code duplication in the data export routine nilfs_segctor_fill_in_super_root() to the super root block's buffer. Link: https://lkml.kernel.org/r/20240122140202.6950-9-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert DAT to use kmap_localRyusuke Konishi1-19/+19
Concerning the code of the metadata file DAT for disk address translation, convert all parts that use the deprecated kmap_atomic to use kmap_local. All transformations are directly possible. Link: https://lkml.kernel.org/r/20240122140202.6950-8-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert persistent object allocator to use kmap_localRyusuke Konishi1-45/+46
Regarding the allocator code that is commonly used in the ondisk inode metadata file ifile and the disk address translation metadata file DAT, convert the parts that use the deprecated kmap_atomic() and kmap() to use kmap_local. Most can be converted directly, but only nilfs_palloc_prepare_alloc_entry() needs to be rewritten to change mapping sections so that multiple kmap_local/kunmap_local calls are nested and disk I/O can be avoided within the mapping sections. Link: https://lkml.kernel.org/r/20240122140202.6950-7-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert sufile to use kmap_localRyusuke Konishi1-43/+43
Concerning the code of the metadata file sufile for segment management, convert all parts that uses the deprecated kmap_atomic() to use kmap_local. All transformations are directly possible here. Link: https://lkml.kernel.org/r/20240122140202.6950-6-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert metadata file common code to use kmap_localRyusuke Konishi1-2/+2
In the common code of metadata files, the new block creation routine nilfs_mdt_insert_new_block() still uses the deprecated kmap_atomic(), so convert it to use kmap_local. Link: https://lkml.kernel.org/r/20240122140202.6950-5-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert nilfs_copy_buffer() to use kmap_localRyusuke Konishi1-4/+4
The routine nilfs_copy_buffer() that copies a block buffer still uses the deprecated kmap_atomic(), so convert it to use kmap_local. Link: https://lkml.kernel.org/r/20240122140202.6950-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert segment buffer to use kmap_localRyusuke Konishi1-2/+2
In the segment buffer code used for log writing, a CRC calculation routine uses the deprecated kmap_atomic(), so convert it to use kmap_local. Link: https://lkml.kernel.org/r/20240122140202.6950-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nilfs2: convert recovery logic to use kmap_localRyusuke Konishi1-2/+2
Patch series "nilfs2: eliminate kmap and kmap_atomic calls". This series converts remaining kmap and kmap_atomic calls to use kmap_local, mainly in metadata files, and eliminates calls to these deprecated kmap functions from nilfs2. This series does not include converting metadata files to use folios, but it is a step in that direction. Most conversions are straightforward, but some are not: the checkpoint file, the inode file, and the persistent object allocator. These have been adjusted or rewritten to avoid multiple kmap_local calls or nest them if necessary, and to eliminate long waits like block I/O within the highmem mapping sections. This series has been tested in both 32-bit and 64-bit environments with varying block sizes. This patch (of 15): In the recovery function when mounting, nilfs_recovery_copy_block() uses the deprecated kmap_atomic(), so convert it to use kmap_local. Link: https://lkml.kernel.org/r/20240122140202.6950-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240122140202.6950-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07nilfs2: fix potential bug in end_buffer_async_writeRyusuke Konishi1-3/+5
According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2. Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash. This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel. The use of async_write for segment summary buffers has already been removed in a previous change. Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer. Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi1-1/+7
Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock. In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail. Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting. Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device. Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07nilfs2: fix data corruption in dsync block recovery for small block sizesRyusuke Konishi1-3/+4
The helper function nilfs_recovery_copy_block() of nilfs_recovery_dsync_blocks(), which recovers data from logs created by data sync writes during a mount after an unclean shutdown, incorrectly calculates the on-page offset when copying repair data to the file's page cache. In environments where the block size is smaller than the page size, this flaw can cause data corruption and leak uninitialized memory bytes during the recovery process. Fix these issues by correcting this byte offset calculation on the page. Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-11Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-6/+1
Pull misc filesystem updates from Al Viro: "Misc cleanups (the part that hadn't been picked by individual fs trees)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: apparmorfs: don't duplicate kfree_link() orangefs: saner arguments passing in readdir guts ocfs2_find_match(): there's no such thing as NULL or negative ->d_parent reiserfs_add_entry(): get rid of pointless namelen checks __ocfs2_add_entry(), ocfs2_prepare_dir_for_insert(): namelen checks ext4_add_entry(): ->d_name.len is never 0 befs: d_obtain_alias(ERR_PTR(...)) will do the right thing affs: d_obtain_alias(ERR_PTR(...)) will do the right thing /proc/sys: use d_splice_alias() calling conventions to simplify failure exits hostfs: use d_splice_alias() calling conventions to simplify failure exits udf_fiiter_add_entry(): check for zero ->d_name.len is bogus... udf: d_obtain_alias(ERR_PTR(...)) will do the right thing... udf: d_splice_alias() will do the right thing on ERR_PTR() inode nfsd: kill stale comment about simple_fill_super() requirements bfs_add_entry(): get rid of pointless ->d_name.len checks nilfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing... zonefs: d_splice_alias() will do the right thing on ERR_PTR() inode
2024-01-09Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of ↵Linus Torvalds14-365/+375
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Quite a lot of kexec work this time around. Many singleton patches in many places. The notable patch series are: - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio conversions for file paths'. - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2: Folio conversions for directory paths'. - IA64 remnant removal in Heiko Carstens's 'Remove unused code after IA-64 removal'. - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere in 'Treewide: enable -Wmissing-prototypes'. This had some followup fixes: - Nathan Chancellor has cleaned up the hexagon build in the series 'hexagon: Fix up instances of -Wmissing-prototypes'. - Nathan also addressed some s390 warnings in 's390: A couple of fixes for -Wmissing-prototypes'. - Arnd Bergmann addresses the same warnings for MIPS in his series 'mips: address -Wmissing-prototypes warnings'. - Baoquan He has made kexec_file operate in a top-down-fitting manner similar to kexec_load in the series 'kexec_file: Load kernel at top of system RAM if required' - Baoquan He has also added the self-explanatory 'kexec_file: print out debugging message if required'. - Some checkstack maintenance work from Tiezhu Yang in the series 'Modify some code about checkstack'. - Douglas Anderson has disentangled the watchdog code's logging when multiple reports are occurring simultaneously. The series is 'watchdog: Better handling of concurrent lockups'. - Yuntao Wang has contributed some maintenance work on the crash code in 'crash: Some cleanups and fixes'" * tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits) crash_core: fix and simplify the logic of crash_exclude_mem_range() x86/crash: use SZ_1M macro instead of hardcoded value x86/crash: remove the unused image parameter from prepare_elf_headers() kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE scripts/decode_stacktrace.sh: strip unexpected CR from lines watchdog: if panicking and we dumped everything, don't re-enable dumping watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/hardlockup: adopt softlockup logic avoiding double-dumps kexec_core: fix the assignment to kimage->control_page x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init() lib/trace_readwrite.c:: replace asm-generic/io with linux/io nilfs2: cpfile: fix some kernel-doc warnings stacktrace: fix kernel-doc typo scripts/checkstack.pl: fix no space expression between sp and offset x86/kexec: fix incorrect argument passed to kexec_dprintk() x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs nilfs2: add missing set_freezable() for freezable kthread kernel: relay: remove relay_file_splice_read dead code, doesn't work docs: submit-checklist: remove all of "make namespacecheck" ...
2024-01-08Merge tag 'vfs-6.8.super' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...
2024-01-08Merge tag 'vfs-6.8.misc' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Add Jan Kara as VFS reviewer - Show correct device and inode numbers in proc/<pid>/maps for vma files on stacked filesystems. This is now easily doable thanks to the backing file work from the last cycles. This comes with selftests Cleanups: - Remove a redundant might_sleep() from wait_on_inode() - Initialize pointer with NULL, not 0 - Clarify comment on access_override_creds() - Rework and simplify eventfd_signal() and eventfd_signal_mask() helpers - Process aio completions in batches to avoid needless wakeups - Completely decouple struct mnt_idmap from namespaces. We now only keep the actual idmapping around and don't stash references to namespaces - Reformat maintainer entries to indicate that a given subsystem belongs to fs/ - Simplify fput() for files that were never opened - Get rid of various pointless file helpers - Rename various file helpers - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from last cycle - Make relatime_need_update() return bool - Use GFP_KERNEL instead of GFP_USER when allocating superblocks - Replace deprecated ida_simple_*() calls with their current ida_*() counterparts Fixes: - Fix comments on user namespace id mapping helpers. They aren't kernel doc comments so they shouldn't be using /** - s/Retuns/Returns/g in various places - Add missing parameter documentation on can_move_mount_beneath() - Rename i_mapping->private_data to i_mapping->i_private_data - Fix a false-positive lockdep warning in pipe_write() for watch queues - Improve __fget_files_rcu() code generation to improve performance - Only notify writer that pipe resizing has finished after setting pipe->max_usage otherwise writers are never notified that the pipe has been resized and hang - Fix some kernel docs in hfsplus - s/passs/pass/g in various places - Fix kernel docs in ntfs - Fix kcalloc() arguments order reported by gcc 14 - Fix uninitialized value in reiserfs" * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) reiserfs: fix uninit-value in comp_keys watch_queue: fix kcalloc() arguments order ntfs: dir.c: fix kernel-doc function parameter warnings fs: fix doc comment typo fs tree wide selftests/overlayfs: verify device and inode numbers in /proc/pid/maps fs/proc: show correct device and inode numbers in /proc/pid/maps eventfd: Remove usage of the deprecated ida_simple_xx() API fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation fs/hfsplus: wrapper.c: fix kernel-doc warnings fs: add Jan Kara as reviewer fs/inode: Make relatime_need_update return bool pipe: wakeup wr_wait after setting max_usage file: remove __receive_fd() file: stop exposing receive_fd_user() fs: replace f_rcuhead with f_task_work file: remove pointless wrapper file: s/close_fd_get_file()/file_close_fd()/g Improve __fget_files_rcu() code generation (and thus __fget_light()) file: massage cleanup of files that failed to open fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() ...
2023-12-29nilfs2: cpfile: fix some kernel-doc warningsRandy Dunlap1-5/+23
Correct the function parameter names for nilfs_cpfile_get_info(): cpfile.c:564: warning: Function parameter or member 'cnop' not described in 'nilfs_cpfile_get_cpinfo' cpfile.c:564: warning: Function parameter or member 'mode' not described in 'nilfs_cpfile_get_cpinfo' cpfile.c:564: warning: Function parameter or member 'buf' not described in 'nilfs_cpfile_get_cpinfo' cpfile.c:564: warning: Function parameter or member 'cisz' not described in 'nilfs_cpfile_get_cpinfo' cpfile.c:564: warning: Excess function parameter 'cno' description in 'nilfs_cpfile_get_cpinfo' cpfile.c:564: warning: Excess function parameter 'ci' description in 'nilfs_cpfile_get_cpinfo' Also add missing descriptions of the function's specification. [ konishi.ryusuke@gmail.com: filled in missing descriptions ] Link: https://lkml.kernel.org/r/20231220065931.2372-1-rdunlap@infradead.org Link: https://lkml.kernel.org/r/20231220221342.11505-1-konishi.ryusuke@gmail.com Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29nilfs2: add missing set_freezable() for freezable kthreadKevin Hao1-0/+1
The kernel thread function nilfs_segctor_thread() invokes the try_to_freeze() in its loop. But all the kernel threads are non-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Link: https://lkml.kernel.org/r/20231219090918.2329-1-konishi.ryusuke@gmail.com Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20nilfs2: switch WARN_ONs to warning output in nilfs_sufile_do_free()Ryusuke Konishi1-2/+7
nilfs_sufile_do_free(), which is called when log write fails or during GC, uses WARN_ONs to check for abnormal status of metadata. In the former case, these WARN_ONs will not be fired, but in the latter case they don't "never-happen". It is possible to trigger these by intentionally modifying the userland GC library to release segments that are not in the expected state. So, replace them with warning output using the dedicated macro nilfs_warn(). This replaces two potentially triggered WARN_ONs with ones that use a warning output macro. Link: https://lkml.kernel.org/r/20231207045730.5205-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20nilfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing...Al Viro1-6/+1
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-10nilfs2: convert nilfs_page_bug() to nilfs_folio_bug()Matthew Wilcox (Oracle)3-17/+18
All callers have a folio now, so convert it. Link: https://lkml.kernel.org/r/20231127143036.2425-18-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_prepare_chunk() and nilfs_commit_chunk() to foliosMatthew Wilcox (Oracle)1-20/+19
All callers now have a folio, so convert these two functions. Saves one call to compound_head() in unlock_page(). [konishi.ryusuke: resolved conflicts in nilfs_{set_link,delete_entry}] Link: https://lkml.kernel.org/r/20231127143036.2425-17-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_make_empty() to use a folioMatthew Wilcox (Oracle)1-9/+9
Remove two calls to compound_head() and switch from kmap_atomic to kmap_local. Link: https://lkml.kernel.org/r/20231127143036.2425-16-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_empty_dir() to use a folioMatthew Wilcox (Oracle)1-15/+4
Remove three calls to compound_head() by using the folio API. Link: https://lkml.kernel.org/r/20231127143036.2425-15-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_add_link() to use a folioMatthew Wilcox (Oracle)1-17/+14
Remove six calls to compound_head() by using the folio API. Link: https://lkml.kernel.org/r/20231127143036.2425-14-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_rename() to use foliosMatthew Wilcox (Oracle)3-64/+60
This involves converting nilfs_find_entry(), nilfs_dotdot(), nilfs_set_link(), nilfs_delete_entry() and nilfs_do_unlink() to use folios as well. [konishi.ryusuke: followed the change of page release helper call sites] Link: https://lkml.kernel.org/r/20231127143036.2425-13-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_find_entry to use a folioMatthew Wilcox (Oracle)1-6/+6
Use the new folio APIs to remove calls to compound_head(). [konishi.ryusuke: resolved a conflict due to style warning correction] Link: https://lkml.kernel.org/r/20231127143036.2425-12-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_readdir to use a folioMatthew Wilcox (Oracle)1-5/+5
Use the new folio APIs to remove calls to compound_head(). Link: https://lkml.kernel.org/r/20231127143036.2425-11-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: add nilfs_get_folio()Matthew Wilcox (Oracle)1-21/+32
Convert nilfs_get_page() to be a wrapper. Also convert nilfs_check_page() to nilfs_check_folio(). Link: https://lkml.kernel.org/r/20231127143036.2425-10-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: switch to kmap_local for directory handlingMatthew Wilcox (Oracle)3-26/+19
Match ext2 by using kmap_local() instead of kmap(). This is more efficient. Also use unmap_and_put_page() instead of duplicating it as a nilfs function. [konishi.ryusuke: followed the change of page release helper call sites] Link: https://lkml.kernel.org/r/20231127143036.2425-9-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: pass the mapped address to nilfs_check_page()Matthew Wilcox (Oracle)1-3/+2
Remove another use of page_address() as part of preparing for the kmap to kmap_local transition. Link: https://lkml.kernel.org/r/20231127143036.2425-8-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: return the mapped address from nilfs_get_page()Matthew Wilcox (Oracle)1-30/+27
In prepartion for switching from kmap() to kmap_local(), return the kmap address from nilfs_get_page() instead of having the caller look up page_address(). [konishi.ryusuke: fixed a missing blank line after declaration] Link: https://lkml.kernel.org/r/20231127143036.2425-7-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: remove page_address() from nilfs_delete_entryMatthew Wilcox (Oracle)1-2/+2
In preparation for removing kmap from directory handling, mask the directory entry pointer to discover the start address of the page. Matches ext2. Link: https://lkml.kernel.org/r/20231127143036.2425-6-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: remove page_address() from nilfs_add_linkMatthew Wilcox (Oracle)1-1/+1
In preparation for removing kmap from directory handling, use offset_in_page() to calculate 'from'. Matches ext2. Link: https://lkml.kernel.org/r/20231127143036.2425-5-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: remove page_address() from nilfs_set_linkMatthew Wilcox (Oracle)1-1/+1
In preparation for removing kmap from directory handling, use offset_in_page() to calculate 'from'. Matches ext2. Link: https://lkml.kernel.org/r/20231127143036.2425-4-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: eliminate staggered calls to kunmap in nilfs_renameRyusuke Konishi1-1/+2
In nilfs_rename(), calls to nilfs_put_page() to release pages obtained with nilfs_find_entry() or nilfs_dotdot() are alternated in the normal path. When replacing the kernel memory mapping method from kmap to kmap_local_{page,folio}, this violates the constraint on the calling order of kunmap_local(). Swap the order of nilfs_put_page calls where the kmap sections of multiple pages overlap so that they are nested, allowing direct replacement of nilfs_put_page() -> unmap_and_put_page(). Without this reordering, that replacement will cause a kernel WARNING in kunmap_local_indexed() on architectures with high memory mapping. Link: https://lkml.kernel.org/r/20231127143036.2425-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_linkRyusuke Konishi3-16/+14
Patch series "nilfs2: Folio conversions for directory paths". This series applies page->folio conversions to nilfs2 directory operations. This reduces hidden compound_head() calls and also converts deprecated kmap calls to kmap_local in the directory code. Although nilfs2 does not yet support large folios, Matthew has done his best here to include support for large folios, which will be needed for devices with large block sizes. This series corresponds to the second half of the original post [1], but with two complementary patches inserted at the beginning and some adjustments, to prevent a kmap_local constraint violation found during testing with highmem mapping. [1] https://lkml.kernel.org/r/20231106173903.1734114-1-willy@infradead.org I have reviewed all changes and tested this for regular and small block sizes, both on machines with and without highmem mapping. No issues found. This patch (of 17): In a few directory operations, the call to nilfs_put_page() for a page obtained using nilfs_find_entry() or nilfs_dotdot() is hidden in nilfs_set_link() and nilfs_delete_entry(), making it difficult to track page release and preventing change of its call position. By moving nilfs_put_page() out of these functions, this makes the page get/put correspondence clearer and makes it easier to swap nilfs_put_page() calls (and kunmap calls within them) when modifying multiple directory entries simultaneously in nilfs_rename(). Also, update comments for nilfs_set_link() and nilfs_delete_entry() to reflect changes in their behavior. To make nilfs_put_page() visible from namei.c, this moves its definition to nilfs.h and replaces existing equivalents to use it, but the exposure of that definition is temporary and will be removed on a later kmap -> kmap_local conversion. Link: https://lkml.kernel.org/r/20231127143036.2425-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20231127143036.2425-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_btnode_abort_change_key to use a folioMatthew Wilcox (Oracle)1-1/+1
Saves one call to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-21-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_btnode_commit_change_key to use a folioMatthew Wilcox (Oracle)1-6/+6
Saves one call to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-20-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_btnode_prepare_change_key to use a folioMatthew Wilcox (Oracle)1-9/+9
Saves three calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-19-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_btnode_delete to use a folioMatthew Wilcox (Oracle)1-9/+9
Saves six calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-18-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_btnode_submit_block to use a folioMatthew Wilcox (Oracle)1-4/+4
Saves two calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-17-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_btnode_create_block to use a folioMatthew Wilcox (Oracle)1-2/+2
Saves two calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-16-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_gccache_submit_read_data to use a folioMatthew Wilcox (Oracle)1-2/+2
Saves two calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-15-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_mdt_submit_block to use a folioMatthew Wilcox (Oracle)1-2/+2
Saves two calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-14-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_mdt_create_block to use a folioMatthew Wilcox (Oracle)1-2/+2
Saves two calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-13-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_page_mkwrite() to use a folioMatthew Wilcox (Oracle)1-13/+15
Using the new folio APIs saves seven hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-12-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_segctor_prepare_write to use foliosMatthew Wilcox (Oracle)1-29/+29
Use the new folio APIs, saving 17 hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-11-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert to __nilfs_clear_folio_dirty()Matthew Wilcox (Oracle)3-11/+12
All callers now have a folio, so convert to pass a folio. No caller uses the return value, so make it return void. Removes a couple of hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-10-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert to nilfs_clear_folio_dirty()Matthew Wilcox (Oracle)4-16/+17
All callers of nilfs_clear_dirty_page() now have a folio, so rename the function and pass in the folio. Saves three hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-9-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_mdt_write_page() to use a folioMatthew Wilcox (Oracle)1-6/+7
Convert the incoming page to a folio. Replaces three calls to compound_head() with one. Link: https://lkml.kernel.org/r/20231114084436.2755-8-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_writepage() to use a folioMatthew Wilcox (Oracle)1-4/+5
Convert the incoming page to a folio. Replaces three calls to compound_head() with one. Link: https://lkml.kernel.org/r/20231114084436.2755-7-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert to nilfs_folio_buffers_clean()Matthew Wilcox (Oracle)3-12/+12
All callers of nilfs_page_buffers_clean() now have a folio, so convert it to take a folio. While I'm at it, make it return a bool. Link: https://lkml.kernel.org/r/20231114084436.2755-6-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_forget_buffer to use a folioMatthew Wilcox (Oracle)1-5/+5
Save two hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-5-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_segctor_complete_write to use foliosMatthew Wilcox (Oracle)1-28/+21
Use the new folio APIs, saving five calls to compound_head(). This includes the last callers of nilfs_end_page_io(), so remove that too. Link: https://lkml.kernel.org/r/20231114084436.2755-4-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: convert nilfs_abort_logs to use foliosMatthew Wilcox (Oracle)1-14/+14
Use the new folio APIs, saving five hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231114084436.2755-3-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10nilfs2: add nilfs_end_folio_io()Matthew Wilcox (Oracle)1-14/+22
Patch series "nilfs2: Folio conversions for file paths". This series advances page->folio conversions for a wide range of nilfs2, including its file operations, block routines, and the log writer's writeback routines. It doesn't cover large folios support, but it saves a lot of hidden compound_head() calls while preserving the existing support range behavior. The original series in post [1] also covered directory-related page->folio conversions, but that was put on hold because a regression was found in testing, so this is an excerpt from the first half of the original post. [1] https://lkml.kernel.org/r/20231106173903.1734114-1-willy@infradead.org I tested this series in both 32-bit and 64-bit environments, switching between normal and small block sizes. I also reviewed all changes in all patches to ensure they do not break existing behavior. There were no problems. This patch (of 20): This is the folio counterpart of the existing nilfs_end_page_io() which is retained as a wrapper of nilfs_end_folio_io(). Replaces nine hidden calls to compound_head() with one. Link: https://lkml.kernel.org/r/20231114084436.2755-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20231114084436.2755-2-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10fs/nilfs2: use standard array-copy-functionPhilipp Stanner1-6/+4
ioctl.c utilizes memdup_user() to copy a userspace array. An overflow check is performed manually before the function's invocation. The new function memdup_array_user() standardizes copying userspace arrays, thus, improving readability by making it more clear that an array is being copied. Additionally, it also performs an overflow check. Remove the (now redundant) manual overflow-check and replace memdup_user() with memdup_array_user(). In addition, improve the grammar of the comment above memdup_array_user(). Link: https://lkml.kernel.org/r/20231106224416.3055-1-konishi.ryusuke@gmail.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://lkml.kernel.org/r/20231103184831.99406-2-pstanner@redhat.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Suggested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()Ryusuke Konishi1-7/+35
If nilfs2 reads a disk image with corrupted segment usage metadata, and its segment usage information is marked as an error for the segment at the write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs during log writing. Segments newly allocated for writing with nilfs_sufile_alloc() will not have this error flag set, but this unexpected situation will occur if the segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active segment) was marked in error. Fix this issue by inserting a sanity check to treat it as a file system corruption. Since error returns are not allowed during the execution phase where nilfs_sufile_set_segment_usage() is used, this inserts the sanity check into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the segment usage record to be updated and sets it up in a dirty state for writing. In addition, nilfs_sufile_set_segment_usage() is also called when canceling log writing and undoing segment usage update, so in order to avoid issuing the same kernel warning in that case, in case of cancellation, avoid checking the error flag in nilfs_sufile_set_segment_usage(). Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06nilfs2: fix missing error check for sb_set_blocksize callRyusuke Konishi1-1/+5
When mounting a filesystem image with a block size larger than the page size, nilfs2 repeatedly outputs long error messages with stack traces to the kernel log, such as the following: getblk(): invalid block size 8192 requested logical block size: 512 ... Call Trace: dump_stack_lvl+0x92/0xd4 dump_stack+0xd/0x10 bdev_getblk+0x33a/0x354 __breadahead+0x11/0x80 nilfs_search_super_root+0xe2/0x704 [nilfs2] load_nilfs+0x72/0x504 [nilfs2] nilfs_mount+0x30f/0x518 [nilfs2] legacy_get_tree+0x1b/0x40 vfs_get_tree+0x18/0xc4 path_mount+0x786/0xa88 __ia32_sys_mount+0x147/0x1a8 __do_fast_syscall_32+0x56/0xc8 do_fast_syscall_32+0x29/0x58 do_SYSENTER_32+0x15/0x18 entry_SYSENTER_32+0x98/0xf1 ... This overloads the system logger. And to make matters worse, it sometimes crashes the kernel with a memory access violation. This is because the return value of the sb_set_blocksize() call, which should be checked for errors, is not checked. The latter issue is due to out-of-buffer memory being accessed based on a large block size that caused sb_set_blocksize() to fail for buffers read with the initial minimum block size that remained unupdated in the super_block structure. Since nilfs2 mkfs tool does not accept block sizes larger than the system page size, this has been overlooked. However, it is possible to create this situation by intentionally modifying the tool or by passing a filesystem image created on a system with a large page size to a system with a smaller page size and mounting it. Fix this issue by inserting the expected error handling for the call to sb_set_blocksize(). Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-11-21fs: Rename mapping private membersMatthew Wilcox (Oracle)1-2/+2
It is hard to find where mapping->private_lock, mapping->private_list and mapping->private_data are used, due to private_XXX being a relatively common name for variables and structure members in the kernel. To fit with other members of struct address_space, rename them all to have an i_ prefix. Tested with an allmodconfig build. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20231117215823.2821906-1-willy@infradead.org Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-18nilfs2: simplify device handlingJan Kara1-8/+0
We removed all codepaths where s_umount is taken beneath open_mutex and bd_holder_lock so don't make things more complicated than they need to be and hold s_umount over block device opening. CC: Ryusuke Konishi <konishi.ryusuke@gmail.com> CC: <linux-nilfs@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231101172739.8676-1-jack@suse.cz Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-02Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds4-84/+76
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-10-25buffer: remove folio_create_empty_buffers()Matthew Wilcox (Oracle)3-4/+4
With all users converted, remove the old create_empty_buffers() and rename folio_create_empty_buffers() to create_empty_buffers(). Link: https://lkml.kernel.org/r/20231016201114.1928083-28-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: convert nilfs_lookup_dirty_data_buffers to use ↵Matthew Wilcox (Oracle)1-4/+3
folio_create_empty_buffers This function was already using a folio, so this update to the new API removes a single folio->page->folio conversion. Link: https://lkml.kernel.org/r/20231016201114.1928083-17-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: remove nilfs_page_get_nth_blockMatthew Wilcox (Oracle)1-6/+0
All users have now been converted to get_nth_block(). Link: https://lkml.kernel.org/r/20231016201114.1928083-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: convert nilfs_mdt_get_frozen_buffer to use a folioMatthew Wilcox (Oracle)1-7/+9
Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: convert nilfs_mdt_forget_block() to use a folioMatthew Wilcox (Oracle)1-16/+14
Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: convert nilfs_copy_page() to nilfs_copy_folio()Matthew Wilcox (Oracle)1-24/+26
Both callers already have a folio, so pass it in and use it directly. Removes a lot of hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: convert nilfs_grab_buffer() to use a folioMatthew Wilcox (Oracle)1-13/+13
Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25nilfs2: convert nilfs_mdt_freeze_buffer to use a folioMatthew Wilcox (Oracle)1-9/+11
Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25buffer: add get_nth_bh()Matthew Wilcox (Oracle)1-6/+1
Extract this useful helper from nilfs_page_get_nth_block() Link: https://lkml.kernel.org/r/20231016201114.1928083-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18nilfs2: convert to new timestamp accessorsJeff Layton2-13/+13
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-51-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-29nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()Pan Bian1-3/+3
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the reference count of bh when the call to nilfs_dat_translate() fails. If the reference count hits 0 and its owner page gets unlocked, bh may be freed. However, bh->b_page is dereferenced to put the page after that, which may result in a use-after-free bug. This patch moves the release operation after unlocking and putting the page. NOTE: The function in question is only called in GC, and in combination with current userland tools, address translation using DAT does not occur in that function, so the code path that causes this issue will not be executed. However, it is possible to run that code path by intentionally modifying the userland GC library or by calling the GC ioctl directly. [konishi.ryusuke@gmail.com: NOTE added to the commit log] Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com Fixes: a3d93f709e89 ("nilfs2: block cache for garbage collection") Signed-off-by: Pan Bian <bianpan2016@163.com> Reported-by: Ferry Meng <mengferry@linux.alibaba.com> Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-29Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds2-1/+2
Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds2-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds1-51/+30
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds4-14/+14
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-21nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuseRyusuke Konishi2-3/+7
A syzbot stress test using a corrupted disk image reported that mark_buffer_dirty() called from __nilfs_mark_inode_dirty() or nilfs_palloc_commit_alloc_entry() may output a kernel warning, and can panic if the kernel is booted with panic_on_warn. This is because nilfs2 keeps buffer pointers in local structures for some metadata and reuses them, but such buffers may be forcibly discarded by nilfs_clear_dirty_page() in some critical situations. This issue is reported to appear after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only"), but the issue has potentially existed before. Fix this issue by checking the uptodate flag when attempting to reuse an internally held buffer, and reloading the metadata instead of reusing the buffer if the flag was lost. Link: https://lkml.kernel.org/r/20230818131804.7758-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+cdfcae656bac88ba0e2d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000003da75f05fdeffd12@google.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi1-0/+5
A syzbot stress test reported that create_empty_buffers() called from nilfs_lookup_dirty_data_buffers() can cause a general protection fault. Analysis using its reproducer revealed that the back reference "mapping" from a page/folio has been changed to NULL after dirty page/folio gang lookup in nilfs_lookup_dirty_data_buffers(). Fix this issue by excluding pages/folios from being collected if, after acquiring a lock on each page/folio, its back reference "mapping" differs from the pointer to the address space struct that held the page/folio. Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0ad741797f4565e7e2d2@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-10nilfs2: use setup_bdev_super to de-duplicate the mount codeChristoph Hellwig1-51/+30
Use the generic setup_bdev_super helper to open the main block device and do various bits of superblock setup instead of duplicating the logic. This includes moving to the new scheme implemented in common code that only opens the block device after the superblock has allocated. It does not yet convert nilfs2 to the new mount API, but doing so will become a bit simpler after this first step. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Message-Id: <20230802154131.2221419-3-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-04nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iputRyusuke Konishi3-0/+12
During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously, nilfs_evict_inode() could cause use-after-free read for nilfs_root if inodes are left in "garbage_list" and released by nilfs_dispose_list at the end of nilfs_detach_log_writer(), and this bug was fixed by commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"). However, it turned out that there is another possibility of UAF in the call path where mark_inode_dirty_sync() is called from iput(): nilfs_detach_log_writer() nilfs_dispose_list() iput() mark_inode_dirty_sync() __mark_inode_dirty() nilfs_dirty_inode() __nilfs_mark_inode_dirty() nilfs_load_inode_block() --> causes UAF of nilfs_root struct This can happen after commit 0ae45f63d4ef ("vfs: add support for a lazytime mount option"), which changed iput() to call mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME flag and i_nlink is non-zero. This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only") when using the syzbot reproducer, but the issue has potentially existed before. Fix this issue by adding a "purging flag" to the nilfs structure, setting that flag while disposing the "garbage_list" and checking it in __nilfs_mark_inode_dirty(). Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"), this patch does not rely on ns_writer to determine whether to skip operations, so as not to break recovery on mount. The nilfs_salvage_orphan_logs routine dirties the buffer of salvaged data before attaching the log writer, so changing __nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL will cause recovery write to fail. The purpose of using the cleanup-only flag is to allow for narrowing of such conditions. Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 4.0+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-02fs: add CONFIG_BUFFER_HEADChristoph Hellwig1-0/+1
Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-02fs: rename and move block_page_mkwrite_returnChristoph Hellwig1-1/+1
block_page_mkwrite_return is neither block nor mkwrite specific, and should not be under CONFIG_BLOCK. Move it to mm.h and rename it to vmf_fs_error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24nilfs2: convert to ctime accessor functionsJeff Layton4-14/+14
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-57-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-26Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-8/+4
Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ...
2023-06-26Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-1/+1
Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-06-19nilfs2: prevent general protection fault in nilfs_clear_dirty_page()Ryusuke Konishi1-1/+9
In a syzbot stress test that deliberately causes file system errors on nilfs2 with a corrupted disk image, it has been reported that nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a general protection fault. In nilfs_clear_dirty_pages(), when looking up dirty pages from the page cache and calling nilfs_clear_dirty_page() for each dirty page/folio retrieved, the back reference from the argument page to "mapping" may have been changed to NULL (and possibly others). It is necessary to check this after locking the page/folio. So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio after locking it in nilfs_clear_dirty_pages() if the back reference "mapping" from the page/folio is different from the "mapping" that held the page/folio just before. Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+53369d11851d8f26735c@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19nilfs2: fix buffer corruption due to concurrent device readsRyusuke Konishi3-1/+35
As a result of analysis of a syzbot report, it turned out that in three cases where nilfs2 allocates block device buffers directly via sb_getblk, concurrent reads to the device can corrupt the allocated buffers. Nilfs2 uses sb_getblk for segment summary blocks, that make up a log header, and the super root block, that is the trailer, and when moving and writing the second super block after fs resize. In any of these, since the uptodate flag is not set when storing metadata to be written in the allocated buffers, the stored metadata will be overwritten if a device read of the same block occurs concurrently before the write. This causes metadata corruption and misbehavior in the log write itself, causing warnings in nilfs_btree_assign() as reported. Fix these issues by setting an uptodate flag on the buffer head on the first or before modifying each buffer obtained with sb_getblk, and clearing the flag on failure. When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used to perform necessary exclusive control, and the buffer is filled to ensure that uninitialized bytes are not mixed into the data read from others. As for buffers for segment summary blocks, they are filled incrementally, so if the uptodate flag was unset on their allocation, set the flag and zero fill the buffer once at that point. Also, regarding the superblock move routine, the starting point of the memset call to zerofill the block is incorrectly specified, which can cause a buffer overflow on file systems with block sizes greater than 4KiB. In addition, if the superblock is moved within a large block, it is necessary to assume the possibility that the data in the superblock will be destroyed by zero-filling before copying. So fix these potential issues as well. Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+31837fe952932efc8fb9@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12nilfs2: reject devices with insufficient block countRyusuke Konishi1-1/+42
The current sanity check for nilfs2 geometry information lacks checks for the number of segments stored in superblocks, so even for device images that have been destructively truncated or have an unusually high number of segments, the mount operation may succeed. This causes out-of-bounds block I/O on file system block reads or log writes to the segments, the latter in particular causing "a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to hang. Fix this issue by checking the number of segments stored in the superblock and avoiding mounting devices that can cause out-of-bounds accesses. To eliminate the possibility of overflow when calculating the number of blocks required for the device from the number of segments, this also adds a helper function to calculate the upper bound on the number of segments and inserts a check using it. Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+7d50f1e54a12ba3aeae2@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12nilfs2: fix possible out-of-bounds segment allocation in resize ioctlRyusuke Konishi1-0/+9
Syzbot reports that in its stress test for resize ioctl, the log writing function nilfs_segctor_do_construct hits a WARN_ON in nilfs_segctor_truncate_segments(). It turned out that there is a problem with the current implementation of the resize ioctl, which changes the writable range on the device (the range of allocatable segments) at the end of the resize process. This order is necessary for file system expansion to avoid corrupting the superblock at trailing edge. However, in the case of a file system shrink, if log writes occur after truncating out-of-bounds trailing segments and before the resize is complete, segments may be allocated from the truncated space. The userspace resize tool was fine as it limits the range of allocatable segments before performing the resize, but it can run into this issue if the resize ioctl is called alone. Fix this issue by changing nilfs_sufile_resize() to update the range of allocatable segments immediately after successful truncation of segment space in case of file system shrink. Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com Fixes: 4e33f9eab07e ("nilfs2: implement resize ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()Ryusuke Konishi1-2/+10
A syzbot fault injection test reported that nilfs_btnode_create_block, a helper function that allocates a new node block for b-trees, causes a kernel BUG for disk images where the file system block size is smaller than the page size. This was due to unexpected flags on the newly allocated buffer head, and it turned out to be because the buffer flags were not cleared by nilfs_btnode_abort_change_key() after an error occurred during a b-tree update operation and the buffer was later reused in that state. Fix this issue by using nilfs_btnode_delete() to abandon the unused preallocated buffer in nilfs_btnode_abort_change_key(). Link: https://lkml.kernel.org/r/20230513102428.10223-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b0a35a5c1f7e846d3b09@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000d1d6c205ebc4d512@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12fs: remove sb->s_modeChristoph Hellwig1-1/+0
There is no real need to store the open mode in the super_block now. It is only used by f2fs, which can easily recalculate it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: add a sb_open_mode helperChristoph Hellwig1-5/+2
Add a helper to return the open flags for blkdev_get_by* for passed in super block flags instead of open coding the logic in many places. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig1-3/+3
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig1-1/+1
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24splice: Use filemap_splice_read() instead of generic_file_splice_read()David Howells1-1/+1
Replace pointers to generic_file_splice_read() with calls to filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-29-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-17nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()Ryusuke Konishi1-0/+18
During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). However, since nilfs_evict_inode() uses nilfs_root for some cleanup operations, it may cause use-after-free read if inodes are left in "garbage_list" and released by nilfs_dispose_list() at the end of nilfs_detach_log_writer(). Fix this issue by modifying nilfs_evict_inode() to only clear inode without additional metadata changes that use nilfs_root if the file system is degraded to read-only or the writer is detached. Link: https://lkml.kernel.org/r/20230509152956.8313-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+78d4495558999f55d1da@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000099e5ac05fb1c3b85@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-06nilfs2: do not write dirty data after degenerating to read-onlyRyusuke Konishi1-1/+4
According to syzbot's report, mark_buffer_dirty() called from nilfs_segctor_do_construct() outputs a warning with some patterns after nilfs2 detects metadata corruption and degrades to read-only mode. After such read-only degeneration, page cache data may be cleared through nilfs_clear_dirty_page() which may also clear the uptodate flag for their buffer heads. However, even after the degeneration, log writes are still performed by unmount processing etc., which causes mark_buffer_dirty() to be called for buffer heads without the "uptodate" flag and causes the warning. Since any writes should not be done to a read-only file system in the first place, this fixes the warning in mark_buffer_dirty() by letting nilfs_segctor_do_construct() abort early if in read-only mode. This also changes the retry check of nilfs_segctor_write_out() to avoid unnecessary log write retries if it detects -EROFS that nilfs_segctor_do_construct() returned. Link: https://lkml.kernel.org/r/20230427011526.13457-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+2af3bc9585be7f23f290@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=2af3bc9585be7f23f290 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-06nilfs2: fix infinite loop in nilfs_mdt_get_block()Ryusuke Konishi1-4/+12
If the disk image that nilfs2 mounts is corrupted and a virtual block address obtained by block lookup for a metadata file is invalid, nilfs_bmap_lookup_at_level() may return the same internal return code as -ENOENT, meaning the block does not exist in the metadata file. This duplication of return codes confuses nilfs_mdt_get_block(), causing it to read and create a metadata block indefinitely. In particular, if this happens to the inode metadata file, ifile, semaphore i_rwsem can be left held, causing task hangs in lock_mount. Fix this issue by making nilfs_bmap_lookup_at_level() treat virtual block address translation failures with -ENOENT as metadata corruption instead of returning the error code. Link: https://lkml.kernel.org/r/20230430193046.6769-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+221d75710bde87fa0e97@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=221d75710bde87fa0e97 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changesAndrew Morton1-0/+20
2023-04-18nilfs2: initialize unused bytes in segment summary blocksRyusuke Konishi1-0/+20
Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for KMSAN enabled kernels after applying commit 7397031622e0 ("nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field"). This is because the unused bytes at the end of each block in segment summaries are not initialized. So this fixes the issue by padding the unused bytes with null bytes. Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changesAndrew Morton5-7/+12
2023-04-05mm: return an ERR_PTR from __filemap_get_folioChristoph Hellwig1-3/+3
Instead of returning NULL for all errors, distinguish between: - no entry found and not asked to allocated (-ENOENT) - failed to allocate memory (-ENOMEM) - would block (-EAGAIN) so that callers don't have to guess the error based on the passed in flags. Also pass through the error through the direct callers: filemap_get_folio, filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio. [hch@lst.de: fix null-pointer deref] Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004 Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2] Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05nilfs2: fix sysfs interface lifetimeRyusuke Konishi2-5/+9
The current nilfs2 sysfs support has issues with the timing of creation and deletion of sysfs entries, potentially leading to null pointer dereferences, use-after-free, and lockdep warnings. Some of the sysfs attributes for nilfs2 per-filesystem instance refer to metadata file "cpfile", "sufile", or "dat", but nilfs_sysfs_create_device_group that creates those attributes is executed before the inodes for these metadata files are loaded, and nilfs_sysfs_delete_device_group which deletes these sysfs entries is called after releasing their metadata file inodes. Therefore, access to some of these sysfs attributes may occur outside of the lifetime of these metadata files, resulting in inode NULL pointer dereferences or use-after-free. In addition, the call to nilfs_sysfs_create_device_group() is made during the locking period of the semaphore "ns_sem" of nilfs object, so the shrinker call caused by the memory allocation for the sysfs entries, may derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in nilfs_evict_inode()". Since nilfs2 may acquire "ns_sem" deep in the call stack holding other locks via its error handler __nilfs_error(), this causes lockdep to report circular locking. This is a false positive and no circular locking actually occurs as no inodes exist yet when nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep warnings can be resolved by simply moving the call to nilfs_sysfs_create_device_group() out of "ns_sem". This fixes these sysfs issues by revising where the device's sysfs interface is created/deleted and keeping its lifetime within the lifetime of the metadata files above. Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad fieldTetsuo Handa2-0/+2
nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing "struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when being passed to CRC function. Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com Reported-by: syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()Ryusuke Konishi1-2/+1
The finalization of nilfs_segctor_thread() can race with nilfs_segctor_kill_thread() which terminates that thread, potentially causing a use-after-free BUG as KASAN detected. At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member of "struct nilfs_sc_info" to indicate the thread has finished, and then notifies nilfs_segctor_kill_thread() of this using waitqueue "sc_wait_task" on the struct nilfs_sc_info. However, here, immediately after the NULL assignment to "sc_task", it is possible that nilfs_segctor_kill_thread() will detect it and return to continue the deallocation, freeing the nilfs_sc_info structure before the thread does the notification. This fixes the issue by protecting the NULL assignment to "sc_task" and its notification, with spinlock "sc_state_lock" of the struct nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate the race. Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-23nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()Ryusuke Konishi1-1/+1
The ioctl helper function nilfs_ioctl_wrap_copy(), which exchanges a metadata array to/from user space, may copy uninitialized buffer regions to user space memory for read-only ioctl commands NILFS_IOCTL_GET_SUINFO and NILFS_IOCTL_GET_CPINFO. This can occur when the element size of the user space metadata given by the v_size member of the argument nilfs_argv structure is larger than the size of the metadata element (nilfs_suinfo structure or nilfs_cpinfo structure) on the file system side. KMSAN-enabled kernels detect this issue as follows: BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0xc0/0x100 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0xc0/0x100 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:169 [inline] nilfs_ioctl_wrap_copy+0x6fa/0xc10 fs/nilfs2/ioctl.c:99 nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline] nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290 nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343 __do_compat_sys_ioctl fs/ioctl.c:968 [inline] __se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910 __ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Uninit was created at: __alloc_pages+0x9f6/0xe90 mm/page_alloc.c:5572 alloc_pages+0xab0/0xd80 mm/mempolicy.c:2287 __get_free_pages+0x34/0xc0 mm/page_alloc.c:5599 nilfs_ioctl_wrap_copy+0x223/0xc10 fs/nilfs2/ioctl.c:74 nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline] nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290 nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343 __do_compat_sys_ioctl fs/ioctl.c:968 [inline] __se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910 __ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Bytes 16-127 of 3968 are uninitialized ... This eliminates the leak issue by initializing the page allocated as buffer using get_zeroed_page(). Link: https://lkml.kernel.org/r/20230307085548.6290-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+132fdd2f1e1805fdc591@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/000000000000a5bd2d05f63f04ae@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-23Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of ↵Linus Torvalds1-10/+28
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "There is no particular theme here - mainly quick hits all over the tree. Most notable is a set of zlib changes from Mikhail Zaslonko which enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set of s390 DFLTCC related patches for kernel zlib'" * tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits) Update CREDITS file entry for Jesper Juhl sparc: allow PM configs for sparc32 COMPILE_TEST hung_task: print message when hung_task_warnings gets down to zero. arch/Kconfig: fix indentation scripts/tags.sh: fix the Kconfig tags generation when using latest ctags nilfs2: prevent WARNING in nilfs_dat_commit_end() lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option lib/zlib: DFLTCC support inflate with small window lib/zlib: Split deflate and inflate states for DFLTCC lib/zlib: DFLTCC not writing header bits when avail_out == 0 lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0 lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams lib/zlib: implement switching between DFLTCC and software lib/zlib: adjust offset calculation for dfltcc_state nilfs2: replace WARN_ONs for invalid DAT metadata block requests scripts/spelling.txt: add "exsits" pattern and fix typo instances fs: gracefully handle ->get_block not mapping bh in __mpage_writepage cramfs: Kconfig: fix spelling & punctuation ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds6-63/+66
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-20Merge tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull legacy dio update from Jens Axboe: "We only have a few file systems that use the old dio code, make them select it rather than build it unconditionally" * tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux: fs: build the legacy direct I/O code conditionally fs: move sb_init_dio_done_wq out of direct-io.c
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds4-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-17nilfs2: fix underflow in second superblock position calculationsRyusuke Konishi3-1/+23
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second superblock, underflows when the argument device size is less than 4096 bytes. Therefore, when using this macro, it is necessary to check in advance that the device size is not less than a lower limit, or at least that underflow does not occur. The current nilfs2 implementation lacks this check, causing out-of-bound block access when mounting devices smaller than 4096 bytes: I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 NILFS (loop0): unable to read secondary superblock (blocksize = 1024) In addition, when trying to resize the filesystem to a size below 4096 bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number of segments to nilfs_sufile_resize(), corrupting parameters such as the number of segments in superblocks. This causes excessive loop iterations in nilfs_sufile_resize() during a subsequent resize ioctl, causing semaphore ns_segctor_sem to block for a long time and hang the writer thread: INFO: task segctord:5067 blocked for more than 143 seconds. Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:segctord state:D stack:23456 pid:5067 ppid:2 flags:0x00004000 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x1409/0x43f0 kernel/sched/core.c:6606 schedule+0xc3/0x190 kernel/sched/core.c:6682 rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190 nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline] nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570 kthread+0x270/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> ... Call Trace: <TASK> folio_mark_accessed+0x51c/0xf00 mm/swap.c:515 __nilfs_get_page_block fs/nilfs2/page.c:42 [inline] nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61 nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121 nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176 nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251 nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline] nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline] nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777 nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422 nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline] nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301 ... This fixes these issues by inserting appropriate minimum device size checks or anti-underflow checks, depending on where the macro is used. Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+f0c4082ce5ebebdac63b@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: prevent WARNING in nilfs_dat_commit_end()Ryusuke Konishi1-0/+11
If nilfs2 reads a corrupted disk image and its DAT metadata file contains invalid lifetime data for a virtual block number, a kernel warning can be generated by the WARN_ON check in nilfs_dat_commit_end() and can panic if the kernel is booted with panic_on_warn. This patch avoids the issue with a sanity check that treats it as an error. Since error return is not allowed in the execution phase of nilfs_dat_commit_end(), this inserts that sanity check in nilfs_dat_prepare_end(), which prepares for nilfs_dat_commit_end(). As the error code, -EINVAL is returned to notify bmap layer of the metadata corruption. When the bmap layer sees this code, it handles the abnormal situation and replaces the return code with -EIO as it should. Link: https://lkml.kernel.org/r/000000000000154d2c05e9ec7df6@google.com Link: https://lkml.kernel.org/r/20230127132202.6083-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+cbff7a52b6f99059e67f@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: replace WARN_ONs for invalid DAT metadata block requestsRyusuke Konishi1-10/+17
If DAT metadata file block access fails due to corruption of the DAT file or abnormal virtual block numbers held by b-trees or inodes, a kernel warning is generated. This replaces the WARN_ONs by error output, so that a kernel, booted with panic_on_warn, does not panic. This patch also replaces the detected return code -ENOENT with another internal code -EINVAL to notify the bmap layer of metadata corruption. When the bmap layer sees -EINVAL, it handles the abnormal situation with nilfs_bmap_convert_error() and finally returns code -EIO as it should. Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+5d5d25f90f195a3cfcb4@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: convert nilfs_clear_dirty_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-10/+10
Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 2 calls to compound_head(). Link: https://lkml.kernel.org/r/20230104211448.4804-23-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: convert nilfs_copy_dirty_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-19/+20
Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 8 calls to compound_head(). Link: https://lkml.kernel.org/r/20230104211448.4804-22-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: convert nilfs_btree_lookup_dirty_buffers() to use ↵Vishal Moola (Oracle)1-7/+7
filemap_get_folios_tag() Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 1 call to compound_head(). Link: https://lkml.kernel.org/r/20230104211448.4804-21-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: convert nilfs_lookup_dirty_node_buffers() to use ↵Vishal Moola (Oracle)1-8/+7
filemap_get_folios_tag() Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 1 call to compound_head(). Link: https://lkml.kernel.org/r/20230104211448.4804-20-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: convert nilfs_lookup_dirty_data_buffers() to use ↵Vishal Moola (Oracle)1-13/+16
filemap_get_folios_tag() Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 4 calls to compound_head(). Link: https://lkml.kernel.org/r/20230104211448.4804-19-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-26fs: build the legacy direct I/O code conditionallyChristoph Hellwig1-0/+1
Add a new LEGACY_DIRECT_IO config symbol that is only selected by the file systems that still use the legacy blockdev_direct_IO code, so that kernels without support for those file systems don't need to build the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230125065839.191256-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-19fs: port inode_init_owner() to mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->permission() to pass mnt_idmapChristian Brauner2-3/+3
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->fileattr_set() to pass mnt_idmapChristian Brauner2-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mknod() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->symlink() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->create() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner2-4/+4
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18nilfs2: replace obvious uses of b_page with b_folioMatthew Wilcox (Oracle)5-6/+6
These places just use b_page to get to the buffer's address_space or the index of the page the buffer is in. Link: https://lkml.kernel.org/r/20221215214402.3522366-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11nilfs2: fix general protection fault in nilfs_btree_insert()Ryusuke Konishi1-3/+12
If nilfs2 reads a corrupted disk image and tries to reads a b-tree node block by calling __nilfs_btree_get_block() against an invalid virtual block address, it returns -ENOENT because conversion of the virtual block address to a disk block address fails. However, this return value is the same as the internal code that b-tree lookup routines return to indicate that the block being searched does not exist, so functions that operate on that b-tree may misbehave. When nilfs_btree_insert() receives this spurious 'not found' code from nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was successful and continues the insert operation using incomplete lookup path data, causing the following crash: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] ... RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline] RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline] RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238 Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89 ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02 ... Call Trace: <TASK> nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline] nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147 nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101 __block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991 __block_write_begin fs/buffer.c:2041 [inline] block_write_begin+0x93/0x1e0 fs/buffer.c:2102 nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261 generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772 __generic_file_write_iter+0x176/0x400 mm/filemap.c:3900 generic_file_write_iter+0xab/0x310 mm/filemap.c:3932 call_write_iter include/linux/fs.h:2186 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> This patch fixes the root cause of this problem by replacing the error code that __nilfs_btree_get_block() returns on block address conversion failure from -ENOENT to another internal code -EINVAL which means that the b-tree metadata is corrupted. By returning -EINVAL, it propagates without glitches, and for all relevant b-tree operations, functions in the upper bmap layer output an error message indicating corrupted b-tree metadata via nilfs_bmap_convert_error(), and code -EIO will be eventually returned as it should be. Link: https://lkml.kernel.org/r/000000000000bd89e205f0e38355@google.com Link: https://lkml.kernel.org/r/20230105055356.8811-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ede796cecd5296353515@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-25treewide: Convert del_timer*() to timer_shutdown*()Steven Rostedt (Google)1-1/+1
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-12Merge tag 'mm-nonmm-stable-2022-12-12' of ↵Linus Torvalds1-8/+65
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - A ptrace API cleanup series from Sergey Shtylyov - Fixes and cleanups for kexec from ye xingchen - nilfs2 updates from Ryusuke Konishi - squashfs feature work from Xiaoming Ni: permit configuration of the filesystem's compression concurrency from the mount command line - A series from Akinobu Mita which addresses bound checking errors when writing to debugfs files - A series from Yang Yingliang to address rapidio memory leaks - A series from Zheng Yejian to address possible overflow errors in encode_comp_t() - And a whole shower of singleton patches all over the place * tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits) ipc: fix memory leak in init_mqueue_fs() hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount rapidio: devices: fix missing put_device in mport_cdev_open kcov: fix spelling typos in comments hfs: Fix OOB Write in hfs_asc2mac hfs: fix OOB Read in __hfs_brec_find relay: fix type mismatch when allocating memory in relay_create_buf() ocfs2: always read both high and low parts of dinode link count io-mapping: move some code within the include guarded section kernel: kcsan: kcsan_test: build without structleak plugin mailmap: update email for Iskren Chernev eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD rapidio: fix possible UAF when kfifo_alloc() fails relay: use strscpy() is more robust and safer cpumask: limit visibility of FORCE_NR_CPUS acct: fix potential integer overflow in encode_comp_t() acct: fix accuracy loss for input value of encode_comp_t() linux/init.h: include <linux/build_bug.h> and <linux/stringify.h> rapidio: rio: fix possible name leak in rio_register_mport() rapidio: fix possible name leaks when rio_add_device() fails ...
2022-11-30nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()ZhangPeng1-0/+7
Syzbot reported a null-ptr-deref bug: NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 1 PID: 3603 Comm: segctord Not tainted 6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0 fs/nilfs2/alloc.c:608 Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7 RSP: 0018:ffffc90003dff830 EFLAGS: 00010212 RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010 RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158 R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0 Call Trace: <TASK> nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline] nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193 nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236 nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940 nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline] nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline] nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018 nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline] nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045 nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline] nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> ... If DAT metadata file is corrupted on disk, there is a case where req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during a b-tree operation that cascadingly updates ancestor nodes of the b-tree, because nilfs_dat_commit_alloc() for a lower level block can initialize the blocknr on the same DAT entry between nilfs_dat_prepare_end() and nilfs_dat_commit_end(). If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free() without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and causes the NULL pointer dereference above in nilfs_palloc_commit_free_entry() function, which leads to a crash. Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free(). This also calls nilfs_error() in that case to notify that there is a fatal flaw in the filesystem metadata and prevent further operations. Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ebe05ee8e98f755f61d0@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-22nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirtyChen Zhongjin1-0/+8
When extending segments, nilfs_sufile_alloc() is called to get an unassigned segment, then mark it as dirty to avoid accidentally allocating the same segment in the future. But for some special cases such as a corrupted image it can be unreliable. If such corruption of the dirty state of the segment occurs, nilfs2 may reallocate a segment that is in use and pick the same segment for writing twice at the same time. This will cause the problem reported by syzkaller: https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd24 This case started with segbuf1.segnum = 3, nextnum = 4 when constructed. It supposed segment 4 has already been allocated and marked as dirty. However the dirty state was corrupted and segment 4 usage was not dirty. For the first time nilfs_segctor_extend_segments() segment 4 was allocated again, which made segbuf2 and next segbuf3 had same segment 4. sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is added to both buffer lists of two segbuf. It makes the lists broken which causes NULL pointer dereference. Fix the problem by setting usage as dirty every time in nilfs_sufile_mark_dirty(), which is called during constructing current segment to be written out and before allocating next segment. [chenzhongjin@huawei.com: add lock protection per Ryusuke] Link: https://lkml.kernel.org/r/20221121091141.214703-1-chenzhongjin@huawei.com Link: https://lkml.kernel.org/r/20221118063304.140187-1-chenzhongjin@huawei.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: <syzbot+77e4f0...@syzkaller.appspotmail.com> Reported-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18nilfs2: fix shift-out-of-bounds due to too large exponent of block sizeRyusuke Konishi1-4/+38
If field s_log_block_size of superblock data is corrupted and too large, init_nilfs() and load_nilfs() still can trigger a shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn is set): shift exponent 38973 is too large for 32-bit type 'int' Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ubsan_epilogue+0xb/0x50 __ubsan_handle_shift_out_of_bounds.cold.12+0x17b/0x1f5 init_nilfs.cold.11+0x18/0x1d [nilfs2] nilfs_mount+0x9b5/0x12b0 [nilfs2] ... This fixes the issue by adding and using a new helper function for getting block size with sanity check. Link: https://lkml.kernel.org/r/20221027044306.42774-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18nilfs2: fix shift-out-of-bounds/overflow in nilfs_sb2_bad_offset()Ryusuke Konishi1-4/+27
Patch series "nilfs2: fix UBSAN shift-out-of-bounds warnings on mount time". The first patch fixes a bug reported by syzbot, and the second one fixes the remaining bug of the same kind. Although they are triggered by the same super block data anomaly, I divided it into the above two because the details of the issues and how to fix it are different. Both are required to eliminate the shift-out-of-bounds issues at mount time. This patch (of 2): If the block size exponent information written in an on-disk superblock is corrupted, nilfs_sb2_bad_offset helper function can trigger shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn is set): shift exponent 38983 is too large for 64-bit type 'unsigned long long' Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x33d/0x3b0 lib/ubsan.c:322 nilfs_sb2_bad_offset fs/nilfs2/the_nilfs.c:449 [inline] nilfs_load_super_block+0xdf5/0xe00 fs/nilfs2/the_nilfs.c:523 init_nilfs+0xb7/0x7d0 fs/nilfs2/the_nilfs.c:577 nilfs_fill_super+0xb1/0x5d0 fs/nilfs2/super.c:1047 nilfs_mount+0x613/0x9b0 fs/nilfs2/super.c:1317 ... In addition, since nilfs_sb2_bad_offset() performs multiplication without considering the upper bound, the computation may overflow if the disk layout parameters are not normal. This fixes these issues by inserting preliminary sanity checks for those parameters and by converting the comparison from one involving multiplication and left bit-shifting to one using division and right bit-shifting. Link: https://lkml.kernel.org/r/20221027044306.42774-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20221027044306.42774-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+e91619dd4c11c4960706@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08nilfs2: fix use-after-free bug of ns_writer on remountRyusuke Konishi2-9/+8
If a nilfs2 filesystem is downgraded to read-only due to metadata corruption on disk and is remounted read/write, or if emergency read-only remount is performed, detaching a log writer and synchronizing the filesystem can be done at the same time. In these cases, use-after-free of the log writer (hereinafter nilfs->ns_writer) can happen as shown in the scenario below: Task1 Task2 -------------------------------- ------------------------------ nilfs_construct_segment nilfs_segctor_sync init_wait init_waitqueue_entry add_wait_queue schedule nilfs_remount (R/W remount case) nilfs_attach_log_writer nilfs_detach_log_writer nilfs_segctor_destroy kfree finish_wait _raw_spin_lock_irqsave __raw_spin_lock_irqsave do_raw_spin_lock debug_spin_lock_before <-- use-after-free While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1 waked up, Task1 accesses nilfs->ns_writer which is already freed. This scenario diagram is based on the Shigeru Yoshida's post [1]. This patch fixes the issue by not detaching nilfs->ns_writer on remount so that this UAF race doesn't happen. Along with this change, this patch also inserts a few necessary read-only checks with superblock instance where only the ns_writer pointer was used to check if the filesystem is read-only. Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1] Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+f816fa82f8783f7a02bb@syzkaller.appspotmail.com Reported-by: Shigeru Yoshida <syoshida@redhat.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08nilfs2: fix deadlock in nilfs_count_free_blocks()Ryusuke Konishi1-2/+0
A semaphore deadlock can occur if nilfs_get_block() detects metadata corruption while locating data blocks and a superblock writeback occurs at the same time: task 1 task 2 ------ ------ * A file operation * nilfs_truncate() nilfs_get_block() down_read(rwsem A) <-- nilfs_bmap_lookup_contig() ... generic_shutdown_super() nilfs_put_super() * Prepare to write superblock * down_write(rwsem B) <-- nilfs_cleanup_super() * Detect b-tree corruption * nilfs_set_log_cursor() nilfs_bmap_convert_error() nilfs_count_free_blocks() __nilfs_error() down_read(rwsem A) <-- nilfs_set_error() down_write(rwsem B) <-- *** DEADLOCK *** Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem) and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata corruption, __nilfs_error() is called from nilfs_bmap_convert_error() inside the lock section. Since __nilfs_error() calls nilfs_set_error() unless the filesystem is read-only and nilfs_set_error() attempts to writelock rwsem B (= nilfs->ns_sem) to write back superblock exclusively, hierarchical lock acquisition occurs in the order rwsem A -> rwsem B. Now, if another task starts updating the superblock, it may writelock rwsem B during the lock sequence above, and can deadlock trying to readlock rwsem A in nilfs_count_free_blocks(). However, there is actually no need to take rwsem A in nilfs_count_free_blocks() because it, within the lock section, only reads a single integer data on a shared struct with nilfs_sufile_get_ncleansegs(). This has been the case after commit aa474a220180 ("nilfs2: add local variable to cache the number of clean segments"), that is, even before this bug was introduced. So, this resolves the deadlock problem by just not taking the semaphore in nilfs_count_free_blocks(). Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+45d6ce7b7ad7ef455d03@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [2.6.38+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12Merge tag 'mm-hotfixes-stable-2022-10-11' of ↵Linus Torvalds2-5/+21
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one is not" * tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix leak of nilfs_root in case of writer thread creation failure nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level() nilfs2: fix use-after-free bug of struct nilfs_root mm/damon/core: initialize damon_target->list in damon_new_target() mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
2022-10-12Merge tag 'mm-nonmm-stable-2022-10-11' of ↵Linus Torvalds2-11/+13
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - hfs and hfsplus kmap API modernization (Fabio Francesco) - make crash-kexec work properly when invoked from an NMI-time panic (Valentin Schneider) - ntfs bugfixes (Hawkins Jiawei) - improve IPC msg scalability by replacing atomic_t's with percpu counters (Jiebin Sun) - nilfs2 cleanups (Minghao Chi) - lots of other single patches all over the tree! * tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype proc: test how it holds up with mapping'less process mailmap: update Frank Rowand email address ia64: mca: use strscpy() is more robust and safer init/Kconfig: fix unmet direct dependencies ia64: update config files nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure fork: remove duplicate included header files init/main.c: remove unnecessary (void*) conversions proc: mark more files as permanent nilfs2: remove the unneeded result variable nilfs2: delete unnecessary checks before brelse() checkpatch: warn for non-standard fixes tag style usr/gen_init_cpio.c: remove unnecessary -1 values from int file ipc/msg: mitigate the lock contention with percpu counter percpu: add percpu_counter_add_local and percpu_counter_sub_local fs/ocfs2: fix repeated words in comments relay: use kvcalloc to alloc page array in relay_alloc_page_array proc: make config PROC_CHILDREN depend on PROC_FS fs: uninline inode_maybe_inc_iversion() ...
2022-10-11nilfs2: fix leak of nilfs_root in case of writer thread creation failureRyusuke Konishi1-4/+3
If nilfs_attach_log_writer() failed to create a log writer thread, it frees a data structure of the log writer without any cleanup. After commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes a leak of struct nilfs_root, which started to leak an ifile metadata inode and a kobject on that struct. In addition, if the kernel is booted with panic_on_warn, the above ifile metadata inode leak will cause the following panic when the nilfs2 kernel module is removed: kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when called from nilfs_destroy_cachep+0x16/0x3a [nilfs2] WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140 ... RIP: 0010:kmem_cache_destroy+0x138/0x140 Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48 ... Call Trace: <TASK> ? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2] nilfs_destroy_cachep+0x16/0x3a [nilfs2] exit_nilfs_fs+0xa/0x1b [nilfs2] __x64_sys_delete_module+0x1d9/0x3a0 ? __sanitizer_cov_trace_pc+0x1a/0x50 ? syscall_trace_enter.isra.19+0x119/0x190 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> Kernel panic - not syncing: panic_on_warn set ... This patch fixes these issues by calling nilfs_detach_log_writer() cleanup function if spawning the log writer thread fails. Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com Fixes: e912a5b66837 ("nilfs2: use root object to get ifile") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+7381dc4ad60658ca4c05@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()Ryusuke Konishi1-0/+2
If the i_mode field in inode of metadata files is corrupted on disk, it can cause the initialization of bmap structure, which should have been called from nilfs_read_inode_common(), not to be called. This causes a lockdep warning followed by a NULL pointer dereference at nilfs_bmap_lookup_at_level(). This patch fixes these issues by adding a missing sanitiy check for the i_mode field of metadata file's inode. Link: https://lkml.kernel.org/r/20221002030804.29978-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+2b32eb36c1a825b7a74c@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11nilfs2: fix use-after-free bug of struct nilfs_rootRyusuke Konishi1-1/+16
If the beginning of the inode bitmap area is corrupted on disk, an inode with the same inode number as the root inode can be allocated and fail soon after. In this case, the subsequent call to nilfs_clear_inode() on that bogus root inode will wrongly decrement the reference counter of struct nilfs_root, and this will erroneously free struct nilfs_root, causing kernel oopses. This fixes the problem by changing nilfs_new_inode() to skip reserved inode numbers while repairing the inode bitmap. Link: https://lkml.kernel.org/r/20221003150519.39789-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b8c672b0e22615c80fe0@syzkaller.appspotmail.com Reported-by: Khalid Masum <khalid.masum.92@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failureRyusuke Konishi1-4/+10
If creation or finalization of a checkpoint fails due to anomalies in the checkpoint metadata on disk, a kernel warning is generated. This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted with panic_on_warn, does not panic. A nilfs_error is appropriate here to handle the abnormal filesystem condition. This also replaces the detected error codes with an I/O error so that neither of the internal error codes is returned to callers. Link: https://lkml.kernel.org/r/20220929123330.19658-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+fbb3e0b24e8dae5a16ee@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03nilfs2: remove the unneeded result variableye xingchen1-3/+1
Return the value nilfs_segctor_sync() directly instead of storing it in another redundant variable. Link: https://lkml.kernel.org/r/20220831033403.302184-1-ye.xingchen@zte.com.cn Link: https://lkml.kernel.org/r/20220921034803.2476-3-konishi.ryusuke@gmail.com Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03nilfs2: delete unnecessary checks before brelse()Minghao Chi1-4/+2
Patch series "nilfs2 minor amendments". This patch (of 2): The brelse() inline function tests whether its argument is NULL and then returns immediately. Thus remove the tests which are not needed around the shown calls. Link: https://lkml.kernel.org/r/20220921034803.2476-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20220819081700.96279-1-chi.minghao@zte.com.cn Link: https://lkml.kernel.org/r/20220921034803.2476-2-konishi.ryusuke@gmail.com Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11nilfs2: convert nilfs_find_uncommited_extent() to use ↵Vishal Moola (Oracle)1-27/+18
filemap_get_folios_contig() Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_contig(). Now also supports large folios. Also clean up an unnecessary if statement - pvec.pages[0]->index > index will always evaluate to false, and filemap_get_folios_contig() returns 0 if there is no folio found at index. Link: https://lkml.kernel.org/r/20220824004023.77310-6-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: David Sterba <dsterb@suse.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-03Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2-31/+31
Pull folio updates from Matthew Wilcox: - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits) fs: remove the NULL get_block case in mpage_writepages fs: don't call ->writepage from __mpage_writepage fs: remove the nobh helpers jfs: stop using the nobh helper ext2: remove nobh support ntfs3: refactor ntfs_writepages mm/folio-compat: Remove migration compatibility functions fs: Remove aops->migratepage() secretmem: Convert to migrate_folio hugetlb: Convert to migrate_folio aio: Convert to migrate_folio f2fs: Convert to filemap_migrate_folio() ubifs: Convert to filemap_migrate_folio() btrfs: Convert btrfs_migratepage to migrate_folio mm/migrate: Add filemap_migrate_folio() mm/migrate: Convert migrate_page() to migrate_folio() nfs: Convert to migrate_folio btrfs: Convert btree_migratepage to migrate_folio mm/migrate: Convert expected_page_refs() to folio_expected_refs() mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() ...
2022-08-02Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-blockLinus Torvalds5-22/+22
Pull block updates from Jens Axboe: - Improve the type checking of request flags (Bart) - Ensure queue mapping for a single queues always picks the right queue (Bart) - Sanitize the io priority handling (Jan) - rq-qos race fix (Jinke) - Reserved tags handling improvements (John) - Separate memory alignment from file/disk offset aligment for O_DIRECT (Keith) - Add new ublk driver, userspace block driver using io_uring for communication with the userspace backend (Ming) - Use try_cmpxchg() to cleanup the code in various spots (Uros) - Finally remove bdevname() (Christoph) - Clean up the zoned device handling (Christoph) - Clean up independent access range support (Christoph) - Clean up and improve block sysfs handling (Christoph) - Clean up and improve teardown of block devices. This turns the usual two step process into something that is simpler to implement and handle in block drivers (Christoph) - Clean up chunk size handling (Christoph) - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu, Ming, Sebastian, Yang, Ying) * tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits) ublk_drv: fix double shift bug ublk_drv: make sure that correct flags(features) returned to userspace ublk_drv: fix error handling of ublk_add_dev ublk_drv: fix lockdep warning block: remove __blk_get_queue block: call blk_mq_exit_queue from disk_release for never added disks blk-mq: fix error handling in __blk_mq_alloc_disk ublk: defer disk allocation ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask ublk: fold __ublk_create_dev into ublk_ctrl_add_dev ublk: cleanup ublk_ctrl_uring_cmd ublk: simplify ublk_ch_open and ublk_ch_release ublk: remove the empty open and release block device operations ublk: remove UBLK_IO_F_PREFLUSH ublk: add a MAINTAINERS entry block: don't allow the same type rq_qos add more than once mmc: fix disk/queue leak in case of adding disk failure ublk_drv: fix an IS_ERR() vs NULL check ublk: remove UBLK_IO_F_INTEGRITY ublk_drv: remove unneeded semicolon ...
2022-07-14fs/nilfs2: Use the enum req_op and blk_opf_t typesBart Van Assche5-21/+21
Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Combine the 'mode' and 'mode_flags' arguments of nilfs_btnode_submit_block into a single argument 'opf'. Reviewed-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-59-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14fs/buffer: Combine two submit_bh() and ll_rw_block() argumentsBart Van Assche3-3/+3
Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Acked-by: Song Liu <song@kernel.org> (for the md changes) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-03nilfs2: fix incorrect masking of permission flags for symlinksRyusuke Konishi1-0/+3
The permission flags of newly created symlinks are wrongly dropped on nilfs2 with the current umask value even though symlinks should have 777 (rwxrwxrwx) permissions: $ umask 0022 $ touch file && ln -s file symlink; ls -l file symlink -rw-r--r--. 1 root root 0 Jun 23 16:29 file lrwxr-xr-x. 1 root root 4 Jun 23 16:29 symlink -> file This fixes the bug by inserting a missing check that excludes symlinks. Link: https://lkml.kernel.org/r/1655974441-5612-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: Tommy Pettersson <ptp@lysator.liu.se> Reported-by: Ciprian Craciun <ciprian.craciun@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-29nilfs2: Remove check for PageErrorMatthew Wilcox (Oracle)1-1/+1
If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this test is not needed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29nilfs2: Convert nilfs_copy_back_pages() to use filemap_get_folios()Matthew Wilcox (Oracle)1-30/+30
Use folios throughout. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-05-24Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2-15/+14
Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...
2022-05-12nilfs2: Fix some kernel-doc commentsYang Li1-6/+7
The description of @flags in nilfs_dirty_inode() kernel-doc comment is missing, and some functions had kernel-doc that used a hash instead of a colon to separate the parameter name from the one line description. Fix them to remove some warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/nilfs2/inode.c:73: warning: Function parameter or member 'inode' not described in 'nilfs_get_block' fs/nilfs2/inode.c:73: warning: Function parameter or member 'blkoff' not described in 'nilfs_get_block' fs/nilfs2/inode.c:73: warning: Function parameter or member 'bh_result' not described in 'nilfs_get_block' fs/nilfs2/inode.c:73: warning: Function parameter or member 'create' not described in 'nilfs_get_block' fs/nilfs2/inode.c:145: warning: Function parameter or member 'file' not described in 'nilfs_readpage' fs/nilfs2/inode.c:145: warning: Function parameter or member 'page' not described in 'nilfs_readpage' fs/nilfs2/inode.c:968: warning: Function parameter or member 'flags' not described in 'nilfs_dirty_inode' Link: https://lkml.kernel.org/r/20220324024215.63479-1-yang.lee@linux.alibaba.com Link: https://lkml.kernel.org/r/1652276316-7791-1-git-send-email-konishi.ryusuke@gmail.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09nilfs2: Remove comment about releasepageMatthew Wilcox (Oracle)1-1/+0
If we need a release_folio, we can add it back. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-09fs: Convert mpage_readpage to mpage_read_folioMatthew Wilcox (Oracle)1-5/+5
mpage_readpage still works in terms of pages, and has not been audited for correctness with large folios, so include an assertion that the filesystem is not passing it large folios. Convert all the filesystems to call mpage_read_folio() instead of mpage_readpage(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-08fs: Remove flags parameter from aops->write_beginMatthew Wilcox (Oracle)1-1/+1
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08fs: Remove aop flags parameter from block_write_begin()Matthew Wilcox (Oracle)2-3/+2
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-17block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARDChristoph Hellwig2-4/+4
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_discard_granularity helperChristoph Hellwig1-2/+2
Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: remove QUEUE_FLAG_DISCARDChristoph Hellwig1-1/+1
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-01nilfs2: get rid of nilfs_mapping_init()Ryusuke Konishi2-10/+0
After applying the lockdep warning fixes, nilfs_mapping_init() is no longer used, so delete it. Link: https://lkml.kernel.org/r/1647867427-30498-4-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hao Sun <sunhao.th@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01nilfs2: fix lockdep warnings during disk space reclamationRyusuke Konishi5-21/+92
During disk space reclamation, nilfs2 still emits the following lockdep warning due to page/folio operations on shadowed page caches that nilfs2 uses to get a snapshot of DAT file in memory: WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670 ... RIP: 0010:__folio_mark_dirty+0x645/0x670 ... Call Trace: filemap_dirty_folio+0x74/0xd0 __set_page_dirty_nobuffers+0x85/0xb0 nilfs_copy_dirty_pages+0x288/0x510 [nilfs2] nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2] nilfs_clean_segments+0xee/0x5d0 [nilfs2] nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2] nilfs_ioctl+0xc52/0xfb0 [nilfs2] __x64_sys_ioctl+0x11d/0x170 This fixes the remaining warning by using inode objects to hold those page caches. Link: https://lkml.kernel.org/r/1647867427-30498-3-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01nilfs2: fix lockdep warnings in page operations for btree nodesRyusuke Konishi10-50/+154
Patch series "nilfs2 lockdep warning fixes". The first two are to resolve the lockdep warning issue, and the last one is the accompanying cleanup and low priority. Based on your comment, this series solves the issue by separating inode object as needed. Since I was worried about the impact of the object composition changes, I tested the series carefully not to cause regressions especially for delicate functions such like disk space reclamation and snapshots. This patch (of 3): If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at inode_to_wb() during page/folio operations for btree nodes: WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 Modules linked in: ... RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline] RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline] RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 ... Call Trace: __set_page_dirty include/linux/pagemap.h:834 [inline] mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145 nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline] nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009 nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline] nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036 nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline] nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 This is because nilfs2 uses two page caches for each inode and inode->i_mapping never points to one of them, the btree node cache. This causes inode_to_wb(inode) to refer to a different page cache than the caller page/folio operations such like __folio_start_writeback(), __folio_end_writeback(), or __folio_mark_dirty() acquired the lock. This patch resolves the issue by allocating and using an additional inode to hold the page cache of btree nodes. The inode is attached one-to-one to the traditional nilfs2 inode if it requires a block mapping with b-tree. This setup change is in memory only and does not affect the disk format. Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com Reported-by: Hao Sun <sunhao.th@gmail.com> Reported-by: David Hildenbrand <david@redhat.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2-22/+21
Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-17/+1
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22fs: allocate inode by using alloc_inode_sb()Muchun Song1-1/+1
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22remove bdi_congested() and wb_congested() and related functionsNeilBrown1-16/+0
These functions are no longer useful as no BDIs report congestions any more. Removing the test on bdi_write_contested() in current_may_throttle() could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set. So replace the calls by 'false' and simplify the code - and remove the functions. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs] Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-16fs: Convert __set_page_dirty_buffers to block_dirty_folioMatthew Wilcox (Oracle)1-2/+2
Convert all callers; mostly this is just changing the aops to point at it, but a few implementations need a little more work. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-16nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio()Matthew Wilcox (Oracle)1-20/+18
The comment about the page always being locked is wrong, so copy the locking protection from __set_page_dirty_buffers(). That means moving the call to nilfs_set_file_dirty() down the function so as to not acquire a new dependency between the mapping->private_lock and the ns_inode_lock. That might be a harmless dependency to add, but it's not necessary. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15fs: Turn block_invalidatepage into block_invalidate_folioMatthew Wilcox (Oracle)2-1/+2
Remove special-casing of a NULL invalidatepage, since there is no more block_invalidatepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-02-27nilfs2: pass the operation to bio_allocChristoph Hellwig1-11/+9
Refactor the segbuf write code to pass the op to bio_alloc instead of setting it just before the submission. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Link: https://lore.kernel.org/r/20220222154634.597067-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: pass a block_device and opf to bio_allocChristoph Hellwig1-2/+2
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02nilfs2: remove nilfs_alloc_seg_bioChristoph Hellwig1-27/+4
bio_alloc will never fail when it can sleep. Remove the now simple nilfs_alloc_seg_bio helper and open code it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-20Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+2
Merge more updates from Andrew Morton: "55 patches. Subsystems affected by this patch series: percpu, procfs, sysctl, misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2, hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits) lib: remove redundant assignment to variable ret ubsan: remove CONFIG_UBSAN_OBJECT_SIZE kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB btrfs: use generic Kconfig option for 256kB page size limit arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB configs: introduce debug.config for CI-like setup delayacct: track delays from memory compact Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact delayacct: cleanup flags in struct task_delay_info and functions use it delayacct: fix incomplete disable operation when switch enable to disable delayacct: support swapin delay accounting for swapping without blkio panic: remove oops_id panic: use error_report_end tracepoint on warnings fs/adfs: remove unneeded variable make code cleaner FAT: use io_schedule_timeout() instead of congestion_wait() hfsplus: use struct_group_attr() for memcpy() region nilfs2: remove redundant pointer sbufs fs/binfmt_elf: use PT_LOAD p_align values for static PIE const_structs.checkpatch: add frequently used ops structs ...
2022-01-20nilfs2: remove redundant pointer sbufsColin Ian King1-2/+2
Pointer sbufs is being assigned a value but it's not being used later on. The pointer is redundant and can be removed. Cleans up scan-build static analysis warning: fs/nilfs2/page.c:203:8: warning: Although the value stored to 'sbufs' is used in the enclosing expression, the value is never actually read from 'sbufs' [deadcode.DeadStores] sbh = sbufs = page_buffers(src); Link: https://lkml.kernel.org/r/20211211180955.550380-1-colin.i.king@gmail.com Link: https://lkml.kernel.org/r/1640712476-15136-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>