Age | Commit message (Collapse) | Author | Files | Lines |
|
Update all the necessary files for a 6.7.0 release.
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
HDIO_GETGEO has been around longer than XFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
SG_IO has been around longer than XFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fstatat has been supported since Linux 2.6.16 and glibc 2.4.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
openat has been supported since Linux 2.6.16 and glibc 2.4.
Note that xfs_db already uses it without the ifdef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The f_flags field has been supported since Linux 2.6.36.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fsetxattr has been supported since Linux 2.4 and glibc 2.3.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
mremap has been around since before the dawn of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
preadv and pwritev have been supported since Linux 2.6.30.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
syncfs has been supported since Linux 2.6.39.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fallocate has been supported since Linux 2.6.23 and glibc 2.10.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fls should never be provided by system headers. It seems like on MacOS
it did, but as we're not supporting MacOS anymore there is no need to
check for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
readdir has been part of Posix since the very beginning.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
sync_file_range has been supported since Linux 2.6.17.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fiemap has been supported since Linux 2.6.28.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
mincore has been supported since Linux 2.3.99pre1 and glibc 2.2.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
madvise has been supported since before the dawn of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
sendfile has been supported since Linux 2.2 and glibc 2.1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fadvise has been supported since Linux 2.5.60 and glibc 2.2.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
We can't support block device access (which is the reason for xfsprogs
to exist) without blkid. Make it a hard requirement and remove the
stubs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
getmntent always exists on Linux (and always has), so don't bother
checking for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
HAVE_SEEK_DATA is never defined, so the code in xfs_io just
unconditionally redefines SEEK_DATA and SEEK_HOLE. Switch to the
system version instead, which has been around since Linux 3.1 and
glibc of similar vintage.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Now that the sizeof checks are gone, we can stop generating platform_defs.h.
The only caveat is that we need to stop undefining ENABLE_GETTEXT, which the
generation process had removed before. The actual ENABLE_GETTEXT will be
passd on the compiler command line, just like other ENABLE or HAVE values
from autoconf.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
SIZEOF_LONG together with the unused SIZEOF_CHAR_P is the last thing that
really needs a generated configuration header. Switch to just using
sizeof(long) so that we can stop generating platform_defs.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Check the 32-bit limits using sizeof instead of cpp ifdefs so that we
can get rid of BITS_PER_LONG.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
No system or kernel uapi header defines umode_t, so just define it
unconditionally.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Neither struct filldir, nor filldir_t is used anywhere in xfsprogs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
On a disk with 4096-byte LBAs, the xfs_db 'type data' subcommand doesn't
work:
# xfs_io -c 'sb' -c 'type data' /dev/sda
xfs_db: read failed: Invalid argument
no current object
The cause of this is the hardcoded initialization of bb_count when we're
setting type data -- it should be the filesystem sector size, not just 1.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Section 4.9 of the Debian Policy reads:
"The package build should be as verbose as reasonably possible,
except where the terse tag is included in DEB_BUILD_OPTIONS".
Implement such behavior for xfsprogs by passing V=1 to make by default.
Link: https://www.debian.org/doc/debian-policy/ch-source.html#main-building-script-debian-rules
Link: https://bugs.debian.org/1063774
Reported-by: Emanuele Rocca <ema@debian.org>
Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Suggested by Darrick during LFS review. We take the same approach as in
5c0599b721d1d232d2e400f357abdf2736f24a97 ('Fix building xfsprogs on 32-bit platforms')
to avoid autoconf hell - just take the tried & tested approach which is working
fine for us with LFS already.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
We now require (at least) 64-bit time_t, so we need to adjust some printf
specifiers accordingly.
Unfortunately, we've stumbled upon a ridiculous C mmoment whereby there's
no neat format specifier (not even one of the inttypes ones) for time_t, so
we cast to intmax_t and use %jd.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
LFS64 interfaces are non-standard and are being removed in the upcoming musl
1.2.5. Setting _FILE_OFFSET_BITS=64 (which is currently being done) makes all
interfaces on glibc 64-bit by default, so using the LFS64 interfaces is
redundant. This commit replaces all occurences of off64_t with off_t,
stat64 with stat, and fstat64 with fstat.
Link: https://bugs.gentoo.org/907039
Cc: Felix Janda <felix.janda@posteo.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Violet Purcell <vimproved@inventati.org>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 038ca189c0d2c1570b4d922f25b524007c85cf94
Discovered when trying to track down a weird recovery corruption
issue that wasn't detected at recovery time.
The specific corruption was a zero extent count field when big
extent counts are in use, and it turns out the dinode verifier
doesn't detect that specific corruption case, either. So fix it too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: f63a5b3769ad7659da4c0420751d78958ab97675
We've been seeing XFS errors like the following:
XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280
...
Call Trace:
xfs_corruption_error+0x94/0xa0
xfs_btree_insert+0x221/0x280
xfs_alloc_fixup_trees+0x104/0x3e0
xfs_alloc_ag_vextent_size+0x667/0x820
xfs_alloc_fix_freelist+0x5d9/0x750
xfs_free_extent_fix_freelist+0x65/0xa0
__xfs_free_extent+0x57/0x180
...
This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when
xfs_btree_insrec() fails.
After converting this into a panic and dissecting the core dump, I found
that xfs_btree_insrec() is failing because it's trying to split a leaf
node in the cntbt when the AG free list is empty. In particular, it's
failing to get a block from the AGFL _while trying to refill the AGFL_.
If a single operation splits every level of the bnobt and the cntbt (and
the rmapbt if it is enabled) at once, the free list will be empty. Then,
when the next operation tries to refill the free list, it allocates
space. If the allocation does not use a full extent, it will need to
insert records for the remaining space in the bnobt and cntbt. And if
those new records go in full leaves, the leaves (and potentially more
nodes up to the old root) need to be split.
Fix it by accounting for the additional splits that may be required to
refill the free list in the calculation for the minimum free list size.
P.S. As far as I can tell, this bug has existed for a long time -- maybe
back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros
...") in April 1994! It requires a very unlucky sequence of events, and
in fact we didn't hit it until a particular sparse mmap workload updated
from 5.12 to 5.19. But this bug existed in 5.12, so it must've been
exposed by some other change in allocation or writeback patterns. It's
also much less likely to be hit with the rmapbt enabled, since that
increases the minimum free list size and is unlikely to split at the
same time as the bnobt and cntbt.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: f8f9d952e42dd49ae534f61f2fa7ca0876cb9848
When recovering intents, we capture newly created intent items as part of
point, we forget to remove those newly created intent items from the AIL
and hang:
[root@localhost ~]# cat /proc/539/stack
[<0>] xfs_ail_push_all_sync+0x174/0x230
[<0>] xfs_unmount_flush_inodes+0x8d/0xd0
[<0>] xfs_mountfs+0x15f7/0x1e70
[<0>] xfs_fs_fill_super+0x10ec/0x1b20
[<0>] get_tree_bdev+0x3c8/0x730
[<0>] vfs_get_tree+0x89/0x2c0
[<0>] path_mount+0xecf/0x1800
[<0>] do_mount+0xf3/0x110
[<0>] __x64_sys_mount+0x154/0x1f0
[<0>] do_syscall_64+0x39/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
When newly created intent items fail to commit via transaction, intent
recovery hasn't created done items for these newly created intent items,
so the capture structure is the sole owner of the captured intent items.
We must release them explicitly or else they leak:
unreferenced object 0xffff888016719108 (size 432):
comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
hex dump (first 32 bytes):
08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff ..q.......q.....
18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff ..q.......q.....
backtrace:
[<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
[<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
[<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
[<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
[<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
[<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
[<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
[<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
[<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
[<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
[<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
[<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
[<ffffffff81a9f35f>] path_mount+0xecf/0x1800
[<ffffffff81a9fd83>] do_mount+0xf3/0x110
[<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
[<ffffffff83968739>] do_syscall_64+0x39/0x80
Fix the problem above by abort intent items that don't have a done item
when recovery intents fail.
Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 2a5db859c6825b5d50377dda9c3cc729c20cad43
Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which
not use transaction parameter, so it can be used after the transaction
life cycle.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: e23aaf450de733044a74bc95528f728478b61c2a
In commit 355e3532132b ("xfs: cache minimum realtime summary level"), I
added a cache of the minimum level of the realtime summary that has any
free extents. However, it turns out that the _maximum_ level is more
useful for upcoming optimizations, and basically equivalent for the
existing usage. So, let's change the meaning of the cache to be the
maximum level + 1, or 0 if there are no free extents.
For example, if the cache contains:
{0, 4}
then there are no free extents starting in realtime bitmap block 0, and
there are no free extents larger than or equal to 2^4 blocks starting in
realtime bitmap block 1. The cache is a loose upper bound, so there may
or may not be free extents smaller than 2^4 blocks in realtime bitmap
block 1.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: e2cf427c91494ea0d1173a911090c39665c5fdef
Simplify the calling convention of these functions since the
xfs_rtalloc_args structure contains the parameters we need.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 5b1d0ae9753f0654ab56c1e06155b3abf2919d71
Now that xfs_rtalloc_args holds references to the last-read bitmap and
summary blocks, we don't need to pass the buffer pointer out of
xfs_rtbuf_get.
Callers no longer have to xfs_trans_brelse on their own, though they are
required to call xfs_rtbuf_cache_relse before the xfs_rtalloc_args goes
out of scope.
While we're at it, create some trivial helpers so that we don't have to
remember if "0" means "bitmap" and "1" means "summary".
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: e94b53ff699c2674a9ec083342a5254866210ade
Profiling a workload on a highly fragmented realtime device showed a ton
of CPU cycles being spent in xfs_trans_read_buf() called by
xfs_rtbuf_get(). Further tracing showed that much of that was repeated
calls to xfs_rtbuf_get() for the same block of the realtime bitmap.
These come from xfs_rtallocate_extent_block(): as it walks through
ranges of free bits in the bitmap, each call to xfs_rtcheck_range() and
xfs_rtfind_{forw,back}() gets the same bitmap block. If the bitmap block
is very fragmented, then this is _a lot_ of buffer lookups.
The realtime allocator already passes around a cache of the last used
realtime summary block to avoid repeated reads (the parameters rbpp and
rsb). We can do the same for the realtime bitmap.
This replaces rbpp and rsb with a struct xfs_rtbuf_cache, which caches
the most recently used block for both the realtime bitmap and summary.
xfs_rtbuf_get() now handles the caching instead of the callers, which
requires plumbing xfs_rtbuf_cache to more functions but also makes sure
we don't miss anything.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 41f33d82cfd310e344fc9183f02cc9e0d2d27663
Consolidate the arguments passed around the rt allocator into a
struct xfs_rtalloc_arg similar to how the btree allocator arguments
are consolidated in a struct xfs_alloc_arg....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 663b8db7b0256b81152b2f786e45ecf12bdf265f
Create get and set functions for rtsummary words so that we can redefine
the ondisk format with a specific endianness. Note that this requires
the definition of a distinct type for ondisk summary info words so that
the compiler can perform proper typechecking.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: bd85af280de66a946022775a876edf0c553e3f35
Create helper functions that compute the number of blocks or words
necessary to store the rt summary file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 97e993830a1cdd86ad7d207308b9f55a00660edd
Create get and set functions for rtbitmap words so that we can redefine
the ondisk format with a specific endianness. Note that this requires
the definition of a distinct type for ondisk rtbitmap words so that the
compiler can perform proper typechecking as we go back and forth.
In the upcoming rtgroups feature, we're going to fix the problem that
rtwords are written in host endian order, which means we'll need the
distinct rtword/rtword_raw types.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 312d61021b8947446aa9ec80b78b9230e8cb3691
Create an explicit helper function to log parts of rt bitmap and summary
blocks. While we're at it, fix an off-by-one error in two of the
rtbitmap logging calls that led to unnecessarily large log items but was
otherwise benign.
Note that the upcoming rtgroups patchset will add block headers to the
rtbitmap and rtsummary files. The helpers in this and the next few
patches take a less than direct route through xfs_rbmblock_wordptr and
xfs_rsumblock_infoptr to avoid helper churn in that patchset.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: d0448fe76ac1a9ccbce574577a4c82246d17eec4
Create helper functions that compute the number of blocks or words
necessary to store the rt bitmap.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 75d1e312bbbd175fa27ffdd4c4fe9e8cc7d047ec
Convert to using the new inode timestamp accessor functions.
[Carlos: Also partially port 077c212f0344ae and 12cd4402365166]
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-75-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 097b4b7b64ef67a4703b89fd4064480b61557fd5
Convert the realtime summary file macros to helper functions so that we
can improve type checking.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: a9948626849c2c65dfd201b5e9d855e62937de61
There are a bunch of places where we use open-coded logic to find a
pointer to an xfs_rtword_t within a rt bitmap buffer. Convert all that
to helper functions for better type safety.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: add3cddaea509071d01bf1d34df0d05db1a93a07
Remove these trivial macros since they're not even part of the ondisk
format.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 90d98a6ada1da0f8797ff3f5adafd175dd8c0a81
Replace these macros with typechecked helper functions. Eventually
we're going to add more logic to the helpers and it'll be easier if we
don't have to macro it up.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: ef5a83b7e597038d1c734ddb4bc00638082c2bf1
Avoid the costs of integer division (32-bit and 64-bit) if the realtime
extent size is a power of two.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 5f57f7309d9ab9d24d50c5707472b1ed8af4eabc
Create a pair of functions to round rtblock numbers up or down to the
nearest rt extent.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 055641248f649b52620a5fe8774bea253690e057
Convert these calls to use the helpers, and clean up all these places
where the same variable can have different units depending on where it
is in the function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 5dc3a80d46a450481df7f7e9fe673ba3eb4514c3
Create helpers to do unit conversions of rt block numbers to rt extent
numbers. There are three variations -- one to compute the rt extent
number from an rt block number; one to compute the offset of an rt block
within an rt extent; and one to extract both.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 2c2b981b737a519907429f62148bbd9e40e01132
Create a helper to compute the realtime extent (xfs_rtxlen_t) from an
extent length (xfs_extlen_t) value.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 68db60bf01c131c09bbe35adf43bd957a4c124bc
Create a helper to compute the misalignment between a file extent
(xfs_extlen_t) and a realtime extent.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: fa5a387230861116c2434c20d29fc4b3fd077d24
Create a helper to convert a realtime extent to a realtime block. Later
on we'll change the helper to use bit shifts when possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 2d5f216b77e33f9b503bd42998271da35d4b7055
Further disambiguate the xfs_rtblock_t uses by creating a new type,
xfs_rtxnum_t, to store the position of an extent within the realtime
section, in units of rtextents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 3d2b6d034f0feb7741b313f978a2fe45e917e1be
This helper function validates that a range of *blocks* in the
realtime section is completely contained within the realtime section.
It does /not/ validate ranges of *rtextents*. Rename the function to
avoid suggesting that it does, and change the type of the @len parameter
since xfs_rtblock_t is a position unit, not a length unit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: f29c3e745dc253bf9d9d06ddc36af1a534ba1dd0
XFS uses xfs_rtblock_t for many different uses, which makes it much more
difficult to perform a unit analysis on the codebase. One of these
(ab)uses is when we need to store the length of a free space extent as
stored in the realtime bitmap. Because there can be up to 2^64 realtime
extents in a filesystem, we need a new type that is larger than
xfs_rtxlen_t for callers that are querying the bitmap directly. This
means scrub and growfs.
Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx
lengths. 'b' stands for 'bitmap' or 'big'; reader's choice.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 03f4de332e2e79db36ed2156fb2350480f142bec
We should use xfs_fileoff_t to store the file block offset of any
location within the realtime bitmap or summary files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: a684c538bc14410565e8939393089670fa1e19dd
In most of the filesystem, we use xfs_extlen_t to store the length of a
file (or AG) space mapping in units of fs blocks. Unfortunately, the
realtime allocator also uses it to store the length of a rt space
mapping in units of rt extents. This is confusing, since one rt extent
can consist of many fs blocks.
Separate the two by introducing a new type (xfs_rtxlen_t) to store the
length of a space mapping (in units of realtime extents) that would be
found in a file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 13928113fc5b5e79c91796290a99ed991ac0efe2
Move all the declarations for functionality in xfs_rtbitmap.c into a
separate xfs_rtbitmap.h header file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: ddd98076d5c075c8a6c49d9e6e8ee12844137f23
The unit conversions in this function do not make sense. First we
convert a block count to bytes, then divide that bytes value by
rextsize, which is in blocks, to get an rt extent count. You can't
divide bytes by blocks to get a (possibly multiblock) extent value.
Fortunately nobody uses delalloc on the rt volume so this hasn't
mattered.
Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 6c664484337b37fa0cf6e958f4019623e30d40f7
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert
the physical extent of a data fork mapping for a realtime file into rt
extents and pass that to the rt extent freeing function. Since the
details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to
xfs_rtbitmap.c to reduce code size when realtime isn't enabled.
This will (one day) enable realtime EFIs to reuse the same
unit-converting call with less code duplication.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 9488062805943c2d63350d3ef9e4dc093799789a
The latest version of the fs geometry structure is v5. Bump this
constant so that xfs_db and mkfs calls to libxfs_fs_geometry will fill
out all the fields.
IOWs, this commit is a no-op for the kernel, but will be useful for
userspace reporting in later changes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Update all the necessary files for a 6.6.0 release.
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub_all: fixes for systemd services [v28.3 5/6]
This patchset ties up some problems in the xfs_scrub_all program and
service, which are essential for finding mounted filesystems to scrub
and creating the background service instances that do the scrub.
First, we need to fix various errors in pathname escaping, because
systemd does /not/ like slashes in service names. Then, teach
xfs_scrub_all to deal with systemd restarts causing it to think that a
scrub has finished before the service actually finishes. Finally,
implement a signal handler so that SIGINT (console ^C) and SIGTERM
(systemd stopping the service) shut down the xfs_scrub@ services
correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub: fixes for systemd services [v28.3 4/6]
This series fixes deficiencies in the systemd services that were created
to manage background scans. First, improve the debian packaging so that
services get installed at package install time. Next, fix copyright and
spdx header omissions.
Finally, fix bugs in the mailer scripts so that scrub failures are
reported effectively. Finally, fix xfs_scrub_all to deal with systemd
restarts causing it to think that a scrub has finished before the
service actually finishes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub: fixes to the repair code [v28.3 3/6]
Now that we've landed the new kernel code, it's time to reorganize the
xfs_scrub code that handles repairs. Clean up various naming warts and
misleading error messages. Move the repair code to scrub/repair.c as
the first step. Then, fix various issues in the repair code before we
start reorganizing things.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub: fix licensing and copyright notices [v28.3 2/6]
Fix various attribution problems in the xfs_scrub source code, such as
the author's contact information, out of date SPDX tags, and a rough
estimate of when the feature was under heavy development. The most
egregious parts are the files that are missing license information
completely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfsprogs: various bug fixes for 6.6 [1/6]
This series fixes a couple of bugs that I found in the userspace support
libraries.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfsprogs: various bug fixes for 6.6 [1/8]
This series fixes a couple of bugs that I found in the userspace support
libraries.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Currently, xfs_scrub_all does not handle termination signals well.
SIGTERM and SIGINT are left to their default handlers, which are
immediate termination of the process group in the case of SIGTERM and
raising KeyboardInterrupt in the case of SIGINT.
Terminating the process group is fine when the xfs_scrub processes are
direct children, but this completely doesn't work if we're farming the
work out to systemd services since we don't terminate the child service.
Instead, they keep going.
Raising KeyboardInterrupt doesn't work because once the main thread
calls sys.exit at the bottom of main(), it blocks in the python runtime
waiting for child threads to terminate. There's no longer any context
to handle an exception, so the signal is ignored and no child processes
are killed.
In other words, if you try to kill a running xfs_scrub_all, chances are
good it won't kill the child xfs_scrub processes. This is undesirable
and egregious since we actually have the ability to track and kill all
the subprocesses that we create.
Solve the subproblem of getting stuck in the python runtime by calling
it repeatedly until we no longer have subprocesses. This means that the
main thread loops until all threads have exited.
Solve the subproblem of the signals doing the wrong thing by setting up
our own signal handler that can wake up the main thread and initiate
subprocess shutdown, no matter whether the subprocesses are systemd
services or directly fork/exec'd.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
cron jobs don't belong in /usr/lib. Since the cron job is also
secondary to the systemd timer, it's really only provided as a courtesy
for distributions that don't use systemd. Move it to @datadir@, aka
/usr/share/xfsprogs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Get rid of the nested lambda functions to simplify the code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Per FHS 3.0, non-PATH executable binaries are supposed to live under
/usr/libexec, not /usr/lib. xfs_scrub_fail is an executable script,
so move it to libexec in case some distro some day tries to mount
/usr/lib as noexec or something.
Link: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s07.html
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If xfs_scrub_all detects a running systemd, it will use it to invoke
xfs_scrub subprocesses in a sandboxed and resource-controlled
environment. Unfortunately, if you happen to restart dbus or systemd
while it's running, you get this:
systemd[1]: Reexecuting.
xfs_scrub_all[9958]: Warning! D-Bus connection terminated.
xfs_scrub_all[9956]: Warning! D-Bus connection terminated.
xfs_scrub_all[9956]: Failed to wait for response: Connection reset by peer
xfs_scrub_all[9958]: Failed to wait for response: Connection reset by peer
xfs_scrub_all[9930]: Scrubbing / done, (err=1)
xfs_scrub_all[9930]: Scrubbing /storage done, (err=1)
The xfs_scrub units themselves are still running, it's just that the
`systemctl start' command that xfs_scrub_all uses to start and wait for
the unit lost its connection to dbus and hence is no longer monitoring
sub-services.
When this happens, we don't have great options -- systemctl doesn't have
a command to wait on an activating (aka running) unit. Emulate the
functionality we normally get by polling the failed/active statuses.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Advise recipients of the service failure emails that they should not try
to reply to the automated service message.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Currently, xfs_scrub_all will try to invoke xfs_scrub with argv[1] being
"-n -x". This of course is recognized by C getopt as a weird looking
string, not two individual arguments, and causes the child process to
exit with complaints about CLI usage.
What we really want is to split the string into a proper array and then
add them to the xfs_scrub command line. The code here isn't strictly
correct, but as @scrub_args@ is controlled by us in the Makefile, it'll
do for now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Add content type and encoding metadata so that these emails display
correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
We should return the exit code of the mailer program sending the scrub
failure reports, since that's much more important to anyone watching the
system.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
On filesystems that don't have the reverse mapping feature enabled, the
GETFSMAP call cannot tell us much about the owner of a space extent --
we're limited to static fs metadata, free space, or "unknown". In this
case, nothing is corrupt, so str_corrupt is not an appropriate logging
function. Relax this to str_info so that the user sees a notice that
media errors have been found so that the user knows something bad
happened even if the directory tree walker cannot find the file owning
the space where the media error was found.
Filesystems with rmap enabled are never supposed to return OWN_UNKNOWN
from a GETFSMAP report, so continue to report that as a corruption.
This fixes a regression in xfs/556.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Update the copyright years in the scrub/ source code files. This isn't
required, but it's helpful to remind myself just how long it's taken to
develop this feature.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
This script emails the results of failed scrub runs to root. We
shouldn't be hardcoding the path to the mailer program because distros
can change the path according to their whim. Modify this script to use
command -v to find the program.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Make sure we flush stdout after printf'ing to it, especially before we
start any operation that could take a while to complete. Most of scrub
already does this, but we missed a couple of spots.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
These files are missing the required SPDX license and copyright
information. Add them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
systemd services provide an "instance name" that can be associated with
a particular invocation of a service. This allows service users to
invoke multiple copies of a service, each with a unique string. For
xfs_scrub, we pass the mountpoint of the filesystem as the instance
name. However, systemd services aren't supposed to have slashes in
them, so we're supposed to escape them.
The canonical escaping scheme for pathnames is defined by the
systemd-escape --path command. Unfortunately, we've been adding our own
opinionated sauce for years, to work around the fact that --path didn't
exist in systemd before January 2017. The special sauce is incorrect,
and we no longer care about systemd of 7 years past.
Clean up this mess by following the systemd escaping scheme throughout
the service units. Now we can use the '%f' specifier in them, which
makes things a lot less complicated.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Fix the spdx tags to match current practice, and update the author
contact information.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
This program is not consistent as to whether or not it escapes the
pathname that is being used as the xfs_scrub service instance name.
Fix it to be consistent, and to fall back to direct invocation if
escaping doesn't work. The escaping itself is also broken, but we'll
fix that in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Use dh_installsystemd to handle the installation and activation of the
scrub systemd services. This requires bumping the compat version to 11.
Note that the services are /not/ activated on installation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
A recent refactoring to xfs_idata_realloc in the kernel made it depend
on krealloc returning NULL if the new size is zero. The xfsprogs
wrapper instead aborts, so we need to make it follow the kernel
behavior.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
When db is reporting on an io cursor, have it print out the device
that the cursor is pointing to.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Now that we have a FORCE_REBUILD flag to the scrub ioctl, try to use
that over the (much noisier) error injection knob, which may or may not
even be enabled in the kernel config.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Not sure why block device targets don't get O_DIRECT in !buffered mode,
but it's misleading when the copy completes instantly only to stall
forever due to fsync-on-close. Adjust the "write last sector" code to
allocate a properly aligned buffer.
In removing the onstack buffer for EOD writes, this also corrects the
buffer being larger than necessary -- the old code declared an array of
32768 pointers, whereas all we really need is an aligned 32768-byte
buffer.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If the kernel says it doesn't support optimizing a data structure, we
should mark it done and move on. This is much better than requeuing the
repair, in which case it will likely keep failing. Eventually these
requeued repairs end up in the single-threaded last resort at the end of
phase 4, which makes things /very/ slow.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Detect short writes to the end of the destination device and report
them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Coverity reminded me that the pthread_cond_wait can wake up and return
without the predicate variable (sft.nr_dirs > 0) actually changing.
Therefore, one has to retest the condition after each wakeup.
Coverity-id: 1554280
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Add CLI options to the scrubv and repair commands so that the user can
pass FORCE_REBUILD to force the kernel to rebuild metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Break out the parts of parse_args that extract control numbers from the
CLI arguments, so that the function isn't as long. This isn't all that
exciting now, but the scrub vectorization speedups will introduce a new
ioctl. For the new command that comes with that, we'll want the control
number parsing helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Simply the call chain by having parse_args set the scrub ioctl
parameters in the caller's object. The parse_args callers can then
invoke the ioctl directly, eliminating one function and one indirect
call.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Set exitcode to 1 if there is an error parsing the CLI arguments to the
scrub or repair commands, like we do most other places in xfs_io.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Now that we've fixed the dissimilarities between the two progress
printing callsites, refactor them into helpers. Do the same for the
duplicate code that clears sb_inprogress from the primary superblock
after the copy succeeds.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Currently, the progress reporting only triggers when the number of bytes
read is exactly a multiple of a megabyte. This isn't always guaranteed,
since AG headers can be 512 bytes in size. Fix the algorithm by
recording the number of megabytes we've reported as being read, and emit
a new report any time the bytes_read count, once converted to megabytes,
doesn't match.
Fix the v2 code to emit one final status message in case the last
extent restored is more than a megabyte.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Fix this check to look at the correct header field.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Spit out a newline after a fatal error message.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Coverity complained about the "is fd a file?" flags being uninitialized.
Clean this up.
Coverity-id: 1554270
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Update the documentation to reflect that we can metadump external log
device contents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
In the kernel, commit 8ebbf262d4684 ("xfs: don't block in busy flushing
when freeing extents") changed the allocator behavior such that AGFL
fixing can return -EAGAIN in response to detection of a deadlock with
the transaction busy extent list. If this happens, we're supposed to
requeue the EFI so that we can roll the transaction and try the item
again.
If a requeue happens, we should not free the xefi pointer in
xfs_extent_free_finish_item or else the retry will walk off a dangling
pointer. There is no extent busy list in userspace so this should
never happen, but let's fix the logic bomb anyway.
We should have ported kernel commit 0853b5de42b47 ("xfs: allow extent
free intents to be retried") to userspace, but neither Carlos nor I
noticed this fine detail. :(
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
We want to keep the rtgroup unit conversion functions as static inlines,
so share the div64 functions via libfrog instead of libxfs_priv.h.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Most of the content of libxfs_init is members duplicated for each of the
data, log and RT devices. Split those members into a separate
libxfs_dev structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Cache the open file descriptor for each device in the buftarg
structure and remove the now unused dev_map infrastructure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
A few places in xfs_repair call libxfs_device_to_fd to get the data
device fd from the data device dev_t stored in the libxfs_init
structure. Just use the file descriptor stored right there directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
No need to do a dev_t to fd lookup when the caller already has the fd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
So that the caller can stash it away without having to call
xfs_device_to_fd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
libxfs_device_open and libxfs_device_close are only used in init.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
libxfs_init initializes the device size to 0 at the start of the function
and libxfs_open_device never sets the size to a negativ value. Remove
these checks as they are dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
platform_set_blocksize has a fatal argument that is currently only
used to change the printed message. Make it actually fatal similar to
other libfrog platform helpers to simplify the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
All callers of libxfs_init always pass an actual sector size or zero in
the setblksize member. Remove the unreachable setblksize == 1 case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The libxfs_xinit stucture has four different ways to pass flags to
libxfs_init:
- the isreadonly argument despite it's name contains various LIBXFS_
flags that go beyond just the readonly flag
- the isdirect flag contains a single LIBXFS_ flag from the same name
- the usebuflock is an integer used as bool
- the bcache_flags member is used to pass flags directly to cache_init()
for the buffer cache
While there is good arguments for keeping the last one separate, all the
others are rather confusing. Consolidate them into a single flags member
using flags in the LIBXFS_* namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The only special handling for an XFS device on a regular file is that
we skip the checks in check_open. Simplify perform those conditionally
instead of duplicating the entire sequence.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Pass a libxfs_init structure to libxfs_alloc_buftarg instead of three
separate dev_t values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Pass a libxfs_init structure to libxfs_mount instead of three separate
dev_t values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Make the struct name more usual, and remove the libxfs_init_t typedef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
There is no need to export a libxfs_xinit with the somewhat unsuitable
name x from libxlog. Move it into the tools linking against libxlog
that actually need it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xlog_init currently requires a libxfs_args structure to be passed in,
and then clobbers various log-related arguments to it. There is no
good reason for that as all the required information can be calculated
without it.
Remove the x argument to xlog_init and xlog_is_dirty and the now unused
logBBstart member in struct libxfs_xinit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfsprogs has three copies of a code sequence to initialize an xlog
structure from a libxfs_init structure. Factor the code into a helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
No caller passes a non-zero verbose argument to xlog_is_dirty.
Remove the argument the code keyed off by it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Isolate the code that sets up the fake xlog into the logstat() helper to
prepare for upcoming changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
IRIX has the concept of a volume that has data/log/rt subvolumes (that's
where the subvolume name in Linux comes from), but in the current
Linux-only xfsprogs version trying to pretend we do anything with that
it is just utterly confusing. The volname is basically just a very
obsfucated second way to pass the data device name, so get rid of it
in the libxfs and progs internals.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Stop pretending we try to distinguish between the legacy Unix raw and
block devices nodes. Linux as the only currently support platform never
had them, but other modern Unix variants like FreeBSD also got rid of
this distinction years ago.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
These variables are only initialized, and then unlink is called if they
were changed from the initial value, which can't happen. Remove the
variables and the conditional unlink calls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Add an '-s' option to the 'set_encpolicy' command of xfs_io to allow
exercising the log2_data_unit_size field that is being added to struct
fscrypt_policy_v2 (kernel patch:
https://lore.kernel.org/linux-fscrypt/20230925055451.59499-6-ebiggers@kernel.org).
The xfs_io support is needed for xfstests
(https://lore.kernel.org/fstests/20231013061403.138425-1-ebiggers@kernel.org),
which currently relies on xfs_io to access the encryption ioctls.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
metadump v2 format allows dumping metadata from external log devices. This
commit allows passing the device file to which log data must be restored from
the corresponding metadump file.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit adds functionality to restore metadump stored in v2 format.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
A future commit will need to perform the device size verification on an
external log device. In preparation for this, this commit extracts the
relevant portions into a new function. No functional changes have been
introduced.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
In order to indicate the version of metadump files that they can work with,
this commit renames read_header(), show_info() and restore() functions to
read_header_v1(), show_info_v1() and restore_v1() respectively.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
We will need two variants of read_header(), show_info() and restore() helper
functions to support two versions of metadump formats. To this end, A future
commit will introduce a vector of function pointers to work with the two
metadump formats. To have a common function signature for the function
pointers, this commit replaces the first argument of the previously listed
function pointers from "struct xfs_metablock *" with "union
mdrestore_headers *".
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit moves functionality associated with opening the target device,
reading metadump header information and printing information about the
metadump into their respective functions. There are no functional changes made
by this commit.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
In order to support both v1 and v2 versions of metadump, mdrestore will have
to detect the format in which the metadump file has been stored on the disk
and then read the ondisk structures accordingly. In a step in that direction,
this commit splits the work of reading the metadump header from disk into two
parts,
1. Read the first 4 bytes containing the metadump magic code.
2. Read the remaining part of the header.
A future commit will take appropriate action based on the value of the magic
code.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit collects all state tracking variables in a new "struct mdrestore"
structure. This is done to collect all the global variables in one place
rather than having them spread across the file. A new structure member of type
"struct mdrestore_ops *" will be added by a future commit to support the two
versions of metadump.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit introduces a new function set_log_cur() allowing xfs_db to read
from an external log device. This is required by a future commit which will
add the ability to dump metadata from external log devices.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit adds functionality to dump metadata from an XFS filesystem in
newly introduced v2 format.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The corresponding metadump file's disk layout is as shown below,
|------------------------------|
| struct xfs_metadump_header |
|------------------------------|
| struct xfs_meta_extent 0 |
| Extent 0's data |
| struct xfs_meta_extent 1 |
| Extent 1's data |
| ... |
| struct xfs_meta_extent (n-1) |
| Extent (n-1)'s data |
|------------------------------|
The "struct xfs_metadump_header" is followed by alternating series of "struct
xfs_meta_extent" and the extent itself.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit moves functionality associated with writing metadump to disk into
a new function. It also renames metadump initialization, write and release
functions to reflect the fact that they work with v1 metadump files.
The metadump initialization, write and release functions are now invoked via
metadump_ops->init(), metadump_ops->write() and metadump_ops->release()
respectively.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
We will need two sets of functions to implement two versions of metadump. This
commit adds the definition for 'struct metadump_ops' to hold pointers to
version specific metadump functions.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The metadump v2 initialization function (introduced in a later commit) writes
the header structure into the metadump file. This will require the program to
open the metadump file before the initialization function has been invoked.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Move metadump initialization and release functionality into corresponding
functions. There are no functional changes made in this commit.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This commit collects all state tracking variables in a new "struct metadump"
structure. This is done to collect all the global variables in one place
rather than having them spread across the file. A new structure member of type
"struct metadump_ops *" will be added by a future commit to support the two
versions of metadump.
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The device size verification code should be writing XFS_MAX_SECTORSIZE bytes
to the end of the device rather than "sizeof(char *) * XFS_MAX_SECTORSIZE"
bytes.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
search_rt_dup_extent expects an RT extent number and not a fsbno.
Convert the units before the call. Without this we are unlikely
to ever found a legit duplicate extent on the RT subvolume because
the search will always be off the end.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When user have mounted an XFS volume, and defined project in
/etc/projects file that points to a directory on a different volume,
then:
`xfs_quota -xc "report -a" $path_to_mounted_volume'
complains with:
"xfs_quota: cannot find mount point for path \
`directory_from_projects': Invalid argument"
unlike `xfs_quota -xc "report -a"' which works as expected and no
warning is printed.
This is happening because in the 1st call we pass to xfs_quota command
the $path_to_mounted_volume argument which says to xfs_quota not to
look for all mounted volumes on the system, but use only those passed
to the command and ignore all others (This behavior is intended as an
optimization for systems with huge number of mounted volumes). After
that, while projects are initialized, the project's directories on
other volumes are obviously not in searched subset of volumes and
warning is printed.
I propose to fix this behavior by conditioning the printing of warning
only if all mounted volumes are searched.
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Clean up the code in hash.c to use the normal char type for all
high-level code, only casting to uint8_t when calling into low-level
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Signed-off-by: Jakub Bogusz <qboosh@pld-linux.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 6868b8505c807ad9397d78cc4e07cb1cb3582152
If we reduce the number of blocks in an AG, we must update the incore
geometry values as well.
Fixes: 0800169e3e2c9 ("xfs: Pre-calculate per-AG agbno geometry")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: f798accd5987dc2280e0ba9055edf1124af46a5f
This reverts commit e44df2664746aed8b6dd5245eb711a0ce33c5cf5.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 74ad4693b6473950e971b3dc525b5ee7570e05d0
Log recovery has always run on read only mounts, even where the primary
superblock advertises unknown rocompat bits. Due to a misunderstanding
between Eric and Darrick back in 2018, we accidentally changed the
superblock write verifier to shutdown the fs over that exact scenario.
As a result, the log cleaning that occurs at the end of the mounting
process fails if there are unknown rocompat bits set.
As we now allow writing of the superblock if there are unknown rocompat
bits set on a RO mount, we no longer want to turn off RO state to allow
log recovery to succeed on a RO mount. Hence we also remove all the
(now unnecessary) RO state toggling from the log recovery path.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier"
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: e44df2664746aed8b6dd5245eb711a0ce33c5cf5
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-11-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 5c83df2e54b6af870e3e02ccd2a8ecd54e36668c
Add a new (superuser-only) flag to the online metadata repair ioctl to
force it to rebuild structures, even if they're not broken. We will use
this to move metadata structures out of the way during a free space
defragmentation operation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: a0a415e34b57368acd262e1172720252c028b936
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-80-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Update all the necessary files for a 6.5.0 release.
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
CRC selftests running on a build host were useful long time ago, when
CRC support was added to the on-disk support. Now it's purpose is
replaced by fstests. Also mkfs.xfs and xfs_repair have their own
selftests.
On top of that, it adds a dependency on liburcu on the build host for
no reason - liburcu is not used by the crc32 selftest.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfsprogs during compilation tries to detect if liburcu supports atomic
64-bit ops on the platform it is being compiled on, and if not it falls
back to using pthread mutex locks.
The detection logic for that fallback relies on _uatomic_link_error()
which is a link-time trick used by liburcu that will cause compilation
errors on archs that lack the required support. That only works for the
generic liburcu code though, and it is not implemented for the
x86-specific code.
In practice this means that when xfsprogs is compiled on 32-bit x86
archs will successfully link to liburcu for atomic ops, but liburcu does
not support atomic64_t on those archs. It indicates this during runtime
by generating an illegal instruction that aborts execution, and thus
causes various xfsprogs utils to be segfaulting.
Fix this by requiring that unsigned longs are at least 64 bits in size,
which /usually/ means that 64-bit atomic counters are supported. We
can't simply execute the liburcu atomic64_t detection code during
configure instead of only relying on the linker error because that
doesn't work for cross-compiled packages.
Fixes: 7448af588a2e ("libxfs: fix atomic64_t poorly for 32-bit architectures")
Reported-by: Anthony Iliopoulos <ailiop@suse.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Ever since commit b42db0860e130 ("xfs: enhance dinode verifier"), we've
required that inodes with zero di_forkoff must also have di_aformat ==
EXTENTS and di_naextents == 0. clear_dinode_attr actually does this,
but then both callers inexplicably set di_format = LOCAL. That in turn
causes a verifier failure the next time the xattrs of that file are
read by the kernel. Get rid of the bogus field write.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Actually return the error code when extended attribute checks fail.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Use this flag to check that newly allocated inodes are, in fact,
unallocated. This matches the kernel, and prevents userspace programs
from making latent corruptions worse by unintentionally crosslinking
files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This flag is perfectly acceptable for bulkstatting a single file;
there's no reason not to allow it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
I discovered the following bad behavior in the workqueue code when I
noticed that xfs_scrub was running single-threaded despite having 4
virtual CPUs allocated to the VM. I observed this sequence:
Thread 1 WQ1 WQ2...N
workqueue_create
<start up>
pthread_cond_wait
<start up>
pthread_cond_wait
workqueue_add
next_item == NULL
pthread_cond_signal
workqueue_add
next_item != NULL
<do not pthread_cond_signal>
<receives wakeup>
<run first item>
workqueue_add
next_item != NULL
<do not pthread_cond_signal>
<run second item>
<run third item>
pthread_cond_wait
workqueue_terminate
pthread_cond_broadcast
<receives wakeup>
<nothing to do, exits>
<wakes up again>
<nothing to do, exits>
Notice how threads WQ2...N are completely idle while WQ1 ends up doing
all the work! That wasn't the point of a worker pool! Observe that
thread 1 manages to queue two work items before WQ1 pulls the first item
off the queue. When thread 1 queues the third item, it sees that
next_item is not NULL, so it doesn't wake a worker. If thread 1 queues
all the N work that it has before WQ1 empties the queue, then none of
the other thread get woken up.
Fix this by maintaining a count of the number of active threads, and
using that to wake either the sole idle thread, or all the threads if
there are many that are idle. This dramatically improves startup
behavior of the workqueue and eliminates the collapse case.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
XFS and tools (mkfs, copy, repair) don't generally rely on the block
device page cache, preferring instead to use directio. For whatever
reason, the debugger was never made to do this, but let's do that now.
This should eliminate the weird fstests failures resulting from
udev/blkid pinning a cache page while the unmounting filesystem writes
to the superblock such that xfs_db finds the stale pagecache instead of
the post-unmount superblock.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
If we're accessing the block device with directio (and hence bypassing
the page cache), then don't fail on BLKBSZSET not working. We don't
care what happens to the pagecache bufferheads.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Enable 64-bit extent counts and reverse mapping for 6.6.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Now that online fsck is feature complete, there's actually a compelling
story for having the reverse mappings enabled. Turn it on by default.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Format filesystems with the large extent counter feature turned on.
We shall now support 64-bit extent counts for the data fork and 32-bit
extent counts for the attr fork.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Create an expert-mode debugger command to create unlinked inodes.
This will hopefully aid in simulation of leaked unlinked inode handling
in the kernel and elsewhere.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Create a new command to dump the resource usage of files in the unlinked
buckets.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: f6250e205691a58c81be041b1809a2e706852641
As of 6.5-rc1, UBSAN trips over the ondisk extended attribute shortform
definitions using an array length of 1 to pretend to be a flex array.
Kernel compilers have to support unbounded array declarations, so let's
correct this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: a49bbce58ea90b14d4cb1d00681023a8606955f2
As of 6.5-rc1, UBSAN trips over the ondisk extended attribute leaf block
definitions using an array length of 1 to pretend to be a flex array.
Kernel compilers have to support unbounded array declarations, so let's
correct this.
================================================================================
UBSAN: array-index-out-of-bounds in fs/xfs/libxfs/xfs_attr_leaf.c:2535:24
index 2 is out of range for type '__u8 [1]'
Call Trace:
<TASK>
dump_stack_lvl+0x33/0x50
__ubsan_handle_out_of_bounds+0x9c/0xd0
xfs_attr3_leaf_getvalue+0x2ce/0x2e0 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_attr_leaf_get+0x148/0x1c0 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_attr_get_ilocked+0xae/0x110 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_attr_get+0xee/0x150 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_xattr_get+0x7d/0xc0 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
__vfs_getxattr+0xa3/0x100
vfs_getxattr+0x87/0x1d0
do_getxattr+0x17a/0x220
getxattr+0x89/0xf0
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 371baf5c9750a258fee21d0cb8c8d683bb057429
As of 6.5-rc1, UBSAN trips over the attrlist ioctl definitions using an
array length of 1 to pretend to be a flex array. Kernel compilers have
to support unbounded array declarations, so let's correct this. This
may cause friction with userspace header declarations, but suck is life.
================================================================================
UBSAN: array-index-out-of-bounds in fs/xfs/xfs_ioctl.c:345:18
index 1 is out of range for type '__s32 [1]'
Call Trace:
<TASK>
dump_stack_lvl+0x33/0x50
__ubsan_handle_out_of_bounds+0x9c/0xd0
xfs_ioc_attr_put_listent+0x413/0x420 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_attr_list_ilocked+0x170/0x850 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_attr_list+0xb7/0x120 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_ioc_attr_list+0x13b/0x2e0 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_attrlist_by_handle+0xab/0x120 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
xfs_file_ioctl+0x1ff/0x15e0 [xfs 4a986a89a77bb77402ab8a87a37da369ef6a3f09]
vfs_ioctl+0x1f/0x60
The kernel and xfsprogs code that uses these structures will not have
problems, but the long tail of external user programs might.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 2d7d1e7ea321b0b2810eb00183e21332ee9c4b6f
Similar to the recent patch strengthening the AGF agf_length
verification, the AGI verifier does not check that the AGI length field
is within known good bounds. This isn't currently checked by runtime
kernel code, yet we assume in many places that it is correct and verify
other metadata against it.
Add length verification to the AGI verifier. Just like the AGF length
checking, the length of the AGI must be equal to the size of the AG
specified in the superblock, unless it is the last AG in the filesystem.
In that case, it must be less than or equal to sb->sb_agblocks and
greater than XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs
operation will allow to exist.
There's only one place in the filesystem that actually uses agi_length,
but let's not leave it vulnerable to the same weird nonsense that
generates syzbot bugs, eh?
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 75dc0345312221971903b2e28279b7e24b7dbb1b
Use struct initializers to ensure that the xfs_btree_irecs passed into
the query_range function are completely initialized. No functional
changes, just closing some sloppy hygiene.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070
Need to happen before we allocate and then leak the xefi. Found by
coverity via an xfsprogs libxfs scan.
[djwong: This also fixes the type of the @agbno argument.]
Fixes: 7dfee17b13e5 ("xfs: validate block number being freed before adding to xefi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: edd8276dd70279c29d412d99b99c2c0cac1b2cdd
The AGF verifier does not check that the AGF length field is within
known good bounds. This has never been checked by runtime kernel
code (i.e. the lack of verification goes back to 1993) yet we assume
in many places that it is correct and verify other metdata against
it.
Add length verification to the AGF verifier. The length of the AGF
must be equal to the size of the AG specified in the superblock,
unless it is the last AG in the filesystem. In that case, it must be
less than or equal to sb->sb_agblocks and greater than
XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs operation will
allow to exist.
This requires a bit of rework of the verifier function. We want to
verify metadata before we use it to verify other metadata. Hence
we need to verify the AGF sequence numbers before using them to
verify the length of the AGF. Then we can verify the AGF length
before we verify AGFL fields. Then we can verifier other fields that
are bounds limited by the AGF length.
And, finally, by calculating agf_length only once into a local
variable, we can collapse repeated "if (xfs_has_foo() &&"
conditionaly checks into single checks. This makes the code much
easier to follow as all the checks for a given feature are obviously
in the same place.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: f1e1765aad7de7a8b8102044fc6a44684bc36180
If the journal geometry results in a sector or log stripe unit
validation problem, it indicates that we cannot set the log up to
safely write to the the journal. In these cases, we must abort the
mount because the corruption needs external intervention to resolve.
Similarly, a journal that is too large cannot be written to safely,
either, so we shouldn't allow those geometries to mount, either.
If the log is too small, we risk having transaction reservations
overruning the available log space and the system hanging waiting
for space it can never provide. This is purely a runtime hang issue,
not a corruption issue as per the first cases listed above. We abort
mounts of the log is too small for V5 filesystems, but we must allow
v4 filesystems to mount because, historically, there was no log size
validity checking and so some systems may still be out there with
undersized logs.
The problem is that on V4 filesystems, when we discover a log
geometry problem, we skip all the remaining checks and then allow
the log to continue mounting. This mean that if one of the log size
checks fails, we skip the log stripe unit check. i.e. we allow the
mount because a "non-fatal" geometry is violated, and then fail to
check the hard fail geometries that should fail the mount.
Move all these fatal checks to the superblock verifier, and add a
new check for the two log sector size geometry variables having the
same values. This will prevent any attempt to mount a log that has
invalid or inconsistent geometries long before we attempt to mount
the log.
However, for the minimum log size checks, we can only do that once
we've setup up the log and calculated all the iclog sizes and
roundoffs. Hence this needs to remain in the log mount code after
the log has been initialised. It is also the only case where we
should allow a v4 filesystem to continue running, so leave that
handling in place, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 8ebbf262d4684e035af5e7aa2a71cab636673a9b
If the current transaction holds a busy extent and we are trying to
allocate a new extent to fix up the free list, we can deadlock if
the AG is entirely empty except for the busy extent held by the
transaction.
This can occur at runtime processing an XEFI with multiple extents
in this path:
__schedule+0x22f at ffffffff81f75e8f
schedule+0x46 at ffffffff81f76366
xfs_extent_busy_flush+0x69 at ffffffff81477d99
xfs_alloc_ag_vextent_size+0x16a at ffffffff8141711a
xfs_alloc_ag_vextent+0x19b at ffffffff81417edb
xfs_alloc_fix_freelist+0x22f at ffffffff8141896f
xfs_free_extent_fix_freelist+0x6a at ffffffff8141939a
__xfs_free_extent+0x99 at ffffffff81419499
xfs_trans_free_extent+0x3e at ffffffff814a6fee
xfs_extent_free_finish_item+0x24 at ffffffff814a70d4
xfs_defer_finish_noroll+0x1f7 at ffffffff81441407
xfs_defer_finish+0x11 at ffffffff814417e1
xfs_itruncate_extents_flags+0x13d at ffffffff8148b7dd
xfs_inactive_truncate+0xb9 at ffffffff8148bb89
xfs_inactive+0x227 at ffffffff8148c4f7
xfs_fs_destroy_inode+0xb8 at ffffffff81496898
destroy_inode+0x3b at ffffffff8127d2ab
do_unlinkat+0x1d1 at ffffffff81270df1
do_syscall_64+0x40 at ffffffff81f6b5f0
entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8200007c
This can also happen in log recovery when processing an EFI
with multiple extents through this path:
context_switch() kernel/sched/core.c:3881
__schedule() kernel/sched/core.c:5111
schedule() kernel/sched/core.c:5186
xfs_extent_busy_flush() fs/xfs/xfs_extent_busy.c:598
xfs_alloc_ag_vextent_size() fs/xfs/libxfs/xfs_alloc.c:1641
xfs_alloc_ag_vextent() fs/xfs/libxfs/xfs_alloc.c:828
xfs_alloc_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:2362
xfs_free_extent_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:3029
__xfs_free_extent() fs/xfs/libxfs/xfs_alloc.c:3067
xfs_trans_free_extent() fs/xfs/xfs_extfree_item.c:370
xfs_efi_recover() fs/xfs/xfs_extfree_item.c:626
xlog_recover_process_efi() fs/xfs/xfs_log_recover.c:4605
xlog_recover_process_intents() fs/xfs/xfs_log_recover.c:4893
xlog_recover_finish() fs/xfs/xfs_log_recover.c:5824
xfs_log_mount_finish() fs/xfs/xfs_log.c:764
xfs_mountfs() fs/xfs/xfs_mount.c:978
xfs_fs_fill_super() fs/xfs/xfs_super.c:1908
mount_bdev() fs/super.c:1417
xfs_fs_mount() fs/xfs/xfs_super.c:1985
legacy_get_tree() fs/fs_context.c:647
vfs_get_tree() fs/super.c:1547
do_new_mount() fs/namespace.c:2843
do_mount() fs/namespace.c:3163
ksys_mount() fs/namespace.c:3372
__do_sys_mount() fs/namespace.c:3386
__se_sys_mount() fs/namespace.c:3383
__x64_sys_mount() fs/namespace.c:3383
do_syscall_64() arch/x86/entry/common.c:296
entry_SYSCALL_64() arch/x86/entry/entry_64.S:180
To avoid this deadlock, we should not block in
xfs_extent_busy_flush() if we hold a busy extent in the current
transaction.
Now that the EFI processing code can handle requeuing a partially
completed EFI, we can detect this situation in
xfs_extent_busy_flush() and return -EAGAIN rather than going to
sleep forever. The -EAGAIN get propagated back out to the
xfs_trans_free_extent() context, where the EFD is populated and the
transaction is rolled, thereby moving the busy extents into the CIL.
At this point, we can retry the extent free operation again with a
clean transaction. If we hit the same "all free extents are busy"
situation when trying to fix up the free list, we can safely call
xfs_extent_busy_flush() and wait for the busy extents to resolve
and wake us. At this point, the allocation search can make progress
again and we can fix up the free list.
This deadlock was first reported by Chandan in mid-2021, but I
couldn't make myself understood during review, and didn't have time
to fix it myself.
It was reported again in March 2023, and again I have found myself
unable to explain the complexities of the solution needed during
review.
As such, I don't have hours more time to waste trying to get the
fix written the way it needs to be written, so I'm just doing it
myself. This patchset is largely based on Wengang Wang's last patch,
but with all the unnecessary stuff removed, split up into multiple
patches and cleaned up somewhat.
Reported-by: Chandan Babu R <chandanrlinux@gmail.com>
Reported-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 6a2a9d776c4ae24a797e25eed2b9f7f33f756296
To avoid blocking in xfs_extent_busy_flush() when freeing extents
and the only busy extents are held by the current transaction, we
need to pass the XFS_ALLOC_FLAG_FREEING flag context all the way
into xfs_extent_busy_flush().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: b742d7b4f0e03df25c2a772adcded35044b625ca
Btrees that aren't freespace management trees use the normal extent
allocation and freeing routines for their blocks. Hence when a btree
block is freed, a direct call to xfs_free_extent() is made and the
extent is immediately freed. This puts the entire free space
management btrees under this path, so we are stacking btrees on
btrees in the call stack. The inobt, finobt and refcount btrees
all do this.
However, the bmap btree does not do this - it calls
xfs_free_extent_later() to defer the extent free operation via an
XEFI and hence it gets processed in deferred operation processing
during the commit of the primary transaction (i.e. via intent
chaining).
We need to change xfs_free_extent() to behave in a non-blocking
manner so that we can avoid deadlocks with busy extents near ENOSPC
in transactions that free multiple extents. Inserting or removing a
record from a btree can cause a multi-level tree merge operation and
that will free multiple blocks from the btree in a single
transaction. i.e. we can call xfs_free_extent() multiple times, and
hence the btree manipulation transaction is vulnerable to this busy
extent deadlock vector.
To fix this, convert all the remaining callers of xfs_free_extent()
to use xfs_free_extent_later() to queue XEFIs and hence defer
processing of the extent frees to a context that can be safely
restarted if a deadlock condition is detected.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: 347eb95b27eb97bebdc3ea7de23558216f4e2c90
Pointers drop_leaf and save_leaf are initialized with values that are never
read, they are being re-assigned later on just before they are used. Remove
the redundant early initializations and keep the later assignments at the
point where they are used. Cleans up two clang scan build warnings:
fs/xfs/libxfs/xfs_attr_leaf.c:2288:29: warning: Value stored to 'drop_leaf'
during its initialization is never read [deadcode.DeadStores]
fs/xfs/libxfs/xfs_attr_leaf.c:2289:29: warning: Value stored to 'save_leaf'
during its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: c3b880acadc95d6e019eae5d669e072afda24f1b
I found a corruption during growfs:
XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of
file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0
CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
Call Trace:
<TASK>
dump_stack_lvl+0x50/0x70
xfs_corruption_error+0x134/0x150
__xfs_free_extent+0x2c1/0x3c0
xfs_ag_extend_space+0x291/0x3e0
xfs_growfs_data+0xd72/0xe90
xfs_file_ioctl+0x5f9/0x14a0
__x64_sys_ioctl+0x13e/0x1c0
do_syscall_64+0x39/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
XFS (loop0): Corruption detected. Unmount and run xfs_repair
XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file
fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90
CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
Call Trace:
<TASK>
dump_stack_lvl+0x50/0x70
xfs_error_report+0x93/0xc0
xfs_trans_cancel+0x2c0/0x350
xfs_growfs_data+0x691/0xe90
xfs_file_ioctl+0x5f9/0x14a0
__x64_sys_ioctl+0x13e/0x1c0
do_syscall_64+0x39/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f2d86706577
The bug can be reproduced with the following sequence:
# truncate -s 1073741824 xfs_test.img
# mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img
# truncate -s 2305843009213693952 xfs_test.img
# mount -o loop xfs_test.img /mnt/test
# xfs_growfs -D 1125899907891200 /mnt/test
The root cause is that during growfs, user space passed in a large value
of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is
too small, new AG count will exceed UINT_MAX. Because of AG number type
is unsigned int and it would overflow, that caused nagcount much smaller
than the actual value. During AG extent space, delta blocks in
xfs_resizefs_init_new_ags() will much larger than the actual value due to
incorrect nagcount, even exceed UINT_MAX. This will cause corruption and
be detected in __xfs_free_extent. Fix it by growing the filesystem to up
to the maximally allowed AGs and not return EINVAL when new AG count
overflow.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Source kernel commit: d67790ddf0219aa0ad3e13b53ae0a7619b3425a2
While struct_size() is normally used in situations where the structure
type already has a pointer instance, there are places where no variable
is available. In the past, this has been worked around by using a typed
NULL first argument, but this is a bit ugly. Add a helper to do this,
and replace the handful of instances of the code pattern with it.
Instances were found with this Coccinelle script:
@struct_size_t@
identifier STRUCT, MEMBER;
expression COUNT;
@@
- struct_size((struct STRUCT *)\(0\|NULL\),
+ struct_size_t(struct STRUCT,
MEMBER, COUNT)
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: HighPoint Linux Team <linux@highpoint-tech.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Cc: Don Brace <don.brace@microchip.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Cc: megaraidlinux.pdl@broadcom.com
Cc: storagedev@microchip.com
Cc: linux-xfs@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230522211810.never.421-kees@kernel.org
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The unending stream of syzbot bug reports and overwrought filing of CVEs
for corner case handling (i.e. things that distract from actual user
complaints) in XFS has generated all sorts of of overheated rhetoric
about how every bug is a Serious Security Issue(tm) because anyone can
craft a malicious filesystem on a USB stick, insert the stick into a
victim machine, and mount will trigger a bug in the kernel driver that
leads to some compromise or DoS or something.
I thought that nobody would be foolish enough to automount an XFS
filesystem. What a fool I was! It turns out that udisks can be told
that it's okay to automount things, and then GNOME will do exactly that.
Including mounting mangled XFS filesystems!
<delete angry rant about poor decisionmaking and armchair fs developers
blasting us on X while not actually doing any of the work>
Turn off /this/ idiocy by adding a udev rule to tell udisks not to
automount XFS filesystems.
This will not stop a logged in user from unwittingly inserting a
malicious storage device and pressing [mount] and getting breached.
This is not a substitute for a thorough audit. This is not a substitute
for lklfuse. This does not solve the general problem of in-kernel fs
drivers being a huge attack surface. I just want a vacation from the
sh*tstorm of bad ideas and threat models that I never agreed to support.
v2: Add external logs to the list too, and document the var we set
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
abnormally set on buffer
We found an issue where repair failed in the fault injection.
$ xfs_repair test.img
...
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
- agno = 3
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
btree block 0/41901 is suspect, error -74
bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
bad data fork in inode 3306572
cleared inode 3306572
...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!
fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
$ xfs_db test.img
xfs_db> inode 3306572
xfs_db> p
core.magic = 0x494e
core.mode = 0100666 // regular file
core.version = 3
core.format = 3 (btree)
...
u3.bmbt.keys[1] = [startoff]
1:[6]
u3.bmbt.ptrs[1] = 41901 // btree root
...
$ hexdump -C -n 4096 41901.img
00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
...
The block data associated with inode 3306572 is abnormal, but check the CRC first
when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
will be refreshed in batches. When refresh to the disk, the data in bp will be
verified. At this time, if the data verification fails, resulting in a repair
error.
After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
doesn't need to set dirty flag, so that it won't trigger writeback verification
failure.
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
|
Merged early in 2023: Commit 480017957d638 xfs: remove restrictions for fsdax
and reflink. There needs to be a corresponding change to the mkfs.xfs manpage
to remove the incompatiblity statement.
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
Update all the necessary files for a 6.4.0 release.
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Teach the debugger to expose the "unwritten" flag in rmapbt keys so that
we can simulate an old filesystem writing out bad keys for testing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|