Age | Commit message (Collapse) | Author | Files | Lines |
|
Adding information on fields in e2fsck_struct
on how they are used when running parallel fsck.
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Sometimes, such as in orphan_inode case, e2fsck_pass1
is called after reading the block bitmaps. This results in
reading the block bitmap sequentially and multithreading
only gets kicked in later. Fix the thread count earlier
while setting up the file system.
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add -m option description to e2fsck.8 man page.
Rename e2fsck_struct fs_num_threads to pfs_num_threads to avoid
confusion with the ext2_filsys fs_num_threads field, and move
thread_info to be together with the other HAVE_PTHREAD fields.
Move ext2_filsys fs_num_threads to fit into the __u16 "pad" field
to avoid consuming one of the few remaining __u32 reserved fields.
Fix a few print format warnings.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
valgrind detected two memory leaks:
1) quota context is not released after merging.
2) @refcount_orig should be released
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
pfsck run on a clean fs should not return any errors.
Generate an image with possible features enabled,
especially EA shared blocks etc.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Verify multiple thread on a corrupted images hit following bug:
pass1.c:2902: e2fsck_pass1_thread_prepare:
Assertion `global_ctx->inodes_to_rebuild == NULL' failed.
Signal (6) SIGABRT si_code=SI_TKILL
./e2fsck/e2fsck[0x43829e]
/lib64/libpthread.so.0(+0x14b20)[0x7f3b45135b20]
/lib64/libc.so.6(gsignal+0x145)[0x7f3b44f2c625]
/lib64/libc.so.6(abort+0x12b)[0x7f3b44f158d9]
/lib64/libc.so.6(+0x257a9)[0x7f3b44f157a9]
/lib64/libc.so.6(+0x34a66)[0x7f3b44f24a66]
./e2fsck/e2fsck(e2fsck_pass1+0x1662)[0x423572]
./e2fsck/e2fsck(e2fsck_run+0x5a)[0x41611a]
./e2fsck/e2fsck(main+0x1608)[0x4121b8]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f3b44f171a3]
./e2fsck/e2fsck(_start+0x2e)[0x413dde]
@inodes_to_rebuild could be not NULL after we restart pass1
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
For multiple threads, different threads will try to
update mmp block at the same time, only allow one
thread to update it.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If we have a smaller inodes per group, default ra size could
be very small(etc 128KiB), this hurts performances.
Tune above 128K to 1M, i see pass1 time drop down from
677.12 seconds to 246 secons with 32 threads.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
e2fsck init memory according to filesystem inodes/dir numbers
recorded in the superblock, this should be aware of filesystem
number of threads, otherwise, oom happen.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use e2fsck_reset_context() to free memory to simpify
codes.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Before proceeding next inodes, waitting existed
fixing finished.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Only flag ALLOC OK after all threads finished without problem.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We tried to copy thread context to global context directly
and then copy back some saved variables before merging.
Since we have finished almost all necessary variables
in the e2fsck context, we could simplify codes, and
this could help us understand what is missing rather
than hide problems.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
tests covered by f_extent_htree.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This should not be kept, the reaons is similar to what
e2fsck_pass1 has done before.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
It will be possible that threads might append E2F_OPT_YES,
so we need merge options to global, test f_yesall cover this.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Several improvments for this patch:
1) move readahead_kb detection to preparing phase.
2) inode readahead should be aware of thread block group
boundary.
3) make readahead_kb aware of multiple threads.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
number of threads should not exceed flex bg numbers,
and output messages if we adjust threads number.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
-m option is introduced to specify number of threads for pfsck.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Now @block_found_map is no longer shared by multiple threads,
and @block_dup_map need be checked again after threads finish.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
EA blocks might be shared, merge them carefully.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Invalid bitmaps are splitted per thread, and we
should merge them after thread finish.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We could only use @found_map_block to find free blocks
after we have collectd all used blocks, so something like
handle_fs_bad_blocks(), ext2fs_create_resize_inode(),
e2fsck_pass1_dupblocks() really should be handled after
all threads has been finished.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Allow different threads to fix at the same time could
be dangerous and error-prone now, and most of time
parallel scanning and checking is important.
So this patch adds a mutex to serialize
fix operations during pass1.
And the good benefit of this, we don't need block
allocations and free, superblock updates protection
any more, since only fix operations during pass1
could touch them.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Every threads calculate its own quota accounting,
merge them after threads finish.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
e2fsck might restart after pass1, so we should keep
flags if possible, this patch try to fix f_illitable_flexbg failure
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
@dirs_to_hash list need be merged after threads finish,
test covered by t_dangerous.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Merge properly.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
merge fs flags properly.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Merge counts properly.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
These debug codes are added to run the multiple pass1 check
thread one by one in order. If all the codes are correct,
fsck of multiple threads should have exactly the same outcome
with single thread.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Merge dblist properly.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Merge inode_count and inode_link_info properly after
threads finish.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Badblocks should be merged properly after threads finish.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Only rbtree support merge operation now, use it for bitmaps.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
dir_info need be merged after thread finish.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Binary search is now used when inserting an dir info to the array.
Memmove is now used when moving array. Both of them improves
the performance of inserting.
This patch is also a prepartion for the merging of two dir db
arrays.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Global variables used in pass1 check are changed to local variables
in this patch. This will avoid conflict between threads.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
A new method merge_bmap has been added to bitmap operations. But
only red-black bitmap has that operation now.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When multi-thread fsck is enabled, logs printed from multiple
threads could overlap with each other. The overlap sometimes
makes the logs unreadable because log_out() is used multiple times
for a single line.
This patch adds leading [Thread XXX] to each logs if multi-thread
is enabed by -m option.
This patch also adds message to show the group ranges and inode
numbers for each thread, which is useful for debuging multi-thread
check.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The start/end groups of a thread is calculated according to the
thread number. But still, only one thread is used to check.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When multi-threads are used for check, each thread needs to jump
to different group in pass1 check. This patch adds the group
jumping support. But still, only one thread is used to check.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch creates only one thread to do pass1 check if pthreads are
enabled. The same codes can be used to create multiple threads, but
other functions need to be modified to get ready for that.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When multi-threads are used, different logs should be created
for different threads. Each thread has log files with suffix
of ".$THREAD_INDEX".
And this patch adds f_multithread_logfile test case.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch also add writethrough flag to the thread io-channel.
When multiple threads write the same disk, we don't want the
data being saved in memory cache. This will be useful in the
future, but even without that flag, the tests can be passed too.
This patch also cleanup the io channel cache of the global
context. Otherwise, after pass1 step, the next steps would use
old data saved in the cache. And the cached data might have
already been overwritten in pass1.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch copies badblocks when the copying fs.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch copies bitmap when the copying context. In the
multi-thread fsck, each thread use different bitmap that copied
from the glboal bitmap. And Bitmaps from multiple threads will
be merged into a global one after the pass1 finishes.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Adding the assert would simplify the copying of context.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
icache of fs will be rebuilt when needed, so after copying
fs, icache can be inited to NULL.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch only copy the fs to a new one when -m is enabled.
It doesn't actually start any thread. When pass1 test finishes,
the new fs is copied back to the original context.
This patch handles the fs fields in dblist, inode_map and block_map
properly.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch only copy the context to a new one when -m is enabled.
It doesn't actually start any thread. When pass1 test finishes,
the new context is copied back to the original context.
Since the signal handler only changes the original context, so
add global_ctx in "struct e2fsck_struct" and use that to check
whether there is any signal of canceling.
This patch handles the long jump properly so that all the existing
tests can be passed even the context has been copied. Otherwise,
test f_expisize_ea_del would fail when aborting.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
-m option is added but no actual functionality is added. This
patch only adds the logic that when -m is specified, one of
-p/-y/-n options should be specified. And when -m is specified,
-C shouldn't be specified and the completion progress report won't
be triggered by sending SIGUSR1/SIGUSR2 signals. This simplifies
the implementation of multi-thread fsck in the future.
Completion progress support with multi-thread fsck will be added
back after multi-thread fsck implementation is finished. Right
now, disable it to simplify the implementation of multi-thread fsck.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
PTHREAD_CFLAGS is set by AX_PTHREADS, and these flags need to be
included when linking executables.
Fixes: bdcd5f22203f ("Add configure and build support for the pthreads library")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Clang gets unhappy when passing an unsigned char to string functions.
For better or for worse we use __u8[] in the definition of the
superblock. So cast them these to "char *" to prevent clang
build-time warnings.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This avoids UBSAN sanitizer warnings, since &(0->member) is
technically undefined per the C standard.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fixes: 93df80d2409d ("Teach makefiles... the target all-static")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Left shifting the pid by 16 bits can cause a UBSAN warning if the pid
is greater than or equal to 2**16. It doesn't matter since we're just
using the pid to seed for a pseudo-random number generator, but
silence the warning by just swapping the high and low 16 bits of the
pid instead.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Explain which valid block sizes mke2fs supports in more detail.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The $(SYSLIBS) was missing when linking the e4crypt application. This is
available in the e4crypt.profiled variant, so I assume this was just
missing in the normal variant and is not left out intentionally.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We were not checking the return value of check_fsck_needed() when
checking to clear the dir_index feature. As a result, tune2fs would
print that the file system needed to be checked first, but then go
ahead and clear the dir_index flag.
Addresses-Coverity-Bug: 1467671
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
params.num_journal_blocks is an unsigned value so it can never be less
than zero.
Addresses-Coverity-Bug: 1472250
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Addresses-Coverity-Bug: 1472249
Addresses-Coverity-Bug: 1472253
Addresses-Coverity-Bug: 1472254
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Addresses-Coverity-Bug: 1472252
Addresses-Coverity-Bug: 1472253
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Addresses-Coverity-Bug: 1467672
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fixes-Coverity-Bug: 1464575
Fixes-Coverity-Bug: 1464571
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fixes: e2e58d312804 ("ext2fs: parallel bitmap loading")
Fixes-Coverity-Bug: 147255
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add fast commit support for debugfs logdump.
This commit also adds fast_commit.h that contains the necessary
helpers needed for fast commit replay. Note that this file is also
byte by byte identical with kernel's fast_commit.h.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
dumpe2fs tool now is capable of reporting number of fast commit
blocks. There were slight changes in the output of dumpe2fs outside of
fast commits. This patch fixes the regression tests to expect the new
output.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch makes number of fast commit blocks configurable. Also, the
number of fast commit blocks can now be seen in dumpe2fs output.
$ ./misc/mke2fs -O fast_commit -t ext4 image
mke2fs 1.46-WIP (20-Mar-2020)
Discarding device blocks: done
Creating filesystem with 5120 1k blocks and 1280 inodes
Allocating group tables: done
Writing inode tables: done
Creating journal (1040 blocks): done
Writing superblocks and filesystem accounting information: done
$ ./misc/dumpe2fs image
dumpe2fs 1.46-WIP (20-Mar-2020)
...
Journal features: (none)
Total journal size: 1040k
Total journal blocks: 1040
Max transaction length: 1024
Fast commit length: 16
Journal sequence: 0x00000001
Journal start: 0
$ ./misc/mke2fs -O fast_commit -t ext4 image -J fast_commit_size=256,size=1
mke2fs 1.46-WIP (20-Mar-2020)
Creating filesystem with 5120 1k blocks and 1280 inodes
Allocating group tables: done
Writing inode tables: done
Creating journal (1280 blocks): done
Writing superblocks and filesystem accounting information: done
$ ./misc/dumpe2fs image
dumpe2fs 1.46-WIP (20-Mar-2020)
...
Journal features: (none)
Total journal size: 1280k
Total journal blocks: 1280
Max transaction length: 1024
Fast commit length: 256
Journal sequence: 0x00000001
Journal start: 0
This patch also adds information about fast commit feature in mke2fs
and tune2fs man pages.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch adds new libext2fs that allow configuring number of fast
commit blocks in journal superblock. We also add a struct
ext2fs_journal_params which contains number of fast commit blocks and
number of normal journal blocks. With this patch, the preferred way
for configuring number of blocks with and without fast commits is:
struct ext2fs_journal_params params;
ext2fs_get_journal_params(¶ms, ...);
params.num_journal_blocks = ...;
params.num_fc_blocks = ...;
ext2fs_create_journal_superblock2(..., ¶ms, ...);
OR
ext2fs_add_journal_inode3(..., ¶ms, ...);
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch makes recovery.c identical with fast commit kernel changes.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In order to make recovery.c identical with kernel, we need endianness
conversion macros (such as cpu_to_be32 and friends) defined in
e2fsprogs. This patch defines these macros and also fixes recovery.c
to use these. These macros are also needed for fast commit recovery
patches later in this series.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The function calculate_summary_stats sets the global metadata of the
file system. Tune2fs had this function defined statically in
tune2fs.c. Fast commit replay needs this function to set global
metadata at the end of the replay phase. So, move this function to
libext2fs.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In our benchmarking for PiB size filesystem, pass5 takes
10446s to finish and 99.5% of time takes on reading bitmaps.
It makes sense to reading bitmaps using multiple threads,
a quickly benchmark show 10446s to 626s with 64 threads.
[ This has all of many bug fixes for rw_bitmaps.c from the original
luster patch set collapsed into a single commit. In addition it has
the new ext2fs_rw_bitmaps() api proposed by Ted. ]
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add initial implementation support for the unix_io manager.
Applications which want to use threading should pass in
IO_FLAG_THREADS when opening the channel. Channels which support
threading (which as of this commit is unix_io and test_io if the
backing io_manager supports threading) will set the
CHANNEL_FLAGS_THREADS bit in io->flags. Library code or applications
can test if threading is enabled by checking this flag.
Applications using libext2fs can pass in EXT2_FLAG_THREADS to
ext2fs_open() or ext2fs_open2() to request threading support.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Support for pthreads can be forcibly disabled by passing
"--without-pthread" to the configure script.
The actual changes in this commit are in configure.ac and MCONFIG.in;
the other files were generated as a result of running aclocal,
autoconf, and autoheader on a Debian testing system.
Note: the autoconf-archive package must now be installed before
rerunning aclocal, to supply the AX_PTHREAD macro.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
Currently, when constructing the <default> configuration pseudo-file using
the profile-to-c.awk script we will just pass the double quotes as they
appear in the mke2fs.conf.
This is problematic, because the resulting default_profile.c will either
fail to compile because of syntax error, or leave the resulting
configuration invalid.
It can be reproduced by adding the following line somewhere into
mke2fs.conf configuration and forcing mke2fs to use the <default>
configuration by specifying nonexistent mke2fs.conf
MKE2FS_CONFIG="nonexistent" ./misc/mke2fs -T ext4 /dev/device
default_mntopts = "acl,user_xattr"
^ this will fail to compile
default_mntopts = ""
^ this will result in invalid config file
Syntax error in mke2fs config file (<default>, line #4)
Unknown code prof 17
Fix it by escaping the double quotes with a backslash in
profile-to-c.awk script.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The support of setting (and reading) of passive translators from
GNU/Linux has been added to the Linux kernel by the commit [1].
The name index '10' has been reserved for GNU/Hurd.
Hurd passive translators are stored as a xattr value with name
"gnu.translator" [2].
If "gnu.translator" xattr value has been set before calling
mkfs.ext2, it will segfault since "gnu." is not present in
ea_names[].
$ setfattr -n gnu.translator -v "/hurd/exec\0" ${TARGET_DIR}/servers/exec
$ mkfs.ext2 -d ${TARGET_DIR} -o hurd -O ext_attr rootfs.ext2 "1G"
Adding "gnu." to ea_names[], allow to create ext2 filesystem
for GNU/Hurd with passive translator already set.
[1] https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=a04c7bf83172faa7cb080fbe3b6c04a8415ca645
[2] https://lists.gnu.org/archive/html/bug-hurd/2016-08/msg00075.html
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
It is possible to crash filefrag with a "Floating point exception" in
two different scenarios:
1. When fstat() returns a device ID set to 0
2. When FIGETBSZ ioctl returns a blocksize of 0
In both scenarios a divide-by-zero will occur in frag_report() because
variable blksize will be set to zero.
I've managed to trigger this crash with an old CephFS kernel client,
using xfstest generic/519. The first scenario has been fixed by kernel
commit 75c9627efb72 ("ceph: map snapid to anonymous bdev ID"). The
second scenario is also fixed with commit 8f97d1e99149 ("vfs: fix
FIGETBSZ ioctl on an overlayfs file").
However, it is desirable to handle these two scenarios gracefully by
checking these conditions explicitly.
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
populate_fs do copy the xattrs for all files and directories, but the
root directory is skipped and as a result its extended attributes aren't
set. This is an issue when using mkfs to build a full system image that
can be used with SElinux in enforcing mode without making any runtime
fix at first boot.
This patch adds logic to set the root directory's extended attributes.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
These are the changes which are needed after running gettextize to
update to gettext 0.19.8 in the previous commit.
* Add support for maintainer mode (which doesn't do as much given
that gettext now has settings in Makevars which allows us to
suppress automatic updates of the po and gmo files)
* Add support to expand the '@' abbreviations in e2fsck/problem.c
and give an explanation of how they work for translators
* Add support for configure --enable-verbose-makecmds and default to
"kernel-style" quieter make output --- this makes it easier
to see warnings and errors by suppressing the distracting
details.
* Teach the makefile where to find the generated error table C files
in the build directory.
* Add make targets (e.g., all-static, fullcheck, coverage.txt) which
are required by the top-level Makefile.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This also removes the built-in "intl" directory as this is now
considered deprecated by the gettext package. This means that we
won't try to use an internal version of gettext if it's not installed
on the build system. We will simply disable NLS support in that case.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
They are pure duplicates.
Signed-off-by: Benno Schulenberg <bensberg@telfort.nl>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The logic for handling 64-bit structure elements was reversed, which
caused attempts to set fields like kbytes_written to fail:
% debugfs -w /tmp/foo.img
debugfs 1.45.6 (20-Mar-2020)
debugfs: set_super_value kbytes_written 1024
64-bit field kbytes_written has a second 64-bit field
defined; BUG?!?
https://github.com/tytso/e2fsprogs/issues/36
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In the case where mkdir -p is not thread-safe (for example, if the
build environment is using busybox's mkdir) the configure script will
fall back to the slow (but safe) install-sh script. In that case
MKDIR_P will be using a relative pathname; so we can't use speed
optimization of defining configure substitutions in MCONFIG.in, since
the substitution will be different depending on depth of the
subdirectory in the Makefile.in file.
https://github.com/tytso/e2fsprogs/issues/51
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
For 1k block file systems, resizing a file system larger than
1073610752 blocks will result in the size of the block group
descriptors to be so large that it will overlap with the backup
superblock in block group #1. This problem can be reproduced via:
mke2fs -t ext4 /tmp/foo.img 200M
resize2fs /tmp/foo.img 1T
e2fsck -f /tmp/foo.img
https://github.com/tytso/e2fsprogs/issues/50
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Comparing unsigned int with ULONG_MAX is always false.
Signed-off-by: Yi Kong <yikong@google.com>
Change-Id: Iae02aad1bcb271d3468828977be288ad04333821
From AOSP commit: 757a4d672dae1a15c57f5f0705ba90ed007da7e6
|
|
This is no longer a transitive include of android_filesystem_config.h
Bug: 149785767
Test: build
Change-Id: I954adf7879fa811eb2b290a0983c84d47ecae26c
From AOSP commit: 9fa6bed983d5ddd7226af647a0c4c0d922227be4
|
|
|
|
Use the letter 'x' to set/get dax attribute on a directory/file.
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
While file has a large inode number (> 0xFFFFFFFF), mkfs.ext4 could
not parse hardlink correctly.
Prepare three hardlink files for mkfs.ext4
$ ls -il rootfs_ota/a rootfs_ota/boot/b rootfs_ota/boot/c
11026675846 -rw-r--r-- 3 hjia users 0 Jul 20 17:44 rootfs_ota/a
11026675846 -rw-r--r-- 3 hjia users 0 Jul 20 17:44 rootfs_ota/boot/b
11026675846 -rw-r--r-- 3 hjia users 0 Jul 20 17:44 rootfs_ota/boot/c
$ truncate -s 1M rootfs_ota.ext4
$ mkfs.ext4 -F -i 8192 rootfs_ota.ext4 -L otaroot -U fd5f8768-c779-4dc9-adde-165a3d863349 -d rootfs_ota
$ mkdir mnt && sudo mount rootfs_ota.ext4 mnt
$ ls -il mnt/a mnt/boot/b mnt/boot/c
12 -rw-r--r-- 1 hjia users 0 Jul 20 17:44 mnt/a
14 -rw-r--r-- 1 hjia users 0 Jul 20 17:44 mnt/boot/b
15 -rw-r--r-- 1 hjia users 0 Jul 20 17:44 mnt/boot/c
After applying this fix
$ ls -il mnt/a mnt/boot/b mnt/boot/c
12 -rw-r--r-- 3 hjia users 0 Jul 20 17:44 mnt/a
12 -rw-r--r-- 3 hjia users 0 Jul 20 17:44 mnt/boot/b
12 -rw-r--r-- 3 hjia users 0 Jul 20 17:44 mnt/boot/c
Since commit [382ed4a1 e2fsck: use proper types for variables][1]
applied, it used ext2_ino_t instead of ino_t for referencing inode
numbers, but the type of is_hardlink's `ino' should not be instead,
The ext2_ino_t is 32bit, if inode > 0xFFFFFFFF, its value will be
truncated.
Add a debug printf to show the value of inode, when it check for hardlink
files, it will always return false if inode > 0xFFFFFFFF
|--- a/misc/create_inode.c
|+++ b/misc/create_inode.c
|@@ -605,6 +605,7 @@ static int is_hardlink(struct hdlinks_s *hdlinks, dev_t dev, ext2_ino_t ino)
| {
| int i;
|
|+ printf("%s %d, %lX, %lX\n", __FUNCTION__, __LINE__, hdlinks->hdl[i].src_ino, ino);
| for (i = 0; i < hdlinks->count; i++) {
| if (hdlinks->hdl[i].src_dev == dev &&
| hdlinks->hdl[i].src_ino == ino)
Here is debug message:
is_hardlink 608, 2913DB886, 913DB886
The length of ext2_ino_t is 32bit (typedef __u32 __bitwise ext2_ino_t;),
and ino_t is 64bit on 64bit system (such as x86-64), recover `ino' to ino_t.
[1] https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=382ed4a1c2b60acb9db7631e86dda207bde6076e
Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If we are creating filesystem on DAX capable device, warn if set block
size is incompatible with DAX to give admin some hint why DAX might not
be available.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Providing -S and a path to 'add_key' previously exhibited an
unintuitive behavior: instead of using the salt explicitly provided by
the user, e4crypt would use the salt obtained via
EXT4_IOC_GET_ENCRYPTION_PWSALT on the path. This was because
set_policy() was still called with NULL as salt.
With this change we now remember the explicitly provided salt (if any)
and use it as argument for set_policy().
Eventually
e4crypt add_key -S s:my-spicy-salt /foo
will now actually use 'my-spicy-salt' and not something else as salt
for the policy set on /foo.
Signed-off-by: Florian Schmaus <flo@geekplace.eu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If tune2fs cannot perform the requested change, ensure that the MMP
block is reset to the unused state before exiting. Otherwise, the
filesystem will be left with mmp_seq = EXT4_MMP_SEQ_FSCK set, which
prevents it from being mounted afterward:
EXT4-fs warning (device dm-9): ext4_multi_mount_protect:311:
fsck is running on the filesystem
Add a test to try some failed tune2fs operations and verify that the
MMP block is left in a clean state afterward.
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13672
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
len argument in string_copy() is int, but it is used with malloc(),
strlen(), strncpy() and some callers use sizeof() to pass value in. So
it really ought to be size_t rather than int. Fix it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the file system is corrupted, there is a potential of a read-only
buffer overrun. Fortunately, we don't actually use the result of that
pointer dereference, and the overrun is at most 64k.
Google-Bug-Id: #158564737
Fixes: eb88b751745b ("libext2fs: make ext2fs_dirent_has_tail() more strict")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fix bitrot for building a restricted version of debugfs, which does
not require read/write access to the file system, and which only
allows access to the file system metadata.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When opening a file system which is mounted, it's possible that when
ext2fs_open2() is racing with the kernel modifying the orphaned inode
list, the superblock's checksum could be incorrect. So retry reading
the superblock in the hopes that the problem will self-correct.
Google-Bug-Id: 151453112
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When allocating blocks for an indirect block mapped file, accumulate
blocks to be zero'ed and then call ext2fs_zero_blocks2() to zero them
in large chunks instead of block by block.
This significantly speeds up mkfs.ext3 since we don't send a large
number of ZERO_RANGE requests to the kernel, and while the kernel does
batch write requests, it is not batching ZERO_RANGE requests. It's
more efficient to batch in userspace in any case, since it avoids
unnecessary system calls.
Reported-by: Mario Schuknecht <mario.schuknecht@dresearch-fe.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fixes: 3f0cf6475399 ("e2fsprogs: add support for 3-level htree")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The pointer operand to the binary `+` operator must be to a complete
object type.
Signed-off-by: Michael Forney <mforney@mforney.org>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Similar to encrypt and verity, once the stable_inodes feature has been
enabled there may be files anywhere on the filesystem that require this
feature. Therefore, in general it's unsafe to allow clearing it. Don't
allow tune2fs to do so. Like encrypt and verity, it can still be
cleared with debugfs if someone really knows what they're doing.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The stable_inodes feature is intended to indicate that it's safe to use
IV_INO_LBLK_64 encryption policies, where the encryption depends on the
inode numbers and thus filesystem shrinking is not allowed. However
since inode numbers are not unique across filesystems, the encryption
also depends on the filesystem UUID, and I missed that there is a
supported way to change the filesystem UUID (tune2fs -U).
So, make 'tune2fs -U' report an error if stable_inodes is set.
We could add a separate stable_uuid feature flag, but it seems unlikely
it would be useful enough on its own to warrant another flag.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There is an off-by-one error in dx_grow_tree() when checking whether we
can add another level to the tree. Thus we can grow tree too much
leading to possible crashes in the library or corrupted filesystem. Fix
the bug.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
dx_lookup() uses errcode_t return values. As such anything non-zero is
an error, not values less than zero. Fix the error checking to avoid
crashes on corrupted filesystems.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The error code allows the kernel to bucket the possible cause of an
ext4 corruption by Unix errno codes (e.g., EIO, EFSBADCRC, ESHUTDOWN,
etc.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since r_inline_xattr is using an extended regexp, we need grep -E on
some implementations of grep.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The loff_t type is a glibc'ism and is not fully portable. Use
ext2_loff_t instead.
Fixes: 382ed4a1c2b6 ("e2fsck: use proper types for variables")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Matthias Andree <matthias.andree@gmx.de>
|
|
v1.45.6
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Bug: 149391799
Change-Id: I5183755915710e28a603e3f718f16813ea9991a0
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
From AOSP commit: a13a88d0d557a396f63702fb8db008487e2384d7
|
|
Currently, basefs_allocator will iterate through blocks owned by an
inode until it finds a block that is free. This effectively ignores the
logical to physical block mapping, which can lead to a bigger delta in
the final image.
An example of how this can happen is if the BaseFS has a deduplicated
block (D), that is not deduplicated in the new image:
Old image: 1 2 3 D 4 5
New image: 1 2 3 ? 4 5
If the allocator sees that "D" is not usable, and skips to block "4",
we will have a non-ideal assignment.
Bad image: 1 2 3 4 5 ?
This patch refactors get_next_block() to acquire at most one block. It's
called a single time, and then only called in a loop if absolutely no
blocks can be acquired from anywhere else.
In a Virtual A/B simulation, this reduces the COW snapshot size by about
90MB.
Bug: 139201772
Test: manual test
Change-Id: I354f0dee1ee191dba0e1f90491ed591dba388f7f
From AOSP commit: a495b54f89b2ec0e46be8e3564e4852c6434687c
|
|
By iterating over blocks to write BaseFS, holes in the extent tree are
skipped. This is problematic because the purpose of BaseFS is to
preserve the logical to physical block assignment between builds. By not
preserving the location of holes, the assignment can be incorrect.
For example, consider the following block list for a file:
1 2 3 0 4 5
If this is recorded as:
1 2 3 4 5
If the first block changes to a hole, the intended mapping will not be
preserved at all:
0 1 2 0 3
This patch makes two changes to e2fsdroid to fix this. The first change
is that holes are now recorded in BaseFS, by iterating over the extent
tree rather than the block list, and inserting zeroes where appropriate.
The second change is that the block allocator now recognizes when blocks
have been skipped (either to deduplication or to holes), and skips the
same number of logical blocks in BaseFS as well.
In a Virtual A/B simulation, this reduces the COW snapshot size by
approximately 100MB.
Bug: 139201772
Test: m target-files-package, inspect .map files
From AOSP commit: d391f3bf38cbe51718d5c3c0bb3e72b1a9978625
|
|
When BaseFS specifies the same block for two files, it gets added to a
separate "dedup" bitmap, and removed from the free block bitmap. If the
new build does not use every block in this bitmap, there will be an
inconsistency: the block bitmap marks blocks as in-use when they are
actually free. Although this doesn't matter for AOSP's read-only file
systems, it does cause e2fsck to complain, which breaks the build.
Fix the inconsistency by properly freeing all unused blocks within the
dedup block set.
Bug: 139201772
Test: build AOSP using BaseFS
Change-Id: I6b6511eb713a56fec932f1d5668f1766d64d9479
From AOSP commit: 346bee6f8b97aefe7714688f738606c116099fbc
|
|
Enable the build of e2freefrag to monitor the free space fragmentation
in ext2/3/4 filesystems.
Bug: 146078546
Test: m + e2freefrag on device
Change-Id: Ia56e443a789ae881a03bdaeae1093567e1736c4c
Signed-off-by: Alessio Balsini <balsini@google.com>
From AOSP commit: ab77f6c79f3dab697cd56ad3b793e7d555ac9415
|
|
We want to start shipping the toybox chattr and lsattr on the device all
the time, so the build system rightly complains that then we'd have two
modules with the same name.
I went with a suffix rather than a prefix so that tab completion works
for folks still wanting to use the e2fsprogs versions.
Bug: http://b/147769529
Test: builds
Change-Id: Ib904fa6c709d29ce709302c61e452383c02cb9a3
From AOSP commit: 8525a455e7410461560a99a42feb0dbfabab5c8e
|
|
Test: pass
Bug: 147347110
Change-Id: Ie800ba1b56773dcc1b6563c4f19c27eccb9ffc1a
From AOSP commit: f5a8e8fdefd78deae971a475a7fa43734eef205e
|
|
blkid_types.h and ext_types.h having the exact same content results in
mismatches in remote RBE builds. Given blkid_types.h is actually
supposed to be different, changing this to remove the mismatch.
Test: Ran a build, and all e2fsprogs mismatches went away between
local/remote.
Change-Id: I63ab1719ee1d0ccd28907f0bc99531260251fa99
From AOSP commit: ec10b513c283706f984edeec47301b0661f7d283
|
|
Bug: 144477678
Test: BUILD_HOST_static=1 m resize2fs; m resize2fs
Change-Id: I0986deccfe496153e662dcc3cc2fb1ffb6973c21
From AOSP commit: 2c767b86591c64cd7b84c5619e8d8b8a0afd557e
|
|
Bug: 144477678
Test: m debugfs_static
Change-Id: I7c360a2a381f8508578d14c32bbb280f386dd925
From AOSP commit: 742bb05a401eb2feb6caaee1c8d66fc1c37eef77
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Addresses-Debian-Bug: #953493
Addresses-Debian-Bug: #953494
Addresses-Debian-Bug: #951808
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This commit adds the function e2p_feature_to_string() which allows the
caller to pass in a preallocated buffer.
Google-Bug-Id: 16978603
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The bitmap array's set/get bitmap_range functions were not subtracting
out bitmap->start. This doesn't matter for normal file systems, since
the bitmap->start is zero or one, and the passed-in starting range is
a multiple of eight, and the starting range is then divided by 8.
But with a non-standard/fuzzed file system, bitmap->start could be
significantly larger, and this could then lead to a array out of
bounds memory reference.
Google-Bug-Id: 147849134
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When directory link count is set to overflow value (1) but during pass 4
we find out the exact link count would fit, we either silently fix this
(which is not great because e2fsck then reports the fs was modified but
output doesn't indicate why in any way), or we report that link count is
wrong and ask whether we should fix it (in case -n option was
specified). The second case is even more misleading because it suggests
non-trivial fs corruption which then gets silently fixed on the next
run. Similarly to how we fix up other non-problems, just create a new
error message for the case directory link count is not overflown anymore
and always report it to clarify what is going on.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 4ebce13292f54c96f43dcb1bd1d5b8df5dc8749d)
|
|
|
|
When clearing dir_index feature while metadata_csum is enabled, we have
to rewrite checksums of all indexed directories to update checksums of
internal tree nodes.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Indexed directories have somewhat different format when metadata_csum is
enabled. Add test to excercise linking in indexed directories and e2fsck
rehash code in this case.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Modify f_large_dir test to create indexed directory and create entries
in it. That way the new code in ext2fs_link() for addition of entries
into indexed directories gets executed including various special cases
when growing htree.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Implement proper creation of new directory entries in htree directories
in ext2fs_link(). So far we just cleared EXT2_INDEX_FL and treated
directory as unindexed however this results in mismatched checksums if
metadata checksums are in use because checksums are placed in different
places depending on htree node type.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently, ext2fs_mkdir() and ext2fs_symlink() update allocation bitmaps
and other information only close to the end of the function, in
particular after calling to ext2fs_link(). When ext2fs_link() will
support indexed directories, it will also need to allocate blocks and
that would cause filesystem corruption in case allocation info isn't
properly updated. So make sure ext2fs_mkdir() and ext2fs_symlink()
update allocation info before calling into ext2fs_link().
[ Added error handling so the calls to ext2fs_{block,inode}_alloc_stats()
can be undone if the newly created directory or symlink can not be linked
into the directory. -- TYT ]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The libattr has stopped providing attr/xattr.h; we now use
sys/xattr.h. So there is no longer any reason to require that the
libattr1-dev package be present when building e2fsprogs, so drop it.
Addresses-Debian-Bug: #953926
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fixes: 70303df16ca6 ("e2fsck: consistently use ext2fs_get_mem()")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Previously ext2fs_dirent_has_tail() would return true if the directory
was corrupted. If the directory is corrupted, then by definition it
doesn't have a valid checksum tail.
(This fixes a big-endian failure on the master branch.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Plural form "directories" should be used along with "files".
"id's" should be "ids" (i.e., plural form, not apostrophe).
"much" should "must".
Signed-off-by: Sawood Alam <ibnesayeed@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Introduced with commit e7236a9476cd1fa5296fbc4aa573b36426901a08,
it was later renamed to encoding, and turned into a fs_type-only
option with commit 28887533bb64db318e74c38cd9c0ad6d0bb2ced2.
Hence, remove an option that does not exist in the default
configuration.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
E2fsck directory rehashing code can fail with ENOSPC due to a bug in
ext2fs_htree_intnode_maxrecs() which fails to take metadata checksum
into account and thus e.g. e2fsck can decide to create 1 indirect level
of index tree when two are actually needed. Fix the logic to account for
metadata checksum.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When directory link count is set to overflow value (1) but during pass 4
we find out the exact link count would fit, we either silently fix this
(which is not great because e2fsck then reports the fs was modified but
output doesn't indicate why in any way), or we report that link count is
wrong and ask whether we should fix it (in case -n option was
specified). The second case is even more misleading because it suggests
non-trivial fs corruption which then gets silently fixed on the next
run. Similarly to how we fix up other non-problems, just create a new
error message for the case directory link count is not overflown anymore
and always report it to clarify what is going on.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
Currently the ext2fs_check_mount_point() will use the open(O_EXCL) check
on linux after all the other checks are done. However it is not
necessary to check mntent if open(O_EXCL) succeeds because it means that
the device is not mounted.
Moreover the commit ea4d53b7 introduced a regression where a following
set of commands fails:
vgcreate mygroup /dev/sda
lvcreate -L 1G -n lvol0 mygroup
mkfs.ext4 /dev/mygroup/lvol0
mount /dev/mygroup/lvol0 /mnt
lvrename /dev/mygroup/lvol0 /dev/mygroup/lvol1
lvcreate -L 1G -n lvol0 mygroup
mkfs.ext4 /dev/mygroup/lvol0 <<<--- This fails
It fails because it thinks that /dev/mygroup/lvol0 is mounted because
the device name in /proc/mounts is not updated following the lvrename.
Move the open(O_EXCL) check before the mntent check and return
immediatelly if the device is not busy.
Fixes: ea4d53b7 ("libext2fs/ismounted.c: check device id in advance to skip false device names")
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reported-by: Karel Zak <kzak@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Set the directory for directories in cases where the owner permissions
is not rwx. This was reported[1] by Robert Yang but we are using a
different approach to fixing the issue.
[1] https://lore.kernel.org/r/1582542522-97508-1-git-send-email-liezhi.yang@windriver.com
Also set the permissions in a more portable way by making a
distinction between the host OS's permissions stats and Linux's
permissions. We still assume the low 12 bits are the historical Unix
assignments, but we don't assume ST_IFMT bits are the same as Linux's.
Reported-by: Robert Yang <liezhi.yang@windriver.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If a filesystem image is on tmpfs, opening it with O_DIRECT for
reading the MMP will fail. This is unnecessary, since the image
file can't really be open on another node at this point. If the
open with O_DIRECT fails, retry without it when plausible.
Remove the special-casing of tmpfs from the mmp test cases.
Change-Id: I41f4b31657b06f62f10be8d6e524d303dd36a321
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In alloc_size_dir() it multiples signed ints when allocating the
buffer for rehashing an htree-indexed directory. This will overflow
when the directory size is above 4GB, which is possible with largedir
directories having about 100M entries, assuming an average 3/4 leaf
fullness and 24-byte filenames, or fewer with longer filenames.
The same problem exisgs in get_next_block().
Similarly, the out_dir struct used a signed int for the number of
blocks in the directory, which may result in a negative size if the
directory is over 2GB (about 50M entries or fewer).
Use appropriate unsigned variables for block counts, and use larger
types for calculating the byte count for memory offsets/sizes.
Such large directories not been seen yet, but are not too far away.
The ext2fs_get_array() function will properly calculate the needed
memory allocation, and detect overflow on 32-bit systems.
Add ext2fs_resize_array() to do the same for array resize.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Avoid overflowing the column-width calc printing files over 4B blocks.
Document the [KMG] suffixes for the "-b <blocksize>" option.
The blocksize is limited to at most 1GiB blocksize to avoid shifting
all extents down to zero GB in size. Even the use of 1GB blocksize
is unlikely, but non-ext4 filesystems may use multi-GB extents.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Consistently use ext2fs_get_mem() and ext2fs_free_mem() instead of
calling malloc() and free() directly in e2fsck. In several places
it is possible to use ext2fs_get_memzero() instead of explicitly
calling memset() on the memory afterward.
This is just a code cleanup, and does not fix any specific bugs.
[ Fix up library dependencies in e2fsck/Makefile.in to fix "make
check" breakages. -- TYT ]
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Even though we don't have support for filesystems with over 4B inodes
in the current e2fsprogs, this may happen in the future. There are
latent overflow bugs when calculating the number of inodes in the
filesystem that can trivially be fixed now, rather than waiting for
them to be hit at some point in the future. The block number calcs
are already correct in this code.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Print inode numbers as unsigned values, to avoid printing negative
numbers for inodes above 2B.
Flags should be printed as hex instead of signed decimal values.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Allow comment lines with '#' at the start of the line in the command
file passed in to debugfs via the "-f" option or from standard input.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Pack struct dx_dir_info and dx_dirblock_info properly in memory, to
avoid holes, and fields are not larger than necessary. This reduces
the memory needed for each hashed dir, according to pahole(1) from:
struct dx_dir_info {
/* size: 32, cachelines: 1, members: 6 */
/* sum members: 26, holes: 1, sum holes: 2 */
/* padding: 4 */
};
struct dx_dirblock_info {
/* size: 56, cachelines: 1, members: 9 */
/* sum members: 48, holes: 2, sum holes: 8 */
/* last cacheline: 56 bytes */
};
to 8 bytes less for each directory and directory block, and leaves
space for future use if needed (e.g. larger numblocks):
struct dx_dir_info {
/* size: 24, cachelines: 1, members: 6 */
/* sum members: 20, holes: 1, sum holes: 4 */
/* bit holes: 1, sum bit holes: 7 bits */
};
struct dx_dirblock_info {
/* size: 48, cachelines: 1, members: 9 */
};
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Don't use mallinfo() for determining the amount of memory used if it
is over 2GB. Otherwise, the signed ints used by this interface can
can overflow and return garbage values. This makes the actual amount
of memory used by e2fsck misleading and hard to determine.
Instead, use brk() to get the total amount of memory allocated, and print
this if the more detailed mallinfo() information is not suitable for use.
There does not appear to be a mallinfo64() variant of this function.
There does appear to be an abomination named malloc_info() that writes
XML-formatted malloc stats to a FILE stream that would need to be read
and parsed in order to get these stats, but that doesn't seem worthwhile.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use ext2_ino_t instead of ino_t for referencing inode numbers.
Use loff_t for for file offsets, and dgrp_t for group numbers.
Cast products to ssize_t before multiplication to avoid overflow.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
e2fsck_allocate_memory() takes an "unsigned int size" argument, which
will overflow for allocations above 4GB. This happens for dir_info
and dx_dir_info arrays when there are more than 350M directories in a
filesystem, and for the dblist array above 180M directories.
There is also a risk of overflow during the binary search in both
e2fsck_get_dir_info() and e2fsck_get_dx_dir_info() when the midpoint
of the array is calculated, if there would be more than 2B directories
in the filesystem and working above the half way point.
Also, in some places inode numbers are "int" instead of "ext2_ino_t",
which can also cause problems with the array size calculations, and
makes it hard to identify where inode numbers are used.
Fix e2fsck_allocate_memory() to take an "unsigned long" argument to
match ext2fs_get_mem(), so that it can do single memory allocations
over 4GB.
Fix e2fsck_get_dir_info() and e2fsck_get_dx_dir_info() to temporarily
use an unsigned long long value to calculate the midpoint (which will
always fit into an ext2_ino_t again afterward).
Change variables that hold inode numbers to be ext2_ino_t, and print
them as unsigned values instead of printing negative inode numbers.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
gcc version 10 changed the default from -fcommon to -fno-common and as a
result e2fsprogs make check tests fail because tst_libext2fs.c end up
with a build error.
This is because it defines two global variables debug_prog_name and
extra_cmds that are already defined in debugfs/debugfs.c. With -fcommon
linker was able to resolve those into the same object, however with
-fno-common it's no longer able to do it and we end up with multiple
definition errors.
Fix the problem by using SKIP_GLOBDEFS macro to skip the variables
definition in debugfs.c. Note that debug_prog_name is also defined in
lib/ext2fs/extent.c when DEBUG macro is used, but this does not work even
with older gcc versions and is never used regardless so I am not going to
bother with it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
By convention, lists of options in man pages use a label followed by an
indented description, such as this example from the Options section:
-R Recursively change attributes of directories and
their contents.
But the Attributes section places the available attributes mid-sentence,
which makes it visually more difficult to parse:
A file with the 'a' attribute set can only be opened
in append mode for writing. [...]
When a file with the 'A' attribute set is accessed, its
atime record is not modified. [...]
This patch places a label beside each attribute description, which (in
my opinion) improves readability, especially when visually skimming the
list. For example:
a A file with the 'a' attribute set can only be
opened in append mode for writing.
A When a file with the 'A' attribute set is accessed,
its atime record is not modified.
Signed-off-by: Jeremy Visser <jeremyvisser@google.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Reported-by: canardo909@gmx.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the bad block list has been reset in the middle of an inode scan,
it's possible for bb->list[scan->bad_blocks_ptr] to result in an
out-of-bounds read access.
This is highly unlikely to happen under normal circumstances; in
particular, we generally don't use bad block inodes any more. In
addition, this would only happen if the bad block inode itself is
corrupt so e2fsck needs to wipe it out. This might cause e2fsck to
crash, but it will more likely cause a part of the inode table to be
wrongly considered invalid, causing file system to be incorrectly
fixed.
This was reported by TALOS as TALOS-2020-0974 and CVE-2020-6057, but
after closer examination, we don't believe this can be used in any way
to exploit the system or release information about the system, since
all this can do is to cause part of the inode table to be skipped when
it shouldn't be, and this can't be leveraged since any information
about the ASLR of the process is obsolete once e2fsck exits.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If overhead is not recorded in the super block, it is calculated
during mount in kernel, for bigalloc file systems it takes
O(groups**2) in time. For a 1PB device with 32K cluster size it takes
~12 mins to mount, with most of the time spent on figuring out
overhead.
While we can not improve the overhead algorithm in kernel due to the
nature of bigalloc, we can work out the overhead during mke2fs and set
it in the super block, avoiding calculating it every time when it
mounts.
Overhead is s_first_data_block plus internal journal blocks plus the
block and inode bitmaps, inode table, super block backups and group
descriptor blocks for every group. This patch introduces
ext2fs_count_used_clusters(), which calculates the clusters used in
the block bitmap for the given range.
When bad blocks are involved, it gets tricky because the blocks
counted as overhead and the bad blocks can end up in the same
allocation cluster. In this case we will unmark the bad blocks from
the block bitmap, convert to cluster bitmap and get the overhead, then
mark the bad blocks back in the cluster bitmap.
Reset the overhead to zero when resizing, we can not simply count the
used blocks as overhead like we do when mke2fs. The overhead can be
calculated by kernel side during mount.
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Rename s_overhead_blocks field from struct ext2_super_block to
make it consistent with the kernel counterpart.
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
For a bigalloc filesystem, converting the block bitmap from blocks
to chunks in ext2fs_convert_subcluster_bitmap() can take a long time
when the device is huge, because we test the bitmap
bit-by-bit using ext2fs_test_block_bitmap2().
Use ext2fs_find_first_set_block_bitmap2() which is more efficient
for mke2fs when the fs is mostly empty.
e2fsck can also benefit from this during pass1 block scanning.
Time taken for "mke2fs -O bigalloc,extent -C 131072 -b 4096" on a 1PB
device:
without patch:
real 27m49.457s
user 21m36.474s
sys 6m9.514s
with patch:
real 6m31.908s
user 0m1.806s
sys 6m29.697s
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
The printf("%.*s") format requires both the buffer size and buffer
pointer to be specified for each use. Since this is repeatedly given
as "(int)sizeof(buf), (char *)buf" for mmp_nodename and mmp_bdevname
fields, with typecasts to avoid compiler warnings.
Add a helper macro EXT2_LEN_STR() to avoid repeated boilerplate code.
This can also be used for other superblock buffer fields that may not
have NUL-terminated strings (e.g. s_volume_name, s_last_mounted,
s_{first,last}_error_func, s_mount_opts) to simplify code and avoid
the need for temporary buffers for NUL-termination.
Annotate the superblock string fields that may not be NUL-terminated.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Don't assume that mmp_nodename and mmp_bdevname are NUL terminated,
since very long node/device names may completely fill the buffers.
Limit string printing to the maximum buffer size for safety, and
change the field definitions to __u8 to make it more clear that
they are not NUL-terminated strings, as is done with other strings
in the superblock that do not have NUL termination.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Ext4 has an extent status cache; add the fiemap extensions so we can
query the kernel for the extent status cache information.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Previously, we just cleared the bad block list and restarted the inode
scan, but we didn't do a full reset of all of e2fsck's state. When
code handling this case; we didn't have the framework to do a
restarted run. Now that we do, we can simply the code and make it
more correct.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We now restart the full e2fsck instead of unwinding and restarting
pass1. So most of what used to be in unwind_pass1() has been moved
elsewhere. Let's git rid of it entirely, which simplifies and shrinks
pass1.c slightly.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the EXT2_FLAG_SUPER_ONLY is set, there's no reason to allocate the
shadow block group descriptors and byte swap the group descriptors.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the EXT2_FLAG_SUPER_ONLY is not set, and the group descriptors are
not loaded, ext2fs_flush[2]() will return EXT2_ET_NO_GDESC.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
This is really only needed in the 1.46+ where the EXT2_FLAG_SUPER_ONLY
is honored by ext2fs_open to only read the superblock, so that
fs->group_desc can be NULL. We define it in the maint branch so that
we can be sure the error tables are kept in sync (in the unlikely case
that a new error code needs to be assigned in the maint branch).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This is a similar fix as c9a8c53b17cc ("libext2fs: fix crash in
ext2fs_open2() on Big Endian systems").
Commit e6069a05: ("Teach ext2fs_open2() to honor the
EXT2_FLAG_SUPER_ONLY flag") changed how the function
ext2fs_group_desc() handled a request for a gdp pointer for a group
larger than the number of groups in the file system; it now returns
NULL, instead of returning a pointer beyond the end of the array.
Previously, the ext2fs_imager_super_write() function would swap all of
the block group descriptors in a block, even if they are beyond the
end of the file system. This was OK, since we were not overrunning
the allocated memory, since it was rounded to a block boundary. But
now that ext2fs_group_desc() would return NULL for those gdp, it would
cause ext2fs_open2(), when it was byte swapping the block group
descriptors on Big Endian systems, to dereference a null pointer and
crash.
This commit adds a NULL pointer check to avoid byte swapping those
block group descriptors in a bg descriptor block, but which are beyond
the end of the file system, to address this crash.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
v1.45.5
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
By only compiling the ext2fs_swap_* functions on big-endian systems,
it causes debian/libext2fs2.symbols to need to be different on
different little-endian vs big-endian architectures. Including the
ext2fS_swap_* functions increases the size of the library by ~6k,
which is not a big deal.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The two second sleep is only needed in e2scrub, and when there is a
failure, so that systemd has a chance to gather the log output before
e2scrub exits. It's not needed if the script is exiting successfully,
and it's never needed for e2scrub_all ever.
Addresses-Debian-Bug: #948193
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Previously we would scan /etc/mtab if the device is not found in
/proc/mounts. This is because previously, /etc/mtab would have the
filename for a loopback mount, while /proc/mounts would only have
something like /dev/loop0. Since on many systems /etc/mtab is now a
symlink to /proc/mounts, ismounted.c has a special function,
check_loop_mounted.
For this reason, it's not necessary to fall back to trying to scan
/etc/mtab if a device / filename is not found from scanning
/proc/mounts. This also prevents failures if the file /etc/mtab does
not exist but /proc/mounts does exist when checking to see if a device
is mounted when it isn't.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
We are no longer enabling periodic file system checks by default in
mke2fs. The only reason why we force file system checks if the last
mount time or last write time in the superblock is if this might
bypass the periodic file systme checks. So if the checkinterval is
zero, skip the last mount/write time checks since there's no reason to
force a check just because the system clock is incorrect.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
With newer versions of gcc -pedantic is *super* pedantic, and
generates way too much noise. So we drop it, and thus we don't need
util/gcc-wall-cleanup and util/static-analysis-cleanup.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Enable the use of files > 2GB when using the inode_io manager.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|