sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget-/translations/zh_CN/filesystems/ext4/overviewmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget-/translations/zh_TW/filesystems/ext4/overviewmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget-/translations/it_IT/filesystems/ext4/overviewmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget-/translations/ja_JP/filesystems/ext4/overviewmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget-/translations/ko_KR/filesystems/ext4/overviewmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget-/translations/sp_SP/filesystems/ext4/overviewmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhhG/var/lib/git/docbuild/linux/Documentation/filesystems/ext4/overview.rsthKubhsection)}(hhh](htitle)}(hHigh Level Designh]hHigh Level Design}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hX0An ext4 file system is split into a series of block groups. To reduce performance difficulties due to fragmentation, the block allocator tries very hard to keep each file's blocks within the same group, thereby reducing seek times. The size of a block group is specified in ``sb.s_blocks_per_group`` blocks, though it can also calculated as 8 * ``block_size_in_bytes``. With the default block size of 4KiB, each group will contain 32,768 blocks, for a length of 128MiB. The number of block groups is the size of the device divided by the size of a block group.h](hXAn ext4 file system is split into a series of block groups. To reduce performance difficulties due to fragmentation, the block allocator tries very hard to keep each file’s blocks within the same group, thereby reducing seek times. The size of a block group is specified in }(hhhhhNhNubhliteral)}(h``sb.s_blocks_per_group``h]hsb.s_blocks_per_group}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhubh. blocks, though it can also calculated as 8 * }(hhhhhNhNubh)}(h``block_size_in_bytes``h]hblock_size_in_bytes}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhubh. With the default block size of 4KiB, each group will contain 32,768 blocks, for a length of 128MiB. The number of block groups is the size of the device divided by the size of a block group.}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hAll fields in ext4 are written to disk in little-endian order. HOWEVER, all fields in jbd2 (the journal) are written to disk in big-endian order.h]hAll fields in ext4 are written to disk in little-endian order. HOWEVER, all fields in jbd2 (the journal) are written to disk in big-endian order.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hj sbah}(h]h ]h"]h$]h&]hhuh1hhhhhh)Documentation/filesystems/ext4/blocks.rsthKubh)}(hhh](h)}(hBlocksh]hBlocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhjhKubh)}(hXext4 allocates storage space in units of “blocks”. A block is a group of sectors between 1KiB and 64KiB, and the number of sectors must be an integral power of 2. Blocks are in turn grouped into larger units called block groups. Block size is specified at mkfs time and typically is 4KiB. You may experience mounting problems if block size is greater than page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory pages). By default a filesystem can contain 2^32 blocks; if the '64bit' feature is enabled, then a filesystem can have 2^64 blocks. The location of structures is stored in terms of the block number the structure lives in and not the absolute offset on disk.h]hXext4 allocates storage space in units of “blocks”. A block is a group of sectors between 1KiB and 64KiB, and the number of sectors must be an integral power of 2. Blocks are in turn grouped into larger units called block groups. Block size is specified at mkfs time and typically is 4KiB. You may experience mounting problems if block size is greater than page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory pages). By default a filesystem can contain 2^32 blocks; if the ‘64bit’ feature is enabled, then a filesystem can have 2^64 blocks. The location of structures is stored in terms of the block number the structure lives in and not the absolute offset on disk.}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(h.For 32-bit filesystems, limits are as follows:h]h.For 32-bit filesystems, limits are as follows:}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jShjPubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjPubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjPubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjPubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjPubhthead)}(hhh]hrow)}(hhh](hentry)}(hhh]h)}(hItemh]hItem}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h1KiBh]h1KiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2KiBh]h2KiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h4KiBh]h4KiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h64KiBh]h64KiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhjPubhtbody)}(hhh](j)}(hhh](j)}(hhh]h)}(hBlocksh]hBlocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj-ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjDubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj[ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjuhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK hjrubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hInodesh]hInodes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK!hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK"hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK#hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK$hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK%hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hFile System Sizeh]hFile System Size}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK&hjubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h4TiBh]h4TiB}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK'hj%ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h8TiBh]h8TiB}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK(hj<ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h16TiBh]h16TiB}(hjVhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK)hjSubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h256TiBh]h256TiB}(hjmhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK*hjjubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlocks Per Block Grouph]hBlocks Per Block Group}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK+hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h8,192h]h8,192}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK,hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h16,384h]h16,384}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK-hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h32,768h]h32,768}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK.hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h524,288h]h524,288}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK/hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hInodes Per Block Grouph]hInodes Per Block Group}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK0hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h8,192h]h8,192}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK1hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h16,384h]h16,384}(hj7hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK2hj4ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h32,768h]h32,768}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK3hjKubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h524,288h]h524,288}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK4hjbubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlock Group Sizeh]hBlock Group Size}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK5hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h8MiBh]h8MiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK6hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h32MiBh]h32MiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK7hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h128MiBh]h128MiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK8hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h32GiBh]h32GiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK9hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlocks Per File, Extentsh]hBlocks Per File, Extents}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK:hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK;hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^32h]h2^32}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjZubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlocks Per File, Block Mapsh]hBlocks Per File, Block Maps}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK?hjzubah}(h]h ]h"]h$]h&]uh1jhjwubj)}(hhh]h)}(h 16,843,020h]h 16,843,020}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK@hjubah}(h]h ]h"]h$]h&]uh1jhjwubj)}(hhh]h)}(h 134,480,396h]h 134,480,396}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKAhjubah}(h]h ]h"]h$]h&]uh1jhjwubj)}(hhh]h)}(h 1,074,791,436h]h 1,074,791,436}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKBhjubah}(h]h ]h"]h$]h&]uh1jhjwubj)}(hhh]h)}(h=4,398,314,962,956 (really 2^32 due to field size limitations)h]h=4,398,314,962,956 (really 2^32 due to field size limitations)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKChjubah}(h]h ]h"]h$]h&]uh1jhjwubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hFile Size, Extentsh]hFile Size, Extents}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKDhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h4TiBh]h4TiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKEhj ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h8TiBh]h8TiB}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKFhj$ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h16TiBh]h16TiB}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKGhj;ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h256TiBh]h256TiB}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKHhjRubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hFile Size, Block Mapsh]hFile Size, Block Maps}(hjuhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKIhjrubah}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh]h)}(h16GiBh]h16GiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKJhjubah}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh]h)}(h256GiBh]h256GiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKKhjubah}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh]h)}(h4TiBh]h4TiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKLhjubah}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh]h)}(h256TiBh]h256TiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKMhjubah}(h]h ]h"]h$]h&]uh1jhjoubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjPubeh}(h]h ]h"]h$]h&]colsKuh1jNhjKubah}(h]h ]colwidths-givenah"]h$]h&]uh1jIhjhhhNhNubh)}(h.For 64-bit filesystems, limits are as follows:h]h.For 64-bit filesystems, limits are as follows:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKOhjhhubjJ)}(hhh]jO)}(hhh](jT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjubj)}(hhh]j)}(hhh](j)}(hhh]h)}(hItemh]hItem}(hjIhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKUhjFubah}(h]h ]h"]h$]h&]uh1jhjCubj)}(hhh]h)}(h1KiBh]h1KiB}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKVhj]ubah}(h]h ]h"]h$]h&]uh1jhjCubj)}(hhh]h)}(h2KiBh]h2KiB}(hjwhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKWhjtubah}(h]h ]h"]h$]h&]uh1jhjCubj)}(hhh]h)}(h4KiBh]h4KiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKXhjubah}(h]h ]h"]h$]h&]uh1jhjCubj)}(hhh]h)}(h64KiBh]h64KiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKYhjubah}(h]h ]h"]h$]h&]uh1jhjCubeh}(h]h ]h"]h$]h&]uh1jhj@ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh](j)}(hhh]h)}(hBlocksh]hBlocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKZhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^64h]h2^64}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK[hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^64h]h2^64}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK\hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^64h]h2^64}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK]hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2^64h]h2^64}(hj*hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK^hj'ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hInodesh]hInodes}(hjJhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK_hjGubah}(h]h ]h"]h$]h&]uh1jhjDubj)}(hhh]h)}(h2^32h]h2^32}(hjahhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK`hj^ubah}(h]h ]h"]h$]h&]uh1jhjDubj)}(hhh]h)}(h2^32h]h2^32}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKahjuubah}(h]h ]h"]h$]h&]uh1jhjDubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKbhjubah}(h]h ]h"]h$]h&]uh1jhjDubj)}(hhh]h)}(h2^32h]h2^32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKchjubah}(h]h ]h"]h$]h&]uh1jhjDubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hFile System Sizeh]hFile System Size}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKdhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h16ZiBh]h16ZiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKehjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h32ZiBh]h32ZiB}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKfhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h64ZiBh]h64ZiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKghj ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h1YiBh]h1YiB}(hj" hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhhj ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlocks Per Block Grouph]hBlocks Per Block Group}(hjB hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKihj? ubah}(h]h ]h"]h$]h&]uh1jhj< ubj)}(hhh]h)}(h8,192h]h8,192}(hjY hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKjhjV ubah}(h]h ]h"]h$]h&]uh1jhj< ubj)}(hhh]h)}(h16,384h]h16,384}(hjp hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKkhjm ubah}(h]h ]h"]h$]h&]uh1jhj< ubj)}(hhh]h)}(h32,768h]h32,768}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKlhj ubah}(h]h ]h"]h$]h&]uh1jhj< ubj)}(hhh]h)}(h524,288h]h524,288}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKmhj ubah}(h]h ]h"]h$]h&]uh1jhj< ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hInodes Per Block Grouph]hInodes Per Block Group}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKnhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h8,192h]h8,192}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKohj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h16,384h]h16,384}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKphj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h32,768h]h32,768}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKqhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h524,288h]h524,288}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKrhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlock Group Sizeh]hBlock Group Size}(hj: hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKshj7 ubah}(h]h ]h"]h$]h&]uh1jhj4 ubj)}(hhh]h)}(h8MiBh]h8MiB}(hjQ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKthjN ubah}(h]h ]h"]h$]h&]uh1jhj4 ubj)}(hhh]h)}(h32MiBh]h32MiB}(hjh hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKuhje ubah}(h]h ]h"]h$]h&]uh1jhj4 ubj)}(hhh]h)}(h128MiBh]h128MiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKvhj| ubah}(h]h ]h"]h$]h&]uh1jhj4 ubj)}(hhh]h)}(h32GiBh]h32GiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKwhj ubah}(h]h ]h"]h$]h&]uh1jhj4 ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlocks Per File, Extentsh]hBlocks Per File, Extents}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKxhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h2^32h]h2^32}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKyhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h2^32h]h2^32}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKzhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h2^32h]h2^32}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK{hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h2^32h]h2^32}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK|hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hBlocks Per File, Block Mapsh]hBlocks Per File, Block Maps}(hj2 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK}hj/ ubah}(h]h ]h"]h$]h&]uh1jhj, ubj)}(hhh]h)}(h 16,843,020h]h 16,843,020}(hjI hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK~hjF ubah}(h]h ]h"]h$]h&]uh1jhj, ubj)}(hhh]h)}(h 134,480,396h]h 134,480,396}(hj` hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj] ubah}(h]h ]h"]h$]h&]uh1jhj, ubj)}(hhh]h)}(h 1,074,791,436h]h 1,074,791,436}(hjw hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjt ubah}(h]h ]h"]h$]h&]uh1jhj, ubj)}(hhh]h)}(h=4,398,314,962,956 (really 2^32 due to field size limitations)h]h=4,398,314,962,956 (really 2^32 due to field size limitations)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj, ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hFile Size, Extentsh]hFile Size, Extents}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h4TiBh]h4TiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h8TiBh]h8TiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h16TiBh]h16TiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]h)}(h256TiBh]h256TiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hFile Size, Block Mapsh]hFile Size, Block Maps}(hj* hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj' ubah}(h]h ]h"]h$]h&]uh1jhj$ ubj)}(hhh]h)}(h16GiBh]h16GiB}(hjA hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj> ubah}(h]h ]h"]h$]h&]uh1jhj$ ubj)}(hhh]h)}(h256GiBh]h256GiB}(hjX hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjU ubah}(h]h ]h"]h$]h&]uh1jhj$ ubj)}(hhh]h)}(h4TiBh]h4TiB}(hjo hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjl ubah}(h]h ]h"]h$]h&]uh1jhj$ ubj)}(hhh]h)}(h256TiBh]h256TiB}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhj ubah}(h]h ]h"]h$]h&]uh1jhj$ ubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]colsKuh1jNhj ubah}(h]h ]jah"]h$]h&]uh1jIhjhhhNhNubh)}(hNote: Files not using extents (i.e. files using block maps) must be placed within the first 2^32 blocks of a filesystem. Files with extents must be placed within the first 2^48 blocks of a filesystem. It's not clear what happens with larger filesystems.h]hNote: Files not using extents (i.e. files using block maps) must be placed within the first 2^32 blocks of a filesystem. Files with extents must be placed within the first 2^48 blocks of a filesystem. It’s not clear what happens with larger filesystems.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hj sbah}(h]h ]h"]h$]h&]hhuh1hhjhhh-Documentation/filesystems/ext4/blockgroup.rsthKubeh}(h]blocksah ]h"]blocksah$]h&]uh1hhhhhhjhKubh)}(hhh](h)}(hLayouth]hLayout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhj hKubh)}(hThe layout of a standard block group is approximately as follows (each of these fields is discussed in a separate section below):h]hThe layout of a standard block group is approximately as follows (each of these fields is discussed in a separate section below):}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj hhubjJ)}(hhh]jO)}(hhh](jT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj ubj)}(hhh]j)}(hhh](j)}(hhh]h)}(hGroup 0 Paddingh]hGroup 0 Padding}(hjN hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hK hjK ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(hext4 Super Blockh]hext4 Super Block}(hje hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjb ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(hGroup Descriptorsh]hGroup Descriptors}(hj| hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjy ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(hReserved GDT Blocksh]hReserved GDT Blocks}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(hData Block Bitmaph]hData Block Bitmap}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(h inode Bitmaph]h inode Bitmap}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(h inode Tableh]h inode Table}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj ubah}(h]h ]h"]h$]h&]uh1jhjH ubj)}(hhh]h)}(h Data Blocksh]h Data Blocks}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj ubah}(h]h ]h"]h$]h&]uh1jhjH ubeh}(h]h ]h"]h$]h&]uh1jhjE ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hhh](j)}(hhh]h)}(h 1024 bytesh]h 1024 bytes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h1 blockh]h1 block}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj,ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h many blocksh]h many blocks}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjCubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h many blocksh]h many blocks}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjZubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h1 blockh]h1 block}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjqubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h1 blockh]h1 block}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h many blocksh]h many blocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hmany more blocksh]hmany more blocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]colsKuh1jNhj ubah}(h]h ]jah"]h$]h&]uh1jIhj hhhNhNubh)}(hXFor the special case of block group 0, the first 1024 bytes are unused, to allow for the installation of x86 boot sectors and other oddities. The superblock will start at offset 1024 bytes, whichever block that happens to be (usually 0). However, if for some reason the block size = 1024, then block 0 is marked in use and the superblock goes in block 1. For all other block groups, there is no padding.h]hXFor the special case of block group 0, the first 1024 bytes are unused, to allow for the installation of x86 boot sectors and other oddities. The superblock will start at offset 1024 bytes, whichever block that happens to be (usually 0). However, if for some reason the block size = 1024, then block 0 is marked in use and the superblock goes in block 1. For all other block groups, there is no padding.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhj hhubh)}(hX0The ext4 driver primarily works with the superblock and the group descriptors that are found in block group 0. Redundant copies of the superblock and group descriptors are written to some of the block groups across the disk in case the beginning of the disk gets trashed, though not all block groups necessarily host a redundant copy (see following paragraph for more details). If the group does not have a redundant copy, the block group begins with the data block bitmap. Note also that when the filesystem is freshly formatted, mkfs will allocate “reserve GDT block” space after the block group descriptors and before the start of the block bitmaps to allow for future expansion of the filesystem. By default, a filesystem is allowed to increase in size by a factor of 1024x over the original filesystem size.h]hX0The ext4 driver primarily works with the superblock and the group descriptors that are found in block group 0. Redundant copies of the superblock and group descriptors are written to some of the block groups across the disk in case the beginning of the disk gets trashed, though not all block groups necessarily host a redundant copy (see following paragraph for more details). If the group does not have a redundant copy, the block group begins with the data block bitmap. Note also that when the filesystem is freshly formatted, mkfs will allocate “reserve GDT block” space after the block group descriptors and before the start of the block bitmaps to allow for future expansion of the filesystem. By default, a filesystem is allowed to increase in size by a factor of 1024x over the original filesystem size.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hK%hj hhubh)}(hThe location of the inode table is given by ``grp.bg_inode_table_*``. It is continuous range of blocks large enough to contain ``sb.s_inodes_per_group * sb.s_inode_size`` bytes.h](h,The location of the inode table is given by }(hjhhhNhNubh)}(h``grp.bg_inode_table_*``h]hgrp.bg_inode_table_*}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh;. It is continuous range of blocks large enough to contain }(hjhhhNhNubh)}(h+``sb.s_inodes_per_group * sb.s_inode_size``h]h'sb.s_inodes_per_group * sb.s_inode_size}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh bytes.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj hK2hj hhubh)}(hXAs for the ordering of items in a block group, it is generally established that the super block and the group descriptor table, if present, will be at the beginning of the block group. The bitmaps and the inode table can be anywhere, and it is quite possible for the bitmaps to come after the inode table, or for both to be in different groups (flex_bg). Leftover space is used for file data blocks, indirect block maps, extent tree blocks, and extended attributes.h]hXAs for the ordering of items in a block group, it is generally established that the super block and the group descriptor table, if present, will be at the beginning of the block group. The bitmaps and the inode table can be anywhere, and it is quite possible for the bitmaps to come after the inode table, or for both to be in different groups (flex_bg). Leftover space is used for file data blocks, indirect block maps, extent tree blocks, and extended attributes.}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hK6hj hhubeh}(h]layoutah ]h"]layoutah$]h&]uh1hhhhhhj hKubh)}(hhh](h)}(hFlexible Block Groupsh]hFlexible Block Groups}(hjMhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjJhhhj hK?ubh)}(hXStarting in ext4, there is a new feature called flexible block groups (flex_bg). In a flex_bg, several block groups are tied together as one logical block group; the bitmap spaces and the inode table space in the first block group of the flex_bg are expanded to include the bitmaps and inode tables of all other block groups in the flex_bg. For example, if the flex_bg size is 4, then group 0 will contain (in order) the superblock, group descriptors, data block bitmaps for groups 0-3, inode bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining space in group 0 is for file data. The effect of this is to group the block group metadata close together for faster loading, and to enable large files to be continuous on disk. Backup copies of the superblock and group descriptors are always at the beginning of block groups, even if flex_bg is enabled. The number of block groups that make up a flex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.h](hXStarting in ext4, there is a new feature called flexible block groups (flex_bg). In a flex_bg, several block groups are tied together as one logical block group; the bitmap spaces and the inode table space in the first block group of the flex_bg are expanded to include the bitmaps and inode tables of all other block groups in the flex_bg. For example, if the flex_bg size is 4, then group 0 will contain (in order) the superblock, group descriptors, data block bitmaps for groups 0-3, inode bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining space in group 0 is for file data. The effect of this is to group the block group metadata close together for faster loading, and to enable large files to be continuous on disk. Backup copies of the superblock and group descriptors are always at the beginning of block groups, even if flex_bg is enabled. The number of block groups that make up a flex_bg is given by 2 ^ }(hj[hhhNhNubh)}(h``sb.s_log_groups_per_flex``h]hsb.s_log_groups_per_flex}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhj[ubh.}(hj[hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj hKAhjJhhubeh}(h]flexible-block-groupsah ]h"]flexible block groupsah$]h&]uh1hhhhhhj hK?ubh)}(hhh](h)}(hMeta Block Groupsh]hMeta Block Groups}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhj hKQubh)}(hXKWithout the option META_BG, for safety concerns, all block group descriptors copies are kept in the first block group. Given the default 128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4 can have at most 2^27/64 = 2^21 block groups. This limits the entire filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB.h]hXKWithout the option META_BG, for safety concerns, all block group descriptors copies are kept in the first block group. Given the default 128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4 can have at most 2^27/64 = 2^21 block groups. This limits the entire filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKShjhhubh)}(hXBThe solution to this problem is to use the metablock group feature (META_BG), which is already in ext3 for all 2.6 releases. With the META_BG feature, ext4 filesystems are partitioned into many metablock groups. Each metablock group is a cluster of block groups whose group descriptor structures can be stored in a single disk block. For ext4 filesystems with 4 KB block size, a single metablock group partition includes 64 block groups, or 8 GiB of disk space. The metablock group feature moves the location of the group descriptors from the congested first block group of the whole filesystem into the first group of each metablock group itself. The backups are in the second and last group of each metablock group. This increases the 2^21 maximum block groups limit to the hard limit 2^32, allowing support for a 512PiB filesystem.h]hXBThe solution to this problem is to use the metablock group feature (META_BG), which is already in ext3 for all 2.6 releases. With the META_BG feature, ext4 filesystems are partitioned into many metablock groups. Each metablock group is a cluster of block groups whose group descriptor structures can be stored in a single disk block. For ext4 filesystems with 4 KB block size, a single metablock group partition includes 64 block groups, or 8 GiB of disk space. The metablock group feature moves the location of the group descriptors from the congested first block group of the whole filesystem into the first group of each metablock group itself. The backups are in the second and last group of each metablock group. This increases the 2^21 maximum block groups limit to the hard limit 2^32, allowing support for a 512PiB filesystem.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKYhjhhubh)}(hXeThe change in the filesystem format replaces the current scheme where the superblock is followed by a variable-length set of block group descriptors. Instead, the superblock and a single block group descriptor block is placed at the beginning of the first, second, and last block groups in a meta-block group. A meta-block group is a collection of block groups which can be described by a single block group descriptor block. Since the size of the block group descriptor structure is 64 bytes, a meta-block group contains 16 block groups for filesystems with a 1KB block size, and 64 block groups for filesystems with a 4KB blocksize. Filesystems can either be created using this new block group descriptor layout, or existing filesystems can be resized on-line, and the field s_first_meta_bg in the superblock will indicate the first block group using this new layout.h]hXeThe change in the filesystem format replaces the current scheme where the superblock is followed by a variable-length set of block group descriptors. Instead, the superblock and a single block group descriptor block is placed at the beginning of the first, second, and last block groups in a meta-block group. A meta-block group is a collection of block groups which can be described by a single block group descriptor block. Since the size of the block group descriptor structure is 64 bytes, a meta-block group contains 16 block groups for filesystems with a 1KB block size, and 64 block groups for filesystems with a 4KB blocksize. Filesystems can either be created using this new block group descriptor layout, or existing filesystems can be resized on-line, and the field s_first_meta_bg in the superblock will indicate the first block group using this new layout.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKfhjhhubh)}(haPlease see an important note about ``BLOCK_UNINIT`` in the section about block and inode bitmaps.h](h#Please see an important note about }(hjhhhNhNubh)}(h``BLOCK_UNINIT``h]h BLOCK_UNINIT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh. in the section about block and inode bitmaps.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj hKthjhhubeh}(h]meta-block-groupsah ]h"]meta block groupsah$]h&]uh1hhhhhhj hKQubh)}(hhh](h)}(hLazy Block Group Initializationh]hLazy Block Group Initialization}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhj hKxubh)}(hXjA new feature for ext4 are three block group descriptor flags that enable mkfs to skip initializing other parts of the block group metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean that the inode and block bitmaps for that group can be calculated and therefore the on-disk bitmap blocks are not initialized. This is generally the case for an empty block group or a block group containing only fixed-location block group metadata. The INODE_ZEROED flag means that the inode table has been initialized; mkfs will unset this flag and rely on the kernel to initialize the inode tables in the background.h]hXjA new feature for ext4 are three block group descriptor flags that enable mkfs to skip initializing other parts of the block group metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean that the inode and block bitmaps for that group can be calculated and therefore the on-disk bitmap blocks are not initialized. This is generally the case for an empty block group or a block group containing only fixed-location block group metadata. The INODE_ZEROED flag means that the inode table has been initialized; mkfs will unset this flag and rely on the kernel to initialize the inode tables in the background.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKzhjhhubh)}(hBy not writing zeroes to the bitmaps and inode table, mkfs time is reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM, but the dumpe2fs output prints this as “uninit_bg”. They are the same thing.h]hBy not writing zeroes to the bitmaps and inode table, mkfs time is reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM, but the dumpe2fs output prints this as “uninit_bg”. They are the same thing.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjhhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjsbah}(h]h ]h"]h$]h&]hhuh1hhjhhh1Documentation/filesystems/ext4/special_inodes.rsthKubeh}(h]lazy-block-group-initializationah ]h"]lazy block group initializationah$]h&]uh1hhhhhhj hKxubh)}(hhh](h)}(hSpecial inodesh]hSpecial inodes}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj*hhhj!hKubh)}(h:ext4 reserves some inode for special features, as follows:h]h:ext4 reserves some inode for special features, as follows:}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhj*hhubjJ)}(hhh]jO)}(hhh](jT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShjLubjT)}(hhh]h}(h]h ]h"]h$]h&]j^KFuh1jShjLubj)}(hhh]j)}(hhh](j)}(hhh]h)}(h inode Numberh]h inode Number}(hjjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK hjgubah}(h]h ]h"]h$]h&]uh1jhjdubj)}(hhh]h)}(hPurposeh]hPurpose}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK hj~ubah}(h]h ]h"]h$]h&]uh1jhjdubeh}(h]h ]h"]h$]h&]uh1jhjaubah}(h]h ]h"]h$]h&]uh1jhjLubj)}(hhh](j)}(hhh](j)}(hhh]h)}(h0h]h0}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h#Doesn't exist; there is no inode 0.h]h%Doesn’t exist; there is no inode 0.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h1h]h1}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hList of defective blocks.h]hList of defective blocks.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h2h]h2}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hRoot directory.h]hRoot directory.}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhj,ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h3h]h3}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjLubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(hhh]h)}(h User quota.h]h User quota.}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjcubah}(h]h ]h"]h$]h&]uh1jhjIubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h4h]h4}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h Group quota.h]h Group quota.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h5h]h5}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h Boot loader.h]h Boot loader.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h6h]h6}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hUndelete directory.h]hUndelete directory.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h7h]h7}(hj+hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhj(ubah}(h]h ]h"]h$]h&]uh1jhj%ubj)}(hhh]h)}(h6Reserved group descriptors inode. (“resize inode”)h]h6Reserved group descriptors inode. (“resize inode”)}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhj?ubah}(h]h ]h"]h$]h&]uh1jhj%ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h8h]h8}(hjbhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhj_ubah}(h]h ]h"]h$]h&]uh1jhj\ubj)}(hhh]h)}(hJournal inode.h]hJournal inode.}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hKhjvubah}(h]h ]h"]h$]h&]uh1jhj\ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h9h]h9}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h)The “exclude” inode, for snapshots(?)h]h)The “exclude” inode, for snapshots(?)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK!hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h10h]h10}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK"hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2Replica inode, used for some non-upstream feature?h]h2Replica inode, used for some non-upstream feature?}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK#hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(h11h]h11}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK$hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hrTraditional first non-reserved inode. Usually this is the lost+found directory. See s_first_ino in the superblock.h]hrTraditional first non-reserved inode. Usually this is the lost+found directory. See s_first_ino in the superblock.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK%hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjLubeh}(h]h ]h"]h$]h&]colsKuh1jNhjIubah}(h]h ]jah"]h$]h&]uh1jIhj*hhhNhNubh)}(hNote that there are also some inodes allocated from non-reserved inode numbers for other filesystem features which are not referenced from standard directory hierarchy. These are generally reference from the superblock. They are:h]hNote that there are also some inodes allocated from non-reserved inode numbers for other filesystem features which are not referenced from standard directory hierarchy. These are generally reference from the superblock. They are:}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK'hj*hhubjJ)}(hhh]jO)}(hhh](jT)}(hhh]h}(h]h ]h"]h$]h&]j^Kuh1jShj\ubjT)}(hhh]h}(h]h ]h"]h$]h&]j^K2uh1jShj\ubj)}(hhh]j)}(hhh](j)}(hhh]h)}(hSuperblock fieldh]hSuperblock field}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK/hjwubah}(h]h ]h"]h$]h&]uh1jhjtubj)}(hhh]h)}(h Descriptionh]h Description}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK0hjubah}(h]h ]h"]h$]h&]uh1jhjtubeh}(h]h ]h"]h$]h&]uh1jhjqubah}(h]h ]h"]h$]h&]uh1jhj\ubj)}(hhh](j)}(hhh](j)}(hhh]h)}(h s_lpf_inoh]h s_lpf_ino}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK2hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h%Inode number of lost+found directory.h]h%Inode number of lost+found directory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK3hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hs_prj_quota_inumh]hs_prj_quota_inum}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK4hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h2Inode number of quota file tracking project quotash]h2Inode number of quota file tracking project quotas}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK5hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hs_orphan_file_inumh]hs_orphan_file_inum}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK6hj%ubah}(h]h ]h"]h$]h&]uh1jhj"ubj)}(hhh]h)}(h,Inode number of file tracking orphan inodes.h]h,Inode number of file tracking orphan inodes.}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj!hK7hj<ubah}(h]h ]h"]h$]h&]uh1jhj"ubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhj\ubeh}(h]h ]h"]h$]h&]colsKuh1jNhjYubah}(h]h ]jah"]h$]h&]uh1jIhj*hhhNhNubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjlsbah}(h]h ]h"]h$]h&]hhuh1hhj*hhh-Documentation/filesystems/ext4/allocators.rsthKubeh}(h]special-inodesah ]h"]special inodesah$]h&]uh1hhhhhhj!hKubh)}(hhh](h)}(h!Block and Inode Allocation Policyh]h!Block and Inode Allocation Policy}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhjzhKubh)}(hXext4 recognizes (better than ext3, anyway) that data locality is generally a desirably quality of a filesystem. On a spinning disk, keeping related blocks near each other reduces the amount of movement that the head actuator and disk must perform to access a data block, thus speeding up disk IO. On an SSD there of course are no moving parts, but locality can increase the size of each transfer request while reducing the total number of requests. This locality may also have the effect of concentrating writes on a single erase block, which can speed up file rewrites significantly. Therefore, it is useful to reduce fragmentation whenever possible.h]hXext4 recognizes (better than ext3, anyway) that data locality is generally a desirably quality of a filesystem. On a spinning disk, keeping related blocks near each other reduces the amount of movement that the head actuator and disk must perform to access a data block, thus speeding up disk IO. On an SSD there of course are no moving parts, but locality can increase the size of each transfer request while reducing the total number of requests. This locality may also have the effect of concentrating writes on a single erase block, which can speed up file rewrites significantly. Therefore, it is useful to reduce fragmentation whenever possible.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzhKhjhhubh)}(hXThe first tool that ext4 uses to combat fragmentation is the multi-block allocator. When a file is first created, the block allocator speculatively allocates 8KiB of disk space to the file on the assumption that the space will get written soon. When the file is closed, the unused speculative allocations are of course freed, but if the speculation is correct (typically the case for full writes of small files) then the file data gets written out in a single multi-block extent. A second related trick that ext4 uses is delayed allocation. Under this scheme, when a file needs more blocks to absorb file writes, the filesystem defers deciding the exact placement on the disk until all the dirty buffers are being written out to disk. By not committing to a particular placement until it's absolutely necessary (the commit timeout is hit, or sync() is called, or the kernel runs out of memory), the hope is that the filesystem can make better location decisions.h]hXThe first tool that ext4 uses to combat fragmentation is the multi-block allocator. When a file is first created, the block allocator speculatively allocates 8KiB of disk space to the file on the assumption that the space will get written soon. When the file is closed, the unused speculative allocations are of course freed, but if the speculation is correct (typically the case for full writes of small files) then the file data gets written out in a single multi-block extent. A second related trick that ext4 uses is delayed allocation. Under this scheme, when a file needs more blocks to absorb file writes, the filesystem defers deciding the exact placement on the disk until all the dirty buffers are being written out to disk. By not committing to a particular placement until it’s absolutely necessary (the commit timeout is hit, or sync() is called, or the kernel runs out of memory), the hope is that the filesystem can make better location decisions.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzhKhjhhubh)}(hXFThe third trick that ext4 (and ext3) uses is that it tries to keep a file's data blocks in the same block group as its inode. This cuts down on the seek penalty when the filesystem first has to read a file's inode to learn where the file's data blocks live and then seek over to the file's data blocks to begin I/O operations.h]hXNThe third trick that ext4 (and ext3) uses is that it tries to keep a file’s data blocks in the same block group as its inode. This cuts down on the seek penalty when the filesystem first has to read a file’s inode to learn where the file’s data blocks live and then seek over to the file’s data blocks to begin I/O operations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzhK hjhhubh)}(hXThe fourth trick is that all the inodes in a directory are placed in the same block group as the directory, when feasible. The working assumption here is that all the files in a directory might be related, therefore it is useful to try to keep them all together.h]hXThe fourth trick is that all the inodes in a directory are placed in the same block group as the directory, when feasible. The working assumption here is that all the files in a directory might be related, therefore it is useful to try to keep them all together.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzhK&hjhhubh)}(hXThe fifth trick is that the disk volume is cut up into 128MB block groups; these mini-containers are used as outlined above to try to maintain data locality. However, there is a deliberate quirk -- when a directory is created in the root directory, the inode allocator scans the block groups and puts that directory into the least heavily loaded block group that it can find. This encourages directories to spread out over a disk; as the top-level directory/file blobs fill up one block group, the allocators simply move on to the next block group. Allegedly this scheme evens out the loading on the block groups, though the author suspects that the directories which are so unlucky as to land towards the end of a spinning drive get a raw deal performance-wise.h]hXThe fifth trick is that the disk volume is cut up into 128MB block groups; these mini-containers are used as outlined above to try to maintain data locality. However, there is a deliberate quirk -- when a directory is created in the root directory, the inode allocator scans the block groups and puts that directory into the least heavily loaded block group that it can find. This encourages directories to spread out over a disk; as the top-level directory/file blobs fill up one block group, the allocators simply move on to the next block group. Allegedly this scheme evens out the loading on the block groups, though the author suspects that the directories which are so unlucky as to land towards the end of a spinning drive get a raw deal performance-wise.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzhK+hjhhubh)}(h[Of course if all of these mechanisms fail, one can always use e4defrag to defragment files.h]h[Of course if all of these mechanisms fail, one can always use e4defrag to defragment files.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzhK7hjhhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjsbah}(h]h ]h"]h$]h&]hhuh1hhjhhh,Documentation/filesystems/ext4/checksums.rsthKubeh}(h]!block-and-inode-allocation-policyah ]h"]!block and inode allocation policyah$]h&]uh1hhhhhhjzhKubh)}(hhh](h)}(h Checksumsh]h Checksums}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhjhKubh)}(hXStarting in early 2012, metadata checksums were added to all major ext4 and jbd2 data structures. The associated feature flag is metadata_csum. The desired checksum algorithm is indicated in the superblock, though as of October 2012 the only supported algorithm is crc32c. Some data structures did not have space to fit a full 32-bit checksum, so only the lower 16 bits are stored. Enabling the 64bit feature increases the data structure size so that full 32-bit checksums can be stored for many data structures. However, existing 32-bit filesystems cannot be extended to enable 64bit mode, at least not without the experimental resize2fs patches to do so.~h]hXStarting in early 2012, metadata checksums were added to all major ext4 and jbd2 data structures. The associated feature flag is metadata_csum. The desired checksum algorithm is indicated in the superblock, though as of October 2012 the only supported algorithm is crc32c. Some data structures did not have space to fit a full 32-bit checksum, so only the lower 16 bits are stored. Enabling the 64bit feature increases the data structure size so that full 32-bit checksums can be stored for many data structures. However, existing 32-bit filesystems cannot be extended to enable 64bit mode, at least not without the experimental resize2fs patches to do so.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(hXExisting filesystems can have checksumming added by running ``tune2fs -O metadata_csum`` against the underlying device. If tune2fs encounters directory blocks that lack sufficient empty space to add a checksum, it will request that you run ``e2fsck -D`` to have the directories rebuilt with checksums. This has the added benefit of removing slack space from the directory files and rebalancing the htree indexes. If you _ignore_ this step, your directories will not be protected by a checksum!h](hhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hInodesh]hInodes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhK@hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h__le32h]h__le32}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKAhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(h~UUID + inode number + inode generation + the entire inode. The checksum field is set to zero. Each inode has its own checksum.h]h~UUID + inode number + inode generation + the entire inode. The checksum field is set to zero. Each inode has its own checksum.}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKBhj,ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hGroup Descriptorsh]hGroup Descriptors}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKDhjLubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(hhh]h)}(h__le16h]h__le16}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKEhjcubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(hhh]h)}(hIf metadata_csum, then UUID + group number + the entire descriptor; else if gdt_csum, then crc16(UUID + group number + the entire descriptor). In all cases, only the lower 16 bits are stored.h]hIf metadata_csum, then UUID + group number + the entire descriptor; else if gdt_csum, then crc16(UUID + group number + the entire descriptor). In all cases, only the lower 16 bits are stored.}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKFhjzubah}(h]h ]h"]h$]h&]uh1jhjIubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjaubeh}(h]h ]h"]h$]h&]colsKuh1jNhj^ubah}(h]h ]jah"]h$]h&]uh1jIhjhhhNhNubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjsbah}(h]h ]h"]h$]h&]hhuh1hhjhhh+Documentation/filesystems/ext4/bigalloc.rsthKubeh}(h] checksumsah ]h"] checksumsah$]h&]uh1hhhhhhjhKubh)}(hhh](h)}(hBigalloch]hBigalloc}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhjhKubh)}(hXAt the moment, the default size of a block is 4KiB, which is a commonly supported page size on most MMU-capable hardware. This is fortunate, as ext4 code is not prepared to handle the case where the block size exceeds the page size. However, for a filesystem of mostly huge files, it is desirable to be able to allocate disk blocks in units of multiple blocks to reduce both fragmentation and metadata overhead. The bigalloc feature provides exactly this ability.h]hXAt the moment, the default size of a block is 4KiB, which is a commonly supported page size on most MMU-capable hardware. This is fortunate, as ext4 code is not prepared to handle the case where the block size exceeds the page size. However, for a filesystem of mostly huge files, it is desirable to be able to allocate disk blocks in units of multiple blocks to reduce both fragmentation and metadata overhead. The bigalloc feature provides exactly this ability.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(hXThe bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to use clustered allocation, so that each bit in the ext4 block allocation bitmap addresses a power of two number of blocks. For example, if the file system is mainly going to be storing large files in the 4-32 megabyte range, it might make sense to set a cluster size of 1 megabyte. This means that each bit in the block allocation bitmap now addresses 256 4k blocks. This shrinks the total size of the block allocation bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also means that a block group addresses 32 gigabytes instead of 128 megabytes, also shrinking the amount of file system overhead for metadata.h]hXThe bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to use clustered allocation, so that each bit in the ext4 block allocation bitmap addresses a power of two number of blocks. For example, if the file system is mainly going to be storing large files in the 4-32 megabyte range, it might make sense to set a cluster size of 1 megabyte. This means that each bit in the block allocation bitmap now addresses 256 4k blocks. This shrinks the total size of the block allocation bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also means that a block group addresses 32 gigabytes instead of 128 megabytes, also shrinking the amount of file system overhead for metadata.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(hXqThe administrator can set a block cluster size at mkfs time (which is stored in the s_log_cluster_size field in the superblock); from then on, the block bitmaps track clusters, not individual blocks. This means that block groups can be several gigabytes in size (instead of just 128MiB); however, the minimum allocation unit becomes a cluster, not a block, even for directories. TaoBao had a patchset to extend the “use units of clusters instead of blocks” to the extent tree, though it is not clear where those patches went-- they eventually morphed into “extent tree v2” but that code has not landed as of May 2015.h]hXqThe administrator can set a block cluster size at mkfs time (which is stored in the s_log_cluster_size field in the superblock); from then on, the block bitmaps track clusters, not individual blocks. This means that block groups can be several gigabytes in size (instead of just 128MiB); however, the minimum allocation unit becomes a cluster, not a block, even for directories. TaoBao had a patchset to extend the “use units of clusters instead of blocks” to the extent tree, though it is not clear where those patches went-- they eventually morphed into “extent tree v2” but that code has not landed as of May 2015.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjsbah}(h]h ]h"]h$]h&]hhuh1hhjhhh-Documentation/filesystems/ext4/inlinedata.rsthKubeh}(h]bigallocah ]h"]bigallocah$]h&]uh1hhhhhhjhKubh)}(hhh](h)}(h Inline Datah]h Inline Data}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhj hKubh)}(hXThe inline data feature was designed to handle the case that a file's data is so tiny that it readily fits inside the inode, which (theoretically) reduces disk block consumption and reduces seeks. If the file is smaller than 60 bytes, then the data are stored inline in ``inode.i_block``. If the rest of the file would fit inside the extended attribute space, then it might be found as an extended attribute “system.data” within the inode body (“ibody EA”). This of course constrains the amount of extended attributes one can attach to an inode. If the data size increases beyond i_block + ibody EA, a regular block is allocated and the contents moved to that block.h](hXThe inline data feature was designed to handle the case that a file’s data is so tiny that it readily fits inside the inode, which (theoretically) reduces disk block consumption and reduces seeks. If the file is smaller than 60 bytes, then the data are stored inline in }(hj$hhhNhNubh)}(h``inode.i_block``h]h inode.i_block}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj$ubhX. If the rest of the file would fit inside the extended attribute space, then it might be found as an extended attribute “system.data” within the inode body (“ibody EA”). This of course constrains the amount of extended attributes one can attach to an inode. If the data size increases beyond i_block + ibody EA, a regular block is allocated and the contents moved to that block.}(hj$hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj hKhjhhubh)}(hXPending a change to compact the extended attribute key used to store inline data, one ought to be able to store 160 bytes of data in a 256-byte inode (as of June 2015, when i_extra_isize is 28). Prior to that, the limit was 156 bytes due to inefficient use of inode space.h]hXPending a change to compact the extended attribute key used to store inline data, one ought to be able to store 160 bytes of data in a 256-byte inode (as of June 2015, when i_extra_isize is 28). Prior to that, the limit was 156 bytes due to inefficient use of inode space.}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjhhubh)}(hThe inline data feature requires the presence of an extended attribute for “system.data”, even if the attribute value is zero length.h]hThe inline data feature requires the presence of an extended attribute for “system.data”, even if the attribute value is zero length.}(hjRhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hKhjhhubh)}(hhh](h)}(hInline Directoriesh]hInline Directories}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhj`hhhj hKubh)}(hXThe first four bytes of i_block are the inode number of the parent directory. Following that is a 56-byte space for an array of directory entries; see ``struct ext4_dir_entry``. If there is a “system.data” attribute in the inode body, the EA value is an array of ``struct ext4_dir_entry`` as well. Note that for inline directories, the i_block and EA space are treated as separate dirent blocks; directory entries cannot span the two.h](hThe first four bytes of i_block are the inode number of the parent directory. Following that is a 56-byte space for an array of directory entries; see }(hjqhhhNhNubh)}(h``struct ext4_dir_entry``h]hstruct ext4_dir_entry}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjqubh[. If there is a “system.data” attribute in the inode body, the EA value is an array of }(hjqhhhNhNubh)}(h``struct ext4_dir_entry``h]hstruct ext4_dir_entry}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjqubh as well. Note that for inline directories, the i_block and EA space are treated as separate dirent blocks; directory entries cannot span the two.}(hjqhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj hKhj`hhubh)}(hlInline directory entries are not checksummed, as the inode checksum should protect all inline data contents.h]hlInline directory entries are not checksummed, as the inode checksum should protect all inline data contents.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hK$hj`hhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjsbah}(h]h ]h"]h$]h&]hhuh1hhj`hhh*Documentation/filesystems/ext4/eainode.rsthKubeh}(h]inline-directoriesah ]h"]inline directoriesah$]h&]uh1hhjhhhj hKubeh}(h] inline-dataah ]h"] inline dataah$]h&]uh1hhhhhhj hKubh)}(hhh](h)}(hLarge Extended Attribute Valuesh]hLarge Extended Attribute Values}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhjhKubh)}(hXeTo enable ext4 to store extended attribute values that do not fit in the inode or in the single extended attribute block attached to an inode, the EA_INODE feature allows us to store the value in the data blocks of a regular file inode. This “EA inode” is linked only from the extended attribute name index and must not appear in a directory entry. The inode's i_atime field is used to store a checksum of the xattr value; and i_ctime/i_version store a 64-bit reference count, which enables sharing of large xattr values between multiple owning inodes. For backward compatibility with older versions of this feature, the i_mtime/i_generation *may* store a back-reference to the inode number and i_generation of the **one** owning inode (in cases where the EA inode is not referenced by multiple inodes) to verify that the EA inode is the correct one being accessed.h](hXTo enable ext4 to store extended attribute values that do not fit in the inode or in the single extended attribute block attached to an inode, the EA_INODE feature allows us to store the value in the data blocks of a regular file inode. This “EA inode” is linked only from the extended attribute name index and must not appear in a directory entry. The inode’s i_atime field is used to store a checksum of the xattr value; and i_ctime/i_version store a 64-bit reference count, which enables sharing of large xattr values between multiple owning inodes. For backward compatibility with older versions of this feature, the i_mtime/i_generation }(hjhhhNhNubhemphasis)}(h*may*h]hmay}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhD store a back-reference to the inode number and i_generation of the }(hjhhhNhNubhstrong)}(h**one**h]hone}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh owning inode (in cases where the EA inode is not referenced by multiple inodes) to verify that the EA inode is the correct one being accessed.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhjhKhjhhubh)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hjsbah}(h]h ]h"]h$]h&]hhuh1hhjhhh)Documentation/filesystems/ext4/verity.rsthKubeh}(h]large-extended-attribute-valuesah ]h"]large extended attribute valuesah$]h&]uh1hhhhhhjhKubh)}(hhh](h)}(h Verity filesh]h Verity files}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj.hhhj%hKubh)}(hXext4 supports fs-verity, which is a filesystem feature that provides Merkle tree based hashing for individual readonly files. Most of fs-verity is common to all filesystems that support it; see :ref:`Documentation/filesystems/fsverity.rst ` for the fs-verity documentation. However, the on-disk layout of the verity metadata is filesystem-specific. On ext4, the verity metadata is stored after the end of the file data itself, in the following format:h](hext4 supports fs-verity, which is a filesystem feature that provides Merkle tree based hashing for individual readonly files. Most of fs-verity is common to all filesystems that support it; see }(hj?hhhNhNubh)}(h8:ref:`Documentation/filesystems/fsverity.rst `h]hinline)}(hjIh]h&Documentation/filesystems/fsverity.rst}(hjMhhhNhNubah}(h]h ](xrefstdstd-refeh"]h$]h&]uh1jKhjGubah}(h]h ]h"]h$]h&]refdocfilesystems/ext4/overview refdomainjXreftyperef refexplicitrefwarn reftargetfsverityuh1hhj%hKhj?ubh for the fs-verity documentation. However, the on-disk layout of the verity metadata is filesystem-specific. On ext4, the verity metadata is stored after the end of the file data itself, in the following format:}(hj?hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj%hKhj.hhubh bullet_list)}(hhh](h list_item)}(h{Zero-padding to the next 65536-byte boundary. This padding need not actually be allocated on-disk, i.e. it may be a hole. h]h)}(hzZero-padding to the next 65536-byte boundary. This padding need not actually be allocated on-disk, i.e. it may be a hole.h]hzZero-padding to the next 65536-byte boundary. This padding need not actually be allocated on-disk, i.e. it may be a hole.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hKhj}ubah}(h]h ]h"]h$]h&]uh1j{hjxhhhj%hNubj|)}(hThe Merkle tree, as documented in :ref:`Documentation/filesystems/fsverity.rst `, with the tree levels stored in order from root to leaf, and the tree blocks within each level stored in their natural order. h]h)}(hThe Merkle tree, as documented in :ref:`Documentation/filesystems/fsverity.rst `, with the tree levels stored in order from root to leaf, and the tree blocks within each level stored in their natural order.h](h"The Merkle tree, as documented in }(hjhhhNhNubh)}(hD:ref:`Documentation/filesystems/fsverity.rst `h]jL)}(hjh]h&Documentation/filesystems/fsverity.rst}(hjhhhNhNubah}(h]h ](jWstdstd-refeh"]h$]h&]uh1jKhjubah}(h]h ]h"]h$]h&]refdocjd refdomainjreftyperef refexplicitrefwarnjjfsverity_merkle_treeuh1hhj%hKhjubh~, with the tree levels stored in order from root to leaf, and the tree blocks within each level stored in their natural order.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj%hKhjubah}(h]h ]h"]h$]h&]uh1j{hjxhhhj%hNubj|)}(h4Zero-padding to the next filesystem block boundary. h]h)}(h3Zero-padding to the next filesystem block boundary.h]h3Zero-padding to the next filesystem block boundary.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hKhjubah}(h]h ]h"]h$]h&]uh1j{hjxhhhj%hNubj|)}(hThe verity descriptor, as documented in :ref:`Documentation/filesystems/fsverity.rst `, with optionally appended signature blob. h]h)}(hThe verity descriptor, as documented in :ref:`Documentation/filesystems/fsverity.rst `, with optionally appended signature blob.h](h(The verity descriptor, as documented in }(hjhhhNhNubh)}(hC:ref:`Documentation/filesystems/fsverity.rst `h]jL)}(hjh]h&Documentation/filesystems/fsverity.rst}(hjhhhNhNubah}(h]h ](jWstdstd-refeh"]h$]h&]uh1jKhjubah}(h]h ]h"]h$]h&]refdocjd refdomainjreftyperef refexplicitrefwarnjjfsverity_descriptoruh1hhj%hKhjubh*, with optionally appended signature blob.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhj%hKhjubah}(h]h ]h"]h$]h&]uh1j{hjxhhhj%hNubj|)}(hTZero-padding to the next offset that is 4 bytes before a filesystem block boundary. h]h)}(hSZero-padding to the next offset that is 4 bytes before a filesystem block boundary.h]hSZero-padding to the next offset that is 4 bytes before a filesystem block boundary.}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hKhj%ubah}(h]h ]h"]h$]h&]uh1j{hjxhhhj%hNubj|)}(hOThe size of the verity descriptor in bytes, as a 4-byte little endian integer. h]h)}(hNThe size of the verity descriptor in bytes, as a 4-byte little endian integer.h]hNThe size of the verity descriptor in bytes, as a 4-byte little endian integer.}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hK hj=ubah}(h]h ]h"]h$]h&]uh1j{hjxhhhj%hNubeh}(h]h ]h"]h$]h&]bullet-uh1jvhj%hKhj.hhubh)}(hVerity inodes have EXT4_VERITY_FL set, and they must use extents, i.e. EXT4_EXTENTS_FL must be set and EXT4_INLINE_DATA_FL must be clear. They can have EXT4_ENCRYPT_FL set, in which case the verity metadata is encrypted as well as the data itself.h]hVerity inodes have EXT4_VERITY_FL set, and they must use extents, i.e. EXT4_EXTENTS_FL must be set and EXT4_INLINE_DATA_FL must be clear. They can have EXT4_ENCRYPT_FL set, in which case the verity metadata is encrypted as well as the data itself.}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hK#hj.hhubh)}(hNVerity files cannot have blocks allocated past the end of the verity metadata.h]hNVerity files cannot have blocks allocated past the end of the verity metadata.}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hK(hj.hhubh)}(h^Verity and DAX are not compatible and attempts to set both of these flags on a file will fail.h]h^Verity and DAX are not compatible and attempts to set both of these flags on a file will fail.}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj%hK+hj.hhubeh}(h] verity-filesah ]h"] verity filesah$]h&]uh1hhhhhhj%hKubeh}(h]high-level-designah ]h"]high level designah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksjfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(jjj j jGjDjj}jjj'j$jj}jjjjjj jjjjj+j(jju nametypes}(jj jGjjj'jjjjjjj+juh}(jhj jjDj j}jJjjj$jj}j*jjjjj jjjjj`j(jjj.u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages] transformerN include_log]+Documentation/filesystems/ext4/overview.rst(NNNNta decorationNhhub.