sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget2/translations/zh_CN/filesystems/ext4/atomic_writesmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget2/translations/zh_TW/filesystems/ext4/atomic_writesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget2/translations/it_IT/filesystems/ext4/atomic_writesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget2/translations/ja_JP/filesystems/ext4/atomic_writesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget2/translations/ko_KR/filesystems/ext4/atomic_writesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget2/translations/sp_SP/filesystems/ext4/atomic_writesmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhhL/var/lib/git/docbuild/linux/Documentation/filesystems/ext4/atomic_writes.rsthKubhtarget)}(h.. _atomic_writes:h]h}(h]h ]h"]h$]h&]refid atomic-writesuh1hhKhhhhhhubhsection)}(hhh](htitle)}(hAtomic Block Writesh]hAtomic Block Writes}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h Introductionh]h Introduction}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hX~Atomic (untorn) block writes ensure that either the entire write is committed to disk or none of it is. This prevents "torn writes" during power loss or system crashes. The ext4 filesystem supports atomic writes (only with Direct I/O) on regular files with extents, provided the underlying storage device supports hardware atomic writes. This is supported in the following two ways:h]hXAtomic (untorn) block writes ensure that either the entire write is committed to disk or none of it is. This prevents “torn writes” during power loss or system crashes. The ext4 filesystem supports atomic writes (only with Direct I/O) on regular files with extents, provided the underlying storage device supports hardware atomic writes. This is supported in the following two ways:}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubhenumerated_list)}(hhh](h list_item)}(hX8**Single-fsblock Atomic Writes**: EXT4's supports atomic write operations with a single filesystem block since v6.13. In this the atomic write unit minimum and maximum sizes are both set to filesystem blocksize. e.g. doing atomic write of 16KB with 16KB filesystem blocksize on 64KB pagesize system is possible. h]h)}(hX7**Single-fsblock Atomic Writes**: EXT4's supports atomic write operations with a single filesystem block since v6.13. In this the atomic write unit minimum and maximum sizes are both set to filesystem blocksize. e.g. doing atomic write of 16KB with 16KB filesystem blocksize on 64KB pagesize system is possible.h](hstrong)}(h **Single-fsblock Atomic Writes**h]hSingle-fsblock Atomic Writes}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhX: EXT4’s supports atomic write operations with a single filesystem block since v6.13. In this the atomic write unit minimum and maximum sizes are both set to filesystem blocksize. e.g. doing atomic write of 16KB with 16KB filesystem blocksize on 64KB pagesize system is possible.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(hXR**Multi-fsblock Atomic Writes with Bigalloc**: EXT4 now also supports atomic writes spanning multiple filesystem blocks using a feature known as bigalloc. The atomic write unit's minimum and maximum sizes are determined by the filesystem block size and cluster size, based on the underlying device’s supported atomic write unit limits. h]h)}(hXQ**Multi-fsblock Atomic Writes with Bigalloc**: EXT4 now also supports atomic writes spanning multiple filesystem blocks using a feature known as bigalloc. The atomic write unit's minimum and maximum sizes are determined by the filesystem block size and cluster size, based on the underlying device’s supported atomic write unit limits.h](j)}(h-**Multi-fsblock Atomic Writes with Bigalloc**h]h)Multi-fsblock Atomic Writes with Bigalloc}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj+ubhX&: EXT4 now also supports atomic writes spanning multiple filesystem blocks using a feature known as bigalloc. The atomic write unit’s minimum and maximum sizes are determined by the filesystem block size and cluster size, based on the underlying device’s supported atomic write unit limits.}(hj+hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj'ubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1hhhhhhhhKubeh}(h] introductionah ]h"] introductionah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h Requirementsh]h Requirements}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhj`hhhhhKubh)}(h-Basic requirements for atomic writes in ext4:h]h-Basic requirements for atomic writes in ext4:}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hj`hhubh block_quote)}(hXg1. The extents feature must be enabled (default for ext4) 2. The underlying block device must support atomic writes 3. For single-fsblock atomic writes: 1. A filesystem with appropriate block size (up to the page size) 4. For multi-fsblock atomic writes: 1. The bigalloc feature must be enabled 2. The cluster size must be appropriately configured h]h)}(hhh](h)}(h6The extents feature must be enabled (default for ext4)h]h)}(hjh]h6The extents feature must be enabled (default for ext4)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK"hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h6The underlying block device must support atomic writesh]h)}(hjh]h6The underlying block device must support atomic writes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK#hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hdFor single-fsblock atomic writes: 1. A filesystem with appropriate block size (up to the page size)h](h)}(h!For single-fsblock atomic writes:h]h!For single-fsblock atomic writes:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK$hjubh)}(hhh]h)}(h>A filesystem with appropriate block size (up to the page size)h]h)}(hjh]h>A filesystem with appropriate block size (up to the page size)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK&hjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]jSjTjUhjVjWuh1hhjubeh}(h]h ]h"]h$]h&]uh1hhjubh)}(hFor multi-fsblock atomic writes: 1. The bigalloc feature must be enabled 2. The cluster size must be appropriately configured h](h)}(h For multi-fsblock atomic writes:h]h For multi-fsblock atomic writes:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK'hjubh)}(hhh](h)}(h$The bigalloc feature must be enabledh]h)}(hjh]h$The bigalloc feature must be enabled}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK)hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h2The cluster size must be appropriately configured h]h)}(h1The cluster size must be appropriately configuredh]h1The cluster size must be appropriately configured}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK*hjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jSjTjUhjVjWuh1hhjubeh}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jSjTjUhjVjWuh1hhjubah}(h]h ]h"]h$]h&]uh1jhhhK"hj`hhubh)}(hNOTE: EXT4 does not support software or COW based atomic write, which means atomic writes on ext4 are only supported if underlying storage device supports it.h]hNOTE: EXT4 does not support software or COW based atomic write, which means atomic writes on ext4 are only supported if underlying storage device supports it.}(hjJhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK,hj`hhubeh}(h] requirementsah ]h"] requirementsah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h$Multi-fsblock Implementation Detailsh]h$Multi-fsblock Implementation Details}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhj`hhhhhK1ubh)}(hX The bigalloc feature changes ext4 to allocate in units of multiple filesystem blocks, also known as clusters. With bigalloc each bit within block bitmap represents cluster (power of 2 number of blocks) rather than individual filesystem blocks. EXT4 supports multi-fsblock atomic writes with bigalloc, subject to the following constraints. The minimum atomic write size is the larger of the fs block size and the minimum hardware atomic write unit; and the maximum atomic write size is smaller of the bigalloc cluster size and the maximum hardware atomic write unit. Bigalloc ensures that all allocations are aligned to the cluster size, which satisfies the LBA alignment requirements of the hardware device if the start of the partition/logical volume is itself aligned correctly.h]hX The bigalloc feature changes ext4 to allocate in units of multiple filesystem blocks, also known as clusters. With bigalloc each bit within block bitmap represents cluster (power of 2 number of blocks) rather than individual filesystem blocks. EXT4 supports multi-fsblock atomic writes with bigalloc, subject to the following constraints. The minimum atomic write size is the larger of the fs block size and the minimum hardware atomic write unit; and the maximum atomic write size is smaller of the bigalloc cluster size and the maximum hardware atomic write unit. Bigalloc ensures that all allocations are aligned to the cluster size, which satisfies the LBA alignment requirements of the hardware device if the start of the partition/logical volume is itself aligned correctly.}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK3hj`hhubh)}(hDHere is the block allocation strategy in bigalloc for atomic writes:h]hDHere is the block allocation strategy in bigalloc for atomic writes:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK?hj`hhubj)}(hXx* For regions with fully mapped extents, no additional work is needed * For append writes, a new mapped extent is allocated * For regions that are entirely holes, unwritten extent is created * For large unwritten extents, the extent gets split into two unwritten extents of appropriate requested size * For mixed mapping regions (combinations of holes, unwritten extents, or mapped extents), ext4_map_blocks() is called in a loop with EXT4_GET_BLOCKS_ZERO flag to convert the region into a single contiguous mapped extent by writing zeroes to it and converting any unwritten extents to written, if found within the range. h]h bullet_list)}(hhh](h)}(hCFor regions with fully mapped extents, no additional work is neededh]h)}(hjh]hCFor regions with fully mapped extents, no additional work is needed}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKAhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h3For append writes, a new mapped extent is allocatedh]h)}(hjh]h3For append writes, a new mapped extent is allocated}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKBhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h@For regions that are entirely holes, unwritten extent is createdh]h)}(hjh]h@For regions that are entirely holes, unwritten extent is created}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKChjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hkFor large unwritten extents, the extent gets split into two unwritten extents of appropriate requested sizeh]h)}(hkFor large unwritten extents, the extent gets split into two unwritten extents of appropriate requested sizeh]hkFor large unwritten extents, the extent gets split into two unwritten extents of appropriate requested size}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKDhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hX?For mixed mapping regions (combinations of holes, unwritten extents, or mapped extents), ext4_map_blocks() is called in a loop with EXT4_GET_BLOCKS_ZERO flag to convert the region into a single contiguous mapped extent by writing zeroes to it and converting any unwritten extents to written, if found within the range. h]h)}(hX>For mixed mapping regions (combinations of holes, unwritten extents, or mapped extents), ext4_map_blocks() is called in a loop with EXT4_GET_BLOCKS_ZERO flag to convert the region into a single contiguous mapped extent by writing zeroes to it and converting any unwritten extents to written, if found within the range.h]hX>For mixed mapping regions (combinations of holes, unwritten extents, or mapped extents), ext4_map_blocks() is called in a loop with EXT4_GET_BLOCKS_ZERO flag to convert the region into a single contiguous mapped extent by writing zeroes to it and converting any unwritten extents to written, if found within the range.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKFhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]bullet*uh1jhhhKAhjubah}(h]h ]h"]h$]h&]uh1jhhhKAhj`hhubh)}(hXNote: Writing on a single contiguous underlying extent, whether mapped or unwritten, is not inherently problematic. However, writing to a mixed mapping region (i.e. one containing a combination of mapped and unwritten extents) must be avoided when performing atomic writes.h]hXNote: Writing on a single contiguous underlying extent, whether mapped or unwritten, is not inherently problematic. However, writing to a mixed mapping region (i.e. one containing a combination of mapped and unwritten extents) must be avoided when performing atomic writes.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKLhj`hhubh)}(hXfThe reason is that, atomic writes when issued via pwritev2() with the RWF_ATOMIC flag, requires that either all data is written or none at all. In the event of a system crash or unexpected power loss during the write operation, the affected region (when later read) must reflect either the complete old data or the complete new data, but never a mix of both.h]hXfThe reason is that, atomic writes when issued via pwritev2() with the RWF_ATOMIC flag, requires that either all data is written or none at all. In the event of a system crash or unexpected power loss during the write operation, the affected region (when later read) must reflect either the complete old data or the complete new data, but never a mix of both.}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKQhj`hhubh)}(hXTo enforce this guarantee, we ensure that the write target is backed by a single, contiguous extent before any data is written. This is critical because ext4 defers the conversion of unwritten extents to written extents until the I/O completion path (typically in ->end_io()). If a write is allowed to proceed over a mixed mapping region (with mapped and unwritten extents) and a failure occurs mid-write, the system could observe partially updated regions after reboot, i.e. new data over mapped areas, and stale (old) data over unwritten extents that were never marked written. This violates the atomicity and/or torn write prevention guarantee.h]hXTo enforce this guarantee, we ensure that the write target is backed by a single, contiguous extent before any data is written. This is critical because ext4 defers the conversion of unwritten extents to written extents until the I/O completion path (typically in ->end_io()). If a write is allowed to proceed over a mixed mapping region (with mapped and unwritten extents) and a failure occurs mid-write, the system could observe partially updated regions after reboot, i.e. new data over mapped areas, and stale (old) data over unwritten extents that were never marked written. This violates the atomicity and/or torn write prevention guarantee.}(hj5hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKWhj`hhubh)}(hXTo prevent such torn writes, ext4 proactively allocates a single contiguous extent for the entire requested region in ``ext4_iomap_alloc`` via ``ext4_map_blocks_atomic()``. EXT4 also force commits the current journalling transaction in case if allocation is done over mixed mapping. This ensures any pending metadata updates (like unwritten to written extents conversion) in this range are in consistent state with the file data blocks, before performing the actual write I/O. If the commit fails, the whole I/O must be aborted to prevent from any possible torn writes. Only after this step, the actual data write operation is performed by the iomap.h](hvTo prevent such torn writes, ext4 proactively allocates a single contiguous extent for the entire requested region in }(hjChhhNhNubhliteral)}(h``ext4_iomap_alloc``h]hext4_iomap_alloc}(hjMhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjCubh via }(hjChhhNhNubjL)}(h``ext4_map_blocks_atomic()``h]hext4_map_blocks_atomic()}(hj_hhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjCubhX. EXT4 also force commits the current journalling transaction in case if allocation is done over mixed mapping. This ensures any pending metadata updates (like unwritten to written extents conversion) in this range are in consistent state with the file data blocks, before performing the actual write I/O. If the commit fails, the whole I/O must be aborted to prevent from any possible torn writes. Only after this step, the actual data write operation is performed by the iomap.}(hjChhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKahj`hhubeh}(h]$multi-fsblock-implementation-detailsah ]h"]$multi-fsblock implementation detailsah$]h&]uh1hhhhhhhhK1ubh)}(hhh](h)}(h)Handling Split Extents Across Leaf Blocksh]h)Handling Split Extents Across Leaf Blocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKlubh)}(hX&There can be a special edge case where we have logically and physically contiguous extents stored in separate leaf nodes of the on-disk extent tree. This occurs because on-disk extent tree merges only happens within the leaf blocks except for a case where we have 2-level tree which can get merged and collapsed entirely into the inode. If such a layout exists and, in the worst case, the extent status cache entries are reclaimed due to memory pressure, ``ext4_map_blocks()`` may never return a single contiguous extent for these split leaf extents.h](hXThere can be a special edge case where we have logically and physically contiguous extents stored in separate leaf nodes of the on-disk extent tree. This occurs because on-disk extent tree merges only happens within the leaf blocks except for a case where we have 2-level tree which can get merged and collapsed entirely into the inode. If such a layout exists and, in the worst case, the extent status cache entries are reclaimed due to memory pressure, }(hjhhhNhNubjL)}(h``ext4_map_blocks()``h]hext4_map_blocks()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubhJ may never return a single contiguous extent for these split leaf extents.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKnhjhhubh)}(hTo address this edge case, a new get block flag ``EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS flag`` is added to enhance the ``ext4_map_query_blocks()`` lookup behavior.h](h0To address this edge case, a new get block flag }(hjhhhNhNubjL)}(h*``EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS flag``h]h&EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS flag}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh is added to enhance the }(hjhhhNhNubjL)}(h``ext4_map_query_blocks()``h]hext4_map_query_blocks()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh lookup behavior.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKwhjhhubh)}(hXnThis new get block flag allows ``ext4_map_blocks()`` to first check if there is an entry in the extent status cache for the full range. If not present, it consults the on-disk extent tree using ``ext4_map_query_blocks()``. If the located extent is at the end of a leaf node, it probes the next logical block (lblk) to detect a contiguous extent in the adjacent leaf.h](hThis new get block flag allows }(hjhhhNhNubjL)}(h``ext4_map_blocks()``h]hext4_map_blocks()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh to first check if there is an entry in the extent status cache for the full range. If not present, it consults the on-disk extent tree using }(hjhhhNhNubjL)}(h``ext4_map_query_blocks()``h]hext4_map_query_blocks()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh. If the located extent is at the end of a leaf node, it probes the next logical block (lblk) to detect a contiguous extent in the adjacent leaf.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK{hjhhubh)}(hFor now only one additional leaf block is queried to maintain efficiency, as atomic writes are typically constrained to small sizes (e.g. [blocksize, clustersize]).h]hFor now only one additional leaf block is queried to maintain efficiency, as atomic writes are typically constrained to small sizes (e.g. [blocksize, clustersize]).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h])handling-split-extents-across-leaf-blocksah ]h"])handling split extents across leaf blocksah$]h&]uh1hhhhhhhhKlubh)}(hhh](h)}(hHandling Journal transactionsh]hHandling Journal transactions}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj*hhhhhKubh)}(h]To support multi-fsblock atomic writes, we ensure enough journal credits are reserved during:h]h]To support multi-fsblock atomic writes, we ensure enough journal credits are reserved during:}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj*hhubj)}(hX1. Block allocation time in ``ext4_iomap_alloc()``. We first query if there could be a mixed mapping for the underlying requested range. If yes, then we reserve credits of up to ``m_len``, assuming every alternate block can be an unwritten extent followed by a hole. 2. During ``->end_io()`` call, we make sure a single transaction is started for doing unwritten-to-written conversion. The loop for conversion is mainly only required to handle a split extent across leaf blocks. h]h)}(hhh](h)}(hXBlock allocation time in ``ext4_iomap_alloc()``. We first query if there could be a mixed mapping for the underlying requested range. If yes, then we reserve credits of up to ``m_len``, assuming every alternate block can be an unwritten extent followed by a hole. h]h)}(hXBlock allocation time in ``ext4_iomap_alloc()``. We first query if there could be a mixed mapping for the underlying requested range. If yes, then we reserve credits of up to ``m_len``, assuming every alternate block can be an unwritten extent followed by a hole.h](hBlock allocation time in }(hjThhhNhNubjL)}(h``ext4_iomap_alloc()``h]hext4_iomap_alloc()}(hj\hhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjTubh. We first query if there could be a mixed mapping for the underlying requested range. If yes, then we reserve credits of up to }(hjThhhNhNubjL)}(h ``m_len``h]hm_len}(hjnhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjTubhO, assuming every alternate block can be an unwritten extent followed by a hole.}(hjThhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjPubah}(h]h ]h"]h$]h&]uh1hhjMubh)}(hDuring ``->end_io()`` call, we make sure a single transaction is started for doing unwritten-to-written conversion. The loop for conversion is mainly only required to handle a split extent across leaf blocks. h]h)}(hDuring ``->end_io()`` call, we make sure a single transaction is started for doing unwritten-to-written conversion. The loop for conversion is mainly only required to handle a split extent across leaf blocks.h](hDuring }(hjhhhNhNubjL)}(h``->end_io()``h]h ->end_io()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh call, we make sure a single transaction is started for doing unwritten-to-written conversion. The loop for conversion is mainly only required to handle a split extent across leaf blocks.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjMubeh}(h]h ]h"]h$]h&]jSjTjUhjVjWuh1hhjIubah}(h]h ]h"]h$]h&]uh1jhhhKhj*hhubeh}(h]handling-journal-transactionsah ]h"]handling journal transactionsah$]h&]uh1hhhhhhhhKubeh}(h](atomic-block-writesheh ]h"](atomic block writes atomic_writeseh$]h&]uh1hhhhhhhhKexpect_referenced_by_name}jhsexpect_referenced_by_id}hhsubh)}(hhh](h)}(hHow toh]hHow to}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h.Creating Filesystems with Atomic Write Supporth]h.Creating Filesystems with Atomic Write Support}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(htFirst check the atomic write units supported by block device. See :ref:`atomic_write_bdev_support` for more details.h](hBFirst check the atomic write units supported by block device. See }(hjhhhNhNubh)}(h :ref:`atomic_write_bdev_support`h]hinline)}(hjh]hatomic_write_bdev_support}(hjhhhNhNubah}(h]h ](xrefstdstd-refeh"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]refdocfilesystems/ext4/atomic_writes refdomainjreftyperef refexplicitrefwarn reftargetatomic_write_bdev_supportuh1hhhhKhjubh for more details.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hcFor single-fsblock atomic writes with a larger block size (on systems with block size < page size):h]hcFor single-fsblock atomic writes with a larger block size (on systems with block size < page size):}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh literal_block)}(hp# Create an ext4 filesystem with a 16KB block size # (requires page size >= 16KB) mkfs.ext4 -b 16384 /dev/deviceh]hp# Create an ext4 filesystem with a 16KB block size # (requires page size >= 16KB) mkfs.ext4 -b 16384 /dev/device}hj@sbah}(h]h ]h"]h$]h&]hhforcelanguagebashhighlight_args}uh1j>hhhKhjhhubh)}(h.For multi-fsblock atomic writes with bigalloc:h]h.For multi-fsblock atomic writes with bigalloc:}(hjShhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj?)}(hu# Create an ext4 filesystem with bigalloc and 64KB cluster size mkfs.ext4 -F -O bigalloc -b 4096 -C 65536 /dev/deviceh]hu# Create an ext4 filesystem with bigalloc and 64KB cluster size mkfs.ext4 -F -O bigalloc -b 4096 -C 65536 /dev/device}hjasbah}(h]h ]h"]h$]h&]hhjNjObashjQ}uh1j>hhhKhjhhubh)}(hWhere ``-b`` specifies the block size, ``-C`` specifies the cluster size in bytes, and ``-O bigalloc`` enables the bigalloc feature.h](hWhere }(hjqhhhNhNubjL)}(h``-b``h]h-b}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjqubh specifies the block size, }(hjqhhhNhNubjL)}(h``-C``h]h-C}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjqubh* specifies the cluster size in bytes, and }(hjqhhhNhNubjL)}(h``-O bigalloc``h]h -O bigalloc}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjqubh enables the bigalloc feature.}(hjqhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h].creating-filesystems-with-atomic-write-supportah ]h"].creating filesystems with atomic write supportah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(hApplication Interfaceh]hApplication Interface}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hjApplications can use the ``pwritev2()`` system call with the ``RWF_ATOMIC`` flag to perform atomic writes:h](hApplications can use the }(hjhhhNhNubjL)}(h``pwritev2()``h]h pwritev2()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh system call with the }(hjhhhNhNubjL)}(h``RWF_ATOMIC``h]h RWF_ATOMIC}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh flag to perform atomic writes:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj?)}(h.pwritev2(fd, iov, iovcnt, offset, RWF_ATOMIC);h]h.pwritev2(fd, iov, iovcnt, offset, RWF_ATOMIC);}hjsbah}(h]h ]h"]h$]h&]hhjNjOcjQ}uh1j>hhhKhjhhubh)}(hThe write must be aligned to the filesystem's block size and not exceed the filesystem's maximum atomic write unit size. See ``generic_atomic_write_valid()`` for more details.h](hThe write must be aligned to the filesystem’s block size and not exceed the filesystem’s maximum atomic write unit size. See }(hjhhhNhNubjL)}(h ``generic_atomic_write_valid()``h]hgeneric_atomic_write_valid()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh for more details.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX``statx()`` system call with ``STATX_WRITE_ATOMIC`` flag can provides following details:h](jL)}(h ``statx()``h]hstatx()}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1jKhj0ubh system call with }(hj0hhhNhNubjL)}(h``STATX_WRITE_ATOMIC``h]hSTATX_WRITE_ATOMIC}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhj0ubh% flag can provides following details:}(hj0hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hXz* ``stx_atomic_write_unit_min``: Minimum size of an atomic write request. * ``stx_atomic_write_unit_max``: Maximum size of an atomic write request. * ``stx_atomic_write_segments_max``: Upper limit for segments. The number of separate memory buffers that can be gathered into a write operation (e.g., the iovcnt parameter for IOV_ITER). Currently, this is always set to one. h]j)}(hhh](h)}(hG``stx_atomic_write_unit_min``: Minimum size of an atomic write request.h]h)}(hjgh](jL)}(h``stx_atomic_write_unit_min``h]hstx_atomic_write_unit_min}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjiubh*: Minimum size of an atomic write request.}(hjihhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjeubah}(h]h ]h"]h$]h&]uh1hhjbubh)}(hG``stx_atomic_write_unit_max``: Maximum size of an atomic write request.h]h)}(hjh](jL)}(h``stx_atomic_write_unit_max``h]hstx_atomic_write_unit_max}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh*: Maximum size of an atomic write request.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjbubh)}(h``stx_atomic_write_segments_max``: Upper limit for segments. The number of separate memory buffers that can be gathered into a write operation (e.g., the iovcnt parameter for IOV_ITER). Currently, this is always set to one. h]h)}(h``stx_atomic_write_segments_max``: Upper limit for segments. The number of separate memory buffers that can be gathered into a write operation (e.g., the iovcnt parameter for IOV_ITER). Currently, this is always set to one.h](jL)}(h!``stx_atomic_write_segments_max``h]hstx_atomic_write_segments_max}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh: Upper limit for segments. The number of separate memory buffers that can be gathered into a write operation (e.g., the iovcnt parameter for IOV_ITER). Currently, this is always set to one.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjbubeh}(h]h ]h"]h$]h&]jjuh1jhhhKhj^ubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubh)}(h`The STATX_ATTR_WRITE_ATOMIC flag in ``statx->attributes`` is set if atomic writes are supported.h](h$The STATX_ATTR_WRITE_ATOMIC flag in }(hjhhhNhNubjL)}(h``statx->attributes``h]hstatx->attributes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjubh' is set if atomic writes are supported.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h.. _atomic_write_bdev_support:h]h}(h]h ]h"]h$]h&]hatomic-write-bdev-supportuh1hhKhjhhhhubeh}(h]application-interfaceah ]h"]application interfaceah$]h&]uh1hhjhhhhhKubeh}(h]how-toah ]h"]how toah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hHardware Supporth]hHardware Support}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hThe underlying storage device must support atomic write operations. Modern NVMe and SCSI devices often provide this capability. The Linux kernel exposes this information through sysfs:h]hThe underlying storage device must support atomic write operations. Modern NVMe and SCSI devices often provide this capability. The Linux kernel exposes this information through sysfs:}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh](h)}(hO``/sys/block//queue/atomic_write_unit_min`` - Minimum atomic write sizeh]h)}(hj@h](jL)}(h3``/sys/block//queue/atomic_write_unit_min``h]h//sys/block//queue/atomic_write_unit_min}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjBubh - Minimum atomic write size}(hjBhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj>ubah}(h]h ]h"]h$]h&]uh1hhj;hhhhhNubh)}(hP``/sys/block//queue/atomic_write_unit_max`` - Maximum atomic write size h]h)}(hO``/sys/block//queue/atomic_write_unit_max`` - Maximum atomic write sizeh](jL)}(h3``/sys/block//queue/atomic_write_unit_max``h]h//sys/block//queue/atomic_write_unit_max}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1jKhjgubh - Maximum atomic write size}(hjghhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjcubah}(h]h ]h"]h$]h&]uh1hhj;hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhKhjhhubh)}(hTNonzero values for these attributes indicate that the device supports atomic writes.h]hTNonzero values for these attributes indicate that the device supports atomic writes.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h](hardware-supportj eh ]h"](hardware supportatomic_write_bdev_supporteh$]h&]uh1hhhhhhhhKj}jjsj}j jsubh)}(hhh](h)}(hSee Alsoh]hSee Also}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubj)}(hhh](h)}(h7:doc:`bigalloc` - Documentation on the bigalloc featureh]h)}(hjh](h)}(h:doc:`bigalloc`h]j)}(hjh]hbigalloc}(hjhhhNhNubah}(h]h ](jstdstd-doceh"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]refdocj refdomainjreftypedoc refexplicitrefwarnj$bigallocuh1hhhhKhjubh( - Documentation on the bigalloc feature}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(h=:doc:`allocators` - Documentation on block allocation in ext4h]h)}(hjh](h)}(h:doc:`allocators`h]j)}(hjh]h allocators}(hjhhhNhNubah}(h]h ](jstdstd-doceh"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]refdocj refdomainjreftypedoc refexplicitrefwarnj$ allocatorsuh1hhhhKhjubh, - Documentation on block allocation in ext4}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hJSupport for atomic block writes in 6.13: https://lwn.net/Articles/1009298/h]h)}(hJSupport for atomic block writes in 6.13: https://lwn.net/Articles/1009298/h](h)Support for atomic block writes in 6.13: }(hj.hhhNhNubh reference)}(h!https://lwn.net/Articles/1009298/h]h!https://lwn.net/Articles/1009298/}(hj8hhhNhNubah}(h]h ]h"]h$]h&]refurij:uh1j6hj.ubeh}(h]h ]h"]h$]h&]uh1hhhhKhj*ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhKhjhhubeh}(h]see-alsoah ]h"]see alsoah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}(h]haj ]jaunameids}(jhjjj]jZj]jZj|jyj'j$jjjjjjjjjj jjj^j[u nametypes}(jjj]j]j|j'jjjjjjj^uh}(hhjhjZhjZj`jyj`j$jjj*jjjjjjj jjjj[ju footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages](hsystem_message)}(hhh]h)}(hhh]h3Hyperlink target "atomic-writes" is not referenced.}hjsbah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehlineKuh1jubj)}(hhh]h)}(hhh]h?Hyperlink target "atomic-write-bdev-support" is not referenced.}hj sbah}(h]h ]h"]h$]h&]uh1hhj ubah}(h]h ]h"]h$]h&]levelKtypej sourcehlineKuh1jube transformerN include_log] decorationNhhub.