sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget$/translations/zh_CN/mm/process_addrsmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/zh_TW/mm/process_addrsmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/it_IT/mm/process_addrsmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/ja_JP/mm/process_addrsmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/ko_KR/mm/process_addrsmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hPortuguese (Brazilian)}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/pt_BR/mm/process_addrsmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/sp_SP/mm/process_addrsmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh>/var/lib/git/docbuild/linux/Documentation/mm/process_addrs.rsthKubhsection)}(hhh](htitle)}(hProcess Addressesh]hProcess Addresses}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubhcompound)}(hhh]htoctree)}(hhh]h}(h]h ]h"]h$]h&]hmm/process_addrsentries] includefiles]maxdepthKcaptionNglobhidden includehiddennumberedK titlesonly rawentries]uh1hhhhKhhubah}(h]h ]toctree-wrapperah"]h$]h&]uh1hhhhhhhhNubh paragraph)}(hUserland memory ranges are tracked by the kernel via Virtual Memory Areas or 'VMA's of type :c:struct:`!struct vm_area_struct`.h](h`Userland memory ranges are tracked by the kernel via Virtual Memory Areas or ‘VMA’s of type }(hjhhhNhNubhliteral)}(h":c:struct:`!struct vm_area_struct`h]hstruct vm_area_struct}(hjhhhNhNubah}(h]h ](xrefcc-structeh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhK hhhhubj)}(hXEach VMA describes a virtually contiguous memory range with identical attributes, each described by a :c:struct:`!struct vm_area_struct` object. Userland access outside of VMAs is invalid except in the case where an adjacent stack VMA could be extended to contain the accessed address.h](hfEach VMA describes a virtually contiguous memory range with identical attributes, each described by a }(hj)hhhNhNubj )}(h":c:struct:`!struct vm_area_struct`h]hstruct vm_area_struct}(hj1hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj)ubh object. Userland access outside of VMAs is invalid except in the case where an adjacent stack VMA could be extended to contain the accessed address.}(hj)hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhhhhubj)}(hAll VMAs are contained within one and only one virtual address space, described by a :c:struct:`!struct mm_struct` object which is referenced by all tasks (that is, threads) which share the virtual address space. We refer to this as the :c:struct:`!mm`.h](hUAll VMAs are contained within one and only one virtual address space, described by a }(hjJhhhNhNubj )}(h:c:struct:`!struct mm_struct`h]hstruct mm_struct}(hjRhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjJubh{ object which is referenced by all tasks (that is, threads) which share the virtual address space. We refer to this as the }(hjJhhhNhNubj )}(h:c:struct:`!mm`h]hmm}(hjehhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjJubh.}(hjJhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhhhhubj)}(hnEach mm object contains a maple tree data structure which describes all VMAs within the virtual address space.h]hnEach mm object contains a maple tree data structure which describes all VMAs within the virtual address space.}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhhhhubhnote)}(hAn exception to this is the 'gate' VMA which is provided by architectures which use :c:struct:`!vsyscall` and is a global static object which does not belong to any specific mm.h]j)}(hAn exception to this is the 'gate' VMA which is provided by architectures which use :c:struct:`!vsyscall` and is a global static object which does not belong to any specific mm.h](hXAn exception to this is the ‘gate’ VMA which is provided by architectures which use }(hjhhhNhNubj )}(h:c:struct:`!vsyscall`h]hvsyscall}(hjhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjubhH and is a global static object which does not belong to any specific mm.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhhhhhhhNubh)}(hhh](h)}(hLockingh]hLocking}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK!ubj)}(hThe kernel is designed to be highly scalable against concurrent read operations on VMA **metadata** so a complicated set of locks are required to ensure memory corruption does not occur.h](hWThe kernel is designed to be highly scalable against concurrent read operations on VMA }(hjhhhNhNubhstrong)}(h **metadata**h]hmetadata}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhW so a complicated set of locks are required to ensure memory corruption does not occur.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhK#hjhhubj)}(hwLocking VMAs for their metadata does not have any impact on the memory they describe nor the page tables that map them.h]j)}(hwLocking VMAs for their metadata does not have any impact on the memory they describe nor the page tables that map them.h]hwLocking VMAs for their metadata does not have any impact on the memory they describe nor the page tables that map them.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK'hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(hhh](h)}(h Terminologyh]h Terminology}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK+ubh bullet_list)}(hhh](h list_item)}(h**mmap locks** - Each MM has a read/write semaphore :c:member:`!mmap_lock` which locks at a process address space granularity which can be acquired via :c:func:`!mmap_read_lock`, :c:func:`!mmap_write_lock` and variants.h]j)}(h**mmap locks** - Each MM has a read/write semaphore :c:member:`!mmap_lock` which locks at a process address space granularity which can be acquired via :c:func:`!mmap_read_lock`, :c:func:`!mmap_write_lock` and variants.h](j)}(h**mmap locks**h]h mmap locks}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh& - Each MM has a read/write semaphore }(hj hhhNhNubj )}(h:c:member:`!mmap_lock`h]h mmap_lock}(hj6hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj ubhN which locks at a process address space granularity which can be acquired via }(hj hhhNhNubj )}(h:c:func:`!mmap_read_lock`h]hmmap_read_lock()}(hjIhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh, }(hj hhhNhNubj )}(h:c:func:`!mmap_write_lock`h]hmmap_write_lock()}(hj\hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh and variants.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhK-hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hX**VMA locks** - The VMA lock is at VMA granularity (of course) which behaves as a read/write semaphore in practice. A VMA read lock is obtained via :c:func:`!lock_vma_under_rcu` (and unlocked via :c:func:`!vma_end_read`) and a write lock via vma_start_write() or vma_start_write_killable() (all VMA write locks are unlocked automatically when the mmap write lock is released). To take a VMA write lock you **must** have already acquired an :c:func:`!mmap_write_lock`.h]j)}(hX**VMA locks** - The VMA lock is at VMA granularity (of course) which behaves as a read/write semaphore in practice. A VMA read lock is obtained via :c:func:`!lock_vma_under_rcu` (and unlocked via :c:func:`!vma_end_read`) and a write lock via vma_start_write() or vma_start_write_killable() (all VMA write locks are unlocked automatically when the mmap write lock is released). To take a VMA write lock you **must** have already acquired an :c:func:`!mmap_write_lock`.h](j)}(h **VMA locks**h]h VMA locks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh - The VMA lock is at VMA granularity (of course) which behaves as a read/write semaphore in practice. A VMA read lock is obtained via }(hjhhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh (and unlocked via }(hjhhhNhNubj )}(h:c:func:`!vma_end_read`h]hvma_end_read()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh) and a write lock via vma_start_write() or vma_start_write_killable() (all VMA write locks are unlocked automatically when the mmap write lock is released). To take a VMA write lock you }(hjhhhNhNubj)}(h**must**h]hmust}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh have already acquired an }(hjhhhNhNubj )}(h:c:func:`!mmap_write_lock`h]hmmap_write_lock()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhK0hj{ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hX**rmap locks** - When trying to access VMAs through the reverse mapping via a :c:struct:`!struct address_space` or :c:struct:`!struct anon_vma` object (reachable from a folio via :c:member:`!folio->mapping`). VMAs must be stabilised via :c:func:`!anon_vma_[try]lock_read` or :c:func:`!anon_vma_[try]lock_write` for anonymous memory and :c:func:`!i_mmap_[try]lock_read` or :c:func:`!i_mmap_[try]lock_write` for file-backed memory. We refer to these locks as the reverse mapping locks, or 'rmap locks' for brevity. h]j)}(hX**rmap locks** - When trying to access VMAs through the reverse mapping via a :c:struct:`!struct address_space` or :c:struct:`!struct anon_vma` object (reachable from a folio via :c:member:`!folio->mapping`). VMAs must be stabilised via :c:func:`!anon_vma_[try]lock_read` or :c:func:`!anon_vma_[try]lock_write` for anonymous memory and :c:func:`!i_mmap_[try]lock_read` or :c:func:`!i_mmap_[try]lock_write` for file-backed memory. We refer to these locks as the reverse mapping locks, or 'rmap locks' for brevity.h](j)}(h**rmap locks**h]h rmap locks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh@ - When trying to access VMAs through the reverse mapping via a }(hjhhhNhNubj )}(h!:c:struct:`!struct address_space`h]hstruct address_space}(hjhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjubh or }(hjhhhNhNubj )}(h:c:struct:`!struct anon_vma`h]hstruct anon_vma}(hjhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjubh$ object (reachable from a folio via }(hjhhhNhNubj )}(h:c:member:`!folio->mapping`h]hfolio->mapping}(hj,hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubh). VMAs must be stabilised via }(hjhhhNhNubj )}(h":c:func:`!anon_vma_[try]lock_read`h]hanon_vma_[try]lock_read()}(hj?hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh or }hjsbj )}(h#:c:func:`!anon_vma_[try]lock_write`h]hanon_vma_[try]lock_write()}(hjRhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh for anonymous memory and }(hjhhhNhNubj )}(h :c:func:`!i_mmap_[try]lock_read`h]hi_mmap_[try]lock_read()}(hjehhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh or }(hjhhhNhNubj )}(h!:c:func:`!i_mmap_[try]lock_write`h]hi_mmap_[try]lock_write()}(hjxhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubho for file-backed memory. We refer to these locks as the reverse mapping locks, or ‘rmap locks’ for brevity.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhK7hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1jhhhK-hjhhubj)}(hFWe discuss page table locks separately in the dedicated section below.h]hFWe discuss page table locks separately in the dedicated section below.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK?hjhhubj)}(hThe first thing **any** of these locks achieve is to **stabilise** the VMA within the MM tree. That is, guaranteeing that the VMA object will not be deleted from under you nor modified (except for some specific fields described below).h](hThe first thing }(hjhhhNhNubj)}(h**any**h]hany}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh of these locks achieve is to }(hjhhhNhNubj)}(h **stabilise**h]h stabilise}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh the VMA within the MM tree. That is, guaranteeing that the VMA object will not be deleted from under you nor modified (except for some specific fields described below).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKAhjhhubj)}(hFStabilising a VMA also keeps the address space described by it around.h]hFStabilising a VMA also keeps the address space described by it around.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKFhjhhubeh}(h] terminologyah ]h"] terminologyah$]h&]uh1hhjhhhhhK+ubh)}(hhh](h)}(h Lock usageh]h Lock usage}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKIubj)}(hjIf you want to **read** VMA metadata fields or just keep the VMA stable, you must do one of the following:h](hIf you want to }(hjhhhNhNubj)}(h**read**h]hread}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhS VMA metadata fields or just keep the VMA stable, you must do one of the following:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKKhjhhubj)}(hhh](j)}(hObtain an mmap read lock at the MM granularity via :c:func:`!mmap_read_lock` (or a suitable variant), unlocking it with a matching :c:func:`!mmap_read_unlock` when you're done with the VMA, *or*h]j)}(hObtain an mmap read lock at the MM granularity via :c:func:`!mmap_read_lock` (or a suitable variant), unlocking it with a matching :c:func:`!mmap_read_unlock` when you're done with the VMA, *or*h](h3Obtain an mmap read lock at the MM granularity via }(hj-hhhNhNubj )}(h:c:func:`!mmap_read_lock`h]hmmap_read_lock()}(hj5hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj-ubh7 (or a suitable variant), unlocking it with a matching }(hj-hhhNhNubj )}(h:c:func:`!mmap_read_unlock`h]hmmap_read_unlock()}(hjHhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj-ubh" when you’re done with the VMA, }(hj-hhhNhNubhemphasis)}(h*or*h]hor}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1j[hj-ubeh}(h]h ]h"]h$]h&]uh1jhhhKNhj)ubah}(h]h ]h"]h$]h&]uh1jhj&hhhhhNubj)}(hTry to obtain a VMA read lock via :c:func:`!lock_vma_under_rcu`. This tries to acquire the lock atomically so might fail, in which case fall-back logic is required to instead obtain an mmap read lock if this returns :c:macro:`!NULL`, *or*h]j)}(hTry to obtain a VMA read lock via :c:func:`!lock_vma_under_rcu`. This tries to acquire the lock atomically so might fail, in which case fall-back logic is required to instead obtain an mmap read lock if this returns :c:macro:`!NULL`, *or*h](h"Try to obtain a VMA read lock via }(hj{hhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj{ubh. This tries to acquire the lock atomically so might fail, in which case fall-back logic is required to instead obtain an mmap read lock if this returns }(hj{hhhNhNubj )}(h:c:macro:`!NULL`h]hNULL}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hj{ubh, }(hj{hhhNhNubj\)}(h*or*h]hor}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j[hj{ubeh}(h]h ]h"]h$]h&]uh1jhhhKQhjwubah}(h]h ]h"]h$]h&]uh1jhj&hhhhhNubj)}(hAcquire an rmap lock before traversing the locked interval tree (whether anonymous or file-backed) to obtain the required VMA. h]j)}(h~Acquire an rmap lock before traversing the locked interval tree (whether anonymous or file-backed) to obtain the required VMA.h]h~Acquire an rmap lock before traversing the locked interval tree (whether anonymous or file-backed) to obtain the required VMA.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKUhjubah}(h]h ]h"]h$]h&]uh1jhj&hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhKNhjhhubj)}(hIf you want to **write** VMA metadata fields, then things vary depending on the field (we explore each VMA field in detail below). For the majority you must:h](hIf you want to }(hjhhhNhNubj)}(h **write**h]hwrite}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh VMA metadata fields, then things vary depending on the field (we explore each VMA field in detail below). For the majority you must:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKXhjhhubj)}(hhh](j)}(hObtain an mmap write lock at the MM granularity via :c:func:`!mmap_write_lock` (or a suitable variant), unlocking it with a matching :c:func:`!mmap_write_unlock` when you're done with the VMA, *and*h]j)}(hObtain an mmap write lock at the MM granularity via :c:func:`!mmap_write_lock` (or a suitable variant), unlocking it with a matching :c:func:`!mmap_write_unlock` when you're done with the VMA, *and*h](h4Obtain an mmap write lock at the MM granularity via }(hjhhhNhNubj )}(h:c:func:`!mmap_write_lock`h]hmmap_write_lock()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh7 (or a suitable variant), unlocking it with a matching }(hjhhhNhNubj )}(h:c:func:`!mmap_write_unlock`h]hmmap_write_unlock()}(hj#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh" when you’re done with the VMA, }(hjhhhNhNubj\)}(h*and*h]hand}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1j[hjubeh}(h]h ]h"]h$]h&]uh1jhhhK[hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hObtain a VMA write lock via :c:func:`!vma_start_write` for each VMA you wish to modify, which will be released automatically when :c:func:`!mmap_write_unlock` is called.h]j)}(hObtain a VMA write lock via :c:func:`!vma_start_write` for each VMA you wish to modify, which will be released automatically when :c:func:`!mmap_write_unlock` is called.h](hObtain a VMA write lock via }(hjThhhNhNubj )}(h:c:func:`!vma_start_write`h]hvma_start_write()}(hj\hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjTubhL for each VMA you wish to modify, which will be released automatically when }(hjThhhNhNubj )}(h:c:func:`!mmap_write_unlock`h]hmmap_write_unlock()}(hjohhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjTubh is called.}(hjThhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhK^hjPubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hIf you want to be able to write to **any** field, you must also hide the VMA from the reverse mapping by obtaining an **rmap write lock**. h]j)}(hIf you want to be able to write to **any** field, you must also hide the VMA from the reverse mapping by obtaining an **rmap write lock**.h](h#If you want to be able to write to }(hjhhhNhNubj)}(h**any**h]hany}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhL field, you must also hide the VMA from the reverse mapping by obtaining an }(hjhhhNhNubj)}(h**rmap write lock**h]hrmap write lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKahjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhK[hjhhubj)}(hXVMA locks are special in that you must obtain an mmap **write** lock **first** in order to obtain a VMA **write** lock. A VMA **read** lock however can be obtained without any other lock (:c:func:`!lock_vma_under_rcu` will acquire then release an RCU lock to lookup the VMA for you).h](h6VMA locks are special in that you must obtain an mmap }(hjhhhNhNubj)}(h **write**h]hwrite}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh lock }(hjhhhNhNubj)}(h **first**h]hfirst}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh in order to obtain a VMA }(hjhhhNhNubj)}(h **write**h]hwrite}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh lock. A VMA }(hjhhhNhNubj)}(h**read**h]hread}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh6 lock however can be obtained without any other lock (}(hjhhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hj hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubhB will acquire then release an RCU lock to lookup the VMA for you).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKdhjhhubj)}(hThis constrains the impact of writers on readers, as a writer can interact with one VMA while a reader interacts with another simultaneously.h]hThis constrains the impact of writers on readers, as a writer can interact with one VMA while a reader interacts with another simultaneously.}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKihjhhubj)}(hThe primary users of VMA read locks are page fault handlers, which means that without a VMA write lock, page faults will run concurrent with whatever you are doing.h]j)}(hThe primary users of VMA read locks are page fault handlers, which means that without a VMA write lock, page faults will run concurrent with whatever you are doing.h]hThe primary users of VMA read locks are page fault handlers, which means that without a VMA write lock, page faults will run concurrent with whatever you are doing.}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKlhjGubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h Examining all valid lock states:h]h Examining all valid lock states:}(hj_hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKphjhhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthK uh1jwhjtubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhjtubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK uh1jwhjtubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhjtubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhjtubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK uh1jwhjtubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK uh1jwhjtubhthead)}(hhh]hrow)}(hhh](hentry)}(hhh]j)}(h mmap lockh]h mmap lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hVMA lockh]hVMA lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h rmap lockh]h rmap lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hStable?h]hStable?}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hRead?h]hRead?}(hj*hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhj'ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Write most?h]h Write most?}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhj>ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Write all?h]h Write all?}(hjXhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKuhjUubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhjtubhtbody)}(hhh](j)}(hhh](j)}(hhh]j)}(h\-h]h-}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubj)}(hhh]j)}(h\-h]h-}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubj)}(hhh]j)}(h\-h]h-}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubj)}(hhh]j)}(hNh]hN}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubj)}(hhh]j)}(hjh]hN}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubj)}(hhh]j)}(hjh]hN}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubj)}(hhh]j)}(hjh]hN}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKwhjubah}(h]h ]h"]h$]h&]uh1jhj}ubeh}(h]h ]h"]h$]h&]uh1jhjzubj)}(hhh](j)}(hhh]j)}(h\-h]h-}(hj*hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhj'ubah}(h]h ]h"]h$]h&]uh1jhj$ubj)}(hhh]j)}(hRh]hR}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhj>ubah}(h]h ]h"]h$]h&]uh1jhj$ubj)}(hhh]j)}(h\-h]h-}(hjXhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhjUubah}(h]h ]h"]h$]h&]uh1jhj$ubj)}(hhh]j)}(hYh]hY}(hjohhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhjlubah}(h]h ]h"]h$]h&]uh1jhj$ubj)}(hhh]j)}(hjqh]hY}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhjubah}(h]h ]h"]h$]h&]uh1jhj$ubj)}(hhh]j)}(hjh]hN}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhjubah}(h]h ]h"]h$]h&]uh1jhj$ubj)}(hhh]j)}(hjh]hN}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKxhjubah}(h]h ]h"]h$]h&]uh1jhj$ubeh}(h]h ]h"]h$]h&]uh1jhjzubj)}(hhh](j)}(hhh]j)}(h\-h]h-}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h\-h]h-}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hR/Wh]hR/W}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hjqh]hY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhj ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hjqh]hY}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhj) ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hjh]hN}(hjB hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhj? ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hjh]hN}(hjX hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKyhjU ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjzubj)}(hhh](j)}(hhh]j)}(hR/Wh]hR/W}(hjw hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhjt ubah}(h]h ]h"]h$]h&]uh1jhjq ubj)}(hhh]j)}(h\-/Rh]h-/R}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhj ubah}(h]h ]h"]h$]h&]uh1jhjq ubj)}(hhh]j)}(h\-/R/Wh]h-/R/W}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhj ubah}(h]h ]h"]h$]h&]uh1jhjq ubj)}(hhh]j)}(hjqh]hY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhj ubah}(h]h ]h"]h$]h&]uh1jhjq ubj)}(hhh]j)}(hjqh]hY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhj ubah}(h]h ]h"]h$]h&]uh1jhjq ubj)}(hhh]j)}(hjh]hN}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhj ubah}(h]h ]h"]h$]h&]uh1jhjq ubj)}(hhh]j)}(hjh]hN}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKzhj ubah}(h]h ]h"]h$]h&]uh1jhjq ubeh}(h]h ]h"]h$]h&]uh1jhjzubj)}(hhh](j)}(hhh]j)}(hWh]hW}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hj h]hW}(hj4 hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hj1 ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(h\-/Rh]h-/R}(hjJ hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hjG ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hja hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hj^ ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hjw hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hjt ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjh]hN}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK{hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhjzubj)}(hhh](j)}(hhh]j)}(hj h]hW}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hj h]hW}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hj h]hW}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hj0 hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hj- ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(hjqh]hY}(hjF hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK|hjC ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhjzubeh}(h]h ]h"]h$]h&]uh1jxhjtubeh}(h]h ]h"]h$]h&]colsKuh1jrhjoubah}(h]h ]h"]h$]h&]uh1jmhjhhhhhNubhwarning)}(hXWhile it's possible to obtain a VMA lock while holding an mmap read lock, attempting to do the reverse is invalid as it can result in deadlock - if another task already holds an mmap write lock and attempts to acquire a VMA write lock that will deadlock on the VMA read lock.h]j)}(hXWhile it's possible to obtain a VMA lock while holding an mmap read lock, attempting to do the reverse is invalid as it can result in deadlock - if another task already holds an mmap write lock and attempts to acquire a VMA write lock that will deadlock on the VMA read lock.h]hXWhile it’s possible to obtain a VMA lock while holding an mmap read lock, attempting to do the reverse is invalid as it can result in deadlock - if another task already holds an mmap write lock and attempts to acquire a VMA write lock that will deadlock on the VMA read lock.}(hjx hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjt ubah}(h]h ]h"]h$]h&]uh1jr hjhhhhhNubj)}(hAll of these locks behave as read/write semaphores in practice, so you can obtain either a read or a write lock for each of these.h]hAll of these locks behave as read/write semaphores in practice, so you can obtain either a read or a write lock for each of these.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hXqGenerally speaking, a read/write semaphore is a class of lock which permits concurrent readers. However a write lock can only be obtained once all readers have left the critical region (and pending readers made to wait). This renders read locks on a read/write semaphore concurrent with other readers and write locks exclusive against all others holding the semaphore.h](j)}(hGenerally speaking, a read/write semaphore is a class of lock which permits concurrent readers. However a write lock can only be obtained once all readers have left the critical region (and pending readers made to wait).h]hGenerally speaking, a read/write semaphore is a class of lock which permits concurrent readers. However a write lock can only be obtained once all readers have left the critical region (and pending readers made to wait).}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubj)}(hThis renders read locks on a read/write semaphore concurrent with other readers and write locks exclusive against all others holding the semaphore.h]hThis renders read locks on a read/write semaphore concurrent with other readers and write locks exclusive against all others holding the semaphore.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(hhh](h)}(h VMA fieldsh]h VMA fields}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhKubj)}(hWe can subdivide :c:struct:`!struct vm_area_struct` fields by their purpose, which makes it easier to explore their locking characteristics:h](hWe can subdivide }(hj hhhNhNubj )}(h":c:struct:`!struct vm_area_struct`h]hstruct vm_area_struct}(hj hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj ubhY fields by their purpose, which makes it easier to explore their locking characteristics:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhj hhubj)}(hvWe exclude VMA lock-specific fields here to avoid confusion, as these are in effect an internal implementation detail.h]j)}(hvWe exclude VMA lock-specific fields here to avoid confusion, as these are in effect an internal implementation detail.h]hvWe exclude VMA lock-specific fields here to avoid confusion, as these are in effect an internal implementation detail.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubjn)}(hhh](h)}(hVirtual layout fieldsh]hVirtual layout fields}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubjs)}(hhh](jx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhj ubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK(uh1jwhj ubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK uh1jwhj ubj)}(hhh]j)}(hhh](j)}(hhh]j)}(hFieldh]hField}(hjE hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjB ubah}(h]h ]h"]h$]h&]uh1jhj? ubj)}(hhh]j)}(h Descriptionh]h Description}(hj\ hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjY ubah}(h]h ]h"]h$]h&]uh1jhj? ubj)}(hhh]j)}(h Write lockh]h Write lock}(hjs hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjp ubah}(h]h ]h"]h$]h&]uh1jhj? ubeh}(h]h ]h"]h$]h&]uh1jhj< ubah}(h]h ]h"]h$]h&]uh1jhj ubjy)}(hhh](j)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_start`h]j )}(hj h]hvm_start}(hj hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj ubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(h7Inclusive start virtual address of range VMA describes.h]h7Inclusive start virtual address of range VMA describes.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(h"mmap write, VMA write, rmap write.h]h"mmap write, VMA write, rmap write.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_end`h]j )}(hj h]hvm_end}(hj hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj ubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(h5Exclusive end virtual address of range VMA describes.h]h5Exclusive end virtual address of range VMA describes.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh]j)}(h"mmap write, VMA write, rmap write.h]h"mmap write, VMA write, rmap write.}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj) ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhj ubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_pgoff`h]j )}(hjN h]hvm_pgoff}(hjP hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjL ubah}(h]h ]h"]h$]h&]uh1jhhhKhjI ubah}(h]h ]h"]h$]h&]uh1jhjF ubj)}(hhh]j)}(hDescribes the page offset into the file, the original page offset within the virtual address space (prior to any :c:func:`!mremap`), or PFN if a PFN map and the architecture does not support :c:macro:`!CONFIG_ARCH_HAS_PTE_SPECIAL`.h](hqDescribes the page offset into the file, the original page offset within the virtual address space (prior to any }(hjm hhhNhNubj )}(h:c:func:`!mremap`h]hmremap()}(hju hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjm ubh=), or PFN if a PFN map and the architecture does not support }(hjm hhhNhNubj )}(h':c:macro:`!CONFIG_ARCH_HAS_PTE_SPECIAL`h]hCONFIG_ARCH_HAS_PTE_SPECIAL}(hj hhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjm ubh.}(hjm hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjj ubah}(h]h ]h"]h$]h&]uh1jhjF ubj)}(hhh]j)}(h"mmap write, VMA write, rmap write.h]h"mmap write, VMA write, rmap write.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhjF ubeh}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jxhj ubeh}(h]h ]h"]h$]h&]colsKuh1jrhj ubeh}(h]id1ah ]h"]h$]h&]uh1jmhj hhhhhNubj)}(hThese fields describes the size, start and end of the VMA, and as such cannot be modified without first being hidden from the reverse mapping since these fields are used to locate VMAs within the reverse mapping interval trees.h]hThese fields describes the size, start and end of the VMA, and as such cannot be modified without first being hidden from the reverse mapping since these fields are used to locate VMAs within the reverse mapping interval trees.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj hhubjn)}(hhh](h)}(h Core fieldsh]h Core fields}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubjs)}(hhh](jx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhj ubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK(uh1jwhj ubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhj ubj)}(hhh]j)}(hhh](j)}(hhh]j)}(hFieldh]hField}(hj!hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Descriptionh]h Description}(hj8hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj5ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Write lockh]h Write lock}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjLubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhj ubjy)}(hhh](j)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_mm`h]j )}(hjzh]hvm_mm}(hj|hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjxubah}(h]h ]h"]h$]h&]uh1jhhhKhjuubah}(h]h ]h"]h$]h&]uh1jhjrubj)}(hhh]j)}(hContaining mm_struct.h]hContaining mm_struct.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjrubj)}(hhh]j)}(h#None - written once on initial map.h]h#None - written once on initial map.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjrubeh}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_page_prot`h]j )}(hjh]h vm_page_prot}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hKArchitecture-specific page table protection bits determined from VMA flags.h]hKArchitecture-specific page table protection bits determined from VMA flags.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hmmap write, VMA write.h]hmmap write, VMA write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_flags`h]j )}(hj*h]hvm_flags}(hj,hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj(ubah}(h]h ]h"]h$]h&]uh1jhhhKhj%ubah}(h]h ]h"]h$]h&]uh1jhj"ubj)}(hhh]j)}(hwRead-only access to VMA flags describing attributes of the VMA, in union with private writable :c:member:`!__vm_flags`.h](h_Read-only access to VMA flags describing attributes of the VMA, in union with private writable }(hjIhhhNhNubj )}(h:c:member:`!__vm_flags`h]h __vm_flags}(hjQhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjIubh.}(hjIhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjFubah}(h]h ]h"]h$]h&]uh1jhj"ubj)}(hhh]j)}(hN/Ah]hN/A}(hjshhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjpubah}(h]h ]h"]h$]h&]uh1jhj"ubeh}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh](j)}(hhh]j)}(h:c:member:`!__vm_flags`h]j )}(hjh]h __vm_flags}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hXPrivate, writable access to VMA flags field, updated by :c:func:`!vm_flags_*` functions.h](h8Private, writable access to VMA flags field, updated by }(hjhhhNhNubj )}(h:c:func:`!vm_flags_*`h]h vm_flags_*()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh functions.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hmmap write, VMA write.h]hmmap write, VMA write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_file`h]j )}(hjh]hvm_file}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h}If the VMA is file-backed, points to a struct file object describing the underlying file, if anonymous then :c:macro:`!NULL`.h](hlIf the VMA is file-backed, points to a struct file object describing the underlying file, if anonymous then }(hjhhhNhNubj )}(h:c:macro:`!NULL`h]hNULL}(hj'hhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h#None - written once on initial map.h]h#None - written once on initial map.}(hjIhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjFubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_ops`h]j )}(hjkh]hvm_ops}(hjmhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjiubah}(h]h ]h"]h$]h&]uh1jhhhKhjfubah}(h]h ]h"]h$]h&]uh1jhjcubj)}(hhh]j)}(hIf the VMA is file-backed, then either the driver or file-system provides a :c:struct:`!struct vm_operations_struct` object describing callbacks to be invoked on VMA lifetime events.h](hLIf the VMA is file-backed, then either the driver or file-system provides a }(hjhhhNhNubj )}(h(:c:struct:`!struct vm_operations_struct`h]hstruct vm_operations_struct}(hjhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjubhB object describing callbacks to be invoked on VMA lifetime events.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjcubj)}(hhh]j)}(h?None - Written once on initial map by :c:func:`!f_ops->mmap()`.h](h&None - Written once on initial map by }(hjhhhNhNubj )}(h:c:func:`!f_ops->mmap()`h]h f_ops->mmap()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjcubeh}(h]h ]h"]h$]h&]uh1jhjoubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_private_data`h]j )}(hjh]hvm_private_data}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h9A :c:member:`!void *` field for driver-specific metadata.h](hA }(hjhhhNhNubj )}(h:c:member:`!void *`h]hvoid *}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubh$ field for driver-specific metadata.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hHandled by driver.h]hHandled by driver.}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj/ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjoubeh}(h]h ]h"]h$]h&]uh1jxhj ubeh}(h]h ]h"]h$]h&]colsKuh1jrhj ubeh}(h]id2ah ]h"]h$]h&]uh1jmhj hhhhhNubj)}(hVThese are the core fields which describe the MM the VMA belongs to and its attributes.h]hVThese are the core fields which describe the MM the VMA belongs to and its attributes.}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj hhubjn)}(hhh](h)}(hConfig-specific fieldsh]hConfig-specific fields}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjnubjs)}(hhh](jx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK!uh1jwhjubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhjubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK(uh1jwhjubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhjubj)}(hhh]j)}(hhh](j)}(hhh]j)}(hFieldh]hField}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hConfiguration optionh]hConfiguration option}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Descriptionh]h Description}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Write lockh]h Write lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubjy)}(hhh](j)}(hhh](j)}(hhh]j)}(h:c:member:`!anon_name`h]j )}(hj#h]h anon_name}(hj%hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj!ubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hCONFIG_ANON_VMA_NAMEh]hCONFIG_ANON_VMA_NAME}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj?ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hXA field for storing a :c:struct:`!struct anon_vma_name` object providing a name for anonymous mappings, or :c:macro:`!NULL` if none is set or the VMA is file-backed. The underlying object is reference counted and can be shared across multiple VMAs for scalability.h](hA field for storing a }(hjYhhhNhNubj )}(h!:c:struct:`!struct anon_vma_name`h]hstruct anon_vma_name}(hjahhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjYubh4 object providing a name for anonymous mappings, or }(hjYhhhNhNubj )}(h:c:macro:`!NULL`h]hNULL}(hjthhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjYubh if none is set or the VMA is file-backed. The underlying object is reference counted and can be shared across multiple VMAs for scalability.}(hjYhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjVubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hmmap write, VMA write.h]hmmap write, VMA write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(h :c:member:`!swap_readahead_info`h]j )}(hjh]hswap_readahead_info}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h CONFIG_SWAPh]h CONFIG_SWAP}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h\Metadata used by the swap mechanism to perform readahead. This field is accessed atomically.h]h\Metadata used by the swap mechanism to perform readahead. This field is accessed atomically.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hmmap read, swap-specific lock.h]hmmap read, swap-specific lock.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_policy`h]j )}(hj'h]h vm_policy}(hj)hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj%ubah}(h]h ]h"]h$]h&]uh1jhhhKhj"ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h CONFIG_NUMAh]h CONFIG_NUMA}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjCubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hv:c:type:`!mempolicy` object which describes the NUMA behaviour of the VMA. The underlying object is reference counted.h](j )}(h:c:type:`!mempolicy`h]h mempolicy}(hjahhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hj]ubhb object which describes the NUMA behaviour of the VMA. The underlying object is reference counted.}(hj]hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjZubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hmmap write, VMA write.h]hmmap write, VMA write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(h:c:member:`!numab_state`h]j )}(hjh]h numab_state}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hCONFIG_NUMA_BALANCINGh]hCONFIG_NUMA_BALANCING}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h:c:type:`!vma_numab_state` object which describes the current state of NUMA balancing in relation to this VMA. Updated under mmap read lock by :c:func:`!task_numa_work`.h](j )}(h:c:type:`!vma_numab_state`h]hvma_numab_state}(hjhhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hjubhu object which describes the current state of NUMA balancing in relation to this VMA. Updated under mmap read lock by }(hjhhhNhNubj )}(h:c:func:`!task_numa_work`h]htask_numa_work()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hmmap read, numab-specific lock.h]hmmap read, numab-specific lock.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(h:c:member:`!vm_userfaultfd_ctx`h]j )}(hj6h]hvm_userfaultfd_ctx}(hj8hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj4ubah}(h]h ]h"]h$]h&]uh1jhhhKhj1ubah}(h]h ]h"]h$]h&]uh1jhj.ubj)}(hhh]j)}(hCONFIG_USERFAULTFDh]hCONFIG_USERFAULTFD}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjRubah}(h]h ]h"]h$]h&]uh1jhj.ubj)}(hhh]j)}(hUserfaultfd context wrapper object of type :c:type:`!vm_userfaultfd_ctx`, either of zero size if userfaultfd is disabled, or containing a pointer to an underlying :c:type:`!userfaultfd_ctx` object which describes userfaultfd metadata.h](h+Userfaultfd context wrapper object of type }(hjlhhhNhNubj )}(h:c:type:`!vm_userfaultfd_ctx`h]hvm_userfaultfd_ctx}(hjthhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hjlubh[, either of zero size if userfaultfd is disabled, or containing a pointer to an underlying }(hjlhhhNhNubj )}(h:c:type:`!userfaultfd_ctx`h]huserfaultfd_ctx}(hjhhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hjlubh- object which describes userfaultfd metadata.}(hjlhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjiubah}(h]h ]h"]h$]h&]uh1jhj.ubj)}(hhh]j)}(hmmap write, VMA write.h]hmmap write, VMA write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhj.ubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jxhjubeh}(h]h ]h"]h$]h&]colsKuh1jrhjnubeh}(h]id3ah ]h"]h$]h&]uh1jmhj hhhhhNubj)}(heThese fields are present or not depending on whether the relevant kernel configuration option is set.h]heThese fields are present or not depending on whether the relevant kernel configuration option is set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj hhubjn)}(hhh](h)}(hReverse mapping fieldsh]hReverse mapping fields}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubjs)}(hhh](jx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK#uh1jwhjubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthK)uh1jwhjubjx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1jwhjubj)}(hhh]j)}(hhh](j)}(hhh]j)}(hFieldh]hField}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Descriptionh]h Description}(hj7hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhj4ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h Write lockh]h Write lock}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjKubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubjy)}(hhh](j)}(hhh](j)}(hhh]j)}(h:c:member:`!shared.rb`h]j )}(hjyh]h shared.rb}(hj{hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjwubah}(h]h ]h"]h$]h&]uh1jhhhKhjtubah}(h]h ]h"]h$]h&]uh1jhjqubj)}(hhh]j)}(hA red/black tree node used, if the mapping is file-backed, to place the VMA in the :c:member:`!struct address_space->i_mmap` red/black interval tree.h](hSA red/black tree node used, if the mapping is file-backed, to place the VMA in the }(hjhhhNhNubj )}(h):c:member:`!struct address_space->i_mmap`h]hstruct address_space->i_mmap}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubh red/black interval tree.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjqubj)}(hhh]j)}(h$mmap write, VMA write, i_mmap write.h]h$mmap write, VMA write, i_mmap write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjqubeh}(h]h ]h"]h$]h&]uh1jhjnubj)}(hhh](j)}(hhh]j)}(h#:c:member:`!shared.rb_subtree_last`h]j )}(hjh]hshared.rb_subtree_last}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hLMetadata used for management of the interval tree if the VMA is file-backed.h]hLMetadata used for management of the interval tree if the VMA is file-backed.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h$mmap write, VMA write, i_mmap write.h]h$mmap write, VMA write, i_mmap write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjnubj)}(hhh](j)}(hhh]j)}(h:c:member:`!anon_vma_chain`h]j )}(hj<h]hanon_vma_chain}(hj>hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj:ubah}(h]h ]h"]h$]h&]uh1jhhhMhj7ubah}(h]h ]h"]h$]h&]uh1jhj4ubj)}(hhh]j)}(hList of pointers to both forked/CoW’d :c:type:`!anon_vma` objects and :c:member:`!vma->anon_vma` if it is non-:c:macro:`!NULL`.h](h(List of pointers to both forked/CoW’d }(hj[hhhNhNubj )}(h:c:type:`!anon_vma`h]hanon_vma}(hjchhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hj[ubh objects and }(hj[hhhNhNubj )}(h:c:member:`!vma->anon_vma`h]h vma->anon_vma}(hjvhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj[ubh if it is non-}(hj[hhhNhNubj )}(h:c:macro:`!NULL`h]hNULL}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hj[ubh.}(hj[hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjXubah}(h]h ]h"]h$]h&]uh1jhj4ubj)}(hhh]j)}(hmmap read, anon_vma write.h]hmmap read, anon_vma write.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhj4ubeh}(h]h ]h"]h$]h&]uh1jhjnubj)}(hhh](j)}(hhh]j)}(h:c:member:`!anon_vma`h]j )}(hjh]hanon_vma}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(h:c:type:`!anon_vma` object used by anonymous folios mapped exclusively to this VMA. Initially set by :c:func:`!anon_vma_prepare` serialised by the :c:macro:`!page_table_lock`. This is set as soon as any page is faulted in.h](j )}(h:c:type:`!anon_vma`h]hanon_vma}(hjhhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hjubhR object used by anonymous folios mapped exclusively to this VMA. Initially set by }(hjhhhNhNubj )}(h:c:func:`!anon_vma_prepare`h]hanon_vma_prepare()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh serialised by the }(hjhhhNhNubj )}(h:c:macro:`!page_table_lock`h]hpage_table_lock}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubh0. This is set as soon as any page is faulted in.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hQWhen :c:macro:`NULL` and setting non-:c:macro:`NULL`: mmap read, page_table_lock.h](hWhen }(hj8hhhNhNubh)}(h:c:macro:`NULL`h]j )}(hjBh]hNULL}(hjDhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hj@ubah}(h]h ]h"]h$]h&]refdoch refdomainjreftypemacro refexplicitrefwarn reftargetNULLuh1hhhhMhj8ubh and setting non-}(hj8hhhNhNubh)}(h:c:macro:`NULL`h]j )}(hjfh]hNULL}(hjhhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjdubah}(h]h ]h"]h$]h&]refdoch refdomainjreftypemacro refexplicitrefwarnj^NULLuh1hhhhMhj8ubh: mmap read, page_table_lock.}(hj8hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj5ubj)}(h\When non-:c:macro:`NULL` and setting :c:macro:`NULL`: mmap write, VMA write, anon_vma write.h](h When non-}(hjhhhNhNubh)}(h:c:macro:`NULL`h]j )}(hjh]hNULL}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]refdoch refdomainjreftypemacro refexplicitrefwarnj^NULLuh1hhhhM hjubh and setting }(hjhhhNhNubh)}(h:c:macro:`NULL`h]j )}(hjh]hNULL}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]refdoch refdomainjreftypemacro refexplicitrefwarnj^NULLuh1hhhhM hjubh(: mmap write, VMA write, anon_vma write.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj5ubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjnubeh}(h]h ]h"]h$]h&]uh1jxhjubeh}(h]h ]h"]h$]h&]colsKuh1jrhjubeh}(h]id4ah ]h"]h$]h&]uh1jmhj hhhhhNubj)}(hX These fields are used to both place the VMA within the reverse mapping, and for anonymous mappings, to be able to access both related :c:struct:`!struct anon_vma` objects and the :c:struct:`!struct anon_vma` in which folios mapped exclusively to this VMA should reside.h](hThese fields are used to both place the VMA within the reverse mapping, and for anonymous mappings, to be able to access both related }(hjhhhNhNubj )}(h:c:struct:`!struct anon_vma`h]hstruct anon_vma}(hj hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjubh objects and the }(hjhhhNhNubj )}(h:c:struct:`!struct anon_vma`h]hstruct anon_vma}(hjhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hjubh> in which folios mapped exclusively to this VMA should reside.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj hhubj)}(hIf a file-backed mapping is mapped with :c:macro:`!MAP_PRIVATE` set then it can be in both the :c:type:`!anon_vma` and :c:type:`!i_mmap` trees at the same time, so all of these fields might be utilised at once.h]j)}(hIf a file-backed mapping is mapped with :c:macro:`!MAP_PRIVATE` set then it can be in both the :c:type:`!anon_vma` and :c:type:`!i_mmap` trees at the same time, so all of these fields might be utilised at once.h](h(If a file-backed mapping is mapped with }(hj9hhhNhNubj )}(h:c:macro:`!MAP_PRIVATE`h]h MAP_PRIVATE}(hjAhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hj9ubh set then it can be in both the }(hj9hhhNhNubj )}(h:c:type:`!anon_vma`h]hanon_vma}(hjThhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hj9ubh and }(hj9hhhNhNubj )}(h:c:type:`!i_mmap`h]hi_mmap}(hjghhhNhNubah}(h]h ](jjc-typeeh"]h$]h&]uh1j hj9ubhJ trees at the same time, so all of these fields might be utilised at once.}(hj9hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj5ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h] vma-fieldsah ]h"] vma fieldsah$]h&]uh1hhjhhhhhKubeh}(h] lock-usageah ]h"] lock usageah$]h&]uh1hhjhhhhhKIubh)}(hhh](h)}(h Page tablesh]h Page tables}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubj)}(hXWe won't speak exhaustively on the subject but broadly speaking, page tables map virtual addresses to physical ones through a series of page tables, each of which contain entries with physical addresses for the next page table level (along with flags), and at the leaf level the physical addresses of the underlying physical data pages or a special entry such as a swap entry, migration entry or other special marker. Offsets into these pages are provided by the virtual address itself.h]hXWe won’t speak exhaustively on the subject but broadly speaking, page tables map virtual addresses to physical ones through a series of page tables, each of which contain entries with physical addresses for the next page table level (along with flags), and at the leaf level the physical addresses of the underlying physical data pages or a special entry such as a swap entry, migration entry or other special marker. Offsets into these pages are provided by the virtual address itself.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjhhubj)}(hIn Linux these are divided into five levels - PGD, P4D, PUD, PMD and PTE. Huge pages might eliminate one or two of these levels, but when this is the case we typically refer to the leaf level as the PTE level regardless.h]hIn Linux these are divided into five levels - PGD, P4D, PUD, PMD and PTE. Huge pages might eliminate one or two of these levels, but when this is the case we typically refer to the leaf level as the PTE level regardless.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM#hjhhubj)}(hXSIn instances where the architecture supports fewer page tables than five the kernel cleverly 'folds' page table levels, that is stubbing out functions related to the skipped levels. This allows us to conceptually act as if there were always five levels, even if the compiler might, in practice, eliminate any code relating to missing ones.h]j)}(hXSIn instances where the architecture supports fewer page tables than five the kernel cleverly 'folds' page table levels, that is stubbing out functions related to the skipped levels. This allows us to conceptually act as if there were always five levels, even if the compiler might, in practice, eliminate any code relating to missing ones.h]hXWIn instances where the architecture supports fewer page tables than five the kernel cleverly ‘folds’ page table levels, that is stubbing out functions related to the skipped levels. This allows us to conceptually act as if there were always five levels, even if the compiler might, in practice, eliminate any code relating to missing ones.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM'hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hAThere are four key operations typically performed on page tables:h]hAThere are four key operations typically performed on page tables:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM.hjhhubhenumerated_list)}(hhh](j)}(hX**Traversing** page tables - Simply reading page tables in order to traverse them. This only requires that the VMA is kept stable, so a lock which establishes this suffices for traversal (there are also lockless variants which eliminate even this requirement, such as :c:func:`!gup_fast`). There is also a special case of page table traversal for non-VMA regions which we consider separately below.h]j)}(hX**Traversing** page tables - Simply reading page tables in order to traverse them. This only requires that the VMA is kept stable, so a lock which establishes this suffices for traversal (there are also lockless variants which eliminate even this requirement, such as :c:func:`!gup_fast`). There is also a special case of page table traversal for non-VMA regions which we consider separately below.h](j)}(h**Traversing**h]h Traversing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh page tables - Simply reading page tables in order to traverse them. This only requires that the VMA is kept stable, so a lock which establishes this suffices for traversal (there are also lockless variants which eliminate even this requirement, such as }(hjhhhNhNubj )}(h:c:func:`!gup_fast`h]h gup_fast()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubho). There is also a special case of page table traversal for non-VMA regions which we consider separately below.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM0hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h**Installing** page table mappings - Whether creating a new mapping or modifying an existing one in such a way as to change its identity. This requires that the VMA is kept stable via an mmap or VMA lock (explicitly not rmap locks).h]j)}(h**Installing** page table mappings - Whether creating a new mapping or modifying an existing one in such a way as to change its identity. This requires that the VMA is kept stable via an mmap or VMA lock (explicitly not rmap locks).h](j)}(h**Installing**h]h Installing}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj+ubh page table mappings - Whether creating a new mapping or modifying an existing one in such a way as to change its identity. This requires that the VMA is kept stable via an mmap or VMA lock (explicitly not rmap locks).}(hj+hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM6hj'ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hX**Zapping/unmapping** page table entries - This is what the kernel calls clearing page table mappings at the leaf level only, whilst leaving all page tables in place. This is a very common operation in the kernel performed on file truncation, the :c:macro:`!MADV_DONTNEED` operation via :c:func:`!madvise`, and others. This is performed by a number of functions including :c:func:`!unmap_mapping_range` and :c:func:`!unmap_mapping_pages`. The VMA need only be kept stable for this operation.h]j)}(hX**Zapping/unmapping** page table entries - This is what the kernel calls clearing page table mappings at the leaf level only, whilst leaving all page tables in place. This is a very common operation in the kernel performed on file truncation, the :c:macro:`!MADV_DONTNEED` operation via :c:func:`!madvise`, and others. This is performed by a number of functions including :c:func:`!unmap_mapping_range` and :c:func:`!unmap_mapping_pages`. The VMA need only be kept stable for this operation.h](j)}(h**Zapping/unmapping**h]hZapping/unmapping}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjQubh page table entries - This is what the kernel calls clearing page table mappings at the leaf level only, whilst leaving all page tables in place. This is a very common operation in the kernel performed on file truncation, the }(hjQhhhNhNubj )}(h:c:macro:`!MADV_DONTNEED`h]h MADV_DONTNEED}(hjghhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjQubh operation via }(hjQhhhNhNubj )}(h:c:func:`!madvise`h]h madvise()}(hjzhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjQubhC, and others. This is performed by a number of functions including }(hjQhhhNhNubj )}(h:c:func:`!unmap_mapping_range`h]hunmap_mapping_range()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjQubh and }(hjQhhhNhNubj )}(h:c:func:`!unmap_mapping_pages`h]hunmap_mapping_pages()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjQubh6. The VMA need only be kept stable for this operation.}(hjQhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM:hjMubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hX**Freeing** page tables - When finally the kernel removes page tables from a userland process (typically via :c:func:`!free_pgtables`) extreme care must be taken to ensure this is done safely, as this logic finally frees all page tables in the specified range, ignoring existing leaf entries (it assumes the caller has both zapped the range and prevented any further faults or modifications within it). h]j)}(hX**Freeing** page tables - When finally the kernel removes page tables from a userland process (typically via :c:func:`!free_pgtables`) extreme care must be taken to ensure this is done safely, as this logic finally frees all page tables in the specified range, ignoring existing leaf entries (it assumes the caller has both zapped the range and prevented any further faults or modifications within it).h](j)}(h **Freeing**h]hFreeing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhb page tables - When finally the kernel removes page tables from a userland process (typically via }(hjhhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubhX ) extreme care must be taken to ensure this is done safely, as this logic finally frees all page tables in the specified range, ignoring existing leaf entries (it assumes the caller has both zapped the range and prevented any further faults or modifications within it).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMAhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1jhjhhhhhM0ubj)}(hModifying mappings for reclaim or migration is performed under rmap lock as it, like zapping, does not fundamentally modify the identity of what is being mapped.h]j)}(hModifying mappings for reclaim or migration is performed under rmap lock as it, like zapping, does not fundamentally modify the identity of what is being mapped.h]hModifying mappings for reclaim or migration is performed under rmap lock as it, like zapping, does not fundamentally modify the identity of what is being mapped.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMHhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h**Traversing** and **zapping** ranges can be performed holding any one of the locks described in the terminology section above - that is the mmap lock, the VMA lock or either of the reverse mapping locks.h](j)}(h**Traversing**h]h Traversing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh and }(hjhhhNhNubj)}(h **zapping**h]hzapping}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh ranges can be performed holding any one of the locks described in the terminology section above - that is the mmap lock, the VMA lock or either of the reverse mapping locks.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMLhjhhubj)}(hX4That is - as long as you keep the relevant VMA **stable** - you are good to go ahead and perform these operations on page tables (though internally, kernel operations that perform writes also acquire internal page table locks to serialise - see the page table implementation detail section for more details).h](h/That is - as long as you keep the relevant VMA }(hjIhhhNhNubj)}(h **stable**h]hstable}(hjQhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjIubh - you are good to go ahead and perform these operations on page tables (though internally, kernel operations that perform writes also acquire internal page table locks to serialise - see the page table implementation detail section for more details).}(hjIhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMPhjhhubj)}(hWe free empty PTE tables on zap under the RCU lock - this does not change the aforementioned locking requirements around zapping.h]j)}(hWe free empty PTE tables on zap under the RCU lock - this does not change the aforementioned locking requirements around zapping.h]hWe free empty PTE tables on zap under the RCU lock - this does not change the aforementioned locking requirements around zapping.}(hjmhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMUhjiubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hWhen **installing** page table entries, the mmap or VMA lock must be held to keep the VMA stable. We explore why this is in the page table locking details section below.h](hWhen }(hjhhhNhNubj)}(h**installing**h]h installing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh page table entries, the mmap or VMA lock must be held to keep the VMA stable. We explore why this is in the page table locking details section below.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMXhjhhubj)}(h**Freeing** page tables is an entirely internal memory management operation and has special requirements (see the page freeing section below for more details).h](j)}(h **Freeing**h]hFreeing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh page tables is an entirely internal memory management operation and has special requirements (see the page freeing section below for more details).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM\hjhhubjs )}(hXBWhen **freeing** page tables, it must not be possible for VMAs containing the ranges those page tables map to be accessible via the reverse mapping. The :c:func:`!free_pgtables` function removes the relevant VMAs from the reverse mappings, but no other VMAs can be permitted to be accessible and span the specified range.h](j)}(hWhen **freeing** page tables, it must not be possible for VMAs containing the ranges those page tables map to be accessible via the reverse mapping.h](hWhen }(hjhhhNhNubj)}(h **freeing**h]hfreeing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh page tables, it must not be possible for VMAs containing the ranges those page tables map to be accessible via the reverse mapping.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM_hjubj)}(hThe :c:func:`!free_pgtables` function removes the relevant VMAs from the reverse mappings, but no other VMAs can be permitted to be accessible and span the specified range.h](hThe }(hjhhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh function removes the relevant VMAs from the reverse mappings, but no other VMAs can be permitted to be accessible and span the specified range.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMchjubeh}(h]h ]h"]h$]h&]uh1jr hjhhhhhNubeh}(h] page-tablesah ]h"] page tablesah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hTraversing non-VMA page tablesh]hTraversing non-VMA page tables}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMhubj)}(hWe've focused above on traversal of page tables belonging to VMAs. It is also possible to traverse page tables which are not represented by VMAs.h]hWe’ve focused above on traversal of page tables belonging to VMAs. It is also possible to traverse page tables which are not represented by VMAs.}(hj!hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMjhjhhubj)}(hXKernel page table mappings themselves are generally managed but whatever part of the kernel established them and the aforementioned locking rules do not apply - for instance vmalloc has its own set of locks which are utilised for establishing and tearing down page its page tables.h]hXKernel page table mappings themselves are generally managed but whatever part of the kernel established them and the aforementioned locking rules do not apply - for instance vmalloc has its own set of locks which are utilised for establishing and tearing down page its page tables.}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMmhjhhubj)}(hHowever, for convenience we provide the :c:func:`!walk_kernel_page_table_range` function which is synchronised via the mmap lock on the :c:macro:`!init_mm` kernel instantiation of the :c:struct:`!struct mm_struct` metadata object.h](h(However, for convenience we provide the }(hj=hhhNhNubj )}(h':c:func:`!walk_kernel_page_table_range`h]hwalk_kernel_page_table_range()}(hjEhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj=ubh9 function which is synchronised via the mmap lock on the }(hj=hhhNhNubj )}(h:c:macro:`!init_mm`h]hinit_mm}(hjXhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hj=ubh kernel instantiation of the }(hj=hhhNhNubj )}(h:c:struct:`!struct mm_struct`h]hstruct mm_struct}(hjkhhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj=ubh metadata object.}(hj=hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMrhjhhubj)}(hIf an operation requires exclusive access, a write lock is used, but if not, a read lock suffices - we assert only that at least a read lock has been acquired.h]hIf an operation requires exclusive access, a write lock is used, but if not, a read lock suffices - we assert only that at least a read lock has been acquired.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMvhjhhubj)}(hSince, aside from vmalloc and memory hot plug, kernel page tables are not torn down all that often - this usually suffices, however any caller of this functionality must ensure that any additionally required locks are acquired in advance.h]hSince, aside from vmalloc and memory hot plug, kernel page tables are not torn down all that often - this usually suffices, however any caller of this functionality must ensure that any additionally required locks are acquired in advance.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMyhjhhubj)}(hWe also permit a truly unusual case is the traversal of non-VMA ranges in **userland** ranges, as provided for by :c:func:`!walk_page_range_debug`.h](hJWe also permit a truly unusual case is the traversal of non-VMA ranges in }(hjhhhNhNubj)}(h **userland**h]huserland}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh ranges, as provided for by }(hjhhhNhNubj )}(h :c:func:`!walk_page_range_debug`h]hwalk_page_range_debug()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM~hjhhubj)}(hThis has only one user - the general page table dumping logic (implemented in :c:macro:`!mm/ptdump.c`) - which seeks to expose all mappings for debug purposes even if they are highly unusual (possibly architecture-specific) and are not backed by a VMA.h](hNThis has only one user - the general page table dumping logic (implemented in }(hjhhhNhNubj )}(h:c:macro:`!mm/ptdump.c`h]h mm/ptdump.c}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubh) - which seeks to expose all mappings for debug purposes even if they are highly unusual (possibly architecture-specific) and are not backed by a VMA.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjhhubj)}(hWe must take great care in this case, as the :c:func:`!munmap` implementation detaches VMAs under an mmap write lock before tearing down page tables under a downgraded mmap read lock.h](h-We must take great care in this case, as the }(hjhhhNhNubj )}(h:c:func:`!munmap`h]hmunmap()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubhy implementation detaches VMAs under an mmap write lock before tearing down page tables under a downgraded mmap read lock.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjhhubj)}(h_This means such an operation could race with this, and thus an mmap **write** lock is required.h](hDThis means such an operation could race with this, and thus an mmap }(hjhhhNhNubj)}(h **write**h]hwrite}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh lock is required.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjhhubeh}(h]traversing-non-vma-page-tablesah ]h"]traversing non-vma page tablesah$]h&]uh1hhjhhhhhMhubh)}(hhh](h)}(h Lock orderingh]h Lock ordering}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj=hhhhhMubj)}(hAs we have multiple locks across the kernel which may or may not be taken at the same time as explicit mm or VMA locks, we have to be wary of lock inversion, and the **order** in which locks are acquired and released becomes very important.h](hAs we have multiple locks across the kernel which may or may not be taken at the same time as explicit mm or VMA locks, we have to be wary of lock inversion, and the }(hjNhhhNhNubj)}(h **order**h]horder}(hjVhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjNubhA in which locks are acquired and released becomes very important.}(hjNhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj=hhubj)}(hXLock inversion occurs when two threads need to acquire multiple locks, but in doing so inadvertently cause a mutual deadlock. For example, consider thread 1 which holds lock A and tries to acquire lock B, while thread 2 holds lock B and tries to acquire lock A. Both threads are now deadlocked on each other. However, had they attempted to acquire locks in the same order, one would have waited for the other to complete its work and no deadlock would have occurred.h](j)}(h}Lock inversion occurs when two threads need to acquire multiple locks, but in doing so inadvertently cause a mutual deadlock.h]h}Lock inversion occurs when two threads need to acquire multiple locks, but in doing so inadvertently cause a mutual deadlock.}(hjrhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjnubj)}(hFor example, consider thread 1 which holds lock A and tries to acquire lock B, while thread 2 holds lock B and tries to acquire lock A.h]hFor example, consider thread 1 which holds lock A and tries to acquire lock B, while thread 2 holds lock B and tries to acquire lock A.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjnubj)}(hBoth threads are now deadlocked on each other. However, had they attempted to acquire locks in the same order, one would have waited for the other to complete its work and no deadlock would have occurred.h]hBoth threads are now deadlocked on each other. However, had they attempted to acquire locks in the same order, one would have waited for the other to complete its work and no deadlock would have occurred.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjnubeh}(h]h ]h"]h$]h&]uh1jhj=hhhhhNubj)}(h~The opening comment in :c:macro:`!mm/rmap.c` describes in detail the required ordering of locks within memory management code:h](hThe opening comment in }(hjhhhNhNubj )}(h:c:macro:`!mm/rmap.c`h]h mm/rmap.c}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubhR describes in detail the required ordering of locks within memory management code:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj=hhubh literal_block)}(hXpinode->i_rwsem (while writing or truncating, not reading or faulting) mm->mmap_lock mapping->invalidate_lock (in filemap_fault) folio_lock hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below) vma_start_write mapping->i_mmap_rwsem anon_vma->rwsem mm->page_table_lock or pte_lock swap_lock (in swap_duplicate, swap_info_get) mmlist_lock (in mmput, drain_mmlist and others) mapping->private_lock (in block_dirty_folio) i_pages lock (widely used) lruvec->lru_lock (in folio_lruvec_lock_irq) inode->i_lock (in set_page_dirty's __mark_inode_dirty) bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) sb_lock (within inode_lock in fs/fs-writeback.c) i_pages lock (widely used, in set_page_dirty, in arch-dependent flush_dcache_mmap_lock, within bdi.wb->list_lock in __sync_single_inode)h]hXpinode->i_rwsem (while writing or truncating, not reading or faulting) mm->mmap_lock mapping->invalidate_lock (in filemap_fault) folio_lock hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below) vma_start_write mapping->i_mmap_rwsem anon_vma->rwsem mm->page_table_lock or pte_lock swap_lock (in swap_duplicate, swap_info_get) mmlist_lock (in mmput, drain_mmlist and others) mapping->private_lock (in block_dirty_folio) i_pages lock (widely used) lruvec->lru_lock (in folio_lruvec_lock_irq) inode->i_lock (in set_page_dirty's __mark_inode_dirty) bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) sb_lock (within inode_lock in fs/fs-writeback.c) i_pages lock (widely used, in set_page_dirty, in arch-dependent flush_dcache_mmap_lock, within bdi.wb->list_lock in __sync_single_inode)}hjsbah}(h]h ]h"]h$]h&]hhƌforcelanguagenonehighlight_args}uh1jhhhMhj=hhubj)}(hjThere is also a file-system specific lock ordering comment located at the top of :c:macro:`!mm/filemap.c`:h](hQThere is also a file-system specific lock ordering comment located at the top of }(hjhhhNhNubj )}(h:c:macro:`!mm/filemap.c`h]h mm/filemap.c}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj=hhubj)}(hX->i_mmap_rwsem (truncate_pagecache) ->private_lock (__free_pte->block_dirty_folio) ->swap_lock (exclusive_swap_page, others) ->i_pages lock ->i_rwsem ->invalidate_lock (acquired by fs in truncate path) ->i_mmap_rwsem (truncate->unmap_mapping_range) ->mmap_lock ->i_mmap_rwsem ->page_table_lock or pte_lock (various, mainly in memory.c) ->i_pages lock (arch-dependent flush_dcache_mmap_lock) ->mmap_lock ->invalidate_lock (filemap_fault) ->lock_page (filemap_fault, access_process_vm) ->i_rwsem (generic_perform_write) ->mmap_lock (fault_in_readable->do_page_fault) bdi->wb.list_lock sb_lock (fs/fs-writeback.c) ->i_pages lock (__sync_single_inode) ->i_mmap_rwsem ->anon_vma.lock (vma_merge) ->anon_vma.lock ->page_table_lock or pte_lock (anon_vma_prepare and various) ->page_table_lock or pte_lock ->swap_lock (try_to_unmap_one) ->private_lock (try_to_unmap_one) ->i_pages lock (try_to_unmap_one) ->lruvec->lru_lock (follow_page_mask->mark_page_accessed) ->lruvec->lru_lock (check_pte_range->folio_isolate_lru) ->private_lock (folio_remove_rmap_pte->set_page_dirty) ->i_pages lock (folio_remove_rmap_pte->set_page_dirty) bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty) ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty) bdi.wb->list_lock (zap_pte_range->set_page_dirty) ->inode->i_lock (zap_pte_range->set_page_dirty) ->private_lock (zap_pte_range->block_dirty_folio)h]hX->i_mmap_rwsem (truncate_pagecache) ->private_lock (__free_pte->block_dirty_folio) ->swap_lock (exclusive_swap_page, others) ->i_pages lock ->i_rwsem ->invalidate_lock (acquired by fs in truncate path) ->i_mmap_rwsem (truncate->unmap_mapping_range) ->mmap_lock ->i_mmap_rwsem ->page_table_lock or pte_lock (various, mainly in memory.c) ->i_pages lock (arch-dependent flush_dcache_mmap_lock) ->mmap_lock ->invalidate_lock (filemap_fault) ->lock_page (filemap_fault, access_process_vm) ->i_rwsem (generic_perform_write) ->mmap_lock (fault_in_readable->do_page_fault) bdi->wb.list_lock sb_lock (fs/fs-writeback.c) ->i_pages lock (__sync_single_inode) ->i_mmap_rwsem ->anon_vma.lock (vma_merge) ->anon_vma.lock ->page_table_lock or pte_lock (anon_vma_prepare and various) ->page_table_lock or pte_lock ->swap_lock (try_to_unmap_one) ->private_lock (try_to_unmap_one) ->i_pages lock (try_to_unmap_one) ->lruvec->lru_lock (follow_page_mask->mark_page_accessed) ->lruvec->lru_lock (check_pte_range->folio_isolate_lru) ->private_lock (folio_remove_rmap_pte->set_page_dirty) ->i_pages lock (folio_remove_rmap_pte->set_page_dirty) bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty) ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty) bdi.wb->list_lock (zap_pte_range->set_page_dirty) ->inode->i_lock (zap_pte_range->set_page_dirty) ->private_lock (zap_pte_range->block_dirty_folio)}hjsbah}(h]h ]h"]h$]h&]hhjjjj}uh1jhhhMhj=hhubj)}(hsPlease check the current state of these comments which may have changed since the time of writing of this document.h]hsPlease check the current state of these comments which may have changed since the time of writing of this document.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj=hhubeh}(h] lock-orderingah ]h"] lock orderingah$]h&]uh1hhjhhhhhMubeh}(h]lockingah ]h"]lockingah$]h&]uh1hhhhhhhhK!ubh)}(hhh](h)}(hLocking Implementation Detailsh]hLocking Implementation Details}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhMubjs )}(hnLocking rules for PTE-level page tables are very different from locking rules for page tables at other levels.h]j)}(hnLocking rules for PTE-level page tables are very different from locking rules for page tables at other levels.h]hnLocking rules for PTE-level page tables are very different from locking rules for page tables at other levels.}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj7ubah}(h]h ]h"]h$]h&]uh1jr hj&hhhhhNubh)}(hhh](h)}(hPage table locking detailsh]hPage table locking details}(hjRhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjOhhhhhMubj)}(hThis section explores page table locking requirements for page tables encompassed by a VMA. See the above section on non-VMA page table traversal for details on how we handle that case.h]j)}(hThis section explores page table locking requirements for page tables encompassed by a VMA. See the above section on non-VMA page table traversal for details on how we handle that case.h]hThis section explores page table locking requirements for page tables encompassed by a VMA. See the above section on non-VMA page table traversal for details on how we handle that case.}(hjdhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj`ubah}(h]h ]h"]h$]h&]uh1jhjOhhhhhNubj)}(hwIn addition to the locks described in the terminology section above, we have additional locks dedicated to page tables:h]hwIn addition to the locks described in the terminology section above, we have additional locks dedicated to page tables:}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjOhhubj)}(hhh](j)}(h**Higher level page table locks** - Higher level page tables, that is PGD, P4D and PUD each make use of the process address space granularity :c:member:`!mm->page_table_lock` lock when modified. h]j)}(h**Higher level page table locks** - Higher level page tables, that is PGD, P4D and PUD each make use of the process address space granularity :c:member:`!mm->page_table_lock` lock when modified.h](j)}(h!**Higher level page table locks**h]hHigher level page table locks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhm - Higher level page tables, that is PGD, P4D and PUD each make use of the process address space granularity }(hjhhhNhNubj )}(h :c:member:`!mm->page_table_lock`h]hmm->page_table_lock}(hjhhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjubh lock when modified.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hX**Fine-grained page table locks** - PMDs and PTEs each have fine-grained locks either kept within the folios describing the page tables or allocated separated and pointed at by the folios if :c:macro:`!ALLOC_SPLIT_PTLOCKS` is set. The PMD spin lock is obtained via :c:func:`!pmd_lock`, however PTEs are mapped into higher memory (if a 32-bit system) and carefully locked via :c:func:`!pte_offset_map_lock`. h]j)}(hX**Fine-grained page table locks** - PMDs and PTEs each have fine-grained locks either kept within the folios describing the page tables or allocated separated and pointed at by the folios if :c:macro:`!ALLOC_SPLIT_PTLOCKS` is set. The PMD spin lock is obtained via :c:func:`!pmd_lock`, however PTEs are mapped into higher memory (if a 32-bit system) and carefully locked via :c:func:`!pte_offset_map_lock`.h](j)}(h!**Fine-grained page table locks**h]hFine-grained page table locks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh - PMDs and PTEs each have fine-grained locks either kept within the folios describing the page tables or allocated separated and pointed at by the folios if }(hjhhhNhNubj )}(h:c:macro:`!ALLOC_SPLIT_PTLOCKS`h]hALLOC_SPLIT_PTLOCKS}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubh+ is set. The PMD spin lock is obtained via }(hjhhhNhNubj )}(h:c:func:`!pmd_lock`h]h pmd_lock()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh[, however PTEs are mapped into higher memory (if a 32-bit system) and carefully locked via }(hjhhhNhNubj )}(h:c:func:`!pte_offset_map_lock`h]hpte_offset_map_lock()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhMhjOhhubj)}(hvThese locks represent the minimum required to interact with each page table level, but there are further requirements.h]hvThese locks represent the minimum required to interact with each page table level, but there are further requirements.}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM hjOhhubj)}(hImportantly, note that on a **traversal** of page tables, sometimes no such locks are taken. However, at the PTE level, at least concurrent page table deletion must be prevented (using RCU) and the page table must be mapped into high memory, see below.h](hImportantly, note that on a }(hj5hhhNhNubj)}(h **traversal**h]h traversal}(hj=hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj5ubh of page tables, sometimes no such locks are taken. However, at the PTE level, at least concurrent page table deletion must be prevented (using RCU) and the page table must be mapped into high memory, see below.}(hj5hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM hjOhhubj)}(hxWhether care is taken on reading the page table entries depends on the architecture, see the section on atomicity below.h]hxWhether care is taken on reading the page table entries depends on the architecture, see the section on atomicity below.}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjOhhubh)}(hhh](h)}(h Locking rulesh]h Locking rules}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjchhhhhMubj)}(hCWe establish basic locking rules when interacting with page tables:h]hCWe establish basic locking rules when interacting with page tables:}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjchhubj)}(hhh](j)}(hWhen changing a page table entry the page table lock for that page table **must** be held, except if you can safely assume nobody can access the page tables concurrently (such as on invocation of :c:func:`!free_pgtables`).h]j)}(hWhen changing a page table entry the page table lock for that page table **must** be held, except if you can safely assume nobody can access the page tables concurrently (such as on invocation of :c:func:`!free_pgtables`).h](hIWhen changing a page table entry the page table lock for that page table }(hjhhhNhNubj)}(h**must**h]hmust}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhs be held, except if you can safely assume nobody can access the page tables concurrently (such as on invocation of }(hjhhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h{Reads from and writes to page table entries must be *appropriately* atomic. See the section on atomicity below for details.h]j)}(h{Reads from and writes to page table entries must be *appropriately* atomic. See the section on atomicity below for details.h](h4Reads from and writes to page table entries must be }(hjhhhNhNubj\)}(h*appropriately*h]h appropriately}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j[hjubh8 atomic. See the section on atomicity below for details.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hPopulating previously empty entries requires that the mmap or VMA locks are held (read or write), doing so with only rmap locks would be dangerous (see the warning below).h]j)}(hPopulating previously empty entries requires that the mmap or VMA locks are held (read or write), doing so with only rmap locks would be dangerous (see the warning below).h]hPopulating previously empty entries requires that the mmap or VMA locks are held (read or write), doing so with only rmap locks would be dangerous (see the warning below).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hAs mentioned previously, zapping can be performed while simply keeping the VMA stable, that is holding any one of the mmap, VMA or rmap locks. h]j)}(hAs mentioned previously, zapping can be performed while simply keeping the VMA stable, that is holding any one of the mmap, VMA or rmap locks.h]hAs mentioned previously, zapping can be performed while simply keeping the VMA stable, that is holding any one of the mmap, VMA or rmap locks.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM!hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhMhjchhubjs )}(hXPopulating previously empty entries is dangerous as, when unmapping VMAs, :c:func:`!vms_clear_ptes` has a window of time between zapping (via :c:func:`!unmap_vmas`) and freeing page tables (via :c:func:`!free_pgtables`), where the VMA is still visible in the rmap tree. :c:func:`!free_pgtables` assumes that the zap has already been performed and removes PTEs unconditionally (along with all other page tables in the freed range), so installing new PTE entries could leak memory and also cause other unexpected and dangerous behaviour.h]j)}(hXPopulating previously empty entries is dangerous as, when unmapping VMAs, :c:func:`!vms_clear_ptes` has a window of time between zapping (via :c:func:`!unmap_vmas`) and freeing page tables (via :c:func:`!free_pgtables`), where the VMA is still visible in the rmap tree. :c:func:`!free_pgtables` assumes that the zap has already been performed and removes PTEs unconditionally (along with all other page tables in the freed range), so installing new PTE entries could leak memory and also cause other unexpected and dangerous behaviour.h](hJPopulating previously empty entries is dangerous as, when unmapping VMAs, }(hj&hhhNhNubj )}(h:c:func:`!vms_clear_ptes`h]hvms_clear_ptes()}(hj.hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj&ubh+ has a window of time between zapping (via }(hj&hhhNhNubj )}(h:c:func:`!unmap_vmas`h]h unmap_vmas()}(hjAhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj&ubh) and freeing page tables (via }(hj&hhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hjThhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj&ubh4), where the VMA is still visible in the rmap tree. }(hj&hhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hjghhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj&ubh assumes that the zap has already been performed and removes PTEs unconditionally (along with all other page tables in the freed range), so installing new PTE entries could leak memory and also cause other unexpected and dangerous behaviour.}(hj&hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM$hj"ubah}(h]h ]h"]h$]h&]uh1jr hjchhhhhNubj)}(hsThere are additional rules applicable when moving page tables, which we discuss in the section on this topic below.h]hsThere are additional rules applicable when moving page tables, which we discuss in the section on this topic below.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM.hjchhubj)}(hzPTE-level page tables are different from page tables at other levels, and there are extra requirements for accessing them:h]hzPTE-level page tables are different from page tables at other levels, and there are extra requirements for accessing them:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM1hjchhubj)}(hhh](j)}(hyOn 32-bit architectures, they may be in high memory (meaning they need to be mapped into kernel memory to be accessible).h]j)}(hyOn 32-bit architectures, they may be in high memory (meaning they need to be mapped into kernel memory to be accessible).h]hyOn 32-bit architectures, they may be in high memory (meaning they need to be mapped into kernel memory to be accessible).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM4hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hXWhen empty, they can be unlinked and RCU-freed while holding an mmap lock or rmap lock for reading in combination with the PTE and PMD page table locks. In particular, this happens in :c:func:`!retract_page_tables` when handling :c:macro:`!MADV_COLLAPSE`. So accessing PTE-level page tables requires at least holding an RCU read lock; but that only suffices for readers that can tolerate racing with concurrent page table updates such that an empty PTE is observed (in a page table that has actually already been detached and marked for RCU freeing) while another new page table has been installed in the same location and filled with entries. Writers normally need to take the PTE lock and revalidate that the PMD entry still refers to the same PTE-level page table. If the writer does not care whether it is the same PTE-level page table, it can take the PMD lock and revalidate that the contents of pmd entry still meet the requirements. In particular, this also happens in :c:func:`!retract_page_tables` when handling :c:macro:`!MADV_COLLAPSE`. h]j)}(hXWhen empty, they can be unlinked and RCU-freed while holding an mmap lock or rmap lock for reading in combination with the PTE and PMD page table locks. In particular, this happens in :c:func:`!retract_page_tables` when handling :c:macro:`!MADV_COLLAPSE`. So accessing PTE-level page tables requires at least holding an RCU read lock; but that only suffices for readers that can tolerate racing with concurrent page table updates such that an empty PTE is observed (in a page table that has actually already been detached and marked for RCU freeing) while another new page table has been installed in the same location and filled with entries. Writers normally need to take the PTE lock and revalidate that the PMD entry still refers to the same PTE-level page table. If the writer does not care whether it is the same PTE-level page table, it can take the PMD lock and revalidate that the contents of pmd entry still meet the requirements. In particular, this also happens in :c:func:`!retract_page_tables` when handling :c:macro:`!MADV_COLLAPSE`.h](hWhen empty, they can be unlinked and RCU-freed while holding an mmap lock or rmap lock for reading in combination with the PTE and PMD page table locks. In particular, this happens in }(hjhhhNhNubj )}(h:c:func:`!retract_page_tables`h]hretract_page_tables()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh when handling }(hjhhhNhNubj )}(h:c:macro:`!MADV_COLLAPSE`h]h MADV_COLLAPSE}(hjhhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubhX. So accessing PTE-level page tables requires at least holding an RCU read lock; but that only suffices for readers that can tolerate racing with concurrent page table updates such that an empty PTE is observed (in a page table that has actually already been detached and marked for RCU freeing) while another new page table has been installed in the same location and filled with entries. Writers normally need to take the PTE lock and revalidate that the PMD entry still refers to the same PTE-level page table. If the writer does not care whether it is the same PTE-level page table, it can take the PMD lock and revalidate that the contents of pmd entry still meet the requirements. In particular, this also happens in }(hjhhhNhNubj )}(h:c:func:`!retract_page_tables`h]hretract_page_tables()}(hjhhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjubh when handling }(hjhhhNhNubj )}(h:c:macro:`!MADV_COLLAPSE`h]h MADV_COLLAPSE}(hj hhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM6hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jhhhM4hjchhubj)}(hX^To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or :c:func:`!pte_offset_map` can be used depending on stability requirements. These map the page table into kernel memory if required, take the RCU lock, and depending on variant, may also look up or acquire the PTE lock. See the comment on :c:func:`!pte_offset_map_lock`.h](h/To access PTE-level page tables, a helper like }(hj' hhhNhNubj )}(h:c:func:`!pte_offset_map_lock`h]hpte_offset_map_lock()}(hj/ hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj' ubh or }(hj' hhhNhNubj )}(h:c:func:`!pte_offset_map`h]hpte_offset_map()}(hjB hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj' ubh can be used depending on stability requirements. These map the page table into kernel memory if required, take the RCU lock, and depending on variant, may also look up or acquire the PTE lock. See the comment on }(hj' hhhNhNubj )}(h:c:func:`!pte_offset_map_lock`h]hpte_offset_map_lock()}(hjU hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj' ubh.}(hj' hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMFhjchhubeh}(h] locking-rulesah ]h"] locking rulesah$]h&]uh1hhjOhhhhhMubh)}(hhh](h)}(h Atomicityh]h Atomicity}(hjy hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjv hhhhhMMubj)}(hX`Regardless of page table locks, the MMU hardware concurrently updates accessed and dirty bits (perhaps more, depending on architecture). Additionally, page table traversal operations in parallel (though holding the VMA stable) and functionality like GUP-fast locklessly traverses (that is reads) page tables, without even keeping the VMA stable at all.h]hX`Regardless of page table locks, the MMU hardware concurrently updates accessed and dirty bits (perhaps more, depending on architecture). Additionally, page table traversal operations in parallel (though holding the VMA stable) and functionality like GUP-fast locklessly traverses (that is reads) page tables, without even keeping the VMA stable at all.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMOhjv hhubj)}(hWhen performing a page table traversal and keeping the VMA stable, whether a read must be performed once and only once or not depends on the architecture (for instance x86-64 does not require any special precautions).h]hWhen performing a page table traversal and keeping the VMA stable, whether a read must be performed once and only once or not depends on the architecture (for instance x86-64 does not require any special precautions).}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMUhjv hhubj)}(hXaIf a write is being performed, or if a read informs whether a write takes place (on an installation of a page table entry say, for instance in :c:func:`!__pud_install`), special care must always be taken. In these cases we can never assume that page table locks give us entirely exclusive access, and must retrieve page table entries once and only once.h](hIf a write is being performed, or if a read informs whether a write takes place (on an installation of a page table entry say, for instance in }(hj hhhNhNubj )}(h:c:func:`!__pud_install`h]h__pud_install()}(hj hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh), special care must always be taken. In these cases we can never assume that page table locks give us entirely exclusive access, and must retrieve page table entries once and only once.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMYhjv hhubj)}(hXIf we are reading page table entries, then we need only ensure that the compiler does not rearrange our loads. This is achieved via :c:func:`!pXXp_get` functions - :c:func:`!pgdp_get`, :c:func:`!p4dp_get`, :c:func:`!pudp_get`, :c:func:`!pmdp_get`, and :c:func:`!ptep_get`.h](hIf we are reading page table entries, then we need only ensure that the compiler does not rearrange our loads. This is achieved via }(hj hhhNhNubj )}(h:c:func:`!pXXp_get`h]h pXXp_get()}(hj hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh functions - }(hj hhhNhNubj )}(h:c:func:`!pgdp_get`h]h pgdp_get()}(hj hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh, }(hj hhhNhNubj )}(h:c:func:`!p4dp_get`h]h p4dp_get()}(hj hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh, }hj sbj )}(h:c:func:`!pudp_get`h]h pudp_get()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh, }(hj hhhNhNubj )}(h:c:func:`!pmdp_get`h]h pmdp_get()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh, and }(hj hhhNhNubj )}(h:c:func:`!ptep_get`h]h ptep_get()}(hj+!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM_hjv hhubj)}(hlEach of these uses :c:func:`!READ_ONCE` to guarantee that the compiler reads the page table entry only once.h](hEach of these uses }(hjD!hhhNhNubj )}(h:c:func:`!READ_ONCE`h]h READ_ONCE()}(hjL!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjD!ubhE to guarantee that the compiler reads the page table entry only once.}(hjD!hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMdhjv hhubj)}(hHowever, if we wish to manipulate an existing page table entry and care about the previously stored data, we must go further and use an hardware atomic operation as, for example, in :c:func:`!ptep_get_and_clear`.h](hHowever, if we wish to manipulate an existing page table entry and care about the previously stored data, we must go further and use an hardware atomic operation as, for example, in }(hje!hhhNhNubj )}(h:c:func:`!ptep_get_and_clear`h]hptep_get_and_clear()}(hjm!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hje!ubh.}(hje!hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMghjv hhubj)}(hXYEqually, operations that do not rely on the VMA being held stable, such as GUP-fast (see :c:func:`!gup_fast` and its various page table level handlers like :c:func:`!gup_fast_pte_range`), must very carefully interact with page table entries, using functions such as :c:func:`!ptep_get_lockless` and equivalent for higher level page table levels.h](hYEqually, operations that do not rely on the VMA being held stable, such as GUP-fast (see }(hj!hhhNhNubj )}(h:c:func:`!gup_fast`h]h gup_fast()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh0 and its various page table level handlers like }(hj!hhhNhNubj )}(h:c:func:`!gup_fast_pte_range`h]hgup_fast_pte_range()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubhQ), must very carefully interact with page table entries, using functions such as }(hj!hhhNhNubj )}(h:c:func:`!ptep_get_lockless`h]hptep_get_lockless()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh3 and equivalent for higher level page table levels.}(hj!hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMkhjv hhubj)}(hWrites to page table entries must also be appropriately atomic, as established by :c:func:`!set_pXX` functions - :c:func:`!set_pgd`, :c:func:`!set_p4d`, :c:func:`!set_pud`, :c:func:`!set_pmd`, and :c:func:`!set_pte`.h](hRWrites to page table entries must also be appropriately atomic, as established by }(hj!hhhNhNubj )}(h:c:func:`!set_pXX`h]h set_pXX()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh functions - }(hj!hhhNhNubj )}(h:c:func:`!set_pgd`h]h set_pgd()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh, }(hj!hhhNhNubj )}(h:c:func:`!set_p4d`h]h set_p4d()}(hj!hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh, }(hj!hhhNhNubj )}(h:c:func:`!set_pud`h]h set_pud()}(hj"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh, }hj!sbj )}(h:c:func:`!set_pmd`h]h set_pmd()}(hj!"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh, and }(hj!hhhNhNubj )}(h:c:func:`!set_pte`h]h set_pte()}(hj4"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj!ubh.}(hj!hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMqhjv hhubj)}(hEqually functions which clear page table entries must be appropriately atomic, as in :c:func:`!pXX_clear` functions - :c:func:`!pgd_clear`, :c:func:`!p4d_clear`, :c:func:`!pud_clear`, :c:func:`!pmd_clear`, and :c:func:`!pte_clear`.h](hUEqually functions which clear page table entries must be appropriately atomic, as in }(hjM"hhhNhNubj )}(h:c:func:`!pXX_clear`h]h pXX_clear()}(hjU"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM"ubh functions - }(hjM"hhhNhNubj )}(h:c:func:`!pgd_clear`h]h pgd_clear()}(hjh"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM"ubh, }(hjM"hhhNhNubj )}(h:c:func:`!p4d_clear`h]h p4d_clear()}(hj{"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM"ubh, }(hjM"hhhNhNubj )}(h:c:func:`!pud_clear`h]h pud_clear()}(hj"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM"ubh, }hjM"sbj )}(h:c:func:`!pmd_clear`h]h pmd_clear()}(hj"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM"ubh, and }(hjM"hhhNhNubj )}(h:c:func:`!pte_clear`h]h pte_clear()}(hj"hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM"ubh.}(hjM"hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMuhjv hhubeh}(h] atomicityah ]h"] atomicityah$]h&]uh1hhjOhhhhhMMubh)}(hhh](h)}(hPage table installationh]hPage table installation}(hj"hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj"hhhhhM{ubj)}(hPage table installation is performed with the VMA held stable explicitly by an mmap or VMA lock in read or write mode (see the warning in the locking rules section for details as to why).h]hPage table installation is performed with the VMA held stable explicitly by an mmap or VMA lock in read or write mode (see the warning in the locking rules section for details as to why).}(hj"hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM}hj"hhubj)}(hWhen allocating a P4D, PUD or PMD and setting the relevant entry in the above PGD, P4D or PUD, the :c:member:`!mm->page_table_lock` must be held. This is acquired in :c:func:`!__p4d_alloc`, :c:func:`!__pud_alloc` and :c:func:`!__pmd_alloc` respectively.h](hcWhen allocating a P4D, PUD or PMD and setting the relevant entry in the above PGD, P4D or PUD, the }(hj"hhhNhNubj )}(h :c:member:`!mm->page_table_lock`h]hmm->page_table_lock}(hj"hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj"ubh# must be held. This is acquired in }(hj"hhhNhNubj )}(h:c:func:`!__p4d_alloc`h]h __p4d_alloc()}(hj#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj"ubh, }(hj"hhhNhNubj )}(h:c:func:`!__pud_alloc`h]h __pud_alloc()}(hj"#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj"ubh and }(hj"hhhNhNubj )}(h:c:func:`!__pmd_alloc`h]h __pmd_alloc()}(hj5#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj"ubh respectively.}(hj"hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(h:c:func:`!__pmd_alloc` actually invokes :c:func:`!pud_lock` and :c:func:`!pud_lockptr` in turn, however at the time of writing it ultimately references the :c:member:`!mm->page_table_lock`.h]j)}(h:c:func:`!__pmd_alloc` actually invokes :c:func:`!pud_lock` and :c:func:`!pud_lockptr` in turn, however at the time of writing it ultimately references the :c:member:`!mm->page_table_lock`.h](j )}(h:c:func:`!__pmd_alloc`h]h __pmd_alloc()}(hjV#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjR#ubh actually invokes }(hjR#hhhNhNubj )}(h:c:func:`!pud_lock`h]h pud_lock()}(hji#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjR#ubh and }(hjR#hhhNhNubj )}(h:c:func:`!pud_lockptr`h]h pud_lockptr()}(hj|#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjR#ubhF in turn, however at the time of writing it ultimately references the }(hjR#hhhNhNubj )}(h :c:member:`!mm->page_table_lock`h]hmm->page_table_lock}(hj#hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjR#ubh.}(hjR#hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjN#ubah}(h]h ]h"]h$]h&]uh1jhj"hhhhhNubj)}(hXBAllocating a PTE will either use the :c:member:`!mm->page_table_lock` or, if :c:macro:`!USE_SPLIT_PMD_PTLOCKS` is defined, a lock embedded in the PMD physical page metadata in the form of a :c:struct:`!struct ptdesc`, acquired by :c:func:`!pmd_ptdesc` called from :c:func:`!pmd_lock` and ultimately :c:func:`!__pte_alloc`.h](h%Allocating a PTE will either use the }(hj#hhhNhNubj )}(h :c:member:`!mm->page_table_lock`h]hmm->page_table_lock}(hj#hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj#ubh or, if }(hj#hhhNhNubj )}(h!:c:macro:`!USE_SPLIT_PMD_PTLOCKS`h]hUSE_SPLIT_PMD_PTLOCKS}(hj#hhhNhNubah}(h]h ](jjc-macroeh"]h$]h&]uh1j hj#ubhP is defined, a lock embedded in the PMD physical page metadata in the form of a }(hj#hhhNhNubj )}(h:c:struct:`!struct ptdesc`h]h struct ptdesc}(hj#hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj#ubh, acquired by }(hj#hhhNhNubj )}(h:c:func:`!pmd_ptdesc`h]h pmd_ptdesc()}(hj#hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj#ubh called from }(hj#hhhNhNubj )}(h:c:func:`!pmd_lock`h]h pmd_lock()}(hj$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj#ubh and ultimately }(hj#hhhNhNubj )}(h:c:func:`!__pte_alloc`h]h __pte_alloc()}(hj$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj#ubh.}(hj#hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hFinally, modifying the contents of the PTE requires special treatment, as the PTE page table lock must be acquired whenever we want stable and exclusive access to entries contained within a PTE, especially when we wish to modify them.h]hFinally, modifying the contents of the PTE requires special treatment, as the PTE page table lock must be acquired whenever we want stable and exclusive access to entries contained within a PTE, especially when we wish to modify them.}(hj.$hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hXfThis is performed via :c:func:`!pte_offset_map_lock` which carefully checks to ensure that the PTE hasn't changed from under us, ultimately invoking :c:func:`!pte_lockptr` to obtain a spin lock at PTE granularity contained within the :c:struct:`!struct ptdesc` associated with the physical PTE page. The lock must be released via :c:func:`!pte_unmap_unlock`.h](hThis is performed via }(hj<$hhhNhNubj )}(h:c:func:`!pte_offset_map_lock`h]hpte_offset_map_lock()}(hjD$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj<$ubhc which carefully checks to ensure that the PTE hasn’t changed from under us, ultimately invoking }(hj<$hhhNhNubj )}(h:c:func:`!pte_lockptr`h]h pte_lockptr()}(hjW$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj<$ubh? to obtain a spin lock at PTE granularity contained within the }(hj<$hhhNhNubj )}(h:c:struct:`!struct ptdesc`h]h struct ptdesc}(hjj$hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj<$ubhF associated with the physical PTE page. The lock must be released via }(hj<$hhhNhNubj )}(h:c:func:`!pte_unmap_unlock`h]hpte_unmap_unlock()}(hj}$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj<$ubh.}(hj<$hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hThere are some variants on this, such as :c:func:`!pte_offset_map_rw_nolock` when we know we hold the PTE stable but for brevity we do not explore this. See the comment for :c:func:`!pte_offset_map_lock` for more details.h]j)}(hThere are some variants on this, such as :c:func:`!pte_offset_map_rw_nolock` when we know we hold the PTE stable but for brevity we do not explore this. See the comment for :c:func:`!pte_offset_map_lock` for more details.h](h)There are some variants on this, such as }(hj$hhhNhNubj )}(h#:c:func:`!pte_offset_map_rw_nolock`h]hpte_offset_map_rw_nolock()}(hj$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj$ubhb when we know we hold the PTE stable but for brevity we do not explore this. See the comment for }(hj$hhhNhNubj )}(h:c:func:`!pte_offset_map_lock`h]hpte_offset_map_lock()}(hj$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj$ubh for more details.}(hj$hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj$ubah}(h]h ]h"]h$]h&]uh1jhj"hhhhhNubj)}(hWhen modifying data in ranges we typically only wish to allocate higher page tables as necessary, using these locks to avoid races or overwriting anything, and set/clear data at the PTE level as required (for instance when page faulting or zapping).h]hWhen modifying data in ranges we typically only wish to allocate higher page tables as necessary, using these locks to avoid races or overwriting anything, and set/clear data at the PTE level as required (for instance when page faulting or zapping).}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hXA typical pattern taken when traversing page table entries to install a new mapping is to optimistically determine whether the page table entry in the table above is empty, if so, only then acquiring the page table lock and checking again to see if it was allocated underneath us.h]hXA typical pattern taken when traversing page table entries to install a new mapping is to optimistically determine whether the page table entry in the table above is empty, if so, only then acquiring the page table lock and checking again to see if it was allocated underneath us.}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hThis allows for a traversal with page table locks only being taken when required. An example of this is :c:func:`!__pud_alloc`.h](hhThis allows for a traversal with page table locks only being taken when required. An example of this is }(hj$hhhNhNubj )}(h:c:func:`!__pud_alloc`h]h __pud_alloc()}(hj$hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj$ubh.}(hj$hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hAt the leaf page table, that is the PTE, we can't entirely rely on this pattern as we have separate PMD and PTE locks and a THP collapse for instance might have eliminated the PMD entry as well as the PTE from under us.h]hAt the leaf page table, that is the PTE, we can’t entirely rely on this pattern as we have separate PMD and PTE locks and a THP collapse for instance might have eliminated the PMD entry as well as the PTE from under us.}(hj%hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hThis is why :c:func:`!pte_offset_map_lock` locklessly retrieves the PMD entry for the PTE, carefully checking it is as expected, before acquiring the PTE-specific lock, and then *again* checking that the PMD entry is as expected.h](h This is why }(hj%hhhNhNubj )}(h:c:func:`!pte_offset_map_lock`h]hpte_offset_map_lock()}(hj'%hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj%ubh locklessly retrieves the PMD entry for the PTE, carefully checking it is as expected, before acquiring the PTE-specific lock, and then }(hj%hhhNhNubj\)}(h*again*h]hagain}(hj:%hhhNhNubah}(h]h ]h"]h$]h&]uh1j[hj%ubh, checking that the PMD entry is as expected.}(hj%hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(hIf a THP collapse (or similar) were to occur then the lock on both pages would be acquired, so we can ensure this is prevented while the PTE lock is held.h]hIf a THP collapse (or similar) were to occur then the lock on both pages would be acquired, so we can ensure this is prevented while the PTE lock is held.}(hjR%hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubj)}(h>Installing entries this way ensures mutual exclusion on write.h]h>Installing entries this way ensures mutual exclusion on write.}(hj`%hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj"hhubeh}(h]page-table-installationah ]h"]page table installationah$]h&]uh1hhjOhhhhhM{ubh)}(hhh](h)}(hPage table freeingh]hPage table freeing}(hjy%hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjv%hhhhhMubj)}(hTearing down page tables themselves is something that requires significant care. There must be no way that page tables designated for removal can be traversed or referenced by concurrent tasks.h]hTearing down page tables themselves is something that requires significant care. There must be no way that page tables designated for removal can be traversed or referenced by concurrent tasks.}(hj%hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hIt is insufficient to simply hold an mmap write lock and VMA lock (which will prevent racing faults, and rmap operations), as a file-backed mapping can be truncated under the :c:struct:`!struct address_space->i_mmap_rwsem` alone.h](hIt is insufficient to simply hold an mmap write lock and VMA lock (which will prevent racing faults, and rmap operations), as a file-backed mapping can be truncated under the }(hj%hhhNhNubj )}(h/:c:struct:`!struct address_space->i_mmap_rwsem`h]h"struct address_space->i_mmap_rwsem}(hj%hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj%ubh alone.}(hj%hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hAs a result, no VMA which can be accessed via the reverse mapping (either through the :c:struct:`!struct anon_vma->rb_root` or the :c:member:`!struct address_space->i_mmap` interval trees) can have its page tables torn down.h](hVAs a result, no VMA which can be accessed via the reverse mapping (either through the }(hj%hhhNhNubj )}(h%:c:struct:`!struct anon_vma->rb_root`h]hstruct anon_vma->rb_root}(hj%hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj%ubh or the }(hj%hhhNhNubj )}(h):c:member:`!struct address_space->i_mmap`h]hstruct address_space->i_mmap}(hj%hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj%ubh4 interval trees) can have its page tables torn down.}(hj%hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hThe operation is typically performed via :c:func:`!free_pgtables`, which assumes either the mmap write lock has been taken (as specified by its :c:member:`!mm_wr_locked` parameter), or that the VMA is already unreachable.h](h)The operation is typically performed via }(hj%hhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hj%hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj%ubhO, which assumes either the mmap write lock has been taken (as specified by its }(hj%hhhNhNubj )}(h:c:member:`!mm_wr_locked`h]h mm_wr_locked}(hj&hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj%ubh4 parameter), or that the VMA is already unreachable.}(hj%hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hIt carefully removes the VMA from all reverse mappings, however it's important that no new ones overlap these or any route remain to permit access to addresses within the range whose page tables are being torn down.h]hIt carefully removes the VMA from all reverse mappings, however it’s important that no new ones overlap these or any route remain to permit access to addresses within the range whose page tables are being torn down.}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hAdditionally, it assumes that a zap has already been performed and steps have been taken to ensure that no further page table entries can be installed between the zap and the invocation of :c:func:`!free_pgtables`.h](hAdditionally, it assumes that a zap has already been performed and steps have been taken to ensure that no further page table entries can be installed between the zap and the invocation of }(hj,&hhhNhNubj )}(h:c:func:`!free_pgtables`h]hfree_pgtables()}(hj4&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,&ubh.}(hj,&hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hSince it is assumed that all such steps have been taken, page table entries are cleared without page table locks (in the :c:func:`!pgd_clear`, :c:func:`!p4d_clear`, :c:func:`!pud_clear`, and :c:func:`!pmd_clear` functions.h](hySince it is assumed that all such steps have been taken, page table entries are cleared without page table locks (in the }(hjM&hhhNhNubj )}(h:c:func:`!pgd_clear`h]h pgd_clear()}(hjU&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM&ubh, }(hjM&hhhNhNubj )}(h:c:func:`!p4d_clear`h]h p4d_clear()}(hjh&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM&ubh, }(hjM&hhhNhNubj )}(h:c:func:`!pud_clear`h]h pud_clear()}(hj{&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM&ubh, and }(hjM&hhhNhNubj )}(h:c:func:`!pmd_clear`h]h pmd_clear()}(hj&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjM&ubh functions.}(hjM&hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjv%hhubj)}(hIt is possible for leaf page tables to be torn down independent of the page tables above it as is done by :c:func:`!retract_page_tables`, which is performed under the i_mmap read lock, PMD, and PTE page table locks, without this level of care.h]j)}(hIt is possible for leaf page tables to be torn down independent of the page tables above it as is done by :c:func:`!retract_page_tables`, which is performed under the i_mmap read lock, PMD, and PTE page table locks, without this level of care.h](hjIt is possible for leaf page tables to be torn down independent of the page tables above it as is done by }(hj&hhhNhNubj )}(h:c:func:`!retract_page_tables`h]hretract_page_tables()}(hj&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj&ubhk, which is performed under the i_mmap read lock, PMD, and PTE page table locks, without this level of care.}(hj&hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj&ubah}(h]h ]h"]h$]h&]uh1jhjv%hhhhhNubeh}(h]page-table-freeingah ]h"]page table freeingah$]h&]uh1hhjOhhhhhMubh)}(hhh](h)}(hPage table movingh]hPage table moving}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhMubj)}(hSome functions manipulate page table levels above PMD (that is PUD, P4D and PGD page tables). Most notable of these is :c:func:`!mremap`, which is capable of moving higher level page tables.h](hwSome functions manipulate page table levels above PMD (that is PUD, P4D and PGD page tables). Most notable of these is }(hj&hhhNhNubj )}(h:c:func:`!mremap`h]hmremap()}(hj&hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj&ubh6, which is capable of moving higher level page tables.}(hj&hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj&hhubj)}(hIn these instances, it is required that **all** locks are taken, that is the mmap lock, the VMA lock and the relevant rmap locks.h](h(In these instances, it is required that }(hj 'hhhNhNubj)}(h**all**h]hall}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj 'ubhR locks are taken, that is the mmap lock, the VMA lock and the relevant rmap locks.}(hj 'hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj&hhubj)}(hYou can observe this in the :c:func:`!mremap` implementation in the functions :c:func:`!take_rmap_locks` and :c:func:`!drop_rmap_locks` which perform the rmap side of lock acquisition, invoked ultimately by :c:func:`!move_page_tables`.h](hYou can observe this in the }(hj,'hhhNhNubj )}(h:c:func:`!mremap`h]hmremap()}(hj4'hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,'ubh! implementation in the functions }(hj,'hhhNhNubj )}(h:c:func:`!take_rmap_locks`h]htake_rmap_locks()}(hjG'hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,'ubh and }(hj,'hhhNhNubj )}(h:c:func:`!drop_rmap_locks`h]hdrop_rmap_locks()}(hjZ'hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,'ubhH which perform the rmap side of lock acquisition, invoked ultimately by }(hj,'hhhNhNubj )}(h:c:func:`!move_page_tables`h]hmove_page_tables()}(hjm'hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,'ubh.}(hj,'hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj&hhubeh}(h]page-table-movingah ]h"]page table movingah$]h&]uh1hhjOhhhhhMubeh}(h]page-table-locking-detailsah ]h"]page table locking detailsah$]h&]uh1hhj&hhhhhMubh)}(hhh](h)}(hVMA lock internalsh]hVMA lock internals}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj'hhhhhMubh)}(hhh](h)}(hOverviewh]hOverview}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj'hhhhhMubj)}(hVMA read locking is entirely optimistic - if the lock is contended or a competing write has started, then we do not obtain a read lock.h]hVMA read locking is entirely optimistic - if the lock is contended or a competing write has started, then we do not obtain a read lock.}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj'hhubj)}(hX&A VMA **read** lock is obtained by :c:func:`!lock_vma_under_rcu`, which first calls :c:func:`!rcu_read_lock` to ensure that the VMA is looked up in an RCU critical section, then attempts to VMA lock it via :c:func:`!vma_start_read`, before releasing the RCU lock via :c:func:`!rcu_read_unlock`.h](hA VMA }(hj'hhhNhNubj)}(h**read**h]hread}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj'ubh lock is obtained by }(hj'hhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hj'hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj'ubh, which first calls }(hj'hhhNhNubj )}(h:c:func:`!rcu_read_lock`h]hrcu_read_lock()}(hj'hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj'ubhb to ensure that the VMA is looked up in an RCU critical section, then attempts to VMA lock it via }(hj'hhhNhNubj )}(h:c:func:`!vma_start_read`h]hvma_start_read()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj'ubh$, before releasing the RCU lock via }(hj'hhhNhNubj )}(h:c:func:`!rcu_read_unlock`h]hrcu_read_unlock()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj'ubh.}(hj'hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj'hhubj)}(hXIn cases when the user already holds mmap read lock, :c:func:`!vma_start_read_locked` and :c:func:`!vma_start_read_locked_nested` can be used. These functions do not fail due to lock contention but the caller should still check their return values in case they fail for other reasons.h](h5In cases when the user already holds mmap read lock, }(hj2(hhhNhNubj )}(h :c:func:`!vma_start_read_locked`h]hvma_start_read_locked()}(hj:(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj2(ubh and }(hj2(hhhNhNubj )}(h':c:func:`!vma_start_read_locked_nested`h]hvma_start_read_locked_nested()}(hjM(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj2(ubh can be used. These functions do not fail due to lock contention but the caller should still check their return values in case they fail for other reasons.}(hj2(hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj'hhubj)}(hVMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for their duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via :c:func:`!vma_end_read`.h](hVMA read locks increment }(hjf(hhhNhNubj )}(h:c:member:`!vma.vm_refcnt`h]h vma.vm_refcnt}(hjn(hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjf(ubh8 reference counter for their duration and the caller of }(hjf(hhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjf(ubh must drop it via }(hjf(hhhNhNubj )}(h:c:func:`!vma_end_read`h]hvma_end_read()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjf(ubh.}(hjf(hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj'hhubj)}(hX{VMA **write** locks are acquired via :c:func:`!vma_start_write` in instances where a VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is always acquired. An mmap write lock **must** be held for the duration of the VMA write lock, releasing or downgrading the mmap write lock also releases the VMA write lock so there is no :c:func:`!vma_end_write` function.h](hVMA }(hj(hhhNhNubj)}(h **write**h]hwrite}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj(ubh locks are acquired via }(hj(hhhNhNubj )}(h:c:func:`!vma_start_write`h]hvma_start_write()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj(ubh: in instances where a VMA is about to be modified, unlike }(hj(hhhNhNubj )}(h:c:func:`!vma_start_read`h]hvma_start_read()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj(ubh1 the lock is always acquired. An mmap write lock }(hj(hhhNhNubj)}(h**must**h]hmust}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj(ubh be held for the duration of the VMA write lock, releasing or downgrading the mmap write lock also releases the VMA write lock so there is no }(hj(hhhNhNubj )}(h:c:func:`!vma_end_write`h]hvma_end_write()}(hj(hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj(ubh function.}(hj(hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhj'hhubj)}(hNote that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is temporarily modified so that readers can detect the presense of a writer. The reference counter is restored once the vma sequence number used for serialisation is updated.h](h-Note that when write-locking a VMA lock, the }(hj)hhhNhNubj )}(h:c:member:`!vma.vm_refcnt`h]h vma.vm_refcnt}(hj )hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj)ubh is temporarily modified so that readers can detect the presense of a writer. The reference counter is restored once the vma sequence number used for serialisation is updated.}(hj)hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM hj'hhubj)}(hbThis ensures the semantics we require - VMA write locks provide exclusive write access to the VMA.h]hbThis ensures the semantics we require - VMA write locks provide exclusive write access to the VMA.}(hj9)hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM hj'hhubeh}(h]overviewah ]h"]overviewah$]h&]uh1hhj'hhhhhMubh)}(hhh](h)}(hImplementation detailsh]hImplementation details}(hjR)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjO)hhhhhMubj)}(hX The VMA lock mechanism is designed to be a lightweight means of avoiding the use of the heavily contended mmap lock. It is implemented using a combination of a reference counter and sequence numbers belonging to the containing :c:struct:`!struct mm_struct` and the VMA.h](hThe VMA lock mechanism is designed to be a lightweight means of avoiding the use of the heavily contended mmap lock. It is implemented using a combination of a reference counter and sequence numbers belonging to the containing }(hj`)hhhNhNubj )}(h:c:struct:`!struct mm_struct`h]hstruct mm_struct}(hjh)hhhNhNubah}(h]h ](jjc-structeh"]h$]h&]uh1j hj`)ubh and the VMA.}(hj`)hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjO)hhubj)}(hX Read locks are acquired via :c:func:`!vma_start_read`, which is an optimistic operation, i.e. it tries to acquire a read lock but returns false if it is unable to do so. At the end of the read operation, :c:func:`!vma_end_read` is called to release the VMA read lock.h](hRead locks are acquired via }(hj)hhhNhNubj )}(h:c:func:`!vma_start_read`h]hvma_start_read()}(hj)hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubh, which is an optimistic operation, i.e. it tries to acquire a read lock but returns false if it is unable to do so. At the end of the read operation, }(hj)hhhNhNubj )}(h:c:func:`!vma_end_read`h]hvma_end_read()}(hj)hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubh( is called to release the VMA read lock.}(hj)hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjO)hhubj)}(hXaInvoking :c:func:`!vma_start_read` requires that :c:func:`!rcu_read_lock` has been called first, establishing that we are in an RCU critical section upon VMA read lock acquisition. Once acquired, the RCU lock can be released as it is only required for lookup. This is abstracted by :c:func:`!lock_vma_under_rcu` which is the interface a user should use.h](h Invoking }(hj)hhhNhNubj )}(h:c:func:`!vma_start_read`h]hvma_start_read()}(hj)hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubh requires that }(hj)hhhNhNubj )}(h:c:func:`!rcu_read_lock`h]hrcu_read_lock()}(hj)hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubh has been called first, establishing that we are in an RCU critical section upon VMA read lock acquisition. Once acquired, the RCU lock can be released as it is only required for lookup. This is abstracted by }(hj)hhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hj)hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubh* which is the interface a user should use.}(hj)hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjO)hhubj)}(hWriting requires the mmap to be write-locked and the VMA lock to be acquired via :c:func:`!vma_start_write`, however the write lock is released by the termination or downgrade of the mmap write lock so no :c:func:`!vma_end_write` is required.h](hQWriting requires the mmap to be write-locked and the VMA lock to be acquired via }(hj)hhhNhNubj )}(h:c:func:`!vma_start_write`h]hvma_start_write()}(hj*hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubhb, however the write lock is released by the termination or downgrade of the mmap write lock so no }(hj)hhhNhNubj )}(h:c:func:`!vma_end_write`h]hvma_end_write()}(hj*hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj)ubh is required.}(hj)hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM#hjO)hhubj)}(hAll this is achieved by the use of per-mm and per-VMA sequence counts, which are used in order to reduce complexity, especially for operations which write-lock multiple VMAs at once.h]hAll this is achieved by the use of per-mm and per-VMA sequence counts, which are used in order to reduce complexity, especially for operations which write-lock multiple VMAs at once.}(hj0*hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM'hjO)hhubj)}(hIf the mm sequence count, :c:member:`!mm->mm_lock_seq` is equal to the VMA sequence count :c:member:`!vma->vm_lock_seq` then the VMA is write-locked. If they differ, then it is not.h](hIf the mm sequence count, }(hj>*hhhNhNubj )}(h:c:member:`!mm->mm_lock_seq`h]hmm->mm_lock_seq}(hjF*hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj>*ubh$ is equal to the VMA sequence count }(hj>*hhhNhNubj )}(h:c:member:`!vma->vm_lock_seq`h]hvma->vm_lock_seq}(hjY*hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj>*ubh> then the VMA is write-locked. If they differ, then it is not.}(hj>*hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM+hjO)hhubj)}(hEach time the mmap write lock is released in :c:func:`!mmap_write_unlock` or :c:func:`!mmap_write_downgrade`, :c:func:`!vma_end_write_all` is invoked which also increments :c:member:`!mm->mm_lock_seq` via :c:func:`!mm_lock_seqcount_end`.h](h-Each time the mmap write lock is released in }(hjr*hhhNhNubj )}(h:c:func:`!mmap_write_unlock`h]hmmap_write_unlock()}(hjz*hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjr*ubh or }(hjr*hhhNhNubj )}(h:c:func:`!mmap_write_downgrade`h]hmmap_write_downgrade()}(hj*hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjr*ubh, }(hjr*hhhNhNubj )}(h:c:func:`!vma_end_write_all`h]hvma_end_write_all()}(hj*hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjr*ubh" is invoked which also increments }(hjr*hhhNhNubj )}(h:c:member:`!mm->mm_lock_seq`h]hmm->mm_lock_seq}(hj*hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjr*ubh via }(hjr*hhhNhNubj )}(h:c:func:`!mm_lock_seqcount_end`h]hmm_lock_seqcount_end()}(hj*hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjr*ubh.}(hjr*hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM/hjO)hhubj)}(hThis way, we ensure that, regardless of the VMA's sequence number, a write lock is never incorrectly indicated and that when we release an mmap write lock we efficiently release **all** VMA write locks contained within the mmap at the same time.h](hThis way, we ensure that, regardless of the VMA’s sequence number, a write lock is never incorrectly indicated and that when we release an mmap write lock we efficiently release }(hj*hhhNhNubj)}(h**all**h]hall}(hj*hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj*ubh< VMA write locks contained within the mmap at the same time.}(hj*hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM4hjO)hhubj)}(hXSince the mmap write lock is exclusive against others who hold it, the automatic release of any VMA locks on its release makes sense, as you would never want to keep VMAs locked across entirely separate write operations. It also maintains correct lock ordering.h]hXSince the mmap write lock is exclusive against others who hold it, the automatic release of any VMA locks on its release makes sense, as you would never want to keep VMAs locked across entirely separate write operations. It also maintains correct lock ordering.}(hj*hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM9hjO)hhubj)}(hEach time a VMA read lock is acquired, we increment :c:member:`!vma.vm_refcnt` reference counter and check that the sequence count of the VMA does not match that of the mm.h](h4Each time a VMA read lock is acquired, we increment }(hj +hhhNhNubj )}(h:c:member:`!vma.vm_refcnt`h]h vma.vm_refcnt}(hj+hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj +ubh^ reference counter and check that the sequence count of the VMA does not match that of the mm.}(hj +hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM>hjO)hhubj)}(hIf it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped. If it does not, we keep the reference counter raised, excluding writers, but permitting other readers, who can also obtain this lock under RCU.h](h$If it does, the read lock fails and }(hj.+hhhNhNubj )}(h:c:member:`!vma.vm_refcnt`h]h vma.vm_refcnt}(hj6+hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj.+ubh is dropped. If it does not, we keep the reference counter raised, excluding writers, but permitting other readers, who can also obtain this lock under RCU.}(hj.+hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMBhjO)hhubj)}(hImportantly, maple tree operations performed in :c:func:`!lock_vma_under_rcu` are also RCU safe, so the whole read lock operation is guaranteed to function correctly.h](h0Importantly, maple tree operations performed in }(hjO+hhhNhNubj )}(h:c:func:`!lock_vma_under_rcu`h]hlock_vma_under_rcu()}(hjW+hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hjO+ubhY are also RCU safe, so the whole read lock operation is guaranteed to function correctly.}(hjO+hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMFhjO)hhubj)}(hX#On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't be modified by readers and wait for all readers to drop their reference count. Once there are no readers, the VMA's sequence number is set to match that of the mm. During this entire operation mmap write lock is held.h](h#On the write side, we set a bit in }(hjp+hhhNhNubj )}(h:c:member:`!vma.vm_refcnt`h]h vma.vm_refcnt}(hjx+hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hjp+ubh which can’t be modified by readers and wait for all readers to drop their reference count. Once there are no readers, the VMA’s sequence number is set to match that of the mm. During this entire operation mmap write lock is held.}(hjp+hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMJhjO)hhubj)}(hThis way, if any read locks are in effect, :c:func:`!vma_start_write` will sleep until these are finished and mutual exclusion is achieved.h](h+This way, if any read locks are in effect, }(hj+hhhNhNubj )}(h:c:func:`!vma_start_write`h]hvma_start_write()}(hj+hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj+ubhF will sleep until these are finished and mutual exclusion is achieved.}(hj+hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMOhjO)hhubj)}(hAfter setting the VMA's sequence number, the bit in :c:member:`!vma.vm_refcnt` indicating a writer is cleared. From this point on, VMA's sequence number will indicate VMA's write-locked state until mmap write lock is dropped or downgraded.h](h6After setting the VMA’s sequence number, the bit in }(hj+hhhNhNubj )}(h:c:member:`!vma.vm_refcnt`h]h vma.vm_refcnt}(hj+hhhNhNubah}(h]h ](jjc-membereh"]h$]h&]uh1j hj+ubh indicating a writer is cleared. From this point on, VMA’s sequence number will indicate VMA’s write-locked state until mmap write lock is dropped or downgraded.}(hj+hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMRhjO)hhubj)}(hThis clever combination of a reference counter and sequence count allows for fast RCU-based per-VMA lock acquisition (especially on page fault, though utilised elsewhere) with minimal complexity around lock ordering.h]hThis clever combination of a reference counter and sequence count allows for fast RCU-based per-VMA lock acquisition (especially on page fault, though utilised elsewhere) with minimal complexity around lock ordering.}(hj+hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMVhjO)hhubeh}(h]implementation-detailsah ]h"]implementation detailsah$]h&]uh1hhj'hhhhhMubeh}(h]vma-lock-internalsah ]h"]vma lock internalsah$]h&]uh1hhj&hhhhhMubh)}(hhh](h)}(hmmap write lock downgradingh]hmmap write lock downgrading}(hj+hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj+hhhhhM[ubj)}(hWhen an mmap write lock is held one has exclusive access to resources within the mmap (with the usual caveats about requiring VMA write locks to avoid races with tasks holding VMA read locks).h]hWhen an mmap write lock is held one has exclusive access to resources within the mmap (with the usual caveats about requiring VMA write locks to avoid races with tasks holding VMA read locks).}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM]hj+hhubj)}(hXeIt is then possible to **downgrade** from a write lock to a read lock via :c:func:`!mmap_write_downgrade` which, similar to :c:func:`!mmap_write_unlock`, implicitly terminates all VMA write locks via :c:func:`!vma_end_write_all`, but importantly does not relinquish the mmap lock while downgrading, therefore keeping the locked virtual address space stable.h](hIt is then possible to }(hj,hhhNhNubj)}(h **downgrade**h]h downgrade}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj,ubh& from a write lock to a read lock via }(hj,hhhNhNubj )}(h:c:func:`!mmap_write_downgrade`h]hmmap_write_downgrade()}(hj*,hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,ubh which, similar to }(hj,hhhNhNubj )}(h:c:func:`!mmap_write_unlock`h]hmmap_write_unlock()}(hj=,hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,ubh0, implicitly terminates all VMA write locks via }(hj,hhhNhNubj )}(h:c:func:`!vma_end_write_all`h]hvma_end_write_all()}(hjP,hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj,ubh, but importantly does not relinquish the mmap lock while downgrading, therefore keeping the locked virtual address space stable.}(hj,hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMahj+hhubj)}(hX8An interesting consequence of this is that downgraded locks are exclusive against any other task possessing a downgraded lock (since a racing task would have to acquire a write lock first to downgrade it, and the downgraded lock prevents a new write lock from being obtained until the original lock is released).h]hX8An interesting consequence of this is that downgraded locks are exclusive against any other task possessing a downgraded lock (since a racing task would have to acquire a write lock first to downgrade it, and the downgraded lock prevents a new write lock from being obtained until the original lock is released).}(hji,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMghj+hhubj)}(h}For clarity, we map read (R)/downgraded write (D)/write (W) locks against one another showing which locks exclude the others:h]h}For clarity, we map read (R)/downgraded write (D)/write (W) locks against one another showing which locks exclude the others:}(hjw,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMmhj+hhubjn)}(hhh](h)}(hLock exclusivityh]hLock exclusivity}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMphj,ubjs)}(hhh](jx)}(hhh]h}(h]h ]h"]h$]h&]colwidthKstubKuh1jwhj,ubjx)}(hhh]h}(h]h ]h"]h$]h&]j,Kuh1jwhj,ubjx)}(hhh]h}(h]h ]h"]h$]h&]j,Kuh1jwhj,ubjx)}(hhh]h}(h]h ]h"]h$]h&]j,Kuh1jwhj,ubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]uh1jhj,ubj)}(hhh]j)}(hjCh]hR}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMvhj,ubah}(h]h ]h"]h$]h&]uh1jhj,ubj)}(hhh]j)}(hDh]hD}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMwhj,ubah}(h]h ]h"]h$]h&]uh1jhj,ubj)}(hhh]j)}(hj h]hW}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMxhj,ubah}(h]h ]h"]h$]h&]uh1jhj,ubeh}(h]h ]h"]h$]h&]uh1jhj,ubah}(h]h ]h"]h$]h&]uh1jhj,ubjy)}(hhh](j)}(hhh](j)}(hhh]j)}(hjCh]hR}(hj&-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMyhj#-ubah}(h]h ]h"]h$]h&]uh1jhj -ubj)}(hhh]j)}(hjh]hN}(hj<-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMzhj9-ubah}(h]h ]h"]h$]h&]uh1jhj -ubj)}(hhh]j)}(hjh]hN}(hjR-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM{hjO-ubah}(h]h ]h"]h$]h&]uh1jhj -ubj)}(hhh]j)}(hjqh]hY}(hjh-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM|hje-ubah}(h]h ]h"]h$]h&]uh1jhj -ubeh}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh](j)}(hhh]j)}(hj,h]hD}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM}hj-ubah}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh]j)}(hjh]hN}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM~hj-ubah}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh]j)}(hjqh]hY}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj-ubah}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh]j)}(hjqh]hY}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj-ubah}(h]h ]h"]h$]h&]uh1jhj-ubeh}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh](j)}(hhh]j)}(hj h]hW}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj-ubah}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh]j)}(hjqh]hY}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj-ubah}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh]j)}(hjqh]hY}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj.ubah}(h]h ]h"]h$]h&]uh1jhj-ubj)}(hhh]j)}(hjqh]hY}(hj*.hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj'.ubah}(h]h ]h"]h$]h&]uh1jhj-ubeh}(h]h ]h"]h$]h&]uh1jhj-ubeh}(h]h ]h"]h$]h&]uh1jxhj,ubeh}(h]h ]h"]h$]h&]colsKuh1jrhj,ubeh}(h]id5ah ]colwidths-givenah"]h$]h&]uh1jmhj+hhhNhNubj)}(hrHere a Y indicates the locks in the matching row/column are mutually exclusive, and N indicates that they are not.h]hrHere a Y indicates the locks in the matching row/column are mutually exclusive, and N indicates that they are not.}(hjX.hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhj+hhubeh}(h]mmap-write-lock-downgradingah ]h"]mmap write lock downgradingah$]h&]uh1hhj&hhhhhM[ubh)}(hhh](h)}(hStack expansionh]hStack expansion}(hjq.hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjn.hhhhhMubj)}(hStack expansion throws up additional complexities in that we cannot permit there to be racing page faults, as a result we invoke :c:func:`!vma_start_write` to prevent this in :c:func:`!expand_downwards` or :c:func:`!expand_upwards`.h](hStack expansion throws up additional complexities in that we cannot permit there to be racing page faults, as a result we invoke }(hj.hhhNhNubj )}(h:c:func:`!vma_start_write`h]hvma_start_write()}(hj.hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj.ubh to prevent this in }(hj.hhhNhNubj )}(h:c:func:`!expand_downwards`h]hexpand_downwards()}(hj.hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj.ubh or }(hj.hhhNhNubj )}(h:c:func:`!expand_upwards`h]hexpand_upwards()}(hj.hhhNhNubah}(h]h ](jjc-funceh"]h$]h&]uh1j hj.ubh.}(hj.hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjn.hhubeh}(h]stack-expansionah ]h"]stack expansionah$]h&]uh1hhj&hhhhhMubeh}(h]locking-implementation-detailsah ]h"]locking implementation detailsah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(hFunctions and structuresh]hFunctions and structures}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj.hhhhhMubhindex)}(hhh]h}(h]h ]h"]h$]h&]entries](singlevma_refcount_put (C function)c.vma_refcount_puthNtauh1j.hj.hhhNhNubhdesc)}(hhh](hdesc_signature)}(h2void vma_refcount_put (struct vm_area_struct *vma)h]hdesc_signature_line)}(h1void vma_refcount_put(struct vm_area_struct *vma)h](hdesc_sig_keyword_type)}(hvoidh]hvoid}(hj /hhhNhNubah}(h]h ]ktah"]h$]h&]uh1j /hj/hhh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhKubhdesc_sig_space)}(h h]h }(hj/hhhNhNubah}(h]h ]wah"]h$]h&]uh1j/hj/hhhj/hKubh desc_name)}(hvma_refcount_puth]h desc_sig_name)}(hvma_refcount_puth]hvma_refcount_put}(hj4/hhhNhNubah}(h]h ]nah"]h$]h&]uh1j2/hj./ubah}(h]h ](sig-namedescnameeh"]h$]h&]hhuh1j,/hj/hhhj/hKubhdesc_parameterlist)}(h(struct vm_area_struct *vma)h]hdesc_parameter)}(hstruct vm_area_struct *vmah](hdesc_sig_keyword)}(hstructh]hstruct}(hjY/hhhNhNubah}(h]h ]kah"]h$]h&]uh1jW/hjS/ubj/)}(h h]h }(hjh/hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hjS/ubh)}(hhh]j3/)}(hvm_area_structh]hvm_area_struct}(hjy/hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hjv/ubah}(h]h ]h"]h$]h&] refdomainjreftype identifier reftargetj{/modnameN classnameN c:parent_keysphinx.domains.c LookupKey)}data]j/ ASTIdentifier)}j/j6/sbc.vma_refcount_putasbuh1hhjS/ubj/)}(h h]h }(hj/hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hjS/ubhdesc_sig_punctuation)}(hjh]h*}(hj/hhhNhNubah}(h]h ]pah"]h$]h&]uh1j/hjS/ubj3/)}(hvmah]hvma}(hj/hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hjS/ubeh}(h]h ]h"]h$]h&]noemphhhuh1jQ/hjM/ubah}(h]h ]h"]h$]h&]hhuh1jK/hj/hhhj/hKubeh}(h]h ]h"]h$]h&]hhƌ add_permalinkuh1j/sphinx_line_type declaratorhj.hhhj/hKubah}(h]j.ah ](sig sig-objecteh"]h$]h&] is_multiline _toc_parts) _toc_namehuh1j.hj/hKhj.hhubh desc_content)}(hhh]j)}(hMDrop reference count in VMA vm_refcnt field due to a read-lock being dropped.h]hMDrop reference count in VMA vm_refcnt field due to a read-lock being dropped.}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhKhj/hhubah}(h]h ]h"]h$]h&]uh1j/hj.hhhj/hKubeh}(h]h ](jfunctioneh"]h$]h&]domainjobjtypej 0desctypej 0noindex noindexentrynocontentsentryuh1j.hhhj.hNhNubh container)}(h**Parameters** ``struct vm_area_struct *vma`` The VMA whose reference count we wish to decrement. **Description** If we were the last reader, wake up threads waiting to obtain an exclusive lock.h](j)}(h**Parameters**h]j)}(hj0h]h Parameters}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj0ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhKhj0ubhdefinition_list)}(hhh]hdefinition_list_item)}(hS``struct vm_area_struct *vma`` The VMA whose reference count we wish to decrement. h](hterm)}(h``struct vm_area_struct *vma``h]j )}(hjB0h]hstruct vm_area_struct *vma}(hjD0hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj@0ubah}(h]h ]h"]h$]h&]uh1j>0h[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhKhj:0ubh definition)}(hhh]j)}(h3The VMA whose reference count we wish to decrement.h]h3The VMA whose reference count we wish to decrement.}(hj]0hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjW0hKhjZ0ubah}(h]h ]h"]h$]h&]uh1jX0hj:0ubeh}(h]h ]h"]h$]h&]uh1j80hjW0hKhj50ubah}(h]h ]h"]h$]h&]uh1j30hj0ubj)}(h**Description**h]j)}(hj0h]h Description}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj}0ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhKhj0ubj)}(hPIf we were the last reader, wake up threads waiting to obtain an exclusive lock.h]hPIf we were the last reader, wake up threads waiting to obtain an exclusive lock.}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhKhj0ubeh}(h]h ] kernelindentah"]h$]h&]uh1j0hj.hhhNhNubj.)}(hhh]h}(h]h ]h"]h$]h&]entries](j.%vma_start_write_killable (C function)c.vma_start_write_killablehNtauh1j.hj.hhhNhNubj.)}(hhh](j.)}(h9int vma_start_write_killable (struct vm_area_struct *vma)h]j/)}(h8int vma_start_write_killable(struct vm_area_struct *vma)h](j /)}(hinth]hint}(hj0hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j /hj0hhh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM3ubj/)}(h h]h }(hj0hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj0hhhj0hM3ubj-/)}(hvma_start_write_killableh]j3/)}(hvma_start_write_killableh]hvma_start_write_killable}(hj0hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj0ubah}(h]h ](jF/jG/eh"]h$]h&]hhuh1j,/hj0hhhj0hM3ubjL/)}(h(struct vm_area_struct *vma)h]jR/)}(hstruct vm_area_struct *vmah](jX/)}(hj[/h]hstruct}(hj1hhhNhNubah}(h]h ]jd/ah"]h$]h&]uh1jW/hj0ubj/)}(h h]h }(hj1hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj0ubh)}(hhh]j3/)}(hvm_area_structh]hvm_area_struct}(hj1hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj1ubah}(h]h ]h"]h$]h&] refdomainjreftypej/ reftargetj!1modnameN classnameNj/j/)}j/]j/)}j/j0sbc.vma_start_write_killableasbuh1hhj0ubj/)}(h h]h }(hj?1hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj0ubj/)}(hjh]h*}(hjM1hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j/hj0ubj3/)}(hvmah]hvma}(hjZ1hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj0ubeh}(h]h ]h"]h$]h&]noemphhhuh1jQ/hj0ubah}(h]h ]h"]h$]h&]hhuh1jK/hj0hhhj0hM3ubeh}(h]h ]h"]h$]h&]hhj/uh1j/j/j/hj0hhhj0hM3ubah}(h]j0ah ](j/j/eh"]h$]h&]j/j/)j/huh1j.hj0hM3hj0hhubj/)}(hhh]j)}(hBegin writing to a VMA.h]hBegin writing to a VMA.}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM3hj1hhubah}(h]h ]h"]h$]h&]uh1j/hj0hhhj0hM3ubeh}(h]h ](jfunctioneh"]h$]h&]j0jj0j1j0j1j0j0j0uh1j.hhhj.hNhNubj0)}(hX**Parameters** ``struct vm_area_struct *vma`` The VMA we are going to modify. **Description** Exclude concurrent readers under the per-VMA lock until the currently write-locked mmap_lock is dropped or downgraded. **Context** May sleep while waiting for readers to drop the vma read lock. Caller must already hold the mmap_lock for write. **Return** 0 for a successful acquisition. -EINTR if a fatal signal was received.h](j)}(h**Parameters**h]j)}(hj1h]h Parameters}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM7hj1ubj40)}(hhh]j90)}(h?``struct vm_area_struct *vma`` The VMA we are going to modify. h](j?0)}(h``struct vm_area_struct *vma``h]j )}(hj1h]hstruct vm_area_struct *vma}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj1ubah}(h]h ]h"]h$]h&]uh1j>0h[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM4hj1ubjY0)}(hhh]j)}(hThe VMA we are going to modify.h]hThe VMA we are going to modify.}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1hM4hj1ubah}(h]h ]h"]h$]h&]uh1jX0hj1ubeh}(h]h ]h"]h$]h&]uh1j80hj1hM4hj1ubah}(h]h ]h"]h$]h&]uh1j30hj1ubj)}(h**Description**h]j)}(hj2h]h Description}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM6hj1ubj)}(hvExclude concurrent readers under the per-VMA lock until the currently write-locked mmap_lock is dropped or downgraded.h]hvExclude concurrent readers under the per-VMA lock until the currently write-locked mmap_lock is dropped or downgraded.}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM5hj1ubj)}(h **Context**h]j)}(hj'2h]hContext}(hj)2hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj%2ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM8hj1ubj)}(hpMay sleep while waiting for readers to drop the vma read lock. Caller must already hold the mmap_lock for write.h]hpMay sleep while waiting for readers to drop the vma read lock. Caller must already hold the mmap_lock for write.}(hj=2hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM9hj1ubj)}(h **Return**h]j)}(hjN2h]hReturn}(hjP2hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjL2ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM<hj1ubj)}(hG0 for a successful acquisition. -EINTR if a fatal signal was received.h]hG0 for a successful acquisition. -EINTR if a fatal signal was received.}(hjd2hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM<hj1ubeh}(h]h ] kernelindentah"]h$]h&]uh1j0hj.hhhNhNubj.)}(hhh]h}(h]h ]h"]h$]h&]entries](j.$vma_assert_write_locked (C function)c.vma_assert_write_lockedhNtauh1j.hj.hhhNhNubj.)}(hhh](j.)}(h9void vma_assert_write_locked (struct vm_area_struct *vma)h]j/)}(h8void vma_assert_write_locked(struct vm_area_struct *vma)h](j /)}(hvoidh]hvoid}(hj2hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j /hj2hhh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMIubj/)}(h h]h }(hj2hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj2hhhj2hMIubj-/)}(hvma_assert_write_lockedh]j3/)}(hvma_assert_write_lockedh]hvma_assert_write_locked}(hj2hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj2ubah}(h]h ](jF/jG/eh"]h$]h&]hhuh1j,/hj2hhhj2hMIubjL/)}(h(struct vm_area_struct *vma)h]jR/)}(hstruct vm_area_struct *vmah](jX/;)}(hj[/h]hstruct}(hj2hhhNhNubah}(h]h ]jd/ah"]h$]h&]uh1jW/hj2ubj/)}(h h]h }(hj2hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj2ubh)}(hhh]j3/)}(hvm_area_structh]hvm_area_struct}(hj2hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj2ubah}(h]h ]h"]h$]h&] refdomainjreftypej/ reftargetj2modnameN classnameNj/j/)}j/]j/)}j/j2sbc.vma_assert_write_lockedasbuh1hhj2ubj/)}(h h]h }(hj3hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj2ubj/)}(hjh]h*}(hj3hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j/hj2ubj3/)}(hvmah]hvma}(hj)3hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj2ubeh}(h]h ]h"]h$]h&]noemphhhuh1jQ/hj2ubah}(h]h ]h"]h$]h&]hhuh1jK/hj2hhhj2hMIubeh}(h]h ]h"]h$]h&]hhj/uh1j/j/j/hj2hhhj2hMIubah}(h]j2ah ](j/j/eh"]h$]h&]j/j/)j/huh1j.hj2hMIhj2hhubj/)}(hhh]j)}(h+assert that **vma** holds a VMA write lock.h](h assert that }(hjS3hhhNhNubj)}(h**vma**h]hvma}(hj[3hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjS3ubh holds a VMA write lock.}(hjS3hhhNhNubeh}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMIhjP3hhubah}(h]h ]h"]h$]h&]uh1j/hj2hhhj2hMIubeh}(h]h ](jfunctioneh"]h$]h&]j0jj0j}3j0j}3j0j0j0uh1j.hhhj.hNhNubj0)}(hC**Parameters** ``struct vm_area_struct *vma`` The VMA to assert.h](j)}(h**Parameters**h]j)}(hj3h]h Parameters}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj3ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMMhj3ubj40)}(hhh]j90)}(h1``struct vm_area_struct *vma`` The VMA to assert.h](j?0)}(h``struct vm_area_struct *vma``h]j )}(hj3h]hstruct vm_area_struct *vma}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj3ubah}(h]h ]h"]h$]h&]uh1j>0h[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMOhj3ubjY0)}(hhh]j)}(hThe VMA to assert.h]hThe VMA to assert.}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMJhj3ubah}(h]h ]h"]h$]h&]uh1jX0hj3ubeh}(h]h ]h"]h$]h&]uh1j80hj3hMOhj3ubah}(h]h ]h"]h$]h&]uh1j30hj3ubeh}(h]h ] kernelindentah"]h$]h&]uh1j0hj.hhhNhNubj.)}(hhh]h}(h]h ]h"]h$]h&]entries](j.vma_assert_locked (C function)c.vma_assert_lockedhNtauh1j.hj.hhhNhNubj.)}(hhh](j.)}(h3void vma_assert_locked (struct vm_area_struct *vma)h]j/)}(h2void vma_assert_locked(struct vm_area_struct *vma)h](j /)}(hvoidh]hvoid}(hj4hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j /hj3hhh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMRubj/)}(h h]h }(hj4hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj3hhhj4hMRubj-/)}(hvma_assert_lockedh]j3/)}(hvma_assert_lockedh]hvma_assert_locked}(hj!4hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj4ubah}(h]h ](jF/jG/eh"]h$]h&]hhuh1j,/hj3hhhj4hMRubjL/)}(h(struct vm_area_struct *vma)h]jR/)}(hstruct vm_area_struct *vmah](jX/)}(hj[/h]hstruct}(hj=4hhhNhNubah}(h]h ]jd/ah"]h$]h&]uh1jW/hj94ubj/)}(h h]h }(hjJ4hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj94ubh)}(hhh]j3/)}(hvm_area_structh]hvm_area_struct}(hj[4hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hjX4ubah}(h]h ]h"]h$]h&] refdomainjreftypej/ reftargetj]4modnameN classnameNj/j/)}j/]j/)}j/j#4sbc.vma_assert_lockedasbuh1hhj94ubj/)}(h h]h }(hj{4hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj94ubj/)}(hjh]h*}(hj4hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j/hj94ubj3/)}(hvmah]hvma}(hj4hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj94ubeh}(h]h ]h"]h$]h&]noemphhhuh1jQ/hj54ubah}(h]h ]h"]h$]h&]hhuh1jK/hj3hhhj4hMRubeh}(h]h ]h"]h$]h&]hhj/uh1j/j/j/hj3hhhj4hMRubah}(h]j3ah ](j/j/eh"]h$]h&]j/j/)j/huh1j.hj4hMRhj3hhubj/)}(hhh]j)}(hTassert that **vma** holds either a VMA read or a VMA write lock and is not detached.h](h assert that }(hj4hhhNhNubj)}(h**vma**h]hvma}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubhA holds either a VMA read or a VMA write lock and is not detached.}(hj4hhhNhNubeh}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMRhj4hhubah}(h]h ]h"]h$]h&]uh1j/hj3hhhj4hMRubeh}(h]h ](jfunctioneh"]h$]h&]j0jj0j4j0j4j0j0j0uh1j.hhhj.hNhNubj0)}(hC**Parameters** ``struct vm_area_struct *vma`` The VMA to assert.h](j)}(h**Parameters**h]j)}(hj4h]h Parameters}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMVhj4ubj40)}(hhh]j90)}(h1``struct vm_area_struct *vma`` The VMA to assert.h](j?0)}(h``struct vm_area_struct *vma``h]j )}(hj5h]hstruct vm_area_struct *vma}(hj5hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj5ubah}(h]h ]h"]h$]h&]uh1j>0h[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMXhj 5ubjY0)}(hhh]j)}(hThe VMA to assert.h]hThe VMA to assert.}(hj,5hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMThj)5ubah}(h]h ]h"]h$]h&]uh1jX0hj 5ubeh}(h]h ]h"]h$]h&]uh1j80hj(5hMXhj 5ubah}(h]h ]h"]h$]h&]uh1j30hj4ubeh}(h]h ] kernelindentah"]h$]h&]uh1j0hj.hhhNhNubj.)}(hhh]h}(h]h ]h"]h$]h&]entries](j."vma_assert_stabilised (C function)c.vma_assert_stabilisedhNtauh1j.hj.hhhNhNubj.)}(hhh](j.)}(h7void vma_assert_stabilised (struct vm_area_struct *vma)h]j/)}(h6void vma_assert_stabilised(struct vm_area_struct *vma)h](j /)}(hvoidh]hvoid}(hjm5hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j /hji5hhh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMzubj/)}(h h]h }(hj|5hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hji5hhhj{5hMzubj-/)}(hvma_assert_stabilisedh]j3/)}(hvma_assert_stabilisedh]hvma_assert_stabilised}(hj5hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj5ubah}(h]h ](jF/jG/eh"]h$]h&]hhuh1j,/hji5hhhj{5hMzubjL/)}(h(struct vm_area_struct *vma)h]jR/)}(hstruct vm_area_struct *vmah](jX/)}(hj[/h]hstruct}(hj5hhhNhNubah}(h]h ]jd/ah"]h$]h&]uh1jW/hj5ubj/)}(h h]h }(hj5hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj5ubh)}(hhh]j3/)}(hvm_area_structh]hvm_area_struct}(hj5hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj5ubah}(h]h ]h"]h$]h&] refdomainjreftypej/ reftargetj5modnameN classnameNj/j/)}j/]j/)}j/j5sbc.vma_assert_stabilisedasbuh1hhj5ubj/)}(h h]h }(hj5hhhNhNubah}(h]h ]j(/ah"]h$]h&]uh1j/hj5ubj/)}(hjh]h*}(hj5hhhNhNubah}(h]h ]j/ah"]h$]h&]uh1j/hj5ubj3/)}(hvmah]hvma}(hj6hhhNhNubah}(h]h ]j?/ah"]h$]h&]uh1j2/hj5ubeh}(h]h ]h"]h$]h&]noemphhhuh1jQ/hj5ubah}(h]h ]h"]h$]h&]hhuh1jK/hji5hhhj{5hMzubeh}(h]h ]h"]h$]h&]hhj/uh1j/j/j/hje5hhhj{5hMzubah}(h]j`5ah ](j/j/eh"]h$]h&]j/j/)j/huh1j.hj{5hMzhjb5hhubj/)}(hhh]j)}(hcassert that this VMA cannot be changed from underneath us either by having a VMA or mmap lock held.h]hcassert that this VMA cannot be changed from underneath us either by having a VMA or mmap lock held.}(hj-6hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMzhj*6hhubah}(h]h ]h"]h$]h&]uh1j/hjb5hhhj{5hMzubeh}(h]h ](jfunctioneh"]h$]h&]j0jj0jE6j0jE6j0j0j0uh1j.hhhj.hNhNubj0)}(hXc**Parameters** ``struct vm_area_struct *vma`` The VMA whose stability we wish to assess. **Description** If lockdep is enabled we can precisely ensure stability via either an mmap lock owned by us or a specific VMA lock. With lockdep disabled we may sometimes race with other threads acquiring the mmap read lock simultaneous with our VMA read lock.h](j)}(h**Parameters**h]j)}(hjO6h]h Parameters}(hjQ6hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjM6ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM~hjI6ubj40)}(hhh]j90)}(hJ``struct vm_area_struct *vma`` The VMA whose stability we wish to assess. h](j?0)}(h``struct vm_area_struct *vma``h]j )}(hjn6h]hstruct vm_area_struct *vma}(hjp6hhhNhNubah}(h]h ]h"]h$]h&]uh1j hjl6ubah}(h]h ]h"]h$]h&]uh1j>0h[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM|hjh6ubjY0)}(hhh]j)}(h*The VMA whose stability we wish to assess.h]h*The VMA whose stability we wish to assess.}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj6hM|hj6ubah}(h]h ]h"]h$]h&]uh1jX0hjh6ubeh}(h]h ]h"]h$]h&]uh1j80hj6hM|hje6ubah}(h]h ]h"]h$]h&]uh1j30hjI6ubj)}(h**Description**h]j)}(hj6h]h Description}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj6ubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM~hjI6ubj)}(hsIf lockdep is enabled we can precisely ensure stability via either an mmap lock owned by us or a specific VMA lock.h]hsIf lockdep is enabled we can precisely ensure stability via either an mmap lock owned by us or a specific VMA lock.}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhM}hjI6ubj)}(hWith lockdep disabled we may sometimes race with other threads acquiring the mmap read lock simultaneous with our VMA read lock.h]hWith lockdep disabled we may sometimes race with other threads acquiring the mmap read lock simultaneous with our VMA read lock.}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1jh[/var/lib/git/docbuild/linux/Documentation/mm/process_addrs:916: ./include/linux/mmap_lock.hhMhjI6ubeh}(h]h ] kernelindentah"]h$]h&]uh1j0hj.hhhNhNubeh}(h]functions-and-structuresah ]h"]functions and structuresah$]h&]uh1hhhhhhhhMubeh}(h]process-addressesah ]h"]process addressesah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksjfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerj7error_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourcehnj _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(j6j6j#j jjjjjjj j j:j7jjj.j.j'j'js jp j"j"js%jp%j&j&j'j'j+j+jL)jI)j+j+jk.jh.j.j.j6j6u nametypes}(j6j#jjjj j:jj.j'js j"js%j&j'j+jL)j+jk.j.j6uh}(j6hj jjjjjjj j jj7jjj=j.j&j'jOjp jcj"jv jp%j"j&jv%j'j&j+j'jI)j'j+jO)jh.j+j.jn.j6j.j.j.j0j0j2j2j3j3j`5je5j j j[j jjnjjjR.j,u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}j$7KsRparse_messages]transform_messages] transformerN include_log] decorationNhhub.