Vsphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget&/translations/zh_CN/mm/unevictable-lrumodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget&/translations/zh_TW/mm/unevictable-lrumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget&/translations/it_IT/mm/unevictable-lrumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget&/translations/ja_JP/mm/unevictable-lrumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget&/translations/ko_KR/mm/unevictable-lrumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget&/translations/sp_SP/mm/unevictable-lrumodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhsection)}(hhh](htitle)}(hUnevictable LRU Infrastructureh]hUnevictable LRU Infrastructure}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhh@/var/lib/git/docbuild/linux/Documentation/mm/unevictable-lru.rsthKubhtopic)}(hhh]h bullet_list)}(hhh](h list_item)}(hhh]h paragraph)}(hhh]h reference)}(hhh]h Introduction}(hhhhhNhNubah}(h]id1ah ]h"]h$]h&]refid introductionuh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]hThe Unevictable LRU}(hhhhhNhNubah}(h]id2ah ]h"]h$]h&]refidthe-unevictable-lruuh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hThe Unevictable LRU Folio List}(hjhhhNhNubah}(h]id3ah ]h"]h$]h&]refidthe-unevictable-lru-folio-listuh1hhj ubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h Memory Control Group Interaction}(hj0hhhNhNubah}(h]id4ah ]h"]h$]h&]refid memory-control-group-interactionuh1hhj-ubah}(h]h ]h"]h$]h&]uh1hhj*ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h"Marking Address Spaces Unevictable}(hjRhhhNhNubah}(h]id5ah ]h"]h$]h&]refid"marking-address-spaces-unevictableuh1hhjOubah}(h]h ]h"]h$]h&]uh1hhjLubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hDetecting Unevictable Pages}(hjthhhNhNubah}(h]id6ah ]h"]h$]h&]refiddetecting-unevictable-pagesuh1hhjqubah}(h]h ]h"]h$]h&]uh1hhjnubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h)Vmscan’s Handling of Unevictable Folios}(hjhhhNhNubah}(h]id7ah ]h"]h$]h&]refid'vmscan-s-handling-of-unevictable-foliosuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhhubeh}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]h MLOCKED Pages}(hjhhhNhNubah}(h]id8ah ]h"]h$]h&]refid mlocked-pagesuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hHistory}(hjhhhNhNubah}(h]id9ah ]h"]h$]h&]refidhistoryuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hBasic Management}(hjhhhNhNubah}(h]id10ah ]h"]h$]h&]refidbasic-managementuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h0mlock()/mlock2()/mlockall() System Call Handling}(hj'hhhNhNubah}(h]id11ah ]h"]h$]h&]refid*mlock-mlock2-mlockall-system-call-handlinguh1hhj$ubah}(h]h ]h"]h$]h&]uh1hhj!ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hFiltering Special VMAs}(hjIhhhNhNubah}(h]id12ah ]h"]h$]h&]refidfiltering-special-vmasuh1hhjFubah}(h]h ]h"]h$]h&]uh1hhjCubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h+munlock()/munlockall() System Call Handling}(hjkhhhNhNubah}(h]id13ah ]h"]h$]h&]refid'munlock-munlockall-system-call-handlinguh1hhjhubah}(h]h ]h"]h$]h&]uh1hhjeubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hMigrating MLOCKED Pages}(hjhhhNhNubah}(h]id14ah ]h"]h$]h&]refidmigrating-mlocked-pagesuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hCompacting MLOCKED Pages}(hjhhhNhNubah}(h]id15ah ]h"]h$]h&]refidcompacting-mlocked-pagesuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hMLOCKING Transparent Huge Pages}(hjhhhNhNubah}(h]id16ah ]h"]h$]h&]refidmlocking-transparent-huge-pagesuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h%mmap(MAP_LOCKED) System Call Handling}(hjhhhNhNubah}(h]id17ah ]h"]h$]h&]refid$mmap-map-locked-system-call-handlinguh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h+munmap()/exit()/exec() System Call Handling}(hjhhhNhNubah}(h]id18ah ]h"]h$]h&]refid%munmap-exit-exec-system-call-handlinguh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hTruncating MLOCKED Pages}(hj7hhhNhNubah}(h]id19ah ]h"]h$]h&]refidtruncating-mlocked-pagesuh1hhj4ubah}(h]h ]h"]h$]h&]uh1hhj1ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hPage Reclaim in shrink_*_list()}(hjYhhhNhNubah}(h]id20ah ]h"]h$]h&]refidpage-reclaim-in-shrink-listuh1hhjVubah}(h]h ]h"]h$]h&]uh1hhjSubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhhubeh}(h]h ]h"]h$]h&]uh1hhhhhhNhNubah}(h]contentsah ](contentslocaleh"]contentsah$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Introductionh]h Introduction}(hjhhhNhNubah}(h]h ]h"]h$]h&]refidhuh1hhjhhhhhK ubh)}(hThis document describes the Linux memory manager's "Unevictable LRU" infrastructure and the use of this to manage several types of "unevictable" folios.h]hThis document describes the Linux memory manager’s “Unevictable LRU” infrastructure and the use of this to manage several types of “unevictable” folios.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hjhhubh)}(hXThe document attempts to provide the overall rationale behind this mechanism and the rationale for some of the design decisions that drove the implementation. The latter design rationale is discussed in the context of an implementation description. Admittedly, one can obtain the implementation details - the "what does it do?" - by reading the code. One hopes that the descriptions below add value by provide the answer to "why does it do that?".h]hXThe document attempts to provide the overall rationale behind this mechanism and the rationale for some of the design decisions that drove the implementation. The latter design rationale is discussed in the context of an implementation description. Admittedly, one can obtain the implementation details - the “what does it do?” - by reading the code. One hopes that the descriptions below add value by provide the answer to “why does it do that?”.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]hah ]h"] introductionah$]h&]uh1hhhhhhhhK ubh)}(hhh](h)}(hThe Unevictable LRUh]hThe Unevictable LRU}(hjhhhNhNubah}(h]h ]h"]h$]h&]jhuh1hhjhhhhhKubh)}(hXRThe Unevictable LRU facility adds an additional LRU list to track unevictable folios and to hide these folios from vmscan. This mechanism is based on a patch by Larry Woodman of Red Hat to address several scalability problems with folio reclaim in Linux. The problems have been observed at customer sites on large memory x86_64 systems.h]hXRThe Unevictable LRU facility adds an additional LRU list to track unevictable folios and to hide these folios from vmscan. This mechanism is based on a patch by Larry Woodman of Red Hat to address several scalability problems with folio reclaim in Linux. The problems have been observed at customer sites on large memory x86_64 systems.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXTo illustrate this with an example, a non-NUMA x86_64 platform with 128GB of main memory will have over 32 million 4k pages in a single node. When a large fraction of these pages are not evictable for any reason [see below], vmscan will spend a lot of time scanning the LRU lists looking for the small fraction of pages that are evictable. This can result in a situation where all CPUs are spending 100% of their time in vmscan for hours or days on end, with the system completely unresponsive.h]hXTo illustrate this with an example, a non-NUMA x86_64 platform with 128GB of main memory will have over 32 million 4k pages in a single node. When a large fraction of these pages are not evictable for any reason [see below], vmscan will spend a lot of time scanning the LRU lists looking for the small fraction of pages that are evictable. This can result in a situation where all CPUs are spending 100% of their time in vmscan for hours or days on end, with the system completely unresponsive.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK!hjhhubh)}(hJThe unevictable list addresses the following classes of unevictable pages:h]hJThe unevictable list addresses the following classes of unevictable pages:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK)hjhhubh block_quote)}(h* Those owned by ramfs. * Those owned by tmpfs with the noswap mount option. * Those mapped into SHM_LOCK'd shared memory regions. * Those mapped into VM_LOCKED [mlock()ed] VMAs. h]h)}(hhh](h)}(hThose owned by ramfs. h]h)}(hThose owned by ramfs.h]hThose owned by ramfs.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK+hj ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h3Those owned by tmpfs with the noswap mount option. h]h)}(h2Those owned by tmpfs with the noswap mount option.h]h2Those owned by tmpfs with the noswap mount option.}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK-hj"ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h4Those mapped into SHM_LOCK'd shared memory regions. h]h)}(h3Those mapped into SHM_LOCK'd shared memory regions.h]h5Those mapped into SHM_LOCK’d shared memory regions.}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK/hj:ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h.Those mapped into VM_LOCKED [mlock()ed] VMAs. h]h)}(h-Those mapped into VM_LOCKED [mlock()ed] VMAs.h]h-Those mapped into VM_LOCKED [mlock()ed] VMAs.}(hjVhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK1hjRubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]bullet*uh1hhhhK+hjubah}(h]h ]h"]h$]h&]uh1jhhhK+hjhhubh)}(hThe infrastructure may also be able to handle other conditions that make pages unevictable, either by definition or by circumstance, in the future.h]hThe infrastructure may also be able to handle other conditions that make pages unevictable, either by definition or by circumstance, in the future.}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK3hjhhubh)}(hhh](h)}(hThe Unevictable LRU Folio Listh]hThe Unevictable LRU Folio List}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhK8ubh)}(hXDThe Unevictable LRU folio list is a lie. It was never an LRU-ordered list, but a companion to the LRU-ordered anonymous and file, active and inactive folio lists; and now it is not even a folio list. But following familiar convention, here in this document and in the source, we often imagine it as a fifth LRU folio list.h]hXDThe Unevictable LRU folio list is a lie. It was never an LRU-ordered list, but a companion to the LRU-ordered anonymous and file, active and inactive folio lists; and now it is not even a folio list. But following familiar convention, here in this document and in the source, we often imagine it as a fifth LRU folio list.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK:hjhhubh)}(hThe Unevictable LRU infrastructure consists of an additional, per-node, LRU list called the "unevictable" list and an associated folio flag, PG_unevictable, to indicate that the folio is being managed on the unevictable list.h]hThe Unevictable LRU infrastructure consists of an additional, per-node, LRU list called the “unevictable” list and an associated folio flag, PG_unevictable, to indicate that the folio is being managed on the unevictable list.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK@hjhhubh)}(hThe PG_unevictable flag is analogous to, and mutually exclusive with, the PG_active flag in that it indicates on which LRU list a folio resides when PG_lru is set.h]hThe PG_unevictable flag is analogous to, and mutually exclusive with, the PG_active flag in that it indicates on which LRU list a folio resides when PG_lru is set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKDhjhhubh)}(h|The Unevictable LRU infrastructure maintains unevictable folios as if they were on an additional LRU list for a few reasons:h]h|The Unevictable LRU infrastructure maintains unevictable folios as if they were on an additional LRU list for a few reasons:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKHhjhhubj)}(hX(1) We get to "treat unevictable folios just like we treat other folios in the system - which means we get to use the same code to manipulate them, the same code to isolate them (for migrate, etc.), the same code to keep track of the statistics, etc..." [Rik van Riel] (2) We want to be able to migrate unevictable folios between nodes for memory defragmentation, workload management and memory hotplug. The Linux kernel can only migrate folios that it can successfully isolate from the LRU lists (or "Movable" folios: outside of consideration here). If we were to maintain folios elsewhere than on an LRU-like list, where they can be detected by folio_isolate_lru(), we would prevent their migration. h]henumerated_list)}(hhh](h)}(hX We get to "treat unevictable folios just like we treat other folios in the system - which means we get to use the same code to manipulate them, the same code to isolate them (for migrate, etc.), the same code to keep track of the statistics, etc..." [Rik van Riel] h]h)}(hXWe get to "treat unevictable folios just like we treat other folios in the system - which means we get to use the same code to manipulate them, the same code to isolate them (for migrate, etc.), the same code to keep track of the statistics, etc..." [Rik van Riel]h]hX We get to “treat unevictable folios just like we treat other folios in the system - which means we get to use the same code to manipulate them, the same code to isolate them (for migrate, etc.), the same code to keep track of the statistics, etc...” [Rik van Riel]}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKKhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hXWe want to be able to migrate unevictable folios between nodes for memory defragmentation, workload management and memory hotplug. The Linux kernel can only migrate folios that it can successfully isolate from the LRU lists (or "Movable" folios: outside of consideration here). If we were to maintain folios elsewhere than on an LRU-like list, where they can be detected by folio_isolate_lru(), we would prevent their migration. h]h)}(hXWe want to be able to migrate unevictable folios between nodes for memory defragmentation, workload management and memory hotplug. The Linux kernel can only migrate folios that it can successfully isolate from the LRU lists (or "Movable" folios: outside of consideration here). If we were to maintain folios elsewhere than on an LRU-like list, where they can be detected by folio_isolate_lru(), we would prevent their migration.h]hXWe want to be able to migrate unevictable folios between nodes for memory defragmentation, workload management and memory hotplug. The Linux kernel can only migrate folios that it can successfully isolate from the LRU lists (or “Movable” folios: outside of consideration here). If we were to maintain folios elsewhere than on an LRU-like list, where they can be detected by folio_isolate_lru(), we would prevent their migration.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKPhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]enumtypearabicprefix(suffix)uh1jhjubah}(h]h ]h"]h$]h&]uh1jhhhKKhjhhubh)}(hThe unevictable list does not differentiate between file-backed and anonymous, swap-backed folios. This differentiation is only important while the folios are, in fact, evictable.h]hThe unevictable list does not differentiate between file-backed and anonymous, swap-backed folios. This differentiation is only important while the folios are, in fact, evictable.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKWhjhhubh)}(hThe unevictable list benefits from the "arrayification" of the per-node LRU lists and statistics originally proposed and posted by Christoph Lameter.h]hThe unevictable list benefits from the “arrayification” of the per-node LRU lists and statistics originally proposed and posted by Christoph Lameter.}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK[hjhhubeh}(h]jah ]h"]the unevictable lru folio listah$]h&]uh1hhjhhhhhK8ubh)}(hhh](h)}(h Memory Control Group Interactionh]h Memory Control Group Interaction}(hj@hhhNhNubah}(h]h ]h"]h$]h&]jj9uh1hhj=hhhhhK`ubh)}(hThe unevictable LRU facility interacts with the memory control group [aka memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the lru_list enum.h]hThe unevictable LRU facility interacts with the memory control group [aka memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the lru_list enum.}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKbhj=hhubh)}(hXThe memory controller data structure automatically gets a per-node unevictable list as a result of the "arrayification" of the per-node LRU lists (one per lru_list enum element). The memory controller tracks the movement of pages to and from the unevictable list.h]hX The memory controller data structure automatically gets a per-node unevictable list as a result of the “arrayification” of the per-node LRU lists (one per lru_list enum element). The memory controller tracks the movement of pages to and from the unevictable list.}(hj\hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKfhj=hhubh)}(hWhen a memory control group comes under memory pressure, the controller will not attempt to reclaim pages on the unevictable list. This has a couple of effects:h]hWhen a memory control group comes under memory pressure, the controller will not attempt to reclaim pages on the unevictable list. This has a couple of effects:}(hjjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKkhj=hhubj)}(hX(1) Because the pages are "hidden" from reclaim on the unevictable list, the reclaim process can be more efficient, dealing only with pages that have a chance of being reclaimed. (2) On the other hand, if too many of the pages charged to the control group are unevictable, the evictable portion of the working set of the tasks in the control group may not fit into the available memory. This can cause the control group to thrash or to OOM-kill tasks. h]j)}(hhh](h)}(hBecause the pages are "hidden" from reclaim on the unevictable list, the reclaim process can be more efficient, dealing only with pages that have a chance of being reclaimed. h]h)}(hBecause the pages are "hidden" from reclaim on the unevictable list, the reclaim process can be more efficient, dealing only with pages that have a chance of being reclaimed.h]hBecause the pages are “hidden” from reclaim on the unevictable list, the reclaim process can be more efficient, dealing only with pages that have a chance of being reclaimed.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKohjubah}(h]h ]h"]h$]h&]uh1hhj|ubh)}(hXOn the other hand, if too many of the pages charged to the control group are unevictable, the evictable portion of the working set of the tasks in the control group may not fit into the available memory. This can cause the control group to thrash or to OOM-kill tasks. h]h)}(hX On the other hand, if too many of the pages charged to the control group are unevictable, the evictable portion of the working set of the tasks in the control group may not fit into the available memory. This can cause the control group to thrash or to OOM-kill tasks.h]hX On the other hand, if too many of the pages charged to the control group are unevictable, the evictable portion of the working set of the tasks in the control group may not fit into the available memory. This can cause the control group to thrash or to OOM-kill tasks.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKshjubah}(h]h ]h"]h$]h&]uh1hhj|ubeh}(h]h ]h"]h$]h&]jjjjjjuh1jhjxubah}(h]h ]h"]h$]h&]uh1jhhhKohj=hhubhtarget)}(h.. _mark_addr_space_unevict:h]h}(h]h ]h"]h$]h&]jmark-addr-space-unevictuh1jhKyhj=hhhhubeh}(h]j?ah ]h"] memory control group interactionah$]h&]uh1hhjhhhhhK`ubh)}(hhh](h)}(h"Marking Address Spaces Unevictableh]h"Marking Address Spaces Unevictable}(hjhhhNhNubah}(h]h ]h"]h$]h&]jj[uh1hhjhhhhhK|ubh)}(hXFor facilities such as ramfs none of the pages attached to the address space may be evicted. To prevent eviction of any such pages, the AS_UNEVICTABLE address space flag is provided, and this can be manipulated by a filesystem using a number of wrapper functions:h]hXFor facilities such as ramfs none of the pages attached to the address space may be evicted. To prevent eviction of any such pages, the AS_UNEVICTABLE address space flag is provided, and this can be manipulated by a filesystem using a number of wrapper functions:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK~hjhhubj)}(hX* ``void mapping_set_unevictable(struct address_space *mapping);`` Mark the address space as being completely unevictable. * ``void mapping_clear_unevictable(struct address_space *mapping);`` Mark the address space as being evictable. * ``int mapping_unevictable(struct address_space *mapping);`` Query the address space, and return true if it is completely unevictable. h]h)}(hhh](h)}(h``void mapping_set_unevictable(struct address_space *mapping);`` Mark the address space as being completely unevictable. h](h)}(h@``void mapping_set_unevictable(struct address_space *mapping);``h]hliteral)}(hjh]hvoid mapping_clear_unevictable(struct address_space *mapping);}(hj8hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubah}(h]h ]h"]h$]h&]uh1hhhhKhj0ubj)}(h+Mark the address space as being evictable. h]h)}(h*Mark the address space as being evictable.h]h*Mark the address space as being evictable.}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjKubah}(h]h ]h"]h$]h&]uh1jhhhKhj0ubeh}(h]h ]h"]h$]h&]uh1hhjubh)}(h``int mapping_unevictable(struct address_space *mapping);`` Query the address space, and return true if it is completely unevictable. h](h)}(h;``int mapping_unevictable(struct address_space *mapping);``h]j)}(hjoh]h7int mapping_unevictable(struct address_space *mapping);}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjmubah}(h]h ]h"]h$]h&]uh1hhhhKhjiubj)}(hJQuery the address space, and return true if it is completely unevictable. h]h)}(hIQuery the address space, and return true if it is completely unevictable.h]hIQuery the address space, and return true if it is completely unevictable.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhhhKhjiubeh}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jpjquh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubh)}(h7These are currently used in three places in the kernel:h]h7These are currently used in three places in the kernel:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hX[(1) By ramfs to mark the address spaces of its inodes when they are created, and this mark remains for the life of the inode. (2) By SYSV SHM to mark SHM_LOCK'd address spaces until SHM_UNLOCK is called. Note that SHM_LOCK is not required to page in the locked pages if they're swapped out; the application must touch the pages manually if it wants to ensure they're in memory. (3) By the i915 driver to mark pinned address space until it's unpinned. The amount of unevictable memory marked by i915 driver is roughly the bounded object size in debugfs/dri/0/i915_gem_objects. h]j)}(hhh](h)}(hzBy ramfs to mark the address spaces of its inodes when they are created, and this mark remains for the life of the inode. h]h)}(hyBy ramfs to mark the address spaces of its inodes when they are created, and this mark remains for the life of the inode.h]hyBy ramfs to mark the address spaces of its inodes when they are created, and this mark remains for the life of the inode.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hBy SYSV SHM to mark SHM_LOCK'd address spaces until SHM_UNLOCK is called. Note that SHM_LOCK is not required to page in the locked pages if they're swapped out; the application must touch the pages manually if it wants to ensure they're in memory. h]h)}(hBy SYSV SHM to mark SHM_LOCK'd address spaces until SHM_UNLOCK is called. Note that SHM_LOCK is not required to page in the locked pages if they're swapped out; the application must touch the pages manually if it wants to ensure they're in memory.h]hBy SYSV SHM to mark SHM_LOCK’d address spaces until SHM_UNLOCK is called. Note that SHM_LOCK is not required to page in the locked pages if they’re swapped out; the application must touch the pages manually if it wants to ensure they’re in memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hBy the i915 driver to mark pinned address space until it's unpinned. The amount of unevictable memory marked by i915 driver is roughly the bounded object size in debugfs/dri/0/i915_gem_objects. h]h)}(hBy the i915 driver to mark pinned address space until it's unpinned. The amount of unevictable memory marked by i915 driver is roughly the bounded object size in debugfs/dri/0/i915_gem_objects.h]hBy the i915 driver to mark pinned address space until it’s unpinned. The amount of unevictable memory marked by i915 driver is roughly the bounded object size in debugfs/dri/0/i915_gem_objects.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjjjjjuh1jhjubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubeh}(h](jajeh ]h"]("marking address spaces unevictablemark_addr_space_unevicteh$]h&]uh1hhjhhhhhK|expect_referenced_by_name}jjsexpect_referenced_by_id}jjsubh)}(hhh](h)}(hDetecting Unevictable Pagesh]hDetecting Unevictable Pages}(hj&hhhNhNubah}(h]h ]h"]h$]h&]jj}uh1hhj#hhhhhKubh)}(hThe function folio_evictable() in mm/internal.h determines whether a folio is evictable or not using the query function outlined above [see section :ref:`Marking address spaces unevictable `] to check the AS_UNEVICTABLE flag.h](hThe function folio_evictable() in mm/internal.h determines whether a folio is evictable or not using the query function outlined above [see section }(hj4hhhNhNubh)}(hC:ref:`Marking address spaces unevictable `h]hinline)}(hj>h]h"Marking address spaces unevictable}(hjBhhhNhNubah}(h]h ](xrefstdstd-refeh"]h$]h&]uh1j@hj<ubah}(h]h ]h"]h$]h&]refdocmm/unevictable-lru refdomainjMreftyperef refexplicitrefwarn reftargetmark_addr_space_unevictuh1hhhhKhj4ubh#] to check the AS_UNEVICTABLE flag.}(hj4hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj#hhubh)}(hXFor address spaces that are so marked after being populated (as SHM regions might be), the lock action (e.g. SHM_LOCK) can be lazy, and need not populate the page tables for the region as does, for example, mlock(), nor need it make any special effort to push any pages in the SHM_LOCK'd area to the unevictable list. Instead, vmscan will do this if and when it encounters the folios during a reclamation scan.h]hXFor address spaces that are so marked after being populated (as SHM regions might be), the lock action (e.g. SHM_LOCK) can be lazy, and need not populate the page tables for the region as does, for example, mlock(), nor need it make any special effort to push any pages in the SHM_LOCK’d area to the unevictable list. Instead, vmscan will do this if and when it encounters the folios during a reclamation scan.}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj#hhubh)}(hXFOn an unlock action (such as SHM_UNLOCK), the unlocker (e.g. shmctl()) must scan the pages in the region and "rescue" them from the unevictable list if no other condition is keeping them unevictable. If an unevictable region is destroyed, the pages are also "rescued" from the unevictable list in the process of freeing them.h]hXNOn an unlock action (such as SHM_UNLOCK), the unlocker (e.g. shmctl()) must scan the pages in the region and “rescue” them from the unevictable list if no other condition is keeping them unevictable. If an unevictable region is destroyed, the pages are also “rescued” from the unevictable list in the process of freeing them.}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj#hhubh)}(hfolio_evictable() also checks for mlocked folios by calling folio_test_mlocked(), which is set when a folio is faulted into a VM_LOCKED VMA, or found in a VMA being VM_LOCKED.h]hfolio_evictable() also checks for mlocked folios by calling folio_test_mlocked(), which is set when a folio is faulted into a VM_LOCKED VMA, or found in a VMA being VM_LOCKED.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj#hhubeh}(h]jah ]h"]detecting unevictable pagesah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h'Vmscan's Handling of Unevictable Foliosh]h)Vmscan’s Handling of Unevictable Folios}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hXIf unevictable folios are culled in the fault path, or moved to the unevictable list at mlock() or mmap() time, vmscan will not encounter the folios until they have become evictable again (via munlock() for example) and have been "rescued" from the unevictable list. However, there may be situations where we decide, for the sake of expediency, to leave an unevictable folio on one of the regular active/inactive LRU lists for vmscan to deal with. vmscan checks for such folios in all of the shrink_{active|inactive|folio}_list() functions and will "cull" such folios that it encounters: that is, it diverts those folios to the unevictable list for the memory cgroup and node being scanned.h]hXIf unevictable folios are culled in the fault path, or moved to the unevictable list at mlock() or mmap() time, vmscan will not encounter the folios until they have become evictable again (via munlock() for example) and have been “rescued” from the unevictable list. However, there may be situations where we decide, for the sake of expediency, to leave an unevictable folio on one of the regular active/inactive LRU lists for vmscan to deal with. vmscan checks for such folios in all of the shrink_{active|inactive|folio}_list() functions and will “cull” such folios that it encounters: that is, it diverts those folios to the unevictable list for the memory cgroup and node being scanned.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXThere may be situations where a folio is mapped into a VM_LOCKED VMA, but the folio does not have the mlocked flag set. Such folios will make it all the way to shrink_active_list() or shrink_folio_list() where they will be detected when vmscan walks the reverse map in folio_referenced() or try_to_unmap(). The folio is culled to the unevictable list when it is released by the shrinker.h]hXThere may be situations where a folio is mapped into a VM_LOCKED VMA, but the folio does not have the mlocked flag set. Such folios will make it all the way to shrink_active_list() or shrink_folio_list() where they will be detected when vmscan walks the reverse map in folio_referenced() or try_to_unmap(). The folio is culled to the unevictable list when it is released by the shrinker.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXTo "cull" an unevictable folio, vmscan simply puts the folio back on the LRU list using folio_putback_lru() - the inverse operation to folio_isolate_lru() - after dropping the folio lock. Because the condition which makes the folio unevictable may change once the folio is unlocked, __pagevec_lru_add_fn() will recheck the unevictable state of a folio before placing it on the unevictable list.h]hXTo “cull” an unevictable folio, vmscan simply puts the folio back on the LRU list using folio_putback_lru() - the inverse operation to folio_isolate_lru() - after dropping the folio lock. Because the condition which makes the folio unevictable may change once the folio is unlocked, __pagevec_lru_add_fn() will recheck the unevictable state of a folio before placing it on the unevictable list.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]jah ]h"]'vmscan's handling of unevictable foliosah$]h&]uh1hhjhhhhhKubeh}(h]hah ]h"]the unevictable lruah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h MLOCKED Pagesh]h MLOCKED Pages}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hThe unevictable folio list is also useful for mlock(), in addition to ramfs and SYSV SHM. Note that mlock() is only available in CONFIG_MMU=y situations; in NOMMU situations, all mappings are effectively mlocked.h]hThe unevictable folio list is also useful for mlock(), in addition to ramfs and SYSV SHM. Note that mlock() is only available in CONFIG_MMU=y situations; in NOMMU situations, all mappings are effectively mlocked.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](h)}(hHistoryh]hHistory}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hX&The "Unevictable mlocked Pages" infrastructure is based on work originally posted by Nick Piggin in an RFC patch entitled "mm: mlocked pages off LRU". Nick posted his patch as an alternative to a patch posted by Christoph Lameter to achieve the same objective: hiding mlocked pages from vmscan.h]hX.The “Unevictable mlocked Pages” infrastructure is based on work originally posted by Nick Piggin in an RFC patch entitled “mm: mlocked pages off LRU”. Nick posted his patch as an alternative to a patch posted by Christoph Lameter to achieve the same objective: hiding mlocked pages from vmscan.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXIn Nick's patch, he used one of the struct page LRU list link fields as a count of VM_LOCKED VMAs that map the page (Rik van Riel had the same idea three years earlier). But this use of the link field for a count prevented the management of the pages on an LRU list, and thus mlocked pages were not migratable as folio_isolate_lru() could not detect them, and the LRU list link field was not available to the migration subsystem.h]hXIn Nick’s patch, he used one of the struct page LRU list link fields as a count of VM_LOCKED VMAs that map the page (Rik van Riel had the same idea three years earlier). But this use of the link field for a count prevented the management of the pages on an LRU list, and thus mlocked pages were not migratable as folio_isolate_lru() could not detect them, and the LRU list link field was not available to the migration subsystem.}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXYNick resolved this by putting mlocked pages back on the LRU list before attempting to isolate them, thus abandoning the count of VM_LOCKED VMAs. When Nick's patch was integrated with the Unevictable LRU work, the count was replaced by walking the reverse map when munlocking, to determine whether any other VM_LOCKED VMAs still mapped the page.h]hX[Nick resolved this by putting mlocked pages back on the LRU list before attempting to isolate them, thus abandoning the count of VM_LOCKED VMAs. When Nick’s patch was integrated with the Unevictable LRU work, the count was replaced by walking the reverse map when munlocking, to determine whether any other VM_LOCKED VMAs still mapped the page.}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX However, walking the reverse map for each page when munlocking was ugly and inefficient, and could lead to catastrophic contention on a file's rmap lock, when many processes which had it mlocked were trying to exit. In 5.18, the idea of keeping mlock_count in Unevictable LRU list link field was revived and put to work, without preventing the migration of mlocked pages. This is why the "Unevictable LRU list" cannot be a linked list of pages now; but there was no use for that linked list anyway - though its size is maintained for meminfo.h]hX&However, walking the reverse map for each page when munlocking was ugly and inefficient, and could lead to catastrophic contention on a file’s rmap lock, when many processes which had it mlocked were trying to exit. In 5.18, the idea of keeping mlock_count in Unevictable LRU list link field was revived and put to work, without preventing the migration of mlocked pages. This is why the “Unevictable LRU list” cannot be a linked list of pages now; but there was no use for that linked list anyway - though its size is maintained for meminfo.}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]jah ]h"]historyah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(hBasic Managementh]hBasic Management}(hjWhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjThhhhhKubh)}(hX$mlocked pages - pages mapped into a VM_LOCKED VMA - are a class of unevictable pages. When such a page has been "noticed" by the memory management subsystem, the folio is marked with the PG_mlocked flag. This can be manipulated using folio_set_mlocked() and folio_clear_mlocked() functions.h]hX(mlocked pages - pages mapped into a VM_LOCKED VMA - are a class of unevictable pages. When such a page has been “noticed” by the memory management subsystem, the folio is marked with the PG_mlocked flag. This can be manipulated using folio_set_mlocked() and folio_clear_mlocked() functions.}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjThhubh)}(hA PG_mlocked page will be placed on the unevictable list when it is added to the LRU. Such pages can be "noticed" by memory management in several places:h]hA PG_mlocked page will be placed on the unevictable list when it is added to the LRU. Such pages can be “noticed” by memory management in several places:}(hjshhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjThhubj)}(hX(1) in the mlock()/mlock2()/mlockall() system call handlers; (2) in the mmap() system call handler when mmapping a region with the MAP_LOCKED flag; (3) mmapping a region in a task that has called mlockall() with the MCL_FUTURE flag; (4) in the fault path and when a VM_LOCKED stack segment is expanded; or (5) as mentioned above, in vmscan:shrink_folio_list() when attempting to reclaim a page in a VM_LOCKED VMA by folio_referenced() or try_to_unmap(). h]j)}(hhh](h)}(h9in the mlock()/mlock2()/mlockall() system call handlers; h]h)}(h8in the mlock()/mlock2()/mlockall() system call handlers;h]h8in the mlock()/mlock2()/mlockall() system call handlers;}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hSin the mmap() system call handler when mmapping a region with the MAP_LOCKED flag; h]h)}(hRin the mmap() system call handler when mmapping a region with the MAP_LOCKED flag;h]hRin the mmap() system call handler when mmapping a region with the MAP_LOCKED flag;}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hQmmapping a region in a task that has called mlockall() with the MCL_FUTURE flag; h]h)}(hPmmapping a region in a task that has called mlockall() with the MCL_FUTURE flag;h]hPmmapping a region in a task that has called mlockall() with the MCL_FUTURE flag;}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hEin the fault path and when a VM_LOCKED stack segment is expanded; or h]h)}(hDin the fault path and when a VM_LOCKED stack segment is expanded; orh]hDin the fault path and when a VM_LOCKED stack segment is expanded; or}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(has mentioned above, in vmscan:shrink_folio_list() when attempting to reclaim a page in a VM_LOCKED VMA by folio_referenced() or try_to_unmap(). h]h)}(has mentioned above, in vmscan:shrink_folio_list() when attempting to reclaim a page in a VM_LOCKED VMA by folio_referenced() or try_to_unmap().h]has mentioned above, in vmscan:shrink_folio_list() when attempting to reclaim a page in a VM_LOCKED VMA by folio_referenced() or try_to_unmap().}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjjjjjuh1jhjubah}(h]h ]h"]h$]h&]uh1jhhhMhjThhubh)}(hImlocked pages become unlocked and rescued from the unevictable list when:h]hImlocked pages become unlocked and rescued from the unevictable list when:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjThhubj)}(hX<(1) mapped in a range unlocked via the munlock()/munlockall() system calls; (2) munmap()'d out of the last VM_LOCKED VMA that maps the page, including unmapping at task exit; (3) when the page is truncated from the last VM_LOCKED VMA of an mmapped file; or (4) before a page is COW'd in a VM_LOCKED VMA. h]j)}(hhh](h)}(hHmapped in a range unlocked via the munlock()/munlockall() system calls; h]h)}(hGmapped in a range unlocked via the munlock()/munlockall() system calls;h]hGmapped in a range unlocked via the munlock()/munlockall() system calls;}(hj% hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj! ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h_munmap()'d out of the last VM_LOCKED VMA that maps the page, including unmapping at task exit; h]h)}(h^munmap()'d out of the last VM_LOCKED VMA that maps the page, including unmapping at task exit;h]h`munmap()’d out of the last VM_LOCKED VMA that maps the page, including unmapping at task exit;}(hj= hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj9 ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hNwhen the page is truncated from the last VM_LOCKED VMA of an mmapped file; or h]h)}(hMwhen the page is truncated from the last VM_LOCKED VMA of an mmapped file; orh]hMwhen the page is truncated from the last VM_LOCKED VMA of an mmapped file; or}(hjU hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjQ ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h,before a page is COW'd in a VM_LOCKED VMA. h]h)}(h*before a page is COW'd in a VM_LOCKED VMA.h]h,before a page is COW’d in a VM_LOCKED VMA.}(hjm hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhji ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjjjjjuh1jhj ubah}(h]h ]h"]h$]h&]uh1jhhhMhjThhubeh}(h]jah ]h"]basic managementah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h0mlock()/mlock2()/mlockall() System Call Handlingh]h0mlock()/mlock2()/mlockall() System Call Handling}(hj hhhNhNubah}(h]h ]h"]h$]h&]jj0uh1hhj hhhhhM!ubh)}(hXmlock(), mlock2() and mlockall() system call handlers proceed to mlock_fixup() for each VMA in the range specified by the call. In the case of mlockall(), this is the entire active address space of the task. Note that mlock_fixup() is used for both mlocking and munlocking a range of memory. A call to mlock() an already VM_LOCKED VMA, or to munlock() a VMA that is not VM_LOCKED, is treated as a no-op and mlock_fixup() simply returns.h]hXmlock(), mlock2() and mlockall() system call handlers proceed to mlock_fixup() for each VMA in the range specified by the call. In the case of mlockall(), this is the entire active address space of the task. Note that mlock_fixup() is used for both mlocking and munlocking a range of memory. A call to mlock() an already VM_LOCKED VMA, or to munlock() a VMA that is not VM_LOCKED, is treated as a no-op and mlock_fixup() simply returns.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM#hj hhubh)}(hXwIf the VMA passes some filtering as described in "Filtering Special VMAs" below, mlock_fixup() will attempt to merge the VMA with its neighbors or split off a subset of the VMA if the range does not cover the entire VMA. Any pages already present in the VMA are then marked as mlocked by mlock_folio() via mlock_pte_range() via walk_page_range() via mlock_vma_pages_range().h]hX{If the VMA passes some filtering as described in “Filtering Special VMAs” below, mlock_fixup() will attempt to merge the VMA with its neighbors or split off a subset of the VMA if the range does not cover the entire VMA. Any pages already present in the VMA are then marked as mlocked by mlock_folio() via mlock_pte_range() via walk_page_range() via mlock_vma_pages_range().}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM*hj hhubh)}(hBefore returning from the system call, do_mlock() or mlockall() will call __mm_populate() to fault in the remaining pages via get_user_pages() and to mark those pages as mlocked as they are faulted.h]hBefore returning from the system call, do_mlock() or mlockall() will call __mm_populate() to fault in the remaining pages via get_user_pages() and to mark those pages as mlocked as they are faulted.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM0hj hhubh)}(hX7Note that the VMA being mlocked might be mapped with PROT_NONE. In this case, get_user_pages() will be unable to fault in the pages. That's okay. If pages do end up getting faulted into this VM_LOCKED VMA, they will be handled in the fault path - which is also how mlock2()'s MLOCK_ONFAULT areas are handled.h]hX;Note that the VMA being mlocked might be mapped with PROT_NONE. In this case, get_user_pages() will be unable to fault in the pages. That’s okay. If pages do end up getting faulted into this VM_LOCKED VMA, they will be handled in the fault path - which is also how mlock2()’s MLOCK_ONFAULT areas are handled.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM4hj hhubh)}(hXFor each PTE (or PMD) being faulted into a VMA, the page add rmap function calls mlock_vma_folio(), which calls mlock_folio() when the VMA is VM_LOCKED (unless it is a PTE mapping of a part of a transparent huge page). Or when it is a newly allocated anonymous page, folio_add_lru_vma() calls mlock_new_folio() instead: similar to mlock_folio(), but can make better judgments, since this page is held exclusively and known not to be on LRU yet.h]hXFor each PTE (or PMD) being faulted into a VMA, the page add rmap function calls mlock_vma_folio(), which calls mlock_folio() when the VMA is VM_LOCKED (unless it is a PTE mapping of a part of a transparent huge page). Or when it is a newly allocated anonymous page, folio_add_lru_vma() calls mlock_new_folio() instead: similar to mlock_folio(), but can make better judgments, since this page is held exclusively and known not to be on LRU yet.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM9hj hhubh)}(hXmlock_folio() sets PG_mlocked immediately, then places the page on the CPU's mlock folio batch, to batch up the rest of the work to be done under lru_lock by __mlock_folio(). __mlock_folio() sets PG_unevictable, initializes mlock_count and moves the page to unevictable state ("the unevictable LRU", but with mlock_count in place of LRU threading). Or if the page was already PG_lru and PG_unevictable and PG_mlocked, it simply increments the mlock_count.h]hXmlock_folio() sets PG_mlocked immediately, then places the page on the CPU’s mlock folio batch, to batch up the rest of the work to be done under lru_lock by __mlock_folio(). __mlock_folio() sets PG_unevictable, initializes mlock_count and moves the page to unevictable state (“the unevictable LRU”, but with mlock_count in place of LRU threading). Or if the page was already PG_lru and PG_unevictable and PG_mlocked, it simply increments the mlock_count.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM@hj hhubh)}(hX7But in practice that may not work ideally: the page may not yet be on an LRU, or it may have been temporarily isolated from LRU. In such cases the mlock_count field cannot be touched, but will be set to 0 later when __munlock_folio() returns the page to "LRU". Races prohibit mlock_count from being set to 1 then: rather than risk stranding a page indefinitely as unevictable, always err with mlock_count on the low side, so that when munlocked the page will be rescued to an evictable LRU, then perhaps be mlocked again later if vmscan finds it in a VM_LOCKED VMA.h]hX;But in practice that may not work ideally: the page may not yet be on an LRU, or it may have been temporarily isolated from LRU. In such cases the mlock_count field cannot be touched, but will be set to 0 later when __munlock_folio() returns the page to “LRU”. Races prohibit mlock_count from being set to 1 then: rather than risk stranding a page indefinitely as unevictable, always err with mlock_count on the low side, so that when munlocked the page will be rescued to an evictable LRU, then perhaps be mlocked again later if vmscan finds it in a VM_LOCKED VMA.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMGhj hhubeh}(h]j6ah ]h"]0mlock()/mlock2()/mlockall() system call handlingah$]h&]uh1hhjhhhhhM!ubh)}(hhh](h)}(hFiltering Special VMAsh]hFiltering Special VMAs}(hj hhhNhNubah}(h]h ]h"]h$]h&]jjRuh1hhj hhhhhMRubh)}(h8mlock_fixup() filters several classes of "special" VMAs:h]h hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hX0munlock_folio() uses the mlock pagevec to batch up work to be done under lru_lock by __munlock_folio(). __munlock_folio() decrements the folio's mlock_count, and when that reaches 0 it clears the mlocked flag and clears the unevictable flag, moving the folio from unevictable state to the inactive LRU.h]hX2munlock_folio() uses the mlock pagevec to batch up work to be done under lru_lock by __munlock_folio(). __munlock_folio() decrements the folio’s mlock_count, and when that reaches 0 it clears the mlocked flag and clears the unevictable flag, moving the folio from unevictable state to the inactive LRU.}(hjL hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hXgBut in practice that may not work ideally: the folio may not yet have reached "the unevictable LRU", or it may have been temporarily isolated from it. In those cases its mlock_count field is unusable and must be assumed to be 0: so that the folio will be rescued to an evictable LRU, then perhaps be mlocked again later if vmscan finds it in a VM_LOCKED VMA.h]hXkBut in practice that may not work ideally: the folio may not yet have reached “the unevictable LRU”, or it may have been temporarily isolated from it. In those cases its mlock_count field is unusable and must be assumed to be 0: so that the folio will be rescued to an evictable LRU, then perhaps be mlocked again later if vmscan finds it in a VM_LOCKED VMA.}(hjZ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubeh}(h]j$ah ]h"]+munmap()/exit()/exec() system call handlingah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hTruncating MLOCKED Pagesh]hTruncating MLOCKED Pages}(hjr hhhNhNubah}(h]h ]h"]h$]h&]jj@uh1hhjo hhhhhMubh)}(hFile truncation or hole punching forcibly unmaps the deleted pages from userspace; truncation even unmaps and deletes any private anonymous pages which had been Copied-On-Write from the file pages now being truncated.h]hFile truncation or hole punching forcibly unmaps the deleted pages from userspace; truncation even unmaps and deletes any private anonymous pages which had been Copied-On-Write from the file pages now being truncated.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjo hhubh)}(hX'Mlocked pages can be munlocked and deleted in this way: like with munmap(), for each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED (unless it was a PTE mapping of a part of a transparent huge page).h]hX'Mlocked pages can be munlocked and deleted in this way: like with munmap(), for each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED (unless it was a PTE mapping of a part of a transparent huge page).}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjo hhubh)}(hXHowever, if there is a racing munlock(), since mlock_vma_pages_range() starts munlocking by clearing VM_LOCKED from a VMA, before munlocking all the pages present, if one of those pages were unmapped by truncation or hole punch before mlock_pte_range() reached it, it would not be recognized as mlocked by this VMA, and would not be counted out of mlock_count. In this rare case, a page may still appear as PG_mlocked after it has been fully unmapped: and it is left to release_pages() (or __page_cache_release()) to clear it and update statistics before freeing (this event is counted in /proc/vmstat unevictable_pgs_cleared, which is usually 0).h]hXHowever, if there is a racing munlock(), since mlock_vma_pages_range() starts munlocking by clearing VM_LOCKED from a VMA, before munlocking all the pages present, if one of those pages were unmapped by truncation or hole punch before mlock_pte_range() reached it, it would not be recognized as mlocked by this VMA, and would not be counted out of mlock_count. In this rare case, a page may still appear as PG_mlocked after it has been fully unmapped: and it is left to release_pages() (or __page_cache_release()) to clear it and update statistics before freeing (this event is counted in /proc/vmstat unevictable_pgs_cleared, which is usually 0).}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjo hhubeh}(h]jFah ]h"]truncating mlocked pagesah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hPage Reclaim in shrink_*_list()h]hPage Reclaim in shrink_*_list()}(hj hhhNhNubah}(h]h ]h"]h$]h&]jjbuh1hhj hhhhhMubh)}(hXvmscan's shrink_active_list() culls any obviously unevictable pages - i.e. !page_evictable(page) pages - diverting those to the unevictable list. However, shrink_active_list() only sees unevictable pages that made it onto the active/inactive LRU lists. Note that these pages do not have PG_unevictable set - otherwise they would be on the unevictable list and shrink_active_list() would never see them.h]hXvmscan’s shrink_active_list() culls any obviously unevictable pages - i.e. !page_evictable(page) pages - diverting those to the unevictable list. However, shrink_active_list() only sees unevictable pages that made it onto the active/inactive LRU lists. Note that these pages do not have PG_unevictable set - otherwise they would be on the unevictable list and shrink_active_list() would never see them.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(h>Some examples of these unevictable pages on the LRU lists are:h]h>Some examples of these unevictable pages on the LRU lists are:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubj)}(hX(1) ramfs pages that have been placed on the LRU lists when first allocated. (2) SHM_LOCK'd shared memory pages. shmctl(SHM_LOCK) does not attempt to allocate or fault in the pages in the shared memory region. This happens when an application accesses the page the first time after SHM_LOCK'ing the segment. (3) pages still mapped into VM_LOCKED VMAs, which should be marked mlocked, but events left mlock_count too low, so they were munlocked too early. h]j)}(hhh](h)}(hIramfs pages that have been placed on the LRU lists when first allocated. h]h)}(hHramfs pages that have been placed on the LRU lists when first allocated.h]hHramfs pages that have been placed on the LRU lists when first allocated.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hSHM_LOCK'd shared memory pages. shmctl(SHM_LOCK) does not attempt to allocate or fault in the pages in the shared memory region. This happens when an application accesses the page the first time after SHM_LOCK'ing the segment. h]h)}(hSHM_LOCK'd shared memory pages. shmctl(SHM_LOCK) does not attempt to allocate or fault in the pages in the shared memory region. This happens when an application accesses the page the first time after SHM_LOCK'ing the segment.h]hSHM_LOCK’d shared memory pages. shmctl(SHM_LOCK) does not attempt to allocate or fault in the pages in the shared memory region. This happens when an application accesses the page the first time after SHM_LOCK’ing the segment.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hpages still mapped into VM_LOCKED VMAs, which should be marked mlocked, but events left mlock_count too low, so they were munlocked too early. h]h)}(hpages still mapped into VM_LOCKED VMAs, which should be marked mlocked, but events left mlock_count too low, so they were munlocked too early.h]hpages still mapped into VM_LOCKED VMAs, which should be marked mlocked, but events left mlock_count too low, so they were munlocked too early.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM$hj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjjjjjuh1jhj ubah}(h]h ]h"]h$]h&]uh1jhhhMhj hhubh)}(hvmscan's shrink_inactive_list() and shrink_folio_list() also divert obviously unevictable pages found on the inactive lists to the appropriate memory cgroup and node unevictable list.h]hvmscan’s shrink_inactive_list() and shrink_folio_list() also divert obviously unevictable pages found on the inactive lists to the appropriate memory cgroup and node unevictable list.}(hj9 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM'hj hhubh)}(hXMrmap's folio_referenced_one(), called via vmscan's shrink_active_list() or shrink_folio_list(), and rmap's try_to_unmap_one() called via shrink_folio_list(), check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_folio() to correct them. Such pages are culled to the unevictable list when released by the shrinker.h]hXSrmap’s folio_referenced_one(), called via vmscan’s shrink_active_list() or shrink_folio_list(), and rmap’s try_to_unmap_one() called via shrink_folio_list(), check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_folio() to correct them. Such pages are culled to the unevictable list when released by the shrinker.}(hjG hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM+hj hhubeh}(h]jhah ]h"]page reclaim in shrink_*_list()ah$]h&]uh1hhjhhhhhMubeh}(h]jah ]h"] mlocked pagesah$]h&]uh1hhhhhhhhKubeh}(h]unevictable-lru-infrastructureah ]h"]unevictable lru infrastructureah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerj error_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}j]jasnameids}(jh je jjjhjhj:jjj?jjjjajjjjj` jjQjj jj j6j jXj jzjH jjn jj jj jjl j$j jFjY jhu nametypes}(jh jjjj:jjjjjj` jQj j j j jH jn j j jl j jY uh}(je hjhhjhjjjj?j=jjjajjj#jjjjjjjjTj6j jXj jzj jj jjK jjq jj j$j jFjo jhj hhhhjjj9j0j[jRj}jtjjjjjjjjj0j'jRjIjtjkjjjjjjjjjjj@j7jbjYu footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}j KsRparse_messages]transform_messages]hsystem_message)}(hhh]h)}(hhh]h=Hyperlink target "mark-addr-space-unevict" is not referenced.}hj sbah}(h]h ]h"]h$]h&]uh1hhj ubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehlineKyuh1j uba transformerN include_log] decorationNhhub.