,sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget#/translations/zh_CN/mm/memory-modelmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/zh_TW/mm/memory-modelmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/it_IT/mm/memory-modelmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/ja_JP/mm/memory-modelmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/ko_KR/mm/memory-modelmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/sp_SP/mm/memory-modelmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh=/var/lib/git/docbuild/linux/Documentation/mm/memory-model.rsthKubhsection)}(hhh](htitle)}(hPhysical Memory Modelh]hPhysical Memory Model}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hXPhysical memory in a system may be addressed in different ways. The simplest case is when the physical memory starts at address 0 and spans a contiguous range up to the maximal address. It could be, however, that this range contains small holes that are not accessible for the CPU. Then there could be several contiguous ranges at completely distinct addresses. And, don't forget about NUMA, where different memory banks are attached to different CPUs.h]hXPhysical memory in a system may be addressed in different ways. The simplest case is when the physical memory starts at address 0 and spans a contiguous range up to the maximal address. It could be, however, that this range contains small holes that are not accessible for the CPU. Then there could be several contiguous ranges at completely distinct addresses. And, don’t forget about NUMA, where different memory banks are attached to different CPUs.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hLinux abstracts this diversity using one of the two memory models: FLATMEM and SPARSEMEM. Each architecture defines what memory models it supports, what the default memory model is and whether it is possible to manually override that default.h]hLinux abstracts this diversity using one of the two memory models: FLATMEM and SPARSEMEM. Each architecture defines what memory models it supports, what the default memory model is and whether it is possible to manually override that default.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hpAll the memory models track the status of physical page frames using struct page arranged in one or more arrays.h]hpAll the memory models track the status of physical page frames using struct page arranged in one or more arrays.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hRegardless of the selected memory model, there exists one-to-one mapping between the physical page frame number (PFN) and the corresponding `struct page`.h](hRegardless of the selected memory model, there exists one-to-one mapping between the physical page frame number (PFN) and the corresponding }(hhhhhNhNubhtitle_reference)}(h `struct page`h]h struct page}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhubh.}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hEach memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn` helpers that allow the conversion from PFN to `struct page` and vice versa.h](hEach memory model defines }(hjhhhNhNubh)}(h:c:func:`pfn_to_page`h]hliteral)}(hj!h]h pfn_to_page()}(hj%hhhNhNubah}(h]h ](xrefcc-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocmm/memory-model refdomainj0reftypefunc refexplicitrefwarn reftarget pfn_to_pageuh1hhhhKhjubh and }(hjhhhNhNubh)}(h:c:func:`page_to_pfn`h]j$)}(hjJh]h page_to_pfn()}(hjLhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjHubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjB page_to_pfnuh1hhhhKhjubh/ helpers that allow the conversion from PFN to }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh and vice versa.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(hFLATMEMh]hFLATMEM}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK ubh)}(hThe simplest memory model is FLATMEM. This model is suitable for non-NUMA systems with contiguous, or mostly contiguous, physical memory.h]hThe simplest memory model is FLATMEM. This model is suitable for non-NUMA systems with contiguous, or mostly contiguous, physical memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK"hjhhubh)}(hIn the FLATMEM memory model, there is a global `mem_map` array that maps the entire physical memory. For most architectures, the holes have entries in the `mem_map` array. The `struct page` objects corresponding to the holes are never fully initialized.h](h/In the FLATMEM memory model, there is a global }(hjhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubhc array that maps the entire physical memory. For most architectures, the holes have entries in the }(hjhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh array. The }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh@ objects corresponding to the holes are never fully initialized.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK&hjhhubh)}(hTo allocate the `mem_map` array, architecture specific setup code should call :c:func:`free_area_init` function. Yet, the mappings array is not usable until the call to :c:func:`memblock_free_all` that hands all the memory to the page allocator.h](hTo allocate the }(hjhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh5 array, architecture specific setup code should call }(hjhhhNhNubh)}(h:c:func:`free_area_init`h]j$)}(hjh]hfree_area_init()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBfree_area_inituh1hhhhK+hjubhC function. Yet, the mappings array is not usable until the call to }(hjhhhNhNubh)}(h:c:func:`memblock_free_all`h]j$)}(hj%h]hmemblock_free_all()}(hj'hhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hj#ubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBmemblock_free_alluh1hhhhK+hjubh1 that hands all the memory to the page allocator.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK+hjhhubh)}(hAn architecture may free parts of the `mem_map` array that do not cover the actual physical pages. In such case, the architecture specific :c:func:`pfn_valid` implementation should take the holes in the `mem_map` into account.h](h&An architecture may free parts of the }(hjLhhhNhNubh)}(h `mem_map`h]hmem_map}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1hhjLubh\ array that do not cover the actual physical pages. In such case, the architecture specific }(hjLhhhNhNubh)}(h:c:func:`pfn_valid`h]j$)}(hjhh]h pfn_valid()}(hjjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjfubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjB pfn_validuh1hhhhK0hjLubh- implementation should take the holes in the }(hjLhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjLubh into account.}(hjLhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK0hjhhubh)}(hWith FLATMEM, the conversion between a PFN and the `struct page` is straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the `mem_map` array.h](h3With FLATMEM, the conversion between a PFN and the }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh is straightforward: }(hjhhhNhNubh)}(h`PFN - ARCH_PFN_OFFSET`h]hPFN - ARCH_PFN_OFFSET}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh is an index to the }(hjhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh array.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK5hjhhubh)}(hThe `ARCH_PFN_OFFSET` defines the first page frame number for systems with physical memory starting at address different from 0.h](hThe }(hjhhhNhNubh)}(h`ARCH_PFN_OFFSET`h]hARCH_PFN_OFFSET}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubhk defines the first page frame number for systems with physical memory starting at address different from 0.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK9hjhhubeh}(h]flatmemah ]h"]flatmemah$]h&]uh1hhhhhhhhK ubh)}(hhh](h)}(h SPARSEMEMh]h SPARSEMEM}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhK=ubh)}(hX9SPARSEMEM is the most versatile memory model available in Linux and it is the only memory model that supports several advanced features such as hot-plug and hot-remove of the physical memory, alternative memory maps for non-volatile memory devices and deferred initialization of the memory map for larger systems.h]hX9SPARSEMEM is the most versatile memory model available in Linux and it is the only memory model that supports several advanced features such as hot-plug and hot-remove of the physical memory, alternative memory maps for non-volatile memory devices and deferred initialization of the memory map for larger systems.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK?hj hhubh)}(hXeThe SPARSEMEM model presents the physical memory as a collection of sections. A section is represented with struct mem_section that contains `section_mem_map` that is, logically, a pointer to an array of struct pages. However, it is stored with some other magic that aids the sections management. The section size and maximal number of section is specified using `SECTION_SIZE_BITS` and `MAX_PHYSMEM_BITS` constants defined by each architecture that supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a physical address that an architecture supports, the `SECTION_SIZE_BITS` is an arbitrary value.h](hThe SPARSEMEM model presents the physical memory as a collection of sections. A section is represented with struct mem_section that contains }(hj,hhhNhNubh)}(h`section_mem_map`h]hsection_mem_map}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj,ubh that is, logically, a pointer to an array of struct pages. However, it is stored with some other magic that aids the sections management. The section size and maximal number of section is specified using }(hj,hhhNhNubh)}(h`SECTION_SIZE_BITS`h]hSECTION_SIZE_BITS}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj,ubh and }(hj,hhhNhNubh)}(h`MAX_PHYSMEM_BITS`h]hMAX_PHYSMEM_BITS}(hjXhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj,ubhG constants defined by each architecture that supports SPARSEMEM. While }(hj,hhhNhNubh)}(h`MAX_PHYSMEM_BITS`h]hMAX_PHYSMEM_BITS}(hjjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj,ubhM is an actual width of a physical address that an architecture supports, the }(hj,hhhNhNubh)}(h`SECTION_SIZE_BITS`h]hSECTION_SIZE_BITS}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj,ubh is an arbitrary value.}(hj,hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKEhj hhubh)}(hJThe maximal number of sections is denoted `NR_MEM_SECTIONS` and defined ash](h*The maximal number of sections is denoted }(hjhhhNhNubh)}(h`NR_MEM_SECTIONS`h]hNR_MEM_SECTIONS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh and defined as}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKPhj hhubh math_block)}(hDNR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}h]hDNR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}}hjsbah}(h]h ]h"]h$]h&]docnamej<numberNlabelNnowraphhuh1jhhhKShj hhubh)}(hThe `mem_section` objects are arranged in a two-dimensional array called `mem_sections`. The size and placement of this array depend on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of sections:h](hThe }(hjhhhNhNubh)}(h `mem_section`h]h mem_section}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh8 objects are arranged in a two-dimensional array called }(hjhhhNhNubh)}(h`mem_sections`h]h mem_sections}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh1. The size and placement of this array depend on }(hjhhhNhNubh)}(h`CONFIG_SPARSEMEM_EXTREME`h]hCONFIG_SPARSEMEM_EXTREME}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh- and the maximal possible number of sections:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKWhj hhubh bullet_list)}(hhh](h list_item)}(hWhen `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections` array is static and has `NR_MEM_SECTIONS` rows. Each row holds a single `mem_section` object.h]h)}(hWhen `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections` array is static and has `NR_MEM_SECTIONS` rows. Each row holds a single `mem_section` object.h](hWhen }(hjhhhNhNubh)}(h`CONFIG_SPARSEMEM_EXTREME`h]hCONFIG_SPARSEMEM_EXTREME}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh is disabled, the }(hjhhhNhNubh)}(h`mem_sections`h]h mem_sections}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh array is static and has }(hjhhhNhNubh)}(h`NR_MEM_SECTIONS`h]hNR_MEM_SECTIONS}(hjChhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh rows. Each row holds a single }(hjhhhNhNubh)}(h `mem_section`h]h mem_section}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh object.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK\hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hWhen `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections` array is dynamically allocated. Each row contains PAGE_SIZE worth of `mem_section` objects and the number of rows is calculated to fit all the memory sections. h]h)}(hWhen `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections` array is dynamically allocated. Each row contains PAGE_SIZE worth of `mem_section` objects and the number of rows is calculated to fit all the memory sections.h](hWhen }(hjwhhhNhNubh)}(h`CONFIG_SPARSEMEM_EXTREME`h]hCONFIG_SPARSEMEM_EXTREME}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjwubh is enabled, the }(hjwhhhNhNubh)}(h`mem_sections`h]h mem_sections}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjwubhF array is dynamically allocated. Each row contains PAGE_SIZE worth of }(hjwhhhNhNubh)}(h `mem_section`h]h mem_section}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjwubhM objects and the number of rows is calculated to fit all the memory sections.}(hjwhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK_hjsubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1j hhhK\hj hhubh)}(hlThe architecture setup code should call sparse_init() to initialize the memory sections and the memory maps.h]hlThe architecture setup code should call sparse_init() to initialize the memory sections and the memory maps.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKdhj hhubh)}(hWith SPARSEMEM there are two possible ways to convert a PFN to the corresponding `struct page` - a "classic sparse" and "sparse vmemmap". The selection is made at build time and it is determined by the value of `CONFIG_SPARSEMEM_VMEMMAP`.h](hQWith SPARSEMEM there are two possible ways to convert a PFN to the corresponding }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh} - a “classic sparse” and “sparse vmemmap”. The selection is made at build time and it is determined by the value of }(hjhhhNhNubh)}(h`CONFIG_SPARSEMEM_VMEMMAP`h]hCONFIG_SPARSEMEM_VMEMMAP}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKghj hhubh)}(hThe classic sparse encodes the section number of a page in page->flags and uses high bits of a PFN to access the section that maps that page frame. Inside a section, the PFN is the index to the array of pages.h]hThe classic sparse encodes the section number of a page in page->flags and uses high bits of a PFN to access the section that maps that page frame. Inside a section, the PFN is the index to the array of pages.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKlhj hhubh)}(hXLThe sparse vmemmap uses a virtually mapped memory map to optimize pfn_to_page and page_to_pfn operations. There is a global `struct page *vmemmap` pointer that points to a virtually contiguous array of `struct page` objects. A PFN is an index to that array and the offset of the `struct page` from `vmemmap` is the PFN of that page.h](h|The sparse vmemmap uses a virtually mapped memory map to optimize pfn_to_page and page_to_pfn operations. There is a global }(hjhhhNhNubh)}(h`struct page *vmemmap`h]hstruct page *vmemmap}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh8 pointer that points to a virtually contiguous array of }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh@ objects. A PFN is an index to that array and the offset of the }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjChhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh from }(hjhhhNhNubh)}(h `vmemmap`h]hvmemmap}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh is the PFN of that page.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKphj hhubh)}(hXTo use vmemmap, an architecture has to reserve a range of virtual addresses that will map the physical pages containing the memory map and make sure that `vmemmap` points to that range. In addition, the architecture should implement :c:func:`vmemmap_populate` method that will allocate the physical memory and create page tables for the virtual memory map. If an architecture does not have any special requirements for the vmemmap mappings, it can use default :c:func:`vmemmap_populate_basepages` provided by the generic memory management.h](hTo use vmemmap, an architecture has to reserve a range of virtual addresses that will map the physical pages containing the memory map and make sure that }(hjmhhhNhNubh)}(h `vmemmap`h]hvmemmap}(hjuhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjmubhF points to that range. In addition, the architecture should implement }(hjmhhhNhNubh)}(h:c:func:`vmemmap_populate`h]j$)}(hjh]hvmemmap_populate()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBvmemmap_populateuh1hhhhKwhjmubh method that will allocate the physical memory and create page tables for the virtual memory map. If an architecture does not have any special requirements for the vmemmap mappings, it can use default }(hjmhhhNhNubh)}(h$:c:func:`vmemmap_populate_basepages`h]j$)}(hjh]hvmemmap_populate_basepages()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBvmemmap_populate_basepagesuh1hhhhKwhjmubh+ provided by the generic memory management.}(hjmhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKwhj hhubh)}(hXThe virtually mapped memory map allows storing `struct page` objects for persistent memory devices in pre-allocated storage on those devices. This storage is represented with struct vmem_altmap that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the `vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to allocate memory map on the persistent memory device.h](h/The virtually mapped memory map allows storing }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubhX objects for persistent memory devices in pre-allocated storage on those devices. This storage is represented with struct vmem_altmap that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the }(hjhhhNhNubh)}(h `vmem_altmap`h]h vmem_altmap}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh along with }(hjhhhNhNubh)}(h!:c:func:`vmemmap_alloc_block_buf`h]j$)}(hjh]hvmemmap_alloc_block_buf()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBvmemmap_alloc_block_bufuh1hhhhKhjubh? helper to allocate memory map on the persistent memory device.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubeh}(h] sparsememah ]h"] sparsememah$]h&]uh1hhhhhhhhK=ubh)}(hhh](h)}(h ZONE_DEVICEh]h ZONE_DEVICE}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj0hhhhhKubh)}(hX>The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer `struct page` `mem_map` services for device driver identified physical address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact that the page objects for these address ranges are never marked online, and that a reference must be taken against the device, not just the page to keep the memory pinned for active use. `ZONE_DEVICE`, via :c:func:`devm_memremap_pages`, performs just enough memory hotplug to turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and :c:func:`get_user_pages` service for the given range of pfns. Since the page reference count never drops below 1 the page is never tracked as free memory and the page's `struct list_head lru` space is repurposed for back referencing to the host device / driver that mapped the memory.h](hThe }(hjAhhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hjIhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh facility builds upon }(hjAhhhNhNubh)}(h`SPARSEMEM_VMEMMAP`h]hSPARSEMEM_VMEMMAP}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh to offer }(hjAhhhNhNubh)}(h `struct page`h]h struct page}(hjmhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh }(hjAhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh[ services for device driver identified physical address ranges. The “device” aspect of }(hjAhhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh relates to the fact that the page objects for these address ranges are never marked online, and that a reference must be taken against the device, not just the page to keep the memory pinned for active use. }(hjAhhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh, via }(hjAhhhNhNubh)}(h:c:func:`devm_memremap_pages`h]j$)}(hjh]hdevm_memremap_pages()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBdevm_memremap_pagesuh1hhhhKhjAubh1, performs just enough memory hotplug to turn on }(hjAhhhNhNubh)}(h:c:func:`pfn_to_page`h]j$)}(hjh]h pfn_to_page()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjB pfn_to_pageuh1hhhhKhjAubh, }(hjAhhhNhNubh)}(h:c:func:`page_to_pfn`h]j$)}(hjh]h page_to_pfn()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjB page_to_pfnuh1hhhhKhjAubh, and }(hjAhhhNhNubh)}(h:c:func:`get_user_pages`h]j$)}(hj h]hget_user_pages()}(hj"hhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBget_user_pagesuh1hhhhKhjAubh service for the given range of pfns. Since the page reference count never drops below 1 the page is never tracked as free memory and the page’s }(hjAhhhNhNubh)}(h`struct list_head lru`h]hstruct list_head lru}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAubh] space is repurposed for back referencing to the host device / driver that mapped the memory.}(hjAhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj0hhubh)}(hXWhile `SPARSEMEM` presents memory as a collection of sections, optionally collected into memory blocks, `ZONE_DEVICE` users have a need for smaller granularity of populating the `mem_map`. Given that `ZONE_DEVICE` memory is never marked online it is subsequently never subject to its memory ranges being exposed through the sysfs memory hotplug api on memory block boundaries. The implementation relies on this lack of user-api constraint to allow sub-section sized memory ranges to be specified to :c:func:`arch_add_memory`, the top-half of memory hotplug. Sub-section support allows for 2MB as the cross-arch common alignment granularity for :c:func:`devm_memremap_pages`.h](hWhile }(hjYhhhNhNubh)}(h `SPARSEMEM`h]h SPARSEMEM}(hjahhhNhNubah}(h]h ]h"]h$]h&]uh1hhjYubhW presents memory as a collection of sections, optionally collected into memory blocks, }(hjYhhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hjshhhNhNubah}(h]h ]h"]h$]h&]uh1hhjYubh= users have a need for smaller granularity of populating the }(hjYhhhNhNubh)}(h `mem_map`h]hmem_map}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjYubh . Given that }(hjYhhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjYubhX memory is never marked online it is subsequently never subject to its memory ranges being exposed through the sysfs memory hotplug api on memory block boundaries. The implementation relies on this lack of user-api constraint to allow sub-section sized memory ranges to be specified to }(hjYhhhNhNubh)}(h:c:func:`arch_add_memory`h]j$)}(hjh]harch_add_memory()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBarch_add_memoryuh1hhhhKhjYubhx, the top-half of memory hotplug. Sub-section support allows for 2MB as the cross-arch common alignment granularity for }(hjYhhhNhNubh)}(h:c:func:`devm_memremap_pages`h]j$)}(hjh]hdevm_memremap_pages()}(hjhhhNhNubah}(h]h ](j/j0c-funceh"]h$]h&]uh1j#hjubah}(h]h ]h"]h$]h&]refdocj< refdomainj0reftypefunc refexplicitrefwarnjBdevm_memremap_pagesuh1hhhhKhjYubh.}(hjYhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj0hhubh)}(hThe users of `ZONE_DEVICE` are:h](h The users of }(hjhhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh are:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj0hhubj )}(hhh](j)}(hYpmem: Map platform persistent memory to be used as a direct-I/O target via DAX mappings. h]h)}(hXpmem: Map platform persistent memory to be used as a direct-I/O target via DAX mappings.h]hXpmem: Map platform persistent memory to be used as a direct-I/O target via DAX mappings.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hhmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()` event callbacks to allow a device-driver to coordinate memory management events related to device-memory, typically GPU memory. See Documentation/mm/hmm.rst. h]h)}(hhmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()` event callbacks to allow a device-driver to coordinate memory management events related to device-memory, typically GPU memory. See Documentation/mm/hmm.rst.h](h hmm: Extend }(hj4hhhNhNubh)}(h `ZONE_DEVICE`h]h ZONE_DEVICE}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj4ubh with }(hj4hhhNhNubh)}(h`->page_fault()`h]h->page_fault()}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj4ubh and }(hj4hhhNhNubh)}(h`->page_free()`h]h ->page_free()}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj4ubh event callbacks to allow a device-driver to coordinate memory management events related to device-memory, typically GPU memory. See Documentation/mm/hmm.rst.}(hj4hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj0ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hp2pdma: Create `struct page` objects to allow peer devices in a PCI/-E topology to coordinate direct-DMA operations between themselves, i.e. bypass host memory.h]h)}(hp2pdma: Create `struct page` objects to allow peer devices in a PCI/-E topology to coordinate direct-DMA operations between themselves, i.e. bypass host memory.h](hp2pdma: Create }(hjhhhNhNubh)}(h `struct page`h]h struct page}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubh objects to allow peer devices in a PCI/-E topology to coordinate direct-DMA operations between themselves, i.e. bypass host memory.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj~ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhKhj0hhubeh}(h] zone-deviceah ]h"] zone_deviceah$]h&]uh1hhhhhhhhKubeh}(h]physical-memory-modelah ]h"]physical memory modelah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(jjj jj-j*jju nametypes}(jj j-juh}(jhjjj*j jj0u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages] transformerN include_log] decorationNhhub.