sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget1/translations/zh_CN/admin-guide/mm/memory-hotplugmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget1/translations/zh_TW/admin-guide/mm/memory-hotplugmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget1/translations/it_IT/admin-guide/mm/memory-hotplugmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget1/translations/ja_JP/admin-guide/mm/memory-hotplugmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget1/translations/ko_KR/admin-guide/mm/memory-hotplugmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget1/translations/sp_SP/admin-guide/mm/memory-hotplugmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhsection)}(hhh](htitle)}(hMemory Hot(Un)Plugh]hMemory Hot(Un)Plug}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhK/var/lib/git/docbuild/linux/Documentation/admin-guide/mm/memory-hotplug.rsthKubh paragraph)}(hThis document describes generic Linux support for memory hot(un)plug with a focus on System RAM, including ZONE_MOVABLE support.h]hThis document describes generic Linux support for memory hot(un)plug with a focus on System RAM, including ZONE_MOVABLE support.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubhtopic)}(hhh]h bullet_list)}(hhh](h list_item)}(hhh](h)}(hhh]h reference)}(hhh]h Introduction}(hhhhhNhNubah}(h]id1ah ]h"]h$]h&]refid introductionuh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hMemory Hot(Un)Plug Granularity}(hhhhhNhNubah}(h]id2ah ]h"]h$]h&]refidmemory-hot-un-plug-granularityuh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh]h)}(hhh]h)}(hhh]hPhases of Memory Hotplug}(hjhhhNhNubah}(h]id3ah ]h"]h$]h&]refidphases-of-memory-hotpluguh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh]h)}(hhh]h)}(hhh]hPhases of Memory Hotunplug}(hj>hhhNhNubah}(h]id4ah ]h"]h$]h&]refidphases-of-memory-hotunpluguh1hhj;ubah}(h]h ]h"]h$]h&]uh1hhj8ubah}(h]h ]h"]h$]h&]uh1hhhubeh}(h]h ]h"]h$]h&]uh1hhhubeh}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]hMemory Hotplug Notifications}(hjlhhhNhNubah}(h]id5ah ]h"]h$]h&]refidmemory-hotplug-notificationsuh1hhjiubah}(h]h ]h"]h$]h&]uh1hhjfubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hACPI Notifications}(hjhhhNhNubah}(h]id6ah ]h"]h$]h&]refidacpi-notificationsuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hManual Probing}(hjhhhNhNubah}(h]id7ah ]h"]h$]h&]refidmanual-probinguh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhjfubeh}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]h$Onlining and Offlining Memory Blocks}(hjhhhNhNubah}(h]id8ah ]h"]h$]h&]refid$onlining-and-offlining-memory-blocksuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hOnlining Memory Blocks Manually}(hjhhhNhNubah}(h]id9ah ]h"]h$]h&]refidonlining-memory-blocks-manuallyuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h$Onlining Memory Blocks Automatically}(hjhhhNhNubah}(h]id10ah ]h"]h$]h&]refid$onlining-memory-blocks-automaticallyuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hOfflining Memory Blocks}(hj>hhhNhNubah}(h]id11ah ]h"]h$]h&]refidofflining-memory-blocksuh1hhj;ubah}(h]h ]h"]h$]h&]uh1hhj8ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h$Observing the State of Memory Blocks}(hj`hhhNhNubah}(h]id12ah ]h"]h$]h&]refid$observing-the-state-of-memory-blocksuh1hhj]ubah}(h]h ]h"]h$]h&]uh1hhjZubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]hConfiguring Memory Hot(Un)Plug}(hjhhhNhNubah}(h]id13ah ]h"]h$]h&]refidconfiguring-memory-hot-un-pluguh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]h*Memory Hot(Un)Plug Configuration via Sysfs}(hjhhhNhNubah}(h]id14ah ]h"]h$]h&]refid*memory-hot-un-plug-configuration-via-sysfsuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]h$Memory Block Configuration via Sysfs}(hjhhhNhNubah}(h]id15ah ]h"]h$]h&]refid$memory-block-configuration-via-sysfsuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hCommand Line Parameters}(hjhhhNhNubah}(h]id16ah ]h"]h$]h&]refidcommand-line-parametersuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hhh]h)}(hhh]h)}(hhh]hModule Parameters}(hjhhhNhNubah}(h]id17ah ]h"]h$]h&]refidmodule-parametersuh1hhjubah}(h]h ]h"]h$]h&]uh1hhj ubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]h ZONE_MOVABLE}(hjAhhhNhNubah}(h]id18ah ]h"]h$]h&]refid zone-movableuh1hhj>ubah}(h]h ]h"]h$]h&]uh1hhj;ubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hZone Imbalances}(hj`hhhNhNubah}(h]id19ah ]h"]h$]h&]refidzone-imbalancesuh1hhj]ubah}(h]h ]h"]h$]h&]uh1hhjZubah}(h]h ]h"]h$]h&]uh1hhjWubh)}(hhh]h)}(hhh]h)}(hhh]h"ZONE_MOVABLE Sizing Considerations}(hjhhhNhNubah}(h]id20ah ]h"]h$]h&]refid"zone-movable-sizing-considerationsuh1hhjubah}(h]h ]h"]h$]h&]uh1hhj|ubah}(h]h ]h"]h$]h&]uh1hhjWubh)}(hhh]h)}(hhh]h)}(hhh]h!Memory Offlining and ZONE_MOVABLE}(hjhhhNhNubah}(h]id21ah ]h"]h$]h&]refid!memory-offlining-and-zone-movableuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhjWubeh}(h]h ]h"]h$]h&]uh1hhj;ubeh}(h]h ]h"]h$]h&]uh1hhhubeh}(h]h ]h"]h$]h&]uh1hhhhhhNhNubah}(h]contentsah ](contentslocaleh"]contentsah$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Introductionh]h Introduction}(hjhhhNhNubah}(h]h ]h"]h$]h&]refidhuh1hhjhhhhhK ubh)}(hMemory hot(un)plug allows for increasing and decreasing the size of physical memory available to a machine at runtime. In the simplest case, it consists of physically plugging or unplugging a DIMM at runtime, coordinated with the operating system.h]hMemory hot(un)plug allows for increasing and decreasing the size of physical memory available to a machine at runtime. In the simplest case, it consists of physically plugging or unplugging a DIMM at runtime, coordinated with the operating system.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hjhhubh)}(h0Memory hot(un)plug is used for various purposes:h]h0Memory hot(un)plug is used for various purposes:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](h)}(hThe physical memory available to a machine can be adjusted at runtime, up- or downgrading the memory capacity. This dynamic memory resizing, sometimes referred to as "capacity on demand", is frequently used with virtual machines and logical partitions. h]h)}(hThe physical memory available to a machine can be adjusted at runtime, up- or downgrading the memory capacity. This dynamic memory resizing, sometimes referred to as "capacity on demand", is frequently used with virtual machines and logical partitions.h]hXThe physical memory available to a machine can be adjusted at runtime, up- or downgrading the memory capacity. This dynamic memory resizing, sometimes referred to as “capacity on demand”, is frequently used with virtual machines and logical partitions.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(hzReplacing hardware, such as DIMMs or whole NUMA nodes, without downtime. One example is replacing failing memory modules. h]h)}(hyReplacing hardware, such as DIMMs or whole NUMA nodes, without downtime. One example is replacing failing memory modules.h]hyReplacing hardware, such as DIMMs or whole NUMA nodes, without downtime. One example is replacing failing memory modules.}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj%ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(hReducing energy consumption either by physically unplugging memory modules or by logically unplugging (parts of) memory modules from Linux. h]h)}(hReducing energy consumption either by physically unplugging memory modules or by logically unplugging (parts of) memory modules from Linux.h]hReducing energy consumption either by physically unplugging memory modules or by logically unplugging (parts of) memory modules from Linux.}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj=ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubeh}(h]h ]h"]h$]h&]bullet-uh1hhhhKhjhhubh)}(hFurther, the basic memory hot(un)plug infrastructure in Linux is nowadays also used to expose persistent memory, other performance-differentiated memory and reserved memory regions as ordinary system RAM to Linux.h]hFurther, the basic memory hot(un)plug infrastructure in Linux is nowadays also used to expose persistent memory, other performance-differentiated memory and reserved memory regions as ordinary system RAM to Linux.}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hpLinux only supports memory hot(un)plug on selected 64 bit architectures, such as x86_64, arm64, ppc64 and s390x.h]hpLinux only supports memory hot(un)plug on selected 64 bit architectures, such as x86_64, arm64, ppc64 and s390x.}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK#hjhhubh)}(hhh](h)}(hMemory Hot(Un)Plug Granularityh]hMemory Hot(Un)Plug Granularity}(hj|hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjyhhhhhK'ubh)}(hX Memory hot(un)plug in Linux uses the SPARSEMEM memory model, which divides the physical memory address space into chunks of the same size: memory sections. The size of a memory section is architecture dependent. For example, x86_64 uses 128 MiB and ppc64 uses 16 MiB.h]hX Memory hot(un)plug in Linux uses the SPARSEMEM memory model, which divides the physical memory address space into chunks of the same size: memory sections. The size of a memory section is architecture dependent. For example, x86_64 uses 128 MiB and ppc64 uses 16 MiB.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK)hjyhhubh)}(hX8Memory sections are combined into chunks referred to as "memory blocks". The size of a memory block is architecture dependent and corresponds to the smallest granularity that can be hot(un)plugged. The default size of a memory block is the same as memory section size, unless an architecture specifies otherwise.h]hX<Memory sections are combined into chunks referred to as “memory blocks”. The size of a memory block is architecture dependent and corresponds to the smallest granularity that can be hot(un)plugged. The default size of a memory block is the same as memory section size, unless an architecture specifies otherwise.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK.hjyhhubh)}(h%All memory blocks have the same size.h]h%All memory blocks have the same size.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK3hjyhhubeh}(h]j ah ]h"]memory hot(un)plug granularityah$]h&]uh1hhjhhhhhK'ubh)}(hhh](h)}(hPhases of Memory Hotplugh]hPhases of Memory Hotplug}(hjhhhNhNubah}(h]h ]h"]h$]h&]jj%uh1hhjhhhhhK6ubh)}(h&Memory hotplug consists of two phases:h]h&Memory hotplug consists of two phases:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK8hjhhubhenumerated_list)}(hhh](h)}(hAdding the memory to Linuxh]h)}(hjh]hAdding the memory to Linux}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK:hjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hOnlining memory blocks h]h)}(hOnlining memory blocksh]hOnlining memory blocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK;hjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefix(suffix)uh1jhjhhhhhK:ubh)}(hIn the first phase, metadata, such as the memory map ("memmap") and page tables for the direct mapping, is allocated and initialized, and memory blocks are created; the latter also creates sysfs files for managing newly created memory blocks.h]hIn the first phase, metadata, such as the memory map (“memmap”) and page tables for the direct mapping, is allocated and initialized, and memory blocks are created; the latter also creates sysfs files for managing newly created memory blocks.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK=hjhhubh)}(hIn the second phase, added memory is exposed to the page allocator. After this phase, the memory is visible in memory statistics, such as free and total memory, of the system.h]hIn the second phase, added memory is exposed to the page allocator. After this phase, the memory is visible in memory statistics, such as free and total memory, of the system.}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKBhjhhubeh}(h]j+ah ]h"]phases of memory hotplugah$]h&]uh1hhjhhhhhK6ubh)}(hhh](h)}(hPhases of Memory Hotunplugh]hPhases of Memory Hotunplug}(hj@hhhNhNubah}(h]h ]h"]h$]h&]jjGuh1hhj=hhhhhKGubh)}(h(Memory hotunplug consists of two phases:h]h(Memory hotunplug consists of two phases:}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKIhj=hhubj)}(hhh](h)}(hOfflining memory blocksh]h)}(hjah]hOfflining memory blocks}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKKhj_ubah}(h]h ]h"]h$]h&]uh1hhj\hhhhhNubh)}(hRemoving the memory from Linux h]h)}(hRemoving the memory from Linuxh]hRemoving the memory from Linux}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKLhjvubah}(h]h ]h"]h$]h&]uh1hhj\hhhhhNubeh}(h]h ]h"]h$]h&]jjjjjjuh1jhj=hhhhhKKubh)}(hXIn the first phase, memory is "hidden" from the page allocator again, for example, by migrating busy memory to other memory locations and removing all relevant free pages from the page allocator After this phase, the memory is no longer visible in memory statistics of the system.h]hXIn the first phase, memory is “hidden” from the page allocator again, for example, by migrating busy memory to other memory locations and removing all relevant free pages from the page allocator After this phase, the memory is no longer visible in memory statistics of the system.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKNhj=hhubh)}(hIIn the second phase, the memory blocks are removed and metadata is freed.h]hIIn the second phase, the memory blocks are removed and metadata is freed.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKShj=hhubeh}(h]jMah ]h"]phases of memory hotunplugah$]h&]uh1hhjhhhhhKGubeh}(h]hah ]h"] introductionah$]h&]uh1hhhhhhhhK ubh)}(hhh](h)}(hMemory Hotplug Notificationsh]hMemory Hotplug Notifications}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuuh1hhjhhhhhKVubh)}(hX There are various ways how Linux is notified about memory hotplug events such that it can start adding hotplugged memory. This description is limited to systems that support ACPI; mechanisms specific to other firmware interfaces or virtual machines are not described.h]hX There are various ways how Linux is notified about memory hotplug events such that it can start adding hotplugged memory. This description is limited to systems that support ACPI; mechanisms specific to other firmware interfaces or virtual machines are not described.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKXhjhhubh)}(hhh](h)}(hACPI Notificationsh]hACPI Notifications}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhK^ubh)}(h_Platforms that support ACPI, such as x86_64, can support memory hotplug notifications via ACPI.h]h_Platforms that support ACPI, such as x86_64, can support memory hotplug notifications via ACPI.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK`hjhhubh)}(hIn general, a firmware supporting memory hotplug defines a memory class object HID "PNP0C80". When notified about hotplug of a new memory device, the ACPI driver will hotplug the memory to Linux.h]hIn general, a firmware supporting memory hotplug defines a memory class object HID “PNP0C80”. When notified about hotplug of a new memory device, the ACPI driver will hotplug the memory to Linux.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKchjhhubh)}(hIf the firmware supports hotplug of NUMA nodes, it defines an object _HID "ACPI0004", "PNP0A05", or "PNP0A06". When notified about an hotplug event, all assigned memory devices are added to Linux by the ACPI driver.h]hIf the firmware supports hotplug of NUMA nodes, it defines an object _HID “ACPI0004”, “PNP0A05”, or “PNP0A06”. When notified about an hotplug event, all assigned memory devices are added to Linux by the ACPI driver.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKghjhhubh)}(hSimilarly, Linux can be notified about requests to hotunplug a memory device or a NUMA node via ACPI. The ACPI driver will try offlining all relevant memory blocks, and, if successful, hotunplug the memory from Linux.h]hSimilarly, Linux can be notified about requests to hotunplug a memory device or a NUMA node via ACPI. The ACPI driver will try offlining all relevant memory blocks, and, if successful, hotunplug the memory from Linux.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKkhjhhubeh}(h]jah ]h"]acpi notificationsah$]h&]uh1hhjhhhhhK^ubh)}(hhh](h)}(hManual Probingh]hManual Probing}(hj0hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj-hhhhhKpubh)}(hOn some architectures, the firmware may not be able to notify the operating system about a memory hotplug event. Instead, the memory has to be manually probed from user space.h]hOn some architectures, the firmware may not be able to notify the operating system about a memory hotplug event. Instead, the memory has to be manually probed from user space.}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKrhj-hhubh)}(h#The probe interface is located at::h]h"The probe interface is located at:}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKvhj-hhubh literal_block)}(h /sys/devices/system/memory/probeh]h /sys/devices/system/memory/probe}hj\sbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1jZhhhKxhj-hhubh)}(hOnly complete memory blocks can be probed. Individual memory blocks are probed by providing the physical start address of the memory block::h]hOnly complete memory blocks can be probed. Individual memory blocks are probed by providing the physical start address of the memory block:}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKzhj-hhubj[)}(h.% echo addr > /sys/devices/system/memory/probeh]h.% echo addr > /sys/devices/system/memory/probe}hjzsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhK}hj-hhubh)}(h]Which results in a memory block for the range [addr, addr + memory_block_size) being created.h]h]Which results in a memory block for the range [addr, addr + memory_block_size) being created.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj-hhubhnote)}(hUsing the probe interface is discouraged as it is easy to crash the kernel, because Linux cannot validate user input; this interface might be removed in the future.h]h)}(hUsing the probe interface is discouraged as it is easy to crash the kernel, because Linux cannot validate user input; this interface might be removed in the future.h]hUsing the probe interface is discouraged as it is easy to crash the kernel, because Linux cannot validate user input; this interface might be removed in the future.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhj-hhhhhNubeh}(h]jah ]h"]manual probingah$]h&]uh1hhjhhhhhKpubeh}(h]j{ah ]h"]memory hotplug notificationsah$]h&]uh1hhhhhhhhKVubh)}(hhh](h)}(h$Onlining and Offlining Memory Blocksh]h$Onlining and Offlining Memory Blocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hAfter a memory block has been created, Linux has to be instructed to actually make use of that memory: the memory block has to be "online".h]hAfter a memory block has been created, Linux has to be instructed to actually make use of that memory: the memory block has to be “online”.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hBefore a memory block can be removed, Linux has to stop using any memory part of the memory block: the memory block has to be "offlined".h]hBefore a memory block can be removed, Linux has to stop using any memory part of the memory block: the memory block has to be “offlined”.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX?The Linux kernel can be configured to automatically online added memory blocks and drivers automatically trigger offlining of memory blocks when trying hotunplug of memory. Memory blocks can only be removed once offlining succeeded and drivers may trigger offlining of memory blocks when attempting hotunplug of memory.h]hX?The Linux kernel can be configured to automatically online added memory blocks and drivers automatically trigger offlining of memory blocks when trying hotunplug of memory. Memory blocks can only be removed once offlining succeeded and drivers may trigger offlining of memory blocks when attempting hotunplug of memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](h)}(hOnlining Memory Blocks Manuallyh]hOnlining Memory Blocks Manually}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hIf auto-onlining of memory blocks isn't enabled, user-space has to manually trigger onlining of memory blocks. Often, udev rules are used to automate this task in user space.h]hIf auto-onlining of memory blocks isn’t enabled, user-space has to manually trigger onlining of memory blocks. Often, udev rules are used to automate this task in user space.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h1Onlining of a memory block can be triggered via::h]h0Onlining of a memory block can be triggered via:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h:% echo online > /sys/devices/system/memory/memoryXXX/stateh]h:% echo online > /sys/devices/system/memory/memoryXXX/state}hj&sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hOr alternatively::h]hOr alternatively:}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h6% echo 1 > /sys/devices/system/memory/memoryXXX/onlineh]h6% echo 1 > /sys/devices/system/memory/memoryXXX/online}hjBsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hdThe kernel will select the target zone automatically, depending on the configured ``online_policy``.h](hRThe kernel will select the target zone automatically, depending on the configured }(hjPhhhNhNubhliteral)}(h``online_policy``h]h online_policy}(hjZhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjPubh.}(hjPhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hVOne can explicitly request to associate an offline memory block with ZONE_MOVABLE by::h]hUOne can explicitly request to associate an offline memory block with ZONE_MOVABLE by:}(hjrhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(hB% echo online_movable > /sys/devices/system/memory/memoryXXX/stateh]hB% echo online_movable > /sys/devices/system/memory/memoryXXX/state}hjsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hFOr one can explicitly request a kernel zone (usually ZONE_NORMAL) by::h]hEOr one can explicitly request a kernel zone (usually ZONE_NORMAL) by:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(hA% echo online_kernel > /sys/devices/system/memory/memoryXXX/stateh]hA% echo online_kernel > /sys/devices/system/memory/memoryXXX/state}hjsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hIn any case, if onlining succeeds, the state of the memory block is changed to be "online". If it fails, the state of the memory block will remain unchanged and the above commands will fail.h]hIn any case, if onlining succeeds, the state of the memory block is changed to be “online”. If it fails, the state of the memory block will remain unchanged and the above commands will fail.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]j ah ]h"]onlining memory blocks manuallyah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h$Onlining Memory Blocks Automaticallyh]h$Onlining Memory Blocks Automatically}(hjhhhNhNubah}(h]h ]h"]h$]h&]jj%uh1hhjhhhhhKubh)}(hThe kernel can be configured to try auto-onlining of newly added memory blocks. If this feature is disabled, the memory blocks will stay offline until explicitly onlined from user space.h]hThe kernel can be configured to try auto-onlining of newly added memory blocks. If this feature is disabled, the memory blocks will stay offline until explicitly onlined from user space.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h9The configured auto-online behavior can be observed via::h]h8The configured auto-online behavior can be observed via:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h3% cat /sys/devices/system/memory/auto_online_blocksh]h3% cat /sys/devices/system/memory/auto_online_blocks}hjsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hpAuto-onlining can be enabled by writing ``online``, ``online_kernel`` or ``online_movable`` to that file, like::h](h(Auto-onlining can be enabled by writing }(hjhhhNhNubjY)}(h ``online``h]honline}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh, }(hjhhhNhNubjY)}(h``online_kernel``h]h online_kernel}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh or }(hjhhhNhNubjY)}(h``online_movable``h]honline_movable}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh to that file, like:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h=% echo online > /sys/devices/system/memory/auto_online_blocksh]h=% echo online > /sys/devices/system/memory/auto_online_blocks}hj>sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hSimilarly to manual onlining, with ``online`` the kernel will select the target zone automatically, depending on the configured ``online_policy``.h](h#Similarly to manual onlining, with }(hjLhhhNhNubjY)}(h ``online``h]honline}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjLubhS the kernel will select the target zone automatically, depending on the configured }(hjLhhhNhNubjY)}(h``online_policy``h]h online_policy}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjLubh.}(hjLhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h^Modifying the auto-online behavior will only affect all subsequently added memory blocks only.h]h^Modifying the auto-online behavior will only affect all subsequently added memory blocks only.}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hIn corner cases, auto-onlining can fail. The kernel won't retry. Note that auto-onlining is not expected to fail in default configurations.h]h)}(hIn corner cases, auto-onlining can fail. The kernel won't retry. Note that auto-onlining is not expected to fail in default configurations.h]hIn corner cases, auto-onlining can fail. The kernel won’t retry. Note that auto-onlining is not expected to fail in default configurations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hDLPAR on ppc64 ignores the ``offline`` setting and will still online added memory blocks; if onlining fails, memory blocks are removed again.h]h)}(hDLPAR on ppc64 ignores the ``offline`` setting and will still online added memory blocks; if onlining fails, memory blocks are removed again.h](hDLPAR on ppc64 ignores the }(hjhhhNhNubjY)}(h ``offline``h]hoffline}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubhg setting and will still online added memory blocks; if onlining fails, memory blocks are removed again.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]j+ah ]h"]$onlining memory blocks automaticallyah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(hOfflining Memory Blocksh]hOfflining Memory Blocks}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjGuh1hhjhhhhhKubh)}(hX In the current implementation, Linux's memory offlining will try migrating all movable pages off the affected memory block. As most kernel allocations, such as page tables, are unmovable, page migration can fail and, therefore, inhibit memory offlining from succeeding.h]hXIn the current implementation, Linux’s memory offlining will try migrating all movable pages off the affected memory block. As most kernel allocations, such as page tables, are unmovable, page migration can fail and, therefore, inhibit memory offlining from succeeding.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hHaving the memory provided by memory block managed by ZONE_MOVABLE significantly increases memory offlining reliability; still, memory offlining can fail in some corner cases.h]hHaving the memory provided by memory block managed by ZONE_MOVABLE significantly increases memory offlining reliability; still, memory offlining can fail in some corner cases.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hcFurther, memory offlining might retry for a long time (or even forever), until aborted by the user.h]hcFurther, memory offlining might retry for a long time (or even forever), until aborted by the user.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h2Offlining of a memory block can be triggered via::h]h1Offlining of a memory block can be triggered via:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h;% echo offline > /sys/devices/system/memory/memoryXXX/stateh]h;% echo offline > /sys/devices/system/memory/memoryXXX/state}hj sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hOr alternatively::h]hOr alternatively:}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h6% echo 0 > /sys/devices/system/memory/memoryXXX/onlineh]h6% echo 0 > /sys/devices/system/memory/memoryXXX/online}hj: sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hIf offlining succeeds, the state of the memory block is changed to be "offline". If it fails, the state of the memory block will remain unchanged and the above commands will fail, for example, via::h]hIf offlining succeeds, the state of the memory block is changed to be “offline”. If it fails, the state of the memory block will remain unchanged and the above commands will fail, for example, via:}(hjH hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h0bash: echo: write error: Device or resource busyh]h0bash: echo: write error: Device or resource busy}hjV sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubh)}(hor via::h]hor via:}(hjd hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj[)}(h)bash: echo: write error: Invalid argumenth]h)bash: echo: write error: Invalid argument}hjr sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhjhhubeh}(h]jMah ]h"]offlining memory blocksah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h$Observing the State of Memory Blocksh]h$Observing the State of Memory Blocks}(hj hhhNhNubah}(h]h ]h"]h$]h&]jjiuh1hhj hhhhhKubh)}(hWThe state (online/offline/going-offline) of a memory block can be observed either via::h]hVThe state (online/offline/going-offline) of a memory block can be observed either via:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubj[)}(h0% cat /sys/devices/system/memory/memoryXXX/stateh]h0% cat /sys/devices/system/memory/memoryXXX/state}hj sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhKhj hhubh)}(hOr alternatively (1/0) via::h]hOr alternatively (1/0) via:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubj[)}(h1% cat /sys/devices/system/memory/memoryXXX/onlineh]h1% cat /sys/devices/system/memory/memoryXXX/online}hj sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhj hhubh)}(hCFor an online memory block, the managing zone can be observed via::h]hBFor an online memory block, the managing zone can be observed via:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubj[)}(h6% cat /sys/devices/system/memory/memoryXXX/valid_zonesh]h6% cat /sys/devices/system/memory/memoryXXX/valid_zones}hj sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhj hhubeh}(h]joah ]h"]$observing the state of memory blocksah$]h&]uh1hhjhhhhhKubeh}(h]jah ]h"]$onlining and offlining memory blocksah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hConfiguring Memory Hot(Un)Plugh]hConfiguring Memory Hot(Un)Plug}(hj hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj hhhhhMubh)}(hThere are various ways how system administrators can configure memory hot(un)plug and interact with memory blocks, especially, to online them.h]hThere are various ways how system administrators can configure memory hot(un)plug and interact with memory blocks, especially, to online them.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj hhubh)}(hhh](h)}(h*Memory Hot(Un)Plug Configuration via Sysfsh]h*Memory Hot(Un)Plug Configuration via Sysfs}(hj hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj hhhhhMubh)}(hPSome memory hot(un)plug properties can be configured or inspected via sysfs in::h]hOSome memory hot(un)plug properties can be configured or inspected via sysfs in:}(hj* hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubj[)}(h/sys/devices/system/memory/h]h/sys/devices/system/memory/}hj8 sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhj hhubh)}(h*The following files are currently defined:h]h*The following files are currently defined:}(hjF hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1j^ hj[ ubj_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthK9uh1j^ hj[ ubhtbody)}(hhh](hrow)}(hhh](hentry)}(hhh]h)}(h``auto_online_blocks``h]jY)}(hj h]hauto_online_blocks}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1j~ hj{ ubj )}(hhh](h)}(hWread-write: set or get the default state of new memory blocks; configure auto-onlining.h]hWread-write: set or get the default state of new memory blocks; configure auto-onlining.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubh)}(h]The default value depends on the CONFIG_MHP_DEFAULT_ONLINE_TYPE kernel configuration options.h]h]The default value depends on the CONFIG_MHP_DEFAULT_ONLINE_TYPE kernel configuration options.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubh)}(h8See the ``state`` property of memory blocks for details.h](hSee the }(hj hhhNhNubjY)}(h ``state``h]hstate}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubh' property of memory blocks for details.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj ubeh}(h]h ]h"]h$]h&]uh1j~ hj{ ubeh}(h]h ]h"]h$]h&]uh1jy hjv ubjz )}(hhh](j )}(hhh]h)}(h``block_size_bytes``h]jY)}(hj h]hblock_size_bytes}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1j~ hj ubj )}(hhh]h)}(h/read-only: the size in bytes of a memory block.h]h/read-only: the size in bytes of a memory block.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1j~ hj ubeh}(h]h ]h"]h$]h&]uh1jy hjv ubjz )}(hhh](j )}(hhh]h)}(h ``probe``h]jY)}(hj3 h]hprobe}(hj5 hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj1 ubah}(h]h ]h"]h$]h&]uh1hhhhM hj. ubah}(h]h ]h"]h$]h&]uh1j~ hj+ ubj )}(hhh](h)}(hpwrite-only: add (probe) selected memory blocks manually from user space by supplying the physical start address.h]hpwrite-only: add (probe) selected memory blocks manually from user space by supplying the physical start address.}(hjQ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hjN ubh)}(hQAvailability depends on the CONFIG_ARCH_MEMORY_PROBE kernel configuration option.h]hQAvailability depends on the CONFIG_ARCH_MEMORY_PROBE kernel configuration option.}(hj_ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM#hjN ubeh}(h]h ]h"]h$]h&]uh1j~ hj+ ubeh}(h]h ]h"]h$]h&]uh1jy hjv ubjz )}(hhh](j )}(hhh]h)}(h ``uevent``h]jY)}(hj h]huevent}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubah}(h]h ]h"]h$]h&]uh1hhhhM%hj| ubah}(h]h ]h"]h$]h&]uh1j~ hjy ubj )}(hhh]h)}(h4read-write: generic udev file for device subsystems.h]h4read-write: generic udev file for device subsystems.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM%hj ubah}(h]h ]h"]h$]h&]uh1j~ hjy ubeh}(h]h ]h"]h$]h&]uh1jy hjv ubjz )}(hhh](j )}(hhh]h)}(h``crash_hotplug``h]jY)}(hj h]h crash_hotplug}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubah}(h]h ]h"]h$]h&]uh1hhhhM&hj ubah}(h]h ]h"]h$]h&]uh1j~ hj ubj )}(hhh](h)}(hX%read-only: when changes to the system memory map occur due to hot un/plug of memory, this file contains '1' if the kernel updates the kdump capture kernel memory map itself (via elfcorehdr and other relevant kexec segments), or '0' if userspace must update the kdump capture kernel memory map.h]hX-read-only: when changes to the system memory map occur due to hot un/plug of memory, this file contains ‘1’ if the kernel updates the kdump capture kernel memory map itself (via elfcorehdr and other relevant kexec segments), or ‘0’ if userspace must update the kdump capture kernel memory map.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM&hj ubh)}(hNAvailability depends on the CONFIG_MEMORY_HOTPLUG kernel configuration option.h]hNAvailability depends on the CONFIG_MEMORY_HOTPLUG kernel configuration option.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM-hj ubeh}(h]h ]h"]h$]h&]uh1j~ hj ubeh}(h]h ]h"]h$]h&]uh1jy hjv ubeh}(h]h ]h"]h$]h&]uh1jt hj[ ubeh}(h]h ]h"]h$]h&]colsKuh1jY hjV ubah}(h]h ]h"]h$]h&]uh1jT hj hhhhhNubj)}(hXLWhen the CONFIG_MEMORY_FAILURE kernel configuration option is enabled, two additional files ``hard_offline_page`` and ``soft_offline_page`` are available to trigger hwpoisoning of pages, for example, for testing purposes. Note that this functionality is not really related to memory hot(un)plug or actual offlining of memory blocks.h]h)}(hXLWhen the CONFIG_MEMORY_FAILURE kernel configuration option is enabled, two additional files ``hard_offline_page`` and ``soft_offline_page`` are available to trigger hwpoisoning of pages, for example, for testing purposes. Note that this functionality is not really related to memory hot(un)plug or actual offlining of memory blocks.h](h\When the CONFIG_MEMORY_FAILURE kernel configuration option is enabled, two additional files }(hj hhhNhNubjY)}(h``hard_offline_page``h]hhard_offline_page}(hj& hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubh and }(hj hhhNhNubjY)}(h``soft_offline_page``h]hsoft_offline_page}(hj8 hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubh are available to trigger hwpoisoning of pages, for example, for testing purposes. Note that this functionality is not really related to memory hot(un)plug or actual offlining of memory blocks.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM3hj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h]jah ]h"]*memory hot(un)plug configuration via sysfsah$]h&]uh1hhj hhhhhMubh)}(hhh](h)}(h$Memory Block Configuration via Sysfsh]h$Memory Block Configuration via Sysfs}(hj` hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj] hhhhhM:ubh)}(hEach memory block is represented as a memory block device that can be onlined or offlined. All memory blocks have their device information located in sysfs. Each present memory block is listed under ``/sys/devices/system/memory`` as::h](hEach memory block is represented as a memory block device that can be onlined or offlined. All memory blocks have their device information located in sysfs. Each present memory block is listed under }(hjn hhhNhNubjY)}(h``/sys/devices/system/memory``h]h/sys/devices/system/memory}(hjv hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjn ubh as:}(hjn hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM<hj] hhubj[)}(h$/sys/devices/system/memory/memoryXXXh]h$/sys/devices/system/memory/memoryXXX}hj sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMAhj] hhubh)}(hCwhere XXX is the memory block id; the number of digits is variable.h]hCwhere XXX is the memory block id; the number of digits is variable.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMChj] hhubh)}(hA present memory block indicates that some memory in the range is present; however, a memory block might span memory holes. A memory block spanning memory holes cannot be offlined.h]hA present memory block indicates that some memory in the range is present; however, a memory block might span memory holes. A memory block spanning memory holes cannot be offlined.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMEhj] hhubh)}(hFor example, assume 1 GiB memory block size. A device for a memory starting at 0x100000000 is ``/sys/devices/system/memory/memory4``::h](h^For example, assume 1 GiB memory block size. A device for a memory starting at 0x100000000 is }(hj hhhNhNubjY)}(h&``/sys/devices/system/memory/memory4``h]h"/sys/devices/system/memory/memory4}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj ubh:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMIhj] hhubj[)}(h(0x100000000 / 1Gib = 4)h]h(0x100000000 / 1Gib = 4)}hj sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMLhj] hhubh)}(h>This device covers address range [0x100000000 ... 0x140000000)h]h>This device covers address range [0x100000000 ... 0x140000000)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMNhj] hhubh)}(h*The following files are currently defined:h]h*The following files are currently defined:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMPhj] hhubjU )}(hhh]jZ )}(hhh](j_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1j^ hj ubj_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthK ../../memory/memory9 A backlink will also be created:: /sys/devices/system/memory/memory9/node0 -> ../../node/node0h](h)}(hIf the CONFIG_NUMA kernel configuration option is enabled, the memoryXXX/ directories can also be accessed via symbolic links located in the ``/sys/devices/system/node/node*`` directories.h](hIf the CONFIG_NUMA kernel configuration option is enabled, the memoryXXX/ directories can also be accessed via symbolic links located in the }(hjhhhNhNubjY)}(h"``/sys/devices/system/node/node*``h]h/sys/devices/system/node/node*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh directories.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(h For example::h]h For example:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubj[)}(h>/sys/devices/system/node/node0/memory9 -> ../../memory/memory9h]h>/sys/devices/system/node/node0/memory9 -> ../../memory/memory9}hjsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhjubh)}(h!A backlink will also be created::h]h A backlink will also be created:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubj[)}(h ../../node/node0h]h ../../node/node0}hjsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhjubeh}(h]h ]h"]h$]h&]uh1jhj] hhhhhNubeh}(h]jah ]h"]$memory block configuration via sysfsah$]h&]uh1hhj hhhhhM:ubh)}(hhh](h)}(hCommand Line Parametersh]hCommand Line Parameters}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhMubh)}(htSome command line parameters affect memory hot(un)plug handling. The following command line parameters are relevant:h]htSome command line parameters affect memory hot(un)plug handling. The following command line parameters are relevant:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubjU )}(hhh]jZ )}(hhh](j_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthKuh1j^ hjubj_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthK7uh1j^ hjubju )}(hhh](jz )}(hhh](j )}(hhh]h)}(h``memhp_default_state``h]jY)}(hj3h]hmemhp_default_state}(hj5hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj1ubah}(h]h ]h"]h$]h&]uh1hhhhMhj.ubah}(h]h ]h"]h$]h&]uh1j~ hj+ubj )}(hhh]h)}(haconfigure auto-onlining by essentially setting ``/sys/devices/system/memory/auto_online_blocks``.h](h/configure auto-onlining by essentially setting }(hjQhhhNhNubjY)}(h1``/sys/devices/system/memory/auto_online_blocks``h]h-/sys/devices/system/memory/auto_online_blocks}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjQubh.}(hjQhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjNubah}(h]h ]h"]h$]h&]uh1j~ hj+ubeh}(h]h ]h"]h$]h&]uh1jy hj(ubjz )}(hhh](j )}(hhh]h)}(h``movable_node``h]jY)}(hjh]h movable_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j~ hj}ubj )}(hhh]h)}(hconfigure automatic zone selection in the kernel when using the ``contig-zones`` online policy. When set, the kernel will default to ZONE_MOVABLE when onlining a memory block, unless other zones can be kept contiguous.h](h@configure automatic zone selection in the kernel when using the }(hjhhhNhNubjY)}(h``contig-zones``h]h contig-zones}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh online policy. When set, the kernel will default to ZONE_MOVABLE when onlining a memory block, unless other zones can be kept contiguous.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j~ hj}ubeh}(h]h ]h"]h$]h&]uh1jy hj(ubeh}(h]h ]h"]h$]h&]uh1jt hjubeh}(h]h ]h"]h$]h&]colsKuh1jY hjubah}(h]h ]h"]h$]h&]uh1jT hjhhhhhNubh)}(htSee Documentation/admin-guide/kernel-parameters.txt for a more generic description of these command line parameters.h]htSee Documentation/admin-guide/kernel-parameters.txt for a more generic description of these command line parameters.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]jah ]h"]command line parametersah$]h&]uh1hhj hhhhhMubh)}(hhh](h)}(hModule Parametersh]hModule Parameters}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhMubh)}(hXInstead of additional command line parameters or sysfs files, the ``memory_hotplug`` subsystem now provides a dedicated namespace for module parameters. Module parameters can be set via the command line by predicating them with ``memory_hotplug.`` such as::h](hBInstead of additional command line parameters or sysfs files, the }(hjhhhNhNubjY)}(h``memory_hotplug``h]hmemory_hotplug}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh subsystem now provides a dedicated namespace for module parameters. Module parameters can be set via the command line by predicating them with }(hjhhhNhNubjY)}(h``memory_hotplug.``h]hmemory_hotplug.}(hj"hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh such as:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj[)}(h!memory_hotplug.memmap_on_memory=1h]h!memory_hotplug.memmap_on_memory=1r}hj:sbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhjhhubh)}(hBand they can be observed (and some even modified at runtime) via::h]hAand they can be observed (and some even modified at runtime) via:}(hjHhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj[)}(h&/sys/module/memory_hotplug/parameters/h]h&/sys/module/memory_hotplug/parameters/}hjVsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhjhhubh)}(h6The following module parameters are currently defined:h]h6The following module parameters are currently defined:}(hjdhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubjU )}(hhh]jZ )}(hhh](j_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthK uh1j^ hjuubj_ )}(hhh]h}(h]h ]h"]h$]h&]colwidthK/uh1j^ hjuubju )}(hhh](jz )}(hhh](j )}(hhh]h)}(h``memmap_on_memory``h]jY)}(hjh]hmemmap_on_memory}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j~ hjubj )}(hhh](h)}(hread-write: Allocate memory for the memmap from the added memory block itself. Even if enabled, actual support depends on various other system properties and should only be regarded as a hint whether the behavior would be desired.h]hread-write: Allocate memory for the memmap from the added memory block itself. Even if enabled, actual support depends on various other system properties and should only be regarded as a hint whether the behavior would be desired.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hXWhile allocating the memmap from the memory block itself makes memory hotplug less likely to fail and keeps the memmap on the same NUMA node in any case, it can fragment physical memory in a way that huge pages in bigger granularity cannot be formed on hotplugged memory.h]hXWhile allocating the memmap from the memory block itself makes memory hotplug less likely to fail and keeps the memmap on the same NUMA node in any case, it can fragment physical memory in a way that huge pages in bigger granularity cannot be formed on hotplugged memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hXWith value "force" it could result in memory wastage due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations.h]hXWith value “force” it could result in memory wastage due to memmap size limitations. For example, if the memmap for a memory block requires 1 MiB, but the pageblock size is 2 MiB, 1 MiB of hotplugged memory will be wasted. Note that there are still cases where the feature cannot be enforced: for example, if the memmap is smaller than a single page, or if the architecture does not support the forced mode in all configurations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubeh}(h]h ]h"]h$]h&]uh1j~ hjubeh}(h]h ]h"]h$]h&]uh1jy hjubjz )}(hhh](j )}(hhh]h)}(h``online_policy``h]jY)}(hjh]h online_policy}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j~ hjubj )}(hhh](h)}(hX5read-write: Set the basic policy used for automatic zone selection when onlining memory blocks without specifying a target zone. ``contig-zones`` has been the kernel default before this parameter was added. After an online policy was configured and memory was online, the policy should not be changed anymore.h](hread-write: Set the basic policy used for automatic zone selection when onlining memory blocks without specifying a target zone. }(hjhhhNhNubjY)}(h``contig-zones``h]h contig-zones}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh has been the kernel default before this parameter was added. After an online policy was configured and memory was online, the policy should not be changed anymore.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hX7When set to ``contig-zones``, the kernel will try keeping zones contiguous. If a memory block intersects multiple zones or no zone, the behavior depends on the ``movable_node`` kernel command line parameter: default to ZONE_MOVABLE if set, default to the applicable kernel zone (usually ZONE_NORMAL) if not set.h](h When set to }(hj1hhhNhNubjY)}(h``contig-zones``h]h contig-zones}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj1ubh, the kernel will try keeping zones contiguous. If a memory block intersects multiple zones or no zone, the behavior depends on the }(hj1hhhNhNubjY)}(h``movable_node``h]h movable_node}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhj1ubh kernel command line parameter: default to ZONE_MOVABLE if set, default to the applicable kernel zone (usually ZONE_NORMAL) if not set.}(hj1hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hXWhen set to ``auto-movable``, the kernel will try onlining memory blocks to ZONE_MOVABLE if possible according to the configuration and memory device details. With this policy, one can avoid zone imbalances when eventually hotplugging a lot of memory later and still wanting to be able to hotunplug as much as possible reliably, very desirable in virtualized environments. This policy ignores the ``movable_node`` kernel command line parameter and isn't really applicable in environments that require it (e.g., bare metal with hotunpluggable nodes) where hotplugged memory might be exposed via the firmware-provided memory map early during boot to the system instead of getting detected, added and onlined later during boot (such as done by virtio-mem or by some hypervisors implementing emulated DIMMs). As one example, a hotplugged DIMM will be onlined either completely to ZONE_MOVABLE or completely to ZONE_NORMAL, not a mixture. As another example, as many memory blocks belonging to a virtio-mem device will be onlined to ZONE_MOVABLE as possible, special-casing units of memory blocks that can only get hotunplugged together. *This policy does not protect from setups that are problematic with ZONE_MOVABLE and does not change the zone of memory blocks dynamically after they were onlined.*h](h When set to }(hjchhhNhNubjY)}(h``auto-movable``h]h auto-movable}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjcubhXq, the kernel will try onlining memory blocks to ZONE_MOVABLE if possible according to the configuration and memory device details. With this policy, one can avoid zone imbalances when eventually hotplugging a lot of memory later and still wanting to be able to hotunplug as much as possible reliably, very desirable in virtualized environments. This policy ignores the }(hjchhhNhNubjY)}(h``movable_node``h]h movable_node}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjcubhX kernel command line parameter and isn’t really applicable in environments that require it (e.g., bare metal with hotunpluggable nodes) where hotplugged memory might be exposed via the firmware-provided memory map early during boot to the system instead of getting detected, added and onlined later during boot (such as done by virtio-mem or by some hypervisors implementing emulated DIMMs). As one example, a hotplugged DIMM will be onlined either completely to ZONE_MOVABLE or completely to ZONE_NORMAL, not a mixture. As another example, as many memory blocks belonging to a virtio-mem device will be onlined to ZONE_MOVABLE as possible, special-casing units of memory blocks that can only get hotunplugged together. }(hjchhhNhNubhemphasis)}(h*This policy does not protect from setups that are problematic with ZONE_MOVABLE and does not change the zone of memory blocks dynamically after they were onlined.*h]hThis policy does not protect from setups that are problematic with ZONE_MOVABLE and does not change the zone of memory blocks dynamically after they were onlined.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjcubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubeh}(h]h ]h"]h$]h&]uh1j~ hjubeh}(h]h ]h"]h$]h&]uh1jy hjubjz )}(hhh](j )}(hhh]h)}(h``auto_movable_ratio``h]jY)}(hjh]hauto_movable_ratio}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j~ hjubj )}(hhh](h)}(hread-write: Set the maximum MOVABLE:KERNEL memory ratio in % for the ``auto-movable`` online policy. Whether the ratio applies only for the system across all NUMA nodes or also per NUMA nodes depends on the ``auto_movable_numa_aware`` configuration.h](hEread-write: Set the maximum MOVABLE:KERNEL memory ratio in % for the }(hjhhhNhNubjY)}(h``auto-movable``h]h auto-movable}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubhz online policy. Whether the ratio applies only for the system across all NUMA nodes or also per NUMA nodes depends on the }(hjhhhNhNubjY)}(h``auto_movable_numa_aware``h]hauto_movable_numa_aware}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh configuration.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hX!All accounting is based on present memory pages in the zones combined with accounting per memory device. Memory dedicated to the CMA allocator is accounted as MOVABLE, although residing on one of the kernel zones. The possible ratio depends on the actual workload. The kernel default is "301" %, for example, allowing for hotplugging 24 GiB to a 8 GiB VM and automatically onlining all hotplugged memory to ZONE_MOVABLE in many setups. The additional 1% deals with some pages being not present, for example, because of some firmware allocations.h]hX%All accounting is based on present memory pages in the zones combined with accounting per memory device. Memory dedicated to the CMA allocator is accounted as MOVABLE, although residing on one of the kernel zones. The possible ratio depends on the actual workload. The kernel default is “301” %, for example, allowing for hotplugging 24 GiB to a 8 GiB VM and automatically onlining all hotplugged memory to ZONE_MOVABLE in many setups. The additional 1% deals with some pages being not present, for example, because of some firmware allocations.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hXNote that ZONE_NORMAL memory provided by one memory device does not allow for more ZONE_MOVABLE memory for a different memory device. As one example, onlining memory of a hotplugged DIMM to ZONE_NORMAL will not allow for another hotplugged DIMM to get onlined to ZONE_MOVABLE automatically. In contrast, memory hotplugged by a virtio-mem device that got onlined to ZONE_NORMAL will allow for more ZONE_MOVABLE memory within *the same* virtio-mem device.h](hXNote that ZONE_NORMAL memory provided by one memory device does not allow for more ZONE_MOVABLE memory for a different memory device. As one example, onlining memory of a hotplugged DIMM to ZONE_NORMAL will not allow for another hotplugged DIMM to get onlined to ZONE_MOVABLE automatically. In contrast, memory hotplugged by a virtio-mem device that got onlined to ZONE_NORMAL will allow for more ZONE_MOVABLE memory within }(hjhhhNhNubj)}(h *the same*h]hthe same}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh virtio-mem device.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubeh}(h]h ]h"]h$]h&]uh1j~ hjubeh}(h]h ]h"]h$]h&]uh1jy hjubjz )}(hhh](j )}(hhh]h)}(h``auto_movable_numa_aware``h]jY)}(hjKh]hauto_movable_numa_aware}(hjMhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjIubah}(h]h ]h"]h$]h&]uh1hhhhMhjFubah}(h]h ]h"]h$]h&]uh1j~ hjCubj )}(hhh](h)}(hread-write: Configure whether the ``auto_movable_ratio`` in the ``auto-movable`` online policy also applies per NUMA node in addition to the whole system across all NUMA nodes. The kernel default is "Y".h](h"read-write: Configure whether the }(hjihhhNhNubjY)}(h``auto_movable_ratio``h]hauto_movable_ratio}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjiubh in the }(hjihhhNhNubjY)}(h``auto-movable``h]h auto-movable}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjiubh online policy also applies per NUMA node in addition to the whole system across all NUMA nodes. The kernel default is “Y”.}(hjihhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjfubh)}(hDisabling NUMA awareness can be helpful when dealing with NUMA nodes that should be completely hotunpluggable, onlining the memory completely to ZONE_MOVABLE automatically if possible.h]hDisabling NUMA awareness can be helpful when dealing with NUMA nodes that should be completely hotunpluggable, onlining the memory completely to ZONE_MOVABLE automatically if possible.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjfubh)}(h.Parameter availability depends on CONFIG_NUMA.h]h.Parameter availability depends on CONFIG_NUMA.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM%hjfubeh}(h]h ]h"]h$]h&]uh1j~ hjCubeh}(h]h ]h"]h$]h&]uh1jy hjubeh}(h]h ]h"]h$]h&]uh1jt hjuubeh}(h]h ]h"]h$]h&]colsKuh1jY hjrubah}(h]h ]h"]h$]h&]uh1jT hjhhhhhNubeh}(h]j"ah ]h"]module parametersah$]h&]uh1hhj hhhhhMubeh}(h]jah ]h"]configuring memory hot(un)plugah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(h ZONE_MOVABLEh]h ZONE_MOVABLE}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjJuh1hhjhhhhhM)ubh)}(hXZONE_MOVABLE is an important mechanism for more reliable memory offlining. Further, having system RAM managed by ZONE_MOVABLE instead of one of the kernel zones can increase the number of possible transparent huge pages and dynamically allocated huge pages.h]hXZONE_MOVABLE is an important mechanism for more reliable memory offlining. Further, having system RAM managed by ZONE_MOVABLE instead of one of the kernel zones can increase the number of possible transparent huge pages and dynamically allocated huge pages.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM+hjhhubh)}(hMost kernel allocations are unmovable. Important examples include the memory map (usually 1/64ths of memory), page tables, and kmalloc(). Such allocations can only be served from the kernel zones.h]hMost kernel allocations are unmovable. Important examples include the memory map (usually 1/64ths of memory), page tables, and kmalloc(). Such allocations can only be served from the kernel zones.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM0hjhhubh)}(hMost user space pages, such as anonymous memory, and page cache pages are movable. Such allocations can be served from ZONE_MOVABLE and the kernel zones.h]hMost user space pages, such as anonymous memory, and page cache pages are movable. Such allocations can be served from ZONE_MOVABLE and the kernel zones.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM4hjhhubh)}(hOnly movable allocations are served from ZONE_MOVABLE, resulting in unmovable allocations being limited to the kernel zones. Without ZONE_MOVABLE, there is absolutely no guarantee whether a memory block can be offlined successfully.h]hOnly movable allocations are served from ZONE_MOVABLE, resulting in unmovable allocations being limited to the kernel zones. Without ZONE_MOVABLE, there is absolutely no guarantee whether a memory block can be offlined successfully.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM7hjhhubh)}(hhh](h)}(hZone Imbalancesh]hZone Imbalances}(hj0hhhNhNubah}(h]h ]h"]h$]h&]jjiuh1hhj-hhhhhM<ubh)}(hX)Having too much system RAM managed by ZONE_MOVABLE is called a zone imbalance, which can harm the system or degrade performance. As one example, the kernel might crash because it runs out of free memory for unmovable allocations, although there is still plenty of free memory left in ZONE_MOVABLE.h]hX)Having too much system RAM managed by ZONE_MOVABLE is called a zone imbalance, which can harm the system or degrade performance. As one example, the kernel might crash because it runs out of free memory for unmovable allocations, although there is still plenty of free memory left in ZONE_MOVABLE.}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM>hj-hhubh)}(hUsually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. Ratios of 63:1 are definitely impossible due to the overhead for the memory map.h]hUsually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. Ratios of 63:1 are definitely impossible due to the overhead for the memory map.}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMChj-hhubh)}(hActual safe zone ratios depend on the workload. Extreme cases, like excessive long-term pinning of pages, might not be able to deal with ZONE_MOVABLE at all.h]hActual safe zone ratios depend on the workload. Extreme cases, like excessive long-term pinning of pages, might not be able to deal with ZONE_MOVABLE at all.}(hjZhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMFhj-hhubj)}(hCMA memory part of a kernel zone essentially behaves like memory in ZONE_MOVABLE and similar considerations apply, especially when combining CMA with ZONE_MOVABLE.h]h)}(hCMA memory part of a kernel zone essentially behaves like memory in ZONE_MOVABLE and similar considerations apply, especially when combining CMA with ZONE_MOVABLE.h]hCMA memory part of a kernel zone essentially behaves like memory in ZONE_MOVABLE and similar considerations apply, especially when combining CMA with ZONE_MOVABLE.}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMKhjhubah}(h]h ]h"]h$]h&]uh1jhj-hhhhhNubeh}(h]joah ]h"]zone imbalancesah$]h&]uh1hhjhhhhhM<ubh)}(hhh](h)}(h"ZONE_MOVABLE Sizing Considerationsh]h"ZONE_MOVABLE Sizing Considerations}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhMPubh)}(hWe usually expect that a large portion of available system RAM will actually be consumed by user space, either directly or indirectly via the page cache. In the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.h]hWe usually expect that a large portion of available system RAM will actually be consumed by user space, either directly or indirectly via the page cache. In the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMRhjhhubh)}(hWith that in mind, it makes sense that we can have a big portion of system RAM managed by ZONE_MOVABLE. However, there are some things to consider when using ZONE_MOVABLE, especially when fine-tuning zone ratios:h]hWith that in mind, it makes sense that we can have a big portion of system RAM managed by ZONE_MOVABLE. However, there are some things to consider when using ZONE_MOVABLE, especially when fine-tuning zone ratios:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMVhjhhubh)}(hhh](h)}(hHaving a lot of offline memory blocks. Even offline memory blocks consume memory for metadata and page tables in the direct map; having a lot of offline memory blocks is not a typical case, though. h]h)}(hHaving a lot of offline memory blocks. Even offline memory blocks consume memory for metadata and page tables in the direct map; having a lot of offline memory blocks is not a typical case, though.h]hHaving a lot of offline memory blocks. Even offline memory blocks consume memory for metadata and page tables in the direct map; having a lot of offline memory blocks is not a typical case, though.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMZhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hXMemory ballooning without balloon compaction is incompatible with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and pseries CMM, fully support balloon compaction. Further, the CONFIG_BALLOON_COMPACTION kernel configuration option might be disabled. In that case, balloon inflation will only perform unmovable allocations and silently create a zone imbalance, usually triggered by inflation requests from the hypervisor. h](h)}(hMemory ballooning without balloon compaction is incompatible with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and pseries CMM, fully support balloon compaction.h]hMemory ballooning without balloon compaction is incompatible with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and pseries CMM, fully support balloon compaction.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM^hjubh)}(hXFurther, the CONFIG_BALLOON_COMPACTION kernel configuration option might be disabled. In that case, balloon inflation will only perform unmovable allocations and silently create a zone imbalance, usually triggered by inflation requests from the hypervisor.h]hXFurther, the CONFIG_BALLOON_COMPACTION kernel configuration option might be disabled. In that case, balloon inflation will only perform unmovable allocations and silently create a zone imbalance, usually triggered by inflation requests from the hypervisor.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMbhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(h[Gigantic pages are unmovable, resulting in user space consuming a lot of unmovable memory. h]h)}(hZGigantic pages are unmovable, resulting in user space consuming a lot of unmovable memory.h]hZGigantic pages are unmovable, resulting in user space consuming a lot of unmovable memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMghjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hHuge pages are unmovable when an architectures does not support huge page migration, resulting in a similar issue as with gigantic pages. h]h)}(hHuge pages are unmovable when an architectures does not support huge page migration, resulting in a similar issue as with gigantic pages.h]hHuge pages are unmovable when an architectures does not support huge page migration, resulting in a similar issue as with gigantic pages.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMjhj ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hXxPage tables are unmovable. Excessive swapping, mapping extremely large files or ZONE_DEVICE memory can be problematic, although only really relevant in corner cases. When we manage a lot of user space memory that has been swapped out or is served from a file/persistent memory/... we still need a lot of page tables to manage that memory once user space accessed that memory. h]h)}(hXwPage tables are unmovable. Excessive swapping, mapping extremely large files or ZONE_DEVICE memory can be problematic, although only really relevant in corner cases. When we manage a lot of user space memory that has been swapped out or is served from a file/persistent memory/... we still need a lot of page tables to manage that memory once user space accessed that memory.h]hXwPage tables are unmovable. Excessive swapping, mapping extremely large files or ZONE_DEVICE memory can be problematic, although only really relevant in corner cases. When we manage a lot of user space memory that has been swapped out or is served from a file/persistent memory/... we still need a lot of page tables to manage that memory once user space accessed that memory.}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMmhj%ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hlIn certain DAX configurations the memory map for the device memory will be allocated from the kernel zones. h]h)}(hkIn certain DAX configurations the memory map for the device memory will be allocated from the kernel zones.h]hkIn certain DAX configurations the memory map for the device memory will be allocated from the kernel zones.}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMshj=ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hKASAN can have a significant memory overhead, for example, consuming 1/8th of the total system memory size as (unmovable) tracking metadata. h]h)}(hKASAN can have a significant memory overhead, for example, consuming 1/8th of the total system memory size as (unmovable) tracking metadata.h]hKASAN can have a significant memory overhead, for example, consuming 1/8th of the total system memory size as (unmovable) tracking metadata.}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMvhjUubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hXLong-term pinning of pages. Techniques that rely on long-term pinnings (especially, RDMA and vfio/mdev) are fundamentally problematic with ZONE_MOVABLE, and therefore, memory offlining. Pinned pages cannot reside on ZONE_MOVABLE as that would turn these pages unmovable. Therefore, they have to be migrated off that zone while pinning. Pinning a page can fail even if there is plenty of free memory in ZONE_MOVABLE. In addition, using ZONE_MOVABLE might make page pinning more expensive, because of the page migration overhead. h](h)}(hXLong-term pinning of pages. Techniques that rely on long-term pinnings (especially, RDMA and vfio/mdev) are fundamentally problematic with ZONE_MOVABLE, and therefore, memory offlining. Pinned pages cannot reside on ZONE_MOVABLE as that would turn these pages unmovable. Therefore, they have to be migrated off that zone while pinning. Pinning a page can fail even if there is plenty of free memory in ZONE_MOVABLE.h]hXLong-term pinning of pages. Techniques that rely on long-term pinnings (especially, RDMA and vfio/mdev) are fundamentally problematic with ZONE_MOVABLE, and therefore, memory offlining. Pinned pages cannot reside on ZONE_MOVABLE as that would turn these pages unmovable. Therefore, they have to be migrated off that zone while pinning. Pinning a page can fail even if there is plenty of free memory in ZONE_MOVABLE.}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMyhjmubh)}(hoIn addition, using ZONE_MOVABLE might make page pinning more expensive, because of the page migration overhead.h]hoIn addition, using ZONE_MOVABLE might make page pinning more expensive, because of the page migration overhead.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjmubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]j[j\uh1hhhhMZhjhhubh)}(hoBy default, all the memory configured at boot time is managed by the kernel zones and ZONE_MOVABLE is not used.h]hoBy default, all the memory configured at boot time is managed by the kernel zones and ZONE_MOVABLE is not used.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hX To enable ZONE_MOVABLE to include the memory present at boot and to control the ratio between movable and kernel zones there are two command line options: ``kernelcore=`` and ``movablecore=``. See Documentation/admin-guide/kernel-parameters.rst for their description.h](hTo enable ZONE_MOVABLE to include the memory present at boot and to control the ratio between movable and kernel zones there are two command line options: }(hjhhhNhNubjY)}(h``kernelcore=``h]h kernelcore=}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubh and }(hjhhhNhNubjY)}(h``movablecore=``h]h movablecore=}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jXhjubhL. See Documentation/admin-guide/kernel-parameters.rst for their description.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]jah ]h"]"zone_movable sizing considerationsah$]h&]uh1hhjhhhhhMPubh)}(hhh](h)}(h!Memory Offlining and ZONE_MOVABLEh]h!Memory Offlining and ZONE_MOVABLE}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhMubh)}(h^Even with ZONE_MOVABLE, there are some corner cases where offlining a memory block might fail:h]h^Even with ZONE_MOVABLE, there are some corner cases where offlining a memory block might fail:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hhh](h)}(hMemory blocks with memory holes; this applies to memory blocks present during boot and can apply to memory blocks hotplugged via the XEN balloon and the Hyper-V balloon. h]h)}(hMemory blocks with memory holes; this applies to memory blocks present during boot and can apply to memory blocks hotplugged via the XEN balloon and the Hyper-V balloon.h]hMemory blocks with memory holes; this applies to memory blocks present during boot and can apply to memory blocks hotplugged via the XEN balloon and the Hyper-V balloon.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hMixed NUMA nodes and mixed zones within a single memory block prevent memory offlining; this applies to memory blocks present during boot only. h]h)}(hMixed NUMA nodes and mixed zones within a single memory block prevent memory offlining; this applies to memory blocks present during boot only.h]hMixed NUMA nodes and mixed zones within a single memory block prevent memory offlining; this applies to memory blocks present during boot only.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hSpecial memory blocks prevented by the system from getting offlined. Examples include any memory available during boot on arm64 or memory blocks spanning the crashkernel area on s390x; this usually applies to memory blocks present during boot only. h]h)}(hSpecial memory blocks prevented by the system from getting offlined. Examples include any memory available during boot on arm64 or memory blocks spanning the crashkernel area on s390x; this usually applies to memory blocks present during boot only.h]hSpecial memory blocks prevented by the system from getting offlined. Examples include any memory available during boot on arm64 or memory blocks spanning the crashkernel area on s390x; this usually applies to memory blocks present during boot only.}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj2ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(huMemory blocks overlapping with CMA areas cannot be offlined, this applies to memory blocks present during boot only. h]h)}(htMemory blocks overlapping with CMA areas cannot be offlined, this applies to memory blocks present during boot only.h]htMemory blocks overlapping with CMA areas cannot be offlined, this applies to memory blocks present during boot only.}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjJubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hConcurrent activity that operates on the same physical memory area, such as allocating gigantic pages, can result in temporary offlining failures. h]h)}(hConcurrent activity that operates on the same physical memory area, such as allocating gigantic pages, can result in temporary offlining failures.h]hConcurrent activity that operates on the same physical memory area, such as allocating gigantic pages, can result in temporary offlining failures.}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjbubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hXIOut of memory when dissolving huge pages, especially when HugeTLB Vmemmap Optimization (HVO) is enabled. Offlining code may be able to migrate huge page contents, but may not be able to dissolve the source huge page because it fails allocating (unmovable) pages for the vmemmap, because the system might not have free memory in the kernel zones left. Users that depend on memory offlining to succeed for movable zones should carefully consider whether the memory savings gained from this feature are worth the risk of possibly not being able to offline memory in certain situations. h](h)}(hhOut of memory when dissolving huge pages, especially when HugeTLB Vmemmap Optimization (HVO) is enabled.h]hhOut of memory when dissolving huge pages, especially when HugeTLB Vmemmap Optimization (HVO) is enabled.}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjzubh)}(hOfflining code may be able to migrate huge page contents, but may not be able to dissolve the source huge page because it fails allocating (unmovable) pages for the vmemmap, because the system might not have free memory in the kernel zones left.h]hOfflining code may be able to migrate huge page contents, but may not be able to dissolve the source huge page because it fails allocating (unmovable) pages for the vmemmap, because the system might not have free memory in the kernel zones left.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjzubh)}(hUsers that depend on memory offlining to succeed for movable zones should carefully consider whether the memory savings gained from this feature are worth the risk of possibly not being able to offline memory in certain situations.h]hUsers that depend on memory offlining to succeed for movable zones should carefully consider whether the memory savings gained from this feature are worth the risk of possibly not being able to offline memory in certain situations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjzubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]j[j\uh1hhhhMhjhhubh)}(hFurther, when running into out of memory situations while migrating pages, or when still encountering permanently unmovable pages within ZONE_MOVABLE (-> BUG), memory offlining will keep retrying until it eventually succeeds.h]hFurther, when running into out of memory situations while migrating pages, or when still encountering permanently unmovable pages within ZONE_MOVABLE (-> BUG), memory offlining will keep retrying until it eventually succeeds.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hWhen offlining is triggered from user space, the offlining context can be terminated by sending a signal. A timeout based offlining can easily be implemented via::h]hWhen offlining is triggered from user space, the offlining context can be terminated by sending a signal. A timeout based offlining can easily be implemented via:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj[)}(h3% timeout $TIMEOUT offline_block | failure_handlingh]h3% timeout $TIMEOUT offline_block | failure_handling}hjsbah}(h]h ]h"]h$]h&]jjjkuh1jZhhhMhjhhubeh}(h]jah ]h"]!memory offlining and zone_movableah$]h&]uh1hhjhhhhhMubeh}(h]jPah ]h"] zone_movableah$]h&]uh1hhhhhhhhM)ubeh}(h]memory-hot-un-plugah ]h"]memory hot(un)plugah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksj~ footnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(jjjjjhjj j:j+jjMjj{j*jjjj jjj jj+j jMj jojjjZ jjjjjjj"jjPjjojjjju nametypes}(jjjjj:jjj*jj jjj j jjZ jjjjjjjuh}(jhjhhjj jyj+jjMj=j{jjjjj-jjj jj+jjMjjoj jj jj jj] jjj"jjPjjoj-jjjjhhjhj%jjGj>jujljjjjjjjjj%jjGj>jij`jjjjjjjjjjjJjAjij`jjjju footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}j$KsRparse_messages]transform_messages] transformerN include_log] decorationNhhub.