Boot time memory management¶
Early system initialization cannot use “normal” memory management simply because it is not set up yet. But there is still need to allocate memory for various data structures, for instance for the physical page allocator. To address this, a specialized allocator called the Boot Memory Allocator, or bootmem, was introduced. Several years later PowerPC developers added a “Logical Memory Blocks” allocator, which was later adopted by other architectures and renamed to memblock. There is also a compatibility layer called nobootmem that translates bootmem allocation interfaces to memblock calls.
The selection of the early allocator is done using
CONFIG_NO_BOOTMEM
and CONFIG_HAVE_MEMBLOCK
kernel
configuration options. These options are enabled or disabled
statically by the architectures’ Kconfig files.
- Architectures that rely only on bootmem select
CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=n
. - The users of memblock with the nobootmem compatibility layer set
CONFIG_NO_BOOTMEM=y && CONFIG_HAVE_MEMBLOCK=y
. - And for those that use both memblock and bootmem the configuration
includes
CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=y
.
Whichever allocator is used, it is the responsibility of the
architecture specific initialization to set it up in
setup_arch()
and tear it down in mem_init()
functions.
Once the early memory management is available it offers a variety of functions and macros for memory allocations. The allocation request may be directed to the first (and probably the only) node or to a particular node in a NUMA system. There are API variants that panic when an allocation fails and those that don’t. And more recent and advanced memblock even allows controlling its own behaviour.
Bootmem¶
(mostly stolen from Mel Gorman’s “Understanding the Linux Virtual Memory Manager” book)
Bootmem is a boot-time physical memory allocator and configurator.
It is used early in the boot process before the page allocator is set up.
Bootmem is based on the most basic of allocators, a First Fit allocator which uses a bitmap to represent memory. If a bit is 1, the page is allocated and 0 if unallocated. To satisfy allocations of sizes smaller than a page, the allocator records the Page Frame Number (PFN) of the last allocation and the offset the allocation ended at. Subsequent small allocations are merged together and stored on the same page.
The information used by the bootmem allocator is represented by
struct bootmem_data
. An array to hold up to MAX_NUMNODES
such structures is statically allocated and then it is discarded
when the system initialization completes. Each entry in this array
corresponds to a node with memory. For UMA systems only entry 0 is
used.
The bootmem allocator is initialized during early architecture
specific setup. Each architecture is required to supply a
setup_arch()
function which, among other tasks, is
responsible for acquiring the necessary parameters to initialise
the boot memory allocator. These parameters define limits of usable
physical memory:
- min_low_pfn - the lowest PFN that is available in the system
- max_low_pfn - the highest PFN that may be addressed by low
memory (
ZONE_NORMAL
) - max_pfn - the last PFN available to the system.
After those limits are determined, the init_bootmem()
or
init_bootmem_node()
function should be called to initialize
the bootmem allocator. The UMA case should use the init_bootmem
function. It will initialize contig_page_data
structure that
represents the only memory node in the system. In the NUMA case the
init_bootmem_node function should be called to initialize the
bootmem allocator for each node.
Once the allocator is set up, it is possible to use either single node or NUMA variant of the allocation APIs.
Memblock¶
Memblock is a method of managing memory regions during the early boot period when the usual kernel memory allocators are not up and running.
Memblock views the system memory as collections of contiguous regions. There are several types of these collections:
memory
- describes the physical memory available to the kernel; this may differ from the actual physical memory installed in the system, for instance when the memory is restricted withmem=
command line parameterreserved
- describes the regions that were allocatedphysmap
- describes the actual physical memory regardless of the possible restrictions; thephysmap
type is only available on some architectures.
Each region is represented by struct memblock_region
that
defines the region extents, its attributes and NUMA node id on NUMA
systems. Every memory type is described by the struct
memblock_type
which contains an array of memory regions along with
the allocator metadata. The memory types are nicely wrapped with
struct memblock
. This structure is statically initialzed
at build time. The region arrays for the “memory” and “reserved”
types are initially sized to INIT_MEMBLOCK_REGIONS
and for the
“physmap” type to INIT_PHYSMEM_REGIONS
.
The memblock_allow_resize()
enables automatic resizing of
the region arrays during addition of new regions. This feature
should be used with care so that memory allocated for the region
array will not overlap with areas that should be reserved, for
example initrd.
The early architecture setup should tell memblock what the physical
memory layout is by using memblock_add()
or
memblock_add_node()
functions. The first function does not
assign the region to a NUMA node and it is appropriate for UMA
systems. Yet, it is possible to use it on NUMA systems as well and
assign the region to a NUMA node later in the setup process using
memblock_set_node()
. The memblock_add_node()
performs such an assignment directly.
Once memblock is setup the memory can be allocated using either memblock or bootmem APIs.
As the system boot progresses, the architecture specific
mem_init()
function frees all the memory to the buddy page
allocator.
If an architecure enables CONFIG_ARCH_DISCARD_MEMBLOCK
, the
memblock data structures will be discarded after the system
initialization compltes.
Functions and structures¶
Common API¶
The functions that are described in this section are available regardless of what early memory manager is enabled.
-
void
free_bootmem_late
(unsigned long addr, unsigned long size)¶ free bootmem pages directly to page allocator
Parameters
unsigned long addr
- starting address of the range
unsigned long size
- size of the range in bytes
Description
This is only useful when the bootmem allocator has already been torn down, but we are still initializing the system. Pages are given directly to the page allocator, no bootmem metadata is updated because it is gone.
-
unsigned long
free_all_bootmem
(void)¶ release free pages to the buddy allocator
Parameters
void
- no arguments
Return
the number of pages actually released.
-
void
free_bootmem_node
(pg_data_t * pgdat, unsigned long physaddr, unsigned long size)¶ mark a page range as usable
Parameters
pg_data_t * pgdat
- node the range resides on
unsigned long physaddr
- starting physical address of the range
unsigned long size
- size of the range in bytes
Description
Partial pages will be considered reserved and left as they are.
The range must reside completely on the specified node.
-
void
free_bootmem
(unsigned long addr, unsigned long size)¶ mark a page range as usable
Parameters
unsigned long addr
- starting physical address of the range
unsigned long size
- size of the range in bytes
Description
Partial pages will be considered reserved and left as they are.
The range must be contiguous but may span node boundaries.
-
void *
__alloc_bootmem_nopanic
(unsigned long size, unsigned long align, unsigned long goal)¶ allocate boot memory without panicking
Parameters
unsigned long size
- size of the request in bytes
unsigned long align
- alignment of the region
unsigned long goal
- preferred starting address of the region
Description
The goal is dropped if it can not be satisfied and the allocation will fall back to memory below goal.
Allocation may happen on any node in the system.
Return
address of the allocated region or NULL
on failure.
-
void *
__alloc_bootmem
(unsigned long size, unsigned long align, unsigned long goal)¶ allocate boot memory
Parameters
unsigned long size
- size of the request in bytes
unsigned long align
- alignment of the region
unsigned long goal
- preferred starting address of the region
Description
The goal is dropped if it can not be satisfied and the allocation will fall back to memory below goal.
Allocation may happen on any node in the system.
The function panics if the request can not be satisfied.
Return
address of the allocated region.
-
void *
__alloc_bootmem_node
(pg_data_t * pgdat, unsigned long size, unsigned long align, unsigned long goal)¶ allocate boot memory from a specific node
Parameters
pg_data_t * pgdat
- node to allocate from
unsigned long size
- size of the request in bytes
unsigned long align
- alignment of the region
unsigned long goal
- preferred starting address of the region
Description
The goal is dropped if it can not be satisfied and the allocation will fall back to memory below goal.
Allocation may fall back to any node in the system if the specified node can not hold the requested memory.
The function panics if the request can not be satisfied.
Return
address of the allocated region.
-
void *
__alloc_bootmem_low
(unsigned long size, unsigned long align, unsigned long goal)¶ allocate low boot memory
Parameters
unsigned long size
- size of the request in bytes
unsigned long align
- alignment of the region
unsigned long goal
- preferred starting address of the region
Description
The goal is dropped if it can not be satisfied and the allocation will fall back to memory below goal.
Allocation may happen on any node in the system.
The function panics if the request can not be satisfied.
Return
address of the allocated region.
-
void *
__alloc_bootmem_low_node
(pg_data_t * pgdat, unsigned long size, unsigned long align, unsigned long goal)¶ allocate low boot memory from a specific node
Parameters
pg_data_t * pgdat
- node to allocate from
unsigned long size
- size of the request in bytes
unsigned long align
- alignment of the region
unsigned long goal
- preferred starting address of the region
Description
The goal is dropped if it can not be satisfied and the allocation will fall back to memory below goal.
Allocation may fall back to any node in the system if the specified node can not hold the requested memory.
The function panics if the request can not be satisfied.
Return
address of the allocated region.
Bootmem specific API¶
These interfaces available only with bootmem, i.e when CONFIG_NO_BOOTMEM=n
-
struct
bootmem_data
¶ per-node information used by the bootmem allocator
Definition
struct bootmem_data {
unsigned long node_min_pfn;
unsigned long node_low_pfn;
void *node_bootmem_map;
unsigned long last_end_off;
unsigned long hint_idx;
struct list_head list;
};
Members
node_min_pfn
- the starting physical address of the node’s memory
node_low_pfn
- the end physical address of the directly addressable memory
node_bootmem_map
- is a bitmap pointer - the bits represent all physical memory pages (including holes) on the node.
last_end_off
- the offset within the page of the end of the last allocation; if 0, the page used is full
hint_idx
- the PFN of the page used with the last allocation; together with using this with the last_end_offset field, a test can be made to see if allocations can be merged with the page used for the last allocation rather than using up a full new page.
list
- list entry in the linked list ordered by the memory addresses
Memblock specific API¶
Here is the description of memblock data structures, functions and macros. Some of them are actually internal, but since they are documented it would be silly to omit them. Besides, reading the descriptions for the internal functions can help to understand what really happens under the hood.
-
enum
memblock_flags
¶ definition of memory region attributes
Constants
MEMBLOCK_NONE
- no special request
MEMBLOCK_HOTPLUG
- hotpluggable region
MEMBLOCK_MIRROR
- mirrored region
MEMBLOCK_NOMAP
- don’t add to kernel direct mapping
-
struct
memblock_region
¶ represents a memory region
Definition
struct memblock_region {
phys_addr_t base;
phys_addr_t size;
enum memblock_flags flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP;
int nid;
#endif;
};
Members
base
- physical address of the region
size
- size of the region
flags
- memory region attributes
nid
- NUMA node id
-
struct
memblock_type
¶ collection of memory regions of certain type
Definition
struct memblock_type {
unsigned long cnt;
unsigned long max;
phys_addr_t total_size;
struct memblock_region *regions;
char *name;
};
Members
cnt
- number of regions
max
- size of the allocated array
total_size
- size of all regions
regions
- array of regions
name
- the memory type symbolic name
-
struct
memblock
¶ memblock allocator metadata
Definition
struct memblock {
bool bottom_up;
phys_addr_t current_limit;
struct memblock_type memory;
struct memblock_type reserved;
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP;
struct memblock_type physmem;
#endif;
};
Members
bottom_up
- is bottom up direction?
current_limit
- physical address of the current allocation limit
memory
- usabe memory regions
reserved
- reserved memory regions
physmem
- all physical memory
-
for_each_mem_range
(i, type_a, type_b, nid, flags, p_start, p_end, p_nid)¶ iterate through memblock areas from type_a and not included in type_b. Or just type_a if type_b is NULL.
Parameters
i
- u64 used as loop variable
type_a
- ptr to memblock_type to iterate
type_b
- ptr to memblock_type which excludes from the iteration
nid
- node selector,
NUMA_NO_NODE
for all nodes flags
- pick from blocks based on memory attributes
p_start
- ptr to phys_addr_t for start address of the range, can be
NULL
p_end
- ptr to phys_addr_t for end address of the range, can be
NULL
p_nid
- ptr to int for nid of the range, can be
NULL
-
for_each_mem_range_rev
(i, type_a, type_b, nid, flags, p_start, p_end, p_nid)¶ reverse iterate through memblock areas from type_a and not included in type_b. Or just type_a if type_b is NULL.
Parameters
i
- u64 used as loop variable
type_a
- ptr to memblock_type to iterate
type_b
- ptr to memblock_type which excludes from the iteration
nid
- node selector,
NUMA_NO_NODE
for all nodes flags
- pick from blocks based on memory attributes
p_start
- ptr to phys_addr_t for start address of the range, can be
NULL
p_end
- ptr to phys_addr_t for end address of the range, can be
NULL
p_nid
- ptr to int for nid of the range, can be
NULL
-
for_each_reserved_mem_region
(i, p_start, p_end)¶ iterate over all reserved memblock areas
Parameters
i
- u64 used as loop variable
p_start
- ptr to phys_addr_t for start address of the range, can be
NULL
p_end
- ptr to phys_addr_t for end address of the range, can be
NULL
Description
Walks over reserved areas of memblock. Available as soon as memblock is initialized.
-
for_each_mem_pfn_range
(i, nid, p_start, p_end, p_nid)¶ early memory pfn range iterator
Parameters
i
- an integer used as loop variable
nid
- node selector,
MAX_NUMNODES
for all nodes p_start
- ptr to ulong for start pfn of the range, can be
NULL
p_end
- ptr to ulong for end pfn of the range, can be
NULL
p_nid
- ptr to int for nid of the range, can be
NULL
Description
Walks over configured memory ranges.
-
for_each_free_mem_range
(i, nid, flags, p_start, p_end, p_nid)¶ iterate through free memblock areas
Parameters
i
- u64 used as loop variable
nid
- node selector,
NUMA_NO_NODE
for all nodes flags
- pick from blocks based on memory attributes
p_start
- ptr to phys_addr_t for start address of the range, can be
NULL
p_end
- ptr to phys_addr_t for end address of the range, can be
NULL
p_nid
- ptr to int for nid of the range, can be
NULL
Description
Walks over free (memory && !reserved) areas of memblock. Available as soon as memblock is initialized.
-
for_each_free_mem_range_reverse
(i, nid, flags, p_start, p_end, p_nid)¶ rev-iterate through free memblock areas
Parameters
i
- u64 used as loop variable
nid
- node selector,
NUMA_NO_NODE
for all nodes flags
- pick from blocks based on memory attributes
p_start
- ptr to phys_addr_t for start address of the range, can be
NULL
p_end
- ptr to phys_addr_t for end address of the range, can be
NULL
p_nid
- ptr to int for nid of the range, can be
NULL
Description
Walks over free (memory && !reserved) areas of memblock in reverse order. Available as soon as memblock is initialized.
iterate through reserved and unavailable memory
Parameters
i
- u64 used as loop variable
p_start
- ptr to phys_addr_t for start address of the range, can be
NULL
p_end
- ptr to phys_addr_t for end address of the range, can be
NULL
Description
Walks over unavailable but reserved (reserved && !memory) areas of memblock. Available as soon as memblock is initialized.
Note
because this memory does not belong to any physical node, flags and nid arguments do not make sense and thus not exported as arguments.
-
void
memblock_set_current_limit
(phys_addr_t limit)¶ Set the current allocation limit to allow limiting allocations to what is currently accessible during boot
Parameters
phys_addr_t limit
- New limit value (physical address)
-
unsigned long
memblock_region_memory_base_pfn
(const struct memblock_region * reg)¶ get the lowest pfn of the memory region
Parameters
const struct memblock_region * reg
- memblock_region structure
Return
the lowest pfn intersecting with the memory region
-
unsigned long
memblock_region_memory_end_pfn
(const struct memblock_region * reg)¶ get the end pfn of the memory region
Parameters
const struct memblock_region * reg
- memblock_region structure
Return
the end_pfn of the reserved region
-
unsigned long
memblock_region_reserved_base_pfn
(const struct memblock_region * reg)¶ get the lowest pfn of the reserved region
Parameters
const struct memblock_region * reg
- memblock_region structure
Return
the lowest pfn intersecting with the reserved region
-
unsigned long
memblock_region_reserved_end_pfn
(const struct memblock_region * reg)¶ get the end pfn of the reserved region
Parameters
const struct memblock_region * reg
- memblock_region structure
Return
the end_pfn of the reserved region