Boot time memory management

Early system initialization cannot use “normal” memory management simply because it is not set up yet. But there is still need to allocate memory for various data structures, for instance for the physical page allocator.

A specialized allocator called memblock performs the boot time memory management. The architecture specific initialization must set it up in setup_arch() and tear it down in mem_init() functions.

Once the early memory management is available it offers a variety of functions and macros for memory allocations. The allocation request may be directed to the first (and probably the only) node or to a particular node in a NUMA system. There are API variants that panic when an allocation fails and those that don’t.

Memblock also offers a variety of APIs that control its own behaviour.

Memblock Overview

Memblock is a method of managing memory regions during the early boot period when the usual kernel memory allocators are not up and running.

Memblock views the system memory as collections of contiguous regions. There are several types of these collections:

  • memory - describes the physical memory available to the kernel; this may differ from the actual physical memory installed in the system, for instance when the memory is restricted with mem= command line parameter
  • reserved - describes the regions that were allocated
  • physmap - describes the actual physical memory regardless of the possible restrictions; the physmap type is only available on some architectures.

Each region is represented by struct memblock_region that defines the region extents, its attributes and NUMA node id on NUMA systems. Every memory type is described by the struct memblock_type which contains an array of memory regions along with the allocator metadata. The memory types are nicely wrapped with struct memblock. This structure is statically initialzed at build time. The region arrays for the “memory” and “reserved” types are initially sized to INIT_MEMBLOCK_REGIONS and for the “physmap” type to INIT_PHYSMEM_REGIONS. The memblock_allow_resize() enables automatic resizing of the region arrays during addition of new regions. This feature should be used with care so that memory allocated for the region array will not overlap with areas that should be reserved, for example initrd.

The early architecture setup should tell memblock what the physical memory layout is by using memblock_add() or memblock_add_node() functions. The first function does not assign the region to a NUMA node and it is appropriate for UMA systems. Yet, it is possible to use it on NUMA systems as well and assign the region to a NUMA node later in the setup process using memblock_set_node(). The memblock_add_node() performs such an assignment directly.

Once memblock is setup the memory can be allocated using either memblock or bootmem APIs.

As the system boot progresses, the architecture specific mem_init() function frees all the memory to the buddy page allocator.

If an architecure enables CONFIG_ARCH_DISCARD_MEMBLOCK, the memblock data structures will be discarded after the system initialization compltes.

Functions and structures

Here is the description of memblock data structures, functions and macros. Some of them are actually internal, but since they are documented it would be silly to omit them. Besides, reading the descriptions for the internal functions can help to understand what really happens under the hood.

enum memblock_flags

definition of memory region attributes

Constants

MEMBLOCK_NONE
no special request
MEMBLOCK_HOTPLUG
hotpluggable region
MEMBLOCK_MIRROR
mirrored region
MEMBLOCK_NOMAP
don’t add to kernel direct mapping
struct memblock_region

represents a memory region

Definition

struct memblock_region {
  phys_addr_t base;
  phys_addr_t size;
  enum memblock_flags flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP;
  int nid;
#endif;
};

Members

base
physical address of the region
size
size of the region
flags
memory region attributes
nid
NUMA node id
struct memblock_type

collection of memory regions of certain type

Definition

struct memblock_type {
  unsigned long cnt;
  unsigned long max;
  phys_addr_t total_size;
  struct memblock_region *regions;
  char *name;
};

Members

cnt
number of regions
max
size of the allocated array
total_size
size of all regions
regions
array of regions
name
the memory type symbolic name
struct memblock

memblock allocator metadata

Definition

struct memblock {
  bool bottom_up;
  phys_addr_t current_limit;
  struct memblock_type memory;
  struct memblock_type reserved;
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP;
  struct memblock_type physmem;
#endif;
};

Members

bottom_up
is bottom up direction?
current_limit
physical address of the current allocation limit
memory
usabe memory regions
reserved
reserved memory regions
physmem
all physical memory
for_each_mem_range(i, type_a, type_b, nid, flags, p_start, p_end, p_nid)

iterate through memblock areas from type_a and not included in type_b. Or just type_a if type_b is NULL.

Parameters

i
u64 used as loop variable
type_a
ptr to memblock_type to iterate
type_b
ptr to memblock_type which excludes from the iteration
nid
node selector, NUMA_NO_NODE for all nodes
flags
pick from blocks based on memory attributes
p_start
ptr to phys_addr_t for start address of the range, can be NULL
p_end
ptr to phys_addr_t for end address of the range, can be NULL
p_nid
ptr to int for nid of the range, can be NULL
for_each_mem_range_rev(i, type_a, type_b, nid, flags, p_start, p_end, p_nid)

reverse iterate through memblock areas from type_a and not included in type_b. Or just type_a if type_b is NULL.

Parameters

i
u64 used as loop variable
type_a
ptr to memblock_type to iterate
type_b
ptr to memblock_type which excludes from the iteration
nid
node selector, NUMA_NO_NODE for all nodes
flags
pick from blocks based on memory attributes
p_start
ptr to phys_addr_t for start address of the range, can be NULL
p_end
ptr to phys_addr_t for end address of the range, can be NULL
p_nid
ptr to int for nid of the range, can be NULL
for_each_reserved_mem_region(i, p_start, p_end)

iterate over all reserved memblock areas

Parameters

i
u64 used as loop variable
p_start
ptr to phys_addr_t for start address of the range, can be NULL
p_end
ptr to phys_addr_t for end address of the range, can be NULL

Description

Walks over reserved areas of memblock. Available as soon as memblock is initialized.

for_each_mem_pfn_range(i, nid, p_start, p_end, p_nid)

early memory pfn range iterator

Parameters

i
an integer used as loop variable
nid
node selector, MAX_NUMNODES for all nodes
p_start
ptr to ulong for start pfn of the range, can be NULL
p_end
ptr to ulong for end pfn of the range, can be NULL
p_nid
ptr to int for nid of the range, can be NULL

Description

Walks over configured memory ranges.

for_each_free_mem_range(i, nid, flags, p_start, p_end, p_nid)

iterate through free memblock areas

Parameters

i
u64 used as loop variable
nid
node selector, NUMA_NO_NODE for all nodes
flags
pick from blocks based on memory attributes
p_start
ptr to phys_addr_t for start address of the range, can be NULL
p_end
ptr to phys_addr_t for end address of the range, can be NULL
p_nid
ptr to int for nid of the range, can be NULL

Description

Walks over free (memory && !reserved) areas of memblock. Available as soon as memblock is initialized.

for_each_free_mem_range_reverse(i, nid, flags, p_start, p_end, p_nid)

rev-iterate through free memblock areas

Parameters

i
u64 used as loop variable
nid
node selector, NUMA_NO_NODE for all nodes
flags
pick from blocks based on memory attributes
p_start
ptr to phys_addr_t for start address of the range, can be NULL
p_end
ptr to phys_addr_t for end address of the range, can be NULL
p_nid
ptr to int for nid of the range, can be NULL

Description

Walks over free (memory && !reserved) areas of memblock in reverse order. Available as soon as memblock is initialized.

void memblock_set_current_limit(phys_addr_t limit)

Set the current allocation limit to allow limiting allocations to what is currently accessible during boot

Parameters

phys_addr_t limit
New limit value (physical address)
unsigned long memblock_region_memory_base_pfn(const struct memblock_region * reg)

get the lowest pfn of the memory region

Parameters

const struct memblock_region * reg
memblock_region structure

Return

the lowest pfn intersecting with the memory region

unsigned long memblock_region_memory_end_pfn(const struct memblock_region * reg)

get the end pfn of the memory region

Parameters

const struct memblock_region * reg
memblock_region structure

Return

the end_pfn of the reserved region

unsigned long memblock_region_reserved_base_pfn(const struct memblock_region * reg)

get the lowest pfn of the reserved region

Parameters

const struct memblock_region * reg
memblock_region structure

Return

the lowest pfn intersecting with the reserved region

unsigned long memblock_region_reserved_end_pfn(const struct memblock_region * reg)

get the end pfn of the reserved region

Parameters

const struct memblock_region * reg
memblock_region structure

Return

the end_pfn of the reserved region

phys_addr_t __init_memblock __memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align, int nid, enum memblock_flags flags)

find free area utility in bottom-up

Parameters

phys_addr_t start
start of candidate range
phys_addr_t end
end of candidate range, can be MEMBLOCK_ALLOC_ANYWHERE or MEMBLOCK_ALLOC_ACCESSIBLE
phys_addr_t size
size of free area to find
phys_addr_t align
alignment of free area to find
int nid
nid of the free area to find, NUMA_NO_NODE for any node
enum memblock_flags flags
pick from blocks based on memory attributes

Description

Utility called from memblock_find_in_range_node(), find free area bottom-up.

Return

Found address on success, 0 on failure.

phys_addr_t __init_memblock __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align, int nid, enum memblock_flags flags)

find free area utility, in top-down

Parameters

phys_addr_t start
start of candidate range
phys_addr_t end
end of candidate range, can be MEMBLOCK_ALLOC_ANYWHERE or MEMBLOCK_ALLOC_ACCESSIBLE
phys_addr_t size
size of free area to find
phys_addr_t align
alignment of free area to find
int nid
nid of the free area to find, NUMA_NO_NODE for any node
enum memblock_flags flags
pick from blocks based on memory attributes

Description

Utility called from memblock_find_in_range_node(), find free area top-down.

Return

Found address on success, 0 on failure.

phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, phys_addr_t align, phys_addr_t start, phys_addr_t end, int nid, enum memblock_flags flags)

find free area in given range and node

Parameters

phys_addr_t size
size of free area to find
phys_addr_t align
alignment of free area to find
phys_addr_t start
start of candidate range
phys_addr_t end
end of candidate range, can be MEMBLOCK_ALLOC_ANYWHERE or MEMBLOCK_ALLOC_ACCESSIBLE
int nid
nid of the free area to find, NUMA_NO_NODE for any node
enum memblock_flags flags
pick from blocks based on memory attributes

Description

Find size free area aligned to align in the specified range and node.

When allocation direction is bottom-up, the start should be greater than the end of the kernel image. Otherwise, it will be trimmed. The reason is that we want the bottom-up allocation just near the kernel image so it is highly likely that the allocated memory and the kernel will reside in the same node.

If bottom-up allocation failed, will try to allocate memory top-down.

Return

Found address on success, 0 on failure.

phys_addr_t __init_memblock memblock_find_in_range(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align)

find free area in given range

Parameters

phys_addr_t start
start of candidate range
phys_addr_t end
end of candidate range, can be MEMBLOCK_ALLOC_ANYWHERE or MEMBLOCK_ALLOC_ACCESSIBLE
phys_addr_t size
size of free area to find
phys_addr_t align
alignment of free area to find

Description

Find size free area aligned to align in the specified range.

Return

Found address on success, 0 on failure.

void memblock_discard(void)

discard memory and reserved arrays if they were allocated

Parameters

void
no arguments
int __init_memblock memblock_double_array(struct memblock_type * type, phys_addr_t new_area_start, phys_addr_t new_area_size)

double the size of the memblock regions array

Parameters

struct memblock_type * type
memblock type of the regions array being doubled
phys_addr_t new_area_start
starting address of memory range to avoid overlap with
phys_addr_t new_area_size
size of memory range to avoid overlap with

Description

Double the size of the type regions array. If memblock is being used to allocate memory for a new reserved regions array and there is a previously allocated memory range [new_area_start, new_area_start + new_area_size] waiting to be reserved, ensure the memory used by the new array does not overlap.

Return

0 on success, -1 on failure.

void __init_memblock memblock_merge_regions(struct memblock_type * type)

merge neighboring compatible regions

Parameters

struct memblock_type * type
memblock type to scan

Description

Scan type and merge neighboring compatible regions.

void __init_memblock memblock_insert_region(struct memblock_type * type, int idx, phys_addr_t base, phys_addr_t size, int nid, enum memblock_flags flags)

insert new memblock region

Parameters

struct memblock_type * type
memblock type to insert into
int idx
index for the insertion point
phys_addr_t base
base address of the new region
phys_addr_t size
size of the new region
int nid
node id of the new region
enum memblock_flags flags
flags of the new region

Description

Insert new memblock region [base, base + size) into type at idx. type must already have extra room to accommodate the new region.

int __init_memblock memblock_add_range(struct memblock_type * type, phys_addr_t base, phys_addr_t size, int nid, enum memblock_flags flags)

add new memblock region

Parameters

struct memblock_type * type
memblock type to add new region into
phys_addr_t base
base address of the new region
phys_addr_t size
size of the new region
int nid
nid of the new region
enum memblock_flags flags
flags of the new region

Description

Add new memblock region [base, base + size) into type. The new region is allowed to overlap with existing ones - overlaps don’t affect already existing regions. type is guaranteed to be minimal (all neighbouring compatible regions are merged) after the addition.

Return

0 on success, -errno on failure.

int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size, int nid)

add new memblock region within a NUMA node

Parameters

phys_addr_t base
base address of the new region
phys_addr_t size
size of the new region
int nid
nid of the new region

Description

Add new memblock region [base, base + size) to the “memory” type. See memblock_add_range() description for mode details

Return

0 on success, -errno on failure.

int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)

add new memblock region

Parameters

phys_addr_t base
base address of the new region
phys_addr_t size
size of the new region

Description

Add new memblock region [base, base + size) to the “memory” type. See memblock_add_range() description for mode details

Return

0 on success, -errno on failure.

int __init_memblock memblock_isolate_range(struct memblock_type * type, phys_addr_t base, phys_addr_t size, int * start_rgn, int * end_rgn)

isolate given range into disjoint memblocks

Parameters

struct memblock_type * type
memblock type to isolate range for
phys_addr_t base
base of range to isolate
phys_addr_t size
size of range to isolate
int * start_rgn
out parameter for the start of isolated region
int * end_rgn
out parameter for the end of isolated region

Description

Walk type and ensure that regions don’t cross the boundaries defined by [base, base + size). Crossing regions are split at the boundaries, which may create at most two more regions. The index of the first region inside the range is returned in *start_rgn and end in *end_rgn.

Return

0 on success, -errno on failure.

int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)

free boot memory block

Parameters

phys_addr_t base
phys starting address of the boot memory block
phys_addr_t size
size of the boot memory block in bytes

Description

Free boot memory block previously allocated by memblock_alloc_xx() API. The freeing memory will not be released to the buddy allocator.

int __init_memblock memblock_setclr_flag(phys_addr_t base, phys_addr_t size, int set, int flag)

set or clear flag for a memory region

Parameters

phys_addr_t base
base address of the region
phys_addr_t size
size of the region
int set
set or clear the flag
int flag
the flag to udpate

Description

This function isolates region [base, base + size), and sets/clears flag

Return

0 on success, -errno on failure.

int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)

Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG.

Parameters

phys_addr_t base
the base phys addr of the region
phys_addr_t size
the size of the region

Return

0 on success, -errno on failure.

int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)

Clear flag MEMBLOCK_HOTPLUG for a specified region.

Parameters

phys_addr_t base
the base phys addr of the region
phys_addr_t size
the size of the region

Return

0 on success, -errno on failure.

int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)

Mark mirrored memory with flag MEMBLOCK_MIRROR.

Parameters

phys_addr_t base
the base phys addr of the region
phys_addr_t size
the size of the region

Return

0 on success, -errno on failure.

int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)

Mark a memory region with flag MEMBLOCK_NOMAP.

Parameters

phys_addr_t base
the base phys addr of the region
phys_addr_t size
the size of the region

Return

0 on success, -errno on failure.

int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)

Clear flag MEMBLOCK_NOMAP for a specified region.

Parameters

phys_addr_t base
the base phys addr of the region
phys_addr_t size
the size of the region

Return

0 on success, -errno on failure.

void __init_memblock __next_reserved_mem_region(u64 * idx, phys_addr_t * out_start, phys_addr_t * out_end)

next function for for_each_reserved_region()

Parameters

u64 * idx
pointer to u64 loop variable
phys_addr_t * out_start
ptr to phys_addr_t for start address of the region, can be NULL
phys_addr_t * out_end
ptr to phys_addr_t for end address of the region, can be NULL

Description

Iterate over all reserved memory regions.

void __init_memblock __next_mem_range(u64 * idx, int nid, enum memblock_flags flags, struct memblock_type * type_a, struct memblock_type * type_b, phys_addr_t * out_start, phys_addr_t * out_end, int * out_nid)

next function for for_each_free_mem_range() etc.

Parameters

u64 * idx
pointer to u64 loop variable
int nid
node selector, NUMA_NO_NODE for all nodes
enum memblock_flags flags
pick from blocks based on memory attributes
struct memblock_type * type_a
pointer to memblock_type from where the range is taken
struct memblock_type * type_b
pointer to memblock_type which excludes memory from being taken
phys_addr_t * out_start
ptr to phys_addr_t for start address of the range, can be NULL
phys_addr_t * out_end
ptr to phys_addr_t for end address of the range, can be NULL
int * out_nid
ptr to int for nid of the range, can be NULL

Description

Find the first area from *idx which matches nid, fill the out parameters, and update *idx for the next iteration. The lower 32bit of *idx contains index into type_a and the upper 32bit indexes the areas before each region in type_b. For example, if type_b regions look like the following,

0:[0-16), 1:[32-48), 2:[128-130)

The upper 32bit indexes the following regions.

0:[0-0), 1:[16-32), 2:[48-128), 3:[130-MAX)

As both region arrays are sorted, the function advances the two indices in lockstep and returns each intersection.

void __init_memblock __next_mem_range_rev(u64 * idx, int nid, enum memblock_flags flags, struct memblock_type * type_a, struct memblock_type * type_b, phys_addr_t * out_start, phys_addr_t * out_end, int * out_nid)

generic next function for for_each_*:c:func:_range_rev()

Parameters

u64 * idx
pointer to u64 loop variable
int nid
node selector, NUMA_NO_NODE for all nodes
enum memblock_flags flags
pick from blocks based on memory attributes
struct memblock_type * type_a
pointer to memblock_type from where the range is taken
struct memblock_type * type_b
pointer to memblock_type which excludes memory from being taken
phys_addr_t * out_start
ptr to phys_addr_t for start address of the range, can be NULL
phys_addr_t * out_end
ptr to phys_addr_t for end address of the range, can be NULL
int * out_nid
ptr to int for nid of the range, can be NULL

Description

Finds the next range from type_a which is not marked as unsuitable in type_b.

Reverse of __next_mem_range().

int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size, struct memblock_type * type, int nid)

set node ID on memblock regions

Parameters

phys_addr_t base
base of area to set node ID for
phys_addr_t size
size of area to set node ID for
struct memblock_type * type
memblock type to set node ID for
int nid
node ID to set

Description

Set the nid of memblock type regions in [base, base + size) to nid. Regions which cross the area boundaries are split as necessary.

Return

0 on success, -errno on failure.

void * memblock_alloc_internal(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid)

allocate boot memory block

Parameters

phys_addr_t size
size of memory block to be allocated in bytes
phys_addr_t align
alignment of the region and block’s size
phys_addr_t min_addr
the lower bound of the memory region to allocate (phys address)
phys_addr_t max_addr
the upper bound of the memory region to allocate (phys address)
int nid
nid of the free area to find, NUMA_NO_NODE for any node

Description

The min_addr limit is dropped if it can not be satisfied and the allocation will fall back to memory below min_addr. Also, allocation may fall back to any node in the system if the specified node can not hold the requested memory.

The allocation is performed from memory region limited by memblock.current_limit if max_addr == MEMBLOCK_ALLOC_ACCESSIBLE.

The phys address of allocated boot memory block is converted to virtual and allocated memory is reset to 0.

In addition, function sets the min_count to 0 using kmemleak_alloc for allocated boot memory block, so that it is never reported as leaks.

Return

Virtual address of allocated memory block on success, NULL on failure.

void * memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid)

allocate boot memory block without zeroing memory and without panicking

Parameters

phys_addr_t size
size of memory block to be allocated in bytes
phys_addr_t align
alignment of the region and block’s size
phys_addr_t min_addr
the lower bound of the memory region from where the allocation is preferred (phys address)
phys_addr_t max_addr
the upper bound of the memory region from where the allocation is preferred (phys address), or MEMBLOCK_ALLOC_ACCESSIBLE to allocate only from memory limited by memblock.current_limit value
int nid
nid of the free area to find, NUMA_NO_NODE for any node

Description

Public function, provides additional debug information (including caller info), if enabled. Does not zero allocated memory, does not panic if request cannot be satisfied.

Return

Virtual address of allocated memory block on success, NULL on failure.

void * memblock_alloc_try_nid_nopanic(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid)

allocate boot memory block

Parameters

phys_addr_t size
size of memory block to be allocated in bytes
phys_addr_t align
alignment of the region and block’s size
phys_addr_t min_addr
the lower bound of the memory region from where the allocation is preferred (phys address)
phys_addr_t max_addr
the upper bound of the memory region from where the allocation is preferred (phys address), or MEMBLOCK_ALLOC_ACCESSIBLE to allocate only from memory limited by memblock.current_limit value
int nid
nid of the free area to find, NUMA_NO_NODE for any node

Description

Public function, provides additional debug information (including caller info), if enabled. This function zeroes the allocated memory.

Return

Virtual address of allocated memory block on success, NULL on failure.

void * memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid)

allocate boot memory block with panicking

Parameters

phys_addr_t size
size of memory block to be allocated in bytes
phys_addr_t align
alignment of the region and block’s size
phys_addr_t min_addr
the lower bound of the memory region from where the allocation is preferred (phys address)
phys_addr_t max_addr
the upper bound of the memory region from where the allocation is preferred (phys address), or MEMBLOCK_ALLOC_ACCESSIBLE to allocate only from memory limited by memblock.current_limit value
int nid
nid of the free area to find, NUMA_NO_NODE for any node

Description

Public panicking version of memblock_alloc_try_nid_nopanic() which provides debug information (including caller info), if enabled, and panics if the request can not be satisfied.

Return

Virtual address of allocated memory block on success, NULL on failure.

void __memblock_free_late(phys_addr_t base, phys_addr_t size)

free bootmem block pages directly to buddy allocator

Parameters

phys_addr_t base
phys starting address of the boot memory block
phys_addr_t size
size of the boot memory block in bytes

Description

This is only useful when the bootmem allocator has already been torn down, but we are still initializing the system. Pages are released directly to the buddy allocator, no bootmem metadata is updated because it is gone.

bool __init_memblock memblock_is_region_memory(phys_addr_t base, phys_addr_t size)

check if a region is a subset of memory

Parameters

phys_addr_t base
base of region to check
phys_addr_t size
size of region to check

Description

Check if the region [base, base + size) is a subset of a memory block.

Return

0 if false, non-zero if true

bool __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)

check if a region intersects reserved memory

Parameters

phys_addr_t base
base of region to check
phys_addr_t size
size of region to check

Description

Check if the region [base, base + size) intersects a reserved memory block.

Return

True if they intersect, false if not.

unsigned long memblock_free_all(void)

release free pages to the buddy allocator

Parameters

void
no arguments

Return

the number of pages actually released.