Previous Up Next

Appendix E  Boot Memory Allocator

E.1  Initialising the Boot Memory Allocator

The functions in this section are responsible for bootstrapping the boot memory allocator. It starts with the architecture specific function setup_memory() (See Section B.1.1) but all architectures cover the same basic tasks in the architecture specific function before calling the architectur independant function init_bootmem().

E.1.1  Function: init_bootmem

Source: mm/bootmem.c

This is called by UMA architectures to initialise their boot memory allocator structures.

304 unsigned long __init init_bootmem (unsigned long start, 
                               unsigned long pages)
305 {
306       max_low_pfn = pages;
307       min_low_pfn = start;
308       return(init_bootmem_core(&contig_page_data, start, 0, pages));
309 }
304Confusingly, the pages parameter is actually the end PFN of the memory addressable by this node, not the number of pages as the name impies
306Set the max PFN addressable by this node in case the architecture dependent code did not
307Set the min PFN addressable by this node in case the architecture dependent code did not
308Call init_bootmem_core()(See Section E.1.3) which does the real work of initialising the bootmem_data

E.1.2  Function: init_bootmem_node

Source: mm/bootmem.c

This is called by NUMA architectures to initialise boot memory allocator data for a given node.

284 unsigned long __init init_bootmem_node (pg_data_t *pgdat, 
                                  unsigned long freepfn, 
                                  unsigned long startpfn, 
                                  unsigned long endpfn)
285 {
286       return(init_bootmem_core(pgdat, freepfn, startpfn, endpfn));
287 }
286Just call init_bootmem_core()(See Section E.1.3) directly

E.1.3  Function: init_bootmem_core

Source: mm/bootmem.c

Initialises the appropriate struct bootmem_data_t and inserts the node into the linked list of nodes pgdat_list.

 46 static unsigned long __init init_bootmem_core (pg_data_t *pgdat,
 47       unsigned long mapstart, unsigned long start, unsigned long end)
 48 {
 49       bootmem_data_t *bdata = pgdat->bdata;
 50       unsigned long mapsize = ((end - start)+7)/8;
 51 
 52       pgdat->node_next = pgdat_list;
 53       pgdat_list = pgdat;
 54 
 55       mapsize = (mapsize + (sizeof(long) - 1UL)) & 
                    ~(sizeof(long) - 1UL);
 56       bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT);
 57       bdata->node_boot_start = (start << PAGE_SHIFT);
 58       bdata->node_low_pfn = end;
 59 
 60       /*
 61        * Initially all pages are reserved - setup_arch() has to
 62        * register free RAM areas explicitly.
 63        */
 64       memset(bdata->node_bootmem_map, 0xff, mapsize);
 65 
 66       return mapsize;
 67 }
46The parameters are;
pgdat is the node descriptor been initialised
mapstart is the beginning of the memory that will be usable
start is the beginning PFN of the node
end is the end PFN of the node
50Each page requires one bit to represent it so the size of the map required is the number of pages in this node rounded up to the nearest multiple of 8 and then divided by 8 to give the number of bytes required
52-53As the node will be shortly considered initialised, insert it into the global pgdat_list
55Round the mapsize up to the closest word boundary
56Convert the mapstart to a virtual address and store it in bdatanode_bootmem_map
57Convert the starting PFN to a physical address and store it on node_boot_start
58Store the end PFN of ZONE_NORMAL in node_low_pfn
64Fill the full map with 1's marking all pages as allocated. It is up to the architecture dependent code to mark the usable pages

E.2  Allocating Memory

E.2.1  Reserving Large Regions of Memory

E.2.1.1  Function: reserve_bootmem

Source: mm/bootmem.c

311 void __init reserve_bootmem (unsigned long addr, unsigned long size)
312 {
313     reserve_bootmem_core(contig_page_data.bdata, addr, size);
314 }
313Just call reserve_bootmem_core()(See Section E.2.1.3). As this is for a non-NUMA architecture, the node to allocate from is the static contig_page_data node.

E.2.1.2  Function: reserve_bootmem_node

Source: mm/bootmem.c

289 void __init reserve_bootmem_node (pg_data_t *pgdat, 
                    unsigned long physaddr,
                    unsigned long size)
290 {
291     reserve_bootmem_core(pgdat->bdata, physaddr, size);
292 }
291Just call reserve_bootmem_core()(See Section E.2.1.3) passing it the bootmem data of the requested node

E.2.1.3  Function: reserve_bootmem_core

Source: mm/bootmem.c

 74 static void __init reserve_bootmem_core(bootmem_data_t *bdata, 
                        unsigned long addr, 
                        unsigned long size)
 75 {
 76     unsigned long i;
 77     /*
 78      * round up, partially reserved pages are considered
 79      * fully reserved.
 80      */
 81     unsigned long sidx = (addr - bdata->node_boot_start)/PAGE_SIZE;
 82     unsigned long eidx = (addr + size - bdata->node_boot_start + 
 83                PAGE_SIZE-1)/PAGE_SIZE;
 84     unsigned long end = (addr + size + PAGE_SIZE-1)/PAGE_SIZE;
 85 
 86     if (!size) BUG();
 87 
 88     if (sidx < 0)
 89         BUG();
 90     if (eidx < 0)
 91         BUG();
 92     if (sidx >= eidx)
 93         BUG();
 94     if ((addr >> PAGE_SHIFT) >= bdata->node_low_pfn)
 95         BUG();
 96     if (end > bdata->node_low_pfn)
 97         BUG();
 98     for (i = sidx; i < eidx; i++)
 99         if (test_and_set_bit(i, bdata->node_bootmem_map))
100             printk("hm, page %08lx reserved twice.\n",
                   i*PAGE_SIZE);
101 }
81The sidx is the starting index to serve pages from. The value is obtained by subtracting the starting address from the requested address and dividing by the size of a page
82A similar calculation is made for the ending index eidx except that the allocation is rounded up to the nearest page. This means that requests to partially reserve a page will result in the full page being reserved
84end is the last PFN that is affected by this reservation
86Check that a non-zero value has been given
88-89Check the starting index is not before the start of the node
90-91Check the end index is not before the start of the node
92-93Check the starting index is not after the end index
94-95Check the starting address is not beyond the memory this bootmem node represents
96-97Check the ending address is not beyond the memory this bootmem node represents
88-100Starting with sidx and finishing with eidx, test and set the bit in the bootmem map that represents the page marking it as allocated. If the bit was already set to 1, print out a message saying it was reserved twice

E.2.2  Allocating Memory at Boot Time

E.2.2.1  Function: alloc_bootmem

Source: mm/bootmem.c

The callgraph for these macros is shown in Figure 5.1.

 38 #define alloc_bootmem(x) \
 39     __alloc_bootmem((x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
 40 #define alloc_bootmem_low(x) \
 41     __alloc_bootmem((x), SMP_CACHE_BYTES, 0)
 42 #define alloc_bootmem_pages(x) \
 43     __alloc_bootmem((x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
 44 #define alloc_bootmem_low_pages(x) \
 45     __alloc_bootmem((x), PAGE_SIZE, 0)
39alloc_bootmem() will align to the L1 hardware cache and start searching for a page after the maximum address usable for DMA
40alloc_bootmem_low() will align to the L1 hardware cache and start searching from page 0
42alloc_bootmem_pages() will align the allocation to a page size so that full pages will be allocated starting from the maximum address usable for DMA
44alloc_bootmem_pages() will align the allocation to a page size so that full pages will be allocated starting from physical address 0

E.2.2.2  Function: __alloc_bootmem

Source: mm/bootmem.c

326 void * __init __alloc_bootmem (unsigned long size, 
                   unsigned long align, unsigned long goal)
327 {
328     pg_data_t *pgdat;
329     void *ptr;
330 
331     for_each_pgdat(pgdat)
332         if ((ptr = __alloc_bootmem_core(pgdat->bdata, size,
333                         align, goal)))
334             return(ptr);
335 
336     /*
337      * Whoops, we cannot satisfy the allocation request.
338      */
339     printk(KERN_ALERT "bootmem alloc of %lu bytes failed!\n", size);
340     panic("Out of memory");
341     return NULL;
342 }
326The parameters are;
size is the size of the requested allocation
align is the desired alignment and must be a power of 2. Currently either SMP_CACHE_BYTES or PAGE_SIZE
goal is the starting address to begin searching from
331-334Cycle through all available nodes and try allocating from each in turn. In the UMA case, this will just allocate from the contig_page_data node
349-340If the allocation fails, the system is not going to be able to boot so the kernel panics

E.2.2.3  Function: alloc_bootmem_node

Source: mm/bootmem.c

 53 #define alloc_bootmem_node(pgdat, x) \
 54     __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES,
                 __pa(MAX_DMA_ADDRESS))
 55 #define alloc_bootmem_pages_node(pgdat, x) \
 56     __alloc_bootmem_node((pgdat), (x), PAGE_SIZE,
                 __pa(MAX_DMA_ADDRESS))
 57 #define alloc_bootmem_low_pages_node(pgdat, x) \
 58     __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, 0)
53-54alloc_bootmem_node() will allocate from the requested node and align to the L1 hardware cache and start searching for a page beginning with ZONE_NORMAL (i.e. at the end of ZONE_DMA which is at MAX_DMA_ADDRESS)
55-56alloc_bootmem_pages() will allocate from the requested node and align the allocation to a page size so that full pages will be allocated starting from the ZONE_NORMAL
57-58alloc_bootmem_pages() will allocate from the requested node and align the allocation to a page size so that full pages will be allocated starting from physical address 0 so that ZONE_DMA will be used

E.2.2.4  Function: __alloc_bootmem_node

Source: mm/bootmem.c

344 void * __init __alloc_bootmem_node (pg_data_t *pgdat, 
                    unsigned long size, 
                    unsigned long align, 
                    unsigned long goal)
345 {
346     void *ptr;
347 
348     ptr = __alloc_bootmem_core(pgdat->bdata, size, align, goal);
349     if (ptr)
350         return (ptr);
351 
352     /*
353      * Whoops, we cannot satisfy the allocation request.
354      */
355     printk(KERN_ALERT "bootmem alloc of %lu bytes failed!\n", size);
356     panic("Out of memory");
357     return NULL;
358 }
344The parameters are the same as for __alloc_bootmem_node() (See Section E.2.2.4) except the node to allocate from is specified
348Call the core function __alloc_bootmem_core() (See Section E.2.2.5) to perform the allocation
349-350Return a pointer if it was successful
355-356Otherwise print out a message and panic the kernel as the system will not boot if memory can not be allocated even now

E.2.2.5  Function: __alloc_bootmem_core

Source: mm/bootmem.c

This is the core function for allocating memory from a specified node with the boot memory allocator. It is quite large and broken up into the following tasks;

144 static void * __init __alloc_bootmem_core (bootmem_data_t *bdata, 
145     unsigned long size, unsigned long align, unsigned long goal)
146 {
147     unsigned long i, start = 0;
148     void *ret;
149     unsigned long offset, remaining_size;
150     unsigned long areasize, preferred, incr;
151     unsigned long eidx = bdata->node_low_pfn - 
152                (bdata->node_boot_start >> PAGE_SHIFT);
153 
154     if (!size) BUG();
155 
156     if (align & (align-1))
157         BUG();
158 
159     offset = 0;
160     if (align &&
161         (bdata->node_boot_start & (align - 1UL)) != 0)
162         offset = (align - (bdata->node_boot_start & 
                    (align - 1UL)));
163     offset >>= PAGE_SHIFT;

Function preamble, make sure the parameters are sane

144The parameters are;
bdata is the bootmem for the struct being allocated from
size is the size of the requested allocation
align is the desired alignment for the allocation. Must be a power of 2
goal is the preferred address to allocate above if possible
151Calculate the ending bit index eidx which returns the highest page index that may be used for the allocation
154Call BUG() if a request size of 0 is specified
156-156If the alignment is not a power of 2, call BUG()
159The default offset for alignments is 0
160If an alignment has been specified and...
161And the requested alignment is the same alignment as the start of the node then calculate the offset to use
162The offset to use is the requested alignment masked against the lower bits of the starting address. In reality, this offset will likely be identical to align for the prevalent values of align
169     if (goal && (goal >= bdata->node_boot_start) && 
170             ((goal >> PAGE_SHIFT) < bdata->node_low_pfn)) {
171         preferred = goal - bdata->node_boot_start;
172     } else
173         preferred = 0;
174 
175     preferred = ((preferred + align - 1) & ~(align - 1)) 
                >> PAGE_SHIFT;
176     preferred += offset;
177     areasize = (size+PAGE_SIZE-1)/PAGE_SIZE;
178     incr = align >> PAGE_SHIFT ? : 1;

Calculate the starting PFN to start scanning from based on the goal parameter.

169If a goal has been specified and the goal is after the starting address for this node and the PFN of the goal is less than the last PFN adressable by this node then ....
170The preferred offset to start from is the goal minus the beginning of the memory addressable by this node
173Else the preferred offset is 0
175-176Adjust the preferred address to take the offset into account so that the address will be correctly aligned
177The number of pages that will be affected by this allocation is stored in areasize
178incr is the number of pages that have to be skipped to satisify alignment requirements if they are over one page
179 
180 restart_scan:
181     for (i = preferred; i < eidx; i += incr) {
182         unsigned long j;
183         if (test_bit(i, bdata->node_bootmem_map))
184             continue;
185         for (j = i + 1; j < i + areasize; ++j) {
186             if (j >= eidx)
187                 goto fail_block;
188             if (test_bit (j, bdata->node_bootmem_map))
189                 goto fail_block;
190         }
191         start = i;
192         goto found;
193     fail_block:;
194     }
195     if (preferred) {
196         preferred = offset;
197         goto restart_scan;
198     }
199     return NULL;

Scan through memory looking for a block large enough to satisfy this request

180If the allocation could not be satisifed starting from goal, this label is jumped to so that the map will be rescanned
181-194Starting from preferred, scan lineraly searching for a free block large enough to satisfy the request. Walk the address space in incr steps to satisfy alignments greater than one page. If the alignment is less than a page, incr will just be 1
183-184Test the bit, if it is already 1, it is not free so move to the next page
185-190Scan the next areasize number of pages and see if they are also free. It fails if the end of the addressable space is reached (eidx) or one of the pages is already in use
191-192A free block is found so record the start and jump to the found block
195-198The allocation failed so start again from the beginning
199If that also failed, return NULL which will result in a kernel panic
200 found:
201     if (start >= eidx)
202         BUG();
203 
209     if (align <= PAGE_SIZE
210         && bdata->last_offset && bdata->last_pos+1 == start) {
211         offset = (bdata->last_offset+align-1) & ~(align-1);
212         if (offset > PAGE_SIZE)
213             BUG();
214         remaining_size = PAGE_SIZE-offset;
215         if (size < remaining_size) {
216             areasize = 0;
217             // last_pos unchanged
218             bdata->last_offset = offset+size;
219             ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset +
220                         bdata->node_boot_start);
221         } else {
222             remaining_size = size - remaining_size;
223             areasize = (remaining_size+PAGE_SIZE-1)/PAGE_SIZE;
224             ret = phys_to_virt(bdata->last_pos*PAGE_SIZE +
225                         offset + 
                            bdata->node_boot_start);
226             bdata->last_pos = start+areasize-1;
227             bdata->last_offset = remaining_size;
228         }
229         bdata->last_offset &= ~PAGE_MASK;
230     } else {
231         bdata->last_pos = start + areasize - 1;
232         bdata->last_offset = size & ~PAGE_MASK;
233         ret = phys_to_virt(start * PAGE_SIZE +
                     bdata->node_boot_start);
234     }

Test to see if this allocation may be merged with the previous allocation.

201-202Check that the start of the allocation is not after the addressable memory. This check was just made so it is redundent
209-230Try and merge with the previous allocation if the alignment is less than a PAGE_SIZE, the previously page has space in it (last_offset != 0) and that the previously used page is adjactent to the page found for this allocation
231-234Else record the pages and offset used for this allocation to be used for merging with the next allocation
211Update the offset to use to be aligned correctly for the requested align
212-213If the offset now goes over the edge of a page, BUG() is called. This condition would require a very poor choice of alignment to be used. As the only alignment commonly used is a factor of PAGE_SIZE, it is impossible for normal usage
214remaining_size is the remaining free space in the previously used page
215-221If there is enough space left in the old page then use the old page totally and update the bootmem_data struct to reflect it
221-228Else calculate how many pages in addition to this one will be required and update the bootmem_data
216The number of pages used by this allocation is now 0
218Update the last_offset to be the end of this allocation
219Calculate the virtual address to return for the successful allocation
222remaining_size is how space will be used in the last page used to satisfy the allocation
223Calculate how many more pages are needed to satisfy the allocation
224Record the address the allocation starts from
226The last page used is the start page plus the number of additional pages required to satisfy this allocation areasize
227The end of the allocation has already been calculated
229If the offset is at the end of the page, make it 0
231No merging took place so record the last page used to satisfy this allocation
232Record how much of the last page was used
233Record the starting virtual address of the allocation
238     for (i = start; i < start+areasize; i++)
239         if (test_and_set_bit(i, bdata->node_bootmem_map))
240             BUG();
241     memset(ret, 0, size);
242     return ret;
243 }

Mark the pages allocated as 1 in the bitmap and zero out the contents of the pages

238-240Cycle through all pages used for this allocation and set the bit to 1 in the bitmap. If any of them are already 1, then a double allocation took place so call BUG()
241Zero fill the pages
242Return the address of the allocation

E.3  Freeing Memory

E.3.1  Function: free_bootmem

Source: mm/bootmem.c


Figure E.1: Call Graph: free_bootmem()

294 void __init free_bootmem_node (pg_data_t *pgdat, 
                           unsigned long physaddr, unsigned long size)
295 {
296       return(free_bootmem_core(pgdat->bdata, physaddr, size));
297 }

316 void __init free_bootmem (unsigned long addr, unsigned long size)
317 {
318       return(free_bootmem_core(contig_page_data.bdata, addr, size));
319 }
296Call the core function with the corresponding bootmem data for the requested node
318Call the core function with the bootmem data for contig_page_data

E.3.2  Function: free_bootmem_core

Source: mm/bootmem.c

103 static void __init free_bootmem_core(bootmem_data_t *bdata, 
                               unsigned long addr, 
                               unsigned long size)
104 {
105       unsigned long i;
106       unsigned long start;
111       unsigned long sidx;
112       unsigned long eidx = (addr + size -
                          bdata->node_boot_start)/PAGE_SIZE;
113       unsigned long end = (addr + size)/PAGE_SIZE;
114 
115       if (!size) BUG();
116       if (end > bdata->node_low_pfn)
117             BUG();
118 
119       /*
120        * Round up the beginning of the address.
121        */
122       start = (addr + PAGE_SIZE-1) / PAGE_SIZE;
123       sidx = start - (bdata->node_boot_start/PAGE_SIZE);
124 
125       for (i = sidx; i < eidx; i++) {
126             if (!test_and_clear_bit(i, bdata->node_bootmem_map))
127                   BUG();
128       }
129 }
112Calculate the end index affected as eidx
113The end address is the end of the affected area rounded down to the nearest page if it is not already page aligned
115If a size of 0 is freed, call BUG
116-117If the end PFN is after the memory addressable by this node, call BUG
122Round the starting address up to the nearest page if it is not already page aligned
123Calculate the starting index to free
125-127For all full pages that are freed by this action, clear the bit in the boot bitmap. If it is already 0, it is a double free or is memory that was never used so call BUG

E.4  Retiring the Boot Memory Allocator

Once the system is started, the boot memory allocator is no longer needed so these functions are responsible for removing unnecessary boot memory allocator structures and passing the remaining pages to the normal physical page allocator.

E.4.1  Function: mem_init

Source: arch/i386/mm/init.c

The call graph for this function is shown in Figure 5.2. The important part of this function for the boot memory allocator is that it calls free_pages_init()(See Section E.4.2). The function is broken up into the following tasks

507 void __init mem_init(void)
508 {
509     int codesize, reservedpages, datasize, initsize;
510 
511     if (!mem_map)
512         BUG();
513     
514     set_max_mapnr_init();
515 
516     high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
517 
518     /* clear the zero-page */
519     memset(empty_zero_page, 0, PAGE_SIZE);
514This function records the PFN high memory starts in mem_map (highmem_start_page), the maximum number of pages in the system (max_mapnr and num_physpages) and finally the maximum number of pages that may be mapped by the kernel (num_mappedpages)
516high_memory is the virtual address where high memory begins
519Zero out the system wide zero page
520 
521     reservedpages = free_pages_init();
522 
512Call free_pages_init()(See Section E.4.2) which tells the boot memory allocator to retire itself as well as initialising all pages in high memory for use with the buddy allocator
523     codesize =  (unsigned long) &_etext - (unsigned long) &_text;
524     datasize =  (unsigned long) &_edata - (unsigned long) &_etext;
525     initsize =  (unsigned long) &__init_end - (unsigned long)
                            &__init_begin;
526 
527     printk(KERN_INFO "Memory: %luk/%luk available (%dk kernel code, 
            %dk reserved, %dk data, %dk init, %ldk highmem)\n",
528         (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
529         max_mapnr << (PAGE_SHIFT-10),
530         codesize >> 10,
531         reservedpages << (PAGE_SHIFT-10),
532         datasize >> 10,
533         initsize >> 10,
534         (unsigned long) (totalhigh_pages << (PAGE_SHIFT-10))
535        );

Print out an informational message

523Calculate the size of the code segment, data segment and memory used by initialisation code and data (all functions marked __init will be in this section)
527-535Print out a nice message on how the availability of memory and the amount of memory consumed by the kernel
536 
537 #if CONFIG_X86_PAE
538     if (!cpu_has_pae)
539         panic("cannot execute a PAE-enabled kernel on a PAE-less
CPU!");
540 #endif
541     if (boot_cpu_data.wp_works_ok < 0)
542         test_wp_bit();
543 
538-539If PAE is enabled but the processor does not support it, panic
541-542Test for the availability of the WP bit
550 #ifndef CONFIG_SMP
551     zap_low_mappings();
552 #endif
553 
554 }
551Cycle through each PGD used by the userspace portion of swapper_pg_dir and map the zero page to it

E.4.2  Function: free_pages_init

Source: arch/i386/mm/init.c

This function has two important functions, to call free_all_bootmem() (See Section E.4.4) to retire the boot memory allocator and to free all high memory pages to the buddy allocator.

481 static int __init free_pages_init(void)
482 {
483     extern int ppro_with_ram_bug(void);
484     int bad_ppro, reservedpages, pfn;
485 
486     bad_ppro = ppro_with_ram_bug();
487 
488     /* this will put all low memory onto the freelists */
489     totalram_pages += free_all_bootmem();
490 
491     reservedpages = 0;
492     for (pfn = 0; pfn < max_low_pfn; pfn++) {
493         /*
494          * Only count reserved RAM pages
495          */
496         if (page_is_ram(pfn) && PageReserved(mem_map+pfn))
497             reservedpages++;
498     }
499 #ifdef CONFIG_HIGHMEM
500     for (pfn = highend_pfn-1; pfn >= highstart_pfn; pfn--)
501         one_highpage_init((struct page *) (mem_map + pfn), pfn,
bad_ppro);
502     totalram_pages += totalhigh_pages;
503 #endif
504     return reservedpages;
505 }
486There is a bug in the Pentium Pros that prevent certain pages in high memory being used. The function ppro_with_ram_bug() checks for its existance
489Call free_all_bootmem() to retire the boot memory allocator
491-498Cycle through all of memory and count the number of reserved pages that were left over by the boot memory allocator
500-501For each page in high memory, call one_highpage_init() (See Section E.4.3). This function clears the PG_reserved bit, sets the PG_high bit, sets the count to 1, calls __free_pages() to give the page to the buddy allocator and increments the totalhigh_pages count. Pages which kill buggy Pentium Pro's are skipped

E.4.3  Function: one_highpage_init

Source: arch/i386/mm/init.c

This function initialises the information for one page in high memory and checks to make sure that the page will not trigger a bug with some Pentium Pros. It only exists if CONFIG_HIGHMEM is specified at compile time.

449 #ifdef CONFIG_HIGHMEM
450 void __init one_highpage_init(struct page *page, int pfn, 
                                  int bad_ppro)
451 {
452     if (!page_is_ram(pfn)) {
453         SetPageReserved(page);
454         return;
455     }
456         
457     if (bad_ppro && page_kills_ppro(pfn)) {
458         SetPageReserved(page);
459         return;
460     }
461         
462     ClearPageReserved(page);
463     set_bit(PG_highmem, &page->flags);
464     atomic_set(&page->count, 1);
465     __free_page(page);
466     totalhigh_pages++;
467 }
468 #endif /* CONFIG_HIGHMEM */
452-455If a page does not exist at the PFN, then mark the struct page as reserved so it will not be used
457-460If the running CPU is susceptible to the Pentium Pro bug and this page is a page that would cause a crash (page_kills_ppro() performs the check), then mark the page as reserved so it will never be allocated
462From here on, the page is a high memory page that should be used so first clear the reserved bit so it will be given to the buddy allocator later
463Set the PG_highmem bit to show it is a high memory page
464Initialise the usage count of the page to 1 which will be set to 0 by the buddy allocator
465Free the page with __free_page()(See Section F.4.2) so that the buddy allocator will add the high memory page to it's free lists
466Increment the total number of available high memory pages (totalhigh_pages)

E.4.4  Function: free_all_bootmem

Source: mm/bootmem.c

299 unsigned long __init free_all_bootmem_node (pg_data_t *pgdat)
300 {
301     return(free_all_bootmem_core(pgdat));
302 }

321 unsigned long __init free_all_bootmem (void)
322 {
323     return(free_all_bootmem_core(&contig_page_data));
324 }
299-302For NUMA, simply call the core function with the specified pgdat
321-324For UMA, call the core function with the only node contig_page_data

E.4.5  Function: free_all_bootmem_core

Source: mm/bootmem.c

This is the core function which “retires” the boot memory allocator. It is divided into two major tasks

245 static unsigned long __init free_all_bootmem_core(pg_data_t *pgdat)
246 {
247     struct page *page = pgdat->node_mem_map;
248     bootmem_data_t *bdata = pgdat->bdata;
249     unsigned long i, count, total = 0;
250     unsigned long idx;
251 
252     if (!bdata->node_bootmem_map) BUG();
253 
254     count = 0;
255     idx = bdata->node_low_pfn - 
              (bdata->node_boot_start >> PAGE_SHIFT);
256     for (i = 0; i < idx; i++, page++) {
257         if (!test_bit(i, bdata->node_bootmem_map)) {
258             count++;
259             ClearPageReserved(page);
260             set_page_count(page, 1);
261             __free_page(page);
262         }
263     }
264     total += count;
252If no map is available, it means that this node has already been freed and something woeful is wrong with the architecture dependent code so call BUG()
254A running count of the number of pages given to the buddy allocator
255idx is the last index that is addressable by this node
256-263Cycle through all pages addressable by this node
257If the page is marked free then...
258Increase the running count of pages given to the buddy allocator
259Clear the PG_reserved flag
260Set the count to 1 so that the buddy allocator will think this is the last user of the page and place it in its free lists
261Call the buddy allocator free function so the page will be added to it's free lists
264total will come the total number of pages given over by this function
270     page = virt_to_page(bdata->node_bootmem_map);
271     count = 0;
272     for (i = 0; 
        i < ((bdata->node_low_pfn - (bdata->node_boot_start >> PAGE_SHIFT)
                          )/8 + PAGE_SIZE-1)/PAGE_SIZE; 
        i++,page++) {
273         count++;
274         ClearPageReserved(page);
275         set_page_count(page, 1);
276         __free_page(page);
277     }
278     total += count;
279     bdata->node_bootmem_map = NULL;
280 
281     return total;
282 }

Free the allocator bitmap and return

270Get the struct page that is at the beginning of the bootmem map
271Count of pages freed by the bitmap
272-277For all pages used by the bitmap, free them to the buddy allocator the same way the previous block of code did
279Set the bootmem map to NULL to prevent it been freed a second time by accident
281Return the total number of pages freed by this function, or in other words, return the number of pages that were added to the buddy allocator's free lists


Previous Up Next