Previous Up Next

Appendix H  Slab Allocator

H.1  Cache Manipulation

H.1.1  Cache Creation

H.1.1.1  Function: kmem_cache_create

Source: mm/slab.c

The call graph for this function is shown in 8.3. This function is responsible for the creation of a new cache and will be dealt with in chunks due to its size. The chunks roughly are;

621 kmem_cache_t *
622 kmem_cache_create (const char *name, size_t size, 
623     size_t offset, unsigned long flags, 
        void (*ctor)(void*, kmem_cache_t *, unsigned long),
624     void (*dtor)(void*, kmem_cache_t *, unsigned long))
625 {
626     const char *func_nm = KERN_ERR "kmem_create: ";
627     size_t left_over, align, slab_size;
628     kmem_cache_t *cachep = NULL;
629 
633     if ((!name) ||
634         ((strlen(name) >= CACHE_NAMELEN - 1)) ||
635         in_interrupt() ||
636         (size < BYTES_PER_WORD) ||
637         (size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
638         (dtor && !ctor) ||
639         (offset < 0 || offset > size))
640             BUG();
641 

Perform basic sanity checks for bad usage

622The parameters of the function are
name The human readable name of the cache
size The size of an object
offset This is used to specify a specific alignment for objects in the cache but it usually left as 0
flags Static cache flags
ctor A constructor function to call for each object during slab creation
dtor The corresponding destructor function. It is expected the destructor function leaves an object in an initialised state
633-640 These are all serious usage bugs that prevent the cache even attempting to create
634If the human readable name is greater than the maximum size for a cache name (CACHE_NAMELEN)
635An interrupt handler cannot create a cache as access to interrupt-safe spinlocks and semaphores are needed
636The object size must be at least a word in size. The slab allocator is not suitable for objects whose size is measured in individual bytes
637The largest possible slab that can be created is 2MAX_OBJ_ORDER number of pages which provides 32 pages
638A destructor cannot be used if no constructor is available
639The offset cannot be before the slab or beyond the boundary of the first page
640Call BUG() to exit
642 #if DEBUG
643     if ((flags & SLAB_DEBUG_INITIAL) && !ctor) {
645         printk("%sNo con, but init state check 
                requested - %s\n", func_nm, name);
646         flags &= ~SLAB_DEBUG_INITIAL;
647     }
648 
649     if ((flags & SLAB_POISON) && ctor) {
651         printk("%sPoisoning requested, but con given - %s\n",
                                                  func_nm, name);
652         flags &= ~SLAB_POISON;
653     }
654 #if FORCED_DEBUG
655     if ((size < (PAGE_SIZE>>3)) && 
        !(flags & SLAB_MUST_HWCACHE_ALIGN))
660         flags |= SLAB_RED_ZONE;
661     if (!ctor)
662         flags |= SLAB_POISON;
663 #endif
664 #endif
670     BUG_ON(flags & ~CREATE_MASK);

This block performs debugging checks if CONFIG_SLAB_DEBUG is set

643-646The flag SLAB_DEBUG_INITIAL requests that the constructor check the objects to make sure they are in an initialised state. For this, a constructor must exist. If it does not, the flag is cleared
649-653A slab can be poisoned with a known pattern to make sure an object wasn't used before it was allocated but a constructor would ruin this pattern falsely reporting a bug. If a constructor exists, remove the SLAB_POISON flag if set
655-660Only small objects will be red zoned for debugging. Red zoning large objects would cause severe fragmentation
661-662If there is no constructor, set the poison bit
670The CREATE_MASK is set with all the allowable flags kmem_cache_create() (See Section H.1.1.1) can be called with. This prevents callers using debugging flags when they are not available and BUG()s it instead
673     cachep = 
           (kmem_cache_t *) kmem_cache_alloc(&cache_cache,
                         SLAB_KERNEL);
674     if (!cachep)
675         goto opps;
676     memset(cachep, 0, sizeof(kmem_cache_t));

Allocate a kmem_cache_t from the cache_cache slab cache.

673Allocate a cache descriptor object from the cache_cache with kmem_cache_alloc() (See Section H.3.2.1)
674-675If out of memory goto opps which handles the oom situation
676Zero fill the object to prevent surprises with uninitialised data
682     if (size & (BYTES_PER_WORD-1)) {
683         size += (BYTES_PER_WORD-1);
684         size &= ~(BYTES_PER_WORD-1);
685         printk("%sForcing size word alignment 
               - %s\n", func_nm, name);
686     }
687
688 #if DEBUG
689     if (flags & SLAB_RED_ZONE) {
694         flags &= ~SLAB_HWCACHE_ALIGN;
695         size += 2*BYTES_PER_WORD;
696     }
697 #endif
698     align = BYTES_PER_WORD;
699     if (flags & SLAB_HWCACHE_ALIGN)
700         align = L1_CACHE_BYTES;
701
703     if (size >= (PAGE_SIZE>>3))
708         flags |= CFLGS_OFF_SLAB;
709 
710     if (flags & SLAB_HWCACHE_ALIGN) {
714         while (size < align/2)
715             align /= 2;
716         size = (size+align-1)&(~(align-1));
717     }

Align the object size to some word-sized boundary.

682If the size is not aligned to the size of a word then...
683-684Increase the object by the size of a word then mask out the lower bits, this will effectively round the object size up to the next word boundary
685Print out an informational message for debugging purposes
688-697If debugging is enabled then the alignments have to change slightly
694Do not bother trying to align things to the hardware cache if the slab will be red zoned. The red zoning of the object is going to offset it by moving the object one word away from the cache boundary
695The size of the object increases by two BYTES_PER_WORD to store the red zone mark at either end of the object
698Initialise the alignment to be to a word boundary. This will change if the caller has requested a CPU cache alignment
699-700If requested, align the objects to the L1 CPU cache
703If the objects are large, store the slab descriptors off-slab. This will allow better packing of objects into the slab
710If hardware cache alignment is requested, the size of the objects must be adjusted to align themselves to the hardware cache
714-715Try and pack objects into one cache line if they fit while still keeping the alignment. This is important to arches (e.g. Alpha or Pentium 4) with large L1 cache bytes. align will be adjusted to be the smallest that will give hardware cache alignment. For machines with large L1 cache lines, two or more small objects may fit into each line. For example, two objects from the size-32 cache will fit on one cache line from a Pentium 4
716Round the cache size up to the hardware cache alignment
724     do {
725         unsigned int break_flag = 0;
726 cal_wastage:
727         kmem_cache_estimate(cachep->gfporder, 
                    size, flags,
728                     &left_over, 
                    &cachep->num);
729         if (break_flag)
730             break;
731         if (cachep->gfporder >= MAX_GFP_ORDER)
732             break;
733         if (!cachep->num)
734             goto next;
735         if (flags & CFLGS_OFF_SLAB && 
            cachep->num > offslab_limit) {
737             cachep->gfporder--;
738             break_flag++;
739             goto cal_wastage;
740         }
741 
746         if (cachep->gfporder >= slab_break_gfp_order)
747             break;
748 
749         if ((left_over*8) <= (PAGE_SIZE<<cachep->gfporder))
750             break;  
751 next:
752         cachep->gfporder++;
753     } while (1);
754
755     if (!cachep->num) {
756         printk("kmem_cache_create: couldn't 
                create cache %s.\n", name);
757         kmem_cache_free(&cache_cache, cachep);
758         cachep = NULL;
759         goto opps;
760     }

Calculate how many objects will fit on a slab and adjust the slab size as necessary

727-728kmem_cache_estimate() (see Section H.1.2.1) calculates the number of objects that can fit on a slab at the current gfp order and what the amount of leftover bytes will be
729-730The break_flag is set if the number of objects fitting on the slab exceeds the number that can be kept when offslab slab descriptors are used
731-732The order number of pages used must not exceed MAX_GFP_ORDER (5)
733-734If even one object didn't fill, goto next: which will increase the gfporder used for the cache
735If the slab descriptor is kept off-cache but the number of objects exceeds the number that can be tracked with bufctl's off-slab then ...
737Reduce the order number of pages used
738Set the break_flag so the loop will exit
739Calculate the new wastage figures
746-747The slab_break_gfp_order is the order to not exceed unless 0 objects fit on the slab. This check ensures the order is not exceeded
749-759This is a rough check for internal fragmentation. If the wastage as a fraction of the total size of the cache is less than one eight, it is acceptable
752If the fragmentation is too high, increase the gfp order and recalculate the number of objects that can be stored and the wastage
755If after adjustments, objects still do not fit in the cache, it cannot be created
757-758Free the cache descriptor and set the pointer to NULL
758Goto opps which simply returns the NULL pointer
761     slab_size = L1_CACHE_ALIGN(
              cachep->num*sizeof(kmem_bufctl_t) + 
              sizeof(slab_t));
762 
767     if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
768         flags &= ~CFLGS_OFF_SLAB;
769         left_over -= slab_size;
770     }

Align the slab size to the hardware cache

761slab_size is the total size of the slab descriptor not the size of the slab itself. It is the size slab_t struct and the number of objects * size of the bufctl
767-769If there is enough left over space for the slab descriptor and it was specified to place the descriptor off-slab, remove the flag and update the amount of left_over bytes there is. This will impact the cache colouring but with the large objects associated with off-slab descriptors, this is not a problem
773     offset += (align-1);
774     offset &= ~(align-1);
775     if (!offset)
776         offset = L1_CACHE_BYTES;
777     cachep->colour_off = offset;
778     cachep->colour = left_over/offset;

Calculate colour offsets.

773-774offset is the offset within the page the caller requested. This will make sure the offset requested is at the correct alignment for cache usage
775-776If somehow the offset is 0, then set it to be aligned for the CPU cache
777This is the offset to use to keep objects on different cache lines. Each slab created will be given a different colour offset
778This is the number of different offsets that can be used
781     if (!cachep->gfporder && !(flags & CFLGS_OFF_SLAB))
782         flags |= CFLGS_OPTIMIZE;
783 
784     cachep->flags = flags;
785     cachep->gfpflags = 0;
786     if (flags & SLAB_CACHE_DMA)
787         cachep->gfpflags |= GFP_DMA;
788     spin_lock_init(&cachep->spinlock);
789     cachep->objsize = size;
790     INIT_LIST_HEAD(&cachep->slabs_full);
791     INIT_LIST_HEAD(&cachep->slabs_partial);
792     INIT_LIST_HEAD(&cachep->slabs_free);
793 
794     if (flags & CFLGS_OFF_SLAB)
795         cachep->slabp_cache =
               kmem_find_general_cachep(slab_size,0);
796     cachep->ctor = ctor;
797     cachep->dtor = dtor;
799     strcpy(cachep->name, name);
800 
801 #ifdef CONFIG_SMP
802     if (g_cpucache_up)
803         enable_cpucache(cachep);
804 #endif

Initialise remaining fields in cache descriptor

781-782For caches with slabs of only 1 page, the CFLGS_OPTIMIZE flag is set. In reality it makes no difference as the flag is unused
784Set the cache static flags
785Zero out the gfpflags. Defunct operation as memset() after the cache descriptor was allocated would do this
786-787If the slab is for DMA use, set the GFP_DMA flag so the buddy allocator will use ZONE_DMA
788Initialise the spinlock for access the cache
789Copy in the object size, which now takes hardware cache alignment if necessary
790-792Initialise the slab lists
794-795If the descriptor is kept off-slab, allocate a slab manager and place it for use in slabp_cache. See Section H.2.1.2
796-797Set the pointers to the constructor and destructor functions
799Copy in the human readable name
802-803If per-cpu caches are enabled, create a set for this cache. See Section 8.5
806     down(&cache_chain_sem);
807     {
808         struct list_head *p;
809 
810         list_for_each(p, &cache_chain) {
811             kmem_cache_t *pc = list_entry(p, 
                    kmem_cache_t, next);
812 
814             if (!strcmp(pc->name, name))
815                 BUG();
816         }
817     }
818 
822     list_add(&cachep->next, &cache_chain);
823     up(&cache_chain_sem);
824 opps:
825     return cachep;
826 }

Add the new cache to the cache chain

806Acquire the semaphore used to synchronise access to the cache chain
810-816Check every cache on the cache chain and make sure there is no other cache with the same name. If there is, it means two caches of the same type are been created which is a serious bug
811Get the cache from the list
814-815Compare the names and if they match, BUG(). It is worth noting that the new cache is not deleted, but this error is the result of sloppy programming during development and not a normal scenario
822Link the cache into the chain.
823Release the cache chain semaphore.
825Return the new cache pointer

H.1.2  Calculating the Number of Objects on a Slab

H.1.2.1  Function: kmem_cache_estimate

Source: mm/slab.c

During cache creation, it is determined how many objects can be stored in a slab and how much waste-age there will be. The following function calculates how many objects may be stored, taking into account if the slab and bufctl's must be stored on-slab.

388 static void kmem_cache_estimate (unsigned long gfporder, 
             size_t size,
389          int flags, size_t *left_over, unsigned int *num)
390 {
391     int i;
392     size_t wastage = PAGE_SIZE<<gfporder;
393     size_t extra = 0;
394     size_t base = 0;
395 
396     if (!(flags & CFLGS_OFF_SLAB)) {
397         base = sizeof(slab_t);
398         extra = sizeof(kmem_bufctl_t);
399     }
400     i = 0;
401     while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)
402         i++;
403     if (i > 0)
404         i--;
405 
406     if (i > SLAB_LIMIT)
407         i = SLAB_LIMIT;
408 
409     *num = i;
410     wastage -= i*size;
411     wastage -= L1_CACHE_ALIGN(base+i*extra);
412     *left_over = wastage;
413 }
388The parameters of the function are as follows
gfporder The 2gfporder number of pages to allocate for each slab
size The size of each object
flags The cache flags
left_over The number of bytes left over in the slab. Returned to caller
num The number of objects that will fit in a slab. Returned to caller
392wastage is decremented through the function. It starts with the maximum possible amount of wastage.
393extra is the number of bytes needed to store kmem_bufctl_t
394base is where usable memory in the slab starts
396If the slab descriptor is kept on cache, the base begins at the end of the slab_t struct and the number of bytes needed to store the bufctl is the size of kmem_bufctl_t
400i becomes the number of objects the slab can hold
401-402This counts up the number of objects that the cache can store. i*size is the the size of the object itself. L1_CACHE_ALIGN(base+i*extra) is slightly trickier. This is calculating the amount of memory needed to store the kmem_bufctl_t needed for every object in the slab. As it is at the beginning of the slab, it is L1 cache aligned so that the first object in the slab will be aligned to hardware cache. i*extra will calculate the amount of space needed to hold a kmem_bufctl_t for this object. As wast-age starts out as the size of the slab, its use is overloaded here.
403-404Because the previous loop counts until the slab overflows, the number of objects that can be stored is i-1.
406-407SLAB_LIMIT is the absolute largest number of objects a slab can store. Is is defined as 0xffffFFFE as this the largest number kmem_bufctl_t(), which is an unsigned integer, can hold
409num is now the number of objects a slab can hold
410Take away the space taken up by all the objects from wastage
411Take away the space taken up by the kmem_bufctl_t
412Wast-age has now been calculated as the left over space in the slab

H.1.3  Cache Shrinking

The call graph for kmem_cache_shrink() is shown in Figure 8.5. Two varieties of shrink functions are provided. kmem_cache_shrink() removes all slabs from slabs_free and returns the number of pages freed as a result. __kmem_cache_shrink() frees all slabs from slabs_free and then verifies that slabs_partial and slabs_full are empty. This is important during cache destruction when it doesn't matter how many pages are freed, just that the cache is empty.

H.1.3.1  Function: kmem_cache_shrink

Source: mm/slab.c

This function performs basic debugging checks and then acquires the cache descriptor lock before freeing slabs. At one time, it also used to call drain_cpu_caches() to free up objects on the per-cpu cache. It is curious that this was removed as it is possible slabs could not be freed due to an object been allocation on a per-cpu cache but not in use.

966 int kmem_cache_shrink(kmem_cache_t *cachep)
967 {
968     int ret;
969 
970     if (!cachep || in_interrupt() || 
        !is_chained_kmem_cache(cachep))
971         BUG();
972 
973     spin_lock_irq(&cachep->spinlock);
974     ret = __kmem_cache_shrink_locked(cachep);
975     spin_unlock_irq(&cachep->spinlock);
976 
977     return ret << cachep->gfporder;
978 }
966The parameter is the cache been shrunk
970Check that
973Acquire the cache descriptor lock and disable interrupts
974Shrink the cache
975Release the cache lock and enable interrupts
976This returns the number of pages freed but does not take into account the objects freed by draining the CPU.

H.1.3.2  Function: __kmem_cache_shrink

Source: mm/slab.c

This function is identical to kmem_cache_shrink() except it returns if the cache is empty or not. This is important during cache destruction when it is not important how much memory was freed, just that it is safe to delete the cache and not leak memory.

945 static int __kmem_cache_shrink(kmem_cache_t *cachep)
946 {
947     int ret;
948 
949     drain_cpu_caches(cachep);
950 
951     spin_lock_irq(&cachep->spinlock);
952     __kmem_cache_shrink_locked(cachep);
953     ret = !list_empty(&cachep->slabs_full) ||
954         !list_empty(&cachep->slabs_partial);
955     spin_unlock_irq(&cachep->spinlock);
956     return ret;
957 }
949Remove all objects from the per-CPU objects cache
951Acquire the cache descriptor lock and disable interrupts
952Free all slabs in the slabs_free list
954-954Check the slabs_partial and slabs_full lists are empty
955Release the cache descriptor lock and re-enable interrupts
956Return if the cache has all its slabs free or not

H.1.3.3  Function: __kmem_cache_shrink_locked

Source: mm/slab.c

This does the dirty work of freeing slabs. It will keep destroying them until the growing flag gets set, indicating the cache is in use or until there is no more slabs in slabs_free.

917 static int __kmem_cache_shrink_locked(kmem_cache_t *cachep)
918 {
919     slab_t *slabp;
920     int ret = 0;
921 
923     while (!cachep->growing) {
924         struct list_head *p;
925 
926         p = cachep->slabs_free.prev;
927         if (p == &cachep->slabs_free)
928             break;
929 
930         slabp = list_entry(cachep->slabs_free.prev, 
                       slab_t, list);
931 #if DEBUG
932         if (slabp->inuse)
933             BUG();
934 #endif
935         list_del(&slabp->list);
936 
937         spin_unlock_irq(&cachep->spinlock);
938         kmem_slab_destroy(cachep, slabp);
939         ret++;
940         spin_lock_irq(&cachep->spinlock);
941     }
942     return ret;
943 }
923While the cache is not growing, free slabs
926-930Get the last slab on the slabs_free list
932-933If debugging is available, make sure it is not in use. If it is not in use, it should not be on the slabs_free list in the first place
935Remove the slab from the list
937Re-enable interrupts. This function is called with interrupts disabled and this is to free the interrupt as quickly as possible.
938Delete the slab with kmem_slab_destroy() (See Section H.2.3.1)
939Record the number of slabs freed
940Acquire the cache descriptor lock and disable interrupts

H.1.4  Cache Destroying

When a module is unloaded, it is responsible for destroying any cache is has created as during module loading, it is ensured there is not two caches of the same name. Core kernel code often does not destroy its caches as their existence persists for the life of the system. The steps taken to destroy a cache are

H.1.4.1  Function: kmem_cache_destroy

Source: mm/slab.c

The call graph for this function is shown in Figure 8.7.

 997 int kmem_cache_destroy (kmem_cache_t * cachep)
 998 {
 999     if (!cachep || in_interrupt() || cachep->growing)
 1000        BUG();
 1001 
1002     /* Find the cache in the chain of caches. */
1003     down(&cache_chain_sem);
1004     /* the chain is never empty, cache_cache is never destroyed */
1005     if (clock_searchp == cachep)
1006         clock_searchp = list_entry(cachep->next.next,
1007                         kmem_cache_t, next);
1008     list_del(&cachep->next);
1009     up(&cache_chain_sem);
1010 
1011     if (__kmem_cache_shrink(cachep)) {
1012         printk(KERN_ERR 
                "kmem_cache_destroy: Can't free all objects %p\n",
1013            cachep);
1014         down(&cache_chain_sem);
1015         list_add(&cachep->next,&cache_chain);
1016         up(&cache_chain_sem);
1017         return 1;
1018     }
1019 #ifdef CONFIG_SMP
1020     {
1021         int i;
1022         for (i = 0; i < NR_CPUS; i++)
1023             kfree(cachep->cpudata[i]);
1024     }
1025 #endif
1026     kmem_cache_free(&cache_cache, cachep);
1027 
1028     return 0;
1029 }
999-1000Sanity check. Make sure the cachep is not null, that an interrupt is not trying to do this and that the cache has not been marked as growing, indicating it is in use
1003Acquire the semaphore for accessing the cache chain
1005-1007Acquire the list entry from the cache chain
1008Delete this cache from the cache chain
1009Release the cache chain semaphore
1011Shrink the cache to free all slabs with __kmem_cache_shrink() (See Section H.1.3.2)
1012-1017The shrink function returns true if there is still slabs in the cache. If there is, the cache cannot be destroyed so it is added back into the cache chain and the error reported
1022-1023If SMP is enabled, the per-cpu data structures are deleted with kfree() (See Section H.4.3.1)
1026Delete the cache descriptor from the cache_cache with kmem_cache_free() (See Section H.3.3.1)

H.1.5  Cache Reaping

H.1.5.1  Function: kmem_cache_reap

Source: mm/slab.c

The call graph for this function is shown in Figure 8.4. Because of the size of this function, it will be broken up into three separate sections. The first is simple function preamble. The second is the selection of a cache to reap and the third is the freeing of the slabs. The basic tasks were described in Section 8.1.7.

1738 int kmem_cache_reap (int gfp_mask)
1739 {
1740     slab_t *slabp;
1741     kmem_cache_t *searchp;
1742     kmem_cache_t *best_cachep;
1743     unsigned int best_pages;
1744     unsigned int best_len;
1745     unsigned int scan;
1746     int ret = 0;
1747 
1748     if (gfp_mask & __GFP_WAIT)
1749         down(&cache_chain_sem);
1750     else
1751         if (down_trylock(&cache_chain_sem))
1752             return 0;
1753 
1754     scan = REAP_SCANLEN;
1755     best_len = 0;
1756     best_pages = 0;
1757     best_cachep = NULL;
1758     searchp = clock_searchp;
1738The only parameter is the GFP flag. The only check made is against the __GFP_WAIT flag. As the only caller, kswapd, can sleep, this parameter is virtually worthless
1748-1749Can the caller sleep? If yes, then acquire the semaphore
1751-1752Else, try and acquire the semaphore and if not available, return
1754REAP_SCANLEN (10) is the number of caches to examine.
1758Set searchp to be the last cache that was examined at the last reap
1759     do {
1760         unsigned int pages;
1761         struct list_head* p;
1762         unsigned int full_free;
1763 
1765         if (searchp->flags & SLAB_NO_REAP)
1766             goto next;
1767         spin_lock_irq(&searchp->spinlock);
1768         if (searchp->growing)
1769             goto next_unlock;
1770         if (searchp->dflags & DFLGS_GROWN) {
1771             searchp->dflags &= ~DFLGS_GROWN;
1772             goto next_unlock;
1773         }
1774 #ifdef CONFIG_SMP
1775         {
1776             cpucache_t *cc = cc_data(searchp);
1777             if (cc && cc->avail) {
1778                 __free_block(searchp, cc_entry(cc),
                          cc->avail);
1779                 cc->avail = 0;
1780             }
1781         }
1782 #endif
1783 
1784         full_free = 0;
1785         p = searchp->slabs_free.next;
1786         while (p != &searchp->slabs_free) {
1787             slabp = list_entry(p, slab_t, list);
1788 #if DEBUG
1789             if (slabp->inuse)
1790                 BUG();
1791 #endif
1792             full_free++;
1793             p = p->next;
1794         }
1795 
1801         pages = full_free * (1<<searchp->gfporder);
1802         if (searchp->ctor)
1803             pages = (pages*4+1)/5;
1804         if (searchp->gfporder)
1805             pages = (pages*4+1)/5;
1806         if (pages > best_pages) {
1807             best_cachep = searchp;
1808             best_len = full_free;
1809             best_pages = pages;
1810             if (pages >= REAP_PERFECT) {
1811                 clock_searchp =
                      list_entry(searchp->next.next,
1812                      kmem_cache_t,next);
1813                 goto perfect;
1814             }
1815         }
1816 next_unlock:
1817         spin_unlock_irq(&searchp->spinlock);
1818 next:
1819         searchp =
               list_entry(searchp->next.next,kmem_cache_t,next);
1820     } while (--scan && searchp != clock_searchp);

This block examines REAP_SCANLEN number of caches to select one to free

1767Acquire an interrupt safe lock to the cache descriptor
1768-1769If the cache is growing, skip it
1770-1773If the cache has grown recently, skip it and clear the flag
1775-1781Free any per CPU objects to the global pool
1786-1794Count the number of slabs in the slabs_free list
1801Calculate the number of pages all the slabs hold
1802-1803If the objects have constructors, reduce the page count by one fifth to make it less likely to be selected for reaping
1804-1805If the slabs consist of more than one page, reduce the page count by one fifth. This is because high order pages are hard to acquire
1806If this is the best candidate found for reaping so far, check if it is perfect for reaping
1807-1809Record the new maximums
1808best_len is recorded so that it is easy to know how many slabs is half of the slabs in the free list
1810If this cache is perfect for reaping then
1811Update clock_searchp
1812Goto perfect where half the slabs will be freed
1816This label is reached if it was found the cache was growing after acquiring the lock
1817Release the cache descriptor lock
1818Move to the next entry in the cache chain
1820Scan while REAP_SCANLEN has not been reached and we have not cycled around the whole cache chain
1822     clock_searchp = searchp;
1823 
1824     if (!best_cachep)
1826         goto out;
1827 
1828     spin_lock_irq(&best_cachep->spinlock);
1829 perfect:
1830     /* free only 50% of the free slabs */
1831     best_len = (best_len + 1)/2;
1832     for (scan = 0; scan < best_len; scan++) {
1833         struct list_head *p;
1834 
1835         if (best_cachep->growing)
1836             break;
1837         p = best_cachep->slabs_free.prev;
1838         if (p == &best_cachep->slabs_free)
1839             break;
1840         slabp = list_entry(p,slab_t,list);
1841 #if DEBUG
1842         if (slabp->inuse)
1843             BUG();
1844 #endif
1845         list_del(&slabp->list);
1846         STATS_INC_REAPED(best_cachep);
1847 
1848         /* Safe to drop the lock. The slab is no longer 
1849          * lined to the cache.
1850          */
1851         spin_unlock_irq(&best_cachep->spinlock);
1852         kmem_slab_destroy(best_cachep, slabp);
1853         spin_lock_irq(&best_cachep->spinlock);
1854     }
1855     spin_unlock_irq(&best_cachep->spinlock);
1856     ret = scan * (1 << best_cachep->gfporder);
1857 out:
1858     up(&cache_chain_sem);
1859     return ret;
1860 }

This block will free half of the slabs from the selected cache

1822Update clock_searchp for the next cache reap
1824-1826If a cache was not found, goto out to free the cache chain and exit
1828Acquire the cache chain spinlock and disable interrupts. The cachep descriptor has to be held by an interrupt safe lock as some caches may be used from interrupt context. The slab allocator has no way to differentiate between interrupt safe and unsafe caches
1831Adjust best_len to be the number of slabs to free
1832-1854Free best_len number of slabs
1835-1847If the cache is growing, exit
1837Get a slab from the list
1838-1839If there is no slabs left in the list, exit
1840Get the slab pointer
1842-1843If debugging is enabled, make sure there is no active objects in the slab
1845Remove the slab from the slabs_free list
1846Update statistics if enabled
1851Free the cache descriptor and enable interrupts
1852Destroy the slab. See Section 8.2.8
1851Re-acquire the cache descriptor spinlock and disable interrupts
1855Free the cache descriptor and enable interrupts
1856ret is the number of pages that was freed
1858-1859Free the cache semaphore and return the number of pages freed

H.2  Slabs

H.2.1  Storing the Slab Descriptor

H.2.1.1  Function: kmem_cache_slabmgmt

Source: mm/slab.c

This function will either allocate allocate space to keep the slab descriptor off cache or reserve enough space at the beginning of the slab for the descriptor and the bufctls.

1032 static inline slab_t * kmem_cache_slabmgmt (
                 kmem_cache_t *cachep,
1033             void *objp, 
                 int colour_off, 
                 int local_flags)
1034 {
1035     slab_t *slabp;
1036     
1037     if (OFF_SLAB(cachep)) {
1039         slabp = kmem_cache_alloc(cachep->slabp_cache,
                          local_flags);
1040         if (!slabp)
1041             return NULL;
1042     } else {
1047         slabp = objp+colour_off;
1048         colour_off += L1_CACHE_ALIGN(cachep->num *
1049                 sizeof(kmem_bufctl_t) + 
                     sizeof(slab_t));
1050     }
1051     slabp->inuse = 0;
1052     slabp->colouroff = colour_off;
1053     slabp->s_mem = objp+colour_off;
1054 
1055     return slabp;
1056 }
1032 The parameters of the function are
cachep The cache the slab is to be allocated to
objp When the function is called, this points to the beginning of the slab
colour_off The colour offset for this slab
local_flags These are the flags for the cache
1037-1042 If the slab descriptor is kept off cache....
1039 Allocate memory from the sizes cache. During cache creation, slabp_cache is set to the appropriate size cache to allocate from.
1040 If the allocation failed, return
1042-1050 Reserve space at the beginning of the slab
1047 The address of the slab will be the beginning of the slab (objp) plus the colour offset
1048 colour_off is calculated to be the offset where the first object will be placed. The address is L1 cache aligned. cachep->num * sizeof(kmem_bufctl_t) is the amount of space needed to hold the bufctls for each object in the slab and sizeof(slab_t) is the size of the slab descriptor. This effectively has reserved the space at the beginning of the slab
1051The number of objects in use on the slab is 0
1052The colouroff is updated for placement of the new object
1053The address of the first object is calculated as the address of the beginning of the slab plus the offset

H.2.1.2  Function: kmem_find_general_cachep

Source: mm/slab.c

If the slab descriptor is to be kept off-slab, this function, called during cache creation will find the appropriate sizes cache to use and will be stored within the cache descriptor in the field slabp_cache.

1620 kmem_cache_t * kmem_find_general_cachep (size_t size, 
                          int gfpflags)
1621 {
1622     cache_sizes_t *csizep = cache_sizes;
1623 
1628     for ( ; csizep->cs_size; csizep++) {
1629         if (size > csizep->cs_size)
1630             continue;
1631         break;
1632     }
1633     return (gfpflags & GFP_DMA) ? csizep->cs_dmacachep :
                       csizep->cs_cachep;
1634 }
1620 size is the size of the slab descriptor. gfpflags is always 0 as DMA memory is not needed for a slab descriptor
1628-1632 Starting with the smallest size, keep increasing the size until a cache is found with buffers large enough to store the slab descriptor
1633 Return either a normal or DMA sized cache depending on the gfpflags passed in. In reality, only the cs_cachep is ever passed back

H.2.2  Slab Creation

H.2.2.1  Function: kmem_cache_grow

Source: mm/slab.c

The call graph for this function is shown in 8.11. The basic tasks for this function are;

1105 static int kmem_cache_grow (kmem_cache_t * cachep, int flags)
1106 {
1107     slab_t  *slabp;
1108     struct page     *page;
1109     void        *objp;
1110     size_t       offset;
1111     unsigned int     i, local_flags;
1112     unsigned long    ctor_flags;
1113     unsigned long    save_flags;

Basic declarations. The parameters of the function are

cachep The cache to allocate a new slab to
flags The flags for a slab creation
1118     if (flags & ~(SLAB_DMA|SLAB_LEVEL_MASK|SLAB_NO_GROW))
1119         BUG();
1120     if (flags & SLAB_NO_GROW)
1121         return 0;
1122 
1129     if (in_interrupt() && 
             (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
1130         BUG();
1131 
1132     ctor_flags = SLAB_CTOR_CONSTRUCTOR;
1133     local_flags = (flags & SLAB_LEVEL_MASK);
1134     if (local_flags == SLAB_ATOMIC)
1139         ctor_flags |= SLAB_CTOR_ATOMIC;

Perform basic sanity checks to guard against bad usage. The checks are made here rather than kmem_cache_alloc() to protect the speed-critical path. There is no point checking the flags every time an object needs to be allocated.

1118-1119Make sure only allowable flags are used for allocation
1120-1121Do not grow the cache if this is set. In reality, it is never set
1129-1130If this called within interrupt context, make sure the ATOMIC flag is set so we don't sleep when kmem_getpages()(See Section H.7.0.3) is called
1132This flag tells the constructor it is to init the object
1133The local_flags are just those relevant to the page allocator
1134-1139If the SLAB_ATOMIC flag is set, the constructor needs to know about it in case it wants to make new allocations
1142     spin_lock_irqsave(&cachep->spinlock, save_flags);
1143 
1145     offset = cachep->colour_next;
1146     cachep->colour_next++;
1147     if (cachep->colour_next >= cachep->colour)
1148         cachep->colour_next = 0;
1149     offset *= cachep->colour_off;
1150     cachep->dflags |= DFLGS_GROWN;
1151 
1152     cachep->growing++;
1153     spin_unlock_irqrestore(&cachep->spinlock, save_flags);

Calculate colour offset for objects in this slab

1142Acquire an interrupt safe lock for accessing the cache descriptor
1145Get the offset for objects in this slab
1146Move to the next colour offset
1147-1148If colour has been reached, there is no more offsets available, so reset colour_next to 0
1149colour_off is the size of each offset, so offset * colour_off will give how many bytes to offset the objects to
1150Mark the cache that it is growing so that kmem_cache_reap() (See Section H.1.5.1) will ignore this cache
1152Increase the count for callers growing this cache
1153Free the spinlock and re-enable interrupts
1165     if (!(objp = kmem_getpages(cachep, flags)))
1166         goto failed;
1167 
1169     if (!(slabp = kmem_cache_slabmgmt(cachep, 
                           objp, offset,
                           local_flags)))
1160         goto opps1;

Allocate memory for slab and acquire a slab descriptor

1165-1166Allocate pages from the page allocator for the slab with kmem_getpages() (See Section H.7.0.3)
1169Acquire a slab descriptor with kmem_cache_slabmgmt() (See Section H.2.1.1)
1173     i = 1 << cachep->gfporder;
1174     page = virt_to_page(objp);
1175     do {
1176         SET_PAGE_CACHE(page, cachep);
1177         SET_PAGE_SLAB(page, slabp);
1178         PageSetSlab(page);
1179         page++;
1180     } while (--i);

Link the pages for the slab used to the slab and cache descriptors

1173i is the number of pages used for the slab. Each page has to be linked to the slab and cache descriptors.
1174objp is a pointer to the beginning of the slab. The macro virt_to_page() will give the struct page for that address
1175-1180Link each pages list field to the slab and cache descriptors
1176SET_PAGE_CACHE() links the page to the cache descriptor using the pagelist.next field
1178SET_PAGE_SLAB() links the page to the slab descriptor using the pagelist.prev field
1178Set the PG_slab page flag. The full set of PG_ flags is listed in Table 2.1
1179Move to the next page for this slab to be linked
1182     kmem_cache_init_objs(cachep, slabp, ctor_flags);
1182Initialise all objects (See Section H.3.1.1)
1184     spin_lock_irqsave(&cachep->spinlock, save_flags);
1185     cachep->growing--;
1186 
1188     list_add_tail(&slabp->list, &cachep->slabs_free);
1189     STATS_INC_GROWN(cachep);
1190     cachep->failures = 0;
1191 
1192     spin_unlock_irqrestore(&cachep->spinlock, save_flags);
1193     return 1;

Add the slab to the cache

1184Acquire the cache descriptor spinlock in an interrupt safe fashion
1185Decrease the growing count
1188Add the slab to the end of the slabs_free list
1189If STATS is set, increase the cachepgrown field STATS_INC_GROWN()
1190Set failures to 0. This field is never used elsewhere
1192Unlock the spinlock in an interrupt safe fashion
1193Return success
1194 opps1:
1195     kmem_freepages(cachep, objp);
1196 failed:
1197     spin_lock_irqsave(&cachep->spinlock, save_flags);
1198     cachep->growing--;
1199     spin_unlock_irqrestore(&cachep->spinlock, save_flags);
1300     return 0;
1301 }

Error handling

1194-1195opps1 is reached if the pages for the slab were allocated. They must be freed
1197Acquire the spinlock for accessing the cache descriptor
1198Reduce the growing count
1199Release the spinlock
1300Return failure

H.2.3  Slab Destroying

H.2.3.1  Function: kmem_slab_destroy

Source: mm/slab.c

The call graph for this function is shown at Figure 8.13. For reability, the debugging sections has been omitted from this function but they are almost identical to the debugging section during object allocation. See Section H.3.1.1 for how the markers and poison pattern are checked.

555 static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp)
556 {
557     if (cachep->dtor
561     ) {
562         int i;
563         for (i = 0; i < cachep->num; i++) {
564             void* objp = slabp->s_mem+cachep->objsize*i;

565-574 DEBUG: Check red zone markers

575             if (cachep->dtor)
576                 (cachep->dtor)(objp, cachep, 0);

577-584 DEBUG: Check poison pattern

585         }
586     }
587 
588     kmem_freepages(cachep, slabp->s_mem-slabp->colouroff);
589     if (OFF_SLAB(cachep))
590         kmem_cache_free(cachep->slabp_cache, slabp);
591 }
557-586If a destructor is available, call it for each object in the slab
563-585Cycle through each object in the slab
564Calculate the address of the object to destroy
575-576Call the destructor
588Free the pages been used for the slab
589If the slab descriptor is been kept off-slab, then free the memory been used for it

H.3  Objects

This section will cover how objects are managed. At this point, most of the real hard work has been completed by either the cache or slab managers.

H.3.1  Initialising Objects in a Slab

H.3.1.1  Function: kmem_cache_init_objs

Source: mm/slab.c

The vast part of this function is involved with debugging so we will start with the function without the debugging and explain that in detail before handling the debugging part. The two sections that are debugging are marked in the code excerpt below as Part 1 and Part 2.

1058 static inline void kmem_cache_init_objs (kmem_cache_t * cachep,
1059             slab_t * slabp, unsigned long ctor_flags)
1060 {
1061     int i;
1062 
1063     for (i = 0; i < cachep->num; i++) {
1064         void* objp = slabp->s_mem+cachep->objsize*i;

1065-1072        /* Debugging Part 1 */

1079         if (cachep->ctor)
1080             cachep->ctor(objp, cachep, ctor_flags);

1081-1094        /* Debugging Part 2 */

1095         slab_bufctl(slabp)[i] = i+1;
1096     }
1097     slab_bufctl(slabp)[i-1] = BUFCTL_END;
1098     slabp->free = 0;
1099 }
1058The parameters of the function are
cachepThe cache the objects are been initialised for
slabpThe slab the objects are in
ctor_flagsFlags the constructor needs whether this is an atomic allocation or not
1063Initialise cachenum number of objects
1064The base address for objects in the slab is s_mem. The address of the object to allocate is then i * (size of a single object)
1079-1080If a constructor is available, call it
1095The macro slab_bufctl() casts slabp to a slab_t slab descriptor and adds one to it. This brings the pointer to the end of the slab descriptor and then casts it back to a kmem_bufctl_t effectively giving the beginning of the bufctl array.
1098The index of the first free object is 0 in the bufctl array

That covers the core of initialising objects. Next the first debugging part will be covered

1065 #if DEBUG
1066         if (cachep->flags & SLAB_RED_ZONE) {
1067             *((unsigned long*)(objp)) = RED_MAGIC1;
1068             *((unsigned long*)(objp + cachep->objsize -
1069                     BYTES_PER_WORD)) = RED_MAGIC1;
1070             objp += BYTES_PER_WORD;
1071         }
1072 #endif
1066If the cache is to be red zones then place a marker at either end of the object
1067Place the marker at the beginning of the object
1068Place the marker at the end of the object. Remember that the size of the object takes into account the size of the red markers when red zoning is enabled
1070Increase the objp pointer by the size of the marker for the benefit of the constructor which is called after this debugging block
1081 #if DEBUG
1082         if (cachep->flags & SLAB_RED_ZONE)
1083             objp -= BYTES_PER_WORD;
1084         if (cachep->flags & SLAB_POISON)
1086             kmem_poison_obj(cachep, objp);
1087         if (cachep->flags & SLAB_RED_ZONE) {
1088             if (*((unsigned long*)(objp)) != RED_MAGIC1)
1089                 BUG();
1090             if (*((unsigned long*)(objp + cachep->objsize -
1091                     BYTES_PER_WORD)) != RED_MAGIC1)
1092                 BUG();
1093         }
1094 #endif

This is the debugging block that takes place after the constructor, if it exists, has been called.

1082-1083The objp pointer was increased by the size of the red marker in the previous debugging block so move it back again
1084-1086If there was no constructor, poison the object with a known pattern that can be examined later to trap uninitialised writes
1088Check to make sure the red marker at the beginning of the object was preserved to trap writes before the object
1090-1091Check to make sure writes didn't take place past the end of the object

H.3.2  Object Allocation

H.3.2.1  Function: kmem_cache_alloc

Source: mm/slab.c

The call graph for this function is shown in Figure 8.14. This trivial function simply calls __kmem_cache_alloc().

1529 void * kmem_cache_alloc (kmem_cache_t *cachep, int flags)
1531 {
1532     return __kmem_cache_alloc(cachep, flags);
1533 }

H.3.2.2  Function: __kmem_cache_alloc (UP Case)

Source: mm/slab.c

This will take the parts of the function specific to the UP case. The SMP case will be dealt with in the next section.

1338 static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, 
                                              int flags)
1339 {
1340     unsigned long save_flags;
1341     void* objp;
1342 
1343     kmem_cache_alloc_head(cachep, flags);
1344 try_again:
1345     local_irq_save(save_flags);

1367     objp = kmem_cache_alloc_one(cachep);

1369     local_irq_restore(save_flags);
1370     return objp;
1371 alloc_new_slab:

1376     local_irq_restore(save_flags);
1377     if (kmem_cache_grow(cachep, flags))
1381         goto try_again;
1382     return NULL;
1383 }
1338The parameters are the cache to allocate from and allocation specific flags
1343This function makes sure the appropriate combination of DMA flags are in use
1345Disable interrupts and save the flags. This function is used by interrupts so this is the only way to provide synchronisation in the UP case
1367kmem_cache_alloc_one() (see Section H.3.2.5) allocates an object from one of the lists and returns it. If no objects are free, this macro (note it isn't a function) will goto alloc_new_slab at the end of this function
1369-1370Restore interrupts and return
1376At this label, no objects were free in slabs_partial and slabs_free is empty so a new slab is needed
1377Allocate a new slab (see Section 8.2.2)
1379A new slab is available so try again
1382No slabs could be allocated so return failure

H.3.2.3  Function: __kmem_cache_alloc (SMP Case)

Source: mm/slab.c

This is what the function looks like in the SMP case

1338 static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, 
                                              int flags)
1339 {
1340     unsigned long save_flags;
1341     void* objp;
1342 
1343     kmem_cache_alloc_head(cachep, flags);
1344 try_again:
1345     local_irq_save(save_flags);
1347     {
1348         cpucache_t *cc = cc_data(cachep);
1349 
1350         if (cc) {
1351             if (cc->avail) {
1352                 STATS_INC_ALLOCHIT(cachep);
1353                 objp = cc_entry(cc)[--cc->avail];
1354             } else {
1355                 STATS_INC_ALLOCMISS(cachep);
1356                 objp =
                  kmem_cache_alloc_batch(cachep,cc,flags);
1357                 if (!objp)
1358                   goto alloc_new_slab_nolock;
1359             }
1360         } else {
1361             spin_lock(&cachep->spinlock);
1362             objp = kmem_cache_alloc_one(cachep);
1363             spin_unlock(&cachep->spinlock);
1364         }
1365     }
1366     local_irq_restore(save_flags);
1370     return objp;
1371 alloc_new_slab:
1373     spin_unlock(&cachep->spinlock);
1374 alloc_new_slab_nolock:
1375     local_irq_restore(save_flags);
1377     if (kmem_cache_grow(cachep, flags))
1381         goto try_again;
1382     return NULL;
1383 }
1338-1347Same as UP case
1349Obtain the per CPU data for this cpu
1350-1360If a per CPU cache is available then ....
1351If there is an object available then ....
1352Update statistics for this cache if enabled
1353Get an object and update the avail figure
1354Else an object is not available so ....
1355Update statistics for this cache if enabled
1356Allocate batchcount number of objects, place all but one of them in the per CPU cache and return the last one to objp
1357-1358The allocation failed, so goto alloc_new_slab_nolock to grow the cache and allocate a new slab
1360-1364If a per CPU cache is not available, take out the cache spinlock and allocate one object in the same way the UP case does. This is the case during the initialisation for the cache_cache for example
1363Object was successfully assigned, release cache spinlock
1366-1370Re-enable interrupts and return the allocated object
1371-1372If kmem_cache_alloc_one() failed to allocate an object, it will goto here with the spinlock still held so it must be released
1375-1383Same as the UP case

H.3.2.4  Function: kmem_cache_alloc_head

Source: mm/slab.c

This simple function ensures the right combination of slab and GFP flags are used for allocation from a slab. If a cache is for DMA use, this function will make sure the caller does not accidently request normal memory and vice versa

1231 static inline void kmem_cache_alloc_head(kmem_cache_t *cachep, 
                                              int flags)
1232 {
1233     if (flags & SLAB_DMA) {
1234         if (!(cachep->gfpflags & GFP_DMA))
1235             BUG();
1236     } else {
1237         if (cachep->gfpflags & GFP_DMA)
1238             BUG();
1239     }
1240 }
1231The parameters are the cache we are allocating from and the flags requested for the allocation
1233If the caller has requested memory for DMA use and ....
1234The cache is not using DMA memory then BUG()
1237Else if the caller has not requested DMA memory and this cache is for DMA use, BUG()

H.3.2.5  Function: kmem_cache_alloc_one

Source: mm/slab.c

This is a preprocessor macro. It may seem strange to not make this an inline function but it is a preprocessor macro for a goto optimisation in __kmem_cache_alloc() (see Section H.3.2.2)

1283 #define kmem_cache_alloc_one(cachep)              \
1284 ({                                                \ 
1285     struct list_head * slabs_partial, * entry;    \
1286     slab_t *slabp;                                \
1287                                                   \
1288     slabs_partial = &(cachep)->slabs_partial;     \
1289     entry = slabs_partial->next;                  \
1290     if (unlikely(entry == slabs_partial)) {       \
1291         struct list_head * slabs_free;            \
1292         slabs_free = &(cachep)->slabs_free;       \
1293         entry = slabs_free->next;                 \
1294         if (unlikely(entry == slabs_free))        \
1295             goto alloc_new_slab;                  \
1296         list_del(entry);                          \
1297         list_add(entry, slabs_partial);           \
1298     }                                             \
1299                                                   \
1300     slabp = list_entry(entry, slab_t, list);      \
1301     kmem_cache_alloc_one_tail(cachep, slabp);     \
1302 })
1288-1289Get the first slab from the slabs_partial list
1290-1298If a slab is not available from this list, execute this block
1291-1293Get the first slab from the slabs_free list
1294-1295If there is no slabs on slabs_free, then goto alloc_new_slab(). This goto label is in __kmem_cache_alloc() and it is will grow the cache by one slab
1296-1297Else remove the slab from the free list and place it on the slabs_partial list because an object is about to be removed from it
1300Obtain the slab from the list
1301Allocate one object from the slab

H.3.2.6  Function: kmem_cache_alloc_one_tail

Source: mm/slab.c

This function is responsible for the allocation of one object from a slab. Much of it is debugging code.

1242 static inline void * kmem_cache_alloc_one_tail (
                             kmem_cache_t *cachep,
1243                         slab_t *slabp)
1244 {
1245     void *objp;
1246 
1247     STATS_INC_ALLOCED(cachep);
1248     STATS_INC_ACTIVE(cachep);
1249     STATS_SET_HIGH(cachep);
1250 
1252     slabp->inuse++;
1253     objp = slabp->s_mem + slabp->free*cachep->objsize;
1254     slabp->free=slab_bufctl(slabp)[slabp->free];
1255 
1256     if (unlikely(slabp->free == BUFCTL_END)) {
1257         list_del(&slabp->list);
1258         list_add(&slabp->list, &cachep->slabs_full);
1259     }
1260 #if DEBUG
1261     if (cachep->flags & SLAB_POISON)
1262         if (kmem_check_poison_obj(cachep, objp))
1263             BUG();
1264     if (cachep->flags & SLAB_RED_ZONE) {
1266         if (xchg((unsigned long *)objp, RED_MAGIC2) !=
1267                           RED_MAGIC1)
1268             BUG();
1269         if (xchg((unsigned long *)(objp+cachep->objsize -
1270             BYTES_PER_WORD), RED_MAGIC2) != RED_MAGIC1)
1271             BUG();
1272         objp += BYTES_PER_WORD;
1273     }
1274 #endif
1275     return objp;
1276 }
1230The parameters are the cache and slab been allocated from
1247-1249If stats are enabled, this will set three statistics. ALLOCED is the total number of objects that have been allocated. ACTIVE is the number of active objects in the cache. HIGH is the maximum number of objects that were active as a single time
1252inuse is the number of objects active on this slab
1253Get a pointer to a free object. s_mem is a pointer to the first object on the slab. free is an index of a free object in the slab. index * object size gives an offset within the slab
1254This updates the free pointer to be an index of the next free object
1256-1259If the slab is full, remove it from the slabs_partial list and place it on the slabs_full.
1260-1274Debugging code
1275Without debugging, the object is returned to the caller
1261-1263If the object was poisoned with a known pattern, check it to guard against uninitialised access
1266-1267If red zoning was enabled, check the marker at the beginning of the object and confirm it is safe. Change the red marker to check for writes before the object later
1269-1271Check the marker at the end of the object and change it to check for writes after the object later
1272Update the object pointer to point to after the red marker
1275Return the object

H.3.2.7  Function: kmem_cache_alloc_batch

Source: mm/slab.c

This function allocate a batch of objects to a CPU cache of objects. It is only used in the SMP case. In many ways it is very similar kmem_cache_alloc_one()(See Section H.3.2.5).

1305 void* kmem_cache_alloc_batch(kmem_cache_t* cachep, 
                  cpucache_t* cc, int flags)
1306 {
1307     int batchcount = cachep->batchcount;
1308 
1309     spin_lock(&cachep->spinlock);
1310     while (batchcount--) {
1311         struct list_head * slabs_partial, * entry;
1312         slab_t *slabp;
1313         /* Get slab alloc is to come from. */
1314         slabs_partial = &(cachep)->slabs_partial;
1315         entry = slabs_partial->next;
1316         if (unlikely(entry == slabs_partial)) {
1317             struct list_head * slabs_free;
1318             slabs_free = &(cachep)->slabs_free;
1319             entry = slabs_free->next;
1320             if (unlikely(entry == slabs_free))
1321                 break;
1322             list_del(entry);
1323             list_add(entry, slabs_partial);
1324         }
1325 
1326         slabp = list_entry(entry, slab_t, list);
1327         cc_entry(cc)[cc->avail++] =
1328                kmem_cache_alloc_one_tail(cachep, slabp);
1329     }
1330     spin_unlock(&cachep->spinlock);
1331 
1332     if (cc->avail)
1333         return cc_entry(cc)[--cc->avail];
1334     return NULL;
1335 }
1305The parameters are the cache to allocate from, the per CPU cache to fill and allocation flags
1307batchcount is the number of objects to allocate
1309Obtain the spinlock for access to the cache descriptor
1310-1329Loop batchcount times
1311-1324This is example the same as kmem_cache_alloc_one()(See Section H.3.2.5). It selects a slab from either slabs_partial or slabs_free to allocate from. If none are available, break out of the loop
1326-1327Call kmem_cache_alloc_one_tail() (See Section H.3.2.6) and place it in the per CPU cache
1330Release the cache descriptor lock
1332-1333Take one of the objects allocated in this batch and return it
1334If no object was allocated, return. __kmem_cache_alloc() (See Section H.3.2.2) will grow the cache by one slab and try again

H.3.3  Object Freeing

H.3.3.1  Function: kmem_cache_free

Source: mm/slab.c

The call graph for this function is shown in Figure 8.15.

1576 void kmem_cache_free (kmem_cache_t *cachep, void *objp)
1577 {
1578     unsigned long flags;
1579 #if DEBUG
1580     CHECK_PAGE(virt_to_page(objp));
1581     if (cachep != GET_PAGE_CACHE(virt_to_page(objp)))
1582         BUG();
1583 #endif
1584 
1585     local_irq_save(flags);
1586     __kmem_cache_free(cachep, objp);
1587     local_irq_restore(flags);
1588 }
1576The parameter is the cache the object is been freed from and the object itself
1579-1583If debugging is enabled, the page will first be checked with CHECK_PAGE() to make sure it is a slab page. Secondly the page list will be examined to make sure it belongs to this cache (See Figure 8.8)
1585Interrupts are disabled to protect the path
1586__kmem_cache_free() (See Section H.3.3.2) will free the object to the per-CPU cache for the SMP case and to the global pool in the normal case
1587Re-enable interrupts

H.3.3.2  Function: __kmem_cache_free (UP Case)

Source: mm/slab.c

This covers what the function looks like in the UP case. Clearly, it simply releases the object to the slab.

1493 static inline void __kmem_cache_free (kmem_cache_t *cachep, 
                                           void* objp)
1494 {
1517     kmem_cache_free_one(cachep, objp);
1519 }

H.3.3.3  Function: __kmem_cache_free (SMP Case)

Source: mm/slab.c

This case is slightly more interesting. In this case, the object is released to the per-cpu cache if it is available.

1493 static inline void __kmem_cache_free (kmem_cache_t *cachep, 
                                          void* objp)
1494 {
1496     cpucache_t *cc = cc_data(cachep);
1497 
1498     CHECK_PAGE(virt_to_page(objp));
1499     if (cc) {
1500         int batchcount;
1501         if (cc->avail < cc->limit) {
1502             STATS_INC_FREEHIT(cachep);
1503             cc_entry(cc)[cc->avail++] = objp;
1504             return;
1505         }
1506         STATS_INC_FREEMISS(cachep);
1507         batchcount = cachep->batchcount;
1508         cc->avail -= batchcount;
1509         free_block(cachep,
1510             &cc_entry(cc)[cc->avail],batchcount);
1511         cc_entry(cc)[cc->avail++] = objp;
1512         return;
1513     } else {
1514         free_block(cachep, &objp, 1);
1515     }
1519 }
1496Get the data for this per CPU cache (See Section 8.5.1)
1498Make sure the page is a slab page
1499-1513If a per-CPU cache is available, try to use it. This is not always available. During cache destruction for instance, the per CPU caches are already gone
1501-1505If the number of available in the per CPU cache is below limit, then add the object to the free list and return
1506Update statistics if enabled
1507The pool has overflowed so batchcount number of objects is going to be freed to the global pool
1508Update the number of available (avail) objects
1509-1510Free a block of objects to the global cache
1511Free the requested object and place it on the per CPU pool
1513If the per-CPU cache is not available, then free this object to the global pool

H.3.3.4  Function: kmem_cache_free_one

Source: mm/slab.c

1414 static inline void kmem_cache_free_one(kmem_cache_t *cachep, 
                                            void *objp)
1415 {
1416     slab_t* slabp;
1417 
1418     CHECK_PAGE(virt_to_page(objp));
1425     slabp = GET_PAGE_SLAB(virt_to_page(objp));
1426 
1427 #if DEBUG
1428     if (cachep->flags & SLAB_DEBUG_INITIAL)
1433         cachep->ctor(objp, cachep,
            SLAB_CTOR_CONSTRUCTOR|SLAB_CTOR_VERIFY);
1434 
1435     if (cachep->flags & SLAB_RED_ZONE) {
1436         objp -= BYTES_PER_WORD;
1437         if (xchg((unsigned long *)objp, RED_MAGIC1) !=
                             RED_MAGIC2)
1438             BUG();
1440         if (xchg((unsigned long *)(objp+cachep->objsize -
1441                 BYTES_PER_WORD), RED_MAGIC1) !=
                              RED_MAGIC2)
1443             BUG();
1444     }
1445     if (cachep->flags & SLAB_POISON)
1446         kmem_poison_obj(cachep, objp);
1447     if (kmem_extra_free_checks(cachep, slabp, objp))
1448         return;
1449 #endif
1450     {
1451         unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;
1452 
1453         slab_bufctl(slabp)[objnr] = slabp->free;
1454         slabp->free = objnr;
1455     }
1456     STATS_DEC_ACTIVE(cachep);
1457     
1459     {
1460         int inuse = slabp->inuse;
1461         if (unlikely(!--slabp->inuse)) {
1462             /* Was partial or full, now empty. */
1463             list_del(&slabp->list);
1464             list_add(&slabp->list, &cachep->slabs_free);
1465         } else if (unlikely(inuse == cachep->num)) {
1466             /* Was full. */
1467             list_del(&slabp->list);
1468             list_add(&slabp->list, &cachep->slabs_partial);
1469         }
1470     }
1471 }
1418Make sure the page is a slab page
1425Get the slab descriptor for the page
1427-1449Debugging material. Discussed at end of section
1451Calculate the index for the object been freed
1454As this object is now free, update the bufctl to reflect that
1456If statistics are enabled, disable the number of active objects in the slab
1461-1464If inuse reaches 0, the slab is free and is moved to the slabs_free list
1465-1468If the number in use equals the number of objects in a slab, it is full so move it to the slabs_full list
1471End of function
1428-1433If SLAB_DEBUG_INITIAL is set, the constructor is called to verify the object is in an initialised state
1435-1444Verify the red marks at either end of the object are still there. This will check for writes beyond the boundaries of the object and for double frees
1445-1446Poison the freed object with a known pattern
1447-1448This function will confirm the object is a part of this slab and cache. It will then check the free list (bufctl) to make sure this is not a double free

H.3.3.5  Function: free_block

Source: mm/slab.c

This function is only used in the SMP case when the per CPU cache gets too full. It is used to free a batch of objects in bulk

1481 static void free_block (kmem_cache_t* cachep, void** objpp, 
                             int len)
1482 {
1483     spin_lock(&cachep->spinlock);
1484     __free_block(cachep, objpp, len);
1485     spin_unlock(&cachep->spinlock);
1486 }
1481The parameters are;
cachep The cache that objects are been freed from
objpp Pointer to the first object to free
len The number of objects to free
1483Acquire a lock to the cache descriptor
1486__free_block()(See Section H.3.3.6) performs the actual task of freeing up each of the pages
1487Release the lock

H.3.3.6  Function: __free_block

Source: mm/slab.c

This function is responsible for freeing each of the objects in the per-CPU array objpp.

1474 static inline void __free_block (kmem_cache_t* cachep,
1475                 void** objpp, int len)
1476 {
1477     for ( ; len > 0; len--, objpp++)
1478         kmem_cache_free_one(cachep, *objpp);
1479 }
1474The parameters are the cachep the objects belong to, the list of objects(objpp) and the number of objects to free (len)
1477Loop len number of times
1478Free an object from the array

H.4  Sizes Cache

H.4.1  Initialising the Sizes Cache

H.4.1.1  Function: kmem_cache_sizes_init

Source: mm/slab.c

This function is responsible for creating pairs of caches for small memory buffers suitable for either normal or DMA memory.

436 void __init kmem_cache_sizes_init(void)
437 {
438     cache_sizes_t *sizes = cache_sizes;
439     char name[20];
440
444     if (num_physpages > (32 << 20) >> PAGE_SHIFT)
445         slab_break_gfp_order = BREAK_GFP_ORDER_HI;
446     do {
452         snprintf(name, sizeof(name), "size-%Zd",
                 sizes->cs_size);
453         if (!(sizes->cs_cachep =
454             kmem_cache_create(name, sizes->cs_size,
455                       0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
456             BUG();
457         }
458 
460         if (!(OFF_SLAB(sizes->cs_cachep))) {
461             offslab_limit = sizes->cs_size-sizeof(slab_t);
462             offslab_limit /= 2;
463         }
464         snprintf(name, sizeof(name), "size-%Zd(DMA)",
                         sizes->cs_size);
465         sizes->cs_dmacachep = kmem_cache_create(name, 
                  sizes->cs_size, 0,
466                   SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, 
                  NULL, NULL);
467         if (!sizes->cs_dmacachep)
468             BUG();
469         sizes++;
470     } while (sizes->cs_size);
471 }
438Get a pointer to the cache_sizes array
439The human readable name of the cache . Should be sized CACHE_NAMELEN which is defined to be 20 bytes long
444-445slab_break_gfp_order determines how many pages a slab may use unless 0 objects fit into the slab. It is statically initialised to BREAK_GFP_ORDER_LO (1). This check sees if more than 32MiB of memory is available and if it is, allow BREAK_GFP_ORDER_HI number of pages to be used because internal fragmentation is more acceptable when more memory is available.
446-470Create two caches for each size of memory allocation needed
452Store the human readable cache name in name
453-454Create the cache, aligned to the L1 cache
460-463Calculate the off-slab bufctl limit which determines the number of objects that can be stored in a cache when the slab descriptor is kept off-cache.
464The human readable name for the cache for DMA use
465-466Create the cache, aligned to the L1 cache and suitable for DMA user
467if the cache failed to allocate, it is a bug. If memory is unavailable this early, the machine will not boot
469Move to the next element in the cache_sizes array
470The array is terminated with a 0 as the last element

H.4.2  kmalloc()

H.4.2.1  Function: kmalloc

Source: mm/slab.c

Ths call graph for this function is shown in Figure 8.16.

1555 void * kmalloc (size_t size, int flags)
1556 {
1557     cache_sizes_t *csizep = cache_sizes;
1558 
1559     for (; csizep->cs_size; csizep++) {
1560         if (size > csizep->cs_size)
1561             continue;
1562         return __kmem_cache_alloc(flags & GFP_DMA ?
1563              csizep->cs_dmacachep : 
                  csizep->cs_cachep, flags);
1564     }
1565     return NULL;
1566 }
1557cache_sizes is the array of caches for each size (See Section 8.4)
1559-1564Starting with the smallest cache, examine the size of each cache until one large enough to satisfy the request is found
1562If the allocation is for use with DMA, allocate an object from cs_dmacachep else use the cs_cachep
1565If a sizes cache of sufficient size was not available or an object could not be allocated, return failure

H.4.3  kfree()

H.4.3.1  Function: kfree

Source: mm/slab.c

The call graph for this function is shown in Figure 8.17. It is worth noting that the work this function does is almost identical to the function kmem_cache_free() with debugging enabled (See Section H.3.3.1).

1597 void kfree (const void *objp)
1598 {
1599     kmem_cache_t *c;
1600     unsigned long flags;
1601 
1602     if (!objp)
1603         return;
1604     local_irq_save(flags);
1605     CHECK_PAGE(virt_to_page(objp));
1606     c = GET_PAGE_CACHE(virt_to_page(objp));
1607     __kmem_cache_free(c, (void*)objp);
1608     local_irq_restore(flags);
1609 }
1602Return if the pointer is NULL. This is possible if a caller used kmalloc() and had a catch-all failure routine which called kfree() immediately
1604Disable interrupts
1605Make sure the page this object is in is a slab page
1606Get the cache this pointer belongs to (See Section 8.2)
1607Free the memory object
1608Re-enable interrupts

H.5  Per-CPU Object Cache

The structure of the Per-CPU object cache and how objects are added or removed from them is covered in detail in Sections 8.5.1 and 8.5.2.

H.5.1  Enabling Per-CPU Caches

H.5.1.1  Function: enable_all_cpucaches

Source: mm/slab.c


Figure H.1: Call Graph: enable_all_cpucaches()

This function locks the cache chain and enables the cpucache for every cache. This is important after the cache_cache and sizes cache have been enabled.

1714 static void enable_all_cpucaches (void)
1715 {
1716     struct list_head* p;
1717 
1718     down(&cache_chain_sem);
1719 
1720     p = &cache_cache.next;
1721     do {
1722         kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next);
1723 
1724         enable_cpucache(cachep);
1725         p = cachep->next.next;
1726     } while (p != &cache_cache.next);
1727 
1728     up(&cache_chain_sem);
1729 }
1718Obtain the semaphore to the cache chain
1719Get the first cache on the chain
1721-1726Cycle through the whole chain
1722Get a cache from the chain. This code will skip the first cache on the chain but cache_cache doesn't need a cpucache as it is so rarely used
1724Enable the cpucache
1725Move to the next cache on the chain
1726Release the cache chain semaphore

H.5.1.2  Function: enable_cpucache

Source: mm/slab.c

This function calculates what the size of a cpucache should be based on the size of the objects the cache contains before calling kmem_tune_cpucache() which does the actual allocation.

1693 static void enable_cpucache (kmem_cache_t *cachep)
1694 {
1695     int err;
1696     int limit;
1697 
1699     if (cachep->objsize > PAGE_SIZE)
1700         return;
1701     if (cachep->objsize > 1024)
1702         limit = 60;
1703     else if (cachep->objsize > 256)
1704         limit = 124;
1705     else
1706         limit = 252;
1707 
1708     err = kmem_tune_cpucache(cachep, limit, limit/2);
1709     if (err)
1710         printk(KERN_ERR 
            "enable_cpucache failed for %s, error %d.\n",
1711                     cachep->name, -err);
1712 }
1699-1700If an object is larger than a page, do not create a per-CPU cache as they are too expensive
1701-1702If an object is larger than 1KiB, keep the cpu cache below 3MiB in size. The limit is set to 124 objects to take the size of the cpucache descriptors into account
1703-1704For smaller objects, just make sure the cache doesn't go above 3MiB in size
1708Allocate the memory for the cpucache
1710-1711Print out an error message if the allocation failed

H.5.1.3  Function: kmem_tune_cpucache

Source: mm/slab.c

This function is responsible for allocating memory for the cpucaches. For each CPU on the system, kmalloc gives a block of memory large enough for one cpu cache and fills a ccupdate_struct_t struct. The function smp_call_function_all_cpus() then calls do_ccupdate_local() which swaps the new information with the old information in the cache descriptor.

1639 static int kmem_tune_cpucache (kmem_cache_t* cachep, 
                    int limit, int batchcount)
1640 {
1641     ccupdate_struct_t new;
1642     int i;
1643 
1644     /*
1645      * These are admin-provided, so we are more graceful.
1646      */
1647     if (limit < 0)
1648         return -EINVAL;
1649     if (batchcount < 0)
1650         return -EINVAL;
1651     if (batchcount > limit)
1652         return -EINVAL;
1653     if (limit != 0 && !batchcount)
1654         return -EINVAL;
1655 
1656     memset(&new.new,0,sizeof(new.new));
1657     if (limit) {
1658         for (i = 0; i< smp_num_cpus; i++) {
1659             cpucache_t* ccnew;
1660 
1661             ccnew = kmalloc(sizeof(void*)*limit+
1662                     sizeof(cpucache_t), 
                         GFP_KERNEL);
1663             if (!ccnew)
1664                 goto oom;
1665             ccnew->limit = limit;
1666             ccnew->avail = 0;
1667             new.new[cpu_logical_map(i)] = ccnew;
1668         }
1669     }
1670     new.cachep = cachep;
1671     spin_lock_irq(&cachep->spinlock);
1672     cachep->batchcount = batchcount;
1673     spin_unlock_irq(&cachep->spinlock);
1674 
1675     smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
1676 
1677     for (i = 0; i < smp_num_cpus; i++) {
1678         cpucache_t* ccold = new.new[cpu_logical_map(i)];
1679         if (!ccold)
1680             continue;
1681         local_irq_disable();
1682         free_block(cachep, cc_entry(ccold), ccold->avail);
1683         local_irq_enable();
1684         kfree(ccold);
1685     }
1686     return 0;
1687 oom:
1688     for (i--; i >= 0; i--)
1689         kfree(new.new[cpu_logical_map(i)]);
1690     return -ENOMEM;
1691 }
1639The parameters of the function are
cachep The cache this cpucache is been allocated for
limit The total number of objects that can exist in the cpucache
batchcount The number of objects to allocate in one batch when the cpucache is empty
1647The number of objects in the cache cannot be negative
1649A negative number of objects cannot be allocated in batch
1651A batch of objects greater than the limit cannot be allocated
1653A batchcount must be provided if the limit is positive
1656Zero fill the update struct
1657If a limit is provided, allocate memory for the cpucache
1658-1668For every CPU, allocate a cpucache
1661The amount of memory needed is limit number of pointers and the size of the cpucache descriptor
1663If out of memory, clean up and exit
1665-1666Fill in the fields for the cpucache descriptor
1667Fill in the information for ccupdate_update_t struct
1670Tell the ccupdate_update_t struct what cache is been updated
1671-1673Acquire an interrupt safe lock to the cache descriptor and set its batchcount
1675Get each CPU to update its cpucache information for itself. This swaps the old cpucaches in the cache descriptor with the new ones in new using do_ccupdate_local() (See Section H.5.2.2)
1677-1685After smp_call_function_all_cpus() (See Section H.5.2.1), the old cpucaches are in new. This block of code cycles through them all, frees any objects in them and deletes the old cpucache
1686Return success
1688In the event there is no memory, delete all cpucaches that have been allocated up until this point and return failure

H.5.2  Updating Per-CPU Information

H.5.2.1  Function: smp_call_function_all_cpus

Source: mm/slab.c

This calls the function func() for all CPU's. In the context of the slab allocator, the function is do_ccupdate_local() and the argument is ccupdate_struct_t.

859 static void smp_call_function_all_cpus(void (*func) (void *arg), 
                       void *arg)
860 {
861     local_irq_disable();
862     func(arg);
863     local_irq_enable();
864 
865     if (smp_call_function(func, arg, 1, 1))
866         BUG();
867 }
861-863Disable interrupts locally and call the function for this CPU
865For all other CPU's, call the function. smp_call_function() is an architecture specific function and will not be discussed further here

H.5.2.2  Function: do_ccupdate_local

Source: mm/slab.c

This function swaps the cpucache information in the cache descriptor with the information in info for this CPU.

874 static void do_ccupdate_local(void *info)
875 {
876     ccupdate_struct_t *new = (ccupdate_struct_t *)info;
877     cpucache_t *old = cc_data(new->cachep);
878     
879     cc_data(new->cachep) = new->new[smp_processor_id()];
880     new->new[smp_processor_id()] = old;
881 }
876info is a pointer to the ccupdate_struct_t which is then passed to smp_call_function_all_cpus()(See Section H.5.2.1)
877Part of the ccupdate_struct_t is a pointer to the cache this cpucache belongs to. cc_data() returns the cpucache_t for this processor
879Place the new cpucache in cache descriptor. cc_data() returns the pointer to the cpucache for this CPU.
880Replace the pointer in new with the old cpucache so it can be deleted later by the caller of smp_call_function_call_cpus(), kmem_tune_cpucache() for example

H.5.3  Draining a Per-CPU Cache

This function is called to drain all objects in a per-cpu cache. It is called when a cache needs to be shrunk for the freeing up of slabs. A slab would not be freeable if an object was in the per-cpu cache even though it is not in use.

H.5.3.1  Function: drain_cpu_caches

Source: mm/slab.c

885 static void drain_cpu_caches(kmem_cache_t *cachep)
886 {
887     ccupdate_struct_t new;
888     int i;
889 
890     memset(&new.new,0,sizeof(new.new));
891 
892     new.cachep = cachep;
893 
894     down(&cache_chain_sem);
895     smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
896 
897     for (i = 0; i < smp_num_cpus; i++) {
898         cpucache_t* ccold = new.new[cpu_logical_map(i)];
899         if (!ccold || (ccold->avail == 0))
900             continue;
901         local_irq_disable();
902         free_block(cachep, cc_entry(ccold), ccold->avail);
903         local_irq_enable();
904         ccold->avail = 0;
905     }
906     smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
907     up(&cache_chain_sem);
908 }
890Blank the update structure as it is going to be clearing all data
892Set new.cachep to cachep so that smp_call_function_all_cpus() knows what cache it is affecting
894Acquire the cache descriptor semaphore
895do_ccupdate_local()(See Section H.5.2.2) swaps the cpucache_t information in the cache descriptor with the ones in new so they can be altered here
897-905For each CPU in the system ....
898Get the cpucache descriptor for this CPU
899If the structure does not exist for some reason or there is no objects available in it, move to the next CPU
901Disable interrupts on this processor. It is possible an allocation from an interrupt handler elsewhere would try to access the per CPU cache
902Free the block of objects with free_block() (See Section H.3.3.5)
903Re-enable interrupts
904Show that no objects are available
906The information for each CPU has been updated so call do_ccupdate_local() (See Section H.5.2.2) for each CPU to put the information back into the cache descriptor
907Release the semaphore for the cache chain

H.6  Slab Allocator Initialisation

H.6.0.2  Function: kmem_cache_init

Source: mm/slab.c

This function will

416 void __init kmem_cache_init(void)
417 {
418     size_t left_over;
419 
420     init_MUTEX(&cache_chain_sem);
421     INIT_LIST_HEAD(&cache_chain);
422 
423     kmem_cache_estimate(0, cache_cache.objsize, 0,
424             &left_over, &cache_cache.num);
425     if (!cache_cache.num)
426         BUG();
427 
428     cache_cache.colour = left_over/cache_cache.colour_off;
429     cache_cache.colour_next = 0;
430 }
420Initialise the semaphore for access the cache chain
421Initialise the cache chain linked list
423kmem_cache_estimate()(See Section H.1.2.1) calculates the number of objects and amount of bytes wasted
425If even one kmem_cache_t cannot be stored in a page, there is something seriously wrong
428colour is the number of different cache lines that can be used while still keeping L1 cache alignment
429colour_next indicates which line to use next. Start at 0

H.7  Interfacing with the Buddy Allocator

H.7.0.3  Function: kmem_getpages

Source: mm/slab.c

This allocates pages for the slab allocator

486 static inline void * kmem_getpages (kmem_cache_t *cachep, 
                                        unsigned long flags)
487 {
488     void    *addr;
495     flags |= cachep->gfpflags;
496     addr = (void*) __get_free_pages(flags, cachep->gfporder);
503     return addr;
504 }
495Whatever flags were requested for the allocation, append the cache flags to it. The only flag it may append is ZONE_DMA if the cache requires DMA memory
496Allocate from the buddy allocator with __get_free_pages() (See Section F.2.3)
503Return the pages or NULL if it failed

H.7.0.4  Function: kmem_freepages

Source: mm/slab.c

This frees pages for the slab allocator. Before it calls the buddy allocator API, it will remove the PG_slab bit from the page flags.

507 static inline void kmem_freepages (kmem_cache_t *cachep, void *addr)
508 {
509     unsigned long i = (1<<cachep->gfporder);
510     struct page *page = virt_to_page(addr);
511 
517     while (i--) {
518         PageClearSlab(page);
519         page++;
520     }
521     free_pages((unsigned long)addr, cachep->gfporder);
522 }
509Retrieve the order used for the original allocation
510Get the struct page for the address
517-520Clear the PG_slab bit on each page
521Free the pages to the buddy allocator with free_pages() (See Section F.4.1)


Previous Up Next