Maple Tree¶
- Author
- Liam R. Howlett 
Overview¶
The Maple Tree is a B-Tree data type which is optimized for storing non-overlapping ranges, including ranges of size 1. The tree was designed to be simple to use and does not require a user written search method. It supports iterating over a range of entries and going to the previous or next entry in a cache-efficient manner. The tree can also be put into an RCU-safe mode of operation which allows reading and writing concurrently. Writers must synchronize on a lock, which can be the default spinlock, or the user can set the lock to an external lock of a different type.
The Maple Tree maintains a small memory footprint and was designed to use modern processor cache efficiently. The majority of the users will be able to use the normal API. An Advanced API exists for more complex scenarios. The most important usage of the Maple Tree is the tracking of the virtual memory areas.
The Maple Tree can store values between 0 and ULONG_MAX.  The Maple
Tree reserves values with the bottom two bits set to '10' which are below 4096
(ie 2, 6, 10 .. 4094) for internal use.  If the entries may use reserved
entries then the users can convert the entries using xa_mk_value() and convert
them back by calling xa_to_value().  If the user needs to use a reserved
value, then the user can convert the value when using the
Advanced API, but are blocked by the normal API.
The Maple Tree can also be configured to support searching for a gap of a given size (or larger).
Pre-allocating of nodes is also supported using the Advanced API. This is useful for users who must guarantee a successful store operation within a given code segment when allocating cannot be done. Allocations of nodes are relatively small at around 256 bytes.
Normal API¶
Start by initialising a maple tree, either with DEFINE_MTREE() for statically
allocated maple trees or mt_init() for dynamically allocated ones.  A
freshly-initialised maple tree contains a NULL pointer for the range 0
- ULONG_MAX.  There are currently two types of maple trees supported: the
allocation tree and the regular tree.  The regular tree has a higher branching
factor for internal nodes.  The allocation tree has a lower branching factor
but allows the user to search for a gap of a given size or larger from either
0 upwards or ULONG_MAX down.  An allocation tree can be used by
passing in the MT_FLAGS_ALLOC_RANGE flag when initialising the tree.
You can then set entries using mtree_store() or mtree_store_range().
mtree_store() will overwrite any entry with the new entry and return 0 on
success or an error code otherwise.  mtree_store_range() works in the same way
but takes a range.  mtree_load() is used to retrieve the entry stored at a
given index.  You can use mtree_erase() to erase an entire range by only
knowing one value within that range, or mtree_store() call with an entry of
NULL may be used to partially erase a range or many ranges at once.
If you want to only store a new entry to a range (or index) if that range is
currently NULL, you can use mtree_insert_range() or mtree_insert() which
return -EEXIST if the range is not empty.
You can search for an entry from an index upwards by using mt_find().
You can walk each entry within a range by calling mt_for_each().  You must
provide a temporary variable to store a cursor.  If you want to walk each
element of the tree then 0 and ULONG_MAX may be used as the range.  If
the caller is going to hold the lock for the duration of the walk then it is
worth looking at the mas_for_each() API in the Advanced API
section.
Sometimes it is necessary to ensure the next call to store to a maple tree does not allocate memory, please see Advanced API for this use case.
Finally, you can remove all entries from a maple tree by calling
mtree_destroy().  If the maple tree entries are pointers, you may wish to free
the entries first.
Allocating Nodes¶
The allocations are handled by the internal tree code. See Advanced Allocating Nodes for other options.
Locking¶
You do not have to worry about locking. See Advanced Locking for other options.
The Maple Tree uses RCU and an internal spinlock to synchronise access:
- Takes RCU read lock:
- Takes ma_lock internally:
If you want to take advantage of the internal lock to protect the data
structures that you are storing in the Maple Tree, you can call mtree_lock()
before calling mtree_load(), then take a reference count on the object you
have found before calling mtree_unlock().  This will prevent stores from
removing the object from the tree between looking up the object and
incrementing the refcount.  You can also use RCU to avoid dereferencing
freed memory, but an explanation of that is beyond the scope of this
document.
Advanced API¶
The advanced API offers more flexibility and better performance at the cost of an interface which can be harder to use and has fewer safeguards. You must take care of your own locking while using the advanced API. You can use the ma_lock, RCU or an external lock for protection. You can mix advanced and normal operations on the same array, as long as the locking is compatible. The Normal API is implemented in terms of the advanced API.
The advanced API is based around the ma_state, this is where the 'mas' prefix originates. The ma_state struct keeps track of tree operations to make life easier for both internal and external tree users.
Initialising the maple tree is the same as in the Normal API. Please see above.
The maple state keeps track of the range start and end in mas->index and mas->last, respectively.
mas_walk() will walk the tree to the location of mas->index and set the
mas->index and mas->last according to the range for the entry.
You can set entries using mas_store().  mas_store() will overwrite any entry
with the new entry and return the first existing entry that is overwritten.
The range is passed in as members of the maple state: index and last.
You can use mas_erase() to erase an entire range by setting index and
last of the maple state to the desired range to erase.  This will erase
the first range that is found in that range, set the maple state index
and last as the range that was erased and return the entry that existed
at that location.
You can walk each entry within a range by using mas_for_each().  If you want
to walk each element of the tree then 0 and ULONG_MAX may be used as
the range.  If the lock needs to be periodically dropped, see the locking
section mas_pause().
Using a maple state allows mas_next() and mas_prev() to function as if the
tree was a linked list.  With such a high branching factor the amortized
performance penalty is outweighed by cache optimization.  mas_next() will
return the next entry which occurs after the entry at index.  mas_prev()
will return the previous entry which occurs before the entry at index.
mas_find() will find the first entry which exists at or above index on
the first call, and the next entry from every subsequent calls.
mas_find_rev() will find the first entry which exists at or below the last on
the first call, and the previous entry from every subsequent calls.
If the user needs to yield the lock during an operation, then the maple state
must be paused using mas_pause().
There are a few extra interfaces provided when using an allocation tree. If you wish to search for a gap within a range, then mas_empty_area() or mas_empty_area_rev() can be used. mas_empty_area() searches for a gap starting at the lowest index given up to the maximum of the range. mas_empty_area_rev() searches for a gap starting at the highest index given and continues downward to the lower bound of the range.
Advanced Allocating Nodes¶
Allocations are usually handled internally to the tree, however if allocations need to occur before a write occurs then calling mas_expected_entries() will allocate the worst-case number of needed nodes to insert the provided number of ranges. This also causes the tree to enter mass insertion mode. Once insertions are complete calling mas_destroy() on the maple state will free the unused allocations.
Advanced Locking¶
The maple tree uses a spinlock by default, but external locks can be used for
tree updates as well.  To use an external lock, the tree must be initialized
with the MT_FLAGS_LOCK_EXTERN flag, this is usually done with the
MTREE_INIT_EXT() #define, which takes an external lock as an argument.
Functions and structures¶
Maple tree flags
- MT_FLAGS_ALLOC_RANGE - Track gaps in this tree 
- MT_FLAGS_USE_RCU - Operate in RCU mode 
- MT_FLAGS_HEIGHT_OFFSET - The position of the tree height in the flags 
- MT_FLAGS_HEIGHT_MASK - The mask for the maple tree height value 
- MT_FLAGS_LOCK_MASK - How the mt_lock is used 
- MT_FLAGS_LOCK_IRQ - Acquired irq-safe 
- MT_FLAGS_LOCK_BH - Acquired bh-safe 
- MT_FLAGS_LOCK_EXTERN - mt_lock is not used 
MAPLE_HEIGHT_MAX The largest height that can be stored
- 
MTREE_INIT¶
MTREE_INIT (name, __flags)
Initialize a maple tree
Parameters
- name
- The maple tree name 
- __flags
- The maple tree flags 
- 
MTREE_INIT_EXT¶
MTREE_INIT_EXT (name, __flags, __lock)
Initialize a maple tree with an external lock.
Parameters
- name
- The tree name 
- __flags
- The maple tree flags 
- __lock
- The external lock 
- 
bool mtree_empty(const struct maple_tree *mt)¶
- Determine if a tree has any present entries. 
Parameters
- const struct maple_tree *mt
- Maple Tree. 
Context
Any context.
Return
true if the tree contains only NULL pointers.
- 
void mas_reset(struct ma_state *mas)¶
- Reset a Maple Tree operation state. 
Parameters
- struct ma_state *mas
- Maple Tree operation state. 
Description
Resets the error or walk state of the mas so future walks of the array will start from the root. Use this if you have dropped the lock and want to reuse the ma_state.
Context
Any context.
- 
mas_for_each¶
mas_for_each (__mas, __entry, __max)
Iterate over a range of the maple tree.
Parameters
- __mas
- Maple Tree operation state (maple_state) 
- __entry
- Entry retrieved from the tree 
- __max
- maximum index to retrieve from the tree 
Description
When returned, mas->index and mas->last will hold the entire range for the entry.
Note
may return the zero entry.
- 
void __mas_set_range(struct ma_state *mas, unsigned long start, unsigned long last)¶
- Set up Maple Tree operation state to a sub-range of the current location. 
Parameters
- struct ma_state *mas
- Maple Tree operation state. 
- unsigned long start
- New start of range in the Maple Tree. 
- unsigned long last
- New end of range in the Maple Tree. 
Description
set the internal maple state values to a sub-range.
Please use mas_set_range() if you do not know where you are in the tree.
- 
void mas_set_range(struct ma_state *mas, unsigned long start, unsigned long last)¶
- Set up Maple Tree operation state for a different index. 
Parameters
- struct ma_state *mas
- Maple Tree operation state. 
- unsigned long start
- New start of range in the Maple Tree. 
- unsigned long last
- New end of range in the Maple Tree. 
Description
Move the operation state to refer to a different range.  This will
have the effect of starting a walk from the top; see mas_next()
to move to an adjacent index.
- 
void mas_set(struct ma_state *mas, unsigned long index)¶
- Set up Maple Tree operation state for a different index. 
Parameters
- struct ma_state *mas
- Maple Tree operation state. 
- unsigned long index
- New index into the Maple Tree. 
Description
Move the operation state to refer to a different index.  This will
have the effect of starting a walk from the top; see mas_next()
to move to an adjacent index.
- 
void mt_init_flags(struct maple_tree *mt, unsigned int flags)¶
- Initialise an empty maple tree with flags. 
Parameters
- struct maple_tree *mt
- Maple Tree 
- unsigned int flags
- maple tree flags. 
Description
If you need to initialise a Maple Tree with special flags (eg, an allocation tree), use this function.
Context
Any context.
- 
void mt_init(struct maple_tree *mt)¶
- Initialise an empty maple tree. 
Parameters
- struct maple_tree *mt
- Maple Tree 
Description
An empty Maple Tree.
Context
Any context.
- 
void mt_clear_in_rcu(struct maple_tree *mt)¶
- Switch the tree to non-RCU mode. 
Parameters
- struct maple_tree *mt
- The Maple Tree 
- 
void mt_set_in_rcu(struct maple_tree *mt)¶
- Switch the tree to RCU safe mode. 
Parameters
- struct maple_tree *mt
- The Maple Tree 
- 
mt_for_each¶
mt_for_each (__tree, __entry, __index, __max)
Iterate over each entry starting at index until max.
Parameters
- __tree
- The Maple Tree 
- __entry
- The current entry 
- __index
- The index to start the search from. Subsequently used as iterator. 
- __max
- The maximum limit for index 
Description
This iterator skips all entries, which resolve to a NULL pointer, e.g. entries which has been reserved with XA_ZERO_ENTRY.
- 
void *mas_insert(struct ma_state *mas, void *entry)¶
- Internal call to insert a value 
Parameters
- struct ma_state *mas
- The maple state 
- void *entry
- The entry to store 
Return
NULL or the contents that already exists at the requested index
otherwise.  The maple state needs to be checked for error conditions.
- 
void *mas_walk(struct ma_state *mas)¶
- Search for mas->index in the tree. 
Parameters
- struct ma_state *mas
- The maple state. 
Description
mas->index and mas->last will be set to the range if there is a value. If mas->node is MAS_NONE, reset to MAS_START.
Return
the entry at the location or NULL.
- 
void __rcu **mte_dead_walk(struct maple_enode **enode, unsigned char offset)¶
- Walk down a dead tree to just before the leaves 
Parameters
- struct maple_enode **enode
- The maple encoded node 
- unsigned char offset
- The starting offset 
Note
This can only be used from the RCU callback context.
- 
void mt_free_walk(struct rcu_head *head)¶
- Walk & free a tree in the RCU callback context 
Parameters
- struct rcu_head *head
- The RCU head that's within the node. 
Note
This can only be used from the RCU callback context.
- 
void *mas_store(struct ma_state *mas, void *entry)¶
- Store an entry. 
Parameters
- struct ma_state *mas
- The maple state. 
- void *entry
- The entry to store. 
Description
The mas->index and mas->last is used to set the range for the entry.
Note
The mas should have pre-allocated entries to ensure there is memory to store the entry. Please see mas_expected_entries()/mas_destroy() for more details.
Return
the first entry between mas->index and mas->last or NULL.
- 
int mas_store_gfp(struct ma_state *mas, void *entry, gfp_t gfp)¶
- Store a value into the tree. 
Parameters
- struct ma_state *mas
- The maple state 
- void *entry
- The entry to store 
- gfp_t gfp
- The GFP_FLAGS to use for allocations if necessary. 
Return
0 on success, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
- 
void mas_store_prealloc(struct ma_state *mas, void *entry)¶
- Store a value into the tree using memory preallocated in the maple state. 
Parameters
- struct ma_state *mas
- The maple state 
- void *entry
- The entry to store. 
- 
int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)¶
- Preallocate enough nodes for a store operation 
Parameters
- struct ma_state *mas
- The maple state 
- void *entry
- The entry that will be stored 
- gfp_t gfp
- The GFP_FLAGS to use for allocations. 
Return
0 on success, -ENOMEM if memory could not be allocated.
- 
void *mas_next(struct ma_state *mas, unsigned long max)¶
- Get the next entry. 
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long max
- The maximum index to check. 
Description
Returns the next entry after mas->index. Must hold rcu_read_lock or the write lock. Can return the zero entry.
Return
The next entry or NULL
- 
void *mas_next_range(struct ma_state *mas, unsigned long max)¶
- Advance the maple state to the next range 
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long max
- The maximum index to check. 
Description
Sets mas->index and mas->last to the range. Must hold rcu_read_lock or the write lock. Can return the zero entry.
Return
The next entry or NULL
- 
void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max)¶
- get the next value in the maple tree 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The start index 
- unsigned long max
- The maximum index to check 
Description
Takes RCU read lock internally to protect the search, which does not protect the returned pointer after dropping RCU read lock. See also: Maple Tree
Return
The entry higher than index or NULL if nothing is found.
- 
void *mas_prev(struct ma_state *mas, unsigned long min)¶
- Get the previous entry 
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long min
- The minimum value to check. 
Description
Must hold rcu_read_lock or the write lock. Will reset mas to MAS_START if the node is MAS_NONE. Will stop on not searchable nodes.
Return
the previous value or NULL.
- 
void *mas_prev_range(struct ma_state *mas, unsigned long min)¶
- Advance to the previous range 
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long min
- The minimum value to check. 
Description
Sets mas->index and mas->last to the range. Must hold rcu_read_lock or the write lock. Will reset mas to MAS_START if the node is MAS_NONE. Will stop on not searchable nodes.
Return
the previous value or NULL.
- 
void *mt_prev(struct maple_tree *mt, unsigned long index, unsigned long min)¶
- get the previous value in the maple tree 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The start index 
- unsigned long min
- The minimum index to check 
Description
Takes RCU read lock internally to protect the search, which does not protect the returned pointer after dropping RCU read lock. See also: Maple Tree
Return
The entry before index or NULL if nothing is found.
- 
void mas_pause(struct ma_state *mas)¶
- Pause a mas_find/mas_for_each to drop the lock. 
Parameters
- struct ma_state *mas
- The maple state to pause 
Description
Some users need to pause a walk and drop the lock they're holding in
order to yield to a higher priority thread or carry out an operation
on an entry.  Those users should call this function before they drop
the lock.  It resets the mas to be suitable for the next iteration
of the loop after the user has reacquired the lock.  If most entries
found during a walk require you to call mas_pause(), the mt_for_each()
iterator may be more appropriate.
- 
bool mas_find_setup(struct ma_state *mas, unsigned long max, void **entry)¶
- Internal function to set up mas_find*(). 
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long max
- The maximum index 
- void **entry
- Pointer to the entry 
Return
True if entry is the answer, false otherwise.
- 
void *mas_find(struct ma_state *mas, unsigned long max)¶
- On the first call, find the entry at or after mas->index up to - max. Otherwise, find the entry after mas->index.
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long max
- The maximum value to check. 
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL.
- 
void *mas_find_range(struct ma_state *mas, unsigned long max)¶
- On the first call, find the entry at or after mas->index up to - max. Otherwise, advance to the next slot mas->index.
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long max
- The maximum value to check. 
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL.
- 
bool mas_find_rev_setup(struct ma_state *mas, unsigned long min, void **entry)¶
- Internal function to set up mas_find_*_rev() 
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long min
- The minimum index 
- void **entry
- Pointer to the entry 
Return
True if entry is the answer, false otherwise.
- 
void *mas_find_rev(struct ma_state *mas, unsigned long min)¶
- On the first call, find the first non-null entry at or below mas->index down to - min. Otherwise find the first non-null entry below mas->index down to- min.
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long min
- The minimum value to check. 
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL.
- 
void *mas_find_range_rev(struct ma_state *mas, unsigned long min)¶
- On the first call, find the first non-null entry at or below mas->index down to - min. Otherwise advance to the previous slot after mas->index down to- min.
Parameters
- struct ma_state *mas
- The maple state 
- unsigned long min
- The minimum value to check. 
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL.
- 
void *mas_erase(struct ma_state *mas)¶
- Find the range in which index resides and erase the entire range. 
Parameters
- struct ma_state *mas
- The maple state 
Description
Must hold the write lock. Searches for mas->index, sets mas->index and mas->last to the range and erases that range.
Return
the entry that was erased or NULL, mas->index and mas->last are updated.
- 
bool mas_nomem(struct ma_state *mas, gfp_t gfp)¶
- Check if there was an error allocating and do the allocation if necessary If there are allocations, then free them. 
Parameters
- struct ma_state *mas
- The maple state 
- gfp_t gfp
- The GFP_FLAGS to use for allocations 
Return
true on allocation, false otherwise.
- 
void *mtree_load(struct maple_tree *mt, unsigned long index)¶
- Load a value stored in a maple tree 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The index to load 
Return
the entry or NULL
- 
int mtree_store_range(struct maple_tree *mt, unsigned long index, unsigned long last, void *entry, gfp_t gfp)¶
- Store an entry at a given range. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The start of the range 
- unsigned long last
- The end of the range 
- void *entry
- The entry to store 
- gfp_t gfp
- The GFP_FLAGS to use for allocations 
Return
0 on success, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
- 
int mtree_store(struct maple_tree *mt, unsigned long index, void *entry, gfp_t gfp)¶
- Store an entry at a given index. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The index to store the value 
- void *entry
- The entry to store 
- gfp_t gfp
- The GFP_FLAGS to use for allocations 
Return
0 on success, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
- 
int mtree_insert_range(struct maple_tree *mt, unsigned long first, unsigned long last, void *entry, gfp_t gfp)¶
- Insert an entry at a given range if there is no value. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long first
- The start of the range 
- unsigned long last
- The end of the range 
- void *entry
- The entry to store 
- gfp_t gfp
- The GFP_FLAGS to use for allocations. 
Return
0 on success, -EEXISTS if the range is occupied, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
- 
int mtree_insert(struct maple_tree *mt, unsigned long index, void *entry, gfp_t gfp)¶
- Insert an entry at a given index if there is no value. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The index to store the value 
- void *entry
- The entry to store 
- gfp_t gfp
- The GFP_FLAGS to use for allocations. 
Return
0 on success, -EEXISTS if the range is occupied, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
- 
void *mtree_erase(struct maple_tree *mt, unsigned long index)¶
- Find an index and erase the entire range. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long index
- The index to erase 
Description
Erasing is the same as a walk to an entry then a store of a NULL to that ENTIRE range. In fact, it is implemented as such using the advanced API.
Return
The entry stored at the index or NULL
- 
void __mt_destroy(struct maple_tree *mt)¶
- Walk and free all nodes of a locked maple tree. 
Parameters
- struct maple_tree *mt
- The maple tree 
Note
Does not handle locking.
- 
void mtree_destroy(struct maple_tree *mt)¶
- Destroy a maple tree 
Parameters
- struct maple_tree *mt
- The maple tree 
Description
Frees all resources used by the tree. Handles locking.
- 
void *mt_find(struct maple_tree *mt, unsigned long *index, unsigned long max)¶
- Search from the start up until an entry is found. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long *index
- Pointer which contains the start location of the search 
- unsigned long max
- The maximum value of the search range 
Description
Takes RCU read lock internally to protect the search, which does not protect the returned pointer after dropping RCU read lock. See also: Maple Tree
In case that an entry is found index is updated to point to the next possible entry independent whether the found entry is occupying a single index or a range if indices.
Return
The entry at or after the index or NULL
- 
void *mt_find_after(struct maple_tree *mt, unsigned long *index, unsigned long max)¶
- Search from the start up until an entry is found. 
Parameters
- struct maple_tree *mt
- The maple tree 
- unsigned long *index
- Pointer which contains the start location of the search 
- unsigned long max
- The maximum value to check 
Description
Same as mt_find() except that it checks index for 0 before
searching. If index == 0, the search is aborted. This covers a wrap
around of index to 0 in an iterator loop.
Return
The entry at or after the index or NULL