=sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget%/translations/zh_CN/bpf/graph_ds_implmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/zh_TW/bpf/graph_ds_implmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/it_IT/bpf/graph_ds_implmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/ja_JP/bpf/graph_ds_implmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/ko_KR/bpf/graph_ds_implmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/sp_SP/bpf/graph_ds_implmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhsection)}(hhh](htitle)}(hBPF Graph Data Structuresh]hBPF Graph Data Structures}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhh?/var/lib/git/docbuild/linux/Documentation/bpf/graph_ds_impl.rsthKubh paragraph)}(hThis document describes implementation details of new-style "graph" data structures (linked_list, rbtree), with particular focus on the verifier's implementation of semantics specific to those data structures.h]hThis document describes implementation details of new-style “graph” data structures (linked_list, rbtree), with particular focus on the verifier’s implementation of semantics specific to those data structures.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hAlthough no specific verifier code is referred to in this document, the document assumes that the reader has general knowledge of BPF verifier internals, BPF maps, and BPF program writing.h]hAlthough no specific verifier code is referred to in this document, the document assumes that the reader has general knowledge of BPF verifier internals, BPF maps, and BPF program writing.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(hNote that the intent of this document is to describe the current state of these graph data structures. **No guarantees** of stability for either semantics or APIs are made or implied here.h](hgNote that the intent of this document is to describe the current state of these graph data structures. }(hhhhhNhNubhstrong)}(h**No guarantees**h]h No guarantees}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhubhD of stability for either semantics or APIs are made or implied here.}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK hhhhubhtopic)}(hhh]h bullet_list)}(hhh](h list_item)}(hhh]h)}(hhh]h reference)}(hhh]h Introduction}(hj hhhNhNubah}(h]id1ah ]h"]h$]h&]refid introductionuh1j hjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1jhhubj)}(hhh]h)}(hhh]j )}(hhh]h Unstable API}(hj-hhhNhNubah}(h]id2ah ]h"]h$]h&]refid unstable-apiuh1j hj*ubah}(h]h ]h"]h$]h&]uh1hhj'ubah}(h]h ]h"]h$]h&]uh1jhhubj)}(hhh]h)}(hhh]j )}(hhh]hLocking}(hjOhhhNhNubah}(h]id3ah ]h"]h$]h&]refidlockinguh1j hjLubah}(h]h ]h"]h$]h&]uh1hhjIubah}(h]h ]h"]h$]h&]uh1jhhubj)}(hhh]h)}(hhh]j )}(hhh]hNon-owning references}(hjqhhhNhNubah}(h]id4ah ]h"]h$]h&]refidnon-owning-referencesuh1j hjnubah}(h]h ]h"]h$]h&]uh1hhjkubah}(h]h ]h"]h$]h&]uh1jhhubeh}(h]h ]h"]h$]h&]uh1hhhhhhNhNubah}(h]contentsah ](contentslocaleh"]contentsah$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Introductionh]h Introduction}(hjhhhNhNubah}(h]h ]h"]h$]h&]refidjuh1hhjhhhhhKubh)}(hXfThe BPF map API has historically been the main way to expose data structures of various types for use within BPF programs. Some data structures fit naturally with the map API (HASH, ARRAY), others less so. Consequently, programs interacting with the latter group of data structures can be hard to parse for kernel programmers without previous BPF experience.h]hXfThe BPF map API has historically been the main way to expose data structures of various types for use within BPF programs. Some data structures fit naturally with the map API (HASH, ARRAY), others less so. Consequently, programs interacting with the latter group of data structures can be hard to parse for kernel programmers without previous BPF experience.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX5Luckily, some restrictions which necessitated the use of BPF map semantics are no longer relevant. With the introduction of kfuncs, kptrs, and the any-context BPF allocator, it is now possible to implement BPF data structures whose API and semantics more closely match those exposed to the rest of the kernel.h]hX5Luckily, some restrictions which necessitated the use of BPF map semantics are no longer relevant. With the introduction of kfuncs, kptrs, and the any-context BPF allocator, it is now possible to implement BPF data structures whose API and semantics more closely match those exposed to the rest of the kernel.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXTwo such data structures - linked_list and rbtree - have many verification details in common. Because both have "root"s ("head" for linked_list) and "node"s, the verifier code and this document refer to common functionality as "graph_api", "graph_root", "graph_node", etc.h]hX(Two such data structures - linked_list and rbtree - have many verification details in common. Because both have “root”s (“head” for linked_list) and “node”s, the verifier code and this document refer to common functionality as “graph_api”, “graph_root”, “graph_node”, etc.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK#hjhhubh)}(hZUnless otherwise stated, examples and semantics below apply to both graph data structures.h]hZUnless otherwise stated, examples and semantics below apply to both graph data structures.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK(hjhhubeh}(h]jah ]h"] introductionah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h Unstable APIh]h Unstable API}(hjhhhNhNubah}(h]h ]h"]h$]h&]jj6uh1hhjhhhhhK,ubh)}(hXData structures implemented using the BPF map API have historically used BPF helper functions - either standard map API helpers like ``bpf_map_update_elem`` or map-specific helpers. The new-style graph data structures instead use kfuncs to define their manipulation helpers. Because there are no stability guarantees for kfuncs, the API and semantics for these data structures can be evolved in a way that breaks backwards compatibility if necessary.h](hData structures implemented using the BPF map API have historically used BPF helper functions - either standard map API helpers like }(hjhhhNhNubhliteral)}(h``bpf_map_update_elem``h]hbpf_map_update_elem}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhX& or map-specific helpers. The new-style graph data structures instead use kfuncs to define their manipulation helpers. Because there are no stability guarantees for kfuncs, the API and semantics for these data structures can be evolved in a way that breaks backwards compatibility if necessary.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK.hjhhubh)}(hhRoot and node types for the new data structures are opaquely defined in the ``uapi/linux/bpf.h`` header.h](hLRoot and node types for the new data structures are opaquely defined in the }(hj!hhhNhNubj)}(h``uapi/linux/bpf.h``h]huapi/linux/bpf.h}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj!ubh header.}(hj!hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK5hjhhubeh}(h]j<ah ]h"] unstable apiah$]h&]uh1hhhhhhhhK,ubh)}(hhh](h)}(hLockingh]hLocking}(hjKhhhNhNubah}(h]h ]h"]h$]h&]jjXuh1hhjHhhhhhK9ubh)}(hkThe new-style data structures are intrusive and are defined similarly to their vanilla kernel counterparts:h]hkThe new-style data structures are intrusive and are defined similarly to their vanilla kernel counterparts:}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK;hjHhhubh literal_block)}(hstruct node_data { long key; long data; struct bpf_rb_node node; }; struct bpf_spin_lock glock; struct bpf_rb_root groot __contains(node_data, node);h]hstruct node_data { long key; long data; struct bpf_rb_node node; }; struct bpf_spin_lock glock; struct bpf_rb_root groot __contains(node_data, node);}hjisbah}(h]h ]h"]h$]h&] xml:spacepreserveforcelanguagechighlight_args}uh1jghhhK>hjHhhubh)}(hXThe "root" type for both linked_list and rbtree expects to be in a map_value which also contains a ``bpf_spin_lock`` - in the above example both global variables are placed in a single-value arraymap. The verifier considers this spin_lock to be associated with the ``bpf_rb_root`` by virtue of both being in the same map_value and will enforce that the correct lock is held when verifying BPF programs that manipulate the tree. Since this lock checking happens at verification time, there is no runtime penalty.h](hgThe “root” type for both linked_list and rbtree expects to be in a map_value which also contains a }(hj~hhhNhNubj)}(h``bpf_spin_lock``h]h bpf_spin_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj~ubh - in the above example both global variables are placed in a single-value arraymap. The verifier considers this spin_lock to be associated with the }(hj~hhhNhNubj)}(h``bpf_rb_root``h]h bpf_rb_root}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj~ubh by virtue of both being in the same map_value and will enforce that the correct lock is held when verifying BPF programs that manipulate the tree. Since this lock checking happens at verification time, there is no runtime penalty.}(hj~hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKIhjHhhubeh}(h]j^ah ]h"]lockingah$]h&]uh1hhhhhhhhK9ubh)}(hhh](h)}(hNon-owning referencesh]hNon-owning references}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjzuh1hhjhhhhhKRubh)}(h**Motivation**h]h)}(hjh]h Motivation}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhhhKThjhhubh)}(h Consider the following BPF code:h]h Consider the following BPF code:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKVhjhhubjh)}(hstruct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */ bpf_spin_lock(&lock); bpf_rbtree_add(&tree, n); /* PASSED */ bpf_spin_unlock(&lock);h]hstruct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */ bpf_spin_lock(&lock); bpf_rbtree_add(&tree, n); /* PASSED */ bpf_spin_unlock(&lock);}hjsbah}(h]h ]h"]h$]h&]jwjxjyjzj{j|}uh1jghhhKXhjhhubh)}(hXFrom the verifier's perspective, the pointer ``n`` returned from ``bpf_obj_new`` has type ``PTR_TO_BTF_ID | MEM_ALLOC``, with a ``btf_id`` of ``struct node_data`` and a nonzero ``ref_obj_id``. Because it holds ``n``, the program has ownership of the pointee's (object pointed to by ``n``) lifetime. The BPF program must pass off ownership before exiting - either via ``bpf_obj_drop``, which ``free``'s the object, or by adding it to ``tree`` with ``bpf_rbtree_add``.h](h/From the verifier’s perspective, the pointer }(hjhhhNhNubj)}(h``n``h]hn}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh returned from }(hjhhhNhNubj)}(h``bpf_obj_new``h]h bpf_obj_new}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh has type }(hjhhhNhNubj)}(h``PTR_TO_BTF_ID | MEM_ALLOC``h]hPTR_TO_BTF_ID | MEM_ALLOC}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh , with a }(hjhhhNhNubj)}(h ``btf_id``h]hbtf_id}(hj:hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh of }(hjhhhNhNubj)}(h``struct node_data``h]hstruct node_data}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh and a nonzero }(hjhhhNhNubj)}(h``ref_obj_id``h]h ref_obj_id}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh. Because it holds }(hjhhhNhNubj)}(h``n``h]hn}(hjphhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhE, the program has ownership of the pointee’s (object pointed to by }(hjhhhNhNubj)}(h``n``h]hn}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhP) lifetime. The BPF program must pass off ownership before exiting - either via }(hjhhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh, which }(hjhhhNhNubj)}(h``free``h]hfree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh$’s the object, or by adding it to }(hjhhhNhNubj)}(h``tree``h]htree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh with }(hjhhhNhNubj)}(h``bpf_rbtree_add``h]hbpf_rbtree_add}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKbhjhhubh)}(h(``ACQUIRED`` and ``PASSED`` comments in the example denote statements where "ownership is acquired" and "ownership is passed", respectively)h](h(}(hjhhhNhNubj)}(h ``ACQUIRED``h]hACQUIRED}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh and }(hjhhhNhNubj)}(h ``PASSED``h]hPASSED}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhy comments in the example denote statements where “ownership is acquired” and “ownership is passed”, respectively)}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKjhjhhubh)}(hX_What should the verifier do with ``n`` after ownership is passed off? If the object was ``free``'d with ``bpf_obj_drop`` the answer is obvious: the verifier should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as the object is no longer valid. The underlying memory may have been reused for some other allocation, unmapped, etc.h](h!What should the verifier do with }(hjhhhNhNubj)}(h``n``h]hn}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh2 after ownership is passed off? If the object was }(hjhhhNhNubj)}(h``free``h]hfree}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh ’d with }(hjhhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhT the answer is obvious: the verifier should reject programs which attempt to access }(hjhhhNhNubj)}(h``n``h]hn}(hjRhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh after }(hjhhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hjdhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhw as the object is no longer valid. The underlying memory may have been reused for some other allocation, unmapped, etc.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKmhjhhubh)}(hWhen ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``, but that would result in programs with useful, common coding patterns being rejected, e.g.:h](hWhen ownership is passed to }(hj|hhhNhNubj)}(h``tree``h]htree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj|ubh via }(hj|hhhNhNubj)}(h``bpf_rbtree_add``h]hbpf_rbtree_add}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj|ubhR the answer is less obvious. The verifier could enforce the same semantics as for }(hj|hhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj|ubh], but that would result in programs with useful, common coding patterns being rejected, e.g.:}(hj|hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKshjhhubjh)}(hint x; struct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */ bpf_spin_lock(&lock); bpf_rbtree_add(&tree, n); /* PASSED */ x = n->data; n->data = 42; bpf_spin_unlock(&lock);h]hint x; struct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */ bpf_spin_lock(&lock); bpf_rbtree_add(&tree, n); /* PASSED */ x = n->data; n->data = 42; bpf_spin_unlock(&lock);}hjsbah}(h]h ]h"]h$]h&]jwjxjyjzj{j|}uh1jghhhKxhjhhubh)}(hBoth the read from and write to ``n->data`` would be rejected. The verifier can do better, though, by taking advantage of two details:h](h Both the read from and write to }(hjhhhNhNubj)}(h ``n->data``h]hn->data}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh[ would be rejected. The verifier can do better, though, by taking advantage of two details:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh block_quote)}(hX* Graph data structure APIs can only be used when the ``bpf_spin_lock`` associated with the graph root is held * Both graph data structures have pointer stability * Because graph nodes are allocated with ``bpf_obj_new`` and adding / removing from the root involves fiddling with the ``bpf_{list,rb}_node`` field of the node struct, a graph node will remain at the same address after either operation. h]h)}(hhh](j)}(hmGraph data structure APIs can only be used when the ``bpf_spin_lock`` associated with the graph root is held h]h)}(hlGraph data structure APIs can only be used when the ``bpf_spin_lock`` associated with the graph root is heldh](h4Graph data structure APIs can only be used when the }(hjhhhNhNubj)}(h``bpf_spin_lock``h]h bpf_spin_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh' associated with the graph root is held}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hX+Both graph data structures have pointer stability * Because graph nodes are allocated with ``bpf_obj_new`` and adding / removing from the root involves fiddling with the ``bpf_{list,rb}_node`` field of the node struct, a graph node will remain at the same address after either operation. h](h)}(h1Both graph data structures have pointer stabilityh]h1Both graph data structures have pointer stability}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj"ubj)}(h* Because graph nodes are allocated with ``bpf_obj_new`` and adding / removing from the root involves fiddling with the ``bpf_{list,rb}_node`` field of the node struct, a graph node will remain at the same address after either operation. h]h)}(hhh]j)}(hBecause graph nodes are allocated with ``bpf_obj_new`` and adding / removing from the root involves fiddling with the ``bpf_{list,rb}_node`` field of the node struct, a graph node will remain at the same address after either operation. h]h)}(hBecause graph nodes are allocated with ``bpf_obj_new`` and adding / removing from the root involves fiddling with the ``bpf_{list,rb}_node`` field of the node struct, a graph node will remain at the same address after either operation.h](h'Because graph nodes are allocated with }(hj?hhhNhNubj)}(h``bpf_obj_new``h]h bpf_obj_new}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj?ubh@ and adding / removing from the root involves fiddling with the }(hj?hhhNhNubj)}(h``bpf_{list,rb}_node``h]hbpf_{list,rb}_node}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj?ubh_ field of the node struct, a graph node will remain at the same address after either operation.}(hj?hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj;ubah}(h]h ]h"]h$]h&]uh1jhj8ubah}(h]h ]h"]h$]h&]bullet*uh1hhhhKhj4ubah}(h]h ]h"]h$]h&]uh1jhhhKhj"ubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]j}j~uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubh)}(hXBecause the associated ``bpf_spin_lock`` must be held by any program adding or removing, if we're in the critical section bounded by that lock, we know that no other program can add or remove until the end of the critical section. This combined with pointer stability means that, until the critical section ends, we can safely access the graph node through ``n`` even after it was used to pass ownership.h](hBecause the associated }(hjhhhNhNubj)}(h``bpf_spin_lock``h]h bpf_spin_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhX? must be held by any program adding or removing, if we’re in the critical section bounded by that lock, we know that no other program can add or remove until the end of the critical section. This combined with pointer stability means that, until the critical section ends, we can safely access the graph node through }(hjhhhNhNubj)}(h``n``h]hn}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh* even after it was used to pass ownership.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hThe verifier considers such a reference a *non-owning reference*. The ref returned by ``bpf_obj_new`` is accordingly considered an *owning reference*. Both terms currently only have meaning in the context of graph nodes and API.h](h*The verifier considers such a reference a }(hjhhhNhNubhemphasis)}(h*non-owning reference*h]hnon-owning reference}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh. The ref returned by }(hjhhhNhNubj)}(h``bpf_obj_new``h]h bpf_obj_new}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is accordingly considered an }(hjhhhNhNubj)}(h*owning reference*h]howning reference}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhO. Both terms currently only have meaning in the context of graph nodes and API.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h **Details**h]h)}(hjh]hDetails}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h;Let's enumerate the properties of both types of references.h]h=Let’s enumerate the properties of both types of references.}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h*owning reference*h]j)}(hj6h]howning reference}(hj8hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hXC* This reference controls the lifetime of the pointee * Ownership of pointee must be 'released' by passing it to some graph API kfunc, or via ``bpf_obj_drop``, which ``free``'s the pointee * If not released before program ends, verifier considers program invalid * Access to the pointee's memory will not page fault h]h)}(hhh](j)}(h4This reference controls the lifetime of the pointee h]h)}(h3This reference controls the lifetime of the pointeeh]h3This reference controls the lifetime of the pointee}(hjVhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjRubah}(h]h ]h"]h$]h&]uh1jhjOubj)}(hOwnership of pointee must be 'released' by passing it to some graph API kfunc, or via ``bpf_obj_drop``, which ``free``'s the pointee * If not released before program ends, verifier considers program invalid h](h)}(hOwnership of pointee must be 'released' by passing it to some graph API kfunc, or via ``bpf_obj_drop``, which ``free``'s the pointeeh](hZOwnership of pointee must be ‘released’ by passing it to some graph API kfunc, or via }(hjnhhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hjvhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjnubh, which }(hjnhhhNhNubj)}(h``free``h]hfree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjnubh’s the pointee}(hjnhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjjubh)}(hhh]j)}(hHIf not released before program ends, verifier considers program invalid h]h)}(hGIf not released before program ends, verifier considers program invalidh]hGIf not released before program ends, verifier considers program invalid}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]j}j~uh1hhhhKhjjubeh}(h]h ]h"]h$]h&]uh1jhjOubj)}(h3Access to the pointee's memory will not page fault h]h)}(h2Access to the pointee's memory will not page faulth]h4Access to the pointee’s memory will not page fault}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjOubeh}(h]h ]h"]h$]h&]j}j~uh1hhhhKhjKubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubh)}(h*non-owning reference*h]j)}(hjh]hnon-owning reference}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hXE* This reference does not own the pointee * It cannot be used to add the graph node to a graph root, nor ``free``'d via ``bpf_obj_drop`` * No explicit control of lifetime, but can infer valid lifetime based on non-owning ref existence (see explanation below) * Access to the pointee's memory will not page fault h]h)}(hhh](j)}(hThis reference does not own the pointee * It cannot be used to add the graph node to a graph root, nor ``free``'d via ``bpf_obj_drop`` h](h)}(h'This reference does not own the pointeeh]h'This reference does not own the pointee}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubj)}(ha* It cannot be used to add the graph node to a graph root, nor ``free``'d via ``bpf_obj_drop`` h]h)}(hhh]j)}(h]It cannot be used to add the graph node to a graph root, nor ``free``'d via ``bpf_obj_drop`` h]h)}(h\It cannot be used to add the graph node to a graph root, nor ``free``'d via ``bpf_obj_drop``h](h=It cannot be used to add the graph node to a graph root, nor }(hj&hhhNhNubj)}(h``free``h]hfree}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj&ubh ’d via }(hj&hhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj&ubeh}(h]h ]h"]h$]h&]uh1hhhhKhj"ubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]j}j~uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhhhKhj ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hxNo explicit control of lifetime, but can infer valid lifetime based on non-owning ref existence (see explanation below) h]h)}(hwNo explicit control of lifetime, but can infer valid lifetime based on non-owning ref existence (see explanation below)h]hwNo explicit control of lifetime, but can infer valid lifetime based on non-owning ref existence (see explanation below)}(hjphhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjlubah}(h]h ]h"]h$]h&]uh1jhjubj)}(h3Access to the pointee's memory will not page fault h]h)}(h2Access to the pointee's memory will not page faulth]h4Access to the pointee’s memory will not page fault}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]j}j~uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubh)}(hXFrom verifier's perspective non-owning references can only exist between spin_lock and spin_unlock. Why? After spin_unlock another program can do arbitrary operations on the data structure like removing and ``free``-ing via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd, ``free``'d, and reused via bpf_obj_new would point to an entirely different thing. Or the memory could go away.h](hFrom verifier’s perspective non-owning references can only exist between spin_lock and spin_unlock. Why? After spin_unlock another program can do arbitrary operations on the data structure like removing and }(hjhhhNhNubj)}(h``free``h]hfree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhU-ing via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove’d, }(hjhhhNhNubj)}(h``free``h]hfree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhi’d, and reused via bpf_obj_new would point to an entirely different thing. Or the memory could go away.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX+To prevent this logic violation all non-owning references are invalidated by the verifier after a critical section ends. This is necessary to ensure the "will not page fault" property of non-owning references. So if the verifier hasn't invalidated a non-owning ref, accessing it will not page fault.h]hX1To prevent this logic violation all non-owning references are invalidated by the verifier after a critical section ends. This is necessary to ensure the “will not page fault” property of non-owning references. So if the verifier hasn’t invalidated a non-owning ref, accessing it will not page fault.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hCurrently ``bpf_obj_drop`` is not allowed in the critical section, so if there's a valid non-owning ref, we must be in a critical section, and can conclude that the ref's memory hasn't been dropped-and- ``free``'d or dropped-and-reused.h](h Currently }(hjhhhNhNubj)}(h``bpf_obj_drop``h]h bpf_obj_drop}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is not allowed in the critical section, so if there’s a valid non-owning ref, we must be in a critical section, and can conclude that the ref’s memory hasn’t been dropped-and- }(hjhhhNhNubj)}(h``free``h]hfree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh’d or dropped-and-reused.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXAny reference to a node that is in an rbtree _must_ be non-owning, since the tree has control of the pointee's lifetime. Similarly, any ref to a node that isn't in rbtree _must_ be owning. This results in a nice property: graph API add / remove implementations don't need to check if a node has already been added (or already removed), as the ownership model allows the verifier to prevent such a state from being valid by simply checking types.h]hXAny reference to a node that is in an rbtree _must_ be non-owning, since the tree has control of the pointee’s lifetime. Similarly, any ref to a node that isn’t in rbtree _must_ be owning. This results in a nice property: graph API add / remove implementations don’t need to check if a node has already been added (or already removed), as the ownership model allows the verifier to prevent such a state from being valid by simply checking types.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hgHowever, pointer aliasing poses an issue for the above "nice property". Consider the following example:h]hkHowever, pointer aliasing poses an issue for the above “nice property”. Consider the following example:}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubjh)}(hXJstruct node_data *n, *m, *o, *p; n = bpf_obj_new(typeof(*n)); /* 1 */ bpf_spin_lock(&lock); bpf_rbtree_add(&tree, n); /* 2 */ m = bpf_rbtree_first(&tree); /* 3 */ o = bpf_rbtree_remove(&tree, n); /* 4 */ p = bpf_rbtree_remove(&tree, m); /* 5 */ bpf_spin_unlock(&lock); bpf_obj_drop(o); bpf_obj_drop(p); /* 6 */h]hXJstruct node_data *n, *m, *o, *p; n = bpf_obj_new(typeof(*n)); /* 1 */ bpf_spin_lock(&lock); bpf_rbtree_add(&tree, n); /* 2 */ m = bpf_rbtree_first(&tree); /* 3 */ o = bpf_rbtree_remove(&tree, n); /* 4 */ p = bpf_rbtree_remove(&tree, m); /* 5 */ bpf_spin_unlock(&lock); bpf_obj_drop(o); bpf_obj_drop(p); /* 6 */}hj6sbah}(h]h ]h"]h$]h&]jwjxjyjzj{j|}uh1jghhhKhjhhubh)}(h{Assume the tree is empty before this program runs. If we track verifier state changes here using numbers in above comments:h]h{Assume the tree is empty before this program runs. If we track verifier state changes here using numbers in above comments:}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hX1) n is an owning reference 2) n is a non-owning reference, it's been added to the tree 3) n and m are non-owning references, they both point to the same node 4) o is an owning reference, n and m non-owning, all point to same node 5) o and p are owning, n and m non-owning, all point to the same node 6) a double-free has occurred, since o and p point to same node and o was ``free``'d in previous statement h]henumerated_list)}(hhh](j)}(hn is an owning reference h]h)}(hn is an owning referenceh]hn is an owning reference}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj\ubah}(h]h ]h"]h$]h&]uh1jhjYubj)}(h9n is a non-owning reference, it's been added to the tree h]h)}(h8n is a non-owning reference, it's been added to the treeh]h:n is a non-owning reference, it’s been added to the tree}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjtubah}(h]h ]h"]h$]h&]uh1jhjYubj)}(hDn and m are non-owning references, they both point to the same node h]h)}(hCn and m are non-owning references, they both point to the same nodeh]hCn and m are non-owning references, they both point to the same node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjYubj)}(hEo is an owning reference, n and m non-owning, all point to same node h]h)}(hDo is an owning reference, n and m non-owning, all point to same nodeh]hDo is an owning reference, n and m non-owning, all point to same node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjYubj)}(hCo and p are owning, n and m non-owning, all point to the same node h]h)}(hBo and p are owning, n and m non-owning, all point to the same nodeh]hBo and p are owning, n and m non-owning, all point to the same node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjYubj)}(hha double-free has occurred, since o and p point to same node and o was ``free``'d in previous statement h]h)}(hga double-free has occurred, since o and p point to same node and o was ``free``'d in previous statementh](hGa double-free has occurred, since o and p point to same node and o was }(hjhhhNhNubj)}(h``free``h]hfree}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh’d in previous statement}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjYubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix)uh1jWhjSubah}(h]h ]h"]h$]h&]uh1jhhhKhjhhubh)}(hStates 4 and 5 violate our "nice property", as there are non-owning refs to a node which is not in an rbtree. Statement 5 will try to remove a node which has already been removed as a result of this violation. State 6 is a dangerous double-free.h]hStates 4 and 5 violate our “nice property”, as there are non-owning refs to a node which is not in an rbtree. Statement 5 will try to remove a node which has already been removed as a result of this violation. State 6 is a dangerous double-free.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hAt a minimum we should prevent state 6 from being possible. If we can't also prevent state 5 then we must abandon our "nice property" and check whether a node has already been removed at runtime.h]hAt a minimum we should prevent state 6 from being possible. If we can’t also prevent state 5 then we must abandon our “nice property” and check whether a node has already been removed at runtime.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hWe prevent both by generalizing the "invalidate non-owning references" behavior of ``bpf_spin_unlock`` and doing similar invalidation after ``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:h](hWWe prevent both by generalizing the “invalidate non-owning references” behavior of }(hj+ hhhNhNubj)}(h``bpf_spin_unlock``h]hbpf_spin_unlock}(hj3 hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj+ ubh& and doing similar invalidation after }(hj+ hhhNhNubj)}(h``bpf_rbtree_remove``h]hbpf_rbtree_remove}(hjE hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj+ ubh6. The logic here being that any graph API kfunc which:}(hj+ hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(h|* takes an arbitrary node argument * removes it from the data structure * returns an owning reference to the removed node h]h)}(hhh](j)}(h!takes an arbitrary node argument h]h)}(h takes an arbitrary node argumenth]h takes an arbitrary node argument}(hjh hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjd ubah}(h]h ]h"]h$]h&]uh1jhja ubj)}(h#removes it from the data structure h]h)}(h"removes it from the data structureh]h"removes it from the data structure}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj| ubah}(h]h ]h"]h$]h&]uh1jhja ubj)}(h0returns an owning reference to the removed node h]h)}(h/returns an owning reference to the removed nodeh]h/returns an owning reference to the removed node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhja ubeh}(h]h ]h"]h$]h&]j}j~uh1hhhhMhj] ubah}(h]h ]h"]h$]h&]uh1jhhhMhjhhubh)}(hMay result in a state where some other non-owning reference points to the same node. So ``remove``-type kfuncs must be considered a non-owning reference invalidation point as well.h](hXMay result in a state where some other non-owning reference points to the same node. So }(hj hhhNhNubj)}(h ``remove``h]hremove}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhR-type kfuncs must be considered a non-owning reference invalidation point as well.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjhhubeh}(h]jah ]h"]non-owning referencesah$]h&]uh1hhhhhhhhKRubeh}(h]bpf-graph-data-structuresah ]h"]bpf graph data structuresah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerj error_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(j j jjjjjEj<jj^j ju nametypes}(j jjjEjj uh}(j hjhjjj<jj^jHjjjj j6j-jXjOjzjqu footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}j KsRparse_messages]transform_messages] transformerN include_log] decorationNhhub.