sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget%/translations/zh_CN/networking/af_xdpmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/zh_TW/networking/af_xdpmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/it_IT/networking/af_xdpmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/ja_JP/networking/af_xdpmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/ko_KR/networking/af_xdpmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget%/translations/sp_SP/networking/af_xdpmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh?/var/lib/git/docbuild/linux/Documentation/networking/af_xdp.rsthKubhsection)}(hhh](htitle)}(hAF_XDPh]hAF_XDP}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hOverviewh]hOverview}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hUAF_XDP is an address family that is optimized for high performance packet processing.h]hUAF_XDP is an address family that is optimized for high performance packet processing.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(hThis document assumes that the reader is familiar with BPF and XDP. If not, the Cilium project has an excellent reference guide at http://cilium.readthedocs.io/en/latest/bpf/.h](hThis document assumes that the reader is familiar with BPF and XDP. If not, the Cilium project has an excellent reference guide at }(hhhhhNhNubh reference)}(h+http://cilium.readthedocs.io/en/latest/bpf/h]h+http://cilium.readthedocs.io/en/latest/bpf/}(hhhhhNhNubah}(h]h ]h"]h$]h&]refurihuh1hhhubh.}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(hXUsing the XDP_REDIRECT action from an XDP program, the program can redirect ingress frames to other XDP enabled netdevs, using the bpf_redirect_map() function. AF_XDP sockets enable the possibility for XDP programs to redirect frames to a memory buffer in a user-space application.h]hXUsing the XDP_REDIRECT action from an XDP program, the program can redirect ingress frames to other XDP enabled netdevs, using the bpf_redirect_map() function. AF_XDP sockets enable the possibility for XDP programs to redirect frames to a memory buffer in a user-space application.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hXAn AF_XDP socket (XSK) is created with the normal socket() syscall. Associated with each XSK are two rings: the RX ring and the TX ring. A socket can receive packets on the RX ring and it can send packets on the TX ring. These rings are registered and sized with the setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is mandatory to have at least one of these rings for each socket. An RX or TX descriptor ring points to a data buffer in a memory area called a UMEM. RX and TX can share the same UMEM so that a packet does not have to be copied between RX and TX. Moreover, if a packet needs to be kept for a while due to a possible retransmit, the descriptor that points to that packet can be changed to point to another and reused right away. This again avoids copying data.h]hXAn AF_XDP socket (XSK) is created with the normal socket() syscall. Associated with each XSK are two rings: the RX ring and the TX ring. A socket can receive packets on the RX ring and it can send packets on the TX ring. These rings are registered and sized with the setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is mandatory to have at least one of these rings for each socket. An RX or TX descriptor ring points to a data buffer in a memory area called a UMEM. RX and TX can share the same UMEM so that a packet does not have to be copied between RX and TX. Moreover, if a packet needs to be kept for a while due to a possible retransmit, the descriptor that points to that packet can be changed to point to another and reused right away. This again avoids copying data.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hX7The UMEM consists of a number of equally sized chunks. A descriptor in one of the rings references a frame by referencing its addr. The addr is simply an offset within the entire UMEM region. The user space allocates memory for this UMEM using whatever means it feels is most appropriate (malloc, mmap, huge pages, etc). This memory area is then registered with the kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two rings: the FILL ring and the COMPLETION ring. The FILL ring is used by the application to send down addr for the kernel to fill in with RX packet data. References to these frames will then appear in the RX ring once each packet has been received. The COMPLETION ring, on the other hand, contains frame addr that the kernel has transmitted completely and can now be used again by user space, for either TX or RX. Thus, the frame addrs appearing in the COMPLETION ring are addrs that were previously transmitted using the TX ring. In summary, the RX and FILL rings are used for the RX path and the TX and COMPLETION rings are used for the TX path.h]hX7The UMEM consists of a number of equally sized chunks. A descriptor in one of the rings references a frame by referencing its addr. The addr is simply an offset within the entire UMEM region. The user space allocates memory for this UMEM using whatever means it feels is most appropriate (malloc, mmap, huge pages, etc). This memory area is then registered with the kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two rings: the FILL ring and the COMPLETION ring. The FILL ring is used by the application to send down addr for the kernel to fill in with RX packet data. References to these frames will then appear in the RX ring once each packet has been received. The COMPLETION ring, on the other hand, contains frame addr that the kernel has transmitted completely and can now be used again by user space, for either TX or RX. Thus, the frame addrs appearing in the COMPLETION ring are addrs that were previously transmitted using the TX ring. In summary, the RX and FILL rings are used for the RX path and the TX and COMPLETION rings are used for the TX path.}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK$hhhhubh)}(hThe socket is then finally bound with a bind() call to a device and a specific queue id on that device, and it is not until bind is completed that traffic starts to flow.h]hThe socket is then finally bound with a bind() call to a device and a specific queue id on that device, and it is not until bind is completed that traffic starts to flow.}(hj7hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK5hhhhubh)}(hX.The UMEM can be shared between processes, if desired. If a process wants to do this, it simply skips the registration of the UMEM and its corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind call and submits the XSK of the process it would like to share UMEM with as well as its own newly created XSK socket. The new process will then receive frame addr references in its own RX ring that point to this shared UMEM. Note that since the ring structures are single-consumer / single-producer (for performance reasons), the new process has to create its own socket with associated RX and TX rings, since it cannot share this with the other process. This is also the reason that there is only one set of FILL and COMPLETION rings per UMEM. It is the responsibility of a single process to handle the UMEM.h]hX.The UMEM can be shared between processes, if desired. If a process wants to do this, it simply skips the registration of the UMEM and its corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind call and submits the XSK of the process it would like to share UMEM with as well as its own newly created XSK socket. The new process will then receive frame addr references in its own RX ring that point to this shared UMEM. Note that since the ring structures are single-consumer / single-producer (for performance reasons), the new process has to create its own socket with associated RX and TX rings, since it cannot share this with the other process. This is also the reason that there is only one set of FILL and COMPLETION rings per UMEM. It is the responsibility of a single process to handle the UMEM.}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK9hhhhubh)}(hXHow is then packets distributed from an XDP program to the XSKs? There is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The user-space application can place an XSK at an arbitrary place in this map. The XDP program can then redirect a packet to a specific index in this map and at this point XDP validates that the XSK in that map was indeed bound to that device and ring number. If not, the packet is dropped. If the map is empty at that index, the packet is also dropped. This also means that it is currently mandatory to have an XDP program loaded (and one XSK in the XSKMAP) to be able to get any traffic to user space through the XSK.h]hXHow is then packets distributed from an XDP program to the XSKs? There is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The user-space application can place an XSK at an arbitrary place in this map. The XDP program can then redirect a packet to a specific index in this map and at this point XDP validates that the XSK in that map was indeed bound to that device and ring number. If not, the packet is dropped. If the map is empty at that index, the packet is also dropped. This also means that it is currently mandatory to have an XDP program loaded (and one XSK in the XSKMAP) to be able to get any traffic to user space through the XSK.}(hjShhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKFhhhhubh)}(hXAF_XDP can operate in two different modes: XDP_SKB and XDP_DRV. If the driver does not have support for XDP, or XDP_SKB is explicitly chosen when loading the XDP program, XDP_SKB mode is employed that uses SKBs together with the generic XDP support and copies out the data to user space. A fallback mode that works for any network device. On the other hand, if the driver has support for XDP, it will be used by the AF_XDP code to provide better performance, but there is still a copy of the data into user space.h]hXAF_XDP can operate in two different modes: XDP_SKB and XDP_DRV. If the driver does not have support for XDP, or XDP_SKB is explicitly chosen when loading the XDP program, XDP_SKB mode is employed that uses SKBs together with the generic XDP support and copies out the data to user space. A fallback mode that works for any network device. On the other hand, if the driver has support for XDP, it will be used by the AF_XDP code to provide better performance, but there is still a copy of the data into user space.}(hjahhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKQhhhhubeh}(h]overviewah ]h"]overviewah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hConceptsh]hConcepts}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjwhhhhhK[ubh)}(hIn order to use an AF_XDP socket, a number of associated objects need to be setup. These objects and their options are explained in the following sections.h]hIn order to use an AF_XDP socket, a number of associated objects need to be setup. These objects and their options are explained in the following sections.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK]hjwhhubh)}(hXFor an overview on how AF_XDP works, you can also take a look at the Linux Plumbers paper from 2018 on the subject: http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf. Do NOT consult the paper from 2017 on "AF_PACKET v4", the first attempt at AF_XDP. Nearly everything changed since then. Jonathan Corbet has also written an excellent article on LWN, "Accelerating networking with AF_XDP". It can be found at https://lwn.net/Articles/750845/.h](htFor an overview on how AF_XDP works, you can also take a look at the Linux Plumbers paper from 2018 on the subject: }(hjhhhNhNubh)}(hGhttp://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfh]hGhttp://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf}(hjhhhNhNubah}(h]h ]h"]h$]h&]refurijuh1hhjubh. Do NOT consult the paper from 2017 on “AF_PACKET v4”, the first attempt at AF_XDP. Nearly everything changed since then. Jonathan Corbet has also written an excellent article on LWN, “Accelerating networking with AF_XDP”. It can be found at }(hjhhhNhNubh)}(h https://lwn.net/Articles/750845/h]h https://lwn.net/Articles/750845/}(hjhhhNhNubah}(h]h ]h"]h$]h&]refurijuh1hhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKahjwhhubh)}(hhh](h)}(hUMEMh]hUMEM}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKjubh)}(hX]UMEM is a region of virtual contiguous memory, divided into equal-sized frames. An UMEM is associated to a netdev and a specific queue id of that netdev. It is created and configured (chunk size, headroom, start address and size) by using the XDP_UMEM_REG setsockopt system call. A UMEM is bound to a netdev and queue id, via the bind() system call.h]hX]UMEM is a region of virtual contiguous memory, divided into equal-sized frames. An UMEM is associated to a netdev and a specific queue id of that netdev. It is created and configured (chunk size, headroom, start address and size) by using the XDP_UMEM_REG setsockopt system call. A UMEM is bound to a netdev and queue id, via the bind() system call.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKlhjhhubh)}(hXJAn AF_XDP is socket linked to a single UMEM, but one UMEM can have multiple AF_XDP sockets. To share an UMEM created via one socket A, the next socket B can do this by setting the XDP_SHARED_UMEM flag in struct sockaddr_xdp member sxdp_flags, and passing the file descriptor of A to struct sockaddr_xdp member sxdp_shared_umem_fd.h]hXJAn AF_XDP is socket linked to a single UMEM, but one UMEM can have multiple AF_XDP sockets. To share an UMEM created via one socket A, the next socket B can do this by setting the XDP_SHARED_UMEM flag in struct sockaddr_xdp member sxdp_flags, and passing the file descriptor of A to struct sockaddr_xdp member sxdp_shared_umem_fd.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKshjhhubh)}(hThe UMEM has two single-producer/single-consumer rings that are used to transfer ownership of UMEM frames between the kernel and the user-space application.h]hThe UMEM has two single-producer/single-consumer rings that are used to transfer ownership of UMEM frames between the kernel and the user-space application.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKyhjhhubeh}(h]umemah ]h"]umemah$]h&]uh1hhjwhhhhhKjubh)}(hhh](h)}(hRingsh]hRings}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhK~ubh)}(hThere are a four different kind of rings: FILL, COMPLETION, RX and TX. All rings are single-producer/single-consumer, so the user-space application need explicit synchronization of multiple processes/threads are reading/writing to them.h]hThere are a four different kind of rings: FILL, COMPLETION, RX and TX. All rings are single-producer/single-consumer, so the user-space application need explicit synchronization of multiple processes/threads are reading/writing to them.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hXThe UMEM uses two rings: FILL and COMPLETION. Each socket associated with the UMEM must have an RX queue, TX queue or both. Say, that there is a setup with four sockets (all doing TX and RX). Then there will be one FILL ring, one COMPLETION ring, four TX rings and four RX rings.h]hXThe UMEM uses two rings: FILL and COMPLETION. Each socket associated with the UMEM must have an RX queue, TX queue or both. Say, that there is a setup with four sockets (all doing TX and RX). Then there will be one FILL ring, one COMPLETION ring, four TX rings and four RX rings.}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hX7The rings are head(producer)/tail(consumer) based rings. A producer writes the data ring at the index pointed out by struct xdp_ring producer member, and increasing the producer index. A consumer reads the data ring at the index pointed out by struct xdp_ring consumer member, and increasing the consumer index.h]hX7The rings are head(producer)/tail(consumer) based rings. A producer writes the data ring at the index pointed out by struct xdp_ring producer member, and increasing the producer index. A consumer reads the data ring at the index pointed out by struct xdp_ring consumer member, and increasing the consumer index.}(hj:hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hThe rings are configured and created via the _RING setsockopt system calls and mmapped to user-space using the appropriate offset to mmap() (XDP_PGOFF_RX_RING, XDP_PGOFF_TX_RING, XDP_UMEM_PGOFF_FILL_RING and XDP_UMEM_PGOFF_COMPLETION_RING).h]hThe rings are configured and created via the _RING setsockopt system calls and mmapped to user-space using the appropriate offset to mmap() (XDP_PGOFF_RX_RING, XDP_PGOFF_TX_RING, XDP_UMEM_PGOFF_FILL_RING and XDP_UMEM_PGOFF_COMPLETION_RING).}(hjHhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(h6The size of the rings need to be of size power of two.h]h6The size of the rings need to be of size power of two.}(hjVhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hhh](h)}(hUMEM Fill Ringh]hUMEM Fill Ring}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1hhjdhhhhhKubh)}(hThe FILL ring is used to transfer ownership of UMEM frames from user-space to kernel-space. The UMEM addrs are passed in the ring. As an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has 16 chunks and can pass addrs between 0 and 64k.h]hThe FILL ring is used to transfer ownership of UMEM frames from user-space to kernel-space. The UMEM addrs are passed in the ring. As an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has 16 chunks and can pass addrs between 0 and 64k.}(hjuhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjdhhubh)}(hEFrames passed to the kernel are used for the ingress path (RX rings).h]hEFrames passed to the kernel are used for the ingress path (RX rings).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjdhhubh)}(hXThe user application produces UMEM addrs to this ring. Note that, if running the application with aligned chunk mode, the kernel will mask the incoming addr. E.g. for a chunk size of 2k, the log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 and 3000 refers to the same chunk. If the user application is run in the unaligned chunks mode, then the incoming addr will be left untouched.h]hXThe user application produces UMEM addrs to this ring. Note that, if running the application with aligned chunk mode, the kernel will mask the incoming addr. E.g. for a chunk size of 2k, the log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 and 3000 refers to the same chunk. If the user application is run in the unaligned chunks mode, then the incoming addr will be left untouched.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjdhhubeh}(h]umem-fill-ringah ]h"]umem fill ringah$]h&]uh1hhj hhhhhKubh)}(hhh](h)}(hUMEM Completion Ringh]hUMEM Completion Ring}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hThe COMPLETION Ring is used transfer ownership of UMEM frames from kernel-space to user-space. Just like the FILL ring, UMEM indices are used.h]hThe COMPLETION Ring is used transfer ownership of UMEM frames from kernel-space to user-space. Just like the FILL ring, UMEM indices are used.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hxFrames passed from the kernel to user-space are frames that has been sent (TX ring) and can be used by user-space again.h]hxFrames passed from the kernel to user-space are frames that has been sent (TX ring) and can be used by user-space again.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(h8The user application consumes UMEM addrs from this ring.h]h8The user application consumes UMEM addrs from this ring.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]umem-completion-ringah ]h"]umem completion ringah$]h&]uh1hhj hhhhhKubh)}(hhh](h)}(hRX Ringh]hRX Ring}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hThe RX ring is the receiving side of a socket. Each entry in the ring is a struct xdp_desc descriptor. The descriptor contains UMEM offset (addr) and the length of the data (len).h]hThe RX ring is the receiving side of a socket. Each entry in the ring is a struct xdp_desc descriptor. The descriptor contains UMEM offset (addr) and the length of the data (len).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hnIf no frames have been passed to kernel via the FILL ring, no descriptors will (or can) appear on the RX ring.h]hnIf no frames have been passed to kernel via the FILL ring, no descriptors will (or can) appear on the RX ring.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hIThe user application consumes struct xdp_desc descriptors from this ring.h]hIThe user application consumes struct xdp_desc descriptors from this ring.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]rx-ringah ]h"]rx ringah$]h&]uh1hhj hhhhhKubh)}(hhh](h)}(hTX Ringh]hTX Ring}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj-hhhhhKubh)}(hThe TX ring is used to send frames. The struct xdp_desc descriptor is filled (index, length and offset) and passed into the ring.h]hThe TX ring is used to send frames. The struct xdp_desc descriptor is filled (index, length and offset) and passed into the ring.}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj-hhubh)}(h_To start the transfer a sendmsg() system call is required. This might be relaxed in the future.h]h_To start the transfer a sendmsg() system call is required. This might be relaxed in the future.}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj-hhubh)}(hGThe user application produces struct xdp_desc descriptors to this ring.h]hGThe user application produces struct xdp_desc descriptors to this ring.}(hjZhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj-hhubeh}(h]tx-ringah ]h"]tx ringah$]h&]uh1hhj hhhhhKubeh}(h]ringsah ]h"]ringsah$]h&]uh1hhjwhhhhhK~ubeh}(h]conceptsah ]h"]conceptsah$]h&]uh1hhhhhhhhK[ubh)}(hhh](h)}(hLibbpfh]hLibbpf}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hX-Libbpf is a helper library for eBPF and XDP that makes using these technologies a lot simpler. It also contains specific helper functions in tools/lib/bpf/xsk.h for facilitating the use of AF_XDP. It contains two types of functions: those that can be used to make the setup of AF_XDP socket easier and ones that can be used in the data plane to access the rings safely and quickly. To see an example on how to use this API, please take a look at the sample application in samples/bpf/xdpsock_usr.c which uses libbpf for both setup and data plane operations.h]hX-Libbpf is a helper library for eBPF and XDP that makes using these technologies a lot simpler. It also contains specific helper functions in tools/lib/bpf/xsk.h for facilitating the use of AF_XDP. It contains two types of functions: those that can be used to make the setup of AF_XDP socket easier and ones that can be used in the data plane to access the rings safely and quickly. To see an example on how to use this API, please take a look at the sample application in samples/bpf/xdpsock_usr.c which uses libbpf for both setup and data plane operations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(htWe recommend that you use this library unless you have become a power user. It will make your program a lot simpler.h]htWe recommend that you use this library unless you have become a power user. It will make your program a lot simpler.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]libbpfah ]h"]libbpfah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hXSKMAP / BPF_MAP_TYPE_XSKMAPh]hXSKMAP / BPF_MAP_TYPE_XSKMAP}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hOn XDP side there is a BPF map type BPF_MAP_TYPE_XSKMAP (XSKMAP) that is used in conjunction with bpf_redirect_map() to pass the ingress frame to a socket.h]hOn XDP side there is a BPF map type BPF_MAP_TYPE_XSKMAP (XSKMAP) that is used in conjunction with bpf_redirect_map() to pass the ingress frame to a socket.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hPThe user application inserts the socket into the map, via the bpf() system call.h]hPThe user application inserts the socket into the map, via the bpf() system call.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXqNote that if an XDP program tries to redirect to a socket that does not match the queue configuration and netdev, the frame will be dropped. E.g. an AF_XDP socket is bound to netdev eth0 and queue 17. Only the XDP program executing for eth0 and queue 17 will successfully pass data to the socket. Please refer to the sample application (samples/bpf/) in for an example.h]hXqNote that if an XDP program tries to redirect to a socket that does not match the queue configuration and netdev, the frame will be dropped. E.g. an AF_XDP socket is bound to netdev eth0 and queue 17. Only the XDP program executing for eth0 and queue 17 will successfully pass data to the socket. Please refer to the sample application (samples/bpf/) in for an example.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]xskmap-bpf-map-type-xskmapah ]h"]xskmap / bpf_map_type_xskmapah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h&Configuration Flags and Socket Optionsh]h&Configuration Flags and Socket Options}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hqThese are the various configuration flags that can be used to control and monitor the behavior of AF_XDP sockets.h]hqThese are the various configuration flags that can be used to control and monitor the behavior of AF_XDP sockets.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](h)}(h$XDP_COPY and XDP_ZEROCOPY bind flagsh]h$XDP_COPY and XDP_ZEROCOPY bind flags}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hXWhen you bind to a socket, the kernel will first try to use zero-copy copy. If zero-copy is not supported, it will fall back on using copy mode, i.e. copying all packets out to user space. But if you would like to force a certain mode, you can use the following flags. If you pass the XDP_COPY flag to the bind call, the kernel will force the socket into copy mode. If it cannot use copy mode, the bind call will fail with an error. Conversely, the XDP_ZEROCOPY flag will force the socket into zero-copy mode or fail.h]hXWhen you bind to a socket, the kernel will first try to use zero-copy copy. If zero-copy is not supported, it will fall back on using copy mode, i.e. copying all packets out to user space. But if you would like to force a certain mode, you can use the following flags. If you pass the XDP_COPY flag to the bind call, the kernel will force the socket into copy mode. If it cannot use copy mode, the bind call will fail with an error. Conversely, the XDP_ZEROCOPY flag will force the socket into zero-copy mode or fail.}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]$xdp-copy-and-xdp-zerocopy-bind-flagsah ]h"]$xdp_copy and xdp_zerocopy bind flagsah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(hXDP_SHARED_UMEM bind flagh]hXDP_SHARED_UMEM bind flag}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj>hhhhhMubh)}(hXsThis flag enables you to bind multiple sockets to the same UMEM. It works on the same queue id, between queue ids and between netdevs/devices. In this mode, each socket has their own RX and TX rings as usual, but you are going to have one or more FILL and COMPLETION ring pairs. You have to create one of these pairs per unique netdev and queue id tuple that you bind to.h]hXsThis flag enables you to bind multiple sockets to the same UMEM. It works on the same queue id, between queue ids and between netdevs/devices. In this mode, each socket has their own RX and TX rings as usual, but you are going to have one or more FILL and COMPLETION ring pairs. You have to create one of these pairs per unique netdev and queue id tuple that you bind to.}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj>hhubh)}(hXStarting with the case were we would like to share a UMEM between sockets bound to the same netdev and queue id. The UMEM (tied to the fist socket created) will only have a single FILL ring and a single COMPLETION ring as there is only on unique netdev,queue_id tuple that we have bound to. To use this mode, create the first socket and bind it in the normal way. Create a second socket and create an RX and a TX ring, or at least one of them, but no FILL or COMPLETION rings as the ones from the first socket will be used. In the bind call, set he XDP_SHARED_UMEM option and provide the initial socket's fd in the sxdp_shared_umem_fd field. You can attach an arbitrary number of extra sockets this way.h]hXStarting with the case were we would like to share a UMEM between sockets bound to the same netdev and queue id. The UMEM (tied to the fist socket created) will only have a single FILL ring and a single COMPLETION ring as there is only on unique netdev,queue_id tuple that we have bound to. To use this mode, create the first socket and bind it in the normal way. Create a second socket and create an RX and a TX ring, or at least one of them, but no FILL or COMPLETION rings as the ones from the first socket will be used. In the bind call, set he XDP_SHARED_UMEM option and provide the initial socket’s fd in the sxdp_shared_umem_fd field. You can attach an arbitrary number of extra sockets this way.}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj>hhubh)}(hXWhat socket will then a packet arrive on? This is decided by the XDP program. Put all the sockets in the XSK_MAP and just indicate which index in the array you would like to send each packet to. A simple round-robin example of distributing packets is shown below:h]hXWhat socket will then a packet arrive on? This is decided by the XDP program. Put all the sockets in the XSK_MAP and just indicate which index in the array you would like to send each packet to. A simple round-robin example of distributing packets is shown below:}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj>hhubh literal_block)}(hX#include #include "bpf_helpers.h" #define MAX_SOCKS 16 struct { __uint(type, BPF_MAP_TYPE_XSKMAP); __uint(max_entries, MAX_SOCKS); __uint(key_size, sizeof(int)); __uint(value_size, sizeof(int)); } xsks_map SEC(".maps"); static unsigned int rr; SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) { rr = (rr + 1) & (MAX_SOCKS - 1); return bpf_redirect_map(&xsks_map, rr, XDP_DROP); }h]hX#include #include "bpf_helpers.h" #define MAX_SOCKS 16 struct { __uint(type, BPF_MAP_TYPE_XSKMAP); __uint(max_entries, MAX_SOCKS); __uint(key_size, sizeof(int)); __uint(value_size, sizeof(int)); } xsks_map SEC(".maps"); static unsigned int rr; SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) { rr = (rr + 1) & (MAX_SOCKS - 1); return bpf_redirect_map(&xsks_map, rr, XDP_DROP); }}hj{sbah}(h]h ]h"]h$]h&]hhforcelanguagechighlight_args}uh1jyhhhMhj>hhubh)}(hXKNote, that since there is only a single set of FILL and COMPLETION rings, and they are single producer, single consumer rings, you need to make sure that multiple processes or threads do not use these rings concurrently. There are no synchronization primitives in the libbpf code that protects multiple users at this point in time.h]hXKNote, that since there is only a single set of FILL and COMPLETION rings, and they are single producer, single consumer rings, you need to make sure that multiple processes or threads do not use these rings concurrently. There are no synchronization primitives in the libbpf code that protects multiple users at this point in time.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM4hj>hhubh)}(hX4Libbpf uses this mode if you create more than one socket tied to the same UMEM. However, note that you need to supply the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the xsk_socket__create calls and load your own XDP program as there is no built in one in libbpf that will route the traffic for you.h]hX4Libbpf uses this mode if you create more than one socket tied to the same UMEM. However, note that you need to supply the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the xsk_socket__create calls and load your own XDP program as there is no built in one in libbpf that will route the traffic for you.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM:hj>hhubh)}(hXThe second case is when you share a UMEM between sockets that are bound to different queue ids and/or netdevs. In this case you have to create one FILL ring and one COMPLETION ring for each unique netdev,queue_id pair. Let us say you want to create two sockets bound to two different queue ids on the same netdev. Create the first socket and bind it in the normal way. Create a second socket and create an RX and a TX ring, or at least one of them, and then one FILL and COMPLETION ring for this socket. Then in the bind call, set he XDP_SHARED_UMEM option and provide the initial socket's fd in the sxdp_shared_umem_fd field as you registered the UMEM on that socket. These two sockets will now share one and the same UMEM.h]hXThe second case is when you share a UMEM between sockets that are bound to different queue ids and/or netdevs. In this case you have to create one FILL ring and one COMPLETION ring for each unique netdev,queue_id pair. Let us say you want to create two sockets bound to two different queue ids on the same netdev. Create the first socket and bind it in the normal way. Create a second socket and create an RX and a TX ring, or at least one of them, and then one FILL and COMPLETION ring for this socket. Then in the bind call, set he XDP_SHARED_UMEM option and provide the initial socket’s fd in the sxdp_shared_umem_fd field as you registered the UMEM on that socket. These two sockets will now share one and the same UMEM.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM@hj>hhubh)}(hXoThere is no need to supply an XDP program like the one in the previous case where sockets were bound to the same queue id and device. Instead, use the NIC's packet steering capabilities to steer the packets to the right queue. In the previous example, there is only one queue shared among sockets, so the NIC cannot do this steering. It can only steer between queues.h]hXqThere is no need to supply an XDP program like the one in the previous case where sockets were bound to the same queue id and device. Instead, use the NIC’s packet steering capabilities to steer the packets to the right queue. In the previous example, there is only one queue shared among sockets, so the NIC cannot do this steering. It can only steer between queues.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMLhj>hhubh)}(hXxIn libbpf, you need to use the xsk_socket__create_shared() API as it takes a reference to a FILL ring and a COMPLETION ring that will be created for you and bound to the shared UMEM. You can use this function for all the sockets you create, or you can use it for the second and following ones and use xsk_socket__create() for the first one. Both methods yield the same result.h]hXxIn libbpf, you need to use the xsk_socket__create_shared() API as it takes a reference to a FILL ring and a COMPLETION ring that will be created for you and bound to the shared UMEM. You can use this function for all the sockets you create, or you can use it for the second and following ones and use xsk_socket__create() for the first one. Both methods yield the same result.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMShj>hhubh)}(hNote that a UMEM can be shared between sockets on the same queue id and device, as well as between queues on the same device and between devices at the same time.h]hNote that a UMEM can be shared between sockets on the same queue id and device, as well as between queues on the same device and between devices at the same time.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMZhj>hhubeh}(h]xdp-shared-umem-bind-flagah ]h"]xdp_shared_umem bind flagah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hXDP_USE_NEED_WAKEUP bind flagh]hXDP_USE_NEED_WAKEUP bind flag}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM_ubh)}(hXkThis option adds support for a new flag called need_wakeup that is present in the FILL ring and the TX ring, the rings for which user space is a producer. When this option is set in the bind call, the need_wakeup flag will be set if the kernel needs to be explicitly woken up by a syscall to continue processing packets. If the flag is zero, no syscall is needed.h]hXkThis option adds support for a new flag called need_wakeup that is present in the FILL ring and the TX ring, the rings for which user space is a producer. When this option is set in the bind call, the need_wakeup flag will be set if the kernel needs to be explicitly woken up by a syscall to continue processing packets. If the flag is zero, no syscall is needed.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMahjhhubh)}(hX^If the flag is set on the FILL ring, the application needs to call poll() to be able to continue to receive packets on the RX ring. This can happen, for example, when the kernel has detected that there are no more buffers on the FILL ring and no buffers left on the RX HW ring of the NIC. In this case, interrupts are turned off as the NIC cannot receive any packets (as there are no buffers to put them in), and the need_wakeup flag is set so that user space can put buffers on the FILL ring and then call poll() so that the kernel driver can put these buffers on the HW ring and start to receive packets.h]hX^If the flag is set on the FILL ring, the application needs to call poll() to be able to continue to receive packets on the RX ring. This can happen, for example, when the kernel has detected that there are no more buffers on the FILL ring and no buffers left on the RX HW ring of the NIC. In this case, interrupts are turned off as the NIC cannot receive any packets (as there are no buffers to put them in), and the need_wakeup flag is set so that user space can put buffers on the FILL ring and then call poll() so that the kernel driver can put these buffers on the HW ring and start to receive packets.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhhjhhubh)}(hIf the flag is set for the TX ring, it means that the application needs to explicitly notify the kernel to send any packets put on the TX ring. This can be accomplished either by a poll() call, as in the RX path, or by calling sendto().h]hIf the flag is set for the TX ring, it means that the application needs to explicitly notify the kernel to send any packets put on the TX ring. This can be accomplished either by a poll() call, as in the RX path, or by calling sendto().}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMrhjhhubh)}(hAn example of how to use this flag can be found in samples/bpf/xdpsock_user.c. An example with the use of libbpf helpers would look like this for the TX path:h]hAn example of how to use this flag can be found in samples/bpf/xdpsock_user.c. An example with the use of libbpf helpers would look like this for the TX path:}(hj%hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMwhjhhubjz)}(huif (xsk_ring_prod__needs_wakeup(&my_tx_ring)) sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);h]huif (xsk_ring_prod__needs_wakeup(&my_tx_ring)) sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);}hj3sbah}(h]h ]h"]h$]h&]hhjjjj}uh1jyhhhM{hjhhubh)}(h.I.e., only use the syscall if the flag is set.h]h.I.e., only use the syscall if the flag is set.}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hX*We recommend that you always enable this mode as it usually leads to better performance especially if you run the application and the driver on the same core, but also if you use different cores for the application and the kernel driver, as it reduces the number of syscalls needed for the TX path.h]hX*We recommend that you always enable this mode as it usually leads to better performance especially if you run the application and the driver on the same core, but also if you use different cores for the application and the kernel driver, as it reduces the number of syscalls needed for the TX path.}(hjPhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]xdp-use-need-wakeup-bind-flagah ]h"]xdp_use_need_wakeup bind flagah$]h&]uh1hhjhhhhhM_ubh)}(hhh](h)}(h6XDP_{RX|TX|UMEM_FILL|UMEM_COMPLETION}_RING setsockoptsh]h6XDP_{RX|TX|UMEM_FILL|UMEM_COMPLETION}_RING setsockopts}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1hhjfhhhhhMubh)}(hXTThese setsockopts sets the number of descriptors that the RX, TX, FILL, and COMPLETION rings respectively should have. It is mandatory to set the size of at least one of the RX and TX rings. If you set both, you will be able to both receive and send traffic from your application, but if you only want to do one of them, you can save resources by only setting up one of them. Both the FILL ring and the COMPLETION ring are mandatory as you need to have a UMEM tied to your socket. But if the XDP_SHARED_UMEM flag is used, any socket after the first one does not have a UMEM and should in that case not have any FILL or COMPLETION rings created as the ones from the shared UMEM will be used. Note, that the rings are single-producer single-consumer, so do not try to access them from multiple processes at the same time. See the XDP_SHARED_UMEM section.h]hXTThese setsockopts sets the number of descriptors that the RX, TX, FILL, and COMPLETION rings respectively should have. It is mandatory to set the size of at least one of the RX and TX rings. If you set both, you will be able to both receive and send traffic from your application, but if you only want to do one of them, you can save resources by only setting up one of them. Both the FILL ring and the COMPLETION ring are mandatory as you need to have a UMEM tied to your socket. But if the XDP_SHARED_UMEM flag is used, any socket after the first one does not have a UMEM and should in that case not have any FILL or COMPLETION rings created as the ones from the shared UMEM will be used. Note, that the rings are single-producer single-consumer, so do not try to access them from multiple processes at the same time. See the XDP_SHARED_UMEM section.}(hjwhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjfhhubh)}(hIn libbpf, you can create Rx-only and Tx-only sockets by supplying NULL to the rx and tx arguments, respectively, to the xsk_socket__create function.h]hIn libbpf, you can create Rx-only and Tx-only sockets by supplying NULL to the rx and tx arguments, respectively, to the xsk_socket__create function.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjfhhubh)}(hIf you create a Tx-only socket, we recommend that you do not put any packets on the fill ring. If you do this, drivers might think you are going to receive something when you in fact will not, and this can negatively impact performance.h]hIf you create a Tx-only socket, we recommend that you do not put any packets on the fill ring. If you do this, drivers might think you are going to receive something when you in fact will not, and this can negatively impact performance.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjfhhubeh}(h]4xdp-rx-tx-umem-fill-umem-completion-ring-setsockoptsah ]h"]6xdp_{rx|tx|umem_fill|umem_completion}_ring setsockoptsah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hXDP_UMEM_REG setsockopth]hXDP_UMEM_REG setsockopt}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hXThis setsockopt registers a UMEM to a socket. This is the area that contain all the buffers that packet can reside in. The call takes a pointer to the beginning of this area and the size of it. Moreover, it also has parameter called chunk_size that is the size that the UMEM is divided into. It can only be 2K or 4K at the moment. If you have an UMEM area that is 128K and a chunk size of 2K, this means that you will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM area and that your largest packet size can be 2K.h]hXThis setsockopt registers a UMEM to a socket. This is the area that contain all the buffers that packet can reside in. The call takes a pointer to the beginning of this area and the size of it. Moreover, it also has parameter called chunk_size that is the size that the UMEM is divided into. It can only be 2K or 4K at the moment. If you have an UMEM area that is 128K and a chunk size of 2K, this means that you will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM area and that your largest packet size can be 2K.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hXDThere is also an option to set the headroom of each single buffer in the UMEM. If you set this to N bytes, it means that the packet will start N bytes into the buffer leaving the first N bytes for the application to use. The final option is the flags field, but it will be dealt with in separate sections for each UMEM flag.h]hXDThere is also an option to set the headroom of each single buffer in the UMEM. If you set this to N bytes, it means that the packet will start N bytes into the buffer leaving the first N bytes for the application to use. The final option is the flags field, but it will be dealt with in separate sections for each UMEM flag.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]xdp-umem-reg-setsockoptah ]h"]xdp_umem_reg setsockoptah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hSO_BINDTODEVICE setsockopth]hSO_BINDTODEVICE setsockopt}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hX[This is a generic SOL_SOCKET option that can be used to tie AF_XDP socket to a particular network interface. It is useful when a socket is created by a privileged process and passed to a non-privileged one. Once the option is set, kernel will refuse attempts to bind that socket to a different interface. Updating the value requires CAP_NET_RAW.h]hX[This is a generic SOL_SOCKET option that can be used to tie AF_XDP socket to a particular network interface. It is useful when a socket is created by a privileged process and passed to a non-privileged one. Once the option is set, kernel will refuse attempts to bind that socket to a different interface. Updating the value requires CAP_NET_RAW.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]so-bindtodevice-setsockoptah ]h"]so_bindtodevice setsockoptah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hXDP_STATISTICS getsockopth]hXDP_STATISTICS getsockopt}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hqGets drop statistics of a socket that can be useful for debug purposes. The supported statistics are shown below:h]hqGets drop statistics of a socket that can be useful for debug purposes. The supported statistics are shown below:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubjz)}(hstruct xdp_statistics { __u64 rx_dropped; /* Dropped for reasons other than invalid desc */ __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ };h]hstruct xdp_statistics { __u64 rx_dropped; /* Dropped for reasons other than invalid desc */ __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ };}hj$sbah}(h]h ]h"]h$]h&]hhjjjj}uh1jyhhhMhjhhubeh}(h]xdp-statistics-getsockoptah ]h"]xdp_statistics getsockoptah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hXDP_OPTIONS getsockopth]hXDP_OPTIONS getsockopt}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj;hhhhhMubh)}(hGets options from an XDP socket. The only one supported so far is XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not.h]hGets options from an XDP socket. The only one supported so far is XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not.}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj;hhubeh}(h]xdp-options-getsockoptah ]h"]xdp_options getsockoptah$]h&]uh1hhjhhhhhMubeh}(h]&configuration-flags-and-socket-optionsah ]h"]&configuration flags and socket optionsah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hMulti-Buffer Supporth]hMulti-Buffer Support}(hjmhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjjhhhhhMubh)}(hX]With multi-buffer support, programs using AF_XDP sockets can receive and transmit packets consisting of multiple buffers both in copy and zero-copy mode. For example, a packet can consist of two frames/buffers, one with the header and the other one with the data, or a 9K Ethernet jumbo frame can be constructed by chaining together three 4K frames.h]hX]With multi-buffer support, programs using AF_XDP sockets can receive and transmit packets consisting of multiple buffers both in copy and zero-copy mode. For example, a packet can consist of two frames/buffers, one with the header and the other one with the data, or a 9K Ethernet jumbo frame can be constructed by chaining together three 4K frames.}(hj{hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjjhhubh)}(hSome definitions:h]hSome definitions:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjjhhubh bullet_list)}(hhh](h list_item)}(h(A packet consists of one or more frames h]h)}(h'A packet consists of one or more framesh]h'A packet consists of one or more frames}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hA descriptor in one of the AF_XDP rings always refers to a single frame. In the case the packet consists of a single frame, the descriptor refers to the whole packet. h]h)}(hA descriptor in one of the AF_XDP rings always refers to a single frame. In the case the packet consists of a single frame, the descriptor refers to the whole packet.h]hA descriptor in one of the AF_XDP rings always refers to a single frame. In the case the packet consists of a single frame, the descriptor refers to the whole packet.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1jhhhMhjjhhubh)}(hXKTo enable multi-buffer support for an AF_XDP socket, use the new bind flag XDP_USE_SG. If this is not provided, all multi-buffer packets will be dropped just as before. Note that the XDP program loaded also needs to be in multi-buffer mode. This can be accomplished by using "xdp.frags" as the section name of the XDP program used.h]hXOTo enable multi-buffer support for an AF_XDP socket, use the new bind flag XDP_USE_SG. If this is not provided, all multi-buffer packets will be dropped just as before. Note that the XDP program loaded also needs to be in multi-buffer mode. This can be accomplished by using “xdp.frags” as the section name of the XDP program used.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjjhhubh)}(hXMTo represent a packet consisting of multiple frames, a new flag called XDP_PKT_CONTD is introduced in the options field of the Rx and Tx descriptors. If it is true (1) the packet continues with the next descriptor and if it is false (0) it means this is the last descriptor of the packet. Why the reverse logic of end-of-packet (eop) flag found in many NICs? Just to preserve compatibility with non-multi-buffer applications that have this bit set to false for all packets on Rx, and the apps set the options field to zero for Tx, as anything else will be treated as an invalid descriptor.h]hXMTo represent a packet consisting of multiple frames, a new flag called XDP_PKT_CONTD is introduced in the options field of the Rx and Tx descriptors. If it is true (1) the packet continues with the next descriptor and if it is false (0) it means this is the last descriptor of the packet. Why the reverse logic of end-of-packet (eop) flag found in many NICs? Just to preserve compatibility with non-multi-buffer applications that have this bit set to false for all packets on Rx, and the apps set the options field to zero for Tx, as anything else will be treated as an invalid descriptor.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjjhhubh)}(h`These are the semantics for producing packets onto AF_XDP Tx ring consisting of multiple frames:h]h`These are the semantics for producing packets onto AF_XDP Tx ring consisting of multiple frames:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjjhhubj)}(hhh](j)}(hX`When an invalid descriptor is found, all the other descriptors/frames of this packet are marked as invalid and not completed. The next descriptor is treated as the start of a new packet, even if this was not the intent (because we cannot guess the intent). As before, if your program is producing invalid descriptors you have a bug that must be fixed. h]h)}(hX_When an invalid descriptor is found, all the other descriptors/frames of this packet are marked as invalid and not completed. The next descriptor is treated as the start of a new packet, even if this was not the intent (because we cannot guess the intent). As before, if your program is producing invalid descriptors you have a bug that must be fixed.h]hX_When an invalid descriptor is found, all the other descriptors/frames of this packet are marked as invalid and not completed. The next descriptor is treated as the start of a new packet, even if this was not the intent (because we cannot guess the intent). As before, if your program is producing invalid descriptors you have a bug that must be fixed.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hrx_queue_index; // A set entry here means that the corresponding queue_id // has an active AF_XDP socket bound to it. if (bpf_map_lookup_elem(&xsks_map, &index)) return bpf_redirect_map(&xsks_map, index, 0); return XDP_PASS; }h]hXISEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) { int index = ctx->rx_queue_index; // A set entry here means that the corresponding queue_id // has an active AF_XDP socket bound to it. if (bpf_map_lookup_elem(&xsks_map, &index)) return bpf_redirect_map(&xsks_map, index, 0); return XDP_PASS; }}hjsbah}(h]h ]h"]h$]h&]hhjjjj}uh1jyhhhM/hjhhubh)}(hNA simple but not so performance ring dequeue and enqueue could look like this:h]hNA simple but not so performance ring dequeue and enqueue could look like this:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM=hjhhubjz)}(hX// struct xdp_rxtx_ring { // __u32 *producer; // __u32 *consumer; // struct xdp_desc *desc; // }; // struct xdp_umem_ring { // __u32 *producer; // __u32 *consumer; // __u64 *desc; // }; // typedef struct xdp_rxtx_ring RING; // typedef struct xdp_umem_ring RING; // typedef struct xdp_desc RING_TYPE; // typedef __u64 RING_TYPE; int dequeue_one(RING *ring, RING_TYPE *item) { __u32 entries = *ring->producer - *ring->consumer; if (entries == 0) return -1; // read-barrier! *item = ring->desc[*ring->consumer & (RING_SIZE - 1)]; (*ring->consumer)++; return 0; } int enqueue_one(RING *ring, const RING_TYPE *item) { u32 free_entries = RING_SIZE - (*ring->producer - *ring->consumer); if (free_entries == 0) return -1; ring->desc[*ring->producer & (RING_SIZE - 1)] = *item; // write-barrier! (*ring->producer)++; return 0; }h]hX// struct xdp_rxtx_ring { // __u32 *producer; // __u32 *consumer; // struct xdp_desc *desc; // }; // struct xdp_umem_ring { // __u32 *producer; // __u32 *consumer; // __u64 *desc; // }; // typedef struct xdp_rxtx_ring RING; // typedef struct xdp_umem_ring RING; // typedef struct xdp_desc RING_TYPE; // typedef __u64 RING_TYPE; int dequeue_one(RING *ring, RING_TYPE *item) { __u32 entries = *ring->producer - *ring->consumer; if (entries == 0) return -1; // read-barrier! *item = ring->desc[*ring->consumer & (RING_SIZE - 1)]; (*ring->consumer)++; return 0; } int enqueue_one(RING *ring, const RING_TYPE *item) { u32 free_entries = RING_SIZE - (*ring->producer - *ring->consumer); if (free_entries == 0) return -1; ring->desc[*ring->producer & (RING_SIZE - 1)] = *item; // write-barrier! (*ring->producer)++; return 0; }}hjsbah}(h]h ]h"]h$]h&]hhjjjj}uh1jyhhhM@hjhhubh)}(hgBut please use the libbpf functions as they are optimized and ready to use. Will make your life easier.h]hgBut please use the libbpf functions as they are optimized and ready to use. Will make your life easier.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMqhjhhubeh}(h]usageah ]h"]usageah$]h&]uh1hhjjhhhhhM&ubh)}(hhh](h)}(hUsage Multi-Buffer Rxh]hUsage Multi-Buffer Rx}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMuubh)}(hHere is a simple Rx path pseudo-code example (using libxdp interfaces for simplicity). Error paths have been excluded to keep it short:h]hHere is a simple Rx path pseudo-code example (using libxdp interfaces for simplicity). Error paths have been excluded to keep it short:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMwhjhhubjz)}(hX\void rx_packets(struct xsk_socket_info *xsk) { static bool new_packet = true; u32 idx_rx = 0, idx_fq = 0; static char *pkt; int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx); xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq); for (int i = 0; i < rcvd; i++) { struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr); bool eop = !(desc->options & XDP_PKT_CONTD); if (new_packet) pkt = frag; else add_frag_to_pkt(pkt, frag); if (eop) process_pkt(pkt); new_packet = eop; *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr; } xsk_ring_prod__submit(&xsk->umem->fq, rcvd); xsk_ring_cons__release(&xsk->rx, rcvd); }h]hX\void rx_packets(struct xsk_socket_info *xsk) { static bool new_packet = true; u32 idx_rx = 0, idx_fq = 0; static char *pkt; int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx); xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq); for (int i = 0; i < rcvd; i++) { struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr); bool eop = !(desc->options & XDP_PKT_CONTD); if (new_packet) pkt = frag; else add_frag_to_pkt(pkt, frag); if (eop) process_pkt(pkt); new_packet = eop; *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr; } xsk_ring_prod__submit(&xsk->umem->fq, rcvd); xsk_ring_cons__release(&xsk->rx, rcvd); }}hj!sbah}(h]h ]h"]h$]h&]hhjjjj}uh1jyhhhMzhjhhubeh}(h]usage-multi-buffer-rxah ]h"]usage multi-buffer rxah$]h&]uh1hhjjhhhhhMuubh)}(hhh](h)}(hUsage Multi-Buffer Txh]hUsage Multi-Buffer Tx}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj8hhhhhMubh)}(hHere is an example Tx path pseudo-code (using libxdp interfaces for simplicity) ignoring that the umem is finite in size, and that we eventually will run out of packets to send. Also assumes pkts.addr points to a valid location in the umem.h]hHere is an example Tx path pseudo-code (using libxdp interfaces for simplicity) ignoring that the umem is finite in size, and that we eventually will run out of packets to send. Also assumes pkts.addr points to a valid location in the umem.}(hjIhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj8hhubjz)}(hX&void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts, int batch_size) { u32 idx, i, pkt_nb = 0; xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx); for (i = 0; i < batch_size;) { u64 addr = pkts[pkt_nb].addr; u32 len = pkts[pkt_nb].size; do { struct xdp_desc *tx_desc; tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++); tx_desc->addr = addr; if (len > xsk_frame_size) { tx_desc->len = xsk_frame_size; tx_desc->options = XDP_PKT_CONTD; } else { tx_desc->len = len; tx_desc->options = 0; pkt_nb++; } len -= tx_desc->len; addr += xsk_frame_size; if (i == batch_size) { /* Remember len, addr, pkt_nb for next iteration. * Skipped for simplicity. */ break; } } while (len); } xsk_ring_prod__submit(&xsk->tx, i); }h]hX&void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts, int batch_size) { u32 idx, i, pkt_nb = 0; xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx); for (i = 0; i < batch_size;) { u64 addr = pkts[pkt_nb].addr; u32 len = pkts[pkt_nb].size; do { struct xdp_desc *tx_desc; tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++); tx_desc->addr = addr; if (len > xsk_frame_size) { tx_desc->len = xsk_frame_size; tx_desc->options = XDP_PKT_CONTD; } else { tx_desc->len = len; tx_desc->options = 0; pkt_nb++; } len -= tx_desc->len; addr += xsk_frame_size; if (i == batch_size) { /* Remember len, addr, pkt_nb for next iteration. * Skipped for simplicity. */ break; } } while (len); } xsk_ring_prod__submit(&xsk->tx, i); }}hjWsbah}(h]h ]h"]h$]h&]hhjjjj}uh1jyhhhMhj8hhubeh}(h]usage-multi-buffer-txah ]h"]usage multi-buffer txah$]h&]uh1hhjjhhhhhMubh)}(hhh](h)}(h Probing for Multi-Buffer Supporth]h Probing for Multi-Buffer Support}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjnhhhhhMubh)}(hXSTo discover if a driver supports multi-buffer AF_XDP in SKB or DRV mode, use the XDP_FEATURES feature of netlink in linux/netdev.h to query for NETDEV_XDP_ACT_RX_SG support. This is the same flag as for querying for XDP multi-buffer support. If XDP supports multi-buffer in a driver, then AF_XDP will also support that in SKB and DRV mode.h]hXSTo discover if a driver supports multi-buffer AF_XDP in SKB or DRV mode, use the XDP_FEATURES feature of netlink in linux/netdev.h to query for NETDEV_XDP_ACT_RX_SG support. This is the same flag as for querying for XDP multi-buffer support. If XDP supports multi-buffer in a driver, then AF_XDP will also support that in SKB and DRV mode.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjnhhubh)}(hXTo discover if a driver supports multi-buffer AF_XDP in zero-copy mode, use XDP_FEATURES and first check the NETDEV_XDP_ACT_XSK_ZEROCOPY flag. If it is set, it means that at least zero-copy is supported and you should go and check the netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS in linux/netdev.h. An unsigned integer value will be returned stating the max number of frags that are supported by this device in zero-copy mode. These are the possible return values:h]hXTo discover if a driver supports multi-buffer AF_XDP in zero-copy mode, use XDP_FEATURES and first check the NETDEV_XDP_ACT_XSK_ZEROCOPY flag. If it is set, it means that at least zero-copy is supported and you should go and check the netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS in linux/netdev.h. An unsigned integer value will be returned stating the max number of frags that are supported by this device in zero-copy mode. These are the possible return values:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjnhhubhdefinition_list)}(hhh](hdefinition_list_item)}(h1: Multi-buffer for zero-copy is not supported by this device, as max one fragment supported means that multi-buffer is not possible. h](hterm)}(hE1: Multi-buffer for zero-copy is not supported by this device, as maxh]hE1: Multi-buffer for zero-copy is not supported by this device, as max}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjubh definition)}(hhh]h)}(h?one fragment supported means that multi-buffer is not possible.h]h?one fragment supported means that multi-buffer is not possible.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubj)}(h>=2: Multi-buffer is supported in zero-copy mode for this device. The returned number signifies the max number of frags supported. h](j)}(hE>=2: Multi-buffer is supported in zero-copy mode for this device. Theh]hE>=2: Multi-buffer is supported in zero-copy mode for this device. The}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMhjubj)}(hhh]h)}(h combined 1 If you want to only see part of the traffic, you can program the NIC through ethtool to filter out your traffic to a single queue id that you can bind your XDP socket to. Here is one example in which UDP traffic to and from port 4242 are sent to queue 2:: sudo ethtool -N rx-flow-hash udp4 fn sudo ethtool -N flow-type udp4 src-port 4242 dst-port \ 4242 action 2 A number of other ways are possible all up to the capabilities of the NIC you have. h](j)}(h@A: When a netdev of a physical NIC is initialized, Linux usuallyh]h@A: When a netdev of a physical NIC is initialized, Linux usually}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM+hj ubj)}(hhh](h)}(hX.allocates one RX and TX queue pair per core. So on a 8 core system, queue ids 0 to 7 will be allocated, one per core. In the AF_XDP bind call or the xsk_socket__create libbpf function call, you specify a specific queue id to bind to and it is only the traffic towards that queue you are going to get on you socket. So in the example above, if you bind to queue 0, you are NOT going to get any traffic that is distributed to queues 1 through 7. If you are lucky, you will see the traffic, but usually it will end up on one of the queues you have not bound to.h]hX.allocates one RX and TX queue pair per core. So on a 8 core system, queue ids 0 to 7 will be allocated, one per core. In the AF_XDP bind call or the xsk_socket__create libbpf function call, you specify a specific queue id to bind to and it is only the traffic towards that queue you are going to get on you socket. So in the example above, if you bind to queue 0, you are NOT going to get any traffic that is distributed to queues 1 through 7. If you are lucky, you will see the traffic, but usually it will end up on one of the queues you have not bound to.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubh)}(hXThere are a number of ways to solve the problem of getting the traffic you want to the queue id you bound to. If you want to see all the traffic, you can force the netdev to only have 1 queue, queue id 0, and then bind to queue 0. You can use ethtool to do this::h]hXThere are a number of ways to solve the problem of getting the traffic you want to the queue id you bound to. If you want to see all the traffic, you can force the netdev to only have 1 queue, queue id 0, and then bind to queue 0. You can use ethtool to do this:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubjz)}(h&sudo ethtool -L combined 1h]h&sudo ethtool -L combined 1}hj sbah}(h]h ]h"]h$]h&]hhuh1jyhhhMhj ubh)}(hIf you want to only see part of the traffic, you can program the NIC through ethtool to filter out your traffic to a single queue id that you can bind your XDP socket to. Here is one example in which UDP traffic to and from port 4242 are sent to queue 2::h]hIf you want to only see part of the traffic, you can program the NIC through ethtool to filter out your traffic to a single queue id that you can bind your XDP socket to. Here is one example in which UDP traffic to and from port 4242 are sent to queue 2:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM!hj ubjz)}(hsudo ethtool -N rx-flow-hash udp4 fn sudo ethtool -N flow-type udp4 src-port 4242 dst-port \ 4242 action 2h]hsudo ethtool -N rx-flow-hash udp4 fn sudo ethtool -N flow-type udp4 src-port 4242 dst-port \ 4242 action 2}hj+ sbah}(h]h ]h"]h$]h&]hhuh1jyhhhM&hj ubh)}(hSA number of other ways are possible all up to the capabilities of the NIC you have.h]hSA number of other ways are possible all up to the capabilities of the NIC you have.}(hj9 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM*hj ubeh}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM+hj ubj)}(hTQ: Can I use the XSKMAP to implement a switch between different umems in copy mode? h](j)}(hEQ: Can I use the XSKMAP to implement a switch between different umemsh]hEQ: Can I use the XSKMAP to implement a switch between different umems}(hjW hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM.hjS ubj)}(hhh]h)}(h in copy mode?h]h in copy mode?}(hjh hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM.hje ubah}(h]h ]h"]h$]h&]uh1jhjS ubeh}(h]h ]h"]h$]h&]uh1jhhhM.hj hhubj)}(hXA: The short answer is no, that is not supported at the moment. The XSKMAP can only be used to switch traffic coming in on queue id X to sockets bound to the same queue id X. The XSKMAP can contain sockets bound to different queue ids, for example X and Y, but only traffic goming in from queue id Y can be directed to sockets bound to the same queue id Y. In zero-copy mode, you should use the switch, or other distribution mechanism, in your NIC to direct traffic to the correct queue id and socket. h](j)}(hCA: The short answer is no, that is not supported at the moment. Theh]hCA: The short answer is no, that is not supported at the moment. The}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhM7hj ubj)}(hhh]h)}(hXXSKMAP can only be used to switch traffic coming in on queue id X to sockets bound to the same queue id X. The XSKMAP can contain sockets bound to different queue ids, for example X and Y, but only traffic goming in from queue id Y can be directed to sockets bound to the same queue id Y. In zero-copy mode, you should use the switch, or other distribution mechanism, in your NIC to direct traffic to the correct queue id and socket.h]hXXSKMAP can only be used to switch traffic coming in on queue id X to sockets bound to the same queue id X. The XSKMAP can contain sockets bound to different queue ids, for example X and Y, but only traffic goming in from queue id Y can be directed to sockets bound to the same queue id Y. In zero-copy mode, you should use the switch, or other distribution mechanism, in your NIC to direct traffic to the correct queue id and socket.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM1hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM7hj hhubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubh)}(h5Q: My packets are sometimes corrupted. What is wrong?h]h5Q: My packets are sometimes corrupted. What is wrong?}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM9hj hhubj)}(hhh]j)}(hXA: Care has to be taken not to feed the same buffer in the UMEM into more than one ring at the same time. If you for example feed the same buffer into the FILL ring and the TX ring at the same time, the NIC might receive data into the buffer at the same time it is sending it. This will cause some packets to become corrupted. Same thing goes for feeding the same buffer into the FILL rings belonging to different queue ids or netdevs bound with the XDP_SHARED_UMEM flag. h](j)}(hDA: Care has to be taken not to feed the same buffer in the UMEM intoh]hDA: Care has to be taken not to feed the same buffer in the UMEM into}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhMBhj ubj)}(hhh]h)}(hXmore than one ring at the same time. If you for example feed the same buffer into the FILL ring and the TX ring at the same time, the NIC might receive data into the buffer at the same time it is sending it. This will cause some packets to become corrupted. Same thing goes for feeding the same buffer into the FILL rings belonging to different queue ids or netdevs bound with the XDP_SHARED_UMEM flag.h]hXmore than one ring at the same time. If you for example feed the same buffer into the FILL ring and the TX ring at the same time, the NIC might receive data into the buffer at the same time it is sending it. This will cause some packets to become corrupted. Same thing goes for feeding the same buffer into the FILL rings belonging to different queue ids or netdevs bound with the XDP_SHARED_UMEM flag.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM<hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMBhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h]faqah ]h"]faqah$]h&]uh1hhhhhhhhM ubh)}(hhh](h)}(hCreditsh]hCredits}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMEubj)}(hhh](j)}(hBjörn Töpel (AF_XDP core)h]h)}(hj h]hBjörn Töpel (AF_XDP core)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMGhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hMagnus Karlsson (AF_XDP core)h]h)}(hj2 h]hMagnus Karlsson (AF_XDP core)}(hj4 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMHhj0 ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hAlexander Duyckh]h)}(hjI h]hAlexander Duyck}(hjK hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMIhjG ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hAlexei Starovoitovh]h)}(hj` h]hAlexei Starovoitov}(hjb hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMJhj^ ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hDaniel Borkmannh]h)}(hjw h]hDaniel Borkmann}(hjy hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMKhju ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hJesper Dangaard Brouerh]h)}(hj h]hJesper Dangaard Brouer}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMLhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hJohn Fastabendh]h)}(hj h]hJohn Fastabend}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMMhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hJonathan Corbet (LWN coverage)h]h)}(hj h]hJonathan Corbet (LWN coverage)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMNhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hMichael S. Tsirkinh]h)}(hj h]hMichael S. Tsirkin}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMOhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(h Qi Z Zhangh]h)}(hj h]h Qi Z Zhang}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMPhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hWillem de Bruijnh]h)}(hj h]hWillem de Bruijn}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMQhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h]h ]h"]h$]h&]j-uh1jhhhMGhj hhubeh}(h]creditsah ]h"]creditsah$]h&]uh1hhhhhhhhMEubeh}(h]af-xdpah ]h"]af_xdpah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjP error_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(j* j' jtjqj}jzj jjujrjjjjj*j'jmjjjjjjjgjdj;j8jjjcj`jjjjjjj8j5j_j\jL jI jjj5j2jkjhj j jD jA j j j j j" j u nametypes}(j* jtj}j jujjj*jmjjjgj;jjcjjjj8j_jL jj5jkj jD j j j" uh}(j' hjqhjzjwjjjrj jjdjjj'jjjj-jjjjjdjj8jjj>j`jjjfjjjjj5jj\j;jI jjjjj2jjhj8j jnjA j j jO j j j j u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages] transformerN include_log] decorationNhhub.