XDP RX Metadata

This document describes how an eXpress Data Path (XDP) program can access hardware metadata related to a packet using a set of helper functions, and how it can pass that metadata on to other consumers.

General Design

XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame. Every device driver that wishes to expose additional packet metadata can implement these kfuncs. The set of kfuncs is declared in include/net/xdp.h via XDP_METADATA_KFUNC_xxx.

Currently, the following kfuncs are supported. In the future, as more metadata is supported, this set will grow:

__bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)

Read XDP frame RX timestamp.

Parameters

const struct xdp_md *ctx

XDP context pointer.

u64 *timestamp

Return value pointer.

Return

  • Returns 0 on success or -errno on error.

  • -EOPNOTSUPP : means device driver does not implement kfunc

  • -ENODATA : means no RX-timestamp available for this frame

__bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash, enum xdp_rss_hash_type *rss_type)

Read XDP frame RX hash.

Parameters

const struct xdp_md *ctx

XDP context pointer.

u32 *hash

Return value pointer.

enum xdp_rss_hash_type *rss_type

Return value pointer for RSS type.

Description

The RSS hash type (rss_type) specifies what portion of packet headers NIC hardware used when calculating RSS hash value. The RSS type can be decoded via enum xdp_rss_hash_type either matching on individual L3/L4 bits XDP_RSS_L* or by combined traditional RSS Hashing Types XDP_RSS_TYPE_L*.

Return

  • Returns 0 on success or -errno on error.

  • -EOPNOTSUPP : means device driver doesn’t implement kfunc

  • -ENODATA : means no RX-hash available for this frame

__bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, __be16 *vlan_proto, u16 *vlan_tci)

Get XDP packet outermost VLAN tag

Parameters

const struct xdp_md *ctx

XDP context pointer.

__be16 *vlan_proto

Destination pointer for VLAN Tag protocol identifier (TPID).

u16 *vlan_tci

Destination pointer for VLAN TCI (VID + DEI + PCP)

Description

In case of success, vlan_proto contains Tag protocol identifier (TPID), usually ETH_P_8021Q or ETH_P_8021AD, but some networks can use custom TPIDs. vlan_proto is stored in network byte order (BE) and should be used as follows: if (vlan_proto == bpf_htons(ETH_P_8021Q)) do_something();

vlan_tci contains the remaining 16 bits of a VLAN tag. Driver is expected to provide those in host byte order (usually LE), so the bpf program should not perform byte conversion. According to 802.1Q standard, VLAN TCI (Tag control information) is a bit field that contains: VLAN identifier (VID) that can be read with vlan_tci & 0xfff, Drop eligible indicator (DEI) - 1 bit, Priority code point (PCP) - 3 bits. For detailed meaning of DEI and PCP, please refer to other sources.

Return

  • Returns 0 on success or -errno on error.

  • -EOPNOTSUPP : device driver doesn’t implement kfunc

  • -ENODATA : VLAN tag was not stripped or is not available

An XDP program can use these kfuncs to read the metadata into stack variables for its own consumption. Or, to pass the metadata on to other consumers, an XDP program can store it into the metadata area carried ahead of the packet. Not all packets will necessary have the requested metadata available in which case the driver returns -ENODATA.

Not all kfuncs have to be implemented by the device driver; when not implemented, the default ones that return -EOPNOTSUPP will be used to indicate the device driver have not implemented this kfunc.

Within an XDP frame, the metadata layout (accessed via xdp_buff) is as follows:

+----------+-----------------+------+
| headroom | custom metadata | data |
+----------+-----------------+------+
           ^                 ^
           |                 |
 xdp_buff->data_meta   xdp_buff->data

An XDP program can store individual metadata items into this data_meta area in whichever format it chooses. Later consumers of the metadata will have to agree on the format by some out of band contract (like for the AF_XDP use case, see below).

AF_XDP

AF_XDP use-case implies that there is a contract between the BPF program that redirects XDP frames into the AF_XDP socket (XSK) and the final consumer. Thus the BPF program manually allocates a fixed number of bytes out of metadata via bpf_xdp_adjust_meta and calls a subset of kfuncs to populate it. The userspace XSK consumer computes xsk_umem__get_data() - METADATA_SIZE to locate that metadata. Note, xsk_umem__get_data is defined in libxdp and METADATA_SIZE is an application-specific constant (AF_XDP receive descriptor does _not_ explicitly carry the size of the metadata).

Here is the AF_XDP consumer layout (note missing data_meta pointer):

+----------+-----------------+------+
| headroom | custom metadata | data |
+----------+-----------------+------+
                             ^
                             |
                      rx_desc->address

XDP_PASS

This is the path where the packets processed by the XDP program are passed into the kernel. The kernel creates the skb out of the xdp_buff contents. Currently, every driver has custom kernel code to parse the descriptors and populate skb metadata when doing this xdp_buff->skb conversion, and the XDP metadata is not used by the kernel when building skbs. However, TC-BPF programs can access the XDP metadata area using the data_meta pointer.

In the future, we’d like to support a case where an XDP program can override some of the metadata used for building skbs.

bpf_redirect_map

bpf_redirect_map can redirect the frame to a different device. Some devices (like virtual ethernet links) support running a second XDP program after the redirect. However, the final consumer doesn’t have access to the original hardware descriptor and can’t access any of the original metadata. The same applies to XDP programs installed into devmaps and cpumaps.

This means that for redirected packets only custom metadata is currently supported, which has to be prepared by the initial XDP program before redirect. If the frame is eventually passed to the kernel, the skb created from such a frame won’t have any hardware metadata populated in its skb. If such a packet is later redirected into an XSK, that will also only have access to the custom metadata.

bpf_tail_call

Adding programs that access metadata kfuncs to the BPF_MAP_TYPE_PROG_ARRAY is currently not supported.

Supported Devices

It is possible to query which kfunc the particular netdev implements via netlink. See xdp-rx-metadata-features attribute set in Documentation/netlink/specs/netdev.yaml.

Example

See tools/testing/selftests/bpf/progs/xdp_metadata.c and tools/testing/selftests/bpf/prog_tests/xdp_metadata.c for an example of BPF program that handles XDP metadata.