eBPF Instruction Set¶
Registers and calling convention¶
eBPF has 10 general purpose registers and a read-only frame pointer register, all of which are 64-bits wide.
The eBPF calling convention is defined as:
R0: return value from function calls, and exit value for eBPF programs
R1 - R5: arguments for function calls
R6 - R9: callee saved registers that function calls will preserve
R10: read-only frame pointer to access stack
R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if necessary across calls.
Instruction encoding¶
eBPF uses 64-bit instructions with the following encoding:
32 bits (MSB)
16 bits
4 bits
4 bits
8 bits (LSB)
immediate
offset
source register
destination register
opcode
Note that most instructions do not use all of the fields. Unused fields shall be cleared to zero.
Instruction classes¶
The three LSB bits of the ‘opcode’ field store the instruction class:
class
value
description
BPF_LD
0x00
non-standard load operations
BPF_LDX
0x01
load into register operations
BPF_ST
0x02
store from immediate operations
BPF_STX
0x03
store from register operations
BPF_ALU
0x04
32-bit arithmetic operations
BPF_JMP
0x05
64-bit jump operations
BPF_JMP32
0x06
32-bit jump operations
BPF_ALU64
0x07
64-bit arithmetic operations
Arithmetic and jump instructions¶
For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and BPF_JMP32), the 8-bit ‘opcode’ field is divided into three parts:
4 bits (MSB)
1 bit
3 bits (LSB)
operation code
source
instruction class
The 4th bit encodes the source operand:
source
value
description
BPF_K
0x00
use 32-bit immediate as source operand
BPF_X
0x08
use ‘src_reg’ register as source operand
The four MSB bits store the operation code.
Arithmetic instructions¶
BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for otherwise identical operations. The code field encodes the operation as below:
code
value
description
BPF_ADD
0x00
dst += src
BPF_SUB
0x10
dst -= src
BPF_MUL
0x20
dst *= src
BPF_DIV
0x30
dst /= src
BPF_OR
0x40
dst |= src
BPF_AND
0x50
dst &= src
BPF_LSH
0x60
dst <<= src
BPF_RSH
0x70
dst >>= src
BPF_NEG
0x80
dst = ~src
BPF_MOD
0x90
dst %= src
BPF_XOR
0xa0
dst ^= src
BPF_MOV
0xb0
dst = src
BPF_ARSH
0xc0
sign extending shift right
BPF_END
0xd0
endianness conversion
BPF_ADD | BPF_X | BPF_ALU means:
dst_reg = (u32) dst_reg + (u32) src_reg;
BPF_ADD | BPF_X | BPF_ALU64 means:
dst_reg = dst_reg + src_reg
BPF_XOR | BPF_K | BPF_ALU means:
src_reg = (u32) src_reg ^ (u32) imm32
BPF_XOR | BPF_K | BPF_ALU64 means:
src_reg = src_reg ^ imm32
Jump instructions¶
BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for otherwise identical operations. The code field encodes the operation as below:
code
value
description
notes
BPF_JA
0x00
PC += off
BPF_JMP only
BPF_JEQ
0x10
PC += off if dst == src
BPF_JGT
0x20
PC += off if dst > src
unsigned
BPF_JGE
0x30
PC += off if dst >= src
unsigned
BPF_JSET
0x40
PC += off if dst & src
BPF_JNE
0x50
PC += off if dst != src
BPF_JSGT
0x60
PC += off if dst > src
signed
BPF_JSGE
0x70
PC += off if dst >= src
signed
BPF_CALL
0x80
function call
BPF_EXIT
0x90
function / program return
BPF_JMP only
BPF_JLT
0xa0
PC += off if dst < src
unsigned
BPF_JLE
0xb0
PC += off if dst <= src
unsigned
BPF_JSLT
0xc0
PC += off if dst < src
signed
BPF_JSLE
0xd0
PC += off if dst <= src
signed
The eBPF program needs to store the return value into register R0 before doing a BPF_EXIT.
Load and store instructions¶
For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the 8-bit ‘opcode’ field is divided as:
3 bits (MSB)
2 bits
3 bits (LSB)
mode
size
instruction class
The size modifier is one of:
size modifier
value
description
BPF_W
0x00
word (4 bytes)
BPF_H
0x08
half word (2 bytes)
BPF_B
0x10
byte
BPF_DW
0x18
double word (8 bytes)
The mode modifier is one of:
mode modifier
value
description
BPF_IMM
0x00
used for 64-bit mov
BPF_ABS
0x20
legacy BPF packet access
BPF_IND
0x40
legacy BPF packet access
BPF_MEM
0x60
all normal load and store operations
BPF_ATOMIC
0xc0
atomic operations
BPF_MEM | <size> | BPF_STX means:
*(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST means:
*(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX means:
dst_reg = *(size *) (src_reg + off)
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
Atomic operations¶
eBPF includes atomic operations, which use the immediate field for extra encoding:
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
The basic atomic operations supported are:
BPF_ADD
BPF_AND
BPF_OR
BPF_XOR
Each having equivalent semantics with the BPF_ADD
example, that is: the
memory location addresed by dst_reg + off
is atomically modified, with
src_reg
as the other operand. If the BPF_FETCH
flag is set in the
immediate, then these operations also overwrite src_reg
with the
value that was in memory before it was modified.
The more special operations are:
BPF_XCHG
This atomically exchanges src_reg
with the value addressed by dst_reg +
off
.
BPF_CMPXCHG
This atomically compares the value addressed by dst_reg + off
with
R0
. If they match it is replaced with src_reg
. In either case, the
value that was there before is zero-extended and loaded back to R0
.
Note that 1 and 2 byte atomic operations are not supported.
Clang can generate atomic instructions by default when -mcpu=v3
is
enabled. If a lower version for -mcpu
is set, the only atomic instruction
Clang can generate is BPF_ADD
without BPF_FETCH
. If you need to enable
the atomics features, while keeping a lower -mcpu
version, you can use
-Xclang -target-feature -Xclang +alu32
.
You may encounter BPF_XADD
- this is a legacy name for BPF_ATOMIC
,
referring to the exclusive-add operation encoded when the immediate field is
zero.
16-byte instructions¶
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM
which consists
of two consecutive struct bpf_insn
8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg.
Packet access instructions¶
eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD) which are used to access packet data.
They had to be carried over from classic BPF to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to struct sk_buff
and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
and must not be used to store the data across BPF_ABS | BPF_LD or
BPF_IND | BPF_LD instructions.
These instructions have implicit program exit condition as well. When eBPF program is trying to access the data beyond the packet boundary, the interpreter will abort the execution of the program. JIT compilers therefore must preserve this property. src_reg and imm32 fields are explicit inputs to these instructions.
For example, BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 are clobbered.