=============================================== Memory Tagging Extension (MTE) in AArch64 Linux =============================================== Authors: Vincenzo Frascino Catalin Marinas Date: 2020-02-25 This document describes the provision of the Memory Tagging Extension functionality in AArch64 Linux. Introduction ============ ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI (Top Byte Ignore) feature and allows software to access a 4-bit allocation tag for each 16-byte granule in the physical address space. Such memory range must be mapped with the Normal-Tagged memory attribute. A logical tag is derived from bits 59-56 of the virtual address used for the memory access. A CPU with MTE enabled will compare the logical tag against the allocation tag and potentially raise an exception on mismatch, subject to system registers configuration. Userspace Support ================= When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is supported by the hardware, the kernel advertises the feature to userspace via ``HWCAP2_MTE``. PROT_MTE -------- To access the allocation tags, a user process must enable the Tagged memory attribute on an address range using a new ``prot`` flag for ``mmap()`` and ``mprotect()``: ``PROT_MTE`` - Pages allow access to the MTE allocation tags. The allocation tag is set to 0 when such pages are first mapped in the user address space and preserved on copy-on-write. ``MAP_SHARED`` is supported and the allocation tags can be shared between processes. **Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other types of mapping will result in ``-EINVAL`` returned by these system calls. **Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot be cleared by ``mprotect()``. **Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and ``MADV_FREE`` may have the allocation tags cleared (set to 0) at any point after the system call. Tag Check Faults ---------------- When ``PROT_MTE`` is enabled on an address range and a mismatch between the logical and allocation tags occurs on access, there are three configurable behaviours: - *Ignore* - This is the default mode. The CPU (and kernel) ignores the tag check fault. - *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with ``.si_code = SEGV_MTESERR`` and ``.si_addr = ``. The memory access is not performed. If ``SIGSEGV`` is ignored or blocked by the offending thread, the containing process is terminated with a ``coredump``. - *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending thread, asynchronously following one or multiple tag check faults, with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting address is unknown). The user can select the above modes, per thread, using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK`` bit-field: - ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults - ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode - ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode The current tag check fault mode can be read using the ``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. Tag checking can also be disabled for a user thread by setting the ``PSTATE.TCO`` bit with ``MSR TCO, #1``. **Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, irrespective of the interrupted context. ``PSTATE.TCO`` is restored on ``sigreturn()``. **Note**: There are no *match-all* logical tags available for user applications. **Note**: Kernel accesses to the user address space (e.g. ``read()`` system call) are not checked if the user thread tag checking mode is ``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is ``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user address accesses, however it cannot always guarantee it. Kernel accesses to user addresses are always performed with an effective ``PSTATE.TCO`` value of zero, regardless of the user configuration. Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions ----------------------------------------------------------------- The architecture allows excluding certain tags to be randomly generated via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux excludes all tags other than 0. A user thread can enable specific tags in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap in the ``PR_MTE_TAG_MASK`` bit-field. **Note**: The hardware uses an exclude mask but the ``prctl()`` interface provides an include mask. An include mask of ``0`` (exclusion mask ``0xffff``) results in the CPU always generating tag ``0``. Initial process state --------------------- On ``execve()``, the new process has the following configuration: - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) - Tag checking mode set to ``PR_MTE_TCF_NONE`` - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) - ``PSTATE.TCO`` set to 0 - ``PROT_MTE`` not set on any of the initial memory maps On ``fork()``, the new process inherits the parent's configuration and memory map attributes with the exception of the ``madvise()`` ranges with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set to 0). The ``ptrace()`` interface -------------------------- ``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read the tags from or set the tags to a tracee's address space. The ``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, data)`` where: - ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. - ``pid`` - the tracee's PID. - ``addr`` - address in the tracee's address space. - ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to a buffer of ``iov_len`` length in the tracer's address space. The tags in the tracer's ``iov_base`` buffer are represented as one 4-bit tag per byte and correspond to a 16-byte MTE tag granule in the tracee's address space. **Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel will use the corresponding aligned address. ``ptrace()`` return value: - 0 - tags were copied, the tracer's ``iov_len`` was updated to the number of tags transferred. This may be smaller than the requested ``iov_len`` if the requested address range in the tracee's or the tracer's space cannot be accessed or does not have valid tags. - ``-EPERM`` - the specified process cannot be traced. - ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid address) and no tags copied. ``iov_len`` not updated. - ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. - ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. **Note**: There are no transient errors for the requests above, so user programs should not retry in case of a non-zero system call return. ``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == ``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged address ABI control and MTE configuration of a process as per the ``prctl()`` options described in Documentation/arm64/tagged-address-abi.rst and above. The corresponding ``regset`` is 1 element of 8 bytes (``sizeof(long))``). Example of correct usage ======================== *MTE Example code* .. code-block:: c /* * To be compiled with -march=armv8.5-a+memtag */ #include #include #include #include #include #include #include #include /* * From arch/arm64/include/uapi/asm/hwcap.h */ #define HWCAP2_MTE (1 << 18) /* * From arch/arm64/include/uapi/asm/mman.h */ #define PROT_MTE 0x20 /* * From include/uapi/linux/prctl.h */ #define PR_SET_TAGGED_ADDR_CTRL 55 #define PR_GET_TAGGED_ADDR_CTRL 56 # define PR_TAGGED_ADDR_ENABLE (1UL << 0) # define PR_MTE_TCF_SHIFT 1 # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) # define PR_MTE_TAG_SHIFT 3 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) /* * Insert a random logical tag into the given pointer. */ #define insert_random_tag(ptr) ({ \ uint64_t __val; \ asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ __val; \ }) /* * Set the allocation tag on the destination address. */ #define set_tag(tagged_addr) do { \ asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ } while (0) int main() { unsigned char *a; unsigned long page_sz = sysconf(_SC_PAGESIZE); unsigned long hwcap2 = getauxval(AT_HWCAP2); /* check if MTE is present */ if (!(hwcap2 & HWCAP2_MTE)) return EXIT_FAILURE; /* * Enable the tagged address ABI, synchronous MTE tag check faults and * allow all non-zero tags in the randomly generated set. */ if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT), 0, 0, 0)) { perror("prctl() failed"); return EXIT_FAILURE; } a = mmap(0, page_sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (a == MAP_FAILED) { perror("mmap() failed"); return EXIT_FAILURE; } /* * Enable MTE on the above anonymous mmap. The flag could be passed * directly to mmap() and skip this step. */ if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { perror("mprotect() failed"); return EXIT_FAILURE; } /* access with the default tag (0) */ a[0] = 1; a[1] = 2; printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); /* set the logical and allocation tags */ a = (unsigned char *)insert_random_tag(a); set_tag(a); printf("%p\n", a); /* non-zero tag access */ a[0] = 3; printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); /* * If MTE is enabled correctly the next instruction will generate an * exception. */ printf("Expecting SIGSEGV...\n"); a[16] = 0xdd; /* this should not be printed in the PR_MTE_TCF_SYNC mode */ printf("...haven't got one\n"); return EXIT_FAILURE; }