Jeff Xu h]h paragraph)}(hDDaniel Verkamp Jeff Xu h](hDaniel Verkamp <}(hhhhhNhNubh reference)}(hdverkamp@chromium.orgh]hdverkamp@chromium.org}(hhhhhNhNubah}(h]h ]h"]h$]h&]refurimailto:dverkamp@chromium.orguh1hhhubh > Jeff Xu <}(hhhhhNhNubh)}(hjeffxu@chromium.orgh]hjeffxu@chromium.org}(hj hhhNhNubah}(h]h ]h"]h$]h&]refurimailto:jeffxu@chromium.orguh1hhhubh>}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhubah}(h]h ]h"]h$]h&]uh1hhhubeh}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Contributorh]h Contributor}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj/hhhKubh)}(h!Aleksa Sarai h]h)}(h Aleksa Sarai h](hAleksa Sarai <}(hjDhhhNhNubh)}(hcyphar@cyphar.comh]hcyphar@cyphar.com}(hjLhhhNhNubah}(h]h ]h"]h$]h&]refurimailto:cyphar@cyphar.comuh1hhjDubh>}(hjDhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK hj@ubah}(h]h ]h"]h$]h&]uh1hhj/ubeh}(h]h ]h"]h$]h&]uh1hhhhK hhhhubeh}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hSince Linux introduced the memfd feature, memfds have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently.h]hSince Linux introduced the memfd feature, memfds have always had their execute bit set, and the memfd_create() syscall doesn’t allow setting it differently.}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(hXHowever, in a secure-by-default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. [2] lists more VRP of this kind.h]hXHowever, in a secure-by-default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. [2] lists more VRP of this kind.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hX On the other hand, executable memfd has its legit use: runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them. For such a system, we need a solution to differentiate runc's use of executable memfds and an attacker's [3].h]hXOn the other hand, executable memfd has its legit use: runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them. For such a system, we need a solution to differentiate runc’s use of executable memfds and an attacker’s [3].}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubhdefinition_list)}(hhh]hdefinition_list_item)}(hTo address those above: - Let memfd_create() set X bit at creation time. - Let memfd be sealed for modifying X bit when NX is set. - Add a new pid namespace sysctl: vm.memfd_noexec to help applications in migrating and enforcing non-executable MFD. h](hterm)}(hTo address those above:h]hTo address those above:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK"hjubh definition)}(hhh]h bullet_list)}(hhh](h list_item)}(h.Let memfd_create() set X bit at creation time.h]h)}(hjh]h.Let memfd_create() set X bit at creation time.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(h7Let memfd be sealed for modifying X bit when NX is set.h]h)}(hjh]h7Let memfd be sealed for modifying X bit when NX is set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(htAdd a new pid namespace sysctl: vm.memfd_noexec to help applications in migrating and enforcing non-executable MFD. h]h)}(hsAdd a new pid namespace sysctl: vm.memfd_noexec to help applications in migrating and enforcing non-executable MFD.h]hsAdd a new pid namespace sysctl: vm.memfd_noexec to help applications in migrating and enforcing non-executable MFD.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK!hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]bullet-uh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhK"hjubah}(h]h ]h"]h$]h&]uh1jhhhhhNhNubh)}(hhh](h)}(hUser APIh]hUser API}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj)hhhhhK%ubh)}(h:``int memfd_create(const char *name, unsigned int flags)``h]hliteral)}(hj<h]h6int memfd_create(const char *name, unsigned int flags)}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1j>hj:ubah}(h]h ]h"]h$]h&]uh1hhhhK&hj)hhubj)}(hhh](j)}(hX``MFD_NOEXEC_SEAL`` When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created with NX. F_SEAL_EXEC is set and the memfd can't be modified to add X later. MFD_ALLOW_SEALING is also implied. This is the most common case for the application to use memfd. h](j)}(h``MFD_NOEXEC_SEAL``h]j?)}(hj\h]hMFD_NOEXEC_SEAL}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjZubah}(h]h ]h"]h$]h&]uh1jhhhK,hjVubj)}(hhh]h)}(hWhen MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created with NX. F_SEAL_EXEC is set and the memfd can't be modified to add X later. MFD_ALLOW_SEALING is also implied. This is the most common case for the application to use memfd.h](h'When MFD_NOEXEC_SEAL bit is set in the }(hjthhhNhNubj?)}(h ``flags``h]hflags}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjtubh, memfd is created with NX. F_SEAL_EXEC is set and the memfd can’t be modified to add X later. MFD_ALLOW_SEALING is also implied. This is the most common case for the application to use memfd.}(hjthhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK)hjqubah}(h]h ]h"]h$]h&]uh1jhjVubeh}(h]h ]h"]h$]h&]uh1jhhhK,hjSubj)}(hQ``MFD_EXEC`` When MFD_EXEC bit is set in the ``flags``, memfd is created with X. h](j)}(h ``MFD_EXEC``h]j?)}(hjh]hMFD_EXEC}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjubah}(h]h ]h"]h$]h&]uh1jhhhK/hjubj)}(hhh]h)}(hCWhen MFD_EXEC bit is set in the ``flags``, memfd is created with X.h](h When MFD_EXEC bit is set in the }(hjhhhNhNubj?)}(h ``flags``h]hflags}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjubh, memfd is created with X.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK/hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhK/hjShhubj)}(hNote: ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that an app doesn't want sealing, it can add F_SEAL_SEAL after creation. h](j)}(hNote:h]hNote:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK4hjubj)}(hhh]h)}(h``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that an app doesn't want sealing, it can add F_SEAL_SEAL after creation.h](j?)}(h``MFD_NOEXEC_SEAL``h]hMFD_NOEXEC_SEAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjubh implies }(hjhhhNhNubj?)}(h``MFD_ALLOW_SEALING``h]hMFD_ALLOW_SEALING}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjubhT. In case that an app doesn’t want sealing, it can add F_SEAL_SEAL after creation.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK2hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhK4hjShhubeh}(h]h ]h"]h$]h&]uh1jhj)hhhhhNubeh}(h]user-apiah ]h"]user apiah$]h&]uh1hhhhhhhhK%ubh)}(hhh](h)}(hSysctl:h]hSysctl:}(hjJhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjGhhhhhK7ubh)}(h)``pid namespaced sysctl vm.memfd_noexec``h]j?)}(hjZh]h%pid namespaced sysctl vm.memfd_noexec}(hj\hhhNhNubah}(h]h ]h"]h$]h&]uh1j>hjXubah}(h]h ]h"]h$]h&]uh1hhhhK8hjGhhubh)}(h;The new pid namespaced sysctl vm.memfd_noexec has 3 values:h]h;The new pid namespaced sysctl vm.memfd_noexec has 3 values:}(hjohhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK:hjGhhubh block_quote)}(hXn- 0: MEMFD_NOEXEC_SCOPE_EXEC memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_EXEC was set. - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_NOEXEC_SEAL was set. - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED memfd_create() without MFD_NOEXEC_SEAL will be rejected. h]j)}(hhh](j)}(hu0: MEMFD_NOEXEC_SCOPE_EXEC memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_EXEC was set. h]j)}(hhh]j)}(hk0: MEMFD_NOEXEC_SCOPE_EXEC memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_EXEC was set. h](j)}(h0: MEMFD_NOEXEC_SCOPE_EXECh]h0: MEMFD_NOEXEC_SCOPE_EXEC}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhK>hjubj)}(hhh]h)}(hOmemfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_EXEC was set.h]hOmemfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_EXEC was set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK=hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhK>hjubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(h1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_NOEXEC_SEAL was set. h]j)}(hhh]j)}(hy1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_NOEXEC_SEAL was set. h](j)}(h!1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEALh]h!1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKBhjubj)}(hhh]h)}(hVmemfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_NOEXEC_SEAL was set.h]hVmemfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_NOEXEC_SEAL was set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKAhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKBhjubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hd2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED memfd_create() without MFD_NOEXEC_SEAL will be rejected. h]j)}(hhh]j)}(h_2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED memfd_create() without MFD_NOEXEC_SEAL will be rejected. h](j)}(h%2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCEDh]h%2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhhhKEhjubj)}(hhh]h)}(h8memfd_create() without MFD_NOEXEC_SEAL will be rejected.h]h8memfd_create() without MFD_NOEXEC_SEAL will be rejected.}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKEhj#ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKEhjubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]jjuh1jhhhK