€•P=Œsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ5/translations/zh_CN/admin-guide/syscall-user-dispatch”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ5/translations/zh_TW/admin-guide/syscall-user-dispatch”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ5/translations/it_IT/admin-guide/syscall-user-dispatch”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ5/translations/ja_JP/admin-guide/syscall-user-dispatch”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ5/translations/ko_KR/admin-guide/syscall-user-dispatch”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ5/translations/sp_SP/admin-guide/syscall-user-dispatch”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒcomment”“”)”}”(hŒ SPDX-License-Identifier: GPL-2.0”h]”hŒ SPDX-License-Identifier: GPL-2.0”…””}”hh£sbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1h¡hhhžhhŸŒO/var/lib/git/docbuild/linux/Documentation/admin-guide/syscall-user-dispatch.rst”h KubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒSyscall User Dispatch”h]”hŒSyscall User Dispatch”…””}”(hh»hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hh¶hžhhŸh³h Kubhµ)”}”(hhh]”(hº)”}”(hŒ Background”h]”hŒ Background”…””}”(hhÌhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hhÉhžhhŸh³h KubhŒ paragraph”“”)”}”(hXÕCompatibility layers like Wine need a way to efficiently emulate system calls of only a part of their process - the part that has the incompatible code - while being able to execute native syscalls without a high performance penalty on the native part of the process. Seccomp falls short on this task, since it has limited support to efficiently filter syscalls based on memory regions, and it doesn't support removing filters. Therefore a new mechanism is necessary.”h]”hX×Compatibility layers like Wine need a way to efficiently emulate system calls of only a part of their process - the part that has the incompatible code - while being able to execute native syscalls without a high performance penalty on the native part of the process. Seccomp falls short on this task, since it has limited support to efficiently filter syscalls based on memory regions, and it doesn’t support removing filters. Therefore a new mechanism is necessary.”…””}”(hhÜhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K hhÉhžhubhÛ)”}”(hXáSyscall User Dispatch brings the filtering of the syscall dispatcher address back to userspace. The application is in control of a flip switch, indicating the current personality of the process. A multiple-personality application can then flip the switch without invoking the kernel, when crossing the compatibility layer API boundaries, to enable/disable the syscall redirection and execute syscalls directly (disabled) or send them to be emulated in userspace through a SIGSYS.”h]”hXáSyscall User Dispatch brings the filtering of the syscall dispatcher address back to userspace. The application is in control of a flip switch, indicating the current personality of the process. A multiple-personality application can then flip the switch without invoking the kernel, when crossing the compatibility layer API boundaries, to enable/disable the syscall redirection and execute syscalls directly (disabled) or send them to be emulated in userspace through a SIGSYS.”…””}”(hhêhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h KhhÉhžhubhÛ)”}”(hXqThe goal of this design is to provide very quick compatibility layer boundary crosses, which is achieved by not executing a syscall to change personality every time the compatibility layer executes. Instead, a userspace memory region exposed to the kernel indicates the current personality, and the application simply modifies that variable to configure the mechanism.”h]”hXqThe goal of this design is to provide very quick compatibility layer boundary crosses, which is achieved by not executing a syscall to change personality every time the compatibility layer executes. Instead, a userspace memory region exposed to the kernel indicates the current personality, and the application simply modifies that variable to configure the mechanism.”…””}”(hhøhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h KhhÉhžhubhÛ)”}”(hXThere is a relatively high cost associated with handling signals on most architectures, like x86, but at least for Wine, syscalls issued by native Windows code are currently not known to be a performance problem, since they are quite rare, at least for modern gaming applications.”h]”hXThere is a relatively high cost associated with handling signals on most architectures, like x86, but at least for Wine, syscalls issued by native Windows code are currently not known to be a performance problem, since they are quite rare, at least for modern gaming applications.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K"hhÉhžhubhÛ)”}”(hXPSince this mechanism is designed to capture syscalls issued by non-native applications, it must function on syscalls whose invocation ABI is completely unexpected to Linux. Syscall User Dispatch, therefore doesn't rely on any of the syscall ABI to make the filtering. It uses only the syscall dispatcher address and the userspace key.”h]”hXRSince this mechanism is designed to capture syscalls issued by non-native applications, it must function on syscalls whose invocation ABI is completely unexpected to Linux. Syscall User Dispatch, therefore doesn’t rely on any of the syscall ABI to make the filtering. It uses only the syscall dispatcher address and the userspace key.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K'hhÉhžhubhÛ)”}”(hŒŠAs the ABI of these intercepted syscalls is unknown to Linux, these syscalls are not instrumentable via ptrace or the syscall tracepoints.”h]”hŒŠAs the ABI of these intercepted syscalls is unknown to Linux, these syscalls are not instrumentable via ptrace or the syscall tracepoints.”…””}”(hj"hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K-hhÉhžhubeh}”(h]”Œ background”ah ]”h"]”Œ background”ah$]”h&]”uh1h´hh¶hžhhŸh³h Kubhµ)”}”(hhh]”(hº)”}”(hŒ Interface”h]”hŒ Interface”…””}”(hj;hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hj8hžhhŸh³h K1ubhÛ)”}”(hŒXA thread can setup this mechanism on supported kernels by executing the following prctl:”h]”hŒXA thread can setup this mechanism on supported kernels by executing the following prctl:”…””}”(hjIhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K3hj8hžhubhŒ block_quote”“”)”}”(hŒJprctl(PR_SET_SYSCALL_USER_DISPATCH, , , , [selector]) ”h]”hÛ)”}”(hŒIprctl(PR_SET_SYSCALL_USER_DISPATCH, , , , [selector])”h]”hŒIprctl(PR_SET_SYSCALL_USER_DISPATCH, , , , [selector])”…””}”(hj]hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K6hjYubah}”(h]”h ]”h"]”h$]”h&]”uh1jWhŸh³h K6hj8hžhubhÛ)”}”(hŒà is either PR_SYS_DISPATCH_EXCLUSIVE_ON/PR_SYS_DISPATCH_INCLUSIVE_ON or PR_SYS_DISPATCH_OFF, to enable and disable the mechanism globally for that thread. When PR_SYS_DISPATCH_OFF is used, the other fields must be zero.”h]”hŒà is either PR_SYS_DISPATCH_EXCLUSIVE_ON/PR_SYS_DISPATCH_INCLUSIVE_ON or PR_SYS_DISPATCH_OFF, to enable and disable the mechanism globally for that thread. When PR_SYS_DISPATCH_OFF is used, the other fields must be zero.”…””}”(hjqhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K8hj8hžhubhÛ)”}”(hXˆFor PR_SYS_DISPATCH_EXCLUSIVE_ON [, +) delimit a memory region interval from which syscalls are always executed directly, regardless of the userspace selector. This provides a fast path for the C library, which includes the most common syscall dispatchers in the native code applications, and also provides a way for the signal handler to return without triggering a nested SIGSYS on (rt\_)sigreturn. Users of this interface should make sure that at least the signal trampoline code is included in this region. In addition, for syscalls that implement the trampoline code on the vDSO, that trampoline is never intercepted.”h]”hXˆFor PR_SYS_DISPATCH_EXCLUSIVE_ON [, +) delimit a memory region interval from which syscalls are always executed directly, regardless of the userspace selector. This provides a fast path for the C library, which includes the most common syscall dispatchers in the native code applications, and also provides a way for the signal handler to return without triggering a nested SIGSYS on (rt_)sigreturn. Users of this interface should make sure that at least the signal trampoline code is included in this region. In addition, for syscalls that implement the trampoline code on the vDSO, that trampoline is never intercepted.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h K, +) delimit a memory region interval from which syscalls are dispatched based on the userspace selector. Syscalls from outside of the range are always executed directly.”h]”hŒäFor PR_SYS_DISPATCH_INCLUSIVE_ON [, +) delimit a memory region interval from which syscalls are dispatched based on the userspace selector. Syscalls from outside of the range are always executed directly.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h KFhj8hžhubhÛ)”}”(hX][selector] is a pointer to a char-sized region in the process memory region, that provides a quick way to enable disable syscall redirection thread-wide, without the need to invoke the kernel directly. selector can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK. Any other value should terminate the program with a SIGSYS.”h]”hX][selector] is a pointer to a char-sized region in the process memory region, that provides a quick way to enable disable syscall redirection thread-wide, without the need to invoke the kernel directly. selector can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK. Any other value should terminate the program with a SIGSYS.”…””}”(hj›hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h KKhj8hžhubhÛ)”}”(hŒÈAdditionally, a tasks syscall user dispatch configuration can be peeked and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace requests. This is useful for checkpoint/restart software.”h]”hŒÈAdditionally, a tasks syscall user dispatch configuration can be peeked and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace requests. This is useful for checkpoint/restart software.”…””}”(hj©hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h KQhj8hžhubeh}”(h]”Œ interface”ah ]”h"]”Œ interface”ah$]”h&]”uh1h´hh¶hžhhŸh³h K1ubhµ)”}”(hhh]”(hº)”}”(hŒSecurity Notes”h]”hŒSecurity Notes”…””}”(hjÂhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hj¿hžhhŸh³h KVubhÛ)”}”(hXbSyscall User Dispatch provides functionality for compatibility layers to quickly capture system calls issued by a non-native part of the application, while not impacting the Linux native regions of the process. It is not a mechanism for sandboxing system calls, and it should not be seen as a security mechanism, since it is trivial for a malicious application to subvert the mechanism by jumping to an allowed dispatcher region prior to executing the syscall, or to discover the address and modify the selector value. If the use case requires any kind of security sandboxing, Seccomp should be used instead.”h]”hXbSyscall User Dispatch provides functionality for compatibility layers to quickly capture system calls issued by a non-native part of the application, while not impacting the Linux native regions of the process. It is not a mechanism for sandboxing system calls, and it should not be seen as a security mechanism, since it is trivial for a malicious application to subvert the mechanism by jumping to an allowed dispatcher region prior to executing the syscall, or to discover the address and modify the selector value. If the use case requires any kind of security sandboxing, Seccomp should be used instead.”…””}”(hjÐhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h KXhj¿hžhubhÛ)”}”(hŒUAny fork or exec of the existing process resets the mechanism to PR_SYS_DISPATCH_OFF.”h]”hŒUAny fork or exec of the existing process resets the mechanism to PR_SYS_DISPATCH_OFF.”…””}”(hjÞhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÚhŸh³h Kbhj¿hžhubeh}”(h]”Œsecurity-notes”ah ]”h"]”Œsecurity notes”ah$]”h&]”uh1h´hh¶hžhhŸh³h KVubeh}”(h]”Œsyscall-user-dispatch”ah ]”h"]”Œsyscall user dispatch”ah$]”h&]”uh1h´hhhžhhŸh³h Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”h³uh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(h¹NŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”h³Œ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”Œrefids”}”Œnameids”}”(jùjöj5j2j¼j¹jñjîuŒ nametypes”}”(jù‰j5‰j¼‰jñ‰uh}”(jöh¶j2hÉj¹j8jîj¿uŒ footnote_refs”}”Œ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”Œ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”…”R”Œparse_messages”]”Œtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nhžhub.