€•ÄžŒsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ /translations/zh_CN/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ /translations/zh_TW/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ /translations/it_IT/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ /translations/ja_JP/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ /translations/ko_KR/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒPortuguese (Brazilian)”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ /translations/pt_BR/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh–sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ /translations/sp_SP/arch/x86/pti”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒcomment”“”)”}”(hŒ SPDX-License-Identifier: GPL-2.0”h]”hŒ SPDX-License-Identifier: GPL-2.0”…””}”hh·sbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1hµhhh²hh³Œ:/var/lib/git/docbuild/linux/Documentation/arch/x86/pti.rst”h´KubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒPage Table Isolation (PTI)”h]”hŒPage Table Isolation (PTI)”…””}”(hhÏh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhhÊh²hh³hÇh´KubhÉ)”}”(hhh]”(hÎ)”}”(hŒOverview”h]”hŒOverview”…””}”(hhàh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhhÝh²hh³hÇh´KubhŒ paragraph”“”)”}”(hŒ­Page Table Isolation (pti, previously known as KAISER [1]_) is a countermeasure against attacks on the shared user/kernel address space such as the "Meltdown" approach [2]_.”h]”(hŒ6Page Table Isolation (pti, previously known as KAISER ”…””}”(hhðh²hh³Nh´NubhŒfootnote_reference”“”)”}”(hŒ[1]_”h]”hŒ1”…””}”(hhúh²hh³Nh´Nubah}”(h]”Œid1”ah ]”h"]”h$]”h&]”Œrefid”Œid3”Œdocname”Œ arch/x86/pti”uh1høhhðŒresolved”KubhŒr) is a countermeasure against attacks on the shared user/kernel address space such as the “Meltdown†approach ”…””}”(hhðh²hh³Nh´Nubhù)”}”(hŒ[2]_”h]”hŒ2”…””}”(hjh²hh³Nh´Nubah}”(h]”Œid2”ah ]”h"]”h$]”h&]”j Œid4”j j uh1høhhðj KubhŒ.”…””}”(hhðh²hh³Nh´Nubeh}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K hhÝh²hubhï)”}”(hXFTo mitigate this class of attacks, we create an independent set of page tables for use only when running userspace applications. When the kernel is entered via syscalls, interrupts or exceptions, the page tables are switched to the full "kernel" copy. When the system switches back to user mode, the user copy is used again.”h]”hXJTo mitigate this class of attacks, we create an independent set of page tables for use only when running userspace applications. When the kernel is entered via syscalls, interrupts or exceptions, the page tables are switched to the full “kernel†copy. When the system switches back to user mode, the user copy is used again.”…””}”(hj,h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KhhÝh²hubhï)”}”(hXXThe userspace page tables contain only a minimal amount of kernel data: only what is needed to enter/exit the kernel such as the entry/exit functions themselves and the interrupt descriptor table (IDT). There are a few strictly unnecessary things that get mapped such as the first C function when entering an interrupt (see comments in pti.c).”h]”hXXThe userspace page tables contain only a minimal amount of kernel data: only what is needed to enter/exit the kernel such as the entry/exit functions themselves and the interrupt descriptor table (IDT). There are a few strictly unnecessary things that get mapped such as the first C function when entering an interrupt (see comments in pti.c).”…””}”(hj:h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KhhÝh²hubhï)”}”(hXYThis approach helps to ensure that side-channel attacks leveraging the paging structures do not function when PTI is enabled. It can be enabled by setting CONFIG_MITIGATION_PAGE_TABLE_ISOLATION=y at compile time. Once enabled at compile-time, it can be disabled at boot with the 'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt).”h]”hXaThis approach helps to ensure that side-channel attacks leveraging the paging structures do not function when PTI is enabled. It can be enabled by setting CONFIG_MITIGATION_PAGE_TABLE_ISOLATION=y at compile time. Once enabled at compile-time, it can be disabled at boot with the ‘nopti’ or ‘pti=’ kernel parameters (see kernel-parameters.txt).”…””}”(hjHh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KhhÝh²hubeh}”(h]”Œoverview”ah ]”h"]”Œoverview”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´KubhÉ)”}”(hhh]”(hÎ)”}”(hŒPage Table Management”h]”hŒPage Table Management”…””}”(hjah²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhj^h²hh³hÇh´K"ubhï)”}”(hXWhen PTI is enabled, the kernel manages two sets of page tables. The first set is very similar to the single set which is present in kernels without PTI. This includes a complete mapping of userspace that the kernel can use for things like copy_to_user().”h]”hXWhen PTI is enabled, the kernel manages two sets of page tables. The first set is very similar to the single set which is present in kernels without PTI. This includes a complete mapping of userspace that the kernel can use for things like copy_to_user().”…””}”(hjoh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K$hj^h²hubhï)”}”(hŒðAlthough _complete_, the user portion of the kernel page tables is crippled by setting the NX bit in the top level. This ensures that any missed kernel->user CR3 switch will immediately crash userspace upon executing its first instruction.”h]”hŒðAlthough _complete_, the user portion of the kernel page tables is crippled by setting the NX bit in the top level. This ensures that any missed kernel->user CR3 switch will immediately crash userspace upon executing its first instruction.”…””}”(hj}h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K)hj^h²hubhï)”}”(hX The userspace page tables map only the kernel data needed to enter and exit the kernel. This data is entirely contained in the 'struct cpu_entry_area' structure which is placed in the fixmap which gives each CPU's copy of the area a compile-time-fixed virtual address.”h]”hXThe userspace page tables map only the kernel data needed to enter and exit the kernel. This data is entirely contained in the ‘struct cpu_entry_area’ structure which is placed in the fixmap which gives each CPU’s copy of the area a compile-time-fixed virtual address.”…””}”(hj‹h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K.hj^h²hubhï)”}”(hXFor new userspace mappings, the kernel makes the entries in its page tables like normal. The only difference is when the kernel makes entries in the top (PGD) level. In addition to setting the entry in the main kernel PGD, a copy of the entry is made in the userspace page tables' PGD.”h]”hX!For new userspace mappings, the kernel makes the entries in its page tables like normal. The only difference is when the kernel makes entries in the top (PGD) level. In addition to setting the entry in the main kernel PGD, a copy of the entry is made in the userspace page tables’ PGD.”…””}”(hj™h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K3hj^h²hubhï)”}”(hŒâThis sharing at the PGD level also inherently shares all the lower layers of the page tables. This leaves a single, shared set of userspace page tables to manage. One PTE to lock, one set of accessed bits, dirty bits, etc...”h]”hŒâThis sharing at the PGD level also inherently shares all the lower layers of the page tables. This leaves a single, shared set of userspace page tables to manage. One PTE to lock, one set of accessed bits, dirty bits, etc...”…””}”(hj§h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K9hj^h²hubeh}”(h]”Œpage-table-management”ah ]”h"]”Œpage table management”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K"ubhÉ)”}”(hhh]”(hÎ)”}”(hŒOverhead”h]”hŒOverhead”…””}”(hjÀh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhj½h²hh³hÇh´K?ubhï)”}”(hŒ\Protection against side-channel attacks is important. But, this protection comes at a cost:”h]”hŒ\Protection against side-channel attacks is important. But, this protection comes at a cost:”…””}”(hjÎh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KAhj½h²hubhŒenumerated_list”“”)”}”(hhh]”hŒ list_item”“”)”}”(hŒIncreased Memory Use ”h]”hï)”}”(hŒIncreased Memory Use”h]”hŒIncreased Memory Use”…””}”(hjçh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KDhjãubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÞh²hh³hÇh´Nubah}”(h]”h ]”h"]”h$]”h&]”Œenumtype”Œarabic”Œprefix”hŒsuffix”Œ.”uh1jÜhj½h²hh³hÇh´KDubhŒ block_quote”“”)”}”(hX[a. Each process now needs an order-1 PGD instead of order-0. (Consumes an additional 4k per process). b. The 'cpu_entry_area' structure must be 2MB in size and 2MB aligned so that it can be mapped by setting a single PMD entry. This consumes nearly 2MB of RAM once the kernel is decompressed, but no space in the kernel image itself. ”h]”jÝ)”}”(hhh]”(jâ)”}”(hŒbEach process now needs an order-1 PGD instead of order-0. (Consumes an additional 4k per process).”h]”hï)”}”(hŒbEach process now needs an order-1 PGD instead of order-0. (Consumes an additional 4k per process).”h]”hŒbEach process now needs an order-1 PGD instead of order-0. (Consumes an additional 4k per process).”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KFhjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhj ubjâ)”}”(hŒæThe 'cpu_entry_area' structure must be 2MB in size and 2MB aligned so that it can be mapped by setting a single PMD entry. This consumes nearly 2MB of RAM once the kernel is decompressed, but no space in the kernel image itself. ”h]”hï)”}”(hŒåThe 'cpu_entry_area' structure must be 2MB in size and 2MB aligned so that it can be mapped by setting a single PMD entry. This consumes nearly 2MB of RAM once the kernel is decompressed, but no space in the kernel image itself.”h]”hŒéThe ‘cpu_entry_area’ structure must be 2MB in size and 2MB aligned so that it can be mapped by setting a single PMD entry. This consumes nearly 2MB of RAM once the kernel is decompressed, but no space in the kernel image itself.”…””}”(hj+h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KHhj'ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhj ubeh}”(h]”h ]”h"]”h$]”h&]”jŒ loweralpha”jhjjuh1jÜhjubah}”(h]”h ]”h"]”h$]”h&]”uh1jh³hÇh´KFhj½h²hubjÝ)”}”(hhh]”jâ)”}”(hŒ Runtime Cost ”h]”hï)”}”(hŒ Runtime Cost”h]”hŒ Runtime Cost”…””}”(hjSh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KMhjOubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjLh²hh³hÇh´Nubah}”(h]”h ]”h"]”h$]”h&]”jjjhjjŒstart”Kuh1jÜhj½h²hh³hÇh´KMubj)”}”(hXI a. CR3 manipulation to switch between the page table copies must be done at interrupt, syscall, and exception entry and exit (it can be skipped when the kernel is interrupted, though.) Moves to CR3 are on the order of a hundred cycles, and are required at every entry and exit. b. Percpu TSS is mapped into the user page tables to allow SYSCALL64 path to work under PTI. This doesn't have a direct runtime cost but it can be argued it opens certain timing attack scenarios. c. Global pages are disabled for all kernel structures not mapped into both kernel and userspace page tables. This feature of the MMU allows different processes to share TLB entries mapping the kernel. Losing the feature means more TLB misses after a context switch. The actual loss of performance is very small, however, never exceeding 1%. d. Process Context IDentifiers (PCID) is a CPU feature that allows us to skip flushing the entire TLB when switching page tables by setting a special bit in CR3 when the page tables are changed. This makes switching the page tables (at context switch, or kernel entry/exit) cheaper. But, on systems with PCID support, the context switch code must flush both the user and kernel entries out of the TLB. The user PCID TLB flush is deferred until the exit to userspace, minimizing the cost. See intel.com/sdm for the gory PCID/INVPCID details. e. The userspace page tables must be populated for each new process. Even without PTI, the shared kernel mappings are created by copying top-level (PGD) entries into each new process. But, with PTI, there are now *two* kernel mappings: one in the kernel page tables that maps everything and one for the entry/exit structures. At fork(), we need to copy both. f. In addition to the fork()-time copying, there must also be an update to the userspace PGD any time a set_pgd() is done on a PGD used to map userspace. This ensures that the kernel and userspace copies always map the same userspace memory. g. On systems without PCID support, each CR3 write flushes the entire TLB. That means that each syscall, interrupt or exception flushes the TLB. h. INVPCID is a TLB-flushing instruction which allows flushing of TLB entries for non-current PCIDs. Some systems support PCIDs, but do not support INVPCID. On these systems, addresses can only be flushed from the TLB for the current PCID. When flushing a kernel address, we need to flush all PCIDs, so a single kernel address flush will require a TLB-flushing CR3 write upon the next use of every PCID. ”h]”jÝ)”}”(hhh]”(jâ)”}”(hXCR3 manipulation to switch between the page table copies must be done at interrupt, syscall, and exception entry and exit (it can be skipped when the kernel is interrupted, though.) Moves to CR3 are on the order of a hundred cycles, and are required at every entry and exit.”h]”hï)”}”(hXCR3 manipulation to switch between the page table copies must be done at interrupt, syscall, and exception entry and exit (it can be skipped when the kernel is interrupted, though.) Moves to CR3 are on the order of a hundred cycles, and are required at every entry and exit.”h]”hXCR3 manipulation to switch between the page table copies must be done at interrupt, syscall, and exception entry and exit (it can be skipped when the kernel is interrupted, though.) Moves to CR3 are on the order of a hundred cycles, and are required at every entry and exit.”…””}”(hjyh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KOhjuubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hŒÀPercpu TSS is mapped into the user page tables to allow SYSCALL64 path to work under PTI. This doesn't have a direct runtime cost but it can be argued it opens certain timing attack scenarios.”h]”hï)”}”(hŒÀPercpu TSS is mapped into the user page tables to allow SYSCALL64 path to work under PTI. This doesn't have a direct runtime cost but it can be argued it opens certain timing attack scenarios.”h]”hŒÂPercpu TSS is mapped into the user page tables to allow SYSCALL64 path to work under PTI. This doesn’t have a direct runtime cost but it can be argued it opens certain timing attack scenarios.”…””}”(hj‘h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KThjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hXUGlobal pages are disabled for all kernel structures not mapped into both kernel and userspace page tables. This feature of the MMU allows different processes to share TLB entries mapping the kernel. Losing the feature means more TLB misses after a context switch. The actual loss of performance is very small, however, never exceeding 1%.”h]”hï)”}”(hXUGlobal pages are disabled for all kernel structures not mapped into both kernel and userspace page tables. This feature of the MMU allows different processes to share TLB entries mapping the kernel. Losing the feature means more TLB misses after a context switch. The actual loss of performance is very small, however, never exceeding 1%.”h]”hXUGlobal pages are disabled for all kernel structures not mapped into both kernel and userspace page tables. This feature of the MMU allows different processes to share TLB entries mapping the kernel. Losing the feature means more TLB misses after a context switch. The actual loss of performance is very small, however, never exceeding 1%.”…””}”(hj©h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KWhj¥ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hXProcess Context IDentifiers (PCID) is a CPU feature that allows us to skip flushing the entire TLB when switching page tables by setting a special bit in CR3 when the page tables are changed. This makes switching the page tables (at context switch, or kernel entry/exit) cheaper. But, on systems with PCID support, the context switch code must flush both the user and kernel entries out of the TLB. The user PCID TLB flush is deferred until the exit to userspace, minimizing the cost. See intel.com/sdm for the gory PCID/INVPCID details.”h]”hï)”}”(hXProcess Context IDentifiers (PCID) is a CPU feature that allows us to skip flushing the entire TLB when switching page tables by setting a special bit in CR3 when the page tables are changed. This makes switching the page tables (at context switch, or kernel entry/exit) cheaper. But, on systems with PCID support, the context switch code must flush both the user and kernel entries out of the TLB. The user PCID TLB flush is deferred until the exit to userspace, minimizing the cost. See intel.com/sdm for the gory PCID/INVPCID details.”h]”hXProcess Context IDentifiers (PCID) is a CPU feature that allows us to skip flushing the entire TLB when switching page tables by setting a special bit in CR3 when the page tables are changed. This makes switching the page tables (at context switch, or kernel entry/exit) cheaper. But, on systems with PCID support, the context switch code must flush both the user and kernel entries out of the TLB. The user PCID TLB flush is deferred until the exit to userspace, minimizing the cost. See intel.com/sdm for the gory PCID/INVPCID details.”…””}”(hjÁh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K]hj½ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hXfThe userspace page tables must be populated for each new process. Even without PTI, the shared kernel mappings are created by copying top-level (PGD) entries into each new process. But, with PTI, there are now *two* kernel mappings: one in the kernel page tables that maps everything and one for the entry/exit structures. At fork(), we need to copy both.”h]”hï)”}”(hXfThe userspace page tables must be populated for each new process. Even without PTI, the shared kernel mappings are created by copying top-level (PGD) entries into each new process. But, with PTI, there are now *two* kernel mappings: one in the kernel page tables that maps everything and one for the entry/exit structures. At fork(), we need to copy both.”h]”(hŒÔThe userspace page tables must be populated for each new process. Even without PTI, the shared kernel mappings are created by copying top-level (PGD) entries into each new process. But, with PTI, there are now ”…””}”(hjÙh²hh³Nh´NubhŒemphasis”“”)”}”(hŒ*two*”h]”hŒtwo”…””}”(hjãh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÙubhŒ kernel mappings: one in the kernel page tables that maps everything and one for the entry/exit structures. At fork(), we need to copy both.”…””}”(hjÙh²hh³Nh´Nubeh}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KfhjÕubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hŒïIn addition to the fork()-time copying, there must also be an update to the userspace PGD any time a set_pgd() is done on a PGD used to map userspace. This ensures that the kernel and userspace copies always map the same userspace memory.”h]”hï)”}”(hŒïIn addition to the fork()-time copying, there must also be an update to the userspace PGD any time a set_pgd() is done on a PGD used to map userspace. This ensures that the kernel and userspace copies always map the same userspace memory.”h]”hŒïIn addition to the fork()-time copying, there must also be an update to the userspace PGD any time a set_pgd() is done on a PGD used to map userspace. This ensures that the kernel and userspace copies always map the same userspace memory.”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´Kmhjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hŒŽOn systems without PCID support, each CR3 write flushes the entire TLB. That means that each syscall, interrupt or exception flushes the TLB.”h]”hï)”}”(hŒŽOn systems without PCID support, each CR3 write flushes the entire TLB. That means that each syscall, interrupt or exception flushes the TLB.”h]”hŒŽOn systems without PCID support, each CR3 write flushes the entire TLB. That means that each syscall, interrupt or exception flushes the TLB.”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´Krhjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubjâ)”}”(hX”INVPCID is a TLB-flushing instruction which allows flushing of TLB entries for non-current PCIDs. Some systems support PCIDs, but do not support INVPCID. On these systems, addresses can only be flushed from the TLB for the current PCID. When flushing a kernel address, we need to flush all PCIDs, so a single kernel address flush will require a TLB-flushing CR3 write upon the next use of every PCID. ”h]”hï)”}”(hX“INVPCID is a TLB-flushing instruction which allows flushing of TLB entries for non-current PCIDs. Some systems support PCIDs, but do not support INVPCID. On these systems, addresses can only be flushed from the TLB for the current PCID. When flushing a kernel address, we need to flush all PCIDs, so a single kernel address flush will require a TLB-flushing CR3 write upon the next use of every PCID.”h]”hX“INVPCID is a TLB-flushing instruction which allows flushing of TLB entries for non-current PCIDs. Some systems support PCIDs, but do not support INVPCID. On these systems, addresses can only be flushed from the TLB for the current PCID. When flushing a kernel address, we need to flush all PCIDs, so a single kernel address flush will require a TLB-flushing CR3 write upon the next use of every PCID.”…””}”(hj5h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´Kuhj1ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjrubeh}”(h]”h ]”h"]”h$]”h&]”jjEjhjjuh1jÜhjnubah}”(h]”h ]”h"]”h$]”h&]”uh1jh³hÇh´KOhj½h²hubeh}”(h]”Œoverhead”ah ]”h"]”Œoverhead”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K?ubhÉ)”}”(hhh]”(hÎ)”}”(hŒPossible Future Work”h]”hŒPossible Future Work”…””}”(hj`h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhj]h²hh³hÇh´K~ubjÝ)”}”(hhh]”(jâ)”}”(hŒ^We can be more careful about not actually writing to CR3 unless its value is actually changed.”h]”hï)”}”(hŒ^We can be more careful about not actually writing to CR3 unless its value is actually changed.”h]”hŒ^We can be more careful about not actually writing to CR3 unless its value is actually changed.”…””}”(hjuh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´Khjqubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjnh²hh³hÇh´Nubjâ)”}”(hŒTAllow PTI to be enabled/disabled at runtime in addition to the boot-time switching. ”h]”hï)”}”(hŒSAllow PTI to be enabled/disabled at runtime in addition to the boot-time switching.”h]”hŒSAllow PTI to be enabled/disabled at runtime in addition to the boot-time switching.”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´Khj‰ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjnh²hh³hÇh´Nubeh}”(h]”h ]”h"]”h$]”h&]”jjjhjjuh1jÜhj]h²hh³hÇh´Kubeh}”(h]”Œpossible-future-work”ah ]”h"]”Œpossible future work”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K~ubhÉ)”}”(hhh]”(hÎ)”}”(hŒTesting”h]”hŒTesting”…””}”(hj²h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhj¯h²hh³hÇh´K…ubhï)”}”(hŒnTo test stability of PTI, the following test procedure is recommended, ideally doing all of these in parallel:”h]”hŒnTo test stability of PTI, the following test procedure is recommended, ideally doing all of these in parallel:”…””}”(hjÀh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K‡hj¯h²hubjÝ)”}”(hhh]”(jâ)”}”(hŒSet CONFIG_DEBUG_ENTRY=y”h]”hï)”}”(hjÓh]”hŒSet CONFIG_DEBUG_ENTRY=y”…””}”(hjÕh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KŠhjÑubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÎh²hh³hÇh´Nubjâ)”}”(hXLRun several copies of all of the tools/testing/selftests/x86/ tests (excluding MPX and protection_keys) in a loop on multiple CPUs for several minutes. These tests frequently uncover corner cases in the kernel entry code. In general, old kernels might cause these tests themselves to crash, but they should never crash the kernel.”h]”hï)”}”(hXLRun several copies of all of the tools/testing/selftests/x86/ tests (excluding MPX and protection_keys) in a loop on multiple CPUs for several minutes. These tests frequently uncover corner cases in the kernel entry code. In general, old kernels might cause these tests themselves to crash, but they should never crash the kernel.”h]”hXLRun several copies of all of the tools/testing/selftests/x86/ tests (excluding MPX and protection_keys) in a loop on multiple CPUs for several minutes. These tests frequently uncover corner cases in the kernel entry code. In general, old kernels might cause these tests themselves to crash, but they should never crash the kernel.”…””}”(hjìh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K‹hjèubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÎh²hh³hÇh´Nubjâ)”}”(hX Run the 'perf' tool in a mode (top or record) that generates many frequent performance monitoring non-maskable interrupts (see "NMI" in /proc/interrupts). This exercises the NMI entry/exit code which is known to trigger bugs in code paths that did not expect to be interrupted, including nested NMIs. Using "-c" boosts the rate of NMIs, and using two -c with separate counters encourages nested NMIs and less deterministic behavior. :: while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done ”h]”(hï)”}”(hXµRun the 'perf' tool in a mode (top or record) that generates many frequent performance monitoring non-maskable interrupts (see "NMI" in /proc/interrupts). This exercises the NMI entry/exit code which is known to trigger bugs in code paths that did not expect to be interrupted, including nested NMIs. Using "-c" boosts the rate of NMIs, and using two -c with separate counters encourages nested NMIs and less deterministic behavior. ::”h]”hX¾Run the ‘perf’ tool in a mode (top or record) that generates many frequent performance monitoring non-maskable interrupts (see “NMI†in /proc/interrupts). This exercises the NMI entry/exit code which is known to trigger bugs in code paths that did not expect to be interrupted, including nested NMIs. Using “-c†boosts the rate of NMIs, and using two -c with separate counters encourages nested NMIs and less deterministic behavior.”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KhjubhŒ literal_block”“”)”}”(hŒLwhile true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done”h]”hŒLwhile true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done”…””}”hjsbah}”(h]”h ]”h"]”h$]”h&]”hÅhÆuh1jh³hÇh´K™hjubeh}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÎh²hh³hÇh´Nubjâ)”}”(hŒLaunch a KVM virtual machine.”h]”hï)”}”(hj*h]”hŒLaunch a KVM virtual machine.”…””}”(hj,h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K›hj(ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÎh²hh³hÇh´Nubjâ)”}”(hŒ†Run 32-bit binaries on systems supporting the SYSCALL instruction. This has been a lightly-tested code path and needs extra scrutiny. ”h]”hï)”}”(hŒ…Run 32-bit binaries on systems supporting the SYSCALL instruction. This has been a lightly-tested code path and needs extra scrutiny.”h]”hŒ…Run 32-bit binaries on systems supporting the SYSCALL instruction. This has been a lightly-tested code path and needs extra scrutiny.”…””}”(hjCh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´Kœhj?ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjÎh²hh³hÇh´Nubeh}”(h]”h ]”h"]”h$]”h&]”jjjhjjuh1jÜhj¯h²hh³hÇh´KŠubeh}”(h]”Œtesting”ah ]”h"]”Œtesting”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K…ubhÉ)”}”(hhh]”(hÎ)”}”(hŒ Debugging”h]”hŒ Debugging”…””}”(hjhh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjeh²hh³hÇh´K ubhï)”}”(hŒSBugs in PTI cause a few different signatures of crashes that are worth noting here.”h]”hŒSBugs in PTI cause a few different signatures of crashes that are worth noting here.”…””}”(hjvh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K¢hjeh²hubj)”}”(hX§* Failures of the selftests/x86 code. Usually a bug in one of the more obscure corners of entry_64.S * Crashes in early boot, especially around CPU bringup. Bugs in the mappings cause these. * Crashes at the first interrupt. Caused by bugs in entry_64.S, like screwing up a page table switch. Also caused by incorrectly mapping the IRQ handler entry code. * Crashes at the first NMI. The NMI code is separate from main interrupt handlers and can have bugs that do not affect normal interrupts. Also caused by incorrectly mapping NMI code. NMIs that interrupt the entry code must be very careful and can be the cause of crashes that show up when running perf. * Kernel crashes at the first exit to userspace. entry_64.S bugs, or failing to map some of the exit code. * Crashes at first interrupt that interrupts userspace. The paths in entry_64.S that return to userspace are sometimes separate from the ones that return to the kernel. * Double faults: overflowing the kernel stack because of page faults upon page faults. Caused by touching non-pti-mapped data in the entry code, or forgetting to switch to kernel CR3 before calling into C functions which are not pti-mapped. * Userspace segfaults early in boot, sometimes manifesting as mount(8) failing to mount the rootfs. These have tended to be TLB invalidation issues. Usually invalidating the wrong PCID, or otherwise missing an invalidation. ”h]”hŒ bullet_list”“”)”}”(hhh]”(jâ)”}”(hŒcFailures of the selftests/x86 code. Usually a bug in one of the more obscure corners of entry_64.S”h]”hï)”}”(hŒcFailures of the selftests/x86 code. Usually a bug in one of the more obscure corners of entry_64.S”h]”hŒcFailures of the selftests/x86 code. Usually a bug in one of the more obscure corners of entry_64.S”…””}”(hj‘h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K¥hjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hŒXCrashes in early boot, especially around CPU bringup. Bugs in the mappings cause these.”h]”hï)”}”(hŒXCrashes in early boot, especially around CPU bringup. Bugs in the mappings cause these.”h]”hŒXCrashes in early boot, especially around CPU bringup. Bugs in the mappings cause these.”…””}”(hj©h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K§hj¥ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hŒ¤Crashes at the first interrupt. Caused by bugs in entry_64.S, like screwing up a page table switch. Also caused by incorrectly mapping the IRQ handler entry code.”h]”hï)”}”(hŒ¤Crashes at the first interrupt. Caused by bugs in entry_64.S, like screwing up a page table switch. Also caused by incorrectly mapping the IRQ handler entry code.”h]”hŒ¤Crashes at the first interrupt. Caused by bugs in entry_64.S, like screwing up a page table switch. Also caused by incorrectly mapping the IRQ handler entry code.”…””}”(hjÁh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K©hj½ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hX/Crashes at the first NMI. The NMI code is separate from main interrupt handlers and can have bugs that do not affect normal interrupts. Also caused by incorrectly mapping NMI code. NMIs that interrupt the entry code must be very careful and can be the cause of crashes that show up when running perf.”h]”hï)”}”(hX/Crashes at the first NMI. The NMI code is separate from main interrupt handlers and can have bugs that do not affect normal interrupts. Also caused by incorrectly mapping NMI code. NMIs that interrupt the entry code must be very careful and can be the cause of crashes that show up when running perf.”h]”hX/Crashes at the first NMI. The NMI code is separate from main interrupt handlers and can have bugs that do not affect normal interrupts. Also caused by incorrectly mapping NMI code. NMIs that interrupt the entry code must be very careful and can be the cause of crashes that show up when running perf.”…””}”(hjÙh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K¬hjÕubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hŒiKernel crashes at the first exit to userspace. entry_64.S bugs, or failing to map some of the exit code.”h]”hï)”}”(hŒiKernel crashes at the first exit to userspace. entry_64.S bugs, or failing to map some of the exit code.”h]”hŒiKernel crashes at the first exit to userspace. entry_64.S bugs, or failing to map some of the exit code.”…””}”(hjñh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K²hjíubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hŒ¦Crashes at first interrupt that interrupts userspace. The paths in entry_64.S that return to userspace are sometimes separate from the ones that return to the kernel.”h]”hï)”}”(hŒ¦Crashes at first interrupt that interrupts userspace. The paths in entry_64.S that return to userspace are sometimes separate from the ones that return to the kernel.”h]”hŒ¦Crashes at first interrupt that interrupts userspace. The paths in entry_64.S that return to userspace are sometimes separate from the ones that return to the kernel.”…””}”(hj h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K´hjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hŒïDouble faults: overflowing the kernel stack because of page faults upon page faults. Caused by touching non-pti-mapped data in the entry code, or forgetting to switch to kernel CR3 before calling into C functions which are not pti-mapped.”h]”hï)”}”(hŒïDouble faults: overflowing the kernel stack because of page faults upon page faults. Caused by touching non-pti-mapped data in the entry code, or forgetting to switch to kernel CR3 before calling into C functions which are not pti-mapped.”h]”hŒïDouble faults: overflowing the kernel stack because of page faults upon page faults. Caused by touching non-pti-mapped data in the entry code, or forgetting to switch to kernel CR3 before calling into C functions which are not pti-mapped.”…””}”(hj!h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K·hjubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubjâ)”}”(hŒàUserspace segfaults early in boot, sometimes manifesting as mount(8) failing to mount the rootfs. These have tended to be TLB invalidation issues. Usually invalidating the wrong PCID, or otherwise missing an invalidation. ”h]”hï)”}”(hŒßUserspace segfaults early in boot, sometimes manifesting as mount(8) failing to mount the rootfs. These have tended to be TLB invalidation issues. Usually invalidating the wrong PCID, or otherwise missing an invalidation.”h]”hŒßUserspace segfaults early in boot, sometimes manifesting as mount(8) failing to mount the rootfs. These have tended to be TLB invalidation issues. Usually invalidating the wrong PCID, or otherwise missing an invalidation.”…””}”(hj9h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´K»hj5ubah}”(h]”h ]”h"]”h$]”h&]”uh1jáhjŠubeh}”(h]”h ]”h"]”h$]”h&]”Œbullet”Œ*”uh1jˆh³hÇh´K¥hj„ubah}”(h]”h ]”h"]”h$]”h&]”uh1jh³hÇh´K¥hjeh²hubhŒfootnote”“”)”}”(hŒ!https://gruss.cc/files/kaiser.pdf”h]”(hŒlabel”“”)”}”(hŒ1”h]”hŒ1”…””}”(hjch²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1jahj]ubhï)”}”(hj_h]”hŒ reference”“”)”}”(hj_h]”hŒ!https://gruss.cc/files/kaiser.pdf”…””}”(hjvh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”Œrefuri”j_uh1jthjqubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KÀhj]ubeh}”(h]”j ah ]”h"]”Œ1”ah$]”h&]”jaj j uh1j[h³hÇh´KÀhjeh²hj Kubj\)”}”(hŒ'https://meltdownattack.com/meltdown.pdf”h]”(jb)”}”(hŒ2”h]”hŒ2”…””}”(hj•h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1jahj‘ubhï)”}”(hj“h]”ju)”}”(hj“h]”hŒ'https://meltdownattack.com/meltdown.pdf”…””}”(hj¦h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”Œrefuri”j“uh1jthj£ubah}”(h]”h ]”h"]”h$]”h&]”uh1hîh³hÇh´KÁhj‘ubeh}”(h]”j!ah ]”h"]”Œ2”ah$]”h&]”jaj j uh1j[h³hÇh´KÁhjeh²hj Kubeh}”(h]”Œ debugging”ah ]”h"]”Œ debugging”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K ubeh}”(h]”Œpage-table-isolation-pti”ah ]”h"]”Œpage table isolation (pti)”ah$]”h&]”uh1hÈhhh²hh³hÇh´Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”hÇuh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(hÍNŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jôŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”hÇŒ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”(Œ1”]”húaŒ2”]”jauŒrefids”}”Œnameids”}”(jÎjËj[jXjºj·jZjWj¬j©jbj_jÆjÃjŽj j¾j!uŒ nametypes”}”(jΉj[‰jº‰jZ‰j¬‰jb‰jƉjŽˆj¾ˆuh}”(jËhÊjXhÝjhújjj·j^jWj½j©j]j_j¯jÃjej j]j!j‘uŒ footnote_refs”}”(j4]”húaj6]”jauŒ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”(j]j‘eŒ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”jKs…”R”Œparse_messages”]”hŒsystem_message”“”)”}”(hhh]”hï)”}”(hŒ:Enumerated list start value not ordinal-1: "2" (ordinal 2)”h]”hŒ>Enumerated list start value not ordinal-1: “2†(ordinal 2)”…””}”(hjah²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hîhj^ubah}”(h]”h ]”h"]”h$]”h&]”Œlevel”KŒtype”ŒINFO”Œsource”hÇŒline”Kuh1j\hj½h²hh³hÇh´KMubaŒtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nh²hub.