sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget$/translations/zh_CN/virt/kvm/lockingmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/zh_TW/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/it_IT/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/ja_JP/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/ko_KR/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/sp_SP/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh>/var/lib/git/docbuild/linux/Documentation/virt/kvm/locking.rsthKubhsection)}(hhh](htitle)}(hKVM Lock Overviewh]hKVM Lock Overview}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h1. Acquisition Ordersh]h1. Acquisition Orders}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(h2The acquisition orders for mutexes are as follows:h]h2The acquisition orders for mutexes are as follows:}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh bullet_list)}(hhh](h list_item)}(h+cpus_read_lock() is taken outside kvm_lock h]h)}(h*cpus_read_lock() is taken outside kvm_lockh]h*cpus_read_lock() is taken outside kvm_lock}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(h1kvm_usage_lock is taken outside cpus_read_lock() h]h)}(h0kvm_usage_lock is taken outside cpus_read_lock()h]h0kvm_usage_lock is taken outside cpus_read_lock()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(h'kvm->lock is taken outside vcpu->mutex h]h)}(h&kvm->lock is taken outside vcpu->mutexh]h&kvm->lock is taken outside vcpu->mutex}(hj%hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj!ubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(h=kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock h]h)}(hlock is taken outside kvm->slots_lock and kvm->irq_lockh]hlock is taken outside kvm->slots_lock and kvm->irq_lock}(hj=hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj9ubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(h^kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare. h]h)}(h]kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare.h]h]kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare.}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjQubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(hXAkvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side when modifying memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock. h]h)}(hX@kvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side when modifying memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock.h]hX@kvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side when modifying memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock.}(hjmhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjiubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubeh}(h]h ]h"]h$]h&]bullet-uh1hhhhK hhhhubh)}(hcpus_read_lock() vs kvm_lock:h]hcpus_read_lock() vs kvm_lock:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh]h)}(hX Taking cpus_read_lock() outside of kvm_lock is problematic, despite that being the official ordering, as it is quite easy to unknowingly trigger cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list, e.g. avoid complex operations when possible. h]h)}(hX Taking cpus_read_lock() outside of kvm_lock is problematic, despite that being the official ordering, as it is quite easy to unknowingly trigger cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list, e.g. avoid complex operations when possible.h]hX Taking cpus_read_lock() outside of kvm_lock is problematic, despite that being the official ordering, as it is quite easy to unknowingly trigger cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list, e.g. avoid complex operations when possible.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubah}(h]h ]h"]h$]h&]jjuh1hhhhKhhhhubh)}(h For SRCU:h]h For SRCU:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK$hhhhubh)}(hhh](h)}(hX(``synchronize_srcu(&kvm->srcu)`` is called inside critical sections for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_ be taken inside a kvm->srcu read-side critical section; that is, the following is broken:: srcu_read_lock(&kvm->srcu); mutex_lock(&kvm->slots_lock); h](h)}(h``synchronize_srcu(&kvm->srcu)`` is called inside critical sections for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_ be taken inside a kvm->srcu read-side critical section; that is, the following is broken::h](hliteral)}(h ``synchronize_srcu(&kvm->srcu)``h]hsynchronize_srcu(&kvm->srcu)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is called inside critical sections for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_ be taken inside a kvm->srcu read-side critical section; that is, the following is broken:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK&hjubh literal_block)}(h9srcu_read_lock(&kvm->srcu); mutex_lock(&kvm->slots_lock);h]h9srcu_read_lock(&kvm->srcu); mutex_lock(&kvm->slots_lock);}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhK+hjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hkvm->slots_arch_lock instead is released before the call to ``synchronize_srcu()``. It _can_ therefore be taken inside a kvm->srcu read-side critical section, for example while processing a vmexit. h]h)}(hkvm->slots_arch_lock instead is released before the call to ``synchronize_srcu()``. It _can_ therefore be taken inside a kvm->srcu read-side critical section, for example while processing a vmexit.h](hslots_arch_lock instead is released before the call to }(hjhhhNhNubj)}(h``synchronize_srcu()``h]hsynchronize_srcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubht. It _can_ therefore be taken inside a kvm->srcu read-side critical section, for example while processing a vmexit.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK.hjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhK&hhhhubh)}(hOn x86:h]hOn x86:}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK3hhhhubh)}(hhh](h)}(hQvcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lock h]h)}(hPvcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lockh]hPvcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lock}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK5hjBubah}(h]h ]h"]h$]h&]uh1hhj?hhhhhNubh)}(hkvm->arch.mmu_lock is an rwlock; critical sections for kvm->arch.tdp_mmu_pages_lock and kvm->arch.mmu_unsync_pages_lock must also take kvm->arch.mmu_lock h]h)}(hkvm->arch.mmu_lock is an rwlock; critical sections for kvm->arch.tdp_mmu_pages_lock and kvm->arch.mmu_unsync_pages_lock must also take kvm->arch.mmu_lockh]hkvm->arch.mmu_lock is an rwlock; critical sections for kvm->arch.tdp_mmu_pages_lock and kvm->arch.mmu_unsync_pages_lock must also take kvm->arch.mmu_lock}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK7hjZubah}(h]h ]h"]h$]h&]uh1hhj?hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhK5hhhhubh)}(hOEverything else is a leaf: no other lock is taken inside the critical sections.h]hOEverything else is a leaf: no other lock is taken inside the critical sections.}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK;hhhhubeh}(h]acquisition-ordersah ]h"]1. acquisition ordersah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h 2. Exceptionh]h 2. Exception}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK?ubh)}(hFast page fault:h]hFast page fault:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKAhjhhubh)}(hFast page fault is the fast path which fixes the guest page fault out of the mmu-lock on x86. Currently, the page fault can be fast in one of the following two cases:h]hFast page fault is the fast path which fixes the guest page fault out of the mmu-lock on x86. Currently, the page fault can be fast in one of the following two cases:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKChjhhubhenumerated_list)}(hhh](h)}(hAccess Tracking: The SPTE is not present, but it is marked for access tracking. That means we need to restore the saved R/X bits. This is described in more detail later below. h]h)}(hAccess Tracking: The SPTE is not present, but it is marked for access tracking. That means we need to restore the saved R/X bits. This is described in more detail later below.h]hAccess Tracking: The SPTE is not present, but it is marked for access tracking. That means we need to restore the saved R/X bits. This is described in more detail later below.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKGhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hWrite-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte. h]h)}(hWrite-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte.h]hWrite-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1jhjhhhhhKGubh)}(h]What we use to avoid all the races is the Host-writable bit and MMU-writable bit on the spte:h]h]What we use to avoid all the races is the Host-writable bit and MMU-writable bit on the spte:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKNhjhhubh)}(hhh](h)}(h^Host-writable means the gfn is writable in the host kernel page tables and in its KVM memslot.h]h)}(h^Host-writable means the gfn is writable in the host kernel page tables and in its KVM memslot.h]h^Host-writable means the gfn is writable in the host kernel page tables and in its KVM memslot.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKQhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(hyMMU-writable means the gfn is writable in the guest's mmu and it is not write-protected by shadow page write-protection. h]h)}(hxMMU-writable means the gfn is writable in the guest's mmu and it is not write-protected by shadow page write-protection.h]hzMMU-writable means the gfn is writable in the guest’s mmu and it is not write-protected by shadow page write-protection.}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKShj$ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKQhjhhubh)}(hXOn fast page fault path, we will use cmpxchg to atomically set the spte W bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved R/X bits if for an access-traced spte, or both. This is safe because whenever changing these bits can be detected by cmpxchg.h]hXOn fast page fault path, we will use cmpxchg to atomically set the spte W bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved R/X bits if for an access-traced spte, or both. This is safe because whenever changing these bits can be detected by cmpxchg.}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKVhjhhubh)}(h(But we need carefully check these cases:h]h(But we need carefully check these cases:}(hjPhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK[hjhhubj)}(hhh]h)}(hThe mapping from gfn to pfn h]h)}(hThe mapping from gfn to pfnh]hThe mapping from gfn to pfn}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK]hjaubah}(h]h ]h"]h$]h&]uh1hhj^hhhhhNubah}(h]h ]h"]h$]h&]jjjhj)uh1jhjhhhhhK]ubh)}(hThe mapping from gfn to pfn may be changed since we can only ensure the pfn is not changed during cmpxchg. This is a ABA problem, for example, below case will happen:h]hThe mapping from gfn to pfn may be changed since we can only ensure the pfn is not changed during cmpxchg. This is a ABA problem, for example, below case will happen:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK_hjhhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthK$uh1jhjubj)}(hhh]h}(h]h ]h"]h$]h&]colwidthK#uh1jhjubhtbody)}(hhh](hrow)}(hhh]hentry)}(hhh](h)}(hAt the beginning::h]hAt the beginning:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKdhjubj)}(hvgpte = gfn1 gfn1 is mapped to pfn1 on host spte is the shadow page table entry corresponding with gpte and spte = pfn1h]hvgpte = gfn1 gfn1 is mapped to pfn1 on host spte is the shadow page table entry corresponding with gpte and spte = pfn1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKfhjubeh}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hhh]h)}(hOn fast page fault path:h]hOn fast page fault path:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKkhjubah}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hCPU 0:h]hCPU 0:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKmhj ubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hCPU 1:h]hCPU 1:}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKmhj!ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(hold_spte = *spte;h]hold_spte = *spte;}hjDsbah}(h]h ]h"]h$]h&]hhuh1jhhhKqhjAubah}(h]h ]h"]h$]h&]uh1jhj>ubj)}(hhh]h}(h]h ]h"]h$]h&]uh1jhj>ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjgubj)}(hhh](h)}(hpfn1 is swapped out::h]hpfn1 is swapped out:}(hjvhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKshjsubj)}(h spte = 0;h]h spte = 0;}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKuhjsubh)}(hpfn1 is re-alloced for gfn2.h]hpfn1 is re-alloced for gfn2.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKwhjsubh)}(h/gpte is changed to point to gfn2 by the guest::h]h.gpte is changed to point to gfn2 by the guest:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKyhjsubj)}(h spte = pfn1;h]h spte = pfn1;}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhK|hjsubeh}(h]h ]h"]h$]h&]uh1jhjgubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hhh]j)}(h]if (cmpxchg(spte, old_spte, old_spte+W) mark_page_dirty(vcpu->kvm, gfn1) OOPS!!!h]h]if (cmpxchg(spte, old_spte, old_spte+W) mark_page_dirty(vcpu->kvm, gfn1) OOPS!!!}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubah}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(h?We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.h]h?We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hFor direct sp, we can easily avoid it since the spte of direct sp is fixed to gfn. For indirect sp, we disabled fast page fault for simplicity.h]hFor direct sp, we can easily avoid it since the spte of direct sp is fixed to gfn. For indirect sp, we disabled fast page fault for simplicity.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hZA solution for indirect sp could be to pin the gfn before the cmpxchg. After the pinning:h]hZA solution for indirect sp could be to pin the gfn before the cmpxchg. After the pinning:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](h)}(hdWe have held the refcount of pfn; that means the pfn can not be freed and be reused for another gfn.h]h)}(hdWe have held the refcount of pfn; that means the pfn can not be freed and be reused for another gfn.h]hdWe have held the refcount of pfn; that means the pfn can not be freed and be reused for another gfn.}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj)ubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(hUThe pfn is writable and therefore it cannot be shared between different gfns by KSM. h]h)}(hTThe pfn is writable and therefore it cannot be shared between different gfns by KSM.h]hTThe pfn is writable and therefore it cannot be shared between different gfns by KSM.}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjAubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhjhhubh)}(hAThen, we can ensure the dirty bitmaps is correctly set for a gfn.h]hAThen, we can ensure the dirty bitmaps is correctly set for a gfn.}(hj_hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh]h)}(hDirty bit tracking h]h)}(hDirty bit trackingh]hDirty bit tracking}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjpubah}(h]h ]h"]h$]h&]uh1hhjmhhhhhNubah}(h]h ]h"]h$]h&]jjjhjjstartKuh1jhjhhhhhKubh)}(hIn the original code, the spte can be fast updated (non-atomically) if the spte is read-only and the Accessed bit has already been set since the Accessed bit and Dirty bit can not be lost.h]hIn the original code, the spte can be fast updated (non-atomically) if the spte is read-only and the Accessed bit has already been set since the Accessed bit and Dirty bit can not be lost.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hBut it is not true after fast page fault since the spte can be marked writable between reading spte and updating spte. Like below case:h]hBut it is not true after fast page fault since the spte can be marked writable between reading spte and updating spte. Like below case:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthK%uh1jhjubj)}(hhh]h}(h]h ]h"]h$]h&]colwidthK#uh1jhjubj)}(hhh](j)}(hhh]j)}(hhh](h)}(hAt the beginning::h]hAt the beginning:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubj)}(hspte.W = 0 spte.Accessed = 1h]hspte.W = 0 spte.Accessed = 1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubeh}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hCPU 0:h]hCPU 0:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h)}(hCPU 1:h]hCPU 1:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh](h)}(hIn mmu_spte_update()::h]hIn mmu_spte_update():}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj1ubj)}(h~old_spte = *spte; /* 'if' condition is satisfied. */ if (old_spte.Accessed == 1 && old_spte.W == 0) spte = new_spte;h]h~old_spte = *spte; /* 'if' condition is satisfied. */ if (old_spte.Accessed == 1 && old_spte.W == 0) spte = new_spte;}hjBsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhj1ubeh}(h]h ]h"]h$]h&]uh1jhj.ubj)}(hhh]h}(h]h ]h"]h$]h&]uh1jhj.ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjeubj)}(hhh](h)}(hon fast page fault path::h]hon fast page fault path:}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjqubj)}(h spte.W = 1h]h spte.W = 1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjqubh)}(hmemory write on the spte::h]hmemory write on the spte:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjqubj)}(hspte.Dirty = 1h]hspte.Dirty = 1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjqubeh}(h]h ]h"]h$]h&]uh1jhjeubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(helse old_spte = xchg(spte, new_spte); if (old_spte.Accessed && !new_spte.Accessed) flush = true; if (old_spte.Dirty && !new_spte.Dirty) flush = true; OOPS!!!h]helse old_spte = xchg(spte, new_spte); if (old_spte.Accessed && !new_spte.Accessed) flush = true; if (old_spte.Dirty && !new_spte.Dirty) flush = true; OOPS!!!}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(h#The Dirty bit is lost in this case.h]h#The Dirty bit is lost in this case.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hIn order to avoid this kind of issue, we always treat the spte as "volatile" if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means the spte is always atomically updated in this case.h]hIn order to avoid this kind of issue, we always treat the spte as “volatile” if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means the spte is always atomically updated in this case.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh]h)}(hflush tlbs due to spte updated h]h)}(hflush tlbs due to spte updatedh]hflush tlbs due to spte updated}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubah}(h]h ]h"]h$]h&]jjjhjjjKuh1jhjhhhhhKubh)}(hIf the spte is updated from writable to read-only, we should flush all TLBs, otherwise rmap_write_protect will find a read-only spte, even though the writable spte might be cached on a CPU's TLB.h]hIf the spte is updated from writable to read-only, we should flush all TLBs, otherwise rmap_write_protect will find a read-only spte, even though the writable spte might be cached on a CPU’s TLB.}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXAs mentioned before, the spte can be updated to writable out of mmu-lock on fast page fault path. In order to easily audit the path, we see if TLBs needing to be flushed caused this reason in mmu_spte_update() since this is a common function to update spte (present -> present).h]hXAs mentioned before, the spte can be updated to writable out of mmu-lock on fast page fault path. In order to easily audit the path, we see if TLBs needing to be flushed caused this reason in mmu_spte_update() since this is a common function to update spte (present -> present).}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hSince the spte is "volatile" if it can be updated out of mmu-lock, we always atomically update the spte and the race caused by fast page fault can be avoided. See the comments in spte_needs_atomic_update() and mmu_spte_update().h]hSince the spte is “volatile” if it can be updated out of mmu-lock, we always atomically update the spte and the race caused by fast page fault can be avoided. See the comments in spte_needs_atomic_update() and mmu_spte_update().}(hjMhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hLockless Access Tracking:h]hLockless Access Tracking:}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXThis is used for Intel CPUs that are using EPT but do not support the EPT A/D bits. In this case, PTEs are tagged as A/D disabled (using ignored bits), and when the KVM MMU notifier is called to track accesses to a page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE not-present in hardware by clearing the RWX bits in the PTE and storing the original R & X bits in more unused/ignored bits. When the VM tries to access the page later on, a fault is generated and the fast page fault mechanism described above is used to atomically restore the PTE to a Present state. The W bit is not saved when the PTE is marked for access tracking and during restoration to the Present state, the W bit is set depending on whether or not it was a write access. If it wasn't, then the W bit will remain clear until a write access happens, at which time it will be set using the Dirty tracking mechanism described above.h]hXThis is used for Intel CPUs that are using EPT but do not support the EPT A/D bits. In this case, PTEs are tagged as A/D disabled (using ignored bits), and when the KVM MMU notifier is called to track accesses to a page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE not-present in hardware by clearing the RWX bits in the PTE and storing the original R & X bits in more unused/ignored bits. When the VM tries to access the page later on, a fault is generated and the fast page fault mechanism described above is used to atomically restore the PTE to a Present state. The W bit is not saved when the PTE is marked for access tracking and during restoration to the Present state, the W bit is set depending on whether or not it was a write access. If it wasn’t, then the W bit will remain clear until a write access happens, at which time it will be set using the Dirty tracking mechanism described above.}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h] exceptionah ]h"] 2. exceptionah$]h&]uh1hhhhhhhhK?ubh)}(hhh](h)}(h 3. Referenceh]h 3. Reference}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h ``kvm_lock``h]j)}(hjh]hkvm_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh field_list)}(hhh](hfield)}(hhh](h field_name)}(hTypeh]hType}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubh field_body)}(hmutexh]h)}(hjh]hmutex}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hArchh]hArch}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hanyh]h)}(hjh]hany}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hProtectsh]hProtects}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(h - vm_list h]h)}(hhh]h)}(hvm_list h]h)}(hvm_listh]hvm_list}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj)ubah}(h]h ]h"]h$]h&]uh1hhj&ubah}(h]h ]h"]h$]h&]jjuh1hhhhKhj"ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhKubeh}(h]kvm-lockah ]h"]kvm_lockah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h``kvm_usage_lock``h]j)}(hjfh]hkvm_usage_lock}(hjhhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjdubah}(h]h ]h"]h$]h&]uh1hhjahhhhhKubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj~hhhKubj)}(hmutexh]h)}(hjh]hmutex}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhj~ubeh}(h]h ]h"]h$]h&]uh1jhhhKhj{hhubj)}(hhh](j)}(hArchh]hArch}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hanyh]h)}(hjh]hany}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhj{hhubj)}(hhh](j)}(hProtectsh]hProtects}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(h:- kvm_usage_count - hardware virtualization enable/disableh]h)}(hhh](h)}(hkvm_usage_counth]h)}(hjh]hkvm_usage_count}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h&hardware virtualization enable/disableh]h)}(hj h]h&hardware virtualization enable/disable}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhj{hhubj)}(hhh](j)}(hCommenth]hComment}(hj5 hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj2 hhhKubj)}(hExists to allow taking cpus_read_lock() while kvm_usage_count is protected, which simplifies the virtualization enabling logic. h]h)}(hExists to allow taking cpus_read_lock() while kvm_usage_count is protected, which simplifies the virtualization enabling logic.h]hExists to allow taking cpus_read_lock() while kvm_usage_count is protected, which simplifies the virtualization enabling logic.}(hjG hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjC ubah}(h]h ]h"]h$]h&]uh1jhj2 ubeh}(h]h ]h"]h$]h&]uh1jhhhKhj{hhubeh}(h]h ]h"]h$]h&]uh1jhjahhhhhKubeh}(h]kvm-usage-lockah ]h"]kvm_usage_lockah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h``kvm->mn_invalidate_lock``h]j)}(hjt h]hkvm->mn_invalidate_lock}(hjv hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjr ubah}(h]h ]h"]h$]h&]uh1hhjo hhhhhKubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h spinlock_th]h)}(hj h]h spinlock_t}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhKhj hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hanyh]h)}(hj h]hany}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h7mn_active_invalidate_count, mn_memslots_update_rcuwait h]h)}(h6mn_active_invalidate_count, mn_memslots_update_rcuwaith]h6mn_active_invalidate_count, mn_memslots_update_rcuwait}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj hhubeh}(h]h ]h"]h$]h&]uh1jhjo hhhhhKubeh}(h]kvm-mn-invalidate-lockah ]h"]kvm->mn_invalidate_lockah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h``kvm_arch::tsc_write_lock``h]j)}(hj* h]hkvm_arch::tsc_write_lock}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj( ubah}(h]h ]h"]h$]h&]uh1hhj% hhhhhMubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjE hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjB hhhKubj)}(hraw_spinlock_th]h)}(hjU h]hraw_spinlock_t}(hjW hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjS ubah}(h]h ]h"]h$]h&]uh1jhjB ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj? hhubj)}(hhh](j)}(hArchh]hArch}(hjs hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjp hhhKubj)}(hx86h]h)}(hj h]hx86}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhjp ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj? hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hO- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset} - tsc offset in vmcbh]h)}(hhh](h)}(h8kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}h]h)}(hj h]h8kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(htsc offset in vmcbh]h)}(hj h]htsc offset in vmcb}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj? hhubj)}(hhh](j)}(hCommenth]hComment}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h>'raw' because updating the tsc offsets must not be preempted. h]h)}(h='raw' because updating the tsc offsets must not be preempted.h]hA‘raw’ because updating the tsc offsets must not be preempted.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM hj? hhubeh}(h]h ]h"]h$]h&]uh1jhj% hhhhhMubeh}(h]kvm-arch-tsc-write-lockah ]h"]kvm_arch::tsc_write_lockah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h``kvm->mmu_lock``h]j)}(hj8 h]h kvm->mmu_lock}(hj: hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj6 ubah}(h]h ]h"]h$]h&]uh1hhj3 hhhhhM ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjS hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjP hhhKubj)}(hspinlock_t or rwlock_th]h)}(hjc h]hspinlock_t or rwlock_t}(hje hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhja ubah}(h]h ]h"]h$]h&]uh1jhjP ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjM hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj~ hhhKubj)}(hanyh]h)}(hj h]hany}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj~ ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjM hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h-shadow page/shadow tlb entryh]h)}(hj h]h-shadow page/shadow tlb entry}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjM hhubj)}(hhh](j)}(hCommenth]hComment}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h3it is a spinlock since it is used in mmu notifier. h]h)}(h2it is a spinlock since it is used in mmu notifier.h]h2it is a spinlock since it is used in mmu notifier.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjM hhubeh}(h]h ]h"]h$]h&]uh1jhj3 hhhhhMubeh}(h] kvm-mmu-lockah ]h"] kvm->mmu_lockah$]h&]uh1hhjhhhhhM ubh)}(hhh](h)}(h ``kvm->srcu``h]j)}(hj h]h kvm->srcu}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hj7 hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4 hhhKubj)}(h srcu lockh]h)}(hjG h]h srcu lock}(hjI hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjE ubah}(h]h ]h"]h$]h&]uh1jhj4 ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj1 hhubj)}(hhh](j)}(hArchh]hArch}(hje hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjb hhhKubj)}(hanyh]h)}(hju h]hany}(hjw hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjs ubah}(h]h ]h"]h$]h&]uh1jhjb ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj1 hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h- kvm->memslots - kvm->busesh]h)}(hhh](h)}(h kvm->memslotsh]h)}(hj h]h kvm->memslots}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h kvm->busesh]h)}(hj h]h kvm->buses}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj1 hhubj)}(hhh](j)}(hCommenth]hComment}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hXThe srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions) and while accessing in-kernel MMIO/PIO address->device structure mapping (kvm->buses). The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions. h]h)}(hXThe srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions) and while accessing in-kernel MMIO/PIO address->device structure mapping (kvm->buses). The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions.h]hXThe srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions) and while accessing in-kernel MMIO/PIO address->device structure mapping (kvm->buses). The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj1 hhubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhMubeh}(h]kvm-srcuah ]h"] kvm->srcuah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h``kvm->slots_arch_lock``h]j)}(hj* h]hkvm->slots_arch_lock}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj( ubah}(h]h ]h"]h$]h&]uh1hhj% hhhhhM ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjE hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjB hhhKubj)}(hmutexh]h)}(hjU h]hmutex}(hjW hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM!hjS ubah}(h]h ]h"]h$]h&]uh1jhjB ubeh}(h]h ]h"]h$]h&]uh1jhhhM!hj? hhubj)}(hhh](j)}(hArchh]hArch}(hjs hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjp hhhKubj)}(hany (only needed on x86 though)h]h)}(hj h]hany (only needed on x86 though)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM"hj ubah}(h]h ]h"]h$]h&]uh1jhjp ubeh}(h]h ]h"]h$]h&]uh1jhhhM"hj? hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hlany arch-specific fields of memslots that have to be modified in a ``kvm->srcu`` read-side critical section.h]h)}(hlany arch-specific fields of memslots that have to be modified in a ``kvm->srcu`` read-side critical section.h](hCany arch-specific fields of memslots that have to be modified in a }(hj hhhNhNubj)}(h ``kvm->srcu``h]h kvm->srcu}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh read-side critical section.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM#hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM#hj? hhubj)}(hhh](j)}(hCommenth]hComment}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hvmust be held before reading the pointer to the current memslots, until after all changes to the memslots are complete h]h)}(humust be held before reading the pointer to the current memslots, until after all changes to the memslots are completeh]humust be held before reading the pointer to the current memslots, until after all changes to the memslots are complete}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM%hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM%hj? hhubeh}(h]h ]h"]h$]h&]uh1jhj% hhhhhM!ubeh}(h]kvm-slots-arch-lockah ]h"]kvm->slots_arch_lockah$]h&]uh1hhjhhhhhM ubh)}(hhh](h)}(h``wakeup_vcpus_on_cpu_lock``h]j)}(hj!h]hwakeup_vcpus_on_cpu_lock}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM)ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj9hhhKubj)}(h spinlock_th]h)}(hjLh]h spinlock_t}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM*hjJubah}(h]h ]h"]h$]h&]uh1jhj9ubeh}(h]h ]h"]h$]h&]uh1jhhhM*hj6hhubj)}(hhh](j)}(hArchh]hArch}(hjjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjghhhKubj)}(hx86h]h)}(hjzh]hx86}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM+hjxubah}(h]h ]h"]h$]h&]uh1jhjgubeh}(h]h ]h"]h$]h&]uh1jhhhM+hj6hhubj)}(hhh](j)}(hProtectsh]hProtects}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hwakeup_vcpus_on_cpuh]h)}(hjh]hwakeup_vcpus_on_cpu}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM,hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM,hj6hhubj)}(hhh](j)}(hCommenth]hComment}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hXThis is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts are supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup. h]h)}(hXThis is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts are supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup.h]hXThis is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts are supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM-hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM-hj6hhubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhM*ubeh}(h]wakeup-vcpus-on-cpu-lockah ]h"]wakeup_vcpus_on_cpu_lockah$]h&]uh1hhjhhhhhM)ubh)}(hhh](h)}(h``vendor_module_lock``h]j)}(hjh]hvendor_module_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM6ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hmutexh]h)}(hj0h]hmutex}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM7hj.ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM7hjhhubj)}(hhh](j)}(hArchh]hArch}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjKhhhKubj)}(hx86h]h)}(hj^h]hx86}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM8hj\ubah}(h]h ]h"]h$]h&]uh1jhjKubeh}(h]h ]h"]h$]h&]uh1jhhhM8hjhhubj)}(hhh](j)}(hProtectsh]hProtects}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjyhhhKubj)}(h.loading a vendor module (kvm_amd or kvm_intel)h]h)}(hjh]h.loading a vendor module (kvm_amd or kvm_intel)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM9hjubah}(h]h ]h"]h$]h&]uh1jhjyubeh}(h]h ]h"]h$]h&]uh1jhhhM9hjhhubj)}(hhh](j)}(hCommenth]hComment}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hX>Exists because using kvm_lock leads to deadlock. kvm_lock is taken in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many operations need to take cpu_hotplug_lock when loading a vendor module, e.g. updating static calls.h]h)}(hX>Exists because using kvm_lock leads to deadlock. kvm_lock is taken in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many operations need to take cpu_hotplug_lock when loading a vendor module, e.g. updating static calls.h]hX>Exists because using kvm_lock leads to deadlock. kvm_lock is taken in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many operations need to take cpu_hotplug_lock when loading a vendor module, e.g. updating static calls.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM:hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM:hjhhubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhM7ubeh}(h]vendor-module-lockah ]h"]vendor_module_lockah$]h&]uh1hhjhhhhhM6ubeh}(h] referenceah ]h"] 3. referenceah$]h&]uh1hhhhhhhhKubeh}(h]kvm-lock-overviewah ]h"]kvm lock overviewah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksjfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(jjjjj|jyjjj^j[jl ji j" j j0 j- j j j" j jjjjjju nametypes}(jjj|jj^jl j" j0 j j" jjjuh}(jhjhjyjjjj[jji jaj jo j- j% j j3 j j jj% jjjju footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages](hsystem_message)}(hhh]h)}(h:Enumerated list start value not ordinal-1: "2" (ordinal 2)h]h>Enumerated list start value not ordinal-1: “2” (ordinal 2)}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjzubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehlineKuh1jxhjhhhhhKubjy)}(hhh]h)}(h:Enumerated list start value not ordinal-1: "3" (ordinal 3)h]h>Enumerated list start value not ordinal-1: “3” (ordinal 3)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]levelKtypejsourcehlineKuh1jxhjhhhhhKubetransform_messages] transformerN include_log] decorationNhhub.