sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget$/translations/zh_CN/virt/kvm/lockingmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/zh_TW/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/it_IT/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/ja_JP/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/ko_KR/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hPortuguese (Brazilian)}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/pt_BR/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget$/translations/sp_SP/virt/kvm/lockingmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh>/var/lib/git/docbuild/linux/Documentation/virt/kvm/locking.rsthKubhsection)}(hhh](htitle)}(hKVM Lock Overviewh]hKVM Lock Overview}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h1. Acquisition Ordersh]h1. Acquisition Orders}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(h2The acquisition orders for mutexes are as follows:h]h2The acquisition orders for mutexes are as follows:}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh bullet_list)}(hhh](h list_item)}(h+cpus_read_lock() is taken outside kvm_lock h]h)}(h*cpus_read_lock() is taken outside kvm_lockh]h*cpus_read_lock() is taken outside kvm_lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h1kvm_usage_lock is taken outside cpus_read_lock() h]h)}(h0kvm_usage_lock is taken outside cpus_read_lock()h]h0kvm_usage_lock is taken outside cpus_read_lock()}(hj!hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h'kvm->lock is taken outside vcpu->mutex h]h)}(h&kvm->lock is taken outside vcpu->mutexh]h&kvm->lock is taken outside vcpu->mutex}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj5ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h=kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock h]h)}(hlock is taken outside kvm->slots_lock and kvm->irq_lockh]hlock is taken outside kvm->slots_lock and kvm->irq_lock}(hjQhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjMubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hFvcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lock h]h)}(hEvcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lockh]hEvcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lock}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjeubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h^kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare. h]h)}(h]kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare.h]h]kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj}ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hXAkvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side when modifying memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock. h]h)}(hX@kvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side when modifying memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock.h]hX@kvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side when modifying memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]bullet-uh1hhhhK hhhhubh)}(hcpus_read_lock() vs kvm_lock:h]hcpus_read_lock() vs kvm_lock:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh]j)}(hX Taking cpus_read_lock() outside of kvm_lock is problematic, despite that being the official ordering, as it is quite easy to unknowingly trigger cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list, e.g. avoid complex operations when possible. h]h)}(hX Taking cpus_read_lock() outside of kvm_lock is problematic, despite that being the official ordering, as it is quite easy to unknowingly trigger cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list, e.g. avoid complex operations when possible.h]hX Taking cpus_read_lock() outside of kvm_lock is problematic, despite that being the official ordering, as it is quite easy to unknowingly trigger cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list, e.g. avoid complex operations when possible.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK!hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubah}(h]h ]h"]h$]h&]jjuh1hhhhK!hhhhubh)}(h For SRCU:h]h For SRCU:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK&hhhhubh)}(hhh](j)}(hX(``synchronize_srcu(&kvm->srcu)`` is called inside critical sections for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_ be taken inside a kvm->srcu read-side critical section; that is, the following is broken:: srcu_read_lock(&kvm->srcu); mutex_lock(&kvm->slots_lock); h](h)}(h``synchronize_srcu(&kvm->srcu)`` is called inside critical sections for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_ be taken inside a kvm->srcu read-side critical section; that is, the following is broken::h](hliteral)}(h ``synchronize_srcu(&kvm->srcu)``h]hsynchronize_srcu(&kvm->srcu)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is called inside critical sections for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_ be taken inside a kvm->srcu read-side critical section; that is, the following is broken:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK(hjubh literal_block)}(h9srcu_read_lock(&kvm->srcu); mutex_lock(&kvm->slots_lock);h]h9srcu_read_lock(&kvm->srcu); mutex_lock(&kvm->slots_lock);}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhK-hjubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hkvm->slots_arch_lock instead is released before the call to ``synchronize_srcu()``. It _can_ therefore be taken inside a kvm->srcu read-side critical section, for example while processing a vmexit. h]h)}(hkvm->slots_arch_lock instead is released before the call to ``synchronize_srcu()``. It _can_ therefore be taken inside a kvm->srcu read-side critical section, for example while processing a vmexit.h](hslots_arch_lock instead is released before the call to }(hj1hhhNhNubj)}(h``synchronize_srcu()``h]hsynchronize_srcu()}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubht. It _can_ therefore be taken inside a kvm->srcu read-side critical section, for example while processing a vmexit.}(hj1hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK0hj-ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhK(hhhhubh)}(hOn x86:h]hOn x86:}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK5hhhhubh)}(hhh](j)}(hQvcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lock h]h)}(hPvcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lockh]hPvcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lock}(hjrhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK7hjnubah}(h]h ]h"]h$]h&]uh1jhjkhhhhhNubj)}(hkvm->arch.mmu_lock is an rwlock; critical sections for kvm->arch.tdp_mmu_pages_lock and kvm->arch.mmu_unsync_pages_lock must also take kvm->arch.mmu_lock h]h)}(hkvm->arch.mmu_lock is an rwlock; critical sections for kvm->arch.tdp_mmu_pages_lock and kvm->arch.mmu_unsync_pages_lock must also take kvm->arch.mmu_lockh]hkvm->arch.mmu_lock is an rwlock; critical sections for kvm->arch.tdp_mmu_pages_lock and kvm->arch.mmu_unsync_pages_lock must also take kvm->arch.mmu_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK9hjubah}(h]h ]h"]h$]h&]uh1jhjkhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhK7hhhhubh)}(hOEverything else is a leaf: no other lock is taken inside the critical sections.h]hOEverything else is a leaf: no other lock is taken inside the critical sections.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK=hhhhubeh}(h]acquisition-ordersah ]h"]1. acquisition ordersah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h 2. Exceptionh]h 2. Exception}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKAubh)}(hFast page fault:h]hFast page fault:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKChjhhubh)}(hFast page fault is the fast path which fixes the guest page fault out of the mmu-lock on x86. Currently, the page fault can be fast in one of the following two cases:h]hFast page fault is the fast path which fixes the guest page fault out of the mmu-lock on x86. Currently, the page fault can be fast in one of the following two cases:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKEhjhhubhenumerated_list)}(hhh](j)}(hAccess Tracking: The SPTE is not present, but it is marked for access tracking. That means we need to restore the saved R/X bits. This is described in more detail later below. h]h)}(hAccess Tracking: The SPTE is not present, but it is marked for access tracking. That means we need to restore the saved R/X bits. This is described in more detail later below.h]hAccess Tracking: The SPTE is not present, but it is marked for access tracking. That means we need to restore the saved R/X bits. This is described in more detail later below.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKIhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hWrite-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte. h]h)}(hWrite-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte.h]hWrite-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1jhjhhhhhKIubh)}(h]What we use to avoid all the races is the Host-writable bit and MMU-writable bit on the spte:h]h]What we use to avoid all the races is the Host-writable bit and MMU-writable bit on the spte:}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKPhjhhubh)}(hhh](j)}(h^Host-writable means the gfn is writable in the host kernel page tables and in its KVM memslot.h]h)}(h^Host-writable means the gfn is writable in the host kernel page tables and in its KVM memslot.h]h^Host-writable means the gfn is writable in the host kernel page tables and in its KVM memslot.}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKShj8ubah}(h]h ]h"]h$]h&]uh1jhj5hhhhhNubj)}(hyMMU-writable means the gfn is writable in the guest's mmu and it is not write-protected by shadow page write-protection. h]h)}(hxMMU-writable means the gfn is writable in the guest's mmu and it is not write-protected by shadow page write-protection.h]hzMMU-writable means the gfn is writable in the guest’s mmu and it is not write-protected by shadow page write-protection.}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKUhjPubah}(h]h ]h"]h$]h&]uh1jhj5hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKShjhhubh)}(hXOn fast page fault path, we will use cmpxchg to atomically set the spte W bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved R/X bits if for an access-traced spte, or both. This is safe because whenever changing these bits can be detected by cmpxchg.h]hXOn fast page fault path, we will use cmpxchg to atomically set the spte W bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved R/X bits if for an access-traced spte, or both. This is safe because whenever changing these bits can be detected by cmpxchg.}(hjnhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKXhjhhubh)}(h(But we need carefully check these cases:h]h(But we need carefully check these cases:}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK]hjhhubj)}(hhh]j)}(hThe mapping from gfn to pfn h]h)}(hThe mapping from gfn to pfnh]hThe mapping from gfn to pfn}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK_hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubah}(h]h ]h"]h$]h&]j"j#j$hj%)uh1jhjhhhhhK_ubh)}(hThe mapping from gfn to pfn may be changed since we can only ensure the pfn is not changed during cmpxchg. This is a ABA problem, for example, below case will happen:h]hThe mapping from gfn to pfn may be changed since we can only ensure the pfn is not changed during cmpxchg. This is a ABA problem, for example, below case will happen:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKahjhhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthK$uh1jhjubj)}(hhh]h}(h]h ]h"]h$]h&]colwidthK#uh1jhjubhtbody)}(hhh](hrow)}(hhh]hentry)}(hhh](h)}(hAt the beginning::h]hAt the beginning:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKfhjubj)}(hvgpte = gfn1 gfn1 is mapped to pfn1 on host spte is the shadow page table entry corresponding with gpte and spte = pfn1h]hvgpte = gfn1 gfn1 is mapped to pfn1 on host spte is the shadow page table entry corresponding with gpte and spte = pfn1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhhjubeh}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hhh]h)}(hOn fast page fault path:h]hOn fast page fault path:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKmhjubah}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hCPU 0:h]hCPU 0:}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKohj6ubah}(h]h ]h"]h$]h&]uh1jhj3ubj)}(hhh]h)}(hCPU 1:h]hCPU 1:}(hjPhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKohjMubah}(h]h ]h"]h$]h&]uh1jhj3ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(hold_spte = *spte;h]hold_spte = *spte;}hjpsbah}(h]h ]h"]h$]h&]hhuh1jhhhKshjmubah}(h]h ]h"]h$]h&]uh1jhjjubj)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](h)}(hpfn1 is swapped out::h]hpfn1 is swapped out:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKuhjubj)}(h spte = 0;h]h spte = 0;}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKwhjubh)}(hpfn1 is re-alloced for gfn2.h]hpfn1 is re-alloced for gfn2.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKyhjubh)}(h/gpte is changed to point to gfn2 by the guest::h]h.gpte is changed to point to gfn2 by the guest:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK{hjubj)}(h spte = pfn1;h]h spte = pfn1;}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhK~hjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]j)}(hhh]j)}(h]if (cmpxchg(spte, old_spte, old_spte+W) mark_page_dirty(vcpu->kvm, gfn1) OOPS!!!h]h]if (cmpxchg(spte, old_spte, old_spte+W) mark_page_dirty(vcpu->kvm, gfn1) OOPS!!!}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubah}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(h?We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.h]h?We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hFor direct sp, we can easily avoid it since the spte of direct sp is fixed to gfn. For indirect sp, we disabled fast page fault for simplicity.h]hFor direct sp, we can easily avoid it since the spte of direct sp is fixed to gfn. For indirect sp, we disabled fast page fault for simplicity.}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hZA solution for indirect sp could be to pin the gfn before the cmpxchg. After the pinning:h]hZA solution for indirect sp could be to pin the gfn before the cmpxchg. After the pinning:}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](j)}(hdWe have held the refcount of pfn; that means the pfn can not be freed and be reused for another gfn.h]h)}(hdWe have held the refcount of pfn; that means the pfn can not be freed and be reused for another gfn.h]hdWe have held the refcount of pfn; that means the pfn can not be freed and be reused for another gfn.}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjUubah}(h]h ]h"]h$]h&]uh1jhjRhhhhhNubj)}(hUThe pfn is writable and therefore it cannot be shared between different gfns by KSM. h]h)}(hTThe pfn is writable and therefore it cannot be shared between different gfns by KSM.h]hTThe pfn is writable and therefore it cannot be shared between different gfns by KSM.}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjmubah}(h]h ]h"]h$]h&]uh1jhjRhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhjhhubh)}(hAThen, we can ensure the dirty bitmaps is correctly set for a gfn.h]hAThen, we can ensure the dirty bitmaps is correctly set for a gfn.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh]j)}(hDirty bit tracking h]h)}(hDirty bit trackingh]hDirty bit tracking}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubah}(h]h ]h"]h$]h&]j"j#j$hj%jstartKuh1jhjhhhhhKubh)}(hIn the original code, the spte can be fast updated (non-atomically) if the spte is read-only and the Accessed bit has already been set since the Accessed bit and Dirty bit can not be lost.h]hIn the original code, the spte can be fast updated (non-atomically) if the spte is read-only and the Accessed bit has already been set since the Accessed bit and Dirty bit can not be lost.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hBut it is not true after fast page fault since the spte can be marked writable between reading spte and updating spte. Like below case:h]hBut it is not true after fast page fault since the spte can be marked writable between reading spte and updating spte. Like below case:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthK%uh1jhjubj)}(hhh]h}(h]h ]h"]h$]h&]colwidthK#uh1jhjubj)}(hhh](j)}(hhh]j)}(hhh](h)}(hAt the beginning::h]hAt the beginning:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubj)}(hspte.W = 0 spte.Accessed = 1h]hspte.W = 0 spte.Accessed = 1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubeh}(h]h ]h"]h$]h&]morecolsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h)}(hCPU 0:h]hCPU 0:}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj&ubah}(h]h ]h"]h$]h&]uh1jhj#ubj)}(hhh]h)}(hCPU 1:h]hCPU 1:}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj=ubah}(h]h ]h"]h$]h&]uh1jhj#ubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh](h)}(hIn mmu_spte_update()::h]hIn mmu_spte_update():}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj]ubj)}(h~old_spte = *spte; /* 'if' condition is satisfied. */ if (old_spte.Accessed == 1 && old_spte.W == 0) spte = new_spte;h]h~old_spte = *spte; /* 'if' condition is satisfied. */ if (old_spte.Accessed == 1 && old_spte.W == 0) spte = new_spte;}hjnsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhj]ubeh}(h]h ]h"]h$]h&]uh1jhjZubj)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjZubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](h)}(hon fast page fault path::h]hon fast page fault path:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubj)}(h spte.W = 1h]h spte.W = 1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubh)}(hmemory write on the spte::h]hmemory write on the spte:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubj)}(hspte.Dirty = 1h]hspte.Dirty = 1}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh](j)}(hhh]j)}(helse old_spte = xchg(spte, new_spte); if (old_spte.Accessed && !new_spte.Accessed) flush = true; if (old_spte.Dirty && !new_spte.Dirty) flush = true; OOPS!!!h]helse old_spte = xchg(spte, new_spte); if (old_spte.Accessed && !new_spte.Accessed) flush = true; if (old_spte.Dirty && !new_spte.Dirty) flush = true; OOPS!!!}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hhh]h}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(h#The Dirty bit is lost in this case.h]h#The Dirty bit is lost in this case.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hIn order to avoid this kind of issue, we always treat the spte as "volatile" if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means the spte is always atomically updated in this case.h]hIn order to avoid this kind of issue, we always treat the spte as “volatile” if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means the spte is always atomically updated in this case.}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hhh]j)}(hflush tlbs due to spte updated h]h)}(hflush tlbs due to spte updatedh]hflush tlbs due to spte updated}(hjChhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj?ubah}(h]h ]h"]h$]h&]uh1jhj<hhhhhNubah}(h]h ]h"]h$]h&]j"j#j$hj%jjKuh1jhjhhhhhKubh)}(hIf the spte is updated from writable to read-only, we should flush all TLBs, otherwise rmap_write_protect will find a read-only spte, even though the writable spte might be cached on a CPU's TLB.h]hIf the spte is updated from writable to read-only, we should flush all TLBs, otherwise rmap_write_protect will find a read-only spte, even though the writable spte might be cached on a CPU’s TLB.}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXAs mentioned before, the spte can be updated to writable out of mmu-lock on fast page fault path. In order to easily audit the path, we see if TLBs needing to be flushed caused this reason in mmu_spte_update() since this is a common function to update spte (present -> present).h]hXAs mentioned before, the spte can be updated to writable out of mmu-lock on fast page fault path. In order to easily audit the path, we see if TLBs needing to be flushed caused this reason in mmu_spte_update() since this is a common function to update spte (present -> present).}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hSince the spte is "volatile" if it can be updated out of mmu-lock, we always atomically update the spte and the race caused by fast page fault can be avoided. See the comments in spte_needs_atomic_update() and mmu_spte_update().h]hSince the spte is “volatile” if it can be updated out of mmu-lock, we always atomically update the spte and the race caused by fast page fault can be avoided. See the comments in spte_needs_atomic_update() and mmu_spte_update().}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hLockless Access Tracking:h]hLockless Access Tracking:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hXThis is used for Intel CPUs that are using EPT but do not support the EPT A/D bits. In this case, PTEs are tagged as A/D disabled (using ignored bits), and when the KVM MMU notifier is called to track accesses to a page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE not-present in hardware by clearing the RWX bits in the PTE and storing the original R & X bits in more unused/ignored bits. When the VM tries to access the page later on, a fault is generated and the fast page fault mechanism described above is used to atomically restore the PTE to a Present state. The W bit is not saved when the PTE is marked for access tracking and during restoration to the Present state, the W bit is set depending on whether or not it was a write access. If it wasn't, then the W bit will remain clear until a write access happens, at which time it will be set using the Dirty tracking mechanism described above.h]hXThis is used for Intel CPUs that are using EPT but do not support the EPT A/D bits. In this case, PTEs are tagged as A/D disabled (using ignored bits), and when the KVM MMU notifier is called to track accesses to a page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE not-present in hardware by clearing the RWX bits in the PTE and storing the original R & X bits in more unused/ignored bits. When the VM tries to access the page later on, a fault is generated and the fast page fault mechanism described above is used to atomically restore the PTE to a Present state. The W bit is not saved when the PTE is marked for access tracking and during restoration to the Present state, the W bit is set depending on whether or not it was a write access. If it wasn’t, then the W bit will remain clear until a write access happens, at which time it will be set using the Dirty tracking mechanism described above.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h] exceptionah ]h"] 2. exceptionah$]h&]uh1hhhhhhhhKAubh)}(hhh](h)}(h 3. Referenceh]h 3. Reference}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h ``kvm_lock``h]j)}(hjh]hkvm_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh field_list)}(hhh](hfield)}(hhh](h field_name)}(hTypeh]hType}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubh field_body)}(hmutexh]h)}(hjh]hmutex}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hArchh]hArch}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hanyh]h)}(hj"h]hany}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hProtectsh]hProtects}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj=hhhKubj)}(h - vm_list h]h)}(hhh]j)}(hvm_list h]h)}(hvm_listh]hvm_list}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjUubah}(h]h ]h"]h$]h&]uh1jhjRubah}(h]h ]h"]h$]h&]jjuh1hhhhKhjNubah}(h]h ]h"]h$]h&]uh1jhj=ubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhKubeh}(h]kvm-lockah ]h"]kvm_lockah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h``kvm_usage_lock``h]j)}(hjh]hkvm_usage_lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hmutexh]h)}(hjh]hmutex}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hArchh]hArch}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hanyh]h)}(hjh]hany}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h:- kvm_usage_count - hardware virtualization enable/disableh]h)}(hhh](j)}(hkvm_usage_counth]h)}(hj h]hkvm_usage_count}(hj" hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(h&hardware virtualization enable/disableh]h)}(hj7 h]h&hardware virtualization enable/disable}(hj9 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj5 ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubj)}(hhh](j)}(hCommenth]hComment}(hja hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj^ hhhKubj)}(hExists to allow taking cpus_read_lock() while kvm_usage_count is protected, which simplifies the virtualization enabling logic. h]h)}(hExists to allow taking cpus_read_lock() while kvm_usage_count is protected, which simplifies the virtualization enabling logic.h]hExists to allow taking cpus_read_lock() while kvm_usage_count is protected, which simplifies the virtualization enabling logic.}(hjs hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjo ubah}(h]h ]h"]h$]h&]uh1jhj^ ubeh}(h]h ]h"]h$]h&]uh1jhhhKhjhhubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhKubeh}(h]kvm-usage-lockah ]h"]kvm_usage_lockah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h``kvm->mn_invalidate_lock``h]j)}(hj h]hkvm->mn_invalidate_lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhKubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h spinlock_th]h)}(hj h]h spinlock_t}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hanyh]h)}(hj h]hany}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h7mn_active_invalidate_count, mn_memslots_update_rcuwait h]h)}(h6mn_active_invalidate_count, mn_memslots_update_rcuwaith]h6mn_active_invalidate_count, mn_memslots_update_rcuwait}(hj) hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj% ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj hhubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhMubeh}(h]kvm-mn-invalidate-lockah ]h"]kvm->mn_invalidate_lockah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(h``kvm_arch::tsc_write_lock``h]j)}(hjV h]hkvm_arch::tsc_write_lock}(hjX hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjT ubah}(h]h ]h"]h$]h&]uh1hhjQ hhhhhMubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjq hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjn hhhKubj)}(hraw_spinlock_th]h)}(hj h]hraw_spinlock_t}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhjn ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjk hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hx86h]h)}(hj h]hx86}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM hjk hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hO- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset} - tsc offset in vmcbh]h)}(hhh](j)}(h8kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}h]h)}(hj h]h8kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(htsc offset in vmcbh]h)}(hj h]htsc offset in vmcb}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhM hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM hjk hhubj)}(hhh](j)}(hCommenth]hComment}(hj% hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj" hhhKubj)}(h>'raw' because updating the tsc offsets must not be preempted. h]h)}(h='raw' because updating the tsc offsets must not be preempted.h]hA‘raw’ because updating the tsc offsets must not be preempted.}(hj7 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj3 ubah}(h]h ]h"]h$]h&]uh1jhj" ubeh}(h]h ]h"]h$]h&]uh1jhhhM hjk hhubeh}(h]h ]h"]h$]h&]uh1jhjQ hhhhhMubeh}(h]kvm-arch-tsc-write-lockah ]h"]kvm_arch::tsc_write_lockah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h``kvm->mmu_lock``h]j)}(hjd h]h kvm->mmu_lock}(hjf hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjb ubah}(h]h ]h"]h$]h&]uh1hhj_ hhhhhMubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj| hhhKubj)}(hspinlock_t or rwlock_th]h)}(hj h]hspinlock_t or rwlock_t}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj| ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjy hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hanyh]h)}(hj h]hany}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjy hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h-shadow page/shadow tlb entryh]h)}(hj h]h-shadow page/shadow tlb entry}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjy hhubj)}(hhh](j)}(hCommenth]hComment}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h3it is a spinlock since it is used in mmu notifier. h]h)}(h2it is a spinlock since it is used in mmu notifier.h]h2it is a spinlock since it is used in mmu notifier.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhjy hhubeh}(h]h ]h"]h$]h&]uh1jhj_ hhhhhMubeh}(h] kvm-mmu-lockah ]h"] kvm->mmu_lockah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h ``kvm->srcu``h]j)}(hjH h]h kvm->srcu}(hjJ hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjF ubah}(h]h ]h"]h$]h&]uh1hhjC hhhhhMubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjc hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj` hhhKubj)}(h srcu lockh]h)}(hjs h]h srcu lock}(hju hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjq ubah}(h]h ]h"]h$]h&]uh1jhj` ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj] hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hanyh]h)}(hj h]hany}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj] hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(h- kvm->memslots - kvm->busesh]h)}(hhh](j)}(h kvm->memslotsh]h)}(hj h]h kvm->memslots}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubj)}(h kvm->busesh]h)}(hj h]h kvm->buses}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj] hhubj)}(hhh](j)}(hCommenth]hComment}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hXThe srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions) and while accessing in-kernel MMIO/PIO address->device structure mapping (kvm->buses). The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions. h]h)}(hXThe srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions) and while accessing in-kernel MMIO/PIO address->device structure mapping (kvm->buses). The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions.h]hXThe srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions) and while accessing in-kernel MMIO/PIO address->device structure mapping (kvm->buses). The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions.}(hj) hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj% ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhMhj] hhubeh}(h]h ]h"]h$]h&]uh1jhjC hhhhhMubeh}(h]kvm-srcuah ]h"] kvm->srcuah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h``kvm->slots_arch_lock``h]j)}(hjV h]hkvm->slots_arch_lock}(hjX hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjT ubah}(h]h ]h"]h$]h&]uh1hhjQ hhhhhM"ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjq hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjn hhhKubj)}(hmutexh]h)}(hj h]hmutex}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM#hj ubah}(h]h ]h"]h$]h&]uh1jhjn ubeh}(h]h ]h"]h$]h&]uh1jhhhM#hjk hhubj)}(hhh](j)}(hArchh]hArch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hany (only needed on x86 though)h]h)}(hj h]hany (only needed on x86 though)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM$hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM$hjk hhubj)}(hhh](j)}(hProtectsh]hProtects}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hlany arch-specific fields of memslots that have to be modified in a ``kvm->srcu`` read-side critical section.h]h)}(hlany arch-specific fields of memslots that have to be modified in a ``kvm->srcu`` read-side critical section.h](hCany arch-specific fields of memslots that have to be modified in a }(hj hhhNhNubj)}(h ``kvm->srcu``h]h kvm->srcu}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh read-side critical section.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM%hj ubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM%hjk hhubj)}(hhh](j)}(hCommenth]hComment}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhKubj)}(hvmust be held before reading the pointer to the current memslots, until after all changes to the memslots are complete h]h)}(humust be held before reading the pointer to the current memslots, until after all changes to the memslots are completeh]humust be held before reading the pointer to the current memslots, until after all changes to the memslots are complete}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM'hjubah}(h]h ]h"]h$]h&]uh1jhj ubeh}(h]h ]h"]h$]h&]uh1jhhhM'hjk hhubeh}(h]h ]h"]h$]h&]uh1jhjQ hhhhhM#ubeh}(h]kvm-slots-arch-lockah ]h"]kvm->slots_arch_lockah$]h&]uh1hhjhhhhhM"ubh)}(hhh](h)}(h``wakeup_vcpus_on_cpu_lock``h]j)}(hjMh]hwakeup_vcpus_on_cpu_lock}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjKubah}(h]h ]h"]h$]h&]uh1hhjHhhhhhM+ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjhhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjehhhKubj)}(h spinlock_th]h)}(hjxh]h spinlock_t}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM,hjvubah}(h]h ]h"]h$]h&]uh1jhjeubeh}(h]h ]h"]h$]h&]uh1jhhhM,hjbhhubj)}(hhh](j)}(hArchh]hArch}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hx86h]h)}(hjh]hx86}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM-hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM-hjbhhubj)}(hhh](j)}(hProtectsh]hProtects}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hwakeup_vcpus_on_cpuh]h)}(hjh]hwakeup_vcpus_on_cpu}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM.hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM.hjbhhubj)}(hhh](j)}(hCommenth]hComment}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hXThis is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts are supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup. h]h)}(hXThis is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts are supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup.h]hXThis is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts are supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM/hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM/hjbhhubeh}(h]h ]h"]h$]h&]uh1jhjHhhhhhM,ubeh}(h]wakeup-vcpus-on-cpu-lockah ]h"]wakeup_vcpus_on_cpu_lockah$]h&]uh1hhjhhhhhM+ubh)}(hhh](h)}(h``vendor_module_lock``h]j)}(hj1h]hvendor_module_lock}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj/ubah}(h]h ]h"]h$]h&]uh1hhj,hhhhhM8ubj)}(hhh](j)}(hhh](j)}(hTypeh]hType}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjIhhhKubj)}(hmutexh]h)}(hj\h]hmutex}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM9hjZubah}(h]h ]h"]h$]h&]uh1jhjIubeh}(h]h ]h"]h$]h&]uh1jhhhM9hjFhhubj)}(hhh](j)}(hArchh]hArch}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjwhhhKubj)}(hx86h]h)}(hjh]hx86}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM:hjubah}(h]h ]h"]h$]h&]uh1jhjwubeh}(h]h ]h"]h$]h&]uh1jhhhM:hjFhhubj)}(hhh](j)}(hProtectsh]hProtects}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(h.loading a vendor module (kvm_amd or kvm_intel)h]h)}(hjh]h.loading a vendor module (kvm_amd or kvm_intel)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM;hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM;hjFhhubj)}(hhh](j)}(hCommenth]hComment}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhKubj)}(hX>Exists because using kvm_lock leads to deadlock. kvm_lock is taken in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many operations need to take cpu_hotplug_lock when loading a vendor module, e.g. updating static calls.h]h)}(hX>Exists because using kvm_lock leads to deadlock. kvm_lock is taken in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many operations need to take cpu_hotplug_lock when loading a vendor module, e.g. updating static calls.h]hX>Exists because using kvm_lock leads to deadlock. kvm_lock is taken in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many operations need to take cpu_hotplug_lock when loading a vendor module, e.g. updating static calls.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM<hjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM<hjFhhubeh}(h]h ]h"]h$]h&]uh1jhj,hhhhhM9ubeh}(h]vendor-module-lockah ]h"]vendor_module_lockah$]h&]uh1hhjhhhhhM8ubeh}(h] referenceah ]h"] 3. referenceah$]h&]uh1hhhhhhhhKubeh}(h]kvm-lock-overviewah ]h"]kvm lock overviewah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksjfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjBerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourcehnj _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}nameids}(jjjjjjjjjjj j jN jK j\ jY j@ j= jN jK jEjBj)j&j j u nametypes}(jjjjjj jN j\ j@ jN jEj)j uh}(jhjhjjjjjjj jjK j jY jQ j= j_ jK jC jBjQ j&jHj j,u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages](hsystem_message)}(hhh]h)}(h:Enumerated list start value not ordinal-1: "2" (ordinal 2)h]h>Enumerated list start value not ordinal-1: “2” (ordinal 2)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehnjlineKuh1jhjhhhhhKubj)}(hhh]h)}(h:Enumerated list start value not ordinal-1: "3" (ordinal 3)h]h>Enumerated list start value not ordinal-1: “3” (ordinal 3)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]levelKtypejsourcehnjlineKuh1jhjhhhhhKubetransform_messages] transformerN include_log] decorationNhhub.