€•DŸŒsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ$/translations/zh_CN/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/zh_TW/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/it_IT/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/ja_JP/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/ko_KR/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒPortuguese (Brazilian)”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/pt_BR/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh–sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/sp_SP/virt/hyperv/vpci”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒcomment”“”)”}”(hŒ SPDX-License-Identifier: GPL-2.0”h]”hŒ SPDX-License-Identifier: GPL-2.0”…””}”hh·sbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1hµhhh²hh³Œ>/var/lib/git/docbuild/linux/Documentation/virt/hyperv/vpci.rst”h´KubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒPCI pass-thru devices”h]”hŒPCI pass-thru devices”…””}”(hhÏh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhhÊh²hh³hÇh´KubhŒ paragraph”“”)”}”(hXNIn a Hyper-V guest VM, PCI pass-thru devices (also called virtual PCI devices, or vPCI devices) are physical PCI devices that are mapped directly into the VM's physical address space. Guest device drivers can interact directly with the hardware without intermediation by the host hypervisor. This approach provides higher bandwidth access to the device with lower latency, compared with devices that are virtualized by the hypervisor. The device should appear to the guest just as it would when running on bare metal, so no changes are required to the Linux device drivers for the device.”h]”hXPIn a Hyper-V guest VM, PCI pass-thru devices (also called virtual PCI devices, or vPCI devices) are physical PCI devices that are mapped directly into the VM’s physical address space. Guest device drivers can interact directly with the hardware without intermediation by the host hypervisor. This approach provides higher bandwidth access to the device with lower latency, compared with devices that are virtualized by the hypervisor. The device should appear to the guest just as it would when running on bare metal, so no changes are required to the Linux device drivers for the device.”…””}”(hhßh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KhhÊh²hubhÞ)”}”(hŒ‹Hyper-V terminology for vPCI devices is "Discrete Device Assignment" (DDA). Public documentation for Hyper-V DDA is available here: `DDA`_”h]”(hŒ‰Hyper-V terminology for vPCI devices is “Discrete Device Assignment†(DDA). Public documentation for Hyper-V DDA is available here: ”…””}”(hhíh²hh³Nh´NubhŒ reference”“”)”}”(hŒ`DDA`_”h]”hŒDDA”…””}”(hh÷h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”Œname”ŒDDA”Œrefuri”Œˆhttps://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment”uh1hõhhíŒresolved”Kubeh}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KhhÊh²hubhŒtarget”“”)”}”(hŒ‘.. _DDA: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment”h]”h}”(h]”Œdda”ah ]”h"]”Œdda”ah$]”h&]”jjuh1jh´KhhÊh²hh³hÇŒ referenced”KubhÞ)”}”(hXDDA is typically used for storage controllers, such as NVMe, and for GPUs. A similar mechanism for NICs is called SR-IOV and produces the same benefits by allowing a guest device driver to interact directly with the hardware. See Hyper-V public documentation here: `SR-IOV`_”h]”(hX DDA is typically used for storage controllers, such as NVMe, and for GPUs. A similar mechanism for NICs is called SR-IOV and produces the same benefits by allowing a guest device driver to interact directly with the hardware. See Hyper-V public documentation here: ”…””}”(hjh²hh³Nh´Nubhö)”}”(hŒ `SR-IOV`_”h]”hŒSR-IOV”…””}”(hj'h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”Œname”ŒSR-IOV”jŒvhttps://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-”uh1hõhjj Kubeh}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KhhÊh²hubj)”}”(hŒ‚.. _SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-”h]”h}”(h]”Œsr-iov”ah ]”h"]”Œsr-iov”ah$]”h&]”jj7uh1jh´KhhÊh²hh³hÇjKubhÞ)”}”(hŒ@This discussion of vPCI devices includes DDA and SR-IOV devices.”h]”hŒ@This discussion of vPCI devices includes DDA and SR-IOV devices.”…””}”(hjJh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KhhÊh²hubhÉ)”}”(hhh]”(hÎ)”}”(hŒDevice Presentation”h]”hŒDevice Presentation”…””}”(hj[h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjXh²hh³hÇh´K"ubhÞ)”}”(hXqHyper-V provides full PCI functionality for a vPCI device when it is operating, so the Linux device driver for the device can be used unchanged, provided it uses the correct Linux kernel APIs for accessing PCI config space and for other integration with Linux. But the initial detection of the PCI device and its integration with the Linux PCI subsystem must use Hyper-V specific mechanisms. Consequently, vPCI devices on Hyper-V have a dual identity. They are initially presented to Linux guests as VMBus devices via the standard VMBus "offer" mechanism, so they have a VMBus identity and appear under /sys/bus/vmbus/devices. The VMBus vPCI driver in Linux at drivers/pci/controller/pci-hyperv.c handles a newly introduced vPCI device by fabricating a PCI bus topology and creating all the normal PCI device data structures in Linux that would exist if the PCI device were discovered via ACPI on a bare- metal system. Once those data structures are set up, the device also has a normal PCI identity in Linux, and the normal Linux device driver for the vPCI device can function as if it were running in Linux on bare-metal. Because vPCI devices are presented dynamically through the VMBus offer mechanism, they do not appear in the Linux guest's ACPI tables. vPCI devices may be added to a VM or removed from a VM at any time during the life of the VM, and not just during initial boot.”h]”hXwHyper-V provides full PCI functionality for a vPCI device when it is operating, so the Linux device driver for the device can be used unchanged, provided it uses the correct Linux kernel APIs for accessing PCI config space and for other integration with Linux. But the initial detection of the PCI device and its integration with the Linux PCI subsystem must use Hyper-V specific mechanisms. Consequently, vPCI devices on Hyper-V have a dual identity. They are initially presented to Linux guests as VMBus devices via the standard VMBus “offer†mechanism, so they have a VMBus identity and appear under /sys/bus/vmbus/devices. The VMBus vPCI driver in Linux at drivers/pci/controller/pci-hyperv.c handles a newly introduced vPCI device by fabricating a PCI bus topology and creating all the normal PCI device data structures in Linux that would exist if the PCI device were discovered via ACPI on a bare- metal system. Once those data structures are set up, the device also has a normal PCI identity in Linux, and the normal Linux device driver for the vPCI device can function as if it were running in Linux on bare-metal. Because vPCI devices are presented dynamically through the VMBus offer mechanism, they do not appear in the Linux guest’s ACPI tables. vPCI devices may be added to a VM or removed from a VM at any time during the life of the VM, and not just during initial boot.”…””}”(hjih²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´K#hjXh²hubhÞ)”}”(hX"With this approach, the vPCI device is a VMBus device and a PCI device at the same time. In response to the VMBus offer message, the hv_pci_probe() function runs and establishes a VMBus connection to the vPCI VSP on the Hyper-V host. That connection has a single VMBus channel. The channel is used to exchange messages with the vPCI VSP for the purpose of setting up and configuring the vPCI device in Linux. Once the device is fully configured in Linux as a PCI device, the VMBus channel is used only if Linux changes the vCPU to be interrupted in the guest, or if the vPCI device is removed from the VM while the VM is running. The ongoing operation of the device happens directly between the Linux device driver for the device and the hardware, with VMBus and the VMBus channel playing no role.”h]”hX"With this approach, the vPCI device is a VMBus device and a PCI device at the same time. In response to the VMBus offer message, the hv_pci_probe() function runs and establishes a VMBus connection to the vPCI VSP on the Hyper-V host. That connection has a single VMBus channel. The channel is used to exchange messages with the vPCI VSP for the purpose of setting up and configuring the vPCI device in Linux. Once the device is fully configured in Linux as a PCI device, the VMBus channel is used only if Linux changes the vCPU to be interrupted in the guest, or if the vPCI device is removed from the VM while the VM is running. The ongoing operation of the device happens directly between the Linux device driver for the device and the hardware, with VMBus and the VMBus channel playing no role.”…””}”(hjwh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´K;hjXh²hubeh}”(h]”Œdevice-presentation”ah ]”h"]”Œdevice presentation”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K"ubhÉ)”}”(hhh]”(hÎ)”}”(hŒPCI Device Setup”h]”hŒPCI Device Setup”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjh²hh³hÇh´KKubhÞ)”}”(hXµPCI device setup follows a sequence that Hyper-V originally created for Windows guests, and that can be ill-suited for Linux guests due to differences in the overall structure of the Linux PCI subsystem compared with Windows. Nonetheless, with a bit of hackery in the Hyper-V virtual PCI driver for Linux, the virtual PCI device is setup in Linux so that generic Linux PCI subsystem code and the Linux driver for the device "just work".”h]”hX¹PCI device setup follows a sequence that Hyper-V originally created for Windows guests, and that can be ill-suited for Linux guests due to differences in the overall structure of the Linux PCI subsystem compared with Windows. Nonetheless, with a bit of hackery in the Hyper-V virtual PCI driver for Linux, the virtual PCI device is setup in Linux so that generic Linux PCI subsystem code and the Linux driver for the device “just workâ€.”…””}”(hjžh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KLhjh²hubhÞ)”}”(hXùEach vPCI device is set up in Linux to be in its own PCI domain with a host bridge. The PCI domainID is derived from bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI device. The Hyper-V host does not guarantee that these bytes are unique, so hv_pci_probe() has an algorithm to resolve collisions. The collision resolution is intended to be stable across reboots of the same VM so that the PCI domainIDs don't change, as the domainID appears in the user space configuration of some devices.”h]”hXûEach vPCI device is set up in Linux to be in its own PCI domain with a host bridge. The PCI domainID is derived from bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI device. The Hyper-V host does not guarantee that these bytes are unique, so hv_pci_probe() has an algorithm to resolve collisions. The collision resolution is intended to be stable across reboots of the same VM so that the PCI domainIDs don’t change, as the domainID appears in the user space configuration of some devices.”…””}”(hj¬h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KUhjh²hubhÞ)”}”(hX—hv_pci_probe() allocates a guest MMIO range to be used as PCI config space for the device. This MMIO range is communicated to the Hyper-V host over the VMBus channel as part of telling the host that the device is ready to enter d0. See hv_pci_enter_d0(). When the guest subsequently accesses this MMIO range, the Hyper-V host intercepts the accesses and maps them to the physical device PCI config space.”h]”hX—hv_pci_probe() allocates a guest MMIO range to be used as PCI config space for the device. This MMIO range is communicated to the Hyper-V host over the VMBus channel as part of telling the host that the device is ready to enter d0. See hv_pci_enter_d0(). When the guest subsequently accesses this MMIO range, the Hyper-V host intercepts the accesses and maps them to the physical device PCI config space.”…””}”(hjºh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´K_hjh²hubhÞ)”}”(hX hv_pci_probe() also gets BAR information for the device from the Hyper-V host, and uses this information to allocate MMIO space for the BARs. That MMIO space is then setup to be associated with the host bridge so that it works when generic PCI subsystem code in Linux processes the BARs.”h]”hX hv_pci_probe() also gets BAR information for the device from the Hyper-V host, and uses this information to allocate MMIO space for the BARs. That MMIO space is then setup to be associated with the host bridge so that it works when generic PCI subsystem code in Linux processes the BARs.”…””}”(hjÈh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´Kghjh²hubhÞ)”}”(hXFinally, hv_pci_probe() creates the root PCI bus. At this point the Hyper-V virtual PCI driver hackery is done, and the normal Linux PCI machinery for scanning the root bus works to detect the device, to perform driver matching, and to initialize the driver and device.”h]”hXFinally, hv_pci_probe() creates the root PCI bus. At this point the Hyper-V virtual PCI driver hackery is done, and the normal Linux PCI machinery for scanning the root bus works to detect the device, to perform driver matching, and to initialize the driver and device.”…””}”(hjÖh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´Kmhjh²hubeh}”(h]”Œpci-device-setup”ah ]”h"]”Œpci device setup”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´KKubhÉ)”}”(hhh]”(hÎ)”}”(hŒPCI Device Removal”h]”hŒPCI Device Removal”…””}”(hjïh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjìh²hh³hÇh´KtubhÞ)”}”(hŒáA Hyper-V host may initiate removal of a vPCI device from a guest VM at any time during the life of the VM. The removal is instigated by an admin action taken on the Hyper-V host and is not under the control of the guest OS.”h]”hŒáA Hyper-V host may initiate removal of a vPCI device from a guest VM at any time during the life of the VM. The removal is instigated by an admin action taken on the Hyper-V host and is not under the control of the guest OS.”…””}”(hjýh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´Kuhjìh²hubhÞ)”}”(hXDA guest VM is notified of the removal by an unsolicited "Eject" message sent from the host to the guest over the VMBus channel associated with the vPCI device. Upon receipt of such a message, the Hyper-V virtual PCI driver in Linux asynchronously invokes Linux kernel PCI subsystem calls to shutdown and remove the device. When those calls are complete, an "Ejection Complete" message is sent back to Hyper-V over the VMBus channel indicating that the device has been removed. At this point, Hyper-V sends a VMBus rescind message to the Linux guest, which the VMBus driver in Linux processes by removing the VMBus identity for the device. Once that processing is complete, all vestiges of the device having been present are gone from the Linux kernel. The rescind message also indicates to the guest that Hyper-V has stopped providing support for the vPCI device in the guest. If the guest were to attempt to access that device's MMIO space, it would be an invalid reference. Hypercalls affecting the device return errors, and any further messages sent in the VMBus channel are ignored.”h]”hXNA guest VM is notified of the removal by an unsolicited “Eject†message sent from the host to the guest over the VMBus channel associated with the vPCI device. Upon receipt of such a message, the Hyper-V virtual PCI driver in Linux asynchronously invokes Linux kernel PCI subsystem calls to shutdown and remove the device. When those calls are complete, an “Ejection Complete†message is sent back to Hyper-V over the VMBus channel indicating that the device has been removed. At this point, Hyper-V sends a VMBus rescind message to the Linux guest, which the VMBus driver in Linux processes by removing the VMBus identity for the device. Once that processing is complete, all vestiges of the device having been present are gone from the Linux kernel. The rescind message also indicates to the guest that Hyper-V has stopped providing support for the vPCI device in the guest. If the guest were to attempt to access that device’s MMIO space, it would be an invalid reference. Hypercalls affecting the device return errors, and any further messages sent in the VMBus channel are ignored.”…””}”(hj h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´Kzhjìh²hubhÞ)”}”(hXéAfter sending the Eject message, Hyper-V allows the guest VM 60 seconds to cleanly shutdown the device and respond with Ejection Complete before sending the VMBus rescind message. If for any reason the Eject steps don't complete within the allowed 60 seconds, the Hyper-V host forcibly performs the rescind steps, which will likely result in cascading errors in the guest because the device is now no longer present from the guest standpoint and accessing the device MMIO space will fail.”h]”hXëAfter sending the Eject message, Hyper-V allows the guest VM 60 seconds to cleanly shutdown the device and respond with Ejection Complete before sending the VMBus rescind message. If for any reason the Eject steps don’t complete within the allowed 60 seconds, the Hyper-V host forcibly performs the rescind steps, which will likely result in cascading errors in the guest because the device is now no longer present from the guest standpoint and accessing the device MMIO space will fail.”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KŽhjìh²hubhÞ)”}”(hXþBecause ejection is asynchronous and can happen at any point during the guest VM lifecycle, proper synchronization in the Hyper-V virtual PCI driver is very tricky. Ejection has been observed even before a newly offered vPCI device has been fully setup. The Hyper-V virtual PCI driver has been updated several times over the years to fix race conditions when ejections happen at inopportune times. Care must be taken when modifying this code to prevent re-introducing such problems. See comments in the code.”h]”hXþBecause ejection is asynchronous and can happen at any point during the guest VM lifecycle, proper synchronization in the Hyper-V virtual PCI driver is very tricky. Ejection has been observed even before a newly offered vPCI device has been fully setup. The Hyper-V virtual PCI driver has been updated several times over the years to fix race conditions when ejections happen at inopportune times. Care must be taken when modifying this code to prevent re-introducing such problems. See comments in the code.”…””}”(hj'h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´K˜hjìh²hubeh}”(h]”Œpci-device-removal”ah ]”h"]”Œpci device removal”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´KtubhÉ)”}”(hhh]”(hÎ)”}”(hŒInterrupt Assignment”h]”hŒInterrupt Assignment”…””}”(hj@h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhj=h²hh³hÇh´K£ubhÞ)”}”(hXñThe Hyper-V virtual PCI driver supports vPCI devices using MSI, multi-MSI, or MSI-X. Assigning the guest vCPU that will receive the interrupt for a particular MSI or MSI-X message is complex because of the way the Linux setup of IRQs maps onto the Hyper-V interfaces. For the single-MSI and MSI-X cases, Linux calls hv_compse_msi_msg() twice, with the first call containing a dummy vCPU and the second call containing the real vCPU. Furthermore, hv_irq_unmask() is finally called (on x86) or the GICD registers are set (on arm64) to specify the real vCPU again. Each of these three calls interact with Hyper-V, which must decide which physical CPU should receive the interrupt before it is forwarded to the guest VM. Unfortunately, the Hyper-V decision-making process is a bit limited, and can result in concentrating the physical interrupts on a single CPU, causing a performance bottleneck. See details about how this is resolved in the extensive comment above the function hv_compose_msi_req_get_cpu().”h]”hXñThe Hyper-V virtual PCI driver supports vPCI devices using MSI, multi-MSI, or MSI-X. Assigning the guest vCPU that will receive the interrupt for a particular MSI or MSI-X message is complex because of the way the Linux setup of IRQs maps onto the Hyper-V interfaces. For the single-MSI and MSI-X cases, Linux calls hv_compse_msi_msg() twice, with the first call containing a dummy vCPU and the second call containing the real vCPU. Furthermore, hv_irq_unmask() is finally called (on x86) or the GICD registers are set (on arm64) to specify the real vCPU again. Each of these three calls interact with Hyper-V, which must decide which physical CPU should receive the interrupt before it is forwarded to the guest VM. Unfortunately, the Hyper-V decision-making process is a bit limited, and can result in concentrating the physical interrupts on a single CPU, causing a performance bottleneck. See details about how this is resolved in the extensive comment above the function hv_compose_msi_req_get_cpu().”…””}”(hjNh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´K¤hj=h²hubhÞ)”}”(hXÌThe Hyper-V virtual PCI driver implements the irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg(). Unfortunately, on Hyper-V the implementation requires sending a VMBus message to the Hyper-V host and awaiting an interrupt indicating receipt of a reply message. Since irq_chip.irq_compose_msi_msg can be called with IRQ locks held, it doesn't work to do the normal sleep until awakened by the interrupt. Instead hv_compose_msi_msg() must send the VMBus message, and then poll for the completion message. As further complexity, the vPCI device could be ejected/rescinded while the polling is in progress, so this scenario must be detected as well. See comments in the code regarding this very tricky area.”h]”hXÎThe Hyper-V virtual PCI driver implements the irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg(). Unfortunately, on Hyper-V the implementation requires sending a VMBus message to the Hyper-V host and awaiting an interrupt indicating receipt of a reply message. Since irq_chip.irq_compose_msi_msg can be called with IRQ locks held, it doesn’t work to do the normal sleep until awakened by the interrupt. Instead hv_compose_msi_msg() must send the VMBus message, and then poll for the completion message. As further complexity, the vPCI device could be ejected/rescinded while the polling is in progress, so this scenario must be detected as well. See comments in the code regarding this very tricky area.”…””}”(hj\h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´K¶hj=h²hubhÞ)”}”(hX‚Most of the code in the Hyper-V virtual PCI driver (pci- hyperv.c) applies to Hyper-V and Linux guests running on x86 and on arm64 architectures. But there are differences in how interrupt assignments are managed. On x86, the Hyper-V virtual PCI driver in the guest must make a hypercall to tell Hyper-V which guest vCPU should be interrupted by each MSI/MSI-X interrupt, and the x86 interrupt vector number that the x86_vector IRQ domain has picked for the interrupt. This hypercall is made by hv_arch_irq_unmask(). On arm64, the Hyper-V virtual PCI driver manages the allocation of an SPI for each MSI/MSI-X interrupt. The Hyper-V virtual PCI driver stores the allocated SPI in the architectural GICD registers, which Hyper-V emulates, so no hypercall is necessary as with x86. Hyper-V does not support using LPIs for vPCI devices in arm64 guest VMs because it does not emulate a GICv3 ITS.”h]”hX‚Most of the code in the Hyper-V virtual PCI driver (pci- hyperv.c) applies to Hyper-V and Linux guests running on x86 and on arm64 architectures. But there are differences in how interrupt assignments are managed. On x86, the Hyper-V virtual PCI driver in the guest must make a hypercall to tell Hyper-V which guest vCPU should be interrupted by each MSI/MSI-X interrupt, and the x86 interrupt vector number that the x86_vector IRQ domain has picked for the interrupt. This hypercall is made by hv_arch_irq_unmask(). On arm64, the Hyper-V virtual PCI driver manages the allocation of an SPI for each MSI/MSI-X interrupt. The Hyper-V virtual PCI driver stores the allocated SPI in the architectural GICD registers, which Hyper-V emulates, so no hypercall is necessary as with x86. Hyper-V does not support using LPIs for vPCI devices in arm64 guest VMs because it does not emulate a GICv3 ITS.”…””}”(hjjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KÄhj=h²hubhÞ)”}”(hXûThe Hyper-V virtual PCI driver in Linux supports vPCI devices whose drivers create managed or unmanaged Linux IRQs. If the smp_affinity for an unmanaged IRQ is updated via the /proc/irq interface, the Hyper-V virtual PCI driver is called to tell the Hyper-V host to change the interrupt targeting and everything works properly. However, on x86 if the x86_vector IRQ domain needs to reassign an interrupt vector due to running out of vectors on a CPU, there's no path to inform the Hyper-V host of the change, and things break. Fortunately, guest VMs operate in a constrained device environment where using all the vectors on a CPU doesn't happen. Since such a problem is only a theoretical concern rather than a practical concern, it has been left unaddressed.”h]”hXÿThe Hyper-V virtual PCI driver in Linux supports vPCI devices whose drivers create managed or unmanaged Linux IRQs. If the smp_affinity for an unmanaged IRQ is updated via the /proc/irq interface, the Hyper-V virtual PCI driver is called to tell the Hyper-V host to change the interrupt targeting and everything works properly. However, on x86 if the x86_vector IRQ domain needs to reassign an interrupt vector due to running out of vectors on a CPU, there’s no path to inform the Hyper-V host of the change, and things break. Fortunately, guest VMs operate in a constrained device environment where using all the vectors on a CPU doesn’t happen. Since such a problem is only a theoretical concern rather than a practical concern, it has been left unaddressed.”…””}”(hjxh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KÔhj=h²hubeh}”(h]”Œinterrupt-assignment”ah ]”h"]”Œinterrupt assignment”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´K£ubhÉ)”}”(hhh]”(hÎ)”}”(hŒDMA”h]”hŒDMA”…””}”(hj‘h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjŽh²hh³hÇh´KãubhÞ)”}”(hX"By default, Hyper-V pins all guest VM memory in the host when the VM is created, and programs the physical IOMMU to allow the VM to have DMA access to all its memory. Hence it is safe to assign PCI devices to the VM, and allow the guest operating system to program the DMA transfers. The physical IOMMU prevents a malicious guest from initiating DMA to memory belonging to the host or to other VMs on the host. From the Linux guest standpoint, such DMA transfers are in "direct" mode since Hyper-V does not provide a virtual IOMMU in the guest.”h]”hX&By default, Hyper-V pins all guest VM memory in the host when the VM is created, and programs the physical IOMMU to allow the VM to have DMA access to all its memory. Hence it is safe to assign PCI devices to the VM, and allow the guest operating system to program the DMA transfers. The physical IOMMU prevents a malicious guest from initiating DMA to memory belonging to the host or to other VMs on the host. From the Linux guest standpoint, such DMA transfers are in “direct†mode since Hyper-V does not provide a virtual IOMMU in the guest.”…””}”(hjŸh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KähjŽh²hubhÞ)”}”(hXhHyper-V assumes that physical PCI devices always perform cache-coherent DMA. When running on x86, this behavior is required by the architecture. When running on arm64, the architecture allows for both cache-coherent and non-cache-coherent devices, with the behavior of each device specified in the ACPI DSDT. But when a PCI device is assigned to a guest VM, that device does not appear in the DSDT, so the Hyper-V VMBus driver propagates cache-coherency information from the VMBus node in the ACPI DSDT to all VMBus devices, including vPCI devices (since they have a dual identity as a VMBus device and as a PCI device). See vmbus_dma_configure(). Current Hyper-V versions always indicate that the VMBus is cache coherent, so vPCI devices on arm64 always get marked as cache coherent and the CPU does not perform any sync operations as part of dma_map/unmap_*() calls.”h]”hXhHyper-V assumes that physical PCI devices always perform cache-coherent DMA. When running on x86, this behavior is required by the architecture. When running on arm64, the architecture allows for both cache-coherent and non-cache-coherent devices, with the behavior of each device specified in the ACPI DSDT. But when a PCI device is assigned to a guest VM, that device does not appear in the DSDT, so the Hyper-V VMBus driver propagates cache-coherency information from the VMBus node in the ACPI DSDT to all VMBus devices, including vPCI devices (since they have a dual identity as a VMBus device and as a PCI device). See vmbus_dma_configure(). Current Hyper-V versions always indicate that the VMBus is cache coherent, so vPCI devices on arm64 always get marked as cache coherent and the CPU does not perform any sync operations as part of dma_map/unmap_*() calls.”…””}”(hj­h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´KïhjŽh²hubeh}”(h]”Œdma”ah ]”h"]”Œdma”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´KãubhÉ)”}”(hhh]”(hÎ)”}”(hŒvPCI protocol versions”h]”hŒvPCI protocol versions”…””}”(hjÆh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjÃh²hh³hÇh´MubhÞ)”}”(hX¤As previously described, during vPCI device setup and teardown messages are passed over a VMBus channel between the Hyper-V host and the Hyper-v vPCI driver in the Linux guest. Some messages have been revised in newer versions of Hyper-V, so the guest and host must agree on the vPCI protocol version to be used. The version is negotiated when communication over the VMBus channel is first established. See hv_pci_protocol_negotiation(). Newer versions of the protocol extend support to VMs with more than 64 vCPUs, and provide additional information about the vPCI device, such as the guest virtual NUMA node to which it is most closely affined in the underlying hardware.”h]”hX¤As previously described, during vPCI device setup and teardown messages are passed over a VMBus channel between the Hyper-V host and the Hyper-v vPCI driver in the Linux guest. Some messages have been revised in newer versions of Hyper-V, so the guest and host must agree on the vPCI protocol version to be used. The version is negotiated when communication over the VMBus channel is first established. See hv_pci_protocol_negotiation(). Newer versions of the protocol extend support to VMs with more than 64 vCPUs, and provide additional information about the vPCI device, such as the guest virtual NUMA node to which it is most closely affined in the underlying hardware.”…””}”(hjÔh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´MhjÃh²hubeh}”(h]”Œvpci-protocol-versions”ah ]”h"]”Œvpci protocol versions”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´MubhÉ)”}”(hhh]”(hÎ)”}”(hŒGuest NUMA node affinity”h]”hŒGuest NUMA node affinity”…””}”(hjíh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjêh²hh³hÇh´MubhÞ)”}”(hXëWhen the vPCI protocol version provides it, the guest NUMA node affinity of the vPCI device is stored as part of the Linux device information for subsequent use by the Linux driver. See hv_pci_assign_numa_node(). If the negotiated protocol version does not support the host providing NUMA affinity information, the Linux guest defaults the device NUMA node to 0. But even when the negotiated protocol version includes NUMA affinity information, the ability of the host to provide such information depends on certain host configuration options. If the guest receives NUMA node value "0", it could mean NUMA node 0, or it could mean "no information is available". Unfortunately it is not possible to distinguish the two cases from the guest side.”h]”hXóWhen the vPCI protocol version provides it, the guest NUMA node affinity of the vPCI device is stored as part of the Linux device information for subsequent use by the Linux driver. See hv_pci_assign_numa_node(). If the negotiated protocol version does not support the host providing NUMA affinity information, the Linux guest defaults the device NUMA node to 0. But even when the negotiated protocol version includes NUMA affinity information, the ability of the host to provide such information depends on certain host configuration options. If the guest receives NUMA node value “0â€, it could mean NUMA node 0, or it could mean “no information is availableâ€. Unfortunately it is not possible to distinguish the two cases from the guest side.”…””}”(hjûh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´Mhjêh²hubeh}”(h]”Œguest-numa-node-affinity”ah ]”h"]”Œguest numa node affinity”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´MubhÉ)”}”(hhh]”(hÎ)”}”(hŒ$PCI config space access in a CoCo VM”h]”hŒ$PCI config space access in a CoCo VM”…””}”(hjh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhjh²hh³hÇh´MubhÞ)”}”(hX|Linux PCI device drivers access PCI config space using a standard set of functions provided by the Linux PCI subsystem. In Hyper-V guests these standard functions map to functions hv_pcifront_read_config() and hv_pcifront_write_config() in the Hyper-V virtual PCI driver. In normal VMs, these hv_pcifront_*() functions directly access the PCI config space, and the accesses trap to Hyper-V to be handled. But in CoCo VMs, memory encryption prevents Hyper-V from reading the guest instruction stream to emulate the access, so the hv_pcifront_*() functions must invoke hypercalls with explicit arguments describing the access to be made.”h]”hX|Linux PCI device drivers access PCI config space using a standard set of functions provided by the Linux PCI subsystem. In Hyper-V guests these standard functions map to functions hv_pcifront_read_config() and hv_pcifront_write_config() in the Hyper-V virtual PCI driver. In normal VMs, these hv_pcifront_*() functions directly access the PCI config space, and the accesses trap to Hyper-V to be handled. But in CoCo VMs, memory encryption prevents Hyper-V from reading the guest instruction stream to emulate the access, so the hv_pcifront_*() functions must invoke hypercalls with explicit arguments describing the access to be made.”…””}”(hj"h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´M hjh²hubeh}”(h]”Œ$pci-config-space-access-in-a-coco-vm”ah ]”h"]”Œ$pci config space access in a coco vm”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´MubhÉ)”}”(hhh]”(hÎ)”}”(hŒConfig Block back-channel”h]”hŒConfig Block back-channel”…””}”(hj;h²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÍhj8h²hh³hÇh´M.ubhÞ)”}”(hXThe Hyper-V host and Hyper-V virtual PCI driver in Linux together implement a non-standard back-channel communication path between the host and guest. The back-channel path uses messages sent over the VMBus channel associated with the vPCI device. The functions hyperv_read_cfg_blk() and hyperv_write_cfg_blk() are the primary interfaces provided to other parts of the Linux kernel. As of this writing, these interfaces are used only by the Mellanox mlx5 driver to pass diagnostic data to a Hyper-V host running in the Azure public cloud. The functions hyperv_read_cfg_blk() and hyperv_write_cfg_blk() are implemented in a separate module (pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that effectively stubs them out when running in non-Hyper-V environments.”h]”hXThe Hyper-V host and Hyper-V virtual PCI driver in Linux together implement a non-standard back-channel communication path between the host and guest. The back-channel path uses messages sent over the VMBus channel associated with the vPCI device. The functions hyperv_read_cfg_blk() and hyperv_write_cfg_blk() are the primary interfaces provided to other parts of the Linux kernel. As of this writing, these interfaces are used only by the Mellanox mlx5 driver to pass diagnostic data to a Hyper-V host running in the Azure public cloud. The functions hyperv_read_cfg_blk() and hyperv_write_cfg_blk() are implemented in a separate module (pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that effectively stubs them out when running in non-Hyper-V environments.”…””}”(hjIh²hh³Nh´Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÝh³hÇh´M/hj8h²hubeh}”(h]”Œconfig-block-back-channel”ah ]”h"]”Œconfig block back-channel”ah$]”h&]”uh1hÈhhÊh²hh³hÇh´M.ubeh}”(h]”Œpci-pass-thru-devices”ah ]”h"]”Œpci pass-thru devices”ah$]”h&]”uh1hÈhhh²hh³hÇh´Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”hÇuh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(hÍNŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jŠŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”hÇŒ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”(Œdda”]”h÷aŒsr-iov”]”j'auŒrefids”}”Œnameids”}”(jdjajjjGjDjŠj‡jéjæj:j7j‹jˆjÀj½jçjäjj j5j2j\jYuŒ nametypes”}”(jd‰jˆjGˆjЉjé‰j:‰j‹‰jÀ‰jç‰j‰j5‰j\‰uh}”(jahÊjjjDj>j‡jXjæjj7jìjˆj=j½jŽjäjÃj jêj2jjYj8uŒ footnote_refs”}”Œ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”Œ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”…”R”Œparse_messages”]”Œtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nh²hub.