€•ƒ¼Œsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ$/translations/zh_CN/core-api/swiotlb”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/zh_TW/core-api/swiotlb”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/it_IT/core-api/swiotlb”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/ja_JP/core-api/swiotlb”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/ko_KR/core-api/swiotlb”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ$/translations/sp_SP/core-api/swiotlb”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒcomment”“”)”}”(hŒ SPDX-License-Identifier: GPL-2.0”h]”hŒ SPDX-License-Identifier: GPL-2.0”…””}”hh£sbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1h¡hhhžhhŸŒ>/var/lib/git/docbuild/linux/Documentation/core-api/swiotlb.rst”h KubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒDMA and swiotlb”h]”hŒDMA and swiotlb”…””}”(hh»hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hh¶hžhhŸh³h KubhŒ paragraph”“”)”}”(hXZswiotlb is a memory buffer allocator used by the Linux kernel DMA layer. It is typically used when a device doing DMA can't directly access the target memory buffer because of hardware limitations or other requirements. In such a case, the DMA layer calls swiotlb to allocate a temporary memory buffer that conforms to the limitations. The DMA is done to/from this temporary memory buffer, and the CPU copies the data between the temporary buffer and the original target memory buffer. This approach is generically called "bounce buffering", and the temporary memory buffer is called a "bounce buffer".”h]”hXdswiotlb is a memory buffer allocator used by the Linux kernel DMA layer. It is typically used when a device doing DMA can’t directly access the target memory buffer because of hardware limitations or other requirements. In such a case, the DMA layer calls swiotlb to allocate a temporary memory buffer that conforms to the limitations. The DMA is done to/from this temporary memory buffer, and the CPU copies the data between the temporary buffer and the original target memory buffer. This approach is generically called “bounce bufferingâ€, and the temporary memory buffer is called a “bounce bufferâ€.”…””}”(hhËhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khh¶hžhubhÊ)”}”(hXDevice drivers don't interact directly with swiotlb. Instead, drivers inform the DMA layer of the DMA attributes of the devices they are managing, and use the normal DMA map, unmap, and sync APIs when programming a device to do DMA. These APIs use the device DMA attributes and kernel-wide settings to determine if bounce buffering is necessary. If so, the DMA layer manages the allocation, freeing, and sync'ing of bounce buffers. Since the DMA attributes are per device, some devices in a system may use bounce buffering while others do not.”h]”hX#Device drivers don’t interact directly with swiotlb. Instead, drivers inform the DMA layer of the DMA attributes of the devices they are managing, and use the normal DMA map, unmap, and sync APIs when programming a device to do DMA. These APIs use the device DMA attributes and kernel-wide settings to determine if bounce buffering is necessary. If so, the DMA layer manages the allocation, freeing, and sync’ing of bounce buffers. Since the DMA attributes are per device, some devices in a system may use bounce buffering while others do not.”…””}”(hhÙhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khh¶hžhubhÊ)”}”(hXBecause the CPU copies data between the bounce buffer and the original target memory buffer, doing bounce buffering is slower than doing DMA directly to the original memory buffer, and it consumes more CPU resources. So it is used only when necessary for providing DMA functionality.”h]”hXBecause the CPU copies data between the bounce buffer and the original target memory buffer, doing bounce buffering is slower than doing DMA directly to the original memory buffer, and it consumes more CPU resources. So it is used only when necessary for providing DMA functionality.”…””}”(hhçhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khh¶hžhubhµ)”}”(hhh]”(hº)”}”(hŒUsage Scenarios”h]”hŒUsage Scenarios”…””}”(hhøhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hhõhžhhŸh³h KubhÊ)”}”(hX7swiotlb was originally created to handle DMA for devices with addressing limitations. As physical memory sizes grew beyond 4 GiB, some devices could only provide 32-bit DMA addresses. By allocating bounce buffer memory below the 4 GiB line, these devices with addressing limitations could still work and do DMA.”h]”hX7swiotlb was originally created to handle DMA for devices with addressing limitations. As physical memory sizes grew beyond 4 GiB, some devices could only provide 32-bit DMA addresses. By allocating bounce buffer memory below the 4 GiB line, these devices with addressing limitations could still work and do DMA.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KhhõhžhubhÊ)”}”(hX0More recently, Confidential Computing (CoCo) VMs have the guest VM's memory encrypted by default, and the memory is not accessible by the host hypervisor and VMM. For the host to do I/O on behalf of the guest, the I/O must be directed to guest memory that is unencrypted. CoCo VMs set a kernel-wide option to force all DMA I/O to use bounce buffers, and the bounce buffer memory is set up as unencrypted. The host does DMA I/O to/from the bounce buffer memory, and the Linux kernel DMA layer does "sync" operations to cause the CPU to copy the data to/from the original target memory buffer. The CPU copying bridges between the unencrypted and the encrypted memory. This use of bounce buffers allows device drivers to "just work" in a CoCo VM, with no modifications needed to handle the memory encryption complexity.”h]”hX:More recently, Confidential Computing (CoCo) VMs have the guest VM’s memory encrypted by default, and the memory is not accessible by the host hypervisor and VMM. For the host to do I/O on behalf of the guest, the I/O must be directed to guest memory that is unencrypted. CoCo VMs set a kernel-wide option to force all DMA I/O to use bounce buffers, and the bounce buffer memory is set up as unencrypted. The host does DMA I/O to/from the bounce buffer memory, and the Linux kernel DMA layer does “sync†operations to cause the CPU to copy the data to/from the original target memory buffer. The CPU copying bridges between the unencrypted and the encrypted memory. This use of bounce buffers allows device drivers to “just work†in a CoCo VM, with no modifications needed to handle the memory encryption complexity.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K%hhõhžhubhÊ)”}”(hX†Other edge case scenarios arise for bounce buffers. For example, when IOMMU mappings are set up for a DMA operation to/from a device that is considered "untrusted", the device should be given access only to the memory containing the data being transferred. But if that memory occupies only part of an IOMMU granule, other parts of the granule may contain unrelated kernel data. Since IOMMU access control is per-granule, the untrusted device can gain access to the unrelated kernel data. This problem is solved by bounce buffering the DMA operation and ensuring that unused portions of the bounce buffers do not contain any unrelated kernel data.”h]”hXŠOther edge case scenarios arise for bounce buffers. For example, when IOMMU mappings are set up for a DMA operation to/from a device that is considered “untrustedâ€, the device should be given access only to the memory containing the data being transferred. But if that memory occupies only part of an IOMMU granule, other parts of the granule may contain unrelated kernel data. Since IOMMU access control is per-granule, the untrusted device can gain access to the unrelated kernel data. This problem is solved by bounce buffering the DMA operation and ensuring that unused portions of the bounce buffers do not contain any unrelated kernel data.”…””}”(hj"hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K1hhõhžhubeh}”(h]”Œusage-scenarios”ah ]”h"]”Œusage scenarios”ah$]”h&]”uh1h´hh¶hžhhŸh³h Kubhµ)”}”(hhh]”(hº)”}”(hŒCore Functionality”h]”hŒCore Functionality”…””}”(hj;hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hj8hžhhŸh³h K/queue/max_sectors_kb. For example, if a device does not use swiotlb, max_sectors_kb might be 512 KiB or larger. If a device might use swiotlb, max_sectors_kb will be 256 KiB. When min_align_mask is non-zero, max_sectors_kb might be even smaller, such as 252 KiB.”h]”hXsA key device DMA setting is “min_align_maskâ€, which is a power of 2 minus 1 so that some number of low order bits are set, or it may be zero. swiotlb allocations ensure these min_align_mask bits of the physical address of the bounce buffer match the same bits in the address of the original buffer. When min_align_mask is non-zero, it may produce an “alignment offset†in the address of the bounce buffer that slightly reduces the maximum size of an allocation. This potential alignment offset is reflected in the value returned by swiotlb_max_mapping_size(), which can show up in places like /sys/block//queue/max_sectors_kb. For example, if a device does not use swiotlb, max_sectors_kb might be 512 KiB or larger. If a device might use swiotlb, max_sectors_kb will be 256 KiB. When min_align_mask is non-zero, max_sectors_kb might be even smaller, such as 252 KiB.”…””}”(hj¶hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Krhj{hžhubhÊ)”}”(hX>swiotlb_tbl_map_single() also takes an "alloc_align_mask" parameter. This parameter specifies the allocation of bounce buffer space must start at a physical address with the alloc_align_mask bits set to zero. But the actual bounce buffer might start at a larger address if min_align_mask is non-zero. Hence there may be pre-padding space that is allocated prior to the start of the bounce buffer. Similarly, the end of the bounce buffer is rounded up to an alloc_align_mask boundary, potentially resulting in post-padding space. Any pre-padding or post-padding space is not initialized by swiotlb code. The "alloc_align_mask" parameter is used by IOMMU code when mapping for untrusted devices. It is set to the granule size - 1 so that the bounce buffer is allocated entirely from granules that are not used for any other purpose.”h]”hXFswiotlb_tbl_map_single() also takes an “alloc_align_mask†parameter. This parameter specifies the allocation of bounce buffer space must start at a physical address with the alloc_align_mask bits set to zero. But the actual bounce buffer might start at a larger address if min_align_mask is non-zero. Hence there may be pre-padding space that is allocated prior to the start of the bounce buffer. Similarly, the end of the bounce buffer is rounded up to an alloc_align_mask boundary, potentially resulting in post-padding space. Any pre-padding or post-padding space is not initialized by swiotlb code. The “alloc_align_mask†parameter is used by IOMMU code when mapping for untrusted devices. It is set to the granule size - 1 so that the bounce buffer is allocated entirely from granules that are not used for any other purpose.”…””}”(hjÄhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khj{hžhubeh}”(h]”Œcore-functionality-constraints”ah ]”h"]”Œcore functionality constraints”ah$]”h&]”uh1h´hh¶hžhhŸh³h KUubhµ)”}”(hhh]”(hº)”}”(hŒData structures concepts”h]”hŒData structures concepts”…””}”(hjÝhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hjÚhžhhŸh³h KŒubhÊ)”}”(hX7Memory used for swiotlb bounce buffers is allocated from overall system memory as one or more "pools". The default pool is allocated during system boot with a default size of 64 MiB. The default pool size may be modified with the "swiotlb=" kernel boot line parameter. The default size may also be adjusted due to other conditions, such as running in a CoCo VM, as described above. If CONFIG_SWIOTLB_DYNAMIC is enabled, additional pools may be allocated later in the life of the system. Each pool must be a contiguous range of physical memory. The default pool is allocated below the 4 GiB physical address line so it works for devices that can only address 32-bits of physical memory (unless architecture-specific code provides the SWIOTLB_ANY flag). In a CoCo VM, the pool memory must be decrypted before swiotlb is used.”h]”hX?Memory used for swiotlb bounce buffers is allocated from overall system memory as one or more “poolsâ€. The default pool is allocated during system boot with a default size of 64 MiB. The default pool size may be modified with the “swiotlb=†kernel boot line parameter. The default size may also be adjusted due to other conditions, such as running in a CoCo VM, as described above. If CONFIG_SWIOTLB_DYNAMIC is enabled, additional pools may be allocated later in the life of the system. Each pool must be a contiguous range of physical memory. The default pool is allocated below the 4 GiB physical address line so it works for devices that can only address 32-bits of physical memory (unless architecture-specific code provides the SWIOTLB_ANY flag). In a CoCo VM, the pool memory must be decrypted before swiotlb is used.”…””}”(hjëhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KhjÚhžhubhÊ)”}”(hXFEach pool is divided into "slots" of size IO_TLB_SIZE, which is 2 KiB with current definitions. IO_TLB_SEGSIZE contiguous slots (128 slots) constitute what might be called a "slot set". When a bounce buffer is allocated, it occupies one or more contiguous slots. A slot is never shared by multiple bounce buffers. Furthermore, a bounce buffer must be allocated from a single slot set, which leads to the maximum bounce buffer size being IO_TLB_SIZE * IO_TLB_SEGSIZE. Multiple smaller bounce buffers may co-exist in a single slot set if the alignment and size constraints can be met.”h]”hXNEach pool is divided into “slots†of size IO_TLB_SIZE, which is 2 KiB with current definitions. IO_TLB_SEGSIZE contiguous slots (128 slots) constitute what might be called a “slot setâ€. When a bounce buffer is allocated, it occupies one or more contiguous slots. A slot is never shared by multiple bounce buffers. Furthermore, a bounce buffer must be allocated from a single slot set, which leads to the maximum bounce buffer size being IO_TLB_SIZE * IO_TLB_SEGSIZE. Multiple smaller bounce buffers may co-exist in a single slot set if the alignment and size constraints can be met.”…””}”(hjùhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K™hjÚhžhubhÊ)”}”(hXdSlots are also grouped into "areas", with the constraint that a slot set exists entirely in a single area. Each area has its own spin lock that must be held to manipulate the slots in that area. The division into areas avoids contending for a single global spin lock when swiotlb is heavily used, such as in a CoCo VM. The number of areas defaults to the number of CPUs in the system for maximum parallelism, but since an area can't be smaller than IO_TLB_SEGSIZE slots, it might be necessary to assign multiple CPUs to the same area. The number of areas can also be set via the "swiotlb=" kernel boot parameter.”h]”hXnSlots are also grouped into “areasâ€, with the constraint that a slot set exists entirely in a single area. Each area has its own spin lock that must be held to manipulate the slots in that area. The division into areas avoids contending for a single global spin lock when swiotlb is heavily used, such as in a CoCo VM. The number of areas defaults to the number of CPUs in the system for maximum parallelism, but since an area can’t be smaller than IO_TLB_SEGSIZE slots, it might be necessary to assign multiple CPUs to the same area. The number of areas can also be set via the “swiotlb=†kernel boot parameter.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K¢hjÚhžhubhÊ)”}”(hX•When allocating a bounce buffer, if the area associated with the calling CPU does not have enough free space, areas associated with other CPUs are tried sequentially. For each area tried, the area's spin lock must be obtained before trying an allocation, so contention may occur if swiotlb is relatively busy overall. But an allocation request does not fail unless all areas do not have enough free space.”h]”hX—When allocating a bounce buffer, if the area associated with the calling CPU does not have enough free space, areas associated with other CPUs are tried sequentially. For each area tried, the area’s spin lock must be obtained before trying an allocation, so contention may occur if swiotlb is relatively busy overall. But an allocation request does not fail unless all areas do not have enough free space.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K«hjÚhžhubhÊ)”}”(hŒïIO_TLB_SIZE, IO_TLB_SEGSIZE, and the number of areas must all be powers of 2 as the code uses shifting and bit masking to do many of the calculations. The number of areas is rounded up to a power of 2 if necessary to meet this requirement.”h]”hŒïIO_TLB_SIZE, IO_TLB_SEGSIZE, and the number of areas must all be powers of 2 as the code uses shifting and bit masking to do many of the calculations. The number of areas is rounded up to a power of 2 if necessary to meet this requirement.”…””}”(hj#hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K²hjÚhžhubhÊ)”}”(hXThe default pool is allocated with PAGE_SIZE alignment. If an alloc_align_mask argument to swiotlb_tbl_map_single() specifies a larger alignment, one or more initial slots in each slot set might not meet the alloc_align_mask criterium. Because a bounce buffer allocation can't cross a slot set boundary, eliminating those initial slots effectively reduces the max size of a bounce buffer. Currently, there's no problem because alloc_align_mask is set based on IOMMU granule size, and granules cannot be larger than PAGE_SIZE. But if that were to change in the future, the initial pool allocation might need to be done with alignment larger than PAGE_SIZE.”h]”hX“The default pool is allocated with PAGE_SIZE alignment. If an alloc_align_mask argument to swiotlb_tbl_map_single() specifies a larger alignment, one or more initial slots in each slot set might not meet the alloc_align_mask criterium. Because a bounce buffer allocation can’t cross a slot set boundary, eliminating those initial slots effectively reduces the max size of a bounce buffer. Currently, there’s no problem because alloc_align_mask is set based on IOMMU granule size, and granules cannot be larger than PAGE_SIZE. But if that were to change in the future, the initial pool allocation might need to be done with alignment larger than PAGE_SIZE.”…””}”(hj1hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K·hjÚhžhubeh}”(h]”Œdata-structures-concepts”ah ]”h"]”Œdata structures concepts”ah$]”h&]”uh1h´hh¶hžhhŸh³h KŒubhµ)”}”(hhh]”(hº)”}”(hŒDynamic swiotlb”h]”hŒDynamic swiotlb”…””}”(hjJhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hjGhžhhŸh³h KÂubhÊ)”}”(hX#When CONFIG_SWIOTLB_DYNAMIC is enabled, swiotlb can do on-demand expansion of the amount of memory available for allocation as bounce buffers. If a bounce buffer request fails due to lack of available space, an asynchronous background task is kicked off to allocate memory from general system memory and turn it into an swiotlb pool. Creating an additional pool must be done asynchronously because the memory allocation may block, and as noted above, swiotlb requests are not allowed to block. Once the background task is kicked off, the bounce buffer request creates a "transient pool" to avoid returning an "swiotlb full" error. A transient pool has the size of the bounce buffer request, and is deleted when the bounce buffer is freed. Memory for this transient pool comes from the general system memory atomic pool so that creation does not block. Creating a transient pool has relatively high cost, particularly in a CoCo VM where the memory must be decrypted, so it is done only as a stopgap until the background task can add another non-transient pool.”h]”hX+When CONFIG_SWIOTLB_DYNAMIC is enabled, swiotlb can do on-demand expansion of the amount of memory available for allocation as bounce buffers. If a bounce buffer request fails due to lack of available space, an asynchronous background task is kicked off to allocate memory from general system memory and turn it into an swiotlb pool. Creating an additional pool must be done asynchronously because the memory allocation may block, and as noted above, swiotlb requests are not allowed to block. Once the background task is kicked off, the bounce buffer request creates a “transient pool†to avoid returning an “swiotlb full†error. A transient pool has the size of the bounce buffer request, and is deleted when the bounce buffer is freed. Memory for this transient pool comes from the general system memory atomic pool so that creation does not block. Creating a transient pool has relatively high cost, particularly in a CoCo VM where the memory must be decrypted, so it is done only as a stopgap until the background task can add another non-transient pool.”…””}”(hjXhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KÃhjGhžhubhÊ)”}”(hXÑAdding a dynamic pool has limitations. Like with the default pool, the memory must be physically contiguous, so the size is limited to MAX_PAGE_ORDER pages (e.g., 4 MiB on a typical x86 system). Due to memory fragmentation, a max size allocation may not be available. The dynamic pool allocator tries smaller sizes until it succeeds, but with a minimum size of 1 MiB. Given sufficient system memory fragmentation, dynamically adding a pool might not succeed at all.”h]”hXÑAdding a dynamic pool has limitations. Like with the default pool, the memory must be physically contiguous, so the size is limited to MAX_PAGE_ORDER pages (e.g., 4 MiB on a typical x86 system). Due to memory fragmentation, a max size allocation may not be available. The dynamic pool allocator tries smaller sizes until it succeeds, but with a minimum size of 1 MiB. Given sufficient system memory fragmentation, dynamically adding a pool might not succeed at all.”…””}”(hjfhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KÒhjGhžhubhÊ)”}”(hXœThe number of areas in a dynamic pool may be different from the number of areas in the default pool. Because the new pool size is typically a few MiB at most, the number of areas will likely be smaller. For example, with a new pool size of 4 MiB and the 256 KiB minimum area size, only 16 areas can be created. If the system has more than 16 CPUs, multiple CPUs must share an area, creating more lock contention.”h]”hXœThe number of areas in a dynamic pool may be different from the number of areas in the default pool. Because the new pool size is typically a few MiB at most, the number of areas will likely be smaller. For example, with a new pool size of 4 MiB and the 256 KiB minimum area size, only 16 areas can be created. If the system has more than 16 CPUs, multiple CPUs must share an area, creating more lock contention.”…””}”(hjthžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KÙhjGhžhubhÊ)”}”(hX9New pools added via dynamic swiotlb are linked together in a linear list. swiotlb code frequently must search for the pool containing a particular swiotlb physical address, so that search is linear and not performant with a large number of dynamic pools. The data structures could be improved for faster searches.”h]”hX9New pools added via dynamic swiotlb are linked together in a linear list. swiotlb code frequently must search for the pool containing a particular swiotlb physical address, so that search is linear and not performant with a large number of dynamic pools. The data structures could be improved for faster searches.”…””}”(hj‚hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KàhjGhžhubhÊ)”}”(hX2Overall, dynamic swiotlb works best for small configurations with relatively few CPUs. It allows the default swiotlb pool to be smaller so that memory is not wasted, with dynamic pools making more space available if needed (as long as fragmentation isn't an obstacle). It is less useful for large CoCo VMs.”h]”hX4Overall, dynamic swiotlb works best for small configurations with relatively few CPUs. It allows the default swiotlb pool to be smaller so that memory is not wasted, with dynamic pools making more space available if needed (as long as fragmentation isn’t an obstacle). It is less useful for large CoCo VMs.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KæhjGhžhubeh}”(h]”Œdynamic-swiotlb”ah ]”h"]”Œdynamic swiotlb”ah$]”h&]”uh1h´hh¶hžhhŸh³h KÂubhµ)”}”(hhh]”(hº)”}”(hŒData Structure Details”h]”hŒData Structure Details”…””}”(hj©hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hj¦hžhhŸh³h KìubhÊ)”}”(hXºswiotlb is managed with four primary data structures: io_tlb_mem, io_tlb_pool, io_tlb_area, and io_tlb_slot. io_tlb_mem describes a swiotlb memory allocator, which includes the default memory pool and any dynamic or transient pools linked to it. Limited statistics on swiotlb usage are kept per memory allocator and are stored in this data structure. These statistics are available under /sys/kernel/debug/swiotlb when CONFIG_DEBUG_FS is set.”h]”hXºswiotlb is managed with four primary data structures: io_tlb_mem, io_tlb_pool, io_tlb_area, and io_tlb_slot. io_tlb_mem describes a swiotlb memory allocator, which includes the default memory pool and any dynamic or transient pools linked to it. Limited statistics on swiotlb usage are kept per memory allocator and are stored in this data structure. These statistics are available under /sys/kernel/debug/swiotlb when CONFIG_DEBUG_FS is set.”…””}”(hj·hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Kíhj¦hžhubhÊ)”}”(hX:io_tlb_pool describes a memory pool, either the default pool, a dynamic pool, or a transient pool. The description includes the start and end addresses of the memory in the pool, a pointer to an array of io_tlb_area structures, and a pointer to an array of io_tlb_slot structures that are associated with the pool.”h]”hX:io_tlb_pool describes a memory pool, either the default pool, a dynamic pool, or a transient pool. The description includes the start and end addresses of the memory in the pool, a pointer to an array of io_tlb_area structures, and a pointer to an array of io_tlb_slot structures that are associated with the pool.”…””}”(hjÅhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Kôhj¦hžhubhÊ)”}”(hXJio_tlb_area describes an area. The primary field is the spin lock used to serialize access to slots in the area. The io_tlb_area array for a pool has an entry for each area, and is accessed using a 0-based area index derived from the calling processor ID. Areas exist solely to allow parallel access to swiotlb from multiple CPUs.”h]”hXJio_tlb_area describes an area. The primary field is the spin lock used to serialize access to slots in the area. The io_tlb_area array for a pool has an entry for each area, and is accessed using a 0-based area index derived from the calling processor ID. Areas exist solely to allow parallel access to swiotlb from multiple CPUs.”…””}”(hjÓhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Kùhj¦hžhubhÊ)”}”(hXOio_tlb_slot describes an individual memory slot in the pool, with size IO_TLB_SIZE (2 KiB currently). The io_tlb_slot array is indexed by the slot index computed from the bounce buffer address relative to the starting memory address of the pool. The size of struct io_tlb_slot is 24 bytes, so the overhead is about 1% of the slot size.”h]”hXOio_tlb_slot describes an individual memory slot in the pool, with size IO_TLB_SIZE (2 KiB currently). The io_tlb_slot array is indexed by the slot index computed from the bounce buffer address relative to the starting memory address of the pool. The size of struct io_tlb_slot is 24 bytes, so the overhead is about 1% of the slot size.”…””}”(hjáhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Kÿhj¦hžhubhÊ)”}”(hX¢The io_tlb_slot array is designed to meet several requirements. First, the DMA APIs and the corresponding swiotlb APIs use the bounce buffer address as the identifier for a bounce buffer. This address is returned by swiotlb_tbl_map_single(), and then passed as an argument to swiotlb_tbl_unmap_single() and the swiotlb_sync_*() functions. The original memory buffer address obviously must be passed as an argument to swiotlb_tbl_map_single(), but it is not passed to the other APIs. Consequently, swiotlb data structures must save the original memory buffer address so that it can be used when doing sync operations. This original address is saved in the io_tlb_slot array.”h]”hX¢The io_tlb_slot array is designed to meet several requirements. First, the DMA APIs and the corresponding swiotlb APIs use the bounce buffer address as the identifier for a bounce buffer. This address is returned by swiotlb_tbl_map_single(), and then passed as an argument to swiotlb_tbl_unmap_single() and the swiotlb_sync_*() functions. The original memory buffer address obviously must be passed as an argument to swiotlb_tbl_map_single(), but it is not passed to the other APIs. Consequently, swiotlb data structures must save the original memory buffer address so that it can be used when doing sync operations. This original address is saved in the io_tlb_slot array.”…””}”(hjïhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Mhj¦hžhubhÊ)”}”(hXSecond, the io_tlb_slot array must handle partial sync requests. In such cases, the argument to swiotlb_sync_*() is not the address of the start of the bounce buffer but an address somewhere in the middle of the bounce buffer, and the address of the start of the bounce buffer isn't known to swiotlb code. But swiotlb code must be able to calculate the corresponding original memory buffer address to do the CPU copy dictated by the "sync". So an adjusted original memory buffer address is populated into the struct io_tlb_slot for each slot occupied by the bounce buffer. An adjusted "alloc_size" of the bounce buffer is also recorded in each struct io_tlb_slot so a sanity check can be performed on the size of the "sync" operation. The "alloc_size" field is not used except for the sanity check.”h]”hX0Second, the io_tlb_slot array must handle partial sync requests. In such cases, the argument to swiotlb_sync_*() is not the address of the start of the bounce buffer but an address somewhere in the middle of the bounce buffer, and the address of the start of the bounce buffer isn’t known to swiotlb code. But swiotlb code must be able to calculate the corresponding original memory buffer address to do the CPU copy dictated by the “syncâ€. So an adjusted original memory buffer address is populated into the struct io_tlb_slot for each slot occupied by the bounce buffer. An adjusted “alloc_size†of the bounce buffer is also recorded in each struct io_tlb_slot so a sanity check can be performed on the size of the “sync†operation. The “alloc_size†field is not used except for the sanity check.”…””}”(hjýhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Mhj¦hžhubhÊ)”}”(hXThird, the io_tlb_slot array is used to track available slots. The "list" field in struct io_tlb_slot records how many contiguous available slots exist starting at that slot. A "0" indicates that the slot is occupied. A value of "1" indicates only the current slot is available. A value of "2" indicates the current slot and the next slot are available, etc. The maximum value is IO_TLB_SEGSIZE, which can appear in the first slot in a slot set, and indicates that the entire slot set is available. These values are used when searching for available slots to use for a new bounce buffer. They are updated when allocating a new bounce buffer and when freeing a bounce buffer. At pool creation time, the "list" field is initialized to IO_TLB_SEGSIZE down to 1 for the slots in every slot set.”h]”hX*Third, the io_tlb_slot array is used to track available slots. The “list†field in struct io_tlb_slot records how many contiguous available slots exist starting at that slot. A “0†indicates that the slot is occupied. A value of “1†indicates only the current slot is available. A value of “2†indicates the current slot and the next slot are available, etc. The maximum value is IO_TLB_SEGSIZE, which can appear in the first slot in a slot set, and indicates that the entire slot set is available. These values are used when searching for available slots to use for a new bounce buffer. They are updated when allocating a new bounce buffer and when freeing a bounce buffer. At pool creation time, the “list†field is initialized to IO_TLB_SEGSIZE down to 1 for the slots in every slot set.”…””}”(hj hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Mhj¦hžhubhÊ)”}”(hXºFourth, the io_tlb_slot array keeps track of any "padding slots" allocated to meet alloc_align_mask requirements described above. When swiotlb_tbl_map_single() allocates bounce buffer space to meet alloc_align_mask requirements, it may allocate pre-padding space across zero or more slots. But when swiotlb_tbl_unmap_single() is called with the bounce buffer address, the alloc_align_mask value that governed the allocation, and therefore the allocation of any padding slots, is not known. The "pad_slots" field records the number of padding slots so that swiotlb_tbl_unmap_single() can free them. The "pad_slots" value is recorded only in the first non-padding slot allocated to the bounce buffer.”h]”hXÆFourth, the io_tlb_slot array keeps track of any “padding slots†allocated to meet alloc_align_mask requirements described above. When swiotlb_tbl_map_single() allocates bounce buffer space to meet alloc_align_mask requirements, it may allocate pre-padding space across zero or more slots. But when swiotlb_tbl_unmap_single() is called with the bounce buffer address, the alloc_align_mask value that governed the allocation, and therefore the allocation of any padding slots, is not known. The “pad_slots†field records the number of padding slots so that swiotlb_tbl_unmap_single() can free them. The “pad_slots†value is recorded only in the first non-padding slot allocated to the bounce buffer.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h M(hj¦hžhubeh}”(h]”Œdata-structure-details”ah ]”h"]”Œdata structure details”ah$]”h&]”uh1h´hh¶hžhhŸh³h Kìubhµ)”}”(hhh]”(hº)”}”(hŒRestricted pools”h]”hŒRestricted pools”…””}”(hj2hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hj/hžhhŸh³h M4ubhÊ)”}”(hX%The swiotlb machinery is also used for "restricted pools", which are pools of memory separate from the default swiotlb pool, and that are dedicated for DMA use by a particular device. Restricted pools provide a level of DMA memory protection on systems with limited hardware protection capabilities, such as those lacking an IOMMU. Such usage is specified by DeviceTree entries and requires that CONFIG_DMA_RESTRICTED_POOL is set. Each restricted pool is based on its own io_tlb_mem data structure that is independent of the main swiotlb io_tlb_mem.”h]”hX)The swiotlb machinery is also used for “restricted poolsâ€, which are pools of memory separate from the default swiotlb pool, and that are dedicated for DMA use by a particular device. Restricted pools provide a level of DMA memory protection on systems with limited hardware protection capabilities, such as those lacking an IOMMU. Such usage is specified by DeviceTree entries and requires that CONFIG_DMA_RESTRICTED_POOL is set. Each restricted pool is based on its own io_tlb_mem data structure that is independent of the main swiotlb io_tlb_mem.”…””}”(hj@hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h M5hj/hžhubhÊ)”}”(hX Restricted pools add swiotlb_alloc() and swiotlb_free() APIs, which are called from the dma_alloc_*() and dma_free_*() APIs. The swiotlb_alloc/free() APIs allocate/free slots from/to the restricted pool directly and do not go through swiotlb_tbl_map/unmap_single().”h]”hX Restricted pools add swiotlb_alloc() and swiotlb_free() APIs, which are called from the dma_alloc_*() and dma_free_*() APIs. The swiotlb_alloc/free() APIs allocate/free slots from/to the restricted pool directly and do not go through swiotlb_tbl_map/unmap_single().”…””}”(hjNhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h M>hj/hžhubeh}”(h]”Œrestricted-pools”ah ]”h"]”Œrestricted pools”ah$]”h&]”uh1h´hh¶hžhhŸh³h M4ubeh}”(h]”Œdma-and-swiotlb”ah ]”h"]”Œdma and swiotlb”ah$]”h&]”uh1h´hhhžhhŸh³h Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”h³uh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(h¹NŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”h³Œ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”Œrefids”}”Œnameids”}”(jijfj5j2jxjuj×jÔjDjAj£j j,j)jaj^uŒ nametypes”}”(ji‰j5‰jx‰j׉jD‰j£‰j,‰ja‰uh}”(jfh¶j2hõjuj8jÔj{jAjÚj jGj)j¦j^j/uŒ footnote_refs”}”Œ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”Œ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”…”R”Œparse_messages”]”Œtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nhžhub.