DRM Driver uAPI¶
drm/i915 uAPI¶
uevents generated by i915 on it’s device node
- I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
event from the gpu l3 cache. Additional information supplied is ROW, BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keep track of these events and if a specific cache-line seems to have a persistent error remap it with the l3 remapping tool supplied in intel-gpu-tools. The value supplied with the event is always 1.
- I915_ERROR_UEVENT - Generated upon error detection, currently only via
hangcheck. The error detection event is a good indicator of when things began to go badly. The value supplied with the event is a 1 upon error detection, and a 0 upon reset completion, signifying no more error exists. NOTE: Disabling hangcheck or reset via module parameter will cause the related events to not be seen.
- I915_RESET_UEVENT - Event is generated just before an attempt to reset the
GPU. The value supplied with the event is always 1. NOTE: Disable reset via module parameter will cause this event to not be seen.
-
struct
i915_user_extension
¶ Base class for defining a chain of extensions
Definition
struct i915_user_extension {
__u64 next_extension;
__u32 name;
__u32 flags;
__u32 rsvd[4];
};
Members
next_extension
Pointer to the next
struct i915_user_extension
, or zero if the end.name
Name of the extension.
Note that the name here is just some integer.
Also note that the name space for this is not global for the whole driver, but rather its scope/meaning is limited to the specific piece of uAPI which has embedded the
struct i915_user_extension
.flags
MBZ
All undefined bits must be zero.
rsvd
MBZ
Reserved for future use; must be zero.
Description
Many interfaces need to grow over time. In most cases we can simply extend the struct and have userspace pass in more data. Another option, as demonstrated by Vulkan’s approach to providing extensions for forward and backward compatibility, is to use a list of optional structs to provide those extra details.
The key advantage to using an extension chain is that it allows us to redefine the interface more easily than an ever growing struct of increasing complexity, and for large parts of that interface to be entirely optional. The downside is more pointer chasing; chasing across the __user boundary with pointers encapsulated inside u64.
Example chaining:
struct i915_user_extension ext3 {
.next_extension = 0, // end
.name = ...,
};
struct i915_user_extension ext2 {
.next_extension = (uintptr_t)&ext3,
.name = ...,
};
struct i915_user_extension ext1 {
.next_extension = (uintptr_t)&ext2,
.name = ...,
};
Typically the struct i915_user_extension
would be embedded in some uAPI
struct, and in this case we would feed it the head of the chain(i.e ext1),
which would then apply all of the above extensions.
perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
-
struct
drm_i915_gem_mmap_offset
¶ Retrieve an offset so we can mmap this buffer object.
Definition
struct drm_i915_gem_mmap_offset {
__u32 handle;
__u32 pad;
__u64 offset;
__u64 flags;
#define I915_MMAP_OFFSET_GTT 0;
#define I915_MMAP_OFFSET_WC 1;
#define I915_MMAP_OFFSET_WB 2;
#define I915_MMAP_OFFSET_UC 3;
#define I915_MMAP_OFFSET_FIXED 4;
__u64 extensions;
};
Members
handle
Handle for the object being mapped.
pad
Must be zero
offset
The fake offset to use for subsequent mmap call
This is a fixed-size type for 32/64 compatibility.
flags
Flags for extended behaviour.
It is mandatory that one of the MMAP_OFFSET types should be included:
I915_MMAP_OFFSET_GTT: Use mmap with the object bound to GTT. (Write-Combined)
I915_MMAP_OFFSET_WC: Use Write-Combined caching.
I915_MMAP_OFFSET_WB: Use Write-Back caching.
I915_MMAP_OFFSET_FIXED: Use object placement to determine caching.
On devices with local memory I915_MMAP_OFFSET_FIXED is the only valid type. On devices without local memory, this caching mode is invalid.
As caching mode when specifying I915_MMAP_OFFSET_FIXED, WC or WB will be used, depending on the object placement on creation. WB will be used when the object can only exist in system memory, WC otherwise.
extensions
Zero-terminated chain of extensions.
No current extensions defined; mbz.
Description
This struct is passed as argument to the DRM_IOCTL_I915_GEM_MMAP_OFFSET ioctl,
and is used to retrieve the fake offset to mmap an object specified by handle
.
The legacy way of using DRM_IOCTL_I915_GEM_MMAP is removed on gen12+.
DRM_IOCTL_I915_GEM_MMAP_GTT is an older supported alias to this struct, but will behave
as setting the extensions
to 0, and flags
to I915_MMAP_OFFSET_GTT.
-
struct
drm_i915_gem_set_domain
¶ Adjust the objects write or read domain, in preparation for accessing the pages via some CPU domain.
Definition
struct drm_i915_gem_set_domain {
__u32 handle;
__u32 read_domains;
__u32 write_domain;
};
Members
handle
Handle for the object.
read_domains
New read domains.
write_domain
New write domain.
Note that having something in the write domain implies it’s in the read domain, and only that read domain.
Description
Specifying a new write or read domain will flush the object out of the previous domain(if required), before then updating the objects domain tracking with the new domain.
Note this might involve waiting for the object first if it is still active on the GPU.
Supported values for read_domains and write_domain:
I915_GEM_DOMAIN_WC: Uncached write-combined domain
I915_GEM_DOMAIN_CPU: CPU cache domain
I915_GEM_DOMAIN_GTT: Mappable aperture domain
All other domains are rejected.
Note that for discrete, starting from DG1, this is no longer supported, and
is instead rejected. On such platforms the CPU domain is effectively static,
where we also only support a single drm_i915_gem_mmap_offset
cache mode,
which can’t be set explicitly and instead depends on the object placements,
as per the below.
Implicit caching rules, starting from DG1:
If any of the object placements (see
drm_i915_gem_create_ext_memory_regions
) contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and mapped as write-combined only.Everything else is always allocated and mapped as write-back, with the guarantee that everything is also coherent with the GPU.
Note that this is likely to change in the future again, where we might need
more flexibility on future devices, so making this all explicit as part of a
new drm_i915_gem_create_ext
extension is probable.
-
struct
drm_i915_gem_caching
¶ Set or get the caching for given object handle.
Definition
struct drm_i915_gem_caching {
__u32 handle;
#define I915_CACHING_NONE 0;
#define I915_CACHING_CACHED 1;
#define I915_CACHING_DISPLAY 2;
__u32 caching;
};
Members
handle
Handle of the buffer to set/get the caching level.
caching
The GTT caching level to apply or possible return value.
The supported caching values:
I915_CACHING_NONE:
GPU access is not coherent with CPU caches. Default for machines without an LLC. This means manual flushing might be needed, if we want GPU access to be coherent.
I915_CACHING_CACHED:
GPU access is coherent with CPU caches and furthermore the data is cached in last-level caches shared between CPU cores and the GPU GT.
I915_CACHING_DISPLAY:
Special GPU caching mode which is coherent with the scanout engines. Transparently falls back to I915_CACHING_NONE on platforms where no special cache mode (like write-through or gfdt flushing) is available. The kernel automatically sets this mode when using a buffer as a scanout target. Userspace can manually set this mode to avoid a costly stall and clflush in the hotpath of drawing the first frame.
Description
Allow userspace to control the GTT caching bits for a given object when the object is later mapped through the ppGTT(or GGTT on older platforms lacking ppGTT support, or if the object is used for scanout). Note that this might require unbinding the object from the GTT first, if its current caching value doesn’t match.
Note that this all changes on discrete platforms, starting from DG1, the set/get caching is no longer supported, and is now rejected. Instead the CPU caching attributes(WB vs WC) will become an immutable creation time property for the object, along with the GTT caching level. For now we don’t expose any new uAPI for this, instead on DG1 this is all implicit, although this largely shouldn’t matter since DG1 is coherent by default(without any way of controlling it).
Implicit caching rules, starting from DG1:
If any of the object placements (see
drm_i915_gem_create_ext_memory_regions
) contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and mapped as write-combined only.Everything else is always allocated and mapped as write-back, with the guarantee that everything is also coherent with the GPU.
Note that this is likely to change in the future again, where we might need
more flexibility on future devices, so making this all explicit as part of a
new drm_i915_gem_create_ext
extension is probable.
Side note: Part of the reason for this is that changing the at-allocation-time CPU caching attributes for the pages might be required(and is expensive) if we need to then CPU map the pages later with different caching attributes. This inconsistent caching behaviour, while supported on x86, is not universally supported on other architectures. So for simplicity we opt for setting everything at creation time, whilst also making it immutable, on discrete platforms.
Virtual Engine uAPI
Virtual engine is a concept where userspace is able to configure a set of physical engines, submit a batch buffer, and let the driver execute it on any engine from the set as it sees fit.
This is primarily useful on parts which have multiple instances of a same class engine, like for example GT3+ Skylake parts with their two VCS engines.
For instance userspace can enumerate all engines of a certain class using the previously described Engine Discovery uAPI. After that userspace can create a GEM context with a placeholder slot for the virtual engine (using I915_ENGINE_CLASS_INVALID and I915_ENGINE_CLASS_INVALID_NONE for class and instance respectively) and finally using the I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE extension place a virtual engine in the same reserved slot.
Example of creating a virtual engine and submitting a batch buffer to it:
I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(virtual, 2) = {
.base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
.engine_index = 0, // Place this virtual engine into engine map slot 0
.num_siblings = 2,
.engines = { { I915_ENGINE_CLASS_VIDEO, 0 },
{ I915_ENGINE_CLASS_VIDEO, 1 }, },
};
I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 1) = {
.engines = { { I915_ENGINE_CLASS_INVALID,
I915_ENGINE_CLASS_INVALID_NONE } },
.extensions = to_user_pointer(&virtual), // Chains after load_balance extension
};
struct drm_i915_gem_context_create_ext_setparam p_engines = {
.base = {
.name = I915_CONTEXT_CREATE_EXT_SETPARAM,
},
.param = {
.param = I915_CONTEXT_PARAM_ENGINES,
.value = to_user_pointer(&engines),
.size = sizeof(engines),
},
};
struct drm_i915_gem_context_create_ext create = {
.flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
.extensions = to_user_pointer(&p_engines);
};
ctx_id = gem_context_create_ext(drm_fd, &create);
// Now we have created a GEM context with its engine map containing a
// single virtual engine. Submissions to this slot can go either to
// vcs0 or vcs1, depending on the load balancing algorithm used inside
// the driver. The load balancing is dynamic from one batch buffer to
// another and transparent to userspace.
...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 0; // Submits to index 0 which is the virtual engine
gem_execbuf(drm_fd, &execbuf);
-
struct
i915_context_engines_parallel_submit
¶ Configure engine for parallel submission.
Definition
struct i915_context_engines_parallel_submit {
struct i915_user_extension base;
__u16 engine_index;
__u16 width;
__u16 num_siblings;
__u16 mbz16;
__u64 flags;
__u64 mbz64[3];
struct i915_engine_class_instance engines[0];
};
Members
base
base user extension.
engine_index
slot for parallel engine
width
number of contexts per parallel engine or in other words the number of batches in each submission
num_siblings
number of siblings per context or in other words the number of possible placements for each submission
mbz16
reserved for future use; must be zero
flags
all undefined flags must be zero, currently not defined flags
mbz64
reserved for future use; must be zero
engines
2-d array of engine instances to configure parallel engine
length = width (i) * num_siblings (j) index = j + i * num_siblings
Description
Setup a slot in the context engine map to allow multiple BBs to be submitted in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU in parallel. Multiple hardware contexts are created internally in the i915 to run these BBs. Once a slot is configured for N BBs only N BBs can be submitted in each execbuf IOCTL and this is implicit behavior e.g. The user doesn’t tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how many BBs there are based on the slot’s configuration. The N BBs are the last N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
The default placement behavior is to create implicit bonds between each context if each context maps to more than 1 physical engine (e.g. context is a virtual engine). Also we only allow contexts of same engine class and these contexts must be in logically contiguous order. Examples of the placement behavior are described below. Lastly, the default is to not allow BBs to be preempted mid-batch. Rather insert coordinated preemption points on all hardware contexts between each set of BBs. Flags could be added in the future to change both of these default behaviors.
Returns -EINVAL if hardware context placement configuration is invalid or if the placement configuration isn’t supported on the platform / submission interface. Returns -ENODEV if extension isn’t supported on the platform / submission interface.
Examples syntax:
CS[X] = generic engine of same class, logical instance X
INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
Example 1 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=1,
engines=CS[0],CS[1])
Results in the following valid placement:
CS[0], CS[1]
Example 2 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=2,
engines=CS[0],CS[2],CS[1],CS[3])
Results in the following valid placements:
CS[0], CS[1]
CS[2], CS[3]
This can be thought of as two virtual engines, each containing two
engines thereby making a 2D array. However, there are bonds tying the
entries together and placing restrictions on how they can be scheduled.
Specifically, the scheduler can choose only vertical columns from the 2D
array. That is, CS[0] is bonded to CS[1] and CS[2] to CS[3]. So if the
scheduler wants to submit to CS[0], it must also choose CS[1] and vice
versa. Same for CS[2] requires also using CS[3].
VE[0] = CS[0], CS[2]
VE[1] = CS[1], CS[3]
Example 3 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=2,
engines=CS[0],CS[1],CS[1],CS[3])
Results in the following valid and invalid placements:
CS[0], CS[1]
CS[1], CS[3] - Not logically contiguous, return -EINVAL
Context Engine Map uAPI
Context engine map is a new way of addressing engines when submitting batch- buffers, replacing the existing way of using identifiers like I915_EXEC_BLT inside the flags field of struct drm_i915_gem_execbuffer2.
To use it created GEM contexts need to be configured with a list of engines the user is intending to submit to. This is accomplished using the I915_CONTEXT_PARAM_ENGINES parameter and struct i915_context_param_engines.
For such contexts the I915_EXEC_RING_MASK field becomes an index into the configured map.
Example of creating such context and submitting against it:
I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 2) = {
.engines = { { I915_ENGINE_CLASS_RENDER, 0 },
{ I915_ENGINE_CLASS_COPY, 0 } }
};
struct drm_i915_gem_context_create_ext_setparam p_engines = {
.base = {
.name = I915_CONTEXT_CREATE_EXT_SETPARAM,
},
.param = {
.param = I915_CONTEXT_PARAM_ENGINES,
.value = to_user_pointer(&engines),
.size = sizeof(engines),
},
};
struct drm_i915_gem_context_create_ext create = {
.flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
.extensions = to_user_pointer(&p_engines);
};
ctx_id = gem_context_create_ext(drm_fd, &create);
// We have now created a GEM context with two engines in the map:
// Index 0 points to rcs0 while index 1 points to bcs0. Other engines
// will not be accessible from this context.
...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 0; // Submits to index 0, which is rcs0 for this context
gem_execbuf(drm_fd, &execbuf);
...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 1; // Submits to index 0, which is bcs0 for this context
gem_execbuf(drm_fd, &execbuf);
-
struct
drm_i915_gem_userptr
¶ Create GEM object from user allocated memory.
Definition
struct drm_i915_gem_userptr {
__u64 user_ptr;
__u64 user_size;
__u32 flags;
#define I915_USERPTR_READ_ONLY 0x1;
#define I915_USERPTR_PROBE 0x2;
#define I915_USERPTR_UNSYNCHRONIZED 0x80000000;
__u32 handle;
};
Members
user_ptr
The pointer to the allocated memory.
Needs to be aligned to PAGE_SIZE.
user_size
The size in bytes for the allocated memory. This will also become the object size.
Needs to be aligned to PAGE_SIZE, and should be at least PAGE_SIZE, or larger.
flags
Supported flags:
I915_USERPTR_READ_ONLY:
Mark the object as readonly, this also means GPU access can only be readonly. This is only supported on HW which supports readonly access through the GTT. If the HW can’t support readonly access, an error is returned.
I915_USERPTR_PROBE:
Probe the provided user_ptr range and validate that the user_ptr is indeed pointing to normal memory and that the range is also valid. For example if some garbage address is given to the kernel, then this should complain.
Returns -EFAULT if the probe failed.
Note that this doesn’t populate the backing pages, and also doesn’t guarantee that the object will remain valid when the object is eventually used.
The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE returns a non-zero value.
I915_USERPTR_UNSYNCHRONIZED:
NOT USED. Setting this flag will result in an error.
handle
Returned handle for the object.
Object handles are nonzero.
Description
Userptr objects have several restrictions on what ioctls can be used with the object handle.
-
struct
drm_i915_query_item
¶ An individual query for the kernel to process.
Definition
struct drm_i915_query_item {
__u64 query_id;
#define DRM_I915_QUERY_TOPOLOGY_INFO 1;
#define DRM_I915_QUERY_ENGINE_INFO 2;
#define DRM_I915_QUERY_PERF_CONFIG 3;
#define DRM_I915_QUERY_MEMORY_REGIONS 4;
__s32 length;
__u32 flags;
#define DRM_I915_QUERY_PERF_CONFIG_LIST 1;
#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2;
#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID 3;
__u64 data_ptr;
};
Members
query_id
The id for this query
length
When set to zero by userspace, this is filled with the size of the data to be written at the data_ptr pointer. The kernel sets this value to a negative value to signal an error on a particular query item.
flags
When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0.
When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the following:
DRM_I915_QUERY_PERF_CONFIG_LIST
DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID
DRM_I915_QUERY_PERF_CONFIG_FOR_UUID
data_ptr
Data will be written at the location pointed by data_ptr when the value of length matches the length of the data to be written by the kernel.
Description
The behaviour is determined by the query_id. Note that exactly what data_ptr is also depends on the specific query_id.
-
struct
drm_i915_query
¶ Supply an array of
struct drm_i915_query_item
for the kernel to fill out.
Definition
struct drm_i915_query {
__u32 num_items;
__u32 flags;
__u64 items_ptr;
};
Members
num_items
The number of elements in the items_ptr array
flags
Unused for now. Must be cleared to zero.
items_ptr
Pointer to an array of
struct drm_i915_query_item
. The number of array elements is num_items.
Description
Note that this is generally a two step process for each struct
drm_i915_query_item
in the array:
Call the DRM_IOCTL_I915_QUERY, giving it our array of
struct drm_i915_query_item
, withdrm_i915_query_item.length
set to zero. The kernel will then fill in the size, in bytes, which tells userspace how memory it needs to allocate for the blob(say for an array of properties).Next we call DRM_IOCTL_I915_QUERY again, this time with the
drm_i915_query_item.data_ptr
equal to our newly allocated blob. Note that thedrm_i915_query_item.length
should still be the same as what the kernel previously set. At this point the kernel can fill in the blob.
Note that for some query items it can make sense for userspace to just pass in a buffer/blob equal to or larger than the required size. In this case only a single ioctl call is needed. For some smaller query items this can work quite well.
Engine Discovery uAPI
Engine discovery uAPI is a way of enumerating physical engines present in a GPU associated with an open i915 DRM file descriptor. This supersedes the old way of using DRM_IOCTL_I915_GETPARAM and engine identifiers like I915_PARAM_HAS_BLT.
The need for this interface came starting with Icelake and newer GPUs, which started to establish a pattern of having multiple engines of a same class, where not all instances were always completely functionally equivalent.
Entry point for this uapi is DRM_IOCTL_I915_QUERY with the DRM_I915_QUERY_ENGINE_INFO as the queried item id.
Example for getting the list of engines:
struct drm_i915_query_engine_info *info;
struct drm_i915_query_item item = {
.query_id = DRM_I915_QUERY_ENGINE_INFO;
};
struct drm_i915_query query = {
.num_items = 1,
.items_ptr = (uintptr_t)&item,
};
int err, i;
// First query the size of the blob we need, this needs to be large
// enough to hold our array of engines. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
//
// Alternatively a large buffer can be allocated straight away enabling
// querying in one pass, in which case item.length should contain the
// length of the provided buffer.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...
info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with info on all engines.
item.data_ptr = (uintptr_t)&info,
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...
// We can now access each engine in the array
for (i = 0; i < info->num_engines; i++) {
struct drm_i915_engine_info einfo = info->engines[i];
u16 class = einfo.engine.class;
u16 instance = einfo.engine.instance;
....
}
free(info);
Each of the enumerated engines, apart from being defined by its class and instance (see struct i915_engine_class_instance), also can have flags and capabilities defined as documented in i915_drm.h.
For instance video engines which support HEVC encoding will have the I915_VIDEO_CLASS_CAPABILITY_HEVC capability bit set.
Engine discovery only fully comes to its own when combined with the new way of addressing engines when submitting batch buffers using contexts with engine maps configured.
-
struct
drm_i915_engine_info
¶
Definition
struct drm_i915_engine_info {
struct i915_engine_class_instance engine;
__u32 rsvd0;
__u64 flags;
#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE (1 << 0);
__u64 capabilities;
#define I915_VIDEO_CLASS_CAPABILITY_HEVC (1 << 0);
#define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC (1 << 1);
__u16 logical_instance;
__u16 rsvd1[3];
__u64 rsvd2[3];
};
Members
engine
Engine class and instance.
rsvd0
Reserved field.
flags
Engine flags.
capabilities
Capabilities of this engine.
logical_instance
Logical instance of engine
rsvd1
Reserved fields.
rsvd2
Reserved fields.
Description
Describes one engine and it’s capabilities as known to the driver.
-
struct
drm_i915_query_engine_info
¶
Definition
struct drm_i915_query_engine_info {
__u32 num_engines;
__u32 rsvd[3];
struct drm_i915_engine_info engines[];
};
Members
num_engines
Number of
struct drm_i915_engine_info
structs following.rsvd
MBZ
engines
Marker for drm_i915_engine_info structures.
Description
Engine info query enumerates all engines known to the driver by filling in
an array of struct drm_i915_engine_info
structures.
-
enum
drm_i915_gem_memory_class
¶ Supported memory classes
Constants
I915_MEMORY_CLASS_SYSTEM
System memory
I915_MEMORY_CLASS_DEVICE
Device local-memory
-
struct
drm_i915_gem_memory_class_instance
¶ Identify particular memory region
Definition
struct drm_i915_gem_memory_class_instance {
__u16 memory_class;
__u16 memory_instance;
};
Members
memory_class
memory_instance
Which instance
-
struct
drm_i915_memory_region_info
¶ Describes one region as known to the driver.
Definition
struct drm_i915_memory_region_info {
struct drm_i915_gem_memory_class_instance region;
__u32 rsvd0;
__u64 probed_size;
__u64 unallocated_size;
__u64 rsvd1[8];
};
Members
region
The class:instance pair encoding
rsvd0
MBZ
probed_size
Memory probed by the driver (-1 = unknown)
unallocated_size
Estimate of memory remaining (-1 = unknown)
rsvd1
MBZ
Description
Note that we reserve some stuff here for potential future work. As an example we might want expose the capabilities for a given region, which could include things like if the region is CPU mappable/accessible, what are the supported mapping types etc.
Note that to extend struct drm_i915_memory_region_info
and struct
drm_i915_query_memory_regions
in the future the plan is to do the following:
struct drm_i915_memory_region_info {
struct drm_i915_gem_memory_class_instance region;
union {
__u32 rsvd0;
__u32 new_thing1;
};
...
union {
__u64 rsvd1[8];
struct {
__u64 new_thing2;
__u64 new_thing3;
...
};
};
};
With this things should remain source compatible between versions for userspace, even as we add new fields.
Note this is using both struct drm_i915_query_item
and struct drm_i915_query
.
For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
at drm_i915_query_item.query_id
.
-
struct
drm_i915_query_memory_regions
¶
Definition
struct drm_i915_query_memory_regions {
__u32 num_regions;
__u32 rsvd[3];
struct drm_i915_memory_region_info regions[];
};
Members
num_regions
Number of supported regions
rsvd
MBZ
regions
Info about each supported region
Description
The region info query enumerates all regions known to the driver by filling
in an array of struct drm_i915_memory_region_info
structures.
Example for getting the list of supported regions:
struct drm_i915_query_memory_regions *info;
struct drm_i915_query_item item = {
.query_id = DRM_I915_QUERY_MEMORY_REGIONS;
};
struct drm_i915_query query = {
.num_items = 1,
.items_ptr = (uintptr_t)&item,
};
int err, i;
// First query the size of the blob we need, this needs to be large
// enough to hold our array of regions. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...
info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with the all the region info.
item.data_ptr = (uintptr_t)&info,
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...
// We can now access each region in the array
for (i = 0; i < info->num_regions; i++) {
struct drm_i915_memory_region_info mr = info->regions[i];
u16 class = mr.region.class;
u16 instance = mr.region.instance;
....
}
free(info);
-
struct
drm_i915_gem_create_ext
¶ Existing gem_create behaviour, with added extension support using
struct i915_user_extension
.
Definition
struct drm_i915_gem_create_ext {
__u64 size;
__u32 handle;
__u32 flags;
#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0;
#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1;
__u64 extensions;
};
Members
size
Requested size for the object.
The (page-aligned) allocated size for the object will be returned.
DG2 64K min page size implications:
On discrete platforms, starting from DG2, we have to contend with GTT page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE objects. Specifically the hardware only supports 64K or larger GTT page sizes for such memory. The kernel will already ensure that all I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page sizes underneath.
Note that the returned size here will always reflect any required rounding up done by the kernel, i.e 4K will now become 64K on devices such as DG2.
Special DG2 GTT address alignment requirement:
The GTT alignment will also need to be at least 2M for such objects.
Note that due to how the hardware implements 64K GTT page support, we have some further complications:
1) The entire PDE (which covers a 2MB virtual address range), must contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden by the hardware.
2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM objects.
To keep things simple for userland, we mandate that any GTT mappings must be aligned to and rounded up to 2MB. The kernel will internally pad them out to the next 2MB boundary. As this only wastes virtual address space and avoids userland having to copy any needlessly complicated PDE sharing scheme (coloring) and only affects DG2, this is deemed to be a good compromise.
handle
Returned handle for the object.
Object handles are nonzero.
flags
MBZ
extensions
The chain of extensions to apply to this object.
This will be useful in the future when we need to support several different extensions, and we need to apply more than one when creating the object. See
struct i915_user_extension
.If we don’t supply any extensions then we get the same old gem_create behaviour.
For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
struct drm_i915_gem_create_ext_memory_regions
.For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
struct drm_i915_gem_create_ext_protected_content
.
Description
Note that in the future we want to have our buffer flags here, at least for the stuff that is immutable. Previously we would have two ioctls, one to create the object with gem_create, and another to apply various parameters, however this creates some ambiguity for the params which are considered immutable. Also in general we’re phasing out the various SET/GET ioctls.
-
struct
drm_i915_gem_create_ext_memory_regions
¶ The I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
Definition
struct drm_i915_gem_create_ext_memory_regions {
struct i915_user_extension base;
__u32 pad;
__u32 num_regions;
__u64 regions;
};
Members
base
Extension link. See
struct i915_user_extension
.pad
MBZ
num_regions
Number of elements in the regions array.
regions
The regions/placements array.
An array of
struct drm_i915_gem_memory_class_instance
.
Description
Set the object with the desired set of placements/regions in priority order. Each entry must be unique and supported by the device.
This is provided as an array of struct drm_i915_gem_memory_class_instance
, or
an equivalent layout of class:instance pair encodings. See struct
drm_i915_query_memory_regions
and DRM_I915_QUERY_MEMORY_REGIONS for how to
query the supported regions for a device.
As an example, on discrete devices, if we wish to set the placement as device local-memory we can do something like:
struct drm_i915_gem_memory_class_instance region_lmem = {
.memory_class = I915_MEMORY_CLASS_DEVICE,
.memory_instance = 0,
};
struct drm_i915_gem_create_ext_memory_regions regions = {
.base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
.regions = (uintptr_t)®ion_lmem,
.num_regions = 1,
};
struct drm_i915_gem_create_ext create_ext = {
.size = 16 * PAGE_SIZE,
.extensions = (uintptr_t)®ions,
};
int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...
At which point we get the object handle in drm_i915_gem_create_ext.handle
,
along with the final object size in drm_i915_gem_create_ext.size
, which
should account for any rounding up, if required.
-
struct
drm_i915_gem_create_ext_protected_content
¶ The I915_OBJECT_PARAM_PROTECTED_CONTENT extension.
Definition
struct drm_i915_gem_create_ext_protected_content {
struct i915_user_extension base;
__u32 flags;
};
Members
base
Extension link. See
struct i915_user_extension
.flags
reserved for future usage, currently MBZ
Description
If this extension is provided, buffer contents are expected to be protected by PXP encryption and require decryption for scan out and processing. This is only possible on platforms that have PXP enabled, on all other scenarios using this extension will cause the ioctl to fail and return -ENODEV. The flags parameter is reserved for future expansion and must currently be set to zero.
The buffer contents are considered invalid after a PXP session teardown.
The encryption is guaranteed to be processed correctly only if the object is submitted with a context created using the I915_CONTEXT_PARAM_PROTECTED_CONTENT flag. This will also enable extra checks at submission time on the validity of the objects involved.
Below is an example on how to create a protected object:
struct drm_i915_gem_create_ext_protected_content protected_ext = {
.base = { .name = I915_GEM_CREATE_EXT_PROTECTED_CONTENT },
.flags = 0,
};
struct drm_i915_gem_create_ext create_ext = {
.size = PAGE_SIZE,
.extensions = (uintptr_t)&protected_ext,
};
int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...