Buffer Sharing and Synchronization (dma-buf)¶
The dma-buf subsystem provides the framework for sharing buffers for hardware (DMA) access across multiple device drivers and subsystems, and for synchronizing asynchronous hardware access.
This is used, for example, by drm "prime" multi-GPU support, but is of course not limited to GPU use cases.
The three main components of this are: (1) dma-buf, representing a sg_table and exposed to userspace as a file descriptor to allow passing between devices, (2) fence, which provides a mechanism to signal when one device has finished access, and (3) reservation, which manages the shared or exclusive fence(s) associated with the buffer.
Reservation Objects¶
The reservation object provides a mechanism to manage a container of dma_fence object associated with a resource. A reservation object can have any number of fences attaches to it. Each fence carries an usage parameter determining how the operation represented by the fence is using the resource. The RCU mechanism is used to protect read access to fences from locked write-side updates.
See struct dma_resv
for more details.
Parameters
struct dma_resv *obj
the reservation object
Parameters
struct dma_resv *obj
the reservation object
-
int dma_resv_reserve_fences(struct dma_resv *obj, unsigned int num_fences)¶
Reserve space to add fences to a dma_resv object.
Parameters
struct dma_resv *obj
reservation object
unsigned int num_fences
number of fences we want to add
Description
Should be called before dma_resv_add_fence()
. Must be called with obj
locked through dma_resv_lock()
.
Note that the preallocated slots need to be re-reserved if obj is unlocked
at any time before calling dma_resv_add_fence()
. This is validated when
CONFIG_DEBUG_MUTEXES is enabled.
RETURNS Zero for success, or -errno
Parameters
struct dma_resv *obj
the dma_resv object to reset
Description
Reset the number of pre-reserved fence slots to test that drivers do
correct slot allocation using dma_resv_reserve_fences()
. See also
dma_resv_list.max_fences
.
-
void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence, enum dma_resv_usage usage)¶
Add a fence to the dma_resv obj
Parameters
struct dma_resv *obj
the reservation object
struct dma_fence *fence
the fence to add
enum dma_resv_usage usage
how the fence is used, see
enum dma_resv_usage
Description
Add a fence to a slot, obj must be locked with dma_resv_lock()
, and
dma_resv_reserve_fences()
has been called.
See also dma_resv.fence
for a discussion of the semantics.
-
void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, struct dma_fence *replacement, enum dma_resv_usage usage)¶
replace fences in the dma_resv obj
Parameters
struct dma_resv *obj
the reservation object
uint64_t context
the context of the fences to replace
struct dma_fence *replacement
the new fence to use instead
enum dma_resv_usage usage
how the new fence is used, see
enum dma_resv_usage
Description
Replace fences with a specified context with a new fence. Only valid if the operation represented by the original fence has no longer access to the resources represented by the dma_resv object when the new fence completes.
And example for using this is replacing a preemption fence with a page table update fence which makes the resource inaccessible.
-
struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor)¶
first fence in an unlocked dma_resv obj.
Parameters
struct dma_resv_iter *cursor
the cursor with the current position
Description
Subsequent fences are iterated with dma_resv_iter_next_unlocked()
.
Beware that the iterator can be restarted. Code which accumulates statistics
or similar needs to check for this with dma_resv_iter_is_restarted()
. For
this reason prefer the locked dma_resv_iter_first()
whenver possible.
Returns the first fence from an unlocked dma_resv obj.
-
struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor)¶
next fence in an unlocked dma_resv obj.
Parameters
struct dma_resv_iter *cursor
the cursor with the current position
Description
Beware that the iterator can be restarted. Code which accumulates statistics
or similar needs to check for this with dma_resv_iter_is_restarted()
. For
this reason prefer the locked dma_resv_iter_next()
whenver possible.
Returns the next fence from an unlocked dma_resv obj.
-
struct dma_fence *dma_resv_iter_first(struct dma_resv_iter *cursor)¶
first fence from a locked dma_resv object
Parameters
struct dma_resv_iter *cursor
cursor to record the current position
Description
Subsequent fences are iterated with dma_resv_iter_next_unlocked()
.
Return the first fence in the dma_resv object while holding the
dma_resv.lock
.
-
struct dma_fence *dma_resv_iter_next(struct dma_resv_iter *cursor)¶
next fence from a locked dma_resv object
Parameters
struct dma_resv_iter *cursor
cursor to record the current position
Description
Return the next fences from the dma_resv object while holding the
dma_resv.lock
.
-
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)¶
Copy all fences from src to dst.
Parameters
struct dma_resv *dst
the destination reservation object
struct dma_resv *src
the source reservation object
Description
Copy all fences from src to dst. dst-lock must be held.
-
int dma_resv_get_fences(struct dma_resv *obj, enum dma_resv_usage usage, unsigned int *num_fences, struct dma_fence ***fences)¶
Get an object's fences fences without update side lock held
Parameters
struct dma_resv *obj
the reservation object
enum dma_resv_usage usage
controls which fences to include, see
enum dma_resv_usage
.unsigned int *num_fences
the number of fences returned
struct dma_fence ***fences
the array of fence ptrs returned (array is krealloc'd to the required size, and must be freed by caller)
Description
Retrieve all fences from the reservation object. Returns either zero or -ENOMEM.
-
int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage, struct dma_fence **fence)¶
Get a single fence for all the fences
Parameters
struct dma_resv *obj
the reservation object
enum dma_resv_usage usage
controls which fences to include, see
enum dma_resv_usage
.struct dma_fence **fence
the resulting fence
Description
Get a single fence representing all the fences inside the resv object. Returns either 0 for success or -ENOMEM.
Warning: This can't be used like this when adding the fence back to the resv object since that can lead to stack corruption when finalizing the dma_fence_array.
Returns 0 on success and negative error values on failure.
-
long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, bool intr, unsigned long timeout)¶
Wait on reservation's objects fences
Parameters
struct dma_resv *obj
the reservation object
enum dma_resv_usage usage
controls which fences to include, see
enum dma_resv_usage
.bool intr
if true, do interruptible wait
unsigned long timeout
timeout value in jiffies or zero to return immediately
Description
Callers are not required to hold specific locks, but maybe hold
dma_resv_lock()
already
RETURNS
Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
greater than zero on success.
-
void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage, ktime_t deadline)¶
Set a deadline on reservation's objects fences
Parameters
struct dma_resv *obj
the reservation object
enum dma_resv_usage usage
controls which fences to include, see
enum dma_resv_usage
.ktime_t deadline
the requested deadline (MONOTONIC)
Description
May be called without holding the dma_resv lock. Sets deadline on all fences filtered by usage.
-
bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage)¶
Test if a reservation object's fences have been signaled.
Parameters
struct dma_resv *obj
the reservation object
enum dma_resv_usage usage
controls which fences to include, see
enum dma_resv_usage
.
Description
Callers are not required to hold specific locks, but maybe hold
dma_resv_lock()
already.
RETURNS
True if all fences signaled, else false.
-
void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq)¶
Dump description of the resv object into seq_file
Parameters
struct dma_resv *obj
the reservation object
struct seq_file *seq
the seq_file to dump the description into
Description
Dump a textual description of the fences inside an dma_resv object into the seq_file.
-
enum dma_resv_usage¶
how the fences from a dma_resv obj are used
Constants
DMA_RESV_USAGE_KERNEL
For in kernel memory management only.
This should only be used for things like copying or clearing memory with a DMA hardware engine for the purpose of kernel memory management.
Drivers always must wait for those fences before accessing the resource protected by the dma_resv object. The only exception for that is when the resource is known to be locked down in place by pinning it previously.
DMA_RESV_USAGE_WRITE
Implicit write synchronization.
This should only be used for userspace command submissions which add an implicit write dependency.
DMA_RESV_USAGE_READ
Implicit read synchronization.
This should only be used for userspace command submissions which add an implicit read dependency.
DMA_RESV_USAGE_BOOKKEEP
No implicit sync.
This should be used by submissions which don't want to participate in any implicit synchronization.
The most common case are preemption fences, page table updates, TLB flushes as well as explicit synced user submissions.
Explicit synced user user submissions can be promoted to DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE as needed using
dma_buf_import_sync_file()
when implicit synchronization should become necessary after initial adding of the fence.
Description
This enum describes the different use cases for a dma_resv object and controls which fences are returned when queried.
An important fact is that there is the order KERNEL<WRITE<READ<BOOKKEEP and when the dma_resv object is asked for fences for one use case the fences for the lower use case are returned as well.
For example when asking for WRITE fences then the KERNEL fences are returned as well. Similar when asked for READ fences then both WRITE and KERNEL fences are returned as well.
Already used fences can be promoted in the sense that a fence with DMA_RESV_USAGE_BOOKKEEP could become DMA_RESV_USAGE_READ by adding it again with this usage. But fences can never be degraded in the sense that a fence with DMA_RESV_USAGE_WRITE could become DMA_RESV_USAGE_READ.
-
enum dma_resv_usage dma_resv_usage_rw(bool write)¶
helper for implicit sync
Parameters
bool write
true if we create a new implicit sync write
Description
This returns the implicit synchronization usage for write or read accesses,
see enum dma_resv_usage
and dma_buf.resv
.
-
struct dma_resv¶
a reservation object manages fences for a buffer
Definition:
struct dma_resv {
struct ww_mutex lock;
struct dma_resv_list __rcu *fences;
};
Members
lock
Update side lock. Don't use directly, instead use the wrapper functions like
dma_resv_lock()
anddma_resv_unlock()
.Drivers which use the reservation object to manage memory dynamically also use this lock to protect buffer object state like placement, allocation policies or throughout command submission.
fences
Array of fences which where added to the dma_resv object
A new fence is added by calling
dma_resv_add_fence()
. Since this often needs to be done past the point of no return in command submission it cannot fail, and therefore sufficient slots need to be reserved by callingdma_resv_reserve_fences()
.
Description
This is a container for dma_fence objects which needs to handle multiple use cases.
One use is to synchronize cross-driver access to a struct dma_buf
, either for
dynamic buffer management or just to handle implicit synchronization between
different users of the buffer in userspace. See dma_buf.resv
for a more
in-depth discussion.
The other major use is to manage access and locking within a driver in a
buffer based memory manager. struct ttm_buffer_object is the canonical
example here, since this is where reservation objects originated from. But
use in drivers is spreading and some drivers also manage struct
drm_gem_object
with the same scheme.
-
struct dma_resv_iter¶
current position into the dma_resv fences
Definition:
struct dma_resv_iter {
struct dma_resv *obj;
enum dma_resv_usage usage;
struct dma_fence *fence;
enum dma_resv_usage fence_usage;
unsigned int index;
struct dma_resv_list *fences;
unsigned int num_fences;
bool is_restarted;
};
Members
obj
The dma_resv object we iterate over
usage
Return fences with this usage or lower.
fence
the currently handled fence
fence_usage
the usage of the current fence
index
index into the shared fences
fences
the shared fences; private, MUST not dereference
num_fences
number of fences
is_restarted
true if this is the first returned fence
Description
Don't touch this directly in the driver, use the accessor function instead.
IMPORTANT
When using the lockless iterators like dma_resv_iter_next_unlocked()
or
dma_resv_for_each_fence_unlocked()
beware that the iterator can be restarted.
Code which accumulates statistics or similar needs to check for this with
dma_resv_iter_is_restarted()
.
-
void dma_resv_iter_begin(struct dma_resv_iter *cursor, struct dma_resv *obj, enum dma_resv_usage usage)¶
initialize a dma_resv_iter object
Parameters
struct dma_resv_iter *cursor
The dma_resv_iter object to initialize
struct dma_resv *obj
The dma_resv object which we want to iterate over
enum dma_resv_usage usage
controls which fences to include, see
enum dma_resv_usage
.
-
void dma_resv_iter_end(struct dma_resv_iter *cursor)¶
cleanup a dma_resv_iter object
Parameters
struct dma_resv_iter *cursor
the dma_resv_iter object which should be cleaned up
Description
Make sure that the reference to the fence in the cursor is properly dropped.
-
enum dma_resv_usage dma_resv_iter_usage(struct dma_resv_iter *cursor)¶
Return the usage of the current fence
Parameters
struct dma_resv_iter *cursor
the cursor of the current position
Description
Returns the usage of the currently processed fence.
-
bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor)¶
test if this is the first fence after a restart
Parameters
struct dma_resv_iter *cursor
the cursor with the current position
Description
Return true if this is the first fence in an iteration after a restart.
-
dma_resv_for_each_fence_unlocked¶
dma_resv_for_each_fence_unlocked (cursor, fence)
unlocked fence iterator
Parameters
cursor
a
struct dma_resv_iter
pointerfence
the current fence
Description
Iterate over the fences in a struct dma_resv
object without holding the
dma_resv.lock
and using RCU instead. The cursor needs to be initialized
with dma_resv_iter_begin()
and cleaned up with dma_resv_iter_end()
. Inside
the iterator a reference to the dma_fence is held and the RCU lock dropped.
Beware that the iterator can be restarted when the struct dma_resv
for
cursor is modified. Code which accumulates statistics or similar needs to
check for this with dma_resv_iter_is_restarted()
. For this reason prefer the
lock iterator dma_resv_for_each_fence()
whenever possible.
-
dma_resv_for_each_fence¶
dma_resv_for_each_fence (cursor, obj, usage, fence)
fence iterator
Parameters
cursor
a
struct dma_resv_iter
pointerobj
a dma_resv object pointer
usage
controls which fences to return
fence
the current fence
Description
Iterate over the fences in a struct dma_resv
object while holding the
dma_resv.lock
. all_fences controls if the shared fences are returned as
well. The cursor initialisation is part of the iterator and the fence stays
valid as long as the lock is held and so no extra reference to the fence is
taken.
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Locks the reservation object for exclusive access and modification. Note, that the lock is only against other writers, readers will run concurrently with a writer under RCU. The seqlock is used to notify readers if they overlap with a writer.
As the reservation object may be locked by multiple parties in an undefined order, a #ww_acquire_ctx is passed to unwind if a cycle is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation object may be locked by itself by passing NULL as ctx.
When a die situation is indicated by returning -EDEADLK all locks held by
ctx must be unlocked and then dma_resv_lock_slow()
called on obj.
Unlocked by calling dma_resv_unlock()
.
See also dma_resv_lock_interruptible()
for the interruptible variant.
-
int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx)¶
lock the reservation object
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Locks the reservation object interruptible for exclusive access and modification. Note, that the lock is only against other writers, readers will run concurrently with a writer under RCU. The seqlock is used to notify readers if they overlap with a writer.
As the reservation object may be locked by multiple parties in an undefined order, a #ww_acquire_ctx is passed to unwind if a cycle is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation object may be locked by itself by passing NULL as ctx.
When a die situation is indicated by returning -EDEADLK all locks held by
ctx must be unlocked and then dma_resv_lock_slow_interruptible()
called on
obj.
Unlocked by calling dma_resv_unlock()
.
-
void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx)¶
slowpath lock the reservation object
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Acquires the reservation object after a die case. This function
will sleep until the lock becomes available. See dma_resv_lock()
as
well.
See also dma_resv_lock_slow_interruptible()
for the interruptible variant.
-
int dma_resv_lock_slow_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx)¶
slowpath lock the reservation object, interruptible
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Acquires the reservation object interruptible after a die case. This function
will sleep until the lock becomes available. See
dma_resv_lock_interruptible()
as well.
Parameters
struct dma_resv *obj
the reservation object
Description
Tries to lock the reservation object for exclusive access and modification. Note, that the lock is only against other writers, readers will run concurrently with a writer under RCU. The seqlock is used to notify readers if they overlap with a writer.
Also note that since no context is provided, no deadlock protection is possible, which is also not needed for a trylock.
Returns true if the lock was acquired, false otherwise.
Parameters
struct dma_resv *obj
the reservation object
Description
Returns true if the mutex is locked, false if unlocked.
-
struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj)¶
returns the context used to lock the object
Parameters
struct dma_resv *obj
the reservation object
Description
Returns the context used to lock a reservation object or NULL if no context was used or the object is not locked at all.
WARNING: This interface is pretty horrible, but TTM needs it because it doesn't pass the struct ww_acquire_ctx around in some very long callchains. Everyone else just uses it to check whether they're holding a reservation or not.
Parameters
struct dma_resv *obj
the reservation object
Description
Unlocks the reservation object following exclusive access.
DMA Fences¶
DMA fences, represented by struct dma_fence
, are the kernel internal
synchronization primitive for DMA operations like GPU rendering, video
encoding/decoding, or displaying buffers on a screen.
A fence is initialized using dma_fence_init()
and completed using
dma_fence_signal()
. Fences are associated with a context, allocated through
dma_fence_context_alloc()
, and all fences on the same context are
fully ordered.
Since the purposes of fences is to facilitate cross-device and cross-application synchronization, there's multiple ways to use one:
Individual fences can be exposed as a
sync_file
, accessed as a file descriptor from userspace, created by callingsync_file_create()
. This is called explicit fencing, since userspace passes around explicit synchronization points.Some subsystems also have their own explicit fencing primitives, like
drm_syncobj
. Compared tosync_file
, adrm_syncobj
allows the underlying fence to be updated.Then there's also implicit fencing, where the synchronization points are implicitly passed around as part of shared
dma_buf
instances. Such implicit fences are stored instruct dma_resv
through thedma_buf.resv
pointer.
DMA Fence Cross-Driver Contract¶
Since dma_fence
provide a cross driver contract, all drivers must follow the
same rules:
Fences must complete in a reasonable time. Fences which represent kernels and shaders submitted by userspace, which could run forever, must be backed up by timeout and gpu hang recovery code. Minimally that code must prevent further command submission and force complete all in-flight fences, e.g. when the driver or hardware do not support gpu reset, or if the gpu reset failed for some reason. Ideally the driver supports gpu recovery which only affects the offending userspace context, and no other userspace submissions.
Drivers may have different ideas of what completion within a reasonable time means. Some hang recovery code uses a fixed timeout, others a mix between observing forward progress and increasingly strict timeouts. Drivers should not try to second guess timeout handling of fences from other drivers.
To ensure there's no deadlocks of
dma_fence_wait()
against other locks drivers should annotate all code required to reachdma_fence_signal()
, which completes the fences, withdma_fence_begin_signalling()
anddma_fence_end_signalling()
.Drivers are allowed to call
dma_fence_wait()
while holdingdma_resv_lock()
. This means any code required for fence completion cannot acquire adma_resv
lock. Note that this also pulls in the entire established locking hierarchy arounddma_resv_lock()
anddma_resv_unlock()
.Drivers are allowed to call
dma_fence_wait()
from theirshrinker
callbacks. This means any code required for fence completion cannot allocate memory with GFP_KERNEL.Drivers are allowed to call
dma_fence_wait()
from theirmmu_notifier
respectivelymmu_interval_notifier
callbacks. This means any code required for fence completeion cannot allocate memory with GFP_NOFS or GFP_NOIO. Only GFP_ATOMIC is permissible, which might fail.
Note that only GPU drivers have a reasonable excuse for both requiring
mmu_interval_notifier
and shrinker
callbacks at the same time as having to
track asynchronous compute work using dma_fence
. No driver outside of
drivers/gpu should ever call dma_fence_wait()
in such contexts.
DMA Fence Signalling Annotations¶
Proving correctness of all the kernel code around dma_fence
through code
review and testing is tricky for a few reasons:
It is a cross-driver contract, and therefore all drivers must follow the same rules for lock nesting order, calling contexts for various functions and anything else significant for in-kernel interfaces. But it is also impossible to test all drivers in a single machine, hence brute-force N vs. N testing of all combinations is impossible. Even just limiting to the possible combinations is infeasible.
There is an enormous amount of driver code involved. For render drivers there's the tail of command submission, after fences are published, scheduler code, interrupt and workers to process job completion, and timeout, gpu reset and gpu hang recovery code. Plus for integration with core mm with have
mmu_notifier
, respectivelymmu_interval_notifier
, andshrinker
. For modesetting drivers there's the commit tail functions between when fences for an atomic modeset are published, and when the corresponding vblank completes, including any interrupt processing and related workers. Auditing all that code, across all drivers, is not feasible.Due to how many other subsystems are involved and the locking hierarchies this pulls in there is extremely thin wiggle-room for driver-specific differences.
dma_fence
interacts with almost all of the core memory handling through page fault handlers viadma_resv
,dma_resv_lock()
anddma_resv_unlock()
. On the other side it also interacts through all allocation sites throughmmu_notifier
andshrinker
.
Furthermore lockdep does not handle cross-release dependencies, which means
any deadlocks between dma_fence_wait()
and dma_fence_signal()
can't be caught
at runtime with some quick testing. The simplest example is one thread
waiting on a dma_fence
while holding a lock:
lock(A);
dma_fence_wait(B);
unlock(A);
while the other thread is stuck trying to acquire the same lock, which prevents it from signalling the fence the previous thread is stuck waiting on:
lock(A);
unlock(A);
dma_fence_signal(B);
By manually annotating all code relevant to signalling a dma_fence
we can
teach lockdep about these dependencies, which also helps with the validation
headache since now lockdep can check all the rules for us:
cookie = dma_fence_begin_signalling();
lock(A);
unlock(A);
dma_fence_signal(B);
dma_fence_end_signalling(cookie);
For using dma_fence_begin_signalling()
and dma_fence_end_signalling()
to
annotate critical sections the following rules need to be observed:
All code necessary to complete a
dma_fence
must be annotated, from the point where a fence is accessible to other threads, to the point wheredma_fence_signal()
is called. Un-annotated code can contain deadlock issues, and due to the very strict rules and many corner cases it is infeasible to catch these just with review or normal stress testing.struct dma_resv
deserves a special note, since the readers are only protected by rcu. This means the signalling critical section starts as soon as the new fences are installed, even beforedma_resv_unlock()
is called.The only exception are fast paths and opportunistic signalling code, which calls
dma_fence_signal()
purely as an optimization, but is not required to guarantee completion of adma_fence
. The usual example is a wait IOCTL which callsdma_fence_signal()
, while the mandatory completion path goes through a hardware interrupt and possible job completion worker.To aid composability of code, the annotations can be freely nested, as long as the overall locking hierarchy is consistent. The annotations also work both in interrupt and process context. Due to implementation details this requires that callers pass an opaque cookie from
dma_fence_begin_signalling()
todma_fence_end_signalling()
.Validation against the cross driver contract is implemented by priming lockdep with the relevant hierarchy at boot-up. This means even just testing with a single device is enough to validate a driver, at least as far as deadlocks with
dma_fence_wait()
againstdma_fence_signal()
are concerned.
DMA Fence Deadline Hints¶
In an ideal world, it would be possible to pipeline a workload sufficiently that a utilization based device frequency governor could arrive at a minimum frequency that meets the requirements of the use-case, in order to minimize power consumption. But in the real world there are many workloads which defy this ideal. For example, but not limited to:
Workloads that ping-pong between device and CPU, with alternating periods of CPU waiting for device, and device waiting on CPU. This can result in devfreq and cpufreq seeing idle time in their respective domains and in result reduce frequency.
Workloads that interact with a periodic time based deadline, such as double buffered GPU rendering vs vblank sync'd page flipping. In this scenario, missing a vblank deadline results in an increase in idle time on the GPU (since it has to wait an additional vblank period), sending a signal to the GPU's devfreq to reduce frequency, when in fact the opposite is what is needed.
To this end, deadline hint(s) can be set on a dma_fence
via dma_fence_set_deadline
.
The deadline hint provides a way for the waiting driver, or userspace, to
convey an appropriate sense of urgency to the signaling driver.
A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace facing APIs). The time could either be some point in the future (such as the vblank based deadline for page-flipping, or the start of a compositor's composition cycle), or the current time to indicate an immediate deadline hint (Ie. forward progress cannot be made until this fence is signaled).
Multiple deadlines may be set on a given fence, even in parallel. See the
documentation for dma_fence_ops.set_deadline
.
The deadline hint is just that, a hint. The driver that created the fence may react by increasing frequency, making different scheduling choices, etc. Or doing nothing at all.
DMA Fences Functions Reference¶
Parameters
void
no arguments
Description
Return a stub fence which is already signaled. The fence's timestamp corresponds to the first time after boot this function is called.
Parameters
void
no arguments
Description
Return a newly allocated and signaled stub fence.
-
u64 dma_fence_context_alloc(unsigned num)¶
allocate an array of fence contexts
Parameters
unsigned num
amount of contexts to allocate
Description
This function will return the first index of the number of fence contexts
allocated. The fence context is used for setting dma_fence.context
to a
unique number by passing the context to dma_fence_init()
.
-
bool dma_fence_begin_signalling(void)¶
begin a critical DMA fence signalling section
Parameters
void
no arguments
Description
Drivers should use this to annotate the beginning of any code section
required to eventually complete dma_fence
by calling dma_fence_signal()
.
The end of these critical sections are annotated with
dma_fence_end_signalling()
.
Opaque cookie needed by the implementation, which needs to be passed to
dma_fence_end_signalling()
.
Return
-
void dma_fence_end_signalling(bool cookie)¶
end a critical DMA fence signalling section
Parameters
bool cookie
opaque cookie from
dma_fence_begin_signalling()
Description
Closes a critical section annotation opened by dma_fence_begin_signalling()
.
-
int dma_fence_signal_timestamp_locked(struct dma_fence *fence, ktime_t timestamp)¶
signal completion of a fence
Parameters
struct dma_fence *fence
the fence to signal
ktime_t timestamp
fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time. Set the timestamp provided as the fence
signal timestamp.
Unlike dma_fence_signal_timestamp()
, this function must be called with
dma_fence.lock
held.
Returns 0 on success and a negative error value when fence has been signalled already.
-
int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)¶
signal completion of a fence
Parameters
struct dma_fence *fence
the fence to signal
ktime_t timestamp
fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time. Set the timestamp provided as the fence
signal timestamp.
Returns 0 on success and a negative error value when fence has been signalled already.
Parameters
struct dma_fence *fence
the fence to signal
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time.
Unlike dma_fence_signal()
, this function must be called with dma_fence.lock
held.
Returns 0 on success and a negative error value when fence has been signalled already.
Parameters
struct dma_fence *fence
the fence to signal
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time.
Returns 0 on success and a negative error value when fence has been signalled already.
-
signed long dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)¶
sleep until the fence gets signaled or until timeout elapses
Parameters
struct dma_fence *fence
the fence to wait on
bool intr
if true, do an interruptible wait
signed long timeout
timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
Description
Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the remaining timeout in jiffies on success. Other error values may be returned on custom implementations.
Performs a synchronous wait on this fence. It is assumed the caller directly or indirectly (buf-mgr between reservation and committing) holds a reference to the fence, otherwise the fence might be freed before return, resulting in undefined behavior.
See also dma_fence_wait()
and dma_fence_wait_any_timeout()
.
Parameters
struct kref *kref
Description
This is the default release functions for dma_fence
. Drivers shouldn't call
this directly, but instead call dma_fence_put()
.
Parameters
struct dma_fence *fence
fence to release
Description
This is the default implementation for dma_fence_ops.release
. It calls
kfree_rcu()
on fence.
Parameters
struct dma_fence *fence
the fence to enable
Description
This will request for sw signaling to be enabled, to make the fence
complete as soon as possible. This calls dma_fence_ops.enable_signaling
internally.
-
int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb, dma_fence_func_t func)¶
add a callback to be called when the fence is signaled
Parameters
struct dma_fence *fence
the fence to wait on
struct dma_fence_cb *cb
the callback to register
dma_fence_func_t func
the function to call
Description
Add a software callback to the fence. The caller should keep a reference to the fence.
cb will be initialized by dma_fence_add_callback()
, no initialization
by the caller is required. Any number of callbacks can be registered
to a fence, but a callback can only be registered to one fence at a time.
If fence is already signaled, this function will return -ENOENT (and not call the callback).
Note that the callback can be called from an atomic context or irq context.
Returns 0 in case of success, -ENOENT if the fence is already signaled and -EINVAL in case of error.
Parameters
struct dma_fence *fence
the dma_fence to query
Description
This wraps dma_fence_get_status_locked()
to return the error status
condition on a signaled fence. See dma_fence_get_status_locked()
for more
details.
Returns 0 if the fence has not yet been signaled, 1 if the fence has been signaled without an error condition, or a negative error code if the fence has been completed in err.
-
bool dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)¶
remove a callback from the signaling list
Parameters
struct dma_fence *fence
the fence to wait on
struct dma_fence_cb *cb
the callback to remove
Description
Remove a previously queued callback from the fence. This function returns true if the callback is successfully removed, or false if the fence has already been signaled.
WARNING: Cancelling a callback should only be done if you really know what you're doing, since deadlocks and race conditions could occur all too easily. For this reason, it should only ever be done on hardware lockup recovery, with a reference held to the fence.
Behaviour is undefined if cb has not been added to fence using
dma_fence_add_callback()
beforehand.
-
signed long dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)¶
default sleep until the fence gets signaled or until timeout elapses
Parameters
struct dma_fence *fence
the fence to wait on
bool intr
if true, do an interruptible wait
signed long timeout
timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
Description
Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the remaining timeout in jiffies on success. If timeout is zero the value one is returned if the fence is already signaled for consistency with other functions taking a jiffies timeout.
-
signed long dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, bool intr, signed long timeout, uint32_t *idx)¶
sleep until any fence gets signaled or until timeout elapses
Parameters
struct dma_fence **fences
array of fences to wait on
uint32_t count
number of fences to wait on
bool intr
if true, do an interruptible wait
signed long timeout
timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
uint32_t *idx
used to store the first signaled fence index, meaningful only on positive return
Description
Returns -EINVAL on custom fence wait implementation, -ERESTARTSYS if interrupted, 0 if the wait timed out, or the remaining timeout in jiffies on success.
Synchronous waits for the first fence in the array to be signaled. The caller needs to hold a reference to all fences in the array, otherwise a fence might be freed before return, resulting in undefined behavior.
See also dma_fence_wait()
and dma_fence_wait_timeout()
.
-
void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)¶
set desired fence-wait deadline hint
Parameters
struct dma_fence *fence
the fence that is to be waited on
ktime_t deadline
the time by which the waiter hopes for the fence to be signaled
Description
Give the fence signaler a hint about an upcoming deadline, such as vblank, by which point the waiter would prefer the fence to be signaled by. This is intended to give feedback to the fence signaler to aid in power management decisions, such as boosting GPU frequency if a periodic vblank deadline is approaching but the fence is not yet signaled..
-
void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)¶
Dump fence describtion into seq_file
Parameters
struct dma_fence *fence
the 6fence to describe
struct seq_file *seq
the seq_file to put the textual description into
Description
Dump a textual description of the fence and it's state into the seq_file.
-
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, spinlock_t *lock, u64 context, u64 seqno)¶
Initialize a custom fence.
Parameters
struct dma_fence *fence
the fence to initialize
const struct dma_fence_ops *ops
the dma_fence_ops for operations on this fence
spinlock_t *lock
the irqsafe spinlock to use for locking this fence
u64 context
the execution context this fence is run on
u64 seqno
a linear increasing sequence number for this context
Description
Initializes an allocated fence, the caller doesn't have to keep its
refcount after committing with this fence, but it will need to hold a
refcount again if dma_fence_ops.enable_signaling
gets called.
context and seqno are used for easy comparison between fences, allowing
to check which fence is later by simply using dma_fence_later()
.
-
struct dma_fence¶
software synchronization primitive
Definition:
struct dma_fence {
spinlock_t *lock;
const struct dma_fence_ops *ops;
union {
struct list_head cb_list;
ktime_t timestamp;
struct rcu_head rcu;
};
u64 context;
u64 seqno;
unsigned long flags;
struct kref refcount;
int error;
};
Members
lock
spin_lock_irqsave used for locking
ops
dma_fence_ops associated with this fence
{unnamed_union}
anonymous
cb_list
list of all callbacks to call
timestamp
Timestamp when the fence was signaled.
rcu
used for releasing fence with kfree_rcu
context
execution context this fence belongs to, returned by
dma_fence_context_alloc()
seqno
the sequence number of this fence inside the execution context, can be compared to decide which fence would be signaled later.
flags
A mask of DMA_FENCE_FLAG_* defined below
refcount
refcount for this fence
error
Optional, only valid if < 0, must be set before calling dma_fence_signal, indicates that the fence has completed with an error.
Description
the flags member must be manipulated and read using the appropriate atomic ops (bit_*), so taking the spinlock will not be needed most of the time.
DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called DMA_FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the implementer of the fence for its own purposes. Can be used in different ways by different fence implementers, so do not rely on this.
Since atomic bitops are used, this is not guaranteed to be the case. Particularly, if the bit was set, but dma_fence_signal was called right before this bit was set, it would have been able to set the DMA_FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called. Adding a check for DMA_FENCE_FLAG_SIGNALED_BIT after setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that after dma_fence_signal was called, any enable_signaling call will have either been completed, or never called at all.
-
struct dma_fence_cb¶
callback for
dma_fence_add_callback()
Definition:
struct dma_fence_cb {
struct list_head node;
dma_fence_func_t func;
};
Members
node
used by
dma_fence_add_callback()
to append this struct to fence::cb_listfunc
dma_fence_func_t to call
Description
This struct will be initialized by dma_fence_add_callback()
, additional
data can be passed along by embedding dma_fence_cb in another struct.
-
struct dma_fence_ops¶
operations implemented for fence
Definition:
struct dma_fence_ops {
bool use_64bit_seqno;
const char * (*get_driver_name)(struct dma_fence *fence);
const char * (*get_timeline_name)(struct dma_fence *fence);
bool (*enable_signaling)(struct dma_fence *fence);
bool (*signaled)(struct dma_fence *fence);
signed long (*wait)(struct dma_fence *fence, bool intr, signed long timeout);
void (*release)(struct dma_fence *fence);
void (*fence_value_str)(struct dma_fence *fence, char *str, int size);
void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
};
Members
use_64bit_seqno
True if this dma_fence implementation uses 64bit seqno, false otherwise.
get_driver_name
Returns the driver name. This is a callback to allow drivers to compute the name at runtime, without having it to store permanently for each fence, or build a cache of some sort.
This callback is mandatory.
get_timeline_name
Return the name of the context this fence belongs to. This is a callback to allow drivers to compute the name at runtime, without having it to store permanently for each fence, or build a cache of some sort.
This callback is mandatory.
enable_signaling
Enable software signaling of fence.
For fence implementations that have the capability for hw->hw signaling, they can implement this op to enable the necessary interrupts, or insert commands into cmdstream, etc, to avoid these costly operations for the common case where only hw->hw synchronization is required. This is called in the first
dma_fence_wait()
ordma_fence_add_callback()
path to let the fence implementation know that there is another driver waiting on the signal (ie. hw->sw case).This function can be called from atomic context, but not from irq context, so normal spinlocks can be used.
A return value of false indicates the fence already passed, or some failure occurred that made it impossible to enable signaling. True indicates successful enabling.
dma_fence.error
may be set in enable_signaling, but only when false is returned.Since many implementations can call
dma_fence_signal()
even when before enable_signaling has been called there's a race window, where thedma_fence_signal()
might result in the final fence reference being released and its memory freed. To avoid this, implementations of this callback should grab their own reference usingdma_fence_get()
, to be released when the fence is signalled (through e.g. the interrupt handler).This callback is optional. If this callback is not present, then the driver must always have signaling enabled.
signaled
Peek whether the fence is signaled, as a fastpath optimization for e.g.
dma_fence_wait()
ordma_fence_add_callback()
. Note that this callback does not need to make any guarantees beyond that a fence once indicates as signalled must always return true from this callback. This callback may return false even if the fence has completed already, in this case information hasn't propogated throug the system yet. See alsodma_fence_is_signaled()
.May set
dma_fence.error
if returning true.This callback is optional.
wait
Custom wait implementation, defaults to
dma_fence_default_wait()
if not set.Deprecated and should not be used by new implementations. Only used by existing implementations which need special handling for their hardware reset procedure.
Must return -ERESTARTSYS if the wait is intr = true and the wait was interrupted, and remaining jiffies if fence has signaled, or 0 if wait timed out. Can also return other error values on custom implementations, which should be treated as if the fence is signaled. For example a hardware lockup could be reported like that.
release
Called on destruction of fence to release additional resources. Can be called from irq context. This callback is optional. If it is NULL, then
dma_fence_free()
is instead called as the default implementation.fence_value_str
Callback to fill in free-form debug info specific to this fence, like the sequence number.
This callback is optional.
timeline_value_str
Fills in the current value of the timeline as a string, like the sequence number. Note that the specific fence passed to this function should not matter, drivers should only use it to look up the corresponding timeline structures.
set_deadline
Callback to allow a fence waiter to inform the fence signaler of an upcoming deadline, such as vblank, by which point the waiter would prefer the fence to be signaled by. This is intended to give feedback to the fence signaler to aid in power management decisions, such as boosting GPU frequency.
This is called without
dma_fence.lock
held, it can be called multiple times and from any context. Locking is up to the callee if it has some state to manage. If multiple deadlines are set, the expectation is to track the soonest one. If the deadline is before the current time, it should be interpreted as an immediate deadline.This callback is optional.
Parameters
struct dma_fence *fence
fence to reduce refcount of
Parameters
struct dma_fence *fence
fence to increase refcount of
Description
Returns the same fence, with refcount increased by 1.
-
struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)¶
get a fence from a dma_resv_list with rcu read lock
Parameters
struct dma_fence *fence
fence to increase refcount of
Description
Function returns NULL if no refcount could be obtained, or the fence.
-
struct dma_fence *dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)¶
acquire a reference to an RCU tracked fence
Parameters
struct dma_fence __rcu **fencep
pointer to fence to increase refcount of
Description
Function returns NULL if no refcount could be obtained, or the fence. This function handles acquiring a reference to a fence that may be reallocated within the RCU grace period (such as with SLAB_TYPESAFE_BY_RCU), so long as the caller is using RCU on the pointer to the fence.
An alternative mechanism is to employ a seqlock to protect a bunch of
fences, such as used by struct dma_resv
. When using a seqlock,
the seqlock must be taken before and checked after a reference to the
fence is acquired (as shown here).
The caller is required to hold the RCU read lock.
-
bool dma_fence_is_signaled_locked(struct dma_fence *fence)¶
Return an indication if the fence is signaled yet.
Parameters
struct dma_fence *fence
the fence to check
Description
Returns true if the fence was already signaled, false if not. Since this
function doesn't enable signaling, it is not guaranteed to ever return
true if dma_fence_add_callback()
, dma_fence_wait()
or
dma_fence_enable_sw_signaling()
haven't been called before.
This function requires dma_fence.lock
to be held.
See also dma_fence_is_signaled()
.
-
bool dma_fence_is_signaled(struct dma_fence *fence)¶
Return an indication if the fence is signaled yet.
Parameters
struct dma_fence *fence
the fence to check
Description
Returns true if the fence was already signaled, false if not. Since this
function doesn't enable signaling, it is not guaranteed to ever return
true if dma_fence_add_callback()
, dma_fence_wait()
or
dma_fence_enable_sw_signaling()
haven't been called before.
It's recommended for seqno fences to call dma_fence_signal when the operation is complete, it makes it possible to prevent issues from wraparound between time of issue and time of use by checking the return value of this function before calling hardware-specific wait instructions.
See also dma_fence_is_signaled_locked()
.
-
bool __dma_fence_is_later(u64 f1, u64 f2, const struct dma_fence_ops *ops)¶
return if f1 is chronologically later than f2
Parameters
u64 f1
the first fence's seqno
u64 f2
the second fence's seqno from the same context
const struct dma_fence_ops *ops
dma_fence_ops associated with the seqno
Description
Returns true if f1 is chronologically later than f2. Both fences must be from the same context, since a seqno is not common across contexts.
-
bool dma_fence_is_later(struct dma_fence *f1, struct dma_fence *f2)¶
return if f1 is chronologically later than f2
Parameters
struct dma_fence *f1
the first fence from the same context
struct dma_fence *f2
the second fence from the same context
Description
Returns true if f1 is chronologically later than f2. Both fences must be from the same context, since a seqno is not re-used across contexts.
-
struct dma_fence *dma_fence_later(struct dma_fence *f1, struct dma_fence *f2)¶
return the chronologically later fence
Parameters
struct dma_fence *f1
the first fence from the same context
struct dma_fence *f2
the second fence from the same context
Description
Returns NULL if both fences are signaled, otherwise the fence that would be signaled last. Both fences must be from the same context, since a seqno is not re-used across contexts.
Parameters
struct dma_fence *fence
the dma_fence to query
Description
Drivers can supply an optional error status condition before they signal
the fence (to indicate whether the fence was completed due to an error
rather than success). The value of the status condition is only valid
if the fence has been signaled, dma_fence_get_status_locked()
first checks
the signal state before reporting the error status.
Returns 0 if the fence has not yet been signaled, 1 if the fence has been signaled without an error condition, or a negative error code if the fence has been completed in err.
Parameters
struct dma_fence *fence
the dma_fence
int error
the error to store
Description
Drivers can supply an optional error status condition before they signal the fence, to indicate that the fence was completed due to an error rather than success. This must be set before signaling (so that the value is visible before any waiters on the signal callback are woken). This helper exists to help catching erroneous setting of #dma_fence.error.
Parameters
struct dma_fence *fence
the fence to wait on
bool intr
if true, do an interruptible wait
Description
This function will return -ERESTARTSYS if interrupted by a signal, or 0 if the fence was signaled. Other error values may be returned on custom implementations.
Performs a synchronous wait on this fence. It is assumed the caller directly or indirectly holds a reference to the fence, otherwise the fence might be freed before return, resulting in undefined behavior.
See also dma_fence_wait_timeout()
and dma_fence_wait_any_timeout()
.
Parameters
struct dma_fence *fence
the fence to test
Description
Return true if it is a dma_fence_array and false otherwise.
Parameters
struct dma_fence *fence
the fence to test
Description
Return true if it is a dma_fence_chain and false otherwise.
-
bool dma_fence_is_container(struct dma_fence *fence)¶
check if a fence is a container for other fences
Parameters
struct dma_fence *fence
the fence to test
Description
Return true if this fence is a container for other fences, false otherwise. This is important since we can't build up large fence structure or otherwise we run into recursion during operation on those fences.
DMA Fence Array¶
-
struct dma_fence_array *dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context, unsigned seqno, bool signal_on_any)¶
Create a custom fence array
Parameters
int num_fences
[in] number of fences to add in the array
struct dma_fence **fences
[in] array containing the fences
u64 context
[in] fence context to use
unsigned seqno
[in] sequence number to use
bool signal_on_any
[in] signal on any fence in the array
Description
Allocate a dma_fence_array object and initialize the base fence with
dma_fence_init()
.
In case of error it returns NULL.
The caller should allocate the fences array with num_fences size
and fill it with the fences it wants to add to the object. Ownership of this
array is taken and dma_fence_put()
is used on each fence on release.
If signal_on_any is true the fence array signals if any fence in the array signals, otherwise it signals when all fences in the array signal.
-
bool dma_fence_match_context(struct dma_fence *fence, u64 context)¶
Check if all fences are from the given context
Parameters
struct dma_fence *fence
[in] fence or fence array
u64 context
[in] fence context to check all fences against
Description
Checks the provided fence or, for a fence array, all fences in the array against the given context. Returns false if any fence is from a different context.
-
struct dma_fence_array_cb¶
callback helper for fence array
Definition:
struct dma_fence_array_cb {
struct dma_fence_cb cb;
struct dma_fence_array *array;
};
Members
cb
fence callback structure for signaling
array
reference to the parent fence array object
-
struct dma_fence_array¶
fence to represent an array of fences
Definition:
struct dma_fence_array {
struct dma_fence base;
spinlock_t lock;
unsigned num_fences;
atomic_t num_pending;
struct dma_fence **fences;
struct irq_work work;
};
Members
base
fence base class
lock
spinlock for fence handling
num_fences
number of fences in the array
num_pending
fences in the array still pending
fences
array of the fences
work
internal irq_work function
-
struct dma_fence_array *to_dma_fence_array(struct dma_fence *fence)¶
cast a fence to a dma_fence_array
Parameters
struct dma_fence *fence
fence to cast to a dma_fence_array
Description
Returns NULL if the fence is not a dma_fence_array, or the dma_fence_array otherwise.
-
dma_fence_array_for_each¶
dma_fence_array_for_each (fence, index, head)
iterate over all fences in array
Parameters
fence
current fence
index
index into the array
head
potential dma_fence_array object
Description
Test if array is a dma_fence_array object and if yes iterate over all fences in the array. If not just iterate over the fence in array itself.
For a deep dive iterator see dma_fence_unwrap_for_each()
.
DMA Fence Chain¶
Parameters
struct dma_fence *fence
current chain node
Description
Walk the chain to the next node. Returns the next fence or NULL if we are at the end of the chain. Garbage collects chain nodes which are already signaled.
-
int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)¶
find fence chain node by seqno
Parameters
struct dma_fence **pfence
pointer to the chain node where to start
uint64_t seqno
the sequence number to search for
Description
Advance the fence pointer to the chain node which will signal this sequence number. If no sequence number is provided then this is a no-op.
Returns EINVAL if the fence is not a chain node or the sequence number has not yet advanced far enough.
-
void dma_fence_chain_init(struct dma_fence_chain *chain, struct dma_fence *prev, struct dma_fence *fence, uint64_t seqno)¶
initialize a fence chain
Parameters
struct dma_fence_chain *chain
the chain node to initialize
struct dma_fence *prev
the previous fence
struct dma_fence *fence
the current fence
uint64_t seqno
the sequence number to use for the fence chain
Description
Initialize a new chain node and either start a new chain or add the node to the existing chain of the previous fence.
-
struct dma_fence_chain¶
fence to represent an node of a fence chain
Definition:
struct dma_fence_chain {
struct dma_fence base;
struct dma_fence __rcu *prev;
u64 prev_seqno;
struct dma_fence *fence;
union {
struct dma_fence_cb cb;
struct irq_work work;
};
spinlock_t lock;
};
Members
base
fence base class
prev
previous fence of the chain
prev_seqno
original previous seqno before garbage collection
fence
encapsulated fence
{unnamed_union}
anonymous
cb
callback for signaling
This is used to add the callback for signaling the complection of the fence chain. Never used at the same time as the irq work.
work
irq work item for signaling
Irq work structure to allow us to add the callback without running into lock inversion. Never used at the same time as the callback.
lock
spinlock for fence handling
-
struct dma_fence_chain *to_dma_fence_chain(struct dma_fence *fence)¶
cast a fence to a dma_fence_chain
Parameters
struct dma_fence *fence
fence to cast to a dma_fence_array
Description
Returns NULL if the fence is not a dma_fence_chain, or the dma_fence_chain otherwise.
Parameters
struct dma_fence *fence
the fence to test
Description
If the fence is a dma_fence_chain the function returns the fence contained inside the chain object, otherwise it returns the fence itself.
-
struct dma_fence_chain *dma_fence_chain_alloc(void)¶
Parameters
void
no arguments
Description
Returns a new struct dma_fence_chain
object or NULL on failure.
-
void dma_fence_chain_free(struct dma_fence_chain *chain)¶
Parameters
struct dma_fence_chain *chain
chain node to free
Description
Frees up an allocated but not used struct dma_fence_chain
object. This
doesn't need an RCU grace period since the fence was never initialized nor
published. After dma_fence_chain_init()
has been called the fence must be
released by calling dma_fence_put()
, and not through this function.
-
dma_fence_chain_for_each¶
dma_fence_chain_for_each (iter, head)
iterate over all fences in chain
Parameters
iter
current fence
head
starting point
Description
Iterate over all fences in the chain. We keep a reference to the current fence while inside the loop which must be dropped when breaking out.
For a deep dive iterator see dma_fence_unwrap_for_each()
.
DMA Fence unwrap¶
-
struct dma_fence_unwrap¶
cursor into the container structure
Definition:
struct dma_fence_unwrap {
struct dma_fence *chain;
struct dma_fence *array;
unsigned int index;
};
Members
chain
potential dma_fence_chain, but can be other fence as well
array
potential dma_fence_array, but can be other fence as well
index
last returned index if array is really a dma_fence_array
Description
Should be used with dma_fence_unwrap_for_each()
iterator macro.
-
dma_fence_unwrap_for_each¶
dma_fence_unwrap_for_each (fence, cursor, head)
iterate over all fences in containers
Parameters
fence
current fence
cursor
current position inside the containers
head
starting point for the iterator
Description
Unwrap dma_fence_chain and dma_fence_array containers and deep dive into all potential fences in them. If head is just a normal fence only that one is returned.
-
dma_fence_unwrap_merge¶
dma_fence_unwrap_merge (...)
unwrap and merge fences
Parameters
...
variable arguments
Description
All fences given as parameters are unwrapped and merged back together as flat dma_fence_array. Useful if multiple containers need to be merged together.
Implemented as a macro to allocate the necessary arrays on the stack and account the stack frame size to the caller.
Returns NULL on memory allocation failure, a dma_fence object representing all the given fences otherwise.
DMA Fence Sync File¶
Parameters
struct dma_fence *fence
fence to add to the sync_fence
Description
Creates a sync_file containg fence. This function acquires and additional
reference of fence for the newly-created sync_file
, if it succeeds. The
sync_file can be released with fput(sync_file->file). Returns the
sync_file or NULL in case of error.
Parameters
int fd
sync_file fd to get the fence from
Description
Ensures fd references a valid sync_file and returns a fence that represents all fence in the sync_file. On error NULL is returned.
-
struct sync_file¶
sync file to export to the userspace
Definition:
struct sync_file {
struct file *file;
char user_name[32];
#ifdef CONFIG_DEBUG_FS;
struct list_head sync_file_list;
#endif;
wait_queue_head_t wq;
unsigned long flags;
struct dma_fence *fence;
struct dma_fence_cb cb;
};
Members
file
file representing this fence
user_name
Name of the sync file provided by userspace, for merged fences. Otherwise generated through driver callbacks (in which case the entire array is 0).
sync_file_list
membership in global file list
wq
wait queue for fence signaling
flags
flags for the sync_file
fence
fence with the fences in the sync_file
cb
fence callback information
Description
flags: POLL_ENABLED: whether userspace is currently poll()'ing or not
DMA Fence Sync File uABI¶
-
struct sync_merge_data¶
SYNC_IOC_MERGE: merge two fences
Definition:
struct sync_merge_data {
char name[32];
__s32 fd2;
__s32 fence;
__u32 flags;
__u32 pad;
};
Members
name
name of new fence
fd2
file descriptor of second fence
fence
returns the fd of the new fence to userspace
flags
merge_data flags
pad
padding for 64-bit alignment, should always be zero
Description
Creates a new fence containing copies of the sync_pts in both the calling fd and sync_merge_data.fd2. Returns the new fence's fd in sync_merge_data.fence
-
struct sync_fence_info¶
detailed fence information
Definition:
struct sync_fence_info {
char obj_name[32];
char driver_name[32];
__s32 status;
__u32 flags;
__u64 timestamp_ns;
};
Members
obj_name
name of parent sync_timeline
driver_name
name of driver implementing the parent
status
status of the fence 0:active 1:signaled <0:error
flags
fence_info flags
timestamp_ns
timestamp of status change in nanoseconds
-
struct sync_file_info¶
SYNC_IOC_FILE_INFO: get detailed information on a sync_file
Definition:
struct sync_file_info {
char name[32];
__s32 status;
__u32 flags;
__u32 num_fences;
__u32 pad;
__u64 sync_fence_info;
};
Members
name
name of fence
status
status of fence. 1: signaled 0:active <0:error
flags
sync_file_info flags num_fences number of fences in the sync_file
pad
padding for 64-bit alignment, should always be zero
sync_fence_info
pointer to array of struct
sync_fence_info
with all fences in the sync_file
Description
Takes a struct sync_file_info
. If num_fences is 0, the field is updated
with the actual number of fences. If num_fences is > 0, the system will
use the pointer provided on sync_fence_info to return up to num_fences of
struct sync_fence_info
, with detailed fence information.
Indefinite DMA Fences¶
At various times struct dma_fence
with an indefinite time until dma_fence_wait()
finishes have been proposed. Examples include:
Future fences, used in HWC1 to signal when a buffer isn't used by the display any longer, and created with the screen update that makes the buffer visible. The time this fence completes is entirely under userspace's control.
Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet been set. Used to asynchronously delay command submission.
Userspace fences or gpu futexes, fine-grained locking within a command buffer that userspace uses for synchronization across engines or with the CPU, which are then imported as a DMA fence for integration into existing winsys protocols.
Long-running compute command buffers, while still using traditional end of batch DMA fences for memory management instead of context preemption DMA fences which get reattached when the compute job is rescheduled.
Common to all these schemes is that userspace controls the dependencies of these fences and controls when they fire. Mixing indefinite fences with normal in-kernel DMA fences does not work, even when a fallback timeout is included to protect against malicious userspace:
Only the kernel knows about all DMA fence dependencies, userspace is not aware of dependencies injected due to memory management or scheduler decisions.
Only userspace knows about all dependencies in indefinite fences and when exactly they will complete, the kernel has no visibility.
Furthermore the kernel has to be able to hold up userspace command submission for memory management needs, which means we must support indefinite fences being dependent upon DMA fences. If the kernel also support indefinite fences in the kernel like a DMA fence, like any of the above proposal would, there is the potential for deadlocks.
This means that the kernel might accidentally create deadlocks through memory management dependencies which userspace is unaware of, which randomly hangs workloads until the timeout kicks in. Workloads, which from userspace's perspective, do not contain a deadlock. In such a mixed fencing architecture there is no single entity with knowledge of all dependencies. Therefore preventing such deadlocks from within the kernel is not possible.
The only solution to avoid dependencies loops is by not allowing indefinite fences in the kernel. This means:
No future fences, proxy fences or userspace fences imported as DMA fences, with or without a timeout.
No DMA fences that signal end of batchbuffer for command submission where userspace is allowed to use userspace fencing or long running compute workloads. This also means no implicit fencing for shared buffers in these cases.
Recoverable Hardware Page Faults Implications¶
Modern hardware supports recoverable page faults, which has a lot of implications for DMA fences.
First, a pending page fault obviously holds up the work that's running on the accelerator and a memory allocation is usually required to resolve the fault. But memory allocations are not allowed to gate completion of DMA fences, which means any workload using recoverable page faults cannot use DMA fences for synchronization. Synchronization fences controlled by userspace must be used instead.
On GPUs this poses a problem, because current desktop compositor protocols on Linux rely on DMA fences, which means without an entirely new userspace stack built on top of userspace fences, they cannot benefit from recoverable page faults. Specifically this means implicit synchronization will not be possible. The exception is when page faults are only used as migration hints and never to on-demand fill a memory request. For now this means recoverable page faults on GPUs are limited to pure compute workloads.
Furthermore GPUs usually have shared resources between the 3D rendering and compute side, like compute units or command submission engines. If both a 3D job with a DMA fence and a compute workload using recoverable page faults are pending they could deadlock:
The 3D workload might need to wait for the compute job to finish and release hardware resources first.
The compute workload might be stuck in a page fault, because the memory allocation is waiting for the DMA fence of the 3D workload to complete.
There are a few options to prevent this problem, one of which drivers need to ensure:
Compute workloads can always be preempted, even when a page fault is pending and not yet repaired. Not all hardware supports this.
DMA fence workloads and workloads which need page fault handling have independent hardware resources to guarantee forward progress. This could be achieved through e.g. through dedicated engines and minimal compute unit reservations for DMA fence workloads.
The reservation approach could be further refined by only reserving the hardware resources for DMA fence workloads when they are in-flight. This must cover the time from when the DMA fence is visible to other threads up to moment when fence is completed through
dma_fence_signal()
.As a last resort, if the hardware provides no useful reservation mechanics, all workloads must be flushed from the GPU when switching between jobs requiring DMA fences or jobs requiring page fault handling: This means all DMA fences must complete before a compute job with page fault handling can be inserted into the scheduler queue. And vice versa, before a DMA fence can be made visible anywhere in the system, all compute workloads must be preempted to guarantee all pending GPU page faults are flushed.
Only a fairly theoretical option would be to untangle these dependencies when allocating memory to repair hardware page faults, either through separate memory blocks or runtime tracking of the full dependency graph of all DMA fences. This results very wide impact on the kernel, since resolving the page on the CPU side can itself involve a page fault. It is much more feasible and robust to limit the impact of handling hardware page faults to the specific driver.
Note that workloads that run on independent hardware like copy engines or other GPUs do not have any impact. This allows us to keep using DMA fences internally in the kernel even for resolving hardware page faults, e.g. by using copy engines to clear or copy memory needed to resolve the page fault.
In some ways this page fault problem is a special case of the Infinite DMA Fences discussions: Infinite fences from compute workloads are allowed to depend on DMA fences, but not the other way around. And not even the page fault problem is new, because some other CPU thread in userspace might hit a page fault which holds up a userspace fence - supporting page faults on GPUs doesn't anything fundamentally new.