drm/i915 Intel GFX Driver¶

The drm/i915 driver supports all (with the exception of some very early models) integrated GFX chipsets with both Intel display and rendering blocks. This excludes a set of SoC platforms with an SGX rendering unit, those have basic support through the gma500 drm driver.

Core Driver Infrastructure¶

This section covers core driver infrastructure used by both the display and the GEM parts of the driver.

Runtime Power Management¶

The i915 driver supports dynamic enabling and disabling of entire hardware blocks at runtime. This is especially important on the display side where software is supposed to control many power gates manually on recent hardware, since on the GT side a lot of the power management is done by the hardware. But even there some manual control at the device level is required.

Since i915 supports a diverse set of platforms with a unified codebase and hardware engineers just love to shuffle functionality around between power domains there’s a sizeable amount of indirection required. This file provides generic functions to the driver for grabbing and releasing references for abstract power domains. It then maps those to the actual power wells present for a given platform.

bool __intel_display_power_is_enabled(struct drm_i915_private * dev_priv, enum intel_display_power_domain domain)¶: unlocked check for a power domain

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum intel_display_power_domain domain: power domain to check

Description

This is the unlocked version of intel_display_power_is_enabled() and should only be used from error capture and recovery code where deadlocks are possible.

Return

True when the power domain is enabled, false otherwise.

bool intel_display_power_is_enabled(struct drm_i915_private * dev_priv, enum intel_display_power_domain domain)¶: check for a power domain

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum intel_display_power_domain domain: power domain to check

Description

This function can be used to check the hw power domain state. It is mostly used in hardware state readout functions. Everywhere else code should rely upon explicit power domain reference counting to ensure that the hardware block is powered up before accessing it.

Callers must hold the relevant modesetting locks to ensure that concurrent threads can’t disable the power well while the caller tries to read a few registers.

Return

True when the power domain is enabled, false otherwise.

void intel_display_set_init_power(struct drm_i915_private * dev_priv, bool enable)¶: set the initial power domain state

Parameters

struct drm_i915_private * dev_priv: i915 device instance
bool enable: whether to enable or disable the initial power domain state

Description

For simplicity our driver load/unload and system suspend/resume code assumes that all power domains are always enabled. This functions controls the state of this little hack. While the initial power domain state is enabled runtime pm is effectively disabled.

void intel_display_power_get(struct drm_i915_private * dev_priv, enum intel_display_power_domain domain)¶: grab a power domain reference

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum intel_display_power_domain domain: power domain to reference

Description

This function grabs a power domain reference for domain and ensures that the power domain and all its parents are powered up. Therefore users should only grab a reference to the innermost power domain they need.

Any power domain reference obtained by this function must have a symmetric call to intel_display_power_put() to release the reference again.

bool intel_display_power_get_if_enabled(struct drm_i915_private * dev_priv, enum intel_display_power_domain domain)¶: grab a reference for an enabled display power domain

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum intel_display_power_domain domain: power domain to reference

Description

This function grabs a power domain reference for domain and ensures that the power domain and all its parents are powered up. Therefore users should only grab a reference to the innermost power domain they need.

Any power domain reference obtained by this function must have a symmetric call to intel_display_power_put() to release the reference again.

void intel_display_power_put(struct drm_i915_private * dev_priv, enum intel_display_power_domain domain)¶: release a power domain reference

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum intel_display_power_domain domain: power domain to reference

Description

This function drops the power domain reference obtained by intel_display_power_get() and might power down the corresponding hardware block right away if this is the last reference.

int intel_power_domains_init(struct drm_i915_private * dev_priv)¶: initializes the power domain structures

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Initializes the power domain structures for dev_priv depending upon the supported platform.

void intel_power_domains_fini(struct drm_i915_private * dev_priv)¶: finalizes the power domain structures

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Finalizes the power domain structures for dev_priv depending upon the supported platform. This function also disables runtime pm and ensures that the device stays powered up so that the driver can be reloaded.

void intel_power_domains_init_hw(struct drm_i915_private * dev_priv, bool resume)¶: initialize hardware power domain state

Parameters

struct drm_i915_private * dev_priv: i915 device instance
bool resume: Called from resume code paths or not

Description

This function initializes the hardware power domain state and enables all power domains using intel_display_set_init_power().

void intel_power_domains_suspend(struct drm_i915_private * dev_priv)¶: suspend power domain state

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function prepares the hardware power domain state before entering system suspend. It must be paired with intel_power_domains_init_hw().

void intel_runtime_pm_get(struct drm_i915_private * dev_priv)¶: grab a runtime pm reference

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function grabs a device-level runtime pm reference (mostly used for GEM code to ensure the GTT or GT is on) and ensures that it is powered up.

Any runtime pm reference obtained by this function must have a symmetric call to intel_runtime_pm_put() to release the reference again.

bool intel_runtime_pm_get_if_in_use(struct drm_i915_private * dev_priv)¶: grab a runtime pm reference if device in use

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function grabs a device-level runtime pm reference if the device is already in use and ensures that it is powered up.

Any runtime pm reference obtained by this function must have a symmetric call to intel_runtime_pm_put() to release the reference again.

void intel_runtime_pm_get_noresume(struct drm_i915_private * dev_priv)¶: grab a runtime pm reference

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function grabs a device-level runtime pm reference (mostly used for GEM code to ensure the GTT or GT is on).

It will _not_ power up the device but instead only check that it’s powered on. Therefore it is only valid to call this functions from contexts where the device is known to be powered up and where trying to power it up would result in hilarity and deadlocks. That pretty much means only the system suspend/resume code where this is used to grab runtime pm references for delayed setup down in work items.

Any runtime pm reference obtained by this function must have a symmetric call to intel_runtime_pm_put() to release the reference again.

void intel_runtime_pm_put(struct drm_i915_private * dev_priv)¶: release a runtime pm reference

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function drops the device-level runtime pm reference obtained by intel_runtime_pm_get() and might power down the corresponding hardware block right away if this is the last reference.

void intel_runtime_pm_enable(struct drm_i915_private * dev_priv)¶: enable runtime pm

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function enables runtime pm at the end of the driver load sequence.

Note that this function does currently not enable runtime pm for the subordinate display power domains. That is only done on the first modeset using intel_display_set_init_power().

void intel_uncore_forcewake_get(struct drm_i915_private * dev_priv, enum forcewake_domains fw_domains)¶: grab forcewake domain references

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum forcewake_domains fw_domains: forcewake domains to get reference on

Description

This function can be used get GT’s forcewake domain references. Normal register access will handle the forcewake domains automatically. However if some sequence requires the GT to not power down a particular forcewake domains this function should be called at the beginning of the sequence. And subsequently the reference should be dropped by symmetric call to intel_unforce_forcewake_put(). Usually caller wants all the domains to be kept awake so the fw_domains would be then FORCEWAKE_ALL.

void intel_uncore_forcewake_get__locked(struct drm_i915_private * dev_priv, enum forcewake_domains fw_domains)¶: grab forcewake domain references

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum forcewake_domains fw_domains: forcewake domains to get reference on

Description

See intel_uncore_forcewake_get(). This variant places the onus on the caller to explicitly handle the dev_priv->uncore.lock spinlock.

void intel_uncore_forcewake_put(struct drm_i915_private * dev_priv, enum forcewake_domains fw_domains)¶: release a forcewake domain reference

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum forcewake_domains fw_domains: forcewake domains to put references

Description

This function drops the device-level forcewakes for specified domains obtained by intel_uncore_forcewake_get().

void intel_uncore_forcewake_put__locked(struct drm_i915_private * dev_priv, enum forcewake_domains fw_domains)¶: grab forcewake domain references

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum forcewake_domains fw_domains: forcewake domains to get reference on

Description

See intel_uncore_forcewake_put(). This variant places the onus on the caller to explicitly handle the dev_priv->uncore.lock spinlock.

int gen6_reset_engines(struct drm_i915_private * dev_priv, unsigned engine_mask)¶: reset individual engines

Parameters

struct drm_i915_private * dev_priv: i915 device
unsigned engine_mask: mask of intel_ring_flag() engines or ALL_ENGINES for full reset

Description

This function will reset the individual engines that are set in engine_mask. If you provide ALL_ENGINES as mask, full global domain reset will be issued.

Note

It is responsibility of the caller to handle the difference between asking full domain reset versus reset for all available individual engines.

Returns 0 on success, nonzero on error.

int intel_wait_for_register_fw(struct drm_i915_private * dev_priv, i915_reg_t reg, const u32 mask, const u32 value, const unsigned long timeout_ms)¶: wait until register matches expected state

Parameters

struct drm_i915_private * dev_priv: the i915 device
i915_reg_t reg: the register to read
const u32 mask: mask to apply to register value
const u32 value: expected value
const unsigned long timeout_ms: timeout in millisecond

Description

This routine waits until the target register reg contains the expected value after applying the mask, i.e. it waits until

(I915_READ_FW(reg) & mask) == value

Otherwise, the wait will timeout after timeout_ms milliseconds.

Note that this routine assumes the caller holds forcewake asserted, it is not suitable for very long waits. See intel_wait_for_register() if you wish to wait without holding forcewake for the duration (i.e. you expect the wait to be slow).

Returns 0 if the register matches the desired condition, or -ETIMEOUT.

int intel_wait_for_register(struct drm_i915_private * dev_priv, i915_reg_t reg, const u32 mask, const u32 value, const unsigned long timeout_ms)¶: wait until register matches expected state

Parameters

struct drm_i915_private * dev_priv: the i915 device
i915_reg_t reg: the register to read
const u32 mask: mask to apply to register value
const u32 value: expected value
const unsigned long timeout_ms: timeout in millisecond

Description

This routine waits until the target register reg contains the expected value after applying the mask, i.e. it waits until

(I915_READ(reg) & mask) == value

Otherwise, the wait will timeout after timeout_ms milliseconds.

Returns 0 if the register matches the desired condition, or -ETIMEOUT.

enum forcewake_domains intel_uncore_forcewake_for_reg(struct drm_i915_private * dev_priv, i915_reg_t reg, unsigned int op)¶: which forcewake domains are needed to access a register

Parameters

struct drm_i915_private * dev_priv: pointer to struct drm_i915_private
i915_reg_t reg: register in question
unsigned int op: operation bitmask of FW_REG_READ and/or FW_REG_WRITE

Description

Returns a set of forcewake domains required to be taken with for example intel_uncore_forcewake_get for the specified register to be accessible in the specified mode (read, write or read/write) with raw mmio accessors.

NOTE

On Gen6 and Gen7 write forcewake domain (FORCEWAKE_RENDER) requires the callers to do FIFO management on their own or risk losing writes.

Interrupt Handling¶

These functions provide the basic support for enabling and disabling the interrupt handling support. There’s a lot more functionality in i915_irq.c and related files, but that will be described in separate chapters.

void intel_irq_init(struct drm_i915_private * dev_priv)¶: initializes irq support

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function initializes all the irq support including work items, timers and all the vtables. It does not setup the interrupt itself though.

void intel_runtime_pm_disable_interrupts(struct drm_i915_private * dev_priv)¶: runtime interrupt disabling

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function is used to disable interrupts at runtime, both in the runtime pm and the system suspend/resume code.

void intel_runtime_pm_enable_interrupts(struct drm_i915_private * dev_priv)¶: runtime interrupt enabling

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function is used to enable interrupts at runtime, both in the runtime pm and the system suspend/resume code.

Intel GVT-g Guest Support(vGPU)¶

Intel GVT-g is a graphics virtualization technology which shares the GPU among multiple virtual machines on a time-sharing basis. Each virtual machine is presented a virtual GPU (vGPU), which has equivalent features as the underlying physical GPU (pGPU), so i915 driver can run seamlessly in a virtual machine. This file provides vGPU specific optimizations when running in a virtual machine, to reduce the complexity of vGPU emulation and to improve the overall performance.

A primary function introduced here is so-called “address space ballooning” technique. Intel GVT-g partitions global graphics memory among multiple VMs, so each VM can directly access a portion of the memory without hypervisor’s intervention, e.g. filling textures or queuing commands. However with the partitioning an unmodified i915 driver would assume a smaller graphics memory starting from address ZERO, then requires vGPU emulation module to translate the graphics address between ‘guest view’ and ‘host view’, for all registers and command opcodes which contain a graphics memory address. To reduce the complexity, Intel GVT-g introduces “address space ballooning”, by telling the exact partitioning knowledge to each guest i915 driver, which then reserves and prevents non-allocated portions from allocation. Thus vGPU emulation module only needs to scan and validate graphics addresses without complexity of address translation.

void i915_check_vgpu(struct drm_i915_private * dev_priv)¶: detect virtual GPU

Parameters

struct drm_i915_private * dev_priv: i915 device private

Description

This function is called at the initialization stage, to detect whether running on a vGPU.

void intel_vgt_deballoon(struct drm_i915_private * dev_priv)¶: deballoon reserved graphics address trunks

Parameters

struct drm_i915_private * dev_priv: undescribed

Description

This function is called to deallocate the ballooned-out graphic memory, when driver is unloaded or when ballooning fails.

int intel_vgt_balloon(struct drm_i915_private * dev_priv)¶: balloon out reserved graphics address trunks

Parameters

struct drm_i915_private * dev_priv: undescribed

Description

This function is called at the initialization stage, to balloon out the graphic address space allocated to other vGPUs, by marking these spaces as reserved. The ballooning related knowledge(starting address and size of the mappable/unmappable graphic memory) is described in the vgt_if structure in a reserved mmio range.

To give an example, the drawing below depicts one typical scenario after ballooning. Here the vGPU1 has 2 pieces of graphic address spaces ballooned out each for the mappable and the non-mappable part. From the vGPU1 point of view, the total size is the same as the physical one, with the start address of its graphic space being zero. Yet there are some portions ballooned out( the shadow part, which are marked as reserved by drm allocator). From the host point of view, the graphic address space is partitioned by multiple vGPUs in different VMs.

                  vGPU1 view         Host view
       0 ------> +-----------+     +-----------+
         ^       |###########|     |   vGPU3   |
         |       |###########|     +-----------+
         |       |###########|     |   vGPU2   |
         |       +-----------+     +-----------+
  mappable GM    | available | ==> |   vGPU1   |
         |       +-----------+     +-----------+
         |       |###########|     |           |
         v       |###########|     |   Host    |
         +=======+===========+     +===========+
         ^       |###########|     |   vGPU3   |
         |       |###########|     +-----------+
         |       |###########|     |   vGPU2   |
         |       +-----------+     +-----------+
unmappable GM    | available | ==> |   vGPU1   |
         |       +-----------+     +-----------+
         |       |###########|     |           |
         |       |###########|     |   Host    |
         v       |###########|     |           |

total GM size ——> +———–+ +———–+

Return

zero on success, non-zero if configuration invalid or ballooning failed

Display Hardware Handling¶

This section covers everything related to the display hardware including the mode setting infrastructure, plane, sprite and cursor handling and display, output probing and related topics.

Mode Setting Infrastructure¶

The i915 driver is thus far the only DRM driver which doesn’t use the common DRM helper code to implement mode setting sequences. Thus it has its own tailor-made infrastructure for executing a display configuration change.

Frontbuffer Tracking¶

Many features require us to track changes to the currently active frontbuffer, especially rendering targeted at the frontbuffer.

To be able to do so GEM tracks frontbuffers using a bitmask for all possible frontbuffer slots through i915_gem_track_fb(). The function in this file are then called when the contents of the frontbuffer are invalidated, when frontbuffer rendering has stopped again to flush out all the changes and when the frontbuffer is exchanged with a flip. Subsystems interested in frontbuffer changes (e.g. PSR, FBC, DRRS) should directly put their callbacks into the relevant places and filter for the frontbuffer slots that they are interested int.

On a high level there are two types of powersaving features. The first one work like a special cache (FBC and PSR) and are interested when they should stop caching and when to restart caching. This is done by placing callbacks into the invalidate and the flush functions: At invalidate the caching must be stopped and at flush time it can be restarted. And maybe they need to know when the frontbuffer changes (e.g. when the hw doesn’t initiate an invalidate and flush on its own) which can be achieved with placing callbacks into the flip functions.

The other type of display power saving feature only cares about busyness (e.g. DRRS). In that case all three (invalidate, flush and flip) indicate busyness. There is no direct way to detect idleness. Instead an idle timer work delayed work should be started from the flush and flip functions and cancelled as soon as busyness is detected.

Note that there’s also an older frontbuffer activity tracking scheme which just tracks general activity. This is done by the various mark_busy and mark_idle functions. For display power management features using these functions is deprecated and should be avoided.

void intel_fb_obj_invalidate(struct drm_i915_gem_object * obj, enum fb_op_origin origin)¶: invalidate frontbuffer object

Parameters

struct drm_i915_gem_object * obj: GEM object to invalidate
enum fb_op_origin origin: which operation caused the invalidation

Description

This function gets called every time rendering on the given object starts and frontbuffer caching (fbc, low refresh rate for DRRS, panel self refresh) must be invalidated. For ORIGIN_CS any subsequent invalidation will be delayed until the rendering completes or a flip on this frontbuffer plane is scheduled.

void intel_frontbuffer_flush(struct drm_device * dev, unsigned frontbuffer_bits, enum fb_op_origin origin)¶: flush frontbuffer

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits
enum fb_op_origin origin: which operation caused the flush

Description

This function gets called every time rendering on the given planes has completed and frontbuffer caching can be started again. Flushes will get delayed if they’re blocked by some outstanding asynchronous rendering.

Can be called without any locks held.

void intel_fb_obj_flush(struct drm_i915_gem_object * obj, bool retire, enum fb_op_origin origin)¶: flush frontbuffer object

Parameters

struct drm_i915_gem_object * obj: GEM object to flush
bool retire: set when retiring asynchronous rendering
enum fb_op_origin origin: which operation caused the flush

Description

This function gets called every time rendering on the given object has completed and frontbuffer caching can be started again. If retire is true then any delayed flushes will be unblocked.

void intel_frontbuffer_flip_prepare(struct drm_device * dev, unsigned frontbuffer_bits)¶: prepare asynchronous frontbuffer flip

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

This function gets called after scheduling a flip on obj. The actual frontbuffer flushing will be delayed until completion is signalled with intel_frontbuffer_flip_complete. If an invalidate happens in between this flush will be cancelled.

Can be called without any locks held.

void intel_frontbuffer_flip_complete(struct drm_device * dev, unsigned frontbuffer_bits)¶: complete asynchronous frontbuffer flip

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

This function gets called after the flip has been latched and will complete on the next vblank. It will execute the flush if it hasn’t been cancelled yet.

Can be called without any locks held.

void intel_frontbuffer_flip(struct drm_device * dev, unsigned frontbuffer_bits)¶: synchronous frontbuffer flip

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

This function gets called after scheduling a flip on obj. This is for synchronous plane updates which will happen on the next vblank and which will not get delayed by pending gpu rendering.

Can be called without any locks held.

void i915_gem_track_fb(struct drm_i915_gem_object * old, struct drm_i915_gem_object * new, unsigned frontbuffer_bits)¶: update frontbuffer tracking

Parameters

struct drm_i915_gem_object * old: current GEM buffer for the frontbuffer slots
struct drm_i915_gem_object * new: new GEM buffer for the frontbuffer slots
unsigned frontbuffer_bits: bitmask of frontbuffer slots

Description

This updates the frontbuffer tracking bits frontbuffer_bits by clearing them from old and setting them in new. Both old and new can be NULL.

Display FIFO Underrun Reporting¶

The i915 driver checks for display fifo underruns using the interrupt signals provided by the hardware. This is enabled by default and fairly useful to debug display issues, especially watermark settings.

If an underrun is detected this is logged into dmesg. To avoid flooding logs and occupying the cpu underrun interrupts are disabled after the first occurrence until the next modeset on a given pipe.

Note that underrun detection on gmch platforms is a bit more ugly since there is no interrupt (despite that the signalling bit is in the PIPESTAT pipe interrupt register). Also on some other platforms underrun interrupts are shared, which means that if we detect an underrun we need to disable underrun reporting on all pipes.

The code also supports underrun detection on the PCH transcoder.

bool intel_set_cpu_fifo_underrun_reporting(struct drm_i915_private * dev_priv, enum pipe pipe, bool enable)¶: set cpu fifo underrrun reporting state

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum pipe pipe: (CPU) pipe to set state for
bool enable: whether underruns should be reported or not

Description

This function sets the fifo underrun state for pipe. It is used in the modeset code to avoid false positives since on many platforms underruns are expected when disabling or enabling the pipe.

Notice that on some platforms disabling underrun reports for one pipe disables for all due to shared interrupts. Actual reporting is still per-pipe though.

Returns the previous state of underrun reporting.

bool intel_set_pch_fifo_underrun_reporting(struct drm_i915_private * dev_priv, enum transcoder pch_transcoder, bool enable)¶: set PCH fifo underrun reporting state

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum transcoder pch_transcoder: the PCH transcoder (same as pipe on IVB and older)
bool enable: whether underruns should be reported or not

Description

This function makes us disable or enable PCH fifo underruns for a specific PCH transcoder. Notice that on some PCHs (e.g. CPT/PPT), disabling FIFO underrun reporting for one transcoder may also disable all the other PCH error interruts for the other transcoders, due to the fact that there’s just one interrupt mask/enable bit for all the transcoders.

Returns the previous state of underrun reporting.

void intel_cpu_fifo_underrun_irq_handler(struct drm_i915_private * dev_priv, enum pipe pipe)¶: handle CPU fifo underrun interrupt

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum pipe pipe: (CPU) pipe to set state for

Description

This handles a CPU fifo underrun interrupt, generating an underrun warning into dmesg if underrun reporting is enabled and then disables the underrun interrupt to avoid an irq storm.

void intel_pch_fifo_underrun_irq_handler(struct drm_i915_private * dev_priv, enum transcoder pch_transcoder)¶: handle PCH fifo underrun interrupt

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum transcoder pch_transcoder: the PCH transcoder (same as pipe on IVB and older)

Description

This handles a PCH fifo underrun interrupt, generating an underrun warning into dmesg if underrun reporting is enabled and then disables the underrun interrupt to avoid an irq storm.

void intel_check_cpu_fifo_underruns(struct drm_i915_private * dev_priv)¶: check for CPU fifo underruns immediately

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Check for CPU fifo underruns immediately. Useful on IVB/HSW where the shared error interrupt may have been disabled, and so CPU fifo underruns won’t necessarily raise an interrupt, and on GMCH platforms where underruns never raise an interrupt.

void intel_check_pch_fifo_underruns(struct drm_i915_private * dev_priv)¶: check for PCH fifo underruns immediately

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Check for PCH fifo underruns immediately. Useful on CPT/PPT where the shared error interrupt may have been disabled, and so PCH fifo underruns won’t necessarily raise an interrupt.

Plane Configuration¶

This section covers plane configuration and composition with the primary plane, sprites, cursors and overlays. This includes the infrastructure to do atomic vsync’ed updates of all this state and also tightly coupled topics like watermark setup and computation, framebuffer compression and panel self refresh.

Atomic Plane Helpers¶

The functions here are used by the atomic plane helper functions to implement legacy plane updates (i.e., drm_plane->:c:func:update_plane() and drm_plane->:c:func:disable_plane()). This allows plane updates to use the atomic state infrastructure and perform plane updates as separate prepare/check/commit/cleanup steps.

struct intel_plane_state * intel_create_plane_state(struct drm_plane * plane)¶: create plane state object

Parameters

struct drm_plane * plane: drm plane

Description

Allocates a fresh plane state for the given plane and sets some of the state values to sensible initial values.

Return

A newly allocated plane state, or NULL on failure

struct drm_plane_state * intel_plane_duplicate_state(struct drm_plane * plane)¶: duplicate plane state

Parameters

struct drm_plane * plane: drm plane

Description

Allocates and returns a copy of the plane state (both common and Intel-specific) for the specified plane.

Return

The newly allocated plane state, or NULL on failure.

void intel_plane_destroy_state(struct drm_plane * plane, struct drm_plane_state * state)¶: destroy plane state

Parameters

struct drm_plane * plane: drm plane
struct drm_plane_state * state: state object to destroy

Description

Destroys the plane state (both common and Intel-specific) for the specified plane.

int intel_plane_atomic_get_property(struct drm_plane * plane, const struct drm_plane_state * state, struct drm_property * property, uint64_t * val)¶: fetch plane property value

Parameters

struct drm_plane * plane: plane to fetch property for
const struct drm_plane_state * state: state containing the property value
struct drm_property * property: property to look up
uint64_t * val: pointer to write property value into

Description

The DRM core does not store shadow copies of properties for atomic-capable drivers. This entrypoint is used to fetch the current value of a driver-specific plane property.

int intel_plane_atomic_set_property(struct drm_plane * plane, struct drm_plane_state * state, struct drm_property * property, uint64_t val)¶: set plane property value

Parameters

struct drm_plane * plane: plane to set property for
struct drm_plane_state * state: state to update property value in
struct drm_property * property: property to set
uint64_t val: value to set property to

Description

Writes the specified property value for a plane into the provided atomic state object.

Returns 0 on success, -EINVAL on unrecognized properties

Output Probing¶

This section covers output probing and related infrastructure like the hotplug interrupt storm detection and mitigation code. Note that the i915 driver still uses most of the common DRM helper code for output probing, so those sections fully apply.

Hotplug¶

Simply put, hotplug occurs when a display is connected to or disconnected from the system. However, there may be adapters and docking stations and Display Port short pulses and MST devices involved, complicating matters.

Hotplug in i915 is handled in many different levels of abstraction.

The platform dependent interrupt handling code in i915_irq.c enables, disables, and does preliminary handling of the interrupts. The interrupt handlers gather the hotplug detect (HPD) information from relevant registers into a platform independent mask of hotplug pins that have fired.

The platform independent interrupt handler intel_hpd_irq_handler() in intel_hotplug.c does hotplug irq storm detection and mitigation, and passes further processing to appropriate bottom halves (Display Port specific and regular hotplug).

The Display Port work function i915_digport_work_func() calls into intel_dp_hpd_pulse() via hooks, which handles DP short pulses and DP MST long pulses, with failures and non-MST long pulses triggering regular hotplug processing on the connector.

The regular hotplug work function i915_hotplug_work_func() calls connector detect hooks, and, if connector status changes, triggers sending of hotplug uevent to userspace via drm_kms_helper_hotplug_event().

Finally, the userspace is responsible for triggering a modeset upon receiving the hotplug uevent, disabling or enabling the crtc as needed.

The hotplug interrupt storm detection and mitigation code keeps track of the number of interrupts per hotplug pin per a period of time, and if the number of interrupts exceeds a certain threshold, the interrupt is disabled for a while before being re-enabled. The intention is to mitigate issues raising from broken hardware triggering massive amounts of interrupts and grinding the system to a halt.

Current implementation expects that hotplug interrupt storm will not be seen when display port sink is connected, hence on platforms whose DP callback is handled by i915_digport_work_func reenabling of hpd is not performed (it was never expected to be disabled in the first place ;) ) this is specific to DP sinks handled by this routine and any other display such as HDMI or DVI enabled on the same port will have proper logic since it will use i915_hotplug_work_func where this logic is handled.

bool intel_hpd_irq_storm_detect(struct drm_i915_private * dev_priv, enum hpd_pin pin)¶: gather stats and detect HPD irq storm on a pin

Parameters

struct drm_i915_private * dev_priv: private driver data pointer
enum hpd_pin pin: the pin to gather stats on

Description

Gather stats about HPD irqs from the specified pin, and detect irq storms. Only the pin specific stats and state are changed, the caller is responsible for further action.

HPD_STORM_THRESHOLD irqs are allowed within HPD_STORM_DETECT_PERIOD ms, otherwise it’s considered an irq storm, and the irq state is set to HPD_MARK_DISABLED.

Return true if an irq storm was detected on pin.

void intel_hpd_irq_handler(struct drm_i915_private * dev_priv, u32 pin_mask, u32 long_mask)¶: main hotplug irq handler

Parameters

struct drm_i915_private * dev_priv: drm_i915_private
u32 pin_mask: a mask of hpd pins that have triggered the irq
u32 long_mask: a mask of hpd pins that may be long hpd pulses

Description

This is the main hotplug irq handler for all platforms. The platform specific irq handlers call the platform specific hotplug irq handlers, which read and decode the appropriate registers into bitmasks about hpd pins that have triggered (pin_mask), and which of those pins may be long pulses (long_mask). The long_mask is ignored if the port corresponding to the pin is not a digital port.

Here, we do hotplug irq storm detection and mitigation, and pass further processing to appropriate bottom halves.

void intel_hpd_init(struct drm_i915_private * dev_priv)¶: initializes and enables hpd support

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function enables the hotplug support. It requires that interrupts have already been enabled with intel_irq_init_hw(). From this point on hotplug and poll request can run concurrently to other code, so locking rules must be obeyed.

This is a separate step from interrupt enabling to simplify the locking rules in the driver load and resume code.

Also see: intel_hpd_poll_init(), which enables connector polling

void intel_hpd_poll_init(struct drm_i915_private * dev_priv)¶: enables/disables polling for connectors with hpd

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function enables polling for all connectors, regardless of whether or not they support hotplug detection. Under certain conditions HPD may not be functional. On most Intel GPUs, this happens when we enter runtime suspend. On Valleyview and Cherryview systems, this also happens when we shut off all of the powerwells.

Since this function can get called in contexts where we’re already holding dev->mode_config.mutex, we do the actual hotplug enabling in a seperate worker.

Also see: intel_hpd_init(), which restores hpd handling.

High Definition Audio¶

The graphics and audio drivers together support High Definition Audio over HDMI and Display Port. The audio programming sequences are divided into audio codec and controller enable and disable sequences. The graphics driver handles the audio codec sequences, while the audio driver handles the audio controller sequences.

The disable sequences must be performed before disabling the transcoder or port. The enable sequences may only be performed after enabling the transcoder and port, and after completed link training. Therefore the audio enable/disable sequences are part of the modeset sequence.

The codec and controller sequences could be done either parallel or serial, but generally the ELDV/PD change in the codec sequence indicates to the audio driver that the controller sequence should start. Indeed, most of the co-operation between the graphics and audio drivers is handled via audio related registers. (The notable exception is the power management, not covered here.)

The struct i915_audio_component is used to interact between the graphics and audio drivers. The struct i915_audio_component_ops *ops in it is defined in graphics driver and called in audio driver. The struct i915_audio_component_audio_ops *audio_ops is called from i915 driver.

void intel_audio_codec_enable(struct intel_encoder * intel_encoder)¶: Enable the audio codec for HD audio

Parameters

struct intel_encoder * intel_encoder: encoder on which to enable audio

Description

The enable sequences may only be performed after enabling the transcoder and port, and after completed link training.

void intel_audio_codec_disable(struct intel_encoder * intel_encoder)¶: Disable the audio codec for HD audio

Parameters

struct intel_encoder * intel_encoder: encoder on which to disable audio

Description

The disable sequences must be performed before disabling the transcoder or port.

void intel_init_audio_hooks(struct drm_i915_private * dev_priv)¶: Set up chip specific audio hooks

Parameters

struct drm_i915_private * dev_priv: device private

void i915_audio_component_init(struct drm_i915_private * dev_priv)¶: initialize and register the audio component

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This will register with the component framework a child component which will bind dynamically to the snd_hda_intel driver’s corresponding master component when the latter is registered. During binding the child initializes an instance of struct i915_audio_component which it receives from the master. The master can then start to use the interface defined by this struct. Each side can break the binding at any point by deregistering its own component after which each side’s component unbind callback is called.

We ignore any error during registration and continue with reduced functionality (i.e. without HDMI audio).

void i915_audio_component_cleanup(struct drm_i915_private * dev_priv)¶: deregister the audio component

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Deregisters the audio component, breaking any existing binding to the corresponding snd_hda_intel driver’s master component.

struct i915_audio_component_ops¶: Ops implemented by i915 driver, called by hda driver

Definition

struct i915_audio_component_ops {
  struct module * owner;
  void (* get_power) (struct device *);
  void (* put_power) (struct device *);
  void (* codec_wake_override) (struct device *, bool enable);
  int (* get_cdclk_freq) (struct device *);
  int (* sync_audio_rate) (struct device *, int port, int rate);
  int (* get_eld) (struct device *, int port, bool *enabled,unsigned char *buf, int max_bytes);
};

Members

struct module * owner

i915 module

void (*)(struct device *) get_power

get the POWER_DOMAIN_AUDIO power well

Request the power well to be turned on.

void (*)(struct device *) put_power

put the POWER_DOMAIN_AUDIO power well

Allow the power well to be turned off.

void (*)(struct device *, bool enable) codec_wake_override

Enable/disable codec wake signal

int (*)(struct device *) get_cdclk_freq

Get the Core Display Clock in kHz

int (*)(struct device *, int port, int rate) sync_audio_rate

set n/cts based on the sample rate

Called from audio driver. After audio driver sets the sample rate, it will call this function to set n/cts

int (*)(struct device *, int port, bool *enabled,unsigned char *buf, int max_bytes) get_eld

fill the audio state and ELD bytes for the given port

Called from audio driver to get the HDMI/DP audio state of the given digital port, and also fetch ELD bytes to the given pointer.

It returns the byte size of the original ELD (not the actually copied size), zero for an invalid ELD, or a negative error code.

Note that the returned size may be over max_bytes. Then it implies that only a part of ELD has been copied to the buffer.

struct i915_audio_component_audio_ops¶: Ops implemented by hda driver, called by i915 driver

Definition

struct i915_audio_component_audio_ops {
  void * audio_ptr;
  void (* pin_eld_notify) (void *audio_ptr, int port);
};

Members

void * audio_ptr

Pointer to be used in call to pin_eld_notify

void (*)(void *audio_ptr, int port) pin_eld_notify

Notify the HDA driver that pin sense and/or ELD information has changed

Called when the i915 driver has set up audio pipeline or has just begun to tear it down. This allows the HDA driver to update its status accordingly (even when the HDA controller is in power save mode).

struct i915_audio_component¶: Used for direct communication between i915 and hda drivers

Definition

struct i915_audio_component {
  struct device * dev;
  int aud_sample_rate[MAX_PORTS];
  const struct i915_audio_component_ops * ops;
  const struct i915_audio_component_audio_ops * audio_ops;
};

Members

struct device * dev: i915 device, used as parameter for ops
int aud_sample_rate[MAX_PORTS]: the array of audio sample rate per port
const struct i915_audio_component_ops * ops: Ops implemented by i915 driver, called by hda driver
const struct i915_audio_component_audio_ops * audio_ops: Ops implemented by hda driver, called by i915 driver

Panel Self Refresh PSR (PSR/SRD)¶

Since Haswell Display controller supports Panel Self-Refresh on display panels witch have a remote frame buffer (RFB) implemented according to PSR spec in eDP1.3. PSR feature allows the display to go to lower standby states when system is idle but display is on as it eliminates display refresh request to DDR memory completely as long as the frame buffer for that display is unchanged.

Panel Self Refresh must be supported by both Hardware (source) and Panel (sink).

PSR saves power by caching the framebuffer in the panel RFB, which allows us to power down the link and memory controller. For DSI panels the same idea is called “manual mode”.

The implementation uses the hardware-based PSR support which automatically enters/exits self-refresh mode. The hardware takes care of sending the required DP aux message and could even retrain the link (that part isn’t enabled yet though). The hardware also keeps track of any frontbuffer changes to know when to exit self-refresh mode again. Unfortunately that part doesn’t work too well, hence why the i915 PSR support uses the software frontbuffer tracking to make sure it doesn’t miss a screen update. For this integration intel_psr_invalidate() and intel_psr_flush() get called by the frontbuffer tracking code. Note that because of locking issues the self-refresh re-enable code is done from a work queue, which must be correctly synchronized/cancelled when shutting down the pipe.”

void intel_psr_enable(struct intel_dp * intel_dp)¶: Enable PSR

Parameters

struct intel_dp * intel_dp: Intel DP

Description

This function can only be called after the pipe is fully trained and enabled.

void intel_psr_disable(struct intel_dp * intel_dp)¶: Disable PSR

Parameters

struct intel_dp * intel_dp: Intel DP

Description

This function needs to be called before disabling pipe.

void intel_psr_single_frame_update(struct drm_device * dev, unsigned frontbuffer_bits)¶: Single Frame Update

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

Some platforms support a single frame update feature that is used to send and update only one frame on Remote Frame Buffer. So far it is only implemented for Valleyview and Cherryview because hardware requires this to be done before a page flip.

void intel_psr_invalidate(struct drm_device * dev, unsigned frontbuffer_bits)¶: Invalidade PSR

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

Since the hardware frontbuffer tracking has gaps we need to integrate with the software frontbuffer tracking. This function gets called every time frontbuffer rendering starts and a buffer gets dirtied. PSR must be disabled if the frontbuffer mask contains a buffer relevant to PSR.

Dirty frontbuffers relevant to PSR are tracked in busy_frontbuffer_bits.”

void intel_psr_flush(struct drm_device * dev, unsigned frontbuffer_bits, enum fb_op_origin origin)¶: Flush PSR

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits
enum fb_op_origin origin: which operation caused the flush

Description

Since the hardware frontbuffer tracking has gaps we need to integrate with the software frontbuffer tracking. This function gets called every time frontbuffer rendering has completed and flushed out to memory. PSR can be enabled again if no other frontbuffer relevant to PSR is dirty.

Dirty frontbuffers relevant to PSR are tracked in busy_frontbuffer_bits.

void intel_psr_init(struct drm_device * dev)¶: Init basic PSR work and mutex.

Parameters

struct drm_device * dev: DRM device

Description

This function is called only once at driver load to initialize basic PSR stuff.

Frame Buffer Compression (FBC)¶

FBC tries to save memory bandwidth (and so power consumption) by compressing the amount of memory used by the display. It is total transparent to user space and completely handled in the kernel.

The benefits of FBC are mostly visible with solid backgrounds and variation-less patterns. It comes from keeping the memory footprint small and having fewer memory pages opened and accessed for refreshing the display.

i915 is responsible to reserve stolen memory for FBC and configure its offset on proper registers. The hardware takes care of all compress/decompress. However there are many known cases where we have to forcibly disable it to allow proper screen updates.

bool intel_fbc_is_active(struct drm_i915_private * dev_priv)¶: Is FBC active?

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function is used to verify the current state of FBC.

FIXME: This should be tracked in the plane config eventually instead of queried at runtime for most callers.

void intel_fbc_choose_crtc(struct drm_i915_private * dev_priv, struct drm_atomic_state * state)¶: select a CRTC to enable FBC on

Parameters

struct drm_i915_private * dev_priv: i915 device instance
struct drm_atomic_state * state: the atomic state structure

Description

This function looks at the proposed state for CRTCs and planes, then chooses which pipe is going to have FBC by setting intel_crtc_state->enable_fbc to true.

Later, intel_fbc_enable is going to look for state->enable_fbc and then maybe enable FBC for the chosen CRTC. If it does, it will set dev_priv->fbc.crtc.

void intel_fbc_enable(struct intel_crtc * crtc, struct intel_crtc_state * crtc_state, struct intel_plane_state * plane_state)¶

Parameters

struct intel_crtc * crtc: the CRTC
struct intel_crtc_state * crtc_state: undescribed
struct intel_plane_state * plane_state: undescribed

Description

This function checks if the given CRTC was chosen for FBC, then enables it if possible. Notice that it doesn’t activate FBC. It is valid to call intel_fbc_enable multiple times for the same pipe without an intel_fbc_disable in the middle, as long as it is deactivated.

void __intel_fbc_disable(struct drm_i915_private * dev_priv)¶: disable FBC

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This is the low level function that actually disables FBC. Callers should grab the FBC lock.

void intel_fbc_disable(struct intel_crtc * crtc)¶: disable FBC if it’s associated with crtc

Parameters

struct intel_crtc * crtc: the CRTC

Description

This function disables FBC if it’s associated with the provided CRTC.

void intel_fbc_global_disable(struct drm_i915_private * dev_priv)¶: globally disable FBC

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

This function disables FBC regardless of which CRTC is associated with it.

void intel_fbc_init_pipe_state(struct drm_i915_private * dev_priv)¶: initialize FBC’s CRTC visibility tracking

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

The FBC code needs to track CRTC visibility since the older platforms can’t have FBC enabled while multiple pipes are used. This function does the initial setup at driver load to make sure FBC is matching the real hardware.

void intel_fbc_init(struct drm_i915_private * dev_priv)¶: Initialize FBC

Parameters

struct drm_i915_private * dev_priv: the i915 device

Description

This function might be called during PM init process.

Display Refresh Rate Switching (DRRS)¶

Display Refresh Rate Switching (DRRS) is a power conservation feature which enables swtching between low and high refresh rates, dynamically, based on the usage scenario. This feature is applicable for internal panels.

Indication that the panel supports DRRS is given by the panel EDID, which would list multiple refresh rates for one resolution.

DRRS is of 2 types - static and seamless. Static DRRS involves changing refresh rate (RR) by doing a full modeset (may appear as a blink on screen) and is used in dock-undock scenario. Seamless DRRS involves changing RR without any visual effect to the user and can be used during normal system usage. This is done by programming certain registers.

Support for static/seamless DRRS may be indicated in the VBT based on inputs from the panel spec.

DRRS saves power by switching to low RR based on usage scenarios.

The implementation is based on frontbuffer tracking implementation. When there is a disturbance on the screen triggered by user activity or a periodic system activity, DRRS is disabled (RR is changed to high RR). When there is no movement on screen, after a timeout of 1 second, a switch to low RR is made.

For integration with frontbuffer tracking code, intel_edp_drrs_invalidate() and intel_edp_drrs_flush() are called.

DRRS can be further extended to support other internal panels and also the scenario of video playback wherein RR is set based on the rate requested by userspace.

void intel_dp_set_drrs_state(struct drm_device * dev, int refresh_rate)¶: program registers for RR switch to take effect

Parameters

struct drm_device * dev: DRM device
int refresh_rate: RR to be programmed

Description

This function gets called when refresh rate (RR) has to be changed from one frequency to another. Switches can be between high and low RR supported by the panel or to any other RR based on media playback (in this case, RR value needs to be passed from user space).

The caller of this function needs to take a lock on dev_priv->drrs.

void intel_edp_drrs_enable(struct intel_dp * intel_dp)¶: init drrs struct if supported

Parameters

struct intel_dp * intel_dp: DP struct

Description

Initializes frontbuffer_bits and drrs.dp

void intel_edp_drrs_disable(struct intel_dp * intel_dp)¶: Disable DRRS

Parameters

struct intel_dp * intel_dp: DP struct

void intel_edp_drrs_invalidate(struct drm_device * dev, unsigned frontbuffer_bits)¶: Disable Idleness DRRS

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

This function gets called everytime rendering on the given planes start. Hence DRRS needs to be Upclocked, i.e. (LOW_RR -> HIGH_RR).

Dirty frontbuffers relevant to DRRS are tracked in busy_frontbuffer_bits.

void intel_edp_drrs_flush(struct drm_device * dev, unsigned frontbuffer_bits)¶: Restart Idleness DRRS

Parameters

struct drm_device * dev: DRM device
unsigned frontbuffer_bits: frontbuffer plane tracking bits

Description

This function gets called every time rendering on the given planes has completed or flip on a crtc is completed. So DRRS should be upclocked (LOW_RR -> HIGH_RR). And also Idleness detection should be started again, if no other planes are dirty.

Dirty frontbuffers relevant to DRRS are tracked in busy_frontbuffer_bits.

struct drm_display_mode * intel_dp_drrs_init(struct intel_connector * intel_connector, struct drm_display_mode * fixed_mode)¶: Init basic DRRS work and mutex.

Parameters

struct intel_connector * intel_connector: eDP connector
struct drm_display_mode * fixed_mode: preferred mode of panel

Description

This function is called only once at driver load to initialize basic DRRS stuff.

Return

Downclock mode if panel supports it, else return NULL. DRRS support is determined by the presence of downclock mode (apart from VBT setting).

DPIO¶

VLV, CHV and BXT have slightly peculiar display PHYs for driving DP/HDMI ports. DPIO is the name given to such a display PHY. These PHYs don’t follow the standard programming model using direct MMIO registers, and instead their registers must be accessed trough IOSF sideband. VLV has one such PHY for driving ports B and C, and CHV adds another PHY for driving port D. Each PHY responds to specific IOSF-SB port.

Each display PHY is made up of one or two channels. Each channel houses a common lane part which contains the PLL and other common logic. CH0 common lane also contains the IOSF-SB logic for the Common Register Interface (CRI) ie. the DPIO registers. CRI clock must be running when any DPIO registers are accessed.

In addition to having their own registers, the PHYs are also controlled through some dedicated signals from the display controller. These include PLL reference clock enable, PLL enable, and CRI clock selection, for example.

Eeach channel also has two splines (also called data lanes), and each spline is made up of one Physical Access Coding Sub-Layer (PCS) block and two TX lanes. So each channel has two PCS blocks and four TX lanes. The TX lanes are used as DP lanes or TMDS data/clock pairs depending on the output type.

Additionally the PHY also contains an AUX lane with AUX blocks for each channel. This is used for DP AUX communication, but this fact isn’t really relevant for the driver since AUX is controlled from the display controller side. No DPIO registers need to be accessed during AUX communication,

Generally on VLV/CHV the common lane corresponds to the pipe and the spline (PCS/TX) corresponds to the port.

For dual channel PHY (VLV/CHV):

pipe A == CMN/PLL/REF CH0

pipe B == CMN/PLL/REF CH1

port B == PCS/TX CH0

port C == PCS/TX CH1

This is especially important when we cross the streams ie. drive port B with pipe B, or port C with pipe A.

For single channel PHY (CHV):

pipe C == CMN/PLL/REF CH0

port D == PCS/TX CH0

On BXT the entire PHY channel corresponds to the port. That means the PLL is also now associated with the port rather than the pipe, and so the clock needs to be routed to the appropriate transcoder. Port A PLL is directly connected to transcoder EDP and port B/C PLLs can be routed to any transcoder A/B/C.

Note: DDI0 is digital port B, DD1 is digital port C, and DDI2 is digital port D (CHV) or port A (BXT).

Dual channel PHY (VLV/CHV/BXT)
---------------------------------
|      CH0      |      CH1      |
|  CMN/PLL/REF  |  CMN/PLL/REF  |
|---------------|---------------| Display PHY
| PCS01 | PCS23 | PCS01 | PCS23 |
|-------|-------|-------|-------|
|TX0|TX1|TX2|TX3|TX0|TX1|TX2|TX3|
---------------------------------
|     DDI0      |     DDI1      | DP/HDMI ports
---------------------------------

Single channel PHY (CHV/BXT)
-----------------
|      CH0      |
|  CMN/PLL/REF  |
|---------------| Display PHY
| PCS01 | PCS23 |
|-------|-------|
|TX0|TX1|TX2|TX3|
-----------------
|     DDI2      | DP/HDMI port
-----------------

CSR firmware support for DMC¶

Display Context Save and Restore (CSR) firmware support added from gen9 onwards to drive newly added DMC (Display microcontroller) in display engine to save and restore the state of display engine when it enter into low-power state and comes back to normal.

Firmware loading status will be one of the below states: FW_UNINITIALIZED, FW_LOADED, FW_FAILED.

Once the firmware is written into the registers status will be moved from FW_UNINITIALIZED to FW_LOADED and for any erroneous condition status will be moved to FW_FAILED.

void intel_csr_load_program(struct drm_i915_private * dev_priv)¶: write the firmware from memory to register.

Parameters

struct drm_i915_private * dev_priv: i915 drm device.

Description

CSR firmware is read from a .bin file and kept in internal memory one time. Everytime display comes back from low power state this function is called to copy the firmware from internal memory to registers.

void intel_csr_ucode_init(struct drm_i915_private * dev_priv)¶: initialize the firmware loading.

Parameters

struct drm_i915_private * dev_priv: i915 drm device.

Description

This function is called at the time of loading the display driver to read firmware from a .bin file and copied into a internal memory.

void intel_csr_ucode_suspend(struct drm_i915_private * dev_priv)¶: prepare CSR firmware before system suspend

Parameters

struct drm_i915_private * dev_priv: i915 drm device

Description

Prepare the DMC firmware before entering system suspend. This includes flushing pending work items and releasing any resources acquired during init.

void intel_csr_ucode_resume(struct drm_i915_private * dev_priv)¶: init CSR firmware during system resume

Parameters

struct drm_i915_private * dev_priv: i915 drm device

Description

Reinitialize the DMC firmware during system resume, reacquiring any resources released in intel_csr_ucode_suspend().

void intel_csr_ucode_fini(struct drm_i915_private * dev_priv)¶: unload the CSR firmware.

Parameters

struct drm_i915_private * dev_priv: i915 drm device.

Description

Firmmware unloading includes freeing the internal memory and reset the firmware loading status.

Video BIOS Table (VBT)¶

The Video BIOS Table, or VBT, provides platform and board specific configuration information to the driver that is not discoverable or available through other means. The configuration is mostly related to display hardware. The VBT is available via the ACPI OpRegion or, on older systems, in the PCI ROM.

The VBT consists of a VBT Header (defined as struct vbt_header), a BDB Header (struct bdb_header), and a number of BIOS Data Blocks (BDB) that contain the actual configuration information. The VBT Header, and thus the VBT, begins with “$VBT” signature. The VBT Header contains the offset of the BDB Header. The data blocks are concatenated after the BDB Header. The data blocks have a 1-byte Block ID, 2-byte Block Size, and Block Size bytes of data. (Block 53, the MIPI Sequence Block is an exception.)

The driver parses the VBT during load. The relevant information is stored in driver private data for ease of use, and the actual VBT is not read after that.

bool intel_bios_is_valid_vbt(const void * buf, size_t size)¶: does the given buffer contain a valid VBT

Parameters

const void * buf: pointer to a buffer to validate
size_t size: size of the buffer

Description

Returns true on valid VBT.

int intel_bios_init(struct drm_i915_private * dev_priv)¶: find VBT and initialize settings from the BIOS

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Loads the Video BIOS and checks that the VBT exists. Sets scratch registers to appropriate values.

Returns 0 on success, nonzero on failure.

bool intel_bios_is_tv_present(struct drm_i915_private * dev_priv)¶: is integrated TV present in VBT

Parameters

struct drm_i915_private * dev_priv: i915 device instance

Description

Return true if TV is present. If no child devices were parsed from VBT, assume TV is present.

bool intel_bios_is_lvds_present(struct drm_i915_private * dev_priv, u8 * i2c_pin)¶: is LVDS present in VBT

Parameters

struct drm_i915_private * dev_priv: i915 device instance
u8 * i2c_pin: i2c pin for LVDS if present

Description

Return true if LVDS is present. If no child devices were parsed from VBT, assume LVDS is present.

bool intel_bios_is_port_present(struct drm_i915_private * dev_priv, enum port port)¶: is the specified digital port present

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum port port: port to check

Description

Return true if the device in port is present.

bool intel_bios_is_port_edp(struct drm_i915_private * dev_priv, enum port port)¶: is the device in given port eDP

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum port port: port to check

Description

Return true if the device in port is eDP.

bool intel_bios_is_dsi_present(struct drm_i915_private * dev_priv, enum port * port)¶: is DSI present in VBT

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum port * port: port for DSI if present

Description

Return true if DSI is present, and return the port in port.

bool intel_bios_is_port_hpd_inverted(struct drm_i915_private * dev_priv, enum port port)¶: is HPD inverted for port

Parameters

struct drm_i915_private * dev_priv: i915 device instance
enum port port: port to check

Description

Return true if HPD should be inverted for port.

struct vbt_header¶: VBT Header structure

Definition

struct vbt_header {
  u8 signature[20];
  u16 version;
  u16 header_size;
  u16 vbt_size;
  u8 vbt_checksum;
  u8 reserved0;
  u32 bdb_offset;
  u32 aim_offset[4];
};

Members

u8 signature[20]: VBT signature, always starts with “$VBT”
u16 version: Version of this structure
u16 header_size: Size of this structure
u16 vbt_size: Size of VBT (VBT Header, BDB Header and data blocks)
u8 vbt_checksum: Checksum
u8 reserved0: Reserved
u32 bdb_offset: Offset of struct bdb_header from beginning of VBT
u32 aim_offset[4]: Offsets of add-in data blocks from beginning of VBT

struct bdb_header¶: BDB Header structure

Definition

struct bdb_header {
  u8 signature[16];
  u16 version;
  u16 header_size;
  u16 bdb_size;
};

Members

u8 signature[16]: BDB signature “BIOS_DATA_BLOCK”
u16 version: Version of the data block definitions
u16 header_size: Size of this structure
u16 bdb_size: Size of BDB (BDB Header and data blocks)

Memory Management and Command Submission¶

This sections covers all things related to the GEM implementation in the i915 driver.

Batchbuffer Parsing¶

Motivation: Certain OpenGL features (e.g. transform feedback, performance monitoring) require userspace code to submit batches containing commands such as MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some generations of the hardware will noop these commands in “unsecure” batches (which includes all userspace batches submitted via i915) even though the commands may be safe and represent the intended programming model of the device.

The software command parser is similar in operation to the command parsing done in hardware for unsecure batches. However, the software parser allows some operations that would be noop’d by hardware, if the parser determines the operation is safe, and submits the batch as “secure” to prevent hardware parsing.

Threats: At a high level, the hardware (and software) checks attempt to prevent granting userspace undue privileges. There are three categories of privilege.

First, commands which are explicitly defined as privileged or which should only be used by the kernel driver. The parser generally rejects such commands, though it may allow some from the drm master process.

Second, commands which access registers. To support correct/enhanced userspace functionality, particularly certain OpenGL extensions, the parser provides a whitelist of registers which userspace may safely access (for both normal and drm master processes).

Third, commands which access privileged memory (i.e. GGTT, HWS page, etc). The parser always rejects such commands.

The majority of the problematic commands fall in the MI_* range, with only a few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW).

Implementation: Each ring maintains tables of commands and registers which the parser uses in scanning batch buffers submitted to that ring.

Since the set of commands that the parser must check for is significantly smaller than the number of commands supported, the parser tables contain only those commands required by the parser. This generally works because command opcode ranges have standard command length encodings. So for commands that the parser does not need to check, it can easily skip them. This is implemented via a per-ring length decoding vfunc.

Unfortunately, there are a number of commands that do not follow the standard length encoding for their opcode range, primarily amongst the MI_* commands. To handle this, the parser provides a way to define explicit “skip” entries in the per-ring command tables.

Other command table entries map fairly directly to high level categories mentioned above: rejected, master-only, register whitelist. The parser implements a number of checks, including the privileged memory checks, via a general bitmasking mechanism.

int i915_cmd_parser_init_ring(struct intel_engine_cs * engine)¶: set cmd parser related fields for a ringbuffer

Parameters

struct intel_engine_cs * engine: the engine to initialize

Description

Optionally initializes fields related to batch buffer command parsing in the struct intel_engine_cs based on whether the platform requires software command parsing.

Return

non-zero if initialization fails

void i915_cmd_parser_fini_ring(struct intel_engine_cs * engine)¶: clean up cmd parser related fields

Parameters

struct intel_engine_cs * engine: the engine to clean up

Description

Releases any resources related to command parsing that may have been initialized for the specified ring.

bool i915_needs_cmd_parser(struct intel_engine_cs * engine)¶: should a given ring use software command parsing?

Parameters

struct intel_engine_cs * engine: the engine in question

Description

Only certain platforms require software batch buffer command parsing, and only when enabled via module parameter.

Return

true if the ring requires software command parsing

int i915_parse_cmds(struct intel_engine_cs * engine, struct drm_i915_gem_object * batch_obj, struct drm_i915_gem_object * shadow_batch_obj, u32 batch_start_offset, u32 batch_len, bool is_master)¶: parse a submitted batch buffer for privilege violations

Parameters

struct intel_engine_cs * engine: the engine on which the batch is to execute
struct drm_i915_gem_object * batch_obj: the batch buffer in question
struct drm_i915_gem_object * shadow_batch_obj: copy of the batch buffer in question
u32 batch_start_offset: byte offset in the batch at which execution starts
u32 batch_len: length of the commands in batch_obj
bool is_master: is the submitting process the drm master?

Description

Parses the specified batch buffer looking for privilege violations as described in the overview.

Return

non-zero if the parser finds violations or otherwise fails; -EACCES if the batch appears legal but should use hardware parsing

int i915_cmd_parser_get_version(struct drm_i915_private * dev_priv)¶: get the cmd parser version number

Parameters

struct drm_i915_private * dev_priv: i915 device private

Description

The cmd parser maintains a simple increasing integer version number suitable for passing to userspace clients to determine what operations are permitted.

Return

the current version number of the cmd parser

Batchbuffer Pools¶

In order to submit batch buffers as ‘secure’, the software command parser must ensure that a batch buffer cannot be modified after parsing. It does this by copying the user provided batch buffer contents to a kernel owned buffer from which the hardware will actually execute, and by carefully managing the address space bindings for such buffers.

The batch pool framework provides a mechanism for the driver to manage a set of scratch buffers to use for this purpose. The framework can be extended to support other uses cases should they arise.

void i915_gem_batch_pool_init(struct drm_device * dev, struct i915_gem_batch_pool * pool)¶: initialize a batch buffer pool

Parameters

struct drm_device * dev: the drm device
struct i915_gem_batch_pool * pool: the batch buffer pool

void i915_gem_batch_pool_fini(struct i915_gem_batch_pool * pool)¶: clean up a batch buffer pool

Parameters

struct i915_gem_batch_pool * pool: the pool to clean up

Note

Callers must hold the struct_mutex.

struct drm_i915_gem_object * i915_gem_batch_pool_get(struct i915_gem_batch_pool * pool, size_t size)¶: allocate a buffer from the pool

Parameters

struct i915_gem_batch_pool * pool: the batch buffer pool
size_t size: the minimum desired size of the returned buffer

Description

Returns an inactive buffer from pool with at least size bytes, with the pages pinned. The caller must i915_gem_object_unpin_pages() on the returned object.

Note

Callers must hold the struct_mutex

Return

the buffer object or an error pointer

Logical Rings, Logical Ring Contexts and Execlists¶

Motivation: GEN8 brings an expansion of the HW contexts: “Logical Ring Contexts”. These expanded contexts enable a number of new abilities, especially “Execlists” (also implemented in this file).

One of the main differences with the legacy HW contexts is that logical ring contexts incorporate many more things to the context’s state, like PDPs or ringbuffer control registers:

The reason why PDPs are included in the context is straightforward: as PPGTTs (per-process GTTs) are actually per-context, having the PDPs contained there mean you don’t need to do a ppgtt->switch_mm yourself, instead, the GPU will do it for you on the context switch.

But, what about the ringbuffer control registers (head, tail, etc..)? shouldn’t we just need a set of those per engine command streamer? This is where the name “Logical Rings” starts to make sense: by virtualizing the rings, the engine cs shifts to a new “ring buffer” with every context switch. When you want to submit a workload to the GPU you: A) choose your context, B) find its appropriate virtualized ring, C) write commands to it and then, finally, D) tell the GPU to switch to that context.

Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch to a contexts is via a context execution list, ergo “Execlists”.

LRC implementation: Regarding the creation of contexts, we have:

One global default context.
One local default context for each opened fd.
One local extra context for each context create ioctl call.

Now that ringbuffers belong per-context (and not per-engine, like before) and that contexts are uniquely tied to a given engine (and not reusable, like before) we need:

One ringbuffer per-engine inside each context.
One backing object per-engine inside each context.

The global default context starts its life with these new objects fully allocated and populated. The local default context for each opened fd is more complex, because we don’t know at creation time which engine is going to use them. To handle this, we have implemented a deferred creation of LR contexts:

The local context starts its life as a hollow or blank holder, that only gets populated for a given engine once we receive an execbuffer. If later on we receive another execbuffer ioctl for the same context but a different engine, we allocate/populate a new ringbuffer and context backing object and so on.

Finally, regarding local contexts created using the ioctl call: as they are only allowed with the render ring, we can allocate & populate them right away (no need to defer anything, at least for now).

Execlists implementation: Execlists are the new method by which, on gen8+ hardware, workloads are submitted for execution (as opposed to the legacy, ringbuffer-based, method). This method works as follows:

When a request is committed, its commands (the BB start and any leading or trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer for the appropriate context. The tail pointer in the hardware context is not updated at this time, but instead, kept by the driver in the ringbuffer structure. A structure representing this request is added to a request queue for the appropriate engine: this structure contains a copy of the context’s tail after the request was written to the ring buffer and a pointer to the context itself.

If the engine’s request queue was empty before the request was added, the queue is processed immediately. Otherwise the queue will be processed during a context switch interrupt. In any case, elements on the queue will get sent (in pairs) to the GPU’s ExecLists Submit Port (ELSP, for short) with a globally unique 20-bits submission ID.

When execution of a request completes, the GPU updates the context status buffer with a context complete event and generates a context switch interrupt. During the interrupt handling, the driver examines the events in the buffer: for each context complete event, if the announced ID matches that on the head of the request queue, then that request is retired and removed from the queue.

After processing, if any requests were retired and the queue is not empty then a new execution list can be submitted. The two requests at the front of the queue are next to be submitted but since a context may not occur twice in an execution list, if subsequent requests have the same ID as the first then the two requests must be combined. This is done simply by discarding requests at the head of the queue until either only one requests is left (in which case we use a NULL second context) or the first two requests have unique IDs.

By always executing the first two requests in the queue the driver ensures that the GPU is kept as busy as possible. In the case where a single context completes but a second context is still executing, the request for this second context will be at the head of the queue when we remove the first one. This request will then be resubmitted along with a new request for a different context, which will cause the hardware to continue executing the second request and queue the new request (the GPU detects the condition of a context getting preempted with the same context and optimizes the context switch flow by not doing preemption, but just sampling the new tail pointer).

int intel_sanitize_enable_execlists(struct drm_i915_private * dev_priv, int enable_execlists)¶: sanitize i915.enable_execlists

Parameters

struct drm_i915_private * dev_priv: i915 device private
int enable_execlists: value of i915.enable_execlists module parameter.

Description

Only certain platforms support Execlists (the prerequisites being support for Logical Ring Contexts and Aliasing PPGTT or better).

Return

1 if Execlists is supported and has to be enabled.

void intel_lr_context_descriptor_update(struct i915_gem_context * ctx, struct intel_engine_cs * engine)¶: calculate & cache the descriptor descriptor for a pinned context

Parameters

struct i915_gem_context * ctx: Context to work on
struct intel_engine_cs * engine: Engine the descriptor will be used with

Description

The context descriptor encodes various attributes of a context, including its GTT address and some flags. Because it’s fairly expensive to calculate, we’ll just do it once and cache the result, which remains valid until the context is unpinned.

This is what a descriptor looks like, from LSB to MSB:: bits 0-11: flags, GEN8_CTX_* (cached in ctx_desc_template) bits 12-31: LRCA, GTT address of (the HWSP of) this context bits 32-52: ctx ID, a globally unique tag bits 53-54: mbz, reserved for use by hardware bits 55-63: group ID, currently unused and set to 0

void intel_lrc_irq_handler(unsigned long data)¶: handle Context Switch interrupts

Parameters

unsigned long data: tasklet handler passed in unsigned long

Description

Check the unread Context Status Buffers and manage the submission of new contexts to the ELSP accordingly.

int intel_execlists_submission(struct i915_execbuffer_params * params, struct drm_i915_gem_execbuffer2 * args, struct list_head * vmas)¶: submit a batchbuffer for execution, Execlists style

Parameters

struct i915_execbuffer_params * params: execbuffer call parameters.
struct drm_i915_gem_execbuffer2 * args: execbuffer call arguments.
struct list_head * vmas: list of vmas.

Description

This is the evil twin version of i915_gem_ringbuffer_submission. It abstracts away the submission details of the execbuffer ioctl call.

Return

non-zero if the submission fails.

int gen8_init_indirectctx_bb(struct intel_engine_cs * engine, struct i915_wa_ctx_bb * wa_ctx, uint32_t *const batch, uint32_t * offset)¶: initialize indirect ctx batch with WA

Parameters

struct intel_engine_cs * engine

only applicable for RCS

struct i915_wa_ctx_bb * wa_ctx

structure representing wa_ctx offset: specifies start of the batch, should be cache-aligned. This is updated

with the offset value received as input.

size: size of the batch in DWORDS but HW expects in terms of cachelines

uint32_t *const batch

page in which WA are loaded

uint32_t * offset

This field specifies the start of the batch, it should be cache-aligned otherwise it is adjusted accordingly. Typically we only have one indirect_ctx and per_ctx batch buffer which are initialized at the beginning and shared across all contexts but this field helps us to have multiple batches at different offsets and select them based on a criteria. At the moment this batch always start at the beginning of the page and at this point we don’t have multiple wa_ctx batch buffers.

Description

The number of WA applied are not known at the beginning; we use this field to return the no of DWORDS written.

It is to be noted that this batch does not contain MI_BATCH_BUFFER_END so it adds NOOPs as padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be added to perctx batch and both of them together makes a complete batch buffer.

Return

non-zero if we exceed the PAGE_SIZE limit.

int gen8_init_perctx_bb(struct intel_engine_cs * engine, struct i915_wa_ctx_bb * wa_ctx, uint32_t *const batch, uint32_t * offset)¶: initialize per ctx batch with WA

Parameters

struct intel_engine_cs * engine: only applicable for RCS
struct i915_wa_ctx_bb * wa_ctx: structure representing wa_ctx offset: specifies start of the batch, should be cache-aligned. size: size of the batch in DWORDS but HW expects in terms of cachelines
uint32_t *const batch: page in which WA are loaded
uint32_t * offset: This field specifies the start of this batch. This batch is started immediately after indirect_ctx batch. Since we ensure that indirect_ctx ends on a cacheline this batch is aligned automatically.

Description

The number of DWORDS written are returned using this field.

This batch is terminated with MI_BATCH_BUFFER_END and so we need not add padding to align it with cacheline as padding after MI_BATCH_BUFFER_END is redundant.

void intel_logical_ring_cleanup(struct intel_engine_cs * engine)¶: deallocate the Engine Command Streamer

Parameters

struct intel_engine_cs * engine: Engine Command Streamer.

int intel_logical_rings_init(struct drm_device * dev)¶: allocate, populate and init the Engine Command Streamers

Parameters

struct drm_device * dev: DRM device.

Description

This function inits the engines for an Execlists submission style (the equivalent in the legacy ringbuffer submission world would be i915_gem_init_engines). It does it only for those engines that are present in the hardware.

Return

non-zero if the initialization failed.

uint32_t intel_lr_context_size(struct intel_engine_cs * engine)¶: return the size of the context for an engine

Parameters

struct intel_engine_cs * engine: which engine to find the context size for

Description

Each engine may require a different amount of space for a context image, so when allocating (or copying) an image, this function can be used to find the right size for the specific engine.

Return

size (in bytes) of an engine-specific context image

Note

this size includes the HWSP, which is part of the context image in LRC mode, but does not include the “shared data page” used with GuC submission. The caller should account for this if using the GuC.

int execlists_context_deferred_alloc(struct i915_gem_context * ctx, struct intel_engine_cs * engine)¶: create the LRC specific bits of a context

Parameters

struct i915_gem_context * ctx: LR context to create.
struct intel_engine_cs * engine: engine to be used with the context.

Description

This function can be called more than once, with different engines, if we plan to use the context with them. The context backing objects and the ringbuffers (specially the ringbuffer backing objects) suck a lot of memory up, and that’s why the creation is a deferred call: it’s better to make sure first that we need to use a given ring with the context.

Return

non-zero on error.

Global GTT views¶

Background and previous state

Historically objects could exists (be bound) in global GTT space only as singular instances with a view representing all of the object’s backing pages in a linear fashion. This view will be called a normal view.

To support multiple views of the same object, where the number of mapped pages is not equal to the backing store, or where the layout of the pages is not linear, concept of a GGTT view was added.

One example of an alternative view is a stereo display driven by a single image. In this case we would have a framebuffer looking like this (2x2 pages):

12 34

Above would represent a normal GGTT view as normally mapped for GPU or CPU rendering. In contrast, fed to the display engine would be an alternative view which could look something like this:

1212 3434

In this example both the size and layout of pages in the alternative view is different from the normal view.

Implementation and usage

GGTT views are implemented using VMAs and are distinguished via enum i915_ggtt_view_type and struct i915_ggtt_view.

A new flavour of core GEM functions which work with GGTT bound objects were added with the _ggtt_ infix, and sometimes with _view postfix to avoid renaming in large amounts of code. They take the struct i915_ggtt_view parameter encapsulating all metadata required to implement a view.

As a helper for callers which are only interested in the normal view, globally const i915_ggtt_view_normal singleton instance exists. All old core GEM API functions, the ones not taking the view parameter, are operating on, or with the normal GGTT view.

Code wanting to add or use a new GGTT view needs to:

Add a new enum with a suitable name.
Extend the metadata in the i915_ggtt_view structure if required.
Add support to i915_get_vma_pages().

New views are required to build a scatter-gather table from within the i915_get_vma_pages function. This table is stored in the vma.ggtt_view and exists for the lifetime of an VMA.

Core API is designed to have copy semantics which means that passed in struct i915_ggtt_view does not need to be persistent (left around after calling the core API functions).

int gen8_ppgtt_alloc_pagetabs(struct i915_address_space * vm, struct i915_page_directory * pd, uint64_t start, uint64_t length, unsigned long * new_pts)¶: Allocate page tables for VA range.

Parameters

struct i915_address_space * vm: Master vm structure.
struct i915_page_directory * pd: Page directory for this address range.
uint64_t start: Starting virtual address to begin allocations.
uint64_t length: Size of the allocations.
unsigned long * new_pts: Bitmap set by function with new allocations. Likely used by the caller to free on error.

Description

Allocate the required number of page tables. Extremely similar to gen8_ppgtt_alloc_page_directories(). The main difference is here we are limited by the page directory boundary (instead of the page directory pointer). That boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_page_directories(), it is possible, and likely that the caller will need to use multiple calls of this function to achieve the appropriate allocation.

Return

0 if success; negative error code otherwise.

int gen8_ppgtt_alloc_page_directories(struct i915_address_space * vm, struct i915_page_directory_pointer * pdp, uint64_t start, uint64_t length, unsigned long * new_pds)¶: Allocate page directories for VA range.

Parameters

struct i915_address_space * vm: Master vm structure.
struct i915_page_directory_pointer * pdp: Page directory pointer for this address range.
uint64_t start: Starting virtual address to begin allocations.
uint64_t length: Size of the allocations.
unsigned long * new_pds: Bitmap set by function with new allocations. Likely used by the caller to free on error.

Description

Allocate the required number of page directories starting at the pde index of start, and ending at the pde index start + length. This function will skip over already allocated page directories within the range, and only allocate new ones, setting the appropriate pointer within the pdp as well as the correct position in the bitmap new_pds.

The function will only allocate the pages within the range for a give page directory pointer. In other words, if start + length straddles a virtually addressed PDP boundary (512GB for 4k pages), there will be more allocations required by the caller, This is not currently possible, and the BUG in the code will prevent it.

Return

0 if success; negative error code otherwise.

int gen8_ppgtt_alloc_page_dirpointers(struct i915_address_space * vm, struct i915_pml4 * pml4, uint64_t start, uint64_t length, unsigned long * new_pdps)¶: Allocate pdps for VA range.

Parameters

struct i915_address_space * vm: Master vm structure.
struct i915_pml4 * pml4: Page map level 4 for this address range.
uint64_t start: Starting virtual address to begin allocations.
uint64_t length: Size of the allocations.
unsigned long * new_pdps: Bitmap set by function with new allocations. Likely used by the caller to free on error.

Description

Allocate the required number of page directory pointers. Extremely similar to gen8_ppgtt_alloc_page_directories() and gen8_ppgtt_alloc_pagetabs(). The main difference is here we are limited by the pml4 boundary (instead of the page directory pointer).

Return

0 if success; negative error code otherwise.

void i915_gem_init_ggtt(struct drm_device * dev)¶: Initialize GEM for Global GTT

Parameters

struct drm_device * dev: DRM device

void i915_ggtt_cleanup_hw(struct drm_device * dev)¶: Clean up GGTT hardware initialization

Parameters

struct drm_device * dev: DRM device

int i915_ggtt_init_hw(struct drm_device * dev)¶: Initialize GGTT hardware

Parameters

struct drm_device * dev: DRM device

int i915_vma_bind(struct i915_vma * vma, enum i915_cache_level cache_level, u32 flags)¶: Sets up PTEs for an VMA in it’s corresponding address space.

Parameters

struct i915_vma * vma: VMA to map
enum i915_cache_level cache_level: mapping cache level
u32 flags: flags like global or local mapping

Description

DMA addresses are taken from the scatter-gather table of this object (or of this VMA in case of non-default GGTT views) and PTE entries set up. Note that DMA addresses are also the only part of the SG table we care about.

size_t i915_ggtt_view_size(struct drm_i915_gem_object * obj, const struct i915_ggtt_view * view)¶: Get the size of a GGTT view.

Parameters

struct drm_i915_gem_object * obj: Object the view is of.
const struct i915_ggtt_view * view: The view in question.

Description

return The size of the GGTT view in bytes.

GTT Fences and Swizzling¶

int i915_gem_object_put_fence(struct drm_i915_gem_object * obj)¶: force-remove fence for an object

Parameters

struct drm_i915_gem_object * obj: object to map through a fence reg

Description

This function force-removes any fence from the given object, which is useful if the kernel wants to do untiled GTT access.

Return

0 on success, negative error code on failure.

int i915_gem_object_get_fence(struct drm_i915_gem_object * obj)¶: set up fencing for an object

Parameters

struct drm_i915_gem_object * obj: object to map through a fence reg

Description

When mapping objects through the GTT, userspace wants to be able to write to them without having to worry about swizzling if the object is tiled. This function walks the fence regs looking for a free one for obj, stealing one if it can’t find any.

It then sets up the reg based on the object’s properties: address, pitch and tiling format.

For an untiled surface, this removes any existing fence.

Return

0 on success, negative error code on failure.

bool i915_gem_object_pin_fence(struct drm_i915_gem_object * obj)¶: pin fencing state

Parameters

struct drm_i915_gem_object * obj: object to pin fencing for

Description

This pins the fencing state (whether tiled or untiled) to make sure the object is ready to be used as a scanout target. Fencing status must be synchronize first by calling i915_gem_object_get_fence():

The resulting fence pin reference must be released again with i915_gem_object_unpin_fence().

Return

True if the object has a fence, false otherwise.

void i915_gem_object_unpin_fence(struct drm_i915_gem_object * obj)¶: unpin fencing state

Parameters

struct drm_i915_gem_object * obj: object to unpin fencing for

Description

This releases the fence pin reference acquired through i915_gem_object_pin_fence. It will handle both objects with and without an attached fence correctly, callers do not need to distinguish this.

void i915_gem_restore_fences(struct drm_device * dev)¶: restore fence state

Parameters

struct drm_device * dev: DRM device

Description

Restore the hw fence state to match the software tracking again, to be called after a gpu reset and on resume.

void i915_gem_detect_bit_6_swizzle(struct drm_device * dev)¶: detect bit 6 swizzling pattern

Parameters

struct drm_device * dev: DRM device

Description

Detects bit 6 swizzling of address lookup between IGD access and CPU access through main memory.

void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object * obj)¶: fixup bit 17 swizzling

Parameters

struct drm_i915_gem_object * obj: i915 GEM buffer object

Description

This function fixes up the swizzling in case any page frame number for this object has changed in bit 17 since that state has been saved with i915_gem_object_save_bit_17_swizzle().

This is called when pinning backing storage again, since the kernel is free to move unpinned backing storage around (either by directly moving pages or by swapping them out and back in again).

void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object * obj)¶: save bit 17 swizzling

Parameters

struct drm_i915_gem_object * obj: i915 GEM buffer object

Description

This function saves the bit 17 of each page frame number so that swizzling can be fixed up later on with i915_gem_object_do_bit_17_swizzle(). This must be called before the backing storage can be unpinned.

Global GTT Fence Handling¶

Important to avoid confusions: “fences” in the i915 driver are not execution fences used to track command completion but hardware detiler objects which wrap a given range of the global GTT. Each platform has only a fairly limited set of these objects.

Fences are used to detile GTT memory mappings. They’re also connected to the hardware frontbuffer render tracking and hence interact with frontbuffer compression. Furthermore on older platforms fences are required for tiled objects used by the display engine. They can also be used by the render engine - they’re required for blitter commands and are optional for render commands. But on gen4+ both display (with the exception of fbc) and rendering have their own tiling state bits and don’t need fences.

Also note that fences only support X and Y tiling and hence can’t be used for the fancier new tiling formats like W, Ys and Yf.

Finally note that because fences are such a restricted resource they’re dynamically associated with objects. Furthermore fence state is committed to the hardware lazily to avoid unnecessary stalls on gen2/3. Therefore code must explicitly call i915_gem_object_get_fence() to synchronize fencing status for cpu access. Also note that some code wants an unfenced view, for those cases the fence can be removed forcefully with i915_gem_object_put_fence().

Internally these functions will synchronize with userspace access by removing CPU ptes into GTT mmaps (not the GTT ptes themselves) as needed.

Hardware Tiling and Swizzling Details¶

The idea behind tiling is to increase cache hit rates by rearranging pixel data so that a group of pixel accesses are in the same cacheline. Performance improvement from doing this on the back/depth buffer are on the order of 30%.

Intel architectures make this somewhat more complicated, though, by adjustments made to addressing of data when the memory is in interleaved mode (matched pairs of DIMMS) to improve memory bandwidth. For interleaved memory, the CPU sends every sequential 64 bytes to an alternate memory channel so it can get the bandwidth from both.

The GPU also rearranges its accesses for increased bandwidth to interleaved memory, and it matches what the CPU does for non-tiled. However, when tiled it does it a little differently, since one walks addresses not just in the X direction but also Y. So, along with alternating channels when bit 6 of the address flips, it also alternates when other bits flip – Bits 9 (every 512 bytes, an X tile scanline) and 10 (every two X tile scanlines) are common to both the 915 and 965-class hardware.

The CPU also sometimes XORs in higher bits as well, to improve bandwidth doing strided access like we do so frequently in graphics. This is called “Channel XOR Randomization” in the MCH documentation. The result is that the CPU is XORing in either bit 11 or bit 17 to bit 6 of its address decode.

All of this bit 6 XORing has an effect on our memory management, as we need to make sure that the 3d driver can correctly address object contents.

If we don’t have interleaved memory, all tiling is safe and no swizzling is required.

When bit 17 is XORed in, we simply refuse to tile at all. Bit 17 is not just a page offset, so as we page an object out and back in, individual pages in it will have different bit 17 addresses, resulting in each 64 bytes being swapped with its neighbor!

Otherwise, if interleaved, we have to tell the 3d driver what the address swizzling it needs to do is, since it’s writing with the CPU to the pages (bit 6 and potentially bit 11 XORed in), and the GPU is reading from the pages (bit 6, 9, and 10 XORed in), resulting in a cumulative bit swizzling required by the CPU of XORing in bit 6, 9, 10, and potentially 11, in order to match what the GPU expects.

Object Tiling IOCTLs¶

int i915_gem_set_tiling(struct drm_device * dev, void * data, struct drm_file * file)¶: IOCTL handler to set tiling mode

Parameters

struct drm_device * dev: DRM device
void * data: data pointer for the ioctl
struct drm_file * file: DRM file for the ioctl call

Description

Sets the tiling mode of an object, returning the required swizzling of bit 6 of addresses in the object.

Called by the user via ioctl.

Return

Zero on success, negative errno on failure.

int i915_gem_get_tiling(struct drm_device * dev, void * data, struct drm_file * file)¶: IOCTL handler to get tiling mode

Parameters

struct drm_device * dev: DRM device
void * data: data pointer for the ioctl
struct drm_file * file: DRM file for the ioctl call

Description

Returns the current tiling mode and required bit 6 swizzling for the object.

Called by the user via ioctl.

Return

Zero on success, negative errno on failure.

i915_gem_set_tiling() and i915_gem_get_tiling() is the userspace interface to declare fence register requirements.

In principle GEM doesn’t care at all about the internal data layout of an object, and hence it also doesn’t care about tiling or swizzling. There’s two exceptions:

For X and Y tiling the hardware provides detilers for CPU access, so called fences. Since there’s only a limited amount of them the kernel must manage these, and therefore userspace must tell the kernel the object tiling if it wants to use fences for detiling.
On gen3 and gen4 platforms have a swizzling pattern for tiled objects which depends upon the physical page frame number. When swapping such objects the page frame number might change and the kernel must be able to fix this up and hence now the tiling. Note that on a subset of platforms with asymmetric memory channel population the swizzling pattern changes in an unknown way, and for those the kernel simply forbids swapping completely.

Since neither of this applies for new tiling layouts on modern platforms like W, Ys and Yf tiling GEM only allows object tiling to be set to X or Y tiled. Anything else can be handled in userspace entirely without the kernel’s invovlement.

Buffer Object Eviction¶

This section documents the interface functions for evicting buffer objects to make space available in the virtual gpu address spaces. Note that this is mostly orthogonal to shrinking buffer objects caches, which has the goal to make main memory (shared with the gpu through the unified memory architecture) available.

int i915_gem_evict_something(struct drm_device * dev, struct i915_address_space * vm, int min_size, unsigned alignment, unsigned cache_level, unsigned long start, unsigned long end, unsigned flags)¶: Evict vmas to make room for binding a new one

Parameters

struct drm_device * dev: drm_device
struct i915_address_space * vm: address space to evict from
int min_size: size of the desired free space
unsigned alignment: alignment constraint of the desired free space
unsigned cache_level: cache_level for the desired space
unsigned long start: start (inclusive) of the range from which to evict objects
unsigned long end: end (exclusive) of the range from which to evict objects
unsigned flags: additional flags to control the eviction algorithm

Description

This function will try to evict vmas until a free space satisfying the requirements is found. Callers must check first whether any such hole exists already before calling this function.

This function is used by the object/vma binding code.

Since this function is only used to free up virtual address space it only ignores pinned vmas, and not object where the backing storage itself is pinned. Hence obj->pages_pin_count does not protect against eviction.

To clarify: This is for freeing up virtual address space, not for freeing memory in e.g. the shrinker.

int i915_gem_evict_vm(struct i915_address_space * vm, bool do_idle)¶: Evict all idle vmas from a vm

Parameters

struct i915_address_space * vm: Address space to cleanse
bool do_idle: Boolean directing whether to idle first.

Description

This function evicts all idles vmas from a vm. If all unpinned vmas should be evicted the do_idle needs to be set to true.

This is used by the execbuf code as a last-ditch effort to defragment the address space.

To clarify: This is for freeing up virtual address space, not for freeing memory in e.g. the shrinker.

Buffer Object Memory Shrinking¶

This section documents the interface function for shrinking memory usage of buffer object caches. Shrinking is used to make main memory available. Note that this is mostly orthogonal to evicting buffer objects, which has the goal to make space in gpu virtual address spaces.

unsigned long i915_gem_shrink(struct drm_i915_private * dev_priv, unsigned long target, unsigned flags)¶: Shrink buffer object caches

Parameters

struct drm_i915_private * dev_priv: i915 device
unsigned long target: amount of memory to make available, in pages
unsigned flags: control flags for selecting cache types

Description

This function is the main interface to the shrinker. It will try to release up to target pages of main memory backing storage from buffer objects. Selection of the specific caches can be done with flags. This is e.g. useful when purgeable objects should be removed from caches preferentially.

Note that it’s not guaranteed that released amount is actually available as free system memory - the pages might still be in-used to due to other reasons (like cpu mmaps) or the mm core has reused them before we could grab them. Therefore code that needs to explicitly shrink buffer objects caches (e.g. to avoid deadlocks in memory reclaim) must fall back to i915_gem_shrink_all().

Also note that any kind of pinning (both per-vma address space pins and backing storage pins at the buffer object level) result in the shrinker code having to skip the object.

Return

The number of pages of backing storage actually released.

unsigned long i915_gem_shrink_all(struct drm_i915_private * dev_priv)¶: Shrink buffer object caches completely

Parameters

struct drm_i915_private * dev_priv: i915 device

Description

This is a simple wraper around i915_gem_shrink() to aggressively shrink all caches completely. It also first waits for and retires all outstanding requests to also be able to release backing storage for active objects.

This should only be used in code to intentionally quiescent the gpu or as a last-ditch effort when memory seems to have run out.

Return

The number of pages of backing storage actually released.

void i915_gem_shrinker_init(struct drm_i915_private * dev_priv)¶: Initialize i915 shrinker

Parameters

struct drm_i915_private * dev_priv: i915 device

Description

This function registers and sets up the i915 shrinker and OOM handler.

void i915_gem_shrinker_cleanup(struct drm_i915_private * dev_priv)¶: Clean up i915 shrinker

Parameters

struct drm_i915_private * dev_priv: i915 device

Description

This function unregisters the i915 shrinker and OOM handler.

GuC¶

GuC-specific firmware loader¶

intel_guc: Top level structure of guc. It handles firmware loading and manages client pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy ExecList submission.

Firmware versioning: The firmware build process will generate a version header file with major and minor version defined. The versions are built into CSS header of firmware. i915 kernel driver set the minimal firmware version required per platform. The firmware installation package will install (symbolic link) proper version of firmware.

GuC address space: GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP), which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is 512K. In order to exclude 0-512K address space from GGTT, all gfx objects used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM.

Firmware log: Firmware log is enabled by setting i915.guc_log_level to non-negative level. Log data is printed out via reading debugfs i915_guc_log_dump. Reading from i915_guc_load_status will print out firmware loading status and scratch registers value.

int intel_guc_setup(struct drm_device * dev)¶: finish preparing the GuC for activity

Parameters

struct drm_device * dev: drm device

Description

Called from gem_init_hw() during driver loading and also after a GPU reset.

The main action required here it to load the GuC uCode into the device. The firmware image should have already been fetched into memory by the earlier call to intel_guc_init(), so here we need only check that worked, and then transfer the image to the h/w.

Return

non-zero code on error

void intel_guc_init(struct drm_device * dev)¶: define parameters and fetch firmware

Parameters

struct drm_device * dev: drm device

Description

Called early during driver load, but after GEM is initialised.

The firmware will be transferred to the GuC’s memory later, when intel_guc_setup() is called.

void intel_guc_fini(struct drm_device * dev)¶: clean up all allocated resources

Parameters

struct drm_device * dev: drm device

GuC-based command submission¶

i915_guc_client: We use the term client to avoid confusion with contexts. A i915_guc_client is equivalent to GuC object guc_context_desc. This context descriptor is allocated from a pool of 1024 entries. Kernel driver will allocate doorbell and workqueue for it. Also the process descriptor (guc_process_desc), which is mapped to client space. So the client can write Work Item then ring the doorbell.

To simplify the implementation, we allocate one gem object that contains all pages for doorbell, process descriptor and workqueue.

The Scratch registers: There are 16 MMIO-based registers start from 0xC180. The kernel driver writes a value to the action register (SOFT_SCRATCH_0) along with any data. It then triggers an interrupt on the GuC via another register write (0xC4C8). Firmware writes a success/fail code back to the action register after processes the request. The kernel driver polls waiting for this update and then proceeds. See host2guc_action()

Doorbells: Doorbells are interrupts to uKernel. A doorbell is a single cache line (QW) mapped into process space.

Work Items: There are several types of work items that the host may place into a workqueue, each with its own requirements and limitations. Currently only WQ_TYPE_INORDER is needed to support legacy submission via GuC, which represents in-order queue. The kernel driver packs ring tail pointer and an ELSP context descriptor dword into Work Item. See guc_add_workqueue_item()

int i915_guc_wq_check_space(struct drm_i915_gem_request * request)¶: check that the GuC can accept a request

Parameters

struct drm_i915_gem_request * request: request associated with the commands

Return

0 if space is available: -EAGAIN if space is not currently available

This function must be called (and must return 0) before a request is submitted to the GuC via i915_guc_submit() below. Once a result of 0 has been returned, it remains valid until (but only until) the next call to submit().

This precheck allows the caller to determine in advance that space will be available for the next submission before committing resources to it, and helps avoid late failures with complicated recovery paths.

int i915_guc_submit(struct drm_i915_gem_request * rq)¶: Submit commands through GuC

Parameters

struct drm_i915_gem_request * rq: request associated with the commands

Return

0 on success, otherwise an errno.: (Note: nonzero really shouldn’t happen!)

The caller must have already called i915_guc_wq_check_space() above with a result of 0 (success) since the last request submission. This guarantees that there is space in the work queue for the new request, so enqueuing the item cannot fail.

Bad Things Will Happen if the caller violates this protocol e.g. calls submit() when check() says there’s no space, or calls submit() multiple times with no intervening check().

The only error here arises if the doorbell hardware isn’t functioning as expected, which really shouln’t happen.

struct drm_i915_gem_object * gem_allocate_guc_obj(struct drm_i915_private * dev_priv, u32 size)¶: Allocate gem object for GuC usage

Parameters

struct drm_i915_private * dev_priv: driver private data structure
u32 size: size of object

Description

This is a wrapper to create a gem obj. In order to use it inside GuC, the object needs to be pinned lifetime. Also we must pin it to gtt space other than [0, GUC_WOPCM_TOP) because this range is reserved inside GuC.

Return

A drm_i915_gem_object if successful, otherwise NULL.

void gem_release_guc_obj(struct drm_i915_gem_object * obj)¶: Release gem object allocated for GuC usage

Parameters

struct drm_i915_gem_object * obj: gem obj to be released

struct i915_guc_client * guc_client_alloc(struct drm_i915_private * dev_priv, uint32_t priority, struct i915_gem_context * ctx)¶: Allocate an i915_guc_client

Parameters

struct drm_i915_private * dev_priv: driver private data structure
uint32_t priority: four levels priority _CRITICAL, _HIGH, _NORMAL and _LOW The kernel client to replace ExecList submission is created with NORMAL priority. Priority of a client for scheduler can be HIGH, while a preemption context can use CRITICAL.
struct i915_gem_context * ctx: the context that owns the client (we use the default render context)

Return

An i915_guc_client object if success, else NULL.

int intel_guc_suspend(struct drm_device * dev)¶: notify GuC entering suspend state

Parameters

struct drm_device * dev: drm device

int intel_guc_resume(struct drm_device * dev)¶: notify GuC resuming from suspend state

Parameters

struct drm_device * dev: drm device

GuC Firmware Layout¶

The GuC firmware layout looks like this:

guc_css_header

contains major/minor version

uCode

RSA signature

modulus key

exponent val

The firmware may or may not have modulus key and exponent data. The header, uCode and RSA signature are must-have components that will be used by driver. Length of each components, which is all in dwords, can be found in header. In the case that modulus and exponent are not present in fw, a.k.a truncated image, the length value still appears in header.

Driver will do some basic fw size validation based on the following rules:

Header, uCode and RSA are must-have components.

2. All firmware components, if they present, are in the sequence illustrated in the layout table above. 3. Length info of each component can be found in header, in dwords. 4. Modulus and exponent key are not required by driver. They may not appear in fw. So driver will load a truncated firmware in this case.

Tracing¶

This sections covers all things related to the tracepoints implemented in the i915 driver.

i915_ppgtt_create and i915_ppgtt_release¶

With full ppgtt enabled each process using drm will allocate at least one translation table. With these traces it is possible to keep track of the allocation and of the lifetime of the tables; this can be used during testing/debug to verify that we are not leaking ppgtts. These traces identify the ppgtt through the vm pointer, which is also printed by the i915_vma_bind and i915_vma_unbind tracepoints.

i915_context_create and i915_context_free¶

These tracepoints are used to track creation and deletion of contexts. If full ppgtt is enabled, they also print the address of the vm assigned to the context.

switch_mm¶

This tracepoint allows tracking of the mm switch, which is an important point in the lifetime of the vm in the legacy submission path. This tracepoint is called only if full ppgtt is enabled.