Porting an architecture to support PREEMPT_RT

Author:

Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This list outlines the architecture specific requirements that must be implemented in order to enable PREEMPT_RT. Once all required features are implemented, ARCH_SUPPORTS_RT can be selected in architecture’s Kconfig to make PREEMPT_RT selectable. Many prerequisites (genirq support for example) are enforced by the common code and are omitted here.

The optional features are not strictly required but it is worth to consider them.

Requirements

Forced threaded interrupts

CONFIG_IRQ_FORCED_THREADING must be selected. Any interrupts that must remain in hard-IRQ context must be marked with IRQF_NO_THREAD. This requirement applies for instance to clocksource event interrupts, perf interrupts and cascading interrupt-controller handlers.

PREEMPTION support

Kernel preemption must be supported and requires that CONFIG_ARCH_NO_PREEMPT remain unselected. Scheduling requests, such as those issued from an interrupt or other exception handler, must be processed immediately.

POSIX CPU timers and KVM

POSIX CPU timers must expire from thread context rather than directly within the timer interrupt. This behavior is enabled by setting the configuration option CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK. When KVM is enabled, CONFIG_KVM_XFER_TO_GUEST_WORK must also be set to ensure that any pending work, such as POSIX timer expiration, is handled before transitioning into guest mode.

Hard-IRQ and Soft-IRQ stacks

Soft interrupts are handled in the thread context in which they are raised. If a soft interrupt is triggered from hard-IRQ context, its execution is deferred to the ksoftirqd thread. Preemption is never disabled during soft interrupt handling, which makes soft interrupts preemptible. If an architecture provides a custom __do_softirq() implementation that uses a separate stack, it must select CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK. The functionality should only be enabled when CONFIG_SOFTIRQ_ON_OWN_STACK is set.

FPU and SIMD access in kernel mode

FPU and SIMD registers are typically not used in kernel mode and are therefore not saved during kernel preemption. As a result, any kernel code that uses these registers must be enclosed within a kernel_fpu_begin() and kernel_fpu_end() section. The kernel_fpu_begin() function usually invokes local_bh_disable() to prevent interruptions from softirqs and to disable regular preemption. This allows the protected code to run safely in both thread and softirq contexts. On PREEMPT_RT kernels, however, kernel_fpu_begin() must not call local_bh_disable(). Instead, it should use preempt_disable(), since softirqs are always handled in thread context under PREEMPT_RT. In this case, disabling preemption alone is sufficient. The crypto subsystem operates on memory pages and requires users to “walk and map” these pages while processing a request. This operation must occur outside the kernel_fpu_begin()/ kernel_fpu_end() section because it requires preemption to be enabled. These preemption points are generally sufficient to avoid excessive scheduling latency.

Exception handlers

Exception handlers, such as the page fault handler, typically enable interrupts early, before invoking any generic code to process the exception. This is necessary because handling a page fault may involve operations that can sleep. Enabling interrupts is especially important on PREEMPT_RT, where certain locks, such as spinlock_t, become sleepable. For example, handling an invalid opcode may result in sending a SIGILL signal to the user task. A debug excpetion will send a SIGTRAP signal. In both cases, if the exception occurred in user space, it is safe to enable interrupts early. Sending a signal requires both interrupts and kernel preemption to be enabled.

Optional features

Timer and clocksource

A high-resolution clocksource and clockevents device are recommended. The clockevents device should support the CLOCK_EVT_FEAT_ONESHOT feature for optimal timer behavior. In most cases, microsecond-level accuracy is sufficient

Lazy preemption

This mechanism allows an in-kernel scheduling request for non-real-time tasks to be delayed until the task is about to return to user space. It helps avoid preempting a task that holds a sleeping lock at the time of the scheduling request. With CONFIG_GENERIC_IRQ_ENTRY enabled, supporting this feature requires defining a bit for TIF_NEED_RESCHED_LAZY, preferably near TIF_NEED_RESCHED.

Serial console with NBCON

With PREEMPT_RT enabled, all console output is handled by a dedicated thread rather than directly from the context in which printk() is invoked. This design allows printk() to be safely used in atomic contexts. However, this also means that if the kernel crashes and cannot switch to the printing thread, no output will be visible preventing the system from printing its final messages. There are exceptions for immediate output, such as during panic() handling. To support this, the console driver must implement new-style lock handling. This involves setting the CON_NBCON flag in console::flags and providing implementations for the write_atomic, write_thread, device_lock, and device_unlock callbacks.