Usphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget'/translations/zh_CN/scheduler/sched-extmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget'/translations/zh_TW/scheduler/sched-extmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget'/translations/it_IT/scheduler/sched-extmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget'/translations/ja_JP/scheduler/sched-extmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget'/translations/ko_KR/scheduler/sched-extmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hPortuguese (Brazilian)}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget'/translations/pt_BR/scheduler/sched-extmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget'/translations/sp_SP/scheduler/sched-extmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhtarget)}(h.. _sched-ext:h]h}(h]h ]h"]h$]h&]refid sched-extuh1hhKhhhhhA/var/lib/git/docbuild/linux/Documentation/scheduler/sched-ext.rstubhsection)}(hhh](htitle)}(hExtensible Scheduler Classh]hExtensible Scheduler Class}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hjsched_ext is a scheduler class whose behavior can be defined by a set of BPF programs - the BPF scheduler.h]hjsched_ext is a scheduler class whose behavior can be defined by a set of BPF programs - the BPF scheduler.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh bullet_list)}(hhh](h list_item)}(hjsched_ext exports a full scheduling interface so that any scheduling algorithm can be implemented on top. h]h)}(hisched_ext exports a full scheduling interface so that any scheduling algorithm can be implemented on top.h]hisched_ext exports a full scheduling interface so that any scheduling algorithm can be implemented on top.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(hThe BPF scheduler can group CPUs however it sees fit and schedule them together, as tasks aren't tied to specific CPUs at the time of wakeup. h]h)}(hThe BPF scheduler can group CPUs however it sees fit and schedule them together, as tasks aren't tied to specific CPUs at the time of wakeup.h]hThe BPF scheduler can group CPUs however it sees fit and schedule them together, as tasks aren’t tied to specific CPUs at the time of wakeup.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hjubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(h@The BPF scheduler can be turned on and off dynamically anytime. h]h)}(h?The BPF scheduler can be turned on and off dynamically anytime.h]h?The BPF scheduler can be turned on and off dynamically anytime.}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(hThe system integrity is maintained no matter what the BPF scheduler does. The default scheduling behavior is restored anytime an error is detected, a runnable task stalls, or on invoking the SysRq key sequence `SysRq-S`. h]h)}(hThe system integrity is maintained no matter what the BPF scheduler does. The default scheduling behavior is restored anytime an error is detected, a runnable task stalls, or on invoking the SysRq key sequence `SysRq-S`.h](hThe system integrity is maintained no matter what the BPF scheduler does. The default scheduling behavior is restored anytime an error is detected, a runnable task stalls, or on invoking the SysRq key sequence }(hj<hhhNhNubhtitle_reference)}(h `SysRq-S`h]hSysRq-S}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1jDhj<ubh.}(hj<hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj8ubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(hXxWhen the BPF scheduler triggers an error, debug information is dumped to aid debugging. The debug dump is passed to and printed out by the scheduler binary. The debug dump can also be accessed through the `sched_ext_dump` tracepoint. The SysRq key sequence `SysRq-D` triggers a debug dump. This doesn't terminate the BPF scheduler and can only be read through the tracepoint. h]h)}(hXwWhen the BPF scheduler triggers an error, debug information is dumped to aid debugging. The debug dump is passed to and printed out by the scheduler binary. The debug dump can also be accessed through the `sched_ext_dump` tracepoint. The SysRq key sequence `SysRq-D` triggers a debug dump. This doesn't terminate the BPF scheduler and can only be read through the tracepoint.h](hWhen the BPF scheduler triggers an error, debug information is dumped to aid debugging. The debug dump is passed to and printed out by the scheduler binary. The debug dump can also be accessed through the }(hjhhhhNhNubjE)}(h`sched_ext_dump`h]hsched_ext_dump}(hjphhhNhNubah}(h]h ]h"]h$]h&]uh1jDhjhubh$ tracepoint. The SysRq key sequence }(hjhhhhNhNubjE)}(h `SysRq-D`h]hSysRq-D}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jDhjhubho triggers a debug dump. This doesn’t terminate the BPF scheduler and can only be read through the tracepoint.}(hjhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjdubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1hhhhK hhhhubh)}(hhh](h)}(hSwitching to and from sched_exth]hSwitching to and from sched_ext}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(h``CONFIG_SCHED_CLASS_EXT`` is the config option to enable sched_ext and ``tools/sched_ext`` contains the example schedulers. The following config options should be enabled to use sched_ext:h](hliteral)}(h``CONFIG_SCHED_CLASS_EXT``h]hCONFIG_SCHED_CLASS_EXT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh. is the config option to enable sched_ext and }(hjhhhNhNubj)}(h``tools/sched_ext``h]htools/sched_ext}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhb contains the example schedulers. The following config options should be enabled to use sched_ext:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK!hjhhubh literal_block)}(hCONFIG_BPF=y CONFIG_SCHED_CLASS_EXT=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y CONFIG_DEBUG_INFO_BTF=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_BPF_JIT_DEFAULT_ON=yh]hCONFIG_BPF=y CONFIG_SCHED_CLASS_EXT=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y CONFIG_DEBUG_INFO_BTF=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_BPF_JIT_DEFAULT_ON=y}hjsbah}(h]h ]h"]h$]h&] xml:spacepreserveforcelanguagenonehighlight_args}uh1jhhhK%hjhhubh)}(hDsched_ext is used only when the BPF scheduler is loaded and running.h]hDsched_ext is used only when the BPF scheduler is loaded and running.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK/hjhhubh)}(hIf a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the BPF scheduler is loaded.h](h3If a task explicitly sets its scheduling policy to }(hjhhhNhNubj)}(h ``SCHED_EXT``h]h SCHED_EXT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh, it will be treated as }(hjhhhNhNubj)}(h``SCHED_NORMAL``h]h SCHED_NORMAL}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhM and scheduled by the fair-class scheduler until the BPF scheduler is loaded.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK1hjhhubh)}(hWhen the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and ``SCHED_EXT`` tasks are scheduled by sched_ext.h](h%When the BPF scheduler is loaded and }(hj@hhhNhNubj)}(h``SCX_OPS_SWITCH_PARTIAL``h]hSCX_OPS_SWITCH_PARTIAL}(hjHhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj@ubh is not set in }(hj@hhhNhNubj)}(h``ops->flags``h]h ops->flags}(hjZhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj@ubh, all }(hj@hhhNhNubj)}(h``SCHED_NORMAL``h]h SCHED_NORMAL}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj@ubh, }(hj@hhhNhNubj)}(h``SCHED_BATCH``h]h SCHED_BATCH}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj@ubh, }hj@sbj)}(h``SCHED_IDLE``h]h SCHED_IDLE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj@ubh, and }(hj@hhhNhNubj)}(h ``SCHED_EXT``h]h SCHED_EXT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj@ubh" tasks are scheduled by sched_ext.}(hj@hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK5hjhhubh)}(hX\However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and ``SCHED_IDLE`` policies are scheduled by the fair-class scheduler which has higher sched_class precedence than ``SCHED_EXT``.h](h.However, when the BPF scheduler is loaded and }(hjhhhNhNubj)}(h``SCX_OPS_SWITCH_PARTIAL``h]hSCX_OPS_SWITCH_PARTIAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is set in }(hjhhhNhNubj)}(h``ops->flags``h]h ops->flags}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh, only tasks with the }(hjhhhNhNubj)}(h ``SCHED_EXT``h]h SCHED_EXT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh5 policy are scheduled by sched_ext, while tasks with }(hjhhhNhNubj)}(h``SCHED_NORMAL``h]h SCHED_NORMAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh, }(hjhhhNhNubj)}(h``SCHED_BATCH``h]h SCHED_BATCH}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh and }(hjhhhNhNubj)}(h``SCHED_IDLE``h]h SCHED_IDLE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubha policies are scheduled by the fair-class scheduler which has higher sched_class precedence than }(hjhhhNhNubj)}(h ``SCHED_EXT``h]h SCHED_EXT}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK9hjhhubh)}(hTerminating the sched_ext scheduler program, triggering `SysRq-S`, or detection of any internal error including stalled runnable tasks aborts the BPF scheduler and reverts all tasks back to the fair-class scheduler.h](h8Terminating the sched_ext scheduler program, triggering }(hjFhhhNhNubjE)}(h `SysRq-S`h]hSysRq-S}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1jDhjFubh, or detection of any internal error including stalled runnable tasks aborts the BPF scheduler and reverts all tasks back to the fair-class scheduler.}(hjFhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK?hjhhubj)}(h# make -j16 -C tools/sched_ext # tools/sched_ext/build/bin/scx_simple local=0 global=3 local=5 global=24 local=9 global=44 local=13 global=56 local=17 global=72 ^CEXIT: BPF scheduler unregisteredh]h# make -j16 -C tools/sched_ext # tools/sched_ext/build/bin/scx_simple local=0 global=3 local=5 global=24 local=9 global=44 local=13 global=56 local=17 global=72 ^CEXIT: BPF scheduler unregistered}hjfsbah}(h]h ]h"]h$]h&]jjjjnonej}uh1jhhhKChjhhubh)}(hEThe current status of the BPF scheduler can be determined as follows:h]hEThe current status of the BPF scheduler can be determined as follows:}(hjvhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKNhjhhubj)}(hU# cat /sys/kernel/sched_ext/state enabled # cat /sys/kernel/sched_ext/root/ops simpleh]hU# cat /sys/kernel/sched_ext/state enabled # cat /sys/kernel/sched_ext/root/ops simple}hjsbah}(h]h ]h"]h$]h&]jjjjnonej}uh1jhhhKPhjhhubh)}(hYou can check if any BPF scheduler has ever been loaded since boot by examining this monotonically incrementing counter (a value of zero indicates that no BPF scheduler has been loaded):h]hYou can check if any BPF scheduler has ever been loaded since boot by examining this monotonically incrementing counter (a value of zero indicates that no BPF scheduler has been loaded):}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKWhjhhubj)}(h(# cat /sys/kernel/sched_ext/enable_seq 1h]h(# cat /sys/kernel/sched_ext/enable_seq 1}hjsbah}(h]h ]h"]h$]h&]jjjjnonej}uh1jhhhK[hjhhubh)}(hEach running scheduler also exposes a per-scheduler ``events`` file under ``/sys/kernel/sched_ext//events`` that tracks diagnostic counters. Each counter occupies one ``name value`` line:h](h4Each running scheduler also exposes a per-scheduler }(hjhhhNhNubj)}(h ``events``h]hevents}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh file under }(hjhhhNhNubj)}(h1``/sys/kernel/sched_ext//events``h]h-/sys/kernel/sched_ext//events}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh< that tracks diagnostic counters. Each counter occupies one }(hjhhhNhNubj)}(h``name value``h]h name value}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh line:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK`hjhhubj)}(hX# cat /sys/kernel/sched_ext/simple/events SCX_EV_SELECT_CPU_FALLBACK 0 SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0 SCX_EV_DISPATCH_KEEP_LAST 123 SCX_EV_ENQ_SKIP_EXITING 0 SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0 SCX_EV_REENQ_IMMED 0 SCX_EV_REENQ_LOCAL_REPEAT 0 SCX_EV_REFILL_SLICE_DFL 456789 SCX_EV_BYPASS_DURATION 0 SCX_EV_BYPASS_DISPATCH 0 SCX_EV_BYPASS_ACTIVATE 0 SCX_EV_INSERT_NOT_OWNED 0 SCX_EV_SUB_BYPASS_DISPATCH 0h]hX# cat /sys/kernel/sched_ext/simple/events SCX_EV_SELECT_CPU_FALLBACK 0 SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0 SCX_EV_DISPATCH_KEEP_LAST 123 SCX_EV_ENQ_SKIP_EXITING 0 SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0 SCX_EV_REENQ_IMMED 0 SCX_EV_REENQ_LOCAL_REPEAT 0 SCX_EV_REFILL_SLICE_DFL 456789 SCX_EV_BYPASS_DURATION 0 SCX_EV_BYPASS_DISPATCH 0 SCX_EV_BYPASS_ACTIVATE 0 SCX_EV_INSERT_NOT_OWNED 0 SCX_EV_SUB_BYPASS_DISPATCH 0}hjsbah}(h]h ]h"]h$]h&]jjjjnonej}uh1jhhhKdhjhhubh)}(hGThe counters are described in ``kernel/sched/ext_internal.h``; briefly:h](hThe counters are described in }(hjhhhNhNubj)}(h``kernel/sched/ext_internal.h``h]hkernel/sched/ext_internal.h}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh ; briefly:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKuhjhhubh)}(hhh](h)}(h``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by the task and the core scheduler silently picked a fallback CPU.h]h)}(h``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by the task and the core scheduler silently picked a fallback CPU.h](j)}(h``SCX_EV_SELECT_CPU_FALLBACK``h]hSCX_EV_SELECT_CPU_FALLBACK}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj-ubhm: ops.select_cpu() returned a CPU unusable by the task and the core scheduler silently picked a fallback CPU.}(hj-hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKwhj)ubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected to the global DSQ because the target CPU went offline.h]h)}(h``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected to the global DSQ because the target CPU went offline.h](j)}(h%``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``h]h!SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE}(hjWhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjSubh\: a local-DSQ dispatch was redirected to the global DSQ because the target CPU went offline.}(hjShhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKyhjOubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).h]h)}(h``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).h](j)}(h``SCX_EV_DISPATCH_KEEP_LAST``h]hSCX_EV_DISPATCH_KEEP_LAST}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjyubhJ: a task continued running because no other task was available (only when }(hjyhhhNhNubj)}(h``SCX_OPS_ENQ_LAST``h]hSCX_OPS_ENQ_LAST}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjyubh is not set).}(hjyhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK{hjuubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).h]h)}(h``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).h](j)}(h``SCX_EV_ENQ_SKIP_EXITING``h]hSCX_EV_ENQ_SKIP_EXITING}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh_: an exiting task was dispatched to the local DSQ directly, bypassing ops.enqueue() (only when }(hjhhhNhNubj)}(h``SCX_OPS_ENQ_EXITING``h]hSCX_OPS_ENQ_EXITING}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is not set).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK}hjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was dispatched to its local DSQ directly (only when ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).h]h)}(h``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was dispatched to its local DSQ directly (only when ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).h](j)}(h&``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``h]h"SCX_EV_ENQ_SKIP_MIGRATION_DISABLED}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhP: a migration-disabled task was dispatched to its local DSQ directly (only when }(hjhhhNhNubj)}(h"``SCX_OPS_ENQ_MIGRATION_DISABLED``h]hSCX_OPS_ENQ_MIGRATION_DISABLED}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is not set).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was re-enqueued because the target CPU was not available for immediate execution.h]h)}(h``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was re-enqueued because the target CPU was not available for immediate execution.h](j)}(h``SCX_EV_REENQ_IMMED``h]hSCX_EV_REENQ_IMMED}(hj%hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj!ubh: a task dispatched with }(hj!hhhNhNubj)}(h``SCX_ENQ_IMMED``h]h SCX_ENQ_IMMED}(hj7hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj!ubhR was re-enqueued because the target CPU was not available for immediate execution.}(hj!hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ`` handling in the BPF scheduler.h]h)}(h``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ`` handling in the BPF scheduler.h](j)}(h``SCX_EV_REENQ_LOCAL_REPEAT``h]hSCX_EV_REENQ_LOCAL_REPEAT}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjYubh`: a reenqueue of the local DSQ triggered another reenqueue; recurring counts indicate incorrect }(hjYhhhNhNubj)}(h``SCX_ENQ_REENQ``h]h SCX_ENQ_REENQ}(hjohhhNhNubah}(h]h ]h"]h$]h&]uh1jhjYubh handling in the BPF scheduler.}(hjYhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjUubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(hi``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the default value (``SCX_SLICE_DFL``).h]h)}(hi``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the default value (``SCX_SLICE_DFL``).h](j)}(h``SCX_EV_REFILL_SLICE_DFL``h]hSCX_EV_REFILL_SLICE_DFL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh=: a task’s time slice was refilled with the default value (}(hjhhhNhNubj)}(h``SCX_SLICE_DFL``h]h SCX_SLICE_DFL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(hC``SCX_EV_BYPASS_DURATION``: total nanoseconds spent in bypass mode.h]h)}(hjh](j)}(h``SCX_EV_BYPASS_DURATION``h]hSCX_EV_BYPASS_DURATION}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh): total nanoseconds spent in bypass mode.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(hL``SCX_EV_BYPASS_DISPATCH``: number of tasks dispatched while in bypass mode.h]h)}(hjh](j)}(h``SCX_EV_BYPASS_DISPATCH``h]hSCX_EV_BYPASS_DISPATCH}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh2: number of tasks dispatched while in bypass mode.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(hF``SCX_EV_BYPASS_ACTIVATE``: number of times bypass mode was activated.h]h)}(hjh](j)}(h``SCX_EV_BYPASS_ACTIVATE``h]hSCX_EV_BYPASS_ACTIVATE}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh,: number of times bypass mode was activated.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task not owned by this scheduler into a DSQ; such attempts are silently ignored.h]h)}(h``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task not owned by this scheduler into a DSQ; such attempts are silently ignored.h](j)}(h``SCX_EV_INSERT_NOT_OWNED``h]hSCX_EV_INSERT_NOT_OWNED}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj8ubhh: attempted to insert a task not owned by this scheduler into a DSQ; such attempts are silently ignored.}(hj8hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj4ubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubh)}(h``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``). h]h)}(h~``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``).h](j)}(h``SCX_EV_SUB_BYPASS_DISPATCH``h]hSCX_EV_SUB_BYPASS_DISPATCH}(hjbhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj^ubhF: tasks dispatched from sub-scheduler bypass DSQs (only relevant with }(hj^hhhNhNubj)}(h``CONFIG_EXT_SUB_SCHED``h]hCONFIG_EXT_SUB_SCHED}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1jhj^ubh).}(hj^hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjZubah}(h]h ]h"]h$]h&]uh1hhj&hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKwhjhhubh)}(h]``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more detailed information:h](j)}(h%``tools/sched_ext/scx_show_state.py``h]h!tools/sched_ext/scx_show_state.py}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh8 is a drgn script which shows more detailed information:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(h# tools/sched_ext/scx_show_state.py ops : simple enabled : 1 switching_all : 1 switched_all : 1 enable_state : enabled (2) bypass_depth : 0 nr_rejected : 0 enable_seq : 1h]h# tools/sched_ext/scx_show_state.py ops : simple enabled : 1 switching_all : 1 switched_all : 1 enable_state : enabled (2) bypass_depth : 0 nr_rejected : 0 enable_seq : 1}hjsbah}(h]h ]h"]h$]h&]jjjjnonej}uh1jhhhKhjhhubh)}(hBWhether a given task is on sched_ext can be determined as follows:h]hBWhether a given task is on sched_ext can be determined as follows:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(h_# grep ext /proc/self/sched ext.enabled : 1h]h_# grep ext /proc/self/sched ext.enabled : 1}hjsbah}(h]h ]h"]h$]h&]jjjjnonej}uh1jhhhKhjhhubeh}(h]switching-to-and-from-sched-extah ]h"]switching to and from sched_extah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h The Basicsh]h The Basics}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hX^Userspace can implement an arbitrary BPF scheduler by loading a set of BPF programs that implement ``struct sched_ext_ops``. The only mandatory field is ``ops.name`` which must be a valid BPF object name. All operations are optional. The following modified excerpt is from ``tools/sched_ext/scx_simple.bpf.c`` showing a minimal global FIFO scheduler.h](hcUserspace can implement an arbitrary BPF scheduler by loading a set of BPF programs that implement }(hjhhhNhNubj)}(h``struct sched_ext_ops``h]hstruct sched_ext_ops}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh. The only mandatory field is }(hjhhhNhNubj)}(h ``ops.name``h]hops.name}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhl which must be a valid BPF object name. All operations are optional. The following modified excerpt is from }(hjhhhNhNubj)}(h$``tools/sched_ext/scx_simple.bpf.c``h]h tools/sched_ext/scx_simple.bpf.c}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh) showing a minimal global FIFO scheduler.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hX/* * Decide which CPU a task should be migrated to before being * enqueued (either at wakeup, fork time, or exec time). If an * idle core is found by the default ops.select_cpu() implementation, * then insert the task directly into SCX_DSQ_LOCAL and skip the * ops.enqueue() callback. * * Note that this implementation has exactly the same behavior as the * default ops.select_cpu implementation. The behavior of the scheduler * would be exactly same if the implementation just didn't define the * simple_select_cpu() struct_ops prog. */ s32 BPF_STRUCT_OPS(simple_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake_flags) { s32 cpu; /* Need to initialize or the BPF verifier will reject the program */ bool direct = false; cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct); if (direct) scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0); return cpu; } /* * Do a direct insertion of a task to the global DSQ. This ops.enqueue() * callback will only be invoked if we failed to find a core to insert * into in ops.select_cpu() above. * * Note that this implementation has exactly the same behavior as the * default ops.enqueue implementation, which just dispatches the task * to SCX_DSQ_GLOBAL. The behavior of the scheduler would be exactly same * if the implementation just didn't define the simple_enqueue struct_ops * prog. */ void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags) { scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); } s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init) { /* * By default, all SCHED_EXT, SCHED_OTHER, SCHED_IDLE, and * SCHED_BATCH tasks should use sched_ext. */ return 0; } void BPF_STRUCT_OPS(simple_exit, struct scx_exit_info *ei) { exit_type = ei->type; } SEC(".struct_ops") struct sched_ext_ops simple_ops = { .select_cpu = (void *)simple_select_cpu, .enqueue = (void *)simple_enqueue, .init = (void *)simple_init, .exit = (void *)simple_exit, .name = "simple", };h]hX/* * Decide which CPU a task should be migrated to before being * enqueued (either at wakeup, fork time, or exec time). If an * idle core is found by the default ops.select_cpu() implementation, * then insert the task directly into SCX_DSQ_LOCAL and skip the * ops.enqueue() callback. * * Note that this implementation has exactly the same behavior as the * default ops.select_cpu implementation. The behavior of the scheduler * would be exactly same if the implementation just didn't define the * simple_select_cpu() struct_ops prog. */ s32 BPF_STRUCT_OPS(simple_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake_flags) { s32 cpu; /* Need to initialize or the BPF verifier will reject the program */ bool direct = false; cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct); if (direct) scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0); return cpu; } /* * Do a direct insertion of a task to the global DSQ. This ops.enqueue() * callback will only be invoked if we failed to find a core to insert * into in ops.select_cpu() above. * * Note that this implementation has exactly the same behavior as the * default ops.enqueue implementation, which just dispatches the task * to SCX_DSQ_GLOBAL. The behavior of the scheduler would be exactly same * if the implementation just didn't define the simple_enqueue struct_ops * prog. */ void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags) { scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); } s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init) { /* * By default, all SCHED_EXT, SCHED_OTHER, SCHED_IDLE, and * SCHED_BATCH tasks should use sched_ext. */ return 0; } void BPF_STRUCT_OPS(simple_exit, struct scx_exit_info *ei) { exit_type = ei->type; } SEC(".struct_ops") struct sched_ext_ops simple_ops = { .select_cpu = (void *)simple_select_cpu, .enqueue = (void *)simple_enqueue, .init = (void *)simple_init, .exit = (void *)simple_exit, .name = "simple", };}hj?sbah}(h]h ]h"]h$]h&]jjjjcj}uh1jhhhKhjhhubh)}(hhh](h)}(hDispatch Queuesh]hDispatch Queues}(hjRhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjOhhhhhKubh)}(hXTo match the impedance between the scheduler core and the BPF scheduler, sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``), and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and ``scx_bpf_destroy_dsq()``.h](hTo match the impedance between the scheduler core and the BPF scheduler, sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a priority queue. By default, there is one global FIFO (}(hj`hhhNhNubj)}(h``SCX_DSQ_GLOBAL``h]hSCX_DSQ_GLOBAL}(hjhhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh), and one local DSQ per CPU (}(hj`hhhNhNubj)}(h``SCX_DSQ_LOCAL``h]h SCX_DSQ_LOCAL}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubhB). The BPF scheduler can manage an arbitrary number of DSQs using }(hj`hhhNhNubj)}(h``scx_bpf_create_dsq()``h]hscx_bpf_create_dsq()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh and }(hj`hhhNhNubj)}(h``scx_bpf_destroy_dsq()``h]hscx_bpf_destroy_dsq()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh.}(hj`hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjOhhubh)}(hA CPU always executes a task from its local DSQ. A task is "inserted" into a DSQ. A task in a non-local DSQ is "move"d into the target CPU's local DSQ.h]hA CPU always executes a task from its local DSQ. A task is “inserted” into a DSQ. A task in a non-local DSQ is “move”d into the target CPU’s local DSQ.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjOhhubh)}(hWhen a CPU is looking for the next task to run, if the local DSQ is not empty, the first task is picked. Otherwise, the CPU tries to move a task from the global DSQ. If that doesn't yield a runnable task either, ``ops.dispatch()`` is invoked.h](hWhen a CPU is looking for the next task to run, if the local DSQ is not empty, the first task is picked. Otherwise, the CPU tries to move a task from the global DSQ. If that doesn’t yield a runnable task either, }(hjhhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is invoked.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjOhhubeh}(h]dispatch-queuesah ]h"]dispatch queuesah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(hScheduling Cycleh]hScheduling Cycle}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hHThe following briefly shows how a waking task is scheduled and executed.h]hHThe following briefly shows how a waking task is scheduled and executed.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hjhhubhenumerated_list)}(hhh](h)}(hXWhen a task is waking up, ``ops.select_cpu()`` is the first operation invoked. This serves two purposes. First, CPU selection optimization hint. Second, waking up the selected CPU if idle. The CPU selected by ``ops.select_cpu()`` is an optimization hint and not binding. The actual decision is made at the last step of scheduling. However, there is a small performance gain if the CPU ``ops.select_cpu()`` returns matches the CPU the task eventually runs on. A side-effect of selecting a CPU is waking it up from idle. While a BPF scheduler can wake up any cpu using the ``scx_bpf_kick_cpu()`` helper, using ``ops.select_cpu()`` judiciously can be simpler and more efficient. Note that the scheduler core will ignore an invalid CPU selection, for example, if it's outside the allowed cpumask of the task. A task can be immediately inserted into a DSQ from ``ops.select_cpu()`` by calling ``scx_bpf_dsq_insert()`` or ``scx_bpf_dsq_insert_vtime()``. If the task is inserted into ``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be added to the local DSQ of whichever CPU is returned from ``ops.select_cpu()``. Additionally, inserting directly from ``ops.select_cpu()`` will cause the ``ops.enqueue()`` callback to be skipped. Any other attempt to store a task in BPF-internal data structures from ``ops.select_cpu()`` does not prevent ``ops.enqueue()`` from being invoked. This is discouraged, as it can introduce racy behavior or inconsistent state. h](h)}(hWhen a task is waking up, ``ops.select_cpu()`` is the first operation invoked. This serves two purposes. First, CPU selection optimization hint. Second, waking up the selected CPU if idle.h](hWhen a task is waking up, }(hjhhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is the first operation invoked. This serves two purposes. First, CPU selection optimization hint. Second, waking up the selected CPU if idle.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjubh)}(hX The CPU selected by ``ops.select_cpu()`` is an optimization hint and not binding. The actual decision is made at the last step of scheduling. However, there is a small performance gain if the CPU ``ops.select_cpu()`` returns matches the CPU the task eventually runs on.h](hThe CPU selected by }(hj4hhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubh is an optimization hint and not binding. The actual decision is made at the last step of scheduling. However, there is a small performance gain if the CPU }(hj4hhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubh5 returns matches the CPU the task eventually runs on.}(hj4hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hA side-effect of selecting a CPU is waking it up from idle. While a BPF scheduler can wake up any cpu using the ``scx_bpf_kick_cpu()`` helper, using ``ops.select_cpu()`` judiciously can be simpler and more efficient.h](hpA side-effect of selecting a CPU is waking it up from idle. While a BPF scheduler can wake up any cpu using the }(hjfhhhNhNubj)}(h``scx_bpf_kick_cpu()``h]hscx_bpf_kick_cpu()}(hjnhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjfubh helper, using }(hjfhhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjfubh/ judiciously can be simpler and more efficient.}(hjfhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hNote that the scheduler core will ignore an invalid CPU selection, for example, if it's outside the allowed cpumask of the task.h]hNote that the scheduler core will ignore an invalid CPU selection, for example, if it’s outside the allowed cpumask of the task.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hA task can be immediately inserted into a DSQ from ``ops.select_cpu()`` by calling ``scx_bpf_dsq_insert()`` or ``scx_bpf_dsq_insert_vtime()``.h](h3A task can be immediately inserted into a DSQ from }(hjhhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh by calling }(hjhhhNhNubj)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh or }(hjhhhNhNubj)}(h``scx_bpf_dsq_insert_vtime()``h]hscx_bpf_dsq_insert_vtime()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hXIf the task is inserted into ``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be added to the local DSQ of whichever CPU is returned from ``ops.select_cpu()``. Additionally, inserting directly from ``ops.select_cpu()`` will cause the ``ops.enqueue()`` callback to be skipped.h](hIf the task is inserted into }(hjhhhNhNubj)}(h``SCX_DSQ_LOCAL``h]h SCX_DSQ_LOCAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh from }(hjhhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhF, it will be added to the local DSQ of whichever CPU is returned from }(hjhhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh(. Additionally, inserting directly from }(hjhhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hj( hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh will cause the }(hjhhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj: hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh callback to be skipped.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hAny other attempt to store a task in BPF-internal data structures from ``ops.select_cpu()`` does not prevent ``ops.enqueue()`` from being invoked. This is discouraged, as it can introduce racy behavior or inconsistent state.h](hGAny other attempt to store a task in BPF-internal data structures from }(hjR hhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hjZ hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjR ubh does not prevent }(hjR hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hjl hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjR ubhb from being invoked. This is discouraged, as it can introduce racy behavior or inconsistent state.}(hjR hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM$hjubeh}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(hX Once the target CPU is selected, ``ops.enqueue()`` is invoked (unless the task was inserted directly from ``ops.select_cpu()``). ``ops.enqueue()`` can make one of the following decisions: * Immediately insert the task into either the global or a local DSQ by calling ``scx_bpf_dsq_insert()`` with one of the following options: ``SCX_DSQ_GLOBAL``, ``SCX_DSQ_LOCAL``, or ``SCX_DSQ_LOCAL_ON | cpu``. * Immediately insert the task into a custom DSQ by calling ``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63. * Queue the task on the BPF side. **Task State Tracking and ops.dequeue() Semantics** A task is in the "BPF scheduler's custody" when the BPF scheduler is responsible for managing its lifecycle. A task enters custody when it is dispatched to a user DSQ or stored in the BPF scheduler's internal data structures. Custody is entered only from ``ops.enqueue()`` for those operations. The only exception is dispatching to a user DSQ from ``ops.select_cpu()``: although the task is not yet technically in BPF scheduler custody at that point, the dispatch has the same semantic effect as dispatching from ``ops.enqueue()`` for custody-related purposes. Once ``ops.enqueue()`` is called, the task may or may not enter custody depending on what the scheduler does: * **Directly dispatched to terminal DSQs** (``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``, or ``SCX_DSQ_GLOBAL``): the BPF scheduler is done with the task - it either goes straight to a CPU's local run queue or to the global DSQ as a fallback. The task never enters (or exits) BPF custody, and ``ops.dequeue()`` will not be called. * **Dispatch to user-created DSQs** (custom DSQs): the task enters the BPF scheduler's custody. When the task later leaves BPF custody (dispatched to a terminal DSQ, picked by core-sched, or dequeued for sleep/property changes), ``ops.dequeue()`` will be called exactly once. * **Stored in BPF data structures** (e.g., internal BPF queues): the task is in BPF custody. ``ops.dequeue()`` will be called when it leaves (e.g., when ``ops.dispatch()`` moves it to a terminal DSQ, or on property change / sleep). When a task leaves BPF scheduler custody, ``ops.dequeue()`` is invoked. The dequeue can happen for different reasons, distinguished by flags: 1. **Regular dispatch**: when a task in BPF custody is dispatched to a terminal DSQ from ``ops.dispatch()`` (leaving BPF custody for execution), ``ops.dequeue()`` is triggered without any special flags. 2. **Core scheduling pick**: when ``CONFIG_SCHED_CORE`` is enabled and core scheduling picks a task for execution while it's still in BPF custody, ``ops.dequeue()`` is called with the ``SCX_DEQ_CORE_SCHED_EXEC`` flag. 3. **Scheduling property change**: when a task property changes (via operations like ``sched_setaffinity()``, ``sched_setscheduler()``, priority changes, CPU migrations, etc.) while the task is still in BPF custody, ``ops.dequeue()`` is called with the ``SCX_DEQ_SCHED_CHANGE`` flag set in ``deq_flags``. **Important**: Once a task has left BPF custody (e.g., after being dispatched to a terminal DSQ), property changes will not trigger ``ops.dequeue()``, since the task is no longer managed by the BPF scheduler. h](h)}(hOnce the target CPU is selected, ``ops.enqueue()`` is invoked (unless the task was inserted directly from ``ops.select_cpu()``). ``ops.enqueue()`` can make one of the following decisions:h](h!Once the target CPU is selected, }(hj hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh8 is invoked (unless the task was inserted directly from }(hj hhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh). }(hj hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh) can make one of the following decisions:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM)hj ubh)}(hhh](h)}(hImmediately insert the task into either the global or a local DSQ by calling ``scx_bpf_dsq_insert()`` with one of the following options: ``SCX_DSQ_GLOBAL``, ``SCX_DSQ_LOCAL``, or ``SCX_DSQ_LOCAL_ON | cpu``. h]h)}(hImmediately insert the task into either the global or a local DSQ by calling ``scx_bpf_dsq_insert()`` with one of the following options: ``SCX_DSQ_GLOBAL``, ``SCX_DSQ_LOCAL``, or ``SCX_DSQ_LOCAL_ON | cpu``.h](hMImmediately insert the task into either the global or a local DSQ by calling }(hj hhhNhNubj)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh$ with one of the following options: }(hj hhhNhNubj)}(h``SCX_DSQ_GLOBAL``h]hSCX_DSQ_GLOBAL}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, }(hj hhhNhNubj)}(h``SCX_DSQ_LOCAL``h]h SCX_DSQ_LOCAL}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, or }(hj hhhNhNubj)}(h``SCX_DSQ_LOCAL_ON | cpu``h]hSCX_DSQ_LOCAL_ON | cpu}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM-hj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h|Immediately insert the task into a custom DSQ by calling ``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63. h]h)}(h{Immediately insert the task into a custom DSQ by calling ``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63.h](h9Immediately insert the task into a custom DSQ by calling }(hj9 hhhNhNubj)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hjA hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj9 ubh* with a DSQ ID which is smaller than 2^63.}(hj9 hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM1hj5 ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h Queue the task on the BPF side. h]h)}(hQueue the task on the BPF side.h]hQueue the task on the BPF side.}(hjc hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM4hj_ ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhM-hj ubh)}(h3**Task State Tracking and ops.dequeue() Semantics**h]hstrong)}(hj h]h/Task State Tracking and ops.dequeue() Semantics}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj} ubah}(h]h ]h"]h$]h&]uh1hhhhM6hj ubh)}(hX0A task is in the "BPF scheduler's custody" when the BPF scheduler is responsible for managing its lifecycle. A task enters custody when it is dispatched to a user DSQ or stored in the BPF scheduler's internal data structures. Custody is entered only from ``ops.enqueue()`` for those operations. The only exception is dispatching to a user DSQ from ``ops.select_cpu()``: although the task is not yet technically in BPF scheduler custody at that point, the dispatch has the same semantic effect as dispatching from ``ops.enqueue()`` for custody-related purposes.h](hXA task is in the “BPF scheduler’s custody” when the BPF scheduler is responsible for managing its lifecycle. A task enters custody when it is dispatched to a user DSQ or stored in the BPF scheduler’s internal data structures. Custody is entered only from }(hj hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhL for those operations. The only exception is dispatching to a user DSQ from }(hj hhhNhNubj)}(h``ops.select_cpu()``h]hops.select_cpu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh: although the task is not yet technically in BPF scheduler custody at that point, the dispatch has the same semantic effect as dispatching from }(hj hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh for custody-related purposes.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM8hj ubh)}(hmOnce ``ops.enqueue()`` is called, the task may or may not enter custody depending on what the scheduler does:h](hOnce }(hj hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhW is called, the task may or may not enter custody depending on what the scheduler does:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMBhj ubh)}(hhh](h)}(hXJ**Directly dispatched to terminal DSQs** (``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``, or ``SCX_DSQ_GLOBAL``): the BPF scheduler is done with the task - it either goes straight to a CPU's local run queue or to the global DSQ as a fallback. The task never enters (or exits) BPF custody, and ``ops.dequeue()`` will not be called. h]h)}(hXI**Directly dispatched to terminal DSQs** (``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``, or ``SCX_DSQ_GLOBAL``): the BPF scheduler is done with the task - it either goes straight to a CPU's local run queue or to the global DSQ as a fallback. The task never enters (or exits) BPF custody, and ``ops.dequeue()`` will not be called.h](j )}(h(**Directly dispatched to terminal DSQs**h]h$Directly dispatched to terminal DSQs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubh (}(hj hhhNhNubj)}(h``SCX_DSQ_LOCAL``h]h SCX_DSQ_LOCAL}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, }(hj hhhNhNubj)}(h``SCX_DSQ_LOCAL_ON | cpu``h]hSCX_DSQ_LOCAL_ON | cpu}(hj) hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, or }(hj hhhNhNubj)}(h``SCX_DSQ_GLOBAL``h]hSCX_DSQ_GLOBAL}(hj; hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh): the BPF scheduler is done with the task - it either goes straight to a CPU’s local run queue or to the global DSQ as a fallback. The task never enters (or exits) BPF custody, and }(hj hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hjM hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh will not be called.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMEhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hX**Dispatch to user-created DSQs** (custom DSQs): the task enters the BPF scheduler's custody. When the task later leaves BPF custody (dispatched to a terminal DSQ, picked by core-sched, or dequeued for sleep/property changes), ``ops.dequeue()`` will be called exactly once. h]h)}(hX**Dispatch to user-created DSQs** (custom DSQs): the task enters the BPF scheduler's custody. When the task later leaves BPF custody (dispatched to a terminal DSQ, picked by core-sched, or dequeued for sleep/property changes), ``ops.dequeue()`` will be called exactly once.h](j )}(h!**Dispatch to user-created DSQs**h]hDispatch to user-created DSQs}(hjs hhhNhNubah}(h]h ]h"]h$]h&]uh1j hjo ubh (custom DSQs): the task enters the BPF scheduler’s custody. When the task later leaves BPF custody (dispatched to a terminal DSQ, picked by core-sched, or dequeued for sleep/property changes), }(hjo hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjo ubh will be called exactly once.}(hjo hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMKhjk ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h**Stored in BPF data structures** (e.g., internal BPF queues): the task is in BPF custody. ``ops.dequeue()`` will be called when it leaves (e.g., when ``ops.dispatch()`` moves it to a terminal DSQ, or on property change / sleep). h]h)}(h**Stored in BPF data structures** (e.g., internal BPF queues): the task is in BPF custody. ``ops.dequeue()`` will be called when it leaves (e.g., when ``ops.dispatch()`` moves it to a terminal DSQ, or on property change / sleep).h](j )}(h!**Stored in BPF data structures**h]hStored in BPF data structures}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubh: (e.g., internal BPF queues): the task is in BPF custody. }(hj hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh+ will be called when it leaves (e.g., when }(hj hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh< moves it to a terminal DSQ, or on property change / sleep).}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMQhj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMEhj ubh)}(hWhen a task leaves BPF scheduler custody, ``ops.dequeue()`` is invoked. The dequeue can happen for different reasons, distinguished by flags:h](h*When a task leaves BPF scheduler custody, }(hj hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhR is invoked. The dequeue can happen for different reasons, distinguished by flags:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMVhj ubj )}(hhh](h)}(h**Regular dispatch**: when a task in BPF custody is dispatched to a terminal DSQ from ``ops.dispatch()`` (leaving BPF custody for execution), ``ops.dequeue()`` is triggered without any special flags. h]h)}(h**Regular dispatch**: when a task in BPF custody is dispatched to a terminal DSQ from ``ops.dispatch()`` (leaving BPF custody for execution), ``ops.dequeue()`` is triggered without any special flags.h](j )}(h**Regular dispatch**h]hRegular dispatch}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubhB: when a task in BPF custody is dispatched to a terminal DSQ from }(hj hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hj0 hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh& (leaving BPF custody for execution), }(hj hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hjB hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh( is triggered without any special flags.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMYhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h**Core scheduling pick**: when ``CONFIG_SCHED_CORE`` is enabled and core scheduling picks a task for execution while it's still in BPF custody, ``ops.dequeue()`` is called with the ``SCX_DEQ_CORE_SCHED_EXEC`` flag. h]h)}(h**Core scheduling pick**: when ``CONFIG_SCHED_CORE`` is enabled and core scheduling picks a task for execution while it's still in BPF custody, ``ops.dequeue()`` is called with the ``SCX_DEQ_CORE_SCHED_EXEC`` flag.h](j )}(h**Core scheduling pick**h]hCore scheduling pick}(hjh hhhNhNubah}(h]h ]h"]h$]h&]uh1j hjd ubh: when }(hjd hhhNhNubj)}(h``CONFIG_SCHED_CORE``h]hCONFIG_SCHED_CORE}(hjz hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjd ubh^ is enabled and core scheduling picks a task for execution while it’s still in BPF custody, }(hjd hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjd ubh is called with the }(hjd hhhNhNubj)}(h``SCX_DEQ_CORE_SCHED_EXEC``h]hSCX_DEQ_CORE_SCHED_EXEC}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjd ubh flag.}(hjd hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM]hj` ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hX.**Scheduling property change**: when a task property changes (via operations like ``sched_setaffinity()``, ``sched_setscheduler()``, priority changes, CPU migrations, etc.) while the task is still in BPF custody, ``ops.dequeue()`` is called with the ``SCX_DEQ_SCHED_CHANGE`` flag set in ``deq_flags``. h]h)}(hX-**Scheduling property change**: when a task property changes (via operations like ``sched_setaffinity()``, ``sched_setscheduler()``, priority changes, CPU migrations, etc.) while the task is still in BPF custody, ``ops.dequeue()`` is called with the ``SCX_DEQ_SCHED_CHANGE`` flag set in ``deq_flags``.h](j )}(h**Scheduling property change**h]hScheduling property change}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubh4: when a task property changes (via operations like }(hj hhhNhNubj)}(h``sched_setaffinity()``h]hsched_setaffinity()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, }(hj hhhNhNubj)}(h``sched_setscheduler()``h]hsched_setscheduler()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhR, priority changes, CPU migrations, etc.) while the task is still in BPF custody, }(hj hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh is called with the }(hj hhhNhNubj)}(h``SCX_DEQ_SCHED_CHANGE``h]hSCX_DEQ_SCHED_CHANGE}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh flag set in }(hj hhhNhNubj)}(h ``deq_flags``h]h deq_flags}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMbhj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1j hj ubh)}(h**Important**: Once a task has left BPF custody (e.g., after being dispatched to a terminal DSQ), property changes will not trigger ``ops.dequeue()``, since the task is no longer managed by the BPF scheduler.h](j )}(h **Important**h]h Important}(hjK hhhNhNubah}(h]h ]h"]h$]h&]uh1j hjG ubhw: Once a task has left BPF custody (e.g., after being dispatched to a terminal DSQ), property changes will not trigger }(hjG hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hj] hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjG ubh;, since the task is no longer managed by the BPF scheduler.}(hjG hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhhj ubeh}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(hXWhen a CPU is ready to schedule, it first looks at its local DSQ. If empty, it then looks at the global DSQ. If there still isn't a task to run, ``ops.dispatch()`` is invoked which can use the following two functions to populate the local DSQ. * ``scx_bpf_dsq_insert()`` inserts a task to a DSQ. Any target DSQ can be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``, ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dsq_insert()`` currently can't be called with BPF locks held, this is being worked on and will be supported. ``scx_bpf_dsq_insert()`` schedules insertion rather than performing them immediately. There can be up to ``ops.dispatch_max_batch`` pending tasks. * ``scx_bpf_dsq_move_to_local()`` moves a task from the specified non-local DSQ to the dispatching DSQ. This function cannot be called with any BPF locks held. ``scx_bpf_dsq_move_to_local()`` flushes the pending insertions tasks before trying to move from the specified DSQ. h](h)}(hWhen a CPU is ready to schedule, it first looks at its local DSQ. If empty, it then looks at the global DSQ. If there still isn't a task to run, ``ops.dispatch()`` is invoked which can use the following two functions to populate the local DSQ.h](hWhen a CPU is ready to schedule, it first looks at its local DSQ. If empty, it then looks at the global DSQ. If there still isn’t a task to run, }(hj hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhP is invoked which can use the following two functions to populate the local DSQ.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMmhj{ ubh)}(hhh](h)}(hX``scx_bpf_dsq_insert()`` inserts a task to a DSQ. Any target DSQ can be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``, ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dsq_insert()`` currently can't be called with BPF locks held, this is being worked on and will be supported. ``scx_bpf_dsq_insert()`` schedules insertion rather than performing them immediately. There can be up to ``ops.dispatch_max_batch`` pending tasks. h]h)}(hX``scx_bpf_dsq_insert()`` inserts a task to a DSQ. Any target DSQ can be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``, ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dsq_insert()`` currently can't be called with BPF locks held, this is being worked on and will be supported. ``scx_bpf_dsq_insert()`` schedules insertion rather than performing them immediately. There can be up to ``ops.dispatch_max_batch`` pending tasks.h](j)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh7 inserts a task to a DSQ. Any target DSQ can be used - }(hj hhhNhNubj)}(h``SCX_DSQ_LOCAL``h]h SCX_DSQ_LOCAL}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, }(hj hhhNhNubj)}(h``SCX_DSQ_LOCAL_ON | cpu``h]hSCX_DSQ_LOCAL_ON | cpu}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh, }(hj hhhNhNubj)}(h``SCX_DSQ_GLOBAL``h]hSCX_DSQ_GLOBAL}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh or a custom DSQ. While }(hj hhhNhNubj)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubha currently can’t be called with BPF locks held, this is being worked on and will be supported. }(hj hhhNhNubj)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhQ schedules insertion rather than performing them immediately. There can be up to }(hj hhhNhNubj)}(h``ops.dispatch_max_batch``h]hops.dispatch_max_batch}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh pending tasks.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMrhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hX``scx_bpf_dsq_move_to_local()`` moves a task from the specified non-local DSQ to the dispatching DSQ. This function cannot be called with any BPF locks held. ``scx_bpf_dsq_move_to_local()`` flushes the pending insertions tasks before trying to move from the specified DSQ. h]h)}(hX``scx_bpf_dsq_move_to_local()`` moves a task from the specified non-local DSQ to the dispatching DSQ. This function cannot be called with any BPF locks held. ``scx_bpf_dsq_move_to_local()`` flushes the pending insertions tasks before trying to move from the specified DSQ.h](j)}(h``scx_bpf_dsq_move_to_local()``h]hscx_bpf_dsq_move_to_local()}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj8ubh moves a task from the specified non-local DSQ to the dispatching DSQ. This function cannot be called with any BPF locks held. }(hj8hhhNhNubj)}(h``scx_bpf_dsq_move_to_local()``h]hscx_bpf_dsq_move_to_local()}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj8ubhS flushes the pending insertions tasks before trying to move from the specified DSQ.}(hj8hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMzhj4ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMrhj{ ubeh}(h]h ]h"]h$]h&]uh1hhj hhhNhNubh)}(hXAfter ``ops.dispatch()`` returns, if there are tasks in the local DSQ, the CPU runs the first one. If empty, the following steps are taken: * Try to move from the global DSQ. If successful, run the task. * If ``ops.dispatch()`` has dispatched any tasks, retry #3. * If the previous task is an SCX task and still runnable, keep executing it (see ``SCX_OPS_ENQ_LAST``). * Go idle. h](h)}(hAfter ``ops.dispatch()`` returns, if there are tasks in the local DSQ, the CPU runs the first one. If empty, the following steps are taken:h](hAfter }(hj|hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj|ubhs returns, if there are tasks in the local DSQ, the CPU runs the first one. If empty, the following steps are taken:}(hj|hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjxubh)}(hhh](h)}(h>Try to move from the global DSQ. If successful, run the task. h]h)}(h=Try to move from the global DSQ. If successful, run the task.h]h=Try to move from the global DSQ. If successful, run the task.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h:If ``ops.dispatch()`` has dispatched any tasks, retry #3. h]h)}(h9If ``ops.dispatch()`` has dispatched any tasks, retry #3.h](hIf }(hjhhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh$ has dispatched any tasks, retry #3.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hfIf the previous task is an SCX task and still runnable, keep executing it (see ``SCX_OPS_ENQ_LAST``). h]h)}(heIf the previous task is an SCX task and still runnable, keep executing it (see ``SCX_OPS_ENQ_LAST``).h](hOIf the previous task is an SCX task and still runnable, keep executing it (see }(hjhhhNhNubj)}(h``SCX_OPS_ENQ_LAST``h]hSCX_OPS_ENQ_LAST}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h Go idle. h]h)}(hGo idle.h]hGo idle.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjxubeh}(h]h ]h"]h$]h&]uh1hhj hhhNhNubeh}(h]h ]h"]h$]h&]jB jC jD hjE jF uh1j hjhhhhhM ubh)}(hXONote that the BPF scheduler can always choose to dispatch tasks immediately in ``ops.enqueue()`` as illustrated in the above simple example. If only the built-in DSQs are used, there is no need to implement ``ops.dispatch()`` as a task is never queued on the BPF scheduler and both the local and global DSQs are executed automatically.h](hONote that the BPF scheduler can always choose to dispatch tasks immediately in }(hj5hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hj=hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj5ubho as illustrated in the above simple example. If only the built-in DSQs are used, there is no need to implement }(hj5hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj5ubhn as a task is never queued on the BPF scheduler and both the local and global DSQs are executed automatically.}(hj5hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hX``scx_bpf_dsq_insert()`` inserts the task on the FIFO of the target DSQ. Use ``scx_bpf_dsq_insert_vtime()`` for the priority queue. Internal DSQs such as ``SCX_DSQ_LOCAL`` and ``SCX_DSQ_GLOBAL`` do not support priority-queue dispatching, and must be dispatched to with ``scx_bpf_dsq_insert()``. See the function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c`` for more information.h](j)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjgubh5 inserts the task on the FIFO of the target DSQ. Use }(hjghhhNhNubj)}(h``scx_bpf_dsq_insert_vtime()``h]hscx_bpf_dsq_insert_vtime()}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjgubh/ for the priority queue. Internal DSQs such as }(hjghhhNhNubj)}(h``SCX_DSQ_LOCAL``h]h SCX_DSQ_LOCAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjgubh and }(hjghhhNhNubj)}(h``SCX_DSQ_GLOBAL``h]hSCX_DSQ_GLOBAL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjgubhK do not support priority-queue dispatching, and must be dispatched to with }(hjghhhNhNubj)}(h``scx_bpf_dsq_insert()``h]hscx_bpf_dsq_insert()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjgubh.. See the function documentation and usage in }(hjghhhNhNubj)}(h$``tools/sched_ext/scx_simple.bpf.c``h]h tools/sched_ext/scx_simple.bpf.c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjgubh for more information.}(hjghhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]scheduling-cycleah ]h"]scheduling cycleah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(hTask Lifecycleh]hTask Lifecycle}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hwThe following pseudo-code presents a rough overview of the entire lifecycle of a task managed by a sched_ext scheduler:h]hwThe following pseudo-code presents a rough overview of the entire lifecycle of a task managed by a sched_ext scheduler:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(hXUops.init_task(); /* A new task is created */ ops.enable(); /* Enable BPF scheduling for the task */ while (task in SCHED_EXT) { if (task can migrate) ops.select_cpu(); /* Called on wakeup (optimization) */ ops.runnable(); /* Task becomes ready to run */ while (task_is_runnable(task)) { if (task is not in a DSQ || task->scx.slice == 0) { ops.enqueue(); /* Task can be added to a DSQ */ /* Task property change (i.e., affinity, nice, etc.)? */ if (sched_change(task)) { ops.dequeue(); /* Exiting BPF scheduler custody */ ops.quiescent(); /* Property change callback, e.g. ops.set_weight() */ ops.runnable(); continue; } /* Any usable CPU becomes available */ ops.dispatch(); /* Task is moved to a local DSQ */ ops.dequeue(); /* Exiting BPF scheduler custody */ } ops.running(); /* Task starts running on its assigned CPU */ while (task_is_runnable(task) && task->scx.slice > 0) { ops.tick(); /* Called every 1/HZ seconds */ if (task->scx.slice == 0) ops.dispatch(); /* task->scx.slice can be refilled */ } ops.stopping(); /* Task stops running (time slice expires or wait) */ } ops.quiescent(); /* Task releases its assigned CPU (wait) */ } ops.disable(); /* Disable BPF scheduling for the task */ ops.exit_task(); /* Task is destroyed */h]hXUops.init_task(); /* A new task is created */ ops.enable(); /* Enable BPF scheduling for the task */ while (task in SCHED_EXT) { if (task can migrate) ops.select_cpu(); /* Called on wakeup (optimization) */ ops.runnable(); /* Task becomes ready to run */ while (task_is_runnable(task)) { if (task is not in a DSQ || task->scx.slice == 0) { ops.enqueue(); /* Task can be added to a DSQ */ /* Task property change (i.e., affinity, nice, etc.)? */ if (sched_change(task)) { ops.dequeue(); /* Exiting BPF scheduler custody */ ops.quiescent(); /* Property change callback, e.g. ops.set_weight() */ ops.runnable(); continue; } /* Any usable CPU becomes available */ ops.dispatch(); /* Task is moved to a local DSQ */ ops.dequeue(); /* Exiting BPF scheduler custody */ } ops.running(); /* Task starts running on its assigned CPU */ while (task_is_runnable(task) && task->scx.slice > 0) { ops.tick(); /* Called every 1/HZ seconds */ if (task->scx.slice == 0) ops.dispatch(); /* task->scx.slice can be refilled */ } ops.stopping(); /* Task stops running (time slice expires or wait) */ } ops.quiescent(); /* Task releases its assigned CPU (wait) */ } ops.disable(); /* Disable BPF scheduling for the task */ ops.exit_task(); /* Task is destroyed */}hjsbah}(h]h ]h"]h$]h&]jjjjjMj}uh1jhhhMhjhhubh)}(huNote that the above pseudo-code does not cover all possible state transitions and edge cases, to name a few examples:h]huNote that the above pseudo-code does not cover all possible state transitions and edge cases, to name a few examples:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hhh](h)}(h``ops.dispatch()`` may fail to move the task to a local DSQ due to a racing property change on that task, in which case ``ops.dispatch()`` will be retried. h]h)}(h``ops.dispatch()`` may fail to move the task to a local DSQ due to a racing property change on that task, in which case ``ops.dispatch()`` will be retried.h](j)}(h``ops.dispatch()``h]hops.dispatch()}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj(ubhf may fail to move the task to a local DSQ due to a racing property change on that task, in which case }(hj(hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj(ubh will be retried.}(hj(hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj$ubah}(h]h ]h"]h$]h&]uh1hhj!hhhhhNubh)}(hThe task may be direct-dispatched to a local DSQ from ``ops.enqueue()``, in which case ``ops.dispatch()`` and ``ops.dequeue()`` are skipped and we go straight to ``ops.running()``. h]h)}(hThe task may be direct-dispatched to a local DSQ from ``ops.enqueue()``, in which case ``ops.dispatch()`` and ``ops.dequeue()`` are skipped and we go straight to ``ops.running()``.h](h6The task may be direct-dispatched to a local DSQ from }(hj`hhhNhNubj)}(h``ops.enqueue()``h]h ops.enqueue()}(hjhhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh, in which case }(hj`hhhNhNubj)}(h``ops.dispatch()``h]hops.dispatch()}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh and }(hj`hhhNhNubj)}(h``ops.dequeue()``h]h ops.dequeue()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh# are skipped and we go straight to }(hj`hhhNhNubj)}(h``ops.running()``h]h ops.running()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj`ubh.}(hj`hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj\ubah}(h]h ]h"]h$]h&]uh1hhj!hhhhhNubh)}(hXXProperty changes may occur at virtually any point during the task's lifecycle, not just when the task is queued and waiting to be dispatched. For example, changing a property of a running task will lead to the callback sequence ``ops.stopping()`` -> ``ops.quiescent()`` -> (property change callback) -> ``ops.runnable()`` -> ``ops.running()``. h]h)}(hXWProperty changes may occur at virtually any point during the task's lifecycle, not just when the task is queued and waiting to be dispatched. For example, changing a property of a running task will lead to the callback sequence ``ops.stopping()`` -> ``ops.quiescent()`` -> (property change callback) -> ``ops.runnable()`` -> ``ops.running()``.h](hProperty changes may occur at virtually any point during the task’s lifecycle, not just when the task is queued and waiting to be dispatched. For example, changing a property of a running task will lead to the callback sequence }(hjhhhNhNubj)}(h``ops.stopping()``h]hops.stopping()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh -> }(hjhhhNhNubj)}(h``ops.quiescent()``h]hops.quiescent()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh" -> (property change callback) -> }(hjhhhNhNubj)}(h``ops.runnable()``h]hops.runnable()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh -> }hjsbj)}(h``ops.running()``h]h ops.running()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhj!hhhhhNubh)}(hA sched_ext task can be preempted by a task from a higher-priority scheduling class, in which case it will exit the tick-dispatch loop even though it is runnable and has a non-zero slice. h]h)}(hA sched_ext task can be preempted by a task from a higher-priority scheduling class, in which case it will exit the tick-dispatch loop even though it is runnable and has a non-zero slice.h]hA sched_ext task can be preempted by a task from a higher-priority scheduling class, in which case it will exit the tick-dispatch loop even though it is runnable and has a non-zero slice.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhj!hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjhhubh)}(hpSee the "Scheduling Cycle" section for a more detailed description of how a freshly woken up task gets on a CPU.h]htSee the “Scheduling Cycle” section for a more detailed description of how a freshly woken up task gets on a CPU.}(hj:hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]task-lifecycleah ]h"]task lifecycleah$]h&]uh1hhjhhhhhMubeh}(h] the-basicsah ]h"] the basicsah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h Where to Lookh]h Where to Look}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjXhhhhhMubh)}(hhh](h)}(hY``include/linux/sched/ext.h`` defines the core data structures, ops table and constants. h]h)}(hX``include/linux/sched/ext.h`` defines the core data structures, ops table and constants.h](j)}(h``include/linux/sched/ext.h``h]hinclude/linux/sched/ext.h}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1jhjpubh; defines the core data structures, ops table and constants.}(hjphhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjlubah}(h]h ]h"]h$]h&]uh1hhjihhhhhNubh)}(h``kernel/sched/ext.c`` contains sched_ext core implementation and helpers. The functions prefixed with ``scx_bpf_`` can be called from the BPF scheduler. h]h)}(h``kernel/sched/ext.c`` contains sched_ext core implementation and helpers. The functions prefixed with ``scx_bpf_`` can be called from the BPF scheduler.h](j)}(h``kernel/sched/ext.c``h]hkernel/sched/ext.c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhQ contains sched_ext core implementation and helpers. The functions prefixed with }(hjhhhNhNubj)}(h ``scx_bpf_``h]hscx_bpf_}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh& can be called from the BPF scheduler.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjihhhhhNubh)}(hM``kernel/sched/ext_idle.c`` contains the built-in idle CPU selection policy. h]h)}(hL``kernel/sched/ext_idle.c`` contains the built-in idle CPU selection policy.h](j)}(h``kernel/sched/ext_idle.c``h]hkernel/sched/ext_idle.c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh1 contains the built-in idle CPU selection policy.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjihhhhhNubh)}(hX,``tools/sched_ext/`` hosts example BPF scheduler implementations. * ``scx_simple[.bpf].c``: Minimal global FIFO scheduler example using a custom DSQ. * ``scx_qmap[.bpf].c``: A multi-level FIFO scheduler supporting five levels of priority implemented with ``BPF_MAP_TYPE_QUEUE``. * ``scx_central[.bpf].c``: A central FIFO scheduler where all scheduling decisions are made on one CPU, demonstrating ``LOCAL_ON`` dispatching, tickless operation, and kthread preemption. * ``scx_cpu0[.bpf].c``: A scheduler that queues all tasks to a shared DSQ and only dispatches them on CPU0 in FIFO order. Useful for testing bypass behavior. * ``scx_flatcg[.bpf].c``: A flattened cgroup hierarchy scheduler implementing hierarchical weight-based cgroup CPU control by compounding each cgroup's share at every level into a single flat scheduling layer. * ``scx_pair[.bpf].c``: A core-scheduling example that always makes sibling CPU pairs execute tasks from the same CPU cgroup. * ``scx_sdt[.bpf].c``: A variation of ``scx_simple`` demonstrating BPF arena memory management for per-task data. * ``scx_userland[.bpf].c``: A minimal scheduler demonstrating user space scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order; all others are scheduled in user space by a simple vruntime scheduler. h](h)}(hA``tools/sched_ext/`` hosts example BPF scheduler implementations.h](j)}(h``tools/sched_ext/``h]htools/sched_ext/}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh- hosts example BPF scheduler implementations.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubh)}(hhh](h)}(hR``scx_simple[.bpf].c``: Minimal global FIFO scheduler example using a custom DSQ. h]h)}(hQ``scx_simple[.bpf].c``: Minimal global FIFO scheduler example using a custom DSQ.h](j)}(h``scx_simple[.bpf].c``h]hscx_simple[.bpf].c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh;: Minimal global FIFO scheduler example using a custom DSQ.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h``scx_qmap[.bpf].c``: A multi-level FIFO scheduler supporting five levels of priority implemented with ``BPF_MAP_TYPE_QUEUE``. h]h)}(h~``scx_qmap[.bpf].c``: A multi-level FIFO scheduler supporting five levels of priority implemented with ``BPF_MAP_TYPE_QUEUE``.h](j)}(h``scx_qmap[.bpf].c``h]hscx_qmap[.bpf].c}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj=ubhS: A multi-level FIFO scheduler supporting five levels of priority implemented with }(hj=hhhNhNubj)}(h``BPF_MAP_TYPE_QUEUE``h]hBPF_MAP_TYPE_QUEUE}(hjShhhNhNubah}(h]h ]h"]h$]h&]uh1jhj=ubh.}(hj=hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj9ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h``scx_central[.bpf].c``: A central FIFO scheduler where all scheduling decisions are made on one CPU, demonstrating ``LOCAL_ON`` dispatching, tickless operation, and kthread preemption. h]h)}(h``scx_central[.bpf].c``: A central FIFO scheduler where all scheduling decisions are made on one CPU, demonstrating ``LOCAL_ON`` dispatching, tickless operation, and kthread preemption.h](j)}(h``scx_central[.bpf].c``h]hscx_central[.bpf].c}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjuubh]: A central FIFO scheduler where all scheduling decisions are made on one CPU, demonstrating }(hjuhhhNhNubj)}(h ``LOCAL_ON``h]hLOCAL_ON}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjuubh9 dispatching, tickless operation, and kthread preemption.}(hjuhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjqubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h``scx_cpu0[.bpf].c``: A scheduler that queues all tasks to a shared DSQ and only dispatches them on CPU0 in FIFO order. Useful for testing bypass behavior. h]h)}(h``scx_cpu0[.bpf].c``: A scheduler that queues all tasks to a shared DSQ and only dispatches them on CPU0 in FIFO order. Useful for testing bypass behavior.h](j)}(h``scx_cpu0[.bpf].c``h]hscx_cpu0[.bpf].c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh: A scheduler that queues all tasks to a shared DSQ and only dispatches them on CPU0 in FIFO order. Useful for testing bypass behavior.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h``scx_flatcg[.bpf].c``: A flattened cgroup hierarchy scheduler implementing hierarchical weight-based cgroup CPU control by compounding each cgroup's share at every level into a single flat scheduling layer. h]h)}(h``scx_flatcg[.bpf].c``: A flattened cgroup hierarchy scheduler implementing hierarchical weight-based cgroup CPU control by compounding each cgroup's share at every level into a single flat scheduling layer.h](j)}(h``scx_flatcg[.bpf].c``h]hscx_flatcg[.bpf].c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh: A flattened cgroup hierarchy scheduler implementing hierarchical weight-based cgroup CPU control by compounding each cgroup’s share at every level into a single flat scheduling layer.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h|``scx_pair[.bpf].c``: A core-scheduling example that always makes sibling CPU pairs execute tasks from the same CPU cgroup. h]h)}(h{``scx_pair[.bpf].c``: A core-scheduling example that always makes sibling CPU pairs execute tasks from the same CPU cgroup.h](j)}(h``scx_pair[.bpf].c``h]hscx_pair[.bpf].c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhg: A core-scheduling example that always makes sibling CPU pairs execute tasks from the same CPU cgroup.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hp``scx_sdt[.bpf].c``: A variation of ``scx_simple`` demonstrating BPF arena memory management for per-task data. h]h)}(ho``scx_sdt[.bpf].c``: A variation of ``scx_simple`` demonstrating BPF arena memory management for per-task data.h](j)}(h``scx_sdt[.bpf].c``h]hscx_sdt[.bpf].c}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh: A variation of }(hjhhhNhNubj)}(h``scx_simple``h]h scx_simple}(hj5hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh= demonstrating BPF arena memory management for per-task data.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h``scx_userland[.bpf].c``: A minimal scheduler demonstrating user space scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order; all others are scheduled in user space by a simple vruntime scheduler. h]h)}(h``scx_userland[.bpf].c``: A minimal scheduler demonstrating user space scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order; all others are scheduled in user space by a simple vruntime scheduler.h](j)}(h``scx_userland[.bpf].c``h]hscx_userland[.bpf].c}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjWubh: A minimal scheduler demonstrating user space scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order; all others are scheduled in user space by a simple vruntime scheduler.}(hjWhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjSubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjubeh}(h]h ]h"]h$]h&]uh1hhjihhhNhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjXhhubeh}(h] where-to-lookah ]h"] where to lookah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(hModule Parametersh]hModule Parameters}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hX6sched_ext exposes two module parameters under the ``sched_ext.`` prefix that control bypass-mode behaviour. These knobs are primarily for debugging; there is usually no reason to change them during normal operation. They can be read and written at runtime (mode 0600) via ``/sys/module/sched_ext/parameters/``.h](h2sched_ext exposes two module parameters under the }(hjhhhNhNubj)}(h``sched_ext.``h]h sched_ext.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh prefix that control bypass-mode behaviour. These knobs are primarily for debugging; there is usually no reason to change them during normal operation. They can be read and written at runtime (mode 0600) via }(hjhhhNhNubj)}(h%``/sys/module/sched_ext/parameters/``h]h!/sys/module/sched_ext/parameters/}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubhdefinition_list)}(hhh](hdefinition_list_item)}(h``sched_ext.slice_bypass_us`` (default: 5000 µs) The time slice assigned to all tasks when the scheduler is in bypass mode, i.e. during BPF scheduler load, unload, and error recovery. Valid range is 100 µs to 100 ms. h](hterm)}(h1``sched_ext.slice_bypass_us`` (default: 5000 µs)h](j)}(h``sched_ext.slice_bypass_us``h]hsched_ext.slice_bypass_us}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh (default: 5000 µs)}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubh definition)}(hhh]h)}(hThe time slice assigned to all tasks when the scheduler is in bypass mode, i.e. during BPF scheduler load, unload, and error recovery. Valid range is 100 µs to 100 ms.h]hThe time slice assigned to all tasks when the scheduler is in bypass mode, i.e. during BPF scheduler load, unload, and error recovery. Valid range is 100 µs to 100 ms.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhMhjubj)}(h``sched_ext.bypass_lb_intv_us`` (default: 500000 µs) The interval at which the bypass-mode load balancer redistributes tasks across CPUs. Set to 0 to disable load balancing during bypass mode. Valid range is 0 to 10 s. h](j)}(h5``sched_ext.bypass_lb_intv_us`` (default: 500000 µs)h](j)}(h``sched_ext.bypass_lb_intv_us``h]hsched_ext.bypass_lb_intv_us}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj"ubh (default: 500000 µs)}(hj"hhhNhNubeh}(h]h ]h"]h$]h&]uh1jhhhM#hjubj)}(hhh]h)}(hThe interval at which the bypass-mode load balancer redistributes tasks across CPUs. Set to 0 to disable load balancing during bypass mode. Valid range is 0 to 10 s.h]hThe interval at which the bypass-mode load balancer redistributes tasks across CPUs. Set to 0 to disable load balancing during bypass mode. Valid range is 0 to 10 s.}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM!hj>ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1jhhhM#hjhhubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]module-parametersah ]h"]module parametersah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(hABI Instabilityh]hABI Instability}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjihhhhhM&ubh)}(hXThe APIs provided by sched_ext to BPF schedulers programs have no stability guarantees. This includes the ops table callbacks and constants defined in ``include/linux/sched/ext.h``, as well as the ``scx_bpf_`` kfuncs defined in ``kernel/sched/ext.c`` and ``kernel/sched/ext_idle.c``.h](hThe APIs provided by sched_ext to BPF schedulers programs have no stability guarantees. This includes the ops table callbacks and constants defined in }(hjzhhhNhNubj)}(h``include/linux/sched/ext.h``h]hinclude/linux/sched/ext.h}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjzubh, as well as the }(hjzhhhNhNubj)}(h ``scx_bpf_``h]hscx_bpf_}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjzubh kfuncs defined in }(hjzhhhNhNubj)}(h``kernel/sched/ext.c``h]hkernel/sched/ext.c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjzubh and }(hjzhhhNhNubj)}(h``kernel/sched/ext_idle.c``h]hkernel/sched/ext_idle.c}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjzubh.}(hjzhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM(hjihhubh)}(hWhile we will attempt to provide a relatively stable API surface when possible, they are subject to change without warning between kernel versions.h]hWhile we will attempt to provide a relatively stable API surface when possible, they are subject to change without warning between kernel versions.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM-hjihhubeh}(h]abi-instabilityah ]h"]abi instabilityah$]h&]uh1hhhhhhhhM&ubeh}(h](extensible-scheduler-classheh ]h"](extensible scheduler class sched-exteh$]h&]uh1hhhhhhhhKexpect_referenced_by_name}jhsexpect_referenced_by_id}hhsubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourcehÌ _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}h]hasnameids}(jhjjjjjUjRjjjjjMjJjjjfjcjju nametypes}(jjjjUjjjMjjfjuh}(hhjhjjjRjjjOjjjJjjjXjcjjjiu footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages]hsystem_message)}(hhh]h)}(hhh]h/Hyperlink target "sched-ext" is not referenced.}hjsbah}(h]h ]h"]h$]h&]uh1hhj}ubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehÌlineKuh1j{uba transformerN include_log] decorationNhhub.