€•ú[Œsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ'/translations/zh_CN/scheduler/schedutil”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ'/translations/zh_TW/scheduler/schedutil”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ'/translations/it_IT/scheduler/schedutil”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ'/translations/ja_JP/scheduler/schedutil”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ'/translations/ko_KR/scheduler/schedutil”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ'/translations/sp_SP/scheduler/schedutil”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒ Schedutil”h]”hŒ Schedutil”…””}”(hh¨hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hh£hžhhŸŒA/var/lib/git/docbuild/linux/Documentation/scheduler/schedutil.rst”h KubhŒnote”“”)”}”(hŒŠAll this assumes a linear relation between frequency and work capacity, we know this is flawed, but it is the best workable approximation.”h]”hŒ paragraph”“”)”}”(hŒŠAll this assumes a linear relation between frequency and work capacity, we know this is flawed, but it is the best workable approximation.”h]”hŒŠAll this assumes a linear relation between frequency and work capacity, we know this is flawed, but it is the best workable approximation.”…””}”(hh¿hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h Khh¹ubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hh£hžhhŸh¶h Nubh¢)”}”(hhh]”(h§)”}”(hŒPELT (Per Entity Load Tracking)”h]”hŒPELT (Per Entity Load Tracking)”…””}”(hhÖhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hhÓhžhhŸh¶h K ubh¾)”}”(hXkWith PELT we track some metrics across the various scheduler entities, from individual tasks to task-group slices to CPU runqueues. As the basis for this we use an Exponentially Weighted Moving Average (EWMA), each period (1024us) is decayed such that y^32 = 0.5. That is, the most recent 32ms contribute half, while the rest of history contribute the other half.”h]”hXkWith PELT we track some metrics across the various scheduler entities, from individual tasks to task-group slices to CPU runqueues. As the basis for this we use an Exponentially Weighted Moving Average (EWMA), each period (1024us) is decayed such that y^32 = 0.5. That is, the most recent 32ms contribute half, while the rest of history contribute the other half.”…””}”(hhähžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h KhhÓhžhubh¾)”}”(hŒ Specifically:”h]”hŒ Specifically:”…””}”(hhòhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h KhhÓhžhubhŒ block_quote”“”)”}”(hŒPewma_sum(u) := u_0 + u_1*y + u_2*y^2 + ... ewma(u) = ewma_sum(u) / ewma_sum(1) ”h]”(h¾)”}”(hŒ*ewma_sum(u) := u_0 + u_1*y + u_2*y^2 + ...”h]”hŒ*ewma_sum(u) := u_0 + u_1*y + u_2*y^2 + ...”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h Khjubh¾)”}”(hŒ#ewma(u) = ewma_sum(u) / ewma_sum(1)”h]”hŒ#ewma(u) = ewma_sum(u) / ewma_sum(1)”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h Khjubeh}”(h]”h ]”h"]”h$]”h&]”uh1jhŸh¶h KhhÓhžhubh¾)”}”(hŒîSince this is essentially a progression of an infinite geometric series, the results are composable, that is ewma(A) + ewma(B) = ewma(A+B). This property is key, since it gives the ability to recompose the averages when tasks move around.”h]”hŒîSince this is essentially a progression of an infinite geometric series, the results are composable, that is ewma(A) + ewma(B) = ewma(A+B). This property is key, since it gives the ability to recompose the averages when tasks move around.”…””}”(hj(hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h KhhÓhžhubh¾)”}”(hŒ¦Note that blocked tasks still contribute to the aggregates (task-group slices and CPU runqueues), which reflects their expected contribution when they resume running.”h]”hŒ¦Note that blocked tasks still contribute to the aggregates (task-group slices and CPU runqueues), which reflects their expected contribution when they resume running.”…””}”(hj6hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h KhhÓhžhubh¾)”}”(hX¼Using this we track 2 key metrics: 'running' and 'runnable'. 'Running' reflects the time an entity spends on the CPU, while 'runnable' reflects the time an entity spends on the runqueue. When there is only a single task these two metrics are the same, but once there is contention for the CPU 'running' will decrease to reflect the fraction of time each task spends on the CPU while 'runnable' will increase to reflect the amount of contention.”h]”hXÔUsing this we track 2 key metrics: ‘running’ and ‘runnable’. ‘Running’ reflects the time an entity spends on the CPU, while ‘runnable’ reflects the time an entity spends on the runqueue. When there is only a single task these two metrics are the same, but once there is contention for the CPU ‘running’ will decrease to reflect the fraction of time each task spends on the CPU while ‘runnable’ will increase to reflect the amount of contention.”…””}”(hjDhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h K#hhÓhžhubh¾)”}”(hŒ(For more detail see: kernel/sched/pelt.c”h]”hŒ(For more detail see: kernel/sched/pelt.c”…””}”(hjRhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h K*hhÓhžhubeh}”(h]”Œpelt-per-entity-load-tracking”ah ]”h"]”Œpelt (per entity load tracking)”ah$]”h&]”uh1h¡hh£hžhhŸh¶h K ubh¢)”}”(hhh]”(h§)”}”(hŒFrequency / CPU Invariance”h]”hŒFrequency / CPU Invariance”…””}”(hjkhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hjhhžhhŸh¶h K.ubh¾)”}”(hX8Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on a big CPU, we allow architectures to scale the time delta with two ratios, one Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.”h]”hX8Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on a big CPU, we allow architectures to scale the time delta with two ratios, one Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.”…””}”(hjyhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h K0hjhhžhubh¾)”}”(hŒeFor simple DVFS architectures (where software is in full control) we trivially compute the ratio as::”h]”hŒdFor simple DVFS architectures (where software is in full control) we trivially compute the ratio as:”…””}”(hj‡hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h K5hjhhžhubhŒ literal_block”“”)”}”(hŒ/ f_cur r_dvfs := ----- f_max”h]”hŒ/ f_cur r_dvfs := ----- f_max”…””}”hj—sbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1j•hŸh¶h K8hjhhžhubh¾)”}”(hŒ¶For more dynamic systems where the hardware is in control of DVFS we use hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio. For Intel specifically, we use::”h]”hŒµFor more dynamic systems where the hardware is in control of DVFS we use hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio. For Intel specifically, we use:”…””}”(hj§hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h½hŸh¶h K