sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftargetG/translations/zh_CN/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/zh_TW/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/it_IT/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/ja_JP/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/ko_KR/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/sp_SP/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhsection)}(hhh](htitle)}(h6A Tour Through TREE_RCU's Grace-Period Memory Orderingh]h8A Tour Through TREE_RCU’s Grace-Period Memory Ordering}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhha/var/lib/git/docbuild/linux/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rsthKubh paragraph)}(hAugust 8, 2017h]hAugust 8, 2017}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(h0This article was contributed by Paul E. McKenneyh]h0This article was contributed by Paul E. McKenney}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Introductionh]h Introduction}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhK ubh)}(hqThis document gives a rough visual overview of how Tree RCU's grace-period memory ordering guarantee is provided.h]hsThis document gives a rough visual overview of how Tree RCU’s grace-period memory ordering guarantee is provided.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubeh}(h] introductionah ]h"] introductionah$]h&]uh1hhhhhhhhK ubh)}(hhh](h)}(h:What Is Tree RCU's Grace Period Memory Ordering Guarantee?h]hlock``. These critical sections use helper functions for lock acquisition, including ``raw_spin_lock_rcu_node()``, ``raw_spin_lock_irq_rcu_node()``, and ``raw_spin_lock_irqsave_rcu_node()``. Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``, ``raw_spin_unlock_irq_rcu_node()``, and ``raw_spin_unlock_irqrestore_rcu_node()``, respectively. For completeness, a ``raw_spin_trylock_rcu_node()`` is also provided. The key point is that the lock-acquisition functions, including ``raw_spin_trylock_rcu_node()``, all invoke ``smp_mb__after_unlock_lock()`` immediately after successful acquisition of the lock.h](hWThe workhorse for RCU’s grace-period memory ordering is the critical section for the }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhO. These critical sections use helper functions for lock acquisition, including }(hjhhhNhNubj$)}(h``raw_spin_lock_rcu_node()``h]hraw_spin_lock_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh, }(hjhhhNhNubj$)}(h ``raw_spin_lock_irq_rcu_node()``h]hraw_spin_lock_irq_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh, and }(hjhhhNhNubj$)}(h$``raw_spin_lock_irqsave_rcu_node()``h]h raw_spin_lock_irqsave_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh&. Their lock-release counterparts are }(hjhhhNhNubj$)}(h``raw_spin_unlock_rcu_node()``h]hraw_spin_unlock_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh, }hjsbj$)}(h"``raw_spin_unlock_irq_rcu_node()``h]hraw_spin_unlock_irq_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh, and }(hjhhhNhNubj$)}(h)``raw_spin_unlock_irqrestore_rcu_node()``h]h%raw_spin_unlock_irqrestore_rcu_node()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh$, respectively. For completeness, a }(hjhhhNhNubj$)}(h``raw_spin_trylock_rcu_node()``h]hraw_spin_trylock_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhS is also provided. The key point is that the lock-acquisition functions, including }(hjhhhNhNubj$)}(h``raw_spin_trylock_rcu_node()``h]hraw_spin_trylock_rcu_node()}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh , all invoke }(hjhhhNhNubj$)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh6 immediately after successful acquisition of the lock.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK1hjshhubh)}(hX9Therefore, for any given ``rcu_node`` structure, any access happening before one of the above lock-release functions will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions. Furthermore, any access happening before one of the above lock-release function on any given CPU will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions executing on that same CPU, even if the lock-release and lock-acquisition functions are operating on different ``rcu_node`` structures. Tree RCU uses these two ordering guarantees to form an ordering network among all CPUs that were in any way involved in the grace period, including any CPUs that came online or went offline during the grace period in question.h](hTherefore, for any given }(hjXhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjXubhX structure, any access happening before one of the above lock-release functions will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions. Furthermore, any access happening before one of the above lock-release function on any given CPU will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions executing on that same CPU, even if the lock-release and lock-acquisition functions are operating on different }(hjXhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjrhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjXubh structures. Tree RCU uses these two ordering guarantees to form an ordering network among all CPUs that were in any way involved in the grace period, including any CPUs that came online or went offline during the grace period in question.}(hjXhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK>hjshhubh)}(hnThe following litmus test exhibits the ordering effects of these lock-acquisition and lock-release functions::h]hmThe following litmus test exhibits the ordering effects of these lock-acquisition and lock-release functions:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKMhjshhubh literal_block)}(hX 1 int x, y, z; 2 3 void task0(void) 4 { 5 raw_spin_lock_rcu_node(rnp); 6 WRITE_ONCE(x, 1); 7 r1 = READ_ONCE(y); 8 raw_spin_unlock_rcu_node(rnp); 9 } 10 11 void task1(void) 12 { 13 raw_spin_lock_rcu_node(rnp); 14 WRITE_ONCE(y, 1); 15 r2 = READ_ONCE(z); 16 raw_spin_unlock_rcu_node(rnp); 17 } 18 19 void task2(void) 20 { 21 WRITE_ONCE(z, 1); 22 smp_mb(); 23 r3 = READ_ONCE(x); 24 } 25 26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);h]hX 1 int x, y, z; 2 3 void task0(void) 4 { 5 raw_spin_lock_rcu_node(rnp); 6 WRITE_ONCE(x, 1); 7 r1 = READ_ONCE(y); 8 raw_spin_unlock_rcu_node(rnp); 9 } 10 11 void task1(void) 12 { 13 raw_spin_lock_rcu_node(rnp); 14 WRITE_ONCE(y, 1); 15 r2 = READ_ONCE(z); 16 raw_spin_unlock_rcu_node(rnp); 17 } 18 19 void task2(void) 20 { 21 WRITE_ONCE(z, 1); 22 smp_mb(); 23 r3 = READ_ONCE(x); 24 } 25 26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);}hjsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1jhhhKPhjshhubh)}(hXVThe ``WARN_ON()`` is evaluated at "the end of time", after all changes have propagated throughout the system. Without the ``smp_mb__after_unlock_lock()`` provided by the acquisition functions, this ``WARN_ON()`` could trigger, for example on PowerPC. The ``smp_mb__after_unlock_lock()`` invocations prevent this ``WARN_ON()`` from triggering.h](hThe }(hjhhhNhNubj$)}(h ``WARN_ON()``h]h WARN_ON()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhm is evaluated at “the end of time”, after all changes have propagated throughout the system. Without the }(hjhhhNhNubj$)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh- provided by the acquisition functions, this }(hjhhhNhNubj$)}(h ``WARN_ON()``h]h WARN_ON()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh, could trigger, for example on PowerPC. The }(hjhhhNhNubj$)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh invocations prevent this }(hjhhhNhNubj$)}(h ``WARN_ON()``h]h WARN_ON()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh from triggering.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKkhjshhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhjubhtbody)}(hhh](hrow)}(hhh]hentry)}(hhh]h)}(h**Quick Quiz**:h](hstrong)}(h**Quick Quiz**h]h Quick Quiz}(hj=hhhNhNubah}(h]h ]h"]h$]h&]uh1j;hj7ubh:}(hj7hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKthj4ubah}(h]h ]h"]h$]h&]uh1j2hj/ubah}(h]h ]h"]h$]h&]uh1j-hj*ubj.)}(hhh]j3)}(hhh]h)}(hX<But the chain of rcu_node-structure lock acquisitions guarantees that new readers will see all of the updater's pre-grace-period accesses and also guarantees that the updater's post-grace-period accesses will see all of the old reader's accesses. So why do we need all of those calls to smp_mb__after_unlock_lock()?h]hXBBut the chain of rcu_node-structure lock acquisitions guarantees that new readers will see all of the updater’s pre-grace-period accesses and also guarantees that the updater’s post-grace-period accesses will see all of the old reader’s accesses. So why do we need all of those calls to smp_mb__after_unlock_lock()?}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKvhjdubah}(h]h ]h"]h$]h&]uh1j2hjaubah}(h]h ]h"]h$]h&]uh1j-hj*ubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK|hjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj*ubj.)}(hhh]j3)}(hhh](h)}(hBecause we must provide ordering for RCU's polling grace-period primitives, for example, get_state_synchronize_rcu() and poll_state_synchronize_rcu(). Consider this code::h]hBecause we must provide ordering for RCU’s polling grace-period primitives, for example, get_state_synchronize_rcu() and poll_state_synchronize_rcu(). Consider this code:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK~hjubj)}(hX,CPU 0 CPU 1 ---- ---- WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) g = get_state_synchronize_rcu() smp_mb() while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) continue; r0 = READ_ONCE(Y)h]hX,CPU 0 CPU 1 ---- ---- WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) g = get_state_synchronize_rcu() smp_mb() while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) continue; r0 = READ_ONCE(Y)}hjsbah}(h]h ]h"]h$]h&]jjuh1jhhhKhjubh)}(hRCU guarantees that the outcome r0 == 0 && r1 == 0 will not happen, even if CPU 1 is in an RCU extended quiescent state (idle or offline) and thus won't interact directly with the RCU core processing at all.h]hRCU guarantees that the outcome r0 == 0 && r1 == 0 will not happen, even if CPU 1 is in an RCU extended quiescent state (idle or offline) and thus won’t interact directly with the RCU core processing at all.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubeh}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj*ubeh}(h]h ]h"]h$]h&]uh1j(hjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjshhhhhNubh)}(hXThis approach must be extended to include idle CPUs, which need RCU's grace-period memory ordering guarantee to extend to any RCU read-side critical sections preceding and following the current idle sojourn. This case is handled by calls to the strongly ordered ``atomic_add_return()`` read-modify-write atomic operation that is invoked within ``ct_kernel_exit_state()`` at idle-entry time and within ``ct_kernel_enter_state()`` at idle-exit time. The grace-period kthread invokes first ``ct_rcu_watching_cpu_acquire()`` (preceded by a full memory barrier) and ``rcu_watching_snap_stopped_since()`` (both of which rely on acquire semantics) to detect idle CPUs.h](hXThis approach must be extended to include idle CPUs, which need RCU’s grace-period memory ordering guarantee to extend to any RCU read-side critical sections preceding and following the current idle sojourn. This case is handled by calls to the strongly ordered }(hjhhhNhNubj$)}(h``atomic_add_return()``h]hatomic_add_return()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh; read-modify-write atomic operation that is invoked within }(hjhhhNhNubj$)}(h``ct_kernel_exit_state()``h]hct_kernel_exit_state()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh at idle-entry time and within }(hjhhhNhNubj$)}(h``ct_kernel_enter_state()``h]hct_kernel_enter_state()}(hj*hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh; at idle-exit time. The grace-period kthread invokes first }(hjhhhNhNubj$)}(h!``ct_rcu_watching_cpu_acquire()``h]hct_rcu_watching_cpu_acquire()}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh) (preceded by a full memory barrier) and }(hjhhhNhNubj$)}(h%``rcu_watching_snap_stopped_since()``h]h!rcu_watching_snap_stopped_since()}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh? (both of which rely on acquire semantics) to detect idle CPUs.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjshhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhjiubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj|ubah}(h]h ]h"]h$]h&]uh1j2hjyubah}(h]h ]h"]h$]h&]uh1j-hjvubj.)}(hhh]j3)}(hhh]h)}(hDBut what about CPUs that remain offline for the entire grace period?h]hDBut what about CPUs that remain offline for the entire grace period?}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjvubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjvubj.)}(hhh]j3)}(hhh]h)}(hXSuch CPUs will be offline at the beginning of the grace period, so the grace period won't expect quiescent states from them. Races between grace-period start and CPU-hotplug operations are mediated by the CPU's leaf ``rcu_node`` structure's ``->lock`` as described above.h](hSuch CPUs will be offline at the beginning of the grace period, so the grace period won’t expect quiescent states from them. Races between grace-period start and CPU-hotplug operations are mediated by the CPU’s leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh as described above.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjvubeh}(h]h ]h"]h$]h&]uh1j(hjiubeh}(h]h ]h"]h$]h&]colsKuh1jhjfubah}(h]h ]h"]h$]h&]uh1jhjshhhhhNubh)}(hXThe approach must be extended to handle one final case, that of waking a task blocked in ``synchronize_rcu()``. This task might be affined to a CPU that is not yet aware that the grace period has ended, and thus might not yet be subject to the grace period's memory ordering. Therefore, there is an ``smp_mb()`` after the return from ``wait_for_completion()`` in the ``synchronize_rcu()`` code path.h](hYThe approach must be extended to handle one final case, that of waking a task blocked in }(hjLhhhNhNubj$)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjLubh. This task might be affined to a CPU that is not yet aware that the grace period has ended, and thus might not yet be subject to the grace period’s memory ordering. Therefore, there is an }(hjLhhhNhNubj$)}(h ``smp_mb()``h]hsmp_mb()}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjLubh after the return from }(hjLhhhNhNubj$)}(h``wait_for_completion()``h]hwait_for_completion()}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjLubh in the }(hjLhhhNhNubj$)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjLubh code path.}(hjLhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjshhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhjubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjubj.)}(hhh]j3)}(hhh]h)}(h^What? Where??? I don't see any ``smp_mb()`` after the return from ``wait_for_completion()``!!!h](h!What? Where??? I don’t see any }(hjhhhNhNubj$)}(h ``smp_mb()``h]hsmp_mb()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh after the return from }(hjhhhNhNubj$)}(h``wait_for_completion()``h]hwait_for_completion()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh!!!}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1j;hj-ubh:}(hj-hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj*ubah}(h]h ]h"]h$]h&]uh1j2hj'ubah}(h]h ]h"]h$]h&]uh1j-hjubj.)}(hhh]j3)}(hhh]h)}(hXpThat would be because I spotted the need for that ``smp_mb()`` during the creation of this documentation, and it is therefore unlikely to hit mainline before v4.14. Kudos to Lance Roy, Will Deacon, Peter Zijlstra, and Jonathan Cameron for asking questions that sensitized me to the rather elaborate sequence of events that demonstrate the need for this memory barrier.h](h2That would be because I spotted the need for that }(hj[hhhNhNubj$)}(h ``smp_mb()``h]hsmp_mb()}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj[ubhX2 during the creation of this documentation, and it is therefore unlikely to hit mainline before v4.14. Kudos to Lance Roy, Will Deacon, Peter Zijlstra, and Jonathan Cameron for asking questions that sensitized me to the rather elaborate sequence of events that demonstrate the need for this memory barrier.}(hj[hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjXubah}(h]h ]h"]h$]h&]uh1j2hjUubah}(h]h ]h"]h$]h&]uh1j-hjubeh}(h]h ]h"]h$]h&]uh1j(hjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjshhhhhNubh)}(hXTree RCU's grace--period memory-ordering guarantees rely most heavily on the ``rcu_node`` structure's ``->lock`` field, so much so that it is necessary to abbreviate this pattern in the diagrams in the next section. For example, consider the ``rcu_prepare_for_idle()`` function shown below, which is one of several functions that enforce ordering of newly arrived RCU callbacks against future grace periods:h](hOTree RCU’s grace--period memory-ordering guarantees rely most heavily on the }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh field, so much so that it is necessary to abbreviate this pattern in the diagrams in the next section. For example, consider the }(hjhhhNhNubj$)}(h``rcu_prepare_for_idle()``h]hrcu_prepare_for_idle()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh function shown below, which is one of several functions that enforce ordering of newly arrived RCU callbacks against future grace periods:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjshhubj)}(hXj 1 static void rcu_prepare_for_idle(void) 2 { 3 bool needwake; 4 struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 5 struct rcu_node *rnp; 6 int tne; 7 8 lockdep_assert_irqs_disabled(); 9 if (rcu_rdp_is_offloaded(rdp)) 10 return; 11 12 /* Handle nohz enablement switches conservatively. */ 13 tne = READ_ONCE(tick_nohz_active); 14 if (tne != rdp->tick_nohz_enabled_snap) { 15 if (!rcu_segcblist_empty(&rdp->cblist)) 16 invoke_rcu_core(); /* force nohz to see update. */ 17 rdp->tick_nohz_enabled_snap = tne; 18 return; 19 } 20 if (!tne) 21 return; 22 23 /* 24 * If we have not yet accelerated this jiffy, accelerate all 25 * callbacks on this CPU. 26 */ 27 if (rdp->last_accelerate == jiffies) 28 return; 29 rdp->last_accelerate = jiffies; 30 if (rcu_segcblist_pend_cbs(&rdp->cblist)) { 31 rnp = rdp->mynode; 32 raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ 33 needwake = rcu_accelerate_cbs(rnp, rdp); 34 raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ 35 if (needwake) 36 rcu_gp_kthread_wake(); 37 } 38 }h]hXj 1 static void rcu_prepare_for_idle(void) 2 { 3 bool needwake; 4 struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 5 struct rcu_node *rnp; 6 int tne; 7 8 lockdep_assert_irqs_disabled(); 9 if (rcu_rdp_is_offloaded(rdp)) 10 return; 11 12 /* Handle nohz enablement switches conservatively. */ 13 tne = READ_ONCE(tick_nohz_active); 14 if (tne != rdp->tick_nohz_enabled_snap) { 15 if (!rcu_segcblist_empty(&rdp->cblist)) 16 invoke_rcu_core(); /* force nohz to see update. */ 17 rdp->tick_nohz_enabled_snap = tne; 18 return; 19 } 20 if (!tne) 21 return; 22 23 /* 24 * If we have not yet accelerated this jiffy, accelerate all 25 * callbacks on this CPU. 26 */ 27 if (rdp->last_accelerate == jiffies) 28 return; 29 rdp->last_accelerate = jiffies; 30 if (rcu_segcblist_pend_cbs(&rdp->cblist)) { 31 rnp = rdp->mynode; 32 raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ 33 needwake = rcu_accelerate_cbs(rnp, rdp); 34 raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ 35 if (needwake) 36 rcu_gp_kthread_wake(); 37 } 38 }}hjsbah}(h]h ]h"]h$]h&]jjuh1jhhhKhjshhubh)}(hBut the only part of ``rcu_prepare_for_idle()`` that really matters for this discussion are lines 32–34. We will therefore abbreviate this function as follows:h](hBut the only part of }(hjhhhNhNubj$)}(h``rcu_prepare_for_idle()``h]hrcu_prepare_for_idle()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhs that really matters for this discussion are lines 32–34. We will therefore abbreviate this function as follows:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjshhubkfigure kernel_figure)}(hhh]hfigure)}(hhh]himage)}(h%.. kernel-figure:: rcu_node-lock.svg h]h}(h]h ]h"]h$]h&]uri,RCU/Design/Memory-Ordering/rcu_node-lock.svg candidates}*j$suh1jhjhhhKubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1j hjshhhhhKubh)}(hThe box represents the ``rcu_node`` structure's ``->lock`` critical section, with the double line on top representing the additional ``smp_mb__after_unlock_lock()``.h](hThe box represents the }(hj4hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj<hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj4ubh structure’s }(hj4hhhNhNubj$)}(h ``->lock``h]h->lock}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj4ubhK critical section, with the double line on top representing the additional }(hj4hhhNhNubj$)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj4ubh.}(hj4hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjshhubh)}(hhh](h)}(h0Tree RCU Grace Period Memory Ordering Componentsh]h0Tree RCU Grace Period Memory Ordering Components}(hj{hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjxhhhhhKubh)}(h\Tree RCU's grace-period memory-ordering guarantee is provided by a number of RCU components:h]h^Tree RCU’s grace-period memory-ordering guarantee is provided by a number of RCU components:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjxhhubhenumerated_list)}(hhh](h list_item)}(h`Callback Registry`_h]h)}(hjh]h reference)}(hjh]hCallback Registry}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameCallback Registryrefidcallback-registryuh1jhjresolvedKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Grace-Period Initialization`_h]h)}(hjh]j)}(hjh]hGrace-Period Initialization}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameGrace-Period Initializationjgrace-period-initializationuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h!`Self-Reported Quiescent States`_h]h)}(hjh]j)}(hjh]hSelf-Reported Quiescent States}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameSelf-Reported Quiescent Statesjself-reported-quiescent-statesuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Dynamic Tick Interface`_h]h)}(hj h]j)}(hj h]hDynamic Tick Interface}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameDynamic Tick Interfacejdynamic-tick-interfaceuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`CPU-Hotplug Interface`_h]h)}(hj0h]j)}(hj0h]hCPU-Hotplug Interface}(hj5hhhNhNubah}(h]h ]h"]h$]h&]nameCPU-Hotplug Interfacejcpu-hotplug-interfaceuh1jhj2jKubah}(h]h ]h"]h$]h&]uh1hhhhMhj.ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Forcing Quiescent States`_h]h)}(hjSh]j)}(hjSh]hForcing Quiescent States}(hjXhhhNhNubah}(h]h ]h"]h$]h&]nameForcing Quiescent Statesjforcing-quiescent-statesuh1jhjUjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjQubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Grace-Period Cleanup`_h]h)}(hjvh]j)}(hjvh]hGrace-Period Cleanup}(hj{hhhNhNubah}(h]h ]h"]h$]h&]nameGrace-Period Cleanupjgrace-period-cleanupuh1jhjxjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjtubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Callback Invocation`_ h]h)}(h`Callback Invocation`_h]j)}(hjh]hCallback Invocation}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameCallback Invocationjcallback-invocationuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1jhjxhhhhhMubh)}(hMEach of the following section looks at the corresponding component in detail.h]hMEach of the following section looks at the corresponding component in detail.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hjxhhubh)}(hhh](h)}(hCallback Registryh]hCallback Registry}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hXIf RCU's grace-period guarantee is to mean anything at all, any access that happens before a given invocation of ``call_rcu()`` must also happen before the corresponding grace period. The implementation of this portion of RCU's grace period guarantee is shown in the following figure:h](hsIf RCU’s grace-period guarantee is to mean anything at all, any access that happens before a given invocation of }(hjhhhNhNubj$)}(h``call_rcu()``h]h call_rcu()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh must also happen before the corresponding grace period. The implementation of this portion of RCU’s grace period guarantee is shown in the following figure:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(hhh]j)}(hhh]j)}(h1.. kernel-figure:: TreeRCU-callback-registry.svg h]h}(h]h ]h"]h$]h&]uri8RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svgj%}j'j suh1jhj hhhKubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1j hjhhhhhMubh)}(hXBecause ``call_rcu()`` normally acts only on CPU-local state, it provides no ordering guarantees, either for itself or for phase one of the update (which again will usually be removal of an element from an RCU-protected data structure). It simply enqueues the ``rcu_head`` structure on a per-CPU list, which cannot become associated with a grace period until a later call to ``rcu_accelerate_cbs()``, as shown in the diagram above.h](hBecause }(hj$ hhhNhNubj$)}(h``call_rcu()``h]h call_rcu()}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj$ ubh normally acts only on CPU-local state, it provides no ordering guarantees, either for itself or for phase one of the update (which again will usually be removal of an element from an RCU-protected data structure). It simply enqueues the }(hj$ hhhNhNubj$)}(h ``rcu_head``h]hrcu_head}(hj> hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj$ ubhg structure on a per-CPU list, which cannot become associated with a grace period until a later call to }(hj$ hhhNhNubj$)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hjP hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj$ ubh , as shown in the diagram above.}(hj$ hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hXOne set of code paths shown on the left invokes ``rcu_accelerate_cbs()`` via ``note_gp_changes()``, either directly from ``call_rcu()`` (if the current CPU is inundated with queued ``rcu_head`` structures) or more likely from an ``RCU_SOFTIRQ`` handler. Another code path in the middle is taken only in kernels built with ``CONFIG_RCU_FAST_NO_HZ=y``, which invokes ``rcu_accelerate_cbs()`` via ``rcu_prepare_for_idle()``. The final code path on the right is taken only in kernels built with ``CONFIG_HOTPLUG_CPU=y``, which invokes ``rcu_accelerate_cbs()`` via ``rcu_advance_cbs()``, ``rcu_migrate_callbacks``, ``rcutree_migrate_callbacks()``, and ``takedown_cpu()``, which in turn is invoked on a surviving CPU after the outgoing CPU has been completely offlined.h](h0One set of code paths shown on the left invokes }(hjh hhhNhNubj$)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hjp hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh via }(hjh hhhNhNubj$)}(h``note_gp_changes()``h]hnote_gp_changes()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh, either directly from }(hjh hhhNhNubj$)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh. (if the current CPU is inundated with queued }(hjh hhhNhNubj$)}(h ``rcu_head``h]hrcu_head}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh$ structures) or more likely from an }(hjh hhhNhNubj$)}(h``RCU_SOFTIRQ``h]h RCU_SOFTIRQ}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubhN handler. Another code path in the middle is taken only in kernels built with }(hjh hhhNhNubj$)}(h``CONFIG_RCU_FAST_NO_HZ=y``h]hCONFIG_RCU_FAST_NO_HZ=y}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh, which invokes }(hjh hhhNhNubj$)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh via }(hjh hhhNhNubj$)}(h``rcu_prepare_for_idle()``h]hrcu_prepare_for_idle()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubhG. The final code path on the right is taken only in kernels built with }(hjh hhhNhNubj$)}(h``CONFIG_HOTPLUG_CPU=y``h]hCONFIG_HOTPLUG_CPU=y}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh, which invokes }(hjh hhhNhNubj$)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh via }(hjh hhhNhNubj$)}(h``rcu_advance_cbs()``h]hrcu_advance_cbs()}(hj$ hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh, }(hjh hhhNhNubj$)}(h``rcu_migrate_callbacks``h]hrcu_migrate_callbacks}(hj6 hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh, }(hjh hhhNhNubj$)}(h``rcutree_migrate_callbacks()``h]hrcutree_migrate_callbacks()}(hjH hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubh, and }(hjh hhhNhNubj$)}(h``takedown_cpu()``h]htakedown_cpu()}(hjZ hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjh ubhb, which in turn is invoked on a surviving CPU after the outgoing CPU has been completely offlined.}(hjh hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjhhubh)}(hXDThere are a few other code paths within grace-period processing that opportunistically invoke ``rcu_accelerate_cbs()``. However, either way, all of the CPU's recently queued ``rcu_head`` structures are associated with a future grace-period number under the protection of the CPU's lead ``rcu_node`` structure's ``->lock``. In all cases, there is full ordering against any prior critical section for that same ``rcu_node`` structure's ``->lock``, and also full ordering against any of the current task's or CPU's prior critical sections for any ``rcu_node`` structure's ``->lock``.h](h^There are a few other code paths within grace-period processing that opportunistically invoke }(hjr hhhNhNubj$)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hjz hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubh:. However, either way, all of the CPU’s recently queued }(hjr hhhNhNubj$)}(h ``rcu_head``h]hrcu_head}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubhf structures are associated with a future grace-period number under the protection of the CPU’s lead }(hjr hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubh structure’s }(hjr hhhNhNubj$)}(h ``->lock``h]h->lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubhX. In all cases, there is full ordering against any prior critical section for that same }(hjr hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubh structure’s }(hjr hhhNhNubj$)}(h ``->lock``h]h->lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubhh, and also full ordering against any of the current task’s or CPU’s prior critical sections for any }(hjr hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubh structure’s }(hjr hhhNhNubj$)}(h ``->lock``h]h->lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjr ubh.}(hjr hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM-hjhhubh)}(hThe next section will show how this ordering ensures that any accesses prior to the ``call_rcu()`` (particularly including phase one of the update) happen before the start of the corresponding grace period.h](hTThe next section will show how this ordering ensures that any accesses prior to the }(hj hhhNhNubj$)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubhl (particularly including phase one of the update) happen before the start of the corresponding grace period.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM7hjhhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhj3 ubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hjM hhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjI ubh:}(hjI hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM<hjF ubah}(h]h ]h"]h$]h&]uh1j2hjC ubah}(h]h ]h"]h$]h&]uh1j-hj@ ubj.)}(hhh]j3)}(hhh]h)}(h%But what about ``synchronize_rcu()``?h](hBut what about }(hjw hhhNhNubj$)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjw ubh?}(hjw hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM>hjt ubah}(h]h ]h"]h$]h&]uh1j2hjq ubah}(h]h ]h"]h$]h&]uh1j-hj@ ubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j;hj ubh:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM@hj ubah}(h]h ]h"]h$]h&]uh1j2hj ubah}(h]h ]h"]h$]h&]uh1j-hj@ ubj.)}(hhh]j3)}(hhh]h)}(hThe ``synchronize_rcu()`` passes ``call_rcu()`` to ``wait_rcu_gp()``, which invokes it. So either way, it eventually comes down to ``call_rcu()``.h](hThe }(hj hhhNhNubj$)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh passes }(hj hhhNhNubj$)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh to }(hj hhhNhNubj$)}(h``wait_rcu_gp()``h]h wait_rcu_gp()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh?, which invokes it. So either way, it eventually comes down to }(hj hhhNhNubj$)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMBhj ubah}(h]h ]h"]h$]h&]uh1j2hj ubah}(h]h ]h"]h$]h&]uh1j-hj@ ubeh}(h]h ]h"]h$]h&]uh1j(hj3 ubeh}(h]h ]h"]h$]h&]colsKuh1jhj0 ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]jah ]h"]callback registryah$]h&]uh1hhjxhhhhhM referencedKubh)}(hhh](h)}(hGrace-Period Initializationh]hGrace-Period Initialization}(hjW hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjT hhhhhMHubh)}(hXrGrace-period initialization is carried out by the grace-period kernel thread, which makes several passes over the ``rcu_node`` tree within the ``rcu_gp_init()`` function. This means that showing the full flow of ordering through the grace-period computation will require duplicating this tree. If you find this confusing, please note that the state of the ``rcu_node`` changes over time, just like Heraclitus's river. However, to keep the ``rcu_node`` river tractable, the grace-period kernel thread's traversals are presented in multiple parts, starting in this section with the various phases of grace-period initialization.h](hrGrace-period initialization is carried out by the grace-period kernel thread, which makes several passes over the }(hje hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjm hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hje ubh tree within the }(hje hhhNhNubj$)}(h``rcu_gp_init()``h]h rcu_gp_init()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hje ubh function. This means that showing the full flow of ordering through the grace-period computation will require duplicating this tree. If you find this confusing, please note that the state of the }(hje hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hje ubhI changes over time, just like Heraclitus’s river. However, to keep the }(hje hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hje ubh river tractable, the grace-period kernel thread’s traversals are presented in multiple parts, starting in this section with the various phases of grace-period initialization.}(hje hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMJhjT hhubh)}(hThe first ordering-related grace-period initialization action is to advance the ``rcu_state`` structure's ``->gp_seq`` grace-period-number counter, as shown below:h](hPThe first ordering-related grace-period initialization action is to advance the }(hj hhhNhNubj$)}(h ``rcu_state``h]h rcu_state}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh structure’s }(hj hhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh- grace-period-number counter, as shown below:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMThjT hhubj)}(hhh]j)}(hhh]j)}(h).. kernel-figure:: TreeRCU-gp-init-1.svg h]h}(h]h ]h"]h$]h&]uri0RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svgj%}j'j suh1jhj hhhKubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1j hjT hhhhhMYubh)}(hThe actual increment is carried out using ``smp_store_release()``, which helps reject false-positive RCU CPU stall detection. Note that only the root ``rcu_node`` structure is touched.h](h*The actual increment is carried out using }(hj hhhNhNubj$)}(h``smp_store_release()``h]hsmp_store_release()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubhU, which helps reject false-positive RCU CPU stall detection. Note that only the root }(hj hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj& hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj ubh structure is touched.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMZhjT hhubh)}(hXThe first pass through the ``rcu_node`` tree updates bitmasks based on CPUs having come online or gone offline since the start of the previous grace period. In the common case where the number of online CPUs for this ``rcu_node`` structure has not transitioned to or from zero, this pass will scan only the leaf ``rcu_node`` structures. However, if the number of online CPUs for a given leaf ``rcu_node`` structure has transitioned from zero, ``rcu_init_new_rnp()`` will be invoked for the first incoming CPU. Similarly, if the number of online CPUs for a given leaf ``rcu_node`` structure has transitioned to zero, ``rcu_cleanup_dead_rnp()`` will be invoked for the last outgoing CPU. The diagram below shows the path of ordering if the leftmost ``rcu_node`` structure onlines its first CPU and if the next ``rcu_node`` structure has no online CPUs (or, alternatively if the leftmost ``rcu_node`` structure offlines its last CPU and if the next ``rcu_node`` structure has no online CPUs).h](hThe first pass through the }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjF hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubh tree updates bitmasks based on CPUs having come online or gone offline since the start of the previous grace period. In the common case where the number of online CPUs for this }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjX hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubhS structure has not transitioned to or from zero, this pass will scan only the leaf }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubhD structures. However, if the number of online CPUs for a given leaf }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj| hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubh' structure has transitioned from zero, }(hj> hhhNhNubj$)}(h``rcu_init_new_rnp()``h]hrcu_init_new_rnp()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubhf will be invoked for the first incoming CPU. Similarly, if the number of online CPUs for a given leaf }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubh% structure has transitioned to zero, }(hj> hhhNhNubj$)}(h``rcu_cleanup_dead_rnp()``h]hrcu_cleanup_dead_rnp()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubhi will be invoked for the last outgoing CPU. The diagram below shows the path of ordering if the leftmost }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubh1 structure onlines its first CPU and if the next }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubhA structure has no online CPUs (or, alternatively if the leftmost }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubh1 structure offlines its last CPU and if the next }(hj> hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj> ubh structure has no online CPUs).}(hj> hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM^hjT hhubj)}(hhh]j)}(hhh]j)}(h).. kernel-figure:: TreeRCU-gp-init-2.svg h]h}(h]h ]h"]h$]h&]uri0RCU/Design/Memory-Ordering/TreeRCU-gp-init-2.svgj%}j'j#suh1jhjhhhKubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1j hjT hhhhhMoubh)}(hThe final ``rcu_gp_init()`` pass through the ``rcu_node`` tree traverses breadth-first, setting each ``rcu_node`` structure's ``->gp_seq`` field to the newly advanced value from the ``rcu_state`` structure, as shown in the following diagram.h](h The final }(hj1hhhNhNubj$)}(h``rcu_gp_init()``h]h rcu_gp_init()}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj1ubh pass through the }(hj1hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj1ubh, tree traverses breadth-first, setting each }(hj1hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj1ubh structure’s }(hj1hhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjohhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj1ubh, field to the newly advanced value from the }(hj1hhhNhNubj$)}(h ``rcu_state``h]h rcu_state}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj1ubh. structure, as shown in the following diagram.}(hj1hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMphjT hhubj)}(hhh]j)}(hhh]j)}(h).. kernel-figure:: TreeRCU-gp-init-3.svg h]h}(h]h ]h"]h$]h&]uri0RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svgj%}j'jsuh1jhjhhhKubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1j hjT hhhhhMvubh)}(hXThis change will also cause each CPU's next call to ``__note_gp_changes()`` to notice that a new grace period has started, as described in the next section. But because the grace-period kthread started the grace period at the root (with the advancing of the ``rcu_state`` structure's ``->gp_seq`` field) before setting each leaf ``rcu_node`` structure's ``->gp_seq`` field, each CPU's observation of the start of the grace period will happen after the actual start of the grace period.h](h6This change will also cause each CPU’s next call to }(hjhhhNhNubj$)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh to notice that a new grace period has started, as described in the next section. But because the grace-period kthread started the grace period at the root (with the advancing of the }(hjhhhNhNubj$)}(h ``rcu_state``h]h rcu_state}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh! field) before setting each leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhy field, each CPU’s observation of the start of the grace period will happen after the actual start of the grace period.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMwhjT hhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhj#ubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hj=hhhNhNubah}(h]h ]h"]h$]h&]uh1j;hj9ubh:}(hj9hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj6ubah}(h]h ]h"]h$]h&]uh1j2hj3ubah}(h]h ]h"]h$]h&]uh1j-hj0ubj.)}(hhh]j3)}(hhh]h)}(hBut what about the CPU that started the grace period? Why wouldn't it see the start of the grace period right when it started that grace period?h]hBut what about the CPU that started the grace period? Why wouldn’t it see the start of the grace period right when it started that grace period?}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjdubah}(h]h ]h"]h$]h&]uh1j2hjaubah}(h]h ]h"]h$]h&]uh1j-hj0ubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj0ubj.)}(hhh]j3)}(hhh]h)}(hXIn some deep philosophical and overly anthromorphized sense, yes, the CPU starting the grace period is immediately aware of having done so. However, if we instead assume that RCU is not self-aware, then even the CPU starting the grace period does not really become aware of the start of this grace period until its first call to ``__note_gp_changes()``. On the other hand, this CPU potentially gets early notification because it invokes ``__note_gp_changes()`` during its last ``rcu_gp_init()`` pass through its leaf ``rcu_node`` structure.h](hXIIn some deep philosophical and overly anthromorphized sense, yes, the CPU starting the grace period is immediately aware of having done so. However, if we instead assume that RCU is not self-aware, then even the CPU starting the grace period does not really become aware of the start of this grace period until its first call to }(hjhhhNhNubj$)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhU. On the other hand, this CPU potentially gets early notification because it invokes }(hjhhhNhNubj$)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh during its last }(hjhhhNhNubj$)}(h``rcu_gp_init()``h]h rcu_gp_init()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh pass through its leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj0ubeh}(h]h ]h"]h$]h&]uh1j(hj#ubeh}(h]h ]h"]h$]h&]colsKuh1jhj ubah}(h]h ]h"]h$]h&]uh1jhjT hhhhhNubeh}(h]jah ]h"]grace-period initializationah$]h&]uh1hhjxhhhhhMHjS Kubh)}(hhh](h)}(hSelf-Reported Quiescent Statesh]hSelf-Reported Quiescent States}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj1hhhhhMubh)}(hXWhen all entities that might block the grace period have reported quiescent states (or as described in a later section, had quiescent states reported on their behalf), the grace period can end. Online non-idle CPUs report their own quiescent states, as shown in the following diagram:h]hXWhen all entities that might block the grace period have reported quiescent states (or as described in a later section, had quiescent states reported on their behalf), the grace period can end. Online non-idle CPUs report their own quiescent states, as shown in the following diagram:}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj1hhubj)}(hhh]j)}(hhh]j)}(h".. kernel-figure:: TreeRCU-qs.svg h]h}(h]h ]h"]h$]h&]uri)RCU/Design/Memory-Ordering/TreeRCU-qs.svgj%}j'jasuh1jhjShhhKubah}(h]h ]h"]h$]h&]uh1jhjPubah}(h]h ]h"]h$]h&]uh1j hj1hhhhhMubh)}(hXThis is for the last CPU to report a quiescent state, which signals the end of the grace period. Earlier quiescent states would push up the ``rcu_node`` tree only until they encountered an ``rcu_node`` structure that is waiting for additional quiescent states. However, ordering is nevertheless preserved because some later quiescent state will acquire that ``rcu_node`` structure's ``->lock``.h](hThis is for the last CPU to report a quiescent state, which signals the end of the grace period. Earlier quiescent states would push up the }(hjohhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjwhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjoubh% tree only until they encountered an }(hjohhhNhNubj$)}(h ``rcu_node``h]hrcu_noder}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjoubh structure that is waiting for additional quiescent states. However, ordering is nevertheless preserved because some later quiescent state will acquire that }(hjohhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjoubh structure’s }(hjohhhNhNubj$)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjoubh.}(hjohhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj1hhubh)}(hX=Any number of events can lead up to a CPU invoking ``note_gp_changes`` (or alternatively, directly invoking ``__note_gp_changes()``), at which point that CPU will notice the start of a new grace period while holding its leaf ``rcu_node`` lock. Therefore, all execution shown in this diagram happens after the start of the grace period. In addition, this CPU will consider any RCU read-side critical section that started before the invocation of ``__note_gp_changes()`` to have started before the grace period, and thus a critical section that the grace period must wait on.h](h3Any number of events can lead up to a CPU invoking }(hjhhhNhNubj$)}(h``note_gp_changes``h]hnote_gp_changes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh& (or alternatively, directly invoking }(hjhhhNhNubj$)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh^), at which point that CPU will notice the start of a new grace period while holding its leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh lock. Therefore, all execution shown in this diagram happens after the start of the grace period. In addition, this CPU will consider any RCU read-side critical section that started before the invocation of }(hjhhhNhNubj$)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhi to have started before the grace period, and thus a critical section that the grace period must wait on.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj1hhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhjubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hj8hhhNhNubah}(h]h ]h"]h$]h&]uh1j;hj4ubh:}(hj4hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj1ubah}(h]h ]h"]h$]h&]uh1j2hj.ubah}(h]h ]h"]h$]h&]uh1j-hj+ubj.)}(hhh]j3)}(hhh]h)}(hBut a RCU read-side critical section might have started after the beginning of the grace period (the advancing of ``->gp_seq`` from earlier), so why should the grace period wait on such a critical section?h](hrBut a RCU read-side critical section might have started after the beginning of the grace period (the advancing of }(hjbhhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjbubhO from earlier), so why should the grace period wait on such a critical section?}(hjbhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj_ubah}(h]h ]h"]h$]h&]uh1j2hj\ubah}(h]h ]h"]h$]h&]uh1j-hj+ubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj+ubj.)}(hhh]j3)}(hhh]h)}(hXIt is indeed not necessary for the grace period to wait on such a critical section. However, it is permissible to wait on it. And it is furthermore important to wait on it, as this lazy approach is far more scalable than a “big bang” all-at-once grace-period start could possibly be.h]hXIt is indeed not necessary for the grace period to wait on such a critical section. However, it is permissible to wait on it. And it is furthermore important to wait on it, as this lazy approach is far more scalable than a “big bang” all-at-once grace-period start could possibly be.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj+ubeh}(h]h ]h"]h$]h&]uh1j(hjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhj1hhhhhNubh)}(hXnIf the CPU does a context switch, a quiescent state will be noted by ``rcu_note_context_switch()`` on the left. On the other hand, if the CPU takes a scheduler-clock interrupt while executing in usermode, a quiescent state will be noted by ``rcu_sched_clock_irq()`` on the right. Either way, the passage through a quiescent state will be noted in a per-CPU variable.h](hEIf the CPU does a context switch, a quiescent state will be noted by }(hjhhhNhNubj$)}(h``rcu_note_context_switch()``h]hrcu_note_context_switch()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh on the left. On the other hand, if the CPU takes a scheduler-clock interrupt while executing in usermode, a quiescent state will be noted by }(hjhhhNhNubj$)}(h``rcu_sched_clock_irq()``h]hrcu_sched_clock_irq()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhe on the right. Either way, the passage through a quiescent state will be noted in a per-CPU variable.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj1hhubh)}(hX\The next time an ``RCU_SOFTIRQ`` handler executes on this CPU (for example, after the next scheduler-clock interrupt), ``rcu_core()`` will invoke ``rcu_check_quiescent_state()``, which will notice the recorded quiescent state, and invoke ``rcu_report_qs_rdp()``. If ``rcu_report_qs_rdp()`` verifies that the quiescent state really does apply to the current grace period, it invokes ``rcu_report_rnp()`` which traverses up the ``rcu_node`` tree as shown at the bottom of the diagram, clearing bits from each ``rcu_node`` structure's ``->qsmask`` field, and propagating up the tree when the result is zero.h](hThe next time an }(hj!hhhNhNubj$)}(h``RCU_SOFTIRQ``h]h RCU_SOFTIRQ}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubhW handler executes on this CPU (for example, after the next scheduler-clock interrupt), }(hj!hhhNhNubj$)}(h``rcu_core()``h]h rcu_core()}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh will invoke }(hj!hhhNhNubj$)}(h``rcu_check_quiescent_state()``h]hrcu_check_quiescent_state()}(hjMhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh=, which will notice the recorded quiescent state, and invoke }(hj!hhhNhNubj$)}(h``rcu_report_qs_rdp()``h]hrcu_report_qs_rdp()}(hj_hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh. If }(hj!hhhNhNubj$)}(h``rcu_report_qs_rdp()``h]hrcu_report_qs_rdp()}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh] verifies that the quiescent state really does apply to the current grace period, it invokes }(hj!hhhNhNubj$)}(h``rcu_report_rnp()``h]hrcu_report_rnp()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh which traverses up the }(hj!hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubhE tree as shown at the bottom of the diagram, clearing bits from each }(hj!hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh structure’s }(hj!hhhNhNubj$)}(h ``->qsmask``h]h->qsmask}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj!ubh< field, and propagating up the tree when the result is zero.}(hj!hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj1hhubh)}(hX Note that traversal passes upwards out of a given ``rcu_node`` structure only if the current CPU is reporting the last quiescent state for the subtree headed by that ``rcu_node`` structure. A key point is that if a CPU's traversal stops at a given ``rcu_node`` structure, then there will be a later traversal by another CPU (or perhaps the same one) that proceeds upwards from that point, and the ``rcu_node`` ``->lock`` guarantees that the first CPU's quiescent state happens before the remainder of the second CPU's traversal. Applying this line of thought repeatedly shows that all CPUs' quiescent states happen before the last CPU traverses through the root ``rcu_node`` structure, the “last CPU” being the one that clears the last bit in the root ``rcu_node`` structure's ``->qsmask`` field.h](h2Note that traversal passes upwards out of a given }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhh structure only if the current CPU is reporting the last quiescent state for the subtree headed by that }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhH structure. A key point is that if a CPU’s traversal stops at a given }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure, then there will be a later traversal by another CPU (or perhaps the same one) that proceeds upwards from that point, and the }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh }(hjhhhNhNubj$)}(h ``->lock``h]h->lock}(hj!hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh guarantees that the first CPU’s quiescent state happens before the remainder of the second CPU’s traversal. Applying this line of thought repeatedly shows that all CPUs’ quiescent states happen before the last CPU traverses through the root }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhR structure, the “last CPU” being the one that clears the last bit in the root }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->qsmask``h]h->qsmask}(hjWhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh field.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj1hhubeh}(h]jah ]h"]self-reported quiescent statesah$]h&]uh1hhjxhhhhhMjS Kubh)}(hhh](h)}(hDynamic Tick Interfaceh]hDynamic Tick Interface}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjvhhhhhMubh)}(hX%Due to energy-efficiency considerations, RCU is forbidden from disturbing idle CPUs. CPUs are therefore required to notify RCU when entering or leaving idle state, which they do via fully ordered value-returning atomic operations on a per-CPU variable. The ordering effects are as shown below:h]hX%Due to energy-efficiency considerations, RCU is forbidden from disturbing idle CPUs. CPUs are therefore required to notify RCU when entering or leaving idle state, which they do via fully ordered value-returning atomic operations on a per-CPU variable. The ordering effects are as shown below:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjvhhubj)}(hhh]j)}(hhh]j)}(h'.. kernel-figure:: TreeRCU-dyntick.svg h]h}(h]h ]h"]h$]h&]uri.RCU/Design/Memory-Ordering/TreeRCU-dyntick.svgj%}j'jsuh1jhjhhhKubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1j hjvhhhhhMubh)}(hXThe RCU grace-period kernel thread samples the per-CPU idleness variable while holding the corresponding CPU's leaf ``rcu_node`` structure's ``->lock``. This means that any RCU read-side critical sections that precede the idle period (the oval near the top of the diagram above) will happen before the end of the current grace period. Similarly, the beginning of the current grace period will happen before any RCU read-side critical sections that follow the idle period (the oval near the bottom of the diagram above).h](hvThe RCU grace-period kernel thread samples the per-CPU idleness variable while holding the corresponding CPU’s leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhXp. This means that any RCU read-side critical sections that precede the idle period (the oval near the top of the diagram above) will happen before the end of the current grace period. Similarly, the beginning of the current grace period will happen before any RCU read-side critical sections that follow the idle period (the oval near the bottom of the diagram above).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjvhhubh)}(hfPlumbing this into the full grace-period execution is described `below `__.h](h@Plumbing this into the full grace-period execution is described }(hjhhhNhNubj)}(h%`below `__h]hbelow}(hjhhhNhNubah}(h]h ]h"]h$]h&]namebelowjjguh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjvhhubeh}(h]j!ah ]h"]dynamic tick interfaceah$]h&]uh1hhjxhhhhhMjS Kubh)}(hhh](h)}(hCPU-Hotplug Interfaceh]hCPU-Hotplug Interface}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hXRCU is also forbidden from disturbing offline CPUs, which might well be powered off and removed from the system completely. CPUs are therefore required to notify RCU of their comings and goings as part of the corresponding CPU hotplug operations. The ordering effects are shown below:h]hXRCU is also forbidden from disturbing offline CPUs, which might well be powered off and removed from the system completely. CPUs are therefore required to notify RCU of their comings and goings as part of the corresponding CPU hotplug operations. The ordering effects are shown below:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(hhh]j)}(hhh]j)}(h'.. kernel-figure:: TreeRCU-hotplug.svg h]h}(h]h ]h"]h$]h&]uri.RCU/Design/Memory-Ordering/TreeRCU-hotplug.svgj%}j'j?suh1jhj1hhhKubah}(h]h ]h"]h$]h&]uh1jhj.ubah}(h]h ]h"]h$]h&]uh1j hjhhhhhMubh)}(hX]Because CPU hotplug operations are much less frequent than idle transitions, they are heavier weight, and thus acquire the CPU's leaf ``rcu_node`` structure's ``->lock`` and update this structure's ``->qsmaskinitnext``. The RCU grace-period kernel thread samples this mask to detect CPUs having gone offline since the beginning of this grace period.h](hBecause CPU hotplug operations are much less frequent than idle transitions, they are heavier weight, and thus acquire the CPU’s leaf }(hjMhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjMubh structure’s }(hjMhhhNhNubj$)}(h ``->lock``h]h->lock}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjMubh and update this structure’s }(hjMhhhNhNubj$)}(h``->qsmaskinitnext``h]h->qsmaskinitnext}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjMubh. The RCU grace-period kernel thread samples this mask to detect CPUs having gone offline since the beginning of this grace period.}(hjMhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hfPlumbing this into the full grace-period execution is described `below `__.h](h@Plumbing this into the full grace-period execution is described }(hjhhhNhNubj)}(h%`below `__h]hbelow}(hjhhhNhNubah}(h]h ]h"]h$]h&]namebelowjjguh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]jDah ]h"]cpu-hotplug interfaceah$]h&]uh1hhjxhhhhhMjS Kubh)}(hhh](h)}(hForcing Quiescent Statesh]hForcing Quiescent States}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM ubh)}(hX&As noted above, idle and offline CPUs cannot report their own quiescent states, and therefore the grace-period kernel thread must do the reporting on their behalf. This process is called “forcing quiescent states”, it is repeated every few jiffies, and its ordering effects are shown below:h]hX&As noted above, idle and offline CPUs cannot report their own quiescent states, and therefore the grace-period kernel thread must do the reporting on their behalf. This process is called “forcing quiescent states”, it is repeated every few jiffies, and its ordering effects are shown below:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(hhh]j)}(hhh]j)}(h&.. kernel-figure:: TreeRCU-gp-fqs.svg h]h}(h]h ]h"]h$]h&]uri-RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svgj%}j'jsuh1jhjhhhKubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1j hjhhhhhMubh)}(hX9Each pass of quiescent state forcing is guaranteed to traverse the leaf ``rcu_node`` structures, and if there are no new quiescent states due to recently idled and/or offlined CPUs, then only the leaves are traversed. However, if there is a newly offlined CPU as illustrated on the left or a newly idled CPU as illustrated on the right, the corresponding quiescent state will be driven up towards the root. As with self-reported quiescent states, the upwards driving stops once it reaches an ``rcu_node`` structure that has quiescent states outstanding from other CPUs.h](hHEach pass of quiescent state forcing is guaranteed to traverse the leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhX structures, and if there are no new quiescent states due to recently idled and/or offlined CPUs, then only the leaves are traversed. However, if there is a newly offlined CPU as illustrated on the left or a newly idled CPU as illustrated on the right, the corresponding quiescent state will be driven up towards the root. As with self-reported quiescent states, the upwards driving stops once it reaches an }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhA structure that has quiescent states outstanding from other CPUs.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhj-ubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjCubh:}(hjChhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM!hj@ubah}(h]h ]h"]h$]h&]uh1j2hj=ubah}(h]h ]h"]h$]h&]uh1j-hj:ubj.)}(hhh]j3)}(hhh]h)}(hXThe leftmost drive to root stopped before it reached the root ``rcu_node`` structure, which means that there are still CPUs subordinate to that structure on which the current grace period is waiting. Given that, how is it possible that the rightmost drive to root ended the grace period?h](h>The leftmost drive to root stopped before it reached the root }(hjqhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjqubh structure, which means that there are still CPUs subordinate to that structure on which the current grace period is waiting. Given that, how is it possible that the rightmost drive to root ended the grace period?}(hjqhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM#hjnubah}(h]h ]h"]h$]h&]uh1j2hjkubah}(h]h ]h"]h$]h&]uh1j-hj:ubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM)hjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj:ubj.)}(hhh]j3)}(hhh]h)}(hX4Good analysis! It is in fact impossible in the absence of bugs in RCU. But this diagram is complex enough as it is, so simplicity overrode accuracy. You can think of it as poetic license, or you can think of it as misdirection that is resolved in the `stitched-together diagram `__.h](hGood analysis! It is in fact impossible in the absence of bugs in RCU. But this diagram is complex enough as it is, so simplicity overrode accuracy. You can think of it as poetic license, or you can think of it as misdirection that is resolved in the }(hjhhhNhNubj)}(h8`stitched-together diagram `__h]hstitched-together diagram}(hjhhhNhNubah}(h]h ]h"]h$]h&]namestitched-together diagramjputting-it-all-togetheruh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM+hjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hj:ubeh}(h]h ]h"]h$]h&]uh1j(hj-ubeh}(h]h ]h"]h$]h&]colsKuh1jhj*ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]jgah ]h"]forcing quiescent statesah$]h&]uh1hhjxhhhhhM jS Kubh)}(hhh](h)}(hGrace-Period Cleanuph]hGrace-Period Cleanup}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM3ubh)}(hGrace-period cleanup first scans the ``rcu_node`` tree breadth-first advancing all the ``->gp_seq`` fields, then it advances the ``rcu_state`` structure's ``->gp_seq`` field. The ordering effects are shown below:h](h%Grace-period cleanup first scans the }(hj+hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj+ubh& tree breadth-first advancing all the }(hj+hhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj+ubh fields, then it advances the }(hj+hhhNhNubj$)}(h ``rcu_state``h]h rcu_state}(hjWhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj+ubh structure’s }(hj+hhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj+ubh- field. The ordering effects are shown below:}(hj+hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM5hjhhubj)}(hhh]j)}(hhh]j)}(h*.. kernel-figure:: TreeRCU-gp-cleanup.svg h]h}(h]h ]h"]h$]h&]uri1RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svgj%}j'jsuh1jhjhhhKubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1j hjhhhhhM;ubh)}(h~As indicated by the oval at the bottom of the diagram, once grace-period cleanup is complete, the next grace period can begin.h]h~As indicated by the oval at the bottom of the diagram, once grace-period cleanup is complete, the next grace period can begin.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM<hjhhubj)}(hhh]j)}(hhh](j)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1jhjubj))}(hhh](j.)}(hhh]j3)}(hhh]h)}(h**Quick Quiz**:h](j<)}(h**Quick Quiz**h]h Quick Quiz}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM@hjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjubj.)}(hhh]j3)}(hhh]h)}(h-But when precisely does the grace period end?h]h-But when precisely does the grace period end?}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMBhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjubj.)}(hhh]j3)}(hhh]h)}(h **Answer**:h](j<)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j;hjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMDhjubah}(h]h ]h"]h$]h&]uh1j2hjubah}(h]h ]h"]h$]h&]uh1j-hjubj.)}(hhh]j3)}(hhh]h)}(hXThere is no useful single point at which the grace period can be said to end. The earliest reasonable candidate is as soon as the last CPU has reported its quiescent state, but it may be some milliseconds before RCU becomes aware of this. The latest reasonable candidate is once the ``rcu_state`` structure's ``->gp_seq`` field has been updated, but it is quite possible that some CPUs have already completed phase two of their updates by that time. In short, if you are going to work with RCU, you need to learn to embrace uncertainty.h](hXThere is no useful single point at which the grace period can be said to end. The earliest reasonable candidate is as soon as the last CPU has reported its quiescent state, but it may be some milliseconds before RCU becomes aware of this. The latest reasonable candidate is once the }(hjChhhNhNubj$)}(h ``rcu_state``h]h rcu_state}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjCubh structure’s }(hjChhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjCubh field has been updated, but it is quite possible that some CPUs have already completed phase two of their updates by that time. In short, if you are going to work with RCU, you need to learn to embrace uncertainty.}(hjChhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMFhj@ubah}(h]h ]h"]h$]h&]uh1j2hj=ubah}(h]h ]h"]h$]h&]uh1j-hjubeh}(h]h ]h"]h$]h&]uh1j(hjubeh}(h]h ]h"]h$]h&]colsKuh1jhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]jah ]h"]grace-period cleanupah$]h&]uh1hhjxhhhhhM3jS Kubh)}(hhh](h)}(hCallback Invocationh]hCallback Invocation}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMQubh)}(hXOnce a given CPU's leaf ``rcu_node`` structure's ``->gp_seq`` field has been updated, that CPU can begin invoking its RCU callbacks that were waiting for this grace period to end. These callbacks are identified by ``rcu_advance_cbs()``, which is usually invoked by ``__note_gp_changes()``. As shown in the diagram below, this invocation can be triggered by the scheduling-clock interrupt (``rcu_sched_clock_irq()`` on the left) or by idle entry (``rcu_cleanup_after_idle()`` on the right, but only for kernels build with ``CONFIG_RCU_FAST_NO_HZ=y``). Either way, ``RCU_SOFTIRQ`` is raised, which results in ``rcu_do_batch()`` invoking the callbacks, which in turn allows those callbacks to carry out (either directly or indirectly via wakeup) the needed phase-two processing for each update.h](hOnce a given CPU’s leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh structure’s }(hjhhhNhNubj$)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh field has been updated, that CPU can begin invoking its RCU callbacks that were waiting for this grace period to end. These callbacks are identified by }(hjhhhNhNubj$)}(h``rcu_advance_cbs()``h]hrcu_advance_cbs()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh, which is usually invoked by }(hjhhhNhNubj$)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhe. As shown in the diagram below, this invocation can be triggered by the scheduling-clock interrupt (}(hjhhhNhNubj$)}(h``rcu_sched_clock_irq()``h]hrcu_sched_clock_irq()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh on the left) or by idle entry (}(hjhhhNhNubj$)}(h``rcu_cleanup_after_idle()``h]hrcu_cleanup_after_idle()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh/ on the right, but only for kernels build with }(hjhhhNhNubj$)}(h``CONFIG_RCU_FAST_NO_HZ=y``h]hCONFIG_RCU_FAST_NO_HZ=y}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh). Either way, }(hjhhhNhNubj$)}(h``RCU_SOFTIRQ``h]h RCU_SOFTIRQ}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh is raised, which results in }(hjhhhNhNubj$)}(h``rcu_do_batch()``h]hrcu_do_batch()}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubh invoking the callbacks, which in turn allows those callbacks to carry out (either directly or indirectly via wakeup) the needed phase-two processing for each update.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMShjhhubj)}(hhh]j)}(hhh]j)}(h3.. kernel-figure:: TreeRCU-callback-invocation.svg h]h}(h]h ]h"]h$]h&]uri:RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svgj%}j'jmsuh1jhj_hhhKubah}(h]h ]h"]h$]h&]uh1jhj\ubah}(h]h ]h"]h$]h&]uh1j hjhhhhhMaubh)}(hXjPlease note that callback invocation can also be prompted by any number of corner-case code paths, for example, when a CPU notes that it has excessive numbers of callbacks queued. In all cases, the CPU acquires its leaf ``rcu_node`` structure's ``->lock`` before invoking callbacks, which preserves the required ordering against the newly completed grace period.h](hPlease note that callback invocation can also be prompted by any number of corner-case code paths, for example, when a CPU notes that it has excessive numbers of callbacks queued. In all cases, the CPU acquires its leaf }(hj{hhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj{ubh structure’s }(hj{hhhNhNubj$)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hj{ubhk before invoking callbacks, which preserves the required ordering against the newly completed grace period.}(hj{hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMbhjhhubh)}(hXdHowever, if the callback function communicates to other CPUs, for example, doing a wakeup, then it is that function's responsibility to maintain ordering. For example, if the callback function wakes up a task that runs on some other CPU, proper ordering must in place in both the callback function and the task being awakened. To see why this is important, consider the top half of the `grace-period cleanup`_ diagram. The callback might be running on a CPU corresponding to the leftmost leaf ``rcu_node`` structure, and awaken a task that is to run on a CPU corresponding to the rightmost leaf ``rcu_node`` structure, and the grace-period kernel thread might not yet have reached the rightmost leaf. In this case, the grace period's memory ordering might not yet have reached that CPU, so again the callback function and the awakened task must supply proper ordering.h](hXHowever, if the callback function communicates to other CPUs, for example, doing a wakeup, then it is that function’s responsibility to maintain ordering. For example, if the callback function wakes up a task that runs on some other CPU, proper ordering must in place in both the callback function and the task being awakened. To see why this is important, consider the top half of the }(hjhhhNhNubj)}(h`grace-period cleanup`_h]hgrace-period cleanup}(hjhhhNhNubah}(h]h ]h"]h$]h&]namegrace-period cleanupjjuh1jhjjKubhT diagram. The callback might be running on a CPU corresponding to the leftmost leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhZ structure, and awaken a task that is to run on a CPU corresponding to the rightmost leaf }(hjhhhNhNubj$)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j#hjubhX structure, and the grace-period kernel thread might not yet have reached the rightmost leaf. In this case, the grace period’s memory ordering might not yet have reached that CPU, so again the callback function and the awakened task must supply proper ordering.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMihjhhubeh}(h]jah ]h"]callback invocationah$]h&]uh1hhjxhhhhhMQjS Kubeh}(h]0tree-rcu-grace-period-memory-ordering-componentsah ]h"]0tree rcu grace period memory ordering componentsah$]h&]uh1hhjshhhhhKubh)}(hhh](h)}(hPutting It All Togetherh]hPutting It All Together}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMyubh)}(h$A stitched-together diagram is here:h]h$A stitched-together diagram is here:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM{hjhhubj)}(hhh]j)}(hhh]j)}(h".. kernel-figure:: TreeRCU-gp.svg h]h}(h]h ]h"]h$]h&]uri)RCU/Design/Memory-Ordering/TreeRCU-gp.svgj%}j'j2suh1jhj$hhhKubah}(h]h ]h"]h$]h&]uh1jhj!ubah}(h]h ]h"]h$]h&]uh1j hjhhhhhM~ubeh}(h]jah ]h"]putting it all togetherah$]h&]uh1hhjshhhhhMyjS Kubh)}(hhh](h)}(hLegal Statementh]hLegal Statement}(hjJhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjGhhhhhMubh)}(h_This work represents the view of the author and does not necessarily represent the view of IBM.h]h_This work represents the view of the author and does not necessarily represent the view of IBM.}(hjXhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjGhhubh)}(h2Linux is a registered trademark of Linus Torvalds.h]h2Linux is a registered trademark of Linus Torvalds.}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjGhhubh)}(hWOther company, product, and service names may be trademarks or service marks of others.h]hWOther company, product, and service names may be trademarks or service marks of others.}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjGhhubeh}(h]legal-statementah ]h"]legal statementah$]h&]uh1hhjshhhhhMubeh}(h]5tree-rcu-grace-period-memory-ordering-building-blocksah ]h"]5tree rcu grace period memory ordering building blocksah$]h&]uh1hhhhhhhhK/ubeh}(h]6a-tour-through-tree-rcu-s-grace-period-memory-orderingah ]h"]6a tour through tree_rcu's grace-period memory orderingah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksj2footnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}(callback registry]jagrace-period initialization]jaself-reported quiescent states]jadynamic tick interface]jacpu-hotplug interface]j5aforcing quiescent states](jXjjegrace-period cleanup](j{jecallback invocation]japutting it all together]jaurefids}nameids}(jjhhjpjmjjjjjP jj.jjsjj j!jjDjjgjjjjjDjjju nametypes}(jhjpjjjP j.jsj jjjjjDjuh}(jhhhjmhjjsjjxjjjjT jj1j!jvjDjjgjjjjjjjjjGu footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages] transformerN include_log] decorationNhhub.