sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftargetG/translations/zh_CN/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/zh_TW/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/it_IT/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/ja_JP/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/ko_KR/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hPortuguese (Brazilian)}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/pt_BR/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftargetG/translations/sp_SP/RCU/Design/Memory-Ordering/Tree-RCU-Memory-OrderingmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhsection)}(hhh](htitle)}(h6A Tour Through TREE_RCU's Grace-Period Memory Orderingh]h8A Tour Through TREE_RCU’s Grace-Period Memory Ordering}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhha/var/lib/git/docbuild/linux/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rsthKubh paragraph)}(hAugust 8, 2017h]hAugust 8, 2017}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(h0This article was contributed by Paul E. McKenneyh]h0This article was contributed by Paul E. McKenney}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Introductionh]h Introduction}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhK ubh)}(hqThis document gives a rough visual overview of how Tree RCU's grace-period memory ordering guarantee is provided.h]hsThis document gives a rough visual overview of how Tree RCU’s grace-period memory ordering guarantee is provided.}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubeh}(h] introductionah ]h"] introductionah$]h&]uh1hhhhhhhhK ubh)}(hhh](h)}(h:What Is Tree RCU's Grace Period Memory Ordering Guarantee?h]hlock``. These critical sections use helper functions for lock acquisition, including ``raw_spin_lock_rcu_node()``, ``raw_spin_lock_irq_rcu_node()``, and ``raw_spin_lock_irqsave_rcu_node()``. Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``, ``raw_spin_unlock_irq_rcu_node()``, and ``raw_spin_unlock_irqrestore_rcu_node()``, respectively. For completeness, a ``raw_spin_trylock_rcu_node()`` is also provided. The key point is that the lock-acquisition functions, including ``raw_spin_trylock_rcu_node()``, all invoke ``smp_mb__after_unlock_lock()`` immediately after successful acquisition of the lock.h](hWThe workhorse for RCU’s grace-period memory ordering is the critical section for the }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhO. These critical sections use helper functions for lock acquisition, including }(hjhhhNhNubj8)}(h``raw_spin_lock_rcu_node()``h]hraw_spin_lock_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh, }(hjhhhNhNubj8)}(h ``raw_spin_lock_irq_rcu_node()``h]hraw_spin_lock_irq_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh, and }(hjhhhNhNubj8)}(h$``raw_spin_lock_irqsave_rcu_node()``h]h raw_spin_lock_irqsave_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh&. Their lock-release counterparts are }(hjhhhNhNubj8)}(h``raw_spin_unlock_rcu_node()``h]hraw_spin_unlock_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh, }hjsbj8)}(h"``raw_spin_unlock_irq_rcu_node()``h]hraw_spin_unlock_irq_rcu_node()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh, and }(hjhhhNhNubj8)}(h)``raw_spin_unlock_irqrestore_rcu_node()``h]h%raw_spin_unlock_irqrestore_rcu_node()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh$, respectively. For completeness, a }(hjhhhNhNubj8)}(h``raw_spin_trylock_rcu_node()``h]hraw_spin_trylock_rcu_node()}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhS is also provided. The key point is that the lock-acquisition functions, including }(hjhhhNhNubj8)}(h``raw_spin_trylock_rcu_node()``h]hraw_spin_trylock_rcu_node()}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh , all invoke }(hjhhhNhNubj8)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh6 immediately after successful acquisition of the lock.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK1hjhhubh)}(hX9Therefore, for any given ``rcu_node`` structure, any access happening before one of the above lock-release functions will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions. Furthermore, any access happening before one of the above lock-release function on any given CPU will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions executing on that same CPU, even if the lock-release and lock-acquisition functions are operating on different ``rcu_node`` structures. Tree RCU uses these two ordering guarantees to form an ordering network among all CPUs that were in any way involved in the grace period, including any CPUs that came online or went offline during the grace period in question.h](hTherefore, for any given }(hjlhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjlubhX structure, any access happening before one of the above lock-release functions will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions. Furthermore, any access happening before one of the above lock-release function on any given CPU will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions executing on that same CPU, even if the lock-release and lock-acquisition functions are operating on different }(hjlhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjlubh structures. Tree RCU uses these two ordering guarantees to form an ordering network among all CPUs that were in any way involved in the grace period, including any CPUs that came online or went offline during the grace period in question.}(hjlhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK>hjhhubh)}(hnThe following litmus test exhibits the ordering effects of these lock-acquisition and lock-release functions::h]hmThe following litmus test exhibits the ordering effects of these lock-acquisition and lock-release functions:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKMhjhhubh literal_block)}(hX 1 int x, y, z; 2 3 void task0(void) 4 { 5 raw_spin_lock_rcu_node(rnp); 6 WRITE_ONCE(x, 1); 7 r1 = READ_ONCE(y); 8 raw_spin_unlock_rcu_node(rnp); 9 } 10 11 void task1(void) 12 { 13 raw_spin_lock_rcu_node(rnp); 14 WRITE_ONCE(y, 1); 15 r2 = READ_ONCE(z); 16 raw_spin_unlock_rcu_node(rnp); 17 } 18 19 void task2(void) 20 { 21 WRITE_ONCE(z, 1); 22 smp_mb(); 23 r3 = READ_ONCE(x); 24 } 25 26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);h]hX 1 int x, y, z; 2 3 void task0(void) 4 { 5 raw_spin_lock_rcu_node(rnp); 6 WRITE_ONCE(x, 1); 7 r1 = READ_ONCE(y); 8 raw_spin_unlock_rcu_node(rnp); 9 } 10 11 void task1(void) 12 { 13 raw_spin_lock_rcu_node(rnp); 14 WRITE_ONCE(y, 1); 15 r2 = READ_ONCE(z); 16 raw_spin_unlock_rcu_node(rnp); 17 } 18 19 void task2(void) 20 { 21 WRITE_ONCE(z, 1); 22 smp_mb(); 23 r3 = READ_ONCE(x); 24 } 25 26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);}hjsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1jhhhKPhjhhubh)}(hXVThe ``WARN_ON()`` is evaluated at "the end of time", after all changes have propagated throughout the system. Without the ``smp_mb__after_unlock_lock()`` provided by the acquisition functions, this ``WARN_ON()`` could trigger, for example on PowerPC. The ``smp_mb__after_unlock_lock()`` invocations prevent this ``WARN_ON()`` from triggering.h](hThe }(hjhhhNhNubj8)}(h ``WARN_ON()``h]h WARN_ON()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhm is evaluated at “the end of time”, after all changes have propagated throughout the system. Without the }(hjhhhNhNubj8)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh- provided by the acquisition functions, this }(hjhhhNhNubj8)}(h ``WARN_ON()``h]h WARN_ON()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh, could trigger, for example on PowerPC. The }(hjhhhNhNubj8)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh invocations prevent this }(hjhhhNhNubj8)}(h ``WARN_ON()``h]h WARN_ON()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh from triggering.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKkhjhhubhtable)}(hhh]htgroup)}(hhh](hcolspec)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hj-ubhtbody)}(hhh](hrow)}(hhh]hentry)}(hhh]h)}(h**Quick Quiz**:h](hstrong)}(h**Quick Quiz**h]h Quick Quiz}(hjQhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjKubh:}(hjKhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKthjHubah}(h]h ]h"]h$]h&]uh1jFhjCubah}(h]h ]h"]h$]h&]uh1jAhj>ubjB)}(hhh]jG)}(hhh]h)}(hX<But the chain of rcu_node-structure lock acquisitions guarantees that new readers will see all of the updater's pre-grace-period accesses and also guarantees that the updater's post-grace-period accesses will see all of the old reader's accesses. So why do we need all of those calls to smp_mb__after_unlock_lock()?h]hXBBut the chain of rcu_node-structure lock acquisitions guarantees that new readers will see all of the updater’s pre-grace-period accesses and also guarantees that the updater’s post-grace-period accesses will see all of the old reader’s accesses. So why do we need all of those calls to smp_mb__after_unlock_lock()?}(hj{hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKvhjxubah}(h]h ]h"]h$]h&]uh1jFhjuubah}(h]h ]h"]h$]h&]uh1jAhj>ubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK|hjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhj>ubjB)}(hhh]jG)}(hhh](h)}(hBecause we must provide ordering for RCU's polling grace-period primitives, for example, get_state_synchronize_rcu() and poll_state_synchronize_rcu(). Consider this code::h]hBecause we must provide ordering for RCU’s polling grace-period primitives, for example, get_state_synchronize_rcu() and poll_state_synchronize_rcu(). Consider this code:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK~hjubj)}(hX,CPU 0 CPU 1 ---- ---- WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) g = get_state_synchronize_rcu() smp_mb() while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) continue; r0 = READ_ONCE(Y)h]hX,CPU 0 CPU 1 ---- ---- WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) g = get_state_synchronize_rcu() smp_mb() while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) continue; r0 = READ_ONCE(Y)}hjsbah}(h]h ]h"]h$]h&]jjuh1jhhhKhjubh)}(hRCU guarantees that the outcome r0 == 0 && r1 == 0 will not happen, even if CPU 1 is in an RCU extended quiescent state (idle or offline) and thus won't interact directly with the RCU core processing at all.h]hRCU guarantees that the outcome r0 == 0 && r1 == 0 will not happen, even if CPU 1 is in an RCU extended quiescent state (idle or offline) and thus won’t interact directly with the RCU core processing at all.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubeh}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhj>ubeh}(h]h ]h"]h$]h&]uh1j<hj-ubeh}(h]h ]h"]h$]h&]colsKuh1j+hj(ubah}(h]h ]h"]h$]h&]uh1j&hjhhhhhNubh)}(hXThis approach must be extended to include idle CPUs, which need RCU's grace-period memory ordering guarantee to extend to any RCU read-side critical sections preceding and following the current idle sojourn. This case is handled by calls to the strongly ordered ``atomic_add_return()`` read-modify-write atomic operation that is invoked within ``ct_kernel_exit_state()`` at idle-entry time and within ``ct_kernel_enter_state()`` at idle-exit time. The grace-period kthread invokes first ``ct_rcu_watching_cpu_acquire()`` (preceded by a full memory barrier) and ``rcu_watching_snap_stopped_since()`` (both of which rely on acquire semantics) to detect idle CPUs.h](hXThis approach must be extended to include idle CPUs, which need RCU’s grace-period memory ordering guarantee to extend to any RCU read-side critical sections preceding and following the current idle sojourn. This case is handled by calls to the strongly ordered }(hjhhhNhNubj8)}(h``atomic_add_return()``h]hatomic_add_return()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh; read-modify-write atomic operation that is invoked within }(hjhhhNhNubj8)}(h``ct_kernel_exit_state()``h]hct_kernel_exit_state()}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh at idle-entry time and within }(hjhhhNhNubj8)}(h``ct_kernel_enter_state()``h]hct_kernel_enter_state()}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh; at idle-exit time. The grace-period kthread invokes first }(hjhhhNhNubj8)}(h!``ct_rcu_watching_cpu_acquire()``h]hct_rcu_watching_cpu_acquire()}(hjPhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh) (preceded by a full memory barrier) and }(hjhhhNhNubj8)}(h%``rcu_watching_snap_stopped_since()``h]h!rcu_watching_snap_stopped_since()}(hjbhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh? (both of which rely on acquire semantics) to detect idle CPUs.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hj}ubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(hDBut what about CPUs that remain offline for the entire grace period?h]hDBut what about CPUs that remain offline for the entire grace period?}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(hXSuch CPUs will be offline at the beginning of the grace period, so the grace period won't expect quiescent states from them. Races between grace-period start and CPU-hotplug operations are mediated by the CPU's leaf ``rcu_node`` structure's ``->lock`` as described above.h](hSuch CPUs will be offline at the beginning of the grace period, so the grace period won’t expect quiescent states from them. Races between grace-period start and CPU-hotplug operations are mediated by the CPU’s leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh as described above.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj ubah}(h]h ]h"]h$]h&]uh1jFhj ubah}(h]h ]h"]h$]h&]uh1jAhjubeh}(h]h ]h"]h$]h&]uh1j<hj}ubeh}(h]h ]h"]h$]h&]colsKuh1j+hjzubah}(h]h ]h"]h$]h&]uh1j&hjhhhhhNubh)}(hXThe approach must be extended to handle one final case, that of waking a task blocked in ``synchronize_rcu()``. This task might be affined to a CPU that is not yet aware that the grace period has ended, and thus might not yet be subject to the grace period's memory ordering. Therefore, there is an ``smp_mb()`` after the return from ``wait_for_completion()`` in the ``synchronize_rcu()`` code path.h](hYThe approach must be extended to handle one final case, that of waking a task blocked in }(hj`hhhNhNubj8)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hjhhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj`ubh. This task might be affined to a CPU that is not yet aware that the grace period has ended, and thus might not yet be subject to the grace period’s memory ordering. Therefore, there is an }(hj`hhhNhNubj8)}(h ``smp_mb()``h]hsmp_mb()}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj`ubh after the return from }(hj`hhhNhNubj8)}(h``wait_for_completion()``h]hwait_for_completion()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj`ubh in the }(hj`hhhNhNubj8)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj`ubh code path.}(hj`hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hjubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(h^What? Where??? I don't see any ``smp_mb()`` after the return from ``wait_for_completion()``!!!h](h!What? Where??? I don’t see any }(hjhhhNhNubj8)}(h ``smp_mb()``h]hsmp_mb()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh after the return from }(hjhhhNhNubj8)}(h``wait_for_completion()``h]hwait_for_completion()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh!!!}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hjEhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjAubh:}(hjAhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj>ubah}(h]h ]h"]h$]h&]uh1jFhj;ubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(hXpThat would be because I spotted the need for that ``smp_mb()`` during the creation of this documentation, and it is therefore unlikely to hit mainline before v4.14. Kudos to Lance Roy, Will Deacon, Peter Zijlstra, and Jonathan Cameron for asking questions that sensitized me to the rather elaborate sequence of events that demonstrate the need for this memory barrier.h](h2That would be because I spotted the need for that }(hjohhhNhNubj8)}(h ``smp_mb()``h]hsmp_mb()}(hjwhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjoubhX2 during the creation of this documentation, and it is therefore unlikely to hit mainline before v4.14. Kudos to Lance Roy, Will Deacon, Peter Zijlstra, and Jonathan Cameron for asking questions that sensitized me to the rather elaborate sequence of events that demonstrate the need for this memory barrier.}(hjohhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjlubah}(h]h ]h"]h$]h&]uh1jFhjiubah}(h]h ]h"]h$]h&]uh1jAhjubeh}(h]h ]h"]h$]h&]uh1j<hjubeh}(h]h ]h"]h$]h&]colsKuh1j+hjubah}(h]h ]h"]h$]h&]uh1j&hjhhhhhNubh)}(hXTree RCU's grace--period memory-ordering guarantees rely most heavily on the ``rcu_node`` structure's ``->lock`` field, so much so that it is necessary to abbreviate this pattern in the diagrams in the next section. For example, consider the ``rcu_prepare_for_idle()`` function shown below, which is one of several functions that enforce ordering of newly arrived RCU callbacks against future grace periods:h](hOTree RCU’s grace--period memory-ordering guarantees rely most heavily on the }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh field, so much so that it is necessary to abbreviate this pattern in the diagrams in the next section. For example, consider the }(hjhhhNhNubj8)}(h``rcu_prepare_for_idle()``h]hrcu_prepare_for_idle()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh function shown below, which is one of several functions that enforce ordering of newly arrived RCU callbacks against future grace periods:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hXj 1 static void rcu_prepare_for_idle(void) 2 { 3 bool needwake; 4 struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 5 struct rcu_node *rnp; 6 int tne; 7 8 lockdep_assert_irqs_disabled(); 9 if (rcu_rdp_is_offloaded(rdp)) 10 return; 11 12 /* Handle nohz enablement switches conservatively. */ 13 tne = READ_ONCE(tick_nohz_active); 14 if (tne != rdp->tick_nohz_enabled_snap) { 15 if (!rcu_segcblist_empty(&rdp->cblist)) 16 invoke_rcu_core(); /* force nohz to see update. */ 17 rdp->tick_nohz_enabled_snap = tne; 18 return; 19 } 20 if (!tne) 21 return; 22 23 /* 24 * If we have not yet accelerated this jiffy, accelerate all 25 * callbacks on this CPU. 26 */ 27 if (rdp->last_accelerate == jiffies) 28 return; 29 rdp->last_accelerate = jiffies; 30 if (rcu_segcblist_pend_cbs(&rdp->cblist)) { 31 rnp = rdp->mynode; 32 raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ 33 needwake = rcu_accelerate_cbs(rnp, rdp); 34 raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ 35 if (needwake) 36 rcu_gp_kthread_wake(); 37 } 38 }h]hXj 1 static void rcu_prepare_for_idle(void) 2 { 3 bool needwake; 4 struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 5 struct rcu_node *rnp; 6 int tne; 7 8 lockdep_assert_irqs_disabled(); 9 if (rcu_rdp_is_offloaded(rdp)) 10 return; 11 12 /* Handle nohz enablement switches conservatively. */ 13 tne = READ_ONCE(tick_nohz_active); 14 if (tne != rdp->tick_nohz_enabled_snap) { 15 if (!rcu_segcblist_empty(&rdp->cblist)) 16 invoke_rcu_core(); /* force nohz to see update. */ 17 rdp->tick_nohz_enabled_snap = tne; 18 return; 19 } 20 if (!tne) 21 return; 22 23 /* 24 * If we have not yet accelerated this jiffy, accelerate all 25 * callbacks on this CPU. 26 */ 27 if (rdp->last_accelerate == jiffies) 28 return; 29 rdp->last_accelerate = jiffies; 30 if (rcu_segcblist_pend_cbs(&rdp->cblist)) { 31 rnp = rdp->mynode; 32 raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ 33 needwake = rcu_accelerate_cbs(rnp, rdp); 34 raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ 35 if (needwake) 36 rcu_gp_kthread_wake(); 37 } 38 }}hjsbah}(h]h ]h"]h$]h&]jjuh1jhhhKhjhhubh)}(hBut the only part of ``rcu_prepare_for_idle()`` that really matters for this discussion are lines 32–34. We will therefore abbreviate this function as follows:h](hBut the only part of }(hjhhhNhNubj8)}(h``rcu_prepare_for_idle()``h]hrcu_prepare_for_idle()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhs that really matters for this discussion are lines 32–34. We will therefore abbreviate this function as follows:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubkfigure kernel_figure)}(hhh]hfigure)}(hhh]himage)}(h%.. kernel-figure:: rcu_node-lock.svg h]h}(h]h ]h"]h$]h&]uri,RCU/Design/Memory-Ordering/rcu_node-lock.svg candidates}*j8suh1j+hj(hhhKubah}(h]h ]h"]h$]h&]uh1j&hj#ubah}(h]h ]h"]h$]h&]uh1j!hjhhhhhKubh)}(hThe box represents the ``rcu_node`` structure's ``->lock`` critical section, with the double line on top representing the additional ``smp_mb__after_unlock_lock()``.h](hThe box represents the }(hjHhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjPhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjHubh structure’s }(hjHhhhNhNubj8)}(h ``->lock``h]h->lock}(hjbhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjHubhK critical section, with the double line on top representing the additional }(hjHhhhNhNubj8)}(h``smp_mb__after_unlock_lock()``h]hsmp_mb__after_unlock_lock()}(hjthhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjHubh.}(hjHhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hhh](h)}(h0Tree RCU Grace Period Memory Ordering Componentsh]h0Tree RCU Grace Period Memory Ordering Components}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(h\Tree RCU's grace-period memory-ordering guarantee is provided by a number of RCU components:h]h^Tree RCU’s grace-period memory-ordering guarantee is provided by a number of RCU components:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubhenumerated_list)}(hhh](h list_item)}(h`Callback Registry`_h]h)}(hjh]h reference)}(hjh]hCallback Registry}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameCallback Registryrefidcallback-registryuh1jhjresolvedKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Grace-Period Initialization`_h]h)}(hjh]j)}(hjh]hGrace-Period Initialization}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameGrace-Period Initializationjgrace-period-initializationuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h!`Self-Reported Quiescent States`_h]h)}(hjh]j)}(hjh]hSelf-Reported Quiescent States}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameSelf-Reported Quiescent Statesjself-reported-quiescent-statesuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Dynamic Tick Interface`_h]h)}(hj!h]j)}(hj!h]hDynamic Tick Interface}(hj&hhhNhNubah}(h]h ]h"]h$]h&]nameDynamic Tick Interfacejdynamic-tick-interfaceuh1jhj#jKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`CPU-Hotplug Interface`_h]h)}(hjDh]j)}(hjDh]hCPU-Hotplug Interface}(hjIhhhNhNubah}(h]h ]h"]h$]h&]nameCPU-Hotplug Interfacejcpu-hotplug-interfaceuh1jhjFjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjBubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Forcing Quiescent States`_h]h)}(hjgh]j)}(hjgh]hForcing Quiescent States}(hjlhhhNhNubah}(h]h ]h"]h$]h&]nameForcing Quiescent Statesjforcing-quiescent-statesuh1jhjijKubah}(h]h ]h"]h$]h&]uh1hhhhMhjeubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Grace-Period Cleanup`_h]h)}(hjh]j)}(hjh]hGrace-Period Cleanup}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameGrace-Period Cleanupjgrace-period-cleanupuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h`Callback Invocation`_ h]h)}(h`Callback Invocation`_h]j)}(hjh]hCallback Invocation}(hjhhhNhNubah}(h]h ]h"]h$]h&]nameCallback Invocationjcallback-invocationuh1jhjjKubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1jhjhhhhhMubh)}(hMEach of the following section looks at the corresponding component in detail.h]hMEach of the following section looks at the corresponding component in detail.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hjhhubh)}(hhh](h)}(hCallback Registryh]hCallback Registry}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hXIf RCU's grace-period guarantee is to mean anything at all, any access that happens before a given invocation of ``call_rcu()`` must also happen before the corresponding grace period. The implementation of this portion of RCU's grace period guarantee is shown in the following figure:h](hsIf RCU’s grace-period guarantee is to mean anything at all, any access that happens before a given invocation of }(hjhhhNhNubj8)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh must also happen before the corresponding grace period. The implementation of this portion of RCU’s grace period guarantee is shown in the following figure:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj")}(hhh]j')}(hhh]j,)}(h1.. kernel-figure:: TreeRCU-callback-registry.svg h]h}(h]h ]h"]h$]h&]uri8RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svgj9}j;j* suh1j+hj hhhKubah}(h]h ]h"]h$]h&]uh1j&hj ubah}(h]h ]h"]h$]h&]uh1j!hjhhhhhMubh)}(hXBecause ``call_rcu()`` normally acts only on CPU-local state, it provides no ordering guarantees, either for itself or for phase one of the update (which again will usually be removal of an element from an RCU-protected data structure). It simply enqueues the ``rcu_head`` structure on a per-CPU list, which cannot become associated with a grace period until a later call to ``rcu_accelerate_cbs()``, as shown in the diagram above.h](hBecause }(hj8 hhhNhNubj8)}(h``call_rcu()``h]h call_rcu()}(hj@ hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj8 ubh normally acts only on CPU-local state, it provides no ordering guarantees, either for itself or for phase one of the update (which again will usually be removal of an element from an RCU-protected data structure). It simply enqueues the }(hj8 hhhNhNubj8)}(h ``rcu_head``h]hrcu_head}(hjR hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj8 ubhg structure on a per-CPU list, which cannot become associated with a grace period until a later call to }(hj8 hhhNhNubj8)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hjd hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj8 ubh , as shown in the diagram above.}(hj8 hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hXOne set of code paths shown on the left invokes ``rcu_accelerate_cbs()`` via ``note_gp_changes()``, either directly from ``call_rcu()`` (if the current CPU is inundated with queued ``rcu_head`` structures) or more likely from an ``RCU_SOFTIRQ`` handler. Another code path in the middle is taken only in kernels built with ``CONFIG_RCU_FAST_NO_HZ=y``, which invokes ``rcu_accelerate_cbs()`` via ``rcu_prepare_for_idle()``. The final code path on the right is taken only in kernels built with ``CONFIG_HOTPLUG_CPU=y``, which invokes ``rcu_accelerate_cbs()`` via ``rcu_advance_cbs()``, ``rcu_migrate_callbacks``, ``rcutree_migrate_callbacks()``, and ``takedown_cpu()``, which in turn is invoked on a surviving CPU after the outgoing CPU has been completely offlined.h](h0One set of code paths shown on the left invokes }(hj| hhhNhNubj8)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh via }(hj| hhhNhNubj8)}(h``note_gp_changes()``h]hnote_gp_changes()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh, either directly from }(hj| hhhNhNubj8)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh. (if the current CPU is inundated with queued }(hj| hhhNhNubj8)}(h ``rcu_head``h]hrcu_head}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh$ structures) or more likely from an }(hj| hhhNhNubj8)}(h``RCU_SOFTIRQ``h]h RCU_SOFTIRQ}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubhN handler. Another code path in the middle is taken only in kernels built with }(hj| hhhNhNubj8)}(h``CONFIG_RCU_FAST_NO_HZ=y``h]hCONFIG_RCU_FAST_NO_HZ=y}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh, which invokes }(hj| hhhNhNubj8)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh via }(hj| hhhNhNubj8)}(h``rcu_prepare_for_idle()``h]hrcu_prepare_for_idle()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubhG. The final code path on the right is taken only in kernels built with }(hj| hhhNhNubj8)}(h``CONFIG_HOTPLUG_CPU=y``h]hCONFIG_HOTPLUG_CPU=y}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh, which invokes }(hj| hhhNhNubj8)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hj& hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh via }(hj| hhhNhNubj8)}(h``rcu_advance_cbs()``h]hrcu_advance_cbs()}(hj8 hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh, }(hj| hhhNhNubj8)}(h``rcu_migrate_callbacks``h]hrcu_migrate_callbacks}(hjJ hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh, }(hj| hhhNhNubj8)}(h``rcutree_migrate_callbacks()``h]hrcutree_migrate_callbacks()}(hj\ hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubh, and }(hj| hhhNhNubj8)}(h``takedown_cpu()``h]htakedown_cpu()}(hjn hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj| ubhb, which in turn is invoked on a surviving CPU after the outgoing CPU has been completely offlined.}(hj| hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjhhubh)}(hXDThere are a few other code paths within grace-period processing that opportunistically invoke ``rcu_accelerate_cbs()``. However, either way, all of the CPU's recently queued ``rcu_head`` structures are associated with a future grace-period number under the protection of the CPU's lead ``rcu_node`` structure's ``->lock``. In all cases, there is full ordering against any prior critical section for that same ``rcu_node`` structure's ``->lock``, and also full ordering against any of the current task's or CPU's prior critical sections for any ``rcu_node`` structure's ``->lock``.h](h^There are a few other code paths within grace-period processing that opportunistically invoke }(hj hhhNhNubj8)}(h``rcu_accelerate_cbs()``h]hrcu_accelerate_cbs()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh:. However, either way, all of the CPU’s recently queued }(hj hhhNhNubj8)}(h ``rcu_head``h]hrcu_head}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubhf structures are associated with a future grace-period number under the protection of the CPU’s lead }(hj hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh structure’s }(hj hhhNhNubj8)}(h ``->lock``h]h->lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubhX. In all cases, there is full ordering against any prior critical section for that same }(hj hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh structure’s }(hj hhhNhNubj8)}(h ``->lock``h]h->lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubhh, and also full ordering against any of the current task’s or CPU’s prior critical sections for any }(hj hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh structure’s }(hj hhhNhNubj8)}(h ``->lock``h]h->lock}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM-hjhhubh)}(hThe next section will show how this ordering ensures that any accesses prior to the ``call_rcu()`` (particularly including phase one of the update) happen before the start of the corresponding grace period.h](hTThe next section will show how this ordering ensures that any accesses prior to the }(hj$ hhhNhNubj8)}(h``call_rcu()``h]h call_rcu()}(hj, hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj$ ubhl (particularly including phase one of the update) happen before the start of the corresponding grace period.}(hj$ hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM7hjhhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hjG ubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hja hhhNhNubah}(h]h ]h"]h$]h&]uh1jOhj] ubh:}(hj] hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM<hjZ ubah}(h]h ]h"]h$]h&]uh1jFhjW ubah}(h]h ]h"]h$]h&]uh1jAhjT ubjB)}(hhh]jG)}(hhh]h)}(h%But what about ``synchronize_rcu()``?h](hBut what about }(hj hhhNhNubj8)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh?}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM>hj ubah}(h]h ]h"]h$]h&]uh1jFhj ubah}(h]h ]h"]h$]h&]uh1jAhjT ubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jOhj ubh:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM@hj ubah}(h]h ]h"]h$]h&]uh1jFhj ubah}(h]h ]h"]h$]h&]uh1jAhjT ubjB)}(hhh]jG)}(hhh]h)}(hThe ``synchronize_rcu()`` passes ``call_rcu()`` to ``wait_rcu_gp()``, which invokes it. So either way, it eventually comes down to ``call_rcu()``.h](hThe }(hj hhhNhNubj8)}(h``synchronize_rcu()``h]hsynchronize_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh passes }(hj hhhNhNubj8)}(h``call_rcu()``h]h call_rcu()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh to }(hj hhhNhNubj8)}(h``wait_rcu_gp()``h]h wait_rcu_gp()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh?, which invokes it. So either way, it eventually comes down to }(hj hhhNhNubj8)}(h``call_rcu()``h]h call_rcu()}(hj) hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMBhj ubah}(h]h ]h"]h$]h&]uh1jFhj ubah}(h]h ]h"]h$]h&]uh1jAhjT ubeh}(h]h ]h"]h$]h&]uh1j<hjG ubeh}(h]h ]h"]h$]h&]colsKuh1j+hjD ubah}(h]h ]h"]h$]h&]uh1j&hjhhhhhNubeh}(h]jah ]h"]callback registryah$]h&]uh1hhjhhhhhM referencedKubh)}(hhh](h)}(hGrace-Period Initializationh]hGrace-Period Initialization}(hjk hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjh hhhhhMHubh)}(hXrGrace-period initialization is carried out by the grace-period kernel thread, which makes several passes over the ``rcu_node`` tree within the ``rcu_gp_init()`` function. This means that showing the full flow of ordering through the grace-period computation will require duplicating this tree. If you find this confusing, please note that the state of the ``rcu_node`` changes over time, just like Heraclitus's river. However, to keep the ``rcu_node`` river tractable, the grace-period kernel thread's traversals are presented in multiple parts, starting in this section with the various phases of grace-period initialization.h](hrGrace-period initialization is carried out by the grace-period kernel thread, which makes several passes over the }(hjy hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjy ubh tree within the }(hjy hhhNhNubj8)}(h``rcu_gp_init()``h]h rcu_gp_init()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjy ubh function. This means that showing the full flow of ordering through the grace-period computation will require duplicating this tree. If you find this confusing, please note that the state of the }(hjy hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjy ubhI changes over time, just like Heraclitus’s river. However, to keep the }(hjy hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjy ubh river tractable, the grace-period kernel thread’s traversals are presented in multiple parts, starting in this section with the various phases of grace-period initialization.}(hjy hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMJhjh hhubh)}(hThe first ordering-related grace-period initialization action is to advance the ``rcu_state`` structure's ``->gp_seq`` grace-period-number counter, as shown below:h](hPThe first ordering-related grace-period initialization action is to advance the }(hj hhhNhNubj8)}(h ``rcu_state``h]h rcu_state}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh structure’s }(hj hhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh- grace-period-number counter, as shown below:}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMThjh hhubj")}(hhh]j')}(hhh]j,)}(h).. kernel-figure:: TreeRCU-gp-init-1.svg h]h}(h]h ]h"]h$]h&]uri0RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svgj9}j;j suh1j+hj hhhKubah}(h]h ]h"]h$]h&]uh1j&hj ubah}(h]h ]h"]h$]h&]uh1j!hjh hhhhhMYubh)}(hThe actual increment is carried out using ``smp_store_release()``, which helps reject false-positive RCU CPU stall detection. Note that only the root ``rcu_node`` structure is touched.h](h*The actual increment is carried out using }(hj hhhNhNubj8)}(h``smp_store_release()``h]hsmp_store_release()}(hj( hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubhU, which helps reject false-positive RCU CPU stall detection. Note that only the root }(hj hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj: hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubh structure is touched.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMZhjh hhubh)}(hXThe first pass through the ``rcu_node`` tree updates bitmasks based on CPUs having come online or gone offline since the start of the previous grace period. In the common case where the number of online CPUs for this ``rcu_node`` structure has not transitioned to or from zero, this pass will scan only the leaf ``rcu_node`` structures. However, if the number of online CPUs for a given leaf ``rcu_node`` structure has transitioned from zero, ``rcu_init_new_rnp()`` will be invoked for the first incoming CPU. Similarly, if the number of online CPUs for a given leaf ``rcu_node`` structure has transitioned to zero, ``rcu_cleanup_dead_rnp()`` will be invoked for the last outgoing CPU. The diagram below shows the path of ordering if the leftmost ``rcu_node`` structure onlines its first CPU and if the next ``rcu_node`` structure has no online CPUs (or, alternatively if the leftmost ``rcu_node`` structure offlines its last CPU and if the next ``rcu_node`` structure has no online CPUs).h](hThe first pass through the }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjZ hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubh tree updates bitmasks based on CPUs having come online or gone offline since the start of the previous grace period. In the common case where the number of online CPUs for this }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjl hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubhS structure has not transitioned to or from zero, this pass will scan only the leaf }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj~ hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubhD structures. However, if the number of online CPUs for a given leaf }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubh' structure has transitioned from zero, }(hjR hhhNhNubj8)}(h``rcu_init_new_rnp()``h]hrcu_init_new_rnp()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubhf will be invoked for the first incoming CPU. Similarly, if the number of online CPUs for a given leaf }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubh% structure has transitioned to zero, }(hjR hhhNhNubj8)}(h``rcu_cleanup_dead_rnp()``h]hrcu_cleanup_dead_rnp()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubhi will be invoked for the last outgoing CPU. The diagram below shows the path of ordering if the leftmost }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubh1 structure onlines its first CPU and if the next }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubhA structure has no online CPUs (or, alternatively if the leftmost }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubh1 structure offlines its last CPU and if the next }(hjR hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjR ubh structure has no online CPUs).}(hjR hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM^hjh hhubj")}(hhh]j')}(hhh]j,)}(h).. kernel-figure:: TreeRCU-gp-init-2.svg h]h}(h]h ]h"]h$]h&]uri0RCU/Design/Memory-Ordering/TreeRCU-gp-init-2.svgj9}j;j7suh1j+hj)hhhKubah}(h]h ]h"]h$]h&]uh1j&hj&ubah}(h]h ]h"]h$]h&]uh1j!hjh hhhhhMoubh)}(hThe final ``rcu_gp_init()`` pass through the ``rcu_node`` tree traverses breadth-first, setting each ``rcu_node`` structure's ``->gp_seq`` field to the newly advanced value from the ``rcu_state`` structure, as shown in the following diagram.h](h The final }(hjEhhhNhNubj8)}(h``rcu_gp_init()``h]h rcu_gp_init()}(hjMhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjEubh pass through the }(hjEhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj_hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjEubh, tree traverses breadth-first, setting each }(hjEhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjEubh structure’s }(hjEhhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjEubh, field to the newly advanced value from the }(hjEhhhNhNubj8)}(h ``rcu_state``h]h rcu_state}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjEubh. structure, as shown in the following diagram.}(hjEhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMphjh hhubj")}(hhh]j')}(hhh]j,)}(h).. kernel-figure:: TreeRCU-gp-init-3.svg h]h}(h]h ]h"]h$]h&]uri0RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svgj9}j;jsuh1j+hjhhhKubah}(h]h ]h"]h$]h&]uh1j&hjubah}(h]h ]h"]h$]h&]uh1j!hjh hhhhhMvubh)}(hXThis change will also cause each CPU's next call to ``__note_gp_changes()`` to notice that a new grace period has started, as described in the next section. But because the grace-period kthread started the grace period at the root (with the advancing of the ``rcu_state`` structure's ``->gp_seq`` field) before setting each leaf ``rcu_node`` structure's ``->gp_seq`` field, each CPU's observation of the start of the grace period will happen after the actual start of the grace period.h](h6This change will also cause each CPU’s next call to }(hjhhhNhNubj8)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh to notice that a new grace period has started, as described in the next section. But because the grace-period kthread started the grace period at the root (with the advancing of the }(hjhhhNhNubj8)}(h ``rcu_state``h]h rcu_state}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh! field) before setting each leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhy field, each CPU’s observation of the start of the grace period will happen after the actual start of the grace period.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMwhjh hhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hj7ubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hjQhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjMubh:}(hjMhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjJubah}(h]h ]h"]h$]h&]uh1jFhjGubah}(h]h ]h"]h$]h&]uh1jAhjDubjB)}(hhh]jG)}(hhh]h)}(hBut what about the CPU that started the grace period? Why wouldn't it see the start of the grace period right when it started that grace period?h]hBut what about the CPU that started the grace period? Why wouldn’t it see the start of the grace period right when it started that grace period?}(hj{hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjxubah}(h]h ]h"]h$]h&]uh1jFhjuubah}(h]h ]h"]h$]h&]uh1jAhjDubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjDubjB)}(hhh]jG)}(hhh]h)}(hXIn some deep philosophical and overly anthromorphized sense, yes, the CPU starting the grace period is immediately aware of having done so. However, if we instead assume that RCU is not self-aware, then even the CPU starting the grace period does not really become aware of the start of this grace period until its first call to ``__note_gp_changes()``. On the other hand, this CPU potentially gets early notification because it invokes ``__note_gp_changes()`` during its last ``rcu_gp_init()`` pass through its leaf ``rcu_node`` structure.h](hXIIn some deep philosophical and overly anthromorphized sense, yes, the CPU starting the grace period is immediately aware of having done so. However, if we instead assume that RCU is not self-aware, then even the CPU starting the grace period does not really become aware of the start of this grace period until its first call to }(hjhhhNhNubj8)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhU. On the other hand, this CPU potentially gets early notification because it invokes }(hjhhhNhNubj8)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh during its last }(hjhhhNhNubj8)}(h``rcu_gp_init()``h]h rcu_gp_init()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh pass through its leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjDubeh}(h]h ]h"]h$]h&]uh1j<hj7ubeh}(h]h ]h"]h$]h&]colsKuh1j+hj4ubah}(h]h ]h"]h$]h&]uh1j&hjh hhhhhNubeh}(h]jah ]h"]grace-period initializationah$]h&]uh1hhjhhhhhMHjg Kubh)}(hhh](h)}(hSelf-Reported Quiescent Statesh]hSelf-Reported Quiescent States}(hjHhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjEhhhhhMubh)}(hXWhen all entities that might block the grace period have reported quiescent states (or as described in a later section, had quiescent states reported on their behalf), the grace period can end. Online non-idle CPUs report their own quiescent states, as shown in the following diagram:h]hXWhen all entities that might block the grace period have reported quiescent states (or as described in a later section, had quiescent states reported on their behalf), the grace period can end. Online non-idle CPUs report their own quiescent states, as shown in the following diagram:}(hjVhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjEhhubj")}(hhh]j')}(hhh]j,)}(h".. kernel-figure:: TreeRCU-qs.svg h]h}(h]h ]h"]h$]h&]uri)RCU/Design/Memory-Ordering/TreeRCU-qs.svgj9}j;jusuh1j+hjghhhKubah}(h]h ]h"]h$]h&]uh1j&hjdubah}(h]h ]h"]h$]h&]uh1j!hjEhhhhhMubh)}(hXThis is for the last CPU to report a quiescent state, which signals the end of the grace period. Earlier quiescent states would push up the ``rcu_node`` tree only until they encountered an ``rcu_node`` structure that is waiting for additional quiescent states. However, ordering is nevertheless preserved because some later quiescent state will acquire that ``rcu_node`` structure's ``->lock``.h](hThis is for the last CPU to report a quiescent state, which signals the end of the grace period. Earlier quiescent states would push up the ~}(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh% tree only until they encountered an }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure that is waiting for additional quiescent states. However, ordering is nevertheless preserved because some later quiescent state will acquire that }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjEhhubh)}(hX=Any number of events can lead up to a CPU invoking ``note_gp_changes`` (or alternatively, directly invoking ``__note_gp_changes()``), at which point that CPU will notice the start of a new grace period while holding its leaf ``rcu_node`` lock. Therefore, all execution shown in this diagram happens after the start of the grace period. In addition, this CPU will consider any RCU read-side critical section that started before the invocation of ``__note_gp_changes()`` to have started before the grace period, and thus a critical section that the grace period must wait on.h](h3Any number of events can lead up to a CPU invoking }(hjhhhNhNubj8)}(h``note_gp_changes``h]hnote_gp_changes}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh& (or alternatively, directly invoking }(hjhhhNhNubj8)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh^), at which point that CPU will notice the start of a new grace period while holding its leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh lock. Therefore, all execution shown in this diagram happens after the start of the grace period. In addition, this CPU will consider any RCU read-side critical section that started before the invocation of }(hjhhhNhNubj8)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhi to have started before the grace period, and thus a critical section that the grace period must wait on.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjEhhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hj2ubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hjLhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjHubh:}(hjHhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjEubah}(h]h ]h"]h$]h&]uh1jFhjBubah}(h]h ]h"]h$]h&]uh1jAhj?ubjB)}(hhh]jG)}(hhh]h)}(hBut a RCU read-side critical section might have started after the beginning of the grace period (the advancing of ``->gp_seq`` from earlier), so why should the grace period wait on such a critical section?h](hrBut a RCU read-side critical section might have started after the beginning of the grace period (the advancing of }(hjvhhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjvubhO from earlier), so why should the grace period wait on such a critical section?}(hjvhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjsubah}(h]h ]h"]h$]h&]uh1jFhjpubah}(h]h ]h"]h$]h&]uh1jAhj?ubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhj?ubjB)}(hhh]jG)}(hhh]h)}(hXIt is indeed not necessary for the grace period to wait on such a critical section. However, it is permissible to wait on it. And it is furthermore important to wait on it, as this lazy approach is far more scalable than a “big bang” all-at-once grace-period start could possibly be.h]hXIt is indeed not necessary for the grace period to wait on such a critical section. However, it is permissible to wait on it. And it is furthermore important to wait on it, as this lazy approach is far more scalable than a “big bang” all-at-once grace-period start could possibly be.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhj?ubeh}(h]h ]h"]h$]h&]uh1j<hj2ubeh}(h]h ]h"]h$]h&]colsKuh1j+hj/ubah}(h]h ]h"]h$]h&]uh1j&hjEhhhhhNubh)}(hXnIf the CPU does a context switch, a quiescent state will be noted by ``rcu_note_context_switch()`` on the left. On the other hand, if the CPU takes a scheduler-clock interrupt while executing in usermode, a quiescent state will be noted by ``rcu_sched_clock_irq()`` on the right. Either way, the passage through a quiescent state will be noted in a per-CPU variable.h](hEIf the CPU does a context switch, a quiescent state will be noted by }(hjhhhNhNubj8)}(h``rcu_note_context_switch()``h]hrcu_note_context_switch()}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh on the left. On the other hand, if the CPU takes a scheduler-clock interrupt while executing in usermode, a quiescent state will be noted by }(hjhhhNhNubj8)}(h``rcu_sched_clock_irq()``h]hrcu_sched_clock_irq()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhe on the right. Either way, the passage through a quiescent state will be noted in a per-CPU variable.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjEhhubh)}(hX\The next time an ``RCU_SOFTIRQ`` handler executes on this CPU (for example, after the next scheduler-clock interrupt), ``rcu_core()`` will invoke ``rcu_check_quiescent_state()``, which will notice the recorded quiescent state, and invoke ``rcu_report_qs_rdp()``. If ``rcu_report_qs_rdp()`` verifies that the quiescent state really does apply to the current grace period, it invokes ``rcu_report_rnp()`` which traverses up the ``rcu_node`` tree as shown at the bottom of the diagram, clearing bits from each ``rcu_node`` structure's ``->qsmask`` field, and propagating up the tree when the result is zero.h](hThe next time an }(hj5hhhNhNubj8)}(h``RCU_SOFTIRQ``h]h RCU_SOFTIRQ}(hj=hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubhW handler executes on this CPU (for example, after the next scheduler-clock interrupt), }(hj5hhhNhNubj8)}(h``rcu_core()``h]h rcu_core()}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh will invoke }(hj5hhhNhNubj8)}(h``rcu_check_quiescent_state()``h]hrcu_check_quiescent_state()}(hjahhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh=, which will notice the recorded quiescent state, and invoke }(hj5hhhNhNubj8)}(h``rcu_report_qs_rdp()``h]hrcu_report_qs_rdp()}(hjshhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh. If }(hj5hhhNhNubj8)}(h``rcu_report_qs_rdp()``h]hrcu_report_qs_rdp()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh] verifies that the quiescent state really does apply to the current grace period, it invokes }(hj5hhhNhNubj8)}(h``rcu_report_rnp()``h]hrcu_report_rnp()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh which traverses up the }(hj5hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubhE tree as shown at the bottom of the diagram, clearing bits from each }(hj5hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh structure’s }(hj5hhhNhNubj8)}(h ``->qsmask``h]h->qsmask}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj5ubh< field, and propagating up the tree when the result is zero.}(hj5hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjEhhubh)}(hX Note that traversal passes upwards out of a given ``rcu_node`` structure only if the current CPU is reporting the last quiescent state for the subtree headed by that ``rcu_node`` structure. A key point is that if a CPU's traversal stops at a given ``rcu_node`` structure, then there will be a later traversal by another CPU (or perhaps the same one) that proceeds upwards from that point, and the ``rcu_node`` ``->lock`` guarantees that the first CPU's quiescent state happens before the remainder of the second CPU's traversal. Applying this line of thought repeatedly shows that all CPUs' quiescent states happen before the last CPU traverses through the root ``rcu_node`` structure, the “last CPU” being the one that clears the last bit in the root ``rcu_node`` structure's ``->qsmask`` field.h](h2Note that traversal passes upwards out of a given }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhh structure only if the current CPU is reporting the last quiescent state for the subtree headed by that }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhH structure. A key point is that if a CPU’s traversal stops at a given }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure, then there will be a later traversal by another CPU (or perhaps the same one) that proceeds upwards from that point, and the }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hj5hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh guarantees that the first CPU’s quiescent state happens before the remainder of the second CPU’s traversal. Applying this line of thought repeatedly shows that all CPUs’ quiescent states happen before the last CPU traverses through the root }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhR structure, the “last CPU” being the one that clears the last bit in the root }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->qsmask``h]h->qsmask}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh field.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjEhhubeh}(h]jah ]h"]self-reported quiescent statesah$]h&]uh1hhjhhhhhMjg Kubh)}(hhh](h)}(hDynamic Tick Interfaceh]hDynamic Tick Interface}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hX%Due to energy-efficiency considerations, RCU is forbidden from disturbing idle CPUs. CPUs are therefore required to notify RCU when entering or leaving idle state, which they do via fully ordered value-returning atomic operations on a per-CPU variable. The ordering effects are as shown below:h]hX%Due to energy-efficiency considerations, RCU is forbidden from disturbing idle CPUs. CPUs are therefore required to notify RCU when entering or leaving idle state, which they do via fully ordered value-returning atomic operations on a per-CPU variable. The ordering effects are as shown below:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj")}(hhh]j')}(hhh]j,)}(h'.. kernel-figure:: TreeRCU-dyntick.svg h]h}(h]h ]h"]h$]h&]uri.RCU/Design/Memory-Ordering/TreeRCU-dyntick.svgj9}j;jsuh1j+hjhhhKubah}(h]h ]h"]h$]h&]uh1j&hjubah}(h]h ]h"]h$]h&]uh1j!hjhhhhhMubh)}(hXThe RCU grace-period kernel thread samples the per-CPU idleness variable while holding the corresponding CPU's leaf ``rcu_node`` structure's ``->lock``. This means that any RCU read-side critical sections that precede the idle period (the oval near the top of the diagram above) will happen before the end of the current grace period. Similarly, the beginning of the current grace period will happen before any RCU read-side critical sections that follow the idle period (the oval near the bottom of the diagram above).h](hvThe RCU grace-period kernel thread samples the per-CPU idleness variable while holding the corresponding CPU’s leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhXp. This means that any RCU read-side critical sections that precede the idle period (the oval near the top of the diagram above) will happen before the end of the current grace period. Similarly, the beginning of the current grace period will happen before any RCU read-side critical sections that follow the idle period (the oval near the bottom of the diagram above).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hfPlumbing this into the full grace-period execution is described `below `__.h](h@Plumbing this into the full grace-period execution is described }(hjhhhNhNubj)}(h%`below `__h]hbelow}(hjhhhNhNubah}(h]h ]h"]h$]h&]namebelowjj{uh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]j5ah ]h"]dynamic tick interfaceah$]h&]uh1hhjhhhhhMjg Kubh)}(hhh](h)}(hCPU-Hotplug Interfaceh]hCPU-Hotplug Interface}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj#hhhhhMubh)}(hXRCU is also forbidden from disturbing offline CPUs, which might well be powered off and removed from the system completely. CPUs are therefore required to notify RCU of their comings and goings as part of the corresponding CPU hotplug operations. The ordering effects are shown below:h]hXRCU is also forbidden from disturbing offline CPUs, which might well be powered off and removed from the system completely. CPUs are therefore required to notify RCU of their comings and goings as part of the corresponding CPU hotplug operations. The ordering effects are shown below:}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj#hhubj")}(hhh]j')}(hhh]j,)}(h'.. kernel-figure:: TreeRCU-hotplug.svg h]h}(h]h ]h"]h$]h&]uri.RCU/Design/Memory-Ordering/TreeRCU-hotplug.svgj9}j;jSsuh1j+hjEhhhKubah}(h]h ]h"]h$]h&]uh1j&hjBubah}(h]h ]h"]h$]h&]uh1j!hj#hhhhhMubh)}(hX]Because CPU hotplug operations are much less frequent than idle transitions, they are heavier weight, and thus acquire the CPU's leaf ``rcu_node`` structure's ``->lock`` and update this structure's ``->qsmaskinitnext``. The RCU grace-period kernel thread samples this mask to detect CPUs having gone offline since the beginning of this grace period.h](hBecause CPU hotplug operations are much less frequent than idle transitions, they are heavier weight, and thus acquire the CPU’s leaf }(hjahhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjaubh structure’s }(hjahhhNhNubj8)}(h ``->lock``h]h->lock}(hj{hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjaubh and update this structure’s }(hjahhhNhNubj8)}(h``->qsmaskinitnext``h]h->qsmaskinitnext}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjaubh. The RCU grace-period kernel thread samples this mask to detect CPUs having gone offline since the beginning of this grace period.}(hjahhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj#hhubh)}(hfPlumbing this into the full grace-period execution is described `below `__.h](h@Plumbing this into the full grace-period execution is described }(hjhhhNhNubj)}(h%`below `__h]hbelow}(hjhhhNhNubah}(h]h ]h"]h$]h&]namebelowjj{uh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj#hhubeh}(h]jXah ]h"]cpu-hotplug interfaceah$]h&]uh1hhjhhhhhMjg Kubh)}(hhh](h)}(hForcing Quiescent Statesh]hForcing Quiescent States}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM ubh)}(hX&As noted above, idle and offline CPUs cannot report their own quiescent states, and therefore the grace-period kernel thread must do the reporting on their behalf. This process is called “forcing quiescent states”, it is repeated every few jiffies, and its ordering effects are shown below:h]hX&As noted above, idle and offline CPUs cannot report their own quiescent states, and therefore the grace-period kernel thread must do the reporting on their behalf. This process is called “forcing quiescent states”, it is repeated every few jiffies, and its ordering effects are shown below:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj")}(hhh]j')}(hhh]j,)}(h&.. kernel-figure:: TreeRCU-gp-fqs.svg h]h}(h]h ]h"]h$]h&]uri-RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svgj9}j;jsuh1j+hjhhhKubah}(h]h ]h"]h$]h&]uh1j&hjubah}(h]h ]h"]h$]h&]uh1j!hjhhhhhMubh)}(hX9Each pass of quiescent state forcing is guaranteed to traverse the leaf ``rcu_node`` structures, and if there are no new quiescent states due to recently idled and/or offlined CPUs, then only the leaves are traversed. However, if there is a newly offlined CPU as illustrated on the left or a newly idled CPU as illustrated on the right, the corresponding quiescent state will be driven up towards the root. As with self-reported quiescent states, the upwards driving stops once it reaches an ``rcu_node`` structure that has quiescent states outstanding from other CPUs.h](hHEach pass of quiescent state forcing is guaranteed to traverse the leaf }(hj hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubhX structures, and if there are no new quiescent states due to recently idled and/or offlined CPUs, then only the leaves are traversed. However, if there is a newly offlined CPU as illustrated on the left or a newly idled CPU as illustrated on the right, the corresponding quiescent state will be driven up towards the root. As with self-reported quiescent states, the upwards driving stops once it reaches an }(hj hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj ubhA structure that has quiescent states outstanding from other CPUs.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hjAubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjWubh:}(hjWhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM!hjTubah}(h]h ]h"]h$]h&]uh1jFhjQubah}(h]h ]h"]h$]h&]uh1jAhjNubjB)}(hhh]jG)}(hhh]h)}(hXThe leftmost drive to root stopped before it reached the root ``rcu_node`` structure, which means that there are still CPUs subordinate to that structure on which the current grace period is waiting. Given that, how is it possible that the rightmost drive to root ended the grace period?h](h>The leftmost drive to root stopped before it reached the root }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure, which means that there are still CPUs subordinate to that structure on which the current grace period is waiting. Given that, how is it possible that the rightmost drive to root ended the grace period?}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM#hjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjNubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM)hjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjNubjB)}(hhh]jG)}(hhh]h)}(hX4Good analysis! It is in fact impossible in the absence of bugs in RCU. But this diagram is complex enough as it is, so simplicity overrode accuracy. You can think of it as poetic license, or you can think of it as misdirection that is resolved in the `stitched-together diagram `__.h](hGood analysis! It is in fact impossible in the absence of bugs in RCU. But this diagram is complex enough as it is, so simplicity overrode accuracy. You can think of it as poetic license, or you can think of it as misdirection that is resolved in the }(hjhhhNhNubj)}(h8`stitched-together diagram `__h]hstitched-together diagram}(hjhhhNhNubah}(h]h ]h"]h$]h&]namestitched-together diagramjputting-it-all-togetheruh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM+hjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjNubeh}(h]h ]h"]h$]h&]uh1j<hjAubeh}(h]h ]h"]h$]h&]colsKuh1j+hj>ubah}(h]h ]h"]h$]h&]uh1j&hjhhhhhNubeh}(h]j{ah ]h"]forcing quiescent statesah$]h&]uh1hhjhhhhhM jg Kubh)}(hhh](h)}(hGrace-Period Cleanuph]hGrace-Period Cleanup}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj.hhhhhM3ubh)}(hGrace-period cleanup first scans the ``rcu_node`` tree breadth-first advancing all the ``->gp_seq`` fields, then it advances the ``rcu_state`` structure's ``->gp_seq`` field. The ordering effects are shown below:h](h%Grace-period cleanup first scans the }(hj?hhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj?ubh& tree breadth-first advancing all the }(hj?hhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj?ubh fields, then it advances the }(hj?hhhNhNubj8)}(h ``rcu_state``h]h rcu_state}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj?ubh structure’s }(hj?hhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hj?ubh- field. The ordering effects are shown below:}(hj?hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM5hj.hhubj")}(hhh]j')}(hhh]j,)}(h*.. kernel-figure:: TreeRCU-gp-cleanup.svg h]h}(h]h ]h"]h$]h&]uri1RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svgj9}j;jsuh1j+hjhhhKubah}(h]h ]h"]h$]h&]uh1j&hjubah}(h]h ]h"]h$]h&]uh1j!hj.hhhhhM;ubh)}(h~As indicated by the oval at the bottom of the diagram, once grace-period cleanup is complete, the next grace period can begin.h]h~As indicated by the oval at the bottom of the diagram, once grace-period cleanup is complete, the next grace period can begin.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM<hj.hhubj')}(hhh]j,)}(hhh](j1)}(hhh]h}(h]h ]h"]h$]h&]colwidthKGuh1j0hjubj=)}(hhh](jB)}(hhh]jG)}(hhh]h)}(h**Quick Quiz**:h](jP)}(h**Quick Quiz**h]h Quick Quiz}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jOhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM@hjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(h-But when precisely does the grace period end?h]h-But when precisely does the grace period end?}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMBhjubah}(h]h ]h"]h$]h&]uh1jFhjubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(h **Answer**:h](jP)}(h **Answer**h]hAnswer}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1jOhj)ubh:}(hj)hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMDhj&ubah}(h]h ]h"]h$]h&]uh1jFhj#ubah}(h]h ]h"]h$]h&]uh1jAhjubjB)}(hhh]jG)}(hhh]h)}(hXThere is no useful single point at which the grace period can be said to end. The earliest reasonable candidate is as soon as the last CPU has reported its quiescent state, but it may be some milliseconds before RCU becomes aware of this. The latest reasonable candidate is once the ``rcu_state`` structure's ``->gp_seq`` field has been updated, but it is quite possible that some CPUs have already completed phase two of their updates by that time. In short, if you are going to work with RCU, you need to learn to embrace uncertainty.h](hXThere is no useful single point at which the grace period can be said to end. The earliest reasonable candidate is as soon as the last CPU has reported its quiescent state, but it may be some milliseconds before RCU becomes aware of this. The latest reasonable candidate is once the }(hjWhhhNhNubj8)}(h ``rcu_state``h]h rcu_state}(hj_hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjWubh structure’s }(hjWhhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjWubh field has been updated, but it is quite possible that some CPUs have already completed phase two of their updates by that time. In short, if you are going to work with RCU, you need to learn to embrace uncertainty.}(hjWhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMFhjTubah}(h]h ]h"]h$]h&]uh1jFhjQubah}(h]h ]h"]h$]h&]uh1jAhjubeh}(h]h ]h"]h$]h&]uh1j<hjubeh}(h]h ]h"]h$]h&]colsKuh1j+hjubah}(h]h ]h"]h$]h&]uh1j&hj.hhhhhNubeh}(h]jah ]h"]grace-period cleanupah$]h&]uh1hhjhhhhhM3jg Kubh)}(hhh](h)}(hCallback Invocationh]hCallback Invocation}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMQubh)}(hXOnce a given CPU's leaf ``rcu_node`` structure's ``->gp_seq`` field has been updated, that CPU can begin invoking its RCU callbacks that were waiting for this grace period to end. These callbacks are identified by ``rcu_advance_cbs()``, which is usually invoked by ``__note_gp_changes()``. As shown in the diagram below, this invocation can be triggered by the scheduling-clock interrupt (``rcu_sched_clock_irq()`` on the left) or by idle entry (``rcu_cleanup_after_idle()`` on the right, but only for kernels build with ``CONFIG_RCU_FAST_NO_HZ=y``). Either way, ``RCU_SOFTIRQ`` is raised, which results in ``rcu_do_batch()`` invoking the callbacks, which in turn allows those callbacks to carry out (either directly or indirectly via wakeup) the needed phase-two processing for each update.h](hOnce a given CPU’s leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->gp_seq``h]h->gp_seq}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh field has been updated, that CPU can begin invoking its RCU callbacks that were waiting for this grace period to end. These callbacks are identified by }(hjhhhNhNubj8)}(h``rcu_advance_cbs()``h]hrcu_advance_cbs()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh, which is usually invoked by }(hjhhhNhNubj8)}(h``__note_gp_changes()``h]h__note_gp_changes()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhe. As shown in the diagram below, this invocation can be triggered by the scheduling-clock interrupt (}(hjhhhNhNubj8)}(h``rcu_sched_clock_irq()``h]hrcu_sched_clock_irq()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh on the left) or by idle entry (}(hjhhhNhNubj8)}(h``rcu_cleanup_after_idle()``h]hrcu_cleanup_after_idle()}(hj"hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh/ on the right, but only for kernels build with }(hjhhhNhNubj8)}(h``CONFIG_RCU_FAST_NO_HZ=y``h]hCONFIG_RCU_FAST_NO_HZ=y}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh). Either way, }(hjhhhNhNubj8)}(h``RCU_SOFTIRQ``h]h RCU_SOFTIRQ}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh is raised, which results in }(hjhhhNhNubj8)}(h``rcu_do_batch()``h]hrcu_do_batch()}(hjXhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh invoking the callbacks, which in turn allows those callbacks to carry out (either directly or indirectly via wakeup) the needed phase-two processing for each update.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMShjhhubj")}(hhh]j')}(hhh]j,)}(h3.. kernel-figure:: TreeRCU-callback-invocation.svg h]h}(h]h ]h"]h$]h&]uri:RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svgj9}j;jsuh1j+hjshhhKubah}(h]h ]h"]h$]h&]uh1j&hjpubah}(h]h ]h"]h$]h&]uh1j!hjhhhhhMaubh)}(hXjPlease note that callback invocation can also be prompted by any number of corner-case code paths, for example, when a CPU notes that it has excessive numbers of callbacks queued. In all cases, the CPU acquires its leaf ``rcu_node`` structure's ``->lock`` before invoking callbacks, which preserves the required ordering against the newly completed grace period.h](hPlease note that callback invocation can also be prompted by any number of corner-case code paths, for example, when a CPU notes that it has excessive numbers of callbacks queued. In all cases, the CPU acquires its leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubh structure’s }(hjhhhNhNubj8)}(h ``->lock``h]h->lock}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhk before invoking callbacks, which preserves the required ordering against the newly completed grace period.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMbhjhhubh)}(hXdHowever, if the callback function communicates to other CPUs, for example, doing a wakeup, then it is that function's responsibility to maintain ordering. For example, if the callback function wakes up a task that runs on some other CPU, proper ordering must in place in both the callback function and the task being awakened. To see why this is important, consider the top half of the `grace-period cleanup`_ diagram. The callback might be running on a CPU corresponding to the leftmost leaf ``rcu_node`` structure, and awaken a task that is to run on a CPU corresponding to the rightmost leaf ``rcu_node`` structure, and the grace-period kernel thread might not yet have reached the rightmost leaf. In this case, the grace period's memory ordering might not yet have reached that CPU, so again the callback function and the awakened task must supply proper ordering.h](hXHowever, if the callback function communicates to other CPUs, for example, doing a wakeup, then it is that function’s responsibility to maintain ordering. For example, if the callback function wakes up a task that runs on some other CPU, proper ordering must in place in both the callback function and the task being awakened. To see why this is important, consider the top half of the }(hjhhhNhNubj)}(h`grace-period cleanup`_h]hgrace-period cleanup}(hjhhhNhNubah}(h]h ]h"]h$]h&]namegrace-period cleanupjjuh1jhjjKubhT diagram. The callback might be running on a CPU corresponding to the leftmost leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhZ structure, and awaken a task that is to run on a CPU corresponding to the rightmost leaf }(hjhhhNhNubj8)}(h ``rcu_node``h]hrcu_node}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j7hjubhX structure, and the grace-period kernel thread might not yet have reached the rightmost leaf. In this case, the grace period’s memory ordering might not yet have reached that CPU, so again the callback function and the awakened task must supply proper ordering.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMihjhhubeh}(h]jah ]h"]callback invocationah$]h&]uh1hhjhhhhhMQjg Kubeh}(h]0tree-rcu-grace-period-memory-ordering-componentsah ]h"]0tree rcu grace period memory ordering componentsah$]h&]uh1hhjhhhhhKubh)}(hhh](h)}(hPutting It All Togetherh]hPutting It All Together}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMyubh)}(h$A stitched-together diagram is here:h]h$A stitched-together diagram is here:}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM{hjhhubj")}(hhh]j')}(hhh]j,)}(h".. kernel-figure:: TreeRCU-gp.svg h]h}(h]h ]h"]h$]h&]uri)RCU/Design/Memory-Ordering/TreeRCU-gp.svgj9}j;jFsuh1j+hj8hhhKubah}(h]h ]h"]h$]h&]uh1j&hj5ubah}(h]h ]h"]h$]h&]uh1j!hjhhhhhM~ubeh}(h]jah ]h"]putting it all togetherah$]h&]uh1hhjhhhhhMyjg Kubh)}(hhh](h)}(hLegal Statementh]hLegal Statement}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj[hhhhhMubh)}(h_This work represents the view of the author and does not necessarily represent the view of IBM.h]h_This work represents the view of the author and does not necessarily represent the view of IBM.}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj[hhubh)}(h2Linux is a registered trademark of Linus Torvalds.h]h2Linux is a registered trademark of Linus Torvalds.}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj[hhubh)}(hWOther company, product, and service names may be trademarks or service marks of others.h]hWOther company, product, and service names may be trademarks or service marks of others.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj[hhubeh}(h]legal-statementah ]h"]legal statementah$]h&]uh1hhjhhhhhMubeh}(h]5tree-rcu-grace-period-memory-ordering-building-blocksah ]h"]5tree rcu grace period memory ordering building blocksah$]h&]uh1hhhhhhhhK/ubeh}(h]6a-tour-through-tree-rcu-s-grace-period-memory-orderingah ]h"]6a tour through tree_rcu's grace-period memory orderingah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksjFfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourcehʌ _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}(callback registry]jagrace-period initialization]jaself-reported quiescent states]jadynamic tick interface]j&acpu-hotplug interface]jIaforcing quiescent states](jljjegrace-period cleanup](jjecallback invocation]japutting it all together]jaurefids}nameids}(jjj j jjjjjjjd jjBjjjj j5jjXj+j{jjj jjXjjju nametypes}(jj jjjjd jBjj jj+jj jXjuh}(jhj hjjjjjjjjjjh jjEj5jjXj#j{jjj.jjjjjj[u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}Rparse_messages]transform_messages] transformerN include_log] decorationNhhub.