summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@kernel.org>2023-07-25 07:11:28 -0700
committerPaul E. McKenney <paulmck@kernel.org>2023-07-25 07:11:28 -0700
commitc87850702eb77bdc60268e97abb32ba679b17688 (patch)
tree452c354436d5807fdfb49c023dcb24b62035ba4d
parentc7fbec0fe88fef6c12680b9853c273b1a8081b55 (diff)
downloadperfbook-c87850702eb77bdc60268e97abb32ba679b17688.tar.gz
debugging: Expand on making rare events less rare
Add some examples and make the section header more compelling Reported-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Nhat Pham <hoangnhatp@meta.com> Reported-by: Mykola Lysenko <mykolal@meta.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r--debugging/debugging.tex25
1 files changed, 18 insertions, 7 deletions
diff --git a/debugging/debugging.tex b/debugging/debugging.tex
index 18fc21bb..f9c4c998 100644
--- a/debugging/debugging.tex
+++ b/debugging/debugging.tex
@@ -1697,7 +1697,8 @@ do just that:
(\cref{sec:debugging:Increase Workload Intensity}).
\item Isolate suspicious subsystems
(\cref{sec:debugging:Isolate Suspicious Subsystems}).
-\item Simulate unusual events (\cref{sec:debugging:Simulate Unusual Events}).
+\item Make rare events less rare
+ (\cref{sec:debugging:Make Rare Events Less Rare}).
\item Count near misses (\cref{sec:debugging:Count Near Misses}).
\end{enumerate}
@@ -1857,15 +1858,25 @@ Creating such component-level stress tests can seem like a waste of time,
but a little bit of component-level testing can save a huge amount
of system-level debugging.
-\subsubsection{Simulate Unusual Events}
-\label{sec:debugging:Simulate Unusual Events}
+\subsubsection{Make Rare Events Less Rare}
+\label{sec:debugging:Make Rare Events Less Rare}
-Heisenbugs are sometimes due to unusual events, such as
+Heisenbugs are sometimes due to rare events, such as
memory-allocation failure, conditional-lock-acquisition failure,
-CPU-hotplug operations, timeouts, packet losses, and so on.
-One way to construct an anti-heisenbug for this class of heisenbug
+CPU-hotplug operations, timeouts, packet losses, large-scale changes
+in state, and so on.
+The corresponding anti-heisenbug is thus simply to make these rare events
+happen much more frequently.
+For example, the TREE03 rcutorture scenario waits only 200~milliseconds
+between CPU-hotplug operations.
+For another example, most of the rcutorture scenarios emulate RCU
+callback flooding every minute.
+For a final example, a memory-management stress test for x86 CPUs might
+frequently transition an aligned 2\,MB block of memory back and forth
+between 2\,MB and 4\,KB pages.
+
+Another way to construct an anti-heisenbug for this class of heisenbug
is to introduce spurious failures.
-
For example, instead of invoking \co{malloc()} directly, invoke
a wrapper function that uses a random number to decide whether
to return \co{NULL} unconditionally on the one hand, or to actually