diff options
author | Paul E. McKenney <paulmck@kernel.org> | 2023-07-25 07:11:28 -0700 |
---|---|---|
committer | Paul E. McKenney <paulmck@kernel.org> | 2023-07-25 07:11:28 -0700 |
commit | c87850702eb77bdc60268e97abb32ba679b17688 (patch) | |
tree | 452c354436d5807fdfb49c023dcb24b62035ba4d | |
parent | c7fbec0fe88fef6c12680b9853c273b1a8081b55 (diff) | |
download | perfbook-c87850702eb77bdc60268e97abb32ba679b17688.tar.gz |
debugging: Expand on making rare events less rare
Add some examples and make the section header more compelling
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Nhat Pham <hoangnhatp@meta.com>
Reported-by: Mykola Lysenko <mykolal@meta.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r-- | debugging/debugging.tex | 25 |
1 files changed, 18 insertions, 7 deletions
diff --git a/debugging/debugging.tex b/debugging/debugging.tex index 18fc21bb..f9c4c998 100644 --- a/debugging/debugging.tex +++ b/debugging/debugging.tex @@ -1697,7 +1697,8 @@ do just that: (\cref{sec:debugging:Increase Workload Intensity}). \item Isolate suspicious subsystems (\cref{sec:debugging:Isolate Suspicious Subsystems}). -\item Simulate unusual events (\cref{sec:debugging:Simulate Unusual Events}). +\item Make rare events less rare + (\cref{sec:debugging:Make Rare Events Less Rare}). \item Count near misses (\cref{sec:debugging:Count Near Misses}). \end{enumerate} @@ -1857,15 +1858,25 @@ Creating such component-level stress tests can seem like a waste of time, but a little bit of component-level testing can save a huge amount of system-level debugging. -\subsubsection{Simulate Unusual Events} -\label{sec:debugging:Simulate Unusual Events} +\subsubsection{Make Rare Events Less Rare} +\label{sec:debugging:Make Rare Events Less Rare} -Heisenbugs are sometimes due to unusual events, such as +Heisenbugs are sometimes due to rare events, such as memory-allocation failure, conditional-lock-acquisition failure, -CPU-hotplug operations, timeouts, packet losses, and so on. -One way to construct an anti-heisenbug for this class of heisenbug +CPU-hotplug operations, timeouts, packet losses, large-scale changes +in state, and so on. +The corresponding anti-heisenbug is thus simply to make these rare events +happen much more frequently. +For example, the TREE03 rcutorture scenario waits only 200~milliseconds +between CPU-hotplug operations. +For another example, most of the rcutorture scenarios emulate RCU +callback flooding every minute. +For a final example, a memory-management stress test for x86 CPUs might +frequently transition an aligned 2\,MB block of memory back and forth +between 2\,MB and 4\,KB pages. + +Another way to construct an anti-heisenbug for this class of heisenbug is to introduce spurious failures. - For example, instead of invoking \co{malloc()} directly, invoke a wrapper function that uses a random number to decide whether to return \co{NULL} unconditionally on the one hand, or to actually |