debugging: Add "Proactive Hunting Techniques" section

Reported-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Nhat Pham <hoangnhatp@meta.com> Reported-by: Mykola Lysenko <mykolal@meta.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
author: Paul E. McKenney <paulmck@kernel.org> 2023-07-25 07:36:05 -0700
committer: Paul E. McKenney <paulmck@kernel.org> 2023-07-25 07:36:05 -0700
commit: 0639550f571618158c768f2d5c6d229ad20f0ee9 (patch)
tree: c4526d4df84c41749c70e9500c247ecb7776831c
parent: c87850702eb77bdc60268e97abb32ba679b17688 (diff)
download: perfbook-0639550f571618158c768f2d5c6d229ad20f0ee9.tar.gz
1 files changed, 41 insertions, 2 deletions
diff --git a/debugging/debugging.tex b/debugging/debugging.tex
index f9c4c998..3916f659 100644
--- a/debugging/debugging.tex
+++ b/debugging/debugging.tex
@@ -1700,6 +1700,8 @@ do just that:
 \item	Make rare events less rare
 	(\cref{sec:debugging:Make Rare Events Less Rare}).
 \item	Count near misses (\cref{sec:debugging:Count Near Misses}).
+\item	Proactive hunting techniques
+	(\cref{sec:debugging:Proactive Hunting Techniques}).
 \end{enumerate}
 
 These are followed by discussion in
@@ -1872,8 +1874,8 @@ between CPU-hotplug operations.
 For another example, most of the rcutorture scenarios emulate RCU
 callback flooding every minute.
 For a final example, a memory-management stress test for x86 CPUs might
-frequently transition an aligned 2\,MB block of memory back and forth
-between 2\,MB and 4\,KB pages.
+do well to frequently transition an aligned 2\,MB block of memory back
+and forth between 2\,MB and 4\,KB pages.
 
 Another way to construct an anti-heisenbug for this class of heisenbug
 is to introduce spurious failures.
@@ -1987,6 +1989,43 @@ heisenbug because the near-misses, being more frequent, are likely to
 be more robust in the face of changes to your code, for example, the
 changes you make to add debugging code.
 
+\subsubsection{Proactive Hunting Techniques}
+\label{sec:debugging:Proactive Hunting Techniques}
+
+Most of the anti-heisenbug techniques discussed in the precending sections
+are backwards looking.
+After all, prior experience is the best guide to knowing which regions
+of code are prone to race conditions, what aspects of the workload
+can most profitably be increased in intensity, which subsystems are
+deserving of suspicion, which rare events are important, and what near
+misses are good proxies for actual failures.
+
+What can you do to get ahead of the game?
+
+Getting ahead of the anti-heisenbug game is even more of an art than
+constructing an anti-heisenbug for a specific situation, but here
+are some techniques that can be helpful:
+
+\begin{enumerate}
+\item	Add delay to sections of concurrent code that required the
+	most analysis, that needed formal verification, or that
+	deviated the most from common concurrency practice.
+\item	Analyze trends in workload intensity, and use the results to
+	guide increasing the intensity of your testing.
+\item	Be most suspicious of new code, especially if it is your new
+	code.
+\item	Instrument your workload, looking for complex operations that
+	occur frequently enough to be an uptime problem but rarely
+	enough to avoid much exposure in your current testing.
+\item	Look for near misses in failure-recovery code and on slowpaths.
+\end{enumerate}
+
+Finally, and most importantly, pay special attention to code that people
+are the most proud of.
+After all, people are most likely to be proud of code that is unusual,
+which means that its bugs (and the bugs in the code that it uses) are
+likely to escape your usual testing efforts.
+
 \subsubsection{Heisenbug Discussion}
 \label{sec:debugging:Heisenbug Discussion}
author	Paul E. McKenney <paulmck@kernel.org>	2023-07-25 07:36:05 -0700
committer	Paul E. McKenney <paulmck@kernel.org>	2023-07-25 07:36:05 -0700
commit	0639550f571618158c768f2d5c6d229ad20f0ee9 (patch)
tree	c4526d4df84c41749c70e9500c247ecb7776831c
parent	c87850702eb77bdc60268e97abb32ba679b17688 (diff)
download	perfbook-0639550f571618158c768f2d5c6d229ad20f0ee9.tar.gz