summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@kernel.org>2023-07-25 07:36:05 -0700
committerPaul E. McKenney <paulmck@kernel.org>2023-07-25 07:36:05 -0700
commit0639550f571618158c768f2d5c6d229ad20f0ee9 (patch)
treec4526d4df84c41749c70e9500c247ecb7776831c
parentc87850702eb77bdc60268e97abb32ba679b17688 (diff)
downloadperfbook-0639550f571618158c768f2d5c6d229ad20f0ee9.tar.gz
debugging: Add "Proactive Hunting Techniques" section
Reported-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Nhat Pham <hoangnhatp@meta.com> Reported-by: Mykola Lysenko <mykolal@meta.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r--debugging/debugging.tex43
1 files changed, 41 insertions, 2 deletions
diff --git a/debugging/debugging.tex b/debugging/debugging.tex
index f9c4c998..3916f659 100644
--- a/debugging/debugging.tex
+++ b/debugging/debugging.tex
@@ -1700,6 +1700,8 @@ do just that:
\item Make rare events less rare
(\cref{sec:debugging:Make Rare Events Less Rare}).
\item Count near misses (\cref{sec:debugging:Count Near Misses}).
+\item Proactive hunting techniques
+ (\cref{sec:debugging:Proactive Hunting Techniques}).
\end{enumerate}
These are followed by discussion in
@@ -1872,8 +1874,8 @@ between CPU-hotplug operations.
For another example, most of the rcutorture scenarios emulate RCU
callback flooding every minute.
For a final example, a memory-management stress test for x86 CPUs might
-frequently transition an aligned 2\,MB block of memory back and forth
-between 2\,MB and 4\,KB pages.
+do well to frequently transition an aligned 2\,MB block of memory back
+and forth between 2\,MB and 4\,KB pages.
Another way to construct an anti-heisenbug for this class of heisenbug
is to introduce spurious failures.
@@ -1987,6 +1989,43 @@ heisenbug because the near-misses, being more frequent, are likely to
be more robust in the face of changes to your code, for example, the
changes you make to add debugging code.
+\subsubsection{Proactive Hunting Techniques}
+\label{sec:debugging:Proactive Hunting Techniques}
+
+Most of the anti-heisenbug techniques discussed in the precending sections
+are backwards looking.
+After all, prior experience is the best guide to knowing which regions
+of code are prone to race conditions, what aspects of the workload
+can most profitably be increased in intensity, which subsystems are
+deserving of suspicion, which rare events are important, and what near
+misses are good proxies for actual failures.
+
+What can you do to get ahead of the game?
+
+Getting ahead of the anti-heisenbug game is even more of an art than
+constructing an anti-heisenbug for a specific situation, but here
+are some techniques that can be helpful:
+
+\begin{enumerate}
+\item Add delay to sections of concurrent code that required the
+ most analysis, that needed formal verification, or that
+ deviated the most from common concurrency practice.
+\item Analyze trends in workload intensity, and use the results to
+ guide increasing the intensity of your testing.
+\item Be most suspicious of new code, especially if it is your new
+ code.
+\item Instrument your workload, looking for complex operations that
+ occur frequently enough to be an uptime problem but rarely
+ enough to avoid much exposure in your current testing.
+\item Look for near misses in failure-recovery code and on slowpaths.
+\end{enumerate}
+
+Finally, and most importantly, pay special attention to code that people
+are the most proud of.
+After all, people are most likely to be proud of code that is unusual,
+which means that its bugs (and the bugs in the code that it uses) are
+likely to escape your usual testing efforts.
+
\subsubsection{Heisenbug Discussion}
\label{sec:debugging:Heisenbug Discussion}