diff options
author | Paul E. McKenney <paulmck@kernel.org> | 2023-07-25 07:36:05 -0700 |
---|---|---|
committer | Paul E. McKenney <paulmck@kernel.org> | 2023-07-25 07:36:05 -0700 |
commit | 0639550f571618158c768f2d5c6d229ad20f0ee9 (patch) | |
tree | c4526d4df84c41749c70e9500c247ecb7776831c | |
parent | c87850702eb77bdc60268e97abb32ba679b17688 (diff) | |
download | perfbook-0639550f571618158c768f2d5c6d229ad20f0ee9.tar.gz |
debugging: Add "Proactive Hunting Techniques" section
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Nhat Pham <hoangnhatp@meta.com>
Reported-by: Mykola Lysenko <mykolal@meta.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r-- | debugging/debugging.tex | 43 |
1 files changed, 41 insertions, 2 deletions
diff --git a/debugging/debugging.tex b/debugging/debugging.tex index f9c4c998..3916f659 100644 --- a/debugging/debugging.tex +++ b/debugging/debugging.tex @@ -1700,6 +1700,8 @@ do just that: \item Make rare events less rare (\cref{sec:debugging:Make Rare Events Less Rare}). \item Count near misses (\cref{sec:debugging:Count Near Misses}). +\item Proactive hunting techniques + (\cref{sec:debugging:Proactive Hunting Techniques}). \end{enumerate} These are followed by discussion in @@ -1872,8 +1874,8 @@ between CPU-hotplug operations. For another example, most of the rcutorture scenarios emulate RCU callback flooding every minute. For a final example, a memory-management stress test for x86 CPUs might -frequently transition an aligned 2\,MB block of memory back and forth -between 2\,MB and 4\,KB pages. +do well to frequently transition an aligned 2\,MB block of memory back +and forth between 2\,MB and 4\,KB pages. Another way to construct an anti-heisenbug for this class of heisenbug is to introduce spurious failures. @@ -1987,6 +1989,43 @@ heisenbug because the near-misses, being more frequent, are likely to be more robust in the face of changes to your code, for example, the changes you make to add debugging code. +\subsubsection{Proactive Hunting Techniques} +\label{sec:debugging:Proactive Hunting Techniques} + +Most of the anti-heisenbug techniques discussed in the precending sections +are backwards looking. +After all, prior experience is the best guide to knowing which regions +of code are prone to race conditions, what aspects of the workload +can most profitably be increased in intensity, which subsystems are +deserving of suspicion, which rare events are important, and what near +misses are good proxies for actual failures. + +What can you do to get ahead of the game? + +Getting ahead of the anti-heisenbug game is even more of an art than +constructing an anti-heisenbug for a specific situation, but here +are some techniques that can be helpful: + +\begin{enumerate} +\item Add delay to sections of concurrent code that required the + most analysis, that needed formal verification, or that + deviated the most from common concurrency practice. +\item Analyze trends in workload intensity, and use the results to + guide increasing the intensity of your testing. +\item Be most suspicious of new code, especially if it is your new + code. +\item Instrument your workload, looking for complex operations that + occur frequently enough to be an uptime problem but rarely + enough to avoid much exposure in your current testing. +\item Look for near misses in failure-recovery code and on slowpaths. +\end{enumerate} + +Finally, and most importantly, pay special attention to code that people +are the most proud of. +After all, people are most likely to be proud of code that is unusual, +which means that its bugs (and the bugs in the code that it uses) are +likely to escape your usual testing efforts. + \subsubsection{Heisenbug Discussion} \label{sec:debugging:Heisenbug Discussion} |