summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@kernel.org>2023-08-03 13:54:02 -0700
committerPaul E. McKenney <paulmck@kernel.org>2023-08-03 13:54:02 -0700
commited8ea422d8b460f6131685f8ca1705d07041e11a (patch)
tree76c6466660b149981441443896e902fd62918d34
parent6dc354d48f3166efa21e75b24c4b3d0a6c30f523 (diff)
downloadperfbook-ed8ea422d8b460f6131685f8ca1705d07041e11a.tar.gz
count,seqlock: More feedback from Yariv Aridor
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r--count/count.tex4
-rw-r--r--defer/seqlock.tex33
2 files changed, 24 insertions, 13 deletions
diff --git a/count/count.tex b/count/count.tex
index 451938d7..e2c5885d 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -417,8 +417,8 @@ avoids the delays inherent in such circulation.
This results in instruction latency that varies as $\O{\log N}$,
where $N$ is the number of CPUs, as shown in
\cref{fig:count:Data Flow For Global Combining-Tree Atomic Increment}.
- And CPUs with this sort of hardware optimization started to
- appear in 2011.
+ Some say that a few CPUs with this sort of hardware optimization
+ were in production use in the 1990s and started to reappear in 2011.
This is a great improvement over the $\O{N}$ performance
of current hardware shown in
diff --git a/defer/seqlock.tex b/defer/seqlock.tex
index 222cb1e5..5be8ae56 100644
--- a/defer/seqlock.tex
+++ b/defer/seqlock.tex
@@ -131,21 +131,32 @@ will pass to a later call to \co{read_seqretry()}.
\QuickQuiz{
Why not have \co{read_seqbegin()} in
\cref{lst:defer:Sequence-Locking Implementation}
- check for the low-order bit being set, and retry
- internally, rather than allowing a doomed read to start?
+ check whether the sequence-number value is odd, and, if so,
+ retry internally rather than entering a doomed read-side critical
+ section?
}\QuickQuizAnswer{
- That would be a legitimate implementation.
- However, if the workload is read-mostly, this added check would
- increase the overhead of the common-case successful read,
- which could be counter-productive.
- On the other hand, given a sufficiently large fraction of updates
- and sufficiently high-overhead readers, having this
- internal-to-\co{read_seqbegin()} check might be preferable.
+ This would be a legitimate implementation.
+
+ But please keep in mind that
+ \begin{enumerate*}[(1)]
+ \item This added check is a relatively expensive conditional branch,
+ \item It cannot be substituted for the later check done by
+ \co{read_seqretry()}, which must happen after the
+ critical section completes, and
+ \item Sequence locking is intended for read-mostly workloads,
+ which means that this extra check would slow down the
+ common case.
+ \end{enumerate*}
+
+ On the other hand, in an alternate universe having a sufficiently
+ large fraction of updates and sufficiently high-overhead readers,
+ having this internal-to-\co{read_seqbegin()} check might be
+ preferable.
\begin{fcvref}[ln:defer:seqlock:impl]
Of course, the full memory barriers
on \clnref{read_seqbegin:mb,read_seqretry:mb} of
- \cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
+ \cref{lst:defer:Sequence-Locking Implementation}
are quite heavyweight as instructions go, which suggests that the
overhead of the added check might be negligible.
Except that, in userspace code, the \co{membarrier()} system
@@ -163,7 +174,7 @@ will pass to a later call to \co{read_seqretry()}.
This same trick may be applied to Linux-kernel code using tools
such as \co{smp_call_function()}, at least in non-realtime builds
- of the Linux kernel..
+ of the Linux kernel.
}\QuickQuizEnd
\begin{fcvref}[ln:defer:seqlock:impl:read_seqretry]