diff options
author | Paul E. McKenney <paulmck@kernel.org> | 2023-08-03 13:54:02 -0700 |
---|---|---|
committer | Paul E. McKenney <paulmck@kernel.org> | 2023-08-03 13:54:02 -0700 |
commit | ed8ea422d8b460f6131685f8ca1705d07041e11a (patch) | |
tree | 76c6466660b149981441443896e902fd62918d34 | |
parent | 6dc354d48f3166efa21e75b24c4b3d0a6c30f523 (diff) | |
download | perfbook-ed8ea422d8b460f6131685f8ca1705d07041e11a.tar.gz |
count,seqlock: More feedback from Yariv Aridor
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r-- | count/count.tex | 4 | ||||
-rw-r--r-- | defer/seqlock.tex | 33 |
2 files changed, 24 insertions, 13 deletions
diff --git a/count/count.tex b/count/count.tex index 451938d7..e2c5885d 100644 --- a/count/count.tex +++ b/count/count.tex @@ -417,8 +417,8 @@ avoids the delays inherent in such circulation. This results in instruction latency that varies as $\O{\log N}$, where $N$ is the number of CPUs, as shown in \cref{fig:count:Data Flow For Global Combining-Tree Atomic Increment}. - And CPUs with this sort of hardware optimization started to - appear in 2011. + Some say that a few CPUs with this sort of hardware optimization + were in production use in the 1990s and started to reappear in 2011. This is a great improvement over the $\O{N}$ performance of current hardware shown in diff --git a/defer/seqlock.tex b/defer/seqlock.tex index 222cb1e5..5be8ae56 100644 --- a/defer/seqlock.tex +++ b/defer/seqlock.tex @@ -131,21 +131,32 @@ will pass to a later call to \co{read_seqretry()}. \QuickQuiz{ Why not have \co{read_seqbegin()} in \cref{lst:defer:Sequence-Locking Implementation} - check for the low-order bit being set, and retry - internally, rather than allowing a doomed read to start? + check whether the sequence-number value is odd, and, if so, + retry internally rather than entering a doomed read-side critical + section? }\QuickQuizAnswer{ - That would be a legitimate implementation. - However, if the workload is read-mostly, this added check would - increase the overhead of the common-case successful read, - which could be counter-productive. - On the other hand, given a sufficiently large fraction of updates - and sufficiently high-overhead readers, having this - internal-to-\co{read_seqbegin()} check might be preferable. + This would be a legitimate implementation. + + But please keep in mind that + \begin{enumerate*}[(1)] + \item This added check is a relatively expensive conditional branch, + \item It cannot be substituted for the later check done by + \co{read_seqretry()}, which must happen after the + critical section completes, and + \item Sequence locking is intended for read-mostly workloads, + which means that this extra check would slow down the + common case. + \end{enumerate*} + + On the other hand, in an alternate universe having a sufficiently + large fraction of updates and sufficiently high-overhead readers, + having this internal-to-\co{read_seqbegin()} check might be + preferable. \begin{fcvref}[ln:defer:seqlock:impl] Of course, the full memory barriers on \clnref{read_seqbegin:mb,read_seqretry:mb} of - \cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup} + \cref{lst:defer:Sequence-Locking Implementation} are quite heavyweight as instructions go, which suggests that the overhead of the added check might be negligible. Except that, in userspace code, the \co{membarrier()} system @@ -163,7 +174,7 @@ will pass to a later call to \co{read_seqretry()}. This same trick may be applied to Linux-kernel code using tools such as \co{smp_call_function()}, at least in non-realtime builds - of the Linux kernel.. + of the Linux kernel. }\QuickQuizEnd \begin{fcvref}[ln:defer:seqlock:impl:read_seqretry] |