sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget0/translations/zh_CN/kernel-hacking/false-sharingmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget0/translations/zh_TW/kernel-hacking/false-sharingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget0/translations/it_IT/kernel-hacking/false-sharingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget0/translations/ja_JP/kernel-hacking/false-sharingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget0/translations/ko_KR/kernel-hacking/false-sharingmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget0/translations/sp_SP/kernel-hacking/false-sharingmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhhJ/var/lib/git/docbuild/linux/Documentation/kernel-hacking/false-sharing.rsthKubhsection)}(hhh](htitle)}(h False Sharingh]h False Sharing}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hWhat is False Sharingh]hWhat is False Sharing}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hFalse sharing is related with cache mechanism of maintaining the data coherence of one cache line stored in multiple CPU's caches; then academic definition for it is in [1]_. Consider a struct with a refcount and a string::h](hFalse sharing is related with cache mechanism of maintaining the data coherence of one cache line stored in multiple CPU’s caches; then academic definition for it is in }(hhhhhNhNubhfootnote_reference)}(h[1]_h]h1}(hhhhhNhNubah}(h]id1ah ]h"]h$]h&]refidid4docnamekernel-hacking/false-sharinguh1hhh܌resolvedKubh1. Consider a struct with a refcount and a string:}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh literal_block)}(hustruct foo { refcount_t refcount; ... char name[16]; } ____cacheline_internodealigned_in_smp;h]hustruct foo { refcount_t refcount; ... char name[16]; } ____cacheline_internodealigned_in_smp;}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhhhhubh)}(hFMember 'refcount'(A) and 'name'(B) _share_ one cache line like below::h]hMMember ‘refcount’(A) and ‘name’(B) _share_ one cache line like below:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubj)}(hX +-----------+ +-----------+ | CPU 0 | | CPU 1 | +-----------+ +-----------+ / | / | V V +----------------------+ +----------------------+ | A B | Cache 0 | A B | Cache 1 +----------------------+ +----------------------+ | | ---------------------------+------------------+----------------------------- | | +----------------------+ | | +----------------------+ Main Memory | A B | +----------------------+h]hX +-----------+ +-----------+ | CPU 0 | | CPU 1 | +-----------+ +-----------+ / | / | V V +----------------------+ +----------------------+ | A B | Cache 0 | A B | Cache 1 +----------------------+ +----------------------+ | | ---------------------------+------------------+----------------------------- | | +----------------------+ | | +----------------------+ Main Memory | A B | +----------------------+}hj"sbah}(h]h ]h"]h$]h&]hhuh1jhhhKhhhhubh)}(hXx'refcount' is modified frequently, but 'name' is set once at object creation time and is never modified. When many CPUs access 'foo' at the same time, with 'refcount' being only bumped by one CPU frequently and 'name' being read by other CPUs, all those reading CPUs have to reload the whole cache line over and over due to the 'sharing', even though 'name' is never changed.h]hX‘refcount’ is modified frequently, but ‘name’ is set once at object creation time and is never modified. When many CPUs access ‘foo’ at the same time, with ‘refcount’ being only bumped by one CPU frequently and ‘name’ being read by other CPUs, all those reading CPUs have to reload the whole cache line over and over due to the ‘sharing’, even though ‘name’ is never changed.}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK(hhhhubh)}(hThere are many real-world cases of performance regressions caused by false sharing. One of these is a rw_semaphore 'mmap_lock' inside mm_struct struct, whose cache line layout change triggered a regression and Linus analyzed in [2]_.h](hThere are many real-world cases of performance regressions caused by false sharing. One of these is a rw_semaphore ‘mmap_lock’ inside mm_struct struct, whose cache line layout change triggered a regression and Linus analyzed in }(hj>hhhNhNubh)}(h[2]_h]h2}(hjFhhhNhNubah}(h]id2ah ]h"]h$]h&]hid5hhuh1hhj>hKubh.}(hj>hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK/hhhhubh)}(h6There are two key factors for a harmful false sharing:h]h6There are two key factors for a harmful false sharing:}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK4hhhhubh bullet_list)}(hhh](h list_item)}(h-A global datum accessed (shared) by many CPUsh]h)}(hjwh]h-A global datum accessed (shared) by many CPUs}(hjyhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK6hjuubah}(h]h ]h"]h$]h&]uh1jshjphhhhhNubjt)}(hpIn the concurrent accesses to the data, there is at least one write operation: write/write or write/read cases. h]h)}(hoIn the concurrent accesses to the data, there is at least one write operation: write/write or write/read cases.h]hoIn the concurrent accesses to the data, there is at least one write operation: write/write or write/read cases.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK7hjubah}(h]h ]h"]h$]h&]uh1jshjphhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1jnhhhK6hhhhubh)}(htThe sharing could be from totally unrelated kernel components, or different code paths of the same kernel component.h]htThe sharing could be from totally unrelated kernel components, or different code paths of the same kernel component.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK:hhhhubeh}(h]what-is-false-sharingah ]h"]what is false sharingah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hFalse Sharing Pitfallsh]hFalse Sharing Pitfalls}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK?ubh)}(hXBack in time when one platform had only one or a few CPUs, hot data members could be purposely put in the same cache line to make them cache hot and save cacheline/TLB, like a lock and the data protected by it. But for recent large system with hundreds of CPUs, this may not work when the lock is heavily contended, as the lock owner CPU could write to the data, while other CPUs are busy spinning the lock.h]hXBack in time when one platform had only one or a few CPUs, hot data members could be purposely put in the same cache line to make them cache hot and save cacheline/TLB, like a lock and the data protected by it. But for recent large system with hundreds of CPUs, this may not work when the lock is heavily contended, as the lock owner CPU could write to the data, while other CPUs are busy spinning the lock.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK@hjhhubh)}(hYLooking at past cases, there are several frequently occurring patterns for false sharing:h]hYLooking at past cases, there are several frequently occurring patterns for false sharing:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKGhjhhubjo)}(hhh](jt)}(h]lock (spinlock/mutex/semaphore) and data protected by it are purposely put in one cache line.h]h)}(h]lock (spinlock/mutex/semaphore) and data protected by it are purposely put in one cache line.h]h]lock (spinlock/mutex/semaphore) and data protected by it are purposely put in one cache line.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKJhjubah}(h]h ]h"]h$]h&]uh1jshjhhhhhNubjt)}(hglobal data being put together in one cache line. Some kernel subsystems have many global parameters of small size (4 bytes), which can easily be grouped together and put into one cache line.h]h)}(hglobal data being put together in one cache line. Some kernel subsystems have many global parameters of small size (4 bytes), which can easily be grouped together and put into one cache line.h]hglobal data being put together in one cache line. Some kernel subsystems have many global parameters of small size (4 bytes), which can easily be grouped together and put into one cache line.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKLhj ubah}(h]h ]h"]h$]h&]uh1jshjhhhhhNubjt)}(hdata members of a big data structure randomly sitting together without being noticed (cache line is usually 64 bytes or more), like 'mem_cgroup' struct. h]h)}(hdata members of a big data structure randomly sitting together without being noticed (cache line is usually 64 bytes or more), like 'mem_cgroup' struct.h]hdata members of a big data structure randomly sitting together without being noticed (cache line is usually 64 bytes or more), like ‘mem_cgroup’ struct.}(hj&hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKOhj"ubah}(h]h ]h"]h$]h&]uh1jshjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1jnhhhKJhjhhubh)}(h