€•á>Œsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ,/translations/zh_CN/locking/futex-requeue-pi”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ,/translations/zh_TW/locking/futex-requeue-pi”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ,/translations/it_IT/locking/futex-requeue-pi”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ,/translations/ja_JP/locking/futex-requeue-pi”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ,/translations/ko_KR/locking/futex-requeue-pi”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ,/translations/sp_SP/locking/futex-requeue-pi”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒFutex Requeue PI”h]”hŒFutex Requeue PI”…””}”(hh¨hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hh£hžhhŸŒF/var/lib/git/docbuild/linux/Documentation/locking/futex-requeue-pi.rst”h KubhŒ paragraph”“”)”}”(hX‘Requeueing of tasks from a non-PI futex to a PI futex requires special handling in order to ensure the underlying rt_mutex is never left without an owner if it has waiters; doing so would break the PI boosting logic [see rt-mutex-design.rst] For the purposes of brevity, this action will be referred to as "requeue_pi" throughout this document. Priority inheritance is abbreviated throughout as "PI".”h]”hX™Requeueing of tasks from a non-PI futex to a PI futex requires special handling in order to ensure the underlying rt_mutex is never left without an owner if it has waiters; doing so would break the PI boosting logic [see rt-mutex-design.rst] For the purposes of brevity, this action will be referred to as “requeue_pi†throughout this document. Priority inheritance is abbreviated throughout as “PIâ€.”…””}”(hh¹hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h Khh£hžhubh¢)”}”(hhh]”(h§)”}”(hŒ Motivation”h]”hŒ Motivation”…””}”(hhÊhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hhÇhžhhŸh¶h Kubh¸)”}”(hX”Without requeue_pi, the glibc implementation of pthread_cond_broadcast() must resort to waking all the tasks waiting on a pthread_condvar and letting them try to sort out which task gets to run first in classic thundering-herd formation. An ideal implementation would wake the highest-priority waiter, and leave the rest to the natural wakeup inherent in unlocking the mutex associated with the condvar.”h]”hX”Without requeue_pi, the glibc implementation of pthread_cond_broadcast() must resort to waking all the tasks waiting on a pthread_condvar and letting them try to sort out which task gets to run first in classic thundering-herd formation. An ideal implementation would wake the highest-priority waiter, and leave the rest to the natural wakeup inherent in unlocking the mutex associated with the condvar.”…””}”(hhØhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h KhhÇhžhubh¸)”}”(hŒ%Consider the simplified glibc calls::”h]”hŒ$Consider the simplified glibc calls:”…””}”(hhæhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h KhhÇhžhubhŒ literal_block”“”)”}”(hXé/* caller must lock mutex */ pthread_cond_wait(cond, mutex) { lock(cond->__data.__lock); unlock(mutex); do { unlock(cond->__data.__lock); futex_wait(cond->__data.__futex); lock(cond->__data.__lock); } while(...) unlock(cond->__data.__lock); lock(mutex); } pthread_cond_broadcast(cond) { lock(cond->__data.__lock); unlock(cond->__data.__lock); futex_requeue(cond->data.__futex, cond->mutex); }”h]”hXé/* caller must lock mutex */ pthread_cond_wait(cond, mutex) { lock(cond->__data.__lock); unlock(mutex); do { unlock(cond->__data.__lock); futex_wait(cond->__data.__futex); lock(cond->__data.__lock); } while(...) unlock(cond->__data.__lock); lock(mutex); } pthread_cond_broadcast(cond) { lock(cond->__data.__lock); unlock(cond->__data.__lock); futex_requeue(cond->data.__futex, cond->mutex); }”…””}”hhösbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1hôhŸh¶h KhhÇhžhubh¸)”}”(hX0Once pthread_cond_broadcast() requeues the tasks, the cond->mutex has waiters. Note that pthread_cond_wait() attempts to lock the mutex only after it has returned to user space. This will leave the underlying rt_mutex with waiters, and no owner, breaking the previously mentioned PI-boosting algorithms.”h]”hX0Once pthread_cond_broadcast() requeues the tasks, the cond->mutex has waiters. Note that pthread_cond_wait() attempts to lock the mutex only after it has returned to user space. This will leave the underlying rt_mutex with waiters, and no owner, breaking the previously mentioned PI-boosting algorithms.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h K/hhÇhžhubh¸)”}”(hX-In order to support PI-aware pthread_condvar's, the kernel needs to be able to requeue tasks to PI futexes. This support implies that upon a successful futex_wait system call, the caller would return to user space already holding the PI futex. The glibc implementation would be modified as follows::”h]”hX.In order to support PI-aware pthread_condvar’s, the kernel needs to be able to requeue tasks to PI futexes. This support implies that upon a successful futex_wait system call, the caller would return to user space already holding the PI futex. The glibc implementation would be modified as follows:”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h K5hhÇhžhubhõ)”}”(hX/* caller must lock mutex */ pthread_cond_wait_pi(cond, mutex) { lock(cond->__data.__lock); unlock(mutex); do { unlock(cond->__data.__lock); futex_wait_requeue_pi(cond->__data.__futex); lock(cond->__data.__lock); } while(...) unlock(cond->__data.__lock); /* the kernel acquired the mutex for us */ } pthread_cond_broadcast_pi(cond) { lock(cond->__data.__lock); unlock(cond->__data.__lock); futex_requeue_pi(cond->data.__futex, cond->mutex); }”h]”hX/* caller must lock mutex */ pthread_cond_wait_pi(cond, mutex) { lock(cond->__data.__lock); unlock(mutex); do { unlock(cond->__data.__lock); futex_wait_requeue_pi(cond->__data.__futex); lock(cond->__data.__lock); } while(...) unlock(cond->__data.__lock); /* the kernel acquired the mutex for us */ } pthread_cond_broadcast_pi(cond) { lock(cond->__data.__lock); unlock(cond->__data.__lock); futex_requeue_pi(cond->data.__futex, cond->mutex); }”…””}”hj"sbah}”(h]”h ]”h"]”h$]”h&]”jjuh1hôhŸh¶h Kuser interface to requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.”h]”hXxThe solution involves two new rt_mutex helper routines, rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which allow the requeue code to acquire an uncontended rt_mutex on behalf of the waiter and to enqueue the waiter on a contended rt_mutex. Two new system calls provide the kernel<->user interface to requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.”…””}”(hjehžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h KahjFhžhubh¸)”}”(hXVFUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait() and pthread_cond_timedwait()) to block on the initial futex and wait to be requeued to a PI-aware futex. The implementation is the result of a high-speed collision between futex_wait() and futex_lock_pi(), with some extra logic to check for the additional wake-up scenarios.”h]”hXVFUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait() and pthread_cond_timedwait()) to block on the initial futex and wait to be requeued to a PI-aware futex. The implementation is the result of a high-speed collision between futex_wait() and futex_lock_pi(), with some extra logic to check for the additional wake-up scenarios.”…””}”(hjshžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h KhhjFhžhubh¸)”}”(hXáFUTEX_CMP_REQUEUE_PI is called by the waker (pthread_cond_broadcast() and pthread_cond_signal()) to requeue and possibly wake the waiting tasks. Internally, this system call is still handled by futex_requeue (by passing requeue_pi=1). Before requeueing, futex_requeue() attempts to acquire the requeue target PI futex on behalf of the top waiter. If it can, this waiter is woken. futex_requeue() then proceeds to requeue the remaining nr_wake+nr_requeue tasks to the PI futex, calling rt_mutex_start_proxy_lock() prior to each requeue to prepare the task as a waiter on the underlying rt_mutex. It is possible that the lock can be acquired at this stage as well, if so, the next waiter is woken to finish the acquisition of the lock.”h]”hXáFUTEX_CMP_REQUEUE_PI is called by the waker (pthread_cond_broadcast() and pthread_cond_signal()) to requeue and possibly wake the waiting tasks. Internally, this system call is still handled by futex_requeue (by passing requeue_pi=1). Before requeueing, futex_requeue() attempts to acquire the requeue target PI futex on behalf of the top waiter. If it can, this waiter is woken. futex_requeue() then proceeds to requeue the remaining nr_wake+nr_requeue tasks to the PI futex, calling rt_mutex_start_proxy_lock() prior to each requeue to prepare the task as a waiter on the underlying rt_mutex. It is possible that the lock can be acquired at this stage as well, if so, the next waiter is woken to finish the acquisition of the lock.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h KohjFhžhubh¸)”}”(hX)FUTEX_CMP_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but their sum is all that really matters. futex_requeue() will wake or requeue up to nr_wake + nr_requeue tasks. It will wake only as many tasks as it can acquire the lock for, which in the majority of cases should be 0 as good programming practice dictates that the caller of either pthread_cond_broadcast() or pthread_cond_signal() acquire the mutex prior to making the call. FUTEX_CMP_REQUEUE_PI requires that nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for signal.”h]”hX)FUTEX_CMP_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but their sum is all that really matters. futex_requeue() will wake or requeue up to nr_wake + nr_requeue tasks. It will wake only as many tasks as it can acquire the lock for, which in the majority of cases should be 0 as good programming practice dictates that the caller of either pthread_cond_broadcast() or pthread_cond_signal() acquire the mutex prior to making the call. FUTEX_CMP_REQUEUE_PI requires that nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for signal.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h·hŸh¶h K|hjFhžhubeh}”(h]”Œimplementation”ah ]”h"]”Œimplementation”ah$]”h&]”uh1h¡hh£hžhhŸh¶h KWubeh}”(h]”Œfutex-requeue-pi”ah ]”h"]”Œfutex requeue pi”ah$]”h&]”uh1h¡hhhžhhŸh¶h Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”h¶uh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(h¦NŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jÐŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”h¶Œ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”Œrefids”}”Œnameids”}”(jªj§jCj@j¢jŸuŒ nametypes”}”(jª‰jC‰j¢‰uh}”(j§h£j@hÇjŸjFuŒ footnote_refs”}”Œ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”Œ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”…”R”Œparse_messages”]”Œtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nhžhub.