׾sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget+/translations/zh_CN/core-api/pin_user_pagesmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget+/translations/zh_TW/core-api/pin_user_pagesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget+/translations/it_IT/core-api/pin_user_pagesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget+/translations/ja_JP/core-api/pin_user_pagesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget+/translations/ko_KR/core-api/pin_user_pagesmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget+/translations/sp_SP/core-api/pin_user_pagesmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhhE/var/lib/git/docbuild/linux/Documentation/core-api/pin_user_pages.rsthKubhsection)}(hhh](htitle)}(h"pin_user_pages() and related callsh]h"pin_user_pages() and related calls}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubhtopic)}(hhh]h bullet_list)}(hhh](h list_item)}(hhh]h paragraph)}(hhh]h reference)}(hhh]hOverview}(hhhhhNhNubah}(h]id1ah ]h"]h$]h&]refidoverviewuh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh]h)}(hhh]h)}(hhh]hBasic description of FOLL_PIN}(hjhhhNhNubah}(h]id2ah ]h"]h$]h&]refidbasic-description-of-foll-pinuh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh]h)}(hhh]h)}(hhh]h#Which flags are set by each wrapper}(hj#hhhNhNubah}(h]id3ah ]h"]h$]h&]refid#which-flags-are-set-by-each-wrapperuh1hhj ubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh]h)}(hhh]h)}(hhh]hTracking dma-pinned pages}(hjEhhhNhNubah}(h]id4ah ]h"]h$]h&]refidtracking-dma-pinned-pagesuh1hhjBubah}(h]h ]h"]h$]h&]uh1hhj?ubah}(h]h ]h"]h$]h&]uh1hhhubh)}(hhh](h)}(hhh]h)}(hhh]h:FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flags}(hjghhhNhNubah}(h]id5ah ]h"]h$]h&]refid7foll-pin-foll-get-foll-longterm-when-to-use-which-flagsuh1hhjdubah}(h]h ]h"]h$]h&]uh1hhjaubh)}(hhh](h)}(hhh]h)}(hhh]h)}(hhh]hCASE 1: Direct IO (DIO)}(hjhhhNhNubah}(h]id6ah ]h"]h$]h&]refidcase-1-direct-io-diouh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhj}ubh)}(hhh]h)}(hhh]h)}(hhh]h CASE 2: RDMA}(hjhhhNhNubah}(h]id7ah ]h"]h$]h&]refid case-2-rdmauh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhj}ubh)}(hhh]h)}(hhh]h)}(hhh]hICASE 3: MMU notifier registration, with or without page faulting hardware}(hjhhhNhNubah}(h]id8ah ]h"]h$]h&]refidGcase-3-mmu-notifier-registration-with-or-without-page-faulting-hardwareuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhj}ubh)}(hhh]h)}(hhh]h)}(hhh]h1CASE 4: Pinning for struct page manipulation only}(hjhhhNhNubah}(h]id9ah ]h"]h$]h&]refid0case-4-pinning-for-struct-page-manipulation-onlyuh1hhjubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]uh1hhj}ubh)}(hhh]h)}(hhh]h)}(hhh]h=CASE 5: Pinning in order to write to the data within the page}(hjhhhNhNubah}(h]id10ah ]h"]h$]h&]refidpin_user_pages() pin_user_pages_fast() pin_user_pages_remote()h]h>pin_user_pages() pin_user_pages_fast() pin_user_pages_remote()}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjhhubeh}(h]hah ]h"]overviewah$]h&]uh1hhhhhhhhK ubh)}(hhh](h)}(hBasic description of FOLL_PINh]hBasic description of FOLL_PIN}(hj*hhhNhNubah}(h]h ]h"]h$]h&]jj uh1hhj'hhhhhKubh)}(hFOLL_PIN and FOLL_LONGTERM are flags that can be passed to the get_user_pages*() ("gup") family of functions. FOLL_PIN has significant interactions and interdependencies with FOLL_LONGTERM, so both are covered here.h]hFOLL_PIN and FOLL_LONGTERM are flags that can be passed to the get_user_pages*() (“gup”) family of functions. FOLL_PIN has significant interactions and interdependencies with FOLL_LONGTERM, so both are covered here.}(hj8hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj'hhubh)}(hFOLL_PIN is internal to gup, meaning that it should not appear at the gup call sites. This allows the associated wrapper functions (pin_user_pages*() and others) to set the correct combination of these flags, and to check for problems as well.h]hFOLL_PIN is internal to gup, meaning that it should not appear at the gup call sites. This allows the associated wrapper functions (pin_user_pages*() and others) to set the correct combination of these flags, and to check for problems as well.}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj'hhubh)}(hXEFOLL_LONGTERM, on the other hand, *is* allowed to be set at the gup call sites. This is in order to avoid creating a large number of wrapper functions to cover all combinations of get*(), pin*(), FOLL_LONGTERM, and more. Also, the pin_user_pages*() APIs are clearly distinct from the get_user_pages*() APIs, so that's a natural dividing line, and a good point to make separate wrapper calls. In other words, use pin_user_pages*() for DMA-pinned pages, and get_user_pages*() for other cases. There are five cases described later on in this document, to further clarify that concept.h](h"FOLL_LONGTERM, on the other hand, }(hjThhhNhNubhemphasis)}(h*is*h]his}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1j\hjTubhX! allowed to be set at the gup call sites. This is in order to avoid creating a large number of wrapper functions to cover all combinations of get*(), pin*(), FOLL_LONGTERM, and more. Also, the pin_user_pages*() APIs are clearly distinct from the get_user_pages*() APIs, so that’s a natural dividing line, and a good point to make separate wrapper calls. In other words, use pin_user_pages*() for DMA-pinned pages, and get_user_pages*() for other cases. There are five cases described later on in this document, to further clarify that concept.}(hjThhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj'hhubh)}(hX FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However, multiple threads and call sites are free to pin the same struct pages, via both FOLL_PIN and FOLL_GET. It's just the call site that needs to choose one or the other, not the struct page(s).h]hX FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However, multiple threads and call sites are free to pin the same struct pages, via both FOLL_PIN and FOLL_GET. It’s just the call site that needs to choose one or the other, not the struct page(s).}(hjvhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK'hj'hhubh)}(hThe FOLL_PIN implementation is nearly the same as FOLL_GET, except that FOLL_PIN uses a different reference counting technique.h]hThe FOLL_PIN implementation is nearly the same as FOLL_GET, except that FOLL_PIN uses a different reference counting technique.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK,hj'hhubh)}(hFOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying that is, FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.h]hFOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying that is, FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK/hj'hhubeh}(h]jah ]h"]basic description of foll_pinah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h#Which flags are set by each wrapperh]h#Which flags are set by each wrapper}(hjhhhNhNubah}(h]h ]h"]h$]h&]jj,uh1hhjhhhhhK3ubh)}(hX For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup flags the caller provides. The caller is required to pass in a non-null struct pages* array, and the function then pins pages by incrementing each by a special value: GUP_PIN_COUNTING_BIAS.h]hX For these pin_user_pages*() functions, FOLL_PIN is OR’d in with whatever gup flags the caller provides. The caller is required to pass in a non-null struct pages* array, and the function then pins pages by incrementing each by a special value: GUP_PIN_COUNTING_BIAS.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK5hjhhubh)}(hFor large folios, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead, the extra space available in the struct folio is used to store the pincount directly.h]hFor large folios, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead, the extra space available in the struct folio is used to store the pincount directly.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK:hjhhubh)}(hXcThis approach for large folios avoids the counting upper limit problems that are discussed below. Those limitations would have been aggravated severely by huge pages, because each tail page adds a refcount to the head page. And in fact, testing revealed that, without a separate pincount field, refcount overflows were seen in some huge page stress tests.h]hXcThis approach for large folios avoids the counting upper limit problems that are discussed below. Those limitations would have been aggravated severely by huge pages, because each tail page adds a refcount to the head page. And in fact, testing revealed that, without a separate pincount field, refcount overflows were seen in some huge page stress tests.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK>hjhhubh)}(hzThis also means that huge pages and large folios do not suffer from the false positives problem that is mentioned below.::h]hyThis also means that huge pages and large folios do not suffer from the false positives problem that is mentioned below.:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKDhjhhubj)}(hFunction -------- pin_user_pages FOLL_PIN is always set internally by this function. pin_user_pages_fast FOLL_PIN is always set internally by this function. pin_user_pages_remote FOLL_PIN is always set internally by this function.h]hFunction -------- pin_user_pages FOLL_PIN is always set internally by this function. pin_user_pages_fast FOLL_PIN is always set internally by this function. pin_user_pages_remote FOLL_PIN is always set internally by this function.}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKGhjhhubh)}(hXJFor these get_user_pages*() functions, FOLL_GET might not even be specified. Behavior is a little more complex than above. If FOLL_GET was *not* specified, but the caller passed in a non-null struct pages* array, then the function sets FOLL_GET for you, and proceeds to pin pages by incrementing the refcount of each page by +1.::h](hFor these get_user_pages*() functions, FOLL_GET might not even be specified. Behavior is a little more complex than above. If FOLL_GET was }(hjhhhNhNubj])}(h*not*h]hnot}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j\hjubh specified, but the caller passed in a non-null struct pages* array, then the function sets FOLL_GET for you, and proceeds to pin pages by incrementing the refcount of each page by +1.:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKMhjhhubj)}(hXFunction -------- get_user_pages FOLL_GET is sometimes set internally by this function. get_user_pages_fast FOLL_GET is sometimes set internally by this function. get_user_pages_remote FOLL_GET is sometimes set internally by this function.h]hXFunction -------- get_user_pages FOLL_GET is sometimes set internally by this function. get_user_pages_fast FOLL_GET is sometimes set internally by this function. get_user_pages_remote FOLL_GET is sometimes set internally by this function.}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKShjhhubeh}(h]j2ah ]h"]#which flags are set by each wrapperah$]h&]uh1hhhhhhhhK3ubh)}(hhh](h)}(hTracking dma-pinned pagesh]hTracking dma-pinned pages}(hj6hhhNhNubah}(h]h ]h"]h$]h&]jjNuh1hhj3hhhhhKZubh)}(hQSome of the key design constraints, and solutions, for tracking dma-pinned pages:h]hQSome of the key design constraints, and solutions, for tracking dma-pinned pages:}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK\hj3hhubh)}(hhh](h)}(hvAn actual reference count, per struct page, is required. This is because multiple processes may pin and unpin a page. h]h)}(huAn actual reference count, per struct page, is required. This is because multiple processes may pin and unpin a page.h]huAn actual reference count, per struct page, is required. This is because multiple processes may pin and unpin a page.}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK_hjUubah}(h]h ]h"]h$]h&]uh1hhjRhhhhhNubh)}(h{False positives (reporting that a page is dma-pinned, when in fact it is not) are acceptable, but false negatives are not. h]h)}(hzFalse positives (reporting that a page is dma-pinned, when in fact it is not) are acceptable, but false negatives are not.h]hzFalse positives (reporting that a page is dma-pinned, when in fact it is not) are acceptable, but false negatives are not.}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKbhjmubah}(h]h ]h"]h$]h&]uh1hhjRhhhhhNubh)}(hTstruct page may not be increased in size for this, and all fields are already used. h]h)}(hSstruct page may not be increased in size for this, and all fields are already used.h]hSstruct page may not be increased in size for this, and all fields are already used.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKehjubah}(h]h ]h"]h$]h&]uh1hhjRhhhhhNubh)}(hXGiven the above, we can overload the page->_refcount field by using, sort of, the upper bits in that field for a dma-pinned count. "Sort of", means that, rather than dividing page->_refcount into bit fields, we simple add a medium- large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to page->_refcount. This provides fuzzy behavior: if a page has get_page() called on it 1024 times, then it will appear to have a single dma-pinned count. And again, that's acceptable. h]h)}(hXGiven the above, we can overload the page->_refcount field by using, sort of, the upper bits in that field for a dma-pinned count. "Sort of", means that, rather than dividing page->_refcount into bit fields, we simple add a medium- large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to page->_refcount. This provides fuzzy behavior: if a page has get_page() called on it 1024 times, then it will appear to have a single dma-pinned count. And again, that's acceptable.h]hXGiven the above, we can overload the page->_refcount field by using, sort of, the upper bits in that field for a dma-pinned count. “Sort of”, means that, rather than dividing page->_refcount into bit fields, we simple add a medium- large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to page->_refcount. This provides fuzzy behavior: if a page has get_page() called on it 1024 times, then it will appear to have a single dma-pinned count. And again, that’s acceptable.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhjubah}(h]h ]h"]h$]h&]uh1hhjRhhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1hhhhK_hj3hhubh)}(hxThis also leads to limitations: there are only 31-10==21 bits available for a counter that increments 10 bits at a time.h]hxThis also leads to limitations: there are only 31-10==21 bits available for a counter that increments 10 bits at a time.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKphj3hhubh)}(hhh](h)}(hX?Because of that limitation, special handling is applied to the zero pages when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its refcount or pincount at all (it is permanent, so there's no need). The unpinning functions also don't do anything to a zero page. This is transparent to the caller. h]h)}(hX>Because of that limitation, special handling is applied to the zero pages when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its refcount or pincount at all (it is permanent, so there's no need). The unpinning functions also don't do anything to a zero page. This is transparent to the caller.h]hXDBecause of that limitation, special handling is applied to the zero pages when using FOLL_PIN. We only pretend to pin a zero page - we don’t alter its refcount or pincount at all (it is permanent, so there’s no need). The unpinning functions also don’t do anything to a zero page. This is transparent to the caller.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKshjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCallers must specifically request "dma-pinned tracking of pages". In other words, just calling get_user_pages() will not suffice; a new set of functions, pin_user_page() and related, must be used. h]h)}(hCallers must specifically request "dma-pinned tracking of pages". In other words, just calling get_user_pages() will not suffice; a new set of functions, pin_user_page() and related, must be used.h]hCallers must specifically request “dma-pinned tracking of pages”. In other words, just calling get_user_pages() will not suffice; a new set of functions, pin_user_page() and related, must be used.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKyhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKshj3hhubeh}(h]jTah ]h"]tracking dma-pinned pagesah$]h&]uh1hhhhhhhhKZubh)}(hhh](h)}(h:FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flagsh]h:FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flags}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjpuh1hhj hhhhhK~ubh)}(hbThanks to Jan Kara, Vlastimil Babka and several other -mm people, for describing these categories:h]hbThanks to Jan Kara, Vlastimil Babka and several other -mm people, for describing these categories:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hhh](h)}(hCASE 1: Direct IO (DIO)h]hCASE 1: Direct IO (DIO)}(hj-hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj*hhhhhKubh)}(hXThere are GUP references to pages that are serving as DIO buffers. These buffers are needed for a relatively short time (so they are not "long term"). No special synchronization with folio_mkclean() or munmap() is provided. Therefore, flags to set at the call site are: ::h]hXThere are GUP references to pages that are serving as DIO buffers. These buffers are needed for a relatively short time (so they are not “long term”). No special synchronization with folio_mkclean() or munmap() is provided. Therefore, flags to set at the call site are:}(hj;hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj*hhubj)}(hFOLL_PINh]hFOLL_PIN}hjIsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhj*hhubh)}(h|...but rather than setting FOLL_PIN directly, call sites should use one of the pin_user_pages*() routines that set FOLL_PIN.h]h|...but rather than setting FOLL_PIN directly, call sites should use one of the pin_user_pages*() routines that set FOLL_PIN.}(hjWhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj*hhubeh}(h]jah ]h"]case 1: direct io (dio)ah$]h&]uh1hhj hhhhhKubh)}(hhh](h)}(h CASE 2: RDMAh]h CASE 2: RDMA}(hjohhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjlhhhhhKubh)}(hThere are GUP references to pages that are serving as DMA buffers. These buffers are needed for a long time ("long term"). No special synchronization with folio_mkclean() or munmap() is provided. Therefore, flags to set at the call site are: ::h]hThere are GUP references to pages that are serving as DMA buffers. These buffers are needed for a long time (“long term”). No special synchronization with folio_mkclean() or munmap() is provided. Therefore, flags to set at the call site are:}(hj}hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjlhhubj)}(hFOLL_PIN | FOLL_LONGTERMh]hFOLL_PIN | FOLL_LONGTERM}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjlhhubh)}(hNOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's because DAX pages do not have a separate page cache, and so "pinning" implies locking down file system blocks, which is not (yet) supported in that way.h]hNOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That’s because DAX pages do not have a separate page cache, and so “pinning” implies locking down file system blocks, which is not (yet) supported in that way.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjlhhubhtarget)}(h#.. _mmu-notifier-registration-case:h]h}(h]h ]h"]h$]h&]jmmu-notifier-registration-caseuh1jhKhjlhhhhubeh}(h]jah ]h"] case 2: rdmaah$]h&]uh1hhj hhhhhKubh)}(hhh](h)}(hICASE 3: MMU notifier registration, with or without page faulting hardwareh]hICASE 3: MMU notifier registration, with or without page faulting hardware}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hX~Device drivers can pin pages via get_user_pages*(), and register for mmu notifier callbacks for the memory range. Then, upon receiving a notifier "invalidate range" callback , stop the device from using the range, and unpin the pages. There may be other possible schemes, such as for example explicitly synchronizing against pending IO, that accomplish approximately the same thing.h]hXDevice drivers can pin pages via get_user_pages*(), and register for mmu notifier callbacks for the memory range. Then, upon receiving a notifier “invalidate range” callback , stop the device from using the range, and unpin the pages. There may be other possible schemes, such as for example explicitly synchronizing against pending IO, that accomplish approximately the same thing.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX*Or, if the hardware supports replayable page faults, then the device driver can avoid pinning entirely (this is ideal), as follows: register for mmu notifier callbacks as above, but instead of stopping the device and unpinning in the callback, simply remove the range from the device's page tables.h]hX,Or, if the hardware supports replayable page faults, then the device driver can avoid pinning entirely (this is ideal), as follows: register for mmu notifier callbacks as above, but instead of stopping the device and unpinning in the callback, simply remove the range from the device’s page tables.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hEither way, as long as the driver unpins the pages upon mmu notifier callback, then there is proper synchronization with both filesystem and mm (folio_mkclean(), munmap(), etc). Therefore, neither flag needs to be set.h]hEither way, as long as the driver unpins the pages upon mmu notifier callback, then there is proper synchronization with both filesystem and mm (folio_mkclean(), munmap(), etc). Therefore, neither flag needs to be set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h](jjeh ]h"](Icase 3: mmu notifier registration, with or without page faulting hardwaremmu-notifier-registration-caseeh$]h&]uh1hhj hhhhhKexpect_referenced_by_name}jjsexpect_referenced_by_id}jjsubh)}(hhh](h)}(h1CASE 4: Pinning for struct page manipulation onlyh]h1CASE 4: Pinning for struct page manipulation only}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhKubh)}(hIf only struct page data (as opposed to the actual memory contents that a page is tracking) is affected, then normal GUP calls are sufficient, and neither flag needs to be set.h]hIf only struct page data (as opposed to the actual memory contents that a page is tracking) is affected, then normal GUP calls are sufficient, and neither flag needs to be set.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]jah ]h"]1case 4: pinning for struct page manipulation onlyah$]h&]uh1hhj hhhhhKubh)}(hhh](h)}(h=CASE 5: Pinning in order to write to the data within the pageh]h=CASE 5: Pinning in order to write to the data within the page}(hj+hhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj(hhhhhKubh)}(hXXEven though neither DMA nor Direct IO is involved, just a simple case of "pin, write to a page's data, unpin" can cause a problem. Case 5 may be considered a superset of Case 1, plus Case 2, plus anything that invokes that pattern. In other words, if the code is neither Case 1 nor Case 2, it may still require FOLL_PIN, for patterns like this:h]hX^Even though neither DMA nor Direct IO is involved, just a simple case of “pin, write to a page’s data, unpin” can cause a problem. Case 5 may be considered a superset of Case 1, plus Case 2, plus anything that invokes that pattern. In other words, if the code is neither Case 1 nor Case 2, it may still require FOLL_PIN, for patterns like this:}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj(hhubhdefinition_list)}(hhh](hdefinition_list_item)}(hfCorrect (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() h](hterm)}(hCorrect (uses FOLL_PIN calls):h]hCorrect (uses FOLL_PIN calls):}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1jRhhhKhjNubh definition)}(hhh]h)}(hFpin_user_pages() write to the data within the pages unpin_user_pages()h]hFpin_user_pages() write to the data within the pages unpin_user_pages()}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjdubah}(h]h ]h"]h$]h&]uh1jbhjNubeh}(h]h ]h"]h$]h&]uh1jLhhhKhjIubjM)}(h`INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() h](jS)}(h INCORRECT (uses FOLL_GET calls):h]h INCORRECT (uses FOLL_GET calls):}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jRhhhKhjubjc)}(hhh]h)}(h>get_user_pages() write to the data within the pages put_page()h]h>get_user_pages() write to the data within the pages put_page()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jbhjubeh}(h]h ]h"]h$]h&]uh1jLhhhKhjIhhubeh}(h]h ]h"]h$]h&]uh1jGhj(hhhhhNubeh}(h]jah ]h"]=case 5: pinning in order to write to the data within the pageah$]h&]uh1hhj hhhhhKubeh}(h]jvah ]h"]:foll_pin, foll_get, foll_longterm: when to use which flagsah$]h&]uh1hhhhhhhhK~ubh)}(hhh](h)}(h4folio_maybe_dma_pinned(): the whole point of pinningh]h4folio_maybe_dma_pinned(): the whole point of pinning}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjEuh1hhjhhhhhKubh)}(hX(The whole point of marking folios as "DMA-pinned" or "gup-pinned" is to be able to query, "is this folio DMA-pinned?" That allows code such as folio_mkclean() (and file system writeback code in general) to make informed decisions about what to do when a folio cannot be unmapped due to such pins.h]hX4The whole point of marking folios as “DMA-pinned” or “gup-pinned” is to be able to query, “is this folio DMA-pinned?” That allows code such as folio_mkclean() (and file system writeback code in general) to make informed decisions about what to do when a folio cannot be unmapped due to such pins.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hX What to do in those cases is the subject of a years-long series of discussions and debates (see the References at the end of this document). It's a TODO item here: fill in the details once that's worked out. Meanwhile, it's safe to say that having this available: ::h]hX What to do in those cases is the subject of a years-long series of discussions and debates (see the References at the end of this document). It’s a TODO item here: fill in the details once that’s worked out. Meanwhile, it’s safe to say that having this available:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(h>static inline bool folio_maybe_dma_pinned(struct folio *folio)h]h>static inline bool folio_maybe_dma_pinned(struct folio *folio)}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhjhhubh)}(hA...is a prerequisite to solving the long-running gup+DMA problem.h]hA...is a prerequisite to solving the long-running gup+DMA problem.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]jKah ]h"]4folio_maybe_dma_pinned(): the whole point of pinningah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hCAnother way of thinking about FOLL_GET, FOLL_PIN, and FOLL_LONGTERMh]hCAnother way of thinking about FOLL_GET, FOLL_PIN, and FOLL_LONGTERM}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjguh1hhjhhhhhKubh)}(hX Another way of thinking about these flags is as a progression of restrictions: FOLL_GET is for struct page manipulation, without affecting the data that the struct page refers to. FOLL_PIN is a *replacement* for FOLL_GET, and is for short term pins on pages whose data *will* get accessed. As such, FOLL_PIN is a "more severe" form of pinning. And finally, FOLL_LONGTERM is an even more restrictive case that has FOLL_PIN as a prerequisite: this is for pages that will be pinned longterm, and whose data will be accessed.h](hAnother way of thinking about these flags is as a progression of restrictions: FOLL_GET is for struct page manipulation, without affecting the data that the struct page refers to. FOLL_PIN is a }(hj%hhhNhNubj])}(h *replacement*h]h replacement}(hj-hhhNhNubah}(h]h ]h"]h$]h&]uh1j\hj%ubh> for FOLL_GET, and is for short term pins on pages whose data }(hj%hhhNhNubj])}(h*will*h]hwill}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1j\hj%ubh get accessed. As such, FOLL_PIN is a “more severe” form of pinning. And finally, FOLL_LONGTERM is an even more restrictive case that has FOLL_PIN as a prerequisite: this is for pages that will be pinned longterm, and whose data will be accessed.}(hj%hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]jmah ]h"]Canother way of thinking about foll_get, foll_pin, and foll_longtermah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h Unit testingh]h Unit testing}(hjahhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhj^hhhhhKubh)}(h This file::h]h This file:}(hjohhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj^hhubj)}(h%tools/testing/selftests/mm/gup_test.ch]h%tools/testing/selftests/mm/gup_test.c}hj}sbah}(h]h ]h"]h$]h&]hhuh1jhhhKhj^hhubh)}(hIhas the following new calls to exercise the new pin*() wrapper functions:h]hIhas the following new calls to exercise the new pin*() wrapper functions:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj^hhubh)}(hhh](h)}(h"PIN_FAST_BENCHMARK (./gup_test -a)h]h)}(hjh]h"PIN_FAST_BENCHMARK (./gup_test -a)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hPIN_BASIC_TEST (./gup_test -b) h]h)}(hPIN_BASIC_TEST (./gup_test -b)h]hPIN_BASIC_TEST (./gup_test -b)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhj^hhubh)}(hYou can monitor how many total dma-pinned pages have been acquired and released since the system was booted, via two new /proc/vmstat entries: ::h]hYou can monitor how many total dma-pinned pages have been acquired and released since the system was booted, via two new /proc/vmstat entries:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj^hhubj)}(hC/proc/vmstat/nr_foll_pin_acquired /proc/vmstat/nr_foll_pin_releasedh]hC/proc/vmstat/nr_foll_pin_acquired /proc/vmstat/nr_foll_pin_released}hjsbah}(h]h ]h"]h$]h&]hhuh1jhhhKhj^hhubh)}(hUnder normal conditions, these two values will be equal unless there are any long-term [R]DMA pins in place, or during pin/unpin transitions.h]hUnder normal conditions, these two values will be equal unless there are any long-term [R]DMA pins in place, or during pin/unpin transitions.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj^hhubh)}(hhh](h)}(hXnr_foll_pin_acquired: This is the number of logical pins that have been acquired since the system was powered on. For huge pages, the head page is pinned once for each page (head page and each tail page) within the huge page. This follows the same sort of behavior that get_user_pages() uses for huge pages: the head page is refcounted once for each tail or head page in the huge page, when get_user_pages() is applied to a huge page. h]h)}(hXnr_foll_pin_acquired: This is the number of logical pins that have been acquired since the system was powered on. For huge pages, the head page is pinned once for each page (head page and each tail page) within the huge page. This follows the same sort of behavior that get_user_pages() uses for huge pages: the head page is refcounted once for each tail or head page in the huge page, when get_user_pages() is applied to a huge page.h]hXnr_foll_pin_acquired: This is the number of logical pins that have been acquired since the system was powered on. For huge pages, the head page is pinned once for each page (head page and each tail page) within the huge page. This follows the same sort of behavior that get_user_pages() uses for huge pages: the head page is refcounted once for each tail or head page in the huge page, when get_user_pages() is applied to a huge page.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hXnr_foll_pin_released: The number of logical pins that have been released since the system was powered on. Note that pages are released (unpinned) on a PAGE_SIZE granularity, even if the original pin was applied to a huge page. Becaused of the pin count behavior described above in "nr_foll_pin_acquired", the accounting balances out, so that after doing this:: pin_user_pages(huge_page); for (each page in huge_page) unpin_user_page(page); h](h)}(hXhnr_foll_pin_released: The number of logical pins that have been released since the system was powered on. Note that pages are released (unpinned) on a PAGE_SIZE granularity, even if the original pin was applied to a huge page. Becaused of the pin count behavior described above in "nr_foll_pin_acquired", the accounting balances out, so that after doing this::h]hXknr_foll_pin_released: The number of logical pins that have been released since the system was powered on. Note that pages are released (unpinned) on a PAGE_SIZE granularity, even if the original pin was applied to a huge page. Becaused of the pin count behavior described above in “nr_foll_pin_acquired”, the accounting balances out, so that after doing this:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubj)}(hRpin_user_pages(huge_page); for (each page in huge_page) unpin_user_page(page);h]hRpin_user_pages(huge_page); for (each page in huge_page) unpin_user_page(page);}hj(sbah}(h]h ]h"]h$]h&]hhuh1jhhhMhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhj^hhubh)}(h...the following is expected::h]h...the following is expected:}(hjBhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj^hhubj)}(h,nr_foll_pin_released == nr_foll_pin_acquiredh]h,nr_foll_pin_released == nr_foll_pin_acquired}hjPsbah}(h]h ]h"]h$]h&]hhuh1jhhhM hj^hhubh)}(hU(...unless it was already out of balance due to a long-term RDMA pin being in place.)h]hU(...unless it was already out of balance due to a long-term RDMA pin being in place.)}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM hj^hhubeh}(h]jah ]h"] unit testingah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hOther diagnosticsh]hOther diagnostics}(hjvhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjshhhhhMubh)}(hdump_page() has been enhanced slightly to handle these new counting fields, and to better report on large folios in general. Specifically, for large folios, the exact pincount is reported.h]hdump_page() has been enhanced slightly to handle these new counting fields, and to better report on large folios in general. Specifically, for large folios, the exact pincount is reported.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjshhubeh}(h]jah ]h"]other diagnosticsah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(h Referencesh]h References}(hjhhhNhNubah}(h]h ]h"]h$]h&]jjuh1hhjhhhhhMubh)}(hhh](h)}(hZ`Some slow progress on get_user_pages() (Apr 2, 2019) `_h]h)}(hjh](h)}(hjh]h4Some slow progress on get_user_pages() (Apr 2, 2019)}(hjhhhNhNubah}(h]h ]h"]h$]h&]name4Some slow progress on get_user_pages() (Apr 2, 2019)refuri https://lwn.net/Articles/784574/uh1hhjubj)}(h# h]h}(h]/some-slow-progress-on-get-user-pages-apr-2-2019ah ]h"]4some slow progress on get_user_pages() (apr 2, 2019)ah$]h&]refurijuh1j referencedKhjubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hR`DMA and get_user_pages() (LPC: Dec 12, 2018) `_h]h)}(hjh](h)}(hjh]h,DMA and get_user_pages() (LPC: Dec 12, 2018)}(hjhhhNhNubah}(h]h ]h"]h$]h&]name,DMA and get_user_pages() (LPC: Dec 12, 2018)j https://lwn.net/Articles/774411/uh1hhjubj)}(h# h]h}(h]&dma-and-get-user-pages-lpc-dec-12-2018ah ]h"],dma and get_user_pages() (lpc: dec 12, 2018)ah$]h&]refurijuh1jjKhjubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hV`The trouble with get_user_pages() (Apr 30, 2018) `_h]h)}(hj h](h)}(hj h]h0The trouble with get_user_pages() (Apr 30, 2018)}(hj hhhNhNubah}(h]h ]h"]h$]h&]name0The trouble with get_user_pages() (Apr 30, 2018)j https://lwn.net/Articles/753027/uh1hhj ubj)}(h# h]h}(h]+the-trouble-with-get-user-pages-apr-30-2018ah ]h"]0the trouble with get_user_pages() (apr 30, 2018)ah$]h&]refurij% uh1jjKhj ubeh}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hg`LWN kernel index: get_user_pages() `_ h]h)}(hf`LWN kernel index: get_user_pages() `_h](h)}(hjE h]h"LWN kernel index: get_user_pages()}(hjG hhhNhNubah}(h]h ]h"]h$]h&]name"LWN kernel index: get_user_pages()j>https://lwn.net/Kernel/Index/#Memory_management-get_user_pagesuh1hhjC ubj)}(hA h]h}(h]lwn-kernel-index-get-user-pagesah ]h"]"lwn kernel index: get_user_pages()ah$]h&]refurijV uh1jjKhjC ubeh}(h]h ]h"]h$]h&]uh1hhhhMhj? ubah}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjhhubh)}(hJohn Hubbard, October, 2019h]hJohn Hubbard, October, 2019}(hjv hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h]jah ]h"] referencesah$]h&]uh1hhhhhhhhMubeh}(h] pin-user-pages-and-related-callsah ]h"]"pin_user_pages() and related callsah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerj error_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}j]jasnameids}(j j jjj$hjjj0j2jjTjjvjijjjjjjjj%jjjjjKj[jmjpjjjj jjjjjj/ j, j` j] u nametypes}(j jj$jj0jjjijjjj%jjj[jpjj jjj/ j` uh}(j hjhhjjj'j2jjTj3jvj jj*jjljjjjjjjj(jKjjmjjj^jjsjjjjjjj, j& j] jW hhj jj,j#jNjEjpjgjjjjjjjjjjjEj<jgj^jjjjjju footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}j KsRparse_messages]transform_messages]hsystem_message)}(hhh]h)}(hhh]hDHyperlink target "mmu-notifier-registration-case" is not referenced.}hj sbah}(h]h ]h"]h$]h&]uh1hhj ubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehlineKuh1j uba transformerN include_log] decorationNhhub.