;sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget#/translations/zh_CN/networking/napimodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/zh_TW/networking/napimodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/it_IT/networking/napimodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/ja_JP/networking/napimodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/ko_KR/networking/napimodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget#/translations/sp_SP/networking/napimodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h7SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)h]h7SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh=/var/lib/git/docbuild/linux/Documentation/networking/napi.rsthKubhtarget)}(h .. _napi:h]h}(h]h ]h"]h$]h&]refidnapiuh1hhKhhhhhhubhsection)}(hhh](htitle)}(hNAPIh]hNAPI}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hNAPI is the event handling mechanism used by the Linux networking stack. The name NAPI no longer stands for anything in particular [#]_.h](hNAPI is the event handling mechanism used by the Linux networking stack. The name NAPI no longer stands for anything in particular }(hhhhhNhNubhfootnote_reference)}(h[#]_h]h1}(hhhhhNhNubah}(h]id2ah ]h"]h$]h&]autoKhid3docnamenetworking/napiuh1hhhٌresolvedKubh.}(hhhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(hXIn basic operation the device notifies the host about new events via an interrupt. The host then schedules a NAPI instance to process the events. The device may also be polled for events via NAPI without receiving interrupts first (:ref:`busy polling`).h](hIn basic operation the device notifies the host about new events via an interrupt. The host then schedules a NAPI instance to process the events. The device may also be polled for events via NAPI without receiving interrupts first (}(hjhhhNhNubh)}(h:ref:`busy polling`h]hinline)}(hj h]h busy polling}(hjhhhNhNubah}(h]h ](xrefstdstd-refeh"]h$]h&]uh1j hj ubah}(h]h ]h"]h$]h&]refdoch refdomainjreftyperef refexplicitrefwarn reftargetpolluh1hhhhK hjubh).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(hNAPI processing usually happens in the software interrupt context, but there is an option to use :ref:`separate kernel threads` for NAPI processing.h](haNAPI processing usually happens in the software interrupt context, but there is an option to use }(hj7hhhNhNubh)}(h(:ref:`separate kernel threads`h]j)}(hjAh]hseparate kernel threads}(hjChhhNhNubah}(h]h ](jstdstd-refeh"]h$]h&]uh1j hj?ubah}(h]h ]h"]h$]h&]refdoch refdomainjMreftyperef refexplicitrefwarnj+threadeduh1hhhhKhj7ubh for NAPI processing.}(hj7hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(huAll in all NAPI abstracts away from the drivers the context and configuration of event (packet Rx and Tx) processing.h]huAll in all NAPI abstracts away from the drivers the context and configuration of event (packet Rx and Tx) processing.}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hhh](h)}(h Driver APIh]h Driver API}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjwhhhhhKubh)}(hX:The two most important elements of NAPI are the struct napi_struct and the associated poll method. struct napi_struct holds the state of the NAPI instance while the method is the driver-specific event handler. The method will typically free Tx packets that have been transmitted and process newly received packets.h]hX:The two most important elements of NAPI are the struct napi_struct and the associated poll method. struct napi_struct holds the state of the NAPI instance while the method is the driver-specific event handler. The method will typically free Tx packets that have been transmitted and process newly received packets.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjwhhubh)}(h .. _drv_ctrl:h]h}(h]h ]h"]h$]h&]hdrv-ctrluh1hhK"hjwhhhhubh)}(hhh](h)}(h Control APIh]h Control API}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK%ubh)}(hnetif_napi_add() and netif_napi_del() add/remove a NAPI instance from the system. The instances are attached to the netdevice passed as argument (and will be deleted automatically when netdevice is unregistered). Instances are added in a disabled state.h]hnetif_napi_add() and netif_napi_del() add/remove a NAPI instance from the system. The instances are attached to the netdevice passed as argument (and will be deleted automatically when netdevice is unregistered). Instances are added in a disabled state.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK'hjhhubh)}(hnapi_enable() and napi_disable() manage the disabled state. A disabled NAPI can't be scheduled and its poll method is guaranteed to not be invoked. napi_disable() waits for ownership of the NAPI instance to be released.h]hnapi_enable() and napi_disable() manage the disabled state. A disabled NAPI can’t be scheduled and its poll method is guaranteed to not be invoked. napi_disable() waits for ownership of the NAPI instance to be released.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK,hjhhubh)}(hXThe control APIs are not idempotent. Control API calls are safe against concurrent use of datapath APIs but an incorrect sequence of control API calls may result in crashes, deadlocks, or race conditions. For example, calling napi_disable() multiple times in a row will deadlock.h]hXThe control APIs are not idempotent. Control API calls are safe against concurrent use of datapath APIs but an incorrect sequence of control API calls may result in crashes, deadlocks, or race conditions. For example, calling napi_disable() multiple times in a row will deadlock.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK1hjhhubeh}(h]( control-apijeh ]h"]( control apidrv_ctrleh$]h&]uh1hhjwhhhhhK%expect_referenced_by_name}jjsexpect_referenced_by_id}jjsubh)}(hhh](h)}(h Datapath APIh]h Datapath API}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK7ubh)}(hnapi_schedule() is the basic method of scheduling a NAPI poll. Drivers should call this function in their interrupt handler (see :ref:`drv_sched` for more info). A successful call to napi_schedule() will take ownership of the NAPI instance.h](hnapi_schedule() is the basic method of scheduling a NAPI poll. Drivers should call this function in their interrupt handler (see }(hjhhhNhNubh)}(h:ref:`drv_sched`h]j)}(hjh]h drv_sched}(hjhhhNhNubah}(h]h ](jstdstd-refeh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]refdoch refdomainjreftyperef refexplicitrefwarnj+ drv_scheduh1hhhhK9hjubh_ for more info). A successful call to napi_schedule() will take ownership of the NAPI instance.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK9hjhhubh)}(hX9Later, after NAPI is scheduled, the driver's poll method will be called to process the events/packets. The method takes a ``budget`` argument - drivers can process completions for any number of Tx packets but should only process up to ``budget`` number of Rx packets. Rx processing is usually much more expensive.h](h|Later, after NAPI is scheduled, the driver’s poll method will be called to process the events/packets. The method takes a }(hj,hhhNhNubhliteral)}(h ``budget``h]hbudget}(hj6hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj,ubhg argument - drivers can process completions for any number of Tx packets but should only process up to }(hj,hhhNhNubj5)}(h ``budget``h]hbudget}(hjHhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj,ubhD number of Rx packets. Rx processing is usually much more expensive.}(hj,hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK>hjhhubh)}(hXOIn other words for Rx processing the ``budget`` argument limits how many packets driver can process in a single poll. Rx specific APIs like page pool or XDP cannot be used at all when ``budget`` is 0. skb Tx processing should happen regardless of the ``budget``, but if the argument is 0 driver cannot call any XDP (or page pool) APIs.h](h%In other words for Rx processing the }(hj`hhhNhNubj5)}(h ``budget``h]hbudget}(hjhhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj`ubh argument limits how many packets driver can process in a single poll. Rx specific APIs like page pool or XDP cannot be used at all when }(hj`hhhNhNubj5)}(h ``budget``h]hbudget}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj`ubh9 is 0. skb Tx processing should happen regardless of the }(hj`hhhNhNubj5)}(h ``budget``h]hbudget}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj`ubhJ, but if the argument is 0 driver cannot call any XDP (or page pool) APIs.}(hj`hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKDhjhhubhwarning)}(hkThe ``budget`` argument may be 0 if core tries to only process skb Tx completions and no Rx or XDP packets.h]h)}(hkThe ``budget`` argument may be 0 if core tries to only process skb Tx completions and no Rx or XDP packets.h](hThe }(hjhhhNhNubj5)}(h ``budget``h]hbudget}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh] argument may be 0 if core tries to only process skb Tx completions and no Rx or XDP packets.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKLhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubh)}(hXThe poll method returns the amount of work done. If the driver still has outstanding work to do (e.g. ``budget`` was exhausted) the poll method should return exactly ``budget``. In that case, the NAPI instance will be serviced/polled again (without the need to be scheduled).h](hfThe poll method returns the amount of work done. If the driver still has outstanding work to do (e.g. }(hjhhhNhNubj5)}(h ``budget``h]hbudget}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh6 was exhausted) the poll method should return exactly }(hjhhhNhNubj5)}(h ``budget``h]hbudget}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubhc. In that case, the NAPI instance will be serviced/polled again (without the need to be scheduled).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKOhjhhubh)}(hIf event processing has been completed (all outstanding packets processed) the poll method should call napi_complete_done() before returning. napi_complete_done() releases the ownership of the instance.h]hIf event processing has been completed (all outstanding packets processed) the poll method should call napi_complete_done() before returning. napi_complete_done() releases the ownership of the instance.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKUhjhhubj)}(hXJThe case of finishing all events and using exactly ``budget`` must be handled carefully. There is no way to report this (rare) condition to the stack, so the driver must either not call napi_complete_done() and wait to be called again, or return ``budget - 1``. If the ``budget`` is 0 napi_complete_done() should never be called.h](h)}(hXThe case of finishing all events and using exactly ``budget`` must be handled carefully. There is no way to report this (rare) condition to the stack, so the driver must either not call napi_complete_done() and wait to be called again, or return ``budget - 1``.h](h3The case of finishing all events and using exactly }(hjhhhNhNubj5)}(h ``budget``h]hbudget}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh must be handled carefully. There is no way to report this (rare) condition to the stack, so the driver must either not call napi_complete_done() and wait to be called again, or return }(hjhhhNhNubj5)}(h``budget - 1``h]h budget - 1}(hj.hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK\hjubh)}(hCIf the ``budget`` is 0 napi_complete_done() should never be called.h](hIf the }(hjFhhhNhNubj5)}(h ``budget``h]hbudget}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjFubh2 is 0 napi_complete_done() should never be called.}(hjFhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKbhjubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h] datapath-apiah ]h"] datapath apiah$]h&]uh1hhjwhhhhhK7ubh)}(hhh](h)}(h Call sequenceh]h Call sequence}(hjwhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjthhhhhKeubh)}(hX@Drivers should not make assumptions about the exact sequencing of calls. The poll method may be called without the driver scheduling the instance (unless the instance is disabled). Similarly, it's not guaranteed that the poll method will be called, even if napi_schedule() succeeded (e.g. if the instance gets disabled).h]hXBDrivers should not make assumptions about the exact sequencing of calls. The poll method may be called without the driver scheduling the instance (unless the instance is disabled). Similarly, it’s not guaranteed that the poll method will be called, even if napi_schedule() succeeded (e.g. if the instance gets disabled).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKghjthhubh)}(hX(As mentioned in the :ref:`drv_ctrl` section - napi_disable() and subsequent calls to the poll method only wait for the ownership of the instance to be released, not for the poll method to exit. This means that drivers should avoid accessing any data structures after calling napi_complete_done().h](hAs mentioned in the }(hjhhhNhNubh)}(h:ref:`drv_ctrl`h]j)}(hjh]hdrv_ctrl}(hjhhhNhNubah}(h]h ](jstdstd-refeh"]h$]h&]uh1j hjubah}(h]h ]h"]h$]h&]refdoch refdomainjreftyperef refexplicitrefwarnj+drv_ctrluh1hhhhKmhjubhX section - napi_disable() and subsequent calls to the poll method only wait for the ownership of the instance to be released, not for the poll method to exit. This means that drivers should avoid accessing any data structures after calling napi_complete_done().}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKmhjthhubh)}(h.. _drv_sched:h]h}(h]h ]h"]h$]h&]h drv-scheduh1hhKshjthhhhubeh}(h] call-sequenceah ]h"] call sequenceah$]h&]uh1hhjwhhhhhKeubh)}(hhh](h)}(hScheduling and IRQ maskingh]hScheduling and IRQ masking}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKvubh)}(hDrivers should keep the interrupts masked after scheduling the NAPI instance - until NAPI polling finishes any further interrupts are unnecessary.h]hDrivers should keep the interrupts masked after scheduling the NAPI instance - until NAPI polling finishes any further interrupts are unnecessary.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKxhjhhubh)}(hDrivers which have to mask the interrupts explicitly (as opposed to IRQ being auto-masked by the device) should use the napi_schedule_prep() and __napi_schedule() calls:h]hDrivers which have to mask the interrupts explicitly (as opposed to IRQ being auto-masked by the device) should use the napi_schedule_prep() and __napi_schedule() calls:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK|hjhhubh literal_block)}(hif (napi_schedule_prep(&v->napi)) { mydrv_mask_rxtx_irq(v->idx); /* schedule after masking to avoid races */ __napi_schedule(&v->napi); }h]hif (napi_schedule_prep(&v->napi)) { mydrv_mask_rxtx_irq(v->idx); /* schedule after masking to avoid races */ __napi_schedule(&v->napi); }}hjsbah}(h]h ]h"]h$]h&]hhforcelanguagechighlight_args}uh1jhhhKhjhhubh)}(hLIRQ should only be unmasked after a successful call to napi_complete_done():h]hLIRQ should only be unmasked after a successful call to napi_complete_done():}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hif (budget && napi_complete_done(&v->napi, work_done)) { mydrv_unmask_rxtx_irq(v->idx); return min(work_done, budget - 1); }h]hif (budget && napi_complete_done(&v->napi, work_done)) { mydrv_unmask_rxtx_irq(v->idx); return min(work_done, budget - 1); }}hj(sbah}(h]h ]h"]h$]h&]hhjjjj}uh1jhhhKhjhhubh)}(hXnapi_schedule_irqoff() is a variant of napi_schedule() which takes advantage of guarantees given by being invoked in IRQ context (no need to mask interrupts). napi_schedule_irqoff() will fall back to napi_schedule() if IRQs are threaded (such as if ``PREEMPT_RT`` is enabled).h](hnapi_schedule_irqoff() is a variant of napi_schedule() which takes advantage of guarantees given by being invoked in IRQ context (no need to mask interrupts). napi_schedule_irqoff() will fall back to napi_schedule() if IRQs are threaded (such as if }(hj7hhhNhNubj5)}(h``PREEMPT_RT``h]h PREEMPT_RT}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj7ubh is enabled).}(hj7hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h](scheduling-and-irq-maskingjeh ]h"](scheduling and irq masking drv_schedeh$]h&]uh1hhjwhhhhhKvj}j]jsj}jjsubh)}(hhh](h)}(hInstance to queue mappingh]hInstance to queue mapping}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1hhjbhhhhhKubh)}(hXPModern devices have multiple NAPI instances (struct napi_struct) per interface. There is no strong requirement on how the instances are mapped to queues and interrupts. NAPI is primarily a polling/processing abstraction without specific user-facing semantics. That said, most networking devices end up using NAPI in fairly similar ways.h]hXPModern devices have multiple NAPI instances (struct napi_struct) per interface. There is no strong requirement on how the instances are mapped to queues and interrupts. NAPI is primarily a polling/processing abstraction without specific user-facing semantics. That said, most networking devices end up using NAPI in fairly similar ways.}(hjshhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjbhhubh)}(hNAPI instances most often correspond 1:1:1 to interrupts and queue pairs (queue pair is a set of a single Rx and single Tx queue).h]hNAPI instances most often correspond 1:1:1 to interrupts and queue pairs (queue pair is a set of a single Rx and single Tx queue).}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjbhhubh)}(hXIn less common cases a NAPI instance may be used for multiple queues or Rx and Tx queues can be serviced by separate NAPI instances on a single core. Regardless of the queue assignment, however, there is usually still a 1:1 mapping between NAPI instances and interrupts.h]hXIn less common cases a NAPI instance may be used for multiple queues or Rx and Tx queues can be serviced by separate NAPI instances on a single core. Regardless of the queue assignment, however, there is usually still a 1:1 mapping between NAPI instances and interrupts.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjbhhubh)}(hXIt's worth noting that the ethtool API uses a "channel" terminology where each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear what constitutes a channel; the recommended interpretation is to understand a channel as an IRQ/NAPI which services queues of a given type. For example, a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected to utilize 3 interrupts, 2 Rx and 2 Tx queues.h](hkIt’s worth noting that the ethtool API uses a “channel” terminology where each channel can be either }(hjhhhNhNubj5)}(h``rx``h]hrx}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh, }(hjhhhNhNubj5)}(h``tx``h]htx}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh or }(hjhhhNhNubj5)}(h ``combined``h]hcombined}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh. It’s not clear what constitutes a channel; the recommended interpretation is to understand a channel as an IRQ/NAPI which services queues of a given type. For example, a configuration of 1 }(hjhhhNhNubj5)}(h``rx``h]hrx}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh, 1 }(hjhhhNhNubj5)}(h``tx``h]htx}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh and 1 }(hjhhhNhNubj5)}(h ``combined``h]hcombined}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubhC channel is expected to utilize 3 interrupts, 2 Rx and 2 Tx queues.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjbhhubeh}(h]instance-to-queue-mappingah ]h"]instance to queue mappingah$]h&]uh1hhjwhhhhhKubh)}(hhh](h)}(hPersistent NAPI configh]hPersistent NAPI config}(hj"hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hXbDrivers often allocate and free NAPI instances dynamically. This leads to loss of NAPI-related user configuration each time NAPI instances are reallocated. The netif_napi_add_config() API prevents this loss of configuration by associating each NAPI instance with a persistent NAPI configuration based on a driver defined index value, like a queue number.h]hXbDrivers often allocate and free NAPI instances dynamically. This leads to loss of NAPI-related user configuration each time NAPI instances are reallocated. The netif_napi_add_config() API prevents this loss of configuration by associating each NAPI instance with a persistent NAPI configuration based on a driver defined index value, like a queue number.}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hUsing this API allows for persistent NAPI IDs (among other settings), which can be beneficial to userspace programs using ``SO_INCOMING_NAPI_ID``. See the sections below for other NAPI configuration settings.h](hzUsing this API allows for persistent NAPI IDs (among other settings), which can be beneficial to userspace programs using }(hj>hhhNhNubj5)}(h``SO_INCOMING_NAPI_ID``h]hSO_INCOMING_NAPI_ID}(hjFhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj>ubh?. See the sections below for other NAPI configuration settings.}(hj>hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh)}(hDDrivers should try to use netif_napi_add_config() whenever possible.h]hDDrivers should try to use netif_napi_add_config() whenever possible.}(hj^hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h]persistent-napi-configah ]h"]persistent napi configah$]h&]uh1hhjwhhhhhKubeh}(h] driver-apiah ]h"] driver apiah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hUser APIh]hUser API}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj|hhhhhKubh)}(hUser interactions with NAPI depend on NAPI instance ID. The instance IDs are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.h](hoUser interactions with NAPI depend on NAPI instance ID. The instance IDs are only visible to the user thru the }(hjhhhNhNubj5)}(h``SO_INCOMING_NAPI_ID``h]hSO_INCOMING_NAPI_ID}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh socket option.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj|hhubh)}(hUsers can query NAPI IDs for a device or device queue using netlink. This can be done programmatically in a user application or by using a script included in the kernel source tree: ``tools/net/ynl/pyynl/cli.py``.h](hUsers can query NAPI IDs for a device or device queue using netlink. This can be done programmatically in a user application or by using a script included in the kernel source tree: }(hjhhhNhNubj5)}(h``tools/net/ynl/pyynl/cli.py``h]htools/net/ynl/pyynl/cli.py}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj|hhubh)}(hnFor example, using the script to dump all of the queues for a device (which will reveal each queue's NAPI ID):h]hpFor example, using the script to dump all of the queues for a device (which will reveal each queue’s NAPI ID):}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj|hhubj)}(h$ kernel-source/tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get \ --json='{"ifindex": 2}'h]h$ kernel-source/tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get \ --json='{"ifindex": 2}'}hjsbah}(h]h ]h"]h$]h&]hhjjbashj}uh1jhhhKhj|hhubh)}(hhSee ``Documentation/netlink/specs/netdev.yaml`` for more details on available operations and attributes.h](hSee }(hjhhhNhNubj5)}(h+``Documentation/netlink/specs/netdev.yaml``h]h'Documentation/netlink/specs/netdev.yaml}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh9 for more details on available operations and attributes.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj|hhubh)}(hhh](h)}(hSoftware IRQ coalescingh]hSoftware IRQ coalescing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhKubh)}(hNAPI does not perform any explicit event coalescing by default. In most scenarios batching happens due to IRQ coalescing which is done by the device. There are cases where software coalescing is helpful.h]hNAPI does not perform any explicit event coalescing by default. In most scenarios batching happens due to IRQ coalescing which is done by the device. There are cases where software coalescing is helpful.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hXtNAPI can be configured to arm a repoll timer instead of unmasking the hardware interrupts as soon as all packets are processed. The ``gro_flush_timeout`` sysfs configuration of the netdevice is reused to control the delay of the timer, while ``napi_defer_hard_irqs`` controls the number of consecutive empty polls before NAPI gives up and goes back to using hardware IRQs.h](hNAPI can be configured to arm a repoll timer instead of unmasking the hardware interrupts as soon as all packets are processed. The }(hj*hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj*ubhY sysfs configuration of the netdevice is reused to control the delay of the timer, while }(hj*hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj*ubhj controls the number of consecutive empty polls before NAPI gives up and goes back to using hardware IRQs.}(hj*hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hX The above parameters can also be set on a per-NAPI basis using netlink via netdev-genl. When used with netlink and configured on a per-NAPI basis, the parameters mentioned above use hyphens instead of underscores: ``gro-flush-timeout`` and ``napi-defer-hard-irqs``.h](hThe above parameters can also be set on a per-NAPI basis using netlink via netdev-genl. When used with netlink and configured on a per-NAPI basis, the parameters mentioned above use hyphens instead of underscores: }(hj\hhhNhNubj5)}(h``gro-flush-timeout``h]hgro-flush-timeout}(hjdhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj\ubh and }(hj\hhhNhNubj5)}(h``napi-defer-hard-irqs``h]hnapi-defer-hard-irqs}(hjvhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj\ubh.}(hj\hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hPer-NAPI configuration can be done programmatically in a user application or by using a script included in the kernel source tree: ``tools/net/ynl/pyynl/cli.py``.h](hPer-NAPI configuration can be done programmatically in a user application or by using a script included in the kernel source tree: }(hjhhhNhNubj5)}(h``tools/net/ynl/pyynl/cli.py``h]htools/net/ynl/pyynl/cli.py}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hFor example, using the script:h]hFor example, using the script:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj hhubj)}(h$ kernel-source/tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set \ --json='{"id": 345, "defer-hard-irqs": 111, "gro-flush-timeout": 11111}'h]h$ kernel-source/tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set \ --json='{"id": 345, "defer-hard-irqs": 111, "gro-flush-timeout": 11111}'}hjsbah}(h]h ]h"]h$]h&]hhjjbashj}uh1jhhhKhj hhubh)}(hSimilarly, the parameter ``irq-suspend-timeout`` can be set using netlink via netdev-genl. There is no global sysfs parameter for this value.h](hSimilarly, the parameter }(hjhhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh] can be set using netlink via netdev-genl. There is no global sysfs parameter for this value.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(h``irq-suspend-timeout`` is used to determine how long an application can completely suspend IRQs. It is used in combination with SO_PREFER_BUSY_POLL, which can be set on a per-epoll context basis with ``EPIOCSPARAMS`` ioctl.h](j5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh is used to determine how long an application can completely suspend IRQs. It is used in combination with SO_PREFER_BUSY_POLL, which can be set on a per-epoll context basis with }(hjhhhNhNubj5)}(h``EPIOCSPARAMS``h]h EPIOCSPARAMS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh ioctl.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(h .. _poll:h]h}(h]h ]h"]h$]h&]hpolluh1hhKhj hhhhubeh}(h]software-irq-coalescingah ]h"]software irq coalescingah$]h&]uh1hhj|hhhhhKubh)}(hhh](h)}(h Busy pollingh]h Busy polling}(hj0hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj-hhhhhMubh)}(hBusy polling allows a user process to check for incoming packets before the device interrupt fires. As is the case with any busy polling it trades off CPU cycles for lower latency (production uses of NAPI busy polling are not well known).h]hBusy polling allows a user process to check for incoming packets before the device interrupt fires. As is the case with any busy polling it trades off CPU cycles for lower latency (production uses of NAPI busy polling are not well known).}(hj>hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj-hhubh)}(hBusy polling is enabled by either setting ``SO_BUSY_POLL`` on selected sockets or using the global ``net.core.busy_poll`` and ``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling also exists.h](h*Busy polling is enabled by either setting }(hjLhhhNhNubj5)}(h``SO_BUSY_POLL``h]h SO_BUSY_POLL}(hjThhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjLubh) on selected sockets or using the global }(hjLhhhNhNubj5)}(h``net.core.busy_poll``h]hnet.core.busy_poll}(hjfhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjLubh and }(hjLhhhNhNubj5)}(h``net.core.busy_read``h]hnet.core.busy_read}(hjxhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjLubh< sysctls. An io_uring API for NAPI busy polling also exists.}(hjLhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj-hhubeh}(h]( busy-pollingj$eh ]h"]( busy pollingpolleh$]h&]uh1hhj|hhhhhMj}jjsj}j$jsubh)}(hhh](h)}(hepoll-based busy pollingh]hepoll-based busy polling}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM ubh)}(hIt is possible to trigger packet processing directly from calls to ``epoll_wait``. In order to use this feature, a user application must ensure all file descriptors which are added to an epoll context have the same NAPI ID.h](hCIt is possible to trigger packet processing directly from calls to }(hjhhhNhNubj5)}(h``epoll_wait``h]h epoll_wait}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh. In order to use this feature, a user application must ensure all file descriptors which are added to an epoll context have the same NAPI ID.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hXrIf the application uses a dedicated acceptor thread, the application can obtain the NAPI ID of the incoming connection using SO_INCOMING_NAPI_ID and then distribute that file descriptor to a worker thread. The worker thread would add the file descriptor to its epoll context. This would ensure each worker thread has an epoll context with FDs that have the same NAPI ID.h]hXrIf the application uses a dedicated acceptor thread, the application can obtain the NAPI ID of the incoming connection using SO_INCOMING_NAPI_ID and then distribute that file descriptor to a worker thread. The worker thread would add the file descriptor to its epoll context. This would ensure each worker thread has an epoll context with FDs that have the same NAPI ID.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hX,Alternatively, if the application uses SO_REUSEPORT, a bpf or ebpf program can be inserted to distribute incoming connections to threads such that each thread is only given incoming connections with the same NAPI ID. Care must be taken to carefully handle cases where a system may have multiple NICs.h]hX,Alternatively, if the application uses SO_REUSEPORT, a bpf or ebpf program can be inserted to distribute incoming connections to threads such that each thread is only given incoming connections with the same NAPI ID. Care must be taken to carefully handle cases where a system may have multiple NICs.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(h7In order to enable busy polling, there are two choices:h]h7In order to enable busy polling, there are two choices:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubhenumerated_list)}(hhh](h list_item)}(hX*``/proc/sys/net/core/busy_poll`` can be set with a time in useconds to busy loop waiting for events. This is a system-wide setting and will cause all epoll-based applications to busy poll when they call epoll_wait. This may not be desirable as many applications may not have the need to busy poll. h]h)}(hX)``/proc/sys/net/core/busy_poll`` can be set with a time in useconds to busy loop waiting for events. This is a system-wide setting and will cause all epoll-based applications to busy poll when they call epoll_wait. This may not be desirable as many applications may not have the need to busy poll.h](j5)}(h ``/proc/sys/net/core/busy_poll``h]h/proc/sys/net/core/busy_poll}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubhX  can be set with a time in useconds to busy loop waiting for events. This is a system-wide setting and will cause all epoll-based applications to busy poll when they call epoll_wait. This may not be desirable as many applications may not have the need to busy poll.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hApplications using recent kernels can issue an ioctl on the epoll context file descriptor to set (``EPIOCSPARAMS``) or get (``EPIOCGPARAMS``) ``struct epoll_params``:, which user programs can define as follows: h]h)}(hApplications using recent kernels can issue an ioctl on the epoll context file descriptor to set (``EPIOCSPARAMS``) or get (``EPIOCGPARAMS``) ``struct epoll_params``:, which user programs can define as follows:h](hbApplications using recent kernels can issue an ioctl on the epoll context file descriptor to set (}(hj'hhhNhNubj5)}(h``EPIOCSPARAMS``h]h EPIOCSPARAMS}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj'ubh ) or get (}(hj'hhhNhNubj5)}(h``EPIOCGPARAMS``h]h EPIOCGPARAMS}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj'ubh) }(hj'hhhNhNubj5)}(h``struct epoll_params``h]hstruct epoll_params}(hjShhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj'ubh-:, which user programs can define as follows:}(hj'hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM%hj#ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]enumtypearabicprefixhsuffix.uh1jhjhhhhhM ubj)}(hstruct epoll_params { uint32_t busy_poll_usecs; uint16_t busy_poll_budget; uint8_t prefer_busy_poll; /* pad the struct to a multiple of 64bits */ uint8_t __pad; };h]hstruct epoll_params { uint32_t busy_poll_usecs; uint16_t busy_poll_budget; uint8_t prefer_busy_poll; /* pad the struct to a multiple of 64bits */ uint8_t __pad; };}hj|sbah}(h]h ]h"]h$]h&]hhjjjj}uh1jhhhM)hjhhubeh}(h]epoll-based-busy-pollingah ]h"]epoll-based busy pollingah$]h&]uh1hhj|hhhhhM ubh)}(hhh](h)}(hIRQ mitigationh]hIRQ mitigation}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM5ubh)}(hzWhile busy polling is supposed to be used by low latency applications, a similar mechanism can be used for IRQ mitigation.h]hzWhile busy polling is supposed to be used by low latency applications, a similar mechanism can be used for IRQ mitigation.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM7hjhhubh)}(hVery high request-per-second applications (especially routing/forwarding applications and especially applications using AF_XDP sockets) may not want to be interrupted until they finish processing a request or a batch of packets.h]hVery high request-per-second applications (especially routing/forwarding applications and especially applications using AF_XDP sockets) may not want to be interrupted until they finish processing a request or a batch of packets.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM:hjhhubh)}(hX=Such applications can pledge to the kernel that they will perform a busy polling operation periodically, and the driver should keep the device IRQs permanently masked. This mode is enabled by using the ``SO_PREFER_BUSY_POLL`` socket option. To avoid system misbehavior the pledge is revoked if ``gro_flush_timeout`` passes without any busy poll call. For epoll-based busy polling applications, the ``prefer_busy_poll`` field of ``struct epoll_params`` can be set to 1 and the ``EPIOCSPARAMS`` ioctl can be issued to enable this mode. See the above section for more details.h](hSuch applications can pledge to the kernel that they will perform a busy polling operation periodically, and the driver should keep the device IRQs permanently masked. This mode is enabled by using the }(hjhhhNhNubj5)}(h``SO_PREFER_BUSY_POLL``h]hSO_PREFER_BUSY_POLL}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubhE socket option. To avoid system misbehavior the pledge is revoked if }(hjhhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubhS passes without any busy poll call. For epoll-based busy polling applications, the }(hjhhhNhNubj5)}(h``prefer_busy_poll``h]hprefer_busy_poll}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh field of }(hjhhhNhNubj5)}(h``struct epoll_params``h]hstruct epoll_params}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubh can be set to 1 and the }(hjhhhNhNubj5)}(h``EPIOCSPARAMS``h]h EPIOCSPARAMS}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjubhQ ioctl can be issued to enable this mode. See the above section for more details.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM?hjhhubh)}(hXThe NAPI budget for busy polling is lower than the default (which makes sense given the low latency intention of normal busy polling). This is not the case with IRQ mitigation, however, so the budget can be adjusted with the ``SO_BUSY_POLL_BUDGET`` socket option. For epoll-based busy polling applications, the ``busy_poll_budget`` field can be adjusted to the desired value in ``struct epoll_params`` and set on a specific epoll context using the ``EPIOCSPARAMS`` ioctl. See the above section for more details.h](hThe NAPI budget for busy polling is lower than the default (which makes sense given the low latency intention of normal busy polling). This is not the case with IRQ mitigation, however, so the budget can be adjusted with the }(hj( hhhNhNubj5)}(h``SO_BUSY_POLL_BUDGET``h]hSO_BUSY_POLL_BUDGET}(hj0 hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj( ubh? socket option. For epoll-based busy polling applications, the }(hj( hhhNhNubj5)}(h``busy_poll_budget``h]hbusy_poll_budget}(hjB hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj( ubh/ field can be adjusted to the desired value in }(hj( hhhNhNubj5)}(h``struct epoll_params``h]hstruct epoll_params}(hjT hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj( ubh/ and set on a specific epoll context using the }(hj( hhhNhNubj5)}(h``EPIOCSPARAMS``h]h EPIOCSPARAMS}(hjf hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj( ubh/ ioctl. See the above section for more details.}(hj( hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMHhjhhubh)}(hX?It is important to note that choosing a large value for ``gro_flush_timeout`` will defer IRQs to allow for better batch processing, but will induce latency when the system is not fully loaded. Choosing a small value for ``gro_flush_timeout`` can cause interference of the user application which is attempting to busy poll by device IRQs and softirq processing. This value should be chosen carefully with these tradeoffs in mind. epoll-based busy polling applications may be able to mitigate how much user processing happens by choosing an appropriate value for ``maxevents``.h](h8It is important to note that choosing a large value for }(hj~ hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj~ ubh will defer IRQs to allow for better batch processing, but will induce latency when the system is not fully loaded. Choosing a small value for }(hj~ hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj~ ubhX@ can cause interference of the user application which is attempting to busy poll by device IRQs and softirq processing. This value should be chosen carefully with these tradeoffs in mind. epoll-based busy polling applications may be able to mitigate how much user processing happens by choosing an appropriate value for }(hj~ hhhNhNubj5)}(h ``maxevents``h]h maxevents}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj~ ubh.}(hj~ hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMPhjhhubh)}(hdUsers may want to consider an alternate approach, IRQ suspension, to help deal with these tradeoffs.h]hdUsers may want to consider an alternate approach, IRQ suspension, to help deal with these tradeoffs.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMYhjhhubeh}(h]irq-mitigationah ]h"]irq mitigationah$]h&]uh1hhj|hhhhhM5ubh)}(hhh](h)}(hIRQ suspensionh]hIRQ suspension}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhM]ubh)}(hiIRQ suspension is a mechanism wherein device IRQs are masked while epoll triggers NAPI packet processing.h]hiIRQ suspension is a mechanism wherein device IRQs are masked while epoll triggers NAPI packet processing.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM_hj hhubh)}(hXCWhile application calls to epoll_wait successfully retrieve events, the kernel will defer the IRQ suspension timer. If the kernel does not retrieve any events while busy polling (for example, because network traffic levels subsided), IRQ suspension is disabled and the IRQ mitigation strategies described above are engaged.h]hXCWhile application calls to epoll_wait successfully retrieve events, the kernel will defer the IRQ suspension timer. If the kernel does not retrieve any events while busy polling (for example, because network traffic levels subsided), IRQ suspension is disabled and the IRQ mitigation strategies described above are engaged.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMbhj hhubh)}(hPThis allows users to balance CPU consumption with network processing efficiency.h]hPThis allows users to balance CPU consumption with network processing efficiency.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhhj hhubh)}(hTo use this mechanism:h]hTo use this mechanism:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMkhj hhubh block_quote)}(hX1. The per-NAPI config parameter ``irq-suspend-timeout`` should be set to the maximum time (in nanoseconds) the application can have its IRQs suspended. This is done using netlink, as described above. This timeout serves as a safety mechanism to restart IRQ driver interrupt processing if the application has stalled. This value should be chosen so that it covers the amount of time the user application needs to process data from its call to epoll_wait, noting that applications can control how much data they retrieve by setting ``max_events`` when calling epoll_wait. 2. The sysfs parameter or per-NAPI config parameters ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` can be set to low values. They will be used to defer IRQs after busy poll has found no data. 3. The ``prefer_busy_poll`` flag must be set to true. This can be done using the ``EPIOCSPARAMS`` ioctl as described above. 4. The application uses epoll as described above to trigger NAPI packet processing. h]j)}(hhh](j)}(hX8The per-NAPI config parameter ``irq-suspend-timeout`` should be set to the maximum time (in nanoseconds) the application can have its IRQs suspended. This is done using netlink, as described above. This timeout serves as a safety mechanism to restart IRQ driver interrupt processing if the application has stalled. This value should be chosen so that it covers the amount of time the user application needs to process data from its call to epoll_wait, noting that applications can control how much data they retrieve by setting ``max_events`` when calling epoll_wait. h]h)}(hX7The per-NAPI config parameter ``irq-suspend-timeout`` should be set to the maximum time (in nanoseconds) the application can have its IRQs suspended. This is done using netlink, as described above. This timeout serves as a safety mechanism to restart IRQ driver interrupt processing if the application has stalled. This value should be chosen so that it covers the amount of time the user application needs to process data from its call to epoll_wait, noting that applications can control how much data they retrieve by setting ``max_events`` when calling epoll_wait.h](hThe per-NAPI config parameter }(hj. hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj6 hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj. ubhX should be set to the maximum time (in nanoseconds) the application can have its IRQs suspended. This is done using netlink, as described above. This timeout serves as a safety mechanism to restart IRQ driver interrupt processing if the application has stalled. This value should be chosen so that it covers the amount of time the user application needs to process data from its call to epoll_wait, noting that applications can control how much data they retrieve by setting }(hj. hhhNhNubj5)}(h``max_events``h]h max_events}(hjH hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj. ubh when calling epoll_wait.}(hj. hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMmhj* ubah}(h]h ]h"]h$]h&]uh1jhj' ubj)}(hThe sysfs parameter or per-NAPI config parameters ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` can be set to low values. They will be used to defer IRQs after busy poll has found no data. h]h)}(hThe sysfs parameter or per-NAPI config parameters ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` can be set to low values. They will be used to defer IRQs after busy poll has found no data.h](h2The sysfs parameter or per-NAPI config parameters }(hjj hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hjr hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjj ubh and }(hjj hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjj ubh] can be set to low values. They will be used to defer IRQs after busy poll has found no data.}(hjj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMvhjf ubah}(h]h ]h"]h$]h&]uh1jhj' ubj)}(hyThe ``prefer_busy_poll`` flag must be set to true. This can be done using the ``EPIOCSPARAMS`` ioctl as described above. h]h)}(hxThe ``prefer_busy_poll`` flag must be set to true. This can be done using the ``EPIOCSPARAMS`` ioctl as described above.h](hThe }(hj hhhNhNubj5)}(h``prefer_busy_poll``h]hprefer_busy_poll}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh6 flag must be set to true. This can be done using the }(hj hhhNhNubj5)}(h``EPIOCSPARAMS``h]h EPIOCSPARAMS}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh ioctl as described above.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMzhj ubah}(h]h ]h"]h$]h&]uh1jhj' ubj)}(hQThe application uses epoll as described above to trigger NAPI packet processing. h]h)}(hPThe application uses epoll as described above to trigger NAPI packet processing.h]hPThe application uses epoll as described above to trigger NAPI packet processing.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM}hj ubah}(h]h ]h"]h$]h&]uh1jhj' ubeh}(h]h ]h"]h$]h&]jwjxjyhjzj{uh1jhj# ubah}(h]h ]h"]h$]h&]uh1j! hhhMmhj hhubh)}(hAs mentioned above, as long as subsequent calls to epoll_wait return events to userland, the ``irq-suspend-timeout`` is deferred and IRQs are disabled. This allows the application to process data without interference.h](h]As mentioned above, as long as subsequent calls to epoll_wait return events to userland, the }(hj hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubhe is deferred and IRQs are disabled. This allows the application to process data without interference.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hOnce a call to epoll_wait results in no events being found, IRQ suspension is automatically disabled and the ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` mitigation mechanisms take over.h](hmOnce a call to epoll_wait results in no events being found, IRQ suspension is automatically disabled and the }(hj" hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj* hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj" ubh and }(hj" hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj< hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj" ubh! mitigation mechanisms take over.}(hj" hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hIt is expected that ``irq-suspend-timeout`` will be set to a value much larger than ``gro_flush_timeout`` as ``irq-suspend-timeout`` should suspend IRQs for the duration of one userland processing cycle.h](hIt is expected that }(hjT hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj\ hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjT ubh) will be set to a value much larger than }(hjT hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hjn hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjT ubh as }(hjT hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hjT ubhG should suspend IRQs for the duration of one userland processing cycle.}(hjT hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hWhile it is not strictly necessary to use ``napi_defer_hard_irqs`` and ``gro_flush_timeout`` to use IRQ suspension, their use is strongly recommended.h](h*While it is not strictly necessary to use }(hj hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh and }(hj hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh: to use IRQ suspension, their use is strongly recommended.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hXMIRQ suspension causes the system to alternate between polling mode and irq-driven packet delivery. During busy periods, ``irq-suspend-timeout`` overrides ``gro_flush_timeout`` and keeps the system busy polling, but when epoll finds no events, the setting of ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` determine the next step.h](hxIRQ suspension causes the system to alternate between polling mode and irq-driven packet delivery. During busy periods, }(hj hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh overrides }(hj hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubhS and keeps the system busy polling, but when epoll finds no events, the setting of }(hj hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh and }(hj hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh determine the next step.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hVThere are essentially three possible loops for network processing and packet delivery:h]hVThere are essentially three possible loops for network processing and packet delivery:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubj)}(hhh](j)}(h9hardirq -> softirq -> napi poll; basic interrupt deliveryh]h)}(hj3 h]h9hardirq -> softirq -> napi poll; basic interrupt delivery}(hj5 hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj1 ubah}(h]h ]h"]h$]h&]uh1jhj. hhhhhNubj)}(h6timer -> softirq -> napi poll; deferred irq processingh]h)}(hjJ h]h6timer -> softirq -> napi poll; deferred irq processing}(hjL hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjH ubah}(h]h ]h"]h$]h&]uh1jhj. hhhhhNubj)}(h.epoll -> busy-poll -> napi poll; busy looping h]h)}(h-epoll -> busy-poll -> napi poll; busy loopingh]h-epoll -> busy-poll -> napi poll; busy looping}(hjc hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj_ ubah}(h]h ]h"]h$]h&]uh1jhj. hhhhhNubeh}(h]h ]h"]h$]h&]jwjxjyhjz)uh1jhj hhhhhMubh)}(hcLoop 2 can take control from Loop 1, if ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` are set.h](h(Loop 2 can take control from Loop 1, if }(hj~ hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj~ ubh and }(hj~ hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj~ ubh are set.}(hj~ hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hsIf ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` are set, Loops 2 and 3 "wrestle" with each other for control.h](hIf }(hj hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh and }(hj hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubhB are set, Loops 2 and 3 “wrestle” with each other for control.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hDuring busy periods, ``irq-suspend-timeout`` is used as timer in Loop 2, which essentially tilts network processing in favour of Loop 3.h](hDuring busy periods, }(hj hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh\ is used as timer in Loop 2, which essentially tilts network processing in favour of Loop 3.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hjIf ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` are not set, Loop 3 cannot take control from Loop 1.h](hIf }(hj hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh and }(hj hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh5 are not set, Loop 3 cannot take control from Loop 1.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hTherefore, setting ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` is the recommended usage, because otherwise setting ``irq-suspend-timeout`` might not have any discernible effect.h](hTherefore, setting }(hj4 hhhNhNubj5)}(h``gro_flush_timeout``h]hgro_flush_timeout}(hj< hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj4 ubh and }(hj4 hhhNhNubj5)}(h``napi_defer_hard_irqs``h]hnapi_defer_hard_irqs}(hjN hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj4 ubh5 is the recommended usage, because otherwise setting }(hj4 hhhNhNubj5)}(h``irq-suspend-timeout``h]hirq-suspend-timeout}(hj` hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj4 ubh' might not have any discernible effect.}(hj4 hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(h .. _threaded:h]h}(h]h ]h"]h$]h&]hthreadeduh1hhMhj hhhhubeh}(h]irq-suspensionah ]h"]irq suspensionah$]h&]uh1hhj|hhhhhM]ubh)}(hhh](h)}(h Threaded NAPIh]h Threaded NAPI}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMubh)}(hX+Threaded NAPI is an operating mode that uses dedicated kernel threads rather than software IRQ context for NAPI processing. The configuration is per netdevice and will affect all NAPI instances of that device. Each NAPI instance will spawn a separate thread (called ``napi/${ifc-name}-${napi-id}``).h](hX Threaded NAPI is an operating mode that uses dedicated kernel threads rather than software IRQ context for NAPI processing. The configuration is per netdevice and will affect all NAPI instances of that device. Each NAPI instance will spawn a separate thread (called }(hj hhhNhNubj5)}(h``napi/${ifc-name}-${napi-id}``h]hnapi/${ifc-name}-${napi-id}}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh).}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hX?It is recommended to pin each kernel thread to a single CPU, the same CPU as the CPU which services the interrupt. Note that the mapping between IRQs and NAPI instances may not be trivial (and is driver dependent). The NAPI instance IDs will be assigned in the opposite order than the process IDs of the kernel threads.2h]hX?It is recommended to pin each kernel thread to a single CPU, the same CPU as the CPU which services the interrupt. Note that the mapping between IRQs and NAPI instances may not be trivial (and is driver dependent). The NAPI instance IDs will be assigned in the opposite order than the process IDs of the kernel threads.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(h`Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in netdev's sysfs directory.h](h2Threaded NAPI is controlled by writing 0/1 to the }(hj hhhNhNubj5)}(h ``threaded``h]hthreaded}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j4hj ubh$ file in netdev’s sysfs directory.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj hhubhrubric)}(h Footnotesh]h Footnotes}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj hhhhhMubhfootnote)}(h8NAPI was originally referred to as New API in 2.4 Linux.h](hlabel)}(hhh]h1}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj hhhNhNubh)}(hj h]h8NAPI was originally referred to as New API in 2.4 Linux.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubeh}(h]hah ]h"]1ah$]h&]hahKhhuh1j hhhMhj hhubeh}(h]( threaded-napij eh ]h"]( threaded napithreadedeh$]h&]uh1hhj|hhhhhMj}j)jx sj}j jx subeh}(h]user-apiah ]h"]user apiah$]h&]uh1hhhhhhhhKubeh}(h](hid1eh ]h"]napiah$]napiah&]uh1hhhhhhhhK referencedKj}j;hsj}hhsubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjeerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}refids}(h]haj]jaj]jaj$]jaj ]jx ah]haunameids}(j;hjyjvjjjjjqjnjjj]jj\jYjjjqjnj3j0j*j'jj$jjjjj j j j j)j j(j%j hu nametypes}(j;jyjjjqjj]j\jjqj3j*jjjj j j)j(j uh}(hhj8hhhjvjwjjjjjnjjjtjjjYjjjbjnjj0j|j'j j$j-jj-jjj jj j j j j%j hj u footnote_refs} citation_refs} autofootnotes]j aautofootnote_refs]hasymbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}jsKsRparse_messages]hsystem_message)}(hhh]h)}(h'Duplicate implicit target name: "napi".h]h+Duplicate implicit target name: “napi”.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]j8alevelKtypeINFOsourcehlineKuh1jhhhhhhhKubatransform_messages](j)}(hhh]h)}(hhh]h*Hyperlink target "napi" is not referenced.}hjsbah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]levelKtypejsourcehlineKuh1jubj)}(hhh]h)}(hhh]h.Hyperlink target "drv-ctrl" is not referenced.}hj sbah}(h]h ]h"]h$]h&]uh1hhjubah}(h]h ]h"]h$]h&]levelKtypejsourcehlineK"uh1jubj)}(hhh]h)}(hhh]h/Hyperlink target "drv-sched" is not referenced.}hj$sbah}(h]h ]h"]h$]h&]uh1hhj!ubah}(h]h ]h"]h$]h&]levelKtypejsourcehlineKsuh1jubj)}(hhh]h)}(hhh]h*Hyperlink target "poll" is not referenced.}hj>sbah}(h]h ]h"]h$]h&]uh1hhj;ubah}(h]h ]h"]h$]h&]levelKtypejsourcehlineKuh1jubj)}(hhh]h)}(hhh]h.Hyperlink target "threaded" is not referenced.}hjXsbah}(h]h ]h"]h$]h&]uh1hhjUubah}(h]h ]h"]h$]h&]levelKtypejsourcehlineMuh1jube transformerN include_log] decorationNhhub.