€•“wŒsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ+/translations/zh_CN/networking/msg_zerocopy”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ+/translations/zh_TW/networking/msg_zerocopy”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ+/translations/it_IT/networking/msg_zerocopy”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ+/translations/ja_JP/networking/msg_zerocopy”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ+/translations/ko_KR/networking/msg_zerocopy”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ+/translations/sp_SP/networking/msg_zerocopy”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒ MSG_ZEROCOPY”h]”hŒ MSG_ZEROCOPY”…””}”(hh¨hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hh£hžhhŸŒE/var/lib/git/docbuild/linux/Documentation/networking/msg_zerocopy.rst”h Kubh¢)”}”(hhh]”(h§)”}”(hŒIntro”h]”hŒIntro”…””}”(hhºhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hh·hžhhŸh¶h KubhŒ paragraph”“”)”}”(hŒ The MSG_ZEROCOPY flag enables copy avoidance for socket send calls. The feature is currently implemented for TCP, UDP and VSOCK (with virtio transport) sockets.”h]”hŒ The MSG_ZEROCOPY flag enables copy avoidance for socket send calls. The feature is currently implemented for TCP, UDP and VSOCK (with virtio transport) sockets.”…””}”(hhÊhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K hh·hžhubh¢)”}”(hhh]”(h§)”}”(hŒOpportunity and Caveats”h]”hŒOpportunity and Caveats”…””}”(hhÛhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hhØhžhhŸh¶h KubhÉ)”}”(hŒþCopying large buffers between user process and kernel can be expensive. Linux supports various interfaces that eschew copying, such as sendfile and splice. The MSG_ZEROCOPY flag extends the underlying copy avoidance mechanism to common socket send calls.”h]”hŒþCopying large buffers between user process and kernel can be expensive. Linux supports various interfaces that eschew copying, such as sendfile and splice. The MSG_ZEROCOPY flag extends the underlying copy avoidance mechanism to common socket send calls.”…””}”(hhéhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h KhhØhžhubhÉ)”}”(hŒóCopy avoidance is not a free lunch. As implemented, with page pinning, it replaces per byte copy cost with page accounting and completion notification overhead. As a result, MSG_ZEROCOPY is generally only effective at writes over around 10 KB.”h]”hŒóCopy avoidance is not a free lunch. As implemented, with page pinning, it replaces per byte copy cost with page accounting and completion notification overhead. As a result, MSG_ZEROCOPY is generally only effective at writes over around 10 KB.”…””}”(hh÷hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h KhhØhžhubhÉ)”}”(hXePage pinning also changes system call semantics. It temporarily shares the buffer between process and network stack. Unlike with copying, the process cannot immediately overwrite the buffer after system call return without possibly modifying the data in flight. Kernel integrity is not affected, but a buggy program can possibly corrupt its own data stream.”h]”hXePage pinning also changes system call semantics. It temporarily shares the buffer between process and network stack. Unlike with copying, the process cannot immediately overwrite the buffer after system call return without possibly modifying the data in flight. Kernel integrity is not affected, but a buggy program can possibly corrupt its own data stream.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h KhhØhžhubhÉ)”}”(hŒ­The kernel returns a notification when it is safe to modify data. Converting an existing application to MSG_ZEROCOPY is not always as trivial as just passing the flag, then.”h]”hŒ­The kernel returns a notification when it is safe to modify data. Converting an existing application to MSG_ZEROCOPY is not always as trivial as just passing the flag, then.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K"hhØhžhubeh}”(h]”Œopportunity-and-caveats”ah ]”h"]”Œopportunity and caveats”ah$]”h&]”uh1h¡hh·hžhhŸh¶h Kubh¢)”}”(hhh]”(h§)”}”(hŒ More Info”h]”hŒ More Info”…””}”(hj,hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hj)hžhhŸh¶h K(ubhÉ)”}”(hŒÈMuch of this document was derived from a longer paper presented at netdev 2.1. For more in-depth information see that paper and talk, the excellent reporting over at LWN.net or read the original code.”h]”hŒÈMuch of this document was derived from a longer paper presented at netdev 2.1. For more in-depth information see that paper and talk, the excellent reporting over at LWN.net or read the original code.”…””}”(hj:hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K*hj)hžhubhŒ block_quote”“”)”}”(hXpaper, slides, video https://netdevconf.org/2.1/session.html?debruijn LWN article https://lwn.net/Articles/726917/ patchset [PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY https://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com ”h]”hŒdefinition_list”“”)”}”(hhh]”(hŒdefinition_list_item”“”)”}”(hŒFpaper, slides, video https://netdevconf.org/2.1/session.html?debruijn ”h]”(hŒterm”“”)”}”(hŒpaper, slides, video”h]”hŒpaper, slides, video”…””}”(hj[hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1jYhŸh¶h K/hjUubhŒ definition”“”)”}”(hhh]”hÉ)”}”(hŒ0https://netdevconf.org/2.1/session.html?debruijn”h]”hŒ reference”“”)”}”(hjph]”hŒ0https://netdevconf.org/2.1/session.html?debruijn”…””}”(hjthžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”Œrefuri”jpuh1jrhjnubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K/hjkubah}”(h]”h ]”h"]”h$]”h&]”uh1jihjUubeh}”(h]”h ]”h"]”h$]”h&]”uh1jShŸh¶h K/hjPubjT)”}”(hŒ-LWN article https://lwn.net/Articles/726917/ ”h]”(jZ)”}”(hŒ LWN article”h]”hŒ LWN article”…””}”(hj˜hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1jYhŸh¶h K2hj”ubjj)”}”(hhh]”hÉ)”}”(hŒ https://lwn.net/Articles/726917/”h]”js)”}”(hj«h]”hŒ https://lwn.net/Articles/726917/”…””}”(hj­hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”Œrefuri”j«uh1jrhj©ubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K2hj¦ubah}”(h]”h ]”h"]”h$]”h&]”uh1jihj”ubeh}”(h]”h ]”h"]”h$]”h&]”uh1jShŸh¶h K2hjPubjT)”}”(hŒ”patchset [PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY https://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com ”h]”(jZ)”}”(hŒpatchset”h]”hŒpatchset”…””}”(hjÑhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1jYhŸh¶h K7hjÍubjj)”}”(hhh]”hÉ)”}”(hŒ‰[PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY https://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com”h]”(hŒ4[PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY ”…””}”(hjâhžhhŸNh Nubjs)”}”(hŒUhttps://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com”h]”hŒUhttps://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com”…””}”(hjêhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”Œrefuri”jìuh1jrhjâubeh}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K5hjßubah}”(h]”h ]”h"]”h$]”h&]”uh1jihjÍubeh}”(h]”h ]”h"]”h$]”h&]”uh1jShŸh¶h K7hjPubeh}”(h]”h ]”h"]”h$]”h&]”uh1jNhjJubah}”(h]”h ]”h"]”h$]”h&]”uh1jHhŸh¶h K.hj)hžhubeh}”(h]”Œ more-info”ah ]”h"]”Œ more info”ah$]”h&]”uh1h¡hh·hžhhŸh¶h K(ubeh}”(h]”Œintro”ah ]”h"]”Œintro”ah$]”h&]”uh1h¡hh£hžhhŸh¶h Kubh¢)”}”(hhh]”(h§)”}”(hŒ Interface”h]”hŒ Interface”…””}”(hj*hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hj'hžhhŸh¶h K:ubhÉ)”}”(hŒfPassing the MSG_ZEROCOPY flag is the most obvious step to enable copy avoidance, but not the only one.”h]”hŒfPassing the MSG_ZEROCOPY flag is the most obvious step to enable copy avoidance, but not the only one.”…””}”(hj8hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h Kcmsg_level != SOL_IP && cm->cmsg_type != IP_RECVERR) error(1, 0, "cmsg"); serr = (void *) CMSG_DATA(cm); if (serr->ee_errno != 0 || serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) error(1, 0, "serr"); printf("completed: %u..%u\n", serr->ee_info, serr->ee_data);”h]”hXostruct sock_extended_err *serr; struct cmsghdr *cm; cm = CMSG_FIRSTHDR(msg); if (cm->cmsg_level != SOL_IP && cm->cmsg_type != IP_RECVERR) error(1, 0, "cmsg"); serr = (void *) CMSG_DATA(cm); if (serr->ee_errno != 0 || serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) error(1, 0, "serr"); printf("completed: %u..%u\n", serr->ee_info, serr->ee_data);”…””}”hjsbah}”(h]”h ]”h"]”h$]”h&]”jujvuh1jehŸh¶h KÀhjÆhžhubeh}”(h]”Œnotification-parsing”ah ]”h"]”Œnotification parsing”ah$]”h&]”uh1h¡hjéhžhhŸh¶h K«ubh¢)”}”(hhh]”(h§)”}”(hŒDeferred copies”h]”hŒDeferred copies”…””}”(hj(hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hj%hžhhŸh¶h KÑubhÉ)”}”(hŒ¾Passing flag MSG_ZEROCOPY is a hint to the kernel to apply copy avoidance, and a contract that the kernel will queue a completion notification. It is not a guarantee that the copy is elided.”h]”hŒ¾Passing flag MSG_ZEROCOPY is a hint to the kernel to apply copy avoidance, and a contract that the kernel will queue a completion notification. It is not a guarantee that the copy is elided.”…””}”(hj6hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h KÓhj%hžhubhÉ)”}”(hXCopy avoidance is not always feasible. Devices that do not support scatter-gather I/O cannot send packets made up of kernel generated protocol headers plus zerocopy user data. A packet may need to be converted to a private copy of data deep in the stack, say to compute a checksum.”h]”hXCopy avoidance is not always feasible. Devices that do not support scatter-gather I/O cannot send packets made up of kernel generated protocol headers plus zerocopy user data. A packet may need to be converted to a private copy of data deep in the stack, say to compute a checksum.”…””}”(hjDhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h K×hj%hžhubhÉ)”}”(hXIn all these cases, the kernel returns a completion notification when it releases its hold on the shared pages. That notification may arrive before the (copied) data is fully transmitted. A zerocopy completion notification is not a transmit completion notification, therefore.”h]”hXIn all these cases, the kernel returns a completion notification when it releases its hold on the shared pages. That notification may arrive before the (copied) data is fully transmitted. A zerocopy completion notification is not a transmit completion notification, therefore.”…””}”(hjRhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h KÝhj%hžhubhÉ)”}”(hX»Deferred copies can be more expensive than a copy immediately in the system call, if the data is no longer warm in the cache. The process also incurs notification processing cost for no benefit. For this reason, the kernel signals if data was completed with a copy, by setting flag SO_EE_CODE_ZEROCOPY_COPIED in field ee_code on return. A process may use this signal to stop passing flag MSG_ZEROCOPY on subsequent requests on the same socket.”h]”hX»Deferred copies can be more expensive than a copy immediately in the system call, if the data is no longer warm in the cache. The process also incurs notification processing cost for no benefit. For this reason, the kernel signals if data was completed with a copy, by setting flag SO_EE_CODE_ZEROCOPY_COPIED in field ee_code on return. A process may use this signal to stop passing flag MSG_ZEROCOPY on subsequent requests on the same socket.”…””}”(hj`hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h Kâhj%hžhubeh}”(h]”Œdeferred-copies”ah ]”h"]”Œdeferred copies”ah$]”h&]”uh1h¡hjéhžhhŸh¶h KÑubeh}”(h]”Œ notifications”ah ]”h"]”Œ notifications”ah$]”h&]”uh1h¡hj'hžhhŸh¶h Keubeh}”(h]”Œ interface”ah ]”h"]”Œ interface”ah$]”h&]”uh1h¡hh£hžhhŸh¶h K:ubh¢)”}”(hhh]”(h§)”}”(hŒImplementation”h]”hŒImplementation”…””}”(hj‰hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hj†hžhhŸh¶h Kìubh¢)”}”(hhh]”(h§)”}”(hŒLoopback”h]”hŒLoopback”…””}”(hjšhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hj—hžhhŸh¶h KïubhÉ)”}”(hXhFor TCP and UDP: Data sent to local sockets can be queued indefinitely if the receive process does not read its socket. Unbound notification latency is not acceptable. For this reason all packets generated with MSG_ZEROCOPY that are looped to a local socket will incur a deferred copy. This includes looping onto packet sockets (e.g., tcpdump) and tun devices.”h]”hXhFor TCP and UDP: Data sent to local sockets can be queued indefinitely if the receive process does not read its socket. Unbound notification latency is not acceptable. For this reason all packets generated with MSG_ZEROCOPY that are looped to a local socket will incur a deferred copy. This includes looping onto packet sockets (e.g., tcpdump) and tun devices.”…””}”(hj¨hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h Kñhj—hžhubhÉ)”}”(hŒPFor VSOCK: Data path sent to local sockets is the same as for non-local sockets.”h]”hŒPFor VSOCK: Data path sent to local sockets is the same as for non-local sockets.”…””}”(hj¶hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h Køhj—hžhubeh}”(h]”Œloopback”ah ]”h"]”Œloopback”ah$]”h&]”uh1h¡hj†hžhhŸh¶h Kïubeh}”(h]”Œimplementation”ah ]”h"]”Œimplementation”ah$]”h&]”uh1h¡hh£hžhhŸh¶h Kìubh¢)”}”(hhh]”(h§)”}”(hŒTesting”h]”hŒTesting”…””}”(hj×hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¦hjÔhžhhŸh¶h KüubhÉ)”}”(hŒoMore realistic example code can be found in the kernel source under tools/testing/selftests/net/msg_zerocopy.c.”h]”hŒoMore realistic example code can be found in the kernel source under tools/testing/selftests/net/msg_zerocopy.c.”…””}”(hjåhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h KþhjÔhžhubhÉ)”}”(hX{Be cognizant of the loopback constraint. The test can be run between a pair of hosts. But if run between a local pair of processes, for instance when run with msg_zerocopy.sh between a veth pair across namespaces, the test will not show any improvement. For testing, the loopback restriction can be temporarily relaxed by making skb_orphan_frags_rx identical to skb_orphan_frags.”h]”hX{Be cognizant of the loopback constraint. The test can be run between a pair of hosts. But if run between a local pair of processes, for instance when run with msg_zerocopy.sh between a veth pair across namespaces, the test will not show any improvement. For testing, the loopback restriction can be temporarily relaxed by making skb_orphan_frags_rx identical to skb_orphan_frags.”…””}”(hjóhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h MhjÔhžhubhÉ)”}”(hŒ[For VSOCK type of socket example can be found in tools/testing/vsock/vsock_test_zerocopy.c.”h]”hŒ[For VSOCK type of socket example can be found in tools/testing/vsock/vsock_test_zerocopy.c.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÈhŸh¶h MhjÔhžhubeh}”(h]”Œtesting”ah ]”h"]”Œtesting”ah$]”h&]”uh1h¡hh£hžhhŸh¶h Küubeh}”(h]”Œ msg-zerocopy”ah ]”h"]”Œ msg_zerocopy”ah$]”h&]”uh1h¡hhhžhhŸh¶h Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”h¶uh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(h¦NŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jBŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”h¶Œ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”Œrefids”}”Œnameids”}”(jjj$j!j&j#jjjƒj€j|jyjæjãjÞjÛj{jxjrjojÃjÀj"jjsjpjÑjÎjÉjÆjjuŒ nametypes”}”(j‰j$‰j&‰j‰jƒ‰j|‰jæ‰jÞ‰j{‰jr‰jÉj"‰js‰jщjɉj‰uh}”(jh£j!h·j#hØjj)j€j'jyjFjãjjÛjºjxjéjojjÀjujjÆjpj%jÎj†jÆj—jjÔuŒ footnote_refs”}”Œ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”Œ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”…”R”Œparse_messages”]”Œtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nhžhub.