sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget/translations/zh_CN/block/ublkmodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget/translations/zh_TW/block/ublkmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget/translations/it_IT/block/ublkmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget/translations/ja_JP/block/ublkmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget/translations/ko_KR/block/ublkmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hPortuguese (Brazilian)}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget/translations/pt_BR/block/ublkmodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget/translations/sp_SP/block/ublkmodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhcomment)}(h SPDX-License-Identifier: GPL-2.0h]h SPDX-License-Identifier: GPL-2.0}hhsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1hhhhhh8/var/lib/git/docbuild/linux/Documentation/block/ublk.rsthKubhsection)}(hhh](htitle)}(h+Userspace block device driver (ublk driver)h]h+Userspace block device driver (ublk driver)}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hOverviewh]hOverview}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhhhhhKubh paragraph)}(hXZublk is a generic framework for implementing block device logic from userspace. The motivation behind it is that moving virtual block drivers into userspace, such as loop, nbd and similar can be very helpful. It can help to implement new virtual block device such as ublk-qcow2 (there are several attempts of implementing qcow2 driver in kernel).h]hXZublk is a generic framework for implementing block device logic from userspace. The motivation behind it is that moving virtual block drivers into userspace, such as loop, nbd and similar can be very helpful. It can help to implement new virtual block device such as ublk-qcow2 (there are several attempts of implementing qcow2 driver in kernel).}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hhhhubh)}(h/Userspace block devices are attractive because:h]h/Userspace block devices are attractive because:}(hhhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh bullet_list)}(hhh](h list_item)}(h/They can be written many programming languages.h]h)}(hjh]h/They can be written many programming languages.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hThey can be installed and updated independently of the kernel.h]h)}(hjh]h>They can be installed and updated independently of the kernel.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hoThey can be used to simulate block device easily with user specified parameters/setting for test/debug purpose h]h)}(hnThey can be used to simulate block device easily with user specified parameters/setting for test/debug purposeh]hnThey can be used to simulate block device easily with user specified parameters/setting for test/debug purpose}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]bullet-uh1j hhhKhhhhubh)}(hXNublk block device (``/dev/ublkb*``) is added by ublk driver. Any IO request on the device will be forwarded to ublk userspace program. For convenience, in this document, ``ublk server`` refers to generic ublk userspace program. ``ublksrv`` [#userspace]_ is one of such implementation. It provides ``libublksrv`` [#userspace_lib]_ library for developing specific user block device conveniently, while also generic type block device is included, such as loop and null. Richard W.M. Jones wrote userspace nbd device ``nbdublk`` [#userspace_nbdublk]_ based on ``libublksrv`` [#userspace_lib]_.h](hublk block device (}(hjhhhNhNubhliteral)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh) is added by ublk driver. Any IO request on the device will be forwarded to ublk userspace program. For convenience, in this document, }(hjhhhNhNubj)}(h``ublk server``h]h ublk server}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh+ refers to generic ublk userspace program. }(hjhhhNhNubj)}(h ``ublksrv``h]hublksrv}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh }(hjhhhNhNubhfootnote_reference)}(h [#userspace]_h]h1}(hjhhhNhNubah}(h]id1ah ]h"]h$]h&]autoKrefid userspacedocname block/ublkuh1jhjresolvedKubh, is one of such implementation. It provides }(hjhhhNhNubj)}(h``libublksrv``h]h libublksrv}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh }hjsbj)}(h[#userspace_lib]_h]h2}(hj+hhhNhNubah}(h]id2ah ]h"]h$]h&]jKj userspace-libjjuh1jhjjKubh library for developing specific user block device conveniently, while also generic type block device is included, such as loop and null. Richard W.M. Jones wrote userspace nbd device }(hjhhhNhNubj)}(h ``nbdublk``h]hnbdublk}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh }hjsbj)}(h[#userspace_nbdublk]_h]h3}(hjQhhhNhNubah}(h]id3ah ]h"]h$]h&]jKjuserspace-nbdublkjjuh1jhjjKubh based on }(hjhhhNhNubj)}(h``libublksrv``h]h libublksrv}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh }hjsbj)}(h[#userspace_lib]_h]h2}(hjwhhhNhNubah}(h]id4ah ]h"]h$]h&]jKjj:jjuh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhhhhubh)}(hXAfter the IO is handled by userspace, the result is committed back to the driver, thus completing the request cycle. This way, any specific IO handling logic is totally done by userspace, such as loop's IO handling, NBD's IO communication, or qcow2's IO mapping.h]hX After the IO is handled by userspace, the result is committed back to the driver, thus completing the request cycle. This way, any specific IO handling logic is totally done by userspace, such as loop’s IO handling, NBD’s IO communication, or qcow2’s IO mapping.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK%hhhhubh)}(h``/dev/ublkb*`` is driven by blk-mq request-based driver. Each request is assigned by one queue wide unique tag. ublk server assigns unique tag to each IO too, which is 1:1 mapped with IO of ``/dev/ublkb*``.h](j)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is driven by blk-mq request-based driver. Each request is assigned by one queue wide unique tag. ublk server assigns unique tag to each IO too, which is 1:1 mapped with IO of }(hjhhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK*hhhhubh)}(hXBoth the IO request forward and IO handling result committing are done via ``io_uring`` passthrough command; that is why ublk is also one io_uring based block driver. It has been observed that using io_uring passthrough command can give better IOPS than block IO; which is why ublk is one of high performance implementation of userspace block device: not only IO request communication is done by io_uring, but also the preferred IO handling in ublk server is io_uring based approach too.h](hKBoth the IO request forward and IO handling result committing are done via }(hjhhhNhNubj)}(h ``io_uring``h]hio_uring}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhX passthrough command; that is why ublk is also one io_uring based block driver. It has been observed that using io_uring passthrough command can give better IOPS than block IO; which is why ublk is one of high performance implementation of userspace block device: not only IO request communication is done by io_uring, but also the preferred IO handling in ublk server is io_uring based approach too.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK.hhhhubh)}(hXublk provides control interface to set/get ublk block device parameters. The interface is extendable and kabi compatible: basically any ublk request queue's parameter or ublk generic feature parameters can be set/get via the interface. Thus, ublk is generic userspace block device framework. For example, it is easy to setup a ublk device with specified block parameters from userspace.h]hXublk provides control interface to set/get ublk block device parameters. The interface is extendable and kabi compatible: basically any ublk request queue’s parameter or ublk generic feature parameters can be set/get via the interface. Thus, ublk is generic userspace block device framework. For example, it is easy to setup a ublk device with specified block parameters from userspace.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK6hhhhubeh}(h]overviewah ]h"]overviewah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(h Using ublkh]h Using ublk}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhK>ubh)}(hFublk requires userspace ublk server to handle real block device logic.h]hFublk requires userspace ublk server to handle real block device logic.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK@hjhhubh)}(hHBelow is example of using ``ublksrv`` to provide ublk-based loop device.h](hBelow is example of using }(hj!hhhNhNubj)}(h ``ublksrv``h]hublksrv}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj!ubh# to provide ublk-based loop device.}(hj!hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKBhjhhubj )}(hhh](j)}(h5add a device:: ublk add -t loop -f ublk-loop.img h](h)}(hadd a device::h]h add a device:}(hjHhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKDhjDubh literal_block)}(h!ublk add -t loop -f ublk-loop.imgh]h!ublk add -t loop -f ublk-loop.img}hjXsbah}(h]h ]h"]h$]h&]hhuh1jVhhhKFhjDubeh}(h]h ]h"]h$]h&]uh1jhjAhhhhhNubj)}(hformat with xfs, then use it:: mkfs.xfs /dev/ublkb0 mount /dev/ublkb0 /mnt # do anything. all IOs are handled by io_uring ... umount /mnt h](h)}(hformat with xfs, then use it::h]hformat with xfs, then use it:}(hjphhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKHhjlubjW)}(hjmkfs.xfs /dev/ublkb0 mount /dev/ublkb0 /mnt # do anything. all IOs are handled by io_uring ... umount /mnth]hjmkfs.xfs /dev/ublkb0 mount /dev/ublkb0 /mnt # do anything. all IOs are handled by io_uring ... umount /mnt}hj~sbah}(h]h ]h"]h$]h&]hhuh1jVhhhKJhjlubeh}(h]h ]h"]h$]h&]uh1jhjAhhhhhNubj)}(h1list the devices with their info:: ublk list h](h)}(h"list the devices with their info::h]h!list the devices with their info:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKPhjubjW)}(h ublk listh]h ublk list}hjsbah}(h]h ]h"]h$]h&]hhuh1jVhhhKRhjubeh}(h]h ]h"]h$]h&]uh1jhjAhhhhhNubj)}(h@delete the device:: ublk del -a ublk del -n $ublk_dev_id h](h)}(hdelete the device::h]hdelete the device:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKThjubjW)}(h$ublk del -a ublk del -n $ublk_dev_idh]h$ublk del -a ublk del -n $ublk_dev_id}hjsbah}(h]h ]h"]h$]h&]hhuh1jVhhhKVhjubeh}(h]h ]h"]h$]h&]uh1jhjAhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhKDhjhhubh)}(h@See usage details in README of ``ublksrv`` [#userspace_readme]_.h](hSee usage details in README of }(hjhhhNhNubj)}(h ``ublksrv``h]hublksrv}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh }(hjhhhNhNubj)}(h[#userspace_readme]_h]h4}(hjhhhNhNubah}(h]id5ah ]h"]h$]h&]jKjuserspace-readmejjuh1jhjjKubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKYhjhhubeh}(h] using-ublkah ]h"] using ublkah$]h&]uh1hhhhhhhhK>ubh)}(hhh](h)}(hDesignh]hDesign}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhK\ubh)}(hhh](h)}(h Control planeh]h Control plane}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj1hhhhhK_ubh)}(hublk driver provides global misc device node (``/dev/ublk-control``) for managing and controlling ublk devices with help of several control commands:h](h.ublk driver provides global misc device node (}(hjBhhhNhNubj)}(h``/dev/ublk-control``h]h/dev/ublk-control}(hjJhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjBubhR) for managing and controlling ublk devices with help of several control commands:}(hjBhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKahj1hhubj )}(hhh](j)}(hX``UBLK_CMD_ADD_DEV`` Add a ublk char device (``/dev/ublkc*``) which is talked with ublk server WRT IO command communication. Basic device info is sent together with this command. It sets UAPI structure of ``ublksrv_ctrl_dev_info``, such as ``nr_hw_queues``, ``queue_depth``, and max IO request buffer size, for which the info is negotiated with the driver and sent back to the server. When this command is completed, the basic device info is immutable. h](h)}(h``UBLK_CMD_ADD_DEV``h]j)}(hjkh]hUBLK_CMD_ADD_DEV}(hjmhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjiubah}(h]h ]h"]h$]h&]uh1hhhhKdhjeubh)}(hXAdd a ublk char device (``/dev/ublkc*``) which is talked with ublk server WRT IO command communication. Basic device info is sent together with this command. It sets UAPI structure of ``ublksrv_ctrl_dev_info``, such as ``nr_hw_queues``, ``queue_depth``, and max IO request buffer size, for which the info is negotiated with the driver and sent back to the server. When this command is completed, the basic device info is immutable.h](hAdd a ublk char device (}(hjhhhNhNubj)}(h``/dev/ublkc*``h]h /dev/ublkc*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh) which is talked with ublk server WRT IO command communication. Basic device info is sent together with this command. It sets UAPI structure of }(hjhhhNhNubj)}(h``ublksrv_ctrl_dev_info``h]hublksrv_ctrl_dev_info}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh , such as }(hjhhhNhNubj)}(h``nr_hw_queues``h]h nr_hw_queues}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh, }(hjhhhNhNubj)}(h``queue_depth``h]h queue_depth}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh, and max IO request buffer size, for which the info is negotiated with the driver and sent back to the server. When this command is completed, the basic device info is immutable.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKfhjeubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(hX9``UBLK_CMD_SET_PARAMS`` / ``UBLK_CMD_GET_PARAMS`` Set or get parameters of the device, which can be either generic feature related, or request queue limit related, but can't be IO logic specific, because the driver does not handle any IO logic. This command has to be sent before sending ``UBLK_CMD_START_DEV``. h](h)}(h1``UBLK_CMD_SET_PARAMS`` / ``UBLK_CMD_GET_PARAMS``h](j)}(h``UBLK_CMD_SET_PARAMS``h]hUBLK_CMD_SET_PARAMS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh / }(hjhhhNhNubj)}(h``UBLK_CMD_GET_PARAMS``h]hUBLK_CMD_GET_PARAMS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1hhhhKmhjubh)}(hXSet or get parameters of the device, which can be either generic feature related, or request queue limit related, but can't be IO logic specific, because the driver does not handle any IO logic. This command has to be sent before sending ``UBLK_CMD_START_DEV``.h](hSet or get parameters of the device, which can be either generic feature related, or request queue limit related, but can’t be IO logic specific, because the driver does not handle any IO logic. This command has to be sent before sending }(hj hhhNhNubj)}(h``UBLK_CMD_START_DEV``h]hUBLK_CMD_START_DEV}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKohjubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(hX,``UBLK_CMD_START_DEV`` After the server prepares userspace resources (such as creating I/O handler threads & io_uring for handling ublk IO), this command is sent to the driver for allocating & exposing ``/dev/ublkb*``. Parameters set via ``UBLK_CMD_SET_PARAMS`` are applied for creating the device. h](h)}(h``UBLK_CMD_START_DEV``h]j)}(hj6h]hUBLK_CMD_START_DEV}(hj8hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj4ubah}(h]h ]h"]h$]h&]uh1hhhhKthj0ubh)}(hXAfter the server prepares userspace resources (such as creating I/O handler threads & io_uring for handling ublk IO), this command is sent to the driver for allocating & exposing ``/dev/ublkb*``. Parameters set via ``UBLK_CMD_SET_PARAMS`` are applied for creating the device.h](hAfter the server prepares userspace resources (such as creating I/O handler threads & io_uring for handling ublk IO), this command is sent to the driver for allocating & exposing }(hjKhhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjShhhNhNubah}(h]h ]h"]h$]h&]uh1jhjKubh. Parameters set via }(hjKhhhNhNubj)}(h``UBLK_CMD_SET_PARAMS``h]hUBLK_CMD_SET_PARAMS}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1jhjKubh% are applied for creating the device.}(hjKhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKvhj0ubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(h``UBLK_CMD_STOP_DEV`` Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns, ublk server will release resources (such as destroying I/O handler threads & io_uring). h](h)}(h``UBLK_CMD_STOP_DEV``h]j)}(hjh]hUBLK_CMD_STOP_DEV}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhK{hjubh)}(hHalt IO on ``/dev/ublkb*`` and remove the device. When this command returns, ublk server will release resources (such as destroying I/O handler threads & io_uring).h](h Halt IO on }(hjhhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh and remove the device. When this command returns, ublk server will release resources (such as destroying I/O handler threads & io_uring).}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhK}hjubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(hy``UBLK_CMD_DEL_DEV`` Remove ``/dev/ublkc*``. When this command returns, the allocated ublk device number can be reused. h](h)}(h``UBLK_CMD_DEL_DEV``h]j)}(hjh]hUBLK_CMD_DEL_DEV}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhKhjubh)}(hbRemove ``/dev/ublkc*``. When this command returns, the allocated ublk device number can be reused.h](hRemove }(hjhhhNhNubj)}(h``/dev/ublkc*``h]h /dev/ublkc*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhL. When this command returns, the allocated ublk device number can be reused.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(hXu``UBLK_CMD_GET_QUEUE_AFFINITY`` When ``/dev/ublkc`` is added, the driver creates block layer tagset, so that each queue's affinity info is available. The server sends ``UBLK_CMD_GET_QUEUE_AFFINITY`` to retrieve queue affinity info. It can set up the per-queue context efficiently, such as bind affine CPUs with IO pthread and try to allocate buffers in IO thread context. h](h)}(h``UBLK_CMD_GET_QUEUE_AFFINITY``h]j)}(hj h]hUBLK_CMD_GET_QUEUE_AFFINITY}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1hhhhKhjubh)}(hXSWhen ``/dev/ublkc`` is added, the driver creates block layer tagset, so that each queue's affinity info is available. The server sends ``UBLK_CMD_GET_QUEUE_AFFINITY`` to retrieve queue affinity info. It can set up the per-queue context efficiently, such as bind affine CPUs with IO pthread and try to allocate buffers in IO thread context.h](hWhen }(hj hhhNhNubj)}(h``/dev/ublkc``h]h /dev/ublkc}(hj(hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhv is added, the driver creates block layer tagset, so that each queue’s affinity info is available. The server sends }(hj hhhNhNubj)}(h``UBLK_CMD_GET_QUEUE_AFFINITY``h]hUBLK_CMD_GET_QUEUE_AFFINITY}(hj:hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh to retrieve queue affinity info. It can set up the per-queue context efficiently, such as bind affine CPUs with IO pthread and try to allocate buffers in IO thread context.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(h``UBLK_CMD_GET_DEV_INFO`` For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's responsibility to save IO target specific info in userspace. h](h)}(h``UBLK_CMD_GET_DEV_INFO``h]j)}(hj^h]hUBLK_CMD_GET_DEV_INFO}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj\ubah}(h]h ]h"]h$]h&]uh1hhhhKhjXubh)}(hFor retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's responsibility to save IO target specific info in userspace.h](hFor retrieving device info via }(hjshhhNhNubj)}(h``ublksrv_ctrl_dev_info``h]hublksrv_ctrl_dev_info}(hj{hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjsubhS. It is the server’s responsibility to save IO target specific info in userspace.}(hjshhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjXubeh}(h]h ]h"]h$]h&]uh1jhjbhhhhhNubj)}(hX``UBLK_CMD_GET_DEV_INFO2`` Same purpose with ``UBLK_CMD_GET_DEV_INFO``, but ublk server has to provide path of the char device of ``/dev/ublkc*`` for kernel to run permission check, and this command is added for supporting unprivileged ublk device, and introduced with ``UBLK_F_UNPRIVILEGED_DEV`` together. Only the user owning the requested device can retrieve the device info. How to deal with userspace/kernel compatibility: 1) if kernel is capable of handling ``UBLK_F_UNPRIVILEGED_DEV`` If ublk server supports ``UBLK_F_UNPRIVILEGED_DEV``: ublk server should send ``UBLK_CMD_GET_DEV_INFO2``, given anytime unprivileged application needs to query devices the current user owns, when the application has no idea if ``UBLK_F_UNPRIVILEGED_DEV`` is set given the capability info is stateless, and application should always retrieve it via ``UBLK_CMD_GET_DEV_INFO2`` If ublk server doesn't support ``UBLK_F_UNPRIVILEGED_DEV``: ``UBLK_CMD_GET_DEV_INFO`` is always sent to kernel, and the feature of UBLK_F_UNPRIVILEGED_DEV isn't available for user 2) if kernel isn't capable of handling ``UBLK_F_UNPRIVILEGED_DEV`` If ublk server supports ``UBLK_F_UNPRIVILEGED_DEV``: ``UBLK_CMD_GET_DEV_INFO2`` is tried first, and will be failed, then ``UBLK_CMD_GET_DEV_INFO`` needs to be retried given ``UBLK_F_UNPRIVILEGED_DEV`` can't be set If ublk server doesn't support ``UBLK_F_UNPRIVILEGED_DEV``: ``UBLK_CMD_GET_DEV_INFO`` is always sent to kernel, and the feature of ``UBLK_F_UNPRIVILEGED_DEV`` isn't available for user h](h)}(hXz``UBLK_CMD_GET_DEV_INFO2`` Same purpose with ``UBLK_CMD_GET_DEV_INFO``, but ublk server has to provide path of the char device of ``/dev/ublkc*`` for kernel to run permission check, and this command is added for supporting unprivileged ublk device, and introduced with ``UBLK_F_UNPRIVILEGED_DEV`` together. Only the user owning the requested device can retrieve the device info.h](j)}(h``UBLK_CMD_GET_DEV_INFO2``h]hUBLK_CMD_GET_DEV_INFO2}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh Same purpose with }(hjhhhNhNubj)}(h``UBLK_CMD_GET_DEV_INFO``h]hUBLK_CMD_GET_DEV_INFO}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh<, but ublk server has to provide path of the char device of }(hjhhhNhNubj)}(h``/dev/ublkc*``h]h /dev/ublkc*}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh| for kernel to run permission check, and this command is added for supporting unprivileged ublk device, and introduced with }(hjhhhNhNubj)}(h``UBLK_F_UNPRIVILEGED_DEV``h]hUBLK_F_UNPRIVILEGED_DEV}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhR together. Only the user owning the requested device can retrieve the device info.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhjubh)}(h0How to deal with userspace/kernel compatibility:h]h0How to deal with userspace/kernel compatibility:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubhenumerated_list)}(hhh]j)}(h=if kernel is capable of handling ``UBLK_F_UNPRIVILEGED_DEV`` h]h)}(h hhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjF hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj> ubh.}(hj> hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hXDUAPI structure of ``ublksrv_io_desc`` is defined for describing each IO from the driver. A fixed mmapped area (array) on ``/dev/ublkc*`` is provided for exporting IO info to the server; such as IO offset, length, OP/flags and buffer address. Each ``ublksrv_io_desc`` instance can be indexed via queue id and IO tag directly.h](hUAPI structure of }(hj^ hhhNhNubj)}(h``ublksrv_io_desc``h]hublksrv_io_desc}(hjf hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj^ ubhT is defined for describing each IO from the driver. A fixed mmapped area (array) on }(hj^ hhhNhNubj)}(h``/dev/ublkc*``h]h /dev/ublkc*}(hjx hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj^ ubho is provided for exporting IO info to the server; such as IO offset, length, OP/flags and buffer address. Each }(hj^ hhhNhNubj)}(h``ublksrv_io_desc``h]hublksrv_io_desc}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj^ ubh: instance can be indexed via queue id and IO tag directly.}(hj^ hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhKhj hhubh)}(hThe following IO commands are communicated via io_uring passthrough command, and each command is only for forwarding the IO and committing the result with specified IO tag in the command data:h]hThe following IO commands are communicated via io_uring passthrough command, and each command is only for forwarding the IO and committing the result with specified IO tag in the command data:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hhh](h)}(hTraditional Per-I/O Commandsh]hTraditional Per-I/O Commands}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMubj )}(hhh](j)}(hX``UBLK_U_IO_FETCH_REQ`` Sent from the server I/O pthread for fetching future incoming I/O requests destined to ``/dev/ublkb*``. This command is sent only once from the server IO pthread for ublk driver to setup IO forward environment. Once a thread issues this command against a given (qid,tag) pair, the thread registers itself as that I/O's daemon. In the future, only that I/O's daemon is allowed to issue commands against the I/O. If any other thread attempts to issue a command against a (qid,tag) pair for which the thread is not the daemon, the command will fail. Daemons can be reset only be going through recovery. The ability for every (qid,tag) pair to have its own independent daemon task is indicated by the ``UBLK_F_PER_IO_DAEMON`` feature. If this feature is not supported by the driver, daemons must be per-queue instead - i.e. all I/Os associated to a single qid must be handled by the same task. h](h)}(h``UBLK_U_IO_FETCH_REQ``h]j)}(hj h]hUBLK_U_IO_FETCH_REQ}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1hhhhM hj ubh)}(hSent from the server I/O pthread for fetching future incoming I/O requests destined to ``/dev/ublkb*``. This command is sent only once from the server IO pthread for ublk driver to setup IO forward environment.h](hWSent from the server I/O pthread for fetching future incoming I/O requests destined to }(hj hhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhl. This command is sent only once from the server IO pthread for ublk driver to setup IO forward environment.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hj ubh)}(hXOnce a thread issues this command against a given (qid,tag) pair, the thread registers itself as that I/O's daemon. In the future, only that I/O's daemon is allowed to issue commands against the I/O. If any other thread attempts to issue a command against a (qid,tag) pair for which the thread is not the daemon, the command will fail. Daemons can be reset only be going through recovery.h]hXOnce a thread issues this command against a given (qid,tag) pair, the thread registers itself as that I/O’s daemon. In the future, only that I/O’s daemon is allowed to issue commands against the I/O. If any other thread attempts to issue a command against a (qid,tag) pair for which the thread is not the daemon, the command will fail. Daemons can be reset only be going through recovery.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubh)}(hX!The ability for every (qid,tag) pair to have its own independent daemon task is indicated by the ``UBLK_F_PER_IO_DAEMON`` feature. If this feature is not supported by the driver, daemons must be per-queue instead - i.e. all I/Os associated to a single qid must be handled by the same task.h](haThe ability for every (qid,tag) pair to have its own independent daemon task is indicated by the }(hj hhhNhNubj)}(h``UBLK_F_PER_IO_DAEMON``h]hUBLK_F_PER_IO_DAEMON}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh feature. If this feature is not supported by the driver, daemons must be per-queue instead - i.e. all I/Os associated to a single qid must be handled by the same task.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj ubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hX``UBLK_U_IO_COMMIT_AND_FETCH_REQ`` When an IO request is destined to ``/dev/ublkb*``, the driver stores the IO's ``ublksrv_io_desc`` to the specified mapped area; then the previous received IO command of this IO tag (either ``UBLK_IO_FETCH_REQ`` or ``UBLK_IO_COMMIT_AND_FETCH_REQ)`` is completed, so the server gets the IO notification via io_uring. After the server handles the IO, its result is committed back to the driver by sending ``UBLK_IO_COMMIT_AND_FETCH_REQ`` back. Once ublkdrv received this command, it parses the result and complete the request to ``/dev/ublkb*``. In the meantime setup environment for fetching future requests with the same IO tag. That is, ``UBLK_IO_COMMIT_AND_FETCH_REQ`` is reused for both fetching request and committing back IO result. h](h)}(h"``UBLK_U_IO_COMMIT_AND_FETCH_REQ``h]j)}(hj9 h]hUBLK_U_IO_COMMIT_AND_FETCH_REQ}(hj; hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj7 ubah}(h]h ]h"]h$]h&]uh1hhhhMhj3 ubh)}(hX:When an IO request is destined to ``/dev/ublkb*``, the driver stores the IO's ``ublksrv_io_desc`` to the specified mapped area; then the previous received IO command of this IO tag (either ``UBLK_IO_FETCH_REQ`` or ``UBLK_IO_COMMIT_AND_FETCH_REQ)`` is completed, so the server gets the IO notification via io_uring.h](h"When an IO request is destined to }(hjN hhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjV hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjN ubh, the driver stores the IO’s }(hjN hhhNhNubj)}(h``ublksrv_io_desc``h]hublksrv_io_desc}(hjh hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjN ubh\ to the specified mapped area; then the previous received IO command of this IO tag (either }(hjN hhhNhNubj)}(h``UBLK_IO_FETCH_REQ``h]hUBLK_IO_FETCH_REQ}(hjz hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjN ubh or }(hjN hhhNhNubj)}(h!``UBLK_IO_COMMIT_AND_FETCH_REQ)``h]hUBLK_IO_COMMIT_AND_FETCH_REQ)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjN ubhC is completed, so the server gets the IO notification via io_uring.}(hjN hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj3 ubh)}(hXAfter the server handles the IO, its result is committed back to the driver by sending ``UBLK_IO_COMMIT_AND_FETCH_REQ`` back. Once ublkdrv received this command, it parses the result and complete the request to ``/dev/ublkb*``. In the meantime setup environment for fetching future requests with the same IO tag. That is, ``UBLK_IO_COMMIT_AND_FETCH_REQ`` is reused for both fetching request and committing back IO result.h](hWAfter the server handles the IO, its result is committed back to the driver by sending }(hj hhhNhNubj)}(h ``UBLK_IO_COMMIT_AND_FETCH_REQ``h]hUBLK_IO_COMMIT_AND_FETCH_REQ}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh\ back. Once ublkdrv received this command, it parses the result and complete the request to }(hj hhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh`. In the meantime setup environment for fetching future requests with the same IO tag. That is, }(hj hhhNhNubj)}(h ``UBLK_IO_COMMIT_AND_FETCH_REQ``h]hUBLK_IO_COMMIT_AND_FETCH_REQ}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhC is reused for both fetching request and committing back IO result.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM$hj3 ubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hX``UBLK_U_IO_NEED_GET_DATA`` With ``UBLK_F_NEED_GET_DATA`` enabled, the WRITE request will be firstly issued to ublk server without data copy. Then, IO backend of ublk server receives the request and it can allocate data buffer and embed its addr inside this new io command. After the kernel driver gets the command, data copy is done from request pages to this backend's buffer. Finally, backend receives the request again with data to be written and it can truly handle the request. ``UBLK_IO_NEED_GET_DATA`` adds one additional round-trip and one io_uring_enter() syscall. Any user thinks that it may lower performance should not enable UBLK_F_NEED_GET_DATA. ublk server pre-allocates IO buffer for each IO by default. Any new project should try to use this buffer to communicate with ublk driver. However, existing project may break or not able to consume the new buffer interface; that's why this command is added for backwards compatibility so that existing projects can still consume existing buffers. h](h)}(h``UBLK_U_IO_NEED_GET_DATA``h]j)}(hj h]hUBLK_U_IO_NEED_GET_DATA}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubah}(h]h ]h"]h$]h&]uh1hhhhM+hj ubh)}(hXWith ``UBLK_F_NEED_GET_DATA`` enabled, the WRITE request will be firstly issued to ublk server without data copy. Then, IO backend of ublk server receives the request and it can allocate data buffer and embed its addr inside this new io command. After the kernel driver gets the command, data copy is done from request pages to this backend's buffer. Finally, backend receives the request again with data to be written and it can truly handle the request.h](hWith }(hj hhhNhNubj)}(h``UBLK_F_NEED_GET_DATA``h]hUBLK_F_NEED_GET_DATA}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhX enabled, the WRITE request will be firstly issued to ublk server without data copy. Then, IO backend of ublk server receives the request and it can allocate data buffer and embed its addr inside this new io command. After the kernel driver gets the command, data copy is done from request pages to this backend’s buffer. Finally, backend receives the request again with data to be written and it can truly handle the request.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM-hj ubh)}(hX ``UBLK_IO_NEED_GET_DATA`` adds one additional round-trip and one io_uring_enter() syscall. Any user thinks that it may lower performance should not enable UBLK_F_NEED_GET_DATA. ublk server pre-allocates IO buffer for each IO by default. Any new project should try to use this buffer to communicate with ublk driver. However, existing project may break or not able to consume the new buffer interface; that's why this command is added for backwards compatibility so that existing projects can still consume existing buffers.h](j)}(h``UBLK_IO_NEED_GET_DATA``h]hUBLK_IO_NEED_GET_DATA}(hj- hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj) ubhX adds one additional round-trip and one io_uring_enter() syscall. Any user thinks that it may lower performance should not enable UBLK_F_NEED_GET_DATA. ublk server pre-allocates IO buffer for each IO by default. Any new project should try to use this buffer to communicate with ublk driver. However, existing project may break or not able to consume the new buffer interface; that’s why this command is added for backwards compatibility so that existing projects can still consume existing buffers.}(hj) hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM5hj ubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hXdata copy between ublk server IO buffer and ublk block IO request The driver needs to copy the block IO request pages into the server buffer (pages) first for WRITE before notifying the server of the coming IO, so that the server can handle WRITE request. When the server handles READ request and sends ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages. h](h)}(hAdata copy between ublk server IO buffer and ublk block IO requesth]hAdata copy between ublk server IO buffer and ublk block IO request}(hjO hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM>hjK ubh)}(hThe driver needs to copy the block IO request pages into the server buffer (pages) first for WRITE before notifying the server of the coming IO, so that the server can handle WRITE request.h]hThe driver needs to copy the block IO request pages into the server buffer (pages) first for WRITE before notifying the server of the coming IO, so that the server can handle WRITE request.}(hj] hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM@hjK ubh)}(hWhen the server handles READ request and sends ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages.h](h/When the server handles READ request and sends }(hjk hhhNhNubj)}(h ``UBLK_IO_COMMIT_AND_FETCH_REQ``h]hUBLK_IO_COMMIT_AND_FETCH_REQ}(hjs hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjk ubh] to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages.}(hjk hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMDhjK ubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhM hj hhubeh}(h]traditional-per-i-o-commandsah ]h"]traditional per-i/o commandsah$]h&]uh1hhj hhhhhMubh)}(hhh](h)}(h$Batch I/O Commands (UBLK_F_BATCH_IO)2h]h$Batch I/O Commands (UBLK_F_BATCH_IO)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMIubh)}(hXThe ``UBLK_F_BATCH_IO`` feature provides an alternative high-performance I/O handling model that replaces the traditional per-I/O commands with per-queue batch commands. This significantly reduces communication overhead and enables better load balancing across multiple server tasks.h](hThe }(hj hhhNhNubj)}(h``UBLK_F_BATCH_IO``h]hUBLK_F_BATCH_IO}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubhX feature provides an alternative high-performance I/O handling model that replaces the traditional per-I/O commands with per-queue batch commands. This significantly reduces communication overhead and enables better load balancing across multiple server tasks.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMKhj hhubh)}(h&Key differences from traditional mode:h]h&Key differences from traditional mode:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMPhj hhubj )}(hhh](j)}(hP**Per-queue vs Per-I/O**: Commands operate on queues rather than individual I/Osh]h)}(hj h](hstrong)}(h**Per-queue vs Per-I/O**h]hPer-queue vs Per-I/O}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubh8: Commands operate on queues rather than individual I/Os}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMRhj ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hD**Batch processing**: Multiple I/Os are handled in single operationsh]h)}(hj h](j )}(h**Batch processing**h]hBatch processing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubh0: Multiple I/Os are handled in single operations}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMShjubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hN**Multishot commands**: Use io_uring multishot for reduced submission overheadh]h)}(hj/h](j )}(h**Multishot commands**h]hMultishot commands}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj1ubh8: Use io_uring multishot for reduced submission overhead}(hj1hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMThj-ubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hN**Flexible task assignment**: Any task can handle any I/O (no per-I/O daemons)h]h)}(hjTh](j )}(h**Flexible task assignment**h]hFlexible task assignment}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjVubh2: Any task can handle any I/O (no per-I/O daemons)}(hjVhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMUhjRubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(hG**Better load balancing**: Tasks can adjust their workload dynamically h]h)}(hF**Better load balancing**: Tasks can adjust their workload dynamicallyh](j )}(h**Better load balancing**h]hBetter load balancing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hj{ubh-: Tasks can adjust their workload dynamically}(hj{hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMVhjwubah}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhMRhj hhubh)}(hBatch I/O Commands:h]hBatch I/O Commands:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMXhj hhubj )}(hhh](j)}(h``UBLK_U_IO_PREP_IO_CMDS`` Prepares multiple I/O commands in batch. The server provides a buffer containing multiple I/O descriptors that will be processed together. This reduces the number of individual command submissions required. h](h)}(h``UBLK_U_IO_PREP_IO_CMDS``h]j)}(hjh]hUBLK_U_IO_PREP_IO_CMDS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhMZhjubh)}(hPrepares multiple I/O commands in batch. The server provides a buffer containing multiple I/O descriptors that will be processed together. This reduces the number of individual command submissions required.h]hPrepares multiple I/O commands in batch. The server provides a buffer containing multiple I/O descriptors that will be processed together. This reduces the number of individual command submissions required.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM\hjubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hX ``UBLK_U_IO_COMMIT_IO_CMDS`` Commits results for multiple I/O operations in batch, and prepares the I/O descriptors to accept new requests. The server provides a buffer containing the results of multiple completed I/Os, allowing efficient bulk completion of requests. h](h)}(h``UBLK_U_IO_COMMIT_IO_CMDS``h]j)}(hjh]hUBLK_U_IO_COMMIT_IO_CMDS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhM`hjubh)}(hCommits results for multiple I/O operations in batch, and prepares the I/O descriptors to accept new requests. The server provides a buffer containing the results of multiple completed I/Os, allowing efficient bulk completion of requests.h]hCommits results for multiple I/O operations in batch, and prepares the I/O descriptors to accept new requests. The server provides a buffer containing the results of multiple completed I/Os, allowing efficient bulk completion of requests.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMbhjubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hXQ``UBLK_U_IO_FETCH_IO_CMDS`` **Multishot command** for fetching I/O commands in batch. This is the key command that enables high-performance batch processing: * Uses io_uring multishot capability for reduced submission overhead * Single command can fetch multiple I/O requests over time * Buffer size determines maximum batch size per operation * Multiple fetch commands can be submitted for load balancing * Only one fetch command is active at any time per queue * Supports dynamic load balancing across multiple server tasks It is one typical multishot io_uring request with provided buffer, and it won't be completed until any failure is triggered. Each task can submit ``UBLK_U_IO_FETCH_IO_CMDS`` with different buffer sizes to control how much work it handles. This enables sophisticated load balancing strategies in multi-threaded servers. h](h)}(h``UBLK_U_IO_FETCH_IO_CMDS``h]j)}(hjh]hUBLK_U_IO_FETCH_IO_CMDS}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhMghjubh)}(h**Multishot command** for fetching I/O commands in batch. This is the key command that enables high-performance batch processing:h](j )}(h**Multishot command**h]hMultishot command}(hj1hhhNhNubah}(h]h ]h"]h$]h&]uh1j hj-ubhl for fetching I/O commands in batch. This is the key command that enables high-performance batch processing:}(hj-hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMihjubj )}(hhh](j)}(hBUses io_uring multishot capability for reduced submission overheadh]h)}(hjNh]hBUses io_uring multishot capability for reduced submission overhead}(hjPhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMlhjLubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(h8Single command can fetch multiple I/O requests over timeh]h)}(hjeh]h8Single command can fetch multiple I/O requests over time}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMmhjcubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(h7Buffer size determines maximum batch size per operationh]h)}(hj|h]h7Buffer size determines maximum batch size per operation}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMnhjzubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(h;Multiple fetch commands can be submitted for load balancingh]h)}(hjh]h;Multiple fetch commands can be submitted for load balancing}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMohjubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(h6Only one fetch command is active at any time per queueh]h)}(hjh]h6Only one fetch command is active at any time per queue}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMphjubah}(h]h ]h"]h$]h&]uh1jhjIubj)}(h=Supports dynamic load balancing across multiple server tasks h]h)}(hThe ublk server must create a sparse buffer table on the same }(hj1hhhNhNubj)}(h``io_ring_ctx``h]h io_ring_ctx}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubh used for }(hj1hhhNhNubj)}(h``UBLK_IO_FETCH_REQ``h]hUBLK_IO_FETCH_REQ}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubh and }(hj1hhhNhNubj)}(h ``UBLK_IO_COMMIT_AND_FETCH_REQ``h]hUBLK_IO_COMMIT_AND_FETCH_REQ}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubh(. If uring_cmd is issued on a different }(hj1hhhNhNubj)}(h``io_ring_ctx``h]h io_ring_ctx}(hjohhhNhNubah}(h]h ]h"]h$]h&]uh1jhj1ubh+, manual buffer unregistration is required.}(hj1hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhj-ubah}(h]h ]h"]h$]h&]uh1jhj*hhhhhNubj)}(hXBuffer registration data must be passed via uring_cmd's ``sqe->addr`` with the following structure:: struct ublk_auto_buf_reg { __u16 index; /* Buffer index for registration */ __u8 flags; /* Registration flags */ __u8 reserved0; /* Reserved for future use */ __u32 reserved1; /* Reserved for future use */ }; ublk_auto_buf_reg_to_sqe_addr() is for converting the above structure into ``sqe->addr``. h](h)}(hdBuffer registration data must be passed via uring_cmd's ``sqe->addr`` with the following structure::h](h:Buffer registration data must be passed via uring_cmd’s }(hjhhhNhNubj)}(h ``sqe->addr``h]h sqe->addr}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh with the following structure:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubjW)}(hstruct ublk_auto_buf_reg { __u16 index; /* Buffer index for registration */ __u8 flags; /* Registration flags */ __u8 reserved0; /* Reserved for future use */ __u32 reserved1; /* Reserved for future use */ };h]hstruct ublk_auto_buf_reg { __u16 index; /* Buffer index for registration */ __u8 flags; /* Registration flags */ __u8 reserved0; /* Reserved for future use */ __u32 reserved1; /* Reserved for future use */ };}hjsbah}(h]h ]h"]h$]h&]hhuh1jVhhhMhjubh)}(hYublk_auto_buf_reg_to_sqe_addr() is for converting the above structure into ``sqe->addr``.h](hKublk_auto_buf_reg_to_sqe_addr() is for converting the above structure into }(hjhhhNhNubj)}(h ``sqe->addr``h]h sqe->addr}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubeh}(h]h ]h"]h$]h&]uh1jhj*hhhhhNubj)}(h=All reserved fields in ``ublk_auto_buf_reg`` must be zeroed. h]h)}(hOptional flags can be passed via ``ublk_auto_buf_reg.flags``. h]h)}(h=Optional flags can be passed via ``ublk_auto_buf_reg.flags``.h](h!Optional flags can be passed via }(hjhhhNhNubj)}(h``ublk_auto_buf_reg.flags``h]hublk_auto_buf_reg.flags}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhj*hhhhhNubeh}(h]h ]h"]h$]h&]j.j/j0hj1.uh1jhjhhhhhMubeh}(h]usage-requirementsah ]h"]usage requirementsah$]h&]uh1hhjohhhhhMubh)}(hhh](h)}(hFallback Behaviorh]hFallback Behavior}(hjKhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjHhhhhhMubh)}(h"If auto buffer registration fails:h]h"If auto buffer registration fails:}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjHhhubj)}(hhh](j)}(hX<When ``UBLK_AUTO_BUF_REG_FALLBACK`` is enabled: - The uring_cmd is completed - ``UBLK_IO_F_NEED_REG_BUF`` is set in ``ublksrv_io_desc.op_flags`` - The ublk server must manually deal with the failure, such as, register the buffer manually, or using user copy feature for retrieving the data for handling ublk IO h](h)}(h/When ``UBLK_AUTO_BUF_REG_FALLBACK`` is enabled:h](hWhen }(hjnhhhNhNubj)}(h``UBLK_AUTO_BUF_REG_FALLBACK``h]hUBLK_AUTO_BUF_REG_FALLBACK}(hjvhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjnubh is enabled:}(hjnhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjjubj )}(hhh](j)}(hThe uring_cmd is completedh]h)}(hjh]hThe uring_cmd is completed}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hA``UBLK_IO_F_NEED_REG_BUF`` is set in ``ublksrv_io_desc.op_flags``h]h)}(hjh](j)}(h``UBLK_IO_F_NEED_REG_BUF``h]hUBLK_IO_F_NEED_REG_BUF}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh is set in }(hjhhhNhNubj)}(h``ublksrv_io_desc.op_flags``h]hublksrv_io_desc.op_flags}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(hThe ublk server must manually deal with the failure, such as, register the buffer manually, or using user copy feature for retrieving the data for handling ublk IO h]h)}(hThe ublk server must manually deal with the failure, such as, register the buffer manually, or using user copy feature for retrieving the data for handling ublk IOh]hThe ublk server must manually deal with the failure, such as, register the buffer manually, or using user copy feature for retrieving the data for handling ublk IO}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]jjuh1j hhhMhjjubeh}(h]h ]h"]h$]h&]uh1jhjghhhNhNubj)}(hfIf fallback is not enabled: - The ublk I/O request fails silently - The uring_cmd won't be completed h](h)}(hIf fallback is not enabled:h]hIf fallback is not enabled:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubj )}(hhh](j)}(h#The ublk I/O request fails silentlyh]h)}(hjh]h#The ublk I/O request fails silently}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjubj)}(h!The uring_cmd won't be completed h]h)}(h The uring_cmd won't be completedh]h"The uring_cmd won’t be completed}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj+ubah}(h]h ]h"]h$]h&]uh1jhjubeh}(h]h ]h"]h$]h&]jjuh1j hhhMhjubeh}(h]h ]h"]h$]h&]uh1jhjghhhNhNubeh}(h]h ]h"]h$]h&]j.j/j0hj1j?uh1jhjHhhhhhMubeh}(h]fallback-behaviorah ]h"]fallback behaviorah$]h&]uh1hhjohhhhhMubh)}(hhh](h)}(h Limitationsh]h Limitations}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj]hhhhhMubj )}(hhh](j)}(h0Requires same ``io_ring_ctx`` for all operationsh]h)}(hjsh](hRequires same }(hjuhhhNhNubj)}(h``io_ring_ctx``h]h io_ring_ctx}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjuubh for all operations}(hjuhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjqubah}(h]h ]h"]h$]h&]uh1jhjnhhhhhNubj)}(h6May require manual buffer management in fallback casesh]h)}(hjh]h6May require manual buffer management in fallback cases}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjnhhhhhNubj)}(hio_ring_ctx buffer table has a max size of 16K, which may not be enough in case that too many ublk devices are handled by this single io_ring_ctx and each one has very large queue depth h]h)}(hio_ring_ctx buffer table has a max size of 16K, which may not be enough in case that too many ublk devices are handled by this single io_ring_ctx and each one has very large queue depthh]hio_ring_ctx buffer table has a max size of 16K, which may not be enough in case that too many ublk devices are handled by this single io_ring_ctx and each one has very large queue depth}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjnhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhMhj]hhubeh}(h] limitationsah ]h"]h$] limitationsah&]uh1hhjohhhhhM referencedKubeh}(h]auto-buffer-registrationah ]h"]auto buffer registrationah$]h&]uh1hhj hhhhhMubh)}(hhh](h)}(h)Shared Memory Zero Copy (UBLK_F_SHMEM_ZC)h]h)Shared Memory Zero Copy (UBLK_F_SHMEM_ZC)}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hXThe ``UBLK_F_SHMEM_ZC`` feature provides an alternative zero-copy path that works by sharing physical memory pages between the client application and the ublk server. Unlike the io_uring fixed buffer approach above, shared memory zero copy does not require io_uring buffer registration per I/O — instead, it relies on the kernel matching physical pages at I/O time. This allows the ublk server to access the shared buffer directly, which is unlikely for the io_uring fixed buffer approach.h](hThe }(hjhhhNhNubj)}(h``UBLK_F_SHMEM_ZC``h]hUBLK_F_SHMEM_ZC}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhX feature provides an alternative zero-copy path that works by sharing physical memory pages between the client application and the ublk server. Unlike the io_uring fixed buffer approach above, shared memory zero copy does not require io_uring buffer registration per I/O — instead, it relies on the kernel matching physical pages at I/O time. This allows the ublk server to access the shared buffer directly, which is unlikely for the io_uring fixed buffer approach.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hhh](h)}(h Motivationh]h Motivation}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMubh)}(hXShared memory zero copy takes a different approach: if the client application and the ublk server both map the same physical memory, there is nothing to copy. The kernel detects the shared pages automatically and tells the server where the data already lives.h]hXShared memory zero copy takes a different approach: if the client application and the ublk server both map the same physical memory, there is nothing to copy. The kernel detects the shared pages automatically and tells the server where the data already lives.}(hj"hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(h``UBLK_F_SHMEM_ZC`` can be thought of as a supplement for optimized client applications — when the client is willing to allocate I/O buffers from shared memory, the entire data path becomes zero-copy.h](j)}(h``UBLK_F_SHMEM_ZC``h]hUBLK_F_SHMEM_ZC}(hj4hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj0ubh can be thought of as a supplement for optimized client applications — when the client is willing to allocate I/O buffers from shared memory, the entire data path becomes zero-copy.}(hj0hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjhhubeh}(h] motivationah ]h"] motivationah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h Use Casesh]h Use Cases}(hjWhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjThhhhhMubh)}(hThis feature is useful when the client application can be configured to use a specific shared memory region for its I/O buffers:h]hThis feature is useful when the client application can be configured to use a specific shared memory region for its I/O buffers:}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjThhubj )}(hhh](j)}(h**Custom storage clients** that allocate I/O buffers from shared memory (memfd, hugetlbfs) and issue direct I/O to the ublk deviceh]h)}(h**Custom storage clients** that allocate I/O buffers from shared memory (memfd, hugetlbfs) and issue direct I/O to the ublk deviceh](j )}(h**Custom storage clients**h]hCustom storage clients}(hj~hhhNhNubah}(h]h ]h"]h$]h&]uh1j hjzubhh that allocate I/O buffers from shared memory (memfd, hugetlbfs) and issue direct I/O to the ublk device}(hjzhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjvubah}(h]h ]h"]h$]h&]uh1jhjshhhhhNubj)}(hG**Database engines** that use pre-allocated buffer pools with O_DIRECT h]h)}(hF**Database engines** that use pre-allocated buffer pools with O_DIRECTh](j )}(h**Database engines**h]hDatabase engines}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubh2 that use pre-allocated buffer pools with O_DIRECT}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhjshhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhMhjThhubeh}(h] use-casesah ]h"] use casesah$]h&]uh1hhjhhhhhMubh)}(hhh](h)}(h How It Worksh]h How It Works}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM ubj)}(hhh](j)}(hThe ublk server and client both ``mmap()`` the same file (memfd or hugetlbfs) with ``MAP_SHARED``. This gives both processes access to the same physical pages. h]h)}(hThe ublk server and client both ``mmap()`` the same file (memfd or hugetlbfs) with ``MAP_SHARED``. This gives both processes access to the same physical pages.h](h The ublk server and client both }(hjhhhNhNubj)}(h ``mmap()``h]hmmap()}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh) the same file (memfd or hugetlbfs) with }(hjhhhNhNubj)}(h``MAP_SHARED``h]h MAP_SHARED}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh>. This gives both processes access to the same physical pages.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hThe ublk server registers its mapping with the kernel:: struct ublk_shmem_buf_reg buf = { .addr = mmap_va, .len = size }; ublk_ctrl_cmd(UBLK_U_CMD_REG_BUF, .addr = &buf); The kernel pins the pages and builds a PFN lookup tree. h](h)}(h7The ublk server registers its mapping with the kernel::h]h6The ublk server registers its mapping with the kernel:}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubjW)}(hrstruct ublk_shmem_buf_reg buf = { .addr = mmap_va, .len = size }; ublk_ctrl_cmd(UBLK_U_CMD_REG_BUF, .addr = &buf);h]hrstruct ublk_shmem_buf_reg buf = { .addr = mmap_va, .len = size }; ublk_ctrl_cmd(UBLK_U_CMD_REG_BUF, .addr = &buf);}hj2sbah}(h]h ]h"]h$]h&]hhuh1jVhhhMhj ubh)}(h7The kernel pins the pages and builds a PFN lookup tree.h]h7The kernel pins the pages and builds a PFN lookup tree.}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hWhen the client issues direct I/O (``O_DIRECT``) to ``/dev/ublkb*``, the kernel checks whether the I/O buffer pages match any registered pages by comparing PFNs. h]h)}(hWhen the client issues direct I/O (``O_DIRECT``) to ``/dev/ublkb*``, the kernel checks whether the I/O buffer pages match any registered pages by comparing PFNs.h](h#When the client issues direct I/O (}(hjXhhhNhNubj)}(h ``O_DIRECT``h]hO_DIRECT}(hj`hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjXubh) to }(hjXhhhNhNubj)}(h``/dev/ublkb*``h]h /dev/ublkb*}(hjrhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjXubh^, the kernel checks whether the I/O buffer pages match any registered pages by comparing PFNs.}(hjXhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjTubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hXOn a match, the kernel sets ``UBLK_IO_F_SHMEM_ZC`` in the I/O descriptor and encodes the buffer index and offset in ``addr``:: if (iod->op_flags & UBLK_IO_F_SHMEM_ZC) { /* Data is already in our shared mapping — zero copy */ index = ublk_shmem_zc_index(iod->addr); offset = ublk_shmem_zc_offset(iod->addr); buf = shmem_table[index].mmap_base + offset; } h](h)}(h~On a match, the kernel sets ``UBLK_IO_F_SHMEM_ZC`` in the I/O descriptor and encodes the buffer index and offset in ``addr``::h](hOn a match, the kernel sets }(hjhhhNhNubj)}(h``UBLK_IO_F_SHMEM_ZC``h]hUBLK_IO_F_SHMEM_ZC}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubhB in the I/O descriptor and encodes the buffer index and offset in }(hjhhhNhNubj)}(h``addr``h]haddr}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh:}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMhjubjW)}(hif (iod->op_flags & UBLK_IO_F_SHMEM_ZC) { /* Data is already in our shared mapping — zero copy */ index = ublk_shmem_zc_index(iod->addr); offset = ublk_shmem_zc_offset(iod->addr); buf = shmem_table[index].mmap_base + offset; }h]hif (iod->op_flags & UBLK_IO_F_SHMEM_ZC) { /* Data is already in our shared mapping — zero copy */ index = ublk_shmem_zc_index(iod->addr); offset = ublk_shmem_zc_offset(iod->addr); buf = shmem_table[index].mmap_base + offset; }}hjsbah}(h]h ]h"]h$]h&]hhuh1jVhhhMhjubeh}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hxIf pages do not match (e.g., the client used a non-shared buffer), the I/O falls back to the normal copy path silently. h]h)}(hwIf pages do not match (e.g., the client used a non-shared buffer), the I/O falls back to the normal copy path silently.h]hwIf pages do not match (e.g., the client used a non-shared buffer), the I/O falls back to the normal copy path silently.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM&hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]j.j/j0hj1j?uh1jhjhhhhhM ubh)}(h0The shared memory can be set up via two methods:h]h0The shared memory can be set up via two methods:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM)hjhhubj )}(hhh](j)}(h**Socket-based**: the client sends a memfd to the ublk server via ``SCM_RIGHTS`` on a unix socket. The server mmaps and registers it.h]h)}(h**Socket-based**: the client sends a memfd to the ublk server via ``SCM_RIGHTS`` on a unix socket. The server mmaps and registers it.h](j )}(h**Socket-based**h]h Socket-based}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hj ubh2: the client sends a memfd to the ublk server via }(hj hhhNhNubj)}(h``SCM_RIGHTS``h]h SCM_RIGHTS}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj ubh5 on a unix socket. The server mmaps and registers it.}(hj hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM+hj ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h**Hugetlbfs-based**: both processes ``mmap(MAP_SHARED)`` the same hugetlbfs file. No IPC needed — same file gives same physical pages. h]h)}(h**Hugetlbfs-based**: both processes ``mmap(MAP_SHARED)`` the same hugetlbfs file. No IPC needed — same file gives same physical pages.h](j )}(h**Hugetlbfs-based**h]hHugetlbfs-based}(hjIhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjEubh: both processes }(hjEhhhNhNubj)}(h``mmap(MAP_SHARED)``h]hmmap(MAP_SHARED)}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjEubhP the same hugetlbfs file. No IPC needed — same file gives same physical pages.}(hjEhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM-hjAubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhM+hjhhubeh}(h] how-it-worksah ]h"] how it worksah$]h&]uh1hhjhhhhhM ubh)}(hhh](h)}(h Advantagesh]h Advantages}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM1ubj )}(hhh](j)}(h**Simple**: no per-I/O buffer registration or unregistration commands. Once the shared buffer is registered, all matching I/O is zero-copy automatically.h]h)}(h**Simple**: no per-I/O buffer registration or unregistration commands. Once the shared buffer is registered, all matching I/O is zero-copy automatically.h](j )}(h **Simple**h]hSimple}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubh: no per-I/O buffer registration or unregistration commands. Once the shared buffer is registered, all matching I/O is zero-copy automatically.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM3hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h**Direct buffer access**: the ublk server can read and write the shared buffer directly via its own mmap, without going through io_uring fixed buffer operations. This is more friendly for server implementations.h]h)}(h**Direct buffer access**: the ublk server can read and write the shared buffer directly via its own mmap, without going through io_uring fixed buffer operations. This is more friendly for server implementations.h](j )}(h**Direct buffer access**h]hDirect buffer access}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubh: the ublk server can read and write the shared buffer directly via its own mmap, without going through io_uring fixed buffer operations. This is more friendly for server implementations.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM6hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(hu**Fast**: PFN matching is a single maple tree lookup per bvec. No io_uring command round-trips for buffer management.h]h)}(hu**Fast**: PFN matching is a single maple tree lookup per bvec. No io_uring command round-trips for buffer management.h](j )}(h**Fast**h]hFast}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubhm: PFN matching is a single maple tree lookup per bvec. No io_uring command round-trips for buffer management.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM9hjubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubj)}(h**Compatible**: non-matching I/O silently falls back to the copy path. The device works normally for any client, with zero-copy as an optimization when shared memory is available. h]h)}(h**Compatible**: non-matching I/O silently falls back to the copy path. The device works normally for any client, with zero-copy as an optimization when shared memory is available.h](j )}(h**Compatible**h]h Compatible}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubh: non-matching I/O silently falls back to the copy path. The device works normally for any client, with zero-copy as an optimization when shared memory is available.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM;hj ubah}(h]h ]h"]h$]h&]uh1jhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhM3hjhhubeh}(h] advantagesah ]h"] advantagesah$]h&]uh1hhjhhhhhM1ubh)}(hhh](h)}(h Limitationsh]h Limitations}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjAhhhhhM@ubj )}(hhh](j)}(h**Requires client cooperation**: the client must allocate its I/O buffers from the shared memory region. This requires a custom or configured client — standard applications using their own buffers will not benefit.h]h)}(h**Requires client cooperation**: the client must allocate its I/O buffers from the shared memory region. This requires a custom or configured client — standard applications using their own buffers will not benefit.h](j )}(h**Requires client cooperation**h]hRequires client cooperation}(hj]hhhNhNubah}(h]h ]h"]h$]h&]uh1j hjYubh: the client must allocate its I/O buffers from the shared memory region. This requires a custom or configured client — standard applications using their own buffers will not benefit.}(hjYhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMBhjUubah}(h]h ]h"]h$]h&]uh1jhjRhhhhhNubj)}(hX**Direct I/O only**: buffered I/O (without ``O_DIRECT``) goes through the page cache, which allocates its own pages. These kernel-allocated pages will never match the registered shared buffer. Only ``O_DIRECT`` puts the client's buffer pages directly into the block I/O.h]h)}(hX**Direct I/O only**: buffered I/O (without ``O_DIRECT``) goes through the page cache, which allocates its own pages. These kernel-allocated pages will never match the registered shared buffer. Only ``O_DIRECT`` puts the client's buffer pages directly into the block I/O.h](j )}(h**Direct I/O only**h]hDirect I/O only}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubh: buffered I/O (without }(hjhhhNhNubj)}(h ``O_DIRECT``h]hO_DIRECT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh) goes through the page cache, which allocates its own pages. These kernel-allocated pages will never match the registered shared buffer. Only }(hjhhhNhNubj)}(h ``O_DIRECT``h]hO_DIRECT}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubh> puts the client’s buffer pages directly into the block I/O.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMFhj{ubah}(h]h ]h"]h$]h&]uh1jhjRhhhhhNubj)}(h**Contiguous data only**: each I/O request's data must be contiguous within a single registered buffer. Scatter/gather I/O that spans multiple non-adjacent registered buffers cannot use the zero-copy path. h]h)}(h**Contiguous data only**: each I/O request's data must be contiguous within a single registered buffer. Scatter/gather I/O that spans multiple non-adjacent registered buffers cannot use the zero-copy path.h](j )}(h**Contiguous data only**h]hContiguous data only}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1j hjubh: each I/O request’s data must be contiguous within a single registered buffer. Scatter/gather I/O that spans multiple non-adjacent registered buffers cannot use the zero-copy path.}(hjhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMJhjubah}(h]h ]h"]h$]h&]uh1jhjRhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhMBhjAhhubeh}(h]id6ah ]h"]h$]jah&]uh1hhjhhhhhM@jKubh)}(hhh](h)}(hControl Commandsh]hControl Commands}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhMOubj )}(hhh](j)}(hX4``UBLK_U_CMD_REG_BUF`` Register a shared memory buffer. ``ctrl_cmd.addr`` points to a ``struct ublk_shmem_buf_reg`` containing the buffer virtual address and size. Returns the assigned buffer index (>= 0) on success. The kernel pins pages and builds the PFN lookup tree. Queue freeze is handled internally. h](h)}(h``UBLK_U_CMD_REG_BUF``h]j)}(hjh]hUBLK_U_CMD_REG_BUF}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhMQhj ubh)}(hXRegister a shared memory buffer. ``ctrl_cmd.addr`` points to a ``struct ublk_shmem_buf_reg`` containing the buffer virtual address and size. Returns the assigned buffer index (>= 0) on success. The kernel pins pages and builds the PFN lookup tree. Queue freeze is handled internally.h](h!Register a shared memory buffer. }(hj'hhhNhNubj)}(h``ctrl_cmd.addr``h]h ctrl_cmd.addr}(hj/hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj'ubh points to a }(hj'hhhNhNubj)}(h``struct ublk_shmem_buf_reg``h]hstruct ublk_shmem_buf_reg}(hjAhhhNhNubah}(h]h ]h"]h$]h&]uh1jhj'ubh containing the buffer virtual address and size. Returns the assigned buffer index (>= 0) on success. The kernel pins pages and builds the PFN lookup tree. Queue freeze is handled internally.}(hj'hhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhMShj ubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubj)}(h``UBLK_U_CMD_UNREG_BUF`` Unregister a previously registered buffer. ``ctrl_cmd.data[0]`` is the buffer index. Unpins pages and removes PFN entries from the lookup tree. h](h)}(h``UBLK_U_CMD_UNREG_BUF``h]j)}(hjeh]hUBLK_U_CMD_UNREG_BUF}(hjghhhNhNubah}(h]h ]h"]h$]h&]uh1jhjcubah}(h]h ]h"]h$]h&]uh1hhhhMYhj_ubh)}(hUnregister a previously registered buffer. ``ctrl_cmd.data[0]`` is the buffer index. Unpins pages and removes PFN entries from the lookup tree.h](h+Unregister a previously registered buffer. }(hjzhhhNhNubj)}(h``ctrl_cmd.data[0]``h]hctrl_cmd.data[0]}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjzubhP is the buffer index. Unpins pages and removes PFN entries from the lookup tree.}(hjzhhhNhNubeh}(h]h ]h"]h$]h&]uh1hhhhM[hj_ubeh}(h]h ]h"]h$]h&]uh1jhj hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1j hhhMQhjhhubeh}(h]control-commandsah ]h"]control commandsah$]h&]uh1hhjhhhhhMOubeh}(h]'shared-memory-zero-copy-ublk-f-shmem-zcah ]h"])shared memory zero copy (ublk_f_shmem_zc)ah$]h&]uh1hhj hhhhhMubeh}(h]designah ]h"]designah$]h&]uh1hhhhhhhhK\ubh)}(hhh](h)}(h Referencesh]h References}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhM`ubhfootnote)}(h https://github.com/ming1/ubdsrv h](hlabel)}(hhh]h1}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhNhNubh)}(hhttps://github.com/ming1/ubdsrvh]h reference)}(hjh]hhttps://github.com/ming1/ubdsrv}(hjhhhNhNubah}(h]h ]h"]h$]h&]refurijuh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhMbhjubeh}(h]jah ]h"] userspaceah$]h&]j ajKjjuh1jhhhMbhjhhubj)}(h0https://github.com/ming1/ubdsrv/tree/master/lib h](j)}(hhh]h2}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1jhjhhhNhNubh)}(h/https://github.com/ming1/ubdsrv/tree/master/libh]j)}(hjh]h/https://github.com/ming1/ubdsrv/tree/master/lib}(hjhhhNhNubah}(h]h ]h"]h$]h&]refurijuh1jhjubah}(h]h ]h"]h$]h&]uh1hhhhMdhjubeh}(h]j:ah ]h"] userspace_libah$]h&](j5jejKjjuh1jhhhMdhjhhubj)}(h2https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk h](j)}(hhh]h3}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1jhj5hhhNhNubh)}(h1https://gitlab.com/rwmjones/libnbd/-/tree/nbdublkh]j)}(hjHh]h1https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk}(hjJhhhNhNubah}(h]h ]h"]h$]h&]refurijHuh1jhjFubah}(h]h ]h"]h$]h&]uh1hhhhMfhj5ubeh}(h]j`ah ]h"]userspace_nbdublkah$]h&]j[ajKjjuh1jhhhMfhjhhubj)}(h2https://github.com/ming1/ubdsrv/blob/master/READMEh](j)}(hhh]h4}(hjihhhNhNubah}(h]h ]h"]h$]h&]uh1jhjehhhNhNubh)}(hjgh]j)}(hjgh]h2https://github.com/ming1/ubdsrv/blob/master/README}(hjyhhhNhNubah}(h]h ]h"]h$]h&]refurijguh1jhjvubah}(h]h ]h"]h$]h&]uh1hhhhMhhjeubeh}(h]j ah ]h"]userspace_readmeah$]h&]jajKjjuh1jhhhMhhjhhubeh}(h] referencesah ]h"] referencesah$]h&]uh1hhhhhhhhM`ubeh}(h])userspace-block-device-driver-ublk-driverah ]h"]+userspace block device driver (ublk driver)ah$]h&]uh1hhhhhhhhKubeh}(h]h ]h"]h$]h&]sourcehuh1hcurrent_sourceN current_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampN source_linkN source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesN report_levelK halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjerror_encodingutf-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourcehnj _destinationN _config_files]7/var/lib/git/docbuild/linux/Documentation/docutils.confafile_insertion_enabled raw_enabledKline_length_limitM'pep_referencesN pep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesN rfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlong smart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}substitution_names}refnames}( userspace]ja userspace_lib](j+jweuserspace_nbdublk]jQauserspace_readme]jaurefids}(j]jaj:](j+jwej`]jQaj ]jaunameids}(jjjjjjjjj j jWjTj j jOjLjljijjjjjEjBjZjW limitationsNjjjQjNjjjjj>j;jjjjjjj2j:jbj`jj u nametypes}(jjjjj jWj jOjljjjEjZjjjQjjj>jjjj2jbjuh}(jhjhj jj5j+j[jQjjwjjjjjj j j1jTj j j jLj jijZjjojjjBjjWjHjj]jjjNjjjTjjj;jjjAjjjjjjj:jj`j5j jeu footnote_refs}(j]jaj ](j+jwej ]jQaj ]jau citation_refs} autofootnotes](jjj5jeeautofootnote_refs](jj+jQjwjesymbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK id_counter collectionsCounter}jKsRparse_messages](hsystem_message)}(hhh]h)}(h:Enumerated list start value not ordinal-1: "2" (ordinal 2)h]h>Enumerated list start value not ordinal-1: “2” (ordinal 2)}(hj?hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj<ubah}(h]h ]h"]h$]h&]levelKtypeINFOsourcehnjlineKuh1j:hjubj;)}(hhh]h)}(h.Duplicate implicit target name: "limitations".h]h2Duplicate implicit target name: “limitations”.}(hj[hhhNhNubah}(h]h ]h"]h$]h&]uh1hhjXubah}(h]h ]h"]h$]h&]jalevelKtypejUsourcehnjlineM@uh1j:hjAhhhhhM@ubetransform_messages] transformerN include_log] decorationNhhub.