aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vdpa
AgeCommit message (Collapse)AuthorFilesLines
2024-04-22vDPA: code clean for vhost_vdpa uapiZhu Lingshan1-3/+3
This commit cleans up the uapi for vhost_vdpa by better naming some of the enums which report blk information to user space, and they are not in any official releases yet. Fixes: 1ac61ddfee93 ("vDPA: report virtio-blk flush info to user space") Fixes: ae1374b7f72c ("vDPA: report virtio-block read-only info to user space") Fixes: 330b8aea6924 ("vDPA: report virtio-block max segment size to user space") Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240415111047.1047774-1-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-blk flush info to user spaceZhu Lingshan1-0/+14
This commit reports whether a virtio-blk device support cache flush command to user space Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block read-only info to user spaceZhu Lingshan1-0/+14
This commit report read-only information of virtio-blk devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block write zeroes configuration to user spaceZhu Lingshan1-0/+23
This commits reports write zeroes configuration of virtio-block devices to user space, includes: 1)maximum write zeroes sectors size 2)maximum write zeroes segment number Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block discarding configuration to user spaceZhu Lingshan1-0/+26
This commit reports virtio-blk discarding configuration to user space,includes: 1) the maximum discard sectors 2) maximum number of discard segments for the block driver to use 3) the alignment for splitting a discarding request Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block topology info to user spaceZhu Lingshan1-0/+32
This commit allows vDPA reporting topology information of virtio-blk devices to user space, includes: 1) the number of logical blocks per physical block 2) offset of first aligned logical block 3) suggested minimum I/O size in blocks 4) optimal (suggested maximum) I/O size in blocks Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block MQ info to user spaceZhu Lingshan1-0/+16
This commits allows vDPA reporting virtio-block multi-queue configuration to user sapce. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block max segments in a request to user spaceZhu Lingshan1-0/+17
This commit allows vDPA reporting the maximum number of segments in a request of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block block-size to user spaceZhu Lingshan1-0/+18
This commit allows reporting the block size of a virtio-block device to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block max segment size to user spaceZhu Lingshan1-0/+17
This commit allows reporting the max size of any single segment of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block capacity to user spaceZhu Lingshan1-0/+35
This commit allows userspace to query capacity of a virtio-block device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vdpa: make vdpa_bus constRicardo B. Marliere1-1/+1
Now that the driver core can properly handle constant struct bus_type, move the vdpa_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Message-Id: <20240204-bus_cleanup-vdpa-v1-1-1745eccb0a5c@marliere.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-19vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_minZhu Lingshan2-0/+7
IFCVF HW supports operation with vq size less than the max size, as the spec required. This commit implements vdpa_config_ops.get_vq_num_min to report the minimal size of the virtqueues, which gives vDPA framework a chance to reduce the vring size. We need at least one descriptor to be functional, but it is better no less than 64 to meet ceratin performance requirements. Actually the framework would allocate at least a PAGE for the vq. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA/ifcvf: get_max_vq_size to return max sizeZhu Lingshan1-5/+1
Since we already implemented vdpa_config_ops.get_vq_size, so get_max_vq_size can return the acutal max size of the virtqueues other than the max allowed safe size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vduse: implement vdpa_config_ops.get_vq_size for vduseZhu Lingshan1-0/+12
This commit implements get_vq_size for vdpa_config_ops. This new interface is used to report per vq size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vdpa_sim: implement vdpa_config_ops.get_vq_size for vDPA simulatorZhu Lingshan1-0/+12
This commit implements vdpa_config_ops.get_vq_size for vDPA simulator, this new interface can help report per vq size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19eni_vdpa: implement vdpa_config_ops.get_vq_sizeZhu Lingshan1-0/+8
This commit implements get_vq_size which report per vq size in vdpa_config_ops Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vp_vdpa: implement vdpa_config_ops.get_vq_sizeZhu Lingshan1-0/+8
This commit implements vdpa_config_ops.get_vq_size in vp_vdpa, which reports per virtqueue size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA/ifcvf: implement vdpa_config_ops.get_vq_sizeZhu Lingshan3-1/+14
This commit implements vdpa_ops.get_vq_size to report the size of a specific virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vdpa/pds: fixes for VF vdpa flr-aer handlingShannon Nelson3-4/+19
This addresses a couple of things found while testing the FLR and AER handling with the VFs. - release irqs before calling vp_modern_remove() - make sure we have a valid struct pointer before using it to release irqs - make sure the FW is alive before trying to add a new device Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20240220011050.30913-1-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vduse: implement DMA sync callbacksMaxime Coquelin3-3/+54
Since commit 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers"), VDUSE device require support for DMA's .sync_single_for_cpu() operation as the memory is non-coherent between the device and CPU because of the use of a bounce buffer. This patch implements both .sync_single_for_cpu() and .sync_single_for_device() callbacks, and also skip bounce buffer copies during DMA map and unmap operations if the DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra copies of the same buffer. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240219170606.587290-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vdpa/mlx5: Allow CVQ size changesJonah Palmer1-4/+9
The MLX driver was not updating its control virtqueue size at set_vq_num and instead always initialized to MLX5_CVQ_MAX_ENT (16) at setup_cvq_vring. Qemu would try to set the size to 64 by default, however, because the CVQ size always was initialized to 16, an error would be thrown when sending >16 control messages (as used-ring entry 17 is initialized to 0). For example, starting a guest with x-svq=on and then executing the following command would produce the error below: # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done qemu-system-x86_64: Insufficient written data (0) [ 435.331223] virtio_net virtio0: Failed to set mac address by vq command. SIOCSIFHWADDR: Invalid argument Acked-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting")
2024-03-19vdpa_sim: reset must not runSteve Sistare1-1/+2
vdpasim_do_reset sets running to true, which is wrong, as it allows vdpasim_kick_vq to post work requests before the device has been configured. To fix, do not set running until VIRTIO_CONFIG_S_DRIVER_OK is set. Fixes: 0c89e2a3a9d0 ("vdpa_sim: Implement suspend vdpa op") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <1707517807-137331-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds5-41/+257
Pull virtio updates from Michael Tsirkin: - vdpa/mlx5: support for resumable vqs - virtio_scsi: mq_poll support - 3virtio_pmem: support SHMEM_REGION - virtio_balloon: stay awake while adjusting balloon - virtio: support for no-reset virtio PCI PM - Fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Add mkey leak detection vdpa/mlx5: Introduce reference counting to mrs vdpa/mlx5: Use vq suspend/resume during .set_map vdpa/mlx5: Mark vq state for modification in hw vq vdpa/mlx5: Mark vq addrs for modification in hw vq vdpa/mlx5: Introduce per vq and device resume vdpa/mlx5: Allow modifying multiple vq fields in one modify command vdpa/mlx5: Expose resumable vq capability vdpa: Block vq property changes in DRIVER_OK vdpa: Track device suspended state scsi: virtio_scsi: Add mq_poll support virtio_pmem: support feature SHMEM_REGION virtio_balloon: stay awake while adjusting balloon vdpa: Remove usage of the deprecated ida_simple_xx() API virtio: Add support for no-reset virtio PCI PM virtio_net: fix missing dma unmap for resize vhost-vdpa: account iommu allocations vdpa: Fix an error handling path in eni_vdpa_probe()
2024-01-10vdpa/mlx5: Add mkey leak detectionDragos Tatulea3-0/+27
Track allocated mrs in a list and show warning when leaks are detected on device free or reset. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10vdpa/mlx5: Introduce reference counting to mrsDragos Tatulea3-25/+78
Deleting the old mr during mr update (.set_map) and then modifying the vqs with the new mr is not a good flow for firmware. The firmware expects that mkeys are deleted after there are no more vqs referencing them. Introduce reference counting for mrs to fix this. It is the only way to make sure that mkeys are not in use by vqs. An mr reference is taken when the mr is associated to the mr asid table and when the mr is linked to the vq on create/modify. The reference is released when the mkey is unlinked from the vq (trough modify/destroy) and from the mr asid table. To make things consistent, get rid of mlx5_vdpa_destroy_mr and use get/put semantics everywhere. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-8-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10vdpa/mlx5: Use vq suspend/resume during .set_mapDragos Tatulea1-8/+38
Instead of tearing down and setting up vq resources, use vq suspend/resume during .set_map to speed things up a bit. The vq mr is updated with the new mapping while the vqs are suspended. If the device doesn't support resumable vqs, do the old teardown and setup dance. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10vdpa/mlx5: Mark vq state for modification in hw vqDragos Tatulea1-0/+8
.set_vq_state will set the indices and mark the fields to be modified in the hw vq. Advertise that the device supports changing the vq state when the device is in DRIVER_OK state and suspended. Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231225151203.152687-6-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10vdpa/mlx5: Mark vq addrs for modification in hw vqDragos Tatulea1-0/+9
Addresses get set by .set_vq_address. hw vq addresses will be updated on next modify_virtqueue. Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10vdpa/mlx5: Introduce per vq and device resumeDragos Tatulea1-7/+62
Implement vdpa vq and device resume if capability detected. Add support for suspend -> ready state change. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-4-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10vdpa/mlx5: Allow modifying multiple vq fields in one modify commandDragos Tatulea1-8/+40
Add a bitmask variable that tracks hw vq field changes that are supposed to be modified on next hw vq change command. This will be useful to set multiple vq fields when resuming the vq. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-08Merge tag 'vfs-6.8.misc' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Add Jan Kara as VFS reviewer - Show correct device and inode numbers in proc/<pid>/maps for vma files on stacked filesystems. This is now easily doable thanks to the backing file work from the last cycles. This comes with selftests Cleanups: - Remove a redundant might_sleep() from wait_on_inode() - Initialize pointer with NULL, not 0 - Clarify comment on access_override_creds() - Rework and simplify eventfd_signal() and eventfd_signal_mask() helpers - Process aio completions in batches to avoid needless wakeups - Completely decouple struct mnt_idmap from namespaces. We now only keep the actual idmapping around and don't stash references to namespaces - Reformat maintainer entries to indicate that a given subsystem belongs to fs/ - Simplify fput() for files that were never opened - Get rid of various pointless file helpers - Rename various file helpers - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from last cycle - Make relatime_need_update() return bool - Use GFP_KERNEL instead of GFP_USER when allocating superblocks - Replace deprecated ida_simple_*() calls with their current ida_*() counterparts Fixes: - Fix comments on user namespace id mapping helpers. They aren't kernel doc comments so they shouldn't be using /** - s/Retuns/Returns/g in various places - Add missing parameter documentation on can_move_mount_beneath() - Rename i_mapping->private_data to i_mapping->i_private_data - Fix a false-positive lockdep warning in pipe_write() for watch queues - Improve __fget_files_rcu() code generation to improve performance - Only notify writer that pipe resizing has finished after setting pipe->max_usage otherwise writers are never notified that the pipe has been resized and hang - Fix some kernel docs in hfsplus - s/passs/pass/g in various places - Fix kernel docs in ntfs - Fix kcalloc() arguments order reported by gcc 14 - Fix uninitialized value in reiserfs" * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) reiserfs: fix uninit-value in comp_keys watch_queue: fix kcalloc() arguments order ntfs: dir.c: fix kernel-doc function parameter warnings fs: fix doc comment typo fs tree wide selftests/overlayfs: verify device and inode numbers in /proc/pid/maps fs/proc: show correct device and inode numbers in /proc/pid/maps eventfd: Remove usage of the deprecated ida_simple_xx() API fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation fs/hfsplus: wrapper.c: fix kernel-doc warnings fs: add Jan Kara as reviewer fs/inode: Make relatime_need_update return bool pipe: wakeup wr_wait after setting max_usage file: remove __receive_fd() file: stop exposing receive_fd_user() fs: replace f_rcuhead with f_task_work file: remove pointless wrapper file: s/close_fd_get_file()/file_close_fd()/g Improve __fget_files_rcu() code generation (and thus __fget_light()) file: massage cleanup of files that failed to open fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() ...
2023-12-27vdpa: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET1-2/+2
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <d7534cc4caf4ff9d6b072744352c1b69487779ea.1702230703.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-12-25vdpa: Fix an error handling path in eni_vdpa_probe()Christophe JAILLET1-2/+4
After a successful vp_legacy_probe() call, vp_legacy_remove() should be called in the error handling path, as already done in the remove function. Add the missing call. Fixes: e85087beedca ("eni_vdpa: add vDPA driver for Alibaba ENI") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <a7b0ef1eabd081f1c7c894e9b11de01678e85dee.1666293559.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-12-21Merge branch 'vfs.file'Christian Brauner1-1/+1
Bring in the changes to the file infrastructure for this cycle. Mostly cleanups and some performance tweaks. * file: remove __receive_fd() * file: stop exposing receive_fd_user() * fs: replace f_rcuhead with f_task_work * file: remove pointless wrapper * file: s/close_fd_get_file()/file_close_fd()/g * Improve __fget_files_rcu() code generation (and thus __fget_light()) * file: massage cleanup of files that failed to open Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-12file: remove __receive_fd()Christian Brauner1-1/+1
Honestly, there's little value in having a helper with and without that int __user *ufd argument. It's just messy and doesn't really give us anything. Just expose receive_fd() with that argument and get rid of that helper. Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-5-e73ca6f4ea83@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-01pds_vdpa: set features orderShannon Nelson1-2/+1
Fix up the order that the device and negotiated features are checked to get a more reliable difference when things get changed. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20231110221802.46841-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-12-01pds_vdpa: clear config callback when status goes to 0Shannon Nelson1-1/+3
If the client driver is setting status to 0, something is getting shutdown and possibly removed. Make sure we clear the config_cb so that it doesn't end up crashing when trying to call a bogus callback. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20231110221802.46841-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-12-01pds_vdpa: fix up format-truncation complaintShannon Nelson1-1/+1
Our friendly kernel test robot has recently been pointing out some format-truncation issues. Here's a fix for one of them. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311040109.RfgJoE7L-lkp@intel.com/ Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20231110221802.46841-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-12-01vdpa/mlx5: preserve CVQ vringh indexSteve Sistare1-1/+6
mlx5_vdpa does not preserve userland's view of vring base for the control queue in the following sequence: ioctl VHOST_SET_VRING_BASE ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK mlx5_vdpa_set_status() setup_cvq_vring() vringh_init_iotlb() vringh_init_kern() vrh->last_avail_idx = 0; ioctl VHOST_GET_VRING_BASE To fix, restore the value of cvq->vring.last_avail_idx after calling vringh_init_iotlb. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <1699014387-194368-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-28eventfd: simplify eventfd_signal()Christian Brauner1-3/+3
Ever since the eventfd type was introduced back in 2007 in commit e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by: Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-01vdpa_sim_blk: allocate the buffer zeroedStefano Garzarella1-2/+2
Deleting and recreating a device can lead to having the same content as the old device, so let's always allocate buffers completely zeroed out. Fixes: abebb16254b3 ("vdpa_sim_blk: support shared backend") Suggested-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20231031144339.121453-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-11-01vdpa_sim: implement .reset_map supportSi-Wei Liu1-9/+43
In order to reduce excessive memory mapping cost in live migration and VM reboot, it is desirable to decouple the vhost-vdpa IOTLB abstraction from the virtio device life cycle, i.e. mappings can be kept intact across virtio device reset. Leverage the .reset_map callback, which is meant to destroy the iotlb on the given ASID and recreate the 1:1 passthrough/identity mapping. To be consistent, the mapping on device creation is initiailized to passthrough/identity with PA 1:1 mapped as IOVA. With this the device .reset op doesn't have to maintain and clean up memory mappings by itself. Additionally, implement .compat_reset to cater for older userspace, which may wish to see mapping to be cleared during reset. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <1697880319-4937-8-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: implement .reset_map driver opSi-Wei Liu3-3/+42
Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device creation time. This 1:1 DMA MR will be implicitly destroyed while the first .set_map call is invoked, in which case callers like vhost-vdpa will start to set up custom mappings. When the .reset callback is invoked, the custom mappings will be cleared and the 1:1 DMA MR will be re-created. In order to reduce excessive memory mapping cost in live migration, it is desirable to decouple the vhost-vdpa IOTLB abstraction from the virtio device life cycle, i.e. mappings can be kept around intact across virtio device reset. Leverage the .reset_map callback, which is meant to destroy the regular MR (including cvq mapping) on the given ASID and recreate the initial DMA mapping. That way, the device .reset op runs free from having to maintain and clean up memory mappings by itself. Additionally, implement .compat_reset to cater for older userspace, which may wish to see mapping to be cleared during reset. Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <1697880319-4937-7-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vduse: make vduse_class constantGreg Kroah-Hartman1-19/+21
Now that the driver core allows for struct class to be in read-only memory, we should make all 'class' structures declared at build time placing them into read-only memory, instead of having to be dynamically allocated at runtime. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <2023100643-tricolor-citizen-6c2d@gregkh> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-11-01mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OKEugenio Pérez1-0/+7
Offer this backend feature as mlx5 is compatible with it. It allows it to do live migration with CVQ, dynamically switching between passthrough and shadow virtqueue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230703142514.363256-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-01vdpa/mlx5: Update cvq iotlb mapping on ASID changeDragos Tatulea3-1/+36
For the following sequence: - cvq group is in ASID 0 - .set_map(1, cvq_iotlb) - .set_group_asid(cvq_group, 1) ... the cvq mapping from ASID 0 will be used. This is not always correct behaviour. This patch adds support for the above mentioned flow by saving the iotlb on each .set_map and updating the cvq iotlb with it on a cvq group change. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-18-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Make iotlb helper functions more genericDragos Tatulea1-8/+11
They will be used in a follow-up patch. For dup_iotlb, avoid the src == dst case. This is an error. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-17-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Enable hw support for vq descriptor mappingDragos Tatulea1-1/+23
Vq descriptor mappings are supported in hardware by filling in an additional mkey which contains the descriptor mappings to the hw vq. A previous patch in this series added support for hw mkey (mr) creation for ASID 1. This patch fills in both the vq data and vq descriptor mkeys based on group ASID mapping. The feature is signaled to the vdpa core through the presence of the .get_vq_desc_group op. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-16-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Introduce mr for vq descriptorDragos Tatulea3-14/+25
Introduce the vq descriptor group and mr per ASID. Until now .set_map on ASID 1 was only updating the cvq iotlb. From now on it also creates a mkey for it. The current patch doesn't use it but follow-up patches will add hardware support for mapping the vq descriptors. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-15-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Improve mr update flowDragos Tatulea3-72/+82
The current flow for updating an mr works directly on mvdev->mr which makes it cumbersome to handle multiple new mr structs. This patch makes the flow more straightforward by having mlx5_vdpa_create_mr return a new mr which will update the old mr (if any). The old mr will be deleted and unlinked from mvdev. For the case when the iotlb is empty (not NULL), the old mr will be cleared. This change paves the way for adding mrs for different ASIDs. The initialized bool is no longer needed as mr is now a pointer in the mlx5_vdpa_dev struct which will be NULL when not initialized. Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-14-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Move mr mutex out of mr structDragos Tatulea3-11/+12
The mutex is named like it is supposed to protect only the mkey but in reality it is a global lock for all mr resources. Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and give it a more appropriate name. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231018171456.1624030-13-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Allow creation/deletion of any given mr structDragos Tatulea3-35/+36
This patch adapts the mr creation/deletion code to be able to work with any given mr struct pointer. All the APIs are adapted to take an extra parameter for the mr. mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The check is done in the caller instead (mlx5_set_map). This change is needed for a followup patch which will introduce an additional mr for the vq descriptor data. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231018171456.1624030-12-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Rename mr destroy functionsDragos Tatulea3-11/+11
Make mlx5_destroy_mr symmetric to mlx5_create_mr. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-11-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Collapse "dvq" mr add/delete functionsDragos Tatulea1-11/+5
Now that the cvq code is out of mlx5_vdpa_create/destroy_mr, the "dvq" functions can be folded into their callers. Having "dvq" in the naming will no longer be accurate in the downstream patches. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-10-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Take cvq iotlb lock during refreshDragos Tatulea1-1/+9
The reslock is taken while refresh is called but iommu_lock is more specific to this resource. So take the iommu_lock during cvq iotlb refresh. Based on Eugenio's patch [0]. [0] https://lore.kernel.org/lkml/20230112142218.725622-4-eperezma@redhat.com/ Acked-by: Jason Wang <jasowang@redhat.com> Suggested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Decouple cvq iotlb handling from hw mapping codeDragos Tatulea3-39/+28
The handling of the cvq iotlb is currently coupled with the creation and destruction of the hardware mkeys (mr). This patch moves cvq iotlb handling into its own function and shifts it to a scope that is not related to mr handling. As cvq handling is just a prune_iotlb + dup_iotlb cycle, put it all in the same "update" function. Finally, the destruction path is handled by directly pruning the iotlb. After this move is done the ASID mr code can be collapsed into a single function. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-8-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01vdpa/mlx5: Create helper function for dma mappingsDragos Tatulea3-2/+8
Necessary for upcoming cvq separation from mr allocation. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-10-18vdpa/mlx5: Fix firmware error on creation of 1k VQsDragos Tatulea2-9/+63
A firmware error is triggered when configuring a 9k MTU on the PF after switching to switchdev mode and then using a vdpa device with larger (1k) rings: mlx5_cmd_out_err: CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status bad resource(0x5), syndrome (0xf6db90), err(-22) This is due to the fact that the hw VQ size parameters are computed based on the umem_1/2/3_buffer_param_a/b capabilities and all device capabilities are read only when the driver is moved to switchdev mode. The problematic configuration flow looks like this: 1) Create VF 2) Unbind VF 3) Switch PF to switchdev mode. 4) Bind VF 5) Set PF MTU to 9k 6) create vDPA device 7) Start VM with vDPA device and 1K queue size Note that setting the MTU before step 3) doesn't trigger this issue. This patch reads the forementioned umem parameters at the latest point possible before the VQs of the device are created. v2: - Allocate output with kmalloc to reduce stack frame size. - Removed stable from cc. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230831155702.1080754-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-10-18vdpa/mlx5: Fix double release of debugfs entryDragos Tatulea3-8/+6
The error path in setup_driver deletes the debugfs entry but doesn't clear the pointer. During .dev_del the invalid pointer will be released again causing a crash. This patch fixes the issue by always clearing the debugfs entry in mlx5_vdpa_remove_debugfs. Also, stop removing the debugfs entry in .dev_del op: the debugfs entry is already handled within the setup_driver/teardown_driver scope. Cc: stable@vger.kernel.org Fixes: f0417e72add5 ("vdpa/mlx5: Add and remove debugfs in setup/teardown driver") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230829174014.928189-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-10-18vdpa_sim_blk: Fix the potential leak of mgmt_devShawn.Shao1-2/+3
If the shared_buffer allocation fails, need to unregister mgmt_dev first. Cc: stable@vger.kernel.org Fixes: abebb16254b36 ("vdpa_sim_blk: support shared backend") Signed-off-by: Shawn.Shao <shawn.shao@jaguarmicro.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230821060333.1155-1-shawn.shao@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-04Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-3/+8
Pull virtio updates from Michael Tsirkin: "A small pull request this time around, mostly because the vduse network got postponed to next relase so we can be sure we got the security store right" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_ring: fix avail_wrap_counter in virtqueue_add_packed virtio_vdpa: build affinity masks conditionally virtio_net: merge dma operations when filling mergeable buffers virtio_ring: introduce dma sync api for virtqueue virtio_ring: introduce dma map api for virtqueue virtio_ring: introduce virtqueue_reset() virtio_ring: separate the logic of reset/enable from virtqueue_resize virtio_ring: correct the expression of the description of virtqueue_resize() virtio_ring: skip unmap for premapped virtio_ring: introduce virtqueue_dma_dev() virtio_ring: support add premapped buf virtio_ring: introduce virtqueue_set_dma_premapped() virtio_ring: put mapping error check in vring_map_one_sg virtio_ring: check use_dma_api before unmap desc for indirect vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa: add get_backend_features vdpa operation vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag vdpa/mlx5: Remove unused function declarations
2023-09-03vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OKEugenio Pérez1-0/+8
Start offering the feature in the simulator. Other parent drivers can follow this code to offer it too. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230609092127.170673-5-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03vdpa/mlx5: Remove unused function declarationsYue Haibing1-3/+0
Commit 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") declared but never implemented these. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Message-Id: <20230803143041.23388-1-yuehaibing@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-116/+225
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10pds_vdpa: fix up debugfs feature bit printingShannon Nelson1-7/+6
Make clearer in debugfs output the difference between the hw feature bits, the features supported through the driver, and the features that have been negotiated. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-6-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10pds_vdpa: alloc irq vectors on DRIVER_OKAllen Hubbe1-29/+81
We were allocating irq vectors at the time the aux dev was probed, but that is before the PCI VF is assigned to a separate iommu domain by vhost_vdpa. Because vhost_vdpa later changes the iommu domain the interrupts do not work. Instead, we can allocate the irq vectors later when we see DRIVER_OK and know that the reassignment of the PCI VF to an iommu domain has already happened. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230711042437.69381-5-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10pds_vdpa: clean and reset vqs entriesShannon Nelson1-7/+17
Make sure that we initialize the vqs[] entries the same way both for initial setup and after a vq reset. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10pds_vdpa: always allow offering VIRTIO_NET_F_MACShannon Nelson3-13/+23
Our driver sets a mac if the HW is 00:..:00 so we need to be sure to advertise VIRTIO_NET_F_MAC even if the HW doesn't. We also need to be sure that virtio_net sees the VIRTIO_NET_F_MAC and doesn't rewrite the mac address that a user may have set with the vdpa utility. After reading the hw_feature bits, add the VIRTIO_NET_F_MAC to the driver's supported_features and use that for reporting what is available. If the HW is not advertising it, be sure to strip the VIRTIO_NET_F_MAC before finishing the feature negotiation. If the user specifies a device_features bitpattern in the vdpa utility without the VIRTIO_NET_F_MAC set, then don't set the mac. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10pds_vdpa: reset to vdpa specified macAllen Hubbe2-8/+9
When the vdpa device is reset, also reinitialize it with the mac address that was assigned when the device was added. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10vdpa/mlx5: Fix crash on shutdown for when no ndev existsDragos Tatulea1-12/+0
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. Instead of doing the ndev check, delete the shutdown handler altogether. The irqs will be released at the parent VF level (mlx5_core). BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]--- Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230803152648.199297-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessaryEugenio Pérez3-3/+4
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after the control virtqueue ASID iotlb has been populated. The control vq iotlb must not be cleared, since it will not be populated again. So call the ASID aware destroy function which makes sure that the right vq resource is destroyed. Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10vdpa/mlx5: Fix mr->initialized semanticsDragos Tatulea2-27/+71
The mr->initialized flag is shared between the control vq and data vq part of the mr init/uninit. But if the control vq and data vq get placed in different ASIDs, it can happen that initializing the control vq will prevent the data vq mr from being initialized. This patch consolidates the control and data vq init parts into their own init functions. The mr->initialized will now be used for the data vq only. The control vq currently doesn't need a flag. The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got split into data and control vq functions which are now also ASID aware. Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10vdpa/mlx5: Correct default number of queues when MQ is onDragos Tatulea1-1/+9
The standard specifies that the initial number of queues is the default, which is 1 (1 tx, 1 rx). Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230727172354.68243-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-08-10vduse: Use proper spinlock for IRQ injectionMaxime Coquelin1-4/+4
The IRQ injection work used spin_lock_irq() to protect the scheduling of the softirq, but spin_lock_bh() should be used. With spin_lock_irq(), we noticed delay of more than 6 seconds between the time a NAPI polling work is scheduled and the time it is executed. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: xieyongji@bytedance.com Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-08-10vdpa: Enable strict validation for netlinks opsDragos Tatulea1-6/+0
The previous patches added the missing nla policies that were required for validation to work. Now strict validation on netlink ops can be enabled. This patch does it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length checkLin Ma1-0/+1
The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa max vqp attr to avoid such bugs. Fixes: ad69dd0bf26b ("vdpa: Introduce query of device config layout") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa: Add queue index attr to vdpa_nl_policy for nlattr length checkLin Ma1-0/+1
The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa queue index attr to avoid such bugs. Fixes: 13b00b135665 ("vdpa: Add support for querying vendor statistics") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernelorg Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa: Add features attr to vdpa_nl_policy for nlattr length checkLin Ma1-0/+1
The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa features attr to avoid such bugs. Fixes: 90fea5a800c3 ("vdpa: device feature provisioning") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10pds_vdpa: protect Makefile from unconfigured debugfsShannon Nelson1-2/+1
debugfs.h protects itself from an undefined DEBUG_FS, so it is not necessary to check it in the driver code or the Makefile. The driver code had been updated for this, but the Makefile had missed the update. Link: https://lore.kernel.org/linux-next/fec68c3c-8249-7af4-5390-0495386a76f9@infradead.org/ Fixes: a16291b5bcbb ("pds_vdpa: Add new vDPA driver for AMD/Pensando DSC") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230706231718.54198-1-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-07net/mlx5: Allocate completion EQs dynamicallyMaher Sanalla1-1/+1
This commit enables the dynamic allocation of EQs at runtime, allowing for more flexibility in managing completion EQs and reducing the memory overhead of driver load. Whenever a CQ is created for a given vector index, the driver will lookup to see if there is an already mapped completion EQ for that vector, if so, utilize it. Otherwise, allocate a new EQ on demand and then utilize it for the CQ completion events. Add a protection lock to the EQ table to protect from concurrent EQ creation attempts. While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an EQ on demand if no EQ is found for the given vector. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds21-201/+1880
Pull virtio updates from Michael Tsirkin: - resume support in vdpa/solidrun - structure size optimizations in virtio_pci - new pds_vdpa driver - immediate initialization mechanism for vdpa/ifcvf - interrupt bypass for vdpa/mlx5 - multiple worker support for vhost - viirtio net in Intel F2000X-PL support for vdpa/ifcvf - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) vhost: Make parameter name match of vhost_get_vq_desc() vduse: fix NULL pointer dereference vhost: Allow worker switching while work is queueing vhost_scsi: add support for worker ioctls vhost: allow userspace to create workers vhost: replace single worker pointer with xarray vhost: add helper to parse userspace vring state/file vhost: remove vhost_work_queue vhost_scsi: flush IO vqs then send TMF rsp vhost_scsi: convert to vhost_vq_work_queue vhost_scsi: make SCSI cmd completion per vq vhost_sock: convert to vhost_vq_work_queue vhost: convert poll work to be vq based vhost: take worker or vq for flushing vhost: take worker or vq instead of dev for queueing vhost, vhost_net: add helper to check if vq has work vhost: add vhost_worker pointer to vhost_virtqueue vhost: dynamically allocate vhost_worker vhost: create worker at end of vhost_dev_set_owner virtio_bt: call scheduler when we free unused buffs ...
2023-07-03vduse: fix NULL pointer dereferenceMaxime Coquelin1-1/+5
vduse_vdpa_set_vq_affinity callback can be called with NULL value as cpu_mask when deleting the vduse device. This patch resets virtqueue's IRQ affinity mask value to set all CPUs instead of dereferencing NULL cpu_mask. [ 4760.952149] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 4760.959110] #PF: supervisor read access in kernel mode [ 4760.964247] #PF: error_code(0x0000) - not-present page [ 4760.969385] PGD 0 P4D 0 [ 4760.971927] Oops: 0000 [#1] PREEMPT SMP PTI [ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ #4 [ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020 [ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130 [ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66 [ 4761.012793] RSP: 0018:ffffb1d565abb830 EFLAGS: 00010246 [ 4761.018020] RAX: ffff9f4bf6b27898 RBX: ffff9f4be23969c0 RCX: ffff9f4bcadf6400 [ 4761.025152] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff9f4bf6b27898 [ 4761.032286] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000 [ 4761.039416] R10: 0000000000000000 R11: 0000000000000600 R12: 0000000000000000 [ 4761.046549] R13: 0000000000000000 R14: 0000000000000080 R15: ffffb1d565abbb10 [ 4761.053680] FS: 00007f64c2ec2740(0000) GS:ffff9f635f980000(0000) knlGS:0000000000000000 [ 4761.061765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4761.067513] CR2: 0000000000000000 CR3: 0000001875270006 CR4: 00000000007706e0 [ 4761.074645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4761.081775] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4761.088909] PKRU: 55555554 [ 4761.091620] Call Trace: [ 4761.094074] <TASK> [ 4761.096180] ? __die+0x1f/0x70 [ 4761.099238] ? page_fault_oops+0x171/0x4f0 [ 4761.103340] ? exc_page_fault+0x7b/0x180 [ 4761.107265] ? asm_exc_page_fault+0x22/0x30 [ 4761.111460] ? memcpy_orig+0xc5/0x130 [ 4761.115126] vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse] [ 4761.120533] virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net] [ 4761.126635] remove_vq_common+0x1a4/0x250 [virtio_net] [ 4761.131781] virtnet_remove+0x5d/0x70 [virtio_net] [ 4761.136580] virtio_dev_remove+0x3a/0x90 [ 4761.140509] device_release_driver_internal+0x19b/0x200 [ 4761.145742] bus_remove_device+0xc2/0x130 [ 4761.149755] device_del+0x158/0x3e0 [ 4761.153245] ? kernfs_find_ns+0x35/0xc0 [ 4761.157086] device_unregister+0x13/0x60 [ 4761.161010] unregister_virtio_device+0x11/0x20 [ 4761.165543] device_release_driver_internal+0x19b/0x200 [ 4761.170770] bus_remove_device+0xc2/0x130 [ 4761.174782] device_del+0x158/0x3e0 [ 4761.178276] ? __pfx_vdpa_name_match+0x10/0x10 [vdpa] [ 4761.183336] device_unregister+0x13/0x60 [ 4761.187260] vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa] Fixes: 28f6288eb63d ("vduse: Support set_vq_affinity callback") Cc: xieyongji@bytedance.com Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230622204851.318125-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-07-03vDPA/ifcvf: implement new accessors for vq_stateZhu Lingshan3-33/+17
This commit implements a better layout of the live migration bar, therefore the accessors for virtqueue state have been refactored. This commit also add a comment to the probing-ids list, indicating this driver drives F2000X-PL virtio-net Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20230612151420.1019504-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vDPA/ifcvf: detect and report max allowed vq sizeZhu Lingshan3-2/+35
Rather than a hardcode, this commit detects and reports the max value of allowed size of the virtqueues Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20230612151420.1019504-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vDPA/ifcvf: dynamic allocate vq data storesZhu Lingshan3-1/+6
This commit dynamically allocates the data stores for the virtqueues based on virtio_pci_common_cfg.num_queues. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20230612151420.1019504-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-27vdpa/mlx5: Support interrupt bypassingEli Cohen2-9/+171
Add support for generation of interrupts from the device directly to the VM to the VCPU thus avoiding the overhead on the host CPU. When supported, the driver will attempt to allocate vectors for each data virtqueue. If a vector for a virtqueue cannot be provided it will use the QP mode where notifications go through the driver. In addition, we add a shutdown callback to make sure allocated interrupts are released in case of shutdown to allow clean shutdown. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Message-Id: <20230607190007.290505-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27pds_vdpa: pds_vdps.rst and KconfigShannon Nelson1-0/+10
Add the documentation and Kconfig entry for pds_vdpa driver. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-12-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27pds_vdpa: subscribe to the pds_core eventsShannon Nelson2-1/+59
Register for the pds_core's notification events, primarily to find out when the FW has been reset so we can pass this on back up the chain. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-11-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27pds_vdpa: add support for vdpa and vdpamgmt interfacesShannon Nelson6-2/+892
This is the vDPA device support, where we advertise that we can support the virtio queues and deal with the configuration work through the pds_core's adminq. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230519215632.12343-10-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-27pds_vdpa: add vdpa config client commandsShannon Nelson4-1/+236
These are the adminq commands that will be needed for setting up and using the vDPA device. There are a number of commands defined in the FW's API, but by making use of the FW's virtio BAR we only need a few of these commands for vDPA support. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-9-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27pds_vdpa: virtio bar setup for vdpaShannon Nelson2-0/+28
Prep and use the "modern" virtio bar utilities to get our virtio config space ready. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-8-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27pds_vdpa: get vdpa management infoShannon Nelson6-1/+150
Find the vDPA management information from the DSC in order to advertise it to the vdpa subsystem. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-7-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27pds_vdpa: Add new vDPA driver for AMD/Pensando DSCShannon Nelson6-0/+144
This is the initial auxiliary driver framework for a new vDPA device driver, an auxiliary_bus client of the pds_core driver. The pds_core driver supplies the PCI services for the VF device and for accessing the adminq in the PF device. This patch adds the very basics of registering for the auxiliary device and setting up debugfs entries. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vdpa/snet: implement the resume vDPA callbackAlvaro Karsz3-0/+22
The callback sends a resume command to the DPU through the control mechanism. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230502131048.61134-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-27vdpa: solidrun: constify pointers to hwmon_channel_infoKrzysztof Kozlowski1-1/+1
Statically allocated array of pointers to hwmon_channel_info can be made const for safety. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20230511175451.282096-1-krzysztof.kozlowski@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-27vDPA/ifcvf: a vendor driver should not set _CONFIG_S_FAILEDZhu Lingshan1-3/+1
VIRTIO_CONFIG_S_FAILED indicates the guest driver has given up the device due to fatal errors. So it is the guest decision, the vendor driver should not set this status to the device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: synchronize irqs in the reset routineZhu Lingshan3-52/+40
This commit synchronize irqs of the virtqueues and config space in the reset routine. Thus ifcvf_stop() and reset() are refactored as well. This commit renames ifcvf_stop_hw() to ifcvf_stop() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: retire ifcvf_start_datapath and ifcvf_add_statusZhu Lingshan3-43/+0
Rather than former lazy-initialization mechanism, now the virtqueue operations and driver_features related ops access the virtio registers directly to take immediate actions. So ifcvf_start_datapath() should retire. ifcvf_add_status() is retierd because we should not change device status by a vendor driver's decision, this driver should only set device status which is from virito drivers upon vdpa_ops.set_status() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: get_driver_features from virtio registersZhu Lingshan3-23/+29
This commit implements a new function ifcvf_get_driver_feature() which read driver_features from virtio registers. To be less ambiguous, ifcvf_set_features() is renamed to ifcvf_set_driver_features(), and ifcvf_get_features() is renamed to ifcvf_get_dev_features() which returns the provisioned vDPA device features. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: virt queue ops take immediate actionsZhu Lingshan3-39/+45
In this commit, virtqueue operations including: set_vq_num(), set_vq_address(), set_vq_ready() and get_vq_ready() access PCI registers directly to take immediate actions. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes1-1/+1
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-08vdpa/mlx5: Fix hang when cvq commands are triggered during device unregisterDragos Tatulea1-1/+1
Currently the vdpa device is unregistered after the workqueue that processes vq commands is disabled. However, the device unregister process can still send commands to the cvq (a vlan delete for example) which leads to a hang because the handing workqueue has been disabled and the command never finishes: [ 2263.095764] rcu: INFO: rcu_sched self-detected stall on CPU [ 2263.096307] rcu: 9-....: (5250 ticks this GP) idle=dac4/1/0x4000000000000000 softirq=111009/111009 fqs=2544 [ 2263.097154] rcu: (t=5251 jiffies g=393549 q=347 ncpus=10) [ 2263.097648] CPU: 9 PID: 94300 Comm: kworker/u20:2 Not tainted 6.3.0-rc6_for_upstream_min_debug_2023_04_14_00_02 #1 [ 2263.098535] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2263.099481] Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core] [ 2263.100143] RIP: 0010:virtnet_send_command+0x109/0x170 [ 2263.100621] Code: 1d df f5 ff 85 c0 78 5c 48 8b 7b 08 e8 d0 c5 f5 ff 84 c0 75 11 eb 22 48 8b 7b 08 e8 01 b7 f5 ff 84 c0 75 15 f3 90 48 8b 7b 08 <48> 8d 74 24 04 e8 8d c5 f5 ff 48 85 c0 74 de 48 8b 83 f8 00 00 00 [ 2263.102148] RSP: 0018:ffff888139cf36e8 EFLAGS: 00000246 [ 2263.102624] RAX: 0000000000000000 RBX: ffff888166bea940 RCX: 0000000000000001 [ 2263.103244] RDX: 0000000000000000 RSI: ffff888139cf36ec RDI: ffff888146763800 [ 2263.103864] RBP: ffff888139cf3710 R08: ffff88810d201000 R09: 0000000000000000 [ 2263.104473] R10: 0000000000000002 R11: 0000000000000003 R12: 0000000000000002 [ 2263.105082] R13: 0000000000000002 R14: ffff888114528400 R15: ffff888166bea000 [ 2263.105689] FS: 0000000000000000(0000) GS:ffff88852cc80000(0000) knlGS:0000000000000000 [ 2263.106404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2263.106925] CR2: 00007f31f394b000 CR3: 000000010615b006 CR4: 0000000000370ea0 [ 2263.107542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2263.108163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2263.108769] Call Trace: [ 2263.109059] <TASK> [ 2263.109320] ? check_preempt_wakeup+0x11f/0x230 [ 2263.109750] virtnet_vlan_rx_kill_vid+0x5a/0xa0 [ 2263.110180] vlan_vid_del+0x9c/0x170 [ 2263.110546] vlan_device_event+0x351/0x760 [8021q] [ 2263.111004] raw_notifier_call_chain+0x41/0x60 [ 2263.111426] dev_close_many+0xcb/0x120 [ 2263.111808] unregister_netdevice_many_notify+0x130/0x770 [ 2263.112297] ? wq_worker_running+0xa/0x30 [ 2263.112688] unregister_netdevice_queue+0x89/0xc0 [ 2263.113128] unregister_netdev+0x18/0x20 [ 2263.113512] virtnet_remove+0x4f/0x230 [ 2263.113885] virtio_dev_remove+0x31/0x70 [ 2263.114273] device_release_driver_internal+0x18f/0x1f0 [ 2263.114746] bus_remove_device+0xc6/0x130 [ 2263.115146] device_del+0x173/0x3c0 [ 2263.115502] ? kernfs_find_ns+0x35/0xd0 [ 2263.115895] device_unregister+0x1a/0x60 [ 2263.116279] unregister_virtio_device+0x11/0x20 [ 2263.116706] device_release_driver_internal+0x18f/0x1f0 [ 2263.117182] bus_remove_device+0xc6/0x130 [ 2263.117576] device_del+0x173/0x3c0 [ 2263.117929] ? vdpa_dev_remove+0x20/0x20 [vdpa] [ 2263.118364] device_unregister+0x1a/0x60 [ 2263.118752] mlx5_vdpa_dev_del+0x4c/0x80 [mlx5_vdpa] [ 2263.119232] vdpa_match_remove+0x21/0x30 [vdpa] [ 2263.119663] bus_for_each_dev+0x71/0xc0 [ 2263.120054] vdpa_mgmtdev_unregister+0x57/0x70 [vdpa] [ 2263.120520] mlx5v_remove+0x12/0x20 [mlx5_vdpa] [ 2263.120953] auxiliary_bus_remove+0x18/0x30 [ 2263.121356] device_release_driver_internal+0x18f/0x1f0 [ 2263.121830] bus_remove_device+0xc6/0x130 [ 2263.122223] device_del+0x173/0x3c0 [ 2263.122581] ? devl_param_driverinit_value_get+0x29/0x90 [ 2263.123070] mlx5_rescan_drivers_locked+0xc4/0x2d0 [mlx5_core] [ 2263.123633] mlx5_unregister_device+0x54/0x80 [mlx5_core] [ 2263.124169] mlx5_uninit_one+0x54/0x150 [mlx5_core] [ 2263.124656] mlx5_sf_dev_remove+0x45/0x90 [mlx5_core] [ 2263.125153] auxiliary_bus_remove+0x18/0x30 [ 2263.125560] device_release_driver_internal+0x18f/0x1f0 [ 2263.126052] bus_remove_device+0xc6/0x130 [ 2263.126451] device_del+0x173/0x3c0 [ 2263.126815] mlx5_sf_dev_remove+0x39/0xf0 [mlx5_core] [ 2263.127318] mlx5_sf_dev_state_change_handler+0x178/0x270 [mlx5_core] [ 2263.127920] blocking_notifier_call_chain+0x5a/0x80 [ 2263.128379] mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core] [ 2263.128951] process_one_work+0x1bb/0x3c0 [ 2263.129355] ? process_one_work+0x3c0/0x3c0 [ 2263.129766] worker_thread+0x4d/0x3c0 [ 2263.130140] ? process_one_work+0x3c0/0x3c0 [ 2263.130548] kthread+0xb9/0xe0 [ 2263.130895] ? kthread_complete_and_exit+0x20/0x20 [ 2263.131349] ret_from_fork+0x1f/0x30 [ 2263.131717] </TASK> The fix is to disable and destroy the workqueue after the device unregister. It is expected that vhost will not trigger kicks after the unregister. But even if it would, the wq is disabled already by setting the pointer to NULL (done so in the referenced commit). Fixes: ad6dc1daaf29 ("vdpa/mlx5: Avoid processing works if workqueue was destroyed") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230516095800.3549932-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-08vduse: avoid empty string for dev nameSheng Zhao1-0/+3
Syzkaller hits a kernel WARN when the first character of the dev name provided is NULL. Solution is to add a NULL check before calling cdev_device_add() in vduse_create_dev(). kobject: (0000000072042169): attempted to be registered with empty name! WARNING: CPU: 0 PID: 112695 at lib/kobject.c:236 Call Trace: kobject_add_varg linux/src/lib/kobject.c:390 [inline] kobject_add+0xf6/0x150 linux/src/lib/kobject.c:442 device_add+0x28f/0xc20 linux/src/drivers/base/core.c:2167 cdev_device_add+0x83/0xc0 linux/src/fs/char_dev.c:546 vduse_create_dev linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2254 [inline] vduse_ioctl+0x7b5/0xf30 linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2316 vfs_ioctl linux/src/fs/ioctl.c:47 [inline] file_ioctl linux/src/fs/ioctl.c:510 [inline] do_vfs_ioctl+0x14b/0xa80 linux/src/fs/ioctl.c:697 ksys_ioctl+0x7c/0xa0 linux/src/fs/ioctl.c:714 __do_sys_ioctl linux/src/fs/ioctl.c:721 [inline] __se_sys_ioctl linux/src/fs/ioctl.c:719 [inline] __x64_sys_ioctl+0x42/0x50 linux/src/fs/ioctl.c:719 do_syscall_64+0x94/0x330 linux/src/arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: "Xie Yongji" <xieyongji@bytedance.com> Reported-by: Xianjun Zeng <zengxianjun@bytedance.com> Signed-off-by: Sheng Zhao <sheng.zhao@bytedance.com> Message-Id: <20230530033626.1266794-1-sheng.zhao@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Cc: "Michael S. Tsirkin"<mst@redhat.com>, "Jason Wang"<jasowang@redhat.com>, Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-04-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds11-309/+1178
Pull virtio updates from Michael Tsirkin: "virtio,vhost,vdpa: features, fixes, and cleanups: - reduction in interrupt rate in virtio - perf improvement for VDUSE - scalability for vhost-scsi - non power of 2 ring support for packed rings - better management for mlx5 vdpa - suspend for snet - VIRTIO_F_NOTIFICATION_DATA - shared backend with vdpa-sim-blk - user VA support in vdpa-sim - better struct packing for virtio and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits) vhost_vdpa: fix unmap process in no-batch mode MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS tools/virtio: fix build caused by virtio_ring changes virtio_ring: add a struct device forward declaration vdpa_sim_blk: support shared backend vdpa_sim: move buffer allocation in the devices vdpa/snet: use likely/unlikely macros in hot functions vdpa/snet: implement kick_vq_with_data callback virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support virtio: add VIRTIO_F_NOTIFICATION_DATA feature support vdpa/snet: support the suspend vDPA callback vdpa/snet: support getting and setting VQ state MAINTAINERS: add vringh.h to Virtio Core and Net Drivers vringh: address kdoc warnings vdpa: address kdoc warnings virtio_ring: don't update event idx on get_buf vdpa_sim: add support for user VA vdpa_sim: replace the spinlock with a mutex to protect the state vdpa_sim: use kthread worker vdpa_sim: make devices agnostic for work management ...
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-21vdpa_sim_blk: support shared backendStefano Garzarella1-7/+50
The vdpa_sim_blk simulator uses a ramdisk as the backend. To test live migration, we need two devices that share the backend to have the data synchronized with each other. Add a new module parameter to make the buffer shared between all devices. The shared_buffer_mutex is used just to ensure that each operation is atomic, but it is up to the user to use the devices knowing that the underlying ramdisk is shared. For example, when we do a migration, the VMM (e.g., QEMU) will guarantee to write to the destination device, only after completing operations with the source device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230407133658.66339-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: move buffer allocation in the devicesStefano Garzarella4-21/+57
Currently, the vdpa_sim core does not use the buffer, but only allocates it. The buffer is used by devices differently, and some future devices may not use it. So let's move all its management inside the devices. Add a new `free` device callback called to clean up the resources allocated by the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230407133658.66339-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/snet: use likely/unlikely macros in hot functionsAlvaro Karsz1-3/+3
- kick callback: most likely that the VQ is ready. - interrupt handlers: most likely that the callback is not NULL. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230409120242.3460074-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/snet: implement kick_vq_with_data callbackAlvaro Karsz1-0/+13
Implement the kick_vq_with_data vDPA callback. On kick, we pass the next available data to the DPU by writing it in the kick offset. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230417083853.375076-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/snet: support the suspend vDPA callbackAlvaro Karsz3-0/+22
When suspend is called, the driver sends a suspend command to the DPU through the control mechanism. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230413073337.31367-3-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vdpa/snet: support getting and setting VQ stateAlvaro Karsz5-71/+387
This patch adds the get_vq_state and set_vq_state vDPA callbacks. In order to get the VQ state, the state needs to be read from the DPU. In order to allow that, the old messaging mechanism is replaced with a new, flexible control mechanism. This mechanism allows to read data from the DPU. The mechanism can be used if the negotiated config version is 2 or higher. If the new mechanism is used when the config version is 1, it will call snet_send_ctrl_msg_old, which is config 1 compatible. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230413073337.31367-2-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vdpa_sim: add support for user VAStefano Garzarella2-8/+90
The new "use_va" module parameter (default: true) is used in vdpa_alloc_device() to inform the vDPA framework that the device supports VA. vringh is initialized to use VA only when "use_va" is true and the user's mm has been bound. So, only when the bus supports user VA (e.g. vhost-vdpa). vdpasim_mm_work_fn work is used to serialize the binding to a new address space when the .bind_mm callback is invoked, and unbinding when the .unbind_mm callback is invoked. Call mmget_not_zero()/kthread_use_mm() inside the worker function to pin the address space only as long as needed, following the documentation of mmget() in include/linux/sched/mm.h: * Never use this function to pin this address space for an * unbounded/indefinite amount of time. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131734.45943-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: replace the spinlock with a mutex to protect the stateStefano Garzarella4-23/+23
The spinlock we use to protect the state of the simulator is sometimes held for a long time (for example, when devices handle requests). This also prevents us from calling functions that might sleep (such as kthread_flush_work() in the next patch), and thus having to release and retake the lock. For these reasons, let's replace the spinlock with a mutex that gives us more flexibility. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131730.45920-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: use kthread workerStefano Garzarella2-7/+15
Let's use our own kthread to run device jobs. This allows us more flexibility, especially we can attach the kthread to the user address space when vDPA uses user's VA. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131725.45908-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: make devices agnostic for work managementStefano Garzarella4-11/+21
Let's move work management inside the vdpa_sim core. This way we can easily change how we manage the works, without having to change the devices each time. Acked-by: Eugenio Pérez Martin <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131721.45886-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Support specifying bounce buffer size via sysfsXie Yongji1-1/+44
As discussed in [1], this adds sysfs interface to support specifying bounce buffer size in virtio-vdpa case. It would be a performance tuning parameter for high throughput workloads. [1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/ Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-12-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Delay iova domain creationXie Yongji1-22/+53
Delay creating iova domain until the vduse device is registered to vdpa bus. This is a preparation for adding sysfs interface to support specifying bounce buffer size for the iova domain. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-11-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Signal vq trigger eventfd directly if possibleXie Yongji1-4/+26
Now the vdpa callback will associate an trigger eventfd in some cases. For performance reasons, VDUSE can signal it directly during irq injection. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-10-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Add sysfs interface for irq callback affinityXie Yongji1-11/+113
Add sysfs interface for each vduse virtqueue to get/set the affinity for irq callback. This might be useful for performance tuning when the irq callback affinity mask contains more than one CPU. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-8-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support get_vq_affinity callbackXie Yongji1-0/+9
This implements get_vq_affinity callback so that the virtio-blk driver can build the blk-mq queues based on the irq callback affinity. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-7-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support set_vq_affinity callbackXie Yongji1-7/+54
Since virtio-vdpa bus driver already support interrupt affinity spreading mechanism, let's implement the set_vq_affinity callback to bring it to vduse device. After we get the virtqueue's affinity, we can spread IRQs between CPUs in the affinity mask, in a round-robin manner, to run the irq callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Refactor allocation for vduse virtqueuesXie Yongji1-32/+66
Allocate memory for vduse virtqueues one by one instead of doing one allocation for all of them. This is a preparation for adding sysfs interface for virtqueues. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/mlx5: Extend driver support for new featuresEli Cohen1-16/+40
Extend the possible list for features that can be supported by firmware. Note that different versions of firmware may or may not support these features. The driver is made aware of them by querying the firmware. While doing this, improve the code so we use enum names instead of hard coded numerical values. The new features supported by the driver are the following: VIRTIO_NET_F_MRG_RXBUF VIRTIO_NET_F_HOST_ECN VIRTIO_NET_F_GUEST_ECN VIRTIO_NET_F_GUEST_TSO6 VIRTIO_NET_F_GUEST_TSO4 Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
2023-04-21vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by defaultEli Cohen1-0/+2
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF. Current firmware versions show degradation in packet rate when using MRG_RXBUF. Users who favor memory saving over packet rate could enable this feature but we want to keep it off by default. One can still enable it when creating the vdpa device using vdpa tool by providing features that include it. For example: $ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2023-04-21vdpa/mlx5: Avoid losing link state updatesEli Cohen1-89/+114
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not negotiated. However, link state updates could be received before feature negotiation was completed , therefore causing link state events to be lost, possibly leaving the link state down. Modify the code so link state notifier is registered after DRIVER_OK was negotiated and carry the registration only if VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the device is reset. Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230417110343.138319-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-04vdpa_sim_net: complete the initialization before register the deviceStefano Garzarella1-4/+9
Initialization must be completed before calling _vdpa_register_device() since it can connect the device to the vDPA bus, so requests can arrive after that call. So for example vdpasim_net_work(), which uses the net->*_stats variables, can be scheduled before they are initialized. Let's move _vdpa_register_device() to the end of vdpasim_net_dev_add() and add a comment to avoid future issues. Fixes: 0899774cb360 ("vdpa_sim_net: vendor satistics") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230329160321.187176-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-04vdpa/mlx5: Add and remove debugfs in setup/teardown driverEli Cohen1-2/+6
The right place to add the debugfs create is in setup_driver() and remove it in teardown_driver(). Current code adds the debugfs when creating the device but resetting a device will remove the debugfs subtree and subsequent set_driver will not be able to create the files since the debugfs pointer is NULL. Fixes: 294221004322 ("vdpa/mlx5: Add debugfs subtree") Signed-off-by: Eli Cohen <elic@nvidia.com> v3 -> v4: Fix error flow in setup_driver() Message-Id: <20230403114039.11102-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-03Merge 6.3-rc5 into driver-core-nextGreg Kroah-Hartman4-2/+18
We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17driver core: class: remove module * from class_create()Greg Kroah-Hartman1-1/+1
The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-13vdpa_sim: set last_used_idx as last_avail_idx in vdpasim_queue_readyEugenio Pérez1-0/+11
Starting from an used_idx different than 0 is needed in use cases like virtual machine migration. Not doing so and letting the caller set an avail idx different than 0 causes destination device to try to use old buffers that source driver already recover and are not available anymore. Since vdpa_sim does not support receive inflight descriptors as a destination of a migration, let's set both avail_idx and used_idx the same at vq start. This is how vhost-user works in a VHOST_SET_VRING_BASE call. Although the simple fix is to set last_used_idx at vdpasim_set_vq_state, it would be reset at vdpasim_queue_ready. The last_avail_idx case is fixed with commit 0e84f918fac8 ("vdpa_sim: not reset state in vdpasim_queue_ready"). Since the only option is to make it equal to last_avail_idx, adding the only change needed here. This was discovered and tested live migrating the vdpa_sim_net device. Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230302181857.925374-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-10vdpa/mlx5: should not activate virtq object when suspendedSi-Wei Liu2-1/+6
Otherwise the virtqueue object to instate could point to invalid address that was unmapped from the MTT: mlx5_core 0000:41:04.2: mlx5_cmd_out_err:782:(pid 8321): CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status bad parameter(0x3), syndrome (0x5fa1c), err(-22) Fixes: cae15c2ed8e6 ("vdpa/mlx5: Implement susupend virtqueue callback") Cc: Eli Cohen <elic@nvidia.com> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1676424640-11673-1-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-03-10vp_vdpa: fix the crash in hot unplug with vp_vdpaCindy Lu1-1/+1
While unplugging the vp_vdpa device, it triggers a kernel panic The root cause is: vdpa_mgmtdev_unregister() will accesses modern devices which will cause a use after free. So need to change the sequence in vp_vdpa_remove [ 195.003359] BUG: unable to handle page fault for address: ff4e8beb80199014 [ 195.004012] #PF: supervisor read access in kernel mode [ 195.004486] #PF: error_code(0x0000) - not-present page [ 195.004960] PGD 100000067 P4D 1001b6067 PUD 1001b7067 PMD 1001b8067 PTE 0 [ 195.005578] Oops: 0000 1 PREEMPT SMP PTI [ 195.005968] CPU: 13 PID: 164 Comm: kworker/u56:10 Kdump: loaded Not tainted 5.14.0-252.el9.x86_64 #1 [ 195.006792] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20221207gitfff6d81270b5-2.el9 unknown [ 195.007556] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 195.008059] RIP: 0010:ioread8+0x31/0x80 [ 195.008418] Code: 77 28 48 81 ff 00 00 01 00 76 0b 89 fa ec 0f b6 c0 c3 cc cc cc cc 8b 15 ad 72 93 01 b8 ff 00 00 00 85 d2 75 0f c3 cc cc cc cc <8a> 07 0f b6 c0 c3 cc cc cc cc 83 ea 01 48 83 ec 08 48 89 fe 48 c7 [ 195.010104] RSP: 0018:ff4e8beb8067bab8 EFLAGS: 00010292 [ 195.010584] RAX: ffffffffc05834a0 RBX: ffffffffc05843c0 RCX: ff4e8beb8067bae0 [ 195.011233] RDX: ff1bcbd580f88000 RSI: 0000000000000246 RDI: ff4e8beb80199014 [ 195.011881] RBP: ff1bcbd587e39000 R08: ffffffff916fa2d0 R09: ff4e8beb8067ba68 [ 195.012527] R10: 000000000000001c R11: 0000000000000000 R12: ff1bcbd5a3de9120 [ 195.013179] R13: ffffffffc062d000 R14: 0000000000000080 R15: ff1bcbe402bc7805 [ 195.013826] FS: 0000000000000000(0000) GS:ff1bcbe402740000(0000) knlGS:0000000000000000 [ 195.014564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 195.015093] CR2: ff4e8beb80199014 CR3: 0000000107dea002 CR4: 0000000000771ee0 [ 195.015741] PKRU: 55555554 [ 195.016001] Call Trace: [ 195.016233] <TASK> [ 195.016434] vp_modern_get_status+0x12/0x20 [ 195.016823] vp_vdpa_reset+0x1b/0x50 [vp_vdpa] [ 195.017238] virtio_vdpa_reset+0x3c/0x48 [virtio_vdpa] [ 195.017709] remove_vq_common+0x1f/0x3a0 [virtio_net] [ 195.018178] virtnet_remove+0x5d/0x70 [virtio_net] [ 195.018618] virtio_dev_remove+0x3d/0x90 [ 195.018986] device_release_driver_internal+0x1aa/0x230 [ 195.019466] bus_remove_device+0xd8/0x150 [ 195.019841] device_del+0x18b/0x3f0 [ 195.020167] ? kernfs_find_ns+0x35/0xd0 [ 195.020526] device_unregister+0x13/0x60 [ 195.020894] unregister_virtio_device+0x11/0x20 [ 195.021311] device_release_driver_internal+0x1aa/0x230 [ 195.021790] bus_remove_device+0xd8/0x150 [ 195.022162] device_del+0x18b/0x3f0 [ 195.022487] device_unregister+0x13/0x60 [ 195.022852] ? vdpa_dev_remove+0x30/0x30 [vdpa] [ 195.023270] vp_vdpa_dev_del+0x12/0x20 [vp_vdpa] [ 195.023694] vdpa_match_remove+0x2b/0x40 [vdpa] [ 195.024115] bus_for_each_dev+0x78/0xc0 [ 195.024471] vdpa_mgmtdev_unregister+0x65/0x80 [vdpa] [ 195.024937] vp_vdpa_remove+0x23/0x40 [vp_vdpa] [ 195.025353] pci_device_remove+0x36/0xa0 [ 195.025719] device_release_driver_internal+0x1aa/0x230 [ 195.026201] pci_stop_bus_device+0x6c/0x90 [ 195.026580] pci_stop_and_remove_bus_device+0xe/0x20 [ 195.027039] disable_slot+0x49/0x90 [ 195.027366] acpiphp_disable_and_eject_slot+0x15/0x90 [ 195.027832] hotplug_event+0xea/0x210 [ 195.028171] ? hotplug_event+0x210/0x210 [ 195.028535] acpiphp_hotplug_notify+0x22/0x80 [ 195.028942] ? hotplug_event+0x210/0x210 [ 195.029303] acpi_device_hotplug+0x8a/0x1d0 [ 195.029690] acpi_hotplug_work_fn+0x1a/0x30 [ 195.030077] process_one_work+0x1e8/0x3c0 [ 195.030451] worker_thread+0x50/0x3b0 [ 195.030791] ? rescuer_thread+0x3a0/0x3a0 [ 195.031165] kthread+0xd9/0x100 [ 195.031459] ? kthread_complete_and_exit+0x20/0x20 [ 195.031899] ret_from_fork+0x22/0x30 [ 195.032233] </TASK> Fixes: ffbda8e9df10 ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa") Tested-by: Lei Yang <leiyang@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230214080924.131462-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds20-382/+2433
Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-20Merge tag 'locking-core-2023-02-20' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - rwsem micro-optimizations - spinlock micro-optimizations - cleanups, simplifications * tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vduse: Remove include of rwlock.h locking/lockdep: Remove lockdep_init_map_crosslock. x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock() x86/PAT: Use try_cmpxchg() in set_page_memtype() locking/rwsem: Disable preemption in all down_write*() and up_write() code paths locking/rwsem: Disable preemption in all down_read*() and up_read() code paths locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath locking/qspinlock: Micro-optimize pending state waiting for unlock
2023-02-20vdpa/mlx5: support device features provisioningSi-Wei Liu1-8/+45
This patch implements features provisioning for mlx5_vdpa. 1) Validate the provisioned features are a subset of the parent features. 2) Clearing features that are not wanted by userspace. For example: # vdpa mgmtdev show pci/0000:41:04.2: supported_classes net max_supported_vqs 65 dev_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 1) Provision vDPA device with all features derived from the parent # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 # vdpa dev config show vdpa1: mac e4:11:c6:d3:45:f0 link up link_announce false max_vq_pairs 1 mtu 1500 negotiated_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 2) Provision vDPA device with a subset of parent features # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 device_features 0x300020000 # vdpa dev config show vdpa1: negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-7-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa/mlx5: make MTU/STATUS presence conditional on feature bitsSi-Wei Liu1-8/+14
The spec says: mtu only exists if VIRTIO_NET_F_MTU is set status only exists if VIRTIO_NET_F_STATUS is set We should only present MTU and STATUS conditionally depending on the feature bits. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-6-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa: validate device feature provisioning against supported classSi-Wei Liu1-9/+50
Today when device features are explicitly provisioned, the features user supplied may contain device class specific features that are not supported by the parent management device. On the other hand, when parent management device supports more than one class, the device features to provision may be ambiguous if none of the class specific attributes is provided at the same time. Validate these cases and prompt appropriate user errors accordingly. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <1675725124-7375-5-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa: validate provisioned device features against specified attributeSi-Wei Liu1-0/+18
With device feature provisioning, there's a chance for misconfiguration that the vdpa feature attribute supplied in 'vdpa dev add' command doesn't get selected on the device_features to be provisioned. For instance, when a @mac attribute is specified, the corresponding feature bit _F_MAC in device_features should be set for consistency. If there's conflict on provisioned features against the attribute, it should be treated as an error to fail the ambiguous command. Noted the opposite is not necessarily true, for e.g. it's okay to have _F_MAC set in device_features without providing a corresponding @mac attribute, in which case the vdpa vendor driver could load certain default value for attribute that is not explicitly specified. Generalize this check in vdpa core so that there's no duplicate code in each vendor driver. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-4-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa: conditionally read STATUS in config spaceSi-Wei Liu1-5/+15
The spec says: status only exists if VIRTIO_NET_F_STATUS is set Similar to MAC and MTU, vdpa_dev_net_config_fill() should read STATUS conditionally depending on the feature bits. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-3-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa: fix improper error message when adding vdpa devSi-Wei Liu1-2/+4
In below example, before the fix, mtu attribute is supported by the parent mgmtdev, but the error message showing "All provided are not supported" is just misleading. $ vdpa mgmtdev show vdpasim_net: supported_classes net max_supported_vqs 3 dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2 Error: vdpa: All provided attributes are not supported. kernel answers: Operation not supported After fix, the relevant error message will be like: $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2 Error: vdpa: Some provided attributes are not supported: 0x1000. kernel answers: Operation not supported Fixes: d8ca2fa5be1b ("vdpa: Enable user to set mac and mtu of vdpa device") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-2-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa/mlx5: Initialize CVQ iotlb spinlockEli Cohen1-0/+1
Initialize itolb spinlock. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230206122016.1149373-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa/mlx5: Don't clear mr struct on destroy MREli Cohen1-1/+0
Clearing the mr struct erases the lock owner and causes warnings to be emitted. It is not required to clear the mr so remove the memset call. Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230206121956.1149356-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa/mlx5: Directly assign memory keyEli Cohen1-1/+1
When creating a memory key, the key value should be assigned to the passed pointer and not or'ed to. No functional issue was observed due to this bug. Fixes: 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230205072906.1108194-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa: Fix a couple of spelling mistakes in some messagesColin Ian King2-2/+2
There are two spelling mistakes in some literal strings. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20230130092644.37002-1-colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20vdpa: mlx5: support per virtqueue dma deviceJason Wang1-0/+11
This patch implements per virtqueue dma device for mlx5_vdpa. This is needed for virtio_vdpa to work for CVQ which is backed by vringh but not DMA. We simply advertise the vDPA device itself as the DMA device for CVQ then DMA API can simply use PA so the identical mapping for CVQ can still be used. Otherwise the identical (1:1) mapping won't work when platform IOMMU is enabled since the IOVA is allocated on demand which is not necessarily the PA. This fixes the following crash when mlx5 vDPA device is bound to virtio-vdpa with platform IOMMU enabled but not in passthrough mode: BUG: unable to handle page fault for address: ff2fb3063deb1002 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1393001067 P4D 1393002067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 55 PID: 8923 Comm: kworker/u112:3 Kdump: loaded Not tainted 6.1.0+ #7 Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021 Workqueue: mlx5_vdpa_wq mlx5_cvq_kick_handler [mlx5_vdpa] RIP: 0010:vringh_getdesc_iotlb+0x93/0x1d0 [vringh] Code: 14 25 40 ef 01 00 83 82 c0 0a 00 00 01 48 2b 05 93 5a 1b ea 8b 4c 24 14 48 c1 f8 06 48 c1 e0 0c 48 03 05 90 5a 1b ea 48 01 c8 <0f> b7 00 83 aa c0 0a 00 00 01 65 ff 0d bc e4 41 3f 0f 84 05 01 00 RSP: 0018:ff46821ba664fdf8 EFLAGS: 00010282 RAX: ff2fb3063deb1002 RBX: 0000000000000a20 RCX: 0000000000000002 RDX: ff2fb318d2f94380 RSI: 0000000000000002 RDI: 0000000000000001 RBP: ff2fb3065e832410 R08: ff46821ba664fe00 R09: 0000000000000001 R10: 0000000000000000 R11: 000000000000000d R12: ff2fb3065e832488 R13: ff2fb3065e8324a8 R14: ff2fb3065e8324c8 R15: ff2fb3065e8324a8 FS: 0000000000000000(0000) GS:ff2fb3257fac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ff2fb3063deb1002 CR3: 0000001392010006 CR4: 0000000000771ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> mlx5_cvq_kick_handler+0x89/0x2b0 [mlx5_vdpa] process_one_work+0x1e2/0x3b0 ? rescuer_thread+0x390/0x390 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xd6/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Reviewed-by: Eli Cohen <elic@nvidia.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-6-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa: set dma mask for vDPA deviceJason Wang1-0/+5
Setting DMA mask for vDPA device in case that there are virtqueue that is not backed by DMA so the vDPA device could be advertised as the DMA device that is used by DMA API for software emulated virtqueues. Reviewed-by: Eli Cohen <elic@nvidia.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-5-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa_sim: get rid of DMA opsJason Wang2-150/+22
We used to (ab)use the DMA ops for setting up identical mappings in the IOTLB. This patch tries to get rid of the those unnecessary DMA ops by maintaining a simple identical/passthrough mappings by default. When bound to virtio_vdpa driver, DMA API will simply use PA as the IOVA and we will be all fine. When the vDPA bus tries to setup customized mapping (e.g when bound to vhost-vDPA), the identical/passthrough mapping will be removed. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221223060021.28011-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de>
2023-02-20vdpa_sim_net: vendor satisticsJason Wang1-6/+213
This patch adds support for basic vendor stats that include counters for tx, rx and cvq. Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221223055548.27810-5-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20vdpa_sim: support vendor statisticsJason Wang2-0/+17
This patch adds a new config ops callback to allow individual simulator to implement the vendor stats callback. Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221223055548.27810-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20vdpasim: customize allocation sizeJason Wang4-1/+7
Allow individual simulator to customize the allocation size. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221223055548.27810-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa_sim: switch to use __vdpa_alloc_device()Jason Wang1-5/+8
This allows us to control the allocation size of the structure. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221223055548.27810-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa_sim: use weak barriersJason Wang1-1/+1
vDPA simulators are software emulated device, so let's switch to use weak barriers to avoid extra overhead in the driver. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221221062146.15356-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2023-02-20vdpa_sim: Implement resume vdpa opSebastien Boeuf2-0/+30
Implement resume operation for vdpa_sim devices, so vhost-vdpa will offer that backend feature and userspace can effectively resume the device. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Message-Id: <15a4566826033c5dd9a2167e5cfb0ef4d90cea49.1672742878.git.sebastien.boeuf@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20virtio: vdpa: new SolidNET DPU driver.Alvaro Karsz6-0/+1518
This commit includes: 1) The driver to manage the controlplane over vDPA bus. 2) A HW monitor device to read health values from the DPU. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230110165638.123745-4-alvaro.karsz@solid-run.com> Message-Id: <20230209075128.78915-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa_sim_net: Offer VIRTIO_NET_F_STATUSEugenio Pérez1-0/+1
VIRTIO_NET_S_LINK_UP is already returned in config reads since vdpasim creation, but the feature bit was not offered to the driver. Tested modifying VIRTIO_NET_S_LINK_UP and different values of "status" in qemu virtio-net options, using vhost_vdpa. Not considering as a fix, because there should be no driver trusting in this config read before the feature flag. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20221117155502.1394700-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-20vDPA/ifcvf: implement features provisioningZhu Lingshan3-1/+17
This commit implements features provisioning for ifcvf, that means: 1)checkk whether the provisioned features are supported by the management device 2)vDPA device only presents selected feature bits Examples: a)The management device supported features: $ vdpa mgmtdev show pci/0000:01:00.5 pci/0000:01:00.5: supported_classes net max_supported_vqs 9 dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM b)Provision a vDPA device with all supported features: $ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5 $ vdpa/vdpa dev config show vdpa0 vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500 negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM c)Provision a vDPA device with a subset of the supported features: $ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5 device_features 0x300020020 $ vdpa dev config show vdpa0 mac 00:e8:ca:11:be:05 link up link_announce false negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20221125145724.1129962-13-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: retire ifcvf_private_to_vfZhu Lingshan2-8/+5
This commit retires ifcvf_private_to_vf, because the vf is already a member of the adapter, so it could be easily addressed by adapter->vf. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20221125145724.1129962-12-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: allocate the adapter in dev_add()Zhu Lingshan1-21/+13
The adapter is the container of the vdpa_device, this commits allocate the adapter in dev_add() rather than in probe(). So that the vdpa_device() could be re-created when the userspace creates the vdpa device, and free-ed in dev_del() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: manage ifcvf_hw in the mgmt_devZhu Lingshan2-5/+7
This commit allocates the hw structure in the management device structure. So the hardware can be initialized once the management device is allocated in probe. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: ifcvf_request_irq works on ifcvf_hwZhu Lingshan1-3/+2
All ifcvf_request_irq's callees are refactored to work on ifcvf_hw, so it should be decoupled from the adapter as well Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: decouple config/dev IRQ requester and vectors allocator from the ↵Zhu Lingshan1-12/+9
adapter This commit decouples the config irq requester, the device shared irq requester and the MSI vectors allocator from the adapter. So they can be safely invoked since probe before the adapter is allocated. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: decouple vq irq requester from the adapterZhu Lingshan1-11/+8
This commit decouples the vq irq requester from the adapter, so that these functions can be invoked since probe. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: decouple config IRQ releaser from the adapterZhu Lingshan1-12/+10
This commit decouples config IRQ releaser from the adapter, so that it could be invoked once probe or in err handlers. ifcvf_free_irq() works on ifcvf_hw in this commit Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: decouple vq IRQ releasers from the adapterZhu Lingshan1-12/+9
This commit decouples the IRQ releasers from the adapter, so that these functions could be safely invoked once probe Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: alloc the mgmt_dev before the adapterZhu Lingshan1-17/+14
This commit reverses the order of allocating the management device and the adapter. So that it would be possible to move the allocation of the adapter to dev_add(). Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: decouple config space ops from the adapterZhu Lingshan1-16/+5
This commit decopules the config space ops from the adapter layer, so these functions can be invoked once the device is probed. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vDPA/ifcvf: decouple hw features manipulators from the adapterZhu Lingshan3-7/+4
This commit gets rid of ifcvf_adapter in hw features related functions in ifcvf_base. Then these functions are more rubust and de-coupling from the ifcvf_adapter layer. So these functions could be invoded once the device is probed, even before the adapter is allocaed. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Cc: stable@vger.kernel.org Message-Id: <20221125145724.1129962-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa/mlx5: Add RX counters to debugfsEli Cohen4-30/+219
For each interface, either VLAN tagged or untagged, add two hardware counters: one for unicast and another for multicast. The counters count RX packets and bytes and can be read through debugfs: $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes This feature is controlled via the config option MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some impact on performance. includes a fixup By Yang Yingliang <yangyingliang@huawei.com>: vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules() The local variable 'rule' is not used anymore, fix return value check after calling mlx5_add_flow_rules(). Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-9-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20vdpa/mlx5: Add debugfs subtreeEli Cohen4-1/+87
Add debugfs subtree and expose flow table ID and TIR number. This information can be used by external tools to do extended troubleshooting. The information can be retrieved like so: $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-8-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20vdpa/mlx5: Move some definitions to a new header fileEli Cohen2-44/+56
Move some definitions from mlx5_vnet.c to newly added header file mlx5_vnet.h. We need these definitions for the following patches that add debugfs tree to expose information vital for debug. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-7-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-13vdpa_sim: not reset state in vdpasim_queue_readyEugenio Pérez1-0/+2
vdpasim_queue_ready calls vringh_init_iotlb, which resets split indexes. But it can be called after setting a ring base with vdpasim_set_vq_state. Fix it by stashing them. They're still resetted in vdpasim_vq_reset. This was discovered and tested live migrating the vdpa_sim_net device. Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230118164359.1523760-2-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan1-1/+1
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-06vduse: Remove include of rwlock.hSebastian Andrzej Siewior1-1/+0
rwlock.h should not be included directly. Instead linux/splinlock.h should be included. Including it directly will break the RT build. Remove the rwlock.h include. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-27vdpa: ifcvf: Do proper cleanup if IFCVF init failsTanmay Bhushan1-1/+1
ifcvf_mgmt_dev leaks memory if it is not freed before returning. Call is made to correct return statement so memory does not leak. ifcvf_init_hw does not take care of this so it is needed to do it here. Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Message-Id: <772e9fe133f21fa78fb98a2ebe8969efbbd58e3c.camel@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Zhu Lingshan <lingshan.zhu@intel.com>
2022-12-28vdpa_sim_net: should not drop the multicast/broadcast packetCindy Lu1-0/+3
In the receive_filter(), should not drop the packet with the broadcast/multicast address. Add the check for this Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20221214054306.24145-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-12-28vdpasim: fix memory leak when freeing IOTLBsJason Wang1-1/+3
After commit bda324fd037a ("vdpasim: control virtqueue support"), vdpasim->iommu became an array of IOTLB, so we should clean the mappings of each free one by one instead of just deleting the ranges in the first IOTLB which may leak maps. Fixes: bda324fd037a ("vdpasim: control virtqueue support") Cc: Gautam Dawar <gautam.dawar@xilinx.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221213090717.61529-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Gautam Dawar <gautam.dawar@amd.com>
2022-12-28vdpa: conditionally fill max max queue pair for statsJason Wang1-5/+4
For the device without multiqueue feature, we will read 0 as max_virtqueue_pairs from the config. So if we fill VDPA_ATTR_DEV_NET_CFG_MAX_VQP with the value we read from the config we will confuse the user. Fixing this by only filling the value when multiqueue is offered by the device so userspace can assume 1 when the attr is not provided. Fixes: 13b00b135665c("vdpa: Add support for querying vendor statistics") Cc: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220907060110.4511-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com>
2022-12-28vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_removeRong Wang1-1/+1
In vp_vdpa_remove(), the code kfree(&vp_vdpa_mgtdev->mgtdev.id_table) uses a reference of pointer as the argument of kfree, which is the wrong pointer and then may hit crash like this: Unable to handle kernel paging request at virtual address 00ffff003363e30c Internal error: Oops: 96000004 [#1] SMP Call trace: rb_next+0x20/0x5c ext4_readdir+0x494/0x5c4 [ext4] iterate_dir+0x168/0x1b4 __se_sys_getdents64+0x68/0x170 __arm64_sys_getdents64+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Code: 54000220 f9400441 b4000161 aa0103e0 (f9400821) SMP: stopping secondary CPUs Starting crashdump kernel... Fixes: ffbda8e9df10 ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa") Signed-off-by: Rong Wang <wangrong68@huawei.com> Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Message-Id: <20221207120813.2837529-1-sunnanyong@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cindy Lu <lulu@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-12-28vduse: Validate vq_num in vduse_validate_config()Harshit Mogalapalli1-0/+3
Add a limit to 'config->vq_num' which is user controlled data which comes from an vduse_ioctl to prevent large memory allocations. Micheal says - This limit is somewhat arbitrary. However, currently virtio pci and ccw are limited to a 16 bit vq number. While MMIO isn't it is also isn't used with lots of VQs due to current lack of support for per-vq interrupts. Thus, the 0xffff limit on number of VQs corresponding to a 16-bit VQ number seems sufficient for now. This is found using static analysis with smatch. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Message-Id: <20221128155717.2579992-1-harshit.m.mogalapalli@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-12-28vdpa_sim: fix vringh initialization in vdpasim_queue_ready()Stefano Garzarella1-2/+1
When we initialize vringh, we should pass the features and the number of elements in the virtqueue negotiated with the driver, otherwise operations with vringh may fail. This was discovered in a case where the driver sets a number of elements in the virtqueue different from the value returned by .get_vq_num_max(). In vdpasim_vq_reset() is safe to initialize the vringh with default values, since the virtqueue will not be used until vdpasim_queue_ready() is called again. Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221110141335.62171-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()ruanjinjie2-2/+6
Inject fault while probing module, if device_register() fails in vdpasim_net_init() or vdpasim_blk_init(), but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup(). (vdpa_sim_net) unreferenced object 0xffff88807eebc370 (size 16): comm "modprobe", pid 3848, jiffies 4362982860 (age 18.153s) hex dump (first 16 bytes): 76 64 70 61 73 69 6d 5f 6e 65 74 00 6b 6b 6b a5 vdpasim_net.kkk. backtrace: [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150 [<ffffffff81731d53>] kstrdup+0x33/0x60 [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110 [<ffffffff82d87aab>] dev_set_name+0xab/0xe0 [<ffffffff82d91a23>] device_add+0xe3/0x1a80 [<ffffffffa0270013>] 0xffffffffa0270013 [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0 [<ffffffff813739cb>] do_init_module+0x1ab/0x640 [<ffffffff81379d20>] load_module+0x5d00/0x77f0 [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0 [<ffffffff83c4d505>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 (vdpa_sim_blk) unreferenced object 0xffff8881070c1250 (size 16): comm "modprobe", pid 6844, jiffies 4364069319 (age 17.572s) hex dump (first 16 bytes): 76 64 70 61 73 69 6d 5f 62 6c 6b 00 6b 6b 6b a5 vdpasim_blk.kkk. backtrace: [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150 [<ffffffff81731d53>] kstrdup+0x33/0x60 [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110 [<ffffffff82d87aab>] dev_set_name+0xab/0xe0 [<ffffffff82d91a23>] device_add+0xe3/0x1a80 [<ffffffffa0220013>] 0xffffffffa0220013 [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0 [<ffffffff813739cb>] do_init_module+0x1ab/0x640 [<ffffffff81379d20>] load_module+0x5d00/0x77f0 [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0 [<ffffffff83c4d505>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 899c4d187f6a ("vdpa_sim_blk: add support for vdpa management tool") Fixes: a3c06ae158dd ("vdpa_sim_net: Add support for user supported devices") Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221110082348.4105476-1-ruanjinjie@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-12-28RDMA/mlx5: remove variable iColin Ian King1-2/+0
Variable i is just being incremented and it's never used anywhere else. The variable and the increment are redundant so remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20221024133756.2158497-1-colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa/mlx5: Avoid overwriting CVQ iotlbEli Cohen3-59/+39
When qemu uses different address spaces for data and control virtqueues, the current code would overwrite the control virtqueue iotlb through the dup_iotlb call. Fix this by referring to the address space identifier and the group to asid mapping to determine which mapping needs to be updated. We also move the address space logic from mlx5 net to core directory. Reported-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-6-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28vdpa/mlx5: Avoid using reslock in event_handlerEli Cohen1-12/+4
event_handler runs under atomic context and may not acquire reslock. We can still guarantee that the handler won't be called after suspend by clearing nb_registered, unregistering the handler and flushing the workqueue. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-5-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa/mlx5: Fix wrong mac address deletionEli Cohen1-1/+1
Delete the old MAC from the table and not the new one which is not there yet. Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-4-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa/mlx5: Return error on vlan ctrl commands if not supportedEli Cohen1-0/+3
Check if VIRTIO_NET_F_CTRL_VLAN is negotiated and return error if control VQ command is received. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28vdpa/mlx5: Fix rule forwarding VLAN to TIREli Cohen1-3/+5
Set the VLAN id to the header values field instead of overwriting the headers criteria field. Before this fix, VLAN filtering would not really work and tagged packets would be forwarded unfiltered to the TIR. Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa: merge functionally duplicated dev_features attributesSi-Wei Liu1-1/+1
We can merge VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES with VDPA_ATTR_DEV_FEATURES which is functionally equivalent. While at it, tweak the comment in header file to make user provioned device features distinguished from those supported by the parent mgmtdev device: the former of which can be inherited as a whole from the latter, or can be a subset of the latter if explicitly specified. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <1665422823-18364-1-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-11-24driver core: make struct class.devnode() take a const *Greg Kroah-Hartman1-1/+1
The devnode() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Justin Sanders <justin@coraid.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> Cc: Liam Mark <lmark@codeaurora.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Brian Starkey <Brian.Starkey@arm.com> Cc: John Stultz <jstultz@google.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sean Young <sean@mess.org> Cc: Frank Haverkamp <haver@linux.ibm.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Xie Yongji <xieyongji@bytedance.com> Cc: Gautam Dawar <gautam.dawar@xilinx.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Eli Cohen <elic@nvidia.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: alsa-devel@alsa-project.org Cc: dri-devel@lists.freedesktop.org Cc: kvm@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-block@vger.kernel.org Cc: linux-input@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20221123122523.1332370-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds6-23/+94
Pull virtio updates from Michael Tsirkin: - 9k mtu perf improvements - vdpa feature provisioning - virtio blk SECURE ERASE support - fixes and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: don't try to use intxif pin is zero vDPA: conditionally read MTU and MAC in dev cfg space vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fill vDPA: check virtio device features to detect MQ vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presence vDPA: only report driver features if FEATURES_OK is set vDPA: allow userspace to query features of a vDPA device virtio_blk: add SECURE ERASE command support vp_vdpa: support feature provisioning vdpa_sim_net: support feature provisioning vdpa: device feature provisioning virtio-net: use mtu size as buffer length for big packets virtio-net: introduce and use helper function for guest gso support checks virtio: drop vp_legacy_set_queue_size virtio_ring: make vring_alloc_queue_packed prettier virtio_ring: split: Operators use unified style vhost: add __init/__exit annotations to module init/exit funcs
2022-10-07vDPA: conditionally read MTU and MAC in dev cfg spaceZhu Lingshan1-8/+29
The spec says: mtu only exists if VIRTIO_NET_F_MTU is set The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set) So vdpa_dev_net_config_fill() should read MTU and MAC conditionally on the feature bits. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fillZhu Lingshan1-1/+2
This commit fixes spars warnings: cast to restricted __le16 in function vdpa_dev_net_mq_config_fill() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07vDPA: check virtio device features to detect MQZhu Lingshan1-1/+1
vdpa_dev_net_mq_config_fill() should checks device features for MQ than driver features. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presenceZhu Lingshan1-3/+3
virtio 1.2 spec says: max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS is set. So when reporint MQ to userspace, it should check both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS. unused parameter struct vdpa_device *vdev is removed Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07vDPA: only report driver features if FEATURES_OK is setZhu Lingshan1-6/+14
This commit reports driver features to user space only after FEATURES_OK is features negotiation is done. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220929014555.112323-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07vDPA: allow userspace to query features of a vDPA deviceZhu Lingshan1-5/+11
This commit adds a new vDPA netlink attribution VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query features of vDPA devices through this new attr. This commit invokes vdpa_config_ops.get_config() rather than vdpa_get_config_unlocked() to read the device config spcae, so no races in vdpa_set_features_unlocked() Userspace tool iproute2 example: $ vdpa dev config show vdpa0 vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500 negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220929014555.112323-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07vp_vdpa: support feature provisioningJason Wang1-2/+20
This patch allows the device features to be provisioned via netlink. This is done by: 1) validating the provisioned features to be a subset of the parent features. 2) clearing the features that is not wanted by the userspace For example: # vdpa mgmtdev show pci/0000:02:00.0: supported_classes net max_supported_vqs 3 dev_features CSUM GUEST_CSUM CTRL_GUEST_OFFLOADS MAC GUEST_TSO4 GUEST_TSO6 GUEST_ECN GUEST_UFO HOST_TSO4 HOST_TSO6 HOST_ECN HOST_UFO MRG_RXBUF STATUS CTRL_VQ CTRL_RX CTRL_VLAN CTRL_RX_EXTRA GUEST_ANNOUNCE CTRL_MAC_ADDR RING_INDIRECT_DESC RING_EVENT_IDX VERSION_1 ACCESS_PLATFORM 1) provision vDPA device with all features that are supported by the virtio-pci # vdpa dev add name dev1 mgmtdev pci/0000:02:00.0 # vdpa dev config show dev1: mac 52:54:00:12:34:56 link up link_announce false mtu 65535 negotiated_features CSUM GUEST_CSUM CTRL_GUEST_OFFLOADS MAC GUEST_TSO4 GUEST_TSO6 GUEST_ECN GUEST_UFO HOST_TSO4 HOST_TSO6 HOST_ECN HOST_UFO MRG_RXBUF STATUS CTRL_VQ CTRL_RX CTRL_VLAN GUEST_ANNOUNCE CTRL_MAC_ADDR RING_INDIRECT_DESC RING_EVENT_IDX VERSION_1 ACCESS_PLATFORM 2) provision vDPA device with a subset of the features # vdpa dev add name dev1 mgmtdev pci/0000:02:00.0 device_features 0x300020000 # dev1: mac 52:54:00:12:34:56 link up link_announce false mtu 65535 negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220927074810.28627-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>