aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
7 daystcp: socket option to check for MPTCP fallback to TCPMatthieu Baerts (NGI0)3-0/+7
A way for an application to know if an MPTCP connection fell back to TCP is to use getsockopt(MPTCP_INFO) and look for errors. The issue with this technique is that the same errors -- EOPNOTSUPP (IPv4) and ENOPROTOOPT (IPv6) -- are returned if there was a fallback, *or* if the kernel doesn't support this socket option. The userspace then has to look at the kernel version to understand what the errors mean. It is not clean, and it doesn't take into account older kernels where the socket option has been backported. A cleaner way would be to expose this info to the TCP socket level. In case of MPTCP socket where no fallback happened, the socket options for the TCP level will be handled in MPTCP code, in mptcp_getsockopt_sol_tcp(). If not, that will be in TCP code, in do_tcp_getsockopt(). So MPTCP simply has to set the value 1, while TCP has to set 0. If the socket option is not supported, one of these two errors will be reported: - EOPNOTSUPP (95 - Operation not supported) for MPTCP sockets - ENOPROTOOPT (92 - Protocol not available) for TCP sockets, e.g. on the socket received after an 'accept()', when the client didn't request to use MPTCP: this socket will be a TCP one, even if the listen socket was an MPTCP one. With this new option, the kernel can return a clear answer to both "Is this kernel new enough to tell me the fallback status?" and "If it is new enough, is it currently a TCP or MPTCP socket?" questions, while not breaking the previous method. Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240509-upstream-net-next-20240509-mptcp-tcp_is_mptcp-v1-1-f846df999202@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 dayssmb: smb2pdu.h: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva3-30/+33
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. So, in order to avoid ending up with a flexible-array member in the middle of multiple other structs, we use the `__struct_group()` helper to separate the flexible array from the rest of the members in the flexible structure, and use the tagged `struct create_context_hdr` instead of `struct create_context`. So, with these changes, fix 51 of the following warnings[1]: fs/smb/client/../common/smb2pdu.h:1225:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://gist.github.com/GustavoARSilva/772526a39be3dd4db39e71497f0a9893 [1] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 daysMerge branch ↵Jakub Kicinski8-95/+224
'net-gro-remove-network_header-use-move-p-flush-flush_id-calculations-to-l4' Richard Gobert says: ==================== net: gro: remove network_header use, move p->{flush/flush_id} calculations to L4 The cb fields network_offset and inner_network_offset are used instead of skb->network_header throughout GRO. These fields are then leveraged in the next commit to remove flush_id state from napi_gro_cb, and stateful code in {ipv6,inet}_gro_receive which may be unnecessarily complicated due to encapsulation support in GRO. These fields are checked in L4 instead. 3rd patch adds tests for different flush_id flows in GRO. ==================== Link: https://lore.kernel.org/r/20240509190819.2985-1-richardbgobert@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysselftests/net: add flush id selftestsRichard Gobert1-0/+138
Added flush id selftests to test different cases where DF flag is set or unset and id value changes in the following packets. All cases where the packets should coalesce or should not coalesce are tested. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240509190819.2985-4-richardbgobert@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segmentRichard Gobert6-84/+73
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, iph->id, ...) against all packets in a loop. These flush checks are used in all merging UDP and TCP flows. These checks need to be done only once and only against the found p skb, since they only affect flush and not same_flow. This patch leverages correct network header offsets from the cb for both outer and inner network headers - allowing these checks to be done only once, in tcp_gro_receive and udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are more declarative and contained in inet_gro_flush, thus removing the need for flush_id in napi_gro_cb. This results in less parsing code for non-loop flush tests for TCP and UDP flows. To make sure results are not within noise range - I've made netfilter drop all TCP packets, and measured CPU performance in GRO (in this case GRO is responsible for about 50% of the CPU utilization). perf top while replaying 64 parallel IP/TCP streams merging in GRO: (gro_receive_network_flush is compiled inline to tcp_gro_receive) net-next: 6.94% [kernel] [k] inet_gro_receive 3.02% [kernel] [k] tcp_gro_receive patch applied: 4.27% [kernel] [k] tcp_gro_receive 4.22% [kernel] [k] inet_gro_receive perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same results for any encapsulation, in this case inet_gro_receive is top offender in net-next) net-next: 10.09% [kernel] [k] inet_gro_receive 2.08% [kernel] [k] tcp_gro_receive patch applied: 6.97% [kernel] [k] inet_gro_receive 3.68% [kernel] [k] tcp_gro_receive Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240509190819.2985-3-richardbgobert@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: gro: use cb instead of skb->network_headerRichard Gobert5-11/+13
This patch converts references of skb->network_header to napi_gro_cb's network_offset and inner_network_offset. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240509190819.2985-2-richardbgobert@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge branch 'ena-driver-changes-may-2024'Jakub Kicinski7-28/+73
David Arinzon says: ==================== ENA driver changes May 2024 This patchset contains several misc and minor changes to the ENA driver. ==================== Link: https://lore.kernel.org/r/20240512134637.25299-1-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: ena: Change initial rx_usec intervalDavid Arinzon1-1/+1
For the purpose of obtaining better CPU utilization, minimum rx moderation interval is set to 20 usec. Signed-off-by: Osama Abboud <osamaabb@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512134637.25299-6-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: ena: Changes around strscpy callsDavid Arinzon2-8/+25
strscpy copies as much of the string as possible, meaning that the destination string will be truncated in case of no space. As this is a non-critical error in our case, adding a debug level print for indication. This patch also removes a -1 which was added to ensure enough space for NUL, but strscpy destination string is guaranteed to be NUL-terminted, therefore, the -1 is not needed. Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512134637.25299-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: ena: Add validation for completion descriptors consistencyDavid Arinzon3-10/+30
Validate that `first` flag is set only for the first descriptor in multi-buffer packets. In case of an invalid descriptor, a reset will occur. A new reset reason for RX data corruption has been added. Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512134637.25299-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: ena: Reduce holes in ena_com structuresDavid Arinzon2-3/+3
This patch makes two changes in order to fill holes and reduce ther overall size of the structures ena_com_dev and ena_com_rx_ctx. Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512134637.25299-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: ena: Add a counter for driver's reset failuresDavid Arinzon3-6/+14
This patch adds a counter to the ena_adapter struct in order to keep track of reset failures. The counter is incremented every time either ena_restore_device() or ena_destroy_device() fail. Signed-off-by: Osama Abboud <osamaabb@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512134637.25299-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysselftests: netfilter: nft_flowtable.sh: bump socat timeout to 1mFlorian Westphal1-2/+3
Now that this test runs in netdev CI it looks like 10s isn't enough for debug kernels: selftests: net/netfilter: nft_flowtable.sh 2024/05/10 20:33:08 socat[12204] E write(7, 0x563feb16a000, 8192): Broken pipe FAIL: file mismatch for ns1 -> ns2 -rw------- 1 root root 37345280 May 10 20:32 /tmp/tmp.Am0yEHhNqI ... Looks like socat gets zapped too quickly, so increase timeout to 1m. Could also reduce tx file size for KSFT_MACHINE_SLOW, but its preferrable to have same test for both debug and nondebug. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240511064814.561525-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'hardening-6.10-rc1' of ↵Linus Torvalds27-477/+768
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "The bulk of the changes here are related to refactoring and expanding the KUnit tests for string helper and fortify behavior. Some trivial strncpy replacements in fs/ were carried in my tree. Also some fixes to SCSI string handling were carried in my tree since the helper for those was introduce here. Beyond that, just little fixes all around: objtool getting confused about LKDTM+KCFI, preparing for future refactors (constification of sysctl tables, additional __counted_by annotations), a Clang UBSAN+i386 crash fix, and adding more options in the hardening.config Kconfig fragment. Summary: - selftests: Add str*cmp tests (Ivan Orlov) - __counted_by: provide UAPI for _le/_be variants (Erick Archer) - Various strncpy deprecation refactors (Justin Stitt) - stackleak: Use a copy of soon-to-be-const sysctl table (Thomas Weißschuh) - UBSAN: Work around i386 -regparm=3 bug with Clang prior to version 19 - Provide helper to deal with non-NUL-terminated string copying - SCSI: Fix older string copying bugs (with new helper) - selftests: Consolidate string helper behavioral tests - selftests: add memcpy() fortify tests - string: Add additional __realloc_size() annotations for "dup" helpers - LKDTM: Fix KCFI+rodata+objtool confusion - hardening.config: Enable KCFI" * tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits) uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be} stackleak: Use a copy of the ctl_table argument string: Add additional __realloc_size() annotations for "dup" helpers kunit/fortify: Fix replaced failure path to unbreak __alloc_size hardening: Enable KCFI and some other options lkdtm: Disable CFI checking for perms functions kunit/fortify: Add memcpy() tests kunit/fortify: Do not spam logs with fortify WARNs kunit/fortify: Rename tests to use recommended conventions init: replace deprecated strncpy with strscpy_pad kunit/fortify: Fix mismatched kvalloc()/vfree() usage scsi: qla2xxx: Avoid possible run-time warning with long model_num scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings fs: ecryptfs: replace deprecated strncpy with strscpy hfsplus: refactor copy_name to not use strncpy reiserfs: replace deprecated strncpy with scnprintf virt: acrn: replace deprecated strncpy with strscpy ubsan: Avoid i386 UBSAN handler crashes with Clang ubsan: Remove 1-element array usage in debug reporting ...
7 daysMerge tag 'execve-6.10-rc1' of ↵Linus Torvalds10-51/+120
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Provide knob to change (previously fixed) coredump NOTES size (Allen Pais) - Add sched_prepare_exec tracepoint (Marco Elver) - Make /proc/$pid/auxv work under binfmt_elf_fdpic (Max Filippov) - Convert ARCH_HAVE_EXTRA_ELF_NOTES to proper Kconfig (Vignesh Balasubramanian) - Leave a gap between .bss and brk * tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fs/coredump: Enable dynamic configuration of max file note size binfmt_elf_fdpic: fix /proc/<pid>/auxv binfmt_elf: Leave a gap between .bss and brk Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig tracing: Add sched_prepare_exec tracepoint
7 daysMerge tag 'seccomp-6.10-rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp update from Kees Cook: - Prepare for sysctl table constification * tag 'seccomp-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Constify sysctl subhelpers
7 daysselftests: net: use upstream mtoolsVladimir Oltean1-2/+17
Joachim kindly merged the IPv6 support in https://github.com/troglobit/mtools/pull/2, so we can just use his version now. A few more fixes subsequently came in for IPv6, so even better. Check that the deployed mtools version is 3.0 or above. Note that the version check breaks compatibility with my fork where I didn't bump the version, but I assume that won't be a problem. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20240510112856.1262901-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysselftest: epoll_busy_poll: Fix spelling mistake "couldnt" -> "couldn't"Colin Ian King1-1/+1
There is a spelling mistake in a TH_LOG message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240510084811.3299685-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysinet: fix inet_fill_ifaddr() flags truncationEric Dumazet1-3/+10
I missed that (struct ifaddrmsg)->ifa_flags was only 8bits, while (struct in_ifaddr)->ifa_flags is 32bits. Use a temporary 32bit variable as I did in set_ifa_lifetime() and check_lifetime(). Fixes: 3ddc2231c810 ("inet: annotate data-races around ifa->ifa_flags") Reported-by: Yu Watanabe <watanabe.yu@gmail.com> Dianosed-by: Yu Watanabe <watanabe.yu@gmail.com> Closes: https://github.com/systemd/systemd/pull/32666#issuecomment-2103977928 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240510072932.2678952-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: phy: air_en8811h: reset netdev rules when LED is set manuallyDaniel Golle1-0/+4
Setting LED_OFF via brightness_set should deactivate hw control, so make sure netdev trigger rules also get cleared in that case. This fixes unwanted restoration of the default netdev trigger rules and matches the behaviour when using the 'netdev' trigger without any hardware offloading. Fixes: 71e79430117d ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/5ed8ea615890a91fa4df59a7ae8311bbdf63cdcf.1715248281.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysKEYS: asymmetric: Add missing dependencies of FIPS_SIGNATURE_SELFTESTEric Biggers1-0/+2
Since the signature self-test uses RSA and SHA-256, it must only be enabled when those algorithms are enabled. Otherwise it fails and panics the kernel on boot-up. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404221528.51d75177-lkp@intel.com Fixes: 3cde3174eb91 ("certs: Add FIPS selftests") Cc: stable@vger.kernel.org Cc: Simo Sorce <simo@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
7 daysKEYS: asymmetric: Add missing dependency on CRYPTO_SIGEric Biggers1-0/+1
Make ASYMMETRIC_PUBLIC_KEY_SUBTYPE select CRYPTO_SIG to avoid build errors like the following, which were possible with CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y && CONFIG_CRYPTO_SIG=n: ld: vmlinux.o: in function `public_key_verify_signature': (.text+0x306280): undefined reference to `crypto_alloc_sig' ld: (.text+0x306300): undefined reference to `crypto_sig_set_pubkey' ld: (.text+0x306324): undefined reference to `crypto_sig_verify' ld: (.text+0x30636c): undefined reference to `crypto_sig_set_privkey' Fixes: 63ba4d67594a ("KEYS: asymmetric: Use new crypto interface without scatterlists") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
7 daysMerge tag 'nf-next-24-05-12' of ↵Jakub Kicinski28-175/+639
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: Patch #1 skips transaction if object type provides no .update interface. Patch #2 skips NETDEV_CHANGENAME which is unused. Patch #3 enables conntrack to handle Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) as untracked opposed to invalid packets. From Linus Luessing. Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of dropping them, from Jason Xing. Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0, also from Jason. Patch #6 removes reference in netfilter's sysctl documentation on pickup entries which were already removed by Florian Westphal. Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which allows to evict entries from the conntrack table, also from Florian. Patches #8 to #16 updates nf_tables pipapo set backend to allocate the datastructure copy on-demand from preparation phase, to better deal with OOM situations where .commit step is too late to fail. Series from Florian Westphal. Patch #17 adds a selftest with packetdrill to cover conntrack TCP state transitions, also from Florian. Patch #18 use GFP_KERNEL to clone elements from control plane to avoid quick atomic reserves exhaustion with large sets, reporter refers to million entries magnitude. * tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow clone callbacks to sleep selftests: netfilter: add packetdrill based conntrack tests netfilter: nft_set_pipapo: remove dirty flag netfilter: nft_set_pipapo: move cloning of match info to insert/removal path netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone netfilter: nft_set_pipapo: merge deactivate helper into caller netfilter: nft_set_pipapo: prepare walk function for on-demand clone netfilter: nft_set_pipapo: prepare destroy function for on-demand clone netfilter: nft_set_pipapo: make pipapo_clone helper return NULL netfilter: nft_set_pipapo: move prove_locking helper around netfilter: conntrack: remove flowtable early-drop test netfilter: conntrack: documentation: remove reference to non-existent sysctl netfilter: use NF_DROP instead of -NF_DROP netfilter: conntrack: dccp: try not to drop skb in conntrack netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler netfilter: nf_tables: skip transaction if update object is not implemented ==================== Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'for-netdev' of ↵Jakub Kicinski2-8/+62
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-05-13 We've added 3 non-merge commits during the last 2 day(s) which contain a total of 2 files changed, 62 insertions(+), 8 deletions(-). The main changes are: 1) Fix a case where syzkaller found that it's unexpectedly possible to attach a cgroup_skb program to the sockopt hooks. The fix adds missing attach_type enforcement for the link_create case along with selftests, from Stanislav Fomichev. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add sockopt case to verify prog_type selftests/bpf: Extend sockopt tests to use BPF_LINK_CREATE bpf: Add BPF_PROG_TYPE_CGROUP_SKB attach type enforcement in BPF_LINK_CREATE ==================== Link: https://lore.kernel.org/r/20240513041845.31040-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'for-6.10/block-20240511' of git://git.kernel.dk/linuxLinus Torvalds72-2949/+2555
Pull block updates from Jens Axboe: - Add a partscan attribute in sysfs, fixing an issue with systemd relying on an internal interface that went away. - Attempt #2 at making long running discards interruptible. The previous attempt went into 6.9, but we ended up mostly reverting it as it had issues. - Remove old ida_simple API in bcache - Support for zoned write plugging, greatly improving the performance on zoned devices. - Remove the old throttle low interface, which has been experimental since 2017 and never made it beyond that and isn't being used. - Remove page->index debugging checks in brd, as it hasn't caught anything and prepares us for removing in struct page. - MD pull request from Song - Don't schedule block workers on isolated CPUs * tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits) blk-throttle: delay initialization until configuration blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW block: fix that util can be greater than 100% block: support to account io_ticks precisely block: add plug while submitting IO bcache: fix variable length array abuse in btree_iter bcache: Remove usage of the deprecated ida_simple_xx() API md: Revert "md: Fix overflow in is_mddev_idle" blk-lib: check for kill signal in ioctl BLKDISCARD block: add a bio_await_chain helper block: add a blk_alloc_discard_bio helper block: add a bio_chain_and_submit helper block: move discard checks into the ioctl handler block: remove the discard_granularity check in __blkdev_issue_discard block/ioctl: prefer different overflow check null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION() block: fix and simplify blkdevparts= cmdline parsing block: refine the EOF check in blkdev_iomap_begin block: add a partscan sysfs attribute for disks block: add a disk_has_partscan helper ...
7 daysMerge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linuxLinus Torvalds51-1762/+2050
Pull io_uring updates from Jens Axboe: - Greatly improve send zerocopy performance, by enabling coalescing of sent buffers. MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the io_uring side did not. In local testing, the crossover point for send zerocopy being faster is now around 3000 byte packets, and it performs better than the sync syscall variants as well. This feature relies on a shared branch with net-next, which was pulled into both branches. - Unification of how async preparation is done across opcodes. Previously, opcodes that required extra memory for async retry would allocate that as needed, using on-stack state until that was the case. If async retry was needed, the on-stack state was adjusted appropriately for a retry and then copied to the allocated memory. This led to some fragile and ugly code, particularly for read/write handling, and made storage retries more difficult than they needed to be. Allocate the memory upfront, as it's cheap from our pools, and use that state consistently both initially and also from the retry side. - Move away from using remap_pfn_range() for mapping the rings. This is really not the right interface to use and can cause lifetime issues or leaks. Additionally, it means the ring sq/cq arrays need to be physically contigious, which can cause problems in production with larger rings when services are restarted, as memory can be very fragmented at that point. Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply the same treatment to mapped ring provided buffers. This also helps unify the code we have dealing with allocating and mapping memory. Hard to see in the diffstat as we're adding a few features as well, but this kills about ~400 lines of code from the codebase as well. - Add support for bundles for send/recv. When used with provided buffers, bundles support sending or receiving more than one buffer at the time, improving the efficiency by only needing to call into the networking stack once for multiple sends or receives. - Tweaks for our accept operations, supporting both a DONTWAIT flag for skipping poll arm and retry if we can, and a POLLFIRST flag that the application can use to skip the initial accept attempt and rely purely on poll for triggering the operation. Both of these have identical flags on the receive side already. - Make the task_work ctx locking unconditional. We had various code paths here that would do a mix of lock/trylock and set the task_work state to whether or not it was locked. All of that goes away, we lock it unconditionally and get rid of the state flag indicating whether it's locked or not. The state struct still exists as an empty type, can go away in the future. - Add support for specifying NOP completion values, allowing it to be used for error handling testing. - Use set/test bit for io-wq worker flags. Not strictly needed, but also doesn't hurt and helps silence a KCSAN warning. - Cleanups for io-wq locking and work assignments, closing a tiny race where cancelations would not be able to find the work item reliably. - Misc fixes, cleanups, and improvements * tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits) io_uring: support to inject result for NOP io_uring: fail NOP if non-zero op flags is passed in io_uring/net: add IORING_ACCEPT_POLL_FIRST flag io_uring/net: add IORING_ACCEPT_DONTWAIT flag io_uring/filetable: don't unnecessarily clear/reset bitmap io_uring/io-wq: Use set_bit() and test_bit() at worker->flags io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring io_uring: Require zeroed sqe->len on provided-buffers send io_uring/notif: disable LAZY_WAKE for linked notifs io_uring/net: fix sendzc lazy wake polling io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it io_uring/rw: reinstate thread check for retries io_uring/notif: implement notification stacking io_uring/notif: simplify io_notif_flush() net: add callback for setting a ubuf_info to skb net: extend ubuf_info callback to ops structure io_uring/net: support bundles for recv io_uring/net: support bundles for send io_uring/kbuf: add helpers for getting/peeking multiple buffers io_uring/net: add provided buffer support for IORING_OP_SEND ...
7 daysMerge tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds5-50/+93
Pull vfs rw iterator updates from Christian Brauner: "The core fs signalfd, userfaultfd, and timerfd subsystems did still use f_op->read() instead of f_op->read_iter(). Convert them over since we should aim to get rid of f_op->read() at some point. Aside from that io_uring and others want to mark files as FMODE_NOWAIT so it can make use of per-IO nonblocking hints to enable more efficient IO. Converting those users to f_op->read_iter() allows them to be marked with FMODE_NOWAIT" * tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signalfd: convert to ->read_iter() userfaultfd: convert to ->read_iter() timerfd: convert to ->read_iter() new helper: copy_to_iter_full()
7 daysMerge tag 'vfs-6.10.netfs' of ↵Linus Torvalds49-4588/+3298
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: "This reworks the netfslib writeback implementation so that pages read from the cache are written to the cache through ->writepages(), thereby allowing the fscache page flag to be retired. The reworking also: - builds on top of the new writeback_iter() infrastructure - makes it possible to use vectored write RPCs as discontiguous streams of pages can be accommodated - makes it easier to do simultaneous content crypto and stream division - provides support for retrying writes and re-dividing a stream - replaces the ->launder_folio() op, so that ->writepages() is used instead - uses mempools to allocate the netfs_io_request and netfs_io_subrequest structs to avoid allocation failure in the writeback path Some code that uses the fscache page flag is retained for compatibility purposes with nfs and ceph. The code is switched to using the synonymous private_2 label instead and marked with deprecation comments. The merge commit contains additional details on the new algorithm that I've left out of here as it would probably be excessively detailed. On top of the netfslib infrastructure this contains the work to convert cifs over to netfslib" * tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) cifs: Enable large folio support cifs: Remove some code that's no longer used, part 3 cifs: Remove some code that's no longer used, part 2 cifs: Remove some code that's no longer used, part 1 cifs: Cut over to using netfslib cifs: Implement netfslib hooks cifs: Make add_credits_and_wake_if() clear deducted credits cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs cifs: Set zero_point in the copy_file_range() and remap_file_range() cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c cifs: Replace the writedata replay bool with a netfs sreq flag cifs: Make wait_mtu_credits take size_t args cifs: Use more fields from netfs_io_subrequest cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest cifs: Use alternative invalidation to using launder_folio netfs, afs: Use writeback retry to deal with alternate keys netfs: Miscellaneous tidy ups netfs: Remove the old writeback code netfs: Cut over to using new writeback code ...
7 daysMerge tag 'vfs-6.10.mount' of ↵Linus Torvalds6-312/+330
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount API conversions from Christian Brauner: "This converts qnx6, minix, debugfs, tracefs, freevxfs, and openpromfs to the new mount api, further reducing the number of filesystems relying on the legacy mount api" * tag 'vfs-6.10.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minix: convert minix to use the new mount api vfs: Convert tracefs to use the new mount API vfs: Convert debugfs to use the new mount API openpromfs: finish conversion to the new mount API freevxfs: Convert freevxfs to the new mount API. qnx6: convert qnx6 to use the new mount api
7 daysMerge branches 'acpi-tools', 'acpi-docs' and 'pnp'Rafael J. Wysocki3-3/+5
Merge an ACPI pfrut utility update, an ACPI documentation update and a PNP update for 6.10: - Fix a typo in the ACPI documentation regarding the layout of sysfs subdirectory representing the ACPI namespace (John Watts). - Make the ACPI pfrut utility print the update_cap field during capability query (Chen Yu). - Add HAS_IOPORT dependencies to PNP (Niklas Schnelle). * acpi-tools: ACPI: tools: pfrut: Print the update_cap field during capability query * acpi-docs: Documentation: firmware-guide: ACPI: Fix namespace typo * pnp: PNP: add HAS_IOPORT dependencies
7 daysMerge branches 'acpi-x86', 'acpi-dptf' and 'acpi-apei'Rafael J. Wysocki16-23/+61
Merge x86-specific ACPI updates, an ACPI DPTF driver update adding new platform support to it, and an ACPI APEI update: - Add a num-cs device property to specify the number of chip selects for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a nested CONFIG_PM #ifdef from it (Andy Shevchenko). - Move three x86-specific ACPI files to the x86 directory (Andy Shevchenko). - Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede). - Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy Sathyanarayanan). - Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar). - Mark the einj_driver driver's remove callback as __exit because it cannot get unbound via sysfs (Uwe Kleine-König). * acpi-x86: ACPI: Move acpi_blacklisted() declaration to asm/acpi.h ACPI: x86: Add PNP_UART1_SKIP quirk for Lenovo Blade2 tablets ACPI: x86: utils: Mark SMO8810 accel on Dell XPS 15 9550 as always present ACPI: x86: Move LPSS to x86 folder ACPI: x86: Move blacklist to x86 folder ACPI: x86: Move acpi_cmos_rtc to x86 folder ACPI: x86: Introduce a Makefile ACPI: LPSS: Remove nested ifdeffery for CONFIG_PM ACPI: LPSS: Advertise number of chip selects via property * acpi-dptf: ACPI: DPTF: Add Lunar Lake support * acpi-apei: ACPI: APEI: EINJ: mark remove callback as __exit
7 daysMerge tag 'vfs-6.10.misc' of ↵Linus Torvalds50-249/+438
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
7 daysMerge tag 'vfs-6.10.iomap' of ↵Linus Torvalds1-54/+65
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: "This contains a few cleanups to the iomap code. Nothing particularly stands out" * tag 'vfs-6.10.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: do some small logical cleanup in buffered write iomap: make iomap_write_end() return a boolean iomap: use a new variable to handle the written bytes in iomap_write_iter() iomap: don't increase i_size if it's not a write operation iomap: drop the write failure handles when unsharing and zeroing iomap: convert iomap_writepages to writeack_iter
7 daysMerge branches 'pm-em' and 'pm-docs'Rafael J. Wysocki6-26/+122
Merge Enery Model update and a power management documentation update for 6.10: - Make the Samsung exynos-asv driver update the Energy Model after adjusting voltage on top of some preliminary changes of the OPP and Enery Model generic code (Lukasz Luba). - Remove a reference to a function that has been dropped from the power management documentation (Bjorn Helgaas). * pm-em: soc: samsung: exynos-asv: Update Energy Model after adjusting voltage PM: EM: Add em_dev_update_chip_binning() PM: EM: Refactor em_adjust_new_capacity() OPP: OF: Export dev_opp_pm_calc_power() for usage from EM * pm-docs: Documentation: PM: Update platform_pci_wakeup_init() reference
7 daysMerge branches 'pm-cpuidle', 'pm-sleep' and 'pm-powercap'Rafael J. Wysocki11-37/+652
Merge cpuidle updates, changes related to system sleep and power capping updates for 6.10: - Fix kerneldoc description of ladder_do_selection() (Jeff Johnson). - Convert the cpuidle kirkwood driver to platform remove callback returning void (Yangtao Li). - Replace deprecated strncpy() with strscpy() in the hibernation core code (Justin Stitt). - Use %ps to simplify debug output in the core system-wide suspend and resume code (Len Brown). - Remove unnecessary else from device_init_wakeup() and make device_wakeup_disable() return void (Dhruva Gole). - Enable PMU support in the Intel TPMI RAPL driver (Zhang Rui). - Add support for ArrowLake-H platform to the Intel RAPL driver (Zhang Rui). - Avoid explicit cpumask allocation on stack in DTPM (Dawei Li). * pm-cpuidle: cpuidle: ladder: fix ladder_do_selection() kernel-doc cpuidle: kirkwood: Convert to platform remove callback returning void * pm-sleep: PM: hibernate: replace deprecated strncpy() with strscpy() PM: sleep: Take advantage of %ps to simplify debug output PM: wakeup: Remove unnecessary else from device_init_wakeup() PM: wakeup: make device_wakeup_disable() return void * pm-powercap: powercap: intel_rapl_tpmi: Enable PMU support powercap: intel_rapl: Introduce APIs for PMU support powercap: intel_rapl: Sort header files powercap: intel_rapl: Add support for ArrowLake-H platform powercap: DTPM: Avoid explicit cpumask allocation on stack
7 daysMerge branch 'pm-cpufreq'Rafael J. Wysocki27-349/+694
Merge cpufreq updates for 6.10: - Rework the handling of disabled turbo in the intel_pstate driver and make it update the maximum CPU frequency consistently regardless of the reason on top of a number of cleanups (Rafael Wysocki). - Add missing checks for NULL .exit() cpufreq driver callback to the cpufreq core (Viresh Kumar). - Prevent pulicy->max from going above the frequency QoS maximum value when cpufreq_frequency_table_verify() is used (Xuewen Yan). - Prevent a negative CPU number or frequency value from being printed if they are really large (Joshua Yeong). - Update MAINTAINERS entry for amd-pstate to add two new submaintainers and a designated reviewer (Huang Rui). - Clean up the amd-pstate driver and update its documentation (Gautham Shenoy). - Fix the highest frequency issue in the amd-pstate driver which limits performance (Perry Yuan). - Enable CPPC v2 for certain processors in the family 17H, as requested by TR40 processor users who expect improved performance and lower system temperature (Perry Yuan). - Change latency and delay values to be read from platform firmware firstly for more accurate timing (Perry Yuan). - A new quirk is introduced for supporting amd-pstate on legacy processors which either lack CPPC capability, or only only have CPPC v2 capability (Perry Yuan). - Sun50i: Add support for opp_supported_hw, H616 platform and general cleanups (Andre Przywara, Martin Botka, Brandon Cheo Fusi, Dan Carpenter, Viresh Kumar). - CPPC: Fix possible null pointer dereference (Aleksandr Mishin). - Eliminate uses of of_node_put() (Javier Carrasco, and Shivani Gupta). - brcmstb-avs: ISO C90 forbids mixed declarations (Portia Stephens). - mediatek: Add support for MT7988A (Sam Shih). - cpufreq-qcom-hw: Add SM4450 compatibles in DT bindings (Tengfei Fan). - Fix struct cpudata::epp_cached kernel-doc in the intel_pstate cpufreq driver (Jeff Johnson). * pm-cpufreq: (46 commits) cpufreq: amd-pstate: fix the highest frequency issue which limits performance cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc cpufreq: Fix up printing large CPU numbers and frequency values MAINTAINERS: cpufreq: amd-pstate: Add co-maintainers and reviewer cpufreq: amd-pstate: remove unused variable lowest_nonlinear_freq cpufreq: amd-pstate: fix code format problems cpufreq: amd-pstate: Add quirk for the pstate CPPC capabilities missing cppc_acpi: print error message if CPPC is unsupported cpufreq: amd-pstate: get transition delay and latency value from ACPI tables cpufreq: amd-pstate: Bail out if min/max/nominal_freq is 0 cpufreq: amd-pstate: Remove amd_get_{min,max,nominal,lowest_nonlinear}_freq() cpufreq: amd-pstate: Unify computation of {max,min,nominal,lowest_nonlinear}_freq cpufreq: amd-pstate: Document the units for freq variables in amd_cpudata cpufreq: amd-pstate: Document *_limit_* fields in struct amd_cpudata dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM4450 compatibles cpufreq: sun50i: fix error returns in dt_has_supported_hw() cpufreq: brcmstb-avs-cpufreq: ISO C90 forbids mixed declarations cpufreq: dt-platdev: eliminate uses of of_node_put() cpufreq: dt: eliminate uses of of_node_put() cpufreq: ti: Implement scope-based cleanup in ti_cpufreq_match_node() ...
7 daysMerge tag 'docs-6.10' of git://git.lwn.net/linuxLinus Torvalds88-274/+2299
Pull documentation updates from Jonathan Corbet: "Another not-too-busy cycle for documentation, including: - Some build-system changes to detect the variable fonts installed by some distributions that can break the PDF build. - Various updates and additions to the Spanish, Chinese, Italian, and Japanese translations. - Update the stable-kernel rules to match modern practice ... and the usual array of corrections, updates, and typo fixes" * tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits) cgroup: Add documentation for missing zswap memory.stat kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning. docs:core-api: fixed typos and grammar in printk-index page Documentation: tracing: Fix spelling mistakes docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4 docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4 docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4 docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4 docs: stable-kernel-rules: fix typo sent->send docs/zh_CN: remove two inconsistent spaces docs: scripts/check-variable-fonts.sh: Improve commands for detection docs: stable-kernel-rules: create special tag to flag 'no backporting' docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.) docs: stable-kernel-rules: remove code-labels tags and a indention level docs: stable-kernel-rules: call mainline by its name and change example docs: stable-kernel-rules: reduce redundancy docs, kprobes: Add riscv as supported architecture Docs: typos/spelling docs: kernel_include.py: Cope with docutils 0.21 docs: ja_JP/howto: Catch up update in v6.8 ...
7 daysMerge tag 'keys-next-6.10-rc1' of ↵Linus Torvalds3-24/+30
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull keys updates from Jarkko Sakkinen: - do not overwrite the key expiration once it is set - move key quota updates earlier into key_put(), instead of updating them in key_gc_unused_keys() * tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: keys: Fix overwrite of key expiration on instantiation keys: update key quotas in key_put()
7 daysMerge tag 'tpmdd-next-6.10-rc1' of ↵Linus Torvalds25-212/+2519
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull TPM updates from Jarkko Sakkinen: "These are the changes for the TPM driver with a single major new feature: TPM bus encryption and integrity protection. The key pair on TPM side is generated from so called null random seed per power on of the machine [1]. This supports the TPM encryption of the hard drive by adding layer of protection against bus interposer attacks. Other than that, a few minor fixes and documentation for tpm_tis to clarify basics of TPM localities for future patch review discussions (will be extended and refined over times, just a seed)" Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1] * tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits) Documentation: tpm: Add TPM security docs toctree entry tpm: disable the TPM if NULL name changes Documentation: add tpm-security.rst tpm: add the null key name as a sysfs export KEYS: trusted: Add session encryption protection to the seal/unseal path tpm: add session encryption protection to tpm2_get_random() tpm: add hmac checks to tpm2_pcr_extend() tpm: Add the rest of the session HMAC API tpm: Add HMAC session name/handle append tpm: Add HMAC session start and end functions tpm: Add TCG mandated Key Derivation Functions (KDFs) tpm: Add NULL primary creation tpm: export the context save and load commands tpm: add buffer function to point to returned parameters crypto: lib - implement library version of AES in CFB mode KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers tpm: Add tpm_buf_read_{u8,u16,u32} tpm: TPM2B formatted buffers tpm: Store the length of the tpm_buf data separately. tpm: Update struct tpm_buf documentation comments ...
7 daysMerge tag 'keys-trusted-next-6.10-rc1' of ↵Linus Torvalds10-14/+554
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull trusted keys updates from Jarkko Sakkinen: "This contains a new key type for the Data Co-Processor (DCP), which is an IP core built into many NXP SoCs such as i.mx6ull" * tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: docs: trusted-encrypted: add DCP as new trust source docs: document DCP-backed trusted keys kernel params MAINTAINERS: add entry for DCP-based trusted keys KEYS: trusted: Introduce NXP DCP-backed trusted keys KEYS: trusted: improve scalability of trust source config crypto: mxs-dcp: Add support for hardware-bound keys
7 daysMerge branches 'acpi-resource', 'acpi-property' and 'acpi-numa'Rafael J. Wysocki7-68/+68
Make ACPI resource management quirks, a documentation update related to the ACPI handling of device properties and ACPI NUMA handling changes for 6.10: - Add ACPI IRQ override quirks for Asus Vivobook Pro N6506MV, TongFang GXxHRXx and GMxHGxx, and XMG APEX 17 M23 (Guenter Schafranek, Tamim Khan, Christoffer Sandberg). - Add reference to UEFI DSD Guide to the documentation related to the ACPI handling of device properties (Sakari Ailus). - Fix SRAT lookup of CFMWS ranges with numa_fill_memblks(), remove lefover architecture-dependent code from the ACPI NUMA handling code and simplify it on top of that (Robert Richter). * acpi-resource: ACPI: resource: Skip IRQ override on Asus Vivobook Pro N6506MV ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx ACPI: resource: Do IRQ override on GMxBGxx (XMG APEX 17 M23) * acpi-property: ACPI: property: Add reference to UEFI DSD Guide * acpi-numa: ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity() ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit() ACPI/NUMA: Remove architecture dependent remainings x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()
7 daysMerge branches 'acpi-scan' and 'acpi-tables'Rafael J. Wysocki9-238/+652
Merge ACPI device enumeration changes and ACPI data-only tables support updates for 6.10: - Rearrange fields in several structures to effectively eliminate computations from container_of() in some cases (Andy Shevchenko). - Do some assorted cleanups of the ACPI device enumeration code (Andy Shevchenko). - Make the ACPI device enumeration code skip devices with _STA values clearly identified by the specification as invalid (Rafael Wysocki). - Rework the handling of the NHLT table to simplify and clarify it and drop some obsolete pieces (Cezary Rojewski). * acpi-scan: ACPI: scan: Avoid enumerating devices with clearly invalid _STA values ACPI: scan: Introduce typedef:s for struct acpi_hotplug_context members ACPI: scan: Use standard error checking pattern ACPI: scan: Move misleading comment to acpi_dma_configure_id() ACPI: scan: Use list_first_entry_or_null() in acpi_device_hid() ACPI: bus: Don't use "proxy" headers ACPI: bus: Make container_of() no-op where it makes sense * acpi-tables: ACPI: NHLT: Streamline struct naming ACPI: NHLT: Drop redundant types ACPI: NHLT: Introduce API for the table ACPI: NHLT: Reintroduce types the table consists of
7 daysMerge tag 'slab-for-6.10' of ↵Linus Torvalds4-54/+96
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: "This time it's mostly random cleanups and fixes, with two performance fixes that might have significant impact, but limited to systems experiencing particular bad corner case scenarios rather than general performance improvements. The memcg hook changes are going through the mm tree due to dependencies. - Prevent stalls when reading /proc/slabinfo (Jianfeng Wang) This fixes the long-standing problem that can happen with workloads that have alloc/free patterns resulting in many partially used slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse the long partial slab list under spinlock with disabled irqs and thus can stall other processes or even trigger the lockup detection. The traversal is only done to count free objects so that <active_objs> column can be reported along with <num_objs>. To avoid affecting fast paths with another shared counter (attempted in the past) or complex partial list traversal schemes that allow rescheduling, the chosen solution resorts to approximation - when the partial list is over 10000 slabs long, we will only traverse first 5000 slabs from head and tail each and use the average of those to estimate the whole list. Both head and tail are used as the slabs near head to tend to have more free objects than the slabs towards the tail. It is expected the approximation should not break existing /proc/slabinfo consumers. The <num_objs> field is still accurate and reflects the overall kmem_cache footprint. The <active_objs> was already imprecise due to cpu and percpu-partial slabs, so can't be relied upon to determine exact cache usage. The difference between <active_objs> and <num_objs> is mainly useful to determine the slab fragmentation, and that will be possible even with the approximation in place. - Prevent allocating many slabs when a NUMA node is full (Chen Jun) Currently, on NUMA systems with a node under significantly bigger pressure than other nodes, the fallback strategy may result in each kmalloc_node() that can't be safisfied from the preferred node, to allocate a new slab on a fallback node, and not reuse the slabs already on that node's partial list. This is now fixed and partial lists of fallback nodes are checked even for kmalloc_node() allocations. It's still preferred to allocate a new slab on the requested node before a fallback, but only with a GFP_NOWAIT attempt, which will fail quickly when the node is under a significant memory pressure. - More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee) - Fix slub_kunit self-test with hardened freelists (Guenter Roeck) - Mark racy accesses for KCSAN (linke li) - Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)" * tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slub: remove the check for NULL kmalloc_caches mm/slub: create kmalloc 96 and 192 caches regardless cache size order mm/slub: mark racy access on slab->freelist slub: use count_partial_free_approx() in slab_out_of_memory() slub: introduce count_partial_free_approx() slub: Set __GFP_COMP in kmem_cache by default mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc() mm/slub: correct comment in do_slab_free() mm/slub, kunit: Use inverted data to corrupt kmem cache mm/slub: simplify get_partial_node() mm/slub: add slub_get_cpu_partial() helper mm/slub: remove the check of !kmem_cache_has_cpu_partial() mm/slub: Reduce memory consumption in extreme scenarios mm/slub: mark racy accesses on slab->slabs mm/slub: remove dummy slabinfo functions
7 daysMerge branch 'acpi-bus'Rafael J. Wysocki20-29/+21
Merge changes related to _OSC handling and updates eliminating the owner field from struct acpi_driver: - Make the kernel indicate support for several ACPI features that are in fact supported to the platform firmware through _OSC and fix the Generic Initiator Affinity _OSC bit (Armin Wolf). - Make the ACPI core set the owner value for ACPI drivers, drop the owner setting from a number of drivers and eliminate the owner field from struct acpi_driver (Krzysztof Kozlowski). * acpi-bus: (24 commits) ACPI: drop redundant owner from acpi_driver virt: vmgenid: drop owner assignment ptp: vmw: drop owner assignment platform/x86/wireless-hotkey: drop owner assignment platform/x86/toshiba_haps: drop owner assignment platform/x86/toshiba_bluetooth: drop owner assignment platform/x86/toshiba_acpi: drop owner assignment platform/x86/sony-laptop: drop owner assignment platform/x86/lg-laptop: drop owner assignment platform/x86/intel/smartconnect: drop owner assignment platform/x86/intel/rst: drop owner assignment platform/x86/eeepc: drop owner assignment platform/x86/dell: drop owner assignment platform: classmate-laptop: drop owner assignment platform: asus-laptop: drop owner assignment platform/chrome: wilco_ec: drop owner assignment net: fjes: drop owner assignment Input: atlas - drop owner assignment ACPI: store owner from modules with acpi_bus_register_driver() ACPI: bus: Indicate support for IRQ ResourceSource thru _OSC ...
7 daysMerge tag 'kcsan.2024.05.10a' of ↵Linus Torvalds3-0/+34
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull kcsan update from Paul McKenney: "Introduce __data_racy type qualifier This adds a __data_racy type qualifier that enables kernel developers to inform KCSAN that a given variable is a shared variable without needing to mark each and every access. This allows pre-KCSAN code to be correctly (if approximately) instrumented withh very little effort, and also provides people reading the code a clear indication that the variable is in fact shared. In addition, it permits incremental transition to per-access KCSAN marking, so that (for example) a given subsystem can be transitioned one variable at a time, while avoiding large numbers of KCSAN warnings during this transition" * tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan, compiler_types: Introduce __data_racy type qualifier
7 daysMerge tag 'lkmm.2024.05.10a' of ↵Linus Torvalds6-2/+176
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull LKMM documentation updates from Paul McKenney: "This upgrades LKMM documentation, perhaps most notably adding a number of litmus tests illustrating cmpxchg() ordering properties. TL;DR: Failing cmpxchg() operations provide no ordering" * tag 'lkmm.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus Documentation/atomic_t: Emphasize that failed atomic operations give no ordering Documentation/litmus-tests: Demonstrate unordered failing cmpxchg Documentation/litmus-tests: Add locking tests to README
7 daysMerge tag 'cmpxchg.2024.05.11a' of ↵Linus Torvalds11-83/+133
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull cmpxchg updates from Paul McKenney: "Provide one-byte and two-byte cmpxchg() support on sparc32, parisc, and csky This provides native one-byte and two-byte cmpxchg() support for sparc32 and parisc, courtesy of Al Viro. This support is provided by the same hashed-array-of-locks technique used for the other atomic operations provided for these two platforms. There is also emulated one-byte cmpxchg() support for csky using a new cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate the one-byte variant. Similar patches for emulation of one-byte cmpxchg() for arc, sh, and xtensa have not yet received maintainer acks, so they are slated for the v6.11 merge window" * tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: csky: Emulate one-byte cmpxchg lib: Add one-byte emulation function parisc: add u16 support to cmpxchg() parisc: add missing export of __cmpxchg_u8() parisc: unify implementations of __cmpxchg_u{8,32,64} parisc: __cmpxchg_u32(): lift conversion into the callers sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes sparc32: unify __cmpxchg_u{32,64} sparc32: make the first argument of __cmpxchg_u64() volatile u64 * sparc32: make __cmpxchg_u32() return u32
7 daysselftests/cgroup: Drop define _GNU_SOURCEEdward Liaw7-15/+0
_GNU_SOURCE is provided by lib.mk, so it should be dropped to prevent redefinition warnings. Signed-off-by: Edward Liaw <edliaw@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
7 daysdocs: cgroup-v1: Update page cache removal functionsIllia Ostapyshyn1-1/+1
Commit 452e9e6992fe ("filemap: Add filemap_remove_folio and __filemap_remove_folio") reimplemented __delete_from_page_cache() as __filemap_remove_folio() and delete_from_page_cache() as filemap_remove_folio(). The compatibility wrappers were finally removed in ece62684dcfb ("hugetlbfs: convert hugetlb_delete_from_page_cache() to use folios") and 6ffcd825e7d0 ("mm: Remove __delete_from_page_cache()"). Update the remaining references to dead functions in the memcg implementation memo. Signed-off-by: Illia Ostapyshyn <illia@yshyn.com> Signed-off-by: Tejun Heo <tj@kernel.org>
7 daysMerge tag 'rcu.next.v6.10' of https://github.com/urezki/linuxLinus Torvalds29-137/+663
Pull RCU updates from Uladzislau Rezki: - Fix a lockdep complain for lazy-preemptible kernel, remove redundant BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix false positives KCSAN splat and fix buffer overflow in the print_cpu_stall_info(). - Misc updates related to bpf, tracing and update the MAINTAINERS file. - An improvement of a normal synchronize_rcu() call in terms of latency. It maintains a separate track for sync. users only. This approach bypasses per-cpu nocb-lists thus sync-users do not depend on nocb-list length and how fast regular callbacks are processed. - RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE priority, fix some comments, add some diagnostic warning to the exit_tasks_rcu_start() and fix a buffer overflow in the show_rcu_tasks_trace_gp_kthread(). - RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU testing, some updates for TREE09, dump mode information to debug GP kthread state, remove redundant READ_ONCE(), fix some comments about RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer initialization, fix a hung splat task by when the rcutorture tests start to exit, fix invalid context warning, add '--do-kvfree' parameter to torture test and use slow register unregister callbacks only for rcutype test. * tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits) rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test torture: Scale --do-kvfree test time rcutorture: Fix invalid context warning when enable srcu barrier testing rcutorture: Make stall-tasks directly exit when rcutorture tests end rcutorture: Removing redundant function pointer initialization rcutorture: Make rcutorture support print rcu-tasks gp state rcutorture: Use the gp_kthread_dbg operation specified by cur_ops rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE() rcu: Allocate WQ with WQ_MEM_RECLAIM bit set rcu: Support direct wake-up of synchronize_rcu() users rcu: Add a trace event for synchronize_rcu_normal() rcu: Reduce synchronize_rcu() latency rcu: Fix buffer overflow in print_cpu_stall_info() rcu: Mollify sparse with RCU guard rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE() rcu: Remove redundant CONFIG_PROVE_RCU #if condition ...
7 daysselftests/user_events: Add non-spacing separator checkBeau Belgrave1-0/+8
The ABI documentation indicates that field separators do not need a space between them, only a ';'. When no spacing is used, the register must work. Any subsequent register, with or without spaces, must match and not return -EADDRINUSE. Add a non-spacing separator case to our self-test register case to ensure it works going forward. Link: https://lore.kernel.org/linux-trace-kernel/20240423162338.292-3-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
7 daystracing/user_events: Fix non-spaced field matchingBeau Belgrave1-1/+75
When the ABI was updated to prevent same name w/different args, it missed an important corner case when fields don't end with a space. Typically, space is used for fields to help separate them, like "u8 field1; u8 field2". If no spaces are used, like "u8 field1;u8 field2", then the parsing works for the first time. However, the match check fails on a subsequent register, leading to confusion. This is because the match check uses argv_split() and assumes that all fields will be split upon the space. When spaces are used, we get back { "u8", "field1;" }, without spaces we get back { "u8", "field1;u8" }. This causes a mismatch, and the user program gets back -EADDRINUSE. Add a method to detect this case before calling argv_split(). If found force a space after the field separator character ';'. This ensures all cases work properly for matching. With this fix, the following are all treated as matching: u8 field1;u8 field2 u8 field1; u8 field2 u8 field1;\tu8 field2 u8 field1;\nu8 field2 Link: https://lore.kernel.org/linux-trace-kernel/20240423162338.292-2-beaub@linux.microsoft.com Fixes: ba470eebc2f6 ("tracing/user_events: Prevent same name but different args event") Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
7 daysMerge tag 'asm-generic-alpha' of ↵Linus Torvalds72-4541/+166
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull alpha updates from Arnd Bergmann: "I had investigated dropping support for alpha EV5 and earlier a while ago after noticing that this is the only supported CPU family in the kernel without native byte access and that Debian has already dropped support for this generation last year [1] in order to improve performance for the newer machines. This topic came up again when Paul McKenney noticed that parts of the RCU code already rely on byte access and do not work on alpha EV5 reliably, so we decided on using my series to avoid the problem entirely. Al Viro did another series for alpha to address all the known build issues. I rebased his patches without any further changes and included it as a baseline for my work here to avoid conflicts and allow backporting the fixes to stable kernels for the now removed hardware support as well" [ I dearly loved alpha back in the days, but the lack of byte and word operations was a horrible mistake and made everything worse - including very much the crazy IO contortions that resulted from it. It certainly wasn't the only mistake in the architecture, but it's the first-order issue. So while it's a bit sad to see the support for my first alpha go away, if you want to run museum hardware, maybe you should use museum kernels.. - Linus ] * tag 'asm-generic-alpha' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: alpha: drop pre-EV56 support alpha: cabriolet: remove EV5 CPU support alpha: remove LCA and APECS based machines alpha: sable: remove early machine support alpha: remove DECpc AXP150 (Jensen) support alpha: trim the unused stuff from asm-offsets.c alpha: jensen, t2 - make __EXTERN_INLINE same as for the rest alpha: core_lca: take the unused functions out alpha: missing includes alpha: sys_sio: fix misspelled ifdefs alpha: don't make functions public without a reason alpha: add clone3() support alpha: fix modversions for strcpy() et.al. alpha: sort scr_mem{cpy,move}w() out
7 daysMerge tag 'soc-defconfig-6.10' of ↵Linus Torvalds4-1/+27
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC defconfig updates from Arnd Bergmann: "Most of the changes enable additional device driver modules and arm64 platforms. In addition, the usb onboard-device support and ext4 security labels are turned on" * tag 'soc-defconfig-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits) arm64: defconfig: enable Airoha platform arm64: defconfig: enable Khadas TS050 panel as module arm64: defconfig: select INTERCONNECT_QCOM_SM6115 as built-in arm64: defconfig: Enable Tegra Security Engine arm64: defconfig: enable REGULATOR_QCOM_USB_VBUS ARM: imx_v6_v7_defconfig: Update ONBOARD_USB_HUB to ONBOAD_USB_DEV arm64: defconfig: enable ext4 security labels arm64: defconfig: qcom: enable X1E80100 sound card ARM: configs: sunxi: Enable DRM_DW_HDMI arm64: defconfig: build snd_bcm2835 as module arm64: defconfig: enable Rockchip Samsung USBDP PHY ARM: shmobile: defconfig: Refresh for v6.9-rc1 arm64: defconfig: build ath12k as a module arm64: defconfig: Enable sc7280 display and gpu clock controllers ARM: imx_v6_v7_defconfig: Select CONFIG_USB_ONBOARD_HUB arm64: defconfig: Enable DRM_IMX8MP_DW_HDMI_BRIDGE as module arm64: defconfig: support Mali CSF-based GPUs arm64: defconfig: enable Rockchip RK3308 internal audio codec driver arm64: defconfig: Enable R9A09G057 SoC arm64: defconfig: Enable Renesas DA9062 PMIC ...
7 daysMerge tag 'soc-arm-6.10' of ↵Linus Torvalds7-38/+124
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC code changes from Arnd Bergmann: "The code changes are fairly minimal, there is a bit of conversion of the old orion5x platform to modern gpio descriptors, the Kconfig entry for the added EN7581 platform and a sysfs change for the i.MX PMU device" * tag 'soc-arm-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: add Airoha EN7581 platform ARM: orion5x: Convert TS409 board to GPIO descriptors for LEDs ARM: orion5x: Convert Net2big board to GPIO descriptors for LEDs ARM: orion5x: Convert MV2120 board to GPIO descriptors for LEDs ARM: orion5x: Convert DNS323 board to GPIO descriptors for LEDs ARM: orion5x: Convert D2Net board to GPIO descriptors for LEDs ARM: imx: Assign parents for mmdc event_source devices
7 daysMerge tag 'soc-drivers-6.10' of ↵Linus Torvalds125-819/+5116
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "As usual, these are updates for drivers that are specific to certain SoCs or firmware running on them. Notable updates include - The new STMicroelectronics STM32 "firewall" bus driver that is used to provide a barrier between different parts of an SoC - Lots of updates for the Qualcomm platform drivers, in particular SCM, which gets a rewrite of its initialization code - Firmware driver updates for Arm FF-A notification interrupts and indirect messaging, SCMI firmware support for pin control and vendor specific interfaces, and TEE firmware interface changes across multiple TEE drivers - A larger cleanup of the Mediatek CMDQ driver and some related bits - Kconfig changes for riscv drivers to prepare for adding Kanaan k230 support - Multiple minor updates for the TI sysc bus driver, memory controllers, hisilicon hccs and more" * tag 'soc-drivers-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (103 commits) firmware: qcom: uefisecapp: Allow on sc8180x Primus and Flex 5G soc: qcom: pmic_glink: Make client-lock non-sleeping dt-bindings: soc: qcom,wcnss: fix bluetooth address example soc/tegra: pmc: Add EQOS wake event for Tegra194 and Tegra234 bus: stm32_firewall: fix off by one in stm32_firewall_get_firewall() bus: etzpc: introduce ETZPC firewall controller driver firmware: arm_ffa: Avoid queuing work when running on the worker queue bus: ti-sysc: Drop legacy idle quirk handling bus: ti-sysc: Drop legacy quirk handling for smartreflex bus: ti-sysc: Drop legacy quirk handling for uarts bus: ti-sysc: Add a description and copyrights bus: ti-sysc: Move check for no-reset-on-init soc: hisilicon: kunpeng_hccs: replace MAILBOX dependency with PCC soc: hisilicon: kunpeng_hccs: Add the check for obtaining complete port attribute firmware: arm_ffa: Fix memory corruption in ffa_msg_send2() bus: rifsc: introduce RIFSC firewall controller driver of: property: fw_devlink: Add support for "access-controller" soc: mediatek: mtk-socinfo: Correct the marketing name for MT8188GV soc: mediatek: mtk-socinfo: Add entry for MT8395AV/ZA Genio 1200 soc: mediatek: mtk-mutex: Add support for MT8188 VPPSYS ...
7 daysMerge tag 'soc-dt-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds565-5083/+23890
Pull SoC devicetree updates from Arnd Bergmann: "The updates this time are a bit smaller than most times, mainly because it is not totally dominated by new Qualcomm hardware support. Instead, we larger than average updates for Rockchips, NXP, Allwinner and TI. The only two new SoCs this time are both from NXP and are minor variants of already supported ones. The updates for aspeed, amlogic and mediatek came a little late, so I'm saving those for part 2 in a few days if everything turns out fine. New machines this time contain: - two Broadcom SoC based wireless routers from Asus - Five allwinner based consumer devices for gaming, set-top-box and eboot reader applications - Three older phones based on Qualcomm chips, plus the more recent Sony Xperia 1 V - 14 industrial and embedded boards based on NXP i.MX6, i.MX8, layerscape and s32g3 SoCs - six rockchips boards including another handheld game console and a few single-board computers On top of these, we have the usual cleanups for dtc warnings and updates to add more features to already merged machines" * tag 'soc-dt-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (612 commits) arm64: dts: marvell: espressobin-ultra: fix Ethernet Switch unit address arm64: dts: marvell: turris-mox: drop unneeded flash address/size-cells arm64: dts: marvell: eDPU: drop redundant address/size-cells arm64: dts: qcom: pm6150: correct USB VBUS regulator compatible arm64: dts: rockchip: add rk3588 pcie and php IOMMUs arm64: dts: rockchip: enable onboard spi flash for rock-3a arm64: dts: rockchip: add USB-C support to rk3588s-orangepi-5 arm64: dts: rockchip: Enable GPU on Orange Pi 5 arm64: dts: rockchip: enable GPU on khadas-edge2 arm64: dts: rockchip: Add USB3 on Edgeble NCM6A-IO board arm64: dts: rockchip: Support poweroff on Edgeble Neural Compute Module arm64: dts: rockchip: Add Radxa ROCK 3C dt-bindings: arm: rockchip: add Radxa ROCK 3C arm64: dts: exynos: gs101: specify empty clocks for remaining pinctrl arm64: dts: exynos: gs101: specify bus clock for pinctrl_hsi2 arm64: dts: exynos: gs101: specify bus clock for pinctrl_peric[01] arm64: dts: exynos: gs101: specify bus clock for pinctrl (far) alive arm64: dts: Add/fix /memory node unit-addresses arm64: dts: qcom: qcs404: fix bluetooth device address arm64: dts: qcom: sc8280xp-x13s: enable USB MP and fingerprint reader ...
7 daysMerge tag 's390-6.10-1' of ↵Linus Torvalds68-703/+1453
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Store AP Query Configuration Information in a static buffer - Rework the AP initialization and add missing cleanups to the error path - Swap IRQ and AP bus/device registration to avoid race conditions - Export prot_virt_guest symbol - Introduce AP configuration changes notifier interface to facilitate modularization of the AP bus - Add CONFIG_AP kernel configuration option to allow modularization of the AP bus - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description and dependency and rename it to CONFIG_AP_DEBUG - Convert sprintf() and snprintf() to sysfs_emit() in CIO code - Adjust indentation of RELOCS command build step - Make crypto performance counters upward compatible - Convert make_page_secure() and gmap_make_secure() to use folio - Rework channel-utilization-block (CUB) handling in preparation of introducing additional CUBs - Use attribute groups to simplify registration, removal and extension of measurement-related channel-path sysfs attributes - Add a per-channel-path binary "ext_measurement" sysfs attribute that provides access to extended channel-path measurement data - Export measurement data for all channel-measurement-groups (CMG), not only for a specific ones. This enables support of new CMG data formats in userspace without the need for kernel changes - Add a per-channel-path sysfs attribute "speed_bps" that provides the operating speed in bits per second or 0 if the operating speed is not available - The CIO tracepoint subchannel-type field "st" is incorrectly set to the value of subchannel-enabled SCHIB "ena" field. Fix that - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS - Consider the maximum physical address available to a DCSS segment (512GB) when memory layout is set up - Simplify the virtual memory layout setup by reducing the size of identity mapping vs vmemmap overlap - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory. This will allow to place the kernel image next to kernel modules - Move everyting KASLR related from <asm/setup.h> to <asm/page.h> - Put virtual memory layout information into a structure to improve code generation - Currently __kaslr_offset is the kernel offset in both physical and virtual memory spaces. Uncouple these offsets to allow uncoupling of the addresses spaces - Currently the identity mapping base address is implicit and is always set to zero. Make it explicit by putting into __identity_base persistent boot variable and use it in proper context - Introduce .amode31 section start and end macros AMODE31_START and AMODE31_END - Introduce OS_INFO entries that do not reference any data in memory, but rather provide only values - Store virtual memory layout in OS_INFO. It is read out by makedumpfile, crash and other tools - Store virtual memory layout in VMCORE_INFO. It is read out by crash and other tools when /proc/kcore device is used - Create additional PT_LOAD ELF program header that covers kernel image only, so that vmcore tools could locate kernel text and data when virtual and physical memory spaces are uncoupled - Uncouple physical and virtual address spaces - Map kernel at fixed location when KASLR mode is disabled. The location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration value. - Rework deployment of kernel image for both compressed and uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel configuration value - Move .vmlinux.relocs section in front of the compressed kernel. The interim section rescue step is avoided as result - Correct modules thunk offset calculation when branch target is more than 2GB away - Kernel modules contain their own set of expoline thunks. Now that the kernel modules area is less than 4GB away from kernel expoline thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN the default if the compiler supports it - userfaultfd can insert shared zeropages into processes running VMs, but that is not allowed for s390. Fallback to allocating a fresh zeroed anonymous folio and insert that instead - Re-enable shared zeropages for non-PV and non-skeys KVM guests - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use - Add ap_config sysfs attribute to provide the means for setting or displaying adapters, domains and control domains assigned to a vfio-ap mediated device in a single operation - Make vfio_ap_mdev_link_queue() ignore duplicate link requests - Add write support to ap_config sysfs attribute to allow atomic update a vfio-ap mediated device state - Document ap_config sysfs attribute - Function os_info_old_init() is expected to be called only from a regular kdump kernel. Enable it to be called from a stand-alone dump kernel - Address gcc -Warray-bounds warning and fix array size in struct os_info - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks - Use unwinder instead of __builtin_return_address() with ftrace to prevent returning of undefined values - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled - Compile kernel with -fPIC and link with -no-pie to allow kpatch feature always succeed and drop the whole CONFIG_PIE_BUILD option-enabled code - Add missing virt_to_phys() converter for VSIE facility and crypto control blocks * tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits) Revert "s390: Relocate vmlinux ELF data to virtual address space" KVM: s390: vsie: Use virt_to_phys for crypto control block s390: Relocate vmlinux ELF data to virtual address space s390: Compile kernel with -fPIC and link with -no-pie s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD s390/ftrace: Use unwinder instead of __builtin_return_address() s390/pci: Drop unneeded reference to CONFIG_DMI s390/os_info: Fix array size in struct os_info s390/os_info: Initialize old os_info in standalone dump kernel docs: Update s390 vfio-ap doc for ap_config sysfs attribute s390/vfio-ap: Add write support to sysfs attr ap_config s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state s390/ap: Externalize AP bus specific bitmap reading function s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests mm/userfaultfd: Do not place zeropages when zeropages are disallowed s390/expoline: Make modules use kernel expolines s390/nospec: Correct modules thunk offset calculation s390/boot: Do not rescue .vmlinux.relocs section s390/boot: Rework deployment of the kernel image ...
7 daysof: property: Add fw_devlink support for interrupt-map propertyAnup Patel1-0/+52
Some of the PCI host controllers (such as generic PCI host controller) use "interrupt-map" DT property to describe the mapping between PCI endpoints and PCI interrupt pins. This is the only case where the interrupts are not described in DT. Currently, there is no fw_devlink created based on "interrupt-map" DT property so interrupt controller is not guaranteed to be probed before the PCI host controller. This affects every platform where both PCI host controller and interrupt controllers are probed as regular platform devices. This creates fw_devlink between consumers (PCI host controller) and supplier (interrupt controller) based on "interrupt-map" DT property. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20240509120820.1430587-1-apatel@ventanamicro.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
7 daysdt-bindings: display: panel: constrain 'reg' in DSI panelsKrzysztof Kozlowski40-41/+120
DSI-attached devices could respond to more than one virtual channel number, thus their bindings are supposed to constrain the 'reg' property to match hardware. Add missing 'reg' constrain for DSI-attached display panels, based on DTS sources in Linux kernel (assume all devices take only one channel number). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240509-dt-bindings-dsi-panel-reg-v1-3-8b2443705be0@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
7 daysdt-bindings: display: panel: constrain 'reg' in SPI panelsKrzysztof Kozlowski22-16/+60
SPI-attached devices could have more than one chip-select, thus their bindings are supposed to constrain the 'reg' property to match hardware. Add missing 'reg' constrain for SPI-attached display panels. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240509-dt-bindings-dsi-panel-reg-v1-2-8b2443705be0@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
7 daysdt-bindings: display: samsung,ams495qa01: add missing SPI properties refKrzysztof Kozlowski1-0/+1
Samsung AMS495QA01 panel is a SPI device, so it should reference spi-peripheral-props.yaml schema to allow and validate the SPI device properties. Fixes: 92be07c65b22 ("dt-bindings: display: panel: Add Samsung AMS495QA01") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240509-dt-bindings-dsi-panel-reg-v1-1-8b2443705be0@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
7 daysMerge branch 'i2c/for-current' into i2c/for-mergewindowWolfram Sang1-4/+15
I missed the last chance to send this in for 6.9, so it now goes into the 6.10 queue
7 daysi2c: mux: Remove class argument from i2c_mux_add_adapter()Heiner Kallweit23-49/+23
99a741aa7a2d ("i2c: mux: gpio: remove support for class-based device instantiation") removed the last call to i2c_mux_add_adapter() with a non-null class argument. Therefore the class argument can be removed. Note: Class-based device instantiation is a legacy mechanism which shouldn't be used in new code, so we can rule out that this argument may be needed again in the future. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Peter Rosin <peda@axentia.se> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
7 daysdmaengine: idxd: add a write() method for applications to submit workNikhil Rao2-2/+90
After the patch to restrict the use of mmap() to CAP_SYS_RAWIO for the currently existing devices, most applications can no longer make use of the accelerators as in production "you don't run things as root". To keep the DSA and IAA accelerators usable, hook up a write() method so that applications can still submit work. In the write method, sufficient input validation is performed to avoid the security issue that required the mmap CAP_SYS_RAWIO check. One complication is that the DSA device allows for indirect ("batched") descriptors. There is no reasonable way to do the input validation on these indirect descriptors so the write() method will not allow these to be submitted to the hardware on affected hardware, and the sysfs enumeration of support for the opcode is also removed. Early performance data shows that the performance delta for most common cases is within the noise. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
7 daysdmaengine: idxd: add a new security check to deal with a hardware erratumArjan van de Ven3-0/+19
On Sapphire Rapids and related platforms, the DSA and IAA devices have an erratum that causes direct access (for example, by using the ENQCMD or MOVDIR64 instructions) from untrusted applications to be a security problem. To solve this, add a flag to the PCI device enumeration and device structures to indicate the presence/absence of this security exposure. In the mmap() method of the device, this flag is then used to enforce that the user has the CAP_SYS_RAWIO capability. In a future patch, a write() based method will be added that allows untrusted applications submit work to the accelerator, where the kernel can do sanity checking on the user input to ensure secure operation of the accelerator. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
7 daysVFIO: Add the SPR_DSA and SPR_IAX devices to the denylistArjan van de Ven3-3/+4
Due to an erratum with the SPR_DSA and SPR_IAX devices, it is not secure to assign these devices to virtual machines. Add the PCI IDs of these devices to the VFIO denylist to ensure that this is handled appropriately by the VFIO subsystem. The SPR_DSA and SPR_IAX devices are on-SOC devices for the Sapphire Rapids (and related) family of products that perform data movement and compression. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
7 daysMerge tag 'i2c-host-6.10' of ↵Wolfram Sang55-733/+1295
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow Code cleanup: A substantial code cleanup from Wolfram affects many drivers: - Removed dev_err() in case of timeout during i2c transfers, as timeouts are not considered errors and should not be treated as such. - For the same reason, 'timeout' variables have been renamed to 'time_left'. Other cleanups: - The viperboard driver now omits the "owner = THIS_MODULE" assignment. - Finally, we have eliminated the last remnants of I2C_CLASS_SPD: support for class-based devices has been completely removed from the mux-gpio driver. - In the ocore devices, a more standard use of ioport_map() for 8-bit I/O read/write operations has been implemented. - The mpc driver will be among the first i2c drivers and one of the first in the kernel to use the __free auto cleanup routine. - The designware driver now uses MODULE_DEVICE_TABLE() instead of MODULE_ALIAS() for better consistency with the ID table. - Added prefixes to the octeon register macros. - Fixed some checkpatch errors in the newly created i2c-viai2c-common.c file. Code refactoring: - The riic driver has refactored read/write operations to more flexibly support new platforms, laying the foundation for new SoC peculiarities. - In the i801 driver, a notifier callback has been created for muxed child segments. - The lpi2c driver now sets a clock rate during probe instead of continuously calling clk_get_rate(). - Improvements in the clock divisor logic to accommodate other clock frequencies. - Combined some common functionalities during initialization for the wmt driver and separated others that can be independently used by different drivers. Now, all the common functionalities are grouped in the i2c-viai2c-common.c file. - Improved the clock stretching mechanism in the newly created i2c-viai2c-common.c file, inherited from the previous i2c-wmt.c. Features added: - The octeon driver now includes watchdog timeout handling. - Added high-speed support for the octeon driver. Added support for: - R9A09G057 SoC in the riic driver. - Rapids-D I2C controller in the designware driver. - Cadence driver now also supports RISC-V architectures. - Added support to the WMT device as a separate driver using the newly created i2c-viai2c-common.c functionalities. - Added support for the Zhaoxin I2C controller. Some improvements in the bindings: - The pnx driver is converted to dtschema. - Added documentation for the Qualcomm SC8280XP.
7 daysMerge tag 'i2c-host-fixes-6.8-rc8' of ↵Wolfram Sang606-3102/+6006
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow This tag includes two fixes. The first one, in the Cadence driver seen in Qemu, prevents unintentional FIFO clearing at the beginning of a transaction. The second fix, in the SynQuacer, ensures proper error handling during clock get, prepare, and enable operations by using the devm_clk_get_enabled() helper.
7 dayssh: setup: Add missing forward declaration for sh_fdt_init()Geert Uytterhoeven1-0/+1
arch/sh/kernel/setup.c:244:12: warning: no previous prototype for 'sh_fdt_init' [-Wmissing-prototypes] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/7e3ea09e706a075bceb6bfd172990676e79be1c2.1715606232.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
7 dayssh: smp: Protect setup_profiling_timer() by CONFIG_PROFILINGGeert Uytterhoeven1-0/+2
arch/sh/kernel/smp.c:326:5: warning: no previous prototype for 'setup_profiling_timer' [-Wmissing-prototypes] The function is unconditionally defined in smp.c, but conditionally declared in <linux/profile.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/effa5eecbd2389c6661974e91bb834db210989ea.1715606232.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
7 dayssh: of-generic: Add missing #include <asm/clock.h>Geert Uytterhoeven1-0/+2
arch/sh/boards/of-generic.c:146:20: warning: no previous prototype for 'arch_init_clk_ops' [-Wmissing-prototypes] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/942621553ed82e3331e2e91485b643892d2d08bc.1715606232.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
7 daysMerge branch 'topic/kdump-hotplug' into nextMichael Ellerman19-359/+713
Merge our topic branch containing kdump hotplug changes, more detail from the original cover letter: Commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. This patch series adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU/Memory add/remove events. Among the 6 patches in this series, the first two patches make changes to the generic crash hotplug handler to assist PowerPC in adding support for this feature. The last four patches add support for this feature. The following section outlines the problem addressed by this patch series, along with the current solution, its shortcomings, and the proposed resolution. Problem: ======== Due to CPU/Memory hotplug or online/offline events the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. Existing solution and its shortcoming: ====================================== The current solution to address the above issue involves monitoring the CPU/memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. Proposed solution: ================== Instead of initiating a full kdump image reload from userspace on CPU/Memory hotplug and online/offline events, the proposed solution aims to update only the necessary kdump image component within the kernel itself.
7 daysMerge branch 'topic/ppc-kvm' into nextMichael Ellerman6-14/+10
Merge our KVM topic branch.
7 daysMerge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'core' and 'x86/vt-d' ↵Joerg Roedel72-1598/+3536
into next
7 daysALSA: hda/realtek: Drop doubly quirk entry for 103c:8a2eTakashi Iwai1-1/+0
There are two quirk entries for SSID 103c:8a2e. Drop the latter one that isn't applied in anyway. As both point to the same quirk action, there is no actual behavior change. Fixes: aa8e3ef4fe53 ("ALSA: hda/realtek: Add quirks for various HP ENVY models") Link: https://lore.kernel.org/r/20240513064010.17546-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 daysALSA: hda/realtek - fixed headset Mic not showKailang Yang1-0/+22
ALC256 run on SOF mode. Boot with plugged headset, the Headset Mic will be gone. Plugged headset after boot. It had partial fail with Headset Mic detect. Add spec->en_3kpull_low = false will solve all issues. Signed-off-by: Kailang Yang <kailang@realtek.com> Link: https://lore.kernel.org/r/c8b638590c5f45a6a5c6aeb20c31fd5b@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 daysASoC: SOF: amd: Fix build error with built-in configTakashi Iwai1-1/+1
Makefile in AMD ACP driver has a line substitution with "=" instead of "+="; this overrides the preexisting item, hence it broke the build after the recent change to replace *-objs with *-y. This patch corrects the line. Fixes: 1a74b21ce59f ("ASoC: SOF: amd: Add Probe functionality support for amd platforms.") Fixes: 9c2f5b6eb8b7 ("ASoC: SOF: Use *-y instead of *-objs in Makefile") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20240510170305.03b67d9f@canb.auug.org.au Link: https://lore.kernel.org/r/20240510073656.23491-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 daysMerge tag 'asoc-v6.10' of ↵Takashi Iwai1365-16625/+28545
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.10 This is a very big update, in large part due to extensive work the Intel people have been doing in their drivers though it's also been busy elsewhere. There's also a big overhaul of the DAPM documentation from Luca Ceresoli arising from the work he did putting together his recent ELC talk, and he also contributed a new tool for visualising the DAPM state. - A new tool dapm-graph for visualising the DAPM state. - Substantial fixes and clarifications for the DAPM documentation. - Very large updates throughout the Intel audio drivers. - Cleanups of accessors for driver data, module labelling, and for constification. - Modernsation and cleanup work in the Mediatek drivers. - Several fixes and features for the DaVinci I2S driver. - New drivers for several AMD and Intel platforms, Nuvoton NAU8325, Rockchip RK3308 and Texas Instruments PCM6240.
7 dayssh: dreamcast: Fix GAPS PCI bridge addressingArtur Rojek2-1/+5
The G2-to-PCI bridge chip found in SEGA Dreamcast assumes P2 area relative addresses. Set the appropriate IOPORT base offset. Tested-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20240511191614.68561-2-contact@artur-rojek.eu Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
8 daysMAINTAINERS: Add Günther Noack as Landlock reviewerMickaël Salaün1-0/+1
Günther is a major contributor to Landlock, both on the kernel and user space sides, and he is already reviewing Landlock changes. Thanks! Cc: James Morris <jmorris@namei.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge E. Hallyn <serge@hallyn.com> Acked-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240425092126.975830-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysfs/ioctl: Add a comment to keep the logic in sync with LSM policiesGünther Noack1-0/+3
Landlock's IOCTL support needs to partially replicate the list of IOCTLs from do_vfs_ioctl(). The list of commands implemented in do_vfs_ioctl() should be kept in sync with Landlock's IOCTL policies. Suggested-by: Paul Moore <paul@paul-moore.com> Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-12-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysMAINTAINERS: Notify Landlock maintainers about changes to fs/ioctl.cGünther Noack1-0/+1
Landlock needs to track changes to do_vfs_ioctl() when new IOCTL implementations are added to it. Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-11-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 dayslandlock: Document IOCTL supportGünther Noack1-16/+62
In the paragraph above the fallback logic, use the shorter phrasing from the landlock(7) man page. Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-10-gnoack@google.com [mic: Update date, and fix redundant "access"] Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 dayssamples/landlock: Add support for LANDLOCK_ACCESS_FS_IOCTL_DEVGünther Noack1-3/+10
Add IOCTL support to the Landlock sample tool. The IOCTL right is grouped with the read-write rights in the sample tool, as some IOCTL requests provide features that mutate state. Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-9-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysselftests/landlock: Exhaustive test for the IOCTL allow-listGünther Noack1-0/+114
This test checks all IOCTL commands implemented in do_vfs_ioctl(). Test coverage for security/landlock is 90.9% of 722 lines according to gcc/gcov-13. Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-8-gnoack@google.com [mic: Add test coverage] Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysselftests/landlock: Check IOCTL restrictions for named UNIX domain socketsGünther Noack1-0/+53
The LANDLOCK_ACCESS_FS_IOCTL_DEV right should have no effect on the use of named UNIX domain sockets. Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-7-gnoack@google.com [mic: Add missing stddef.h for offsetof()] Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysselftests/landlock: Test IOCTLs on named pipesGünther Noack1-0/+43
Named pipes should behave like pipes created with pipe(2), so we don't want to restrict IOCTLs on them. Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-6-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysselftests/landlock: Test ioctl(2) and ftruncate(2) with open(O_PATH)Günther Noack1-0/+40
ioctl(2) and ftruncate(2) operations on files opened with O_PATH should always return EBADF, independent of the LANDLOCK_ACCESS_FS_TRUNCATE and LANDLOCK_ACCESS_FS_IOCTL_DEV access rights in that file hierarchy. Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-5-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysselftests/landlock: Test IOCTL with memfdsGünther Noack1-8/+36
Because the LANDLOCK_ACCESS_FS_IOCTL_DEV right is associated with the opened file during open(2), IOCTLs are supposed to work with files which are opened by means other than open(2). Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-4-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysselftests/landlock: Test IOCTL supportGünther Noack1-3/+189
Exercises Landlock's IOCTL feature in different combinations of handling and permitting the LANDLOCK_ACCESS_FS_IOCTL_DEV right, and in different combinations of using files and directories. Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-3-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 dayslandlock: Add IOCTL access right for character and block devicesGünther Noack6-16/+258
Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right and increments the Landlock ABI version to 5. This access right applies to device-custom IOCTL commands when they are invoked on block or character device files. Like the truncate right, this right is associated with a file descriptor at the time of open(2), and gets respected even when the file descriptor is used outside of the thread which it was originally opened in. Therefore, a newly enabled Landlock policy does not apply to file descriptors which are already open. If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small number of safe IOCTL commands will be permitted on newly opened device files. These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well as other IOCTL commands for regular files which are implemented in fs/ioctl.c. Noteworthy scenarios which require special attention: TTY devices are often passed into a process from the parent process, and so a newly enabled Landlock policy does not retroactively apply to them automatically. In the past, TTY devices have often supported IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were letting callers control the TTY input buffer (and simulate keypresses). This should be restricted to CAP_SYS_ADMIN programs on modern kernels though. Known limitations: The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained control over IOCTL commands. Landlock users may use path-based restrictions in combination with their knowledge about the file system layout to control what IOCTLs can be done. Cc: Paul Moore <paul@paul-moore.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-2-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 dayssamples/landlock: Fix incorrect free in populate_ruleset_netIvanov Mikhail1-2/+3
Pointer env_port_name changes after strsep(). Memory allocated via strdup() will not be freed if landlock_add_rule() returns non-zero value. Fixes: 5e990dcef12e ("samples/landlock: Support TCP restrictions") Signed-off-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Reviewed-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20240326095625.3576164-1-ivanov.mikhail1@huawei-partners.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysbpf: make list_for_each_entry portableJose E. Marchesi4-10/+38
[Changes from V1: - The __compat_break has been abandoned in favor of a more readable can_loop macro that can be used anywhere, including loop conditions.] The macro list_for_each_entry is defined in bpf_arena_list.h as follows: #define list_for_each_entry(pos, head, member) \ for (void * ___tmp = (pos = list_entry_safe((head)->first, \ typeof(*(pos)), member), \ (void *)0); \ pos && ({ ___tmp = (void *)pos->member.next; 1; }); \ cond_break, \ pos = list_entry_safe((void __arena *)___tmp, typeof(*(pos)), member)) The macro cond_break, in turn, expands to a statement expression that contains a `break' statement. Compound statement expressions, and the subsequent ability of placing statements in the header of a `for' loop, are GNU extensions. Unfortunately, clang implements this GNU extension differently than GCC: - In GCC the `break' statement is bound to the containing "breakable" context in which the defining `for' appears. If there is no such context, GCC emits a warning: break statement without enclosing `for' o `switch' statement. - In clang the `break' statement is bound to the defining `for'. If the defining `for' is itself inside some breakable construct, then clang emits a -Wgcc-compat warning. This patch adds a new macro can_loop to bpf_experimental, that implements the same logic than cond_break but evaluates to a boolean expression. The patch also changes all the current instances of usage of cond_break withing the header of loop accordingly. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Link: https://lore.kernel.org/r/20240511212243.23477-1-jose.marchesi@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysbpf: ignore expected GCC warning in test_global_func10.cJose E. Marchesi1-0/+4
The BPF selftest global_func10 in progs/test_global_func10.c contains: struct Small { long x; }; struct Big { long x; long y; }; [...] __noinline int foo(const struct Big *big) { if (!big) return 0; return bpf_get_prandom_u32() < big->y; } [...] SEC("cgroup_skb/ingress") __failure __msg("invalid indirect access to stack") int global_func10(struct __sk_buff *skb) { const struct Small small = {.x = skb->len }; return foo((struct Big *)&small) ? 1 : 0; } GCC emits a "maybe uninitialized" warning for the code above, because it knows `foo' accesses `big->y'. Since the purpose of this selftest is to check that the verifier will fail on this sort of invalid memory access, this patch just silences the compiler warning. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240511212349.23549-1-jose.marchesi@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysbpf: disable strict aliasing in test_global_func9.cJose E. Marchesi1-0/+1
The BPF selftest test_global_func9.c performs type punning and breaks srict-aliasing rules. In particular, given: int global_func9(struct __sk_buff *skb) { int result = 0; [...] { const struct C c = {.x = skb->len, .y = skb->family }; result |= foo((const struct S *)&c); } } When building with strict-aliasing enabled (the default) the initialization of `c' gets optimized away in its entirely: [... no initialization of `c' ...] r1 = r10 r1 += -40 call foo w0 |= w6 Since GCC knows that `foo' accesses s->x, we get a "maybe uninitialized" warning. On the other hand, when strict-aliasing is disabled GCC only optimizes away the store to `.y': r1 = *(u32 *) (r6+0) *(u32 *) (r10+-40) = r1 ; This is .x = skb->len in `c' r1 = r10 r1 += -40 call foo w0 |= w6 In this case the warning is not emitted, because s-> is initialized. This patch disables strict aliasing in this test when building with GCC. clang seems to not optimize this particular code even when strict aliasing is enabled. Tested in bpf-next master. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240511212213.23418-1-jose.marchesi@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Free strdup memory in xdp_hw_metadataGeliang Tang1-0/+2
The strdup() function returns a pointer to a new string which is a duplicate of the string "ifname". Memory for the new string is obtained with malloc(), and need to be freed with free(). This patch adds this missing "free(saved_hwtstamp_ifname)" in cleanup() to avoid a potential memory leak in xdp_hw_metadata.c. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/af9bcccb96655e82de5ce2b4510b88c9c8ed5ed0.1715417367.git.tanggeliang@kylinos.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Fix a few tests for GCC related warnings.Cupertino Miranda4-29/+37
This patch corrects a few warnings to allow selftests to compile for GCC. -- progs/cpumask_failure.c -- progs/bpf_misc.h:136:22: error: ‘cpumask’ is used uninitialized [-Werror=uninitialized] 136 | #define __sink(expr) asm volatile("" : "+g"(expr)) | ^~~ progs/cpumask_failure.c:68:9: note: in expansion of macro ‘__sink’ 68 | __sink(cpumask); The macro __sink(cpumask) with the '+' contraint modifier forces the the compiler to expect a read and write from cpumask. GCC detects that cpumask is never initialized and reports an error. This patch removes the spurious non required definitions of cpumask. -- progs/dynptr_fail.c -- progs/dynptr_fail.c:1444:9: error: ‘ptr1’ may be used uninitialized [-Werror=maybe-uninitialized] 1444 | bpf_dynptr_clone(&ptr1, &ptr2); Many of the tests in the file are related to the detection of uninitialized pointers by the verifier. GCC is able to detect possible uninitialized values, and reports this as an error. The patch initializes all of the previous uninitialized structs. -- progs/test_tunnel_kern.c -- progs/test_tunnel_kern.c:590:9: error: array subscript 1 is outside array bounds of ‘struct geneve_opt[1]’ [-Werror=array-bounds=] 590 | *(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef); | ^~~~~~~~~~~~~~~~~~~~~~~ progs/test_tunnel_kern.c:575:27: note: at offset 4 into object ‘gopt’ of size 4 575 | struct geneve_opt gopt; This tests accesses beyond the defined data for the struct geneve_opt which contains as last field "u8 opt_data[0]" which clearly does not get reserved space (in stack) in the function header. This pattern is repeated in ip6geneve_set_tunnel and geneve_set_tunnel functions. GCC is able to see this and emits a warning. The patch introduces a local struct that allocates enough space to safely allow the write to opt_data field. -- progs/jeq_infer_not_null_fail.c -- progs/jeq_infer_not_null_fail.c:21:40: error: array subscript ‘struct bpf_map[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’ [-Werror=array-bounds=] 21 | struct bpf_map *inner_map = map->inner_map_meta; | ^~ progs/jeq_infer_not_null_fail.c:14:3: note: object ‘m_hash’ of size 32 14 | } m_hash SEC(".maps"); This example defines m_hash in the context of the compilation unit and casts it to struct bpf_map which is much smaller than the size of struct bpf_map. It errors out in GCC when it attempts to access an element that would be defined in struct bpf_map outsize of the defined limits for m_hash. This patch disables the warning through a GCC pragma. This changes were tested in bpf-next master selftests without any regressions. Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com> Cc: jose.marchesi@oracle.com Cc: david.faust@oracle.com Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Link: https://lore.kernel.org/r/20240510183850.286661-2-cupertino.miranda@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysbpf: avoid gcc overflow warning in test_xdp_vlan.cDavid Faust1-1/+1
This patch fixes an integer overflow warning raised by GCC in xdp_prognum1 of progs/test_xdp_vlan.c: GCC-BPF [test_maps] test_xdp_vlan.bpf.o progs/test_xdp_vlan.c: In function 'xdp_prognum1': progs/test_xdp_vlan.c:163:25: error: integer overflow in expression '(short int)(((__builtin_constant_p((int)vlan_hdr->h_vlan_TCI)) != 0 ? (int)(short unsigned int)((short int)((int)vlan_hdr->h_vlan_TCI << 8 >> 8) << 8 | (short int)((int)vlan_hdr->h_vlan_TCI << 0 >> 8 << 0)) & 61440 : (int)__builtin_bswap16(vlan_hdr->h_vlan_TCI) & 61440) << 8 >> 8) << 8' of type 'short int' results in '0' [-Werror=overflow] 163 | bpf_htons((bpf_ntohs(vlan_hdr->h_vlan_TCI) & 0xf000) | ^~~~~~~~~ The problem lies with the expansion of the bpf_htons macro and the expression passed into it. The bpf_htons macro (and similarly the bpf_ntohs macro) expand to a ternary operation using either __builtin_bswap16 or ___bpf_swab16 to swap the bytes, depending on whether the expression is constant. For an expression, with 'value' as a u16, like: bpf_htons (value & 0xf000) The entire (value & 0xf000) is 'x' in the expansion of ___bpf_swab16 and we get as one part of the expanded swab16: ((__u16)(value & 0xf000) << 8 >> 8 << 8 This will always evaluate to 0, which is intentional since this subexpression deals with the byte guaranteed to be 0 by the mask. However, GCC warns because the precise reason this always evaluates to 0 is an overflow. Specifically, the plain 0xf000 in the expression is a signed 32-bit integer, which causes 'value' to also be promoted to a signed 32-bit integer, and the combination of the 8-bit left shift and down-cast back to __u16 results in a signed overflow (really a 'warning: overflow in conversion from int to __u16' which is propegated up through the rest of the expression leading to the ultimate overflow warning above), which is a valid warning despite being the intended result of this code. Clang does not warn on this case, likely because it performs constant folding later in the compilation process relative to GCC. It seems that by the time clang does constant folding for this expression, the side of the ternary with this overflow has already been discarded. Fortunately, this warning is easily silenced by simply making the 0xf000 mask explicitly unsigned. This has no impact on the result. Signed-off-by: David Faust <david.faust@oracle.com> Cc: jose.marchesi@oracle.com Cc: cupertino.miranda@oracle.com Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240508193512.152759-1-david.faust@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daystools: remove redundant ethtool.h from tooling infraTushar Vyavahare1-2271/+0
Remove the redundant ethtool.h header file from tools/include/uapi/linux. The file is unnecessary as the system uses the kernel's include/uapi/linux/ethtool.h directly. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240508104123.434769-1-tushar.vyavahare@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysMerge branch 'retire-progs-test_sock_addr'Alexei Starovoitov19-1274/+1992
Jordan Rife says: ==================== Retire progs/test_sock_addr.c This patch series migrates remaining tests from bpf/test_sock_addr.c to prog_tests/sock_addr.c and progs/verifier_sock_addr.c in order to fully retire the old-style test program and expands test coverage to test previously untested scenarios related to sockaddr hooks. This is a continuation of the work started recently during the expansion of prog_tests/sock_addr.c. Link: https://lore.kernel.org/bpf/20240429214529.2644801-1-jrife@google.com/T/#u ======= Patches ======= * Patch 1 moves tests that check valid return values for recvmsg hooks into progs/verifier_sock_addr.c, a new addition to the verifier test suite. * Patches 2-5 lay the groundwork for test migration, enabling prog_tests/sock_addr.c to handle more test dimensions. * Patches 6-11 move existing tests to prog_tests/sock_addr.c. * Patch 12 removes some redundant test cases. * Patches 14-17 expand on existing test coverage. ==================== Link: https://lore.kernel.org/r/20240510190246.3247730-1-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Expand ATTACH_REJECT testsJordan Rife1-0/+187
This expands coverage for ATTACH_REJECT tests to include connect_unix, sendmsg_unix, recvmsg*, getsockname*, and getpeername*. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-18-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Expand getsockname and getpeername testsJordan Rife5-2/+412
This expands coverage for getsockname and getpeername hooks to include getsockname4, getsockname6, getpeername4, and getpeername6. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-17-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 dayssefltests/bpf: Expand sockaddr hook deny testsJordan Rife7-0/+378
This patch expands test coverage for EPERM tests to include connect and bind calls and rounds out the coverage for sendmsg by adding tests for sendmsg_unix. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-16-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Expand sockaddr program return value testsJordan Rife1-0/+294
This patch expands verifier coverage for program return values to cover bind, connect, sendmsg, getsockname, and getpeername hooks. It also rounds out the recvmsg coverage by adding test cases for recvmsg_unix hooks. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-15-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Retire test_sock_addr.(c|sh)Jordan Rife4-636/+1
Fully remove test_sock_addr.c and test_sock_addr.sh, as test coverage has been fully moved to prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-14-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Remove redundant sendmsg test casesJordan Rife1-161/+0
Remove these test cases completely, as the same behavior is already covered by other sendmsg* test cases in prog_tests/sock_addr.c. This just rewrites the destination address similar to sendmsg_v4_prog and sendmsg_v6_prog. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-13-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate ATTACH_REJECT test casesJordan Rife2-146/+102
Migrate test case from bpf/test_sock_addr.c ensuring that program attachment fails when using an inappropriate attach type. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-12-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate expected_attach_type testsJordan Rife2-84/+96
Migrates tests from progs/test_sock_addr.c ensuring that programs fail to load when the expected attach type does not match. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-11-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate wildcard destination rewrite testJordan Rife3-20/+37
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg respects when sendmsg6 hooks rewrite the destination IP with the IPv6 wildcard IP, [::]. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-10-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate sendmsg6 v4 mapped address testsJordan Rife3-20/+42
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg returns -ENOTSUPP when sending to an IPv4-mapped IPv6 address to prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-9-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate sendmsg deny test casesJordan Rife4-45/+110
This set of tests checks that sendmsg calls are rejected (return -EPERM) when the sendmsg* hook returns 0. Replace those in bpf/test_sock_addr.c with corresponding tests in prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-8-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate WILDCARD_IP testJordan Rife3-20/+56
Move wildcard IP sendmsg test case out of bpf/test_sock_addr.c into prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-7-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test casesJordan Rife1-20/+58
In preparation to move test cases from bpf/test_sock_addr.c that expect system calls to return ENOTSUPP or EPERM, this patch propagates errno from relevant system calls up to test_sock_addr() where the result can be checked. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-6-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Handle ATTACH_REJECT test casesJordan Rife1-1/+34
In preparation to move test cases from bpf/test_sock_addr.c that expect ATTACH_REJECT, this patch adds BPF_SKEL_FUNCS_RAW to generate load and destroy functions that use bpf_prog_attach() to control the attach_type. The normal load functions use bpf_program__attach_cgroup which does not have the same degree of control over the attach type, as bpf_program_attach_fd() calls bpf_link_create() with the attach type extracted from prog using bpf_program__expected_attach_type(). It is currently not possible to modify the attach type before bpf_program__attach_cgroup() is called, since bpf_program__set_expected_attach_type() has no effect after the program is loaded. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-5-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Handle LOAD_REJECT test casesJordan Rife1-5/+98
In preparation to move test cases from bpf/test_sock_addr.c that expect LOAD_REJECT, this patch adds expected_attach_type and extends load_fn to accept an expected attach type and a flag indicating whether or not rejection is expected. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-4-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Use program name for skel load/destroy functionsJordan Rife1-46/+50
In preparation to migrate tests from bpf/test_sock_addr.c to sock_addr.c, update BPF_SKEL_FUNCS so that it generates functions based on prog_name instead of skel_name. This allows us to differentiate between programs in the same skeleton. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-3-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysselftests/bpf: Migrate recvmsg* return code tests to verifier_sock_addr.cJordan Rife3-70/+39
This set of tests check that the BPF verifier rejects programs with invalid return codes (recvmsg4 and recvmsg6 hooks can only return 1). This patch replaces the tests in test_sock_addr.c with verifier_sock_addr.c, a new verifier prog_tests for sockaddr hooks, in a step towards fully retiring test_sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-2-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysriscv, bpf: make some atomic operations fully orderedPuranjay Mohan1-10/+10
The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all atomic operations except BPF_CMPXCHG with relaxed ordering. Section 8.1 of the "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic Instructions" says: | To provide more efficient support for release consistency [5], each | atomic instruction has two bits, aq and rl, used to specify additional | memory ordering constraints as viewed by other RISC-V harts. and | If only the aq bit is set, the atomic memory operation is treated as | an acquire access. | If only the rl bit is set, the atomic memory operation is treated as a | release access. | | If both the aq and rl bits are set, the atomic memory operation is | sequentially consistent. Fix this by setting both aq and rl bits as 1 for operations with BPF_FETCH and BPF_XCHG. [1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysriscv, bpf: Fix typo in commentXiao Wang1-2/+2
We can use either "instruction" or "insn" in the comment. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 dayss390/bpf: Emit a barrier for BPF_FETCH instructionsIlya Leoshkevich1-2/+6
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH" should be the same as atomic_fetch_add(), which is currently not the case on s390x: the serialization instruction "bcr 14,0" is missing. This applies to "and", "or" and "xor" variants too. s390x is allowed to reorder stores with subsequent fetches from different addresses, so code relying on BPF_FETCH acting as a barrier, for example: stw [%r0], 1 afadd [%r1], %r2 ldxw %r3, [%r4] may be broken. Fix it by emitting "bcr 14,0". Note that a separate serialization instruction is not needed for BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs serialization itself. Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops") Reported-by: Puranjay Mohan <puranjay12@gmail.com> Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysMerge branch 'bpf-inline-helpers-in-arm64-and-riscv-jits'Alexei Starovoitov8-0/+132
Puranjay Mohan says: ==================== bpf: Inline helpers in arm64 and riscv JITs Changes in v5 -> v6: arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/ riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/ - Combine riscv and arm64 changes in single series - Some coding style fixes Changes in v4 -> v5: v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/ - Implement the inlining of the bpf_get_smp_processor_id() in the JIT. NOTE: This needs to be based on: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/ to be built. Manual run of bpf-ci with this series rebased on above: https://github.com/kernel-patches/bpf/pull/6929 Changes in v3 -> v4: v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/ - Fix coding style issue related to C89 standards. Changes in v2 -> v3: v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/ - Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)" - Made ARM64 and x86-64 use the same code for inlining. The only difference that remains is the per-cpu address of the cpu_number. Changes in v1 -> v2: v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/ - Add a patch to inline bpf_get_smp_processor_id() - Fix an issue in MRS instruction encoding as pointed out by Will - Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP This series adds the support of internal only per-CPU instructions and inlines the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs. Here is an example of calls to bpf_get_smp_processor_id() and percpu_array_map_lookup_elem() before and after this series on ARM64. BPF ===== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); (85) call bpf_get_smp_processor_id#229032 (85) call bpf_get_smp_processor_id#8 p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero); (18) r1 = map[id:78] (18) r1 = map[id:153] (18) r2 = map[id:82][0]+65536 (18) r2 = map[id:157][0]+65536 (85) call percpu_array_map_lookup_elem#313512 (07) r1 += 496 (61) r0 = *(u32 *)(r2 +0) (35) if r0 >= 0x1 goto pc+5 (67) r0 <<= 3 (0f) r0 += r1 (79) r0 = *(u64 *)(r0 +0) (bf) r0 = &(void __percpu *)(r0) (05) goto pc+1 (b7) r0 = 0 ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero); mov x0, #0xffff0003ffffffff mov x0, #0xffff0003ffffffff movk x0, #0xce5c, lsl #16 movk x0, #0xe0f3, lsl #16 movk x0, #0xca00 movk x0, #0x7c00 mov x1, #0xffff8000ffffffff mov x1, #0xffff8000ffffffff movk x1, #0x8bdb, lsl #16 movk x1, #0xb0c7, lsl #16 movk x1, #0x6000 movk x1, #0xe000 mov x10, #0xffffffffffff3ed0 add x0, x0, #0x1f0 movk x10, #0x802d, lsl #16 ldr w7, [x1] movk x10, #0x8000, lsl #32 cmp x7, #0x1 blr x10 b.cs 0x0000000000000090 add x7, x0, #0x0 lsl x7, x7, #3 add x7, x7, x0 ldr x7, [x7] mrs x10, tpidr_el1 add x7, x7, x10 b 0x0000000000000094 mov x7, #0x0 Performance improvement found using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ | Name | Before | After | % change | |---------------+-------------------+-------------------+--------------| | glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% | | arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% | | hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% | +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef RISCV64 JIT output for `call bpf_get_smp_processor_id` ======================================================= Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ ==================== Link: https://lore.kernel.org/r/20240502151854.9810-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysbpf, arm64: inline bpf_get_smp_processor_id() helperPuranjay Mohan3-0/+28
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting a read from struct thread_info. The SP_EL0 system register holds the pointer to the task_struct and thread_info is the first member of this struct. We can read the cpu number from the thread_info. Here is how the ARM64 JITed assembly changes after this commit: ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 Performance improvement using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ | Name | Before | After | % change | |---------------+-------------------+-------------------+--------------| | glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% | | arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% | | hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% | +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysarm64, bpf: add internal-only MOV instruction to resolve per-CPU addrsPuranjay Mohan4-0/+38
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu access using tpidr_el1"), the per-cpu offset for the CPU is stored in the tpidr_el1/2 register of that CPU. To support this BPF instruction in the ARM64 JIT, the following ARM64 instructions are emitted: mov dst, src // Move src to dst, if src != dst mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp. add dst, dst, tmp // Add the per cpu offset to the dst. To measure the performance improvement provided by this change, the benchmark in [1] was used: Before: glob-arr-inc : 23.597 ± 0.012M/s arr-inc : 23.173 ± 0.019M/s hash-inc : 12.186 ± 0.028M/s After: glob-arr-inc : 23.819 ± 0.034M/s arr-inc : 23.285 ± 0.017M/s hash-inc : 12.419 ± 0.011M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysriscv, bpf: inline bpf_get_smp_processor_id()Puranjay Mohan4-0/+42
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysriscv, bpf: add internal-only MOV instruction to resolve per-CPU addrsPuranjay Mohan1-0/+24
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). Once we have the cpu number in a register we read the offset for that cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this offset to the destination register. To measure the improvement from this change, the benchmark in [1] was used on Qemu: Before: glob-arr-inc : 1.127 ± 0.013M/s arr-inc : 1.121 ± 0.004M/s hash-inc : 0.681 ± 0.052M/s After: glob-arr-inc : 1.138 ± 0.011M/s arr-inc : 1.366 ± 0.006M/s hash-inc : 0.676 ± 0.001M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 daysARC: Add eBPF JIT supportShahab Vahedi9-2/+4611
This will add eBPF JIT support to the 32-bit ARCv2 processors. The implementation is qualified by running the BPF tests on a Synopsys HSDK board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU. The test_bpf.ko reports 2-10 fold improvements in execution time of its tests. For instance: test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS Deployment and structure ------------------------ The related codes are added to "arch/arc/net": - bpf_jit.h -- The interface that a back-end translator must provide - bpf_jit_core.c -- Knows how to handle the input eBPF byte stream - bpf_jit_arcv2.c -- The back-end code that knows the translation logic The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance to the whole process. Normally, the translation is done in one pass, namely the "normal pass". In case some relocations are not known during this pass, some data (arc_jit_data) is allocated for the next pass to come. This possible next (and last) pass is called the "extra pass". 1. Normal pass # The necessary pass 1a. Dry run # Get the whole JIT length, epilogue offset, etc. 1b. Emit phase # Allocate memory and start emitting instructions 2. Extra pass # Only needed if there are relocations to be fixed 2a. Patch relocations Support status -------------- The JIT compiler supports BPF instructions up to "cpu=v4". However, it does not yet provide support for: - Tail calls - Atomic operations - 64-bit division/remainder - BPF_PROBE_MEM* (exception table) The result of "test_bpf" test suite on an HSDK board is: hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] All the failing test cases are due to the ones that were not JIT'ed. Categorically, they can be represented as: .-----------.------------.-------------. | test type | opcodes | # of cases | |-----------+------------+-------------| | atomic | 0xC3, 0xDB | 149 | | div64 | 0x37, 0x3F | 22 | | mod64 | 0x97, 0x9F | 15 | `-----------^------------+-------------| | (total) 186 | `-------------' Setup: build config ------------------- The following configs must be set to have a working JIT test: CONFIG_BPF_JIT=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_TEST_BPF=m The following options are not necessary for the tests module, but are good to have: CONFIG_DEBUG_INFO=y # prerequisite for below CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h CONFIG_FTRACE=y # CONFIG_BPF_SYSCALL=y # all these options lead to CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y CONFIG_PERF_EVENTS=y # Some BPF programs provide data through /sys/kernel/debug: CONFIG_DEBUG_FS=y arc# mount -t debugfs debugfs /sys/kernel/debug Setup: elfutils --------------- The libdw.{so,a} library that is used by pahole for processing the final binary must come from elfutils 0.189 or newer. The support for ARCv2 [1] has been added since that version. [1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7 Setup: pahole ------------- The line below in linux/scripts/Makefile.btf must be commented out: pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats Or else, the build will fail: $ make V=1 ... BTF .btf.vmlinux.bin.o pahole -J --btf_gen_floats \ -j --lang_exclude=rust \ --skip_encoding_btf_inconsistent_proto \ --btf_gen_optimized .tmp_vmlinux.btf Complex, interval and imaginary float types are not supported Encountered error while encoding BTF. ... BTFIDS vmlinux ./tools/bpf/resolve_btfids/resolve_btfids vmlinux libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available This is due to the fact that the ARC toolchains generate "complex float" DIE entries in libgcc and at the moment, pahole can't handle such entries. Running the tests ----------------- host$ scp /bld/linux/lib/test_bpf.ko arc: arc # sysctl net.core.bpf_jit_enable=1 arc # insmod test_bpf.ko test_suite=test_bpf ... test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] Acknowledgments --------------- - Claudiu Zissulescu for his unwavering support - Yuriy Kolerov for testing and troubleshooting - Vladimir Isaev for the pahole workaround - Sergey Matyukevich for paving the road by adding the interpreter support Signed-off-by: Shahab Vahedi <shahab@synopsys.com> Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 dayshwmon: (nzxt-kraken3) Bail out for unsupported device variantsGuenter Roeck1-1/+2
Dan Carpenter reports: Commit cbeb479ff4cd ("hwmon: (nzxt-kraken3) Decouple device names from kinds") from Apr 28, 2024 (linux-next), leads to the following Smatch static checker warning: drivers/hwmon/nzxt-kraken3.c:957 kraken3_probe() error: uninitialized symbol 'device_name'. Indeed, 'device_name' will be uninitizalized if an unknown product is encountered. In practice this should not matter because the driver should not instantiate on unknown products, but lets play safe and bail out if that happens. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-hwmon/b1738c50-db42-40f0-a899-9c027c131ffb@moroto.mountain/ Cc: Jonas Malaco <jonas@protocubo.io> Cc: Aleksa Savic <savicaleksa83@gmail.com> Fixes: cbeb479ff4cd ("hwmon: (nzxt-kraken3) Decouple device names from kinds") Acked-by: Jonas Malaco <jonas@protocubo.io> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 daysLinux 6.9v6.9Linus Torvalds1-1/+1
8 daysMerge tag 'kselftest-fix-vfork-2024-05-12' of ↵Linus Torvalds4-67/+147
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull Kselftest fixes from Mickaël Salaün: "Fix Kselftest's vfork() side effects. As reported by Kernel Test Robot and Sean Christopherson, some tests fail since v6.9-rc1 . This is due to the use of vfork() which introduced some side effects. Similarly, while making it more generic, a previous commit made some Landlock file system tests flaky, and subject to the host's file system mount configuration. This fixes all these side effects by replacing vfork() with clone3() and CLONE_VFORK, which is cleaner (no arbitrary shared memory) and makes the Kselftest framework more robust" Link: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com Link: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com Link: https://lore.kernel.org/r/20240511171445.904356-1-mic@digikod.net * tag 'kselftest-fix-vfork-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: selftests/harness: Handle TEST_F()'s explicit exit codes selftests/harness: Fix vfork() side effects selftests/harness: Share _metadata between forked processes selftests/pidfd: Fix wrong expectation selftests/harness: Constify fixture variants selftests/landlock: Do not allocate memory in fixture data selftests/harness: Fix interleaved scheduling leading to race conditions selftests/harness: Fix fixture teardown selftests/landlock: Fix FS tests when run on a private mount point selftests/pidfd: Fix config for pidfd_setns_test
8 daysMerge tag 'for-linus-6.9' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull kvm fix from Paolo Bonzini: - Fix NULL pointer read on s390 in ioctl(KVM_CHECK_EXTENSION) for /dev/kvm * tag 'for-linus-6.9' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: Check kvm pointer when testing KVM_CAP_S390_HPAGE_1M
8 daysMerge tag 'edac_urgent_for_v6.9' of ↵Linus Torvalds1-13/+37
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - Fix a race condition when clearing error count bits and toggling the error interrupt throug the same register, in synopsys_edac * tag 'edac_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/synopsys: Fix ECC status and IRQ control race condition
8 dayshwmon: (emc1403) Add support for EMC1428 and EMC1438.Lars Petter Mostad2-14/+124
EMC1428 and EMC1438 are similar to EMC14xx, but have eight temperature channels, as well as signed data and limit registers. Chips currently supported by this driver have unsigned registers only. Signed-off-by: Lars Petter Mostad <larspm@gmail.com> Link: https://lore.kernel.org/r/20240510142824.824332-1-lars.petter.mostad@appear.net Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 daysMerge tag 'x86_urgent_for_v6.9' of ↵Linus Torvalds3-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add a new PCI ID which belongs to a new AMD CPU family 0x1a - Ensure that that last level cache ID is set in all cases, in the AMD CPU topology parsing code, in order to prevent invalid scheduling domain CPU masks * tag 'x86_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology/amd: Ensure that LLC ID is initialized x86/amd_nb: Add new PCI IDs for AMD family 0x1a
8 daysRDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siwZhu Yanjun1-1/+3
When running blktests nvme/rdma, the following kmemleak issue will appear. kmemleak: Kernel memory leak detector initialized (mempool available:36041) kmemleak: Automatic memory scanning thread started kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 8 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 17 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak) unreferenced object 0xffff88855da53400 (size 192): comm "rdma", pid 10630, jiffies 4296575922 hex dump (first 32 bytes): 37 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00 7............... 10 34 a5 5d 85 88 ff ff 10 34 a5 5d 85 88 ff ff .4.].....4.].... backtrace (crc 47f66721): [<ffffffff911251bd>] kmalloc_trace+0x30d/0x3b0 [<ffffffffc2640ff7>] alloc_gid_entry+0x47/0x380 [ib_core] [<ffffffffc2642206>] add_modify_gid+0x166/0x930 [ib_core] [<ffffffffc2643468>] ib_cache_update.part.0+0x6d8/0x910 [ib_core] [<ffffffffc2644e1a>] ib_cache_setup_one+0x24a/0x350 [ib_core] [<ffffffffc263949e>] ib_register_device+0x9e/0x3a0 [ib_core] [<ffffffffc2a3d389>] 0xffffffffc2a3d389 [<ffffffffc2688cd8>] nldev_newlink+0x2b8/0x520 [ib_core] [<ffffffffc2645fe3>] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core] [<ffffffffc264648c>] rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core] [<ffffffff9270e7b5>] netlink_unicast+0x445/0x710 [<ffffffff9270f1f1>] netlink_sendmsg+0x761/0xc40 [<ffffffff9249db29>] __sys_sendto+0x3a9/0x420 [<ffffffff9249dc8c>] __x64_sys_sendto+0xdc/0x1b0 [<ffffffff92db0ad3>] do_syscall_64+0x93/0x180 [<ffffffff92e00126>] entry_SYSCALL_64_after_hwframe+0x71/0x79 The root cause: rdma_put_gid_attr is not called when sgid_attr is set to ERR_PTR(-ENODEV). Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/19bf5745-1b3b-4b8a-81c2-20d945943aaf@linux.dev/T/ Fixes: f8ef1be816bf ("RDMA/cma: Avoid GID lookups on iWARP devices") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240510211247.31345-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
8 daysALSA: scarlett2: Increase mixer range to +12dBGeoffrey D. Bennett1-4/+5
The values loaded into the mixer are 16-bit values, with 8192 representing 0dB, going up to a current maximum of 16345 (+6dB). All supported interfaces have no problem going up to 32612 (+12dB), so update SCARLETT2_MIXER_MAX_DB and scarlett2_mixer_values[] to allow for this. Tested with: - Scarlett 2nd Gen 6i6, 18i8, 18i20 - Scarlett 3rd Gen 4i4, 8i6, 18i8, 18i20 - Scarlett 4th Gen Solo, 2i2, 4i4 - Clarett+ 2Pre, 4Pre, 8Pre - Vocaster One and Two Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/Zj+gYT4F2XeKTD93@m.b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
8 daysALSA: scarlett2: Add S/PDIF source selection controlsGeoffrey D. Bennett1-0/+179
Add S/PDIF Source/Digital I/O Mode selection controls for the Scarlett 3rd Gen 18i8/18i20 and Clarett 4Pre/8Pre interfaces. These models have both coax S/PDIF and optical inputs, and the optical inputs are switchable between being used as S/PDIF and ADAT inputs. The Scarlett 3rd Gen 18i20 also has a "Dual ADAT" mode for 8-channel audio at 88.2/96kHz. Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/Zj8zCTjzPsTDENN+@m.b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
8 daysRDMA/IPoIB: Fix format truncation compilation errorsLeon Romanovsky1-2/+6
Truncate the device name to store IPoIB VLAN name. [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 allmodconfig [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 W=1 drivers/infiniband/ulp/ipoib/ drivers/infiniband/ulp/ipoib/ipoib_vlan.c: In function ‘ipoib_vlan_add’: drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:52: error: ‘%04x’ directive output may be truncated writing 4 bytes into a region of size between 0 and 15 [-Werror=format-truncation=] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:48: note: directive argument in the range [0, 65535] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:9: note: ‘snprintf’ output between 6 and 21 bytes into a destination of size 16 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | ppriv->dev->name, pkey); | ~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/ulp/ipoib/ipoib_vlan.o] Error 1 make[6]: *** Waiting for unfinished jobs.... Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Link: https://lore.kernel.org/r/e9d3e1fef69df4c9beaf402cc3ac342bad680791.1715240029.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
8 daysMerge tag 'kvm-x86-misc-6.10' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini7-31/+53
KVM x86 misc changes for 6.10: - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is unused by hardware, so that KVM can communicate its inability to map GPAs that set bits 51:48 due to lack of 5-level paging. Guest firmware is expected to use the information to safely remap BARs in the uppermost GPA space, i.e to avoid placing a BAR at a legal, but unmappable, GPA. - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Don't completely ignore same-value writes to immutable feature MSRs, as doing so results in KVM failing to reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - Don't mark APICv as being inhibited due to ABSENT if APICv is disabled KVM-wide to avoid confusing debuggers (KVM will never bother clearing the ABSENT inhibit, even if userspace enables in-kernel local APIC).
8 daysMerge tag 'kvm-x86-mmu-6.10' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-29/+66
KVM x86 MMU changes for 6.10: - Process TDP MMU SPTEs that are are zapped while holding mmu_lock for read after replacing REMOVED_SPTE with '0' and flushing remote TLBs, which allows vCPU tasks to repopulate the zapped region while the zapper finishes tearing down the old, defunct page tables. - Fix a longstanding, likely benign-in-practice race where KVM could fail to detect a write from kvm_mmu_track_write() to a shadowed GPTE if the GPTE is first page table being shadowed.
8 daysMerge tag 'kvm-x86-selftests_utils-6.10' of https://github.com/kvm-x86/linux ↵Paolo Bonzini86-447/+1420
into HEAD KVM selftests treewide updates for 6.10: - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by a change to kselftest_harness.h late in the 6.9 cycle, and because forcing every test to #define _GNU_SOURCE is painful. - Provide a global psuedo-RNG instance for all tests, so that library code can generate random, but determinstic numbers. - Use the global pRNG to randomly force emulation of select writes from guest code on x86, e.g. to help validate KVM's emulation of locked accesses. - Rename kvm_util_base.h back to kvm_util.h, as the weird layer of indirection was added purely to avoid manually #including ucall_common.h in a handful of locations. - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception handlers at VM creation, instead of forcing tests to manually trigger the related setup.
8 daysMerge tag 'kvm-x86-vmx-6.10' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini5-16/+34
KVM VMX changes for 6.10: - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to L1, as per the SDM. - Move kvm_vcpu_arch's exit_qualification into x86_exception, as the field is used only when synthesizing nested EPT violation, i.e. it's not the vCPU's "real" exit_qualification, which is tracked elsewhere. - Add a sanity check to assert that EPT Violations are the only sources of nested PML Full VM-Exits.
8 daysMerge tag 'kvm-x86-selftests-6.10' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini10-137/+282
KVM selftests cleanups and fixes for 6.10: - Enhance the demand paging test to allow for better reporting and stressing of UFFD performance. - Convert the steal time test to generate TAP-friendly output. - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed time across two different clock domains. - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT. - Avoid unnecessary use of "sudo" in the NX hugepage test to play nice with running in a minimal userspace environment. - Allow skipping the RSEQ test's sanity check that the vCPU was able to complete a reasonable number of KVM_RUNs, as the assert can fail on a completely valid setup. If the test is run on a large-ish system that is otherwise idle, and the test isn't affined to a low-ish number of CPUs, the vCPU task can be repeatedly migrated to CPUs that are in deep sleep states, which results in the vCPU having very little net runtime before the next migration due to high wakeup latencies.
8 daysMerge tag 'kvm-x86-generic-6.10' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini3-22/+9
KVM cleanups for 6.10: - Misc cleanups extracted from the "exit on missing userspace mapping" series, which has been put on hold in anticipation of a "KVM Userfault" approach, which should provide a superset of functionality. - Remove kvm_make_all_cpus_request_except(), which got added to hack around an AVIC bug, and then became dead code when a more robust fix came along. - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
8 daysMerge tag 'kvmarm-6.10-1' of ↵Paolo Bonzini74-1056/+2971
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.10 - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greattly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements.
9 daysfs/proc: fix softlockup in __read_vmcoreRik van Riel1-0/+2
While taking a kernel core dump with makedumpfile on a larger system, softlockup messages often appear. While softlockup warnings can be harmless, they can also interfere with things like RCU freeing memory, which can be problematic when the kdump kexec image is configured with as little memory as possible. Avoid the softlockup, and give things like work items and RCU a chance to do their thing during __read_vmcore by adding a cond_resched. Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysnilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()Ryusuke Konishi1-1/+3
The BUG_ON check performed on the return value of __getblk() in nilfs_finish_roll_forward() assumes that a buffer that has been successfully read once is retrieved with the same parameters and does not fail (__getblk() does not return an error due to memory allocation failure). Also, nilfs_finish_roll_forward() is called at most once during mount. Taking these into consideration, rewrite the check to use WARN_ON() to avoid using BUG_ON(). Link: https://lkml.kernel.org/r/20240508221429.7559-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysscripts: checkpatch: check unused parameters for function-like macroXining Xu2-0/+20
If function-like macros do not utilize a parameter, it might result in a build warning. In our coding style guidelines, we advocate for utilizing static inline functions to replace such macros. This patch verifies compliance with the new rule. For a macro such as the one below, #define test(a) do { } while (0) The test result is as follows. WARNING: Argument 'a' is not used in function-like macro #21: FILE: mm/init-mm.c:20: +#define test(a) do { } while (0) total: 0 errors, 1 warnings, 8 lines checked Link: https://lkml.kernel.org/r/20240507032757.146386-3-21cnbao@gmail.com Signed-off-by: Xining Xu <mac.xxn@outlook.com> Tested-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: Joe Perches <joe@perches.com> Cc: Chris Zankel <chris@zankel.net> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Brown <broonie@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Jeff Johnson <quic_jjohnson@quicinc.com> Cc: Charlemagne Lasse <charlemagnelasse@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysDocumentation: coding-style: ask function-like macros to evaluate parametersBarry Song1-0/+23
Patch series "codingstyle: avoid unused parameters for a function-like macro", v7. A function-like macro could result in build warnings such as "unused variable." This patchset updates the guidance to recommend always using a static inline function instead and also provides checkpatch support for this new rule. This patch (of 2): Recent commit 77292bb8ca69c80 ("crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem") leads to warnings on xtensa and loongarch, In file included from crypto/scompress.c:12: include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone': include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable] 76 | struct page *page; | ^~~~ crypto/scompress.c: In function 'scomp_acomp_comp_decomp': >> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable] 174 | struct page *dst_page = sg_page(req->dst); | The reason is that flush_dcache_page() is implemented as a noop macro on these platforms as below, #define flush_dcache_page(page) do { } while (0) The driver code, for itself, seems be quite innocent and placing maybe_unused seems pointless, struct page *dst_page = sg_page(req->dst); for (i = 0; i < nr_pages; i++) flush_dcache_page(dst_page + i); And it should be independent of architectural implementation differences. Let's provide guidance on coding style for requesting parameter evaluation or proposing the migration to a static inline function. Link: https://lkml.kernel.org/r/20240507032757.146386-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240507032757.146386-2-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Suggested-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Joe Perches <joe@perches.com> Cc: Chris Zankel <chris@zankel.net> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Xining Xu <mac.xxn@outlook.com> Cc: Charlemagne Lasse <charlemagnelasse@gmail.com> Cc: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysnilfs2: use __field_struct() for a bitwise fieldBart Van Assche1-1/+5
As one can see in include/trace/stages/stage4_event_fields.h, the implementation of __field() uses the is_signed_type() macro. As one can see in commit dcf8e5633e2e ("tracing: Define the is_signed_type() macro once"), there has been an attempt to not make is_signed_type() trigger sparse warnings for bitwise types. Despite that change, sparse complains when passing a bitwise type to is_signed_type(). The reason is that in its definition below, an inequality comparison will be made against bitwise types, which are random collections of bits (the casts to bitwise types themselves are semantically valid and not problematic): #define is_signed_type(type) (((type)(-1)) < (__force type)1) So, as a workaround, follow the example of <trace/events/initcall.h> and suppress the following sparse warnings by changing __field() into __field_struct() that doesn't use is_signed_type(): fs/nilfs2/segment.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nilfs2.h): ./include/trace/events/nilfs2.h:191:1: warning: cast to restricted blk_opf_t ./include/trace/events/nilfs2.h:191:1: warning: restricted blk_opf_t degrades to integer ./include/trace/events/nilfs2.h:191:1: warning: restricted blk_opf_t degrades to integer [konishi.ryusuke: describe the reason for the warnings per Linus's explanation] Link: https://lkml.kernel.org/r/20240507222041.4876-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240507142454.3344-1-konishi.ryusuke@gmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401092241.I4mm9OWl-lkp@intel.com/ Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Closes: https://lore.kernel.org/all/20240430080019.4242-2-konishi.ryusuke@gmail.com/ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/kcmp: remove unused open modeEdward Liaw1-1/+1
Android bionic warns that open modes are ignored if O_CREAT or O_TMPFILE aren't specified. The permissions for the file are set above: fd1 = open(kpath, O_RDWR | O_CREAT | O_TRUNC, 0644); Link: https://lkml.kernel.org/r/20240429234610.191144-1-edliaw@google.com Fixes: d97b46a64674 ("syscalls, x86: add __NR_kcmp syscall") Signed-off-by: Edward Liaw <edliaw@google.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysnilfs2: remove calls to folio_set_error() and folio_clear_error()Matthew Wilcox (Oracle)2-8/+1
Nobody checks this flag on nilfs2 folios, stop setting and clearing it. That lets us simplify nilfs_end_folio_io() slightly. Link: https://lkml.kernel.org/r/20240420025029.2166544-17-willy@infradead.org Link: https://lkml.kernel.org/r/20240430050901.3239-1-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmemcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_orderXiu Jianfeng2-4/+0
Since commit 857f21397f71 ("memcg, oom: remove unnecessary check in mem_cgroup_oom_synchronize()"), memcg_oom_gfp_mask and memcg_oom_order are no longer used any more. Link: https://lkml.kernel.org/r/20240509032628.1217652-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Benjamin Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage ↵Dev Jain1-7/+9
size at runtime Currently, the size used in mmap() is statically defined, leading to skipping of the test on a hugepage size other than 2 MB, since munmap() won't free the hugepage for a size greater than 2 MB. Hence, query the size at runtime. Also, there is no reason why a hugepage allocation should fail, since we are using a simple mmap() using MAP_HUGETLB; hence, instead of skipping the test, make it fail. Link: https://lkml.kernel.org/r/20240509095447.3791573-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wpOscar Salvador1-1/+1
commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults") added support to use the mc variants when coping hugetlb pages on CoW faults. Add the missing VM_FAULT_SET_HINDEX, so the right si_addr_lsb will be passed to userspace to report the extension of the faulty area. Link: https://lkml.kernel.org/r/20240509100148.22384-3-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_faultOscar Salvador1-1/+2
Patch series "Minor fixups for hugetlb fault path". This series contains a couple of fixups for hugetlb_fault and hugetlb_wp respectively, where a VM_FAULT_SET_HINDEX call was missing. I did not bother with a Fixes tag because the missing piece here is that we will not report to userspace the right extension of the faulty area by adjusting struct kernel_siginfo.si_addr_lsb, but I do not consider that to be a big issue because I assume that userspace already knows the size of the mapping anyway. This patch (of 2): commit af19487f00f3 ("mm: make PTE_MARKER_SWAPIN_ERROR more general") added the code to handle pte_markers in hugetlb faulting path. In case of an UFFD_POISON event, a PTE_MARKER_POISONED will be created and we will return VM_FAULT_HWPOISON_LARGE upon detecting that in the fault path. Add the missing VM_FAULT_SET_HINDEX, so the right si_addr_lsb will be passed to userspace to report the extension of the faulty area. Link: https://lkml.kernel.org/r/20240509100148.22384-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20240509100148.22384-2-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests: cgroup: add tests to verify the zswap writeback pathUsama Arif1-1/+129
Attempt writeback with the below steps and check using memory.stat.zswpwb if zswap writeback occurred: 1. Allocate memory. 2. Reclaim memory equal to the amount that was allocated in step 1. This will move it into zswap. 3. Save current zswap usage. 4. Move the memory allocated in step 1 back in from zswap. 5. Set zswap.max to half the amount that was recorded in step 3. 6. Attempt to reclaim memory equal to the amount that was allocated, this will either trigger writeback if it's enabled, or reclamation will fail if writeback is disabled as there isn't enough zswap space. Link: https://lkml.kernel.org/r/20240508171359.1545744-1-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Suggested-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm: memcg: make alloc_mem_cgroup_per_node_info() return boolXiu Jianfeng1-5/+5
alloc_mem_cgroup_per_node_info() returns int that doesn't map to any errno error code. The only existing caller doesn't really need an error code so change the function to return bool (true on success) because this is slightly less confusing and more consistent with the other code. Link: https://lkml.kernel.org/r/20240507132324.1158510-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm/damon/core: fix return value from damos_wmark_metric_valueAlex Rusuf1-4/+5
damos_wmark_metric_value's return value is 'unsigned long', so returning -EINVAL as 'unsigned long' may turn out to be very different from the expected one (using 2's complement) and treat as usual matric's value. So, fix that, checking if returned value is not 0. Link: https://lkml.kernel.org/r/20240506180238.53842-1-sj@kernel.org Fixes: ee801b7dd782 ("mm/damon/schemes: activate schemes based on a watermarks mechanism") Signed-off-by: Alex Rusuf <yorha.op@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPEDYosry Ahmed1-6/+9
Previously, all NR_VM_EVENT_ITEMS stats were maintained per-memcg, although some of those fields are not exposed anywhere. Commit 14e0f6c957e39 ("memcg: reduce memory for the lruvec and memcg stats") changed this such that we only maintain the stats we actually expose per-memcg via a translation table. Additionally, commit 514462bbe927b ("memcg: warn for unexpected events and stats") added a warning if a per-memcg stat update is attempted for a stat that is not in the translation table. The warning started firing for the NR_{FILE/SHMEM}_PMDMAPPED stat updates in the rmap code. These stats are not maintained per-memcg, and hence are not in the translation table. Do not use __lruvec_stat_mod_folio() when updating NR_FILE_PMDMAPPED and NR_SHMEM_PMDMAPPED. Use __mod_node_page_state() instead, which updates the global per-node stats only. Link: https://lkml.kernel.org/r/20240506192924.271999-1-yosryahmed@google.com Fixes: 514462bbe927 ("memcg: warn for unexpected events and stats") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: syzbot+9319a4268a640e26b72b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000001b9d500617c8b23c@google.com Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests: cgroup: remove redundant enabling of memory controllerUsama Arif1-2/+0
Memory controller is already enabled in main which invokes the test, hence this does not need to be done in test_no_kmem_bypass. Link: https://lkml.kernel.org/r/20240502200529.4193651-2-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysDocs/mm/damon/maintainer-profile: allow posting patches based on damon/next treeSeongJae Park1-3/+4
The document mentions any patches for review should based on mm-unstable instead of damon/next. It should be the recommended process, but sometimes patches based on damon/next could be posted for some reasons. Actually, the DAMON-based tiered memory management patchset[1] was written on top of 'young page' DAMOS filter patchset, which was in damon/next tree as of the writing. Allow such case and just ask such things to be clearly specified. [1] https://lore.kernel.org/20240405060858.2818-1-honggyu.kim@sk.com Link: https://lkml.kernel.org/r/20240503180318.72798-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysDocs/mm/damon/maintainer-profile: change the maintainer's timezone from PST ↵SeongJae Park1-3/+3
to PT The document says the maintainer is working on only PST. The maintainer respects daylight saving system, though. Update the time zone to PT. Link: https://lkml.kernel.org/r/20240503180318.72798-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysDocs/mm/damon/design: use a list for supported filtersSeongJae Park1-20/+26
Filters section is listing currently supported filter types in a normal paragraph. Since the number of types are higher than four, it is not easy to read for only specific types. Use a list for easier finding of specific types. [sj@kernel.org: fix build warning] Link: https://lkml.kernel.org/r/20240507161747.52430-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240503180318.72798-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysDocs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update ↵SeongJae Park1-2/+2
command To update effective size quota of DAMOS schemes on DAMON sysfs file interface, user should write 'update_schemes_effective_quotas' to the kdamond 'state' file. But the document is mistakenly saying the input string as 'update_schemes_effective_bytes'. Fix it (s/bytes/quotas/). Link: https://lkml.kernel.org/r/20240503180318.72798-8-sj@kernel.org Fixes: a6068d6dfa2f ("Docs/admin-guide/mm/damon/usage: document effective_bytes file") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.9.x] Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysDocs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching ↵SeongJae Park1-1/+1
sysfs file The example usage of DAMOS filter sysfs files, specifically the part of 'matching' file writing for memcg type filter, is wrong. The intention is to exclude pages of a memcg that already getting enough care from a given scheme, but the example is setting the filter to apply the scheme to only the pages of the memcg. Fix it. Link: https://lkml.kernel.org/r/20240503180318.72798-7-sj@kernel.org Fixes: 9b7f9322a530 ("Docs/admin-guide/mm/damon/usage: document DAMOS filters of sysfs") Closes: https://lore.kernel.org/r/20240317191358.97578-1-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.3.x] Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/damon: classify tests for functionalities and regressionsSeongJae Park1-4/+9
DAMON selftests can be classified into two categories: functionalities and regressions. Functionality tests are for checking if the function is working as specified, while the regression tests are basically reproducers of previously reported and fixed bugs. The tests of the categories are mixed in the selftests Makefile. Separate those for easier understanding of the types of tests. Link: https://lkml.kernel.org/r/20240503180318.72798-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'SeongJae Park1-40/+40
_damon_sysfs.py is using '==' or '!=' for 'None'. Since 'None' is a singleton, using 'is' or 'is not' is more efficient. Use the more efficient one. Link: https://lkml.kernel.org/r/20240503180318.72798-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/damon/_damon_sysfs: find sysfs mount point from /proc/mountsSeongJae Park1-1/+12
_damon_sysfs.py assumes sysfs is mounted at /sys. In some systems, that might not be true. Find the mount point from /proc/mounts file content. Link: https://lkml.kernel.org/r/20240503180318.72798-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/damon/_damon_sysfs: check errors from nr_schemes file readsSeongJae Park1-0/+2
DAMON context staging method in _damon_sysfs.py is not checking the returned error from nr_schemes file read. Check it. Link: https://lkml.kernel.org/r/20240503180318.72798-3-sj@kernel.org Fixes: f5f0e5a2bef9 ("selftests/damon/_damon_sysfs: implement kdamonds start function") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm/damon/core: initialize ->esz_bp from damos_quota_init_priv()SeongJae Park1-0/+1
Patch series "mm/damon: misc fixes and improvements". Add miscelleneous and non-urgent fixes and improvements for DAMON code, selftests, and documents. This patch (of 10): damos_quota_init_priv() function should initialize all private fields of struct damos_quota. However, it is not initializing ->esz_bp field. This could result in use of uninitialized variable from damon_feed_loop_next_input() function. There is no such issue at the moment because every caller of the function is passing damos_quota object that already having the field zero value. But we cannot guarantee the future, and the function is not doing what it is promising. A bug is a bug. This fix is for preventing possible future issues. Link: https://lkml.kernel.org/r/20240503180318.72798-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240503180318.72798-2-sj@kernel.org Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/damon: add a test for DAMOS quota goalSeongJae Park2-1/+78
Add a selftest for DAMOS quota goal. It tests the feature by setting a user_input metric based goal, change the current feedback, and check if the effective quota size is increased and decreased as expected. Link: https://lkml.kernel.org/r/20240502172718.74166-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/damon/_damon_sysfs: support quota goalsSeongJae Park1-1/+83
Patch series "selftests/damon: add DAMOS quota goal test". Extend DAMON selftest-purpose sysfs wrapper to support DAMOS quota goal, and implement a simple selftest for the feature using it. This patch (of 2): The DAMON sysfs test purpose wrapper, _damon_sysfs.py, is not supporting quota goals. Implement the support for testing the feature. The test will be implemented and added by the following commit. Link: https://lkml.kernel.org/r/20240502172718.74166-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240502172718.74166-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysselftests/harness: Handle TEST_F()'s explicit exit codesMickaël Salaün1-1/+5
If TEST_F() explicitly calls exit(code) with code different than 0, then _metadata->exit_code is set to this code (e.g. KVM_ONE_VCPU_TEST()). We need to keep in mind that _metadata->exit_code can be KSFT_SKIP while the process exit code is 0. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Reported-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Closes: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Link: https://lore.kernel.org/r/20240511171445.904356-11-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/harness: Fix vfork() side effectsMickaël Salaün2-25/+57
Setting the time namespace with CLONE_NEWTIME returns -EUSERS if the calling thread shares memory with another thread (because of the shared vDSO), which is the case when it is created with vfork(). Fix pidfd_setns_test by replacing test harness's vfork() call with a clone3() call with CLONE_VFORK, and an explicit sharing of the _metadata and self objects. Replace _metadata->teardown_parent with a new FIXTURE_TEARDOWN_PARENT() helper that can replace FIXTURE_TEARDOWN(). This is a cleaner approach and it enables to selectively share the fixture data between the child process running tests and the parent process running the fixture teardown. This also avoids updating several tests to not rely on the self object's copy-on-write property (e.g. storing the returned value of a fork() call). Cc: Christian Brauner <brauner@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-10-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/harness: Share _metadata between forked processesMickaël Salaün1-11/+15
Unconditionally share _metadata between all forked processes, which enables to actually catch errors which were previously ignored. This is required for a following commit replacing vfork() with clone3() and CLONE_VFORK (i.e. not sharing the full memory) . It should also be useful to share _metadata to extend expectations to test process's forks. For instance, this change identified a wrong expectation in pidfd_setns_test. Because this _metadata is used by the new XFAIL_ADD(), use a global pointer initialized in TEST_F(). This is OK because only XFAIL_ADD() use it, and XFAIL_ADD() already depends on TEST_F(). Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-9-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/pidfd: Fix wrong expectationMickaël Salaün1-1/+1
Replace a wrong EXPECT_GT(self->child_pid_exited, 0) with EXPECT_GE(), which will be actually tested on the parent and child sides with a following commit. Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240511171445.904356-8-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/harness: Constify fixture variantsMickaël Salaün1-2/+2
FIXTURE_VARIANT_ADD() types are passed as const pointers to FIXTURE_TEARDOWN(). Make that explicit by constifying the variants declarations. Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-7-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/landlock: Do not allocate memory in fixture dataMickaël Salaün1-22/+35
Do not allocate self->dir_path in the test process because this would not be visible in the FIXTURE_TEARDOWN() process when relying on fork()/clone3() instead of vfork(). This change is required for a following commit removing vfork() call to not break the layout3_fs.* test cases. Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-6-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/harness: Fix interleaved scheduling leading to race conditionsMickaël Salaün1-1/+14
Fix a race condition when running several FIXTURE_TEARDOWN() managing the same resource. This fixes a race condition in the Landlock file system tests when creating or unmounting the same directory. Using clone3() with CLONE_VFORK guarantees that the child and grandchild test processes are sequentially scheduled. This is implemented with a new clone3_vfork() helper replacing the fork() call. This avoids triggering this error in __wait_for_test(): Test ended in some other way [127] Cc: Christian Brauner <brauner@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-5-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/harness: Fix fixture teardownMickaël Salaün1-5/+9
Make sure fixture teardowns are run when test cases failed, including when _metadata->teardown_parent is set to true. Make sure only one fixture teardown is run per test case, handling the case where the test child forks. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shengyu Li <shengyu.li.evgeny@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Fixes: 72d7cb5c190b ("selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN") Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-4-mic@digikod.net Rule: add Link: https://lore.kernel.org/stable/20240506165518.474504-4-mic%40digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/landlock: Fix FS tests when run on a private mount pointMickaël Salaün1-1/+9
According to the test environment, the mount point of the test's working directory may be shared or not, which changes the visibility of the nested "tmp" mount point for the test's parent process calling umount("tmp"). This was spotted while running tests in containers [1], where mount points are private. Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://github.com/landlock-lsm/landlock-test-tools/pull/4 [1] Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-3-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysselftests/pidfd: Fix config for pidfd_setns_testMickaël Salaün1-0/+2
Required by switch_timens() to open /proc/self/ns/time_for_children. CONFIG_GENERIC_VDSO_TIME_NS is not available on UML, so pidfd_setns_test cannot be run successfully on this architecture. Cc: Shuah Khan <skhan@linuxfoundation.org> Fixes: 2b40c5db73e2 ("selftests/pidfd: add pidfd setns tests") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240511171445.904356-2-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into ↵Jens Axboe1760-33155/+78281
net-accept-more * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1557 commits) net: qede: use extack in qede_parse_actions() net: qede: propagate extack through qede_flow_spec_validate() net: qede: use faked extack in qede_flow_spec_to_rule() net: qede: use extack in qede_parse_flow_attr() net: qede: add extack in qede_add_tc_flower_fltr() net: qede: use extack in qede_flow_parse_udp_v4() net: qede: use extack in qede_flow_parse_udp_v6() net: qede: use extack in qede_flow_parse_tcp_v4() net: qede: use extack in qede_flow_parse_tcp_v6() net: qede: use extack in qede_flow_parse_v4_common() net: qede: use extack in qede_flow_parse_v6_common() net: qede: use extack in qede_set_v4_tuple_to_profile() net: qede: use extack in qede_set_v6_tuple_to_profile() net: qede: use extack in qede_flow_parse_ports() net: usb: smsc95xx: stop lying about skb->truesize net: dsa: microchip: Fix spellig mistake "configur" -> "configure" af_unix: Add dead flag to struct scm_fp_list. net: ethernet: adi: adin1110: Replace linux/gpio.h by proper one octeontx2-pf: Reuse Transmit queue/Send queue index of HTB class gve: Use ethtool_sprintf/puts() to fill stats strings ...
9 daysMerge branch 'for-6.10/io_uring' into net-accept-moreJens Axboe51-1762/+2050
* for-6.10/io_uring: (97 commits) io_uring: support to inject result for NOP io_uring: fail NOP if non-zero op flags is passed in io_uring/net: add IORING_ACCEPT_POLL_FIRST flag io_uring/net: add IORING_ACCEPT_DONTWAIT flag io_uring/filetable: don't unnecessarily clear/reset bitmap io_uring/io-wq: Use set_bit() and test_bit() at worker->flags io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring io_uring: Require zeroed sqe->len on provided-buffers send io_uring/notif: disable LAZY_WAKE for linked notifs io_uring/net: fix sendzc lazy wake polling io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it io_uring/rw: reinstate thread check for retries io_uring/notif: implement notification stacking io_uring/notif: simplify io_notif_flush() net: add callback for setting a ubuf_info to skb net: extend ubuf_info callback to ops structure io_uring/net: support bundles for recv io_uring/net: support bundles for send io_uring/kbuf: add helpers for getting/peeking multiple buffers io_uring/net: add provided buffer support for IORING_OP_SEND ...
9 dayscsky: Emulate one-byte cmpxchgPaul E. McKenney2-0/+11
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ] Co-developed-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <linux-csky@vger.kernel.org>
9 daysMerge branch 'acpica'Rafael J. Wysocki9-47/+314
Merge ACPICA material for v6.10. This is mostly new material included in the 20240322 upstream ACPICA release. - Disable -Wstringop-truncation for some ACPICA code in the kernel to avoid a compiler warning that is not very useful (Arnd Bergmann). - Add EINJ CXL error types to actbl1.h (Ben Cheatham). - Add support for RAS2 table to ACPICA (Shiju Jose). - Fix various spelling mistakes in text files and code comments in ACPICA (Colin Ian King). - Fix spelling and typos in ACPICA (Saket Dumbre). - Modify ACPI_OBJECT_COMMON_HEADER (lijun). - Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu). - Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam). - Add missin increment of registered GPE count to ACPICA (Daniil Tatianin). - Mark new ACPICA release 20240322 (Saket Dumbre). - Add support for the AEST V2 table to ACPICA (Ruidong Tian). * acpica: ACPICA: AEST: Add support for the AEST V2 table ACPICA: Update acpixf.h for new ACPICA release 20240322 ACPICA: events/evgpeinit: don't forget to increment registered GPE count ACPICA: Fix CXL 3.0 structure (RDPAS) in the CEDT table ACPICA: SRAT: Add dump and compiler support for RINTC affinity structure ACPICA: SRAT: Add RISC-V RINTC affinity structure ACPICA: Modify ACPI_OBJECT_COMMON_HEADER ACPICA: Fix spelling and typos ACPICA: Clean up the fix for Issue #900 ACPICA: Fix various spelling mistakes in text files and code comments ACPICA: Attempt 1 to fix issue #900 ACPICA: ACPI 6.5: RAS2: Add support for RAS2 table ACPICA: actbl1.h: Add EINJ CXL error types ACPI: disable -Wstringop-truncation
9 dayswatchdog: LENOVO_SE10_WDT should depend on X86 && DMIGeert Uytterhoeven1-8/+9
The Lenovo SE10 watchdog is only present on Lenovo ThinkEdge SE10 platforms, which are based on Intel Atom SoCs, and its driver relies on DMI tables. Hence add dependencies on X86 && DMI, to prevent asking the user about this driver when configuring a kernel without Intel Atom or DMI support. While at it, fix the odd indentation (spaces instead of TABs). Fixes: 1f6602c8ed1eccac ("watchdog: lenovo_se10_wdt: Watchdog driver for Lenovo SE10 platform") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/58005595a05ef803b454b78d3ae9b8ee0675bd5d.1715076440.git.geert+renesas@glider.be Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
10 daysMerge branch 'mlx5-misc-fixes'Jakub Kicinski9-51/+79
Tariq Toukan says: ==================== mlx5 misc fixes This patchset provides bug fixes to mlx5 driver. Patch 1 by Shay fixes the error flow in mlx5e_suspend(). Patch 2 by Shay aligns the peer devlink set logic with the register devlink flow. Patch 3 by Maher solves a deadlock in lag enable/disable. Patches 4 and 5 by Akiva address issues in command interface corner cases. Series generated against: commit 393ceeb9211e ("Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'") ==================== Link: https://lore.kernel.org/r/20240509112951.590184-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/mlx5: Discard command completions in internal errorAkiva Goldberger1-0/+3
Fix use after free when FW completion arrives while device is in internal error state. Avoid calling completion handler in this case, since the device will flush the command interface and trigger all completions manually. Kernel log: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. ... RIP: 0010:refcount_warn_saturate+0xd8/0xe0 ... Call Trace: <IRQ> ? __warn+0x79/0x120 ? refcount_warn_saturate+0xd8/0xe0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0xd8/0xe0 cmd_ent_put+0x13b/0x160 [mlx5_core] mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core] cmd_comp_notifier+0x1f/0x30 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 mlx5_eq_async_int+0xf6/0x290 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x4b/0x160 handle_irq_event+0x2e/0x80 handle_edge_irq+0x98/0x230 __common_interrupt+0x3b/0xa0 common_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/mlx5: Add a timeout to acquire the command queue semaphoreAkiva Goldberger2-9/+33
Prevent forced completion handling on an entry that has not yet been assigned an index, causing an out of bounds access on idx = -22. Instead of waiting indefinitely for the sem, blocking flow now waits for index to be allocated or a sem acquisition timeout before beginning the timer for FW completion. Kernel log example: mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/mlx5: Reload only IB representors upon lag disable/enableMaher Sanalla4-17/+25
On lag disable, the bond IB device along with all of its representors are destroyed, and then the slaves' representors get reloaded. In case the slave IB representor load fails, the eswitch error flow unloads all representors, including ethernet representors, where the netdevs get detached and removed from lag bond. Such flow is inaccurate as the lag driver is not responsible for loading/unloading ethernet representors. Furthermore, the flow described above begins by holding lag lock to prevent bond changes during disable flow. However, when reaching the ethernet representors detachment from lag, the lag lock is required again, triggering the following deadlock: Call trace: __switch_to+0xf4/0x148 __schedule+0x2c8/0x7d0 schedule+0x50/0xe0 schedule_preempt_disabled+0x18/0x28 __mutex_lock.isra.13+0x2b8/0x570 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x4c/0x68 mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core] mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core] mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core] mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core] mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core] mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core] mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core] mlx5_disable_lag+0x130/0x138 [mlx5_core] mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core] devlink_nl_cmd_eswitch_set_doit+0xdc/0x180 genl_family_rcv_msg_doit.isra.17+0xe8/0x138 genl_rcv_msg+0xe4/0x220 netlink_rcv_skb+0x44/0x108 genl_rcv+0x40/0x58 netlink_unicast+0x198/0x268 netlink_sendmsg+0x1d4/0x418 sock_sendmsg+0x54/0x60 __sys_sendto+0xf4/0x120 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x8c/0x120 do_el0_svc+0x30/0xa0 el0_svc+0x20/0x30 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 Thus, upon lag enable/disable, load and unload only the IB representors of the slaves preventing the deadlock mentioned above. While at it, refactor the mlx5_esw_offloads_rep_load() function to have a static helper method for its internal logic, in symmetry with the representor unload design. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Co-developed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/mlx5: Fix peer devlink set for SF representor devlink portShay Drory2-20/+13
The cited patch change register devlink flow, and neglect to reflect the changes for peer devlink set logic. Peer devlink set is triggering a call trace if done after devl_register.[1] Hence, align peer devlink set logic with register devlink flow. [1] WARNING: CPU: 4 PID: 3394 at net/devlink/core.c:155 devlink_rel_nested_in_add+0x177/0x180 CPU: 4 PID: 3394 Comm: kworker/u40:1 Not tainted 6.9.0-rc4_for_linust_min_debug_2024_04_16_14_08 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_vhca_event0 mlx5_vhca_state_work_handler [mlx5_core] RIP: 0010:devlink_rel_nested_in_add+0x177/0x180 Call Trace: <TASK> ? __warn+0x78/0x120 ? devlink_rel_nested_in_add+0x177/0x180 ? report_bug+0x16d/0x180 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? devlink_port_init+0x30/0x30 ? devlink_port_type_clear+0x50/0x50 ? devlink_rel_nested_in_add+0x177/0x180 ? devlink_rel_nested_in_add+0xdd/0x180 mlx5_sf_mdev_event+0x74/0xb0 [mlx5_core] notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core] mlx5_sf_dev_probe+0x185/0x3e0 [mlx5_core] auxiliary_bus_probe+0x38/0x80 ? driver_sysfs_add+0x51/0x80 really_probe+0xc5/0x3a0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 bus_probe_device+0x86/0xa0 device_add+0x64f/0x860 __auxiliary_device_add+0x3b/0xa0 mlx5_sf_dev_add+0x139/0x330 [mlx5_core] mlx5_sf_dev_state_change_handler+0x1e4/0x250 [mlx5_core] notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core] process_one_work+0x13f/0x2e0 worker_thread+0x2bd/0x3c0 ? rescuer_thread+0x410/0x410 kthread+0xc4/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: bf729988303a ("net/mlx5: Restore mistakenly dropped parts in register devlink flow") Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/mlx5e: Fix netif state handlingShay Drory1-5/+5
mlx5e_suspend cleans resources only if netif_device_present() returns true. However, mlx5e_resume changes the state of netif, via mlx5e_nic_enable, only if reg_state == NETREG_REGISTERED. In the below case, the above leads to NULL-ptr Oops[1] and memory leaks: mlx5e_probe _mlx5e_resume mlx5e_attach_netdev mlx5e_nic_enable <-- netdev not reg, not calling netif_device_attach() register_netdev <-- failed for some reason. ERROR_FLOW: _mlx5e_suspend <-- netif_device_present return false, resources aren't freed :( Hence, clean resources in this case as well. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0010 [#1] SMP CPU: 2 PID: 9345 Comm: test-ovs-ct-gen Not tainted 6.5.0_for_upstream_min_debug_2023_09_05_16_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at0xffffffffffffffd6. RSP: 0018:ffff888178aaf758 EFLAGS: 00010246 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core] mlx5_core_uplink_netdev_event_replay+0x3e/0x60 [mlx5_core] mlx5_mdev_netdev_track+0x53/0x60 [mlx5_ib] mlx5_ib_roce_init+0xc3/0x340 [mlx5_ib] __mlx5_ib_add+0x34/0xd0 [mlx5_ib] mlx5r_probe+0xe1/0x210 [mlx5_ib] ? auxiliary_match_id+0x6a/0x90 auxiliary_bus_probe+0x38/0x80 ? driver_sysfs_add+0x51/0x80 really_probe+0xc9/0x3e0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 bus_probe_device+0x86/0xa0 device_add+0x637/0x840 __auxiliary_device_add+0x3b/0xa0 add_adev+0xc9/0x140 [mlx5_core] mlx5_rescan_drivers_locked+0x22a/0x310 [mlx5_core] mlx5_register_device+0x53/0xa0 [mlx5_core] mlx5_init_one_devl_locked+0x5c4/0x9c0 [mlx5_core] mlx5_init_one+0x3b/0x60 [mlx5_core] probe_one+0x44c/0x730 [mlx5_core] local_pci_probe+0x3e/0x90 pci_device_probe+0xbf/0x210 ? kernfs_create_link+0x5d/0xa0 ? sysfs_do_create_link_sd+0x60/0xc0 really_probe+0xc9/0x3e0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 pci_bus_add_device+0x54/0x80 pci_iov_add_virtfn+0x2e6/0x320 sriov_enable+0x208/0x420 mlx5_core_sriov_configure+0x9e/0x200 [mlx5_core] sriov_numvfs_store+0xae/0x1a0 kernfs_fop_write_iter+0x10c/0x1a0 vfs_write+0x291/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- Fixes: 2c3b5beec46a ("net/mlx5e: More generic netdev management API") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge branch '40GbE' of ↵Jakub Kicinski8-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-05-08 (most Intel drivers) This series contains updates to i40e, iavf, ice, igb, igc, e1000e, and ixgbe drivers. Asbjørn Sloth Tønnesen adds checks against supported flower control flags for i40e, iavf, ice, and igb drivers. Michal corrects filters removed during eswitch release for ice. Corinna Vinschen defers PTP initialization to later in probe so that netdev log entry is initialized on igc. Ilpo Järvinen removes a couple of unused, duplicate defines on e1000e and ixgbe. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates igc: fix a log entry using uninitialized netdev ice: remove correct filters during eswitch release igb: flower: validate control flags ice: flower: validate control flags iavf: flower: validate control flags i40e: flower: validate control flags ==================== Link: https://lore.kernel.org/r/20240508173342.2760994-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge branch 'net-qede-convert-filter-code-to-use-extack'Jakub Kicinski1-51/+63
Asbjørn Sloth Tønnesen says: ==================== net: qede: convert filter code to use extack This series converts the filter code in the qede driver to use NL_SET_ERR_MSG_*(extack, ...) for error handling. Patch 1-12 converts qede_parse_flow_attr() to use extack, along with all it's static helper functions. qede_parse_flow_attr() is used in two places: - qede_add_tc_flower_fltr() - qede_flow_spec_to_rule() In the latter call site extack is faked in the same way as is done in mlxsw (patch 12). While the conversion is going on, some error messages are silenced in between patch 1-12. If wanted could squash patch 1-12 in a v3, but I felt that it would be easier to review as 12 more trivial patches. Patch 13 and 14, finishes up by converting qede_parse_actions(), and ensures that extack is propagated to it, in both call contexts. v1: https://lore.kernel.org/netdev/20240507104421.1628139-1-ast@fiberby.net/ ==================== Link: https://lore.kernel.org/r/20240508143404.95901-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: qede: use extack in qede_parse_actions()Asbjørn Sloth Tønnesen1-2/+3
Convert DP_NOTICE/DP_INFO to NL_SET_ERR_MSG_MOD. Keep edev around for use with QEDE_RSS_COUNT(). Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240508143404.95901-15-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: qede: propagate extack through qede_flow_spec_validate()Asbjørn Sloth Tønnesen1-3/+4
Pass extack to qede_flow_spec_validate() when called in qede_flow_spec_to_rule(). Pass extack to qede_parse_actions(). Not converting qede_flow_spec_validate() to use extack for errors, as it's only called from qede_flow_spec_to_rule(), where extack is faked into a DP_NOTICE anyway, so opting to keep DP_VERBOSE/DP_NOTICE usage. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240508143404.95901-14-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: qede: use faked extack in qede_flow_spec_to_rule()Asbjørn Sloth Tønnesen1-1/+4
Since qede_parse_flow_attr() now does error reporting through extack, then give it a fake extack and extract the error message afterwards if one was set. The extracted error message is then passed on through DP_NOTICE(), including messages that was earlier issued with DP_INFO(). This fake extack approach is already used by mlxsw_env_linecard_modules_power_mode_apply() in drivers/net/ethernet/mellanox/mlxsw/core_env.c Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240508143404.95901-13-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: qede: use extack in qede_parse_flow_attr()Asbjørn Sloth Tønnesen1-12/+14
Convert qede_parse_flow_attr() to take extack, and drop the edev argument. Convert DP_NOTICE calls to use NL_SET_ERR_MSG_* instead. Pass extack in calls to qede_flow_parse_{tcp,udp}_v{4,6}(). In calls to qede_parse_flow_attr(), if extack is unavailable, then use NULL for now, until a subsequent patch makes extack available. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240508143404.95901-12-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>