aboutsummaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
3 daysMerge tag 'powerpc-6.9-4' of ↵Linus Torvalds3-8/+15
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix incorrect delay handling in the plpks (keystore) code - Fix a panic when an LPAR boots with a frozen PE Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna Jain. * tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE powerpc/pseries: make max polling consistent for longer H_CALLs
3 daysMerge tag 'x86-urgent-2024-05-05' of ↵Linus Torvalds9-67/+64
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Remove the broken vsyscall emulation code from the page fault code - Fix kexec crash triggered by certain SEV RMP table layouts - Fix unchecked MSR access error when disabling the x2APIC via iommu=off * tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove broken vsyscall emulation code from the page fault code x86/apic: Don't access the APIC when disabling x2APIC x86/sev: Add callback to apply RMP table fixups for kexec x86/e820: Add a new e820 table update helper
5 daysMerge tag 'for-linus-6.9a-rc7-tag' of ↵Linus Torvalds2-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two fixes when running as Xen PV guests for issues introduced in the 6.9 merge window, both related to apic id handling" * tag 'for-linus-6.9a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: return a sane initial apic id when running as PV guest x86/xen/smp_pv: Register the boot CPU APIC properly
6 daysMerge tag 's390-6.9-6' of ↵Linus Torvalds5-4/+18
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - The function __storage_key_init_range() expects the end address to be the first byte outside the range to be initialized. Fix the callers that provide the last byte within the range instead. - 3270 Channel Command Word (CCW) may contain zero data address in case there is no data in the request. Add data availability check to avoid erroneous non-zero value as result of virt_to_dma32(NULL) application in cases there is no data - Add missing CFI directives for an unwinder to restore the return address in the vDSO assembler code - NUL-terminate kernel buffer when duplicating user space memory region on Channel IO (CIO) debugfs write inject - Fix wrong format string in zcrypt debug output - Return -EBUSY code when a CCA card is temporarily unavailabile - Restore a loop that retries derivation of a protected key from a secure key in cases the low level reports temporarily unavailability with -EBUSY code * tag 's390-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/paes: Reestablish retry loop in paes s390/zcrypt: Use EBUSY to indicate temp unavailability s390/zcrypt: Handle ep11 cprb return code s390/zcrypt: Fix wrong format string in debug feature printout s390/cio: Ensure the copied buf is NUL terminated s390/vdso: Add CFI for RA register to asm macro vdso_func s390/3270: Fix buffer assignment s390/mm: Fix clearing storage keys for huge pages s390/mm: Fix storage key clearing for guest huge pages
6 daysMerge tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds6-28/+20
Pull xtensa fixes from Max Filippov: - fix unused variable warning caused by empty flush_dcache_page() definition - fix stack unwinding on windowed noMMU XIP configurations - fix Coccinelle warning 'opportunity for min()' in xtensa ISS platform code * tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: remove redundant flush_dcache_page and ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros tty: xtensa/iss: Use min() to fix Coccinelle warning xtensa: fix MAKE_PC_FROM_RA second argument
6 daysx86/xen: return a sane initial apic id when running as PV guestJuergen Gross1-1/+10
With recent sanity checks for topology information added, there are now warnings issued for APs when running as a Xen PV guest: [Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001 This is due to the initial APIC ID obtained via CPUID for PV guests is always 0. Avoid the warnings by synthesizing the CPUID data to contain the same initial APIC ID as xen_pv_smp_config() is using for registering the APIC IDs of all CPUs. Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete") Signed-off-by: Juergen Gross <jgross@suse.com>
6 daysx86/xen/smp_pv: Register the boot CPU APIC properlyThomas Gleixner1-2/+2
The topology core expects the boot APIC to be registered from earhy APIC detection first and then again when the firmware tables are evaluated. This is used for detecting the real BSP CPU on a kexec kernel. The recent conversion of XEN/PV to register fake APIC IDs failed to register the boot CPU APIC correctly as it only registers it once. This causes the BSP detection mechanism to trigger wrongly: CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1 Additionally this results in one CPU being ignored. Register the boot CPU APIC twice so that the XEN/PV fake enumeration behaves like real firmware. Reported-by: Juergen Gross <jgross@suse.com> Fixes: e75307023466 ("x86/xen/smp_pv: Register fake APICs") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/87a5l8s2fg.ffs@tglx Signed-off-by: Juergen Gross <jgross@suse.com>
6 daysMerge tag 'net-6.9-rc7' of ↵Linus Torvalds4-51/+80
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
7 dayss390/paes: Reestablish retry loop in paesHarald Freudenberger1-2/+13
With commit ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") the retry loop to retry derivation of a protected key from a secure key has been removed. This was based on the assumption that theses retries are not needed any more as proper retries are done in the zcrypt layer. However, tests have revealed that there exist some cases with master key change in the HSM and immediately (< 1 second) attempt to derive a protected key from a secure key with exact this HSM may eventually fail. The low level functions in zcrypt_ccamisc.c and zcrypt_ep11misc.c detect and report this temporary failure and report it to the caller as -EBUSY. The re-established retry loop in the paes implementation catches exactly this -EBUSY and eventually may run some retries. Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
7 daysx86/mm: Remove broken vsyscall emulation code from the page fault codeLinus Torvalds3-59/+3
The syzbot-reported stack trace from hell in this discussion thread actually has three nested page faults: https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com ... and I think that's actually the important thing here: - the first page fault is from user space, and triggers the vsyscall emulation. - the second page fault is from __do_sys_gettimeofday(), and that should just have caused the exception that then sets the return value to -EFAULT - the third nested page fault is due to _raw_spin_unlock_irqrestore() -> preempt_schedule() -> trace_sched_switch(), which then causes a BPF trace program to run, which does that bpf_probe_read_compat(), which causes that page fault under pagefault_disable(). It's quite the nasty backtrace, and there's a lot going on. The problem is literally the vsyscall emulation, which sets current->thread.sig_on_uaccess_err = 1; and that causes the fixup_exception() code to send the signal *despite* the exception being caught. And I think that is in fact completely bogus. It's completely bogus exactly because it sends that signal even when it *shouldn't* be sent - like for the BPF user mode trace gathering. In other words, I think the whole "sig_on_uaccess_err" thing is entirely broken, because it makes any nested page-faults do all the wrong things. Now, arguably, I don't think anybody should enable vsyscall emulation any more, but this test case clearly does. I think we should just make the "send SIGSEGV" be something that the vsyscall emulation does on its own, not this broken per-thread state for something that isn't actually per thread. The x86 page fault code actually tried to deal with the "incorrect nesting" by having that: if (in_interrupt()) return; which ignores the sig_on_uaccess_err case when it happens in interrupts, but as shown by this example, these nested page faults do not need to be about interrupts at all. IOW, I think the only right thing is to remove that horrendously broken code. The attached patch looks like the ObviouslyCorrect(tm) thing to do. NOTE! This broken code goes back to this commit in 2011: 4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults") ... and back then the reason was to get all the siginfo details right. Honestly, I do not for a moment believe that it's worth getting the siginfo details right here, but part of the commit says: This fixes issues with UML when vsyscall=emulate. ... and so my patch to remove this garbage will probably break UML in this situation. I do not believe that anybody should be running with vsyscall=emulate in 2024 in the first place, much less if you are doing things like UML. But let's see if somebody screams. Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
8 daysMerge tag 'kvmarm-fixes-6.9-2' of ↵Paolo Bonzini1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #2 - Fix + test for a NULL dereference resulting from unsanitised user input in the vgic-v2 device attribute accessors
8 daysx86/apic: Don't access the APIC when disabling x2APICThomas Gleixner1-5/+11
With 'iommu=off' on the kernel command line and x2APIC enabled by the BIOS the code which disables the x2APIC triggers an unchecked MSR access error: RDMSR from 0x802 at rIP: 0xffffffff94079992 (native_apic_msr_read+0x12/0x50) This is happens because default_acpi_madt_oem_check() selects an x2APIC driver before the x2APIC is disabled. When the x2APIC is disabled because interrupt remapping cannot be enabled due to 'iommu=off' on the command line, x2apic_disable() invokes apic_set_fixmap() which in turn tries to read the APIC ID. This triggers the MSR warning because x2APIC is disabled, but the APIC driver is still x2APIC based. Prevent that by adding an argument to apic_set_fixmap() which makes the APIC ID read out conditional and set it to false from the x2APIC disable path. That's correct as the APIC ID has already been read out during early discovery. Fixes: d10a904435fa ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites") Reported-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Adrian Huang <ahuang12@lenovo.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/875xw5t6r7.ffs@tglx
9 daysx86/sev: Add callback to apply RMP table fixups for kexecAshish Kalra3-0/+45
Handle cases where the RMP table placement in the BIOS is not 2M aligned and the kexec-ed kernel could try to allocate from within that chunk which then causes a fatal RMP fault. The kexec failure is illustrated below: SEV-SNP: RMP table physical range [0x0000007ffe800000 - 0x000000807f0fffff] BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000008efff] usable BIOS-e820: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS ... BIOS-e820: [mem 0x0000004080000000-0x0000007ffe7fffff] usable BIOS-e820: [mem 0x0000007ffe800000-0x000000807f0fffff] reserved BIOS-e820: [mem 0x000000807f100000-0x000000807f1fefff] usable As seen here in the e820 memory map, the end range of the RMP table is not aligned to 2MB and not reserved but it is usable as RAM. Subsequently, kexec -s (KEXEC_FILE_LOAD syscall) loads it's purgatory code and boot_param, command line and other setup data into this RAM region as seen in the kexec logs below, which leads to fatal RMP fault during kexec boot. Loaded purgatory at 0x807f1fa000 Loaded boot_param, command line and misc at 0x807f1f8000 bufsz=0x1350 memsz=0x2000 Loaded 64bit kernel at 0x7ffae00000 bufsz=0xd06200 memsz=0x3894000 Loaded initrd at 0x7ff6c89000 bufsz=0x4176014 memsz=0x4176014 E820 memmap: 0000000000000000-000000000008efff (1) 000000000008f000-000000000008ffff (4) 0000000000090000-000000000009ffff (1) ... 0000004080000000-0000007ffe7fffff (1) 0000007ffe800000-000000807f0fffff (2) 000000807f100000-000000807f1fefff (1) 000000807f1ff000-000000807fffffff (2) nr_segments = 4 segment[0]: buf=0x00000000e626d1a2 bufsz=0x4000 mem=0x807f1fa000 memsz=0x5000 segment[1]: buf=0x0000000029c67bd6 bufsz=0x1350 mem=0x807f1f8000 memsz=0x2000 segment[2]: buf=0x0000000045c60183 bufsz=0xd06200 mem=0x7ffae00000 memsz=0x3894000 segment[3]: buf=0x000000006e54f08d bufsz=0x4176014 mem=0x7ff6c89000 memsz=0x4177000 kexec_file_load: type:0, start:0x807f1fa150 head:0x1184d0002 flags:0x0 Check if RMP table start and end physical range in the e820 tables are not aligned to 2MB and in that case map this range to reserved in all the three e820 tables. [ bp: Massage. ] Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/df6e995ff88565262c2c7c69964883ff8aa6fc30.1714090302.git.ashish.kalra@amd.com
9 daysx86/e820: Add a new e820 table update helperAshish Kalra2-3/+5
Add a new API helper e820__range_update_table() with which to update an arbitrary e820 table. Move all current users of e820__range_update_kexec() to this new helper. [ bp: Massage. ] Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/b726af213ad55053f8a7a1e793b01bb3f1ca9dd5.1714090302.git.ashish.kalra@amd.com
9 daysxtensa: remove redundant flush_dcache_page and ↵Barry Song1-16/+8
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros xtensa's flush_dcache_page() can be a no-op sometimes. There is a generic implementation for this case in include/asm-generic/ cacheflush.h. #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE static inline void flush_dcache_page(struct page *page) { } #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0 #endif So remove the superfluous flush_dcache_page() definition, which also helps silence potential build warnings complaining the page variable passed to flush_dcache_page() is not used. In file included from crypto/scompress.c:12: include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone': include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable] 76 | struct page *page; | ^~~~ crypto/scompress.c: In function 'scomp_acomp_comp_decomp': >> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable] 174 | struct page *dst_page = sg_page(req->dst); | The issue was originally reported on LoongArch by kernel test robot (Huacai fixed it on LoongArch), then reported by Guenter and me on xtensa. This patch also removes lots of redundant macros which have been defined by asm-generic/cacheflush.h. Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/ Reported-by: Barry Song <v-songbaohua@oppo.com> Closes: https://lore.kernel.org/all/CAGsJ_4yDk1+axbte7FKQEwD7X2oxUCFrEc9M5YOS1BobfDFXPA@mail.gmail.com/ Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/all/aaa8b7d7-5abe-47bf-93f6-407942436472@roeck-us.net/ Fixes: 77292bb8ca69 ("crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Message-Id: <20240319010920.125192-1-21cnbao@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
10 daysMerge tag 'x86-urgent-2024-04-28' of ↵Linus Torvalds7-13/+29
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Make the CPU_MITIGATIONS=n interaction with conflicting mitigation-enabling boot parameters a bit saner. - Re-enable CPU mitigations by default on non-x86 - Fix TDX shared bit propagation on mprotect() - Fix potential show_regs() system hang when PKE initialization is not fully finished yet. - Add the 0x10-0x1f model IDs to the Zen5 range - Harden #VC instruction emulation some more * tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n cpu: Re-enable CPU mitigations by default for !X86 architectures x86/tdx: Preserve shared bit on mprotect() x86/cpu: Fix check for RDPKRU in __show_regs() x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
11 daysMerge tag 'riscv-for-linus-6.9-rc6' of ↵Linus Torvalds7-27/+33
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for TASK_SIZE on rv64/NOMMU, to reflect the lack of user/kernel separation - A fix to avoid loading rv64/NOMMU kernel past the start of RAM - A fix for RISCV_HWPROBE_EXT_ZVFHMIN on ilp32 to avoid signed integer overflow in the bitmask - The sud_test kselftest has been fixed to properly swizzle the syscall number into the return register, which are not the same on RISC-V - A fix for a build warning in the perf tools on rv32 - A fix for the CBO selftests, to avoid non-constants leaking into the inline asm - A pair of fixes for T-Head PBMT errata probing, which has been renamed MAE by the vendor * tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2 perf riscv: Fix the warning due to the incompatible type riscv: T-Head: Test availability bit before enabling MAE errata riscv: thead: Rename T-Head PBMT to MAE selftests: sud_test: return correct emulated syscall value on RISC-V riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN riscv: Fix loading 64-bit NOMMU kernels past the start of RAM riscv: Fix TASK_SIZE on 64-bit NOMMU
11 daysMerge tag 'for-netdev' of ↵Jakub Kicinski4-51/+80
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-04-26 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 14 files changed, 168 insertions(+), 72 deletions(-). The main changes are: 1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page, from Puranjay Mohan. 2) Fix a crash in XDP with devmap broadcast redirect when the latter map is in process of being torn down, from Toke Høiland-Jørgensen. 3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF program runtime stats, from Xu Kuohai. 4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue, from Jason Xing. 5) Fix BPF verifier error message in resolve_pseudo_ldimm64, from Anton Protopopov. 6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item, from Andrii Nakryiko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64 bpf, x86: Fix PROBE_MEM runtime load check bpf: verifier: prevent userspace memory access xdp: use flags field to disambiguate broadcast redirect arm32, bpf: Reimplement sign-extension mov instruction riscv, bpf: Fix incorrect runtime stats bpf, arm64: Fix incorrect runtime stats bpf: Fix a verifier verbose message bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers MAINTAINERS: Update email address for Puranjay Mohan bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition ==================== Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysMerge tag 'soc-fixes-6.9-2' of ↵Linus Torvalds37-105/+171
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There are a lot of minor DT fixes for Mediatek, Rockchip, Qualcomm and Microchip and NXP, addressing both build-time warnings and bugs found during runtime testing. Most of these changes are machine specific fixups, but there are a few notable regressions that affect an entire SoC: - The Qualcomm MSI support that was improved for 6.9 ended up being wrong on some chips and now gets fixed. - The i.MX8MP camera interface broke due to a typo and gets updated again. The main driver fix is also for Qualcomm platforms, rewriting an interface in the QSEECOM firmware support that could lead to crashing the kernel from a trusted application. The only other code changes are minor fixes for Mediatek SoC drivers" * tag 'soc-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits) ARM: dts: imx6ull-tarragon: fix USB over-current polarity soc: mediatek: mtk-socinfo: depends on CONFIG_SOC_BUS soc: mediatek: mtk-svs: Append "-thermal" to thermal zone names arm64: dts: imx8mp: Fix assigned-clocks for second CSI2 ARM: dts: microchip: at91-sama7g54_curiosity: Replace regulator-suspend-voltage with the valid property ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with the valid property arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64 arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller arm64: dts: qcom: sm8650: Fix the msi-map entries arm64: dts: qcom: sm8550: Fix the msi-map entries arm64: dts: qcom: sm8450: Fix the msi-map entries arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2 arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1 arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo arm64: dts: rockchip: mark system power controller and fix typo on orangepi-5-plus ...
12 daysMerge tag 'arc-6.9-fixes' of ↵Linus Torvalds30-59/+50
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Incorrect VIPT aliasing assumption - Misc build warning fixes and some typos * tag 'arc-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: [plat-hsdk]: Remove misplaced interrupt-cells property ARC: Fix typos ARC: mm: fix new code about cache aliasing ARC: Fix -Wmissing-prototypes warnings
12 daysMerge patch series "RISC-V: Test th.sxstatus.MAEE bit before enabling MAEE"Palmer Dabbelt3-23/+29
Christoph Müllner <christoph.muellner@vrull.eu> says: Currently, the Linux kernel suffers from a boot regression when running on the c906 QEMU emulation. Details have been reported here by Björn Töpel: https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg04766.html The main issue is, that Linux enables XTheadMae for CPUs that have a T-Head mvendorid but QEMU maintainers don't want to emulate a CPU that uses reserved bits in PTEs. See also the following discussion for more context: https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg00775.html This series renames "T-Head PBMT" to "MAE"/"XTheadMae" and only enables it if the th.sxstatus.MAEE bit is set. The th.sxstatus CSR is documented here: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsxstatus.adoc XTheadMae is documented here: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmae.adoc The QEMU patch to emulate th.sxstatus with the MAEE bit not set is here: https://lore.kernel.org/all/20240329120427.684677-1-christoph.muellner@vrull.eu/ After applying the referenced QEMU patch, this patchset allows to successfully boot a C906 QEMU system emulation ("-cpu thead-c906"). * b4-shazam-lts: riscv: T-Head: Test availability bit before enabling MAE errata riscv: thead: Rename T-Head PBMT to MAE Link: https://lore.kernel.org/r/20240407213236.2121592-1-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 daysbpf, x86: Fix PROBE_MEM runtime load checkPuranjay Mohan1-32/+25
When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the address being loaded from is not necessarily valid. The BPF jit sets up exception handlers for each such load which catch page faults and 0 out the destination register. If the address for the load is outside kernel address space, the load will escape the exception handling and crash the kernel. To prevent this from happening, the emits some instruction to verify that addr is > end of userspace addresses. x86 has a legacy vsyscall ABI where a page at address 0xffffffffff600000 is mapped with user accessible permissions. The addresses in this page are considered userspace addresses by the fault handler. Therefore, a BPF program accessing this page will crash the kernel. This patch fixes the runtime checks to also check that the PROBE_MEM address is below VSYSCALL_ADDR. Example BPF program: SEC("fentry/tcp_v4_connect") int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk) { *(volatile unsigned long *)&sk->sk_tsq_flags; return 0; } BPF Assembly: 0: (79) r1 = *(u64 *)(r1 +0) 1: (79) r1 = *(u64 *)(r1 +344) 2: (b7) r0 = 0 3: (95) exit x86-64 JIT ========== BEFORE AFTER ------ ----- 0: nopl 0x0(%rax,%rax,1) 0: nopl 0x0(%rax,%rax,1) 5: xchg %ax,%ax 5: xchg %ax,%ax 7: push %rbp 7: push %rbp 8: mov %rsp,%rbp 8: mov %rsp,%rbp b: mov 0x0(%rdi),%rdi b: mov 0x0(%rdi),%rdi ------------------------------------------------------------------------------- f: movabs $0x100000000000000,%r11 f: movabs $0xffffffffff600000,%r10 19: add $0x2a0,%rdi 19: mov %rdi,%r11 20: cmp %r11,%rdi 1c: add $0x2a0,%r11 23: jae 0x0000000000000029 23: sub %r10,%r11 25: xor %edi,%edi 26: movabs $0x100000000a00000,%r10 27: jmp 0x000000000000002d 30: cmp %r10,%r11 29: mov 0x0(%rdi),%rdi 33: ja 0x0000000000000039 --------------------------------\ 35: xor %edi,%edi 2d: xor %eax,%eax \ 37: jmp 0x0000000000000040 2f: leave \ 39: mov 0x2a0(%rdi),%rdi 30: ret \-------------------------------------------- 40: xor %eax,%eax 42: leave 43: ret Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240424100210.11982-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 daysbpf: verifier: prevent userspace memory accessPuranjay Mohan1-0/+6
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To thwart invalid memory accesses, the JITs add an exception table entry for all such accesses. But in case the src_reg + offset is a userspace address, the BPF program might read that memory if the user has mapped it. Make the verifier add guard instructions around such memory accesses and skip the load if the address falls into the userspace region. The JITs need to implement bpf_arch_uaddress_limit() to define where the userspace addresses end for that architecture or TASK_SIZE is taken as default. The implementation is as follows: REG_AX = SRC_REG if(offset) REG_AX += offset; REG_AX >>= 32; if (REG_AX <= (uaddress_limit >> 32)) DST_REG = 0; else DST_REG = *(size *)(SRC_REG + offset); Comparing just the upper 32 bits of the load address with the upper 32 bits of uaddress_limit implies that the values are being aligned down to a 4GB boundary before comparison. The above means that all loads with address <= uaddress_limit + 4GB are skipped. This is acceptable because there is a large hole (much larger than 4GB) between userspace and kernel space memory, therefore a correctly functioning BPF program should not access this 4GB memory above the userspace. Let's analyze what this patch does to the following fentry program dereferencing an untrusted pointer: SEC("fentry/tcp_v4_connect") int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk) { *(volatile long *)sk; return 0; } BPF Program before | BPF Program after ------------------ | ----------------- 0: (79) r1 = *(u64 *)(r1 +0) 0: (79) r1 = *(u64 *)(r1 +0) ----------------------------------------------------------------------- 1: (79) r1 = *(u64 *)(r1 +0) --\ 1: (bf) r11 = r1 ----------------------------\ \ 2: (77) r11 >>= 32 2: (b7) r0 = 0 \ \ 3: (b5) if r11 <= 0x8000 goto pc+2 3: (95) exit \ \-> 4: (79) r1 = *(u64 *)(r1 +0) \ 5: (05) goto pc+1 \ 6: (b7) r1 = 0 \-------------------------------------- 7: (b7) r0 = 0 8: (95) exit As you can see from above, in the best case (off=0), 5 extra instructions are emitted. Now, we analyze the same program after it has gone through the JITs of ARM64 and RISC-V architectures. We follow the single load instruction that has the untrusted pointer and see what instrumentation has been added around it. x86-64 JIT ========== JIT's Instrumentation (upstream) --------------------- 0: nopl 0x0(%rax,%rax,1) 5: xchg %ax,%ax 7: push %rbp 8: mov %rsp,%rbp b: mov 0x0(%rdi),%rdi --------------------------------- f: movabs $0x800000000000,%r11 19: cmp %r11,%rdi 1c: jb 0x000000000000002a 1e: mov %rdi,%r11 21: add $0x0,%r11 28: jae 0x000000000000002e 2a: xor %edi,%edi 2c: jmp 0x0000000000000032 2e: mov 0x0(%rdi),%rdi --------------------------------- 32: xor %eax,%eax 34: leave 35: ret The x86-64 JIT already emits some instructions to protect against user memory access. This patch doesn't make any changes for the x86-64 JIT. ARM64 JIT ========= No Intrumentation Verifier's Instrumentation (upstream) (This patch) ----------------- -------------------------- 0: add x9, x30, #0x0 0: add x9, x30, #0x0 4: nop 4: nop 8: paciasp 8: paciasp c: stp x29, x30, [sp, #-16]! c: stp x29, x30, [sp, #-16]! 10: mov x29, sp 10: mov x29, sp 14: stp x19, x20, [sp, #-16]! 14: stp x19, x20, [sp, #-16]! 18: stp x21, x22, [sp, #-16]! 18: stp x21, x22, [sp, #-16]! 1c: stp x25, x26, [sp, #-16]! 1c: stp x25, x26, [sp, #-16]! 20: stp x27, x28, [sp, #-16]! 20: stp x27, x28, [sp, #-16]! 24: mov x25, sp 24: mov x25, sp 28: mov x26, #0x0 28: mov x26, #0x0 2c: sub x27, x25, #0x0 2c: sub x27, x25, #0x0 30: sub sp, sp, #0x0 30: sub sp, sp, #0x0 34: ldr x0, [x0] 34: ldr x0, [x0] -------------------------------------------------------------------------------- 38: ldr x0, [x0] ----------\ 38: add x9, x0, #0x0 -----------------------------------\\ 3c: lsr x9, x9, #32 3c: mov x7, #0x0 \\ 40: cmp x9, #0x10, lsl #12 40: mov sp, sp \\ 44: b.ls 0x0000000000000050 44: ldp x27, x28, [sp], #16 \\--> 48: ldr x0, [x0] 48: ldp x25, x26, [sp], #16 \ 4c: b 0x0000000000000054 4c: ldp x21, x22, [sp], #16 \ 50: mov x0, #0x0 50: ldp x19, x20, [sp], #16 \--------------------------------------- 54: ldp x29, x30, [sp], #16 54: mov x7, #0x0 58: add x0, x7, #0x0 58: mov sp, sp 5c: autiasp 5c: ldp x27, x28, [sp], #16 60: ret 60: ldp x25, x26, [sp], #16 64: nop 64: ldp x21, x22, [sp], #16 68: ldr x10, 0x0000000000000070 68: ldp x19, x20, [sp], #16 6c: br x10 6c: ldp x29, x30, [sp], #16 70: add x0, x7, #0x0 74: autiasp 78: ret 7c: nop 80: ldr x10, 0x0000000000000088 84: br x10 There are 6 extra instructions added in ARM64 in the best case. This will become 7 in the worst case (off != 0). RISC-V JIT (RISCV_ISA_C Disabled) ========== No Intrumentation Verifier's Instrumentation (upstream) (This patch) ----------------- -------------------------- 0: nop 0: nop 4: nop 4: nop 8: li a6, 33 8: li a6, 33 c: addi sp, sp, -16 c: addi sp, sp, -16 10: sd s0, 8(sp) 10: sd s0, 8(sp) 14: addi s0, sp, 16 14: addi s0, sp, 16 18: ld a0, 0(a0) 18: ld a0, 0(a0) --------------------------------------------------------------- 1c: ld a0, 0(a0) --\ 1c: mv t0, a0 --------------------------\ \ 20: srli t0, t0, 32 20: li a5, 0 \ \ 24: lui t1, 4096 24: ld s0, 8(sp) \ \ 28: sext.w t1, t1 28: addi sp, sp, 16 \ \ 2c: bgeu t1, t0, 12 2c: sext.w a0, a5 \ \--> 30: ld a0, 0(a0) 30: ret \ 34: j 8 \ 38: li a0, 0 \------------------------------ 3c: li a5, 0 40: ld s0, 8(sp) 44: addi sp, sp, 16 48: sext.w a0, a5 4c: ret There are 7 extra instructions added in RISC-V. Fixes: 800834285361 ("bpf, arm64: Add BPF exception tables") Reported-by: Breno Leitao <leitao@debian.org> Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240424100210.11982-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 daysMerge tag 'imx-fixes-6.9-2' of ↵Arnd Bergmann2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into for-next i.MX fixes for 6.9, round 2: - Fix i.MX8MP the second CSI2 assigned-clock property which got wrong by commit f78835d1e616 ("arm64: dts: imx8mp: reparent MEDIA_MIPI_PHY1_REF to CLK_24M") - Correct USB over-current polarity for imx6ull-tarragon board * tag 'imx-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6ull-tarragon: fix USB over-current polarity arm64: dts: imx8mp: Fix assigned-clocks for second CSI2 Link: https://lore.kernel.org/r/ZioopqscxwUOwQkf@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
12 daysMerge tag 'mtk-dts64-fixes-for-v6.9' of ↵Arnd Bergmann12-41/+70
https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into for-next MediaTek ARM64 DTS fixes for v6.9 This fixes some dts validation issues against bindings for multiple SoCs, GPU voltage constraints for Chromebook devices, missing gce-client-reg on various nodes (performance issues) on MT8183/92/95, and also fixes boot issues on MT8195 when SPMI is built as module. * tag 'mtk-dts64-fixes-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: arm64: dts: mediatek: mt2712: fix validation errors arm64: dts: mediatek: mt7986: prefix BPI-R3 cooling maps with "map-" arm64: dts: mediatek: mt7986: drop invalid thermal block clock arm64: dts: mediatek: mt7986: drop "#reset-cells" from Ethernet controller arm64: dts: mediatek: mt7986: drop invalid properties from ethsys arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block arm64: dts: mediatek: mt7622: fix ethernet controller "compatible" arm64: dts: mediatek: mt7622: fix IR nodename arm64: dts: mediatek: mt7622: fix clock controllers arm64: dts: mediatek: mt8186-corsola: Update min voltage constraint for Vgpu arm64: dts: mediatek: mt8183-kukui: Use default min voltage for MT6358 arm64: dts: mediatek: mt8195-cherry: Update min voltage constraint for MT6315 arm64: dts: mediatek: mt8192-asurada: Update min voltage constraint for MT6315 arm64: dts: mediatek: cherry: Describe CPU supplies arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex1 arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex arm64: dts: mediatek: mt8195: Add missing gce-client-reg to vpp/vdosys arm64: dts: mediatek: mt8192: Add missing gce-client-reg to mutex arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
12 daysMerge tag 'at91-fixes-6.9' of ↵Arnd Bergmann2-8/+8
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into for-next AT91 fixes for 6.9 It contains: - fixes for regulator nodes on SAMA7G5 based boards: proper DT property is used to setup regulators suspend voltage. * tag 'at91-fixes-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: microchip: at91-sama7g54_curiosity: Replace regulator-suspend-voltage with the valid property ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with the valid property Link: https://lore.kernel.org/r/20240421124824.960096-1-claudiu.beznea@tuxon.dev Signed-off-by: Arnd Bergmann <arnd@arndb.de>
12 daysMerge tag 'qcom-arm64-fixes-for-6.9' of ↵Arnd Bergmann10-38/+31
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into for-next Qualcomm Arm64 DeviceTree fixes for v6.9 This corrects the watchdog IRQ flags for a number of remoteproc instances, which otherwise prevents the driver from probe in the face of a probe deferral. Improvements in other areas, such as USB, have made it possible for CX rail voltage on SC8280XP to be lowered, no longer meeting requirements of active PCIe controllers. Necessary votes are added to these controllers. The MSI definitions for PCIe controllers in SM8450, SM8550, and SM8650 was incorrect, due to a bug in the driver. As this has now been fixed the definition needs to be corrected. Lastly, the SuperSpeed PHY irq of the second USB controller in SC8180x, and the compatible string for X1 Elite domain idle states are corrected. * tag 'qcom-arm64-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller arm64: dts: qcom: sm8650: Fix the msi-map entries arm64: dts: qcom: sm8550: Fix the msi-map entries arm64: dts: qcom: sm8450: Fix the msi-map entries arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs Link: https://lore.kernel.org/r/20240420161002.1132240-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
12 daysMerge branch 'v6.9-armsoc/dtsfixes' of ↵Arnd Bergmann11-17/+60
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into for-next * 'v6.9-armsoc/dtsfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64 arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2 arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1 arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo arm64: dts: rockchip: mark system power controller and fix typo on orangepi-5-plus arm64: dts: rockchip: Designate the system power controller on QuartzPro64 arm64: dts: rockchip: drop panel port unit address in GRU Scarlet arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi CM5 arm64: dts: rockchip: add regulators for PCIe on RK3399 Puma Haikou arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma arm64: dts: rockchip: fix alphabetical ordering RK3399 puma arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma arm64: dts: rockchip: set PHY address of MT7531 switch to 0x1f Link: https://lore.kernel.org/r/3413596.CbtlEUcBR6@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
12 dayss390/vdso: Add CFI for RA register to asm macro vdso_funcJens Remus2-0/+3
The return-address (RA) register r14 is specified as volatile in the s390x ELF ABI [1]. Nevertheless proper CFI directives must be provided for an unwinder to restore the return address, if the RA register value is changed from its value at function entry, as it is the case. [1]: s390x ELF ABI, https://github.com/IBM/s390x-abi/releases Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Signed-off-by: Jens Remus <jremus@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
13 daysriscv: T-Head: Test availability bit before enabling MAE errataChristoph Müllner1-4/+10
T-Head's memory attribute extension (XTheadMae) (non-compatible equivalent of RVI's Svpbmt) is currently assumed for all T-Head harts. However, QEMU recently decided to drop acceptance of guests that write reserved bits in PTEs. As XTheadMae uses reserved bits in PTEs and Linux applies the MAE errata for all T-Head harts, this broke the Linux startup on QEMU emulations of the C906 emulation. This patch attempts to address this issue by testing the MAE-enable bit in the th.sxstatus CSR. This CSR is available in HW and can be emulated in QEMU. This patch also makes the XTheadMae probing mechanism reliable, because a test for the right combination of mvendorid, marchid, and mimpid is not sufficient to enable MAE. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Link: https://lore.kernel.org/r/20240407213236.2121592-3-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
13 daysriscv: thead: Rename T-Head PBMT to MAEChristoph Müllner3-19/+19
T-Head's vendor extension to set page attributes has the name MAE (memory attribute extension). Let's rename it, so it is clear what this referes to. Link: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmae.adoc Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Link: https://lore.kernel.org/r/20240407213236.2121592-2-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
13 daysLoongArch: Lately init pmu after smp is onlineBibo Mao1-1/+1
There is an smp function call named reset_counters() to init PMU registers of every CPU in PMU initialization state. It requires that all CPUs are online. However there is an early_initcall() wrapper for the PMU init funciton init_hw_perf_events(), so that pmu init funciton is called in do_pre_smp_initcalls() which before function smp_init(). Function reset_counters() cannot work on other CPUs since they haven't boot up still. Here replace the wrapper early_initcall() with pure_initcall(), so that the PMU init function is called after every cpu is online. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 dayscpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=nSean Christopherson1-2/+6
Explicitly disallow enabling mitigations at runtime for kernels that were built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code entirely if mitigations are disabled at compile time. E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS, and trying to provide sane behavior for retroactively enabling mitigations is extremely difficult, bordering on impossible. E.g. page table isolation and call depth tracking require build-time support, BHI mitigations will still be off without additional kernel parameters, etc. [ bp: Touchups. ] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240420000556.2645001-3-seanjc@google.com
13 dayscpu: Re-enable CPU mitigations by default for !X86 architecturesSean Christopherson2-5/+14
Rename x86's to CPU_MITIGATIONS, define it in generic code, and force it on for all architectures exception x86. A recent commit to turn mitigations off by default if SPECULATION_MITIGATIONS=n kinda sorta missed that "cpu_mitigations" is completely generic, whereas SPECULATION_MITIGATIONS is x86-specific. Rename x86's SPECULATIVE_MITIGATIONS instead of keeping both and have it select CPU_MITIGATIONS, as having two configs for the same thing is unnecessary and confusing. This will also allow x86 to use the knob to manage mitigations that aren't strictly related to speculative execution. Use another Kconfig to communicate to common code that CPU_MITIGATIONS is already defined instead of having x86's menu depend on the common CPU_MITIGATIONS. This allows keeping a single point of contact for all of x86's mitigations, and it's not clear that other architectures *want* to allow disabling mitigations at compile-time. Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n") Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240420000556.2645001-2-seanjc@google.com
13 daysARM: dts: imx6ull-tarragon: fix USB over-current polarityMichael Heimpold1-0/+1
Our Tarragon platform uses a active-low signal to inform the i.MX6ULL about the over-current detection. Fixes: 5e4f393ccbf0 ("ARM: dts: imx6ull: Add chargebyte Tarragon support") Signed-off-by: Michael Heimpold <michael.heimpold@chargebyte.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
14 daysKVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()Oliver Upton1-4/+4
vgic_v2_parse_attr() is responsible for finding the vCPU that matches the user-provided CPUID, which (of course) may not be valid. If the ID is invalid, kvm_get_vcpu_by_id() returns NULL, which isn't handled gracefully. Similar to the GICv3 uaccess flow, check that kvm_get_vcpu_by_id() actually returns something and fail the ioctl if not. Cc: stable@vger.kernel.org Fixes: 7d450e282171 ("KVM: arm/arm64: vgic-new: Add userland access to VGIC dist registers") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240424173959.3776798-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
14 daysx86/tdx: Preserve shared bit on mprotect()Kirill A. Shutemov2-1/+3
The TDX guest platform takes one bit from the physical address to indicate if the page is shared (accessible by VMM). This bit is not part of the physical_mask and is not preserved during mprotect(). As a result, the 'shared' bit is lost during mprotect() on shared mappings. _COMMON_PAGE_CHG_MASK specifies which PTE bits need to be preserved during modification. AMD includes 'sme_me_mask' in the define to preserve the 'encrypt' bit. To cover both Intel and AMD cases, include 'cc_mask' in _COMMON_PAGE_CHG_MASK instead of 'sme_me_mask'. Reported-and-tested-by: Chris Oo <cho@microsoft.com> Fixes: 41394e33f3a0 ("x86/tdx: Extend the confidential computing API to support TDX guests") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240424082035.4092071-1-kirill.shutemov%40linux.intel.com
14 daysx86/cpu: Fix check for RDPKRU in __show_regs()David Kaplan1-1/+1
cpu_feature_enabled(X86_FEATURE_OSPKE) does not necessarily reflect whether CR4.PKE is set on the CPU. In particular, they may differ on non-BSP CPUs before setup_pku() is executed. In this scenario, RDPKRU will #UD causing the system to hang. Fix by checking CR4 for PKE enablement which is always correct for the current CPU. The scenario happens by inserting a WARN* before setup_pku() in identiy_cpu() or some other diagnostic which would lead to calling __show_regs(). [ bp: Massage commit message. ] Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240421191728.32239-1-bp@kernel.org
14 daysx86/CPU/AMD: Add models 0x10-0x1f to the Zen5 rangeWenkuan Wang1-2/+1
Add some more Zen5 models. Fixes: 3e4147f33f8b ("x86/CPU/AMD: Add X86_FEATURE_ZEN5") Signed-off-by: Wenkuan Wang <Wenkuan.Wang@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240423144111.1362-1-bp@kernel.org
2024-04-24LoongArch: Fix callchain parse error with kernel tracepoint eventsHuacai Chen1-0/+8
In order to fix perf's callchain parse error for LoongArch, we implement perf_arch_fetch_caller_regs() which fills several necessary registers used for callchain unwinding, including sp, fp, and era. This is similar to the following commits. commit b3eac0265bf6: ("arm: perf: Fix callchain parse error with kernel tracepoint events") commit 5b09a094f2fb: ("arm64: perf: Fix callchain parse error with kernel tracepoint events") commit 9a7e8ec0d4cc: ("riscv: perf: Fix callchain parse error with kernel tracepoint events") Test with commands: perf record -e sched:sched_switch -g --call-graph dwarf perf report Without this patch: Children Self Command Shared Object Symbol ........ ........ ............. ................. .................... 43.41% 43.41% swapper [unknown] [k] 0000000000000000 10.94% 10.94% loong-container [unknown] [k] 0000000000000000 | |--5.98%--0x12006ba38 | |--2.56%--0x12006bb84 | --2.40%--0x12006b6b8 With this patch, callchain can be parsed correctly: Children Self Command Shared Object Symbol ........ ........ ............. ................. .................... 47.57% 47.57% swapper [kernel.vmlinux] [k] __schedule | ---__schedule 26.76% 26.76% loong-container [kernel.vmlinux] [k] __schedule | |--13.78%--0x12006ba38 | | | |--9.19%--__schedule | | | --4.59%--handle_syscall | do_syscall | sys_futex | do_futex | futex_wait | futex_wait_queue_me | hrtimer_start_range_ns | __schedule | |--8.38%--0x12006bb84 | handle_syscall | do_syscall | sys_epoll_pwait | do_epoll_wait | schedule_hrtimeout_range_clock | hrtimer_start_range_ns | __schedule | --4.59%--0x12006b6b8 handle_syscall do_syscall sys_nanosleep hrtimer_nanosleep do_nanosleep hrtimer_start_range_ns __schedule Cc: stable@vger.kernel.org Fixes: b37042b2bb7cd751f0 ("LoongArch: Add perf events support") Reported-by: Youling Tang <tangyouling@kylinos.cn> Suggested-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-24LoongArch: Fix access error when read fault on a write-only VMAJiantao Shan1-2/+2
As with most architectures, allow handling of read faults in VMAs that have VM_WRITE but without VM_READ (WRITE implies READ). Otherwise, reading before writing a write-only memory will error while reading after writing everything is fine. BTW, move the VM_EXEC judgement before VM_READ/VM_WRITE to make logic a little clearer. Cc: stable@vger.kernel.org Fixes: 09cfefb7fa70c3af01 ("LoongArch: Add memory management") Signed-off-by: Jiantao Shan <shanjiantao@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-24LoongArch: Fix a build error due to __tlb_remove_tlb_entry()David Hildenbrand1-2/+0
With LLVM=1 and W=1 we get: ./include/asm-generic/tlb.h:629:10: error: parameter 'ptep' set but not used [-Werror,-Wunused-but-set-parameter] We fixed a similar issue via Arnd in the introducing commit, missed the LoongArch variant. Turns out, there is no need for LoongArch to have a custom variant, so let's just drop it and rely on the asm-generic one. Fixes: 4d5bf0b6183f ("mm/mmu_gather: add tlb_remove_tlb_entries()") Closes: https://lkml.kernel.org/r/CANiq72mQh3O9S4umbvrKBgMMorty48UMwS01U22FR0mRyd3cyQ@mail.gmail.com Reported-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Tested-by: Miguel Ojeda <ojeda@kernel.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-24LoongArch: Fix Kconfig item and left code related to CRASH_COREBaoquan He2-3/+3
In commit 85fcde402db191b5 ("kexec: split crashkernel reservation code out from crash_core.c"), crashkernel reservation code is split out from crash_core.c, and add CRASH_RESERVE to control it. And also rename each ARCH's <asm/crash_core.h> to <asm/crash_reserve.h> accordingly. But the relevant part in LoongArch is missed. Do it now. Fixes: 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c") Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-23riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMINClément Léger1-1/+1
The current definition yields a negative 32bits signed value which result in a mask with is obviously incorrect. Replace it by using a 1ULL bit shift value to obtain a single set bit mask. Fixes: 5dadda5e6a59 ("riscv: hwprobe: export Zvfh[min] ISA extensions") Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240409143839.558784-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-23powerpc/pseries/iommu: LPAR panics during boot up with a frozen PEGaurav Batra1-0/+8
At the time of LPAR boot up, partition firmware provides Open Firmware property ibm,dma-window for the PE. This property is provided on the PCI bus the PE is attached to. There are execptions where the partition firmware might not provide this property for the PE at the time of LPAR boot up. One of the scenario is where the firmware has frozen the PE due to some error condition. This PE is frozen for 24 hours or unless the whole system is reinitialized. Within this time frame, if the LPAR is booted, the frozen PE will be presented to the LPAR but ibm,dma-window property could be missing. Today, under these circumstances, the LPAR oopses with NULL pointer dereference, when configuring the PCI bus the PE is attached to. BUG: Kernel NULL pointer dereference on read at 0x000000c8 Faulting instruction address: 0xc0000000001024c0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: Supported: Yes CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450 REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000 CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0 ... NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 Call Trace: pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) pcibios_setup_bus_self+0x1c0/0x370 __of_scan_bus+0x2f8/0x330 pcibios_scan_phb+0x280/0x3d0 pcibios_init+0x88/0x12c do_one_initcall+0x60/0x320 kernel_init_freeable+0x344/0x3e4 kernel_init+0x34/0x1d0 ret_from_kernel_user_thread+0x14/0x1c Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422205141.10662-1-gbatra@linux.ibm.com
2024-04-22x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handlerTom Lendacky1-2/+4
The MWAITX and MONITORX instructions generate the same #VC error code as the MWAIT and MONITOR instructions, respectively. Update the #VC handler opcode checking to also support the MWAITX and MONITORX opcodes. Fixes: e3ef461af35a ("x86/sev: Harden #VC instruction emulation somewhat") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/453d5a7cfb4b9fe818b6fb67f93ae25468bc9e23.1713793161.git.thomas.lendacky@amd.com
2024-04-22arm32, bpf: Reimplement sign-extension mov instructionPuranjay Mohan1-13/+43
The current implementation of the mov instruction with sign extension has the following problems: 1. It clobbers the source register if it is not stacked because it sign extends the source and then moves it to the destination. 2. If the dst_reg is stacked, the current code doesn't write the value back in case of 64-bit mov. 3. There is room for improvement by emitting fewer instructions. The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations: Case A: offset == 32: ===================== Case A.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Store tmp_lo into dst_lo 3. Sign extend tmp_lo into tmp_hi 4. Store tmp_hi to dst_hi Example: r3 = (s32)r3 r3 is a stacked register ldr r6, [r11, #-16] // Load r3_lo into tmp_lo // str to dst_lo is not emitted because src_lo == dst_lo asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi into r3_hi Case A.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo into dst_hi Example: r6 = (s32)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo into r6_lo asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case A.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Store src_lo into dst_lo 2. Sign extend src_lo into tmp_hi 3. Store tmp_hi to dst_hi Example: r3 = (s32)r6 r3 is stacked and r6 maps to {ARM_R5, ARM_R4} str r4, [r11, #-16] // Store r6_lo to r3_lo asr r7, r4, #31 // Sign extend r6_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi to dest_hi Case A.4: Both src and dst are not stacked: ------------------------------------------- 1. Mov src_lo into dst_lo 2. Sign extend src_lo into dst_hi Example: (bf) r6 = (s32)r6 r6 maps to {ARM_R5, ARM_R4} // Mov not emitted because dst == src asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B: offset != 32: ===================== Case B.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Sign extend tmp_lo according to offset. 3. Store tmp_lo into dst_lo 4. Sign extend tmp_lo into tmp_hi 5. Store tmp_hi to dst_hi Example: r9 = (s8)r3 r9 and r3 are both stacked registers ldr r6, [r11, #-16] // Load r3_lo into tmp_lo lsl r6, r6, #24 // Sign extend tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-56] // Store tmp_lo to r9_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-52] // Store tmp_hi to r9_hi Case B.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo according to offset. 3. Sign extend tmp_lo into dst_hi Example: r6 = (s8)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo to r6_lo lsl r4, r4, #24 // Sign extend r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Sign extend src_lo into tmp_lo according to offset. 2. Store tmp_lo into dst_lo. 3. Sign extend src_lo into tmp_hi. 4. Store tmp_hi to dst_hi. Example: r3 = (s8)r1 r3 is stacked and r1 maps to {ARM_R3, ARM_R2} lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-16] // Store tmp_lo to r3_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-12] // Store tmp_hi to r3_hi Case B.4: Both src and dst are not stacked: ------------------------------------------- 1. Sign extend src_lo into dst_lo according to offset. 2. Sign extend dst_lo into dst_hi. Example: r6 = (s8)r1 r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} lsl r4, r2, #24 // Sign extend r1_lo to r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo to r6_hi Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction") Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com Link: https://lore.kernel.org/bpf/20240419182832.27707-1-puranjay@kernel.org
2024-04-22powerpc/pseries: make max polling consistent for longer H_CALLsNayna Jain2-8/+7
Currently, plpks_confirm_object_flushed() function polls for 5msec in total instead of 5sec. Keep max polling time consistent for all the H_CALLs, which take longer than expected, to be 5sec. Also, make use of fsleep() everywhere to insert delay. Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240418031230.170954-1-nayna@linux.ibm.com
2024-04-22s390/mm: Fix clearing storage keys for huge pagesClaudio Imbrenda1-1/+1
The function __storage_key_init_range() expects the end address to be the first byte outside the range to be initialized. I.e. end - start should be the size of the area to be initialized. The current code works because __storage_key_init_range() will still loop over every page in the range, but it is slower than using sske_frame(). Fixes: 3afdfca69870 ("s390/mm: Clear skeys for newly mapped huge guest pmds") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20240416114220.28489-3-imbrenda@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22s390/mm: Fix storage key clearing for guest huge pagesClaudio Imbrenda1-1/+1
The function __storage_key_init_range() expects the end address to be the first byte outside the range to be initialized. I.e. end - start should be the size of the area to be initialized. The current code works because __storage_key_init_range() will still loop over every page in the range, but it is slower than using sske_frame(). Fixes: 964c2c05c9f3 ("s390/mm: Clear huge page storage keys on enable_skey") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20240416114220.28489-2-imbrenda@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22arm64: dts: imx8mp: Fix assigned-clocks for second CSI2Marek Vasut1-1/+1
The first CSI2 pixel clock are supplied from IMX8MP_CLK_MEDIA_CAM1_PIX_ROOT, the second CSI2 pixel clock are supplied from IMX8MP_CLK_MEDIA_CAM2_PIX_ROOT, both clock are supplied from SYS_PLL2 and configured using assigned-clock DT properties. Each CSI2 DT node configures its IMX8MP_CLK_MEDIA_CAMn_PIX_ROOT clock. This used to be the case until likely a copy-paste error in commit f78835d1e616 ("arm64: dts: imx8mp: reparent MEDIA_MIPI_PHY1_REF to CLK_24M") which changed the second CSI2 node to configure IMX8MP_CLK_MEDIA_CAM1_PIX_ROOT using its assigned-clocks property. Fix the second CSI2 assigned-clock property back to the original correct IMX8MP_CLK_MEDIA_CAM2_PIX_ROOT . Fixes: f78835d1e616 ("arm64: dts: imx8mp: reparent MEDIA_MIPI_PHY1_REF to CLK_24M") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2024-04-21Merge tag 'sched_urgent_for_v6.9_rc5' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Add a missing memory barrier in the concurrency ID mm switching * tag 'sched_urgent_for_v6.9_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Add missing memory barrier in switch_mm_cid
2024-04-21Merge tag 'x86_urgent_for_v6.9_rc5' of ↵Linus Torvalds5-12/+87
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix CPU feature dependencies of GFNI, VAES, and VPCLMULQDQ - Print the correct error code when FRED reports a bad event type - Add a FRED-specific INT80 handler without the special dances that need to happen in the current one - Enable the using-the-default-return-thunk-but-you-should-not warning only on configs which actually enable those special return thunks - Check the proper feature flags when selecting BHI retpoline mitigation * tag 'x86_urgent_for_v6.9_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ x86/fred: Fix incorrect error code printout in fred_bad_type() x86/fred: Fix INT80 emulation for FRED x86/retpolines: Enable the default thunk warning only on relevant configs x86/bugs: Fix BHI retpoline check
2024-04-21ARM: dts: microchip: at91-sama7g54_curiosity: Replace ↵Andrei Simion1-4/+4
regulator-suspend-voltage with the valid property By checking the pmic node with microchip,mcp16502.yaml# 'regulator-suspend-voltage' does not match any of the regexes 'pinctrl-[0-9]+' from schema microchip,mcp16502.yaml# which inherits regulator.yaml#. So replace regulator-suspend-voltage with regulator-suspend-microvolt to avoid the inconsitency. Fixes: ebd6591f8ddb ("ARM: dts: microchip: sama7g54_curiosity: Add initial device tree of the board") Signed-off-by: Andrei Simion <andrei.simion@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20240404123824.19182-3-andrei.simion@microchip.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2024-04-21ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with ↵Andrei Simion1-4/+4
the valid property By checking the pmic node with microchip,mcp16502.yaml# 'regulator-suspend-voltage' does not match any of the regexes 'pinctrl-[0-9]+' from schema microchip,mcp16502.yaml# which inherits regulator.yaml#. So replace regulator-suspend-voltage with regulator-suspend-microvolt to avoid the inconsitency. Fixes: 85b1304b9daa ("ARM: dts: at91: sama7g5ek: set regulator voltages for standby state") Signed-off-by: Andrei Simion <andrei.simion@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20240404123824.19182-2-andrei.simion@microchip.com [claudiu.beznea: added a dot before starting the last sentence in commit description] Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2024-04-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds18-113/+157
Pull kvm fixes from Paolo Bonzini: "This is a bit on the large side, mostly due to two changes: - Changes to disable some broken PMU virtualization (see below for details under "x86 PMU") - Clean up SVM's enter/exit assembly code so that it can be compiled without OBJECT_FILES_NON_STANDARD. This fixes a warning "Unpatched return thunk in use. This should not happen!" when running KVM selftests. Everything else is small bugfixes and selftest changes: - Fix a mostly benign bug in the gfn_to_pfn_cache infrastructure where KVM would allow userspace to refresh the cache with a bogus GPA. The bug has existed for quite some time, but was exposed by a new sanity check added in 6.9 (to ensure a cache is either GPA-based or HVA-based). - Drop an unused param from gfn_to_pfn_cache_invalidate_start() that got left behind during a 6.9 cleanup. - Fix a math goof in x86's hugepage logic for KVM_SET_MEMORY_ATTRIBUTES that results in an array overflow (detected by KASAN). - Fix a bug where KVM incorrectly clears root_role.direct when userspace sets guest CPUID. - Fix a dirty logging bug in the where KVM fails to write-protect SPTEs used by a nested guest, if KVM is using Page-Modification Logging and the nested hypervisor is NOT using EPT. x86 PMU: - Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. - Set the enable bits for general purpose counters in PERF_GLOBAL_CTRL at RESET time, as done by both Intel and AMD processors. - Disable LBR virtualization on CPUs that don't support LBR callstacks, as KVM unconditionally uses PERF_SAMPLE_BRANCH_CALL_STACK when creating the perf event, and would fail on such CPUs. Tests: - Fix a flaw in the max_guest_memory selftest that results in it exhausting the supply of ucall structures when run with more than 256 vCPUs. - Mark KVM_MEM_READONLY as supported for RISC-V in set_memory_region_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits) KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start() KVM: selftests: Add coverage of EPT-disabled to vmx_dirty_log_test KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs. write-protecting KVM: x86/mmu: Remove function comments above clear_dirty_{gfn_range,pt_masked}() KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID update KVM: VMX: Disable LBR virtualization if the CPU doesn't support LBR callstacks perf/x86/intel: Expose existence of callback support to KVM KVM: VMX: Snapshot LBR capabilities during module initialization KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run() KVM: SVM: Save/restore args across SEV-ES VMRUN via host save area KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via host save area KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_intercepted KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run() KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEV KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwinding KVM: SVM: Remove a useless zeroing of allocated memory ...
2024-04-20Merge tag 'powerpc-6.9-3' of ↵Linus Torvalds2-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix wireguard loading failure on pre-Power10 due to Power10 crypto routines - Fix papr-vpd selftest failure due to missing variable initialization - Avoid unnecessary get/put in spapr_tce_platform_iommu_attach_dev() Thanks to Geetika Moolchandani, Jason Gunthorpe, Michal Suchánek, Nathan Lynch, and Shivaprasad G Bhat. * tag 'powerpc-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc/papr-vpd: Fix missing variable initialization powerpc/crypto/chacha-p10: Fix failure on non Power10 powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()
2024-04-19Merge tag 'arm64-fixes' of ↵Linus Torvalds3-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix a kernel fault during page table walking in huge_pte_alloc() with PTABLE_LEVELS=5 due to using p4d_offset() instead of p4d_alloc() - head.S fix and cleanup to disable the MMU before toggling the HCR_EL2.E2H bit when entering the kernel with the MMU on from the EFI stub. Changing this bit (currently from VHE to nVHE) causes some system registers as well as page table descriptors to be interpreted differently, potentially resulting in spurious MMU faults - Fix translation fault in swsusp_save() accessing MEMBLOCK_NOMAP memory ranges due to kernel_page_present() returning true in most configurations other than rodata_full == true, CONFIG_DEBUG_PAGEALLOC=y or CONFIG_KFENCE=y * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hibernate: Fix level3 translation fault in swsusp_save() arm64/head: Disable MMU at EL2 before clearing HCR_EL2.E2H arm64/head: Drop unnecessary pre-disable-MMU workaround arm64/hugetlb: Fix page table walk in huge_pte_alloc()
2024-04-19Merge tag 's390-6.9-4' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Fix NULL pointer dereference in program check handler - Fake IRBs are important events relevant for problem analysis. Add traces when queueing and delivering - Fix a race condition in ccw_device_set_online() that can cause the online process to fail - Deferred condition code 1 response indicates that I/O was not started and should be retried. The current QDIO implementation handles a cc1 response as an error, resulting in a failed QDIO setup. Fix that by retrying the setup when a cc1 response is received * tag 's390-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: Fix NULL pointer dereference s390/cio: log fake IRB events s390/cio: fix race condition during online processing s390/qdio: handle deferred cc1
2024-04-19arm64: hibernate: Fix level3 translation fault in swsusp_save()Yaxiong Tian1-3/+0
On arm64 machines, swsusp_save() faults if it attempts to access MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n: Unable to handle kernel paging request at virtual address ffffff8000000000 Mem abort info: ESR = 0x0000000096000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000 [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000 Internal error: Oops: 0000000096000007 [#1] SMP Internal error: Oops: 0000000096000007 [#1] SMP Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ #76 Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0 Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021 pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : swsusp_save+0x280/0x538 lr : swsusp_save+0x280/0x538 sp : ffffffa034a3fa40 x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000 x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000 x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2 x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000 x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666 x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0 x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001 x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e Call trace: swsusp_save+0x280/0x538 swsusp_arch_suspend+0x148/0x190 hibernation_snapshot+0x240/0x39c hibernate+0xc4/0x378 state_store+0xf0/0x10c kobj_attr_store+0x14/0x24 The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable() -> kernel_page_present() assuming that a page is always present when can_set_direct_map() is false (all of rodata_full, debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false), irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions should not be saved during hibernation. This problem was introduced by changes to the pfn_valid() logic in commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify pfn_valid()"). Similar to other architectures, drop the !can_set_direct_map() check in kernel_page_present() so that page_is_savable() skips such pages. Fixes: a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify pfn_valid()") Cc: <stable@vger.kernel.org> # 5.14.x Suggested-by: Mike Rapoport <rppt@kernel.org> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: xiongxin <xiongxin@kylinos.cn> Signed-off-by: xiongxin <xiongxin@kylinos.cn> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://lore.kernel.org/r/20240417025248.386622-1-tianyaxiong@kylinos.cn [catalin.marinas@arm.com: rework commit message] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-04-18arm64/head: Disable MMU at EL2 before clearing HCR_EL2.E2HArd Biesheuvel1-0/+5
Even though the boot protocol stipulates otherwise, an exception has been made for the EFI stub, and entering the core kernel with the MMU enabled is permitted. This allows a substantial amount of cache maintenance to be elided, wich is significant when fast boot times are critical (e.g., for booting micro-VMs) Once the initial ID map has been populated, the MMU is disabled as part of the logic sequence that puts all system registers into a known state. Any code that needs to execute within the window where the MMU is off is cleaned to the PoC explicitly, which includes all of HYP text when entering at EL2. However, the current sequence of initializing the EL2 system registers is not safe: HCR_EL2 is set to its nVHE initial state before SCTLR_EL2 is reprogrammed, and this means that a VHE-to-nVHE switch may occur while the MMU is enabled. This switch causes some system registers as well as page table descriptors to be interpreted in a different way, potentially resulting in spurious exceptions relating to MMU translation. So disable the MMU explicitly first when entering in EL2 with the MMU and caches enabled. Fixes: 617861703830 ("efi: arm64: enter with MMU and caches enabled") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: <stable@vger.kernel.org> # 6.3.x Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240415075412.2347624-6-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-04-18arm64/head: Drop unnecessary pre-disable-MMU workaroundArd Biesheuvel1-2/+0
The Falkor erratum that results in the need for an ISB before clearing the M bit in SCTLR_ELx only applies to execution at exception level x, and so the workaround is not needed when disabling the EL1 MMU while running at EL2. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240415075412.2347624-5-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-04-18x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQEric Biggers1-3/+3
Fix cpuid_deps[] to list the correct dependencies for GFNI, VAES, and VPCLMULQDQ. These features don't depend on AVX512, and there exist CPUs that support these features but not AVX512. GFNI actually doesn't even depend on AVX. This prevents GFNI from being unnecessarily disabled if AVX is disabled to mitigate the GDS vulnerability. This also prevents all three features from being unnecessarily disabled if AVX512VL (or its dependency AVX512F) were to be disabled, but it looks like there isn't any case where this happens anyway. Fixes: c128dbfa0f87 ("x86/cpufeatures: Enable new SSE/AVX/AVX512 CPU features") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20240417060434.47101-1-ebiggers@kernel.org
2024-04-18x86/fred: Fix incorrect error code printout in fred_bad_type()Hou Wenlong1-4/+4
regs->orig_ax has been set to -1 on entry so in the printout, fred_bad_type() should use the passed parameter error_code. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: https://lore.kernel.org/r/b2a8f0a41449d25240e314a2ddfbf6549511fb04.1713353612.git.houwenlong.hwl@antgroup.com
2024-04-18x86/fred: Fix INT80 emulation for FREDXin Li (Intel)2-1/+66
Add a FRED-specific INT80 handler and document why it differs from the current one. Eventually, the common bits will be unified once FRED hw is available and it turns out that no further changes are needed but for now, keep the handlers separate for everyone's sanity's sake. [ bp: Zap duplicated commit message, massage. ] Fixes: 55617fb991df ("x86/entry: Do not allow external 0x80 interrupts") Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240417174731.4189592-1-xin@zytor.com
2024-04-17arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64Rob Herring1-1/+1
The correct compatible string for a USB interface node begins with "usbif", not "usb". Fix the Rockchip RK3399 based Kobol Helios64 board. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240412204405.3703638-1-robh@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-17x86/retpolines: Enable the default thunk warning only on relevant configsBorislav Petkov (AMD)1-0/+7
The using-default-thunk warning check makes sense only with configurations which actually enable the special return thunks. Otherwise, it fires on unrelated 32-bit configs on which the special return thunks won't even work (they're 64-bit only) and, what is more, those configs even go off into the weeds when booting in the alternatives patching code, leading to a dead machine. Fixes: 4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime") Reported-by: Klara Modin <klarasmodin@gmail.com> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Klara Modin <klarasmodin@gmail.com> Link: https://lore.kernel.org/r/78e0d19c-b77a-4169-a80f-2eef91f4a1d6@gmail.com Link: https://lore.kernel.org/r/20240413024956.488d474e@yea
2024-04-17Merge branch 'svm' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini5-67/+57
Clean up SVM's enter/exit assembly code so that it can be compiled without OBJECT_FILES_NON_STANDARD. The "standard" __svm_vcpu_run() can't be made 100% bulletproof, as RBP isn't restored on #VMEXIT, but that's also the case for __vmx_vcpu_run(), and getting "close enough" is better than not even trying. As for SEV-ES, after yet another refresher on swap types, I realized KVM can simply let the hardware restore registers after #VMEXIT, all that's missing is storing the current values to the host save area (they are swap type B). This should provide 100% accuracy when using stack frames for unwinding, and requires less assembly. In between, build the SEV-ES code iff CONFIG_KVM_AMD_SEV=y, and yank out "support" for 32-bit kernels in __svm_sev_es_vcpu_run, which was unnecessarily polluting the code for a configuration that is disabled at build time. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-17s390/mm: Fix NULL pointer dereferenceSven Schnelle1-1/+2
The recently added check to figure out if a fault happened on gmap ASCE dereferences the gmap pointer in lowcore without checking that it is not NULL. For all non-KVM processes the pointer is NULL, so that some value from lowcore will be read. With the current layouts of struct gmap and struct lowcore the read value (aka ASCE) is zero, so that this doesn't lead to any observable bug; at least currently. Fix this by adding the missing NULL pointer check. Fixes: 64c3431808bd ("s390/entry: compare gmap asce to determine guest/host fault") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-16ARC: [plat-hsdk]: Remove misplaced interrupt-cells propertyAlexey Brodkin1-1/+0
"gmac" node stands for just an ordinary Ethernet controller, which is by no means a provider of interrupts, i.e. it doesn't serve as an interrupt controller, thus "#interrupt-cells" property doesn't belong to it and so we remove it. Fixes: ------------>8------------ DTC arch/arc/boot/dts/hsdk.dtb arch/arc/boot/dts/hsdk.dts:207.23-235.5: Warning (interrupt_provider): /soc/ethernet@8000: '#interrupt-cells' found, but node is not an interrupt provider arch/arc/boot/dts/hsdk.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider' ------------>8------------ Reported-by: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2024-04-16Merge tag 'kvm-x86-fixes-6.9-rcN' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini8-43/+84
- Fix a mostly benign bug in the gfn_to_pfn_cache infrastructure where KVM would allow userspace to refresh the cache with a bogus GPA. The bug has existed for quite some time, but was exposed by a new sanity check added in 6.9 (to ensure a cache is either GPA-based or HVA-based). - Drop an unused param from gfn_to_pfn_cache_invalidate_start() that got left behind during a 6.9 cleanup. - Disable support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken and can leak host LBRs to the guest. - Fix a bug where KVM neglects to set the enable bits for general purpose counters in PERF_GLOBAL_CTRL when initializing the virtual PMU. Both Intel and AMD architectures require the bits to be set at RESET in order for v2 PMUs to be backwards compatible with software that was written for v1 PMUs, i.e. for software that will never manually set the global enables. - Disable LBR virtualization on CPUs that don't support LBR callstacks, as KVM unconditionally uses PERF_SAMPLE_BRANCH_CALL_STACK when creating the virtual LBR perf event, i.e. KVM will always fail to create LBR events on such CPUs. - Fix a math goof in x86's hugepage logic for KVM_SET_MEMORY_ATTRIBUTES that results in an array overflow (detected by KASAN). - Fix a flaw in the max_guest_memory selftest that results in it exhausting the supply of ucall structures when run with more than 256 vCPUs. - Mark KVM_MEM_READONLY as supported for RISC-V in set_memory_region_test. - Fix a bug where KVM incorrectly thinks a TDP MMU root is an indirect shadow root due KVM unnecessarily clobbering root_role.direct when userspace sets guest CPUID. - Fix a dirty logging bug in the where KVM fails to write-protect TDP MMU SPTEs used for L2 if Page-Modification Logging is enabled for L1 and the L1 hypervisor is NOT using EPT (if nEPT is enabled, KVM doesn't use the TDP MMU to run L2). For simplicity, KVM always disables PML when running L2, but the TDP MMU wasn't accounting for root-specific conditions that force write- protect based dirty logging.
2024-04-16riscv, bpf: Fix incorrect runtime statsXu Kuohai1-3/+3
When __bpf_prog_enter() returns zero, the s1 register is not set to zero, resulting in incorrect runtime stats. Fix it by setting s1 immediately upon the return of __bpf_prog_enter(). Fixes: 49b5e77ae3e2 ("riscv, bpf: Add bpf trampoline support for RV64") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240416064208.2919073-3-xukuohai@huaweicloud.com
2024-04-16bpf, arm64: Fix incorrect runtime statsXu Kuohai1-3/+3
When __bpf_prog_enter() returns zero, the arm64 register x20 that stores prog start time is not assigned to zero, causing incorrect runtime stats. To fix it, assign the return value of bpf_prog_enter() to x20 register immediately upon its return. Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64") Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Ivan Babrou <ivan@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240416064208.2919073-2-xukuohai@huaweicloud.com
2024-04-16sched: Add missing memory barrier in switch_mm_cidMathieu Desnoyers1-0/+3
Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb() which the core scheduler code has depended upon since commit: commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can unset the actively used cid when it fails to observe active task after it sets lazy_put. There *is* a memory barrier between storing to rq->curr and _return to userspace_ (as required by membarrier), but the rseq mm_cid has stricter requirements: the barrier needs to be issued between store to rq->curr and switch_mm_cid(), which happens earlier than: - spin_unlock(), - switch_to(). So it's fine when the architecture switch_mm() happens to have that barrier already, but less so when the architecture only provides the full barrier in switch_to() or spin_unlock(). It is a bug in the rseq switch_mm_cid() implementation. All architectures that don't have memory barriers in switch_mm(), but rather have the full barrier either in finish_lock_switch() or switch_to() have them too late for the needs of switch_mm_cid(). Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the generic barrier.h header, and use it in switch_mm_cid() for scheduler transitions where switch_mm() is expected to provide a memory barrier. Architectures can override smp_mb__after_switch_mm() if their switch_mm() implementation provides an implicit memory barrier. Override it with a no-op on x86 which implicitly provide this memory barrier by writing to CR3. Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Reported-by: levi.yun <yeoreum.yun@arm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64 Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86 Cc: <stable@vger.kernel.org> # 6.4.x Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240415152114.59122-2-mathieu.desnoyers@efficios.com
2024-04-15arm64/hugetlb: Fix page table walk in huge_pte_alloc()Anshuman Khandual1-1/+4
Currently normal HugeTLB fault ends up crashing the kernel, as p4dp derived from p4d_offset() is an invalid address when PGTABLE_LEVEL = 5. A p4d level entry needs to be allocated when not available while walking the page table during HugeTLB faults. Let's call p4d_alloc() to allocate such entries when required instead of current p4d_offset(). Unable to handle kernel paging request at virtual address ffffffff80000000 Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000081da9000 [ffffffff80000000] pgd=1000000082cec003, p4d=0000000082c32003, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 108 Comm: high_addr_hugep Not tainted 6.9.0-rc4 #48 Hardware name: Foundation-v8A (DT) pstate: 01402005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : huge_pte_alloc+0xd4/0x334 lr : hugetlb_fault+0x1b8/0xc68 sp : ffff8000833bbc20 x29: ffff8000833bbc20 x28: fff000080080cb58 x27: ffff800082a7cc58 x26: 0000000000000000 x25: fff0000800378e40 x24: fff00008008d6c60 x23: 00000000de9dbf07 x22: fff0000800378e40 x21: 0004000000000000 x20: 0004000000000000 x19: ffffffff80000000 x18: 1ffe00010011d7a1 x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000001 x14: 0000000000000000 x13: ffff8000816120d0 x12: ffffffffffffffff x11: 0000000000000000 x10: fff00008008ebd0c x9 : 0004000000000000 x8 : 0000000000001255 x7 : fff00008003e2000 x6 : 00000000061d54b0 x5 : 0000000000001000 x4 : ffffffff80000000 x3 : 0000000000200000 x2 : 0000000000000004 x1 : 0000000080000000 x0 : 0000000000000000 Call trace: huge_pte_alloc+0xd4/0x334 hugetlb_fault+0x1b8/0xc68 handle_mm_fault+0x260/0x29c do_page_fault+0xfc/0x47c do_translation_fault+0x68/0x74 do_mem_abort+0x44/0x94 el0_da+0x2c/0x9c el0t_64_sync_handler+0x70/0xc4 el0t_64_sync+0x190/0x194 Code: aa000084 cb010084 b24c2c84 8b130c93 (f9400260) ---[ end trace 0000000000000000 ]--- Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Fixes: a6bbf5d4d9d1 ("arm64: mm: Add definitions to support 5 levels of paging") Reported-by: Dev Jain <dev.jain@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20240415094003.1812018-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-04-14Merge tag 'x86-urgent-2024-04-14' of ↵Linus Torvalds8-119/+115
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Follow up fixes for the BHI mitigations code - Fix !SPECULATION_MITIGATIONS bug not turning off mitigations as expected - Work around an APIC emulation bug when the kernel is built with Clang and run as a SEV guest - Follow up x86 topology fixes * tag 'x86-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Move TOPOEXT enablement into the topology parser x86/cpu/amd: Make the NODEID_MSR union actually work x86/cpu/amd: Make the CPUID 0x80000008 parser correct x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto x86/bugs: Clarify that syscall hardening isn't a BHI mitigation x86/bugs: Fix BHI handling of RRSBA x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr' x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES x86/bugs: Fix BHI documentation x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n x86/topology: Don't update cpu_possible_map in topo_set_cpuids() x86/bugs: Fix return type of spectre_bhi_state() x86/apic: Force native_apic_mem_read() to use the MOV instruction
2024-04-14Merge tag 'perf-urgent-2024-04-14' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix the x86 PMU multi-counter code returning invalid data in certain circumstances" * tag 'perf-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix out of range data
2024-04-14x86/bugs: Fix BHI retpoline checkJosh Poimboeuf1-4/+7
Confusingly, X86_FEATURE_RETPOLINE doesn't mean retpolines are enabled, as it also includes the original "AMD retpoline" which isn't a retpoline at all. Also replace cpu_feature_enabled() with boot_cpu_has() because this is before alternatives are patched and cpu_feature_enabled()'s fallback path is slower than plain old boot_cpu_has(). Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/ad3807424a3953f0323c011a643405619f2a4927.1712944776.git.jpoimboe@kernel.org
2024-04-12Merge tag 'arm64-fixes' of ↵Linus Torvalds1-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix the TLBI RANGE operand calculation causing live migration under KVM/arm64 to miss dirty pages due to stale TLB entries" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: tlb: Fix TLBI RANGE operand
2024-04-12Merge tag 'soc-fixes-6.9-1' of ↵Linus Torvalds9-56/+54
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The device tree changes this time are all for NXP i.MX platforms, addressing issues with clocks and regulators on i.MX7 and i.MX8. The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for regressions that came in. The ARM SCMI and FF-A firmware interfaces get a couple of minor bug fixes. A regression fix for RISC-V cache management addresses a problem with probe order on Sifive cores" * tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits) MAINTAINERS: Change Krzysztof Kozlowski's email address arm64: dts: imx8qm-ss-dma: fix can lpcg indices arm64: dts: imx8-ss-dma: fix can lpcg indices arm64: dts: imx8-ss-dma: fix adc lpcg indices arm64: dts: imx8-ss-dma: fix pwm lpcg indices arm64: dts: imx8-ss-dma: fix spi lpcg indices arm64: dts: imx8-ss-conn: fix usb lpcg indices arm64: dts: imx8-ss-lsio: fix pwm lpcg indices ARM: dts: imx7s-warp: Pass OV2680 link-frequencies ARM: dts: imx7-mba7: Use 'no-mmc' property arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator cache: sifive_ccache: Partially convert to a platform driver firmware: arm_scmi: Make raw debugfs entries non-seekable firmware: arm_scmi: Fix wrong fastchannel initialization firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get() ARM: OMAP2+: fix USB regression on Nokia N8x0 mmc: omap: restore original power up/down steps mmc: omap: fix deferred probe ...
2024-04-12arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controllerMaximilian Luz1-1/+1
The ACPI DSDT of the Surface Pro X (SQ2) specifies the interrupts for the secondary UBS controller as Name (_CRS, ResourceTemplate () { Interrupt (ResourceConsumer, Level, ActiveHigh, Shared, ,, ) { 0x000000AA, } Interrupt (ResourceConsumer, Level, ActiveHigh, SharedAndWake, ,, ) { 0x000000A7, // hs_phy_irq: &intc GIC_SPI 136 } Interrupt (ResourceConsumer, Level, ActiveHigh, SharedAndWake, ,, ) { 0x00000228, // ss_phy_irq: &pdc 40 } Interrupt (ResourceConsumer, Edge, ActiveHigh, SharedAndWake, ,, ) { 0x0000020A, // dm_hs_phy_irq: &pdc 10 } Interrupt (ResourceConsumer, Edge, ActiveHigh, SharedAndWake, ,, ) { 0x0000020B, // dp_hs_phy_irq: &pdc 11 } }) Generally, the interrupts above 0x200 map to the PDC interrupts (as used in the devicetree) as ACPI_NUMBER - 0x200. Note that this lines up with dm_hs_phy_irq and dp_hs_phy_irq (as well as the interrupts for the primary USB controller). Based on the snippet above, ss_phy_irq should therefore be PDC 40 (= 0x28) and not PDC 7. The latter is according to ACPI instead used as ss_phy_irq for port 0 of the multiport USB controller). Fix this by setting ss_phy_irq to '&pdc 40'. Fixes: b080f53a8f44 ("arm64: dts: qcom: sc8180x: Add remoteprocs, wifi and usb nodes") Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20240328022224.336938-1-luzmaximilian@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12arm64: dts: qcom: sm8650: Fix the msi-map entriesManivannan Sadhasivam1-6/+4
While adding the GIC ITS MSI support, it was found that the msi-map entries needed to be swapped to receive MSIs from the endpoint. But later it was identified that the swapping was needed due to a bug in the Qualcomm PCIe controller driver. And since the bug is now fixed with commit bf79e33cdd89 ("PCI: qcom: Enable BDF to SID translation properly"), let's fix the msi-map entries also to reflect the actual mapping in the hardware. Fixes: a33a532b3b1e ("arm64: dts: qcom: sm8650: Use GIC-ITS for PCIe0 and PCIe1") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD Link: https://lore.kernel.org/r/20240318-pci-bdf-sid-fix-v1-3-acca6c5d9cf1@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12arm64: dts: qcom: sm8550: Fix the msi-map entriesManivannan Sadhasivam1-6/+4
While adding the GIC ITS MSI support, it was found that the msi-map entries needed to be swapped to receive MSIs from the endpoint. But later it was identified that the swapping was needed due to a bug in the Qualcomm PCIe controller driver. And since the bug is now fixed with commit bf79e33cdd89 ("PCI: qcom: Enable BDF to SID translation properly"), let's fix the msi-map entries also to reflect the actual mapping in the hardware. Fixes: 114990ce3edf ("arm64: dts: qcom: sm8550: Use GIC-ITS for PCIe0 and PCIe1") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Link: https://lore.kernel.org/r/20240318-pci-bdf-sid-fix-v1-2-acca6c5d9cf1@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12arm64: dts: qcom: sm8450: Fix the msi-map entriesManivannan Sadhasivam1-12/+4
While adding the GIC ITS MSI support, it was found that the msi-map entries needed to be swapped to receive MSIs from the endpoint. But later it was identified that the swapping was needed due to a bug in the Qualcomm PCIe controller driver. And since the bug is now fixed with commit bf79e33cdd89 ("PCI: qcom: Enable BDF to SID translation properly"), let's fix the msi-map entries also to reflect the actual mapping in the hardware. Cc: stable@vger.kernel.org # 6.3: bf79e33cdd89 ("PCI: qcom: Enable BDF to SID translation properly") Fixes: ff384ab56f16 ("arm64: dts: qcom: sm8450: Use GIC-ITS for PCIe0 and PCIe1") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240318-pci-bdf-sid-fix-v1-1-acca6c5d9cf1@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPPJohan Hovold1-0/+5
Add the missing PCIe CX performance level votes to avoid relying on other drivers (e.g. USB or UFS) to maintain the nominal performance level required for Gen3 speeds. Fixes: 813e83157001 ("arm64: dts: qcom: sc8280xp/sa8540p: add PCIe2-4 nodes") Cc: stable@vger.kernel.org # 6.2 Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20240306095651.4551-5-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle statesRajendra Nayak1-2/+2
The compatible's for the cluster/domain idle states of x1e80100 are wrong, fix it. Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts") Signed-off-by: Rajendra Nayak <quic_rjendra@quicinc.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240317132918.1068817-1-quic_rjendra@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocsLuca Weiss5-11/+11
The code in qcom_q6v5_init() requests the "wdog" IRQ as IRQF_TRIGGER_RISING. If dt defines the interrupt type as LEVEL_HIGH then the driver will have issues getting the IRQ again after probe deferral with an error like: irq: type mismatch, failed to map hwirq-14 for interrupt-controller@b220000! Fix that by updating the devicetrees to use IRQ_TYPE_EDGE_RISING for these interrupts, as is already used in most dt's. Also the driver was already using the interrupts with that type. Fixes: 3658e411efcb ("arm64: dts: qcom: sc7280: Add ADSP node") Fixes: df62402e5ff9 ("arm64: dts: qcom: sc7280: Add CDSP node") Fixes: 152d1faf1e2f ("arm64: dts: qcom: add SC8280XP platform") Fixes: 8eb5287e8a42 ("arm64: dts: qcom: sm6350: Add CDSP nodes") Fixes: efc33c969f23 ("arm64: dts: qcom: sm6350: Add ADSP nodes") Fixes: fe6fd26aeddf ("arm64: dts: qcom: sm6375: Add ADSP&CDSP") Fixes: 23a8903785b9 ("arm64: dts: qcom: sm8250: Add remoteprocs") Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20240219-remoteproc-irqs-v1-1-c5aeb02334bd@fairphone.com [bjorn: Added fixes references] Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-04-12Kconfig: add some hidden tabs on purposeLinus Torvalds1-6/+6
Commit d96c36004e31 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry") removed a hidden tab because it apparently showed breakage in some third-party kernel config parsing tool. It wasn't clear what tool it was, but let's make sure it gets fixed. Because if you can't parse tabs as whitespace, you should not be parsing the kernel Kconfig files. In fact, let's make such breakage more obvious than some esoteric ftrace record size option. If you can't parse tabs, you can't have page sizes. Yes, tab-vs-space confusion is sadly a traditional Unix thing, and 'make' is famous for being broken in this regard. But no, that does not mean that it's ok. I'd add more random tabs to our Kconfig files, but I don't want to make things uglier than necessary. But it *might* bbe necessary if it turns out we see more of this kind of silly tooling. Fixes: d96c36004e31 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry") Link: https://lore.kernel.org/lkml/CAHk-=wj-hLLN_t_m5OL4dXLaxvXKy_axuoJYXif7iczbfgAevQ@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-12Merge tag 'mips-fixes_6.9_1' of ↵Linus Torvalds7-38/+42
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Fix for syscall_get_nr() to make it work even if tracing is disabled" * tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: scall: Save thread_info.syscall unconditionally on entry
2024-04-12x86/cpu/amd: Move TOPOEXT enablement into the topology parserThomas Gleixner2-15/+21
The topology rework missed that early_init_amd() tries to re-enable the Topology Extensions when the BIOS disabled them. The new parser is invoked before early_init_amd() so the re-enable attempt happens too late. Move it into the AMD specific topology parser code where it belongs. Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/878r1j260l.ffs@tglx
2024-04-12x86/cpu/amd: Make the NODEID_MSR union actually workThomas Gleixner1-3/+3
A system with NODEID_MSR was reported to crash during early boot without any output. The reason is that the union which is used for accessing the bitfields in the MSR is written wrongly and the resulting executable code accesses the wrong part of the MSR data. As a consequence a later division by that value results in 0 and that result is used for another division as divisor, which obviously does not work well. The magic world of C, unions and bitfields: union { u64 bita : 3, bitb : 3; u64 all; } x; x.all = foo(); a = x.bita; b = x.bitb; results in the effective executable code of: a = b = x.bita; because bita and bitb are treated as union members and therefore both end up at bit offset 0. Wrapping the bitfield into an anonymous struct: union { struct { u64 bita : 3, bitb : 3; }; u64 all; } x; works like expected. Rework the NODEID_MSR union in exactly that way to cure the problem. Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser") Reported-by: "kernelci.org bot" <bot@kernelci.org> Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20240410194311.596282919@linutronix.de Closes: https://lore.kernel.org/all/20240322175210.124416-1-laura.nao@collabora.com/
2024-04-12x86/cpu/amd: Make the CPUID 0x80000008 parser correctThomas Gleixner1-6/+18
CPUID 0x80000008 ECX.cpu_nthreads describes the number of threads in the package. The parser uses this value to initialize the SMT domain level. That's wrong because cpu_nthreads does not describe the number of threads per physical core. So this needs to set the CORE domain level and let the later parsers set the SMT shift if available. Preset the SMT domain level with the assumption of one thread per core, which is correct ifrt here are no other CPUID leafs to parse, and propagate cpu_nthreads and the core level APIC bitwidth into the CORE domain. Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser") Reported-by: "kernelci.org bot" <bot@kernelci.org> Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20240410194311.535206450@linutronix.de
2024-04-12x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHIJosh Poimboeuf2-15/+4
For consistency with the other CONFIG_MITIGATION_* options, replace the CONFIG_SPECTRE_BHI_{ON,OFF} options with a single CONFIG_MITIGATION_SPECTRE_BHI option. [ mingo: Fix ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
2024-04-12x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=autoJosh Poimboeuf2-14/+1
Unlike most other mitigations' "auto" options, spectre_bhi=auto only mitigates newer systems, which is confusing and not particularly useful. Remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
2024-04-11Merge tag 'hyperv-fixes-signed-20240411' of ↵Linus Torvalds2-26/+12
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Some cosmetic changes (Erni Sri Satya Vennela, Li Zhijian) - Introduce hv_numa_node_to_pxm_info() (Nuno Das Neves) - Fix KVP daemon to handle IPv4 and IPv6 combination for keyfile format (Shradha Gupta) - Avoid freeing decrypted memory in a confidential VM (Rick Edgecombe and Michael Kelley) * tag 'hyperv-fixes-signed-20240411' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted uio_hv_generic: Don't free decrypted memory hv_netvsc: Don't free decrypted memory Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format hv: vmbus: Convert sprintf() family to sysfs_emit() family mshyperv: Introduce hv_numa_node_to_pxm_info() x86/hyperv: Cosmetic changes for hv_apic.c
2024-04-11KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs. write-protectingDavid Matlack1-10/+6
Drop the "If AD bits are enabled/disabled" verbiage from the comments above kvm_tdp_mmu_clear_dirty_{slot,pt_masked}() since TDP MMU SPTEs may need to be write-protected even when A/D bits are enabled. i.e. These comments aren't technically correct. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240315230541.1635322-4-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11KVM: x86/mmu: Remove function comments above clear_dirty_{gfn_range,pt_masked}()David Matlack1-14/+0
Drop the comments above clear_dirty_gfn_range() and clear_dirty_pt_masked(), since each is word-for-word identical to the comment above their parent function. Leave the comment on the parent functions since they are APIs called by the KVM/x86 MMU. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240315230541.1635322-3-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty statusDavid Matlack1-5/+16
Check kvm_mmu_page_ad_need_write_protect() when deciding whether to write-protect or clear D-bits on TDP MMU SPTEs, so that the TDP MMU accounts for any role-specific reasons for disabling D-bit dirty logging. Specifically, TDP MMU SPTEs must be write-protected when the TDP MMU is being used to run an L2 (i.e. L1 has disabled EPT) and PML is enabled. KVM always disables PML when running L2, even when L1 and L2 GPAs are in the some domain, so failing to write-protect TDP MMU SPTEs will cause writes made by L2 to not be reflected in the dirty log. Reported-by: syzbot+900d58a45dcaab9e4821@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=900d58a45dcaab9e4821 Fixes: 5982a5392663 ("KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need wrprot") Cc: stable@vger.kernel.org Cc: Vipin Sharma <vipinsh@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240315230541.1635322-2-dmatlack@google.com [sean: massage shortlog and changelog, tweak ternary op formatting] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID updateSean Christopherson1-3/+3
Set kvm_mmu_page_role.invalid to mark the various MMU root_roles invalid during CPUID update in order to force a refresh, instead of zeroing out the entire role. This fixes a bug where kvm_mmu_free_roots() incorrectly thinks a root is indirect, i.e. not a TDP MMU, due to "direct" being zeroed, which in turn causes KVM to take mmu_lock for write instead of read. Note, paving over the entire role was largely unintentional, commit 7a458f0e1ba1 ("KVM: x86/mmu: remove extended bits from mmu_role, rename field") simply missed that "invalid" could be set. Fixes: 576a15de8d29 ("KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read") Reported-by: syzbot+dc308fcfcd53f987de73@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/0000000000009b38080614c49bdb@google.com Cc: Phi Nguyen <phind.uet@gmail.com> Link: https://lore.kernel.org/r/20240408231115.1387279-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11KVM: VMX: Disable LBR virtualization if the CPU doesn't support LBR callstacksSean Christopherson1-1/+9
Disable LBR virtualization if the CPU doesn't support callstacks, which were introduced in HSW (see commit e9d7f7cd97c4 ("perf/x86/intel: Add basic Haswell LBR call stack support"), as KVM unconditionally configures the perf LBR event with PERF_SAMPLE_BRANCH_CALL_STACK, i.e. LBR virtualization always fails on pre-HSW CPUs. Simply disable LBR support on such CPUs, as it has never worked, i.e. there is no risk of breaking an existing setup, and figuring out a way to performantly context switch LBRs on old CPUs is not worth the effort. Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Cc: Mingwei Zhang <mizhang@google.com> Cc: Jim Mattson <jmattson@google.com> Tested-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240307011344.835640-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11perf/x86/intel: Expose existence of callback support to KVMSean Christopherson2-0/+2
Add a "has_callstack" field to the x86_pmu_lbr structure used to pass information to KVM, and set it accordingly in x86_perf_get_lbr(). KVM will use has_callstack to avoid trying to create perf LBR events with PERF_SAMPLE_BRANCH_CALL_STACK on CPUs that don't support callstacks. Reviewed-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240307011344.835640-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11KVM: VMX: Snapshot LBR capabilities during module initializationSean Christopherson3-5/+8
Snapshot VMX's LBR capabilities once during module initialization instead of calling into perf every time a vCPU reconfigures its vPMU. This will allow massaging the LBR capabilities, e.g. if the CPU doesn't support callstacks, without having to remember to update multiple locations. Opportunistically tag vmx_get_perf_capabilities() with __init, as it's only called from vmx_set_cpu_caps(). Reviewed-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240307011344.835640-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11Merge tag 'loongarch-fixes-6.9-1' of ↵Linus Torvalds9-12/+116
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: - make {virt, phys, page, pfn} translation work with KFENCE for LoongArch (otherwise NVMe and virtio-blk cannot work with KFENCE enabled) - update dts files for Loongson-2K series to make devices work correctly - fix a build error * tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE mm: Move lowmem_page_address() a little later
2024-04-11KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platformsSandipan Das1-1/+2
On AMD and Hygon platforms, the local APIC does not automatically set the mask bit of the LVTPC register when handling a PMI and there is no need to clear it in the kernel's PMI handler. For guests, the mask bit is currently set by kvm_apic_local_deliver() and unless it is cleared by the guest kernel's PMI handler, PMIs stop arriving and break use-cases like sampling with perf record. This does not affect non-PerfMonV2 guests because PMIs are handled in the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC mask bit irrespective of the vendor. Before: $ perf record -e cycles:u true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ] After: $ perf record -e cycles:u true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ] Fixes: a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI") Cc: stable@vger.kernel.org Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> [sean: use is_intel_compatible instead of !is_amd_or_hygon()] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240405235603.1173076-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatibleSean Christopherson5-2/+14
Add kvm_vcpu_arch.is_amd_compatible to cache if a vCPU's vendor model is compatible with AMD, i.e. if the vCPU vendor is AMD or Hygon, along with helpers to check if a vCPU is compatible AMD vs. Intel. To handle Intel vs. AMD behavior related to masking the LVTPC entry, KVM will need to check for vendor compatibility on every PMI injection, i.e. querying for AMD will soon be a moderately hot path. Note! This subtly (or maybe not-so-subtly) makes "Intel compatible" KVM's default behavior, both if userspace omits (or never sets) CPUID 0x0 and if userspace sets a completely unknown vendor. One could argue that KVM should treat such vCPUs as not being compatible with Intel *or* AMD, but that would add useless complexity to KVM. KVM needs to do *something* in the face of vendor specific behavior, and so unless KVM conjured up a magic third option, choosing to treat unknown vendors as neither Intel nor AMD means that checks on AMD compatibility would yield Intel behavior, and checks for Intel compatibility would yield AMD behavior. And that's far worse as it would effectively yield random behavior depending on whether KVM checked for AMD vs. Intel vs. !AMD vs. !Intel. And practically speaking, all x86 CPUs follow either Intel or AMD architecture, i.e. "supporting" an unknown third architecture adds no value. Deliberately don't convert any of the existing guest_cpuid_is_intel() checks, as the Intel side of things is messier due to some flows explicitly checking for exactly vendor==Intel, versus some flows assuming anything that isn't "AMD compatible" gets Intel behavior. The Intel code will be cleaned up in the future. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240405235603.1173076-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11x86/bugs: Clarify that syscall hardening isn't a BHI mitigationJosh Poimboeuf1-3/+3
While syscall hardening helps prevent some BHI attacks, there's still other low-hanging fruit remaining. Don't classify it as a mitigation and make it clear that the system may still be vulnerable if it doesn't have a HW or SW mitigation enabled. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Fix BHI handling of RRSBAJosh Poimboeuf1-12/+18
The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been disabled by the Spectre v2 mitigation (or can otherwise be disabled by the BHI mitigation itself if needed). In that case retpolines are fine. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'Ingo Molnar3-42/+42
So we are using the 'ia32_cap' value in a number of places, which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register. But there's very little 'IA32' about it - this isn't 32-bit only code, nor does it originate from there, it's just a historic quirk that many Intel MSR names are prefixed with IA32_. This is already clear from the helper method around the MSR: x86_read_arch_cap_msr(), which doesn't have the IA32 prefix. So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with its role and with the naming of the helper function. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIESJosh Poimboeuf1-15/+7
There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and over. It's even read in the BHI sysfs function which is a big no-no. Just read it once and cache it. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
2024-04-10arm64: tlb: Fix TLBI RANGE operandGavin Shan1-9/+11
KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty pages are collected by VMM and the page table entries become write protected during live migration. Unfortunately, the operand passed to the TLBI RANGE instruction isn't correctly sorted out due to the commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()"). It leads to crash on the destination VM after live migration because TLBs aren't flushed completely and some of the dirty pages are missed. For example, I have a VM where 8GB memory is assigned, starting from 0x40000000 (1GB). Note that the host has 4KB as the base page size. In the middile of migration, kvm_tlb_flush_vmid_range() is executed to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3 and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop in the __flush_tlb_range_op() until the variable @scale underflows and becomes -9, 0xffff708000040000 is set as the operand. The operand is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to invalid @scale and @num. Fix it by extending __TLBI_RANGE_NUM() to support the combination of SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can be returned from the macro, meaning the TLBs for 0x200000 pages in the above example can be flushed in one shoot with SCALE#3 and NUM#31. The macro TLBI_RANGE_MASK is dropped since no one uses it any more. The comments are also adjusted accordingly. Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") Cc: stable@kernel.org # v6.6+ Reported-by: Yihuang Yu <yihyu@redhat.com> Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240405035852.1532010-2-gshan@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-04-10x86/topology: Don't update cpu_possible_map in topo_set_cpuids()Thomas Gleixner1-2/+5
topo_set_cpuids() updates cpu_present_map and cpu_possible map. It is invoked during enumeration and "physical hotplug" operations. In the latter case this results in a kernel crash because cpu_possible_map is marked read only after init completes. There is no reason to update cpu_possible_map in that function. During enumeration cpu_possible_map is not relevant and gets fully initialized after enumeration completed. On "physical hotplug" the bit is already set because the kernel allows only CPUs to be plugged which have been enumerated and associated to a CPU number during early boot. Remove the bogus update of cpu_possible_map. Fixes: 0e53e7b656cf ("x86/cpu/topology: Sanitize the APIC admission logic") Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87ttkc6kwx.ffs@tglx
2024-04-10LoongArch: Include linux/sizes.h in addrspace.h to prevent build errorsRandy Dunlap1-0/+1
LoongArch's include/asm/addrspace.h uses SZ_32M and SZ_16K, so add <linux/sizes.h> to provide those macros to prevent build errors: In file included from ../arch/loongarch/include/asm/io.h:11, from ../include/linux/io.h:13, from ../include/linux/io-64-nonatomic-lo-hi.h:5, from ../drivers/cxl/pci.c:4: ../include/asm-generic/io.h: In function 'ioport_map': ../arch/loongarch/include/asm/addrspace.h:124:25: error: 'SZ_32M' undeclared (first use in this function); did you mean 'PS_32M'? 124 | #define PCI_IOSIZE SZ_32M Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNETHuacai Chen2-3/+42
Current dts file for Loongson-2K2000's GMAC/GNET is incomplete, both irq and phy descriptions are missing. Add them to make GMAC/GNET work. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10LoongArch: Update dts for Loongson-2K2000 to support PCI-MSIHuacai Chen1-0/+3
Current dts file for Loongson-2K2000 misses the interrupt-controller & interrupt-cells descriptions in the msi-controller node, and misses the msi-parent link in the pci root node. Add them to support PCI-MSI. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10LoongArch: Update dts for Loongson-2K2000 to support ISA/LPCHuacai Chen1-1/+8
Some Loongson-2K2000 platforms have ISA/LPC devices such as Super-IO, define an ISA node in the dts file to avoid access error. Also adjust the PCI io resource range to avoid confliction. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10LoongArch: Update dts for Loongson-2K1000 to support ISA/LPCHuacai Chen1-0/+7
Some Loongson-2K1000 platforms have ISA/LPC devices such as Super-IO, define an ISA node in the dts file to avoid access error. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCEHuacai Chen1-0/+4
When enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following backtraces when running LongArch kernels. [ 2.496257] kernel BUG at include/linux/scatterlist.h:187! ... [ 2.501925] Call Trace: [ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0 [ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4 [ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4 [ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690 [ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40 [ 2.502775] [<90000000041b4508>] kthread+0x138/0x158 [ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4 The backtrace is always similar but not exactly the same. It is always triggered from cryptomgr_test, but not always from the same test. Analysis shows that with CONFIG_KFENCE active, the address returned from kmalloc() and friends is not always below vm_map_base. It is allocated by kfence_alloc() which at least sometimes seems to get its memory from an address space above vm_map_base. This causes __virt_addr_valid() to return false for the affected objects. Let __virt_addr_valid() return 1 for kfence pool addresses, this make virt_addr_valid()/__virt_addr_valid() work with KFENCE. Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10LoongArch: Make {virt, phys, page, pfn} translation work with KFENCEHuacai Chen4-8/+51
KFENCE changes virt_to_page() to be able to translate tlb mapped virtual addresses, but forget to change virt_to_phys()/phys_to_virt() and other translation functions as well. This patch fix it, otherwise some drivers (such as nvme and virtio-blk) cannot work with KFENCE. All {virt, phys, page, pfn} translation functions are updated: 1, virt_to_pfn()/pfn_to_virt(); 2, virt_to_page()/page_to_virt(); 3, virt_to_phys()/phys_to_virt(). DMW/TLB mapped addresses are distinguished by comparing the vaddress with vm_map_base in virt_to_xyz(), and we define WANT_PAGE_VIRTUAL in the KFENCE case for the reverse translations, xyz_to_virt(). Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2ProJose Ignacio Tornos Martinez1-0/+2
With default dts configuration for BPI-R2Pro, the regulator for sd card is powered off when reboot is commanded, and the only solution to detect the sd card again, and therefore, allow rebooting from there, is to do a hardware reset. Configure the regulator for sd to be always on for BPI-R2Pro in order to avoid this issue. Fixes: f901aaadaa2a ("arm64: dts: rockchip: Add Bananapi R2 Pro") Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Link: https://lore.kernel.org/r/20240305143222.189413-1-jtornosm@redhat.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-10x86/bugs: Fix return type of spectre_bhi_state()Daniel Sneddon1-1/+1
The definition of spectre_bhi_state() incorrectly returns a const char * const. This causes the a compiler warning when building with W=1: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2812 | static const char * const spectre_bhi_state(void) Remove the const qualifier from the pointer. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.intel.com
2024-04-10Merge branch 'linus' into x86/urgent, to pick up dependent commitsIngo Molnar17-43/+317
Prepare to fix aspects of the new BHI code. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-10arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2Krzysztof Kozlowski1-1/+0
There is no "disable-gpios" property in the PCI bindings or Linux driver, so assume this was copied from downstream. This property looks like some real hardware, just described wrongly. Rockchip PCIe controller (DesignWare based) does not define any other GPIO-s property, except reset-gpios which is already there, so not sure what would be the real property for this GPIO. This fixes dtbs_check warning: rk3568-lubancat-2.dtb: pcie@fe260000: Unevaluated properties are not allowed ('disable-gpios' was unexpected) Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240407102854.38672-4-krzysztof.kozlowski@linaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-10arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1Krzysztof Kozlowski1-1/+0
There is no "disable-gpios" property in the PCI bindings or Linux driver, so assume this was copied from downstream. This property looks like some real hardware, just described wrongly. Rockchip PCIe controller (DesignWare based) does not define any other GPIO-s property, except reset-gpios which is already there, so not sure what would be the real property for this GPIO. This fixes dtbs_check warning: rk3566-lubancat-1.dtb: pcie@fe260000: Unevaluated properties are not allowed ('disable-gpios' was unexpected) Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240407102854.38672-3-krzysztof.kozlowski@linaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-10arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet DumoKrzysztof Kozlowski1-1/+0
There is no "pcie-reset-suspend" property in the PCI bindings or Linux driver, so assume this was copied from downstream. Drop the property, but leave the comment, because it might be useful for someone. This fixes dtbs_check warning: rk3399-gru-scarlet-dumo.dtb: pcie@f8000000: Unevaluated properties are not allowed ('pcie-reset-suspend' was unexpected) Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240407102854.38672-1-krzysztof.kozlowski@linaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-10arm64: dts: rockchip: mark system power controller and fix typo on ↵Muhammed Efe Cetin1-1/+2
orangepi-5-plus Mark the PMIC as system power controller, so the board will shut-down properly and fix the typo on rk806_dvs1_null pins property. Fixes: 236d225e1ee7 ("arm64: dts: rockchip: Add board device tree for rk3588-orangepi-5-plus") Signed-off-by: Muhammed Efe Cetin <efectn@protonmail.com> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Link: https://lore.kernel.org/r/20240407173210.372585-1-efectn@6tel.net Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-10arm64: dts: rockchip: Designate the system power controller on QuartzPro64Dragan Simic1-0/+1
Designate the primary RK806 PMIC on the Pine64 QuartzPro64 as the system power controller, so the board shuts down properly on poweroff(8). Fixes: 152d3d070a9c ("arm64: dts: rockchip: Add QuartzPro64 SBC device tree") Signed-off-by: Dragan Simic <dsimic@manjaro.org> Link: https://lore.kernel.org/r/c602dfb3972a0844f2a87b6245bdc5c3378c5989.1712512497.git.dsimic@manjaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-04-10perf/x86: Fix out of range dataNamhyung Kim1-0/+1
On x86 each struct cpu_hw_events maintains a table for counter assignment but it missed to update one for the deleted event in x86_pmu_del(). This can make perf_clear_dirty_counters() reset used counter if it's called before event scheduling or enabling. Then it would return out of range data which doesn't make sense. The following code can reproduce the problem. $ cat repro.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <linux/perf_event.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/syscall.h> struct perf_event_attr attr = { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, .disabled = 1, }; void *worker(void *arg) { int cpu = (long)arg; int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); void *p; do { ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0); p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0); ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0); ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0); munmap(p, 4096); ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0); } while (1); return NULL; } int main(void) { int i; int n = sysconf(_SC_NPROCESSORS_ONLN); pthread_t *th = calloc(n, sizeof(*th)); for (i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, (void *)(long)i); for (i = 0; i < n; i++) pthread_join(th[i], NULL); free(th); return 0; } And you can see the out of range data using perf stat like this. Probably it'd be easier to see on a large machine. $ gcc -o repro repro.c -pthread $ ./repro & $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }' 1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%) 1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%) 1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%) 2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%) 2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%) 2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%) 2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%) 3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%) 3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%) 3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%) 3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle It should move the contents in the cpuc->assign as well. Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
2024-04-09Merge patch the fixes from "riscv: 64-bit NOMMU fixes and enhancements"Palmer Dabbelt3-3/+3
These two patches are fixes that the feature depends on, but they also fix generic issues. So I'm picking them up for fixes as well as for-next. * commit 'aea702dde7e9876fb00571a2602f25130847bf0f': riscv: Fix loading 64-bit NOMMU kernels past the start of RAM riscv: Fix TASK_SIZE on 64-bit NOMMU Link: https://lore.kernel.org/r/20240227003630.3634533-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-09riscv: Fix loading 64-bit NOMMU kernels past the start of RAMSamuel Holland2-2/+2
commit 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") added logic to allow using RAM below the kernel load address. However, this does not work for NOMMU, where PAGE_OFFSET is fixed to the kernel load address. Since that range of memory corresponds to PFNs below ARCH_PFN_OFFSET, mm initialization runs off the beginning of mem_map and corrupts adjacent kernel memory. Fix this by restoring the previous behavior for NOMMU kernels. Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240227003630.3634533-3-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-09riscv: Fix TASK_SIZE on 64-bit NOMMUSamuel Holland1-1/+1
On NOMMU, userspace memory can come from anywhere in physical RAM. The current definition of TASK_SIZE is wrong if any RAM exists above 4G, causing spurious failures in the userspace access routines. Fixes: 6bd33e1ece52 ("riscv: add nommu support") Fixes: c3f896dcf1e4 ("mm: switch the test_vmalloc module to use __vmalloc_node") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Bo Gan <ganboing@gmail.com> Link: https://lore.kernel.org/r/20240227003630.3634533-2-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-09KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARDSean Christopherson1-5/+0
Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD to skip objtool's stack validation now that __svm_vcpu_run() and __svm_sev_es_vcpu_run() create stack frames (though the former's effectiveness is dubious). Note, due to a quirk in how OBJECT_FILES_NON_STANDARD was handled by the build system prior to commit bf48d9b756b9 ("kbuild: change tool coverage variables to take the path relative to $(obj)"), vmx/vmenter.S got lumped in with svm/vmenter.S. __vmx_vcpu_run() already plays nice with frame pointers, i.e. it was collateral damage when commit 7f4b5cde2409 ("kvm: Disable objtool frame pointer checking for vmenter.S") added the OBJECT_FILES_NON_STANDARD hack-a-fix. Link: https://lore.kernel.org/all/20240217055504.2059803-1-masahiroy@kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run()Sean Christopherson1-0/+4
Now that KVM uses the host save area to context switch RBP, i.e. preserves RBP for the entirety of __svm_sev_es_vcpu_run(), create a stack frame using the standared FRAME_{BEGIN,END} macros. Note, __svm_sev_es_vcpu_run() is subtly not a leaf function as it can call into ibpb_feature() via UNTRAIN_RET_VM. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Save/restore args across SEV-ES VMRUN via host save areaSean Christopherson1-16/+13
Use the host save area to preserve volatile registers that are used in __svm_sev_es_vcpu_run() to access function parameters after #VMEXIT. Like saving/restoring non-volatile registers, there's no reason not to take advantage of hardware restoring registers on #VMEXIT, as doing so shaves a few instructions and the save area is going to be accessed no matter what. Converting all register save/restore code to use the host save area also make it easier to follow the SEV-ES VMRUN flow in its entirety, as opposed to having a mix of stack-based versus host save area save/restore. Add a parameter to RESTORE_HOST_SPEC_CTRL_BODY so that the SEV-ES path doesn't need to write @spec_ctrl_intercepted to memory just to play nice with the common macro. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via host save areaSean Christopherson3-26/+35
Use the host save area to save/restore non-volatile (callee-saved) registers in __svm_sev_es_vcpu_run() to take advantage of hardware loading all registers from the save area on #VMEXIT. KVM still needs to save the registers it wants restored, but the loads are handled automatically by hardware. Aside from less assembly code, letting hardware do the restoration means stack frames are preserved for the entirety of __svm_sev_es_vcpu_run(). Opportunistically add a comment to call out why @svm needs to be saved across VMRUN->#VMEXIT, as it's not easy to decipher that from the macro hell. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_interceptedSean Christopherson1-2/+2
POP @spec_ctrl_intercepted into RAX instead of RBX when discarding it from the stack so that __svm_sev_es_vcpu_run() doesn't modify any non-volatile registers. __svm_sev_es_vcpu_run() doesn't return a value, and RAX is already are clobbered multiple times in the #VMEXIT path. This will allowing using the host save area to save/restore non-volatile registers in __svm_sev_es_vcpu_run(). Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run()Sean Christopherson1-31/+13
Drop 32-bit "support" from __svm_sev_es_vcpu_run(), as SEV/SEV-ES firmly 64-bit only. The "support" was purely the result of bad copy+paste from __svm_vcpu_run(), which in turn was slightly less bad copy+paste from __vmx_vcpu_run(). Opportunistically convert to unadulterated register accesses so that it's easier (but still not easy) to follow which registers hold what arguments, and when. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEVSean Christopherson1-0/+2
Compile (and link) __svm_sev_es_vcpu_run() if and only if SEV support is actually enabled. This will allow dropping non-existent 32-bit "support" from __svm_sev_es_vcpu_run() without causing undue confusion. Intentionally don't provide a stub (but keep the declaration), as any sane compiler, even with things like KASAN enabled, should eliminate the call to __svm_sev_es_vcpu_run() since sev_es_guest() unconditionally returns "false" if CONFIG_KVM_AMD_SEV=n. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwindingSean Christopherson1-0/+1
Unconditionally create a stack frame in __svm_vcpu_run() to play nice with unwinding via frame pointers, at least until the point where RBP is loaded with the guest's value. Don't bother conditioning the code on CONFIG_FRAME_POINTER=y, as RBP needs to be saved and restored anyways (due to it being clobbered with the guest's value); omitting the "MOV RSP, RBP" is not worth the extra #ifdef. Creating a stack frame will allow removing the OBJECT_FILES_NON_STANDARD tag from vmenter.S once __svm_sev_es_vcpu_run() is fixed to not stomp all over RBP for no reason. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240223204233.3337324-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: SVM: Remove a useless zeroing of allocated memoryChristophe JAILLET1-1/+1
Remove KVM's unnecessary zeroing of memory when allocating the pages array in sev_pin_memory() via __vmalloc(), as the array is only used to hold kernel pointers. The kmalloc() path for "small" regions doesn't zero the array, and if KVM leaks state and/or accesses uninitialized data, then the kernel has bigger problems. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/c7619a3d3cbb36463531a7c73ccbde9db587986c.1710004509.git.christophe.jaillet@wanadoo.fr [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09MIPS: scall: Save thread_info.syscall unconditionally on entryJiaxun Yang7-38/+42
thread_info.syscall is used by syscall_get_nr to supply syscall nr over a thread stack frame. Previously, thread_info.syscall is only saved at syscall_trace_enter when syscall tracing is enabled. However rest of the kernel code do expect syscall_get_nr to be available without syscall tracing. The previous design breaks collect_syscall. Move saving process to syscall entry to fix it. Reported-by: Xi Ruoyao <xry111@xry111.site> Link: https://github.com/util-linux/util-linux/issues/2867 Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-04-09Merge tag 'imx-fixes-6.9' of ↵Arnd Bergmann8-43/+44
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 6.9: - A couple of i.MX7 board fixes from Fabio Estevam that use correct 'no-mmc' property and pass 'link-frequencies' for OV2680. - A series from Frank Li to fix LPCG clock indices for i.MX8 subsystems. - A couple of changes from Tim Harvey that fix USB VBUS regulator for imx8mp-venice board. * tag 'imx-fixes-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: dts: imx8qm-ss-dma: fix can lpcg indices arm64: dts: imx8-ss-dma: fix can lpcg indices arm64: dts: imx8-ss-dma: fix adc lpcg indices arm64: dts: imx8-ss-dma: fix pwm lpcg indices arm64: dts: imx8-ss-dma: fix spi lpcg indices arm64: dts: imx8-ss-conn: fix usb lpcg indices arm64: dts: imx8-ss-lsio: fix pwm lpcg indices ARM: dts: imx7s-warp: Pass OV2680 link-frequencies ARM: dts: imx7-mba7: Use 'no-mmc' property arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator Link: https://lore.kernel.org/r/Zg5rfaVVvD9egoBK@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-04-09Merge tag 'omap-for-v6.9/n8x0-fixes-signed' of ↵Arnd Bergmann1-13/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes GPIO regression fixes for n8x0 A series of fixes for n8x0 GPIO regressions caused by the changes to use GPIO descriptors. * tag 'omap-for-v6.9/n8x0-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: fix USB regression on Nokia N8x0 mmc: omap: restore original power up/down steps mmc: omap: fix deferred probe mmc: omap: fix broken slot switch lookup ARM: OMAP2+: fix N810 MMC gpiod table ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0 Link: https://lore.kernel.org/r/pull-1712135932-125424@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-04-08Merge tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds17-43/+317
Pull x86 mitigations from Thomas Gleixner: "Mitigations for the native BHI hardware vulnerabilty: Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Add mitigations against it either with the help of microcode or with software sequences for the affected CPUs" [ This also ends up enabling the full mitigation by default despite the system call hardening, because apparently there are other indirect calls that are still sufficiently reachable, and the 'auto' case just isn't hardened enough. We'll have some more inevitable tweaking in the future - Linus ] * tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: x86: Add BHI_NO x86/bhi: Mitigate KVM by default x86/bhi: Add BHI mitigation knob x86/bhi: Enumerate Branch History Injection (BHI) bug x86/bhi: Define SPEC_CTRL_BHI_DIS_S x86/bhi: Add support for clearing branch history at syscall entry x86/syscall: Don't force use of indirect calls for system calls x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
2024-04-08KVM: VMX: Ignore MKTME KeyID bits when intercepting #PF for ↵Tao Su1-1/+3
allow_smaller_maxphyaddr Use the raw/true host.MAXPHYADDR when deciding whether or not KVM must intercept #PFs when allow_smaller_maxphyaddr is enabled, as any adjustments the kernel makes to boot_cpu_data.x86_phys_bits to account for MKTME KeyID bits do not apply to the guest physical address space. I.e. the KeyID are off-limits for host physical addresses, but are not reserved for GPAs as far as hardware is concerned. Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20240319031111.495006-1-tao1.su@linux.intel.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"Sean Christopherson1-2/+14
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL when refreshing the PMU to emulate the MSR's architecturally defined post-RESET behavior. Per Intel's SDM: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. and Where "n" is the number of general-purpose counters available in the processor. AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many PPRs. Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose counters, although a literal reading of the SDM would require the CPU to set either bits 63:0 or 31:0. The intent of the behavior is to globally enable all GP counters; honor the intent, if not the letter of the law. Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL. This bug was recently exposed when KVM added supported for AMD's PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to guest software that only knew how to program v1 PMUs (that don't support PERF_GLOBAL_CTRL). Failure to emulate the post-RESET behavior results in such guests unknowingly leaving all general purpose counters globally disabled (the entire reason the post-RESET value sets the GP counter enable bits is to maintain backwards compatibility). The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel CPUs, although the old behavior was likely dumb luck. Because (a) that old code was also broken in its own way (the history of this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented as having a value of '0' post-RESET in all SDMs before March 2023. Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost* got it right (again likely by dumb luck), but for some reason only set the bits if the guest PMU was advertised as v1: if (pmu->version == 1) { pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; return; } Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset") then tried to remedy that goof, presumably because guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling counters. pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED); pmu->global_ctrl_mask = ~pmu->global_ctrl; That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing") removed *everything*. However, it did so based on the behavior defined by the SDM , which at the time stated that "Global Perf Counter Controls" is '0' at Power-Up and RESET. But then the March 2023 SDM (325462-079US), stealthily changed its "IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT" table to say: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a "pure" RESET flow. But it can only be called prior to the first KVM_RUN, i.e. the guest will only ever observe the final value. Note #2, KVM has always cleared global_ctrl during refresh (see commit f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")), i.e. there is no danger of breaking existing setups by clobbering a value set by userspace. Reported-by: Babu Moger <babu.moger@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08KVM: x86/mmu: x86: Don't overflow lpage_info when checking attributesRick Edgecombe1-1/+2
Fix KVM_SET_MEMORY_ATTRIBUTES to not overflow lpage_info array and trigger KASAN splat, as seen in the private_mem_conversions_test selftest. When memory attributes are set on a GFN range, that range will have specific properties applied to the TDP. A huge page cannot be used when the attributes are inconsistent, so they are disabled for those the specific huge pages. For internal KVM reasons, huge pages are also not allowed to span adjacent memslots regardless of whether the backing memory could be mapped as huge. What GFNs support which huge page sizes is tracked by an array of arrays 'lpage_info' on the memslot, of ‘kvm_lpage_info’ structs. Each index of lpage_info contains a vmalloc allocated array of these for a specific supported page size. The kvm_lpage_info denotes whether a specific huge page (GFN and page size) on the memslot is supported. These arrays include indices for unaligned head and tail huge pages. Preventing huge pages from spanning adjacent memslot is covered by incrementing the count in head and tail kvm_lpage_info when the memslot is allocated, but disallowing huge pages for memory that has mixed attributes has to be done in a more complicated way. During the KVM_SET_MEMORY_ATTRIBUTES ioctl KVM updates lpage_info for each memslot in the range that has mismatched attributes. KVM does this a memslot at a time, and marks a special bit, KVM_LPAGE_MIXED_FLAG, in the kvm_lpage_info for any huge page. This bit is essentially a permanently elevated count. So huge pages will not be mapped for the GFN at that page size if the count is elevated in either case: a huge head or tail page unaligned to the memslot or if KVM_LPAGE_MIXED_FLAG is set because it has mixed attributes. To determine whether a huge page has consistent attributes, the KVM_SET_MEMORY_ATTRIBUTES operation checks an xarray to make sure it consistently has the incoming attribute. Since level - 1 huge pages are aligned to level huge pages, it employs an optimization. As long as the level - 1 huge pages are checked first, it can just check these and assume that if each level - 1 huge page contained within the level sized huge page is not mixed, then the level size huge page is not mixed. This optimization happens in the helper hugepage_has_attrs(). Unfortunately, although the kvm_lpage_info array representing page size 'level' will contain an entry for an unaligned tail page of size level, the array for level - 1 will not contain an entry for each GFN at page size level. The level - 1 array will only contain an index for any unaligned region covered by level - 1 huge page size, which can be a smaller region. So this causes the optimization to overflow the level - 1 kvm_lpage_info and perform a vmalloc out of bounds read. In some cases of head and tail pages where an overflow could happen, callers skip the operation completely as KVM_LPAGE_MIXED_FLAG is not required to prevent huge pages as discussed earlier. But for memslots that are smaller than the 1GB page size, it does call hugepage_has_attrs(). In this case the huge page is both the head and tail page. The issue can be observed simply by compiling the kernel with CONFIG_KASAN_VMALLOC and running the selftest “private_mem_conversions_test”, which produces the output like the following: BUG: KASAN: vmalloc-out-of-bounds in hugepage_has_attrs+0x7e/0x110 Read of size 4 at addr ffffc900000a3008 by task private_mem_con/169 Call Trace: dump_stack_lvl print_report ? __virt_addr_valid ? hugepage_has_attrs ? hugepage_has_attrs kasan_report ? hugepage_has_attrs hugepage_has_attrs kvm_arch_post_set_memory_attributes kvm_vm_ioctl It is a little ambiguous whether the unaligned head page (in the bug case also the tail page) should be expected to have KVM_LPAGE_MIXED_FLAG set. It is not functionally required, as the unaligned head/tail pages will already have their kvm_lpage_info count incremented. The comments imply not setting it on unaligned head pages is intentional, so fix the callers to skip trying to set KVM_LPAGE_MIXED_FLAG in this case, and in doing so not call hugepage_has_attrs(). Cc: stable@vger.kernel.org Fixes: 90b4fe17981e ("KVM: x86: Disallow hugepages when memory attributes are mixed") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com> Link: https://lore.kernel.org/r/20240314212902.2762507-1-rick.p.edgecombe@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08KVM: x86/pmu: Disable support for adaptive PEBSSean Christopherson1-2/+22
Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. Bug #1 is that KVM doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() stores local variables as u8s and truncates the upper bits too, etc. Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value for PEBS events, perf will _always_ generate an adaptive record, even if the guest requested a basic record. Note, KVM will also enable adaptive PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, i.e. the guest will only ever see Basic records. Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what KVM requests. Bug #4 is that adaptive PEBS *might* effectively bypass event filters set by the host, as "Updated Memory Access Info Group" records information that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least zeros) when entering a vCPU with adaptive PEBS, which allows the guest to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" records. Disable adaptive PEBS support as an immediate fix due to the severity of the LBR leak in particular, and because fixing all of the bugs will be non-trivial, e.g. not suitable for backporting to stable kernels. Note! This will break live migration, but trying to make KVM play nice with live migration would be quite complicated, wouldn't be guaranteed to work (i.e. KVM might still kill/confuse the guest), and it's not clear that there are any publicly available VMMs that support adaptive PEBS, let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't support PEBS in any capacity. Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhang Xiong <xiong.y.zhang@intel.com> Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> Cc: Dapeng Mi <dapeng1.mi@intel.com> Cc: Jim Mattson <jmattson@google.com> Acked-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20240307005833.827147-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08KVM: x86: Add BHI_NODaniel Sneddon1-1/+1
Intel processors that aren't vulnerable to BHI will set MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to determine if they need to implement BHI mitigations or not. Allow this bit to be passed to the guests. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Mitigate KVM by defaultPawan Gupta4-2/+15
BHI mitigation mode spectre_bhi=auto does not deploy the software mitigation by default. In a cloud environment, it is a likely scenario where userspace is trusted but the guests are not trusted. Deploying system wide mitigation in such cases is not desirable. Update the auto mode to unconditionally mitigate against malicious guests. Deploy the software sequence at VMexit in auto mode also, when hardware mitigation is not available. Unlike the force =on mode, software sequence is not deployed at syscalls in auto mode. Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Add BHI mitigation knobPawan Gupta3-1/+115
Branch history clearing software sequences and hardware control BHI_DIS_S were defined to mitigate Branch History Injection (BHI). Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation: auto - Deploy the hardware mitigation BHI_DIS_S, if available. on - Deploy the hardware mitigation BHI_DIS_S, if available, otherwise deploy the software sequence at syscall entry and VMexit. off - Turn off BHI mitigation. The default is auto mode which does not deploy the software sequence mitigation. This is because of the hardening done in the syscall dispatch path, which is the likely target of BHI. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Enumerate Branch History Injection (BHI) bugPawan Gupta3-8/+21
Mitigation for BHI is selected based on the bug enumeration. Add bits needed to enumerate BHI bug. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Define SPEC_CTRL_BHI_DIS_SDaniel Sneddon4-2/+8
Newer processors supports a hardware control BHI_DIS_S to mitigate Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel from userspace BHI attacks without having to manually overwrite the branch history. Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL. Mitigation is enabled later. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Add support for clearing branch history at syscall entryPawan Gupta7-3/+96
Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Alder Lake and new processors supports a hardware control BHI_DIS_S to mitigate BHI. For older processors Intel has released a software sequence to clear the branch history on parts that don't support BHI_DIS_S. Add support to execute the software sequence at syscall entry and VMexit to overwrite the branch history. For now, branch history is not cleared at interrupt entry, as malicious applications are not believed to have sufficient control over the registers, since previous register state is cleared at interrupt entry. Researchers continue to poke at this area and it may become necessary to clear at interrupt entry as well in the future. This mitigation is only defined here. It is enabled later. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Co-developed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/syscall: Don't force use of indirect calls for system callsLinus Torvalds5-16/+50
Make <asm/syscall.h> build a switch statement instead, and the compiler can either decide to generate an indirect jump, or - more likely these days due to mitigations - just a series of conditional branches. Yes, the conditional branches also have branch prediction, but the branch prediction is much more controlled, in that it just causes speculatively running the wrong system call (harmless), rather than speculatively running possibly wrong random less controlled code gadgets. This doesn't mitigate other indirect calls, but the system call indirection is the first and most easily triggered case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs fileJosh Poimboeuf1-12/+12
Change the format of the 'spectre_v2' vulnerabilities sysfs file slightly by converting the commas to semicolons, so that mitigations for future variants can be grouped together and separated by commas. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2024-04-08x86/apic: Force native_apic_mem_read() to use the MOV instructionAdam Dunlap1-1/+2
When done from a virtual machine, instructions that touch APIC memory must be emulated. By convention, MMIO accesses are typically performed via io.h helpers such as readl() or writeq() to simplify instruction emulation/decoding (ex: in KVM hosts and SEV guests) [0]. Currently, native_apic_mem_read() does not follow this convention, allowing the compiler to emit instructions other than the MOV instruction generated by readl(). In particular, when the kernel is compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler would emit a TESTL instruction which is not supported by the SEV-ES emulator, causing a boot failure in that environment. It is likely the same problem would happen in a TDX guest as that uses the same instruction emulator as SEV-ES. To make sure all emulators can emulate APIC memory reads via MOV, use the readl() function in native_apic_mem_read(). It is expected that any emulator would support MOV in any addressing mode as it is the most generic and is what is usually emitted currently. The TESTL instruction is emitted when native_apic_mem_read() is inlined into apic_mem_wait_icr_idle(). The emulator comes from insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to extend insn_decode_mmio() to support more instructions since, in theory, the compiler could choose to output nearly any instruction for such reads which would bloat the emulator beyond reason. [0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/ [ bp: Massage commit message, fix typos. ] Signed-off-by: Adam Dunlap <acdunlap@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Kevin Loughlin <kevinloughlin@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
2024-04-07Merge tag 'x86-urgent-2024-04-07' of ↵Linus Torvalds14-39/+150
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix MCE timer reinit locking - Fix/improve CoCo guest random entropy pool init - Fix SEV-SNP late disable bugs - Fix false positive objtool build warning - Fix header dependency bug - Fix resctrl CPU offlining bug * tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk x86/mce: Make sure to grab mce_sysfs_mutex in set_bank() x86/CPU/AMD: Track SNP host status with cc_platform_*() x86/cc: Add cc_platform_set/_clear() helpers x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM x86/coco: Require seeding RNG with RDRAND on CoCo systems x86/numa/32: Include missing <asm/pgtable_areas.h> x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline
2024-04-07Merge tag 'timers-urgent-2024-04-07' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Fix various timer bugs: - Fix a timer migration bug that may result in missed events - Fix timer migration group hierarchy event updates - Fix a PowerPC64 build warning - Fix a handful of DocBook annotation bugs" * tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Return early on deactivation timers/migration: Fix ignored event due to missing CPU update vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h timers: Fix text inconsistencies and spelling tick/sched: Fix struct tick_sched doc warnings tick/sched: Fix various kernel-doc warnings timers: Fix kernel-doc format and add Return values time/timekeeping: Fix kernel-doc warnings and typos time/timecounter: Fix inline documentation
2024-04-07Merge tag 'perf-urgent-2024-04-07' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Ingo Molnar: "Fix a combined PEBS events bug on x86 Intel CPUs" * tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
2024-04-06x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunkBorislav Petkov (AMD)1-0/+1
srso_alias_untrain_ret() is special code, even if it is a dummy which is called in the !SRSO case, so annotate it like its real counterpart, to address the following objtool splat: vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0 Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
2024-04-06Merge branch 'linus' into x86/urgent, to pick up dependent commitIngo Molnar43-203/+331
We want to fix: 0e110732473e ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO") So merge in Linus's latest into x86/urgent to have it available. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-05Merge tag 'devicetree-fixes-for-6.9-1' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix NIOS2 boot with external DTB - Add missing synchronization needed between fw_devlink and DT overlay removals - Fix some unit-address regex's to be hex only - Drop some 10+ year old "unstable binding" statements - Add new SoCs to QCom UFS binding - Add TPM bindings to TPM maintainers * tag 'devicetree-fixes-for-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: nios2: Only use built-in devicetree blob if configured to do so dt-bindings: timer: narrow regex for unit address to hex numbers dt-bindings: soc: fsl: narrow regex for unit address to hex numbers dt-bindings: remoteproc: ti,davinci: remove unstable remark dt-bindings: clock: ti: remove unstable remark dt-bindings: clock: keystone: remove unstable remark of: module: prevent NULL pointer dereference in vsnprintf() dt-bindings: ufs: qcom: document SM6125 UFS dt-bindings: ufs: qcom: document SC7180 UFS dt-bindings: ufs: qcom: document SC8180X UFS of: dynamic: Synchronize of_changeset_destroy() with the devlink removals driver core: Introduce device_link_wait_removal() docs: dt-bindings: add missing address/size-cells to example MAINTAINERS: Add TPM DT bindings to TPM maintainers
2024-04-05Merge tag 'mm-hotfixes-stable-2024-04-05-11-30' of ↵Linus Torvalds1-14/+35
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "8 hotfixes, 3 are cc:stable There are a couple of fixups for this cycle's vmalloc changes and one for the stackdepot changes. And a fix for a very old x86 PAT issue which can cause a warning splat" * tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: stackdepot: rename pool_index to pool_index_plus_1 x86/mm/pat: fix VM_PAT handling in COW mappings MAINTAINERS: change vmware.com addresses to broadcom.com selftests/mm: include strings.h for ffsl mm: vmalloc: fix lockdep warning mm: vmalloc: bail out early in find_vmap_area() if vmap is not init init: open output files from cpio unpacking with O_LARGEFILE mm/secretmem: fix GUP-fast succeeding on secretmem folios
2024-04-05Merge tag 'arm64-fixes' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "arm64/ptrace fix to use the correct SVE layout based on the saved floating point state rather than the TIF_SVE flag. The latter may be left on during syscalls even if the SVE state is discarded" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/ptrace: Use saved floating point state type to determine SVE layout
2024-04-05Merge tag 'riscv-for-linus-6.9-rc3' of ↵Linus Torvalds12-20/+34
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for an __{get,put}_kernel_nofault to avoid an uninitialized value causing spurious failures - compat_vdso.so.dbg is now installed to the standard install location - A fix to avoid initializing PERF_SAMPLE_BRANCH_*-related events, as they aren't supported and will just later fail - A fix to make AT_VECTOR_SIZE_ARCH correct now that we're providing AT_MINSIGSTKSZ - pgprot_nx() is now implemented, which fixes vmap W^X protection - A fix for the vector save/restore code, which at least manifests as corrupted vector state when a signal is taken - A fix for a race condition in instruction patching - A fix to avoid leaking the kernel-mode GP to userspace, which is a kernel pointer leak that can be used to defeat KASLR in various ways - A handful of smaller fixes to build warnings, an overzealous printk, and some missing tracing annotations * tag 'riscv-for-linus-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: process: Fix kernel gp leakage riscv: Disable preemption when using patch_map() riscv: Fix warning by declaring arch_cpu_idle() as noinstr riscv: use KERN_INFO in do_trap riscv: Fix vector state restore in rt_sigreturn() riscv: mm: implement pgprot_nx riscv: compat_vdso: align VDSOAS build log RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ riscv: Mark __se_sys_* functions __used drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/ riscv: hwprobe: do not produce frtace relocation riscv: Fix spurious errors from __get/put_kernel_nofault riscv: mm: Fix prototype to avoid discarding const
2024-04-05Merge tag 's390-6.9-3' of ↵Linus Torvalds7-58/+67
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Fix missing NULL pointer check when determining guest/host fault - Mark all functions in asm/atomic_ops.h, asm/atomic.h and asm/preempt.h as __always_inline to avoid unwanted instrumentation - Fix removal of a Processor Activity Instrumentation (PAI) sampling event in PMU device driver - Align system call table on 8 bytes * tag 's390-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/entry: align system call table on 8 bytes s390/pai: fix sampling event removal for PMU device driver s390/preempt: mark all functions __always_inline s390/atomic: mark all functions __always_inline s390/mm: fix NULL pointer dereference
2024-04-05x86/mm/pat: fix VM_PAT handling in COW mappingsDavid Hildenbrand1-14/+35
PAT handling won't do the right thing in COW mappings: the first PTE (or, in fact, all PTEs) can be replaced during write faults to point at anon folios. Reliably recovering the correct PFN and cachemode using follow_phys() from PTEs will not work in COW mappings. Using follow_phys(), we might just get the address+protection of the anon folio (which is very wrong), or fail on swap/nonswap entries, failing follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and track_pfn_copy(), not properly calling free_pfn_range(). In free_pfn_range(), we either wouldn't call memtype_free() or would call it with the wrong range, possibly leaking memory. To fix that, let's update follow_phys() to refuse returning anon folios, and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings if we run into that. We will now properly handle untrack_pfn() with COW mappings, where we don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if the first page was replaced by an anon folio, though: we'd have to store the cachemode in the VMA to make this work, likely growing the VMA size. For now, lets keep it simple and let track_pfn_copy() just fail in that case: it would have failed in the past with swap/nonswap entries already, and it would have done the wrong thing with anon folios. Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn(): <--- C reproducer ---> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <liburing.h> int main(void) { struct io_uring_params p = {}; int ring_fd; size_t size; char *map; ring_fd = io_uring_setup(1, &p); if (ring_fd < 0) { perror("io_uring_setup"); return 1; } size = p.sq_off.array + p.sq_entries * sizeof(unsigned); /* Map the submission queue ring MAP_PRIVATE */ map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, ring_fd, IORING_OFF_SQ_RING); if (map == MAP_FAILED) { perror("mmap"); return 1; } /* We have at least one page. Let's COW it. */ *map = 0; pause(); return 0; } <--- C reproducer ---> On a system with 16 GiB RAM and swap configured: # ./iouring & # memhog 16G # killall iouring [ 301.552930] ------------[ cut here ]------------ [ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100 [ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g [ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1 [ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4 [ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100 [ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000 [ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282 [ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047 [ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200 [ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000 [ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000 [ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000 [ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000 [ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0 [ 301.565725] PKRU: 55555554 [ 301.565944] Call Trace: [ 301.566148] <TASK> [ 301.566325] ? untrack_pfn+0xf4/0x100 [ 301.566618] ? __warn+0x81/0x130 [ 301.566876] ? untrack_pfn+0xf4/0x100 [ 301.567163] ? report_bug+0x171/0x1a0 [ 301.567466] ? handle_bug+0x3c/0x80 [ 301.567743] ? exc_invalid_op+0x17/0x70 [ 301.568038] ? asm_exc_invalid_op+0x1a/0x20 [ 301.568363] ? untrack_pfn+0xf4/0x100 [ 301.568660] ? untrack_pfn+0x65/0x100 [ 301.568947] unmap_single_vma+0xa6/0xe0 [ 301.569247] unmap_vmas+0xb5/0x190 [ 301.569532] exit_mmap+0xec/0x340 [ 301.569801] __mmput+0x3e/0x130 [ 301.570051] do_exit+0x305/0xaf0 ... Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Wupeng Ma <mawupeng1@huawei.com> Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines") Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3") Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05arm64: dts: mediatek: mt2712: fix validation errorsRafał Miłecki2-5/+6
1. Fixup infracfg clock controller binding It also acts as reset controller so #reset-cells is required. 2. Use -pins suffix for pinctrl This fixes: arch/arm64/boot/dts/mediatek/mt2712-evb.dtb: syscon@10001000: '#reset-cells' is a required property from schema $id: http://devicetree.org/schemas/arm/mediatek/mediatek,infracfg.yaml# arch/arm64/boot/dts/mediatek/mt2712-evb.dtb: pinctrl@1000b000: 'eth_default', 'eth_sleep', 'usb0_iddig', 'usb1_iddig' do not match any of the regexes: 'pinctrl-[0-9]+', 'pins$' from schema $id: http://devicetree.org/schemas/pinctrl/mediatek,mt65xx-pinctrl.yaml# Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240301074741.8362-1-zajec5@gmail.com [Angelo: Added Fixes tags] Fixes: 5d4839709c8e ("arm64: dts: mt2712: Add clock controller device nodes") Fixes: 1724f4cc5133 ("arm64: dts: Add USB3 related nodes for MT2712") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-05arm64: dts: mediatek: mt7986: prefix BPI-R3 cooling maps with "map-"Rafał Miłecki1-3/+3
This fixes: arch/arm64/boot/dts/mediatek/mt7986a-bananapi-bpi-r3.dtb: thermal-zones: cpu-thermal:cooling-maps: 'cpu-active-high', 'cpu-active-low', 'cpu-active-med' do not match any of the regexes: '^map[-a-zA-Z0-9]*$', 'pinctrl-[0-9]+' from schema $id: http://devicetree.org/schemas/thermal/thermal-zones.yaml# Fixes: c26f779a2295 ("arm64: dts: mt7986: add pwm-fan and cooling-maps to BPI-R3 dts") Cc: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240213061459.17917-1-zajec5@gmail.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-05arm64: dts: mediatek: mt7986: drop invalid thermal block clockRafał Miłecki1-3/+2
Thermal block uses only two clocks. Its binding doesn't document or allow "adc_32k". Also Linux driver doesn't support it. It has been additionally verified by Angelo by his detailed research on MT7981 / MT7986 clocks (thanks!). This fixes: arch/arm64/boot/dts/mediatek/mt7986a-bananapi-bpi-r3.dtb: thermal@1100c800: clocks: [[4, 27], [4, 44], [4, 45]] is too long from schema $id: http://devicetree.org/schemas/thermal/mediatek,thermal.yaml# arch/arm64/boot/dts/mediatek/mt7986a-bananapi-bpi-r3.dtb: thermal@1100c800: clock-names: ['therm', 'auxadc', 'adc_32k'] is too long from schema $id: http://devicetree.org/schemas/thermal/mediatek,thermal.yaml# Fixes: 0a9615d58d04 ("arm64: dts: mt7986: add thermal and efuse") Cc: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/linux-devicetree/17d143aa-576e-4d67-a0ea-b79f3518b81c@collabora.com/ Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240213053739.14387-3-zajec5@gmail.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-05arm64: dts: mediatek: mt7986: drop "#reset-cells" from Ethernet controllerRafał Miłecki1-1/+0
Ethernet block doesn't include or act as a reset controller. Documentation also doesn't document "#reset-cells" for it. This fixes: arch/arm64/boot/dts/mediatek/mt7986a-bananapi-bpi-r3.dtb: ethernet@15100000: Unevaluated properties are not allowed ('#reset-cells' was unexpected) from schema $id: http://devicetree.org/schemas/net/mediatek,net.yaml# Fixes: 082ff36bd5c0 ("arm64: dts: mediatek: mt7986: introduce ethernet nodes") Cc: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240213053739.14387-2-zajec5@gmail.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-05arm64: dts: mediatek: mt7986: drop invalid properties from ethsysRafał Miłecki1-2/+0
Mediatek ethsys controller / syscon binding doesn't allow any subnodes so "#address-cells" and "#size-cells" are redundant (actually: disallowed). This fixes: arch/arm64/boot/dts/mediatek/mt7986a-bananapi-bpi-r3.dtb: syscon@15000000: '#address-cells', '#size-cells' do not match any of the regexes: 'pinctrl-[0-9]+' from schema $id: http://devicetree.org/schemas/clock/mediatek,ethsys.yaml# Fixes: 1f9986b258c2 ("arm64: dts: mediatek: add clock support for mt7986a") Cc: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240213053739.14387-1-zajec5@gmail.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-04x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined wordSean Christopherson2-0/+4
Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate compile-time assert in KVM to prevent direct lookups on the features in CPUID_LNX_5. KVM uses X86_FEATURE_* flags to manage guest CPUID, and so must translate features that are scattered by Linux from the Linux-defined bit to the hardware-defined bit, i.e. should never try to directly access scattered features in guest CPUID. Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a compile-time assert in KVM's CPUID infrastructure to ensure that future additions update cpuid_leafs along with NCAPINTS. No functional change intended. Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features") Cc: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-04Merge tag 'net-6.9-rc3' of ↵Linus Torvalds4-15/+14
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bluetooth and bpf. Fairly usual collection of driver and core fixes. The large selftest accompanying one of the fixes is also becoming a common occurrence. Current release - regressions: - ipv6: fix infinite recursion in fib6_dump_done() - net/rds: fix possible null-deref in newly added error path Current release - new code bugs: - net: do not consume a full cacheline for system_page_pool - bpf: fix bpf_arena-related file descriptor leaks in the verifier - drv: ice: fix freeing uninitialized pointers, fixing misuse of the newfangled __free() auto-cleanup Previous releases - regressions: - x86/bpf: fixes the BPF JIT with retbleed=stuff - xen-netfront: add missing skb_mark_for_recycle, fix page pool accounting leaks, revealed by recently added explicit warning - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6 non-wildcard addresses - Bluetooth: - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with better workarounds to un-break some buggy Qualcomm devices - set conn encrypted before conn establishes, fix re-connecting to some headsets which use slightly unusual sequence of msgs - mptcp: - prevent BPF accessing lowat from a subflow socket - don't account accept() of non-MPC client as fallback to TCP - drv: mana: fix Rx DMA datasize and skb_over_panic - drv: i40e: fix VF MAC filter removal Previous releases - always broken: - gro: various fixes related to UDP tunnels - netns crossing problems, incorrect checksum conversions, and incorrect packet transformations which may lead to panics - bpf: support deferring bpf_link dealloc to after RCU grace period - nf_tables: - release batch on table validation from abort path - release mutex after nft_gc_seq_end from abort path - flush pending destroy work before exit_net release - drv: r8169: skip DASH fw status checks when DASH is disabled" * tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) netfilter: validate user input for expected length net/sched: act_skbmod: prevent kernel-infoleak net: usb: ax88179_178a: avoid the interface always configured as random address net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45() net: ravb: Always update error counters net: ravb: Always process TX descriptor ring netfilter: nf_tables: discard table flag update with pending basechain deletion netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() netfilter: nf_tables: reject new basechain after table flag update netfilter: nf_tables: flush pending destroy work before exit_net release netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path netfilter: nf_tables: release batch on table validation from abort path Revert "tg3: Remove residual error handling in tg3_suspend" tg3: Remove residual error handling in tg3_suspend net: mana: Fix Rx DMA datasize and skb_over_panic net/sched: fix lockdep splat in qdisc_tree_reduce_backlog() net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping net: stmmac: fix rx queue priority assignment net: txgbe: fix i2c dev name cannot match clkdev net: fec: Set mac_managed_pm during probe ...
2024-04-04riscv: process: Fix kernel gp leakageStefan O'Rear1-3/+0
childregs represents the registers which are active for the new thread in user context. For a kernel thread, childregs->gp is never used since the kernel gp is not touched by switch_to. For a user mode helper, the gp value can be observed in user space after execve or possibly by other means. [From the email thread] The /* Kernel thread */ comment is somewhat inaccurate in that it is also used for user_mode_helper threads, which exec a user process, e.g. /sbin/init or when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. childregs is the *user* context during syscall execution and it is observable from userspace in at least five ways: 1. kernel_execve does not currently clear integer registers, so the starting register state for PID 1 and other user processes started by the kernel has sp = user stack, gp = kernel __global_pointer$, all other integer registers zeroed by the memset in the patch comment. This is a bug in its own right, but I'm unwilling to bet that it is the only way to exploit the issue addressed by this patch. 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread before it execs, but ptrace requires SIGSTOP to be delivered which can only happen at user/kernel boundaries. 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for user_mode_helpers before the exec completes, but gp is not one of the registers it returns. 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses are also exposed via PERF_SAMPLE_REGS_USER which is permitted under LOCKDOWN_PERF. I have not attempted to write exploit code. 5. Much of the tracing infrastructure allows access to user registers. I have not attempted to determine which forms of tracing allow access to user registers without already allowing access to kernel registers. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Cc: stable@vger.kernel.org Signed-off-by: Stefan O'Rear <sorear@fastmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04riscv: Disable preemption when using patch_map()Alexandre Ghiti1-0/+8
patch_map() uses fixmap mappings to circumvent the non-writability of the kernel text mapping. The __set_fixmap() function only flushes the current cpu tlb, it does not emit an IPI so we must make sure that while we use a fixmap mapping, the current task is not migrated on another cpu which could miss the newly introduced fixmap mapping. So in order to avoid any task migration, disable the preemption. Reported-by: Andrea Parri <andrea@rivosinc.com> Closes: https://lore.kernel.org/all/ZcS+GAaM25LXsBOl@andrea/ Reported-by: Andy Chiu <andy.chiu@sifive.com> Closes: https://lore.kernel.org/linux-riscv/CABgGipUMz3Sffu-CkmeUB1dKVwVQ73+7=sgC45-m0AE9RCjOZg@mail.gmail.com/ Fixes: cad539baa48f ("riscv: implement a memset like function for text") Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock") Co-developed-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240326203017.310422-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04riscv: Fix warning by declaring arch_cpu_idle() as noinstrAlexandre Ghiti1-1/+1
The following warning appears when using ftrace: [89855.443413] RCU not on for: arch_cpu_idle+0x0/0x1c [89855.445640] WARNING: CPU: 5 PID: 0 at include/linux/trace_recursion.h:162 arch_ftrace_ops_list_func+0x208/0x228 [89855.445824] Modules linked in: xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) cfg80211(E) nls_iso8859_1(E) ofpart(E) redboot(E) cmdlinepart(E) cfi_cmdset_0001(E) virtio_net(E) cfi_probe(E) cfi_util(E) 9pnet_virtio(E) gen_probe(E) net_failover(E) virtio_rng(E) failover(E) 9pnet(E) physmap(E) map_funcs(E) chipreg(E) mtd(E) uio_pdrv_genirq(E) uio(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) drm(E) efi_pstore(E) backlight(E) ip_tables(E) x_tables(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) raid1(E) raid0(E) virtio_blk(E) [89855.451563] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G E 6.8.0-rc6ubuntu-defconfig #2 [89855.451726] Hardware name: riscv-virtio,qemu (DT) [89855.451899] epc : arch_ftrace_ops_list_func+0x208/0x228 [89855.452016] ra : arch_ftrace_ops_list_func+0x208/0x228 [89855.452119] epc : ffffffff8016b216 ra : ffffffff8016b216 sp : ffffaf808090fdb0 [89855.452171] gp : ffffffff827c7680 tp : ffffaf808089ad40 t0 : ffffffff800c0dd8 [89855.452216] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffffaf808090fe30 [89855.452306] s1 : 0000000000000000 a0 : 0000000000000026 a1 : ffffffff82cd6ac8 [89855.452423] a2 : ffffffff800458c8 a3 : ffffaf80b1870640 a4 : 0000000000000000 [89855.452646] a5 : 0000000000000000 a6 : 00000000ffffffff a7 : ffffffffffffffff [89855.452698] s2 : ffffffff82766872 s3 : ffffffff80004caa s4 : ffffffff80ebea90 [89855.452743] s5 : ffffaf808089bd40 s6 : 8000000a00006e00 s7 : 0000000000000008 [89855.452787] s8 : 0000000000002000 s9 : 0000000080043700 s10: 0000000000000000 [89855.452831] s11: 0000000000000000 t3 : 0000000000100000 t4 : 0000000000000064 [89855.452874] t5 : 000000000000000c t6 : ffffaf80b182dbfc [89855.452929] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [89855.453053] [<ffffffff8016b216>] arch_ftrace_ops_list_func+0x208/0x228 [89855.453191] [<ffffffff8000e082>] ftrace_call+0x8/0x22 [89855.453265] [<ffffffff800a149c>] do_idle+0x24c/0x2ca [89855.453357] [<ffffffff8000da54>] return_to_handler+0x0/0x26 [89855.453429] [<ffffffff8000b716>] smp_callin+0x92/0xb6 [89855.453785] ---[ end trace 0000000000000000 ]--- To fix this, mark arch_cpu_idle() as noinstr, like it is done in commit a9cbc1b471d2 ("s390/idle: mark arch_cpu_idle() noinstr"). Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Closes: https://lore.kernel.org/linux-riscv/51f21b87-ebed-4411-afbc-c00d3dea2bab@yadro.com/ Fixes: cfbc4f81c9d0 ("riscv: Select ARCH_WANTS_NO_INSTR") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240326203017.310422-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04riscv: use KERN_INFO in do_trapAndreas Schwab1-1/+1
Print the instruction dump with info instead of emergency level. The unhandled signal message is only for informational purpose. Fixes: b8a03a634129 ("riscv: add userland instruction dump to RISC-V splats") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/mvmy1aegrhm.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04Merge tag 'for-netdev' of ↵Jakub Kicinski3-15/+12
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-04-04 We've added 7 non-merge commits during the last 5 day(s) which contain a total of 9 files changed, 75 insertions(+), 24 deletions(-). The main changes are: 1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to incorrect destination IP calculation and incorrect IP for relocations, from Uros Bizjak and Joan Bruguera Micó. 2) Fix BPF arena file descriptor leaks in the verifier, from Anton Protopopov. 3) Defer bpf_link deallocation to after RCU grace period as currently running multi-{kprobes,uprobes} programs might still access cookie information from the link, from Andrii Nakryiko. 4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported by syzkaller, from Jakub Sitnicki. 5) Fix resolve_btfids build with musl libc due to missing linux/types.h include, from Natanael Copa. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Prevent lock inversion deadlock in map delete elem x86/bpf: Fix IP for relocating call depth accounting x86/bpf: Fix IP after emitting call depth accounting bpf: fix possible file descriptor leaks in verifier tools/resolve_btfids: fix build with musl libc bpf: support deferring bpf_link dealloc to after RCU grace period bpf: put uprobe link's path and task in release callback ==================== Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()Borislav Petkov (AMD)1-1/+3
Modifying a MCA bank's MCA_CTL bits which control which error types to be reported is done over /sys/devices/system/machinecheck/ ├── machinecheck0 │   ├── bank0 │   ├── bank1 │   ├── bank10 │   ├── bank11 ... sysfs nodes by writing the new bit mask of events to enable. When the write is accepted, the kernel deletes all current timers and reinits all banks. Doing that in parallel can lead to initializing a timer which is already armed and in the timer wheel, i.e., in use already: ODEBUG: init active (active state 0) object: ffff888063a28000 object type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642 WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514 debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514 Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code does. Reported by: Yue Sun <samsun1006219@gmail.com> Reported by: xingwei lee <xrivendell7@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@mail.gmail.com
2024-04-05powerpc/crypto/chacha-p10: Fix failure on non Power10Michael Ellerman1-1/+7
The chacha-p10-crypto module provides optimised chacha routines for Power10. It also selects CRYPTO_ARCH_HAVE_LIB_CHACHA which says it provides chacha_crypt_arch() to generic code. Notably the module needs to provide chacha_crypt_arch() regardless of whether it is loaded on Power10 or an older CPU. The implementation of chacha_crypt_arch() already has a fallback to chacha_crypt_generic(), however the module as a whole fails to load on pre-Power10, because of the use of module_cpu_feature_match(). This breaks for example loading wireguard: jostaberry-1:~ # modprobe -v wireguard insmod /lib/modules/6.8.0-lp155.8.g7e0e887-default/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko.zst modprobe: ERROR: could not insert 'wireguard': No such device Fix it by removing module_cpu_feature_match(), and instead check the CPU feature manually. If the CPU feature is not found, the module still loads successfully, but doesn't register the Power10 specific algorithms. That allows chacha_crypt_generic() to remain available for use, fixing the problem. [root@fedora ~]# modprobe -v wireguard insmod /lib/modules/6.8.0-00001-g786a790c4d79/kernel/net/ipv4/udp_tunnel.ko insmod /lib/modules/6.8.0-00001-g786a790c4d79/kernel/net/ipv6/ip6_udp_tunnel.ko insmod /lib/modules/6.8.0-00001-g786a790c4d79/kernel/lib/crypto/libchacha.ko insmod /lib/modules/6.8.0-00001-g786a790c4d79/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko insmod /lib/modules/6.8.0-00001-g786a790c4d79/kernel/lib/crypto/libchacha20poly1305.ko insmod /lib/modules/6.8.0-00001-g786a790c4d79/kernel/drivers/net/wireguard/wireguard.ko [ 18.910452][ T721] wireguard: allowedips self-tests: pass [ 18.914999][ T721] wireguard: nonce counter self-tests: pass [ 19.029066][ T721] wireguard: ratelimiter self-tests: pass [ 19.029257][ T721] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. [ 19.029361][ T721] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. Reported-by: Michal Suchánek <msuchanek@suse.de> Closes: https://lore.kernel.org/all/20240315122005.GG20665@kitsune.suse.cz/ Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240328130200.3041687-1-mpe@ellerman.id.au
2024-04-04tty: xtensa/iss: Use min() to fix Coccinelle warningThorsten Blum1-4/+2
Inline strlen() and use min() to fix the following Coccinelle/coccicheck warning reported by minmax.cocci: WARNING opportunity for min() Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Message-Id: <20240404075811.6936-3-thorsten.blum@toblux.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2024-04-04x86/CPU/AMD: Track SNP host status with cc_platform_*()Borislav Petkov (AMD)6-37/+45
The host SNP worthiness can determined later, after alternatives have been patched, in snp_rmptable_init() depending on cmdline options like iommu=pt which is incompatible with SNP, for example. Which means that one cannot use X86_FEATURE_SEV_SNP and will need to have a special flag for that control. Use that newly added CC_ATTR_HOST_SEV_SNP in the appropriate places. Move kdump_sev_callback() to its rightful place, while at it. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Link: https://lore.kernel.org/r/20240327154317.29909-6-bp@alien8.de
2024-04-04x86/cc: Add cc_platform_set/_clear() helpersBorislav Petkov (AMD)1-0/+52
Add functionality to set and/or clear different attributes of the machine as a confidential computing platform. Add the first one too: whether the machine is running as a host for SEV-SNP guests. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Link: https://lore.kernel.org/r/20240327154317.29909-5-bp@alien8.de
2024-04-04x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORMBorislav Petkov (AMD)1-0/+1
The functionality to load SEV-SNP guests by the host will soon rely on cc_platform* helpers because the cpu_feature* API with the early patching is insufficient when SNP support needs to be disabled late. Therefore, pull that functionality in. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Link: https://lore.kernel.org/r/20240327154317.29909-4-bp@alien8.de
2024-04-04x86/coco: Require seeding RNG with RDRAND on CoCo systemsJason A. Donenfeld3-0/+45
There are few uses of CoCo that don't rely on working cryptography and hence a working RNG. Unfortunately, the CoCo threat model means that the VM host cannot be trusted and may actively work against guests to extract secrets or manipulate computation. Since a malicious host can modify or observe nearly all inputs to guests, the only remaining source of entropy for CoCo guests is RDRAND. If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole is meant to gracefully continue on gathering entropy from other sources, but since there aren't other sources on CoCo, this is catastrophic. This is mostly a concern at boot time when initially seeding the RNG, as after that the consequences of a broken RDRAND are much more theoretical. So, try at boot to seed the RNG using 256 bits of RDRAND output. If this fails, panic(). This will also trigger if the system is booted without RDRAND, as RDRAND is essential for a safe CoCo boot. Add this deliberately to be "just a CoCo x86 driver feature" and not part of the RNG itself. Many device drivers and platforms have some desire to contribute something to the RNG, and add_device_randomness() is specifically meant for this purpose. Any driver can call it with seed data of any quality, or even garbage quality, and it can only possibly make the quality of the RNG better or have no effect, but can never make it worse. Rather than trying to build something into the core of the RNG, consider the particular CoCo issue just a CoCo issue, and therefore separate it all out into driver (well, arch/platform) code. [ bp: Massage commit message. ] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Elena Reshetova <elena.reshetova@intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240326160735.73531-1-Jason@zx2c4.com
2024-04-04x86/numa/32: Include missing <asm/pgtable_areas.h>Arnd Bergmann1-0/+1
The __vmalloc_start_set declaration is in a header that is not included in numa_32.c in current linux-next: arch/x86/mm/numa_32.c: In function 'initmem_init': arch/x86/mm/numa_32.c:57:9: error: '__vmalloc_start_set' undeclared (first use in this function) 57 | __vmalloc_start_set = true; | ^~~~~~~~~~~~~~~~~~~ arch/x86/mm/numa_32.c:57:9: note: each undeclared identifier is reported only once for each function it appears in Add an explicit #include. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240403202344.3463169-1-arnd@kernel.org
2024-04-03riscv: Fix vector state restore in rt_sigreturn()Björn Töpel1-7/+8
The RISC-V Vector specification states in "Appendix D: Calling Convention for Vector State" [1] that "Executing a system call causes all caller-saved vector registers (v0-v31, vl, vtype) and vstart to become unspecified.". In the RISC-V kernel this is called "discarding the vstate". Returning from a signal handler via the rt_sigreturn() syscall, vector discard is also performed. However, this is not an issue since the vector state should be restored from the sigcontext, and therefore not care about the vector discard. The "live state" is the actual vector register in the running context, and the "vstate" is the vector state of the task. A dirty live state, means that the vstate and live state are not in synch. When vectorized user_from_copy() was introduced, an bug sneaked in at the restoration code, related to the discard of the live state. An example when this go wrong: 1. A userland application is executing vector code 2. The application receives a signal, and the signal handler is entered. 3. The application returns from the signal handler, using the rt_sigreturn() syscall. 4. The live vector state is discarded upon entering the rt_sigreturn(), and the live state is marked as "dirty", indicating that the live state need to be synchronized with the current vstate. 5. rt_sigreturn() restores the vstate, except the Vector registers, from the sigcontext 6. rt_sigreturn() restores the Vector registers, from the sigcontext, and now the vectorized user_from_copy() is used. The dirty live state from the discard is saved to the vstate, making the vstate corrupt. 7. rt_sigreturn() returns to the application, which crashes due to corrupted vstate. Note that the vectorized user_from_copy() is invoked depending on the value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which means that vlen has to be larger than 128b for this bug to trigger. The fix is simply to mark the live state as non-dirty/clean prior performing the vstate restore. Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1] Reported-by: Charlie Jenkins <charlie@rivosinc.com> Reported-by: Vineet Gupta <vgupta@kernel.org> Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Vineet Gupta <vineetg@rivosinc.com> Link: https://lore.kernel.org/r/20240403072638.567446-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-03vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.hArnd Bergmann1-2/+1
Both the vdso rework and the CONFIG_PAGE_SHIFT changes were merged during the v6.9 merge window, so it is now possible to use CONFIG_PAGE_SHIFT instead of including asm/page.h in the vdso. This avoids the workaround for arm64 - commit 8b3843ae3634 ("vdso/datapage: Quick fix - use asm/page-def.h for ARM64") and addresses a build warning for powerpc64: In file included from <built-in>:4: In file included from /home/arnd/arm-soc/arm-soc/lib/vdso/gettimeofday.c:5: In file included from ../include/vdso/datapage.h:25: arch/powerpc/include/asm/page.h:230:9: error: result of comparison of constant 13835058055282163712 with expression of type 'unsigned long' is always true [-Werror,-Wtautological-constant-out-of-range-compare] 230 | return __pa(kaddr) >> PAGE_SHIFT; | ^~~~~~~~~~~ arch/powerpc/include/asm/page.h:217:37: note: expanded from macro '__pa' 217 | VIRTUAL_WARN_ON((unsigned long)(x) < PAGE_OFFSET); \ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ arch/powerpc/include/asm/page.h:202:73: note: expanded from macro 'VIRTUAL_WARN_ON' 202 | #define VIRTUAL_WARN_ON(x) WARN_ON(IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && (x)) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~ arch/powerpc/include/asm/bug.h:88:25: note: expanded from macro 'WARN_ON' 88 | int __ret_warn_on = !!(x); \ | ^ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20240320180228.136371-1-arnd@kernel.org
2024-04-03nios2: Only use built-in devicetree blob if configured to do soGuenter Roeck1-1/+5
Starting with commit 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware"), attempts to boot nios2 images with an external devicetree blob result in a crash. Kernel panic - not syncing: early_init_dt_alloc_memory_arch: Failed to allocate 72 bytes align=0x40 For nios2, a built-in devicetree blob always overrides devicetree blobs provided by ROMMON/BIOS. This includes the new dummy devicetree blob. Result is that the dummy devicetree blob is used even if an external devicetree blob is provided. Since the dummy devicetree blob does not include any memory information, memory allocations fail, resulting in the crash. To fix the problem, only use the built-in devicetree blob if CONFIG_NIOS2_DTB_SOURCE_BOOL is enabled. Fixes: 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware") Cc: Frank Rowand <frowand.list@gmail.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Rob Herring <robh@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240322065419.162416-1-linux@roeck-us.net Signed-off-by: Rob Herring <robh@kernel.org>
2024-04-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds14-90/+167
Pull KVM fixes from Paolo Bonzini: "ARM: - Ensure perf events programmed to count during guest execution are actually enabled before entering the guest in the nVHE configuration - Restore out-of-range handler for stage-2 translation faults - Several fixes to stage-2 TLB invalidations to avoid stale translations, possibly including partial walk caches - Fix early handling of architectural VHE-only systems to ensure E2H is appropriately set - Correct a format specifier warning in the arch_timer selftest - Make the KVM banner message correctly handle all of the possible configurations RISC-V: - Remove redundant semicolon in num_isa_ext_regs() - Fix APLIC setipnum_le/be write emulation - Fix APLIC in_clrip[x] read emulation x86: - Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID entries (old vs. new) and ultimately neglects to clear PV_UNHALT from vCPUs with HLT-exiting disabled - Documentation fixes for SEV - Fix compat ABI for KVM_MEMORY_ENCRYPT_OP - Fix a 14-year-old goof in a declaration shared by host and guest; the enabled field used by Linux when running as a guest pushes the size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is really unconsequential because KVM never consumes anything beyond the first 64 bytes, but the resulting struct does not match the documentation Selftests: - Fix spelling mistake in arch_timer selftest" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: arm64: Rationalise KVM banner output arm64: Fix early handling of FEAT_E2H0 not being implemented KVM: arm64: Ensure target address is granule-aligned for range TLBI KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range() KVM: arm64: Don't pass a TLBI level hint when zapping table entries KVM: arm64: Don't defer TLB invalidation when zapping table entries KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test KVM: arm64: Fix out-of-IPA space translation fault handling KVM: arm64: Fix host-programmed guest events in nVHE RISC-V: KVM: Fix APLIC in_clrip[x] read emulation RISC-V: KVM: Fix APLIC setipnum_le/be write emulation RISC-V: KVM: Remove second semicolon KVM: selftests: Fix spelling mistake "trigged" -> "triggered" Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP Documentation: kvm/sev: separate description of firmware KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES ...
2024-04-03x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSOBorislav Petkov (AMD)1-1/+4
The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO case is there only for the altenative in CALL_UNTRAIN_RET to have a symbol to resolve. However, testing with kernels which don't have CONFIG_MITIGATION_SRSO enabled, leads to the warning in patch_return() to fire: missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826 Put in a plain "ret" there so that gcc doesn't put a return thunk in in its place which special and gets checked. In addition: ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined! make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1 make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2 make: *** [Makefile:240: __sub-make] Chyba 2 since !SRSO builds would use the dummy return thunk as reported by petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-03arm64/ptrace: Use saved floating point state type to determine SVE layoutMark Brown1-4/+1
The SVE register sets have two different formats, one of which is a wrapped version of the standard FPSIMD register set and another with actual SVE register data. At present we check TIF_SVE to see if full SVE register state should be provided when reading the SVE regset but if we were in a syscall we may have saved only floating point registers even though that is set. Fix this and simplify the logic by checking and using the format which we recorded when deciding if we should use FPSIMD or SVE format. Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Cc: <stable@vger.kernel.org> # 6.2.x Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240325-arm64-ptrace-fp-type-v1-1-8dc846caf11f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-04-03s390/entry: align system call table on 8 bytesSumanth Korikkar1-0/+1
Align system call table on 8 bytes. With sys_call_table entry size of 8 bytes that eliminates the possibility of a system call pointer crossing cache line boundary. Cc: stable@kernel.org Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-04-03s390/pai: fix sampling event removal for PMU device driverThomas Richter2-6/+14
In case of a sampling event, the PAI PMU device drivers need a reference to this event. Currently to PMU device driver reference is removed when a sampling event is destroyed. This may lead to situations where the reference of the PMU device driver is removed while being used by a different sampling event. Reset the event reference pointer of the PMU device driver when a sampling event is deleted and before the next one might be added. Fixes: 39d62336f5c1 ("s390/pai: add support for cryptography counters") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-04-03s390/preempt: mark all functions __always_inlineIlya Leoshkevich1-18/+18
preempt_count-related functions are quite ubiquitous and may be called by noinstr ones, introducing unwanted instrumentation. Here is one example call chain: irqentry_nmi_enter() # noinstr lockdep_hardirqs_enabled() this_cpu_read() __pcpu_size_call_return() this_cpu_read_*() this_cpu_generic_read() __this_cpu_generic_read_nopreempt() preempt_disable_notrace() __preempt_count_inc() __preempt_count_add() They are very small, so there are no significant downsides to force-inlining them. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20240320230007.4782-3-iii@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-04-03s390/atomic: mark all functions __always_inlineIlya Leoshkevich2-33/+33
Atomic functions are quite ubiquitous and may be called by noinstr ones, introducing unwanted instrumentation. They are very small, so there are no significant downsides to force-inlining them. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20240320230007.4782-2-iii@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-04-03s390/mm: fix NULL pointer dereferenceHeiko Carstens1-1/+1
The recently added check to figure out if a fault happened on gmap ASCE dereferences the gmap pointer in lowcore without checking that it is not NULL. For all non-KVM processes the pointer is NULL, so that some value from lowcore will be read. With the current layouts of struct gmap and struct lowcore the read value (aka ASCE) is zero, so that this doesn't lead to any observable bug; at least currently. Fix this by adding the missing NULL pointer check. Fixes: 64c3431808bd ("s390/entry: compare gmap asce to determine guest/host fault") Acked-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-04-03arm64: dts: imx8qm-ss-dma: fix can lpcg indicesFrank Li1-4/+4
can1_lpcg: clock-controller@5ace0000 { ... Col1 Col2 clocks = <&clk IMX_SC_R_CAN_1 IMX_SC_PM_CLK_PER>,// 0 0 <&dma_ipg_clk>, // 1 4 <&dma_ipg_clk>; // 2 5 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>; }; Col1: index, which existing dts try to get. Col2: actual index in lpcg driver &flexcan2 { clocks = <&can1_lpcg 1>, <&can1_lpcg 0>; ^^ ^^ Should be: clocks = <&can1_lpcg IMX_LPCG_CLK_4>, <&can1_lpcg IMX_LPCG_CLK_0>; }; Arg0 is divided by 4 in lpcg driver. So flexcan get IMX_SC_PM_CLK_PER by <&can1_lpcg 1> and <&can1_lpcg 0>. Although function work, code logic is wrong. Fix it by using correct clock indices. Cc: stable@vger.kernel.org Fixes: be85831de020 ("arm64: dts: imx8qm: add can node in devicetree") Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2024-04-03arm64: dts: imx8-ss-dma: fix can lpcg indicesFrank Li1-6/+6
can0_lpcg: clock-controller@5acd0000 { ... Col1 Col2 clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>, // 0 0 <&dma_ipg_clk>, // 1 4 <&dma_ipg_clk>; // 2 5 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>; } Col1: index, which existing dts try to get. Col2: actual index in lpcg driver. flexcan1: can@5a8d0000 { clocks = <&can0_lpcg 1>, <&can0_lpcg 0>; ^^ ^^ Should be: clocks = <&can0_lpcg IMX_LPCG_CLK_4>, <&can0_lpcg IMX_LPCG_CLK_0>; }; Arg0 is divided by 4 in lpcg driver. flexcan driver get IMX_SC_PM_CLK_PER by <&can0_lpcg 1> and <&can0_lpcg 0>. Although function can work, code logic is wrong. Fix it by using correct clock indices. Cc: stable@vger.kernel.org Fixes: 5e7d5b023e03 ("arm64: dts: imx8qxp: add flexcan in adma") Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>