aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2021-02-28Linux 5.12-rc1HEADmasterLinus Torvalds1-3/+3
2021-02-28Merge tag 'ide-5.11-2021-02-28' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+2
Pull ide fix from Jens Axboe: "This is a leftover fix from 5.11, where I forgot to ship it your way" * tag 'ide-5.11-2021-02-28' of git://git.kernel.dk/linux-block: ide/falconide: Fix module unload
2021-02-28Merge tag 'kbuild-fixes-v5.12' of ↵Linus Torvalds6-21/+31
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix UNUSED_KSYMS_WHITELIST for Clang LTO - Make -s builds really silent irrespective of V= option - Fix build error when SUBLEVEL or PATCHLEVEL is empty * tag 'kbuild-fixes-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again kbuild: make -s option take precedence over V=1 ia64: remove redundant READELF from arch/ia64/Makefile kbuild: do not include include/config/auto.conf from adjust_autoksyms.sh kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO kbuild: lto: add _mcount to list of used symbols
2021-02-28Merge tag 'csky-for-linus-5.12-rc1' of git://github.com/c-sky/csky-linuxLinus Torvalds94-992/+1347
Pull arch/csky updates from Guo Ren: "Features: - add new memory layout 2.5G(user):1.5G(kernel) - add kmemleak support - reconstruct VDSO framework: add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO - add faulthandler_disabled() check - support (fix) swapon - add (fix) _PAGE_ACCESSED for default pgprot - abort uaccess retries upon fatal signal (from arm) Fixes and optimizations: - fix perf probe failure - fix show_regs doesn't contain regs->usp - remove custom asm/atomic.h implementation - fix barrier design - fix futex SMP implementation - fix asm/cmpxchg.h with correct ordering barrier - cleanup asm/spinlock.h - fix PTE global for 2.5:1.5 virtual memory - remove prologue of page fault handler in entry.S - fix TLB maintenance synchronization problem - add show_tlb for CPU_CK860 debug - fix FAULT_FLAG_XXX param for handle_mm_fault - fix update_mmu_cache called with user io mapping - fix do_page_fault parent irq status - fix a size determination in gpr_get() - pgtable.h: Coding convention - kprobe: Fix code in simulate without 'long' - fix pfn_valid error with wrong max_mapnr - use free_initmem_default() in free_initmem() - fix compile error" * tag 'csky-for-linus-5.12-rc1' of git://github.com/c-sky/csky-linux: (30 commits) csky: Fixup compile error csky: use free_initmem_default() in free_initmem() csky: Fixup pfn_valid error with wrong max_mapnr csky: Add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO csky: kprobe: Fixup code in simulate without 'long' csky: Fixup swapon csky: pgtable.h: Coding convention csky: Fixup _PAGE_ACCESSED for default pgprot csky: remove unused including <linux/version.h> csky: Fix a size determination in gpr_get() csky: Reconstruct VDSO framework csky: mm: abort uaccess retries upon fatal signal csky: Sync riscv mm/fault.c for easy maintenance csky: Fixup do_page_fault parent irq status csky: Add faulthandler_disabled() check csky: Fixup update_mmu_cache called with user io mapping csky: Fixup FAULT_FLAG_XXX param for handle_mm_fault csky: Add show_tlb for CPU_CK860 debug csky: Fix TLB maintenance synchronization problem csky: Add kmemleak support ...
2021-02-28Merge tag 'riscv-for-linus-5.12-mw1' of ↵Linus Torvalds4-19/+5
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: "A pair of patches that slipped through the cracks: - enable CPU hotplug in the defconfigs - some cleanups to setup_bootmem There's also a single fix for some randconfig build failures: - make NUMA depend on SMP" * tag 'riscv-for-linus-5.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Cleanup setup_bootmem() RISC-V: Enable CPU Hotplug in defconfigs RISC-V: Make NUMA depend on SMP
2021-02-28Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds30-316/+746
Pull more SCSI updates from James Bottomley: "This is a few driver updates (iscsi, mpt3sas) that were still in the staging queue when the merge window opened (all committed on or before 8 Feb) and some small bug fixes which came in during the merge window (all committed on 22 Feb)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits) scsi: hpsa: Correct dev cmds outstanding for retried cmds scsi: sd: Fix Opal support scsi: target: tcmu: Fix memory leak caused by wrong uio usage scsi: target: tcmu: Move some functions without code change scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc scsi: aic7xxx: Remove unused function pointer typedef ahc_bus_suspend/resume_t scsi: bnx2fc: Fix Kconfig warning & CNIC build errors scsi: ufs: Fix a duplicate dev quirk number scsi: aic79xx: Fix spelling of version scsi: target: core: Prevent underflow for service actions scsi: target: core: Add cmd length set before cmd complete scsi: iscsi: Drop session lock in iscsi_session_chkready() scsi: qla4xxx: Use iscsi_is_session_online() scsi: libiscsi: Reset max/exp cmdsn during recovery scsi: iscsi_tcp: Fix shost can_queue initialization scsi: libiscsi: Add helper to calculate max SCSI cmds per session scsi: libiscsi: Fix iSCSI host workq destruction scsi: libiscsi: Fix iscsi_task use after free() scsi: libiscsi: Drop taskqueuelock scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling ...
2021-02-28Merge tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds8-70/+94
Pull more xfs updates from Darrick Wong: "The most notable fix here prevents premature reuse of freed metadata blocks, and adding the ability to detect accidental nested transactions, which are not allowed here. - Restore a disused sysctl control knob that was inadvertently dropped during the merge window to avoid fstests regressions. - Don't speculatively release freed blocks from the busy list until we're actually allocating them, which fixes a rare log recovery regression. - Don't nest transactions when scanning for free space. - Add an idiot^Wmaintainer light to detect nested transactions. ;)" * tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: use current->journal_info for detecting transaction recursion xfs: don't nest transactions when scanning for eofblocks xfs: don't reuse busy extents on extent trim xfs: restore speculative_cow_prealloc_lifetime sysctl
2021-02-28Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-blockLinus Torvalds40-180/+173
Pull more block updates from Jens Axboe: "A few stragglers (and one due to me missing it originally), and fixes for changes in this merge window mostly. In particular: - blktrace cleanups (Chaitanya, Greg) - Kill dead blk_pm_* functions (Bart) - Fixes for the bio alloc changes (Christoph) - Fix for the partition changes (Christoph, Ming) - Fix for turning off iopoll with polled IO inflight (Jeffle) - nbd disconnect fix (Josef) - loop fsync error fix (Mauricio) - kyber update depth fix (Yang) - max_sectors alignment fix (Mikulas) - Add bio_max_segs helper (Matthew)" * tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits) block: Add bio_max_segs blktrace: fix documentation for blk_fill_rw() block: memory allocations in bounce_clone_bio must not fail block: remove the gfp_mask argument to bounce_clone_bio block: fix bounce_clone_bio for passthrough bios block-crypto-fallback: use a bio_set for splitting bios block: fix logging on capacity change blk-settings: align max_sectors on "logical_block_size" boundary block: reopen the device in blkdev_reread_part block: don't skip empty device in in disk_uevent blktrace: remove debugfs file dentries from struct blk_trace nbd: handle device refs for DESTROY_ON_DISCONNECT properly kyber: introduce kyber_depth_updated() loop: fix I/O error on fsync() in detached loop devices block: fix potential IO hang when turning off io_poll block: get rid of the trace rq insert wrapper blktrace: fix blk_rq_merge documentation blktrace: fix blk_rq_issue documentation blktrace: add blk_fill_rwbs documentation comment block: remove superfluous param in blk_fill_rwbs() ...
2021-02-28kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL againMasahiro Yamada1-2/+4
Commit 78d3bb4483ba ("kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL") fixed the build error for empty SUBLEVEL or PATCHLEVEL by prepending a zero. Commit 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255") re-introduced this issue. This time, we cannot take the same approach because we have C code: #define LINUX_VERSION_PATCHLEVEL $(PATCHLEVEL) #define LINUX_VERSION_SUBLEVEL $(SUBLEVEL) Replace empty SUBLEVEL/PATCHLEVEL with a zero. Fixes: 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-and-tested-by: Sasha Levin <sashal@kernel.org>
2021-02-28kbuild: make -s option take precedence over V=1Masahiro Yamada1-0/+1
'make -s' should be really silent. However, 'make -s V=1' prints noisy log messages from some shell scripts. Of course, such a combination is odd, but the build system needs to do the right thing even if a user gives strange input. If -s is given, KBUILD_VERBOSE should be forced to 0. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-28ia64: remove redundant READELF from arch/ia64/MakefileMasahiro Yamada1-1/+0
READELF is defined by the top Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-28kbuild: do not include include/config/auto.conf from adjust_autoksyms.shMasahiro Yamada1-3/+0
Commit cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts") split out the code that needs include/config/auto.conf. This script no longer needs to include include/config/auto.conf. Fixes: cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-28kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTOMasahiro Yamada3-16/+26
Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols") does not work as expected if the .config file has already specified CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling CONFIG_LTO_CLANG. So, the user-supplied whitelist and LTO-specific white list must be independent of each other. I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO handle whitelists in the same way. Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2021-02-27Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-blockLinus Torvalds38-1114/+694
Pull io_uring thread rewrite from Jens Axboe: "This converts the io-wq workers to be forked off the tasks in question instead of being kernel threads that assume various bits of the original task identity. This kills > 400 lines of code from io_uring/io-wq, and it's the worst part of the code. We've had several bugs in this area, and the worry is always that we could be missing some pieces for file types doing unusual things (recent /dev/tty example comes to mind, userfaultfd reads installing file descriptors is another fun one... - both of which need special handling, and I bet it's not the last weird oddity we'll find). With these identical workers, we can have full confidence that we're never missing anything. That, in itself, is a huge win. Outside of that, it's also more efficient since we're not wasting space and code on tracking state, or switching between different states. I'm sure we're going to find little things to patch up after this series, but testing has been pretty thorough, from the usual regression suite to production. Any issue that may crop up should be manageable. There's also a nice series of further reductions we can do on top of this, but I wanted to get the meat of it out sooner rather than later. The general worry here isn't that it's fundamentally broken. Most of the little issues we've found over the last week have been related to just changes in how thread startup/exit is done, since that's the main difference between using kthreads and these kinds of threads. In fact, if all goes according to plan, I want to get this into the 5.10 and 5.11 stable branches as well. That said, the changes outside of io_uring/io-wq are: - arch setup, simple one-liner to each arch copy_thread() implementation. - Removal of net and proc restrictions for io_uring, they are no longer needed or useful" * tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits) io-wq: remove now unused IO_WQ_BIT_ERROR io_uring: fix SQPOLL thread handling over exec io-wq: improve manager/worker handling over exec io_uring: ensure SQPOLL startup is triggered before error shutdown io-wq: make buffered file write hashed work map per-ctx io-wq: fix race around io_worker grabbing io-wq: fix races around manager/worker creation and task exit io_uring: ensure io-wq context is always destroyed for tasks arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread() io_uring: cleanup ->user usage io-wq: remove nr_process accounting io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS net: remove cmsg restriction from io_uring based send/recvmsg calls Revert "proc: don't allow async path resolution of /proc/self components" Revert "proc: don't allow async path resolution of /proc/thread-self components" io_uring: move SQPOLL thread io-wq forked worker io-wq: make io_wq_fork_thread() available to other users io-wq: only remove worker from free_list, if it was there io_uring: remove io_identity io_uring: remove any grabbing of context ...
2021-02-27Merge branch 'work.misc' of ↵Linus Torvalds14-65/+53
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff pile - no common topic here" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: whack-a-mole: don't open-code iminor/imajor 9p: fix misuse of sscanf() in v9fs_stat2inode() audit_alloc_mark(): don't open-code ERR_CAST() fs/inode.c: make inode_init_always() initialize i_ino to 0 vfs: don't unnecessarily clone write access for writable fds
2021-02-27Merge branch 'i2c/for-current' of ↵Linus Torvalds5-37/+11
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Three more bugfixes and one revert. I accidently applied one patch too early" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: exynos5: Preserve high speed master code Revert "i2c: i2c-qcom-geni: Add shutdown callback for i2c" i2c: designware: Get right data length i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
2021-02-27csky: Fixup compile errorGuo Ren52-52/+0
: error: C++ style comments are not allowed in ISO C90 // Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd. ^ error: (this will be reported only once per input file) Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2021-02-27csky: use free_initmem_default() in free_initmem()David Hildenbrand1-16/+1
The existing code is essentially free_initmem_default()->free_reserved_area() without poisoning. Note that existing code missed to update the managed page count of the zone. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Tested-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com>
2021-02-27csky: Fixup pfn_valid error with wrong max_mapnrGuo Ren1-2/+2
The max_mapnr is the number of PFNs, not absolute PFN offset. Using set_max_mapnr API instead of setting the value directly. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2021-02-27csky: Add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, ↵Guo Ren14-3/+225
HAVE_GENERIC_VDSO It could help to reduce the latency of the time-related functions in user space. We have referenced arm's and riscv's implementation for the patch. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Arnd Bergmann <arnd@arndb.de>
2021-02-27csky: kprobe: Fixup code in simulate without 'long'Guo Ren1-15/+7
The type of 'val' is 'unsigned long' in simulate_blz32, so 'val < 0' can't be true. Cast 'val' to 'long' here to determine branch token or not, Fixup instructions: bnezad32, bhsz32, bhz32, blsz32, blz32 Link: https://lore.kernel.org/linux-csky/CAJF2gTQjKXR9gpo06WAWG1aquiT87mATiMGorXs6ChxOxoe90Q@mail.gmail.com/T/#t Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Co-developed-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
2021-02-27csky: Fixup swaponGuo Ren3-9/+52
Current csky's swappon is broken by wrong swap PTE entry format. Now redesign the new format for abiv1 & abiv2 and make swappon + zram work properly on csky machines. C-SKY PTE has VALID, DIRTY to emulate PRESENT, READ, WRITE, EXEC attributes. GLOBAL bit is shared by two pages in the same tlb entry. So we need to keep GLOBAL, VALID, PRESENT zero in swp_pte. To distinguish PAGE_NONE and swp_pte, we need to use an additional bit (abiv1 is _PAGE_READ, abiv2 is _PAGE_WRITE). Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: Arnd Bergmann <arnd@arndb.de>
2021-02-27csky: pgtable.h: Coding conventionGuo Ren3-55/+36
C-SKY page table attributes only have 'Dirty' and 'Valid' to emulate 'PRESENT, READ, WRITE, EXEC, DIRTY, ACCESSED'. This patch cleanup unnecessary definition. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: Arnd Bergmann <arnd@arndb.de>
2021-02-27kbuild: lto: add _mcount to list of used symbolsArnd Bergmann1-0/+1
Some randconfig builds fail with undefined references to _mcount when CONFIG_TRIM_UNUSED_KSYMS is set: ERROR: modpost: "_mcount" [drivers/tee/optee/optee.ko] undefined! ERROR: modpost: "_mcount" [drivers/fsi/fsi-occ.ko] undefined! ERROR: modpost: "_mcount" [drivers/fpga/dfl-pci.ko] undefined! Since there is already a list of symbols that get generated at link time, add this one as well. Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-26riscv: Cleanup setup_bootmem()Kefeng Wang1-19/+2
After the following patches, commit de043da0b9e7 ("RISC-V: Fix usage of memblock_enforce_memory_limit") commit 1bd14a66ee52 ("RISC-V: Remove any memblock representing unusable memory area") commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") some logic is useless, kill the mem_start/start/end and unneeded code. Reviewed-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-02-26RISC-V: Enable CPU Hotplug in defconfigsAnup Patel2-0/+2
The CPU hotplug support has been tested on QEMU, Spike, and SiFive Unleashed so let's enable it by default in RV32 and RV64 defconfigs. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-02-26RISC-V: Make NUMA depend on SMPPalmer Dabbelt1-0/+1
In theory these are orthogonal, but in practice all NUMA systems are SMP. NUMA && !SMP doesn't build, everyone else is coupling them, and I don't really see any value in supporting that configuration. Fixes: 4f0e8eef772e ("riscv: Add numa support for riscv64 platform") Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Atish Patra <atishp@atishpatra.org> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-02-26block: Add bio_max_segsMatthew Wilcox (Oracle)20-52/+44
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-26Merge tag 'docs-5.12-2' of git://git.lwn.net/linuxLinus Torvalds53-81/+56
Pull documentation fixes from Jonathan Corbet: "A handful of late-arriving documentation fixes, nothing all that notable" * tag 'docs-5.12-2' of git://git.lwn.net/linux: docs: proc.rst: fix indentation warning Documentation: cgroup-v2: fix path to example BPF program docs: powerpc: Fix tables in syscall64-abi.rst Documentation: features: refresh feature list Documentation: features: remove c6x references docs: ABI: testing: ima_policy: Fixed missing bracket Fix unaesthetic indentation scripts: kernel-doc: fix array element capture in pointer-to-func parsing doc: use KCFLAGS instead of EXTRA_CFLAGS to pass flags from command line Documentation: proc.rst: add more about the 6 fields in loadavg
2021-02-26Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds9-141/+211
Pull OpenRISC updates from Stafford Horne: - Update for Litex SoC controller to support wider width registers as well as reset. - Refactor SMP code to use device tree to define possible cpus. - Update build including generating vmlinux.bin * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: Use devicetree to determine present cpus drivers/soc/litex: Add restart handler openrisc: add arch/openrisc/Kbuild drivers/soc/litex: make 'litex_[set|get]_reg()' methods private drivers/soc/litex: support 32-bit subregisters, 64-bit CPUs drivers/soc/litex: s/LITEX_REG_SIZE/LITEX_SUBREG_ALIGN/g drivers/soc/litex: separate MMIO from subregister offset calculation drivers/soc/litex: move generic accessors to litex.h openrisc: restart: Call common handlers before hanging openrisc: Add vmlinux.bin target
2021-02-26Merge tag 's390-5.12-2' of ↵Linus Torvalds12-81/+660
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Fix physical vs virtual confusion in some basic mm macros and routines. Caused by __pa == __va on s390 currently. - Get rid of on-stack cpu masks. - Add support for complete CPU counter set extraction. - Add arch_irq_work_raise implementation. - virtio-ccw revision and opcode fixes. * tag 's390-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cpumf: Add support for complete counter set extraction virtio/s390: implement virtio-ccw revision 2 correctly s390/smp: implement arch_irq_work_raise() s390/topology: move cpumasks away from stack s390/smp: smp_emergency_stop() - move cpumask away from stack s390/smp: __smp_rescan_cpus() - move cpumask away from stack s390/smp: consolidate locking for smp_rescan() s390/mm: fix phys vs virt confusion in vmem_*() functions family s390/mm: fix phys vs virt confusion in pgtable allocation routines s390/mm: fix invalid __pa() usage in pfn_pXd() macros s390/mm: make pXd_deref() macros return a pointer s390/opcodes: rename selhhhr to selfhr
2021-02-26Merge tag '5.12-smb3-part1' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds26-414/+883
Pull cifs updates from Steve French: - improvements to mode bit conversion, chmod and chown when using cifsacl mount option - two new mount options for controlling attribute caching - improvements to crediting and reconnect, improved debugging - reconnect fix - add SMB3.1.1 dialect to default dialects for vers=3 * tag '5.12-smb3-part1' of git://git.samba.org/sfrench/cifs-2.6: (27 commits) cifs: update internal version number cifs: use discard iterator to discard unneeded network data more efficiently cifs: introduce helper for finding referral server to improve DFS target resolution cifs: check all path components in resolved dfs target cifs: fix DFS failover cifs: fix nodfs mount option cifs: fix handling of escaped ',' in the password mount argument cifs: Add new parameter "acregmax" for distinct file and directory metadata timeout cifs: convert revalidate of directories to using directory metadata cache timeout cifs: Add new mount parameter "acdirmax" to allow caching directory metadata cifs: If a corrupted DACL is returned by the server, bail out. cifs: minor simplification to smb2_is_network_name_deleted TCON Reconnect during STATUS_NETWORK_NAME_DELETED cifs: cleanup a few le16 vs. le32 uses in cifsacl.c cifs: Change SIDs in ACEs while transferring file ownership. cifs: Retain old ACEs when converting between mode bits and ACL. cifs: Fix cifsacl ACE mask for group and others. cifs: clarify hostname vs ip address in /proc/fs/cifs/DebugData cifs: change confusing field serverName (to ip_addr) cifs: Fix inconsistent IS_ERR and PTR_ERR ...
2021-02-26Merge tag 'for-5.12/io_uring-2021-02-25' of git://git.kernel.dk/linux-blockLinus Torvalds1-340/+346
Pull more io_uring updates from Jens Axboe: "A collection of later fixes that we should get into this release: - Series of submission cleanups (Pavel) - A few fixes for issues from earlier this merge window (Pavel, me) - IOPOLL resubmission fix - task_work locking fix (Hao)" * tag 'for-5.12/io_uring-2021-02-25' of git://git.kernel.dk/linux-block: (25 commits) Revert "io_uring: wait potential ->release() on resurrect" io_uring: fix locked_free_list caches_free() io_uring: don't attempt IO reissue from the ring exit path io_uring: clear request count when freeing caches io_uring: run task_work on io_uring_register() io_uring: fix leaving invalid req->flags io_uring: wait potential ->release() on resurrect io_uring: keep generic rsrc infra generic io_uring: zero ref_node after killing it io_uring: make the !CONFIG_NET helpers a bit more robust io_uring: don't hold uring_lock when calling io_run_task_work* io_uring: fail io-wq submission from a task_work io_uring: don't take uring_lock during iowq cancel io_uring: fail links more in io_submit_sqe() io_uring: don't do async setup for links' heads io_uring: do io_*_prep() early in io_submit_sqe() io_uring: split sqe-prep and async setup io_uring: don't submit link on error io_uring: move req link into submit_state io_uring: move io_init_req() into io_submit_sqe() ...
2021-02-26Merge branch 'stable/for-linus-5.12' of ↵Linus Torvalds6-123/+215
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: "Two memory encryption related patches (SWIOTLB is enabled by default for AMD-SEV): - Add support for alignment so that NVME can properly work - Keep track of requested DMA buffers length, as underlaying hardware devices can trip SWIOTLB to bounce too much and crash the kernel And a tiny fix to use proper APIs in drivers" * 'stable/for-linus-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Validate bounce size in the sync/unmap path nvme-pci: set min_align_mask swiotlb: respect min_align_mask swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single swiotlb: refactor swiotlb_tbl_map_single swiotlb: clean up swiotlb_tbl_unmap_single swiotlb: factor out a nr_slots helper swiotlb: factor out an io_tlb_offset helper swiotlb: add a IO_TLB_SIZE define driver core: add a min_align_mask field to struct device_dma_parameters sdhci: stop poking into swiotlb internals
2021-02-26Merge tag 'leds-5.12-rc1' of ↵Linus Torvalds24-169/+1196
git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds Pull LED updates from Pavel Machek: "Besides the usual fixes and new drivers, we are changing CLASS_FLASH to return success to make it easier to work with V4L2 stuff disabled, and we are getting rid of enum that should have been plain integer long time ago. I'm slightly nervous about potential warnings, but it needed to be fixed at some point" * tag 'leds-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: leds: lp50xx: Get rid of redundant explicit casting leds: lp50xx: Update headers block to reflect reality leds: lp50xx: Get rid of redundant check in lp50xx_enable_disable() leds: lp50xx: Reduce level of dereferences leds: lp50xx: Switch to new style i2c-driver probe function leds: lp50xx: Don't spam logs when probe is deferred leds: apu: extend support for PC Engines APU1 with newer firmware leds: flash: Fix multicolor no-ops registration by return 0 leds: flash: Add flash registration with undefined CONFIG_LEDS_CLASS_FLASH leds: lgm: Add LED controller driver for LGM SoC dt-bindings: leds: Add bindings for Intel LGM SoC leds: led-core: Get rid of enum led_brightness leds: gpio: Set max brightness to 1 leds: lm3533: Switch to using the new API kobj_to_dev() leds: ss4200: simplify the return expression of register_nasgpio_led() leds: Use DEVICE_ATTR_{RW, RO, WO} macros
2021-02-26Merge branch 'pcmcia-next' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull pcmcia update from Dominik Brodowski: "Improve the use of the kobj API in the core of the Linux PCMCIA subsystem" * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: Switch to using the new API kobj_to_dev()
2021-02-26Merge tag 'riscv-for-linus-5.12-mw0' of ↵Linus Torvalds121-1040/+7605
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "A handful of new RISC-V related patches for this merge window: - A check to ensure drivers are properly using uaccess. This isn't manifesting with any of the drivers I'm currently using, but may catch errors in new drivers. - Some preliminary support for the FU740, along with the HiFive Unleashed it will appear on. - NUMA support for RISC-V, which involves making the arm64 code generic. - Support for kasan on the vmalloc region. - A handful of new drivers for the Kendryte K210, along with the DT plumbing required to boot on a handful of K210-based boards. - Support for allocating ASIDs. - Preliminary support for kernels larger than 128MiB. - Various other improvements to our KASAN support, including the utilization of huge pages when allocating the KASAN regions. We may have already found a bug with the KASAN_VMALLOC code, but it's passing my tests. There's a fix in the works, but that will probably miss the merge window. * tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits) riscv: Improve kasan population by using hugepages when possible riscv: Improve kasan population function riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization riscv: Improve kasan definitions riscv: Get rid of MAX_EARLY_MAPPING_SIZE soc: canaan: Sort the Makefile alphabetically riscv: Disable KSAN_SANITIZE for vDSO riscv: Remove unnecessary declaration riscv: Add Canaan Kendryte K210 SD card defconfig riscv: Update Canaan Kendryte K210 defconfig riscv: Add Kendryte KD233 board device tree riscv: Add SiPeed MAIXDUINO board device tree riscv: Add SiPeed MAIX GO board device tree riscv: Add SiPeed MAIX DOCK board device tree riscv: Add SiPeed MAIX BiT board device tree riscv: Update Canaan Kendryte K210 device tree dt-bindings: add resets property to dw-apb-timer dt-bindings: fix sifive gpio properties dt-bindings: update sifive uart compatible string dt-bindings: update sifive clint compatible string ...
2021-02-26Merge tag 'arm64-fixes' of ↵Linus Torvalds12-27/+44
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The big one is a fix for the VHE enabling path during early boot, where the code enabling the MMU wasn't necessarily in the identity map of the new page-tables, resulting in a consistent crash with 64k pages. In fixing that, we noticed some missing barriers too, so we added those for the sake of architectural compliance. Other than that, just the usual merge window trickle. There'll be more to come, too. Summary: - Fix lockdep false alarm on resume-from-cpuidle path - Fix memory leak in kexec_file - Fix module linker script to work with GDB - Fix error code when trying to use uprobes with AArch32 instructions - Fix late VHE enabling with 64k pages - Add missing ISBs after TLB invalidation - Fix seccomp when tracing syscall -1 - Fix stacktrace return code at end of stack - Fix inconsistent whitespace for pointer return values - Fix compiler warnings when building with W=1" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: stacktrace: Report when we reach the end of the stack arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL) arm64: Add missing ISB after invalidating TLB in enter_vhe arm64: Add missing ISB after invalidating TLB in __primary_switch arm64: VHE: Enable EL2 MMU from the idmap KVM: arm64: make the hyp vector table entries local arm64/mm: Fixed some coding style issues arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing kexec: move machine_kexec_post_load() to public interface arm64 module: set plt* section addresses to 0x0 arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
2021-02-26Merge tag 'm68knommu-for-v5.12' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu update from Greg Ungerer: "Only a single change. NULL parameter check in the local ColdFire clocking code" * tag 'm68knommu-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: let clk_enable() return immediately if clk is NULL
2021-02-26Merge tag 'trace-v5.12-2' of ↵Linus Torvalds2-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two fixes: - Fix an unsafe printf string usage in a kmem trace event - Fix spelling in output from the latency-collector tool" * tag 'trace-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/tools: fix a couple of spelling mistakes mm, tracing: Fix kmem_cache_free trace event to not print stale pointers
2021-02-26Merge tag 'orphan-handling-v5.12-rc1' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull orphan handling fix from Kees Cook: "Another case of bogus .eh_frame emission was noticed under CONFIG_GCOV_KERNEL=y. Summary: - Define SANITIZER_DISCARDS with CONFIG_GCOV_KERNEL=y (Nathan Chancellor)" * tag 'orphan-handling-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: vmlinux.lds.h: Define SANITIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
2021-02-26Merge tag 'clang-lto-v5.12-rc1-fix1' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull clang LTO fixes from Kees Cook: "This gets parisc building again and moves LTO artifact caching cleanup from the 'distclean' build target to 'clean'. Summary: - Fix parisc build for ftrace vs mcount (Sami Tolvanen) - Move .thinlto-cache remove to "clean" from "distclean" (Masahiro Yamada)" * tag 'clang-lto-v5.12-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: Move .thinlto-cache removal to 'make clean' parisc: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
2021-02-26Merge tag 'for-linus-5.12b-rc1-tag' of ↵Linus Torvalds7-16/+168
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - A small series for Xen event channels adding some sysfs nodes for per pv-device settings and statistics, and two fixes of theoretical problems. - two minor fixes (one for an unlikely error path, one for a comment). * tag 'for-linus-5.12b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-front-pgdir-shbuf: don't record wrong grant handle upon error xen: Replace lkml.org links with lore xen/evtchn: use READ/WRITE_ONCE() for accessing ring indices xen/evtchn: use smp barriers for user event ring xen/events: add per-xenbus device event statistics and settings
2021-02-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds24-423/+533
Pull more KVM updates from Paolo Bonzini: "x86: - take into account HVA before retrying on MMU notifier race - fixes for nested AMD guests without NPT - allow INVPCID in guest without PCID - disable PML in hardware when not in use - MMU code cleanups: * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: SVM: Fix nested VM-Exit on #GP interception handling KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created KVM: x86/mmu: Consider the hva in mmu_notifier retry KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID KVM: nSVM: prepare guest save area while is_guest_mode is true KVM: x86/mmu: Remove a variety of unnecessary exports KVM: x86: Fold "write-protect large" use case into generic write-protect KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging KVM: x86: Further clarify the logic and comments for toggling log dirty KVM: x86: Move MMU's PML logic to common code KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() KVM: nVMX: Disable PML in hardware when running L2 KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs KVM: x86/mmu: Pass the memslot to the rmap callbacks KVM: x86/mmu: Split out max mapping level calculation to helper KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages KVM: nVMX: no need to undo inject_page_fault change on nested vmexit ...
2021-02-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds145-1507/+4870
Merge more updates from Andrew Morton: "118 patches: - The rest of MM. Includes kfence - another runtime memory validator. Not as thorough as KASAN, but it has unmeasurable overhead and is intended to be usable in production builds. - Everything else Subsystems affected by this patch series: alpha, procfs, sysctl, misc, core-kernel, MAINTAINERS, lib, bitops, checkpatch, init, coredump, seq_file, gdb, ubsan, initramfs, and mm (thp, cma, vmstat, memory-hotplug, mlock, rmap, zswap, zsmalloc, cleanups, kfence, kasan2, and pagemap2)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) MIPS: make userspace mapping young by default initramfs: panic with memory information ubsan: remove overflow checks kgdb: fix to kill breakpoints on initmem after boot scripts/gdb: fix list_for_each x86: fix seq_file iteration for pat/memtype.c seq_file: document how per-entry resources are managed. fs/coredump: use kmap_local_page() init/Kconfig: fix a typo in CC_VERSION_TEXT help text init: clean up early_param_on_off() macro init/version.c: remove Version_<LINUX_VERSION_CODE> symbol checkpatch: do not apply "initialise globals to 0" check to BPF progs checkpatch: don't warn about colon termination in linker scripts checkpatch: add kmalloc_array_node to unnecessary OOM message check checkpatch: add warning for avoiding .L prefix symbols in assembly files checkpatch: improve TYPECAST_INT_CONSTANT test message checkpatch: prefer ftrace over function entry/exit printks checkpatch: trivial style fixes checkpatch: ignore warning designated initializers using NR_CPUS checkpatch: improve blank line after declaration test ...
2021-02-26MIPS: make userspace mapping young by defaultHuang Pei3-26/+16
MIPS page fault path(except huge page) takes 3 exceptions (1 TLB Miss + 2 TLB Invalid), butthe second TLB Invalid exception is just triggered by __update_tlb from do_page_fault writing tlb without _PAGE_VALID set. With this patch, user space mapping prot is made young by default (with both _PAGE_VALID and _PAGE_YOUNG set), and it only take 1 TLB Miss + 1 TLB Invalid exception Remove pte_sw_mkyoung without polluting MM code and make page fault delay of MIPS on par with other architecture Link: https://lkml.kernel.org/r/20210204013942.8398-1-huangpei@loongson.cn Signed-off-by: Huang Pei <huangpei@loongson.cn> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: <huangpei@loongson.cn> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <ambrosehua@gmail.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Li Xuefeng <lixuefeng@loongson.cn> Cc: Yang Tiezhu <yangtiezhu@loongson.cn> Cc: Gao Juxin <gaojuxin@loongson.cn> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Huacai Chen <chenhc@lemote.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26initramfs: panic with memory informationFlorian Fainelli1-4/+15
On systems with large amounts of reserved memory we may fail to successfully complete unpack_to_rootfs() and be left with: Kernel panic - not syncing: write error this is not too helpful to understand what happened, so let's wrap the panic() calls with a surrounding show_mem() such that we have a chance of understanding the memory conditions leading to these allocation failures. [akpm@linux-foundation.org: replace macro with C function] Link: https://lkml.kernel.org/r/20210114231517.1854379-1-f.fainelli@gmail.com Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Barret Rhoden <brho@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26ubsan: remove overflow checksAndrey Ryabinin4-136/+0
Since GCC 8.0 -fsanitize=signed-integer-overflow doesn't work with -fwrapv. -fwrapv makes signed overflows defines and GCC essentially disables ubsan checks. On GCC < 8.0 -fwrapv doesn't have influence on -fsanitize=signed-integer-overflow setting, so it kinda works but generates false-positves and violates uaccess rules: lib/iov_iter.o: warning: objtool: iovec_from_user()+0x22d: call to __ubsan_handle_add_overflow() with UACCESS enabled Disable signed overflow checks to avoid these problems. Remove unsigned overflow checks as well. Unsigned overflow appeared as side effect of commit cdf8a76fda4a ("ubsan: move cc-option tests into Kconfig"), but it never worked (kernel doesn't boot). And unsigned overflows are allowed by C standard, so it just pointless. Link: https://lkml.kernel.org/r/20210209232348.20510-1-ryabinin.a.a@gmail.com Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kgdb: fix to kill breakpoints on initmem after bootSumit Garg3-0/+14
Currently breakpoints in kernel .init.text section are not handled correctly while allowing to remove them even after corresponding pages have been freed. Fix it via killing .init.text section breakpoints just prior to initmem pages being freed. Doug: "HW breakpoints aren't handled by this patch but it's probably not such a big deal". Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Suggested-by: Doug Anderson <dianders@chromium.org> Acked-by: Doug Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26scripts/gdb: fix list_for_eachGeorge Prekas1-0/+5
If the list is uninitialized (next pointer is NULL), list_for_each gets stuck in an infinite loop. Print a message and treat list as empty. Link: https://lkml.kernel.org/r/4ae23bb1-c333-f669-da2d-fa35c4f49018@amazon.com Signed-off-by: George Prekas <prekageo@amazon.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26x86: fix seq_file iteration for pat/memtype.cNeilBrown1-2/+2
The memtype seq_file iterator allocates a buffer in the ->start and ->next functions and frees it in the ->show function. The preferred handling for such resources is to free them in the subsequent ->next or ->stop function call. Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") there is no guarantee that ->show will be called after ->next, so this function can now leak memory. So move the freeing of the buffer to ->next and ->stop. Link: https://lkml.kernel.org/r/161248539022.21478.13874455485854739066.stgit@noble1 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.de> Cc: Xin Long <lucien.xin@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26seq_file: document how per-entry resources are managed.NeilBrown1-0/+6
Patch series "Fix some seq_file users that were recently broken". A recent change to seq_file broke some users which were using seq_file in a non-"standard" way ... though the "standard" isn't documented, so they can be excused. The result is a possible leak - of memory in one case, of references to a 'transport' in the other. These three patches: 1/ document and explain the problem 2/ fix the problem user in x86 3/ fix the problem user in net/sctp This patch (of 3): Users of seq_file will sometimes find it convenient to take a resource, such as a lock or memory allocation, in the ->start or ->next operations. These are per-entry resources, distinct from per-session resources which are taken in ->start and released in ->stop. The preferred management of these is release the resource on the subsequent call to ->next or ->stop. However prior to Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") it happened that ->show would always be called after ->start or ->next, and a few users chose to release the resource in ->show. This is no longer reliable. Since the mentioned commit, ->next will always come after a successful ->show (to ensure m->index is updated correctly), so the original ordering cannot be maintained. This patch updates the documentation to clearly state the required behaviour. Other patches will fix the few problematic users. [akpm@linux-foundation.org: fix typo, per Willy] Link: https://lkml.kernel.org/r/161248518659.21478.2484341937387294998.stgit@noble1 Link: https://lkml.kernel.org/r/161248539020.21478.3147971477400875336.stgit@noble1 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.de> Cc: Xin Long <lucien.xin@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26fs/coredump: use kmap_local_page()Ira Weiny1-2/+2
In dump_user_range() there is no reason for the mapping to be global. Use kmap_local_page() rather than kmap. Link: https://lkml.kernel.org/r/20210203223328.558945-1-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26init/Kconfig: fix a typo in CC_VERSION_TEXT help textBhaskar Chowdhury1-1/+1
s/compier/compiler/ Link: https://lkml.kernel.org/r/20210224223325.29099-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26init: clean up early_param_on_off() macroMasahiro Yamada1-2/+2
Use early_param() to define early_param_on_off(). Link: https://lkml.kernel.org/r/20210201041532.4025025-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Johan Hovold <johan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Nick Desaulniers <ndesaulniers@gooogle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26init/version.c: remove Version_<LINUX_VERSION_CODE> symbolMasahiro Yamada1-8/+0
This code hunk creates a Version_<LINUX_VERSION_CODE> symbol if CONFIG_KALLSYMS is disabled. For example, building the kernel v5.10 for allnoconfig creates the following symbol: $ nm vmlinux | grep Version_ c116b028 B Version_330240 There is no in-tree user of this symbol. Commit 197dcffc8ba0 ("init/version.c: define version_string only if CONFIG_KALLSYMS is not defined") mentions that Version_* is only used with ksymoops. However, a commit in the pre-git era [1] had added the statement, "ksymoops is useless on 2.6. Please use the Oops in its original format". That statement existed until commit 4eb9241127a0 ("Documentation: admin-guide: update bug-hunting.rst") finally removed the stale ksymoops information. This symbol is no longer needed. [1] https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=ad68b2f085f5c79e4759ca2d13947b3c885ee831 Link: https://lkml.kernel.org/r/20210120033452.2895170-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Daniel Guilak <guilak@linux.vnet.ibm.com> Cc: Lee Revell <rlrevell@joe-job.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: do not apply "initialise globals to 0" check to BPF progsSong Liu1-1/+11
BPF programs explicitly initialise global variables to 0 to make sure clang (v10 or older) do not put the variables in the common section. Skip "initialise globals to 0" check for BPF programs to elimiate error messages like: ERROR: do not initialise globals to 0 #19: FILE: samples/bpf/tracex1_kern.c:21: Link: https://lkml.kernel.org/r/20210209211954.490077-1-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: don't warn about colon termination in linker scriptsChris Down1-1/+1
This check erroneously flags cases like the one in my recent printk enumeration patch[0], where the spaces are syntactic, and `section:' vs. `section :' is syntactically important: ERROR: space prohibited before that ':' (ctx:WxW) #258: FILE: include/asm-generic/vmlinux.lds.h:314: + .printk_fmts : AT(ADDR(.printk_fmts) - LOAD_OFFSET) { 0: https://lore.kernel.org/patchwork/patch/1375749/ Link: https://lkml.kernel.org/r/YBwhqsc2TIVeid3t@chrisdown.name Link: https://lkml.kernel.org/r/YB6UsjCOy1qrrlSD@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: add kmalloc_array_node to unnecessary OOM message checkJoe Perches1-1/+1
commit 5799b255c491 ("include/linux/slab.h: add kmalloc_array_node() and kcalloc_node()") was added in 2017. Update the unnecessary OOM message test to include it. Link: https://lkml.kernel.org/r/b9dc4a808b1518e08ab8761480d9872e5d18e7cd.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: add warning for avoiding .L prefix symbols in assembly filesAditya Srivastava1-0/+7
objtool requires that all code must be contained in an ELF symbol. Symbol names that have a '.L' prefix do not emit symbol table entries, as they have special meaning for the assembler. '.L' prefixed symbols can be used within a code region, but should be avoided for denoting a range of code via 'SYM_*_START/END' annotations. Add a new check to emit a warning on finding the usage of '.L' symbols for '.S' files, if it denotes range of code via SYM_*_START/END annotation pair. Link: https://lkml.kernel.org/r/20210123190459.9701-1-yashsri421@gmail.com Link: https://lore.kernel.org/lkml/20210112210154.GI4646@sirena.org.uk Signed-off-by: Aditya Srivastava <yashsri421@gmail.com> Suggested-by: Mark Brown <broonie@kernel.org> Acked-by: Joe Perches <joe@perches.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Aditya Srivastava <yashsri421@gmail.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: improve TYPECAST_INT_CONSTANT test messageJoe Perches1-10/+10
Improve the TYPECAST_INT_CONSTANT test by showing the suggested conversion for various type of uses like (unsigned int)1 to 1U. Link: https://lkml.kernel.org/r/ecefe8dcb93fe7028311b69dd297ba52224233d4.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: prefer ftrace over function entry/exit printksJoe Perches1-0/+35
Prefer using ftrace over function entry/exit logging messages. Warn with various function entry/exit only logging that only use __func__ with or without descriptive decoration. Link: https://lkml.kernel.org/r/47c01081533a417c99c9a80a4cd537f8c308503f.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: trivial style fixesDwaipayan Ray1-9/+9
Indentations should use tabs wherever possible. Replace spaces by tabs for indents. Link: https://lkml.kernel.org/r/20210105103044.40282-1-dwaipayanray1@gmail.com Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: ignore warning designated initializers using NR_CPUSPeng Wang1-1/+3
Some max_length wants to hold as large room as possible to ensure enough size to tackle with the biggest NR_CPUS. An example below: kernel/cgroup/cpuset.c: static struct cftype legacy_files[] = { { .name = "cpus", .seq_show = cpuset_common_seq_show, .write = cpuset_write_resmask, .max_write_len = (100U + 6 * NR_CPUS), .private = FILE_CPULIST, }, ... } Link: https://lkml.kernel.org/r/5d4998aa8a8ac7efada2c7daffa9e73559f8b186.1609331255.git.rocking@linux.alibaba.com Signed-off-by: Peng Wang <rocking@linux.alibaba.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26checkpatch: improve blank line after declaration testJoe Perches1-23/+29
Avoid multiple false positives by ignoring attributes. Various attributes like volatile and ____cacheline_aligned_in_smp cause checkpatch to emit invalid "Missing a blank line after declarations" messages. Use copies of $sline and $prevline, remove $Attribute and $Sparse, and use the existing tests to avoid these false positives. Miscellanea: o Add volatile to $Attribute This also reduces checkpatch runtime a bit by moving the indentation comparison test to the start of the block to avoid multiple unnecessary regex tests. Link: https://lkml.kernel.org/r/9015fd00742bf4e5b824ad6d7fd7189530958548.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26include/linux/bitops.h: spelling s/synomyn/synonym/Geert Uytterhoeven1-1/+1
Fix a misspelling of "synonym". Link: https://lkml.kernel.org/r/20210108105305.2028120-1-geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26lib/cmdline: remove an unneeded local variable in next_arg()Masahiro Yamada1-4/+3
The local variable 'next' is unneeded because you can simply advance the existing pointer 'args'. Link: https://lkml.kernel.org/r/20210201014707.3828753-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26lib: stackdepot: fix ignoring return value warningVijayanand Jitta1-2/+4
Fix the below ignoring return value warning for kstrtobool in is_stack_depot_disabled function. lib/stackdepot.c: In function 'is_stack_depot_disabled': lib/stackdepot.c:154:2: warning: ignoring return value of 'kstrtobool' declared with attribute 'warn_unused_result' [-Wunused-result] Link: https://lkml.kernel.org/r/1612163048-28026-1-git-send-email-vjitta@codeaurora.org Fixes: b9779abb09a8 ("lib: stackdepot: add support to disable stack depot") Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26lib: stackdepot: add support to disable stack depotVijayanand Jitta4-4/+45
Add a kernel parameter stack_depot_disable to disable stack depot. So that stack hash table doesn't consume any memory when stack depot is disabled. The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this patch, stackdepot will consume the memory for the hashtable. By default, it's 8M which is never trivial. With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off, stack_depot_disable in kernel command line, we could save the wasted memory for the hashtable. [akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build] Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Alexander Potapenko <glider@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yogesh Lal <ylal@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26lib: stackdepot: add support to configure STACK_HASH_SIZEYogesh Lal2-2/+10
Use CONFIG_STACK_HASH_ORDER to configure STACK_HASH_SIZE. Aim is to have configurable value for STACK_HASH_SIZE, so depend on use case one can configure it. One example is of Page Owner, CONFIG_PAGE_OWNER works only if page_owner=on via kernel parameter on CONFIG_PAGE_OWNER configured system. Thus, unless admin enable it via command line option, the stackdepot will just waste 8M memory without any customer. Making it configurable and use lower value helps to enable features like CONFIG_PAGE_OWNER without any significant overhead. Link: https://lkml.kernel.org/r/1611749198-24316-1-git-send-email-vjitta@codeaurora.org Signed-off-by: Yogesh Lal <ylal@codeaurora.org> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26string.h: move fortified functions definitions in a dedicated header.Francis Laniel2-281/+303
This patch adds fortify-string.h to contain fortified functions definitions. Thus, the code is more separated and compile time is approximately 1% faster for people who do not set CONFIG_FORTIFY_SOURCE. Link: https://lkml.kernel.org/r/20210111092141.22946-1-laniel_francis@privacyrequired.com Link: https://lkml.kernel.org/r/20210111092141.22946-2-laniel_francis@privacyrequired.com Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26lib/genalloc.c: change return type to unsigned long for bitmap_set_llHuang Shijie1-1/+2
Just as bitmap_clear_ll(), change return type to unsigned long for bitmap_set_ll to avoid the possible overflow in future. Link: https://lkml.kernel.org/r/20210105031644.2771-1-sjhuang@iluvatar.ai Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26MAINTAINERS: add uapi directories to API/ABI sectionVlastimil Babka1-0/+2
Let's add include/uapi/ and arch/*/include/uapi/ to API/ABI section, so that for patches modifying them, get_maintainers.pl suggests CCing linux-api@ so people don't forget. Link: https://lkml.kernel.org/r/20210217174745.13591-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: David Hildenbrand <david@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kernel: delete repeated words in commentsRandy Dunlap7-11/+11
Drop repeated words in kernel/events/. {if, the, that, with, time} Drop repeated words in kernel/locking/. {it, no, the} Drop repeated words in kernel/sched/. {in, not} Link: https://lkml.kernel.org/r/20210127023412.26292-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Will Deacon <will@kernel.org> [kernel/locking/] Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26groups: simplify struct group_info allocationHubert Jasudowicz1-6/+1
Combine kmalloc and vmalloc into a single call. Use struct_size macro instead of direct size calculation. Link: https://lkml.kernel.org/r/ba9ba5beea9a44b7196c41a0d9528abd5f20dd2e.1611620846.git.hubert.jasudowicz@gmail.com Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com> Cc: Gao Xiang <xiang@kernel.org> Cc: Micah Morton <mortonm@chromium.org> Cc: Michael Kelley <mikelley@microsoft.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26groups: use flexible-array member in struct group_infoHubert Jasudowicz1-1/+1
Replace zero-size array with flexible array member, as recommended by the docs. Link: https://lkml.kernel.org/r/155995eed35c3c1bdcc56e69d8997c8e4c46740a.1611620846.git.hubert.jasudowicz@gmail.com Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Micah Morton <mortonm@chromium.org> Cc: Gao Xiang <xiang@kernel.org> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26treewide: Miguel has movedMiguel Ojeda11-23/+21
Update contact info. Link: https://lkml.kernel.org/r/20210206162524.GA11520@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26include/linux: remove repeated wordsRandy Dunlap4-4/+4
Drop the doubled word "for" in a comment. {firewire-cdev.h} Drop the doubled word "in" in a comment. {input.h} Drop the doubled word "a" in a comment. {mdev.h} Drop the doubled word "the" in a comment. {ptrace.h} Link: https://lkml.kernel.org/r/20210126232444.22861-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26sysctl.c: fix underflow value setting risk in vm_tableLin Feng1-4/+4
Apart from subsystem specific .proc_handler handler, all ctl_tables with extra1 and extra2 members set should use proc_dointvec_minmax instead of proc_dointvec, or the limit set in extra* never work and potentially echo underflow values(negative numbers) is likely make system unstable. Especially vfs_cache_pressure and zone_reclaim_mode, -1 is apparently not a valid value, but we can set to them. And then kernel may crash. # echo -1 > /proc/sys/vm/vfs_cache_pressure Link: https://lkml.kernel.org/r/20201223105535.2875-1-linf@wangsu.com Signed-off-by: Lin Feng <linf@wangsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26proc: use kvzalloc for our kernel bufferJosef Bacik1-2/+2
Since sysctl: pass kernel pointers to ->proc_handler we have been pre-allocating a buffer to copy the data from the proc handlers into, and then copying that to userspace. The problem is this just blindly kzalloc()'s the buffer size passed in from the read, which in the case of our 'cat' binary was 64kib. Order-4 allocations are not awesome, and since we can potentially allocate up to our maximum order, so use kvzalloc for these buffers. [willy@infradead.org: changelog tweaks] Link: https://lkml.kernel.org/r/6345270a2c1160b89dd5e6715461f388176899d1.1612972413.git.josef@toxicpanda.com Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> CC: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26proc/wchan: use printk format instead of lookup_symbol_name()Helge Deller1-11/+8
To resolve the symbol fuction name for wchan, use the printk format specifier %ps instead of manually looking up the symbol function name via lookup_symbol_name(). Link: https://lkml.kernel.org/r/20201217165413.GA1959@ls3530.fritz.box Signed-off-by: Helge Deller <deller@gmx.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26alpha: remove CONFIG_EXPERIMENTAL from defconfigsRandy Dunlap1-1/+0
Since CONFIG_EXPERIMENTAL was removed in 2013, go ahead and drop it from any defconfig files. Link: https://lkml.kernel.org/r/20210115005956.29408-1-rdunlap@infradead.org Fixes: 3d374d09f16f ("final removal of CONFIG_EXPERIMENTAL") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: clarify that only first bug is reported in HW_TAGSAndrey Konovalov2-3/+7
Hwardware tag-based KASAN only reports the first found bug. After that MTE tag checking gets disabled. Clarify this in comments and documentation. Link: https://lkml.kernel.org/r/00383ba88a47c3f8342d12263c24bdf95527b07d.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: inline HW_TAGS helper functionsAndrey Konovalov1-6/+7
Mark all static functions in common.c and kasan.h that are used for hardware tag-based KASAN as inline to avoid unnecessary function calls. Link: https://lkml.kernel.org/r/2c94a2af0657f2b95b9337232339ff5ffa643ab5.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26arm64: kasan: simplify and inline MTE functionsAndrey Konovalov7-73/+60
This change provides a simpler implementation of mte_get_mem_tag(), mte_get_random_tag(), and mte_set_mem_tag_range(). Simplifications include removing system_supports_mte() checks as these functions are onlye called from KASAN runtime that had already checked system_supports_mte(). Besides that, size and address alignment checks are removed from mte_set_mem_tag_range(), as KASAN now does those. This change also moves these functions into the asm/mte-kasan.h header and implements mte_set_mem_tag_range() via inline assembly to avoid unnecessary functions calls. [vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()] Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: ensure poisoning size alignmentAndrey Konovalov3-31/+48
A previous changes d99f6a10c161 ("kasan: don't round_up too much") attempted to simplify the code by adding a round_up(size) call into kasan_poison(). While this allows to have less round_up() calls around the code, this results in round_up() being called multiple times. This patch removes round_up() of size from kasan_poison() and ensures that all callers round_up() the size explicitly. This patch also adds WARN_ON() alignment checks for address and size to kasan_poison() and kasan_unpoison(). Link: https://lkml.kernel.org/r/3ffe8d4a246ae67a8b5e91f65bf98cd7cba9d7b9.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan, mm: optimize krealloc poisoningAndrey Konovalov2-8/+24
Currently, krealloc() always calls ksize(), which unpoisons the whole object including the redzone. This is inefficient, as kasan_krealloc() repoisons the redzone for objects that fit into the same buffer. This patch changes krealloc() instrumentation to use uninstrumented __ksize() that doesn't unpoison the memory. Instead, kasan_kreallos() is changed to unpoison the memory excluding the redzone. For objects that don't fit into the old allocation, this patch disables KASAN accessibility checks when copying memory into a new object instead of unpoisoning it. Link: https://lkml.kernel.org/r/9bef90327c9cb109d736c40115684fd32f49e6b0.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan, mm: fail krealloc on freed objectsAndrey Konovalov2-0/+23
Currently, if krealloc() is called on a freed object with KASAN enabled, it allocates and returns a new object, but doesn't copy any memory from the old one as ksize() returns 0. This makes the caller believe that krealloc() succeeded (KASAN report is printed though). This patch adds an accessibility check into __do_krealloc(). If the check fails, krealloc() returns NULL. This check duplicates the one in ksize(); this is fixed in the following patch. This patch also adds a KASAN-KUnit test to check krealloc() behaviour when it's called on a freed object. Link: https://lkml.kernel.org/r/cbcf7b02be0a1ca11de4f833f2ff0b3f2c9b00c8.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: rework krealloc testsAndrey Konovalov1-10/+81
This patch reworks KASAN-KUnit tests for krealloc() to: 1. Check both slab and page_alloc based krealloc() implementations. 2. Allow at least one full granule to fit between old and new sizes for each KASAN mode, and check accesses to that granule accordingly. Link: https://lkml.kernel.org/r/c707f128a2bb9f2f05185d1eb52192cf179cf4fa.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: unify large kfree checksAndrey Konovalov2-18/+34
Unify checks in kasan_kfree_large() and in kasan_slab_free_mempool() for large allocations as it's done for small kfree() allocations. With this change, kasan_slab_free_mempool() starts checking that the first byte of the memory that's being freed is accessible. Link: https://lkml.kernel.org/r/14ffc4cd867e0b1ed58f7527e3b748a1b4ad08aa.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: clean up setting free info in kasan_slab_freeAndrey Konovalov1-4/+2
Put kasan_stack_collection_enabled() check and kasan_set_free_info() calls next to each other. The way this was previously implemented was a minor optimization that relied of the the fact that kasan_stack_collection_enabled() is always true for generic KASAN. The confusion that this brings outweights saving a few instructions. Link: https://lkml.kernel.org/r/f838e249be5ab5810bf54a36ef5072cfd80e2da7.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: optimize large kmalloc poisoningAndrey Konovalov1-5/+15
Similarly to kasan_kmalloc(), kasan_kmalloc_large() doesn't need to unpoison the object as it as already unpoisoned by alloc_pages() (or by ksize() for krealloc()). This patch changes kasan_kmalloc_large() to only poison the redzone. Link: https://lkml.kernel.org/r/33dee5aac0e550ad7f8e26f590c9b02c6129b4a3.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan, mm: optimize kmalloc poisoningAndrey Konovalov4-48/+119
For allocations from kmalloc caches, kasan_kmalloc() always follows kasan_slab_alloc(). Currenly, both of them unpoison the whole object, which is unnecessary. This patch provides separate implementations for both annotations: kasan_slab_alloc() unpoisons the whole object, and kasan_kmalloc() only poisons the redzone. For generic KASAN, the redzone start might not be aligned to KASAN_GRANULE_SIZE. Therefore, the poisoning is split in two parts: kasan_poison_last_granule() poisons the unaligned part, and then kasan_poison() poisons the rest. This patch also clarifies alignment guarantees of each of the poisoning functions and drops the unnecessary round_up() call for redzone_end. With this change, the early SLUB cache annotation needs to be changed to kasan_slab_alloc(), as kasan_kmalloc() doesn't unpoison objects now. The number of poisoned bytes for objects in this cache stays the same, as kmem_cache_node->object_size is equal to sizeof(struct kmem_cache_node). Link: https://lkml.kernel.org/r/7e3961cb52be380bc412860332063f5f7ce10d13.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan, mm: don't save alloc stacks twiceAndrey Konovalov3-4/+24
Patch series "kasan: optimizations and fixes for HW_TAGS", v4. This patchset makes the HW_TAGS mode more efficient, mostly by reworking poisoning approaches and simplifying/inlining some internal helpers. With this change, the overhead of HW_TAGS annotations excluding setting and checking memory tags is ~3%. The performance impact caused by tags will be unknown until we have hardware that supports MTE. As a side-effect, this patchset speeds up generic KASAN by ~15%. This patch (of 13): Currently KASAN saves allocation stacks in both kasan_slab_alloc() and kasan_kmalloc() annotations. This patch changes KASAN to save allocation stacks for slab objects from kmalloc caches in kasan_kmalloc() only, and stacks for other slab objects in kasan_slab_alloc() only. This change requires ____kasan_kmalloc() knowing whether the object belongs to a kmalloc cache. This is implemented by adding a flag field to the kasan_info structure. That flag is only set for kmalloc caches via a new kasan_cache_create_kmalloc() annotation. Link: https://lkml.kernel.org/r/cover.1612546384.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/7c673ebca8d00f40a7ad6f04ab9a2bddeeae2097.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: use error_report_end tracepointAlexander Potapenko1-3/+5
Make it possible to trace KASAN error reporting. A good usecase is watching for trace events from the userspace to detect and process memory corruption reports from the kernel. Link: https://lkml.kernel.org/r/20210121131915.1331302-4-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence: use error_report_end tracepointAlexander Potapenko1-0/+2
Make it possible to trace KFENCE error reporting. A good usecase is watching for trace events from the userspace to detect and process memory corruption reports from the kernel. Link: https://lkml.kernel.org/r/20210121131915.1331302-3-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26tracing: add error_report_end trace pointAlexander Potapenko3-0/+86
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3. This patchset adds a tracepoint, error_repor_end, that is to be used by KFENCE, KASAN, and potentially other bug detection tools, when they print an error report. One of the possible use cases is userspace collection of kernel error reports: interested parties can subscribe to the tracing event via tracefs, and get notified when an error report occurs. This patch (of 3): Introduce error_report_end tracepoint. It can be used in debugging tools like KASAN, KFENCE, etc. to provide extensions to the error reporting mechanisms (e.g. allow tests hook into error reporting, ease error report collection from production kernels). Another benefit would be making use of ftrace for debugging or benchmarking the tools themselves. Should we need it, the tracepoint name leaves us with the possibility to introduce a complementary error_report_start tracepoint in the future. Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence: report sensitive information based on no_hash_pointersMarco Elver5-27/+18
We cannot rely on CONFIG_DEBUG_KERNEL to decide if we're running a "debug kernel" where we can safely show potentially sensitive information in the kernel log. Instead, simply rely on the newly introduced "no_hash_pointers" to print unhashed kernel pointers, as well as decide if our reports can include other potentially sensitive information such as registers and corrupted bytes. Link: https://lkml.kernel.org/r/20210223082043.1972742-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Cc: Timur Tabi <timur@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26MAINTAINERS: add entry for KFENCEMarco Elver1-0/+12
Add entry for KFENCE maintainers. Link: https://lkml.kernel.org/r/20201103175841.3495947-10-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: SeongJae Park <sjpark@amazon.de> Co-developed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence: add test suiteMarco Elver10-25/+915
Add KFENCE test suite, testing various error detection scenarios. Makes use of KUnit for test organization. Since KFENCE's interface to obtain error reports is via the console, the test verifies that KFENCE outputs expected reports to the console. [elver@google.com: fix typo in test] Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com [elver@google.com: show access type in report] Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence, Documentation: add KFENCE documentationMarco Elver3-0/+301
Add KFENCE documentation in dev-tools/kfence.rst, and add to index. [elver@google.com: add missing copyright header to documentation] Link: https://lkml.kernel.org/r/20210118092159.145934-4-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-8-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence, kasan: make KFENCE compatible with KASANAlexander Potapenko5-5/+40
Make KFENCE compatible with KASAN. Currently this helps test KFENCE itself, where KASAN can catch potential corruptions to KFENCE state, or other corruptions that may be a result of freepointer corruptions in the main allocators. [akpm@linux-foundation.org: merge fixup] [andreyknvl@google.com: untag addresses for KFENCE] Link: https://lkml.kernel.org/r/9dc196006921b191d25d10f6e611316db7da2efc.1611946152.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-7-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jann Horn <jannh@google.com> Co-developed-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm, kfence: insert KFENCE hooks for SLUBAlexander Potapenko3-14/+51
Inserts KFENCE hooks into the SLUB allocator. To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE. Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into. When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline. Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jann Horn <jannh@google.com> Co-developed-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm, kfence: insert KFENCE hooks for SLABAlexander Potapenko4-10/+38
Inserts KFENCE hooks into the SLAB allocator. To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE. Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into. When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline. Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Marco Elver <elver@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence: use pt_regs to generate stack trace on faultsMarco Elver8-46/+48
Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly. Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace. If the kernel is a DEBUG_KERNEL, also show registers for more information. Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26arm64, kfence: enable KFENCE for ARM64Marco Elver4-1/+36
Add architecture specific implementation details for KFENCE and enable KFENCE for the arm64 architecture. In particular, this implements the required interface in <asm/kfence.h>. KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the entire linear map to be mapped at page granularity. Doing so may result in extra memory allocated for page tables in case rodata=full is not set; however, currently CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case is therefore not affected by this change. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26x86, kfence: enable KFENCE for x86Alexander Potapenko3-0/+76
Add architecture specific implementation details for KFENCE and enable KFENCE for the x86 architecture. In particular, this implements the required interface in <asm/kfence.h> for setting up the pool and providing helper functions for protecting and unprotecting pages. For x86, we need to ensure that the pool uses 4K pages, which is done using the set_memory_4k() helper function. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Marco Elver <elver@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add Kernel Electric-Fence infrastructureAlexander Potapenko9-0/+1484
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7. This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. This series enables KFENCE for the x86 and arm64 architectures, and adds KFENCE hooks to the SLAB and SLUB allocators. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, the next allocation through the main allocator (SLAB or SLUB) returns a guarded allocation from the KFENCE object pool. At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. The KFENCE memory pool is of fixed size, and if the pool is exhausted no further KFENCE allocations occur. The default config is conservative with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB pages). We have verified by running synthetic benchmarks (sysbench I/O, hackbench) and production server-workload benchmarks that a kernel with KFENCE (using sample intervals 100-500ms) is performance-neutral compared to a non-KFENCE baseline kernel. KFENCE is inspired by GWP-ASan [1], a userspace tool with similar properties. The name "KFENCE" is a homage to the Electric Fence Malloc Debugger [2]. For more details, see Documentation/dev-tools/kfence.rst added in the series -- also viewable here: https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst [1] http://llvm.org/docs/GwpAsan.html [2] https://linux.die.net/man/3/efence This patch (of 9): This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. To detect out-of-bounds writes to memory within the object's page itself, KFENCE also uses pattern-based redzones. The following figure illustrates the page layout: ---+-----------+-----------+-----------+-----------+-----------+--- | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | ---+-----------+-----------+-----------+-----------+-----------+--- Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, a guarded allocation from the KFENCE object pool is returned to the main allocator (SLAB or SLUB). At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. To date, we have verified by running synthetic benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE is performance-neutral compared to the non-KFENCE baseline. For more details, see Documentation/dev-tools/kfence.rst (added later in the series). [elver@google.com: fix parameter description for kfence_object_start()] Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com [elver@google.com: avoid stalling work queue task without allocations] Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com [elver@google.com: fix potential deadlock due to wake_up()] Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com [elver@google.com: add option to use KFENCE without static keys] Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com [elver@google.com: add missing copyright and description headers] Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: SeongJae Park <sjpark@amazon.de> Co-developed-by: Marco Elver <elver@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Joern Engel <joern@purestorage.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/early_ioremap.c: use __func__ instead of function nameStephen Zhang1-6/+6
It is better to use __func__ instead of function name. Link: https://lkml.kernel.org/r/1611385587-4209-1-git-send-email-stephenzhangzsd@gmail.com Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/backing-dev.c: use might_alloc()Daniel Vetter1-1/+2
Now that my little helper has landed, use it more. On top of the existing check this also uses lockdep through the fs_reclaim annotations. [akpm@linux-foundation.org: include linux/sched/mm.h] Link: https://lkml.kernel.org/r/20210113135009.3606813-2-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/dmapool: use might_alloc()Daniel Vetter1-1/+2
Now that my little helper has landed, use it more. On top of the existing check this also uses lockdep through the fs_reclaim annotations. Link: https://lkml.kernel.org/r/20210113135009.3606813-1-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: page-flags.h: Typo fix (It -> If)Guo Ren1-2/+2
The "If" was wrongly spelled as "It". Link: https://lkml.kernel.org/r/1608959036-91409-1-git-send-email-guoren@kernel.org Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/zsmalloc.c: use page_private() to access page->privateMiaohe Lin1-1/+1
It's recommended to use helper macro page_private() to access the private field of page. Use such helper to eliminate direct access. Link: https://lkml.kernel.org/r/20210203091857.20017-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26zsmalloc: account the number of compacted pages correctlyRokudo Yan3-8/+13
There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()Miaohe Lin1-2/+1
We always memset the zspage allocated via cache_alloc_zspage. So it's more convenient to use kmem_cache_zalloc in cache_alloc_zspage than caller do it manually. Link: https://lkml.kernel.org/r/20210114120032.25885-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: set the sleep_mapped to true for zbud and z3foldTian Tao2-0/+2
zpool driver adds a flag to indicate whether the zpool driver can enter an atomic context after mapping. This patch sets it true for z3fold and zbud. Link: https://lkml.kernel.org/r/1611035683-12732-3-git-send-email-tiantao6@hisilicon.com Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Mike Galbraith <efault@gmx.de> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/zswap: add the flag can_sleep_mappedTian Tao3-5/+62
Patch series "Fix the compatibility of zsmalloc and zswap". Patch #1 adds a flag to zpool, then zswap used to determine if zpool drivers such as zbud/z3fold/zsmalloc will enter an atomic context after mapping. The difference between zbud/z3fold and zsmalloc is that zsmalloc requires an atomic context that since its map function holds a preempt-disabled, but zbud/z3fold don't require an atomic context. So patch #2 sets flag sleep_mapped to true indicating that zbud/z3fold can sleep after mapping. zsmalloc didn't support sleep after mapping, so don't set that flag to true. This patch (of 2): Add a flag to zpool, named is "can_sleep_mapped", and have it set true for zbud/z3fold, not set this flag for zsmalloc, so its default value is false. Then zswap could go the current path if the flag is true; and if it's false, copy data from src to a temporary buffer, then unmap the handle, take the mutex, process the buffer instead of src to avoid sleeping function called from atomic context. [natechancellor@gmail.com: add return value in zswap_frontswap_load] Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com [tiantao6@hisilicon.com: fix potential memory leak] Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com [colin.king@canonical.com: fix potential uninitialized pointer read on tmp] Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com [tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used] Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Mike Galbraith <efault@gmx.de> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: zswap: clean up confusing commentRandy Dunlap1-3/+3
Correct wording and change one duplicated word (it) to "it is". Link: https://lkml.kernel.org/r/20201221042848.13980-1-rdunlap@infradead.org Fixes: 0ab0abcf5115 ("mm/zswap: refactor the get/put routines") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Weijie Yang <weijie.yang@samsung.com> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: fix potential pte_unmap on an not mapped pteMiaohe Lin1-1/+2
For PMD-mapped page (usually THP), pvmw->pte is NULL. For PTE-mapped THP, pvmw->pte is mapped. But for HugeTLB pages, pvmw->pte is not mapped and set to the relevant page table entry. So in page_vma_mapped_walk_done(), we may do pte_unmap() for HugeTLB pte which is not mapped. Fix this by checking pvmw->page against PageHuge before trying to do pte_unmap(). Link: https://lkml.kernel.org/r/20210127093349.39081-1-linmiaohe@huawei.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michel Lespinasse <walken@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: correct obsolete comment of page_get_anon_vma()Miaohe Lin1-2/+2
Since commit 746b18d421da ("mm: use refcounts for page_lock_anon_vma()"), page_lock_anon_vma() is renamed to page_get_anon_vma() and converted to return a refcount increased anon_vma. But it forgot to change the relevant comment. Link: https://lkml.kernel.org/r/20210203093215.31990-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: use page_not_mapped in try_to_unmap()Miaohe Lin1-8/+3
page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error. Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: fix obsolete comment in __page_check_anon_rmap()Miaohe Lin1-2/+1
Commit 21333b2b66b8 ("ksm: no debug in page_dup_rmap()") has reverted page_dup_rmap() to an inline atomic_inc of mapcount. So page_dup_rmap() does not call __page_check_anon_rmap() anymore. Link: https://lkml.kernel.org/r/20210128110209.50857-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: remove unneeded semicolon in page_not_mapped()Miaohe Lin1-1/+1
Remove extra semicolon without any functional change intended. Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: correct some obsolete comments of anon_vmaMiaohe Lin1-2/+2
commit 2b575eb64f7a ("mm: convert anon_vma->lock to a mutex") changed spinlock used to serialize access to vma list to mutex. And further, the commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex to an rwsem") converted the mutex to an rwsem for solving scalability problem. So replace spinlock with rwsem to make comment uptodate. Link: https://lkml.kernel.org/r/20210123072459.25903-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/mlock: stop counting mlocked pages when none vma is foundMiaohe Lin1-1/+1
There will be no vma satisfies addr < vm_end when find_vma() returns NULL. Thus it's meaningless to traverse the vma list below because we can't find any vma to count mlocked pages. Stop counting mlocked pages in this case to save some vma list traversal cycles. Link: https://lkml.kernel.org/r/20210204110705.17586-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplugDavid Hildenbrand1-14/+27
Right now, we only check against MAX_PHYSMEM_BITS - but turns out there are more restrictions of which memory we can actually hotplug, especially om arm64 or s390x once we support them: we might receive something like -E2BIG or -ERANGE from add_memory_driver_managed(), stopping device operation. So, check right when initializing the device which memory we can add, warning the user. Try only adding actually pluggable ranges: in the worst case, no memory provided by our device is pluggable. In the usual case, we expect all device memory to be pluggable, and in corner cases only some memory at the end of the device-managed memory region to not be pluggable. Link: https://lkml.kernel.org/r/1612149902-7867-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26s390/mm: define arch_get_mappable_range()Anshuman Khandual2-1/+14
This overrides arch_get_mappabble_range() on s390 platform which will be used with recently added generic framework. It modifies the existing range check in vmem_add_mapping() using arch_get_mappable_range(). It also adds a VM_BUG_ON() check that would ensure that mhp_range_allowed() has already been called on the hotplug path. Link: https://lkml.kernel.org/r/1612149902-7867-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26arm64/mm: define arch_get_mappable_range()Anshuman Khandual1-8/+7
This overrides arch_get_mappable_range() on arm64 platform which will be used with recently added generic framework. It drops inside_linear_region() and subsequent check in arch_add_memory() which are no longer required. It also adds a VM_BUG_ON() check that would ensure that mhp_range_allowed() has already been called. Link: https://lkml.kernel.org/r/1612149902-7867-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: prevalidate the address range being added with platformAnshuman Khandual3-20/+76
Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5. This series adds a mechanism allowing platforms to weigh in and prevalidate incoming address range before proceeding further with the memory hotplug. This helps prevent potential platform errors for the given address range, down the hotplug call chain, which inevitably fails the hotplug itself. This mechanism was suggested by David Hildenbrand during another discussion with respect to a memory hotplug fix on arm64 platform. https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khandual@arm.com/ This mechanism focuses on the addressibility aspect and not [sub] section alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span() have been left unchanged. This patch (of 4): This introduces mhp_range_allowed() which can be called in various memory hotplug paths to prevalidate the address range which is being added, with the platform. Then mhp_range_allowed() calls mhp_get_pluggable_range() which provides applicable address range depending on whether linear mapping is required or not. For ranges that require linear mapping, it calls a new arch callback arch_get_mappable_range() which the platform can override. So the new callback, in turn provides the platform an opportunity to configure acceptable memory hotplug address ranges in case there are constraints. This mechanism will help prevent platform specific errors deep down during hotplug calls. This drops now redundant check_hotplug_memory_addressable() check in __add_pages() but instead adds a VM_BUG_ON() check which would ensure that the range has been validated with mhp_range_allowed() earlier in the call chain. Besides mhp_get_pluggable_range() also can be used by potential memory hotplug callers to avail the allowed physical range which would go through on a given platform. This does not really add any new range check in generic memory hotplug but instead compensates for lost checks in arch_add_memory() where applicable and check_hotplug_memory_addressable(), with unified mhp_range_allowed(). [akpm@linux-foundation.org: make pagemap_range() return -EINVAL when mhp_range_allowed() fails] Link: https://lkml.kernel.org/r/1612149902-7867-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1612149902-7867-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> # s390 Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26Documentation: sysfs/memory: clarify some memory block device propertiesDavid Hildenbrand2-28/+41
In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory blocks as removable") we changed the output of the "removable" property of memory devices to return "1" if and only if the kernel supports memory offlining. Let's update documentation, stating that the interface is legacy. Also update documentation of the "state" property and "valid_zones" properties. Link: https://lkml.kernel.org/r/20210201181347.13262-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26drivers/base/memory: don't store phys_device in memory blocksDavid Hildenbrand4-22/+15
No need to store the value for each and every memory block, as we can easily query the value at runtime. Reshuffle the members to optimize the memory layout. Also, let's clarify what the interface once was used for and why it's legacy nowadays. "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], back when they were still part of s390x-tools. They were later replaced by the variants in linux-utils. For example, RHEL6 and RHEL7 contain lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux on s390x [4]. "phys_device" was added with sysfs support for memory hotplug in commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") in 2005. It always returned 0. s390x started returning something != 0 on some setups (if sclp.rzm is set by HW) in 2010 via commit 57b552ba0b2f ("memory hotplug/s390: set phys_device"). For s390x, it allowed for identifying which memory block devices belong to the same storage increment (RZM). Only if all memory block devices comprising a single storage increment were offline, the memory could actually be removed in the hypervisor. Since commit e5d709bb5fb7 ("s390/memory hotplug: provide memory_block_size_bytes() function") in 2013 a memory block device spans at least one storage increment - which is why the interface isn't really helpful/used anymore (except by old lsmem/chmem tools). There were once RFC patches to make use of "phys_device" in ACPI context; however, the underlying problem could be solved using different interfaces [1]. [1] https://patchwork.kernel.org/patch/2163871/ [2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem [3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem [4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134 Link: https://lkml.kernel.org/r/20210201181347.13262-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfnMiaohe Lin1-5/+4
Commit 108bcc96ef70 ("mm: add & use zone_end_pfn() and zone_spans_pfn()") introduced the helper zone_end_pfn() to calculate the zone end pfn. But update_pgdat_span() forgot to use it. Use this helper and rename local variable zone_end_pfn to end_pfn to avoid a naming conflict with the existing zone_end_pfn(). Link: https://lkml.kernel.org/r/20210127093211.37714-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCEDavid Hildenbrand5-5/+5
Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and "mhp_flags". As discussed recently [1], "mhp" is our internal acronym for memory hotplug now. [1] https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c334d@redhat.com/ Link: https://lkml.kernel.org/r/20210126115829.10909-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: rename all existing 'memhp' into 'mhp'Anshuman Khandual3-13/+13
This renames all 'memhp' instances to 'mhp' except for memhp_default_state for being a kernel command line option. This is just a clean up and should not cause a functional change. Let's make it consistent rater than mixing the two prefixes. In preparation for more users of the 'mhp' terminology. Link: https://lkml.kernel.org/r/1611554093-27316-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: fix memory_failure() handling of dax-namespace metadataDan Williams3-0/+27
Given 'struct dev_pagemap' spans both data pages and metadata pages be careful to consult the altmap if present to delineate metadata. In fact the pfn_first() helper already identifies the first valid data pfn, so export that helper for other code paths via pgmap_pfn_valid(). Other usage of get_dev_pagemap() are not a concern because those are operating on known data pfns having been looked up by get_user_pages(). I.e. metadata pfns are never user mapped. Link: https://lkml.kernel.org/r/161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 6100e34b2526 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: teach pfn_to_online_page() about ZONE_DEVICE section collisionsDan Williams2-7/+65
While pfn_to_online_page() is able to determine pfn_valid() at subsection granularity it is not able to reliably determine if a given pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with ZONE_DEVICE. This means that pfn_to_online_page() may return invalid @page objects. For example with a memory map like: 100000000-1fbffffff : System RAM 142000000-143002e16 : Kernel code 143200000-143713fff : Kernel rodata 143800000-143b15b7f : Kernel data 144227000-144ffffff : Kernel bss 1fc000000-2fbffffff : Persistent Memory (legacy) 1fc000000-2fbffffff : namespace0.0 This command: echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page ...succeeds when it should fail. When it succeeds it touches an uninitialized page and may crash or cause other damage (see dissolve_free_huge_page()). While the memory map above is contrived via the memmap=ss!nn kernel command line option, the collision happens in practice on shipping platforms. The memory controller resources that decode spans of physical address space are a limited resource. One technique platform-firmware uses to conserve those resources is to share a decoder across 2 devices to keep the address range contiguous. Unfortunately the unit of operation of a decoder is 64MiB while the Linux section size is 128MiB. This results in situations where, without subsection hotplug memory mappings with different lifetimes collide into one object that can only express one lifetime. Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a section that mixes ZONE_DEVICE pfns with other online pfns. With SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall back to a slow-path check for ZONE_DEVICE pfns in an online section. In the fast path online_section() for a full ZONE_DEVICE section returns false. Because the collision case is rare, and for simplicity, the SECTION_TAINT_ZONE_DEVICE flag is never cleared once set. [dan.j.williams@intel.com: fix CONFIG_ZONE_DEVICE=n build] Link: https://lkml.kernel.org/r/CAPcyv4iX+7LAgAeSqx7Zw-Zd=ZV9gBv8Bo7oTbwCOOqJoZ3+Yg@mail.gmail.com Link: https://lkml.kernel.org/r/161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: teach pfn_to_online_page() to consider subsection validityDan Williams1-4/+19
pfn_to_online_page is primarily used to filter out offline or fully uninitialized pages. pfn_valid resp. online_section_nr have a coarse per memory section granularity. If a section shared with a partially offline memory (e.g. part of ZONE_DEVICE) then pfn_to_online_page would lead to a false positive on some pfns. Fix this by adding pfn_section_valid check which is subsection aware. [mhocko@kernel.org: changelog rewrite] Link: https://lkml.kernel.org/r/161058500148.1840162.4365921007820501696.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Oscar Salvador <osalvador@suse.de> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: move pfn_to_online_page() out of lineDan Williams2-16/+17
Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4. A pfn-walker that uses pfn_to_online_page() may inadvertently translate a pfn as online and in the page allocator, when it is offline managed by a ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions")). The 2 proposals under consideration are teach pfn_to_online_page() to be precise in the presence of mixed-zone sections, or teach the memory-add code to drop the System RAM associated with ZONE_DEVICE collisions. In order to not regress memory capacity by a few 10s to 100s of MiB the approach taken in this set is to add precision to pfn_to_online_page(). In the course of validating pfn_to_online_page() a couple other fixes fell out: 1/ soft_offline_page() fails to drop the reference taken in the madvise(..., MADV_SOFT_OFFLINE) case. 2/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages, however that mapping may contain data pages and metadata raw pfns. Introduce pgmap_pfn_valid() to delineate the 2 types and fail the handling of raw metadata pfns. This patch (of 4); pfn_to_online_page() is already too large to be a macro or an inline function. In anticipation of further logic changes / growth, move it out of line. No functional change, just code movement. Link: https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/161058499608.1840162.10165648147615238793.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/vmstat.c: erase latency in vmstat_shepherdJiang Biao1-0/+2
Many 100us+ latencies have been deteceted in vmstat_shepherd() on CPX platform which has 208 logic cpus. And vmstat_shepherd is queued every second, which could make the case worse. Add schedule point in vmstat_shepherd() to erase the latency. Link: https://lkml.kernel.org/r/20210111035526.1511-1-benbjiang@tencent.com Signed-off-by: Jiang Biao <benbjiang@tencent.com> Reported-by: Bin Lai <robinlai@tencent.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: vmstat: add some comments on internal storage of byte itemsJohannes Weiner2-0/+18
Byte-accounted items are used for slab object accounting at the cgroup level, because the objects in a slab page can belong to different cgroups. At the global level these items always change in multiples of whole slab pages. The vmstat code exploits this and stores these items as pages internally, which allows for more compact per-cpu data. This optimization isn't self-evident from the asserts and the division in the stat update functions. Provide the reader with some context. Link: https://lkml.kernel.org/r/20210202184411.118614-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: vmstat: fix NOHZ wakeups for node stat changesJohannes Weiner1-6/+9
On NOHZ, the periodic vmstat flushers on each CPU can go to sleep and won't wake up until stat changes are detected in the per-cpu deltas of the zone vmstat counters. In commit 75ef71840539 ("mm, vmstat: add infrastructure for per-node vmstats") per-node counters were introduced, and subsequently most stats were moved from the zone to the node level. However, the node counters weren't added to the NOHZ wakeup detection. In theory this can cause per-cpu errors to remain in the user-reported stats indefinitely. In practice this only affects a handful of sub counters (file_mapped, dirty and writeback e.g.) because other page state changes at the node level likely involve a change at the zone level as well (alloc and free, lru ops). Also, nobody has complained. Fix it up for completeness: wake up vmstat refreshing on node changes. Also remove the BUILD_BUG_ONs that assert counter size; we haven't relied on it since we added sizeof() to the range calculation in commit 13c9aaf7fa01 ("mm/vmstat.c: fix NUMA statistics updates"). Link: https://lkml.kernel.org/r/20210202184342.118513-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: cma: print region name on failurePatrick Daly1-2/+2
Print the name of the CMA region for convenience. This is useful information to have when cma_alloc() fails. [pdaly@codeaurora.org: print the "count" variable] Link: https://lkml.kernel.org/r/20210209142414.12768-1-georgi.djakov@linaro.org Link: https://lkml.kernel.org/r/20210208115200.20286-1-georgi.djakov@linaro.org Signed-off-by: Patrick Daly <pdaly@codeaurora.org> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfoDavid Hildenbrand3-2/+20
Let's count the number of CMA pages per zone and print them in /proc/zoneinfo. Having access to the total number of CMA pages per zone is helpful for debugging purposes to know where exactly the CMA pages ended up, and to figure out how many pages of a zone might behave differently, even after some of these pages might already have been allocated. As one example, CMA pages part of a kernel zone cannot be used for ordinary kernel allocations but instead behave more like ZONE_MOVABLE. For now, we are only able to get the global nr+free cma pages from /proc/meminfo and the free cma pages per zone from /proc/zoneinfo. Example after this patch when booting a 6 GiB QEMU VM with "hugetlb_cma=2G": # cat /proc/zoneinfo | grep cma cma 0 nr_free_cma 0 cma 0 nr_free_cma 0 cma 524288 nr_free_cma 493016 cma 0 cma 0 # cat /proc/meminfo | grep Cma CmaTotal: 2097152 kB CmaFree: 1972064 kB Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way, one can be sure when spotting "cma 0", that there are definetly no CMA pages located in a zone. [david@redhat.com: v2] Link: https://lkml.kernel.org/r/20210128164533.18566-1-david@redhat.com [david@redhat.com: v3] Link: https://lkml.kernel.org/r/20210129113451.22085-1-david@redhat.com Link: https://lkml.kernel.org/r/20210127101813.6370-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/cma: expose all pages to the buddy if activation of an area failsDavid Hildenbrand1-22/+21
Right now, if activation fails, we might already have exposed some pages to the buddy for CMA use (although they will never get actually used by CMA), and some pages won't be exposed to the buddy at all. Let's check for "single zone" early and on error, don't expose any pages for CMA use - instead, expose them to the buddy available for any use. Simply call free_reserved_page() on every single page - easier than going via free_reserved_area(), converting back and forth between pfns and virt addresses. In addition, make sure to fixup totalcma_pages properly. Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...": [ 0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.006893] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.006893] hugetlb_cma: reserved 2048 MiB on node 0 ... [ 0.175433] cma: CMA area hugetlb0 could not be activated Before this patch: # cat /proc/meminfo MemTotal: 5867348 kB MemFree: 5692808 kB MemAvailable: 5542516 kB ... CmaTotal: 2097152 kB CmaFree: 1884160 kB After this patch: # cat /proc/meminfo MemTotal: 6077308 kB MemFree: 5904208 kB MemAvailable: 5747968 kB ... CmaTotal: 0 kB CmaFree: 0 kB Note: cma_init_reserved_mem() makes sure that we always cover full pageblocks / MAX_ORDER - 1 pages. Link: https://lkml.kernel.org/r/20210127101813.6370-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: cma: allocate cma areas bottom-upRoman Gushchin1-0/+17
Currently cma areas without a fixed base are allocated close to the end of the node. This placement is sub-optimal because of compaction: it brings pages into the cma area. In particular, it can bring in hot executable pages, even if there is a plenty of free memory on the machine. This results in cma allocation failures. Instead let's place cma areas close to the beginning of a node. In this case the compaction will help to free cma areas, resulting in better cma allocation success rates. If there is enough memory let's try to allocate bottom-up starting with 4GB to exclude any possible interference with DMA32. On smaller machines or in a case of a failure, stick with the old behavior. 16GB vm, 2GB cma area: With this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 Without this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 v2: - switched to memblock_set_bottom_up(true), by Mike - start with 4GB, by Mike [guro@fb.com: whitespace fix, per Mike] Link: https://lkml.kernel.org/r/20201221170551.GB3428478@carbon.DHCP.thefacebook.com [guro@fb.com: fix 32-bit warnings] Link: https://lkml.kernel.org/r/20201223163537.GA4011967@carbon.DHCP.thefacebook.com [guro@fb.com: fix 32-bit systems] [akpm@linux-foundation.org: build fix] Link: https://lkml.kernel.org/r/20201217201214.3414100-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm,shmem,thp: limit shmem THP allocations to requested zonesRik van Riel1-1/+5
Hugh pointed out that the gma500 driver uses shmem pages, but needs to limit them to the DMA32 zone. Ensure the allocations resulting from the gfp_mask returned by limit_gfp_mask use the zone flags that were originally passed to shmem_getpage_gfp. Link: https://lkml.kernel.org/r/20210224121016.1314ed6d@imladris.surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Suggested-by: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm,thp,shmem: make khugepaged obey tmpfs mount flagsRik van Riel2-6/+18
Currently if thp enabled=[madvise], mounting a tmpfs filesystem with huge=always and mmapping files from that tmpfs does not result in khugepaged collapsing those mappings, despite the mount flag indicating that it should. Fix that by breaking up the blocks of tests in hugepage_vma_check a little bit, and testing things in the correct order. Link: https://lkml.kernel.org/r/20201124194925.623931-4-riel@surriel.com Fixes: c2231020ea7b ("mm: thp: register mm for khugepaged when merging vma for shmem") Signed-off-by: Rik van Riel <riel@surriel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm,thp,shm: limit gfp mask to no more than specifiedRik van Riel1-0/+21
Matthew Wilcox pointed out that the i915 driver opportunistically allocates tmpfs memory, but will happily reclaim some of its pool if no memory is available. Make sure the gfp mask used to opportunistically allocate a THP is always at least as restrictive as the original gfp mask. Link: https://lkml.kernel.org/r/20201124194925.623931-3-riel@surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm,thp,shmem: limit shmem THP alloc gfp_maskRik van Riel3-6/+10
Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6. The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. This patch (of 4): The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. Controlling the gfp_mask of THP allocations through the knobs in sysfs allows users to determine the balance between how aggressively the system tries to allocate THPs at fault time, and how much the application may end up stalling attempting those allocations. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. Link: https://lkml.kernel.org/r/20201124194925.623931-1-riel@surriel.com Link: https://lkml.kernel.org/r/20201124194925.623931-2-riel@surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: remove pagevec_lookup_entriesMatthew Wilcox (Oracle)3-39/+4
pagevec_lookup_entries() is now just a wrapper around find_get_entries() so remove it and convert all its callers. Link: https://lkml.kernel.org/r/20201112212641.27837-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: pass pvec directly to find_get_entriesMatthew Wilcox (Oracle)4-20/+13
All callers of find_get_entries() use a pvec, so pass it directly instead of manipulating it in the caller. Link: https://lkml.kernel.org/r/20201112212641.27837-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: remove nr_entries parameter from pagevec_lookup_entriesMatthew Wilcox (Oracle)3-6/+5
All callers want to fetch the full size of the pvec. Link: https://lkml.kernel.org/r/20201112212641.27837-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add an 'end' parameter to pagevec_lookup_entriesMatthew Wilcox (Oracle)3-38/+16
Simplifies the callers and uses the existing functionality in find_get_entries(). We can also drop the final argument of truncate_exceptional_pvec_entries() and simplify the logic in that function. Link: https://lkml.kernel.org/r/20201112212641.27837-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add an 'end' parameter to find_get_entriesMatthew Wilcox (Oracle)4-15/+10
This simplifies the callers and leads to a more efficient implementation since the XArray has this functionality already. Link: https://lkml.kernel.org/r/20201112212641.27837-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add and use find_lock_entriesMatthew Wilcox (Oracle)4-97/+78
We have three functions (shmem_undo_range(), truncate_inode_pages_range() and invalidate_mapping_pages()) which want exactly this function, so add it to filemap.c. Before this patch, shmem_undo_range() would split any compound page which overlaps either end of the range being punched in both the first and second loops through the address space. After this patch, that functionality is left for the second loop, which is arguably more appropriate since the first loop is supposed to run through all the pages quickly, and splitting a page can sleep. [willy@infradead.org: add assertion] Link: https://lkml.kernel.org/r/20201124041507.28996-3-willy@infradead.org Link: https://lkml.kernel.org/r/20201112212641.27837-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26iomap: use mapping_seek_hole_dataMatthew Wilcox (Oracle)2-119/+43
Enhance mapping_seek_hole_data() to handle partially uptodate pages and convert the iomap seek code to call it. Link: https://lkml.kernel.org/r/20201112212641.27837-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/filemap: add mapping_seek_hole_dataMatthew Wilcox (Oracle)3-70/+82
Rewrite shmem_seek_hole_data() and move it to filemap.c. [willy@infradead.org: don't put an xa_is_value() page] Link: https://lkml.kernel.org/r/20201124041507.28996-4-willy@infradead.org Link: https://lkml.kernel.org/r/20201112212641.27837-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/filemap: add helper for finding pagesMatthew Wilcox (Oracle)1-55/+42
There is a lot of common code in find_get_entries(), find_get_pages_range() and find_get_pages_range_tag(). Factor out find_get_entry() which simplifies all three functions. [willy@infradead.org: remove VM_BUG_ON_PAGE()] Link: https://lkml.kernel.org/r/20201124041507.28996-2-willy@infradead.orgLink: https://lkml.kernel.org/r/20201112212641.27837-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/filemap: rename find_get_entry to mapping_get_entryMatthew Wilcox (Oracle)1-3/+4
find_get_entry doesn't "find" anything. It returns the entry at a particular index. Link: https://lkml.kernel.org/r/20201112212641.27837-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add FGP_ENTRYMatthew Wilcox (Oracle)5-41/+13
The functionality of find_lock_entry() and find_get_entry() can be provided by pagecache_get_page(), which lets us delete find_lock_entry() and make find_get_entry() static. Link: https://lkml.kernel.org/r/20201112212641.27837-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/swap: optimise get_shadow_from_swap_cacheMatthew Wilcox (Oracle)1-3/+1
There's no need to get a reference to the page, just load the entry and see if it's a shadow entry. Link: https://lkml.kernel.org/r/20201112212641.27837-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/shmem: use pagevec_lookup in shmem_unlock_mappingMatthew Wilcox (Oracle)1-10/+1
The comment shows that the reason for using find_get_entries() is now stale; find_get_pages() will not return 0 if it hits a consecutive run of swap entries, and I don't believe it has since 2011. pagevec_lookup() is a simpler function to use than find_get_pages(), so use it instead. Link: https://lkml.kernel.org/r/20201112212641.27837-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: make pagecache tagged lookups return only head pagesMatthew Wilcox (Oracle)1-5/+6
Patch series "Overhaul multi-page lookups for THP", v4. This THP prep patchset changes several page cache iteration APIs to only return head pages. - It's only possible to tag head pages in the page cache, so only return head pages, not all their subpages. - Factor a lot of common code out of the various batch lookup routines - Add mapping_seek_hole_data() - Unify find_get_entries() and pagevec_lookup_entries() - Make find_get_entries only return head pages, like find_get_entry(). These are only loosely connected, but they seem to make sense together as a series. This patch (of 14): Pagecache tags are used for dirty page writeback. Since dirtiness is tracked on a per-THP basis, we only want to return the head page rather than each subpage of a tagged page. All the filesystems which use huge pages today are in-memory, so there are no tagged huge pages today. Link: https://lkml.kernel.org/r/20201112212641.27837-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26Merge tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds22-272/+356
Pull NFS Client Updates from Anna Schumaker: "New Features: - Support for eager writes, and the write=eager and write=wait mount options - Other Bugfixes and Cleanups: - Fix typos in some comments - Fix up fall-through warnings for Clang - Cleanups to the NFS readpage codepath - Remove FMR support in rpcrdma_convert_iovs() - Various other cleanups to xprtrdma - Fix xprtrdma pad optimization for servers that don't support RFC 8797 - Improvements to rpcrdma tracepoints - Fix up nfs4_bitmask_adjust() - Optimize sparse writes past the end of files" * tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFS: Support the '-owrite=' option in /proc/self/mounts and mountinfo NFS: Set the stable writes flag when initialising the super block NFS: Add mount options supporting eager writes NFS: Add support for eager writes NFS: 'flags' field should be unsigned in struct nfs_server NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache NFS: Always clear an invalid mapping when attempting a buffered write NFS: Optimise sparse writes past the end of file NFS: Fix documenting comment for nfs_revalidate_file_size() NFSv4: Fixes for nfs4_bitmask_adjust() xprtrdma: Clean up rpcrdma_prepare_readch() rpcrdma: Capture bytes received in Receive completion tracepoints xprtrdma: Pad optimization, revisited rpcrdma: Fix comments about reverse-direction operation xprtrdma: Refactor invocations of offset_in_page() xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() xprtrdma: Remove FMR support in rpcrdma_convert_iovs() NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async() NFS: Call readpage_async_filler() from nfs_readpage_async() NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc ...
2021-02-26swiotlb: Validate bounce size in the sync/unmap pathMartin Radev1-3/+50
The size of the buffer being bounced is not checked if it happens to be larger than the size of the mapped buffer. Because the size can be controlled by a device, as it's the case with virtio devices, this can lead to memory corruption. This patch saves the remaining buffer memory for each slab and uses that information for validation in the sync/unmap paths before swiotlb_bounce is called. Validating this argument is important under the threat models of AMD SEV-SNP and Intel TDX, where the HV is considered untrusted. Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-26nvme-pci: set min_align_maskJianxiong Gao1-0/+1
The PRP addressing scheme requires all PRP entries except for the first one to have a zero offset into the NVMe controller pages (which can be different from the Linux PAGE_SIZE). Use the min_align_mask device parameter to ensure that swiotlb does not change the address of the buffer modulo the device page size to ensure that the PRPs won't be malformed. Signed-off-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-26swiotlb: respect min_align_maskChristoph Hellwig1-10/+31
Respect the min_align_mask in struct device_dma_parameters in swiotlb. There are two parts to it: 1) for the lower bits of the alignment inside the io tlb slot, just extent the size of the allocation and leave the start of the slot empty 2) for the high bits ensure we find a slot that matches the high bits of the alignment to avoid wasting too much memory Based on an earlier patch from Jianxiong Gao <jxgao@google.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jianxiong Gao <jxgao@google.com> Tested-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-26i2c: exynos5: Preserve high speed master codeMårten Lindahl1-1/+7
When the driver starts to send a message with the MASTER_ID field set (high speed), the whole I2C_ADDR register is overwritten including MASTER_ID as the SLV_ADDR_MAS field is set. This patch preserves already written fields in I2C_ADDR when writing SLV_ADDR_MAS. Fixes: 8a73cd4cfa15 ("i2c: exynos5: add High Speed I2C controller driver") Signed-off-by: Mårten Lindahl <martenli@axis.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-02-26Revert "i2c: i2c-qcom-geni: Add shutdown callback for i2c"Wolfram Sang1-34/+0
This reverts commit e0371298ddc51761be257698554ea507ac8bf831. It was accidently applied despite discussion still going on. Signed-off-by: Wolfram Sang <wsa@kernel.org> Acked-by: Stephen Boyd <swboyd@chromium.org>
2021-02-26i2c: designware: Get right data lengthLiguang Zhang2-1/+3
IC_DATA_CMD[11] indicates the first data byte received after the address phase for receive transfer in Master receiver or Slave receiver mode, this bit was set in some transfer flow. IC_DATA_CMD[7:0] contains the data to be transmitted or received on the I2C bus, so we should use the lower 8 bits to get the real data length. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-02-26i2c: brcmstb: Fix brcmstd_send_i2c_cmd conditionMaxime Ripard1-1/+1
The brcmstb_send_i2c_cmd currently has a condition that is (CMD_RD || CMD_WR) which always evaluates to true, while the obvious fix is to test whether the cmd variable passed as parameter holds one of these two values. Fixes: dd1aa2524bc5 ("i2c: brcmstb: Add Broadcom settop SoC i2c controller driver") Reported-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-02-25cifs: update internal version numberSteve French1-1/+1
To 2.31 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25cifs: use discard iterator to discard unneeded network data more efficientlyDavid Howells3-3/+22
The iterator, ITER_DISCARD, that can only be used in READ mode and just discards any data copied to it, was added to allow a network filesystem to discard any unwanted data sent by a server. Convert cifs_discard_from_socket() to use this. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=yNathan Chancellor1-4/+5
clang produces .eh_frame sections when CONFIG_GCOV_KERNEL is enabled, even when -fno-asynchronous-unwind-tables is in KBUILD_CFLAGS: $ make CC=clang vmlinux ... ld: warning: orphan section `.eh_frame' from `init/main.o' being placed in section `.eh_frame' ld: warning: orphan section `.eh_frame' from `init/version.o' being placed in section `.eh_frame' ld: warning: orphan section `.eh_frame' from `init/do_mounts.o' being placed in section `.eh_frame' ld: warning: orphan section `.eh_frame' from `init/do_mounts_initrd.o' being placed in section `.eh_frame' ld: warning: orphan section `.eh_frame' from `init/initramfs.o' being placed in section `.eh_frame' ld: warning: orphan section `.eh_frame' from `init/calibrate.o' being placed in section `.eh_frame' ld: warning: orphan section `.eh_frame' from `init/init_task.o' being placed in section `.eh_frame' ... $ rg "GCOV_KERNEL|GCOV_PROFILE_ALL" .config CONFIG_GCOV_KERNEL=y CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y CONFIG_GCOV_PROFILE_ALL=y This was already handled for a couple of other options in commit d812db78288d ("vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted sections") and there is an open LLVM bug for this issue. Take advantage of that section for this config as well so that there are no more orphan warnings. Link: https://bugs.llvm.org/show_bug.cgi?id=46478 Link: https://github.com/ClangBuiltLinux/linux/issues/1069 Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Fixes: d812db78288d ("vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted sections") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210130004650.2682422-1-nathan@kernel.org
2021-02-25Merge tag 'pwm/for-5.12-rc1' of ↵Linus Torvalds7-374/+65
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "The ZTE ZX platform is being removed, so the PWM driver is no longer needed and removed as well. Other than that this contains a small set of fixes and cleanups across a couple of drivers" * tag 'pwm/for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: lpc18xx-sct: remove unneeded semicolon pwm: iqs620a: Correct a stale state variable pwm: iqs620a: Fix overflow and optimize calculations pwm: rockchip: Enable clock before calling clk_get_rate() pwm: rockchip: Eliminate potential race condition when probing pwm: rockchip: Replace "bus clk" with "PWM clk" pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare() pwm: rockchip: Enable APB clock during register access while probing pwm: Remove ZTE ZX driver
2021-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds22-507/+1492
Pull virtio updates from Michael Tsirkin: - new vdpa features to allow creation and deletion of new devices - virtio-blk support per-device queue depth - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits) virtio-input: add multi-touch support virtio_mmio: fix one typo vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() virtio_net: Fix fall-through warnings for Clang virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT. virtio-blk: support per-device queue depth virtio_vdpa: don't warn when fail to disable vq virtio-pci: introduce modern device module virito-pci-modern: rename map_capability() to vp_modern_map_capability() virtio-pci-modern: introduce helper to get notification offset virtio-pci-modern: introduce helper for getting queue nums virtio-pci-modern: introduce helper for setting/geting queue size virtio-pci-modern: introduce helper to set/get queue_enable virtio-pci-modern: introduce vp_modern_queue_address() virtio-pci-modern: introduce vp_modern_set_queue_vector() virtio-pci-modern: introduce vp_modern_generation() virtio-pci-modern: introduce helpers for setting and getting features virtio-pci-modern: introduce helpers for setting and getting status virtio-pci-modern: introduce helper to set config vector virtio-pci-modern: introduce vp_modern_remove() ...
2021-02-25kbuild: Move .thinlto-cache removal to 'make clean'Masahiro Yamada1-2/+2
Instead of 'make distclean', 'make clean' should remove build artifacts unneeded by external module builds. Obviously, you do not need to keep this directory. Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210225193912.3303604-1-masahiroy@kernel.org
2021-02-25parisc: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRYSami Tolvanen1-0/+1
parisc uses -fpatchable-function-entry with dynamic ftrace, which means we don't need recordmcount. Select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY to tell that to the build system. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 3b15cdc15956 ("tracing: move function tracer options to Kconfig") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210224225706.2726050-1-samitolvanen@google.com
2021-02-25Merge tag 'mips_5.12_1' of ↵Linus Torvalds8-5/+190
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull more MIPS updates from Thomas Bogendoerfer: - added n64 block driver - fix for ubsan warnings - fix for bcm63xx platform - update of linux-mips mailinglist * tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: arch: mips: update references to current linux-mips list mips: bmips: init clocks earlier vmlinux.lds.h: catch even more instrumentation symbols into .data n64: store dev instance into disk private data n64: cleanup n64cart_probe() n64: cosmetics changes n64: remove curly brackets n64: use sector SECTOR_SHIFT instead 512 n64: use enums for reg n64: move module param at the top n64: move module info at the end n64: use pr_fmt to avoid duplicate string block: Add n64 cart driver
2021-02-25docs: proc.rst: fix indentation warningRandy Dunlap1-1/+1
Fix indentation snafu in proc.rst as reported by Stephen. next-20210219/Documentation/filesystems/proc.rst:697: WARNING: Unexpected indentation. Fixes: 93ea4a0b8fce ("Documentation: proc.rst: add more about the 6 fields in loadavg") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20210223060418.21443-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-25Merge tag 'drm-next-2021-02-26' of git://anongit.freedesktop.org/drm/drmLinus Torvalds68-423/+1223
Pull more drm updates from Dave Airlie: "This is mostly fixes but I missed msm-next pull last week. It's been in drm-next. Otherwise it's a selection of i915, amdgpu and misc fixes, one TTM memory leak, nothing really major stands out otherwise. core: - vblank fence timing improvements dma-buf: - improve error handling ttm: - memory leak fix msm: - a6xx speedbin support - a508, a509, a512 support - various a5xx fixes - various dpu fixes - qseed3lite support for sm8250 - dsi fix for msm8994 - mdp5 fix for framerate bug with cmd mode panels - a6xx GMU OOB race fixes that were showing up in CI - various addition and removal of semicolons - gem submit fix for legacy userspace relocs path amdgpu: - clang warning fix - S0ix platform shutdown/poweroff fix - misc display fixes i915: - color format fix - -Wuninitialised reenabled - GVT ww locking, cmd parser fixes atyfb: - fix build rockchip: - AFBC modifier fix" * tag 'drm-next-2021-02-26' of git://anongit.freedesktop.org/drm/drm: (60 commits) drm/panel: kd35t133: allow using non-continuous dsi clock drm/rockchip: Require the YTR modifier for AFBC drm/ttm: Fix a memory leak drm/drm_vblank: set the dma-fence timestamp during send_vblank_event dma-fence: allow signaling drivers to set fence timestamp dma-buf: heaps: Rework heap allocation hooks to return struct dma_buf instead of fd dma-buf: system_heap: Make sure to return an error if we abort drm/amd/display: Fix system hang after multiple hotplugs (v3) drm/amdgpu: fix shutdown and poweroff process failed with s0ix drm/i915: Nuke INTEL_OUTPUT_FORMAT_INVALID drm/i915: Enable -Wuninitialized drm/amd/display: Remove Assert from dcn10_get_dig_frontend drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1 Revert "drm/amd/display: reuse current context instead of recreating one" drm/amd/pm/swsmu: Avoid using structure_size uninitialized in smu_cmn_init_soft_gpu_metrics fbdev: atyfb: add stubs for aty_{ld,st}_lcd() drm/i915/gvt: Introduce per object locking in GVT scheduler. drm/i915/gvt: Purge dev_priv->gt drm/i915/gvt: Parse default state to update reg whitelist dt-bindings: dp-connector: Drop maxItems from -supply ...
2021-02-25Documentation: cgroup-v2: fix path to example BPF programAntonio Terceiro1-1/+1
This file has been moved into the "progs" subdirectory, together with all test BPF programs. Fixes: bd4aed0ee73c ("selftests: bpf: centre kernel bpf objects under new subdir "progs"") Signed-off-by: Antonio Terceiro <antonio.terceiro@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jiong Wang <jiong.wang@netronome.com> Link: https://lore.kernel.org/r/20210224131631.349287-1-antonio.terceiro@linaro.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-25Merge tag 'net-5.12-rc1' of ↵Linus Torvalds68-371/+734
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Rather small batch this time. Current release - regressions: - bcm63xx_enet: fix sporadic kernel panic due to queue length mis-accounting Current release - new code bugs: - bcm4908_enet: fix RX path possible mem leak - bcm4908_enet: fix NAPI poll returned value - stmmac: fix missing spin_lock_init in visconti_eth_dwmac_probe() - sched: cls_flower: validate ct_state for invalid and reply flags Previous releases - regressions: - net: introduce CAN specific pointer in the struct net_device to prevent mis-interpreting memory - phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 - psample: fix netlink skb length with tunnel info Previous releases - always broken: - icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending - wireguard: device: do not generate ICMP for non-IP packets - mptcp: provide subflow aware release function to avoid a mem leak - hsr: add support for EntryForgetTime - r8169: fix jumbo packet handling on RTL8168e - octeontx2-af: fix an off by one in rvu_dbg_qsize_write() - i40e: fix flow for IPv6 next header (extension header) - phy: icplus: call phy_restore_page() when phy_select_page() fails - dpaa_eth: fix the access method for the dpaa_napi_portal" * tag 'net-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) r8169: fix jumbo packet handling on RTL8168e net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 net: psample: Fix netlink skb length with tunnel info net: broadcom: bcm4908_enet: fix NAPI poll returned value net: broadcom: bcm4908_enet: fix RX path possible mem leak net: hsr: add support for EntryForgetTime net: dsa: sja1105: Remove unneeded cast in sja1105_crc32() ibmvnic: fix a race between open and reset net: stmmac: Fix missing spin_lock_init in visconti_eth_dwmac_probe() net: introduce CAN specific pointer in the struct net_device net: usb: qmi_wwan: support ZTE P685M modem wireguard: kconfig: use arm chacha even with no neon wireguard: queueing: get rid of per-peer ring buffers wireguard: device: do not generate ICMP for non-IP packets wireguard: peer: put frequently used members above cache lines wireguard: selftests: test multiple parallel streams wireguard: socket: remove bogus __be32 annotation wireguard: avoid double unlikely() notation when using IS_ERR() net: qrtr: Fix memory leak in qrtr_tun_open vxlan: move debug check after netdev unregister ...
2021-02-25docs: powerpc: Fix tables in syscall64-abi.rstAndrew Donnellan1-19/+32
Commit 209b44c804c ("docs: powerpc: syscall64-abi.rst: fix a malformed table") attempted to fix the formatting of tables in syscall64-abi.rst, but inadvertently changed some register names. Redo the tables with the correct register names, and while we're here, clean things up to separate the registers into different rows and add headings. Fixes: 209b44c804c ("docs: powerpc: syscall64-abi.rst: fix a malformed table") Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Link: https://lore.kernel.org/r/20210225060857.16083-1-ajd@linux.ibm.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-25Merge tag 'acpi-5.12-rc1-3' of ↵Linus Torvalds8-27/+334
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These make additional changes to the platform profile interface merged recently and add support for the FPDT ACPI table. Specifics: - Rearrange Kconfig handling of ACPI_PLATFORM_PROFILE, add "balanced-performance" to the list of supported platform profiles and fix up some file references in a comment (Maximilian Luz). - Add support for parsing the ACPI Firmware Performance Data Table (FPDT) and exposing the data from there via sysfs (Zhang Rui)" * tag 'acpi-5.12-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: platform: Add balanced-performance platform profile ACPI: platform: Fix file references in comment ACPI: platform: Hide ACPI_PLATFORM_PROFILE option ACPI: tables: introduce support for FPDT table
2021-02-26Merge tag 'drm-intel-next-fixes-2021-02-25' of ↵Dave Airlie7-94/+65
git://anongit.freedesktop.org/drm/drm-intel into drm-next A fix for color format check from Ville, plus the re-enable of -Wuninitialized from Nathan, and the GVT fixes including fixes for ww locking, cmd parser and a general cleanup of dev_priv->gt. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YDe3pBPV5Kx3hpk6@intel.com
2021-02-26Merge tag 'amd-drm-fixes-5.12-2021-02-24' of ↵Dave Airlie11-48/+158
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-5.12-2021-02-24: amdgpu: - Clang warning fix - S0ix platform shutdown/poweroff fix - Misc display fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210225043853.3880-1-alexander.deucher@amd.com
2021-02-26Merge tag 'drm-misc-next-fixes-2021-02-25' of ↵Dave Airlie14-62/+197
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next tasty fixes for v5.12: - Cherry pick of drm-misc-fixes pull: "here's this week's PR for drm-misc-fixes. One of the patches is a memory leak; the rest is for hardware issues." - Fix dt bindings for dp connector. - Fix build error in atyfb. - Improve error handling for dma-buf heaps. - Make vblank timestamp more correct, by recording timestamp to be set when signaling. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/0f60a68c-d562-7266-0815-ea75ff680b17@linux.intel.com
2021-02-25Documentation: features: refresh feature listArnd Bergmann10-10/+10
Run the update script to document the recent feature additions on riscv, mips and csky. Fixes: c109f42450ec ("csky: Add kmemleak support") Fixes: 8b3165e54566 ("MIPS: Enable GCOV") Fixes: 1ddc96bd42da ("MIPS: kernel: Support extracting off-line stack traces from user-space with perf") Fixes: 74784081aac8 ("riscv: Add uprobes supported") Fixes: 829adda597fe ("riscv: Add KPROBES_ON_FTRACE supported") Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Fixes: dcdc7a53a890 ("RISC-V: Implement ptrace regs and stack API") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210225142841.3385428-2-arnd@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-25Documentation: features: remove c6x referencesArnd Bergmann41-41/+0
The references to arch/c6x are obsolete now that the architecture is gone. Remove them. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210225142841.3385428-1-arnd@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-25cifs: introduce helper for finding referral server to improve DFS target ↵Paulo Alcantara2-17/+51
resolution Some servers seem to mistakenly report different values for capabilities and share flags, so we can't always rely on those values to decide whether the resolved target can handle any new DFS referrals. Add a new helper is_referral_server() to check if all resolved targets can handle new DFS referrals by directly looking at the GET_DFS_REFERRAL.ReferralHeaderFlags value as specified in MS-DFSC 2.2.4 RESP_GET_DFS_REFERRAL in addition to is_tcon_dfs(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org # 5.11 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25cifs: check all path components in resolved dfs targetPaulo Alcantara1-60/+33
Handle the case where a resolved target share is like //server/users/dir, and the user "foo" has no read permission to access the parent folder "users" but has access to the final path component "dir". is_path_remote() already implements that, so call it directly. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org # 5.11 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25Merge tag 'kbuild-v5.12' of ↵Linus Torvalds65-498/+698
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic syscallhdr.sh scripts: add generic syscalltbl.sh arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...
2021-02-25cifs: fix DFS failoverPaulo Alcantara1-64/+59
In do_dfs_failover(), the mount_get_conns() function requires the full fs context in order to get new connection to server, so clone the original context and change it accordingly when retrying the DFS targets in the referral. If failover was successful, then update original context with the new UNC, prefix path and ip address. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org # 5.11 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25cifs: fix nodfs mount optionPaulo Alcantara1-4/+4
Skip DFS resolving when mounting with 'nodfs' even if CONFIG_CIFS_DFS_UPCALL is enabled. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org # 5.11 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25cifs: fix handling of escaped ',' in the password mount argumentRonnie Sahlberg1-13/+30
Passwords can contain ',' which are also used as the separator between mount options. Mount.cifs will escape all ',' characters as the string ",,". Update parsing of the mount options to detect ",," and treat it as a single 'c' character. Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Cc: stable@vger.kernel.org # 5.11 Reported-by: Simon Taylor <simon@simon-taylor.me.uk> Tested-by: Simon Taylor <simon@simon-taylor.me.uk> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-25Merge tag 'ext4_for_linus' of ↵Linus Torvalds6-49/+59
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Miscellaneous ext4 cleanups and bug fixes. Pretty boring this cycle..." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add .kunitconfig fragment to enable ext4-specific tests ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it ext4: reset retry counter when ext4_alloc_file_blocks() makes progress ext4: fix potential htree index checksum corruption ext4: factor out htree rep invariant check ext4: Change list_for_each* to list_for_each_entry* ext4: don't try to processed freed blocks until mballoc is initialized ext4: use DEFINE_MUTEX() for mutex lock
2021-02-25Merge branch 'acpi-tables'Rafael J. Wysocki4-0/+316
* acpi-tables: ACPI: tables: introduce support for FPDT table
2021-02-25Merge tag 'pci-v5.12-changes' of ↵Linus Torvalds73-781/+5543
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Remove unnecessary locking around _OSC (Bjorn Helgaas) - Clarify message about _OSC failure (Bjorn Helgaas) - Remove notification of PCIe bandwidth changes (Bjorn Helgaas) - Tidy checking of syscall user config accessors (Heiner Kallweit) Resource management: - Decline to resize resources if boot config must be preserved (Ard Biesheuvel) - Fix pci_register_io_range() memory leak (Geert Uytterhoeven) Error handling (Keith Busch): - Clear error status from the correct device - Retain error recovery status so drivers can use it after reset - Log the type of Port (Root or Switch Downstream) that we reset - Always request a reset for Downstream Ports in frozen state Endpoint framework and NTB (Kishon Vijay Abraham I): - Make *_get_first_free_bar() take into account 64 bit BAR - Add helper API to get the 'next' unreserved BAR - Make *_free_bar() return error codes on failure - Remove unused pci_epf_match_device() - Add support to associate secondary EPC with EPF - Add support in configfs to associate two EPCs with EPF - Add pci_epc_ops to map MSI IRQ - Add pci_epf_ops to expose function-specific attrs - Allow user to create sub-directory of 'EPF Device' directory - Implement ->msi_map_irq() ops for cadence - Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence - Add EP function driver to provide NTB functionality - Add support for EPF PCI Non-Transparent Bridge - Add specification for PCI NTB function device - Add PCI endpoint NTB function user guide - Add configfs binding documentation for pci-ntb endpoint function Broadcom STB PCIe controller driver: - Add support for BCM4908 and external PERST# signal controller (Rafał Miłecki) Cadence PCIe controller driver: - Retrain Link to work around Gen2 training defect (Nadeem Athani) - Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof Wilczyński) Freescale Layerscape PCIe controller driver: - Add LX2160A rev2 EP mode support (Hou Zhiqiang) - Convert to builtin_platform_driver() (Michael Walle) MediaTek PCIe controller driver: - Fix OF node reference leak (Krzysztof Wilczyński) Microchip PolarFlare PCIe controller driver: - Add Microchip PolarFire PCIe controller driver (Daire McNamara) Qualcomm PCIe controller driver: - Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith) - Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov) Renesas R-Car PCIe controller driver: - Drop PCIE_RCAR config option (Lad Prabhakar) - Always allocate MSI addresses in 32bit space (Marek Vasut) Rockchip PCIe controller driver: - Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai) - Make 'ep-gpios' DT property optional (Chen-Yu Tsai) Synopsys DesignWare PCIe controller driver: - Work around ECRC configuration hardware defect (Vidya Sagar) - Drop support for config space in DT 'ranges' (Rob Herring) - Change size to u64 for EP outbound iATU (Shradha Todi) - Add upper limit address for outbound iATU (Shradha Todi) - Make dw_pcie ops optional (Jisheng Zhang) - Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang) Xilinx Versal CPM PCIe controller driver: - Fix OF node reference leak (Pan Bian) Miscellaneous: - Remove tango host controller driver (Arnd Bergmann) - Remove IRQ handler & data together (altera-msi, brcmstb, dwc) (Martin Kaiser) - Fix xgene-msi race in installing chained IRQ handler (Martin Kaiser) - Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He) - Fix pci-bridge-emul array overruns (Russell King) - Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej Siewior)" * tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits) PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064 PCI: qcom: Add support for ddrss_sf_tbu clock dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250 PCI: al: Remove useless dw_pcie_ops PCI: dwc: Don't assume the ops in dw_pcie always exist PCI: dwc: Add upper limit address for outbound iATU PCI: dwc: Change size to u64 for EP outbound iATU PCI: dwc: Drop support for config space in 'ranges' PCI: layerscape: Convert to builtin_platform_driver() PCI: layerscape: Add LX2160A rev2 EP mode support dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings PCI: dwc: Work around ECRC configuration issue PCI/portdrv: Report reset for frozen channel PCI/AER: Specify the type of Port that was reset PCI/ERR: Retain status from error notification PCI/AER: Clear AER status from Root Port when resetting Downstream Port PCI/ERR: Clear status of the reporting device dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B PCI: rockchip: Make 'ep-gpios' DT property optional Documentation: PCI: Add PCI endpoint NTB function user guide ...
2021-02-25r8169: fix jumbo packet handling on RTL8168eHeiner Kallweit1-2/+2
Josef reported [0] that using jumbo packets fails on RTL8168e. Aligning the values for register MaxTxPacketSize with the vendor driver fixes the problem. [0] https://bugzilla.kernel.org/show_bug.cgi?id=211827 Fixes: d58d46b5d851 ("r8169: jumbo fixes.") Reported-by: Josef Oškera <joskera@redhat.com> Tested-by: Josef Oškera <joskera@redhat.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/b15ddef7-0d50-4320-18f4-6a3f86fbfd3e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>