aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/core-api
AgeCommit message (Collapse)AuthorFilesLines
7 daysMerge tag 'docs-6.10' of git://git.lwn.net/linuxLinus Torvalds3-5/+25
Pull documentation updates from Jonathan Corbet: "Another not-too-busy cycle for documentation, including: - Some build-system changes to detect the variable fonts installed by some distributions that can break the PDF build. - Various updates and additions to the Spanish, Chinese, Italian, and Japanese translations. - Update the stable-kernel rules to match modern practice ... and the usual array of corrections, updates, and typo fixes" * tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits) cgroup: Add documentation for missing zswap memory.stat kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning. docs:core-api: fixed typos and grammar in printk-index page Documentation: tracing: Fix spelling mistakes docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4 docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4 docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4 docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4 docs: stable-kernel-rules: fix typo sent->send docs/zh_CN: remove two inconsistent spaces docs: scripts/check-variable-fonts.sh: Improve commands for detection docs: stable-kernel-rules: create special tag to flag 'no backporting' docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.) docs: stable-kernel-rules: remove code-labels tags and a indention level docs: stable-kernel-rules: call mainline by its name and change example docs: stable-kernel-rules: reduce redundancy docs, kprobes: Add riscv as supported architecture Docs: typos/spelling docs: kernel_include.py: Cope with docutils 0.21 docs: ja_JP/howto: Catch up update in v6.8 ...
13 daysdocs:core-api: fixed typos and grammar in printk-index pageDennis Lam1-2/+2
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240502212522.4263-1-dennis.lamerice@gmail.com
2024-05-02Docs: typos/spellingRemington Brasga1-1/+1
Fix spelling and grammar in Docs descriptions Signed-off-by: Remington Brasga <rbrasga@uci.edu> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240429225527.2329-1-rbrasga@uci.edu
2024-04-03Documentation/core-api: Update events_freezable_power references.Audra Mitchell1-3/+3
Due to commit 8318d6a6362f ("workqueue: Shorten events_freezable_power_efficient name") we now have some stale references in the workqeueue documentation, so updating those references accordingly. Signed-off-by: Audra Mitchell <audra@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-02docs: dma: correct dma_set_mask() sample codeFrank Li1-2/+22
There are bunch of codes in driver like if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) Actually it is wrong because if dma_set_mask_and_coherent(64) fails, dma_set_mask_and_coherent(32) will fail for the same reason. And dma_set_mask_and_coherent(64) never returns failure. According to the definition of dma_set_mask(), it indicates the width of address that device DMA can access. If it can access 64-bit address, it must access 32-bit address inherently. So only need set biggest address width. See below code fragment: dma_set_mask(mask) { mask = (dma_addr_t)mask; if (!dev->dma_mask || !dma_supported(dev, mask)) return -EIO; arch_dma_set_mask(dev, mask); *dev->dma_mask = mask; return 0; } dma_supported() will call dma_direct_supported or iommux's dma_supported call back function. int dma_direct_supported(struct device *dev, u64 mask) { u64 min_mask = (max_pfn - 1) << PAGE_SHIFT; /* * Because 32-bit DMA masks are so common we expect every architecture * to be able to satisfy them - either by not supporting more physical * memory, or by providing a ZONE_DMA32. If neither is the case, the * architecture needs to use an IOMMU instead of the direct mapping. */ if (mask >= DMA_BIT_MASK(32)) return 1; ... } The iommux's dma_supported() actually means iommu requires devices's minimized dma capability. An example: static int sba_dma_supported( struct device *dev, u64 mask)() { ... * check if mask is >= than the current max IO Virt Address * The max IO Virt address will *always* < 30 bits. */ return((int)(mask >= (ioc->ibase - 1 + (ioc->pdir_size / sizeof(u64) * IOVP_SIZE) ))); ... } 1 means supported. 0 means unsupported. Correct document to make it more clear and provide correct sample code. Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net> [jc: fixed then/than typo] Link: https://lore.kernel.org/r/20240401174159.642998-1-Frank.Li@nxp.com
2024-03-11Merge tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-14/+29
Pull workqueue updates from Tejun Heo: "This cycle, a lot of workqueue changes including some that are significant and invasive. - During v6.6 cycle, unbound workqueues were updated so that they are more topology aware and flexible, which among other things improved workqueue behavior on modern multi-L3 CPUs. In the process, commit 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") switched unbound workqueues to use per-CPU frontend pool_workqueues as a part of increasing front-back mapping flexibility. An unwelcome side effect of this change was that this made max concurrency enforcement per-CPU blowing up the maximum number of allowed concurrent executions. I incorrectly assumed that this wouldn't cause practical problems as most unbound workqueue users are self-regulate max concurrency; however, there definitely are which don't (e.g. on IO paths) and the drastic increase in the allowed max concurrency led to noticeable perf regressions in some use cases. This is now addressed by separating out max concurrency enforcement to a separate struct - wq_node_nr_active - which makes @max_active consistently mean system-wide max concurrency regardless of the number of CPUs or (finally) NUMA nodes. This is a rather invasive and, in places, a bit clunky; however, the clunkiness rises from the the inherent requirement to handle the disagreement between the execution locality domain and max concurrency enforcement domain on some modern machines. See commit 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") for more details. - BH workqueue support is added. They are similar to per-CPU workqueues but execute work items in the softirq context. This is expected to replace tasklet. However, currently, it's missing the ability to disable and enable work items which is needed to convert many tasklet users. To avoid crowding this merge window too much, this will be included in the next merge window. A separate pull request will be sent for the couple conversion patches that are currently pending. - Waiman plugged a long-standing hole in workqueue CPU isolation where ordered workqueues didn't follow wq_unbound_cpumask updates. Ordered workqueues now follow the same rules as other unbound workqueues. - More CPU isolation improvements: Juri fixed another deficit in workqueue isolation where unbound rescuers don't respect wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on isolated CPUs. - Other misc changes" * tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits) workqueue: Drain BH work items on hot-unplugged CPUs workqueue: Introduce from_work() helper for cleaner callback declarations workqueue: Control intensive warning threshold through cmdline workqueue: Make @flags handling consistent across set_work_data() and friends workqueue: Remove clear_work_data() workqueue: Factor out work_grab_pending() from __cancel_work_sync() workqueue: Clean up enum work_bits and related constants workqueue: Introduce work_cancel_flags workqueue: Use variable name irq_flags for saving local irq flags workqueue: Reorganize flush and cancel[_sync] functions workqueue: Rename __cancel_work_timer() to __cancel_timer_sync() workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held() workqueue: Cosmetic changes workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK workqueue: Fix queue_work_on() with BH workqueues async: Use a dedicated unbound workqueue with raised min_active workqueue: Implement workqueue_set_min_active() workqueue: Fix kernel-doc comment of unplug_oldest_pwq() workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask kernel/workqueue: Let rescuers follow unbound wq cpumask changes ...
2024-02-05workqueue: Don't implicitly make UNBOUND workqueues w/ @max_active==1 orderedTejun Heo1-9/+5
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automoatically promoted UNBOUND workqueues w/ @max_active==1 to ordered workqueues because UNBOUND workqueues w/ @max_active==1 used to be the way to create ordered workqueues and the new NUMA support broke it. These problems can be subtle and the fact that they can only trigger on NUMA machines made them even more difficult to debug. However, overloading the UNBOUND allocation interface this way creates other issues. It's difficult to tell whether a given workqueue actually needs to be ordered and users that legitimately want a min concurrency level wq unexpectedly gets an ordered one instead. With planned UNBOUND workqueue udpates to improve execution locality and more prevalence of chiplet designs which can benefit from such improvements, this isn't a state we wanna be in forever. There aren't that many UNBOUND w/ @max_active==1 users in the tree and the preceding patches audited all and converted them to alloc_ordered_workqueue() as appropriate. This patch removes the implicit promotion of UNBOUND w/ @max_active==1 workqueues to ordered ones. v2: v1 patch incorrectly dropped !list_empty(&wq->pwqs) condition in apply_workqueue_attrs_locked() which spuriously triggers WARNING and fails workqueue creation. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202304251050.45a5df1f-oliver.sang@intel.com
2024-02-04workqueue: Implement BH workqueues to eventually replace taskletsTejun Heo1-5/+24
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws such as the execution code accessing the tasklet item after the execution is complete which can lead to subtle use-after-free in certain usage scenarios and less-developed flush and cancel mechanisms. This patch implements BH workqueues which share the same semantics and features of regular workqueues but execute their work items in the softirq context. As there is always only one BH execution context per CPU, none of the concurrency management mechanisms applies and a BH workqueue can be thought of as a convenience wrapper around softirq. Except for the inability to sleep while executing and lack of max_active adjustments, BH workqueues and work items should behave the same as regular workqueues and work items. Currently, the execution is hooked to tasklet[_hi]. However, the goal is to convert all tasklet users over to BH workqueues. Once the conversion is complete, tasklet can be removed and BH workqueues can directly take over the tasklet softirqs. system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in tasklet, all existing tasklet users should be able to use the system BH workqueues without creating their own workqueues. v3: - Add missing interrupt.h include. v2: - Instead of using tasklets, hook directly into its softirq action functions - tasklet[_hi]_action(). This is slightly cheaper and closer to the eventual code structure we want to arrive at. Suggested by Lai. - Lai also pointed out several places which need NULL worker->task handling or can use clarification. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.com Tested-by: Allen Pais <allen.lkml@gmail.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
2024-01-17Merge tag 'docs-6.8-2' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation fixes from Jonathan Corbet: "A handful of late-arriving documentation fixes" * tag 'docs-6.8-2' of git://git.lwn.net/linux: docs, kprobes: Add loongarch as supported architecture docs, kprobes: Update email address of Masami Hiramatsu docs: admin-guide: hw_random: update rng-tools website Documentation/core-api: fix spelling mistake in workqueue docs: kernel_feat.py: fix potential command injection Documentation: constrain alabaster package to older versions
2024-01-12Merge tag 'drm-next-2024-01-10' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-0/+2
Pull drm updates from Dave Airlie: "This contains two major new drivers: - imagination is a first driver for Imagination Technologies devices, it only covers very specific devices, but there is hope to grow it - xe is a reboot of the i915 GPU (shares display) side using a more upstream focused development model, and trying to maximise code sharing. It's not enabled for any hw by default, and will hopefully get switched on for Intel's Lunarlake. This also drops a bunch of the old UMS ioctls. It's been dead long enough. amdgpu has a bunch of new color management code that is being used in the Steam Deck. amdgpu also has a new ACPI WBRF interaction to help avoid radio interference. Otherwise it's the usual lots of changes in lots of places. Detailed summary: new drivers: - imagination - new driver for Imagination Technologies GPU - xe - new driver for Intel GPUs using core drm concepts core: - add CLOSE_FB ioctl - remove old UMS ioctls - increase max objects to accomodate AMD color mgmt encoder: - create per-encoder debugfs directory edid: - split out drm_eld - SAD helpers - drop edid_firmware module parameter format-helper: - cache format conversion buffers sched: - move from kthread to workqueue - rename some internals - implement dynamic job-flow control gpuvm: - provide more features to handle GEM objects client: - don't acquire module reference displayport: - add mst path property documentation fdinfo: - alignment fix dma-buf: - add fence timestamp helper - add fence deadline support bridge: - transparent aux-bridge for DP/USB-C - lt8912b: add suspend/resume support and power regulator support panel: - edp: AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 - chromebook panel support - elida-kd35t133: rework pm - powkiddy RK2023 panel - himax-hx8394: drop prepare/unprepare and shutdown logic - BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G - Evervision VGG644804, SDC ATNA45AF01 - nv3052c: register docs, init sequence fixes, fascontek FS035VG158 - st7701: Anbernic RG-ARC support - r63353 panel controller - Ilitek ILI9805 panel controller - AUO G156HAN04.0 simplefb: - support memory regions - support power domains amdgpu: - add new 64-bit sequence number infrastructure - add AMD specific color management - ACPI WBRF support for RF interference handling - GPUVM updates - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9/7.11 updates - SubVP updates - XGMI PCIe state dumping for aqua vanjaram - GFX11 golden register updates - enable tunnelling on high pri compute amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend/resume - Fix possible memory leak in pqm_uninit() - support import/export of dma-bufs using GEM handles radeon: - fix possible overflows in command buffer checking - check for errors in ring_lock i915: - reorg display code for reuse in xe driver - fdinfo memory stats printing - DP MST bandwidth mgmt improvements - DP panel replay enabling - MTL C20 phy state verification - MTL DP DSC fractional bpp support - Audio fastset support - use dma_fence interfaces instead of i915_sw_fence - Separate gem and display code - AUX register macro refactoring - Separate display module/device parameters - Move display capabilities debugfs under display - Makefile cleanups - Register cleanups - Move display lock inits under display/ - VLV/CHV DPIO PHY register and interface refactoring - DSI VBT sequence refactoring - C10/C20 PHY PLL hardware readout - DPLL code cleanups - Cleanup PXP plane protection checks - Improve display debug msgs - PSR selective fetch fixes/improvements - DP MST fixes - Xe2LPD FBC restrictions removed - DGFX uses direct VBT pin mapping - more MTL WAs - fix MTL eDP bug - eliminate use of kmap_atomic habanalabs: - sysfs entry to identify a device minor id with debugfs path - sysfs entry to expose device module id - add signed device info retrieval through INFO ioctl - add Gaudi2C device support - pcie reset prepare/done hooks msm: - Add support for SDM670, SM8650 - Handle the CFG interconnect to fix the obscure hangs / timeouts - Kconfig fix for QMP dependency - use managed allocators - DPU: SDM670, SM8650 support - DPU: Enable SmartDMA on SM8350 and SM8450 - DP: enable runtime PM support - GPU: add metadata UAPI - GPU: move devcoredumps to GPU device - GPU: convert to drm_exec ivpu: - update FW API - new debugfs file - a new NOP job submission test mode - improve suspend/resume - PM improvements - MMU PT optimizations - firmware profile frequency support - support for uncached buffers - switch to gem shmem helpers - replace kthread with threaded irqs rockchip: - rk3066_hdmi: convert to atomic - vop2: support nv20 and nv30 - rk3588 support mediatek: - use devm_platform_ioremap_resource - stop using iommu_present - MT8188 VDOSYS1 display support panfrost: - PM improvements - improve interrupt handling as poweroff qaic: - allow to run with single MSI - support host/device time sync - switch to persistent DRM devices exynos: - fix potential error pointer dereference - fix wrong error checking - add missing call to drm_atomic_helper_shutdown omapdrm: - dma-fence lockdep annotation fix tidss: - dma-fence lockdep annotation fix - support for AM62A7 v3d: - BCM2712 - rpi5 support - fdinfo + gputop support - uapi for CPU job handling virtio-gpu: - add context debug name" * tag 'drm-next-2024-01-10' of git://anongit.freedesktop.org/drm/drm: (2340 commits) drm/amd/display: Allow z8/z10 from driver drm/amd/display: fix bandwidth validation failure on DCN 2.1 drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well drm/amd/display: Move fixpt_from_s3132 to amdgpu_dm drm/amd/display: Fix recent checkpatch errors in amdgpu_dm Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole" drm/amd/display: avoid stringop-overflow warnings for dp_decide_lane_settings() drm/amd/display: Fix power_helpers.c codestyle drm/amd/display: Fix hdcp_log.h codestyle drm/amd/display: Fix hdcp2_execution.c codestyle drm/amd/display: Fix hdcp_psp.h codestyle drm/amd/display: Fix freesync.c codestyle drm/amd/display: Fix hdcp_psp.c codestyle drm/amd/display: Fix hdcp1_execution.c codestyle drm/amd/pm/smu7: fix a memleak in smu7_hwmgr_backend_init drm/amdkfd: Fix iterator used outside loop in 'kfd_add_peer_prop()' drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()' drm/amdkfd: Confirm list is non-empty before utilizing list_first_entry in kfd_topology.c drm/amdgpu: Fix '*fw' from request_firmware() not released in 'amdgpu_ucode_request()' drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()' ...
2024-01-11Merge tag 'docs-6.8' of git://git.lwn.net/linuxLinus Torvalds3-3/+3
Pull documentation update from Jonathan Corbet: "Another moderately busy cycle for documentation, including: - The minimum Sphinx requirement has been raised to 2.4.4, following a warning that was added in 6.2 - Some reworking of the Documentation/process front page to, hopefully, make it more useful - Various kernel-doc tweaks to, for example, make it deal properly with __counted_by annotations - We have also restored a warning for documentation of nonexistent structure members that disappeared a while back. That had the delightful consequence of adding some 600 warnings to the docs build. A sustained effort by Randy, Vegard, and myself has addressed almost all of those, bringing the documentation back into sync with the code. The fixes are going through the appropriate maintainer trees - Various improvements to the HTML rendered docs, including automatic links to Git revisions and a nice new pulldown to make translations easy to access - Speaking of translations, more of those for Spanish and Chinese ... plus the usual stream of documentation updates and typo fixes" * tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits) MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL A reworked process/index.rst ring-buffer/Documentation: Add documentation on buffer_percent file Translated the RISC-V architecture boot documentation. Docs: remove mentions of fdformat from util-linux Docs/zh_CN: Fix the meaning of DEBUG to pr_debug() Documentation: move driver-api/dcdbas to userspace-api/ Documentation: move driver-api/isapnp to userspace-api/ Documentation/core-api : fix typo in workqueue Documentation/trace: Fixed typos in the ftrace FLAGS section kernel-doc: handle a void function without producing a warning scripts/get_abi.pl: ignore some temp files docs: kernel_abi.py: fix command injection scripts/get_abi: fix source path leak CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer docs: translations: add translations links when they exist kernel-doc: Align quick help and the code MAINTAINERS: add reviewer for Spanish translations docs: ignore __counted_by attribute in structure definitions scripts: kernel-doc: Clarify missing struct member description ..
2024-01-11Documentation/core-api: fix spelling mistake in workqueueattreyee-muk1-1/+1
Correct to "following" from "followings" in the sentence "The followings are the read bandwidths and CPU utilizations depending on different affinity scope settings on ``kcryptd`` measured over five runs." Signed-off-by: Attreyee Mukherjee <tintinm2017@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240110185746.24974-1-tintinm2017@gmail.com
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-03Documentation/core-api : fix typo in workqueueattreyee-muk1-1/+1
Correct to “boundaries” from “bounaries” Signed-off-by: Attreyee Mukherjee <tintinm2017@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20231223175316.24951-1-tintinm2017@gmail.com
2023-12-10maple_tree: update the documentation of maple treePeng Zhang1-0/+4
Introduce the new interface mtree_dup() in the documentation. Link: https://lkml.kernel.org/r/20231027033845.90608-7-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-05mm/slab, docs: switch mm-api docs generation from slab.c to slub.cVlastimil Babka1-1/+1
The SLAB implementation is going to be removed, and mm-api.rst currently uses mm/slab.c to obtain kerneldocs for some API functions. Switch it to mm/slub.c and move the relevant kerneldocs of exported functions from one to the other. The rest of kerneldocs in slab.c is for static SLAB implementation-specific functions that don't have counterparts in slub.c and thus can be simply removed with the implementation. Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-11-29Documentation/gpu: VM_BIND locking documentThomas Hellström1-0/+2
Add the first version of the VM_BIND locking document which is intended to be part of the xe driver upstreaming agreement. The document describes and discuss the locking used during exec- functions, evicton and for userptr gpu-vmas. Intention is to be using the same nomenclature as the drm-vm-bind-async.rst. v2: - s/gvm/gpu_vm/g (Rodrigo Vivi) - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c (Rodrigo Vivi) - Adjust commit message accordingly. - Add SPDX license header. v3: - Large update to align with the drm_gpuvm manager locking - Add "Efficient userptr gpu_vma exec function iteration" section - Add "Locking at bind- and unbind time" section. v4: - Fix tabs vs space errors by untabifying (Rodrigo Vivi) - Minor style fixes and typos (Rodrigo Vivi) - Clarify situations where stale GPU mappings are occurring and how access through these mappings are blocked. (Rodrigo Vivi) - Insert into the toctree in implementation_guidelines.rst v5: - Add a section about recoverable page-faults. - Use local references to other documentation where possible (Bagas Sanjaya) - General documentation fixes and typos (Danilo Krummrich and Boris Brezillon) - Improve the documentation around locks that need to be grabbed from the dm-fence critical section (Boris Brezillon) - Add more references to the DRM GPUVM helpers (Danilo Krummrich and Boriz Brezillon) - Update the rfc/xe.rst document. v6: - Rework wording to improve readability (Boris Brezillon, Rodrigo Vivi, Bagas Sanjaya) - Various minor fixes across the document (Boris Brezillon) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Acked-by: John Hubbard <jhubbard@nvidia.com> # Documentation/core-api/pin_user_pages.rst changes Link: https://patchwork.freedesktop.org/patch/msgid/20231129090637.2629-1-thomas.hellstrom@linux.intel.com
2023-11-17docs: dma-api: Fix description of the sync_sg APIBrian Johannesmeyer1-1/+1
Fix the description of the parameters to dma_sync_sg*. They should be the same as the parameters to dma_map_sg(), not dma_map_single(). Signed-off-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20231103162120.3474026-1-bjohannesmeyer@gmail.com>
2023-11-17docs: dma: update a reference to a moved documentLi Zhijian1-1/+1
Documentation/DMA-API.txt has moved to Documentation/core-api/dma-api.rst Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Message-ID: <20231101070201.4066998-1-lizhijian@fujitsu.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-11-02Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-11-01Merge tag 'asm-generic-6.7' of ↵Linus Torvalds2-9/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture
2023-11-01Documentation: maple_tree: fix word spelling errorTom Yang1-1/+1
The "first" is spelled "fist". Link: https://lkml.kernel.org/r/20231023095737.21823-1-yangqixiao@inspur.com Signed-off-by: Tom Yang <yangqixiao@inspur.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-12workqueue: doc: Fix function and sysfs path errorsWangJinchao1-2/+2
alloc_ordered_queue -> alloc_ordered_workqueue /sys/devices/virtual/WQ_NAME/ -> /sys/devices/virtual/workqueue/WQ_NAME/ Signed-off-by: WangJinchao <wangjinchao@xfusion.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-09-11Documentation: Drop or replace remaining mentions of IA64Ard Biesheuvel1-3/+3
Drop or update mentions of IA64, as appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-09-11arch: Remove Itanium (IA-64) architectureArd Biesheuvel1-6/+0
The Itanium architecture is obsolete, and an informal survey [0] reveals that any residual use of Itanium hardware in production is mostly HP-UX or OpenVMS based. The use of Linux on Itanium appears to be limited to enthusiasts that occasionally boot a fresh Linux kernel to see whether things are still working as intended, and perhaps to churn out some distro packages that are rarely used in practice. None of the original companies behind Itanium still produce or support any hardware or software for the architecture, and it is listed as 'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers that contributed on behalf of those companies (nor anyone else, for that matter) have been willing to support or maintain the architecture upstream or even be responsible for applying the odd fix. The Intel firmware team removed all IA-64 support from the Tianocore/EDK2 reference implementation of EFI in 2018. (Itanium is the original architecture for which EFI was developed, and the way Linux supports it deviates significantly from other architectures.) Some distros, such as Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have dropped support years ago. While the argument is being made [1] that there is a 'for the common good' angle to being able to build and run existing projects such as the Grid Community Toolkit [2] on Itanium for interoperability testing, the fact remains that none of those projects are known to be deployed on Linux/ia64, and very few people actually have access to such a system in the first place. Even if there were ways imaginable in which Linux/ia64 could be put to good use today, what matters is whether anyone is actually doing that, and this does not appear to be the case. There are no emulators widely available, and so boot testing Itanium is generally infeasible for ordinary contributors. GCC still supports IA-64 but its compile farm [3] no longer has any IA-64 machines. GLIBC would like to get rid of IA-64 [4] too because it would permit some overdue code cleanups. In summary, the benefits to the ecosystem of having IA-64 be part of it are mostly theoretical, whereas the maintenance overhead of keeping it supported is real. So let's rip off the band aid, and remove the IA-64 arch code entirely. This follows the timeline proposed by the Debian/ia64 maintainer [5], which removes support in a controlled manner, leaving IA-64 in a known good state in the most recent LTS release. Other projects will follow once the kernel support is removed. [0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/ [1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/ [2] https://gridcf.org/gct-docs/latest/index.html [3] https://cfarm.tetaneutral.net/machines/list/ [4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/ [5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/ Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-09-04Merge tag 'printk-for-6.6' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Do not try to get the console lock when it is not need or useful in panic() - Replace the global console_suspended state by a per-console flag - Export symbols needed for dumping the raw printk buffer in panic() - Fix documentation of printf formats for integer types - Moved Sergey Senozhatsky to the reviewer role - Misc cleanups * tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: export symbols for debug modules lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix() printk: ringbuffer: Fix truncating buffer size min_t cast printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() printk: Add per-console suspended state printk: Consolidate console deferred printing printk: Do not take console lock for console_flush_on_panic() printk: Keep non-panic-CPUs out of console lock printk: Reduce console_unblank() usage in unsafe scenarios kdb: Do not assume write() callback available docs: printk-formats: Treat char as always unsigned docs: printk-formats: Fix hex printing of signed values MAINTAINERS: adjust printk/vsprintf entries
2023-09-01Merge tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-19/+337
Pull workqueue updates from Tejun Heo: - Unbound workqueues now support more flexible affinity scopes. The default behavior is to soft-affine according to last level cache boundaries. A work item queued from a given LLC is executed by a worker running on the same LLC but the worker may be moved across cache boundaries as the scheduler sees fit. On machines which multiple L3 caches, which are becoming more popular along with chiplet designs, this improves cache locality while not harming work conservation too much. Unbound workqueues are now also a lot more flexible in terms of execution affinity. Differeing levels of affinity scopes are supported and both the default and per-workqueue affinity settings can be modified dynamically. This should help working around amny of sub-optimal behaviors observed recently with asymmetric ARM CPUs. This involved signficant restructuring of workqueue code. Nothing was reported yet but there's some risk of subtle regressions. Should keep an eye out. - Rescuer workers now has more identifiable comms. - workqueue.unbound_cpus added so that CPUs which can be used by workqueue can be constrained early during boot. - Now that all the in-tree users have been flushed out, trigger warning if system-wide workqueues are flushed. * tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (31 commits) workqueue: fix data race with the pwq->stats[] increment workqueue: Rename rescuer kworker workqueue: Make default affinity_scope dynamically updatable workqueue: Add "Affinity Scopes and Performance" section to documentation workqueue: Implement non-strict affinity scope for unbound workqueues workqueue: Add workqueue_attrs->__pod_cpumask workqueue: Factor out need_more_worker() check and worker wake-up workqueue: Factor out work to worker assignment and collision handling workqueue: Add multiple affinity scopes and interface to select them workqueue: Modularize wq_pod_type initialization workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration workqueue: Generalize unbound CPU pods workqueue: Factor out clearing of workqueue-only attrs fields workqueue: Factor out actual cpumask calculation to reduce subtlety in wq_update_pod() workqueue: Initialize unbound CPU pods later in the boot workqueue: Move wq_pod_init() below workqueue_init() workqueue: Rename NUMA related names to use pod instead workqueue: Rename workqueue_attrs->no_numa to ->ordered workqueue: Make unbound workqueues to use per-cpu pool_workqueues workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug ...
2023-08-30Merge tag 'docs-6.6' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ...
2023-08-30Merge tag 'sound-6.6-rc1' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "We've received a fairly wide range of changes at this time, including for ALSA and ASoC core, but all of them are rather small changes. Here are some highlights: ALSA / ASoC Core: - Fixes of inconsistent locking around control API helpers - A few new control API functions and cleanups - Workarounds for potential UAFs by delayed kobj releases - Unified PCM copy ops with iov_iter - Continued efforts for ASoC API cleanups ASoC: - An adaptor to allow use of IIO DACs and ADCs in ASoC which pulls in some IIO changes - Create a library function for intlog10() and use it in the NAU8825 driver - Convert drivers to use the more modern maple tree register cache - Lots of work on the SOF framework, AMD and Intel drivers, including a lot of cleanup and new device support - Standardization of the presentation of jacks from drivers - Provision of some generic sound card DT properties - Support for AMD Van Gogh, AMD machines with MAX98388 and NAU8821, AWInic AW88261, Cirrus Logic CS35L36 and CS42L43, various Intel platforms including AVS machines with ES8336 and RT5663, Mediatek MT7986, NXP i.MX93, RealTek RT1017 and StarFive JH7110 Others: - New test coverage including ASoC and topology tests in KUnit; this also involves enabling UML builds of ALSA since that's the default KUnit test environment which pulls in the addition of some stubs to the driver - More enhancement of pcmtest driver - A few fixes / enhancements of MIDI 2.0 UMP core - Using PCI definitions in allover HD-audio code - Support for Cirrus CS35L56 and TI TAS2781 HD-audio sub-codecs - CS35L41 HD-audio sub-codec improvements - Continued emu10k1 improvements" * tag 'sound-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (693 commits) ALSA: pcm: Fix missing fixup call in compat hw_refine ioctl ASoC: dwc: i2s: Fix unused functions ALSA: usb-audio: Don't try to submit URBs after disconnection ALSA: emu10k1: add separate documentation for E-MU cards ALSA: emu10k1: more documentation updates ALSA: emu10k1: de-duplicate audigy-mixer.rst vs. sb-live-mixer.rst ALSA: ump: Fix -Wformat-truncation warnings ALSA: hda: Add missing dependency on CONFIG_EFI for Cirrus/TI sub-codecs ALSA: doc: Fix missing backquote in midi-2.0.rst ALSA: hda/realtek: Add quirk for mute LEDs on HP ENVY x360 15-eu0xxx ALSA: hda/tas2781: Switch back to use struct i2c_driver's .probe() ASoC: soc-core.c: Do not error if a DAI link component is not found ASoC: codecs: Fix error code in aw88261_i2c_probe() ASoC: audio-graph-card.c: move audio_graph_parse_of() ASoC: cs42l43: Use new-style PM runtime macros ALSA: documentation: Add description for USB MIDI 2.0 gadget driver ALSA: ump: Don't create unused substreams for static blocks ALSA: ump: Fill group names for legacy rawmidi substreams ALSA: usb-audio: Attach legacy rawmidi after probing all UMP EPs ALSA: ac97: Fix possible error value of *rac97 ...
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds1-0/+18
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-29Merge tag 'mm-stable-2023-08-28-18-26' of ↵Linus Torvalds2-28/+52
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2023-08-29Merge tag 'net-next-6.6' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...
2023-08-28Docu: genericirq.rst: fix irq-examplePhilipp Stanner1-1/+1
A code example was missing the pointer to dereference a variable. Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230824110109.18844-1-pstanner@redhat.com
2023-08-27doc/netlink: Update genetlink-legacy documentationDonald Hunter1-4/+5
Add documentation for recently added genetlink-legacy schema attributes. Remove statements about 'work in progress' and 'todo'. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230825122756.7603-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24crash: memory and CPU hotplug sysfs attributesEric DeVolder1-0/+18
Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="00000051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="8000000" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: add orphaned kernel-doc to the rst files.Matthew Wilcox (Oracle)1-0/+25
There are many files in mm/ that contain kernel-doc which is not currently published on kernel.org. Some of it is easily categorisable, but most of it is going into the miscellaneous documentation section to be organised later. Some files aren't ready to be included; they contain documentation with build errors. Or they're nommu.c which duplicates documentation from "real" MMU systems. Those files are noted with a # mark (although really anything which isn't a recognised directive would do to prevent inclusion) Link: https://lkml.kernel.org/r/20230818200630.2719595-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIOMatthew Wilcox (Oracle)1-15/+9
Current best practice is to reuse the name of the function as a define to indicate that the function is implemented by the architecture. Link: https://lkml.kernel.org/r/20230802151406.3735276-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: add generic flush_icache_pages() and documentationMatthew Wilcox (Oracle)1-17/+22
flush_icache_page() is deprecated but not yet removed, so add a range version of it. Change the documentation to refer to update_mmu_cache_range() instead of update_mmu_cache(). Link: https://lkml.kernel.org/r/20230802151406.3735276-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-08Documentation: core-api/cpuhotplug: Fix state namesAnna-Maria Behnsen1-5/+5
Dynamic allocated hotplug states in documentation and the comment above cpuhp_state enum do not match the code. To not get confused by wrong documentation, change to proper state names. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230515162038.62703-1-anna-maria@linutronix.de
2023-08-07workqueue: Make default affinity_scope dynamically updatableTejun Heo1-1/+8
While workqueue.default_affinity_scope is writable, it only affects workqueues which are created afterwards and isn't very useful. Instead, let's introduce explicit "default" scope and update the effective scope dynamically when workqueue.default_affinity_scope is changed. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Add "Affinity Scopes and Performance" section to documentationTejun Heo1-5/+179
With affinity scopes and their strictness setting added, unbound workqueues should now be able to cover wide variety of configurations and use cases. Unfortunately, the performance picture is not entirely straight-forward due to a trade-off between efficiency and work-conservation in some situations necessitating manual configuration. This patch adds "Affinity Scopes and Performance" section to Documentation/core-api/workqueue.rst which illustrates the trade-off with a set of experiments and provides some guidelines. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Implement non-strict affinity scope for unbound workqueuesTejun Heo1-7/+23
An unbound workqueue can be served by multiple worker_pools to improve locality. The segmentation is achieved by grouping CPUs into pods. By default, the cache boundaries according to cpus_share_cache() define the CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the system has two L3 caches. The workqueue would be mapped to two worker_pools each serving one L3 cache domains. While this improves locality, because the pod boundaries are strict, it limits the total bandwidth a given issuer can consume. For example, let's say there is a thread pinned to a CPU issuing enough work items to saturate the whole machine. With the machine segmented into two pods, no matter how many work items it issues, it can only use half of the CPUs on the system. While this limitation has existed for a very long time, it wasn't very pronounced because the affinity grouping used to be always by NUMA nodes. With cache boundaries as the default and support for even finer grained scopes (smt and cpu), it is now an a lot more pressing problem. This patch implements non-strict affinity scope where the pod boundaries aren't enforced strictly. Going back to the previous example, the workqueue would still be mapped to two worker_pools; however, the affinity enforcement would be soft. The workers in both pools would have their cpus_allowed set to the whole machine thus allowing the scheduler to migrate them anywhere on the machine. However, whenever an idle worker is woken up, the workqueue code asks the scheduler to bring back the task within the pod if the worker is outside. ie. work items start executing within its affinity scope but can be migrated outside as the scheduler sees fit. This removes the hard cap on utilization while maintaining the benefits of affinity scopes. After the earlier ->__pod_cpumask changes, the implementation is pretty simple. When non-strict which is the new default: * pool_allowed_cpus() returns @pool->attrs->cpumask instead of ->__pod_cpumask so that the workers are allowed to run on any CPU that the associated workqueues allow. * If the idle worker task's ->wake_cpu is outside the pod, kick_pool() sets the field to a CPU within the pod. This would be the first use of task_struct->wake_cpu outside scheduler proper, so it isn't clear whether this would be acceptable. However, other methods of migrating tasks are significantly more expensive and are likely prohibitively so if we want to do this on every work item. This needs discussion with scheduler folks. There is also a race window where setting ->wake_cpu wouldn't be effective as the target task is still on CPU. However, the window is pretty small and this being a best-effort optimization, it doesn't seem to warrant more complexity at the moment. While the non-strict cache affinity scopes seem to be the best option, the performance picture interacts with the affinity scope and is a bit complicated to fully discuss in this patch, so the behavior is made easily selectable through wqattrs and sysfs and the next patch will add documentation to discuss performance implications. v2: pool->attrs->affn_strict is set to true for per-cpu worker_pools. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-07workqueue: Add multiple affinity scopes and interface to select themTejun Heo1-0/+63
Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE the default. The code changes to actually add the additional scopes are trivial. Also add module parameter "workqueue.default_affinity_scope" to override the default scope and "affinity_scope" sysfs file to configure it per workqueue. wq_dump.py and documentations are updated accordingly. This enables significant flexibility in configuring how unbound workqueues behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu workqueue. On the other hand, "system" removes all locality boundaries. Many modern machines have multiple L3 caches often while being mostly uniform in terms of memory access. Thus, workqueue's previous behavior of spreading work items in each NUMA node had negative performance implications from unncessarily crossing L3 boundaries between issue and execution. However, picking a finer grained affinity scope also has a downside in that an issuer in one group can't utilize CPUs in other groups. While dependent on the specifics of workload, there's usually a noticeable penalty in crossing L3 boundaries, so let's default to CACHE. This issue will be further addressed and documented with examples in future patches. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue ↵Tejun Heo1-0/+59
configuration Lack of visibility has always been a pain point for workqueues. While the recently added wq_monitor.py improved the situation, it's still difficult to understand what worker pools are active in the system, how workqueues map to them and why. The lack of visibility into how workqueues are configured is going to become more noticeable as workqueue improves locality awareness and provides more mechanisms to customize locality related behaviors. Now that the basic framework for more flexible locality support is in place, this is a good time to improve the situation. This patch adds tools/workqueues/wq_dump.py which prints out the topology configuration, worker pools and how workqueues are mapped to pools. Read the command's help message for more details. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Make unbound workqueues to use per-cpu pool_workqueuesTejun Heo1-11/+10
A pwq (pool_workqueue) represents an association between a workqueue and a worker_pool. When a work item is queued, the workqueue selects the pwq to use, which in turn determines the pool, and queues the work item to the pool through the pwq. pwq is also what implements the maximum concurrency limit - @max_active. As a per-cpu workqueue should be assocaited with a different worker_pool on each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq. However, unbound workqueues were sharing a pwq within each NUMA node by default. The sharing has several downsides: * Because @max_active is per-pwq, the meaning of @max_active changes depending on the machine configuration and whether workqueue NUMA locality support is enabled. * Makes per-cpu and unbound code deviate. * Gets in the way of making workqueue CPU locality awareness more flexible. This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu workqueues do by making the following changes: * wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound workqueues. * numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs the specified pwq to the target CPU's wq->cpu_pwq. * apply_wqattrs_prepare() now always allocates a separate pwq for each CPU unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq. This makes the return value of wq_calc_node_cpumask() unnecessary. It now returns void. * @max_active now means the same thing for both per-cpu and unbound workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer used in workqueue implementation and will be removed later. * All unbound pwq operations which used to be per-numa-node are now per-cpu. For most unbound workqueue users, this shouldn't cause noticeable changes. Work item issue and completion will be a small bit faster, flush_workqueue() would become a bit more expensive, and the total concurrency limit would likely become higher. All @max_active==1 use cases are currently being audited for conversion into alloc_ordered_workqueue() and they shouldn't be affected once the audit and conversion is complete. One area where the behavior change may be more noticeable is workqueue_congested() as the reported congestion state is now per CPU instead of NUMA node. There are only two users of this interface - drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are cc'd. Inputs on the behavior change would be very much appreciated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Karsten Graul <kgraul@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com>
2023-07-25Documentation: core-api: Drop :export: for int_log.hAndy Shevchenko1-1/+0
The :export: keyword makes sense only for C-files, where EXPORT_SYMBOL() might appear. Otherwise kernel-doc may not produce anything out of this file. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: f97fa3dcb2db ("lib/math: Move dvb_math.c into lib/math/int_log.c") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230725104956.47806-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-07-18docs: printk-formats: Treat char as always unsignedAndy Shevchenko1-1/+2
The Linux kernel switched to have char be equivalent to usigned char. Reflect this in the printk specifiers. Fixes: 3bc753c06dd0 ("kbuild: treat char as always unsigned") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230703145839.14248-2-andriy.shevchenko@linux.intel.com
2023-07-18docs: printk-formats: Fix hex printing of signed valuesAndy Shevchenko1-4/+4
The commit cbacb5ab0aa0 ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") obviously missed the point of sign promotion for the signed values lesser than int. In such case %x prints not the same as %h[h]x. Restore back those specifiers for the signed hex cases. Fixes: cbacb5ab0aa0 ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230703145839.14248-1-andriy.shevchenko@linux.intel.com
2023-07-09lib/math: Move dvb_math.c into lib/math/int_log.cAndy Shevchenko1-2/+5
Some existing and new users may benefit from the intlog2() and intlog10() APIs, make them wide available. Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/20230619172019.21457-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230703135211.87416-2-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-06-27Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-0/+32
Pull workqueue updates from Tejun Heo: - Concurrency-managed per-cpu work items that hog CPUs and delay the execution of other work items are now automatically detected and excluded from concurrency management. Reporting on such work items can also be enabled through a config option. - Added tools/workqueue/wq_monitor.py which improves visibility into workqueue usages and behaviors. - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles, superseded by commit afa4bb778e48 in mainline. * tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0 workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() workqueue: fix enum type for gcc-13 workqueue: Track and monitor per-workqueue CPU time usage workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE workqueue: Improve locking rule description for worker fields workqueue: Move worker_set/clr_flags() upwards workqueue: Re-order struct worker fields workqueue: Add pwq->stats[] and a monitoring script Further upgrade queue_work_on() comment
2023-06-27Merge tag 'locking-core-2023-06-27' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface. Instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. The generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. * tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg() locking/atomic: treewide: delete arch_atomic_*() kerneldoc locking/atomic: docs: Add atomic operations to the driver basic API documentation locking/atomic: scripts: generate kerneldoc comments docs: scripts: kernel-doc: accept bitwise negation like ~@var locking/atomic: scripts: simplify raw_atomic*() definitions locking/atomic: scripts: simplify raw_atomic_long*() definitions locking/atomic: scripts: split pfx/name/sfx/order locking/atomic: scripts: restructure fallback ifdeffery locking/atomic: scripts: build raw_atomic_long*() directly locking/atomic: treewide: use raw_atomic*_<op>() locking/atomic: scripts: add trivial raw_atomic*_<op>() locking/atomic: scripts: factor out order template generation locking/atomic: scripts: remove leftover "${mult}" locking/atomic: scripts: remove bogus order parameter locking/atomic: xtensa: add preprocessor symbols locking/atomic: x86: add preprocessor symbols locking/atomic: sparc: add preprocessor symbols locking/atomic: sh: add preprocessor symbols ...
2023-06-27Merge tag 'docs-6.5' of git://git.lwn.net/linuxLinus Torvalds1-0/+6
Pull documentation updates from Jonathan Corbet: "It's been a relatively calm cycle in docsland. We do have: - Some initial page-table documentation from Linus (the other Linus) - Regression-handling documentation improvements from Thorsten - Addition of kerneldoc documentation for the ERR_PTR() and related macros from James Seo ... and the usual collection of fixes and updates" * tag 'docs-6.5' of git://git.lwn.net/linux: docs: consolidate storage interfaces Documentation: update git configuration for Link: tag Documentation: KVM: make corrections to vcpu-requests.rst Documentation: KVM: make corrections to ppc-pv.rst Documentation: KVM: make corrections to locking.rst Documentation: KVM: make corrections to halt-polling.rst Documentation: virt: correct location of haltpoll module params Documentation/mm: Initial page table documentation docs: crypto: async-tx-api: fix typo in struct name docs/doc-guide: Clarify how to write tables docs: handling-regressions: rework section about fixing procedures docs: process: fix a typoed cross-reference docs: submitting-patches: Discuss interleaved replies MAINTAINERS: direct process doc changes to a dedicated ML Documentation: core-api: Add error pointer functions to kernel-api err.h: Add missing kerneldocs for error pointer functions Documentation: conf.py: Add __force to c_id_attributes docs: clarify KVM related kernel parameters' descriptions docs: consolidate human interface subsystems docs: admin-guide: Add information about intel_pstate active mode
2023-06-27Merge tag 'rcu.2023.06.22a' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "Documentation updates Miscellaneous fixes, perhaps most notably: - Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. - Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. - Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree_rcu updates: - Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. Callback-offloading updates: - Fix a number of bugs involving the shrinker and lazy callbacks. Tasks RCU updates Torture-test updates" * tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) torture: Remove duplicated argument -enable-kvm for ppc64 doc/rcutorture: Add description of rcutorture.stall_cpu_block rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcutorture: Correct name of use_softirq module parameter locktorture: Add long_hold to adjust lock-hold delays rcu/nocb: Make shrinker iterate only over NOCB CPUs rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp rcu: Employ jiffies-based backstop to callback time limit rcu: Check callback-invocation time limit for rcuc kthreads rcu: Remove RCU_NONIDLE() rcu: Add more RCU files to kernel-api.rst rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker rcu/nocb: Fix shrinker race against callback enqueuer rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading ...
2023-06-26Merge tag 'smp-core-2023-06-26' of ↵Linus Torvalds1-11/+2
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...
2023-06-05arch: Remove cmpxchg_doublePeter Zijlstra1-2/+0
No moar users, remove the monster. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
2023-05-31mm: Don't pin ZERO_PAGE in pin_user_pages()David Howells1-0/+6
Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer to it from the page tables and make unpin_user_page*() correspondingly ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a zero page's refcount as we're only allowed ~2 million pins on it - something that userspace can conceivably trigger. Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@infradead.org> cc: David Hildenbrand <david@redhat.com> cc: Lorenzo Stoakes <lstoakes@gmail.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Jason Gunthorpe <jgg@nvidia.com> cc: Logan Gunthorpe <logang@deltatee.com> cc: Hillf Danton <hdanton@sina.com> cc: Christian Brauner <brauner@kernel.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20230526214142.958751-2-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19Documentation: core-api: Add error pointer functions to kernel-apiJames Seo1-0/+6
Bring the error pointer functions (e.g. ERR_PTR(), PTR_ERR()) into the docs build so that they can be cross-referenced elsewhere. List them as kernel library functions in the kernel-api document. Nowhere else seems to fit, and they need to go *somewhere*. Signed-off-by: James Seo <james@equiv.tech> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230509175543.2065835-4-james@equiv.tech
2023-05-17workqueue: Track and monitor per-workqueue CPU time usageTejun Heo1-19/+19
Now that wq_worker_tick() is there, we can easily track the rough CPU time consumption of each workqueue by charging the whole tick whenever a tick hits an active workqueue. While not super accurate, it provides reasonable visibility into the workqueues that consume a lot of CPU cycles. wq_monitor.py is updated to report the per-workqueue CPU times. v2: wq_monitor.py was using "cputime" as the key when outputting in json format. Use "cpu_time" instead for consistency with other fields. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-17workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVETejun Heo1-19/+19
If a per-cpu work item hogs the CPU, it can prevent other work items from starting through concurrency management. A per-cpu workqueue which intends to host such CPU-hogging work items can choose to not participate in concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be error-prone and difficult to debug when missed. This patch adds an automatic CPU usage based detection. If a concurrency-managed work item consumes more CPU time than the threshold (10ms by default) continuously without intervening sleeps, wq_worker_tick() which is called from scheduler_tick() will detect the condition and automatically mark it CPU_INTENSIVE. The mechanism isn't foolproof: * Detection depends on tick hitting the work item. Getting preempted at the right timings may allow a violating work item to evade detection at least temporarily. * nohz_full CPUs may not be running ticks and thus can fail detection. * Even when detection is working, the 10ms detection delays can add up if many CPU-hogging work items are queued at the same time. However, in vast majority of cases, this should be able to detect violations reliably and provide reasonable protection with a small increase in code complexity. If some work items trigger this condition repeatedly, the bigger problem likely is the CPU being saturated with such per-cpu work items and the solution would be making them UNBOUND. The next patch will add a debug mechanism to help spot such cases. v4: Documentation for workqueue.cpu_intensive_thresh_us added to kernel-parameters.txt. v3: Switch to use wq_worker_tick() instead of hooking into preemptions as suggested by Peter. v2: Lai pointed out that wq_worker_stopping() also needs to be called from preemption and rtlock paths and an earlier patch was updated accordingly. This patch adds a comment describing the risk of infinte recursions and how they're avoided. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17workqueue: Add pwq->stats[] and a monitoring scriptTejun Heo1-0/+32
Currently, the only way to peer into workqueue operations is through tracing. While possible, it isn't easy or convenient to monitor per-workqueue behaviors over time this way. Let's add pwq->stats[] that track relevant events and a drgn monitoring script - tools/workqueue/wq_monitor.py. It's arguable whether this needs to be configurable. However, it currently only has several counters and the runtime overhead shouldn't be noticeable given that they're on pwq's which are per-cpu on per-cpu workqueues and per-numa-node on unbound ones. Let's keep it simple for the time being. v2: Patch reordered to earlier with fewer fields. Field will be added back gradually. Help message improved. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-15x86/topology: Remove CPU0 hotplug optionThomas Gleixner1-11/+2
This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support physical hotplug of CPU0: "We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available." 11 years later this has not happened and physical hotplug is not officially supported. Remove the cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.715707999@linutronix.de
2023-05-11rcu: Add more RCU files to kernel-api.rstPaul E. McKenney1-0/+12
Recent changes and additions to RCU have not been reflected in Documentation/core-api/kernel-api.rst, which makes it harder to find the kernel-doc headers in recently added RCU files. Therefore, add those files. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Liam Beguin <liambeguin@gmail.com> Cc: <linux-doc@vger.kernel.org>
2023-04-27Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds1-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-27Merge tag 'modules-6.4-rc1' of ↵Linus Torvalds1-3/+21
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The summary of the changes for this pull requests is: - Song Liu's new struct module_memory replacement - Nick Alcock's MODULE_LICENSE() removal for non-modules - My cleanups and enhancements to reduce the areas where we vmalloc module memory for duplicates, and the respective debug code which proves the remaining vmalloc pressure comes from userspace. Most of the changes have been in linux-next for quite some time except the minor fixes I made to check if a module was already loaded prior to allocating the final module memory with vmalloc and the respective debug code it introduces to help clarify the issue. Although the functional change is small it is rather safe as it can only *help* reduce vmalloc space for duplicates and is confirmed to fix a bootup issue with over 400 CPUs with KASAN enabled. I don't expect stable kernels to pick up that fix as the cleanups would have also had to have been picked up. Folks on larger CPU systems with modules will want to just upgrade if vmalloc space has been an issue on bootup. Given the size of this request, here's some more elaborate details: The functional change change in this pull request is the very first patch from Song Liu which replaces the 'struct module_layout' with a new 'struct module_memory'. The old data structure tried to put together all types of supported module memory types in one data structure, the new one abstracts the differences in memory types in a module to allow each one to provide their own set of details. This paves the way in the future so we can deal with them in a cleaner way. If you look at changes they also provide a nice cleanup of how we handle these different memory areas in a module. This change has been in linux-next since before the merge window opened for v6.3 so to provide more than a full kernel cycle of testing. It's a good thing as quite a bit of fixes have been found for it. Jason Baron then made dynamic debug a first class citizen module user by using module notifier callbacks to allocate / remove module specific dynamic debug information. Nick Alcock has done quite a bit of work cross-tree to remove module license tags from things which cannot possibly be module at my request so to: a) help him with his longer term tooling goals which require a deterministic evaluation if a piece a symbol code could ever be part of a module or not. But quite recently it is has been made clear that tooling is not the only one that would benefit. Disambiguating symbols also helps efforts such as live patching, kprobes and BPF, but for other reasons and R&D on this area is active with no clear solution in sight. b) help us inch closer to the now generally accepted long term goal of automating all the MODULE_LICENSE() tags from SPDX license tags In so far as a) is concerned, although module license tags are a no-op for non-modules, tools which would want create a mapping of possible modules can only rely on the module license tag after the commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"). Nick has been working on this *for years* and AFAICT I was the only one to suggest two alternatives to this approach for tooling. The complexity in one of my suggested approaches lies in that we'd need a possible-obj-m and a could-be-module which would check if the object being built is part of any kconfig build which could ever lead to it being part of a module, and if so define a new define -DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've suggested would be to have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as well but that means getting kconfig symbol names mapping to modules always, and I don't think that's the case today. I am not aware of Nick or anyone exploring either of these options. Quite recently Josh Poimboeuf has pointed out that live patching, kprobes and BPF would benefit from resolving some part of the disambiguation as well but for other reasons. The function granularity KASLR (fgkaslr) patches were mentioned but Joe Lawrence has clarified this effort has been dropped with no clear solution in sight [1]. In the meantime removing module license tags from code which could never be modules is welcomed for both objectives mentioned above. Some developers have also welcomed these changes as it has helped clarify when a module was never possible and they forgot to clean this up, and so you'll see quite a bit of Nick's patches in other pull requests for this merge window. I just picked up the stragglers after rc3. LWN has good coverage on the motivation behind this work [2] and the typical cross-tree issues he ran into along the way. The only concrete blocker issue he ran into was that we should not remove the MODULE_LICENSE() tags from files which have no SPDX tags yet, even if they can never be modules. Nick ended up giving up on his efforts due to having to do this vetting and backlash he ran into from folks who really did *not understand* the core of the issue nor were providing any alternative / guidance. I've gone through his changes and dropped the patches which dropped the module license tags where an SPDX license tag was missing, it only consisted of 11 drivers. To see if a pull request deals with a file which lacks SPDX tags you can just use: ./scripts/spdxcheck.py -f \ $(git diff --name-only commid-id | xargs echo) You'll see a core module file in this pull request for the above, but that's not related to his changes. WE just need to add the SPDX license tag for the kernel/module/kmod.c file in the future but it demonstrates the effectiveness of the script. Most of Nick's changes were spread out through different trees, and I just picked up the slack after rc3 for the last kernel was out. Those changes have been in linux-next for over two weeks. The cleanups, debug code I added and final fix I added for modules were motivated by David Hildenbrand's report of boot failing on a systems with over 400 CPUs when KASAN was enabled due to running out of virtual memory space. Although the functional change only consists of 3 lines in the patch "module: avoid allocation if module is already present and ready", proving that this was the best we can do on the modules side took quite a bit of effort and new debug code. The initial cleanups I did on the modules side of things has been in linux-next since around rc3 of the last kernel, the actual final fix for and debug code however have only been in linux-next for about a week or so but I think it is worth getting that code in for this merge window as it does help fix / prove / evaluate the issues reported with larger number of CPUs. Userspace is not yet fixed as it is taking a bit of time for folks to understand the crux of the issue and find a proper resolution. Worst come to worst, I have a kludge-of-concept [3] of how to make kernel_read*() calls for modules unique / converge them, but I'm currently inclined to just see if userspace can fix this instead" Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0] Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1] Link: https://lwn.net/Articles/927569/ [2] Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3] * tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits) module: add debugging auto-load duplicate module support module: stats: fix invalid_mod_bytes typo module: remove use of uninitialized variable len module: fix building stats for 32-bit targets module: stats: include uapi/linux/module.h module: avoid allocation if module is already present and ready module: add debug stats to help identify memory pressure module: extract patient module check into helper modules/kmod: replace implementation with a semaphore Change DEFINE_SEMAPHORE() to take a number argument module: fix kmemleak annotations for non init ELF sections module: Ignore L0 and rename is_arm_mapping_symbol() module: Move is_arm_mapping_symbol() to module_symbol.h module: Sync code of is_arm_mapping_symbol() scripts/gdb: use mem instead of core_layout to get the module address interconnect: remove module-related code interconnect: remove MODULE_LICENSE in non-modules zswap: remove MODULE_LICENSE in non-modules zpool: remove MODULE_LICENSE in non-modules x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules ...
2023-04-25Merge tag 'slab-for-6.4' of ↵Linus Torvalds1-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: "The main change is naturally the SLOB removal. Since its deprecation in 6.2 I've seen no complaints so hopefully SLUB_(TINY) works well for everyone and we can proceed. Besides the code cleanup, the main immediate benefit will be allowing kfree() family of function to work on kmem_cache_alloc() objects, which was incompatible with SLOB. This includes kfree_rcu() which had no kmem_cache_free_rcu() counterpart yet and now it shouldn't be necessary anymore. Besides that, there are several small code and comment improvements from Thomas, Thorsten and Vernon" * tag 'slab-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: document kfree() as allowed for kmem_cache_alloc() objects mm/slob: remove slob.c mm/slab: remove CONFIG_SLOB code from slab common code mm, pagemap: remove SLOB and SLQB from comments and documentation mm, page_flags: remove PG_slob_free mm/slob: remove CONFIG_SLOB mm/slub: fix help comment of SLUB_DEBUG mm: slub: make kobj_type structure constant slab: Adjust comment after refactoring of gfp.h
2023-04-18module: add debug stats to help identify memory pressureLuis Chamberlain1-2/+20
Loading modules with finit_module() can end up using vmalloc(), vmap() and vmalloc() again, for a total of up to 3 separate allocations in the worst case for a single module. We always kernel_read*() the module, that's a vmalloc(). Then vmap() is used for the module decompression, and if so the last read buffer is freed as we use the now decompressed module buffer to stuff data into our copy module. The last allocation is specific to each architectures but pretty much that's generally a series of vmalloc() calls or a variation of vmalloc to handle ELF sections with special permissions. Evaluation with new stress-ng module support [1] with just 100 ops is proving that you can end up using GiBs of data easily even with all care we have in the kernel and userspace today in trying to not load modules which are already loaded. 100 ops seems to resemble the sort of pressure a system with about 400 CPUs can create on module loading. Although issues relating to duplicate module requests due to each CPU inucurring a new module reuest is silly and some of these are being fixed, we currently lack proper tooling to help diagnose easily what happened, when it happened and who likely is to blame -- userspace or kernel module autoloading. Provide an initial set of stats which use debugfs to let us easily scrape post-boot information about failed loads. This sort of information can be used on production worklaods to try to optimize *avoiding* redundant memory pressure using finit_module(). There's a few examples that can be provided: A 255 vCPU system without the next patch in this series applied: Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s graphical.target reached after 6.988s in userspace And 13.58 GiB of virtual memory space lost due to failed module loading: root@big ~ # cat /sys/kernel/debug/modules/stats Mods ever loaded 67 Mods failed on kread 0 Mods failed on decompress 0 Mods failed on becoming 0 Mods failed on load 1411 Total module size 11464704 Total mod text size 4194304 Failed kread bytes 0 Failed decompress bytes 0 Failed becoming bytes 0 Failed kmod bytes 14588526272 Virtual mem wasted bytes 14588526272 Average mod size 171115 Average mod text size 62602 Average fail load bytes 10339140 Duplicate failed modules: module-name How-many-times Reason kvm_intel 249 Load kvm 249 Load irqbypass 8 Load crct10dif_pclmul 128 Load ghash_clmulni_intel 27 Load sha512_ssse3 50 Load sha512_generic 200 Load aesni_intel 249 Load crypto_simd 41 Load cryptd 131 Load evdev 2 Load serio_raw 1 Load virtio_pci 3 Load nvme 3 Load nvme_core 3 Load virtio_pci_legacy_dev 3 Load virtio_pci_modern_dev 3 Load t10_pi 3 Load virtio 3 Load crc32_pclmul 6 Load crc64_rocksoft 3 Load crc32c_intel 40 Load virtio_ring 3 Load crc64 3 Load The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the next patch in this series applied, shows 226.53 MiB are wasted in virtual memory allocations which due to duplicate module requests during boot. It also shows an average module memory size of 167.10 KiB and an an average module .text + .init.text size of 61.13 KiB. The end shows all modules which were detected as duplicate requests and whether or not they failed early after just the first kernel_read*() call or late after we've already allocated the private space for the module in layout_and_allocate(). A system with module decompression would reveal more wasted virtual memory space. We should put effort now into identifying the source of these duplicate module requests and trimming these down as much possible. Larger systems will obviously show much more wasted virtual memory allocations. root@kmod ~ # cat /sys/kernel/debug/modules/stats Mods ever loaded 67 Mods failed on kread 0 Mods failed on decompress 0 Mods failed on becoming 83 Mods failed on load 16 Total module size 11464704 Total mod text size 4194304 Failed kread bytes 0 Failed decompress bytes 0 Failed becoming bytes 228959096 Failed kmod bytes 8578080 Virtual mem wasted bytes 237537176 Average mod size 171115 Average mod text size 62602 Avg fail becoming bytes 2758544 Average fail load bytes 536130 Duplicate failed modules: module-name How-many-times Reason kvm_intel 7 Becoming kvm 7 Becoming irqbypass 6 Becoming & Load crct10dif_pclmul 7 Becoming & Load ghash_clmulni_intel 7 Becoming & Load sha512_ssse3 6 Becoming & Load sha512_generic 7 Becoming & Load aesni_intel 7 Becoming crypto_simd 7 Becoming & Load cryptd 3 Becoming & Load evdev 1 Becoming serio_raw 1 Becoming nvme 3 Becoming nvme_core 3 Becoming t10_pi 3 Becoming virtio_pci 3 Becoming crc32_pclmul 6 Becoming & Load crc64_rocksoft 3 Becoming crc32c_intel 3 Becoming virtio_pci_modern_dev 2 Becoming virtio_pci_legacy_dev 1 Becoming crc64 2 Becoming virtio 2 Becoming virtio_ring 2 Becoming [0] https://github.com/ColinIanKing/stress-ng.git [1] echo 0 > /proc/sys/vm/oom_dump_tasks ./stress-ng --module 100 --module-name xfs Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-10dma-api-howto: typo fixMichael S. Tsirkin1-1/+1
Stumbled upon a typo while reading the doc, here's a fix. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/af1505348a67981f63ccff4e3c3d45b686cda43f.1680864874.git.mst@redhat.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-30docs: move x86 documentation into Documentation/arch/Jonathan Corbet1-1/+1
Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes. All in-kernel references to the old paths have been updated. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/ Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-29mm/slab: document kfree() as allowed for kmem_cache_alloc() objectsVlastimil Babka1-4/+13
This will make it easier to free objects in situations when they can come from either kmalloc() or kmem_cache_alloc(), and also allow kfree_rcu() for freeing objects from kmem_cache_alloc(). For the SLAB and SLUB allocators this was always possible so with SLOB gone, we can document it as supported. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org>
2023-03-28mm, printk: introduce new format %pGt for page_typeHyeonggon Yoo1-5/+11
%pGp format is used to display 'flags' field of a struct page. However, some page flags (i.e. PG_buddy, see page-flags.h for more details) are stored in page_type field. To display human-readable output of page_type, introduce %pGt format. It is important to note the meaning of bits are different in page_type. if page_type is 0xffffffff, no flags are set. Setting PG_buddy (0x00000080) flag results in a page_type of 0xffffff7f. Clearing a bit actually means setting a flag. Bits in page_type are inverted when displaying type names. Only values for which page_type_has_type() returns true are considered as page_type, to avoid confusion with mapcount values. if it returns false, only raw values are displayed and not page type names. Link: https://lkml.kernel.org/r/20230130042514.2418-3-42.hyeyoo@gmail.com Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> [vsprintf part] Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-24Documentation: core-api: update kernel-doc reference to kmod.cBagas Sanjaya1-1/+1
Commit d6f819908f8aac ("module: fold usermode helper kmod into modules directory") moves kmod helper implementation (kmod.c) to kernel/module/ directory but forgets to update its reference on kernel api doc, hence: WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 2.4.4 -export ./kernel/kmod.c' failed with return code 2 Update the reference. Fixes: d6f819908f8aac ("module: fold usermode helper kmod into modules directory") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/linux-next/20230324154413.19cc78be@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Liam Beguin <liambeguin@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds1-16/+15
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-22Merge tag 'docs-6.3' of git://git.lwn.net/linuxLinus Torvalds2-3/+3
Pull documentation updates from Jonathan Corbet: "It has been a moderately calm cycle for documentation; the significant changes include: - Some significant additions to the memory-management documentation - Some improvements to navigation in the HTML-rendered docs - More Spanish and Chinese translations ... and the usual set of typo fixes and such" * tag 'docs-6.3' of git://git.lwn.net/linux: (68 commits) Documentation/watchdog/hpwdt: Fix Format Documentation/watchdog/hpwdt: Fix Reference Documentation: core-api: padata: correct spelling docs/mm: Physical Memory: correct spelling in reference to CONFIG_PAGE_EXTENSION docs: Use HTML comments for the kernel-toc SPDX line docs: Add more information to the HTML sidebar Documentation: KVM: Update AMD memory encryption link printk: Document that CONFIG_BOOT_PRINTK_DELAY required for boot_delay= Documentation: userspace-api: correct spelling Documentation: sparc: correct spelling Documentation: driver-api: correct spelling Documentation: admin-guide: correct spelling docs: add workload-tracing document to admin-guide docs/admin-guide/mm: remove useless markup docs/mm: remove useless markup docs/mm: Physical Memory: remove useless markup docs/sp_SP: Add process magic-number translation docs: ftrace: always use canonical ftrace path Doc/damon: fix the data path error dma-buf: Add "dma-buf" to title of documentation ...
2023-02-16Documentation: core-api: padata: correct spellingRandy Dunlap1-1/+1
Correct spelling problems for Documentation/core-api/padata.rst as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: netdev@vger.kernel.org Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: linux-crypto@vger.kernel.org Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Link: https://lore.kernel.org/r/20230215053744.11716-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-02-15Documentation: core-api: packing: correct spellingRandy Dunlap1-1/+1
Correct spelling problems for Documentation/core-api/packing.rst as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Link: https://lore.kernel.org/r/20230215053738.11562-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02mm: remove folio_pincount_ptr() and head_compound_pincount()Matthew Wilcox (Oracle)1-15/+14
We can use folio->_pincount directly, since all users are guarded by tests of compound/large. Link: https://lkml.kernel.org/r/20230111142915.1001531-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02docs/mm: Physical Memory: remove useless markupMike Rapoport (IBM)1-2/+0
Jon says: > +See also :ref:`Page Reclaim <page_reclaim>`. Can also just be "See also Documentation/mm/page_reclaim.rst". The right things will happen in the HTML output, readers of the plain-text will know immediately where to go, and we don't have to add the label clutter. Remove reference markup and unnecessary labes and use plain file names. Fixes: 5d8c5e430a63 ("docs/mm: Physical Memory: add structure, introduction and nodes description") Suggested-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://lore.kernel.org/r/20230201094156.991542-2-rppt@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-31docs: ftrace: always use canonical ftrace pathRoss Zwisler1-2/+2
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many parts of Documentation still reference this older debugfs path, so let's update them to avoid confusion. Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230125213251.2013791-1-zwisler@google.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-26docs/mm: Physical Memory: add structure, introduction and nodes descriptionMike Rapoport (IBM)1-0/+2
Add structure, introduction and Nodes section to Physical Memory chapter. As the new documentation references core-api/dma-api and mm/page_reclaim, add page labels to those documents. Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://lore.kernel.org/r/20230125192841.25342-2-rppt@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-24docs: add more netlink docs (incl. spec docs)Jakub Kicinski2-0/+102
Add documentation about the upcoming Netlink protocol specs. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-18selftests/vm: rename selftests/vm to selftests/mmSeongJae Park1-1/+1
Rename selftets/vm to selftests/mm for being more consistent with the code, documentation, and tools directories, and won't be confused with virtual machines. [sj@kernel.org: convert missing vm->mm changes] Link: https://lkml.kernel.org/r/20230107230643.252273-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230103180754.129637-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-14Merge tag 'hardening-v6.2-rc1' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: - Convert flexible array members, fix -Wstringop-overflow warnings, and fix KCFI function type mismatches that went ignored by maintainers (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook) - Remove the remaining side-effect users of ksize() by converting dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add more __alloc_size attributes, and introduce full testing of all allocator functions. Finally remove the ksize() side-effect so that each allocation-aware checker can finally behave without exceptions - Introduce oops_limit (default 10,000) and warn_limit (default off) to provide greater granularity of control for panic_on_oops and panic_on_warn (Jann Horn, Kees Cook) - Introduce overflows_type() and castable_to_type() helpers for cleaner overflow checking - Improve code generation for strscpy() and update str*() kern-doc - Convert strscpy and sigphash tests to KUnit, and expand memcpy tests - Always use a non-NULL argument for prepare_kernel_cred() - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell) - Adjust orphan linker section checking to respect CONFIG_WERROR (Xin Li) - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu) - Fix um vs FORTIFY warnings for always-NULL arguments * tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits) ksmbd: replace one-element arrays with flexible-array members hpet: Replace one-element array with flexible-array member um: virt-pci: Avoid GCC non-NULL warning signal: Initialize the info in ksignal lib: fortify_kunit: build without structleak plugin panic: Expose "warn_count" to sysfs panic: Introduce warn_limit panic: Consolidate open-coded panic_on_warn checks exit: Allow oops_limit to be disabled exit: Expose "oops_count" to sysfs exit: Put an upper limit on how often we can oops panic: Separate sysctl logic from CONFIG_SMP mm/pgtable: Fix multiple -Wstringop-overflow warnings mm: Make ksize() a reporting-only function kunit/fortify: Validate __alloc_size attribute results drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid() drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid() driver core: Add __alloc_size hint to devm allocators overflow: Introduce overflows_type() and castable_to_type() coredump: Proactively round up to kmalloc bucket size ...
2022-12-12Merge tag 'docs-6.2' of git://git.lwn.net/linuxLinus Torvalds1-3/+0
Pull documentation updates from Jonathan Corbet: "This was a not-too-busy cycle for documentation; highlights include: - The beginnings of a set of translations into Spanish, headed up by Carlos Bilbao - More Chinese translations - A change to the Sphinx "alabaster" theme by default for HTML generation. Unlike the previous default (Read the Docs), alabaster is shipped with Sphinx by default, reducing the number of other dependencies that need to be installed. It also (IMO) produces a cleaner and more readable result. - The ability to render the documentation into the texinfo format (something Sphinx could always do, we just never wired it up until now) Plus the usual collection of typo fixes, build-warning fixes, and minor updates" * tag 'docs-6.2' of git://git.lwn.net/linux: (67 commits) Documentation/features: Use loongarch instead of loong Documentation/features-refresh.sh: Only sed the beginning "arch" of ARCH_DIR docs/zh_CN: Fix '.. only::' directive's expression docs/sp_SP: Add memory-barriers.txt Spanish translation docs/zh_CN/LoongArch: Update links of LoongArch ISA Vol1 and ELF psABI docs/LoongArch: Update links of LoongArch ISA Vol1 and ELF psABI Documentation/features: Update feature lists for 6.1 Documentation: Fixed a typo in bootconfig.rst docs/sp_SP: Add process coding-style translation docs/sp_SP: Add kernel-docs.rst Spanish translation docs: Create translations/sp_SP/process/, move submitting-patches.rst docs: Add book to process/kernel-docs.rst docs: Retire old resources from kernel-docs.rst docs: Update maintainer of kernel-docs.rst Documentation: riscv: Document the sv57 VM layout Documentation: USB: correct possessive "its" usage math64: fix kernel-doc return value warnings math64: add kernel-doc for DIV64_U64_ROUND_UP math64: favor kernel-doc from header files doc: add texinfodocs and infodocs targets ...
2022-12-12Merge tag 'timers-core-2022-12-10' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and drivers: Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place" * tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support dt-bindings: timer: renesas,tmu: Add r8a779g0 support clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool() clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock() clocksource/drivers/timer-ti-dm: Clear settings on probe and free clocksource/drivers/timer-ti-dm: Make timer_get_irq static clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks dt-bindings: timer: rockchip: Add rockchip,rk3128-timer clockevents: Repair kernel-doc for clockevent_delta2ns() clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct clocksource/drivers/sh_cmt: Access registers according to spec vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy Bluetooth: hci_qca: Fix the teardown problem for real timers: Update the documentation to reflect on the new timer_shutdown() API timers: Provide timer_shutdown[_sync]() timers: Add shutdown mechanism to the internal functions timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode ...
2022-11-24timers: Update the documentation to reflect on the new timer_shutdown() APISteven Rostedt (Google)1-1/+1
In order to make sure that a timer is not re-armed after it is stopped before freeing, a new shutdown state is added to the timer code. The API timer_shutdown_sync() and timer_shutdown() must be called before the object that holds the timer can be freed. Update the documentation to reflect this new workflow. [ tglx: Updated to the new semantics and updated the zh_CN version ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Link: https://lore.kernel.org/r/20221110064147.712934793@goodmis.org Link: https://lore.kernel.org/r/20221123201625.375284489@linutronix.de
2022-11-24Documentation: Replace del_timer/del_timer_sync()Thomas Gleixner1-1/+1
Adjust to the new preferred function names. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Link: https://lore.kernel.org/r/20221123201625.075320635@linutronix.de
2022-11-21math64: favor kernel-doc from header filesLiam Beguin1-3/+0
Fix the kernel-doc markings for div64 functions to point to the header file instead of the lib/ directory. This avoids having implementation specific comments in generic documentation. Furthermore, given that some kernel-doc comments are identical, drop them from lib/math64 and only keep there comments that add implementation details. Signed-off-by: Liam Beguin <liambeguin@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20221118182309.3824530-1-liambeguin@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-28string: Rewrite and add more kern-doc for the str*() functionsKees Cook1-0/+3
While there were varying degrees of kern-doc for various str*()-family functions, many needed updating and clarification, or to just be entirely written. Update (and relocate) existing kern-doc and add missing functions, sadly shaking my head at how many times I have written "Do not use this function". Include the results in the core kernel API doc. Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Andy Shevchenko <andy@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-hardening@vger.kernel.org Tested-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/lkml/9b0cf584-01b3-3013-b800-1ef59fe82476@gmail.com Signed-off-by: Kees Cook <keescook@chromium.org>
2022-10-25overflow: Fix kern-doc markup for functionsKees Cook1-0/+6
Fix the kern-doc markings for several of the overflow helpers and move their location into the core kernel API documentation, where it belongs (it's not driver-specific). Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: linux-hardening@vger.kernel.org Reviewed-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds3-3/+218
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-03Merge tag 'rust-v6.1-rc1' of https://github.com/Rust-for-Linux/linuxLinus Torvalds1-0/+10
Pull Rust introductory support from Kees Cook: "The tree has a recent base, but has fundamentally been in linux-next for a year and a half[1]. It's been updated based on feedback from the Kernel Maintainer's Summit, and to gain recent Reviewed-by: tags. Miguel is the primary maintainer, with me helping where needed/wanted. Our plan is for the tree to switch to the standard non-rebasing practice once this initial infrastructure series lands. The contents are the absolute minimum to get Rust code building in the kernel, with many more interfaces[2] (and drivers - NVMe[3], 9p[4], M1 GPU[5]) on the way. The initial support of Rust-for-Linux comes in roughly 4 areas: - Kernel internals (kallsyms expansion for Rust symbols, %pA format) - Kbuild infrastructure (Rust build rules and support scripts) - Rust crates and bindings for initial minimum viable build - Rust kernel documentation and samples Rust support has been in linux-next for a year and a half now, and the short log doesn't do justice to the number of people who have contributed both to the Linux kernel side but also to the upstream Rust side to support the kernel's needs. Thanks to these 173 people, and many more, who have been involved in all kinds of ways: Miguel Ojeda, Wedson Almeida Filho, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Andreas Hindborg, Adam Bratschi-Kaye, Benno Lossin, Maciej Falkowski, Finn Behrens, Sven Van Asbroeck, Asahi Lina, FUJITA Tomonori, John Baublitz, Wei Liu, Geoffrey Thomas, Philip Herron, Arthur Cohen, David Faust, Antoni Boucher, Philip Li, Yujie Liu, Jonathan Corbet, Greg Kroah-Hartman, Paul E. McKenney, Josh Triplett, Kent Overstreet, David Gow, Alice Ryhl, Robin Randhawa, Kees Cook, Nick Desaulniers, Matthew Wilcox, Linus Walleij, Joe Perches, Michael Ellerman, Petr Mladek, Masahiro Yamada, Arnaldo Carvalho de Melo, Andrii Nakryiko, Konstantin Shelekhin, Rasmus Villemoes, Konstantin Ryabitsev, Stephen Rothwell, Andy Shevchenko, Sergey Senozhatsky, John Paul Adrian Glaubitz, David Laight, Nathan Chancellor, Jonathan Cameron, Daniel Latypov, Shuah Khan, Brendan Higgins, Julia Lawall, Laurent Pinchart, Geert Uytterhoeven, Akira Yokosawa, Pavel Machek, David S. Miller, John Hawley, James Bottomley, Arnd Bergmann, Christian Brauner, Dan Robertson, Nicholas Piggin, Zhouyi Zhou, Elena Zannoni, Jose E. Marchesi, Leon Romanovsky, Will Deacon, Richard Weinberger, Randy Dunlap, Paolo Bonzini, Roland Dreier, Mark Brown, Sasha Levin, Ted Ts'o, Steven Rostedt, Jarkko Sakkinen, Michal Kubecek, Marco Elver, Al Viro, Keith Busch, Johannes Berg, Jan Kara, David Sterba, Connor Kuehl, Andy Lutomirski, Andrew Lunn, Alexandre Belloni, Peter Zijlstra, Russell King, Eric W. Biederman, Willy Tarreau, Christoph Hellwig, Emilio Cobos Álvarez, Christian Poveda, Mark Rousskov, John Ericson, TennyZhuang, Xuanwo, Daniel Paoliello, Manish Goregaokar, comex, Josh Stone, Stephan Sokolow, Philipp Krones, Guillaume Gomez, Joshua Nelson, Mats Larsen, Marc Poulhiès, Samantha Miller, Esteban Blanc, Martin Schmidt, Martin Rodriguez Reboredo, Daniel Xu, Viresh Kumar, Bartosz Golaszewski, Vegard Nossum, Milan Landaverde, Dariusz Sosnowski, Yuki Okushi, Matthew Bakhtiari, Wu XiangCheng, Tiago Lam, Boris-Chengbiao Zhou, Sumera Priyadarsini, Viktor Garske, Niklas Mohrin, Nándor István Krácser, Morgan Bartlett, Miguel Cano, Léo Lanteri Thauvin, Julian Merkle, Andreas Reindl, Jiapeng Chong, Fox Chen, Douglas Su, Antonio Terceiro, SeongJae Park, Sergio González Collado, Ngo Iok Ui (Wu Yu Wei), Joshua Abraham, Milan, Daniel Kolsoi, ahomescu, Manas, Luis Gerhorst, Li Hongyu, Philipp Gesang, Russell Currey, Jalil David Salamé Messina, Jon Olson, Raghvender, Angelos, Kaviraj Kanagaraj, Paul Römer, Sladyn Nunes, Mauro Baladés, Hsiang-Cheng Yang, Abhik Jain, Hongyu Li, Sean Nash, Yuheng Su, Peng Hao, Anhad Singh, Roel Kluin, Sara Saa, Geert Stappers, Garrett LeSage, IFo Hancroft, and Linus Torvalds" Link: https://lwn.net/Articles/849849/ [1] Link: https://github.com/Rust-for-Linux/linux/commits/rust [2] Link: https://github.com/metaspace/rust-linux/commit/d88c3744d6cbdf11767e08bad56cbfb67c4c96d0 [3] Link: https://github.com/wedsonaf/linux/commit/9367032607f7670de0ba1537cf09ab0f4365a338 [4] Link: https://github.com/AsahiLinux/linux/commits/gpu/rust-wip [5] * tag 'rust-v6.1-rc1' of https://github.com/Rust-for-Linux/linux: (27 commits) MAINTAINERS: Rust samples: add first Rust examples x86: enable initial Rust support docs: add Rust documentation Kbuild: add Rust support rust: add `.rustfmt.toml` scripts: add `is_rust_module.sh` scripts: add `rust_is_available.sh` scripts: add `generate_rust_target.rs` scripts: add `generate_rust_analyzer.py` scripts: decode_stacktrace: demangle Rust symbols scripts: checkpatch: enable language-independent checks for Rust scripts: checkpatch: diagnose uses of `%pA` in the C side as errors vsprintf: add new `%pA` format specifier rust: export generated symbols rust: add `kernel` crate rust: add `bindings` crate rust: add `macros` crate rust: add `compiler_builtins` crate rust: adapt `alloc` crate to the kernel ...
2022-10-03mm/page_alloc: remove obsolete gfpflags_normal_context()Miaohe Lin1-3/+0
Since commit dacb5d8875cc ("tcp: fix page frag corruption on page fault"), there's no caller of gfpflags_normal_context(). Remove it as this helper is strictly tied to the sk page frag usage and there won't be other user in the future. [linmiaohe@huawei.com: fix htmldocs] Link: https://lkml.kernel.org/r/1bc55727-9b66-0e9e-c306-f10c4716ea89@huawei.com Link: https://lkml.kernel.org/r/20220916072257.9639-16-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-29docs: put atomic*.txt and memory-barriers.txt into the core-api bookJonathan Corbet4-0/+58
These files describe part of the core API, but have never been converted to RST due to ... let's say local oppposition. So, create a set of special-purpose wrappers to ..include these files into a separate page so that they can be a part of the htmldocs build. Then link them into the core-api manual and remove them from the "staging" dumping ground. Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: David Vernet <void@manifault.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20220927160559.97154-7-corbet@lwn.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-29docs: move asm-annotations.rst into core-apiJonathan Corbet2-0/+223
This one file should not really be in the top-level documentation directory. core-api/ may not be a perfect fit but seems to be best, so move it there. Adjust a couple of internal document references to make them location-independent, and point checkpatch.pl at the new location. Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Joe Perches <joe@perches.com> Reviewed-by: David Vernet <void@manifault.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20220927160559.97154-6-corbet@lwn.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-28vsprintf: add new `%pA` format specifierGary Guo1-0/+10
This patch adds a format specifier `%pA` to `vsprintf` which formats a pointer as `core::fmt::Arguments`. Doing so allows us to directly format to the internal buffer of `printf`, so we do not have to use a temporary buffer on the stack to pre-assemble the message on the Rust side. This specifier is intended only to be used from Rust and not for C, so `checkpatch.pl` is intentionally unchanged to catch any misuse. Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com> Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com> Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Gary Guo <gary@garyguo.net> Co-developed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-09-27Remove duplicate words inside documentationAkhil Raj1-1/+1
I have removed repeated `the` inside the documentation Signed-off-by: Akhil Raj <lf32.dev@gmail.com> Link: https://lore.kernel.org/r/20220827145359.32599-1-lf32.dev@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-26Maple Tree: add new data structureLiam R. Howlett2-0/+218
Patch series "Introducing the Maple Tree" The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. Davidlor said : Yes I like the maple tree, and at this stage I don't think we can ask for : more from this series wrt the MM - albeit there seems to still be some : folks reporting breakage. Fundamentally I see Liam's work to (re)move : complexity out of the MM (not to say that the actual maple tree is not : complex) by consolidating the three complimentary data structures very : much worth it considering performance does not take a hit. This was very : much a turn off with the range locking approach, which worst case scenario : incurred in prohibitive overhead. Also as Liam and Matthew have : mentioned, RCU opens up a lot of nice performance opportunities, and in : addition academia[1] has shown outstanding scalability of address spaces : with the foundation of replacing the locked rbtree with RCU aware trees. A similar work has been discovered in the academic press https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Sheer coincidence. We designed our tree with the intention of solving the hardest problem first. Upon settling on a b-tree variant and a rough outline, we researched ranged based b-trees and RCU b-trees and did find that article. So it was nice to find reassurances that we were on the right path, but our design choice of using ranges made that paper unusable for us. This patch (of 70): The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. There is additional BUG_ON() calls added within the tree, most of which are in debug code. These will be replaced with a WARN_ON() call in the future. There is also additional BUG_ON() calls within the code which will also be reduced in number at a later date. These exist to catch things such as out-of-range accesses which would crash anyways. Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-18Documentation: irqdomain: Fix typo of "at least once"Eric Lin1-1/+1
Signed-off-by: Eric Lin <dslin1010@gmail.com> Link: https://lore.kernel.org/r/20220811091516.2107908-1-dslin1010@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-08-07Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linuxLinus Torvalds1-4/+4
Pull bitmap updates from Yury Norov: - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo) - optimize out non-atomic bitops on compile-time constants (Alexander Lobakin) - cleanup bitmap-related headers (Yury Norov) - x86/olpc: fix 'logical not is only applied to the left hand side' (Alexander Lobakin) - lib/nodemask: inline wrappers around bitmap (Yury Norov) * tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits) lib/nodemask: inline next_node_in() and node_random() powerpc: drop dependency on <asm/machdep.h> in archrandom.h x86/olpc: fix 'logical not is only applied to the left hand side' lib/cpumask: move some one-line wrappers to header file headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h> headers/deps: mm: Optimize <linux/gfp.h> header dependencies lib/cpumask: move trivial wrappers around find_bit to the header lib/cpumask: change return types to unsigned where appropriate cpumask: change return types to bool where appropriate lib/bitmap: change type of bitmap_weight to unsigned long lib/bitmap: change return types to bool where appropriate arm: align find_bit declarations with generic kernel iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE) lib/test_bitmap: test the tail after bitmap_to_arr64() lib/bitmap: fix off-by-one in bitmap_to_arr64() lib: test_bitmap: add compile-time optimization/evaluations assertions bitmap: don't assume compiler evaluates small mem*() builtins calls net/ice: fix initializing the bitmap in the switch code bitops: let optimize out non-atomic bitops on compile-time constants ...
2022-08-06Merge tag 'dma-mapping-5.20-2022-08-06' of ↵Linus Torvalds1-0/+14
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy, Christoph Hellwig) - restructure the PCIe peer to peer mapping support (Logan Gunthorpe) - allow the IOMMU code to communicate an optional DMA mapping length and use that in scsi and libata (John Garry) - split the global swiotlb lock (Tianyu Lan) - various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang, Lukas Bulwahn, Robin Murphy) * tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits) swiotlb: fix passing local variable to debugfs_create_ulong() dma-mapping: reformat comment to suppress htmldoc warning PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() RDMA/rw: drop pci_p2pdma_[un]map_sg() RDMA/core: introduce ib_dma_pci_p2p_dma_supported() nvme-pci: convert to using dma_map_sgtable() nvme-pci: check DMA ops when indicating support for PCI P2PDMA iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg iommu: Explicitly skip bus address marked segments in __iommu_map_sg() dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support dma-direct: support PCI P2PDMA pages in dma-direct map_sg dma-mapping: allow EREMOTEIO return code for P2PDMA transfers PCI/P2PDMA: Introduce helpers for dma_map_sg implementations PCI/P2PDMA: Attempt to set map_type if it has not been set lib/scatterlist: add flag for indicating P2PDMA segments in an SGL swiotlb: clean up some coding style and minor issues dma-mapping: update comment after dmabounce removal scsi: sd: Add a comment about limiting max_sectors to shost optimal limit ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit ...
2022-08-05Merge tag 'mm-stable-2022-08-03' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
2022-08-05Merge tag 'asm-generic-6.0' of ↵Linus Torvalds3-235/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are three independent sets of changes: - Sai Prakash Ranjan adds tracing support to the asm-generic version of the MMIO accessors, which is intended to help understand problems with device drivers and has been part of Qualcomm's vendor kernels for many years - A patch from Sebastian Siewior to rework the handling of IRQ stacks in softirqs across architectures, which is needed for enabling PREEMPT_RT - The last patch to remove the CONFIG_VIRT_TO_BUS option and some of the code behind that, after the last users of this old interface made it in through the netdev, scsi, media and staging trees" * tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: uapi: asm-generic: fcntl: Fix typo 'the the' in comment arch/*/: remove CONFIG_VIRT_TO_BUS soc: qcom: geni: Disable MMIO tracing for GENI SE serial: qcom_geni_serial: Disable MMIO tracing for geni serial asm-generic/io: Add logging support for MMIO accessors KVM: arm64: Add a flag to disable MMIO trace for nVHE KVM lib: Add register read/write tracing support drm/meson: Fix overflow implicit truncation warnings irqchip/tegra: Fix overflow implicit truncation warnings coresight: etm4x: Use asm-generic IO memory barriers arm64: io: Use asm-generic high level MMIO accessors arch/*: Disable softirq stacks on PREEMPT_RT.
2022-08-03Merge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarrayLinus Torvalds1-0/+3
Pull XArray/IDR updates from Matthew Wilcox: - Add appropriate might_alloc() annotations to the XArray APIs - Document that the IDR is deprecated * tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray: IDR: Note that the IDR API is deprecated XArray: Add calls to might_alloc()
2022-08-01Merge tag 'x86_mm_for_v6.0_rc1' of ↵Linus Torvalds1-23/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Rename a PKRU macro to make more sense when reading the code - Update pkeys documentation - Avoid reading contended mm's TLB generation var if not absolutely necessary along with fixing a case where arch_tlbbatch_flush() doesn't adhere to the generation scheme and thus violates the conditions for the above avoidance. * tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/tlb: Ignore f->new_tlb_gen when zero x86/pkeys: Clarify PKRU_AD_KEY macro Documentation/protection-keys: Clean up documentation for User Space pkeys x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-07-19dma-mapping: add dma_opt_mapping_size()John Garry1-0/+14
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for device drivers to know this "optimal" limit, such that they may try to produce mapping which don't exceed it. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-15headers/deps: mm: align MANITAINERS and Docs with new gfp.h structureYury Norov1-4/+4
After moving gfp types out of gfp.h, we have to align MAINTAINERS and Docs, to avoid warnings like this: >> include/linux/gfp.h:1: warning: 'Page mobility and placement hints' not found >> include/linux/gfp.h:1: warning: 'Watermark modifiers' not found >> include/linux/gfp.h:1: warning: 'Reclaim modifiers' not found >> include/linux/gfp.h:1: warning: 'Useful GFP flag combinations' not found Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-07-10IDR: Note that the IDR API is deprecatedMatthew Wilcox (Oracle)1-0/+3
Some people read the documentation, perhaps. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-07-01doc: module: update file referencesMasahiro Yamada2-3/+3
Adjust documents to the file moves made by commit cfc1d277891e ("module: Move all into module/"). Thanks to Yanteng Si for helping me to update Documentation/translations/zh_CN/core-api/kernel-api.rst Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-06-28arch/*/: remove CONFIG_VIRT_TO_BUSArnd Bergmann3-235/+0
All architecture-independent users of virt_to_bus() and bus_to_virt() have been fixed to use the dma mapping interfaces or have been removed now. This means the definitions on most architectures, and the CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. The only exceptions to this are a few network and scsi drivers for m68k Amiga and VME machines and ppc32 Macintosh. These drivers work correctly with the old interfaces and are probably not worth changing. On alpha and parisc, virt_to_bus() were still used in asm/floppy.h. alpha can use isa_virt_to_bus() like x86 does, and parisc can just open-code the virt_to_phys() here, as this is architecture specific code. I tried updating the bus-virt-phys-mapping.rst documentation, which started as an email from Linus to explain some details of the Linux-2.0 driver interfaces. The bits about virt_to_bus() were declared obsolete backin 2000, and the rest is not all that relevant any more, so in the end I just decided to remove the file completely. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-06-27docs: rename Documentation/vm to Documentation/mmMike Rapoport1-1/+1
so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Tested-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Wu XiangCheng <bobwxc@email.cn>
2022-06-07Documentation/protection-keys: Clean up documentation for User Space pkeysIra Weiny1-23/+21
The documentation for user space pkeys was a bit dated including things such as Amazon and distribution testing information which is irrelevant now. Update the documentation. This also streamlines adding the Supervisor pkey documentation later on. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220419170649.1022246-2-ira.weiny@intel.com
2022-05-25Merge tag 'docs-5.19' of git://git.lwn.net/linuxLinus Torvalds2-0/+344
Pull documentation updates from Jonathan Corbet: "It was a moderately busy cycle for documentation; highlights include: - After a long period of inactivity, the Japanese translations are seeing some much-needed maintenance and updating. - Reworked IOMMU documentation - Some new documentation for static-analysis tools - A new overall structure for the memory-management documentation. This is an LSFMM outcome that, it is hoped, will help encourage developers to fill in the many gaps. Optimism is eternal...but hopefully it will work. - More Chinese translations. Plus the usual typo fixes, updates, etc" * tag 'docs-5.19' of git://git.lwn.net/linux: (70 commits) docs: pdfdocs: Add space for chapter counts >= 100 in TOC docs/zh_CN: Add dev-tools/gdb-kernel-debugging.rst Chinese translation input: Docs: correct ntrig.rst typo input: Docs: correct atarikbd.rst typos MAINTAINERS: Become the docs/zh_CN maintainer docs/zh_CN: fix devicetree usage-model translation mm,doc: Add new documentation structure Documentation: drop more IDE boot options and ide-cd.rst Documentation/process: use scripts/get_maintainer.pl on patches MAINTAINERS: Add entry for DOCUMENTATION/JAPANESE docs/trans/ja_JP/howto: Don't mention specific kernel versions docs/ja_JP/SubmittingPatches: Request summaries for commit references docs/ja_JP/SubmittingPatches: Add Suggested-by as a standard signature docs/ja_JP/SubmittingPatches: Randy has moved docs/ja_JP/SubmittingPatches: Suggest the use of scripts/get_maintainer.pl docs/ja_JP/SubmittingPatches: Update GregKH links Documentation/sysctl: document max_rcu_stall_to_panic Documentation: add missing angle bracket in cgroup-v2 doc Documentation: dev-tools: use literal block instead of code-block docs/zh_CN: add vm numa translation ...
2022-05-25Merge tag 'printk-for-5.19' of ↵Linus Torvalds2-0/+138
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Offload writing printk() messages on consoles to per-console kthreads. It prevents soft-lockups when an extensive amount of messages is printed. It was observed, for example, during boot of large systems with a lot of peripherals like disks or network interfaces. It prevents live-lockups that were observed, for example, when messages about allocation failures were reported and a CPU handled consoles instead of reclaiming the memory. It was hard to solve even with rate limiting because it would need to take into account the amount of messages and the speed of all consoles. It is a must to have for real time. Otherwise, any printk() might break latency guarantees. The per-console kthreads allow to handle each console on its own speed. Slow consoles do not longer slow down faster ones. And printk() does not longer unpredictably slows down various code paths. There are situations when the kthreads are either not available or not reliable, for example, early boot, suspend, or panic. In these situations, printk() uses the legacy mode and tries to handle consoles immediately. - Add documentation for the printk index. * tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk, tracing: fix console tracepoint printk: remove @console_locked printk: extend console_lock for per-console locking printk: add kthread console printers printk: add functions to prefer direct printing printk: add pr_flush() printk: move buffer definitions into console_emit_next_record() caller printk: refactor and rework printing logic printk: add con_printk() macro for console details printk: call boot_delay_msec() in printk_delay() printk: get caller_id/timestamp after migration disable printk: wake waiters for safe and NMI contexts printk: wake up all waiters printk: add missing memory barrier to wake_up_klogd() printk: cpu sync always disable interrupts printk: rename cpulock functions printk/index: Printk index feature documentation MAINTAINERS: Add printk indexing maintainers on mention of printk_index
2022-04-22Documentation: move watch_queue to core-apiRandy Dunlap2-0/+344
Move watch_queue documentation to the core-api index and subdirectory. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-14timekeeping: Introduce fast accessor to clock taiKurt Kanzenbach1-0/+1
Introduce fast/NMI safe accessor to clock tai for tracing. The Linux kernel tracing infrastructure has support for using different clocks to generate timestamps for trace events. Especially in TSN networks it's useful to have TAI as trace clock, because the application scheduling is done in accordance to the network time, which is based on TAI. With a tai trace_clock in place, it becomes very convenient to correlate network activity with Linux kernel application traces. Use the same implementation as ktime_get_boot_fast_ns() does by reading the monotonic time and adding the TAI offset. The same limitations as for the fast boot implementation apply. The TAI offset may change at run time e.g., by setting the time or using adjtimex() with an offset. However, these kind of offset changes are rare events. Nevertheless, the user has to be aware and deal with it in post processing. An alternative approach would be to use the same implementation as ktime_get_real_fast_ns() does. However, this requires to add an additional u64 member to the tk_read_base struct. This struct together with a seqcount is designed to fit into a single cache line on 64 bit architectures. Adding a new member would violate this constraint. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20220414091805.89667-2-kurt@linutronix.de
2022-04-13printk/index: Printk index feature documentationPetr Mladek2-0/+138
Document the printk index feature. The primary motivation is to explain that it is not creating KABI from particular printk() calls. Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Chris Down <chris@chrisdown.name> Signed-off-by: Petr Mladek <pmladek@suse.com>
2022-04-01Merge tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarrayLinus Torvalds1-5/+9
Pull XArray updates from Matthew Wilcox: - Documentation update - Fix test-suite build after move of bitmap.h - Fix xas_create_range() when a large entry is already present - Fix xas_split() of a shadow entry * tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray: XArray: Update the LRU list in xas_split() XArray: Fix xas_create_range() when multi-order entry present XArray: Include bitmap.h from xarray.h XArray: Document the locking requirement for the xa_state
2022-03-28Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""Linus Torvalds1-8/+0
Halil Pasic points out [1] that the full revert of that commit (revert in bddac7c1e02b), and that a partial revert that only reverts the problematic case, but still keeps some of the cleanups is probably better.  And that partial revert [2] had already been verified by Oleksandr Natalenko to also fix the issue, I had just missed that in the long discussion. So let's reinstate the cleanups from commit aa6f8dcbab47 ("swiotlb: rework "fix info leak with DMA_FROM_DEVICE""), and effectively only revert the part that caused problems. Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1] Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2] Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/ [3] Suggested-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Christoph Hellwig" <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-26Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""Linus Torvalds1-0/+8
This reverts commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13. It turns out this breaks at least the ath9k wireless driver, and possibly others. What the ath9k driver does on packet receive is to set up the DMA transfer with: int ath_rx_init(..) .. bf->bf_buf_addr = dma_map_single(sc->dev, skb->data, common->rx_bufsize, DMA_FROM_DEVICE); and then the receive logic (through ath_rx_tasklet()) will fetch incoming packets static bool ath_edma_get_buffers(..) .. dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr, common->rx_bufsize, DMA_FROM_DEVICE); ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data); if (ret == -EINPROGRESS) { /*let device gain the buffer again*/ dma_sync_single_for_device(sc->dev, bf->bf_buf_addr, common->rx_bufsize, DMA_FROM_DEVICE); return false; } and it's worth noting how that first DMA sync: dma_sync_single_for_cpu(..DMA_FROM_DEVICE); is there to make sure the CPU can read the DMA buffer (possibly by copying it from the bounce buffer area, or by doing some cache flush). The iommu correctly turns that into a "copy from bounce bufer" so that the driver can look at the state of the packets. In the meantime, the device may continue to write to the DMA buffer, but we at least have a snapshot of the state due to that first DMA sync. But that _second_ DMA sync: dma_sync_single_for_device(..DMA_FROM_DEVICE); is telling the DMA mapping that the CPU wasn't interested in the area because the packet wasn't there. In the case of a DMA bounce buffer, that is a no-op. Note how it's not a sync for the CPU (the "for_device()" part), and it's not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part). Or rather, it _should_ be a no-op. That's what commit aa6f8dcbab47 broke: it made the code bounce the buffer unconditionally, and changed the DMA_FROM_DEVICE to just unconditionally and illogically be DMA_TO_DEVICE. [ Side note: purely within the confines of the swiotlb driver it wasn't entirely illogical: The reason it did that odd DMA_FROM_DEVICE -> DMA_TO_DEVICE conversion thing is because inside the swiotlb driver, it uses just a swiotlb_bounce() helper that doesn't care about the whole distinction of who the sync is for - only which direction to bounce. So it took the "sync for device" to mean that the CPU must have been the one writing, and thought it meant DMA_TO_DEVICE. ] Also note how the commentary in that commit was wrong, probably due to that whole confusion, claiming that the commit makes the swiotlb code "bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer" which is nonsensical for two reasons: - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was exactly when it always did - and should do - the bounce. - since this is a sync for the device (not for the CPU), we're clearly fundamentally not coping back stale data from the bounce buffers at all, because we'd be copying *to* the bounce buffers. So that commit was just very confused. It confused the direction of the synchronization (to the device, not the cpu) with the direction of the DMA (from the device). Reported-and-bisected-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by: Olha Cherevyk <olha.cherevyk@gmail.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Kalle Valo <kvalo@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Toke Høiland-Jørgensen <toke@toke.dk> Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds1-9/+9
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+17
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22mm: document and polish read-ahead codeNeilBrown1-2/+17
Add some "big-picture" documentation for read-ahead and polish the code to make it fit this documentation. The meaning of ->async_size is clarified to match its name. i.e. Any request to ->readahead() has a sync part and an async part. The caller will wait for the sync pages to complete, but will not wait for the async pages. The first async page is still marked PG_readahead Note that the current function names page_cache_sync_ra() and page_cache_async_ra() are misleading. All ra request are partly sync and partly async, so either part can be empty. A page_cache_sync_ra() request will usually set ->async_size non-zero, implying it is not all synchronous. When a non-zero req_count is passed to page_cache_async_ra(), the implication is that some prefix of the request is synchronous, though the calculation made there is incorrect - I haven't tried to fix it. Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-21mm: Make compound_pincount always availableMatthew Wilcox (Oracle)1-9/+9
Move compound_pincount from the third page to the second page, which means it's available for all compound pages. That lets us delete hpage_pincount_available(). On 32-bit systems, there isn't enough space for both compound_pincount and compound_nr in the second page (it would collide with page->private, which is in use for pages in the swap cache), so revert the optimisation of storing both compound_order and compound_nr on 32-bit systems. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-03-07swiotlb: rework "fix info leak with DMA_FROM_DEVICE"Halil Pasic1-8/+0
Unfortunately, we ended up merging an old version of the patch "fix info leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph (the swiotlb maintainer), he asked me to create an incremental fix (after I have pointed this out the mix up, and asked him for guidance). So here we go. The main differences between what we got and what was agreed are: * swiotlb_sync_single_for_device is also required to do an extra bounce * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC Thus this patch removes DMA_ATTR_OVERWRITE, and makes swiotlb_sync_single_for_device() bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer. Let me note, that if the size used with dma_sync_* API is less than the size used with dma_[un]map_*, under certain circumstances we may still end up with swiotlb not being transparent. In that sense, this is no perfect fix either. To get this bullet proof, we would have to bounce the entire mapping/bounce buffer. For that we would have to figure out the starting address, and the size of the mapping in swiotlb_sync_single_for_device(). While this does seem possible, there seems to be no firm consensus on how things are supposed to work. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-14swiotlb: fix info leak with DMA_FROM_DEVICEHalil Pasic1-0/+8
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204. A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails. One can argue that this is an swiotlb problem, because without swiotlb we leak all zeros, and the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved). Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that if in doubt, but allow the driver to tell us that the whole mapped buffer is going to be overwritten, in which case we can preserve the old behavior and avoid the performance impact of the extra bounce. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-03XArray: Document the locking requirement for the xa_stateMatthew Wilcox (Oracle)1-5/+9
It wasn't obvious to all readers that it's unsafe to reuse an xa_state after dropping the xas_lock() or the rcu_read_lock(). Reported-by: Charan Teja Kalla <charante@codeaurora.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-01-27Documentation: core-api: entry: Add comments about nestingNicolas Saenz Julienne1-0/+18
The topic of nesting and reentrancy in the context of early entry code hasn't been addressed so far. So do it. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220110105044.94423-2-nsaenzju@redhat.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-27Documentation: Fill the gaps about entry/noinstr constraintsThomas Gleixner2-0/+269
The entry/exit handling for exceptions, interrupts, syscalls and KVM is not really documented except for some comments. Fill the gaps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de Co-developed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> ---- Changes since v3: - s/nointr/noinstr/ Changes since v2: - No big content changes, just style corrections, so it should be pretty clean at this stage. In the light of this, I kept Mark's Reviewed-by. - Paul's style and paragraph re-writes - Randy's style comments - Add links to transition type sections Documentation/core-api/entry.rst | 261 +++++++++++++++++++++++++++++++ Documentation/core-api/index.rst | 8 + 2 files changed, 269 insertions(+) create mode 100644 Documentation/core-api/entry.rst Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220110105044.94423-1-nsaenzju@redhat.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-12Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linuxLinus Torvalds1-0/+1
Pull iomap updates from Matthew Wilcox: "Convert xfs/iomap to use folios. This should be all that is needed for XFS to use large folios. There is no code in this pull request to create large folios, but no additional changes should be needed to XFS or iomap once they are created. Usually this would have come from Darrick, and we had intended that it would come that route. Between the holidays and various things which Darrick needed to work on, he asked if I could send things directly. There weren't any other iomap patches pending for this release, which probably also played a role" * tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits) iomap: Inline __iomap_zero_iter into its caller xfs: Support large folios iomap: Support large folios in invalidatepage iomap: Convert iomap_migrate_page() to use folios iomap: Convert iomap_add_to_ioend() to take a folio iomap: Simplify iomap_do_writepage() iomap: Simplify iomap_writepage_map() iomap,xfs: Convert ->discard_page to ->discard_folio iomap: Convert iomap_write_end_inline to take a folio iomap: Convert iomap_write_begin() and iomap_write_end() to folios iomap: Convert __iomap_zero_iter to use a folio iomap: Allow iomap_write_begin() to be called with the full length iomap: Convert iomap_page_mkwrite to use a folio iomap: Convert readahead and readpage to use a folio iomap: Convert iomap_read_inline_data to take a folio iomap: Use folio offsets instead of page offsets iomap: Convert bio completions to use folios iomap: Pass the iomap_page into iomap_set_range_uptodate iomap: Add iomap_invalidate_folio iomap: Convert iomap_releasepage to use a folio ...
2022-01-12Merge tag 'driver-core-5.17-rc1' of ↵Linus Torvalds1-9/+7
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits) kobject documentation: remove default_attrs information drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb debugfs: lockdown: Allow reading debugfs files that are not world readable driver core: Make bus notifiers in right order in really_probe() driver core: Move driver_sysfs_remove() after driver_sysfs_add() firmware: edd: remove empty default_attrs array firmware: dmi-sysfs: use default_groups in kobj_type qemu_fw_cfg: use default_groups in kobj_type firmware: memmap: use default_groups in kobj_type sh: sq: use default_groups in kobj_type headers/uninline: Uninline single-use function: kobject_has_children() devtmpfs: mount with noexec and nosuid driver core: Simplify async probe test code by using ktime_ms_delta() nilfs2: use default_groups in kobj_type kobject: remove kset from struct kset_uevent_ops callbacks driver core: make kobj_type constant. driver core: platform: document registration-failure requirement vdpa/mlx5: Use auxiliary_device driver data helpers net/mlx5e: Use auxiliary_device driver data helpers soundwire: intel: Use auxiliary_device driver data helpers ...
2022-01-07kobject documentation: remove default_attrs informationGreg Kroah-Hartman1-3/+2
Since commit aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") we have been encouraging the use of default_groups instead of default_attrs, so reflect that information in the documentation as well so that no new users get added while the kernel is converted over to not use this field anymore. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220104105024.1014313-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-28kobject: remove kset from struct kset_uevent_ops callbacksGreg Kroah-Hartman1-4/+3
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-27driver core: make kobj_type constant.Wedson Almeida Filho1-2/+2
This way instances of kobj_type (which contain function pointers) can be stored in .rodata, which means that they cannot be [easily/accidentally] modified at runtime. Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211224231345.777370-1-wedsonaf@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-16block: Add bio_for_each_folio_all()Matthew Wilcox (Oracle)1-0/+1
Allow callers to iterate over each folio instead of each page. The bio need not have been constructed using folios originally. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-11-29block: remove blk-exec.cChristoph Hellwig1-3/+0
All this code is tightly coupled to the blk-mq core, so move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-4-hch@lst.de [axboe: remove doc generation for blk-exec.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-3/+0
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...
2021-11-06mm/memory_hotplug: remove HIGHMEM leftoversDavid Hildenbrand1-3/+0
We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not HIGHMEM. Let's remove any leftover code -- including the unused "status_change_nid_high" field part of the memory notifier. Link: https://lkml.kernel.org/r/20210929143600.49379-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Alex Shi <alexs@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-02Merge branch 'for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-4/+17
Pull workqueue updates from Tejun Heo: "Nothing too interesting. An optimization to short-circuit noop cpumask updates, debug dump code reorg, and doc update" * 'for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: doc: Call out the non-reentrance conditions workqueue: Introduce show_one_worker_pool and show_one_workqueue. workqueue: make sysfs of unbound kworker cpumask more clever
2021-11-02Merge tag 'printk-for-5.16' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Extend %pGp print format to print hex value of the page flags - Use kvmalloc instead of kmalloc to allocate devkmsg buffers - Misc cleanup and warning fixes * tag 'printk-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: vsprintf: Update %pGp documentation about that it prints hex value lib/vsprintf.c: Amend static asserts for format specifier flags vsprintf: Make %pGp print the hex value test_printf: Append strings more efficiently test_printf: Remove custom appending of '|' test_printf: Remove separate page_flags variable test_printf: Make pft array const ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK printk: use gnu_printf format attribute for printk_sprint() printk: avoid -Wsometimes-uninitialized warning printk: use kvmalloc instead of kmalloc for devkmsg_user
2021-11-01Merge tag 'irq-core-2021-10-31' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Prevent a potential deadlock when initial priority is assigned to a newly created interrupt thread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - A couple of small updates to make the irq core RT safe. - Confine the irq_cpu_online/offline() API to the only left unfixable user Cavium Octeon so that it does not grow new usage. - A small documentation update Driver changes: - A large cross architecture rework to move irq_enter/exit() into the architecture code to make addressing the NOHZ_FULL/RCU issues simpler. - The obligatory new irq chip driver for Microchip EIC - Modularize a few irq chip drivers - Expand usage of devm_*() helpers throughout the driver code - The usual small fixes and improvements all over the place" * tag 'irq-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) h8300: Fix linux/irqchip.h include mess dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings MIPS: irq: Avoid an unused-variable error genirq: Hide irq_cpu_{on,off}line() behind a deprecated option irqchip/mips-gic: Get rid of the reliance on irq_cpu_online() MIPS: loongson64: Drop call to irq_cpu_offline() irq: remove handle_domain_{irq,nmi}() irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY irq: riscv: perform irqentry in entry code irq: openrisc: perform irqentry in entry code irq: csky: perform irqentry in entry code irq: arm64: perform irqentry in entry code irq: arm: perform irqentry in entry code irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ irq: add generic_handle_arch_irq() irq: unexport handle_irq_desc() irq: simplify handle_domain_{irq,nmi}() irq: mips: simplify do_domain_IRQ() ...
2021-11-01vsprintf: Update %pGp documentation about that it prints hex valuePetr Mladek1-1/+1
The commit 23efd0804c0a869dfb1e7 ("vsprintf: Make %pGp print the hex value") changed the behavior of %pGp printk format. Update the documentation accordingly. Fixes: 23efd0804c0a869dfb1e7 ("vsprintf: Make %pGp print the hex value") Reviewed-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/YXlKqCPY9suM4mfT@alley
2021-10-26irq: remove handle_domain_{irq,nmi}()Mark Rutland1-3/+0
Now that entry code handles IRQ entry (including setting the IRQ regs) before calling irqchip code, irqchip code can safely call generic_handle_domain_irq(), and there's no functional reason for it to call handle_domain_irq(). Let's cement this split of responsibility and remove handle_domain_irq() entirely, updating irqchip drivers to call generic_handle_domain_irq(). For consistency, handle_domain_nmi() is similarly removed and replaced with a generic_handle_domain_nmi() function which also does not perform any entry logic. Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire when they were called in an inappropriate context. So that we can identify similar issues going forward, similar WARN_ON_ONCE() logic is added to the generic_handle_*() functions, and comments are updated for clarity and consistency. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25workqueue: doc: Call out the non-reentrance conditionsBoqun Feng1-4/+17
The current doc of workqueue API suggests that work items are non-reentrant: any work item is guaranteed to be executed by at most one worker system-wide at any given time. However this is not true, the following case can cause a work item W executed by two workers at the same time: queue_work_on(0, WQ1, W); // after a worker picks up W and clear the pending bit queue_work_on(1, WQ2, W); // workers on CPU0 and CPU1 will execute W in the same time. , which means the non-reentrance of a work item is conditional, and Lai Jiangshan provided a nice summary[1] of the conditions, therefore use it to describe a work item instance and improve the doc. [1]: https://lore.kernel.org/lkml/CAJhGHyDudet_xyNk=8xnuO2==o-u06s0E0GZVP4Q67nmQ84Ceg@mail.gmail.com/ Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-18mm: Add flush_dcache_folio()Matthew Wilcox (Oracle)1-0/+6
This is a default implementation which calls flush_dcache_page() on each page in the folio. If architectures can do better, they should implement their own version of it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-27mm/util: Add folio_mapping() and folio_file_mapping()Matthew Wilcox (Oracle)1-0/+2
These are the folio equivalent of page_mapping() and page_file_mapping(). Add an out-of-line page_mapping() wrapper around folio_mapping() in order to prevent the page_folio() call from bloating every caller of page_mapping(). Adjust page_file_mapping() and page_mapping_file() to use folios internally. Rename __page_file_mapping() to swapcache_mapping() and change it to take a folio. This ends up saving 122 bytes of text overall. folio_mapping() is 45 bytes shorter than page_mapping() was, but the new page_mapping() wrapper is 30 bytes. The major reduction is a few bytes less in dozens of nfs functions (which call page_file_mapping()). Most of these appear to be a slight change in gcc's register allocation decisions, which allow: 48 8b 56 08 mov 0x8(%rsi),%rdx 48 8d 42 ff lea -0x1(%rdx),%rax 83 e2 01 and $0x1,%edx 48 0f 44 c6 cmove %rsi,%rax to become: 48 8b 46 08 mov 0x8(%rsi),%rax 48 8d 78 ff lea -0x1(%rax),%rdi a8 01 test $0x1,%al 48 0f 44 fe cmove %rsi,%rdi for a reduction of a single byte. Once the NFS client is converted to use folios, this entire sequence will disappear. Also add folio_mapping() documentation. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: David Howells <dhowells@redhat.com>
2021-09-27mm/lru: Add folio LRU functionsMatthew Wilcox (Oracle)1-0/+1
Handle arbitrary-order folios being added to the LRU. By definition, all pages being added to the LRU were already head or base pages, but call page_folio() on them anyway to get the type right and avoid the buried calls to compound_head(). Saves 783 bytes of kernel text; no functions grow. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-27mm: Add folio reference count functionsMatthew Wilcox (Oracle)1-0/+1
These functions mirror their page reference counterparts. Also add the kernel-doc to the mm-api and correct the return type of page_ref_add_unless() to bool. No change to generated code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
2021-09-27mm: Introduce struct folioMatthew Wilcox (Oracle)1-0/+1
A struct folio is a new abstraction to replace the venerable struct page. A function which takes a struct folio argument declares that it will operate on the entire (possibly compound) page, not just PAGE_SIZE bytes. In return, the caller guarantees that the pointer it is passing does not point to a tail page. No change to generated code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
2021-09-24Merge tag 'irqchip-fixes-5.15-1' of ↵Thomas Gleixner1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Work around a bad GIC integration on a Renesas platform, where the interconnect cannot deal with byte-sized MMIO accesses - Cleanup another Renesas driver abusing the comma operator - Fix a potential GICv4 memory leak on an error path - Make the type of 'size' consistent with the rest of the code in __irq_domain_add() - Fix a regression in the Armada 370-XP IPI path - Fix the build for the obviously unloved goldfish-pic - Some documentation fixes Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
2021-09-12Merge tag 'smp-urgent-2021-09-12' of ↵Linus Torvalds1-99/+480
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Updates for the SMP and CPU hotplug: - Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the original hotplug code and now causing trouble with the ARM64 cache topology setup due to the pointless SMP function call. It's not longer required as the hotplug callbacks are guaranteed to be invoked on the upcoming CPU. - Remove the deprecated and now unused CPU hotplug functions - Rewrite the CPU hotplug API documentation" * tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: core-api/cpuhotplug: Rewrite the API section cpu/hotplug: Remove deprecated CPU-hotplug functions. thermal: Replace deprecated CPU-hotplug functions. drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
2021-09-11Merge tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+3
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - fix nvmet command set reporting for passthrough controllers (Adam Manzanares) - update a MAINTAINERS email address (Chaitanya Kulkarni) - set QUEUE_FLAG_NOWAIT for nvme-multipth (me) - handle errors from add_disk() (Luis Chamberlain) - update the keep alive interval when kato is modified (Tatsuya Sasaki) - fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke) - do not reset transport on data digest errors in nvme-tcp (Daniel Wagner) - only call synchronize_srcu when clearing current path (Daniel Wagner) - revalidate paths during rescan (Hannes Reinecke) - Split out the fs/block_dev into block/fops.c and block/bdev.c, which has been long overdue. Do this now before -rc1, to avoid annoying conflicts due to this (Christoph) - blk-throtl use-after-free fix (Li) - Improve plug depth for multi-device plugs, greatly increasing md resync performance (Song) - blkdev_show() locking fix (Tetsuo) - n64cart error check fix (Yang) * tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block: n64cart: fix return value check in n64cart_probe() blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues block: move fs/block_dev.c to block/bdev.c block: split out operations on block special files blk-throttle: fix UAF by deleteing timer in blk_throtl_exit() block: genhd: don't call blkdev_show() with major_names_lock held nvme: update MAINTAINERS email address nvme: add error handling support for add_disk() nvme: only call synchronize_srcu when clearing current path nvme: update keep alive interval when kato is modified nvme-tcp: Do not reset transport on data digest errors nvmet: fixup buffer overrun in nvmet_subsys_attr_serial() nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req nvmet: looks at the passthrough controller when initializing CAP nvme: move nvme_multi_css into nvme.h nvme-multipath: revalidate paths during rescan nvme-multipath: set QUEUE_FLAG_NOWAIT
2021-09-11Documentation: core-api/cpuhotplug: Rewrite the API sectionThomas Gleixner1-99/+480
Dave stumbled over the incomplete and confusing documentation of the CPU hotplug API. Rewrite it, add the missing function documentations and correct the existing ones. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210909123212.489059409@linutronix.de
2021-09-07block: move fs/block_dev.c to block/bdev.cChristoph Hellwig1-0/+3
Move it together with the rest of the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210907141303.1371844-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-49/+37
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03mm: remove flush_kernel_dcache_pageChristoph Hellwig1-49/+37
flush_kernel_dcache_page is a rather confusing interface that implements a subset of flush_dcache_page by not being able to properly handle page cache mapped pages. The only callers left are in the exec code as all other previous callers were incorrect as they could have dealt with page cache pages. Replace the calls to flush_kernel_dcache_page with calls to flush_dcache_page, which for all architectures does either exactly the same thing, can contains one or more of the following: 1) an optimization to defer the cache flush for page cache pages not mapped into userspace 2) additional flushing for mapped page cache pages if cache aliases are possible Link: https://lkml.kernel.org/r/20210712060928.4161649-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Alex Shi <alexs@kernel.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Cercueil <paul@crapouillou.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03Documentation: Fix irq-domain.rst build warningMarc Zyngier1-2/+3
Correctly escape the * not to be used as emphasis. Also take this opportunity to clarify the fate of the rest of the legacy APIs. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210903085343.923036-1-maz@kernel.org
2021-09-01Merge tag 'docs-5.15' of git://git.lwn.net/linuxLinus Torvalds2-14/+25
Pull documentation updates from Jonathan Corbet: "Yet another set of documentation changes: - A reworking of PDF generation to yield better results for documents using CJK fonts in particular. - A new set of translations into traditional Chinese, a dialect for which I am assured there is a community of interested readers. - A lot more regular Chinese translation work as well. ... plus the usual assortment of updates, fixes, typo tweaks, etc" * tag 'docs-5.15' of git://git.lwn.net/linux: (55 commits) docs: sphinx-requirements: Move sphinx_rtd_theme to top docs: pdfdocs: Enable language-specific font choice of zh_TW translations docs: pdfdocs: Teach xeCJK about character classes of quotation marks docs: pdfdocs: Permit AutoFakeSlant for CJK fonts docs: pdfdocs: One-half spacing for CJK translations docs: pdfdocs: Add conf.py local to translations for ascii-art alignment docs: pdfdocs: Preserve inter-phrase space in Korean translations docs: pdfdocs: Choose Serif font as CJK mainfont if possible docs: pdfdocs: Add CJK-language-specific font settings docs: pdfdocs: Refactor config for CJK document scripts/kernel-doc: Override -Werror from KCFLAGS with KDOC_WERROR docs/zh_CN: Add zh_CN/accounting/psi.rst doc: align Italian translation Documentation/features/vm: riscv supports THP now docs/zh_CN: add infiniband user_verbs translation docs/zh_CN: add infiniband user_mad translation docs/zh_CN: add infiniband tag_matching translation docs/zh_CN: add infiniband sysfs translation docs/zh_CN: add infiniband opa_vnic translation docs/zh_CN: add infiniband ipoib translation ...
2021-09-01Merge tag 'printk-for-5.15' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-08-30Merge tag 'irq-core-2021-08-30' of ↵Linus Torvalds1-3/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing stands out MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements" * tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) platform-msi: Add ABI to show msi_irqs of platform devices genirq/msi: Move MSI sysfs handling from PCI to MSI core genirq/cpuhotplug: Demote debug printk to KERN_DEBUG irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy irqdomain: Export irq_domain_disconnect_hierarchy() irqchip/gic-v3: Fix priority comparison when non-secure priorities are used irqchip/apple-aic: Fix irq_disable from within irq handlers pinctrl/rockchip: drop the gpio related codes gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type gpio/rockchip: support next version gpio controller gpio/rockchip: use struct rockchip_gpio_regs for gpio controller gpio/rockchip: add driver for rockchip gpio dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank pinctrl/rockchip: add pinctrl device to gpio bank struct pinctrl/rockchip: separate struct rockchip_pin_bank to a head file pinctrl/rockchip: always enable clock for gpio controller genirq: Fix kernel doc indentation EDAC/altera: Convert to generic_handle_domain_irq() powerpc: Bulk conversion to generic_handle_domain_irq() nios2: Bulk conversion to generic_handle_domain_irq() ...
2021-08-28Documentation: Replace deprecated CPU-hotplug functions.Sebastian Andrzej Siewior1-1/+1
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Update the documentation accordingly. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-2-bigeasy@linutronix.de
2021-08-12Documentation: Update irq_domain.rst with new lookup APIsMarc Zyngier1-3/+25
Catch up with the recent irqdomain updates, and document {generic_,}handle_domain_irq(), irq_resolve_mapping() as well as the deprecation of some of the older APIs. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-07-26printk: Move the printk() kerneldoc comment to its new homeJonathan Corbet1-4/+1
Commit 337015573718 ("printk: Userspace format indexing support") turned printk() into a macro, but left the kerneldoc comment for it with the (now) _printk() function, resulting in this docs-build warning: kernel/printk/printk.c:1: warning: 'printk' not found Move the kerneldoc comment back next to the (now) macro it's meant to describe and have the docs build find it there. Fixes: 337015573718b161 ("printk: Userspace format indexing support") Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/87o8aqt7qn.fsf@meer.lwn.net
2021-07-25docs: printk-formats: fix build warningIoana Ciornei1-0/+1
Add an empty line after the '::' starting the code block so that the following lines are properly interpreted. Without this, the following build warnings are visible. Documentation/core-api/printk-formats.rst:136: WARNING: Unexpected indentation. Documentation/core-api/printk-formats.rst:137: WARNING: Block quote ends without a blank line; unexpected unindent. Fixes: 9294523e3768 ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210722100356.635078-2-ciorneiioana@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-07-15docs/core-api: Modify document layoutYanteng Si1-14/+24
Modify the layout of the document and remove unnecessary symbols. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Wu XiangCheng <bobwxc@email.cn> Link: https://lore.kernel.org/r/f151bbc0d1ff6cf24611a698c76b90181f005f8d.1625798719.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-07-08module: add printk formats to add module build ID to stacktracesStephen Boyd1-0/+11
Let's make kernel stacktraces easier to identify by including the build ID[1] of a module if the stacktrace is printing a symbol from a module. This makes it simpler for developers to locate a kernel module's full debuginfo for a particular stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the matching debuginfo from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace that match the module. This is especially useful for pstore crash debugging where the kernel crashes are recorded in something like console-ramoops and the recovery kernel/modules are different or the debuginfo doesn't exist on the device due to space concerns (the debuginfo can be too large for space limited devices). Originally, I put this on the %pS format, but that was quickly rejected given that %pS is used in other places such as ftrace where build IDs aren't meaningful. There was some discussions on the list to put every module build ID into the "Modules linked in:" section of the stacktrace message but that quickly becomes very hard to read once you have more than three or four modules linked in. It also provides too much information when we don't expect each module to be traversed in a stacktrace. Having the build ID for modules that aren't important just makes things messy. Splitting it to multiple lines for each module quickly explodes the number of lines printed in an oops too, possibly wrapping the warning off the console. And finally, trying to stash away each module used in a callstack to provide the ID of each symbol printed is cumbersome and would require changes to each architecture to stash away modules and return their build IDs once unwinding has completed. Instead, we opt for the simpler approach of introducing new printk formats '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb' for "pointer backtrace with module build ID" and then updating the few places in the architecture layer where the stacktrace is printed to use this new format. Before: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm] direct_entry+0x16c/0x1b4 [lkdtm] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 After: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout] [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set] Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static] [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled] Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sashal@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-5/+2
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-01kernel.h: split out kstrtox() and simple_strtox() to a separate headerAndy Shevchenko1-5/+2
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out kstrtox() and simple_strtox() helpers. At the same time convert users in header and lib folders to use new header. Though for time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [andy.shevchenko@gmail.com: fix documentation references] Link: https://lkml.kernel.org/r/20210615220003.377901-1-andy.shevchenko@gmail.com Link: https://lkml.kernel.org/r/20210611185815.44103-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Francis Laniel <laniel_francis@privacyrequired.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Kars Mulder <kerneldev@karsmulder.nl> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29Merge tag 'irq-core-2021-06-29' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...
2021-06-29Merge tag 'printk-for-5.14' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add %pt[RT]s modifier to vsprintf(). It overrides ISO 8601 separator by using ' ' (space). It produces "YYYY-mm-dd HH:MM:SS" instead of "YYYY-mm-ddTHH:MM:SS". - Correctly parse long row of numbers by sscanf() when using the field width. Add extensive sscanf() selftest. - Generalize re-entrant CPU lock that has already been used to serialize dump_stack() output. It is part of the ongoing printk rework. It will allow to remove the obsoleted printk_safe buffers and introduce atomic consoles. - Some code clean up and sparse warning fixes. * tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: fix cpu lock ordering lib/dump_stack: move cpu lock to printk.c printk: Remove trailing semicolon in macros random32: Fix implicit truncation warning in prandom_seed_state() lib: test_scanf: Remove pointless use of type_min() with unsigned types selftests: lib: Add wrapper script for test_scanf lib: test_scanf: Add tests for sscanf number conversion lib: vsprintf: Fix handling of number field widths in vsscanf lib: vsprintf: scanf: Negative number must have field width > 1 usb: host: xhci-tegra: Switch to use %ptTs nilfs2: Switch to use %ptTs kdb: Switch to use %ptTs lib/vsprintf: Allow to override ISO 8601 date and time separator
2021-06-17docs: core-api: avoid using ReST :doc:`foo` markupMauro Carvalho Chehab4-6/+7
The :doc:`foo` tag is auto-generated via automarkup.py. So, use the filename at the sources, instead of :doc:`foo`. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/d967d490b6655735b7df292f88859b5a1b07d0d7.1623824363.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-06-14docs: printk-formats: update size-casting examplesCarlos Llamas1-5/+4
Since commit 72deb455b5ec ("block: remove CONFIG_LBDAF") sector_t and blkcnt_t types are no longer variable in size, making them unsuitable examples for casting to the largest possible type. This patch replaces such examples with cycles_t and blk_status_t types, whose sizes depend on architecture and config options respectively. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20210609195058.3518943-1-cmllamas@google.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-06-10irqdomain: Kill irq_domain_add_legacy_isaMarc Zyngier1-1/+0
This helper doesn't have a user anymore, let's remove it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-05-17lib/vsprintf: Allow to override ISO 8601 date and time separatorAndy Shevchenko1-1/+6
ISO 8601 defines 'T' as a separator between date and time. Though, some ABIs use time and date with ' ' (space) separator instead. Add a flavour to the %pt specifier to override default separator. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210511153958.34527-1-andriy.shevchenko@linux.intel.com
2021-05-06Merge tag 'docs-5.13-2' of git://git.lwn.net/linuxLinus Torvalds1-13/+13
Pull documentation fixes from Jonathan Corbet: "A few late-arriving documentation fixes, including some oprofile cleanup, a kernel-doc fix, some regression-reporting updates, and the usual minor fixes" * tag 'docs-5.13-2' of git://git.lwn.net/linux: Enlisted oprofile version line removed oprofiled version output line removed from the list Removed the oprofiled version option docs: reporting-issues.rst: CC subsystem and maintainers on regressions docs: correct URL to bios and kernel developer's guide docs/core-api: Consistent code style docs/zh_CN: Adjust order and content of zh_CN/index.rst Documentation: input: joydev file corrections docs: Fix typo in Documentation/x86/x86_64/5level-paging.rst kernel-doc: Add support for __deprecated
2021-05-05Merge tag 'gpio-updates-for-v5.13-v2' of ↵Linus Torvalds1-10/+12
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: - new driver for the Realtek Otto GPIO controller - ACPI support for gpio-mpc8xxx - edge event support for gpio-sch (+ Kconfig fixes) - Kconfig improvements in gpio-ich - fixes to older issues in gpio-mockup - ACPI quirk for ignoring EC wakeups on Dell Venue 10 Pro 5055 - improve the GPIO aggregator code by using more generic interfaces instead of reimplementing them in the driver - convert the DT bindings for gpio-74x164 to yaml - documentation improvements - a slew of other minor fixes and improvements to GPIO drivers * tag 'gpio-updates-for-v5.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (34 commits) dt-bindings: gpio: add YAML description for rockchip,gpio-bank gpio: mxs: remove useless function dt-bindings: gpio: fairchild,74hc595: Convert to json-schema gpio: it87: remove unused code gpio: 104-dio-48e: Fix coding style issues gpio: mpc8xxx: Add ACPI support gpio: ich: Switch to be dependent on LPC_ICH gpio: sch: Drop MFD_CORE selection gpio: sch: depends on LPC_SCH gpiolib: acpi: Add quirk to ignore EC wakeups on Dell Venue 10 Pro 5055 gpio: sch: Hook into ACPI GPE handler to catch GPIO edge events gpio: sch: Add edge event support gpio: aggregator: Replace custom get_arg() with a generic next_arg() lib/cmdline: Export next_arg() for being used in modules gpio: omap: Use device_get_match_data() helper gpio: Add Realtek Otto GPIO support dt-bindings: gpio: Binding for Realtek Otto GPIO docs: kernel-parameters: Add gpio_mockup_named_lines docs: kernel-parameters: Move gpio-mockup for alphabetic order lib: bitmap: provide devm_bitmap_alloc() and devm_bitmap_zalloc() ...
2021-05-04Merge tag 'dma-mapping-5.13' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-0/+88
Pull dma-mapping updates from Christoph Hellwig: - add a new dma_alloc_noncontiguous API (me, Ricardo Ribalda) - fix a copyright notice (Hao Fang) - add an unlikely annotation to dma_mapping_error (Heiner Kallweit) - remove a pointless empty line (Wang Qing) - add support for multi-pages map/unmap bencharking (Xiang Chen) * tag 'dma-mapping-5.13' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: add unlikely hint to error path in dma_mapping_error dma-mapping: benchmark: Add support for multi-pages map/unmap dma-mapping: benchmark: use the correct HiSilicon copyright dma-mapping: remove a pointless empty line in dma_alloc_coherent media: uvcvideo: Use dma_alloc_noncontiguous API dma-iommu: implement ->alloc_noncontiguous dma-iommu: refactor iommu_dma_alloc_remap dma-mapping: add a dma_alloc_noncontiguous API dma-mapping: refactor dma_{alloc,free}_pages dma-mapping: add a dma_mmap_pages helper
2021-05-03docs/core-api: Consistent code styleYanteng Si1-13/+13
all `example` in this file should be replaced with ``example``. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Acked-by: Matthias Maennich <maennich@google.com> Link: https://lore.kernel.org/r/20210428100720.1076276-1-siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-04-30mm/mmzone.h: fix existing kernel-doc comments and link them to core-apiMike Rapoport1-0/+1
There are a couple of kernel-doc comments in include/linux/mmzone.h but they have minor formatting issues that would cause kernel-doc warnings. Fix the formatting of those comments, add missing Return: descriptions and link include/linux/mmzone.h to Documentation/core-api/mm-api.rst Link: https://lkml.kernel.org/r/20210426141927.1314326-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm/mempolicy: fix mpol_misplaced kernel-docMatthew Wilcox (Oracle)1-0/+1
Sphinx interprets the Return section as a list and complains about it. Turn it into a sentence and move it to the end of the kernel-doc to fit the kernel-doc style. Link: https://lkml.kernel.org/r/20210225150642.2582252-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm/doc: add mm.h and mm_types.h to the mm-api documentMatthew Wilcox (Oracle)1-0/+4
kerneldoc in include/linux/mm.h and include/linux/mm_types.h wasn't being included in the html build. Link: https://lkml.kernel.org/r/20210322195022.2143603-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm/vmalloc: remove unmap_kernel_rangeNicholas Piggin1-1/+1
This is a shim around vunmap_range, get rid of it. Move the main API comment from the _noflush variant to the normal variant, and make _noflush internal to mm/. [npiggin@gmail.com: fix nommu builds and a comment bug per sfr] Link: https://lkml.kernel.org/r/1617292598.m6g0knx24s.astroid@bobo.none [akpm@linux-foundation.org: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA] [npiggin@gmail.com: fix nommu builds] Link: https://lkml.kernel.org/r/1617292497.o1uhq5ipxp.astroid@bobo.none Link: https://lkml.kernel.org/r/20210322021806.892164-5-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Cédric Le Goater <clg@kaod.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm/vmalloc: remove map_kernel_rangeNicholas Piggin1-1/+1
Patch series "mm/vmalloc: cleanup after hugepage series", v2. Christoph pointed out some overdue cleanups required after the huge vmalloc series, and I had another failure error message improvement as well. This patch (of 5): This is a shim around vmap_pages_range, get rid of it. Move the main API comment from the _noflush variant to the normal variant, and make _noflush internal to mm/. Link: https://lkml.kernel.org/r/20210322021806.892164-1-npiggin@gmail.com Link: https://lkml.kernel.org/r/20210322021806.892164-2-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Cédric Le Goater <clg@kaod.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-28Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-0/+18
Pull drm updates from Dave Airlie: "The usual lots of work all over the place. i915 has gotten some Alderlake work and prelim DG1 code, along with a major locking rework over the GEM code, and brings back the property of timing out long running jobs using a watchdog. amdgpu has some Alderbran support (new GPU), freesync HDMI support along with a lot other fixes. Outside of the drm, there is a new printf specifier added which should have all the correct acks/sobs: - printk fourcc modifier support added %p4cc Summary: core: - drm_crtc_commit_wait - atomic plane state helpers reworked for full state - dma-buf heaps API rework - edid: rework and improvements for displayid dp-mst: - better topology logging bridge: - Chipone ICN6211 - Lontium LT8912B - anx7625 regulator support panel: - fix lt9611 4k panels handling simple-kms: - add plane state helpers ttm: - debugfs support - removal of unused sysfs - ignore signaled moved fences - ioremap buffer according to mem caching i915: - Alderlake S enablement - Conversion to dma_resv_locking - Bring back watchdog timeout support - legacy ioctl cleanups - add GEM TDDO and RFC process - DG1 LMEM preparation work - intel_display.c refactoring - Gen9/TGL PCH combination support - eDP MSO Support - multiple PSR instance support - Link training debug updates - Disable PSR2 support on JSL/EHL - DDR5/LPDDR5 support for bw calcs - LSPCON limited to gen9/10 platforms - HSW/BDW async flip/VTd corruption workaround - SAGV watermark fixes - SNB hard hang on ring resume fix - Limit imported dma-buf size - move to use new tasklet API - refactor KBL/TGL/ADL-S display/gt steppings - refactoring legacy DP/HDMI, FB plane code out amdgpu: - uapi: add ioctl to query video capabilities - Iniital AMD Freesync HDMI support - Initial Adebaran support - 10bpc dithering improvements - DCN secure display support - Drop legacy IO BAR requirements - PCIE/S0ix/RAS/Prime/Reset fixes - Display ASSR support - SMU gfx busy queues for RV/PCO - Initial LTTPR display work amdkfd: - MMU notifier fixes - APU fixes radeon: - debugfs cleanps - fw error handling ifix - Flexible array cleanups msm: - big DSI phy/pll cleanup - sc7280 initial support - commong bandwidth scaling path - shrinker locking contention fixes - unpin/swap support for GEM objcets ast: - cursor plane handling reworked tegra: - don't register DP AUX channels before connectors zynqmp: - fix OOB struct padding memset gma500: - drop ttm and medfield support exynos: - request_irq cleanup function mediatek: - fine tune line time for EOTp - MT8192 dpi support - atomic crtc config updates - don't support HDMI connector creation mxsdb: - imx8mm support panfrost: - MMU IRQ handling rework qxl: - locking fixes - resource deallocation changes sun4i: - add alpha properties to UI/VI layers vc4: - RPi4 CEC support vmwgfx: - doc cleanups arc: - moved to drm/tiny" * tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm: (1390 commits) drm/ttm: Don't count pages in SG BOs against pages_limit drm/ttm: fix return value check drm/bridge: lt8912b: fix incorrect handling of of_* return values drm: bridge: fix LONTIUM use of mipi_dsi_() functions drm: bridge: fix ANX7625 use of mipi_dsi_() functions drm/amdgpu: page retire over debugfs mechanism drm/radeon: Fix a missing check bug in radeon_dp_mst_detect() drm/amd/display: Fix the Wunused-function warning drm/radeon/r600: Fix variables that are not used after assignment drm/amdgpu/smu7: fix CAC setting on TOPAZ drm/amd/display: Update DCN302 SR Exit Latency drm/amdgpu: enable ras eeprom on aldebaran drm/amdgpu: RAS harvest on driver load drm/amdgpu: add ras aldebaran ras eeprom driver drm/amd/pm: increase time out value when sending msg to SMU drm/amdgpu: add DMUB outbox event IRQ source define/complete/debug flag drm/amd/pm: add the callback to get vbios bootup values for vangogh drm/radeon: Fix size overflow drm/amdgpu: Fix size overflow drm/amdgpu: move mmhub ras_func init to ip specific file ...
2021-04-27Merge tag 'printk-for-5.13' of ↵Linus Torvalds1-2/+26
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Stop synchronizing kernel log buffer readers by logbuf_lock. As a result, the access to the buffer is fully lockless now. Note that printk() itself still uses locks because it tries to flush the messages to the console immediately. Also the per-CPU temporary buffers are still there because they prevent infinite recursion and serialize backtraces from NMI. All this is going to change in the future. - kmsg_dump API rework and cleanup as a side effect of the logbuf_lock removal. - Make bstr_printf() aware that %pf and %pF formats could deference the given pointer. - Show also page flags by %pGp format. - Clarify the documentation for plain pointer printing. - Do not show no_hash_pointers warning multiple times. - Update Senozhatsky email address. - Some clean up. * tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (24 commits) lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf() printk: clarify the documentation for plain pointer printing kernel/printk.c: Fixed mundane typos printk: rename vprintk_func to vprintk vsprintf: dump full information of page flags in pGp mm, slub: don't combine pr_err with INFO mm, slub: use pGp to print page flags MAINTAINERS: update Senozhatsky email address lib/vsprintf: do not show no_hash_pointers message multiple times printk: console: remove unnecessary safe buffer usage printk: kmsg_dump: remove _nolock() variants printk: remove logbuf_lock printk: introduce a kmsg_dump iterator printk: kmsg_dumper: remove @active field printk: add syslog_lock printk: use atomic64_t for devkmsg_user.seq printk: use seqcount_latch for clear_seq printk: introduce CONSOLE_LOG_MAX printk: consolidate kmsg_dump_get_buffer/syslog_print_all code printk: refactor kmsg_dump_get_buffer() ...
2021-04-27Merge branch 'for-5.13-vsprintf-pgp' into for-linusPetr Mladek1-1/+1
2021-04-07printk: clarify the documentation for plain pointer printingVlastimil Babka1-1/+25
We have several modifiers for plain pointers (%p, %px and %pK) and now also the no_hash_pointers boot parameter. The documentation should help to choose which variant to use. Importantly, we should discourage %px in favor of %p (with the new boot parameter when debugging), and stress that %pK should be only used for procfs and similar files, not dmesg buffer. This patch clarifies the documentation in that regard. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210225164639.27212-1-vbabka@suse.cz
2021-03-26irqdomain: Introduce irq_domain_create_simple() APIAndy Shevchenko1-10/+12
Linus Walleij pointed out that ird_domain_add_simple() gained additional functionality and can't be anymore replaced with a simple conditional. In preparation to upgrade GPIO library to use fwnode, introduce irq_domain_create_simple() API which is functional equivalent to the existing irq_domain_add_simple(), but takes a pointer to the struct fwnode_handle as a parameter. While at it, amend documentation to mention irq_domain_create_*() functions where it makes sense. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-03-25docs: rbtree.rst: Fix a typoBhaskar Chowdhury1-1/+1
s/maintanence/maintenance/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210324080046.20709-1-unixbhaskar@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-03-19vsprintf: dump full information of page flags in pGpYafang Shao1-1/+1
Currently the pGp only shows the names of page flags, rather than the full information including section, node, zone, last cpupid and kasan tag. While it is not easy to parse these information manually because there're so many flavors. Let's interpret them in pGp as well. To be compitable with the existed format of pGp, the new introduced ones also use '|' as the separator, then the user tools parsing pGp won't need to make change, suggested by Matthew. The new information is tracked onto the end of the existed one. On example of the output in mm/slub.c as follows, - Before the patch, [ 6343.396602] Slab 0x000000004382e02b objects=33 used=3 fp=0x000000009ae06ffc flags=0x17ffffc0010200(slab|head) - After the patch, [ 8448.272530] Slab 0x0000000090797883 objects=33 used=3 fp=0x00000000790f1c26 flags=0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) The documentation and test cases are also updated. The output of the test cases as follows, [68599.816764] test_printf: loaded. [68599.819068] test_printf: all 388 tests passed [68599.830367] test_printf: unloaded. [lkp@intel.com: reported issues in the prev version in test_printf.c] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: kernel test robot <lkp@intel.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210319101246.73513-4-laoar.shao@gmail.com
2021-03-16Merge tag 'drm-misc-next-2021-03-03' of ↵Dave Airlie1-0/+18
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.13: UAPI Changes: Cross-subsystem Changes: Core Changes: - %p4cc printk format modifier - atomic: introduce drm_crtc_commit_wait, rework atomic plane state helpers to take the drm_commit_state structure - dma-buf: heaps rework to return a struct dma_buf - simple-kms: Add plate state helpers - ttm: debugfs support, removal of sysfs Driver Changes: - Convert drivers to shadow plane helpers - arc: Move to drm/tiny - ast: cursor plane reworks - gma500: Remove TTM and medfield support - mxsfb: imx8mm support - panfrost: MMU IRQ handling rework - qxl: rework to better handle resources deallocation, locking - sun4i: Add alpha properties for UI and VI layers - vc4: RPi4 CEC support - vmwgfx: doc cleanup Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210303100600.dgnkadonzuvfnu22@gilmour
2021-03-15dma-mapping: add a dma_alloc_noncontiguous APIChristoph Hellwig1-0/+78
Add a new API that returns a potentiall virtually non-contigous sg_table and a DMA address. This API is only properly implemented for dma-iommu and will simply return a contigious chunk as a fallback. The intent is that drivers can use this API if either: - no kernel mapping or only temporary kernel mappings are required. That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING - a kernel mapping is required for cached and DMA mapped pages, but the driver also needs the pages to e.g. map them to userspace. In that sense it is a replacement for some aspects of the recently removed and never fully implemented DMA_ATTR_NON_CONSISTENT Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Ricardo Ribalda <ribalda@chromium.org>
2021-03-15dma-mapping: add a dma_mmap_pages helperChristoph Hellwig1-0/+10
Add a helper to map memory allocated using dma_alloc_pages into a user address space, similar to the dma_alloc_attrs function for coherent allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Ricardo Ribalda <ribalda@chromium.org>
2021-02-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-5/+2
Merge misc updates from Andrew Morton: "A few small subsystems and some of MM. 172 patches. Subsystems affected by this patch series: hexagon, scripts, ntfs, ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap, memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, mempolicy, oom-kill, hugetlbfs, and migration)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits) mm/migrate: remove unneeded semicolons hugetlbfs: remove unneeded return value of hugetlb_vmtruncate() hugetlbfs: fix some comment typos hugetlbfs: correct some obsolete comments about inode i_mutex hugetlbfs: make hugepage size conversion more readable hugetlbfs: remove meaningless variable avoid_reserve hugetlbfs: correct obsolete function name in hugetlbfs_read_iter() hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr() hugetlbfs: remove special hugetlbfs_set_page_dirty() mm/hugetlb: change hugetlb_reserve_pages() to type bool mm, oom: fix a comment in dump_task() mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() numa balancing: migrate on fault among multiple bound nodes mm, compaction: make fast_isolate_freepages() stay within zone mm/compaction: fix misbehaviors of fast_find_migrateblock() mm/compaction: correct deferral logic for proactive compaction mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked mm/compaction: remove rcu_read_lock during page compaction z3fold: simplify the zhdr initialization code in init_z3fold_page() ...
2021-02-24mm/gfp: add kernel-doc for gfp_tMatthew Wilcox (Oracle)1-5/+2
The generated html will link to the definition of the gfp_t automatically once we define it. Move the one-paragraph overview of GFP flags from the documentation directory into gfp.h and pull gfp.h into the documentation. This generates warnings with clang (https://lkml.kernel.org/r/20210219195509.GA59987@24bbad8f3778), so use a #if 0 to hide it from the compiler for now. Link: https://lkml.kernel.org/r/20210215204909.3824509-1-willy@infradead.org Link: https://lkml.kernel.org/r/20210220003049.GZ2858050@casper.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-17lib/vsprintf: Add support for printing V4L2 and DRM fourccsSakari Ailus1-0/+18
Add a printk modifier %p4cc (for pixel format) for printing V4L2 and DRM pixel formats denoted by fourccs. The fourcc encoding is the same for both so the same implementation can be used. Suggested-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210216155723.17109-2-sakari.ailus@linux.intel.com
2021-02-09dma-mapping: remove the {alloc,free}_noncoherent methodsChristoph Hellwig1-42/+22
It turns out allowing non-contigous allocations here was a rather bad idea, as we'll now need to define ways to get the pages for mmaping or dma_buf sharing. Revert this change and stick to the original concept. A different API for the use case of non-contigous allocations will be added back later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Ricardo Ribalda <ribalda@chromium.org>:wq
2020-12-31atomic: remove further references to atomic_opsLukas Bulwahn1-1/+0
Commit f0400a77ebdc ("atomic: Delete obsolete documentation") removed ./Documentation/core-api/atomic_ops.rst, but missed to remove further references to that file. Hence, make htmldocs warns: Documentation/core-api/index.rst:53: WARNING: toctree contains reference to nonexisting document 'core-api/atomic_ops' Also, ./scripts/get_maintainer.pl --self-test=patterns warns: warning: no file matches F: Documentation/core-api/atomic_ops.rst Remove further references to ./Documentation/core-api/atomic_ops.rst. Fixes: f0400a77ebdc ("atomic: Delete obsolete documentation") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20201220060927.21582-1-lukas.bulwahn@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-12-15Merge tag 'irq-core-2020-12-15' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Generic interrupt and irqchips subsystem updates. Unusually, there is not a single completely new irq chip driver, just new DT bindings and extensions of existing drivers to accomodate new variants! Core: - Consolidation and robustness changes for irq time accounting - Cleanup and consolidation of irq stats - Remove the fasteoi IPI flow which has been proved useless - Provide an interface for converting legacy interrupt mechanism into irqdomains Drivers: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups" * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling driver core: platform: Add devm_platform_get_irqs_affinity() ACPI: Drop acpi_dev_irqresource_disabled() resource: Add irqresource_disabled() genirq/affinity: Add irq_update_affinity_desc() irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device platform-msi: Track shared domain allocation irqchip/ti-sci-intr: Fix freeing of irqs irqchip/ti-sci-inta: Fix printing of inta id on probe success drivers/irqchip: Remove EZChip NPS interrupt controller Revert "genirq: Add fasteoi IPI flow" irqchip/hip04: Make IPIs use handle_percpu_devid_irq() irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq() irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq() irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq() irqchip/ocelot: Add support for Jaguar2 platforms irqchip/ocelot: Add support for Serval platforms irqchip/ocelot: Add support for Luton platforms irqchip/ocelot: prepare to support more SoC ...
2020-12-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-3/+7
Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto*() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject * uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...
2020-12-15selftests/vm: only some gup_test items are really benchmarksJohn Hubbard1-1/+1
Therefore, some minor cleanup and improvements are in order: 1. Rename the other items appropriately. 2. Stop reporting timing information on the non-benchmark items. It's still being recorded and is available, but there's no point in cluttering up the report with data that no one reasonably needs to check. 3. Don't do iterations, for non-benchmark items. 4. Print out a shorter, more appropriate report for the non-benchmark tests. 5. Add the command that was run, to the report. This really helps, as there are quite a lot of options now. 6. Use a larger integer type for cmd, now that it's being compared Otherwise it doesn't work, because in this case cmd is about 3 billion, which is the perfect size for problems with signed vs unsigned int. Link: https://lkml.kernel.org/r/20201026064021.3545418-6-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>