aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-10-21Merge tag 'linux-watchdog-5.10-rc1' of ↵HEADmasterLinus Torvalds15-48/+374
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - Add Toshiba Visconti watchdog driver - it87_wdt: add IT8772 + IT8784 - several fixes and improvements * tag 'linux-watchdog-5.10-rc1' of git://www.linux-watchdog.org/linux-watchdog: watchdog: Add Toshiba Visconti watchdog driver watchdog: bindings: Add binding documentation for Toshiba Visconti watchdog device watchdog: it87_wdt: add IT8784 ID watchdog: sp5100_tco: Enable watchdog on Family 17h devices if disabled watchdog: sp5100: Fix definition of EFCH_PM_DECODEEN3 watchdog: renesas_wdt: support handover from bootloader watchdog: imx7ulp: Watchdog should continue running for wait/stop mode watchdog: rti: Simplify with dev_err_probe() watchdog: davinci: Simplify with dev_err_probe() watchdog: cadence: Simplify with dev_err_probe() watchdog: remove unneeded inclusion of <uapi/linux/sched/types.h> watchdog: Use put_device on error watchdog: Fix memleak in watchdog_cdev_register watchdog: imx7ulp: Strictly follow the sequence for wdog operations watchdog: it87_wdt: add IT8772 ID watchdog: pcwd_usb: Avoid GFP_ATOMIC where it is not needed drivers: watchdog: rdc321x_wdt: Fix race condition bugs
2020-10-21Merge tag 'rtc-5.10' of ↵Linus Torvalds21-368/+1403
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "A new driver this cycle is making the bulk of the changes and the rx8010 driver has been rework to use the modern APIs. Summary: Subsystem: - new generic DT properties: aux-voltage-chargeable, trickle-voltage-millivolt New driver: - Microcrystal RV-3032 Drivers: - ds1307: use aux-voltage-chargeable - r9701, rx8010: modernization of the driver - rv3028: fix clock output, trickle resistor values, RAM configuration registers" * tag 'rtc-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits) rtc: r9701: set range rtc: r9701: convert to devm_rtc_allocate_device rtc: r9701: stop setting RWKCNT rtc: r9701: remove useless memset rtc: r9701: stop setting a default time rtc: r9701: remove leftover comment rtc: rv3032: Add a driver for Microcrystal RV-3032 dt-bindings: rtc: rv3032: add RV-3032 bindings dt-bindings: rtc: add trickle-voltage-millivolt rtc: rv3028: ensure ram configuration registers are saved rtc: rv3028: factorize EERD bit handling rtc: rv3028: fix trickle resistor values rtc: rv3028: fix clock output support rtc: mt6397: Remove unused member dev rtc: rv8803: simplify the return expression of rv8803_nvram_write rtc: meson: simplify the return expression of meson_vrtc_probe rtc: rx8010: rename rx8010_init_client() to rx8010_init() rtc: ds1307: enable rx8130's backup battery, make it chargeable optionally rtc: ds1307: consider aux-voltage-chargeable rtc: ds1307: store previous charge default per chip ...
2020-10-21Merge branch 'dmi-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi update from Jean Delvare. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: Replace HTTP links with HTTPS ones: DMI/SMBIOS SUPPORT
2020-10-21Merge branch 'i2c/for-5.10' of ↵Linus Torvalds41-920/+4009
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - if a host can be a client, too, the I2C core can now use it to emulate SMBus HostNotify support (STM32 and R-Car added this so far) - also for client mode, a testunit has been added. It can create rare situations on the bus, so host controllers can be tested - a binding has been added to mark the bus as "single-master". This allows for better timeout detections - new driver for Mellanox Bluefield - massive refactoring of the Tegra driver - EEPROMs recognized by the at24 driver can now have custom names - rest is driver updates * 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (80 commits) Documentation: i2c: add testunit docs to index i2c: tegra: Improve driver module description i2c: tegra: Clean up whitespaces, newlines and indentation i2c: tegra: Clean up and improve comments i2c: tegra: Clean up printk messages i2c: tegra: Clean up variable names i2c: tegra: Improve formatting of variables i2c: tegra: Check errors for both positive and negative values i2c: tegra: Factor out hardware initialization into separate function i2c: tegra: Factor out register polling into separate function i2c: tegra: Factor out packet header setup from tegra_i2c_xfer_msg() i2c: tegra: Factor out error recovery from tegra_i2c_xfer_msg() i2c: tegra: Rename wait/poll functions i2c: tegra: Remove "dma" variable from tegra_i2c_xfer_msg() i2c: tegra: Remove redundant check in tegra_i2c_issue_bus_clear() i2c: tegra: Remove likely/unlikely from the code i2c: tegra: Remove outdated barrier() i2c: tegra: Clean up variable types i2c: tegra: Reorder location of functions in the code i2c: tegra: Clean up probe function ...
2020-10-21Merge tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds26-443/+689
Pull ceph updates from Ilya Dryomov: - a patch that removes crush_workspace_mutex (myself). CRUSH computations are no longer serialized and can run in parallel. - a couple new filesystem client metrics for "ceph fs top" command (Xiubo Li) - a fix for a very old messenger bug that affected the filesystem, marked for stable (myself) - assorted fixups and cleanups throughout the codebase from Jeff and others. * tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits) libceph: clear con->out_msg on Policy::stateful_server faults libceph: format ceph_entity_addr nonces as unsigned libceph: fix ENTITY_NAME format suggestion libceph: move a dout in queue_con_delay() ceph: comment cleanups and clarifications ceph: break up send_cap_msg ceph: drop separate mdsc argument from __send_cap ceph: promote to unsigned long long before shifting ceph: don't SetPageError on readpage errors ceph: mark ceph_fmt_xattr() as printf-like for better type checking ceph: fold ceph_update_writeable_page into ceph_write_begin ceph: fold ceph_sync_writepages into writepage_nounlock ceph: fold ceph_sync_readpages into ceph_readpage ceph: don't call ceph_update_writeable_page from page_mkwrite ceph: break out writeback of incompatible snap context to separate function ceph: add a note explaining session reject error string libceph: switch to the new "osd blocklist add" command libceph, rbd, ceph: "blacklist" -> "blocklist" ceph: have ceph_writepages_start call pagevec_lookup_range_tag ceph: use kill_anon_super helper ...
2020-10-20Merge tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarrayLinus Torvalds11-35/+116
Pull XArray updates from Matthew Wilcox: - Fix the test suite after introduction of the local_lock - Fix a bug in the IDA spotted by Coverity - Change the API that allows the workingset code to delete a node - Fix xas_reload() when dealing with entries that occupy multiple indices - Add a few more tests to the test suite - Fix an unsigned int being shifted into an unsigned long * tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarray: XArray: Fix xas_create_range for ranges above 4 billion radix-tree: fix the comment of radix_tree_next_slot() XArray: Fix xas_reload for multi-index entries XArray: Add private interface for workingset node deletion XArray: Fix xas_for_each_conflict documentation XArray: Test marked multiorder iterations XArray: Test two more things about xa_cmpxchg ida: Free allocated bitmap in error path radix tree test suite: Fix compilation
2020-10-20Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds36-502/+989
Pull NFS client updates from Anna Schumaker: "Stable Fixes: - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+ - Fix nfs_path in case of a rename retry - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag New features and improvements: - Replace dprintk() calls with tracepoints - Make cache consistency bitmap dynamic - Added support for the NFS v4.2 READ_PLUS operation - Improvements to net namespace uniquifier Other bugfixes and cleanups: - Remove redundant clnt pointer - Don't update timeout values on connection resets - Remove redundant tracepoints - Various cleanups to comments - Fix oops when trying to use copy_file_range with v4.0 source server - Improvements to flexfiles mirrors - Add missing 'local_lock=posix' mount option" * tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits) NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag NFSv4: Fix up RCU annotations for struct nfs_netns_client NFS: Only reference user namespace from nfs4idmap struct instead of cred nfs: add missing "posix" local_lock constant table definition NFSv4: Use the net namespace uniquifier if it is set NFSv4: Clean up initialisation of uniquified client id strings NFS: Decode a full READ_PLUS reply SUNRPC: Add an xdr_align_data() function NFS: Add READ_PLUS hole segment decoding SUNRPC: Add the ability to expand holes in data pages SUNRPC: Split out _shift_data_right_tail() SUNRPC: Split out xdr_realign_pages() from xdr_align_pages() NFS: Add READ_PLUS data segment support NFS: Use xdr_page_pos() in NFSv4 decode_getacl() SUNRPC: Implement a xdr_page_pos() function SUNRPC: Split out a function for setting current page NFS: fix nfs_path in case of a rename retry fs: nfs: return per memcg count for xattr shrinkers NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE nfs: remove incorrect fallthrough label ...
2020-10-20Merge tag 'io_uring-5.10-2020-10-20' of git://git.kernel.dk/linux-blockLinus Torvalds7-301/+475
Pull io_uring updates from Jens Axboe: "A mix of fixes and a few stragglers. In detail: - Revert the bogus __read_mostly that we discussed for the initial pull request. - Fix a merge window regression with fixed file registration error path handling. - Fix io-wq numa node affinities. - Series abstracting out an io_identity struct, making it both easier to see what the personality items are, and also easier to to adopt more. Use this to cover audit logging. - Fix for read-ahead disabled block condition in async buffered reads, and using single page read-ahead to unify what generic_file_buffer_read() path is used. - Series for REQ_F_COMP_LOCKED fix and removal of it (Pavel) - Poll fix (Pavel)" * tag 'io_uring-5.10-2020-10-20' of git://git.kernel.dk/linux-block: (21 commits) io_uring: use blk_queue_nowait() to check if NOWAIT supported mm: use limited read-ahead to satisfy read mm: mark async iocb read as NOWAIT once some data has been copied io_uring: fix double poll mask init io-wq: inherit audit loginuid and sessionid io_uring: use percpu counters to track inflight requests io_uring: assign new io_identity for task if members have changed io_uring: store io_identity in io_uring_task io_uring: COW io_identity on mismatch io_uring: move io identity items into separate struct io_uring: rely solely on work flags to determine personality. io_uring: pass required context in as flags io-wq: assign NUMA node locality if appropriate io_uring: fix error path cleanup in io_sqe_files_register() Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly" io_uring: fix REQ_F_COMP_LOCKED by killing it io_uring: dig out COMP_LOCK from deep call chain io_uring: don't put a poll req under spinlock io_uring: don't unnecessarily clear F_LINK_TIMEOUT io_uring: don't set COMP_LOCKED if won't put ...
2020-10-20Merge tag 'for-v5.10' of ↵Linus Torvalds55-1383/+3938
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: "Power-supply core: - add wireless type - properly document current direction Battery/charger driver changes: - new fuel-gauge/charger driver for RN5T618/RN5T619 - new charger driver for BQ25980, BQ25975 and BQ25960 - bq27xxx-battery: add support for TI bq34z100 - gpio-charger: convert to GPIO descriptors - gpio-charger: add optional support for charge current limiting - max17040: add support for max17041, max17043, max17044 - max17040: add support for max17048, max17049, max17058, max17059 - smb347-charger: add DT support - smb247-charger: add SMB345 and SMB358 support - simple-battery: add temperature properties - lots of minor fixes, cleanups and DT binding YAML conversions Reset drivers: - ocelot: Add support for Sparx5" * tag 'for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (81 commits) power: reset: POWER_RESET_OCELOT_RESET should depend on Ocelot or Sparx5 power: supply: bq25980: Fix uninitialized wd_reg_val and overrun power: supply: ltc2941: Fix ptr to enum cast power: supply: test-power: revise parameter printing to use sprintf power: supply: charger-manager: fix incorrect check on charging_duration_ms power: supply: max17040: Fix ptr to enum cast power: supply: bq25980: Fix uninitialized wd_reg_val power: supply: bq25980: remove redundant zero check on ret power: reset: ocelot: Add support for Sparx5 dt-bindings: reset: ocelot: Add Sparx5 support power: supply: sbs-battery: keep error code when get_property() fails power: supply: bq25980: Add support for the BQ259xx family dt-binding: bq25980: Add the bq25980 flash charger power: supply: fix spelling mistake "unprecise" -> "imprecise" power: supply: test_power: add missing newlines when printing parameters by sysfs power: supply: pm2301: drop duplicated i2c_device_id power: supply: charger-manager: drop unused charger assignment power: supply: rt9455: skip 'struct acpi_device_id' when !CONFIG_ACPI power: supply: goldfish: skip 'struct acpi_device_id' when !CONFIG_ACPI power: supply: bq25890: skip 'struct acpi_device_id' when !CONFIG_ACPI ...
2020-10-20Merge tag 'drm-next-2020-10-19' of git://anongit.freedesktop.org/drm/drmLinus Torvalds14-34/+72
Pull drm fixes from Dave Airlie: "Some fixes queued up already for i915 and amdgpu, I've also included the fix for the clang warning you've seen. i915: - set all unused color plane offsets to ~0xfff again (Ville) - fix TGL DKL PHY DP vswing handling (Ville) amdgpu: - DCN clang warning fix - eDP fix - BACO fix - kernel documentation fixes - SMU7 mclk fix - VCN1 hw bug workaround amdkfd: - kvfree vs kfree fix" * tag 'drm-next-2020-10-19' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Fix incorrect dsc force enable logic drm/amdkfd: Use kvfree in destroy_crat_image drm/amdgpu: vcn and jpeg ring synchronization drm/amd/pm: increase mclk switch threshold to 200 us docs: amdgpu: fix a warning when building the documentation drm/amd/display: kernel-doc: document force_timing_sync drm/amdgpu/swsmu: init the baco mutex in early_init drm/amd/display: Fix module load hangs when connected to an eDP drm/i915: Set all unused color plane offsets to ~0xfff again drm/i915: Fix TGL DKL PHY DP vswing handling
2020-10-20Merge tag 'iommu-fix-v5.10' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fix from Joerg Roedel: "Fix a build regression with !CONFIG_IOMMU_API" * tag 'iommu-fix-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built
2020-10-20Merge tag 'for-linus-5.10b-rc1b-tag' of ↵Linus Torvalds19-165/+707
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - A single patch to fix the Xen security issue XSA-331 (malicious guests can DoS dom0 by triggering NULL-pointer dereferences or access to stale data). - A larger series to fix the Xen security issue XSA-332 (malicious guests can DoS dom0 by sending events at high frequency leading to dom0's vcpus being busy in IRQ handling for elongated times). * tag 'for-linus-5.10b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: block rogue events for some time xen/events: defer eoi in case of excessive number of events xen/events: use a common cpu hotplug hook for event channels xen/events: switch user event channels to lateeoi model xen/pciback: use lateeoi irq binding xen/pvcallsback: use lateeoi irq binding xen/scsiback: use lateeoi irq binding xen/netback: use lateeoi irq binding xen/blkback: use lateeoi irq binding xen/events: add a new "late EOI" evtchn framework xen/events: fix race in evtchn_fifo_unmask() xen/events: add a proper barrier to 2-level uevent unmasking xen/events: avoid removing an event channel while handling it
2020-10-20Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds34-112/+263
Pull ARM updates from Russell King: - handle inexact watchpoint addresses (Douglas Anderson) - decompressor serial debug cleanups (Linus Walleij) - update L2 cache prefetch bits (Guillaume Tucker) - add text offset and malloc size to the decompressor kexec data * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: add malloc size to decompressor kexec size structure ARM: add TEXT_OFFSET to decompressor kexec image structure ARM: 9007/1: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values ARM: 9010/1: uncompress: Print the location of appended DTB ARM: 9009/1: uncompress: Enable debug in head.S ARM: 9008/1: uncompress: Drop excess whitespace print ARM: 9006/1: uncompress: Wait for ready and busy in debug prints ARM: 9005/1: debug: Select flow control for all debug UARTs ARM: 9004/1: debug: Split waituart to CTS and TXRDY ARM: 9003/1: uncompress: Delete unused debug macros ARM: 8997/2: hw_breakpoint: Handle inexact watchpoint addresses
2020-10-20Merge tag 'arc-5.10-rc1' of ↵Linus Torvalds36-1362/+15
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: "The bulk of ARC pull request is removal of EZChip NPS platform which was suffering from constant bitrot. In recent years EZChip has gone though multiple successive acquisitions and I guess things and people move on. I would like to take this opportunity to recognize and thank all those good folks (Gilad, Noam, Ofer...) for contributing major bits to ARC port (SMP, Big Endian). Summary: - drop support for EZChip NPS platform - misc other fixes" * tag 'arc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: arc: include/asm: fix typos of "themselves" ARC: SMP: fix typo and use "come up" instead of "comeup" ARC: [dts] fix the errors detected by dtbs_check arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER ARC: [plat-eznps]: Drop support for EZChip NPS platform
2020-10-20xen/events: block rogue events for some timeJuergen Gross2-6/+24
In order to avoid high dom0 load due to rogue guests sending events at high frequency, block those events in case there was no action needed in dom0 to handle the events. This is done by adding a per-event counter, which set to zero in case an EOI without the XEN_EOI_FLAG_SPURIOUS is received from a backend driver, and incremented when this flag has been set. In case the counter is 2 or higher delay the EOI by 1 << (cnt - 2) jiffies, but not more than 1 second. In order not to waste memory shorten the per-event refcnt to two bytes (it should normally never exceed a value of 2). Add an overflow check to evtchn_get() to make sure the 2 bytes really won't overflow. This is part of XSA-332. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/events: defer eoi in case of excessive number of eventsJuergen Gross5-32/+216
In case rogue guests are sending events at high frequency it might happen that xen_evtchn_do_upcall() won't stop processing events in dom0. As this is done in irq handling a crash might be the result. In order to avoid that, delay further inter-domain events after some time in xen_evtchn_do_upcall() by forcing eoi processing into a worker on the same cpu, thus inhibiting new events coming in. The time after which eoi processing is to be delayed is configurable via a new module parameter "event_loop_timeout" which specifies the maximum event loop time in jiffies (default: 2, the value was chosen after some tests showing that a value of 2 was the lowest with an only slight drop of dom0 network throughput while multiple guests performed an event storm). How long eoi processing will be delayed can be specified via another parameter "event_eoi_delay" (again in jiffies, default 10, again the value was chosen after testing with different delay values). This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/events: use a common cpu hotplug hook for event channelsJuergen Gross3-21/+47
Today only fifo event channels have a cpu hotplug callback. In order to prepare for more percpu (de)init work move that callback into events_base.c and add percpu_init() and percpu_deinit() hooks to struct evtchn_ops. This is part of XSA-332. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/events: switch user event channels to lateeoi modelJuergen Gross1-4/+3
Instead of disabling the irq when an event is received and enabling it again when handled by the user process use the lateeoi model. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/pciback: use lateeoi irq bindingJuergen Gross4-19/+56
In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving pcifront use the lateeoi irq binding for pciback and unmask the event channel only just before leaving the event handling function. Restructure the handling to support that scheme. Basically an event can come in for two reasons: either a normal request for a pciback action, which is handled in a worker, or in case the guest has finished an AER request which was requested by pciback. When an AER request is issued to the guest and a normal pciback action is currently active issue an EOI early in order to be able to receive another event when the AER request has been finished by the guest. Let the worker processing the normal requests run until no further request is pending, instead of starting a new worker ion that case. Issue the EOI only just before leaving the worker. This scheme allows to drop calling the generic function xen_pcibk_test_and_schedule_op() after processing of any request as the handling of both request types is now separated more cleanly. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/pvcallsback: use lateeoi irq bindingJuergen Gross1-30/+46
In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving pvcallsfront use the lateeoi irq binding for pvcallsback and unmask the event channel only after handling all write requests, which are the ones coming in via an irq. This requires modifying the logic a little bit to not require an event for each write request, but to keep the ioworker running until no further data is found on the ring page to be processed. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/scsiback: use lateeoi irq bindingJuergen Gross1-10/+13
In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving scsifront use the lateeoi irq binding for scsiback and unmask the event channel only just before leaving the event handling function. In case of a ring protocol error don't issue an EOI in order to avoid the possibility to use that for producing an event storm. This at once will result in no further call of scsiback_irq_fn(), so the ring_error struct member can be dropped and scsiback_do_cmd_fn() can signal the protocol error via a negative return value. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/netback: use lateeoi irq bindingJuergen Gross4-14/+86
In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving netfront use the lateeoi irq binding for netback and unmask the event channel only just before going to sleep waiting for new events. Make sure not to issue an EOI when none is pending by introducing an eoi_pending element to struct xenvif_queue. When no request has been consumed set the spurious flag when sending the EOI for an interrupt. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/blkback: use lateeoi irq bindingJuergen Gross2-8/+19
In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving blkfront use the lateeoi irq binding for blkback and unmask the event channel only after processing all pending requests. As the thread processing requests is used to do purging work in regular intervals an EOI may be sent only after having received an event. If there was no pending I/O request flag the EOI as spurious. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/events: add a new "late EOI" evtchn frameworkJuergen Gross2-17/+155
In order to avoid tight event channel related IRQ loops add a new framework of "late EOI" handling: the IRQ the event channel is bound to will be masked until the event has been handled and the related driver is capable to handle another event. The driver is responsible for unmasking the event channel via the new function xen_irq_lateeoi(). This is similar to binding an event channel to a threaded IRQ, but without having to structure the driver accordingly. In order to support a future special handling in case a rogue guest is sending lots of unsolicited events, add a flag to xen_irq_lateeoi() which can be set by the caller to indicate the event was a spurious one. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/events: fix race in evtchn_fifo_unmask()Juergen Gross1-4/+9
Unmasking a fifo event channel can result in unmasking it twice, once directly in the kernel and once via a hypercall in case the event was pending. Fix that by doing the local unmask only if the event is not pending. This is part of XSA-332. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2020-10-20xen/events: add a proper barrier to 2-level uevent unmaskingJuergen Gross1-0/+2
A follow-up patch will require certain write to happen before an event channel is unmasked. While the memory barrier is not strictly necessary for all the callers, the main one will need it. In order to avoid an extra memory barrier when using fifo event channels, mandate evtchn_unmask() to provide write ordering. The 2-level event handling unmask operation is missing an appropriate barrier, so add it. Fifo event channels are fine in this regard due to using sync_cmpxchg(). This is part of XSA-332. Cc: stable@vger.kernel.org Suggested-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20xen/events: avoid removing an event channel while handling itJuergen Gross1-5/+36
Today it can happen that an event channel is being removed from the system while the event handling loop is active. This can lead to a race resulting in crashes or WARN() splats when trying to access the irq_info structure related to the event channel. Fix this problem by using a rwlock taken as reader in the event handling loop and as writer when deallocating the irq_info structure. As the observed problem was a NULL dereference in evtchn_from_irq() make this function more robust against races by testing the irq_info pointer to be not NULL before dereferencing it. And finally make all accesses to evtchn_to_irq[row][col] atomic ones in order to avoid seeing partial updates of an array element in irq handling. Note that irq handling can be entered only for event channels which have been valid before, so any not populated row isn't a problem in this regard, as rows are only ever added and never removed. This is XSA-331. Cc: stable@vger.kernel.org Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reported-by: Jinoh Kang <luke1337@theori.io> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Wei Liu <wl@xen.org>
2020-10-19Merge tag 'riscv-for-linus-5.10-mw0' of ↵Linus Torvalds31-241/+1212
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "A handful of cleanups and new features: - A handful of cleanups for our page fault handling - Improvements to how we fill out cacheinfo - Support for EFI-based systems" * tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits) RISC-V: Add page table dump support for uefi RISC-V: Add EFI runtime services RISC-V: Add EFI stub support. RISC-V: Add PE/COFF header for EFI stub RISC-V: Implement late mapping page table allocation functions RISC-V: Add early ioremap support RISC-V: Move DT mapping outof fixmap RISC-V: Fix duplicate included thread_info.h riscv/mm/fault: Set FAULT_FLAG_INSTRUCTION flag in do_page_fault() riscv/mm/fault: Fix inline placement in vmalloc_fault() declaration riscv: Add cache information in AUX vector riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO riscv: Set more data to cacheinfo riscv/mm/fault: Move access error check to function riscv/mm/fault: Move FAULT_FLAG_WRITE handling in do_page_fault() riscv/mm/fault: Simplify mm_fault_error() riscv/mm/fault: Move fault error handling to mm_fault_error() riscv/mm/fault: Simplify fault error handling riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault() riscv/mm/fault: Move bad area handling to bad_area() ...
2020-10-19Merge tag 'm68knommu-for-v5.10' of ↵Linus Torvalds6-559/+402
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: "A collection of fixes for 5.10: - switch to using asm-generic uaccess code - fix sparse warnings in signal code - fix compilation of ColdFire MMC support - support sysrq in ColdFire serial driver" * tag 'm68knommu-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: serial: mcf: add sysrq capability m68knommu: include SDHC support only when hardware has it m68knommu: fix sparse warnings in signal code m68knommu: switch to using asm-generic/uaccess.h
2020-10-19Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds37-509/+1023
Pull more xfs updates from Darrick Wong: "The second large pile of new stuff for 5.10, with changes even more monumental than last week! We are formally announcing the deprecation of the V4 filesystem format in 2030. All users must upgrade to the V5 format, which contains design improvements that greatly strengthen metadata validation, supports reflink and online fsck, and is the intended vehicle for handling timestamps past 2038. We're also deprecating the old Irix behavioral tweaks in September 2025. Coming along for the ride are two design changes to the deferred metadata ops subsystem. One of the improvements is to retain correct logical ordering of tasks and subtasks, which is a more logical design for upper layers of XFS and will become necessary when we add atomic file range swaps and commits. The second improvement to deferred ops improves the scalability of the log by helping the log tail to move forward during long-running operations. This reduces log contention when there are a large number of threads trying to run transactions. In addition to that, this fixes numerous small bugs in log recovery; refactors logical intent log item recovery to remove the last remaining place in XFS where we could have nested transactions; fixes a couple of ways that intent log item recovery could fail in ways that wouldn't have happened in the regular commit paths; fixes a deadlock vector in the GETFSMAP implementation (which improves its performance by 20%); and fixes serious bugs in the realtime growfs, fallocate, and bitmap handling code. Summary: - Deprecate the V4 filesystem format, some disused mount options, and some legacy sysctl knobs now that we can support dates into the 25th century. Note that removal of V4 support will not happen until the early 2030s. - Fix some probles with inode realtime flag propagation. - Fix some buffer handling issues when growing a rt filesystem. - Fix a problem where a BMAP_REMAP unmap call would free rt extents even though the purpose of BMAP_REMAP is to avoid freeing the blocks. - Strengthen the dabtree online scrubber to check hash values on child dabtree blocks. - Actually log new intent items created as part of recovering log intent items. - Fix a bug where quotas weren't attached to an inode undergoing bmap intent item recovery. - Fix a buffer overrun problem with specially crafted log buffer headers. - Various cleanups to type usage and slightly inaccurate comments. - More cleanups to the xattr, log, and quota code. - Don't run the (slower) shared-rmap operations on attr fork mappings. - Fix a bug where we failed to check the LSN of finobt blocks during replay and could therefore overwrite newer data with older data. - Clean up the ugly nested transaction mess that log recovery uses to stage intent item recovery in the correct order by creating a proper data structure to capture recovered chains. - Use the capture structure to resume intent item chains with the same log space and block reservations as when they were captured. - Fix a UAF bug in bmap intent item recovery where we failed to maintain our reference to the incore inode if the bmap operation needed to relog itself to continue. - Rearrange the defer ops mechanism to finish newly created subtasks of a parent task before moving on to the next parent task. - Automatically relog intent items in deferred ops chains if doing so would help us avoid pinning the log tail. This will help fix some log scaling problems now and will facilitate atomic file updates later. - Fix a deadlock in the GETFSMAP implementation by using an internal memory buffer to reduce indirect calls and copies to userspace, thereby improving its performance by ~20%. - Fix various problems when calling growfs on a realtime volume would not fully update the filesystem metadata. - Fix broken Kconfig asking about deprecated XFS when XFS is disabled" * tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits) xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n xfs: fix high key handling in the rt allocator's query_range function xfs: annotate grabbing the realtime bitmap/summary locks in growfs xfs: make xfs_growfs_rt update secondary superblocks xfs: fix realtime bitmap/summary file truncation when growing rt volume xfs: fix the indent in xfs_trans_mod_dquot xfs: do the ASSERT for the arguments O_{u,g,p}dqpp xfs: fix deadlock and streamline xfs_getfsmap performance xfs: limit entries returned when counting fsmap records xfs: only relog deferred intent items if free space in the log gets low xfs: expose the log push threshold xfs: periodically relog deferred intent items xfs: change the order in which child and parent defer ops are finished xfs: fix an incore inode UAF in xfs_bui_recover xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering xfs: clean up bmap intent item recovery checking xfs: xfs_defer_capture should absorb remaining transaction reservation xfs: xfs_defer_capture should absorb remaining block reservations xfs: proper replay of deferred ops queued during log recovery xfs: remove XFS_LI_RECOVERED ...
2020-10-19Merge tag 'fuse-update-5.10' of ↵Linus Torvalds20-496/+2689
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Support directly accessing host page cache from virtiofs. This can improve I/O performance for various workloads, as well as reducing the memory requirement by eliminating double caching. Thanks to Vivek Goyal for doing most of the work on this. - Allow automatic submounting inside virtiofs. This allows unique st_dev/ st_ino values to be assigned inside the guest to files residing on different filesystems on the host. Thanks to Max Reitz for the patches. - Fix an old use after free bug found by Pradeep P V K. * tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) virtiofs: calculate number of scatter-gather elements accurately fuse: connection remove fix fuse: implement crossmounts fuse: Allow fuse_fill_super_common() for submounts fuse: split fuse_mount off of fuse_conn fuse: drop fuse_conn parameter where possible fuse: store fuse_conn in fuse_req fuse: add submount support to <uapi/linux/fuse.h> fuse: fix page dereference after free virtiofs: add logic to free up a memory range virtiofs: maintain a list of busy elements virtiofs: serialize truncate/punch_hole and dax fault path virtiofs: define dax address space operations virtiofs: add DAX mmap support virtiofs: implement dax read/write operations virtiofs: introduce setupmapping/removemapping commands virtiofs: implement FUSE_INIT map_alignment field virtiofs: keep a list of free dax memory ranges virtiofs: add a mount option to enable dax virtiofs: set up virtio_fs dax_device ...
2020-10-19Merge tag 'zonefs-5.10-rc1' of ↵Linus Torvalds3-13/+233
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: "Add an 'explicit-open' mount option to automatically issue a REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file is open for writing for the first time. This avoids 'insufficient zone resources' errors for write operations on some drives with limited zone resources or on ZNS drives with a limited number of active zones. From Johannes" * tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: document the explicit-open mount option zonefs: open/close zone on file open/close zonefs: provide no-lock zonefs_io_error variant zonefs: introduce helper for zone management
2020-10-19rtc: r9701: set rangeAlexandre Belloni1-5/+3
Set range and remove the set_time check. This is a classic BCD RTC. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201015191135.471249-6-alexandre.belloni@bootlin.com
2020-10-19rtc: r9701: convert to devm_rtc_allocate_deviceAlexandre Belloni1-3/+3
This allows further improvement of the driver. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201015191135.471249-5-alexandre.belloni@bootlin.com
2020-10-19rtc: r9701: stop setting RWKCNTAlexandre Belloni1-1/+0
tm_wday is never checked for validity and it is not read back in r9701_get_datetime. Avoid setting it to stop tripping static checkers: drivers/rtc/rtc-r9701.c:109 r9701_set_datetime() error: undefined (user controlled) shift '1 << dt->tm_wday' Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201015191135.471249-4-alexandre.belloni@bootlin.com
2020-10-19rtc: r9701: remove useless memsetAlexandre Belloni1-2/+0
The RTC core already sets to zero the struct rtc_tie it passes to the driver, avoid doing it a second time. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201015191135.471249-3-alexandre.belloni@bootlin.com
2020-10-19rtc: r9701: stop setting a default timeAlexandre Belloni1-22/+0
It doesn't make sense to set the RTC to a default value at probe time. Let the core handle invalid date and time. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201015191135.471249-2-alexandre.belloni@bootlin.com
2020-10-19rtc: r9701: remove leftover commentAlexandre Belloni1-4/+0
Commit 22652ba72453 ("rtc: stop validating rtc_time in .read_time") removed the code but not the associated comment. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201015191135.471249-1-alexandre.belloni@bootlin.com
2020-10-19rtc: rv3032: Add a driver for Microcrystal RV-3032Alexandre Belloni3-0/+936
New driver for the Microcrystal RV-3032, including support for: - Date/time - Alarms - Low voltage detection - Trickle charge - Trimming - Clkout - RAM - EEPROM - Temperature sensor Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201013144110.1942218-3-alexandre.belloni@bootlin.com
2020-10-19dt-bindings: rtc: rv3032: add RV-3032 bindingsAlexandre Belloni1-0/+64
Document the Microcrystal RV-3032 device tree bindings Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201013144110.1942218-2-alexandre.belloni@bootlin.com
2020-10-19dt-bindings: rtc: add trickle-voltage-millivoltAlexandre Belloni1-0/+6
Some RTCs have a trickle charge that is able to output different voltages depending on the type of the connected auxiliary power (battery, supercap, ...). Add a property allowing to specify the necessary voltage. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201013144110.1942218-1-alexandre.belloni@bootlin.com
2020-10-19io_uring: use blk_queue_nowait() to check if NOWAIT supportedJeffle Xu1-1/+1
commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") adds a new helper function blk_queue_nowait() to check if the bdev supports handling of REQ_NOWAIT or not. Since then bio-based dm device can also support REQ_NOWAIT, and currently only dm-linear supports that since commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable it for linear target"). Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not builtBartosz Golaszewski1-1/+1
Since commit c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") dmar.c needs struct iommu_device to be selected. We can drop this dependency by not dereferencing struct iommu_device if IOMMU_API is not selected and by reusing the information stored in iommu->drhd->ignored instead. This fixes the following build error when IOMMU_API is not selected: drivers/iommu/intel/dmar.c: In function ‘free_iommu’: drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’ 1139 | if (intel_iommu_enabled && iommu->iommu.ops) { ^ Fixes: c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20201013073055.11262-1-brgl@bgdev.pl Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-19Merge tag 'drm-intel-next-fixes-2020-10-15' of ↵Dave Airlie2-13/+6
git://anongit.freedesktop.org/drm/drm-intel into drm-next - Set all unused color plane offsets to ~0xfff again (Ville) - Fix TGL DKL PHY DP vswing handling (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015181453.GA2905280@intel.com
2020-10-19drm/amd/display: Fix incorrect dsc force enable logicEryk Brol1-1/+1
[Why] Missed removing a '!' which results in incorrect behavior [How] Remove the offending '!' Signed-off-by: Eryk Brol <eryk.brol@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015194053.355335-1-eryk.brol@amd.com
2020-10-19Merge tag 'amd-drm-fixes-5.10-2020-10-14' of ↵Dave Airlie11-20/+65
git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-fixes-5.10-2020-10-14: amdgpu: - eDP fix - BACO fix - Kernel documentation fixes - SMU7 mclk fix - VCN1 hw bug workaround amdkfd: - kvfree vs kfree fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201014195403.4558-1-alexander.deucher@amd.com
2020-10-18Merge tag 'linux-kselftest-kunit-5.10-rc1' of ↵Linus Torvalds16-110/+444
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more Kunit updates from Shuah Khan: - add Kunit to kernel_init() and remove KUnit from init calls entirely. This addresses the concern that Kunit would not work correctly during late init phase. - add a linker section where KUnit can put references to its test suites. This is the first step in transitioning to dispatching all KUnit tests from a centralized executor rather than having each as its own separate late_initcall. - add a centralized executor to dispatch tests rather than relying on late_initcall to schedule each test suite separately. Centralized execution is for built-in tests only; modules will execute tests when loaded. - convert bitfield test to use KUnit framework - Documentation updates for naming guidelines and how kunit_test_suite() works. - add test plan to KUnit TAP format * tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: lib: kunit: Fix compilation test when using TEST_BIT_FIELD_COMPILE lib: kunit: add bitfield test conversion to KUnit Documentation: kunit: add a brief blurb about kunit_test_suite kunit: test: add test plan to KUnit TAP format init: main: add KUnit to kernel init kunit: test: create a single centralized executor for all tests vmlinux.lds.h: add linker section for KUnit test suites Documentation: kunit: Add naming guidelines
2020-10-18Merge tag 'core-rcu-2020-10-12' of ↵Linus Torvalds57-421/+1582
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: - Debugging for smp_call_function() - RT raw/non-raw lock ordering fixes - Strict grace periods for KASAN - New smp_call_function() torture test - Torture-test updates - Documentation updates - Miscellaneous fixes [ This doesn't actually pull the tag - I've dropped the last merge from the RCU branch due to questions about the series. - Linus ] * tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) smp: Make symbol 'csd_bug_count' static kernel/smp: Provide CSD lock timeout diagnostics smp: Add source and destination CPUs to __call_single_data rcu: Shrink each possible cpu krcp rcu/segcblist: Prevent useless GP start if no CBs to accelerate torture: Add gdb support rcutorture: Allow pointer leaks to test diagnostic code rcutorture: Hoist OOM registry up one level refperf: Avoid null pointer dereference when buf fails to allocate rcutorture: Properly synchronize with OOM notifier rcutorture: Properly set rcu_fwds for OOM handling torture: Add kvm.sh --help and update help message rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05 torture: Update initrd documentation rcutorture: Replace HTTP links with HTTPS ones locktorture: Make function torture_percpu_rwsem_init() static torture: document --allcpus argument added to the kvm.sh script rcutorture: Output number of elapsed grace periods rcutorture: Remove KCSAN stubs rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp() ...
2020-10-18Merge tag 'mailbox-v5.10' of ↵Linus Torvalds8-57/+506
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - arm: implementation of mhu as a doorbell driver and conversion of dt-bindings to json-schema - mediatek: fix platform_get_irq error handling - bcm: convert tasklets to use new tasklet_setup api - core: fix race cause by hrtimer starting inappropriately * tag 'mailbox-v5.10' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: avoid timer start from callback maiblox: mediatek: Fix handling of platform_get_irq() error mailbox: arm_mhu: Add ARM MHU doorbell driver mailbox: arm_mhu: Match only if compatible is "arm,mhu" dt-bindings: mailbox: add doorbell support to ARM MHU dt-bindings: mailbox : arm,mhu: Convert to Json-schema mailbox: bcm: convert tasklets to use new tasklet_setup() API
2020-10-18Merge branch 'for-5.10' of ↵Linus Torvalds11-26/+1118
git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux Pull coccinelle updates from Julia Lawall. * 'for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux: coccinelle: api: add kfree_mismatch script coccinelle: iterators: Add for_each_child.cocci script scripts: coccicheck: Change default condition for parallelism scripts: coccicheck: Add quotes to improve portability coccinelle: api: kfree_sensitive: print memset position coccinelle: misc: add flexible_array.cocci script coccinelle: api: add kvmalloc script scripts: coccicheck: Change default value for parallelism coccinelle: misc: add excluded_middle.cocci script scripts: coccicheck: Improve error feedback when coccicheck fails coccinelle: api: update kzfree script to kfree_sensitive coccinelle: misc: add uninitialized_var.cocci script coccinelle: ifnullfree: add vfree(), kvfree*() functions coccinelle: api: add kobj_to_dev.cocci script coccinelle: add patch rule for dma_alloc_coherent scripts: coccicheck: Add chain mode to list of modes
2020-10-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds55-439/+592
Merge yet more updates from Andrew Morton: "Subsystems affected by this patch series: mm (memcg, migration, pagemap, gup, madvise, vmalloc), ia64, and misc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) mm: remove duplicate include statement in mmu.c mm: remove the filename in the top of file comment in vmalloc.c mm: cleanup the gfp_mask handling in __vmalloc_area_node mm: remove alloc_vm_area x86/xen: open code alloc_vm_area in arch_gnttab_valloc xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv drm/i915: use vmap in i915_gem_object_map drm/i915: stop using kmap in i915_gem_object_map drm/i915: use vmap in shmem_pin_map zsmalloc: switch from alloc_vm_area to get_vm_area mm: allow a NULL fn callback in apply_to_page_range mm: add a vmap_pfn function mm: add a VM_MAP_PUT_PAGES flag for vmap mm: update the documentation for vfree mm/madvise: introduce process_madvise() syscall: an external memory hinting API pid: move pidfd_get_pid() to pid.c mm/madvise: pass mm to do_madvise selftests/vm: 10x speedup for hmm-tests binfmt_elf: take the mmap lock around find_extend_vma() mm/gup_benchmark: take the mmap lock around GUP ...
2020-10-18Merge tag 'for-linus-5.10-rc1' of ↵Linus Torvalds14-61/+91
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Improve support for non-glibc systems - Vector: Add support for scripting and dynamic tap devices - Various fixes for the vector networking driver - Various fixes for time travel mode * tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: vector: Add dynamic tap interfaces and scripting um: Clean up stacktrace dump um: Fix incorrect assumptions about max pid length um: Remove dead usage of TIF_IA32 um: Remove redundant NULL check um: change sigio_spinlock to a mutex um: time-travel: Return the sequence number in ACK messages um: time-travel: Fix IRQ handling in time_travel_handle_message() um: Allow static linking for non-glibc implementations um: Some fixes to build UML with musl um: vector: Use GFP_ATOMIC under spin lock um: Fix null pointer dereference in vector_user_bpf
2020-10-18Merge tag 'for-linus-5.10-rc1-part2' of ↵Linus Torvalds7-4/+25
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull more ubi and ubifs updates from Richard Weinberger: "UBI: - Correctly use kthread_should_stop in ubi worker UBIFS: - Fixes for memory leaks while iterating directory entries - Fix for a user triggerable error message - Fix for a space accounting bug in authenticated mode" * tag 'for-linus-5.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: journal: Make sure to not dirty twice for auth nodes ubifs: setflags: Don't show error message when vfs_ioc_setflags_prepare() fails ubifs: ubifs_jnl_change_xattr: Remove assertion 'nlink > 0' for host inode ubi: check kthread_should_stop() after the setting of task state ubifs: dent: Fix some potential memory leaks while iterating entries ubifs: xattr: Fix some potential memory leaks while iterating entries
2020-10-18Merge tag 'for-linus-5.10-rc1' of ↵Linus Torvalds5-21/+34
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull ubifs updates from Richard Weinberger: - Kernel-doc fixes - Fixes for memory leaks in authentication option parsing * tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: mount_ubifs: Release authentication resource in error handling path ubifs: Don't parse authentication mount options in remount process ubifs: Fix a memleak after dumping authentication mount options ubifs: Fix some kernel-doc warnings in tnc.c ubifs: Fix some kernel-doc warnings in replay.c ubifs: Fix some kernel-doc warnings in gc.c ubifs: Fix 'hash' kernel-doc warning in auth.c
2020-10-18mm: remove duplicate include statement in mmu.cTian Tao1-1/+0
asm/sections.h is included more than once, Remove the one that isn't necessary. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lkml.kernel.org/r/1600088607-17327-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: remove the filename in the top of file comment in vmalloc.cChristoph Hellwig1-2/+0
No point in having the filename inside the file. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002124035.1539300-3-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: cleanup the gfp_mask handling in __vmalloc_area_nodeChristoph Hellwig1-12/+10
Patch series "two small vmalloc cleanups". This patch (of 2): __vmalloc_area_node currently has four different gfp_t variables to just express this simple logic: - use the passed in mask, plus __GFP_NOWARN and __GFP_HIGHMEM (if suitable) for the underlying page allocation - use just the reclaim flags from the passed in mask plus __GFP_ZERO for allocating the page array Simplify this down to just use the pre-existing nested_gfp as-is for the page array allocation, and just the passed in gfp_mask for the page allocation, after conditionally ORing __GFP_HIGHMEM into it. This also makes the allocation warning a little more correct. Also initialize two variables at the time of declaration while touching this area. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002124035.1539300-1-hch@lst.de Link: https://lkml.kernel.org/r/20201002124035.1539300-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: remove alloc_vm_areaChristoph Hellwig3-59/+1
All users are gone now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-12-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18x86/xen: open code alloc_vm_area in arch_gnttab_vallocChristoph Hellwig1-7/+20
Replace the last call to alloc_vm_area with an open coded version using an iterator in struct gnttab_vm_area instead of the triple indirection magic in alloc_vm_area. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-11-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pvChristoph Hellwig1-14/+16
Replacing alloc_vm_area with get_vm_area_caller + apply_page_range allows to fill put the phys_addr values directly instead of doing another loop over all addresses. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-10-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18drm/i915: use vmap in i915_gem_object_mapChristoph Hellwig2-68/+60
i915_gem_object_map implements fairly low-level vmap functionality in a driver. Split it into two helpers, one for remapping kernel memory which can use vmap, and one for I/O memory that uses vmap_pfn. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here for the kernel memory case (and could be added to vmap using a flag if actually required). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-9-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18drm/i915: stop using kmap in i915_gem_object_mapChristoph Hellwig1-5/+2
kmap for !PageHighmem is just a convoluted way to say page_address, and kunmap is a no-op in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-8-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18drm/i915: use vmap in shmem_pin_mapChristoph Hellwig1-58/+18
shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and manual pte setup. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here (and could be added to vmap using a flag if actually required). Switch to use vmap, and use vfree to free both the vmalloc mapping and the page array, as well as dropping the references to each page. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18zsmalloc: switch from alloc_vm_area to get_vm_areaChristoph Hellwig1-2/+8
Just manually pre-fault the PTEs using apply_to_page_range. Co-developed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: allow a NULL fn callback in apply_to_page_rangeChristoph Hellwig1-7/+9
Besides calling the callback on each page, apply_to_page_range also has the effect of pre-faulting all PTEs for the range. To support callers that only need the pre-faulting, make the callback optional. Based on a patch from Minchan Kim <minchan@kernel.org>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-5-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: add a vmap_pfn functionChristoph Hellwig3-0/+49
Add a proper helper to remap PFNs into kernel virtual space so that drivers don't have to abuse alloc_vm_area and open coded PTE manipulation for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-4-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: add a VM_MAP_PUT_PAGES flag for vmapChristoph Hellwig2-2/+8
Add a flag so that vmap takes ownership of the passed in page array. When vfree is called on such an allocation it will put one reference on each page, and free the page array itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-3-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: update the documentation for vfreeMatthew Wilcox (Oracle)1-10/+11
Patch series "remove alloc_vm_area", v4. This series removes alloc_vm_area, which was left over from the big vmalloc interface rework. It is a rather arkane interface, basicaly the equivalent of get_vm_area + actually faulting in all PTEs in the allocated area. It was originally addeds for Xen (which isn't modular to start with), and then grew users in zsmalloc and i915 which seems to mostly qualify as abuses of the interface, especially for i915 as a random driver should not set up PTE bits directly. This patch (of 11): * Document that you can call vfree() on an address returned from vmap() * Remove the note about the minimum size -- the minimum size of a vmalloc allocation is one page * Add a Context: section * Fix capitalisation * Reword the prohibition on calling from NMI context to avoid a double negative Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-1-hch@lst.de Link: https://lkml.kernel.org/r/20201002122204.1534411-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/madvise: introduce process_madvise() syscall: an external memory hinting APIMinchan Kim22-3/+117
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ [minchan@kernel.org: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com [minchan@kernel.org: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com [akpm@linux-foundation.org: fix patch ordering issue] [akpm@linux-foundation.org: fix arm64 whoops] [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian] [akpm@linux-foundation.org: fix i386 build] [sfr@canb.auug.org.au: fix syscall numbering] Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au [sfr@canb.auug.org.au: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au [minchan@kernel.org: fix mips build] Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com [yuehaibing@huawei.com: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com [minchan@kernel.org: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com [akpm@linux-foundation.org: pidfd_get_pid() gained an argument] [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18pid: move pidfd_get_pid() to pid.cMinchan Kim3-19/+20
process_madvise syscall needs pidfd_get_pid function to translate pidfd to pid so this patch move the function to kernel/pid.c. Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-3-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-3-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/madvise: pass mm to do_madviseMinchan Kim3-16/+20
Patch series "introduce memory hinting API for external process", v9. Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API. With that, application could give hints to kernel what memory range are preferred to be reclaimed. However, in some platform(e.g., Android), the information required to make the hinting decision is not known to the app. Instead, it is known to a centralized userspace daemon(e.g., ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the concern, this patch introduces new syscall - process_madvise(2). Bascially, it's same with madvise(2) syscall but it has some differences. 1. It needs pidfd of target process to provide the hint 2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this moment. Other hints in madvise will be opened when there are explicit requests from community to prevent unexpected bugs we couldn't support. 3. Only privileged processes can do something for other process's address space. For more detail of the new API, please see "mm: introduce external memory hinting API" description in this patchset. This patch (of 3): In upcoming patches, do_madvise will be called from external process context so we shouldn't asssume "current" is always hinted process's task_struct. Furthermore, we must not access mm_struct via task->mm, but obtain it via access_mm() once (in the following patch) and only use that pointer [1], so pass it to do_madvise() as well. Note the vma->vm_mm pointers are safe, so we can use them further down the call stack. And let's pass current->mm as arguments of do_madvise so it shouldn't change existing behavior but prepare next patch to make review easy. [vbabka@suse.cz: changelog tweak] [minchan@kernel.org: use current->mm for io_uring] Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org [akpm@linux-foundation.org: fix it for upstream changes] [akpm@linux-foundation.org: whoops] [rdunlap@infradead.org: add missing includes] Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: https://lkml.kernel.org/r/20200901000633.1920247-1-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-1-minchan@kernel.org Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-2-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-2-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18selftests/vm: 10x speedup for hmm-testsJohn Hubbard1-1/+1
This patch reduces the running time for hmm-tests from about 10+ seconds, to just under 1.0 second, for an approximately 10x speedup. That brings it in line with most of the other tests in selftests/vm, which mostly run in < 1 sec. This is done with a one-line change that simply reduces the number of iterations of several tests, from 256, to 10. Thanks to Ralph Campbell for suggesting changing NTIMES as a way to get the speedup. Suggested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Link: https://lkml.kernel.org/r/20201003011721.44238-1-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18binfmt_elf: take the mmap lock around find_extend_vma()Jann Horn1-0/+3
create_elf_tables() runs after setup_new_exec(), so other tasks can already access our new mm and do things like process_madvise() on it. (At the time I'm writing this commit, process_madvise() is not in mainline yet, but has been in akpm's tree for some time.) While I believe that there are currently no APIs that would actually allow another process to mess up our VMA tree (process_madvise() is limited to MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm under which no syscalls have been executed yet), this seems like an accident waiting to happen. Let's make sure that we always take the mmap lock around GUP paths as long as another process might be able to see the mm. (Yes, this diff looks suspicious because we drop the lock before doing anything with `vma`, but that's because we actually don't do anything with it apart from the NULL check.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michel Lespinasse <walken@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lkml.kernel.org/r/CAG48ez1-PBCdv3y8pn-Ty-b+FmBSLwDuVKFSt8h7wARLy0dF-Q@mail.gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/gup_benchmark: take the mmap lock around GUPJann Horn1-3/+12
To be safe against concurrent changes to the VMA tree, we must take the mmap lock around GUP operations (excluding the GUP-fast family of operations, which will take the mmap lock by themselves if necessary). This code is only for testing, and it's only reachable by root through debugfs, so this doesn't really have any impact; however, if we want to add lockdep asserts into the GUP path, we need to have clean locking here. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Michel Lespinasse <walken@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lkml.kernel.org/r/CAG48ez3SG6ngZLtasxJ6LABpOnqCz5-QHqb0B4k44TQ8F9n6+w@mail.gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/mmap: add inline munmap_vma_range() for code readabilityLiam R. Howlett1-15/+33
There are two locations that have a block of code for munmapping a vma range. Change those two locations to use a function and add meaningful comments about what happens to the arguments, which was unclear in the previous code. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200818154707.2515169-2-Liam.Howlett@Oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/mmap: add inline vma_next() for readability of mmap codeLiam R. Howlett1-6/+20
There are three places that the next vma is required which uses the same block of code. Replace the block with a function and add comments on what happens in the case where NULL is encountered. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200818154707.2515169-1-Liam.Howlett@Oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/migrate: avoid possible unnecessary process right check in ↵Miaohe Lin1-28/+43
kernel_move_pages() There is no need to check if this process has the right to modify the specified process when they are same. And we could also skip the security hook call if a process is modifying its own pages. Add helper function to handle these. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christopher Lameter <cl@linux.com> Link: https://lkml.kernel.org/r/20200819083331.19012-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/memory_hotplug: remove a wrapper for alloc_migration_target()Joonsoo Kim1-24/+22
To calculate the correct node to migrate the page for hotplug, we need to check node id of the page. Wrapper for alloc_migration_target() exists for this purpose. However, Vlastimil informs that all migration source pages come from a single node. In this case, we don't need to check the node id for each page and we don't need to re-set the target nodemask for each page by using the wrapper. Set up the migration_target_control once and use it for all pages. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Roman Gushchin <guro@fb.com> Link: http://lkml.kernel.org/r/1594622517-20681-10-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm/memory-failure: remove a wrapper for alloc_migration_target()Joonsoo Kim1-12/+6
There is a well-defined standard migration target callback. Use it directly. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Roman Gushchin <guro@fb.com> Link: http://lkml.kernel.org/r/1594622517-20681-9-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: kmem: enable kernel memcg accounting from interrupt contextsRoman Gushchin2-12/+13
If a memcg to charge can be determined (using remote charging API), there are no reasons to exclude allocations made from an interrupt context from the accounting. Such allocations will pass even if the resulting memcg size will exceed the hard limit, but it will affect the application of the memory pressure and an inability to put the workload under the limit will eventually trigger the OOM. To use active_memcg() helper, memcg_kmem_bypass() is moved back to memcontrol.c. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-5-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: kmem: prepare remote memcg charging infra for interrupt contextsRoman Gushchin2-16/+45
Remote memcg charging API uses current->active_memcg to store the currently active memory cgroup, which overwrites the memory cgroup of the current process. It works well for normal contexts, but doesn't work for interrupt contexts: indeed, if an interrupt occurs during the execution of a section with an active memcg set, all allocations inside the interrupt will be charged to the active memcg set (given that we'll enable accounting for allocations from an interrupt context). But because the interrupt might have no relation to the active memcg set outside, it's obviously wrong from the accounting prospective. To resolve this problem, let's add a global percpu int_active_memcg variable, which will be used to store an active memory cgroup which will be used from interrupt contexts. set_active_memcg() will transparently use current->active_memcg or int_active_memcg depending on the context. To make the read part simple and transparent for the caller, let's introduce two new functions: - struct mem_cgroup *active_memcg(void), - struct mem_cgroup *get_active_memcg(void). They are returning the active memcg if it's set, hiding all implementation details: where to get it depending on the current context. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-4-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: kmem: remove redundant checks from get_obj_cgroup_from_current()Roman Gushchin1-3/+0
There are checks for current->mm and current->active_memcg in get_obj_cgroup_from_current(), but these checks are redundant: memcg_kmem_bypass() called just above performs same checks. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-3-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()Roman Gushchin3-10/+9
Patch series "mm: kmem: kernel memory accounting in an interrupt context". This patchset implements memcg-based memory accounting of allocations made from an interrupt context. Historically, such allocations were passed unaccounted mostly because charging the memory cgroup of the current process wasn't an option. Also performance reasons were likely a reason too. The remote charging API allows to temporarily overwrite the currently active memory cgroup, so that all memory allocations are accounted towards some specified memory cgroup instead of the memory cgroup of the current process. This patchset extends the remote charging API so that it can be used from an interrupt context. Then it removes the fence that prevented the accounting of allocations made from an interrupt context. It also contains a couple of optimizations/code refactorings. This patchset doesn't directly enable accounting for any specific allocations, but prepares the code base for it. The bpf memory accounting will likely be the first user of it: a typical example is a bpf program parsing an incoming network packet, which allocates an entry in hashmap map to store some information. This patch (of 4): Currently memcg_kmem_bypass() is called before obtaining the current memory/obj cgroup using get_mem/obj_cgroup_from_current(). Moving memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces the number of call sites and allows further code simplifications. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-1-guro@fb.com Link: http://lkml.kernel.org/r/20200827225843.1270629-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm, memcg: rework remote charging API to support nestingRoman Gushchin5-30/+22
Currently the remote memcg charging API consists of two functions: memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the memcg value, which overwrites the memcg of the current task. memalloc_use_memcg(target_memcg); <...> memalloc_unuse_memcg(); It works perfectly for allocations performed from a normal context, however an attempt to call it from an interrupt context or just nest two remote charging blocks will lead to an incorrect accounting. On exit from the inner block the active memcg will be cleared instead of being restored. memalloc_use_memcg(target_memcg); memalloc_use_memcg(target_memcg_2); <...> memalloc_unuse_memcg(); Error: allocation here are charged to the memcg of the current process instead of target_memcg. memalloc_unuse_memcg(); This patch extends the remote charging API by switching to a single function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg), which sets the new value and returns the old one. So a remote charging block will look like: old_memcg = set_active_memcg(target_memcg); <...> set_active_memcg(old_memcg); This patch is heavily based on the patch by Johannes Weiner, which can be found here: https://lkml.org/lkml/2020/5/28/806 . Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Schatzberg <dschatzberg@fb.com> Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18ia64: fix build error with !COREDUMPKrzysztof Kozlowski1-1/+1
Fix linkage error when CONFIG_BINFMT_ELF is selected but CONFIG_COREDUMP is not: ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_phdrs': elfcore.c:(.text+0x172): undefined reference to `dump_emit' ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_data': elfcore.c:(.text+0x2b2): undefined reference to `dump_emit' Fixes: 1fcccbac89f5 ("elf coredump: replace ELF_CORE_EXTRA_* macros by functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200819064146.12529-1-krzk@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-17coccinelle: api: add kfree_mismatch scriptDenis Efremov1-0/+228
Check that alloc and free types of functions match each other. Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
2020-10-17mm: use limited read-ahead to satisfy readJens Axboe1-6/+14
For the case where read-ahead is disabled on the file, or if the cgroup is congested, ensure that we can at least do 1 page of read-ahead to make progress on the read in an async fashion. This could potentially be larger, but it's not needed in terms of functionality, so let's error on the side of caution as larger counts of pages may run into reclaim issues (particularly if we're congested). This makes sure we're not hitting the potentially sync ->readpage() path for IO that is marked IOCB_WAITQ, which could cause us to block. It also means we'll use the same path for IO, regardless of whether or not read-ahead happens to be disabled on the lower level device. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Hao_Xu <haoxu@linux.alibaba.com> [axboe: updated for new ractl API] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17mm: mark async iocb read as NOWAIT once some data has been copiedJens Axboe1-0/+8
Once we've copied some data for an iocb that is marked with IOCB_WAITQ, we should no longer attempt to async lock a new page. Instead make sure we return the copied amount, and let the caller retry, instead of returning -EIOCBQUEUED for a new page. This should only be possible with read-ahead disabled on the below device, and multiple threads racing on the same file. Haven't been able to reproduce on anything else. Cc: stable@vger.kernel.org # v5.9 Fixes: 1a0a7853b901 ("mm: support async buffered reads in generic_file_buffered_read()") Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17Merge tag 'perf-tools-for-v5.10-2020-10-15' of ↵Linus Torvalds164-5711/+10441
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - cgroup improvements for 'perf stat', allowing for compact specification of events and cgroups in the command line. - Support per thread topdown metrics in 'perf stat'. - Support sample-read topdown metric group in 'perf record' - Show start of latency in addition to its start in 'perf sched latency'. - Add min, max to 'perf script' futex-contention output, in addition to avg. - Allow usage of 'perf_event_attr->exclusive' attribute via the new ':e' event modifier. - Add 'snapshot' command to 'perf record --control', using it with Intel PT. - Support FIFO file names as alternative options to 'perf record --control'. - Introduce branch history "streams", to compare 'perf record' runs with 'perf diff' based on branch records and report hot streams. - Support PE executable symbol tables using libbfd, to profile, for instance, wine binaries. - Add filter support for option 'perf ftrace -F/--funcs'. - Allow configuring the 'disassembler_style' 'perf annotate' knob via 'perf config' - Update CascadelakeX and SkylakeX JSON vendor events files. - Add support for parsing perchip/percore JSON vendor events. - Add power9 hv_24x7 core level metric events. - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD zen1. - Enable Family 19h users by matching Zen2 AMD vendor events. - Use debuginfod in 'perf probe' when required debug files not found locally. - Display negative tid in non-sample events in 'perf script'. - Make GTK2 support opt-in - Add build test with GTK+ - Add missing -lzstd to the fast path feature detection - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for use in 'perf trace'. - Show python test script in verbose mode. - Fix uncore metric expressions - Msan uninitialized use fixes. - Use condition variables in 'perf bench numa' - Autodetect python3 binary in systems without python2. - Support md5 build ids in addition to sha1. - Add build id 'perf test' regression test. - Fix printable strings in python3 scripts. - Fix off by ones in 'perf trace' in arches using libaudit. - Fix JSON event code for events referencing std arch events. - Introduce 'perf test' shell script for Arm CoreSight testing. - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata event and in 'perf test tsc'. - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores", fixes and documentation update. - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and debuginfo files. - Do not print 'Metric Groups:' unnecessarily in 'perf list' - Refcounting fixes in the event parsing code. - Add expand cgroup event 'perf test' entry. - Fix out of bounds CPU map access when handling armv8_pmu events in 'perf stat'. - Add build-id injection 'perf bench' benchmark. - Enter namespace when reading build-id in 'perf inject'. - Do not load map/dso when injecting build-id speeding up the 'perf inject' process. - Add --buildid-all option to avoid processing all samples, just the mmap metadata events. - Add feature test to check if libbfd has buildid support - Add 'perf test' entry for PE binary format support. - Fix typos in power8 PMU vendor events JSON files. - Hide libtraceevent non API functions. * tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits) perf c2c: Update documentation for metrics reorganization perf c2c: Add metrics "RMT Load Hit" perf c2c: Correct LLC load hit metrics perf c2c: Change header for LLC local hit perf c2c: Use more explicit headers for HITM perf c2c: Change header from "LLC Load Hitm" to "Load Hitm" perf c2c: Organize metrics based on memory hierarchy perf c2c: Display "Total Stores" as a standalone metrics perf c2c: Display the total numbers continuously perf bench: Use condition variables in numa. perf jevents: Fix event code for events referencing std arch events perf diff: Support hot streams comparison perf streams: Report hot streams perf streams: Calculate the sum of total streams hits perf streams: Link stream pair perf streams: Compare two streams perf streams: Get the evsel_streams by evsel_idx perf streams: Introduce branch history "streams" perf intel-pt: Improve PT documentation slightly perf tools: Add support for exclusive groups/events ...
2020-10-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds224-4703/+5213
Pull rdma updates from Jason Gunthorpe: "A usual cycle for RDMA with a typical mix of driver and core subsystem updates: - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns, usnic, qib, qedr, cxgb4, hns, bnxt_re - Various rtrs fixes and updates - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA wasn't working right - Use tracepoints instead of pr_debug in the CM code - Scrub the locking in ucma and cma to close more syzkaller bugs - Use tasklet_setup in the subsystem - Revert the idea that 'destroy' operations are not allowed to fail at the driver level. This proved unworkable from a HW perspective. - Revise how the umem API works so drivers make fewer mistakes using it - XRC support for qedr - Convert uverbs objects RWQ and MW to new the allocation scheme - Large queue entry sizes for hns - Use hmm_range_fault() for mlx5 On Demand Paging - uverbs APIs to inspect the GID table instead of sysfs - Move some of the RDMA code for building large page SGLs into lib/scatterlist" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits) RDMA/ucma: Fix use after free in destroy id flow RDMA/rxe: Handle skb_clone() failure in rxe_recv.c RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI RDMA: Explicitly pass in the dma_device to ib_register_device lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values IB/mlx4: Convert rej_tmout radix-tree to XArray RDMA/rxe: Fix bug rejecting all multicast packets RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt() RDMA/rxe: Remove duplicate entries in struct rxe_mr IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS IB/rdmavt: Fix sizeof mismatch MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl. RDMA/bnxt_re: Use rdma_umem_for_each_dma_block() RDMA/umem: Move to allocate SG table from pages lib/scatterlist: Add support in dynamic allocation of SG table from pages tools/testing/scatterlist: Show errors in human readable form tools/testing/scatterlist: Rejuvenate bit-rotten test RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces RDMA/uverbs: Expose the new GID query API to user space ...
2020-10-17Merge tag 'i3c/for-5.10' of ↵Linus Torvalds2-54/+94
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Boris Brezillon: - Fix DAA for the pre-reserved address case - Fix an error path in the cadence driver * tag 'i3c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: master: Fix error return in cdns_i3c_master_probe() i3c: master: fix for SETDASA and DAA process i3c: master add i3c_master_attach_boardinfo to preserve boardinfo
2020-10-17Merge tag 'mtd/for-5.10' of ↵Linus Torvalds121-1076/+2316
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD updates from Richard Weinberger: "NAND core changes: - Drop useless 'depends on' in Kconfig - Add an extra level in the Kconfig hierarchy - Trivial spellings - Dynamic allocation of the interface configurations - Dropping the default ONFI timing mode - Various cleanup (types, structures, naming, comments) - Hide the chip->data_interface indirection - Add the generic rb-gpios property - Add the ->choose_interface_config() hook - Introduce nand_choose_best_sdr_timings() - Use default values for tPROG_max and tBERS_max - Avoid redefining tR_max and tCCS_min - Add a helper to find the closest ONFI mode - bcm63xx MTD parsers: simplify CFE detection Raw NAND controller drivers changes: - fsl-upm: Deprecation of specific DT properties - fsl_upm: Driver rework and cleanup in favor of ->exec_op() - Ingenic: Cleanup ARRAY_SIZE() vs sizeof() use - brcmnand: ECC error handling on EDU transfers - brcmnand: Don't default to EDU transfers - qcom: Set BAM mode only if not set already - qcom: Avoid write to unavailable register - gpio: Driver rework in favor of ->exec_op() - tango: ->exec_op() conversion - mtk: ->exec_op() conversion Raw NAND chip drivers changes: - toshiba: Implement ->choose_interface_config() for TH58NVG2S3HBAI4 - toshiba: Implement ->choose_interface_config() for TC58NVG0S3E - toshiba: Implement ->choose_interface_config() for TC58TEG5DCLTA00 - hynix: Implement ->choose_interface_config() for H27UCG8T2ATR-BC HyperBus changes: - DMA support for TI's AM654 HyperBus controller driver. - HyperBus frontend driver for Renesas RPC-IF driver. SPI NOR core changes: - Support for Winbond w25q64jwm flash - Enable 4K sector support for mx25l12805d SPI NOR controller drivers changes: - intel-spi Add Alder Lake-S PCI ID MTD Core changes: - mtdoops: Don't run panic write twice - mtdconcat: Correctly handle panic write - Use DEFINE_SHOW_ATTRIBUTE" * tag 'mtd/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (76 commits) mtd: hyperbus: Fix build failure when only RPCIF_HYPERBUS is enabled mtd: hyperbus: add Renesas RPC-IF driver Revert "mtd: spi-nor: Prefer asynchronous probe" mtd: parsers: bcm63xx: Do not make it modular mtd: spear_smi: Enable compile testing mtd: maps: vmu-flash: fix typos for struct memcard mtd: physmap: Add Baikal-T1 physically mapped ROM support mtd: maps: vmu-flash: simplify the return expression of probe_maple_vmu mtd: onenand: simplify the return expression of onenand_transfer_auto_oob mtd: rawnand: cadence: remove a redundant dev_err call mtd: rawnand: ams-delta: Fix non-OF build warning mtd: rawnand: Don't overwrite the error code from nand_set_ecc_soft_ops() mtd: rawnand: Introduce nand_set_ecc_on_host_ops() mtd: rawnand: atmel: Check return values for nand_read_data_op mtd: rawnand: vf610: Remove unused function vf610_nfc_transfer_size() mtd: rawnand: qcom: Simplify with dev_err_probe() mtd: rawnand: marvell: Fix and update kerneldoc mtd: rawnand: marvell: Simplify with dev_err_probe() mtd: rawnand: gpmi: Simplify with dev_err_probe() mtd: rawnand: atmel: Simplify with dev_err_probe() ...
2020-10-17Merge tag 'thermal-v5.10-rc1' of ↵Linus Torvalds22-82/+156
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Fix Kconfig typo "acces" -> "access" (Colin Ian King) - Use dev_error_probe() to simplify the error handling on imx and imx8 platforms (Anson Huang) - Use dedicated kobj_to_dev() instead of container_of() in the sysfs core code (Tian Tao) - Fix coding style by adding braces to a one line conditional statement on rcar (Geert Uytterhoeven) - Add DT binding documentation for the r8a774e1 platform and update the Kconfig description supporting RZ/G2 SoCs (Lad Prabhakar) - Simplify the return expression of stm_thermal_prepare on the stm32 platform (Qinglang Miao) - Fix the unit in the function documentation for the idle injection cooling device (Zhuguang Qing) - Remove an unecessary mutex_init() in the core code (Qinglang Miao) - Add support for keep alive events in the core code and the specific int340x (Srinivas Pandruvada) - Remove unused thermal zone variable in devfreq and cpufreq cooling devices (Zhuguang Qing) - Add the A100's THS controller support (Yangtao Li) - Add power management on the omap3's bandgap sensor (Adam Ford) - Fix a missing nlmsg_free in the netlink core error path (Jing Xiangfeng) * tag 'thermal-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp() thermal: ti-soc-thermal: Enable addition power management thermal: sun8i: Add A100's THS controller support thermal: sun8i: add TEMP_CALIB_MASK for calibration data in sun50i_h6_ths_calibrate dt-bindings: thermal: sun8i: Add binding for A100's THS controller thermal: cooling: Remove unused variable *tz thermal: int340x: Add keep alive response method thermal: core: Add new event for sending keep alive notifications thermal: int340x: Provide notification for OEM variable change thermal: core: remove unnecessary mutex_init() thermal/idle_inject: Fix comment of idle_duration_us and name of latency_ns thermal: Kconfig: Update description for RCAR_GEN3_THERMAL config thermal: stm32: simplify the return expression of stm_thermal_prepare() dt-bindings: thermal: rcar-gen3-thermal: Add r8a774e1 support thermal: rcar_thermal: Add missing braces to conditional statement thermal: Use kobj_to_dev() instead of container_of() thermal: imx8mm: Use dev_err_probe() to simplify error handling thermal: imx: Use dev_err_probe() to simplify error handling drivers: thermal: Kconfig: fix spelling mistake "acces" -> "access"
2020-10-17io_uring: fix double poll mask initPavel Begunkov1-1/+3
__io_queue_proc() is used by both, poll reqs and apoll. Don't use req->poll.events to copy poll mask because for apoll it aliases with private data of the request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io-wq: inherit audit loginuid and sessionidJens Axboe4-1/+41
Make sure the async io-wq workers inherit the loginuid and sessionid from the original task, and restore them to unset once we're done with the async work item. While at it, disable the ability for kernel threads to write to their own loginuid. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: use percpu counters to track inflight requestsJens Axboe2-27/+30
Even though we place the req_issued and req_complete in separate cachelines, there's considerable overhead in doing the atomics particularly on the completion side. Get rid of having the two counters, and just use a percpu_counter for this. That's what it was made for, after all. This considerably reduces the overhead in __io_free_req(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: assign new io_identity for task if members have changedJens Axboe2-5/+17
This avoids doing a copy for each new async IO, if some parts of the io_identity has changed. We avoid reference counting for the normal fast path of nothing ever changing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: store io_identity in io_uring_taskJens Axboe2-10/+12
This is, by definition, a per-task structure. So store it in the task context, instead of doing carrying it in each io_kiocb. We're being a bit inefficient if members have changed, as that requires an alloc and copy of a new io_identity struct. The next patch will fix that up. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: COW io_identity on mismatchJens Axboe2-52/+171
If the io_identity doesn't completely match the task, then create a copy of it and use that. The existing copy remains valid until the last user of it has gone away. This also changes the personality lookup to be indexed by io_identity, instead of creds directly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: move io identity items into separate structJens Axboe4-57/+64
io-wq contains a pointer to the identity, which we just hold in io_kiocb for now. This is in preparation for putting this outside io_kiocb. The only exception is struct files_struct, which we'll need different rules for to avoid a circular dependency. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: rely solely on work flags to determine personality.Jens Axboe2-22/+33
We solely rely on work->work_flags now, so use that for proper checking and clearing/dropping of various identity items. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: pass required context in as flagsJens Axboe3-64/+52
We have a number of bits that decide what context to inherit. Set up io-wq flags for these instead. This is in preparation for always having the various members set, but not always needing them for all requests. No intended functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io-wq: assign NUMA node locality if appropriateJens Axboe1-0/+1
There was an assumption that kthread_create_on_node() would properly set NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we do this when creating an io-wq context on NUMA. Cc: stable@vger.kernel.org Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: fix error path cleanup in io_sqe_files_register()Jens Axboe1-1/+2
syzbot reports the following crash: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 8927 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline] RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline] RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline] RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553 Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c RSP: 0018:ffffc90009137d68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000 RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005 RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37 R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000 FS: 00007f4266f3c700(0000) GS:ffff8880ae500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000118c000 CR3: 000000008e57d000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45de59 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4266f3bc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab RAX: ffffffffffffffda RBX: 00000000000083c0 RCX: 000000000045de59 RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000005 RBP: 000000000118bf68 R08: 0000000000000000 R09: 0000000000000000 R10: 40000000000000a1 R11: 0000000000000246 R12: 000000000118bf2c R13: 00007fff2fa4f12f R14: 00007f4266f3c9c0 R15: 000000000118bf2c Modules linked in: ---[ end trace 2a40a195e2d5e6e6 ]--- RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline] RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline] RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline] RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553 Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c RSP: 0018:ffffc90009137d68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000 RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005 RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37 R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000 FS: 00007f4266f3c700(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000074a918 CR3: 000000008e57d000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 which is a copy of fget failure condition jumping to cleanup, but the cleanup requires ctx->file_data to be assigned. Assign it when setup, and ensure that we clear it again for the error path exit. Fixes: 5398ae698525 ("io_uring: clean file_data access in files_register") Reported-by: syzbot+f4ebcc98223dafd8991e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly"Jens Axboe1-2/+2
This reverts commit 738277adc81929b3e7c9b63fec6693868cc5f931. This change didn't make a lot of sense, and as Linus reports, it actually fails on clang: /tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section attributes for .data..read_mostly The arrays are already marked const so, by definition, they are not just read-mostly, they are read-only. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: fix REQ_F_COMP_LOCKED by killing itPavel Begunkov1-80/+69
REQ_F_COMP_LOCKED is used and implemented in a buggy way. The problem is that the flag is set before io_put_req() but not cleared after, and if that wasn't the final reference, the request will be freed with the flag set from some other context, which may not hold a spinlock. That means possible races with removing linked timeouts and unsynchronised completion (e.g. access to CQ). Instead of fixing REQ_F_COMP_LOCKED, kill the flag and use task_work_add() to move such requests to a fresh context to free from it, as was done with __io_free_req_finish(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: dig out COMP_LOCK from deep call chainPavel Begunkov1-24/+9
io_req_clean_work() checks REQ_F_COMP_LOCK to pass this two layers up. Move the check up into __io_free_req(), so at least it doesn't looks so ugly and would facilitate further changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: don't put a poll req under spinlockPavel Begunkov1-2/+1
Move io_put_req() in io_poll_task_handler() from under spinlock. This eliminates the need to use REQ_F_COMP_LOCKED, at the expense of potentially having to grab the lock again. That's still a better trade off than relying on the locked flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: don't unnecessarily clear F_LINK_TIMEOUTPavel Begunkov1-1/+0
If a request had REQ_F_LINK_TIMEOUT it would've been cleared in __io_kill_linked_timeout() by the time of __io_fail_links(), so no need to care about it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: don't set COMP_LOCKED if won't putPavel Begunkov1-1/+1
__io_kill_linked_timeout() sets REQ_F_COMP_LOCKED for a linked timeout even if it can't cancel it, e.g. it's already running. It not only races with io_link_timeout_fn() for ->flags field, but also leaves the flag set and so io_link_timeout_fn() may find it and decide that it holds the lock. Hopefully, the second problem is potential. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: Fix sizeof() mismatchColin Ian King1-1/+1
An incorrect sizeof() is being used, sizeof(file_data->table) is not correct, it should be sizeof(*file_data->table). Fixes: 5398ae698525 ("io_uring: clean file_data access in files_register") Signed-off-by: Colin Ian King <colin.king@canonical.com> Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-16mailbox: avoid timer start from callbackJassi Brar1-5/+7
If the txdone is done by polling, it is possible for msg_submit() to start the timer while txdone_hrtimer() callback is running. If the timer needs recheduling, it could already be enqueued by the time hrtimer_forward_now() is called, leading hrtimer to loudly complain. WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110 CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5 Hardware name: Libre Computer AML-S805X-AC (DT) Workqueue: events_freezable_power_ thermal_zone_device_check pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--) pc : hrtimer_forward+0xc4/0x110 lr : txdone_hrtimer+0xf8/0x118 [...] This can be fixed by not starting the timer from the callback path. Which requires the timer reloading as long as any message is queued on the channel, and not just when current tx is not done yet. Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling") Reported-by: Da Xue <da@libre.computer> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Jerome Brunet <jbrunet@baylibre.com> Tested-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2020-10-16xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=nDarrick J. Wong1-0/+1
Pavel Machek complained that the question about supporting deprecated XFS v4 comes up even when XFS is disabled. This clearly makes no sense, so fix Kconfig. Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-10-16xfs: fix high key handling in the rt allocator's query_range functionDarrick J. Wong1-7/+4
Fix some off-by-one errors in xfs_rtalloc_query_range. The highest key in the realtime bitmap is always one less than the number of rt extents, which means that the key clamp at the start of the function is wrong. The 4th argument to xfs_rtfind_forw is the highest rt extent that we want to probe, which means that passing 1 less than the high key is wrong. Finally, drop the rem variable that controls the loop because we can compare the iteration point (rtstart) against the high key directly. The sordid history of this function is that the original commit (fb3c3) incorrectly passed (high_rec->ar_startblock - 1) as the 'limit' parameter to xfs_rtfind_forw. This was wrong because the "high key" is supposed to be the largest key for which the caller wants result rows, not the key for the first row that could possibly be outside the range that the caller wants to see. A subsequent attempt (8ad56) to strengthen the parameter checking added incorrect clamping of the parameters to the number of rt blocks in the system (despite the bitmap functions all taking units of rt extents) to avoid querying ranges past the end of rt bitmap file but failed to fix the incorrect _rtfind_forw parameter. The original _rtfind_forw parameter error then survived the conversion of the startblock and blockcount fields to rt extents (a0e5c), and the most recent off-by-one fix (a3a37) thought it was patching a problem when the end of the rt volume is not in use, but none of these fixes actually solved the original problem that the author was confused about the "limit" argument to xfs_rtfind_forw. Sadly, all four of these patches were written by this author and even his own usage of this function and rt testing were inadequate to get this fixed quickly. Original-problem: fb3c3de2f65c ("xfs: add a couple of queries to iterate free extents in the rtbitmap") Not-fixed-by: 8ad560d2565e ("xfs: strengthen rtalloc query range checks") Not-fixed-by: a0e5c435babd ("xfs: fix xfs_rtalloc_rec units") Fixes: a3a374bf1889 ("xfs: fix off-by-one error in xfs_rtalloc_query_range") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-10-16Merge tag 'ovl-update-5.10' of ↵Linus Torvalds12-200/+446
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: - Improve performance for certain container setups by introducing a "volatile" mode - ioctl improvements - continue preparation for unprivileged overlay mounts * tag 'ovl-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: use generic vfs_ioc_setflags_prepare() helper ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories ovl: rearrange ovl_can_list() ovl: enumerate private xattrs ovl: pass ovl_fs down to functions accessing private xattrs ovl: drop flags argument from ovl_do_setxattr() ovl: adhere to the vfs_ vs. ovl_do_ conventions for xattrs ovl: use ovl_do_getxattr() for private xattr ovl: fold ovl_getxattr() into ovl_get_redirect_xattr() ovl: clean up ovl_getxattr() in copy_up.c duplicate ovl_getxattr() ovl: provide a mount option "volatile" ovl: check for incompatible features in work dir
2020-10-16Merge tag 'afs-fixes-20201016' of ↵Linus Torvalds12-172/+378
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull afs updates from David Howells: "A collection of fixes to fix afs_cell struct refcounting, thereby fixing a slew of related syzbot bugs: - Fix the cell tree in the netns to use an rwsem rather than RCU. There seem to be some problems deriving from the use of RCU and a seqlock to walk the rbtree, but it's not entirely clear what since there are several different failures being seen. Changing things to use an rwsem instead makes it more robust. The extra performance derived from using RCU isn't necessary in this case since the only time we're looking up a cell is during mount or when cells are being manually added. - Fix the refcounting by splitting the usage counter into a memory refcount and an active users counter. The usage counter was doing double duty, keeping track of whether a cell is still in use and keeping track of when it needs to be destroyed - but this makes the clean up tricky. Separating these out simplifies the logic. - Fix purging a cell that has an alias. A cell alias pins the cell it's an alias of, but the alias is always later in the list. Trying to purge in a single pass causes rmmod to hang in such a case. - Fix cell removal. If a cell's manager is requeued whilst it's removing itself, the manager will run again and re-remove itself, causing problems in various places. Follow Hillf Danton's suggestion to insert a more terminal state that causes the manager to do nothing post-removal. In additional to the above, two other changes: - Add a tracepoint for the cell refcount and active users count. This helped with debugging the above and may be useful again in future. - Downgrade an assertion to a print when a still-active server is seen during purging. This was happening as a consequence of incomplete cell removal before the servers were cleaned up" * tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Don't assert on unpurgeable server records afs: Add tracing for cell refcount and active user count afs: Fix cell removal afs: Fix cell purging with aliases afs: Fix cell refcounting by splitting the usage counter afs: Fix rapid cell addition/removal by not using RCU on cells tree
2020-10-16Merge tag 'f2fs-for-5.10-rc1' of ↵Linus Torvalds28-493/+1795
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added new features such as zone capacity for ZNS and a new GC policy, ATGC, along with in-memory segment management. In addition, we could improve the decompression speed significantly by changing virtual mapping method. Even though we've fixed lots of small bugs in compression support, I feel that it becomes more stable so that I could give it a try in production. Enhancements: - suport zone capacity in NVMe Zoned Namespace devices - introduce in-memory current segment management - add standart casefolding support - support age threshold based garbage collection - improve decompression speed by changing virtual mapping method Bug fixes: - fix condition checks in some ioctl() such as compression, move_range, etc - fix 32/64bits support in data structures - fix memory allocation in zstd decompress - add some boundary checks to avoid kernel panic on corrupted image - fix disallowing compression for non-empty file - fix slab leakage of compressed block writes In addition, it includes code refactoring for better readability and minor bug fixes for compression and zoned device support" * tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits) f2fs: code cleanup by removing unnecessary check f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info f2fs: fix writecount false positive in releasing compress blocks f2fs: introduce check_swap_activate_fast() f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case f2fs: handle errors of f2fs_get_meta_page_nofail f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode f2fs: reject CASEFOLD inode flag without casefold feature f2fs: fix memory alignment to support 32bit f2fs: fix slab leak of rpages pointer f2fs: compress: fix to disallow enabling compress on non-empty file f2fs: compress: introduce cic/dic slab cache f2fs: compress: introduce page array slab cache f2fs: fix to do sanity check on segment/section count f2fs: fix to check segment boundary during SIT page readahead f2fs: fix uninit-value in f2fs_lookup f2fs: remove unneeded parameter in find_in_block() f2fs: fix wrong total_sections check and fsmeta check f2fs: remove duplicated code in sanity_check_area_boundary f2fs: remove unused check on version_bitmap ...
2020-10-16Merge tag 'docs/v5.10-1' of ↵Linus Torvalds309-2275/+1952
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull documentation updates from Mauro Carvalho Chehab: "A series of patches addressing warnings produced by make htmldocs. This includes: - kernel-doc markup fixes - ReST fixes - Updates at the build system in order to support newer versions of the docs build toolchain (Sphinx) After this series, the number of html build warnings should reduce significantly, and building with Sphinx 3.1 or later should now be supported (although it is still recommended to use Sphinx 2.4.4). As agreed with Jon, I should be sending you a late pull request by the end of the merge window addressing remaining issues with docs build, as there are a number of warning fixes that depends on pull requests that should be happening along the merge window. The end goal is to have a clean htmldocs build on Kernel 5.10. PS. It should be noticed that Sphinx 3.0 is not currently supported, as it lacks support for C domain namespaces. Such feature, needed in order to document uAPI system calls with Sphinx 3.x, was added only on Sphinx 3.1" * tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (75 commits) PM / devfreq: remove a duplicated kernel-doc markup mm/doc: fix a literal block markup workqueue: fix a kernel-doc warning docs: virt: user_mode_linux_howto_v2.rst: fix a literal block markup Input: sparse-keymap: add a description for @sw rcu/tree: docs: document bkvcache new members at struct kfree_rcu_cpu nl80211: docs: add a description for s1g_cap parameter usb: docs: document altmode register/unregister functions kunit: test.h: fix a bad kernel-doc markup drivers: core: fix kernel-doc markup for dev_err_probe() docs: bio: fix a kerneldoc markup kunit: test.h: solve kernel-doc warnings block: bio: fix a warning at the kernel-doc markups docs: powerpc: syscall64-abi.rst: fix a malformed table drivers: net: hamradio: fix document location net: appletalk: Kconfig: Fix docs location dt-bindings: fix references to files converted to yaml memblock: get rid of a :c:type leftover math64.h: kernel-docs: Convert some markups into normal comments media: uAPI: buffer.rst: remove a left-over documentation ...
2020-10-16Merge tag 'trace-v5.10-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix mismatch section of adding early trace events. Fixes the issue of a mismatch section that was missed due to gcc inlining the offending function, while clang did not (and reported the issue)" * tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove __init from __trace_early_add_new_event()
2020-10-16Merge tag 'printk-for-5.10-fixup' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: "Prevent overflow in the new lockless ringbuffer" * tag 'printk-for-5.10-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: ringbuffer: Wrong data pointer when appending small string
2020-10-16Merge tag 'kgdb-5.10-rc1' of ↵Linus Torvalds10-34/+101
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "A fairly modest set of changes for this cycle. Of particular note are an earlycon fix from Doug Anderson and my own changes to get kgdb/kdb to honour the kprobe blocklist. The later creates a safety rail that strongly encourages developers not to place breakpoints in, for example, arch specific trap handling code. Also included are a couple of small fixes and tweaks: an API update, eliminate a coverity dead code warning, improved handling of search during multi-line printk and a couple of typo corrections" * tag 'kgdb-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Fix pager search for multi-line strings kernel: debug: Centralize dbg_[de]activate_sw_breakpoints kgdb: Add NOKPROBE labels on the trap handler functions kgdb: Honour the kprobe blocklist when setting breakpoints kernel/debug: Fix spelling mistake in debug_core.c kdb: Use newer api for tasklist scanning kgdb: Make "kgdbcon" work properly with "kgdb_earlycon" kdb: remove unnecessary null check of dbg_io_ops
2020-10-16Merge tag 'mips_5.10' of ↵Linus Torvalds165-3681/+1730
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - removed support for PNX833x alias NXT_STB22x - included Ingenic SoC support into generic MIPS kernels - added support for new Ingenic SoCs - converted workaround selection to use Kconfig - replaced old boot mem functions by memblock_* - enabled COP2 usage in kernel for Loongson64 to make use of 16byte load/stores possible - cleanups and fixes * tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (92 commits) MIPS: DEC: Restore bootmem reservation for firmware working memory area MIPS: dec: fix section mismatch bcm963xx_tag.h: fix duplicated word mips: ralink: enable zboot support MIPS: ingenic: Remove CPU_SUPPORTS_HUGEPAGES MIPS: cpu-probe: remove MIPS_CPU_BP_GHIST option bit MIPS: cpu-probe: introduce exclusive R3k CPU probe MIPS: cpu-probe: move fpu probing/handling into its own file MIPS: replace add_memory_region with memblock MIPS: Loongson64: Clean up numa.c MIPS: Loongson64: Select SMP in Kconfig to avoid build error mips: octeon: Add Ubiquiti E200 and E220 boards MIPS: SGI-IP28: disable use of ll/sc in kernel MIPS: tx49xx: move tx4939_add_memory_regions into only user MIPS: pgtable: Remove used PAGE_USERIO define MIPS: alchemy: Share prom_init implementation MIPS: alchemy: Fix build breakage, if TOUCHSCREEN_WM97XX is disabled MIPS: process: include exec.h header in process.c MIPS: process: Add prototype for function arch_dup_task_struct MIPS: idle: Add prototype for function check_wait ...
2020-10-16Merge tag 's390-5.10-1' of ↵Linus Torvalds129-2162/+4094
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Remove address space overrides using set_fs() - Convert to generic vDSO - Convert to generic page table dumper - Add ARCH_HAS_DEBUG_WX support - Add leap seconds handling support - Add NVMe firmware-assisted kernel dump support - Extend NVMe boot support with memory clearing control and addition of kernel parameters - AP bus and zcrypt api code rework. Add adapter configure/deconfigure interface. Extend debug features. Add failure injection support - Add ECC secure private keys support - Add KASan support for running protected virtualization host with 4-level paging - Utilize destroy page ultravisor call to speed up secure guests shutdown - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code - Various checksum improvements - Other small various fixes and improvements all over the code * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits) s390/uaccess: fix indentation s390/uaccess: add default cases for __put_user_fn()/__get_user_fn() s390/zcrypt: fix wrong format specifications s390/kprobes: move insn_page to text segment s390/sie: fix typo in SIGP code description s390/lib: fix kernel doc for memcmp() s390/zcrypt: Introduce Failure Injection feature s390/zcrypt: move ap_msg param one level up the call chain s390/ap/zcrypt: revisit ap and zcrypt error handling s390/ap: Support AP card SCLP config and deconfig operations s390/sclp: Add support for SCLP AP adapter config/deconfig s390/ap: add card/queue deconfig state s390/ap: add error response code field for ap queue devices s390/ap: split ap queue state machine state from device state s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG s390/zcrypt: introduce msg tracking in zcrypt functions s390/startup: correct early pgm check info formatting s390: remove orphaned extern variables declarations s390/kasan: make sure int handler always run with DAT on s390/ipl: add support to control memory clearing for nvme re-IPL ...
2020-10-16lib: kunit: Fix compilation test when using TEST_BIT_FIELD_COMPILEVitor Massaru Iha1-4/+2
A build condition was missing around a compilation test, this compilation test comes from the original test_bitfield code. And removed unnecessary code for this test. Fixes: d2585f5164c2 ("lib: kunit: add bitfield test conversion to KUnit") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Vitor Massaru Iha <vitor@massaru.org> Link: https://lore.kernel.org/linux-next/20201015163056.56fcc835@canb.auug.org.au/ Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-10-16Merge tag 'powerpc-5.10-1' of ↵Linus Torvalds223-2491/+3245
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting it for powerpc, as well as a related fix for sparc. - Remove support for PowerPC 601. - Some fixes for watchpoints & addition of a new ptrace flag for detecting ISA v3.1 (Power10) watchpoint features. - A fix for kernels using 4K pages and the hash MMU on bare metal Power9 systems with > 16TB of RAM, or RAM on the 2nd node. - A basic idle driver for shallow stop states on Power10. - Tweaks to our sched domains code to better inform the scheduler about the hardware topology on Power9/10, where two SMT4 cores can be presented by firmware as an SMT8 core. - A series doing further reworks & cleanups of our EEH code. - Addition of a filter for RTAS (firmware) calls done via sys_rtas(), to prevent root from overwriting kernel memory. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero, Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha, Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt, Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang Yingliang, zhengbin. * tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits) Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed" selftests/powerpc: Fix eeh-basic.sh exit codes cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier powerpc/time: Make get_tb() common to PPC32 and PPC64 powerpc/time: Make get_tbl() common to PPC32 and PPC64 powerpc/time: Remove get_tbu() powerpc/time: Avoid using get_tbl() and get_tbu() internally powerpc/time: Make mftb() common to PPC32 and PPC64 powerpc/time: Rename mftbl() to mftb() powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S powerpc/32s: Rename head_32.S to head_book3s_32.S powerpc/32s: Setup the early hash table at all time. powerpc/time: Remove ifdef in get_dec() and set_dec() powerpc: Remove get_tb_or_rtc() powerpc: Remove __USE_RTC() powerpc: Tidy up a bit after removal of PowerPC 601. powerpc: Remove support for PowerPC 601 powerpc: Remove PowerPC 601 powerpc: Drop SYNC_601() ISYNC_601() and SYNC() powerpc: Remove CONFIG_PPC601_SYNC_FIX ...
2020-10-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds161-1724/+2390
Merge more updates from Andrew Morton: "155 patches. Subsystems affected by this patch series: mm (dax, debug, thp, readahead, page-poison, util, memory-hotplug, zram, cleanups), misc, core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch, binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan, romfs, and fault-injection" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits) lib, uaccess: add failure injection to usercopy functions lib, include/linux: add usercopy failure capability ROMFS: support inode blocks calculation ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang sched.h: drop in_ubsan field when UBSAN is in trap mode scripts/gdb/tasks: add headers and improve spacing format scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command kernel/relay.c: drop unneeded initialization panic: dump registers on panic_on_warn rapidio: fix the missed put_device() for rio_mport_add_riodev rapidio: fix error handling path nilfs2: fix some kernel-doc warnings for nilfs2 autofs: harden ioctl table ramfs: fix nommu mmap with gaps in the page cache mm: remove the now-unnecessary mmget_still_valid() hack mm/gup: take mmap_lock in get_dump_page() binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot coredump: rework elf/elf_fdpic vma_dump_size() into common helper coredump: refactor page range dumping into common helper coredump: let dump_emit() bail out on short writes ...
2020-10-16lib, uaccess: add failure injection to usercopy functionsAlbert van der Linde4-2/+22
To test fault-tolerance of user memory access functions, introduce fault injection to usercopy functions. If a failure is expected return either -EFAULT or the total amount of bytes that were not copied. Signed-off-by: Albert van der Linde <alinde@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20200831171733.955393-3-alinde@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib, include/linux: add usercopy failure capabilityAlbert van der Linde6-1/+76
Patch series "add fault injection to user memory access", v3. The goal of this series is to improve testing of fault-tolerance in usages of user memory access functions, by adding support for fault injection. syzkaller/syzbot are using the existing fault injection modes and will use this particular feature also. The first patch adds failure injection capability for usercopy functions. The second changes usercopy functions to use this new failure capability (copy_from_user, ...). The third patch adds get/put/clear_user failures to x86. This patch (of 3): Add a failure injection capability to improve testing of fault-tolerance in usages of user memory access functions. Add CONFIG_FAULT_INJECTION_USERCOPY to enable faults in usercopy functions. The should_fail_usercopy function is to be called by these functions (copy_from_user, get_user, ...) in order to fail or not. Signed-off-by: Albert van der Linde <alinde@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20200831171733.955393-1-alinde@google.com Link: http://lkml.kernel.org/r/20200831171733.955393-2-alinde@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16ROMFS: support inode blocks calculationLibing Zhou1-0/+1
When use 'stat' tool to display file status, the 'Blocks' field always in '0', this is not good for tool 'du'(e.g.: busybox 'du'), it always output '0' size for the files under ROMFS since such tool calculates number of 512B Blocks. This patch calculates approx. number of 512B blocks based on inode size. Signed-off-by: Libing Zhou <libing.zhou@nokia-sbell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200811052606.4243-1-libing.zhou@nokia-sbell.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for ClangGeorge Popescu2-1/+23
When the kernel is compiled with Clang, -fsanitize=bounds expands to -fsanitize=array-bounds and -fsanitize=local-bounds. Enabling -fsanitize=local-bounds with Clang has the unfortunate side-effect of inserting traps; this goes back to its original intent, which was as a hardening and not a debugging feature [1]. The same feature made its way into -fsanitize=bounds, but the traps remained. For that reason, -fsanitize=bounds was split into 'array-bounds' and 'local-bounds' [2]. Since 'local-bounds' doesn't behave like a normal sanitizer, enable it with Clang only if trapping behaviour was requested by CONFIG_UBSAN_TRAP=y. Add the UBSAN_BOUNDS_LOCAL config to Kconfig.ubsan to enable the 'local-bounds' option by default when UBSAN_TRAP is enabled. [1] http://lists.llvm.org/pipermail/llvm-dev/2012-May/049972.html [2] http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20131021/091536.html Suggested-by: Marco Elver <elver@google.com> Signed-off-by: George Popescu <georgepope@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Brazdil <dbrazdil@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lkml.kernel.org/r/20200922074330.2549523-1-georgepope@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16sched.h: drop in_ubsan field when UBSAN is in trap modeElena Petrova1-1/+1
in_ubsan field of task_struct is only used in lib/ubsan.c, which in its turn is used only `ifneq ($(CONFIG_UBSAN_TRAP),y)`. Removing unnecessary field from a task_struct will help preserve the ABI between vanilla and CONFIG_UBSAN_TRAP'ed kernels. In particular, this will help enabling bounds sanitizer transparently for Android's GKI. Signed-off-by: Elena Petrova <lenaptr@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Link: https://lkml.kernel.org/r/20200910134802.3160311-1-lenaptr@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16scripts/gdb/tasks: add headers and improve spacing formatRitesh Harjani1-4/+5
With the patch. <e.g. o/p> TASK PID COMM 0xffffffff82c2b8c0 0 swapper/0 0xffff888a0ba20040 1 systemd 0xffff888a0ba24040 2 kthreadd 0xffff888a0ba28040 3 rcu_gp w/o 0xffffffff82c2b8c0 <init_task> 0 swapper/0 0xffff888a0ba20040 1 systemd 0xffff888a0ba24040 2 kthreadd 0xffff888a0ba28040 3 rcu_gp Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Link: http://lkml.kernel.org/r/54c868c79b5fc364a8be7799891934a6fe6d1464.1597742951.git.riteshh@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts ↵Ritesh Harjani1-8/+7
command This is many times found useful while debugging some FS related issue. <e.g. output> mount super_block devname pathname fstype options 0xffff888a0bfa4b40 0xffff888a0bfc1000 none / rootfs rw 0 0 0xffff888a033f75c0 0xffff8889fcf65000 /dev/root / ext4 rw,relatime 0 0 0xffff8889fc8ce040 0xffff888a0bb51000 devtmpfs /dev devtmpfs rw,relatime 0 0 Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Link: http://lkml.kernel.org/r/a3c4177e1597b3e06d66d55e07d72c0c46a03571.1597742951.git.riteshh@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16kernel/relay.c: drop unneeded initializationSudip Mukherjee1-1/+1
The variable 'consumed' is initialized with the consumed count but immediately after that the consumed count is updated and assigned to 'consumed' again thus overwriting the previous value. So, drop the unneeded initialization. Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201005205727.1147-1-sudipm.mukherjee@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16panic: dump registers on panic_on_warnAlexey Kardashevskiy1-6/+6
Currently we print stack and registers for ordinary warnings but we do not for panic_on_warn which looks as oversight - panic() will reboot the machine but won't print registers. This moves printing of registers and modules earlier. This does not move the stack dumping as panic() dumps it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Rafael Aquini <aquini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Link: https://lkml.kernel.org/r/20200804095054.68724-1-aik@ozlabs.ru Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16rapidio: fix the missed put_device() for rio_mport_add_riodevJing Xiangfeng1-1/+4
rio_mport_add_riodev() misses to call put_device() when the device already exists. Add the missed function call to fix it. Fixes: e8de370188d0 ("rapidio: add mport char device driver") Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lkml.kernel.org/r/20200922072525.42330-1-jingxiangfeng@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16rapidio: fix error handling pathSouptick Joarder1-6/+7
rio_dma_transfer() attempts to clamp the return value of pin_user_pages_fast() to be >= 0. However, the attempt fails because nr_pages is overridden a few lines later, and restored to the undesirable -ERRNO value. The return value is ultimately stored in nr_pages, which in turn is passed to unpin_user_pages(), which expects nr_pages >= 0, else, disaster. Fix this by fixing the nesting of the assignment to nr_pages: nr_pages should be clamped to zero if pin_user_pages_fast() returns -ERRNO, or set to the return value of pin_user_pages_fast(), otherwise. [jhubbard@nvidia.com: new changelog] Fixes: e8de370188d09 ("rapidio: add mport char device driver") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lkml.kernel.org/r/1600227737-20785-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16nilfs2: fix some kernel-doc warnings for nilfs2Wang Hai4-7/+6
Fixes the following W=1 kernel build warning(s): fs/nilfs2/bmap.c:378: warning: Excess function parameter 'bhp' description in 'nilfs_bmap_assign' fs/nilfs2/cpfile.c:907: warning: Excess function parameter 'status' description in 'nilfs_cpfile_change_cpmode' fs/nilfs2/cpfile.c:946: warning: Excess function parameter 'stat' description in 'nilfs_cpfile_get_stat' fs/nilfs2/page.c:76: warning: Excess function parameter 'inode' description in 'nilfs_forget_buffer' fs/nilfs2/sufile.c:563: warning: Excess function parameter 'stat' description in 'nilfs_sufile_get_stat' Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/1601386269-2423-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16autofs: harden ioctl tableMatthew Wilcox1-2/+6
The table of ioctl functions should be marked const in order to put them in read-only memory, and we should use array_index_nospec() to avoid speculation disclosing the contents of kernel memory to userspace. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ian Kent <raven@themaw.net> Link: https://lkml.kernel.org/r/20200818122203.GO17456@casper.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16ramfs: fix nommu mmap with gaps in the page cacheMatthew Wilcox (Oracle)1-1/+1
ramfs needs to check that pages are both physically contiguous and contiguous in the file. If the page cache happens to have, eg, page A for index 0 of the file, no page for index 1, and page A+1 for index 2, then an mmap of the first two pages of the file will succeed when it should fail. Fixes: 642fb4d1f1dd ("[PATCH] NOMMU: Provide shared-writable mmap support on ramfs") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Link: https://lkml.kernel.org/r/20200914122239.GO6583@casper.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm: remove the now-unnecessary mmget_still_valid() hackJann Horn8-107/+29
The preceding patches have ensured that core dumping properly takes the mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all its users. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/gup: take mmap_lock in get_dump_page()Jann Horn1-6/+10
Properly take the mmap_lock before calling into the GUP code from get_dump_page(); and play nice, allowing the GUP code to drop the mmap_lock if it has to sleep. As Linus pointed out, we don't actually need the VMA because __get_user_pages() will flush the dcache for us if necessary. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-7-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshotJann Horn4-120/+138
In both binfmt_elf and binfmt_elf_fdpic, use a new helper dump_vma_snapshot() to take a snapshot of the VMA list (including the gate VMA, if we have one) while protected by the mmap_lock, and then use that snapshot instead of walking the VMA list without locking. An alternative approach would be to keep the mmap_lock held across the entire core dumping operation; however, keeping the mmap_lock locked while we may be blocked for an unbounded amount of time (e.g. because we're dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock blocks things like the ->release handler of userfaultfd, and we don't really want critical system daemons to grind to a halt just because someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or something like that. Since both the normal ELF code and the FDPIC ELF code need this functionality (and if any other binfmt wants to add coredump support in the future, they'd probably need it, too), implement this with a common helper in fs/coredump.c. A downside of this approach is that we now need a bigger amount of kernel memory per userspace VMA in the normal ELF case, and that we need O(n) kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't be terribly bad. There currently is a data race between stack expansion and anything that reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to mitigate that for core dumping, take the mmap_lock in write mode when taking a snapshot of the VMA hierarchy. (If we only took the mmap_lock in read mode, we could end up with a corrupted core dump if someone does get_user_pages_remote() concurrently. Not really a major problem, but taking the mmap_lock either way works here, so we might as well avoid the issue.) (This doesn't do anything about the existing data races with stack expansion in other mm code.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16coredump: rework elf/elf_fdpic vma_dump_size() into common helperJann Horn4-199/+106
At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly different code to figure out which VMAs should be dumped, and if so, whether the dump should contain the entire VMA or just its first page. Eliminate duplicate code by reworking the binfmt_elf version into a generic core dumping helper in coredump.c. As part of that, change the heuristic for detecting executable/library header pages to check whether the inode is executable instead of looking at the file mode. This is less problematic in terms of locking because it lets us avoid get_user() under the mmap_sem. (And arguably it looks nicer and makes more sense in generic code.) Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA has been written to. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16coredump: refactor page range dumping into common helperJann Horn4-35/+41
Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of pages into the coredump file. Extract that logic into a common helper. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16coredump: let dump_emit() bail out on short writesJann Horn1-11/+11
dump_emit() has a retry loop, but there seems to be no way for that retry logic to actually be used; and it was also buggy, writing the same data repeatedly after a short write. Let's just bail out on a short write. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-3-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMUJann Horn2-37/+28
Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5. At the moment, we have that rather ugly mmget_still_valid() helper to work around <https://crbug.com/project-zero/1790>: ELF core dumping doesn't take the mmap_sem while traversing the task's VMAs, and if anything (like userfaultfd) then remotely messes with the VMA tree, fireworks ensue. So at the moment we use mmget_still_valid() to bail out in any writers that might be operating on a remote mm's VMAs. With this series, I'm trying to get rid of the need for that as cleanly as possible. ("cleanly" meaning "avoid holding the mmap_lock across unbounded sleeps".) Patches 1, 2, 3 and 4 are relatively unrelated cleanups in the core dumping code. Patches 5 and 6 implement the main change: Instead of repeatedly accessing the VMA list with sleeps in between, we snapshot it at the start with proper locking, and then later we just use our copy of the VMA list. This ensures that the kernel won't crash, that VMA metadata in the coredump is consistent even in the presence of concurrent modifications, and that any virtual addresses that aren't being concurrently modified have their contents show up in the core dump properly. The disadvantage of this approach is that we need a bit more memory during core dumping for storing metadata about all VMAs. At the end of the series, patch 7 removes the old workaround for this issue (mmget_still_valid()). I have tested: - Creating a simple core dump on X86-64 still works. - The created coredump on X86-64 opens in GDB and looks plausible. - X86-64 core dumps contain the first page for executable mappings at offset 0, and don't contain the first page for non-executable file mappings or executable mappings at offset !=0. - NOMMU 32-bit ARM can still generate plausible-looking core dumps through the FDPIC implementation. (I can't test this with GDB because GDB is missing some structure definition for nommu ARM, but I've poked around in the hexdump and it looked decent.) This patch (of 7): dump_emit() is for kernel pointers, and VMAs describe userspace memory. Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, even if it probably doesn't matter much on !MMU systems - especially given that it looks like we can just use the same get_dump_page() as on MMU if we move it out of the CONFIG_MMU block. One small change we have to make in get_dump_page() is to use __get_user_pages_locked() instead of __get_user_pages(), since the latter doesn't exist on nommu. On mmu builds, __get_user_pages_locked() will just call __get_user_pages() for us. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-1-jannh@google.com Link: http://lkml.kernel.org/r/20200827114932.3572699-2-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16tools/testing/selftests: add self-test for verifying load alignmentChris Kennelly3-2/+76
This produces a PIE binary with a variety of p_align requirements, suitable for verifying that the load address meets that alignment requirement. Signed-off-by: Chris Kennelly <ckennelly@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Hugh Dickens <hughd@google.com> Cc: Ian Rogers <irogers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Link: https://lkml.kernel.org/r/20200820170541.1132271-3-ckennelly@google.com Link: https://lkml.kernel.org/r/20200821233848.3904680-3-ckennelly@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16fs/binfmt_elf: use PT_LOAD p_align values for suitable start addressChris Kennelly1-0/+25
Patch series "Selecting Load Addresses According to p_align", v3. The current ELF loading mechancism provides page-aligned mappings. This can lead to the program being loaded in a way unsuitable for file-backed, transparent huge pages when handling PIE executables. While specifying -z,max-page-size=0x200000 to the linker will generate suitably aligned segments for huge pages on x86_64, the executable needs to be loaded at a suitably aligned address as well. This alignment requires the binary's cooperation, as distinct segments need to be appropriately paddded to be eligible for THP. For binaries built with increased alignment, this limits the number of bits usable for ASLR, but provides some randomization over using fixed load addresses/non-PIE binaries. This patch (of 2): The current ELF loading mechancism provides page-aligned mappings. This can lead to the program being loaded in a way unsuitable for file-backed, transparent huge pages when handling PIE executables. For binaries built with increased alignment, this limits the number of bits usable for ASLR, but provides some randomization over using fixed load addresses/non-PIE binaries. Tested by verifying program with -Wl,-z,max-page-size=0x200000 loading. [akpm@linux-foundation.org: fix max() warning] [ckennelly@google.com: augment comment] Link: https://lkml.kernel.org/r/20200821233848.3904680-2-ckennelly@google.com Signed-off-by: Chris Kennelly <ckennelly@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickens <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Link: https://lkml.kernel.org/r/20200820170541.1132271-1-ckennelly@google.com Link: https://lkml.kernel.org/r/20200820170541.1132271-2-ckennelly@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: add new warnings to author signoff checks.Dwaipayan Ray1-16/+77
The author signed-off-by checks are currently very vague. Cases like same name or same address are not handled separately. For example, running checkpatch on commit be6577af0cef ("parisc: Add atomic64_set_release() define to avoid CPU soft lockups"), gives: WARNING: Missing Signed-off-by: line by nominal patch author 'John David Anglin <dave.anglin@bell.net>' The signoff line was: "Signed-off-by: Dave Anglin <dave.anglin@bell.net>" Clearly the author has signed off but with a slightly different version of his name. A more appropriate warning would have been to point out at the name mismatch instead. Previously, the values assumed by $authorsignoff were either 0 or 1 to indicate whether a proper sign off by author is present. Extended the checks to handle four new cases. $authorsignoff values now denote the following: 0: Missing sign off by patch author. 1: Sign off present and identical. 2: Addresses and names match, but comments differ. "James Watson(JW) <james@gmail.com>", "James Watson <james@gmail.com>" 3: Addresses match, but names are different. "James Watson <james@gmail.com>", "James <james@gmail.com>" 4: Names match, but addresses are different. "James Watson <james@watson.com>", "James Watson <james@gmail.com>" 5: Names match, addresses excluding subaddress details (RFC 5233) match. "James Watson <james@gmail.com>", "James Watson <james+a@gmail.com>" Also introduced a new message type FROM_SIGN_OFF_MISMATCH for cases 2, 3, 4 and 5. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/linux-kernel-mentees/c1ca28e77e8e3bfa7aadf3efa8ed70f97a9d369c.camel@perches.com/ Link: https://lkml.kernel.org/r/20201007192029.551744-1-dwaipayanray1@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: fix false positive on empty block comment linesŁukasz Stelmach1-1/+1
To avoid false positives in presence of SPDX-License-Identifier in networking files it is required to increase the leeway for empty block comment lines by one line. For example, checking drivers/net/loopback.c which starts with // SPDX-License-Identifier: GPL-2.0-or-later /* * INET An implementation of the TCP/IP protocol suite for the LINUX rsults in an unnecessary warning WARNING: networking block comments don't use an empty /* line, use /* Comment... +/* + * INET An implementation of the TCP/IP protocol suite for the LINUX Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joe Perches <joe@perches.com> Cc: Bartłomiej Żolnierkiewicz <b.zolnierkie@samsung.co> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lkml.kernel.org/r/20201006083509.19934-1-l.stelmach@samsung.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: fix multi-statement macro checks for while blocks.Dwaipayan Ray1-3/+4
Checkpatch.pl doesn't have a check for excluding while (...) {...} blocks from MULTISTATEMENT_MACRO_USE_DO_WHILE error. For example, running checkpatch.pl on the file mm/maccess.c in the kernel generates the following error: ERROR: Macros with complex values should be enclosed in parentheses +#define copy_from_kernel_nofault_loop(dst, src, len, type, err_label) \ + while (len >= sizeof(type)) { \ + __get_kernel_nofault(dst, src, type, err_label); \ + dst += sizeof(type); \ + src += sizeof(type); \ + len -= sizeof(type); \ + } The error is misleading for this case. Enclosing it in parentheses doesn't make any sense. Checkpatch already has an exception list for such common macro types. Added a new exception for while (...) {...} style blocks to the same. In addition, the brace flatten logic was modified by changing the substitution characters from "1" to "1u". This was done to ensure that macros in the form "#define foo(bar) while(bar){bar--;}" were also correctly procecssed. Link: https://lore.kernel.org/linux-kernel-mentees/dc985938aa3986702815a0bd68dfca8a03c85447.camel@perches.com/ Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201001171903.312021-1-dwaipayanray1@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: emit a warning on embedded filenamesJoe Perches1-0/+6
Embedding the complete filename path inside the file isn't particularly useful as often the path is moved around and becomes incorrect. Emit a warning when the source contains the filename. [akpm@linux-foundation.org: remove stray " di"] Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/1fd5f9188a14acdca703ca00301ee323de672a8d.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: extend author Signed-off-by check for split From: headerDwaipayan Ray1-0/+4
Checkpatch did not handle cases where the author From: header was split into multiple lines. The author identity could not be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF warning. A typical example is commit e33bcbab16d1 ("tee: add support for session's client UUID generation"). When checkpatch was run on this commit, it displayed: "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author ''" This was due to split header lines not being handled properly and the author himself wrote in commit cd2614967d8b ("checkpatch: warn if missing author Signed-off-by"): "Split From: headers are not fully handled: only the first part is compared." Support split From: headers by correctly parsing the header extension lines. RFC 5322, Section-2.2.3 stated that each extended line must start with a WSP character (a space or htab). The solution was therefore to concatenate the lines which start with a WSP to get the correct long header. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/linux-kernel-mentees/f5d8124e54a50480b0a9fa638787bc29b6e09854.camel@perches.com/ Link: https://lkml.kernel.org/r/20200921085436.63003-1-dwaipayanray1@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: allow not using -f with files that are in gitJoe Perches1-0/+14
If a file exists in git and checkpatch is used without the -f flag for scanning a file, then checkpatch will scan the file assuming it's a patch and emit: ERROR: Does not appear to be a unified-diff format patch Change the behavior to assume the -f flag if the file exists in git. [joe@perches.com: fix git "fatal" warning if file argument outside kernel tree] Link: https://lkml.kernel.org/r/b6afa04112d450c2fc120a308d706acd60cee294.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Julia Lawall <julia.lawall@inria.fr> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lkml.kernel.org/r/45b81a48e1568bd0126a96f5046eb7aaae9b83c9.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: warn on self-assignmentsJoe Perches1-0/+11
The uninitialized_var() macro was removed recently via commit 63a0895d960a ("compiler: Remove uninitialized_var() macro") as it's not a particularly useful warning and its use can "paper over real bugs". Add a checkpatch test to warn on self-assignments as a means to avoid compiler warnings and as a back-door mechanism to reproduce the old uninitialized_var macro behavior. [akpm@linux-foundation.org: coding style fixes] Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Denis Efremov <efremov@linux.com> Cc: Julia Lawall <julia.lawall@inria.fr> Link: https://lkml.kernel.org/r/afc2cffdd315d3e4394af149278df9e8af7f49f4.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16const_structs.checkpatch: add pinctrl_ops and pinmux_opsRikard Falkeborn1-0/+2
All usages of include/linux of these are const pointers, and all instances in the kernel except one, that are not const can be made const (patches have been posted for those separately). Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com> Link: https://lkml.kernel.org/r/20200830224352.37114-1-rikard.falkeborn@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: warn if trace_printk and friends are calledNicolas Boichat1-0/+6
trace_printk is meant as a debugging tool, and should not be compiled into production code without specific debug Kconfig options enabled, or source code changes, as indicated by the warning that shows up on boot if any trace_printk is called: ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** ** ** ** trace_printk() being used. Allocating extra memory. ** ** ** ** This means that this is a DEBUG kernel and it is ** ** unsafe for production use. ** Let's warn developers when they try to submit such a change. Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joe Perches <joe@perches.com> Link: https://lkml.kernel.org/r/20200825193600.v2.1.I723c43c155f02f726c97501be77984f1e6bb740a@changeid Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16const_structs.checkpatch: add phy_opsRikard Falkeborn1-0/+1
All usages of phy_ops in include/linux uses const phy_ops * and all instances of phy_ops in the kernel that are not const already can be made const (patches have been posted for those separately). Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kishon Vijay Abraham I <kishon@ti.com> Cc: Vinod Koul <vkoul@kernel.org> Link: https://lkml.kernel.org/r/20200824214132.9072-1-rikard.falkeborn@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: add test for comma use that should be semicolonJoe Perches1-0/+11
There are commas used as statement terminations that should typically have used semicolons instead. Only direct assignments or use of a single function or value on a single line are detected by this test. e.g.: foo = bar(), /* typical use is semicolon not comma */ bar = baz(); Add an imperfect test to detect these comma uses. No false positives were found in testing, but many types of false negatives are possible. e.g.: foo = bar() + 1, /* comma use, but not direct assignment */ bar = baz(); Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/3bf27caf462007dfa75647b040ab3191374a59de.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: move repeated word testJoe Perches1-36/+36
Currently this test only works on .[ch] files. Move the test to check more file types and the commit log. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/180b3b5677771c902b2e2f7a2b7090ede65fe004.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16checkpatch: add --kconfig-prefixJerome Forissier1-4/+8
Kconfig allows to customize the CONFIG_ prefix via the $CONFIG_ environment variable. Out-of-tree projects may therefore use Kconfig with a different prefix, or they may use a custom configuration tool which does not use the CONFIG_ prefix at all. Such projects may still want to adhere to the Linux kernel coding style and run checkpatch.pl. One example is OP-TEE [1] which does not use Kconfig but does have configuration options prefixed with CFG_. It also mostly follows the kernel coding style and therefore being able to use checkpatch is quite valuable. To make this possible, add the --kconfig-prefix command line option. [1] https://github.com/OP-TEE/optee_os Signed-off-by: Jerome Forissier <jerome@forissier.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200818081732.800449-1-jerome@forissier.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16bitops: use the same mechanism for get_count_order[_long]Wei Yang1-5/+3
These two functions share the same logic. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lkml.kernel.org/r/20200807085837.11697-3-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16bitops: simplify get_count_order_long()Wei Yang1-4/+1
These two cases could be unified into one. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lkml.kernel.org/r/20200807085837.11697-2-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/crc32.c: fix trivial typo in preprocessor conditionTobias Jordan1-1/+1
Whether crc32_be needs a lookup table is chosen based on CRC_LE_BITS. Obviously, the _be function should be governed by the _BE_ define. This probably never pops up as it's hard to come up with a configuration where CRC_BE_BITS isn't the same as CRC_LE_BITS and as nobody is using bitwise CRC anyway. Fixes: 46c5801eaf86 ("crc32: bolt on crc32c") Signed-off-by: Tobias Jordan <kernel@cdqe.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lkml.kernel.org/r/20200923182122.GA3338@agrajag.zerfleddert.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/test_hmm.c: fix an error code in dmirror_allocate_chunk()Dan Carpenter1-1/+1
This is supposed to return false on failure, not a negative error code. Fixes: 170e38548b81 ("mm/hmm/test: use after free in dmirror_allocate_chunk()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Dan Williams <dan.j.williams@intel.com> Link: https://lkml.kernel.org/r/20201010200812.GA1886610@mwanda Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16include/linux/list.h: add a macro to test if entry is pointing to the headAndy Shevchenko1-10/+19
Add a macro to test if entry is pointing to the head of the list which is useful in cases like: list_for_each_entry(pos, &head, member) { if (cond) break; } if (list_entry_is_head(pos, &head, member)) return -ERRNO; that allows to avoid additional variable to be added to track if loop has not been stopped in the middle. While here, convert list_for_each_entry*() family of macros to use a new one. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lkml.kernel.org/r/20200929134342.51489-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/percpu_counter.c: use helper macro abs()Miaohe Lin1-1/+1
Use helper macro abs() to simplify the "x >= t || x <= -t" cmp. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200927122746.5964-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/scatterlist.c: avoid a double memsetChristophe JAILLET1-1/+1
'sgl' is zeroed a few lines below in 'sg_init_table()'. There is no need to clear it twice. Remove the redundant initialization. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200920071544.368841-1-christophe.jaillet@wanadoo.fr Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/idr.c: document that ida_simple_{get,remove}() are deprecatedStephen Boyd1-0/+4
These two functions are deprecated. Users should call ida_alloc() or ida_free() respectively instead. Add documentation to this effect until the macro can be removed. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Tri Vo <trong@android.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20200910055246.2297797-2-swboyd@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/idr.c: document calling context for IDA APIs mustn't use locksStephen Boyd2-6/+12
The documentation for these functions indicates that callers don't need to hold a lock while calling them, but that documentation is only in one place under "IDA Usage". Let's state the same information on each IDA function so that it's clear what the calling context requires. Furthermore, let's document ida_simple_get() with the same information so that callers know how this API works. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tri Vo <trong@android.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20200910055246.2297797-1-swboyd@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib/mpi/mpi-bit.c: fix spello of "functions"Randy Dunlap1-1/+1
Fix typo/spello of "functions". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/8df15173-a6df-9426-7cad-a2d279bf1170@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: test_sysctl: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "the". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040520.1999-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: syscall: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "the". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040514.26136-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: radix-tree: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "be". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040508.26086-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: earlycpio: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "the". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040455.25995-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: dynamic_queue_limits: delete duplicated words + fix typoRandy Dunlap1-2/+2
Drop the repeated word "the". Fix spelling of "excess". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040449.25946-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: decompress_bunzip2: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "how". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040436.25852-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: libcrc32c: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "the". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040430.25807-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16lib: bitmap: delete duplicated wordsRandy Dunlap1-1/+1
Drop the repeated word "an". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200823040424.25760-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16MAINTAINERS: jarkko.sakkinen@linux.intel.com -> jarkko@kernel.orgJarkko Sakkinen2-3/+4
Use @kernel.org address as the main communications end point. Update the corresponding M-entries and .mailmap (for git shortlog translation). Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rob Herring <robh@kernel.org> Link: https://lkml.kernel.org/r/20201015142710.8371-1-jarkko.sakkinen@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16get_maintainer: exclude MAINTAINERS file(s) from --git-fallbackJoe Perches1-2/+4
MAINTAINERS files generally have no specific maintainer but are updated by individuals for subsystems all over the source tree. Exclude MAINTAINERS file(s) from --git-fallback searches so the unlucky individuals that update the files the most are not shown by default. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Rob Herring <robh@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/2bacb0a9c06fbb6d56a43bf930e808c74243c908.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16get_maintainer: add test for file in VCSJoe Perches1-0/+3
It's somewhat common for me to ask get_maintainer to tell me who maintains a patch file rather than the files modified by the patch. Emit a warning if using get_maintainer.pl -f <patchfile> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/f63229c051567041819f25e76f49d83c6e4c0f71.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16kernel: acct.c: fix some kernel-doc nitsRandy Dunlap1-5/+3
Fix kernel-doc notation to use the documented Returns: syntax and place the function description for acct_process() on the first line where it should be. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/b4c33e5d-98e8-0c47-77b6-ac1859f94d7f@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16kernel/: fix repeated words in commentsRandy Dunlap15-15/+15
Fix multiple occurrences of duplicated words in kernel/. Fix one typo/spello on the same line as a duplicate word. Change one instance of "the the" to "that the". Otherwise just drop one of the repeated words. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map()Liao Pingfang1-1/+1
Replace do_brk with do_brk_flags in comment of prctl_set_mm_map(), since do_brk was removed in following commit. Fixes: bb177a732c4369 ("mm: do not bug_on on incorrect length in __mm_populate()") Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/1600650751-43127-1-git-send-email-wang.yi59@zte.com.cn Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16kernel.h: split out min()/max() et al. helpersAndy Shevchenko12-154/+170
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out min()/max() et al. helpers. At the same time convert users in header and lib folder to use new header. Though for time being include new header back to kernel.h to avoid twisted indirected includes for other existing users. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16fs: configfs: delete repeated words in commentsRandy Dunlap2-2/+2
Drop duplicated words {the, that} in comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/20200811021826.25032-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm: rename page_order() to buddy_order()Matthew Wilcox (Oracle)7-29/+29
The current page_order() can only be called on pages in the buddy allocator. For compound pages, you have to use compound_order(). This is confusing and led to a bug, so rename page_order() to buddy_order(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16include/linux/mmzone.h: remove unused early_pfn_valid()Mike Rapoport1-5/+0
The early_pfn_valid() macro is defined but it is never used. Remove it. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lkml.kernel.org/r/20200923162915.26935-1-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm: use helper function put_write_access()Miaohe Lin2-2/+2
In commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), the helper put_write_access() came with the atomic_dec operation of the i_writecount field. But it forgot to use this helper in __vma_link_file() and dup_mmap(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200924115235.5111-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/workingset.c: fix some doc warningsXiaofei Tan1-1/+1
Fix following warnings caused by mismatch bewteen function parameters and comments. mm/workingset.c:228: warning: Function parameter or member 'lruvec' not described in 'workingset_age_nonresident' mm/workingset.c:228: warning: Excess function parameter 'memcg' description in 'workingset_age_nonresident' Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/1600485913-11192-1-git-send-email-tanxiaofei@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm: fix some comments formattingChen Tao2-2/+2
Correct one function name "get_partials" with "get_partial". Update the old struct name of list3 with kmem_cache_node. Signed-off-by: Chen Tao <chentao3@hotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm: fix some broken commentsMiaohe Lin2-3/+3
Fix some broken comments including typo, grammar error and wrong function name. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200913095456.54873-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm: use self-explanatory macros rather than "2"Yu Zhao3-6/+10
Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/20200831175042.3527153-2-yuzhao@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/highmem.c: clean up endif commentsIra Weiny1-2/+2
The #endif at the end of the file matches up with the '#if defined(HASHED_PAGE_VIRTUAL)' on line 374. Not the CONFIG_HIGHMEM #if earlier. Fix comments on both of the #endif's to indicate the correct end of blocks for each. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lkml.kernel.org/r/20200819184635.112579-1-ira.weiny@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/page_reporting.c: drop stale list head check in page_reporting_cycleWei Yang1-1/+1
list_for_each_entry_safe() guarantees that we will never stumble over the list head; "&page->lru != list" will always evaluate to true. Let's simplify. [david@redhat.com: Changelog refinements] Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Link: http://lkml.kernel.org/r/20200818084448.33969-1-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/slab.h: remove duplicate includeYueHaibing1-1/+0
Remove duplicate header which is included twice. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200818114323.58156-1-yuehaibing@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16zram: failing to decompress is WARN_ON worthyDouglas Anderson1-1/+1
If we fail to decompress in zram it's a pretty serious problem. We were entrusted to be able to decompress the old data but we failed. Either we've got some crazy bug in the compression code or we've got memory corruption. At the moment, when this happens the log looks like this: ERR kernel: [ 1833.099861] zram: Decompression failed! err=-22, page=336112 ERR kernel: [ 1833.099881] zram: Decompression failed! err=-22, page=336112 ALERT kernel: [ 1833.099886] Read-error on swap-device (253:0:2688896) It is true that we have an "ALERT" level log in there, but (at least to me) it feels like even this isn't enough to impart the seriousness of this error. Let's convert to a WARN_ON. Note that WARN_ON is automatically "unlikely" so we can simply replace the old annotation with the new one. Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Sonny Rao <sonnyrao@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lkml.kernel.org/r/20200917174059.1.If09c882545dbe432268f7a67a4d4cfcb6caace4f@changeid Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/memory_hotplug: update comment regarding zone shufflingDavid Hildenbrand1-7/+4
As we no longer shuffle via generic_online_page() and when undoing isolation, we can simplify the comment. We now effectively shuffle only once (properly) when onlining new memory. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Link: https://lkml.kernel.org/r/20201005121534.15649-6-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>