commit 0d59679df5b53755c00ea0292df696f97bfc950d Author: Greg Kroah-Hartman Date: Tue Jan 2 20:31:17 2018 +0100 Linux 4.14.11 commit 3ade66602bb7ea348ed8cbafcdd56eb1826c2177 Author: Johan Hovold Date: Fri Nov 3 15:18:05 2017 +0100 tty: fix tty_ldisc_receive_buf() documentation commit e7e51dcf3b8a5f65c5653a054ad57eb2492a90d0 upstream. The tty_ldisc_receive_buf() helper returns the number of bytes processed so drop the bogus "not" from the kernel doc comment. Fixes: 8d082cd300ab ("tty: Unify receive_buf() code paths") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit aaa5a91ff744f91fb1d1c91853aa0c8f126be563 Author: Linus Torvalds Date: Wed Dec 20 17:57:06 2017 -0800 n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD) commit 966031f340185eddd05affcf72b740549f056348 upstream. We added support for EXTPROC back in 2010 in commit 26df6d13406d ("tty: Add EXTPROC support for LINEMODE") and the intent was to allow it to override some (all?) ICANON behavior. Quoting from that original commit message: There is a new bit in the termios local flag word, EXTPROC. When this bit is set, several aspects of the terminal driver are disabled. Input line editing, character echo, and mapping of signals are all disabled. This allows the telnetd to turn off these functions when in linemode, but still keep track of what state the user wants the terminal to be in. but the problem turns out that "several aspects of the terminal driver are disabled" is a bit ambiguous, and you can really confuse the n_tty layer by setting EXTPROC and then causing some of the ICANON invariants to no longer be maintained. This fixes at least one such case (TIOCINQ) becoming unhappy because of the confusion over whether ICANON really means ICANON when EXTPROC is set. This basically makes TIOCINQ match the case of read: if EXTPROC is set, we ignore ICANON. Also, make sure to reset the ICANON state ie EXTPROC changes, not just if ICANON changes. Fixes: 26df6d13406d ("tty: Add EXTPROC support for LINEMODE") Reported-by: Tetsuo Handa Reported-by: syzkaller Cc: Jiri Slaby Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 57849de13c7da3385611a1b9aca5c6040ef7dfe2 Author: Thomas Gleixner Date: Sun Dec 31 16:52:15 2017 +0100 x86/ldt: Make LDT pgtable free conditional commit 7f414195b0c3612acd12b4611a5fe75995cf10c7 upstream. Andy prefers to be paranoid about the pagetable free in the error path of write_ldt(). Make it conditional and warn whenever the installment of a secondary LDT fails. Requested-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 3e133155f22d59b0085ef63cbcbcfbd96c4d0b70 Author: Thomas Gleixner Date: Sun Dec 31 11:24:34 2017 +0100 x86/ldt: Plug memory leak in error path commit a62d69857aab4caa43049e72fe0ed5c4a60518dd upstream. The error path in write_ldt() tries to free 'old_ldt' instead of the newly allocated 'new_ldt', resulting in a memory leak. It also misses to clean up a half populated LDT pagetable, which is not a leak as it gets cleaned up when the process exits. Free both the potentially half populated LDT pagetable and the newly allocated LDT struct. This can be done unconditionally because once an LDT is mapped subsequent maps will succeed, because the PTE page is already populated and the two LDTs fit into that single page. Reported-by: Mathieu Desnoyers Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Dominik Brodowski Cc: Linus Torvalds Cc: Linus Torvalds Cc: Peter Zijlstra Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1712311121340.1899@nanos Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit cf6c3f7f4b1305596e261fb90201b737ad79e0d6 Author: Andy Lutomirski Date: Tue Dec 12 07:56:36 2017 -0800 x86/espfix/64: Fix espfix double-fault handling on 5-level systems commit c739f930be1dd5fd949030e3475a884fe06dae9b upstream. Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems was wrong, and it resulted in panics due to unhandled double faults. Use P4D_SHIFT instead, which is correct on 4-level and 5-level machines. This fixes a panic when running x86 selftests on 5-level machines. Signed-off-by: Andy Lutomirski Acked-by: Kirill A. Shutemov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging") Link: http://lkml.kernel.org/r/24c898b4f44fdf8c22d93703850fb384ef87cfdc.1513035461.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 530f5fa1600b6af56b138f9fb87b7f5148c979a0 Author: Linus Torvalds Date: Wed Dec 27 11:48:50 2017 -0800 x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) commit ac461122c88a10b7d775de2f56467f097c9e627a upstream. Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up and unified the IDT invalidation that existed in a couple of places. It changed no actual real code. Despite not changing any actual real code, it _did_ change code generation: by implementing the common idt_invalidate() function in archx86/kernel/idt.c, it made the use of the function in arch/x86/kernel/machine_kexec_32.c be a real function call rather than an (accidental) inlining of the function. That, in turn, exposed two issues: - in load_segments(), we had incorrectly reset all the segment registers, which then made the stack canary load (which gcc does using offset of %gs) cause a trap. Instead of %gs pointing to the stack canary, it will be the normal zero-based kernel segment, and the stack canary load will take a page fault at address 0x14. - to make this even harder to debug, we had invalidated the GDT just before calling idt_invalidate(), which meant that the fault happened with an invalid GDT, which in turn causes a triple fault and immediate reboot. Fix this by (a) not reloading the special segments in load_segments(). We currently don't do any percpu accesses (which would require %fs on x86-32) in this area, but there's no reason to think that we might not want to do them, and like %gs, it's pointless to break it. (b) doing idt_invalidate() before invalidating the GDT, to keep things at least _slightly_ more debuggable for a bit longer. Without a IDT, traps will not work. Without a GDT, traps also will not work, but neither will any segment loads etc. So in a very real sense, the GDT is even more core than the IDT. Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation") Reported-and-tested-by: Alexandru Chirvasitu Signed-off-by: Linus Torvalds Signed-off-by: Thomas Gleixner Cc: Denys Vlasenko Cc: Peter Zijlstra Cc: Brian Gerst Cc: Steven Rostedt Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Josh Poimboeuf Link: https://lkml.kernel.org/r/alpine.LFD.2.21.1712271143180.8572@i7.lan Signed-off-by: Greg Kroah-Hartman commit 082b7521a541a66f1fed68b1f76de6a3b910495e Author: Thomas Gleixner Date: Sat Dec 30 22:13:54 2017 +0100 x86/mm: Remove preempt_disable/enable() from __native_flush_tlb() commit decab0888e6e14e11d53cefa85f8b3d3b45ce73c upstream. The preempt_disable/enable() pair in __native_flush_tlb() was added in commit: 5cf0791da5c1 ("x86/mm: Disable preemption during CR3 read+write") ... to protect the UP variant of flush_tlb_mm_range(). That preempt_disable/enable() pair should have been added to the UP variant of flush_tlb_mm_range() instead. The UP variant was removed with commit: ce4a4e565f52 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code") ... but the preempt_disable/enable() pair stayed around. The latest change to __native_flush_tlb() in commit: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") ... added an access to a per CPU variable outside the preempt disabled regions, which makes no sense at all. __native_flush_tlb() must always be called with at least preemption disabled. Remove the preempt_disable/enable() pair and add a WARN_ON_ONCE() to catch bad callers independent of the smp_processor_id() debugging. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Dominik Brodowski Cc: Linus Torvalds Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171230211829.679325424@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit b5bef29785ff990a90238d4f59ccbeb656134eeb Author: Thomas Gleixner Date: Sat Dec 30 22:13:53 2017 +0100 x86/smpboot: Remove stale TLB flush invocations commit 322f8b8b340c824aef891342b0f5795d15e11562 upstream. smpboot_setup_warm_reset_vector() and smpboot_restore_warm_reset_vector() invoke local_flush_tlb() for no obvious reason. Digging in history revealed that the original code in the 2.1 era added those because the code manipulated a swapper_pg_dir pagetable entry. The pagetable manipulation was removed long ago in the 2.3 timeframe, but the TLB flush invocations stayed around forever. Remove them along with the pointless pr_debug()s which come from the same 2.1 change. Reported-by: Dominik Brodowski Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Linus Torvalds Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171230211829.586548655@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit e798502cfb471b95153de2f75a89501c45ec997a Author: Thomas Gleixner Date: Fri Dec 22 15:51:13 2017 +0100 nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream. The conditions in irq_exit() to invoke tick_nohz_irq_exit() which subsequently invokes tick_nohz_stop_sched_tick() are: if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) If need_resched() is not set, but a timer softirq is pending then this is an indication that the softirq code punted and delegated the execution to softirqd. need_resched() is not true because the current interrupted task takes precedence over softirqd. Invoking tick_nohz_irq_exit() in this case can cause an endless loop of timer interrupts because the timer wheel contains an expired timer, but softirqs are not yet executed. So it returns an immediate expiry request, which causes the timer to fire immediately again. Lather, rinse and repeat.... Prevent that by adding a check for a pending timer soft interrupt to the conditions in tick_nohz_stop_sched_tick() which avoid calling get_next_timer_interrupt(). That keeps the tick sched timer on the tick and prevents a repetitive programming of an already expired timer. Reported-by: Sebastian Siewior Signed-off-by: Thomas Gleixner Acked-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Paul McKenney Cc: Anna-Maria Gleixner Cc: Sebastian Siewior Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos Signed-off-by: Greg Kroah-Hartman commit 7a3ce39c2bcac61abcb4650b33945335a02ef6cd Author: Sushmita Susheelendra Date: Fri Dec 15 13:59:13 2017 -0700 staging: android: ion: Fix dma direction for dma_sync_sg_for_cpu/device commit d6b246bb7a29703f53aa4c050b8b3205d749caee upstream. Use the direction argument passed into begin_cpu_access and end_cpu_access when calling the dma_sync_sg_for_cpu/device. The actual cache primitive called depends on the direction passed in. Signed-off-by: Sushmita Susheelendra Acked-by: Laura Abbott Signed-off-by: Greg Kroah-Hartman commit 2695c0f1f71e6022a6916584a4d19de8968e97c3 Author: Sudeep Holla Date: Fri Nov 17 11:56:41 2017 +0000 drivers: base: cacheinfo: fix cache type for non-architected system cache commit f57ab9a01a36ef3454333251cc57e3a9948b17bf upstream. Commit dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for cache properties") doesn't initialise the cache type if it's present only in DT and the architecture is not aware of it. They are unified system level cache which are generally transparent. This patch check if the cache type is set to NOCACHE but the DT node indicates that it's unified cache and sets the cache type accordingly. Fixes: dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for cache properties") Reported-and-tested-by: Tan Xiaojun Cc: Greg Kroah-Hartman Signed-off-by: Sudeep Holla Signed-off-by: Greg Kroah-Hartman commit f3f5fa872d09109edfd7c10c57865301fee396d4 Author: Johan Hovold Date: Wed Nov 15 10:43:16 2017 +0100 phy: tegra: fix device-tree node lookups commit 046046737bd35bed047460f080ea47e186be731e upstream. Fix child-node lookups during probe, which ended up searching the whole device tree depth-first starting at the parents rather than just matching on their children. To make things worse, some parent nodes could end up being being prematurely freed (by tegra_xusb_pad_register()) as of_find_node_by_name() drops a reference to its first argument. Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support") Cc: Thierry Reding Signed-off-by: Johan Hovold Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman commit d87f1bc7d15b89bd3bcf31020eb7f3b3cd6f84b5 Author: Todd Kjos Date: Mon Nov 27 09:32:33 2017 -0800 binder: fix proc->files use-after-free commit 7f3dc0088b98533f17128058fac73cd8b2752ef1 upstream. proc->files cleanup is initiated by binder_vma_close. Therefore a reference on the binder_proc is not enough to prevent the files_struct from being released while the binder_proc still has a reference. This can lead to an attempt to dereference the stale pointer obtained from proc->files prior to proc->files cleanup. This has been seen once in task_get_unused_fd_flags() when __alloc_fd() is called with a stale "files". The fix is to protect proc->files with a mutex to prevent cleanup while in use. Signed-off-by: Todd Kjos Signed-off-by: Greg Kroah-Hartman commit 6fae6de72ad44e98b5ae58a662d110c58594aad9 Author: Thomas Gleixner Date: Wed Dec 27 21:37:25 2017 +0100 timers: Reinitialize per cpu bases on hotplug commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream. The timer wheel bases are not (re)initialized on CPU hotplug. That leaves them with a potentially stale clk and next_expiry valuem, which can cause trouble then the CPU is plugged. Add a prepare callback which forwards the clock, sets next_expiry to far in the future and reset the control flags to a known state. Set base->must_forward_clk so the first timer which is queued will try to forward the clock to current jiffies. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Reported-by: Paul E. McKenney Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Sebastian Siewior Cc: Anna-Maria Gleixner Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos Signed-off-by: Greg Kroah-Hartman commit 8f1aa64ab08621cc3d2b1600ab80da5d3128abb6 Author: Thomas Gleixner Date: Fri Dec 22 15:51:14 2017 +0100 timers: Invoke timer_start_debug() where it makes sense commit fd45bb77ad682be728d1002431d77b8c73342836 upstream. The timer start debug function is called before the proper timer base is set. As a consequence the trace data contains the stale CPU and flags values. Call the debug function after setting the new base and flags. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Sebastian Siewior Cc: rt@linutronix.de Cc: Paul McKenney Cc: Anna-Maria Gleixner Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de Signed-off-by: Greg Kroah-Hartman commit e4fb2e7e92ec638f0bd489c56b4b6d9004fd1f40 Author: Anna-Maria Gleixner Date: Fri Dec 22 15:51:12 2017 +0100 timers: Use deferrable base independent of base::nohz_active commit ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd upstream. During boot and before base::nohz_active is set in the timer bases, deferrable timers are enqueued into the standard timer base. This works correctly as long as base::nohz_active is false. Once it base::nohz_active is set and a timer which was enqueued before that is accessed the lock selector code choses the lock of the deferred base. This causes unlocked access to the standard base and in case the timer is removed it does not clear the pending flag in the standard base bitmap which causes get_next_timer_interrupt() to return bogus values. To prevent that, the deferrable timers must be enqueued in the deferrable base, even when base::nohz_active is not set. Those deferrable timers also need to be expired unconditional. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Sebastian Siewior Cc: rt@linutronix.de Cc: Paul McKenney Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de Signed-off-by: Greg Kroah-Hartman commit f0f18aa8f70171f3e184b4185e673d7d4694300e Author: Daniel Thompson Date: Thu Dec 21 15:06:15 2017 +0200 usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201 commit da99706689481717998d1d48edd389f339eea979 upstream. When plugging in a USB webcam I see the following message: xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? handle_tx_event: 913 callbacks suppressed All is quiet again with this patch (and I've done a fair but of soak testing with the camera since). Signed-off-by: Daniel Thompson Acked-by: Ard Biesheuvel Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman commit c7c4e00a66081ad6161f009a0c329b9a36085766 Author: Mathias Nyman Date: Tue Dec 19 11:14:42 2017 +0200 USB: Fix off by one in type-specific length check of BOS SSP capability commit 07b9f12864d16c3a861aef4817eb1efccbc5d0e6 upstream. USB 3.1 devices are not detected as 3.1 capable since 4.15-rc3 due to a off by one in commit 81cf4a45360f ("USB: core: Add type-specific length check of BOS descriptors") It uses USB_DT_USB_SSP_CAP_SIZE() to get SSP capability size which takes the zero based SSAC as argument, not the actual count of sublink speed attributes. USB3 spec 9.6.2.5 says "The number of Sublink Speed Attributes = SSAC + 1." The type-specific length check patch was added to stable and needs to be fixed there as well Fixes: 81cf4a45360f ("USB: core: Add type-specific length check of BOS descriptors") CC: Masakazu Mokuno Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman commit e2f33e5983cb1609805e267a3d2ef46d78257b2c Author: Oliver Neukum Date: Tue Dec 12 16:11:30 2017 +0100 usb: add RESET_RESUME for ELSA MicroLink 56K commit b9096d9f15c142574ebebe8fbb137012bb9d99c2 upstream. This modem needs this quirk to operate. It produces timeouts when resumed without reset. Signed-off-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman commit b9d02d3c5899b0abcb2df72252d7fc2f57028bfb Author: Dmitry Fleytman Dmitry Fleytman Date: Tue Dec 19 06:02:04 2017 +0200 usb: Add device quirk for Logitech HD Pro Webcam C925e commit 7f038d256c723dd390d2fca942919573995f4cfd upstream. Commit e0429362ab15 ("usb: Add device quirk for Logitech HD Pro Webcams C920 and C930e") introduced quirk to workaround an issue with some Logitech webcams. There is one more model that has the same issue - C925e, so applying the same quirk as well. See aforementioned commit message for detailed explanation of the problem. Signed-off-by: Dmitry Fleytman Signed-off-by: Greg Kroah-Hartman commit 57211f0cf174f8fdf3dffcff99fad25f376bb2b5 Author: SZ Lin (林上智) Date: Tue Dec 19 17:40:32 2017 +0800 USB: serial: option: adding support for YUGA CLM920-NC5 commit 3920bb713038810f25770e7545b79f204685c8f2 upstream. This patch adds support for YUGA CLM920-NC5 PID 0x9625 USB modem to option driver. Interface layout: 0: QCDM/DIAG 1: ADB 2: MODEM 3: AT 4: RMNET Signed-off-by: Taiyi Wu Signed-off-by: SZ Lin (林上智) Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 9fa3c3b5598eec2040fa466941a59bfc15f03f17 Author: Daniele Palmas Date: Thu Dec 14 16:54:45 2017 +0100 USB: serial: option: add support for Telit ME910 PID 0x1101 commit 08933099e6404f588f81c2050bfec7313e06eeaf upstream. This patch adds support for PID 0x1101 of Telit ME910. Signed-off-by: Daniele Palmas Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit eb6cc0af22a3b0c982e3a61de63736b2c667a94e Author: Reinhard Speyerer Date: Fri Dec 15 00:39:27 2017 +0100 USB: serial: qcserial: add Sierra Wireless EM7565 commit 92a18a657fb2e2ffbfa0659af32cc18fd2346516 upstream. Sierra Wireless EM7565 devices use the QCSERIAL_SWI layout for their serial ports T: Bus=01 Lev=03 Prnt=29 Port=01 Cnt=02 Dev#= 31 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1199 ProdID=9091 Rev= 0.06 S: Manufacturer=Sierra Wireless, Incorporated S: Product=Sierra Wireless EM7565 Qualcomm Snapdragon X16 LTE-A S: SerialNumber=xxxxxxxx C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=qcserial E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=qcserial E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=qcserial E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=86(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms but need sendsetup = true for the NMEA port to make it work properly. Simplify the patch compared to v1 as suggested by Bjørn Mork by taking advantage of the fact that existing devices work with sendsetup = true too. Use sendsetup = true for the NMEA interface of QCSERIAL_SWI and add DEVICE_SWI entries for the EM7565 PID 0x9091 and the EM7565 QDL PID 0x9090. Tests with several MC73xx/MC74xx/MC77xx devices have been performed in order to verify backward compatibility. Signed-off-by: Reinhard Speyerer Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit ad61ff29f1049448fa389ba9ad464e49bdcdd61c Author: Max Schulze Date: Wed Dec 20 20:47:44 2017 +0100 USB: serial: ftdi_sio: add id for Airbus DS P8GR commit c6a36ad383559a60a249aa6016cebf3cb8b6c485 upstream. Add AIRBUS_DS_P8GR device IDs to ftdi_sio driver. Signed-off-by: Max Schulze Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 1fcd9859a4b3f1df7316ff01a9993695ab929727 Author: Johan Hovold Date: Mon Nov 13 11:12:58 2017 +0100 USB: chipidea: msm: fix ulpi-node lookup commit 964728f9f407eca0b417fdf8e784b7a76979490c upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. Note that the original premature free of the parent node has already been fixed separately, but that fix was apparently never backported to stable. Fixes: 47654a162081 ("usb: chipidea: msm: Restore wrapper settings after reset") Fixes: b74c43156c0c ("usb: chipidea: msm: ci_hdrc_msm_probe() missing of_node_get()") Cc: Stephen Boyd Cc: Frank Rowand Signed-off-by: Johan Hovold Signed-off-by: Peter Chen Signed-off-by: Greg Kroah-Hartman commit 2d6483bf78f859b827ea97a16b68cf64c3b56e67 Author: Shuah Khan Date: Mon Dec 18 17:24:22 2017 -0700 usbip: vhci: stop printing kernel pointer addresses in messages commit 8272d099d05f7ab2776cf56a2ab9f9443be18907 upstream. Remove and/or change debug, info. and error messages to not print kernel pointer addresses. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit ed4db9a7f8cb9e83605e63e7656a6efb56846ebb Author: Shuah Khan Date: Mon Dec 18 17:23:37 2017 -0700 usbip: stub: stop printing kernel pointer addresses in messages commit 248a22044366f588d46754c54dfe29ffe4f8b4df upstream. Remove and/or change debug, info. and error messages to not print kernel pointer addresses. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit 6b8335e48ae7171206eda3ae572f0cd0c75ca8a1 Author: Shuah Khan Date: Fri Dec 15 10:50:09 2017 -0700 usbip: prevent leaking socket pointer address in messages commit 90120d15f4c397272aaf41077960a157fc4212bf upstream. usbip driver is leaking socket pointer address in messages. Remove the messages that aren't useful and print sockfd in the ones that are useful for debugging. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit 2ee11dcfc9e1c894b058de64fb8b693cf697fbbe Author: Juan Zea Date: Fri Dec 15 10:21:20 2017 +0100 usbip: fix usbip bind writing random string after command in match_busid commit 544c4605acc5ae4afe7dd5914147947db182f2fb upstream. usbip bind writes commands followed by random string when writing to match_busid attribute in sysfs, caused by using full variable size instead of string length. Signed-off-by: Juan Zea Acked-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit 38e8981d5490b2908136baea1f9e3b84a87aec01 Author: Jan Engelhardt Date: Mon Dec 25 03:43:53 2017 +0100 sparc64: repair calling incorrect hweight function from stubs [ Upstream commit 59585b4be9ae4dc6506551709bdcd6f5210b8a01 ] Commit v4.12-rc4-1-g9289ea7f952b introduced a mistake that made the 64-bit hweight stub call the 16-bit hweight function. Fixes: 9289ea7f952b ("sparc64: Use indirect calls in hamming weight stubs") Signed-off-by: Jan Engelhardt Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b9edd6bf0ccb2939a7716b80d3e73a24ead9fe62 Author: Willem de Bruijn Date: Thu Dec 28 12:38:13 2017 -0500 skbuff: in skb_copy_ubufs unclone before releasing zerocopy skb_copy_ubufs must unclone before it is safe to modify its skb_shared_info with skb_zcopy_clear. Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags") ensures that all skbs release their zerocopy state, even those without frags. But I forgot an edge case where such an skb arrives that is cloned. The stack does not build such packets. Vhost/tun skbs have their frags orphaned before cloning. TCP skbs only attach zerocopy state when a frag is added. But if TCP packets can be trimmed or linearized, this might occur. Tracing the code I found no instance so far (e.g., skb_linearize ends up calling skb_zcopy_clear if !skb->data_len). Still, it is non-obvious that no path exists. And it is fragile to rely on this. Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags") Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 49cd180d4a104159f240a9a896e5f13844227378 Author: Willem de Bruijn Date: Wed Dec 20 17:37:50 2017 -0500 skbuff: skb_copy_ubufs must release uarg even without user frags [ Upstream commit b90ddd568792bcb0054eaf0f61785c8f80c3bd1c ] skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 17155ea827b2fd81330a442ed56d0edafd9969e1 Author: Willem de Bruijn Date: Wed Dec 20 17:37:49 2017 -0500 skbuff: orphan frags before zerocopy clone [ Upstream commit 268b790679422a89e9ab0685d9f291edae780c98 ] Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/ Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann Reported-by: David Hill Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8f5bbb29b62c818e16de49c0bbc945b501c157cf Author: Saeed Mahameed Date: Fri Nov 10 15:59:52 2017 +0900 Revert "mlx5: move affinity hints assignments to generic code" [ Upstream commit 231243c82793428467524227ae02ca451e6a98e7 ] Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo > /proc/irq//smp_affinity fails with -EIO This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg Cc: Thomas Gleixner Cc: Jes Sorensen Reported-by: Jes Sorensen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit f2272d5dce7905d605f26ba8a5f9961d05263e74 Author: Nicolas Dichtel Date: Tue Nov 14 14:21:32 2017 +0100 ipv6: set all.accept_dad to 0 by default [ Upstream commit 094009531612246d9e13f9e0c3ae2205d7f63a0a ] With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag is also taken into account (default value is 1). If either global or per-interface flag is non-zero, DAD will be enabled on a given interface. This is not backward compatible: before those patches, the user could disable DAD just by setting the per-interface flag to 0. Now, the user instead needs to set both flags to 0 to actually disable DAD. Restore the previous behaviour by setting the default for the global 'accept_dad' flag to 0. This way, DAD is still enabled by default, as per-interface flags are set to 1 on device creation, but setting them to 0 is enough to disable DAD on a given interface. - Before 35e015e1f57a7 and a2d3f3e33853: global per-interface DAD enabled [default] 1 1 yes X 0 no X 1 yes - After 35e015e1f577 and a2d3f3e33853: global per-interface DAD enabled [default] 1 1 yes 0 0 no 0 1 yes 1 0 yes - After this fix: global per-interface DAD enabled 1 1 yes 0 0 no [default] 0 1 yes 1 0 yes Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real") CC: Stefano Brivio CC: Matteo Croce CC: Erik Kline Signed-off-by: Nicolas Dichtel Acked-by: Stefano Brivio Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 86deaaa0ca2b0657e1f72e1087153d4ff22a312b Author: Phil Sutter Date: Tue Dec 19 15:17:13 2017 +0100 ipv4: fib: Fix metrics match when deleting a route [ Upstream commit d03a45572efa068fa64db211d6d45222660e76c5 ] The recently added fib_metrics_match() causes a regression for routes with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has TCP_CONG_NEEDS_ECN flag set: | # ip link add d0 type dummy | # ip link set d0 up | # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp | # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp | RTNETLINK answers: No such process During route insertion, fib_convert_metrics() detects that the given CC algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in RTAX_FEATURES. During route deletion though, fib_metrics_match() compares stored RTAX_FEATURES value with that from userspace (which obviously has no knowledge about DST_FEATURE_ECN_CA) and fails. Fixes: 5f9ae3d9e7e4a ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by: Phil Sutter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 185a3475dee6c14dd67d1a94178b6b5615cf167b Author: Russell King Date: Wed Dec 20 23:21:34 2017 +0000 phylink: ensure AN is enabled [ Upstream commit 74ee0e8c1bf9925c59cc8f1c65c29adf6e4cf603 ] Ensure that we mark AN as enabled at boot time, rather than leaving it disabled. This is noticable if your SFP module is fiber, and it supports faster speeds than 1G with 2.5G support in place. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 39889c293371e826503b61d98050a49742ccaf9c Author: Russell King Date: Wed Dec 20 23:21:28 2017 +0000 phylink: ensure the PHY interface mode is appropriately set [ Upstream commit 182088aa3c6c7f7c20a2c1dcc9ded4a3fc631f38 ] When setting the ethtool settings, ensure that the validated PHY interface mode is propagated to the current link settings, so that 2500BaseX can be selected. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7f6dcb82d04045f143758f56caf031376f9bf16c Author: Calvin Owens Date: Fri Dec 8 09:05:26 2017 -0800 bnxt_en: Fix sources of spurious netpoll warnings [ Upstream commit 2edbdb3159d6f6bd3a9b6e7f789f2b879699a519 ] After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and 903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."), we still see the following WARN fire: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160 bnxt_poll+0x0/0xd0 exceeded budget in poll Call Trace: [] dump_stack+0x4d/0x70 [] __warn+0xd3/0xf0 [] warn_slowpath_fmt+0x4f/0x60 [] netpoll_poll_dev+0x15a/0x160 [] netpoll_send_skb_on_dev+0x168/0x250 [] netpoll_send_udp+0x2dc/0x440 [] write_ext_msg+0x20e/0x250 [] call_console_drivers.constprop.23+0xa5/0x110 [] console_unlock+0x339/0x5b0 [] vprintk_emit+0x2c8/0x450 [] vprintk_default+0x1f/0x30 [] printk+0x48/0x50 [] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core] [] edac_mc_handle_error+0x42b/0x6e0 [edac_core] [] sbridge_mce_output_error+0x410/0x10d0 [sb_edac] [] sbridge_check_error+0xac/0x130 [sb_edac] [] edac_mc_workq_function+0x3c/0x90 [edac_core] [] process_one_work+0x19b/0x480 [] worker_thread+0x6a/0x520 [] kthread+0xe4/0x100 [] ret_from_fork+0x22/0x40 This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually given a non-zero budget. Signed-off-by: Calvin Owens Acked-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 11295730446fd654aeef28a575d801f3679424fa Author: Jiri Pirko Date: Fri Dec 15 12:40:13 2017 +0100 net: sched: fix static key imbalance in case of ingress/clsact_init error [ Upstream commit b59e6979a86384e68b0ab6ffeab11f0034fba82d ] Move static key increments to the beginning of the init function so they pair 1:1 with decrements in ingress/clsact_destroy, which is called in case ingress/clsact_init fails. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 215b69e208089f42c29f9343ef11f4adc6ea6f6b Author: Alexey Kodanev Date: Thu Dec 14 20:20:00 2017 +0300 vxlan: restore dev->mtu setting based on lower device [ Upstream commit f870c1ff65a6d1f3a083f277280802ee09a5b44d ] Stefano Brivio says: Commit a985343ba906 ("vxlan: refactor verification and application of configuration") introduced a change in the behaviour of initial MTU setting: earlier, the MTU for a link created on top of a given lower device, without an initial MTU specification, was set to the MTU of the lower device minus headroom as a result of this path in vxlan_dev_configure(): if (!conf->mtu) dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); which is now gone. Now, the initial MTU, in absence of a configured value, is simply set by ether_setup() to ETH_DATA_LEN (1500 bytes). This breaks userspace expectations in case the MTU of the lower device is higher than 1500 bytes minus headroom. This patch restores the previous behaviour on newlink operation. Since max_mtu can be negative and we update dev->mtu directly, also check it for valid minimum. Reported-by: Junhan Yan Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration") Signed-off-by: Alexey Kodanev Acked-by: Stefano Brivio Signed-off-by: Stefano Brivio Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cea58617977baafe4a1be8f55c8e55d21c6cfb20 Author: Kamal Heib Date: Sun Oct 29 04:03:37 2017 +0200 net/mlx5: FPGA, return -EINVAL if size is zero [ Upstream commit bae115a2bb479142605726e6aa130f43f50e801a ] Currently, if a size of zero is passed to mlx5_fpga_mem_{read|write}_i2c() the "err" return value will not be initialized, which triggers gcc warnings: [..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error: uninitialized symbol 'err'. [..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error: uninitialized symbol 'err'. fix that. Fixes: a9956d35d199 ('net/mlx5: FPGA, Add SBU infrastructure') Signed-off-by: Kamal Heib Reviewed-by: Yevgeny Kliteynik Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 5504319c6993d51565b582c685c3e09c0e07fded Author: Eric Dumazet Date: Tue Dec 12 18:22:52 2017 -0800 tcp: refresh tcp_mstamp from timers callbacks [ Upstream commit 4688eb7cf3ae2c2721d1dacff5c1384cba47d176 ] Only the retransmit timer currently refreshes tcp_mstamp We should do the same for delayed acks and keepalives. Even if RFC 7323 does not request it, this is consistent to what linux did in the past, when TS values were based on jiffies. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet Cc: Soheil Hassas Yeganeh Cc: Mike Maloney Cc: Neal Cardwell Acked-by: Neal Cardwell Acked-by: Soheil Hassas Yeganeh Acked-by: Mike Maloney Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 333921964046fe8453a90b60aa55dc4725955073 Author: Ido Schimmel Date: Wed Dec 20 12:28:25 2017 +0200 ipv6: Honor specified parameters in fibmatch lookup [ Upstream commit 58acfd714e6b02e8617448b431c2b64a2f1f0792 ] Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel Acked-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5e255d684d0554893768b4452de404f293c71696 Author: Zhao Qiang Date: Mon Dec 18 10:26:43 2017 +0800 net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well. [ Upstream commit c505873eaece2b4aefd07d339dc7e1400e0235ac ] 88E1145 also need this autoneg errata. Fixes: f2899788353c ("net: phy: marvell: Limit errata to 88m1101") Signed-off-by: Zhao Qiang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 583395a81f00a42f178c1d8ba3c4858d76c9f5b4 Author: Wei Wang Date: Tue Dec 12 16:28:58 2017 -0800 tcp: fix potential underestimation on rcv_rtt [ Upstream commit 9ee11bd03cb1a5c3ca33c2bb70e7ed325f68890f ] When ms timestamp is used, current logic uses 1us in tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us. This could cause rcv_rtt underestimation. Fix it by always using a min value of 1ms if ms timestamp is used. Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bcc029ff5dafc68d00a5fceadd93763d2b43e0e3 Author: Yuval Mintz Date: Fri Dec 15 08:44:21 2017 +0100 mlxsw: spectrum: Disable MAC learning for ovs port [ Upstream commit fccff0862838908d21eaf956d57e09c6c189f7c5 ] Learning is currently enabled for ports which are OVS slaves - even though OVS doesn't need this indication. Since we're not associating a fid with the port, HW would continuously notify driver of learned [& aged] MACs which would be logged as errors. Fixes: 2b94e58df58c ("mlxsw: spectrum: Allow ports to work under OVS master") Signed-off-by: Yuval Mintz Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 92ae8233467b3a19a50fb02a7ebe065c6de3df17 Author: Parthasarathy Bhuvaragan Date: Thu Dec 28 12:03:06 2017 +0100 tipc: fix hanging poll() for stream sockets [ Upstream commit 517d7c79bdb39864e617960504bdc1aa560c75c6 ] In commit 42b531de17d2f6 ("tipc: Fix missing connection request handling"), we replaced unconditional wakeup() with condtional wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND. This breaks the applications which do a connect followed by poll with POLLOUT flag. These applications are not woken when the connection is ESTABLISHED and hence sleep forever. In this commit, we fix it by including the POLLOUT event for sockets in TIPC_CONNECTING state. Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling") Acked-by: Jon Maloy Signed-off-by: Parthasarathy Bhuvaragan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 201c59bb7ba69fd6a41ac6606d8049b94fb26232 Author: Xin Long Date: Sun Dec 10 15:40:51 2017 +0800 sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams [ Upstream commit 2342b8d95bcae5946e1b9b8d58645f37500ef2e7 ] Now in sctp_setsockopt_reset_streams, it only does the check optlen < sizeof(*params) for optlen. But it's not enough, as params->srs_number_streams should also match optlen. If the streams in params->srs_stream_list are less than stream nums in params->srs_number_streams, later when dereferencing the stream list, it could cause a slab-out-of-bounds crash, as reported by syzbot. This patch is to fix it by also checking the stream numbers in sctp_setsockopt_reset_streams to make sure at least it's not greater than the streams in the list. Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Reported-by: Dmitry Vyukov Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f38ffe325b209f367c003bab291cf7e96cd1a6d9 Author: Julian Wiedmann Date: Wed Dec 20 18:07:18 2017 +0100 s390/qeth: fix error handling in checksum cmd callback [ Upstream commit ad3cbf61332914711e5f506972b1dc9af8d62146 ] Make sure to check both return code fields before processing the response. Otherwise we risk operating on invalid data. Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ff1ff3815c2483ac19bd6f926a535dd4cea27e2e Author: Florian Fainelli Date: Tue Nov 21 17:37:46 2017 -0800 net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY [ Upstream commit 4b52d010113e11006a389f2a8315167ede9e0b10 ] The PHY on BCM7278 has an additional bit that needs to be cleared: IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out of suspend/resume cycles. Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 701768dc9a1030d3f02b9b3e9b4690d15f637dad Author: Bert Kenward Date: Thu Dec 7 17:18:58 2017 +0000 sfc: pass valid pointers from efx_enqueue_unwind [ Upstream commit d4a7a8893d4cdbc89d79ac4aa704bf8d4b67b368 ] The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers cannot be NULL. Add a paranoid warning to check this condition and fix the one case where they were NULL. efx_enqueue_unwind() is called very rarely, during error handling. Without this fix it would fail with a NULL pointer dereference in efx_dequeue_buffer, with efx_enqueue_skb in the call stack. Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2") Reported-by: Jarod Wilson Signed-off-by: Bert Kenward Tested-by: Jarod Wilson Acked-by: Jarod Wilson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a6cc63e125ffb3ae9f6b4e2b4642ddea9a932b46 Author: Eric Garver Date: Wed Dec 20 15:09:22 2017 -0500 openvswitch: Fix pop_vlan action for double tagged frames [ Upstream commit c48e74736fccf25fb32bb015426359e1c2016e3b ] skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver Reviewed-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bf070305213031e1300070d10334ac012116d470 Author: Moni Shoua Date: Mon Dec 4 08:59:25 2017 +0200 net/mlx5: Fix error flow in CREATE_QP command [ Upstream commit dbff26e44dc3ec4de6578733b054a0114652a764 ] In error flow, when DESTROY_QP command should be executed, the wrong mailbox was set with data, not the one that is written to hardware, Fix that. Fixes: 09a7d9eca1a6 '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc' Signed-off-by: Moni Shoua Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 999755ec40a6206d0c9f3edfffe2b00a7a66a6e9 Author: Gal Pressman Date: Mon Dec 4 09:57:43 2017 +0200 net/mlx5e: Prevent possible races in VXLAN control flow [ Upstream commit 0c1cc8b2215f5122ca614b5adca60346018758c3 ] When calling add/remove VXLAN port, a lock must be held in order to prevent race scenarios when more than one add/remove happens at the same time. Fix by holding our state_lock (mutex) as done by all other parts of the driver. Note that the spinlock protecting the radix-tree is still needed in order to synchronize radix-tree access from softirq context. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit c4d0e614c151f102967ff38bdf7c4372c146a5a6 Author: Gal Pressman Date: Sun Dec 3 13:58:50 2017 +0200 net/mlx5e: Add refcount to VXLAN structure [ Upstream commit 23f4cc2cd9ed92570647220aca60d0197d8c1fa9 ] A refcount mechanism must be implemented in order to prevent unwanted scenarios such as: - Open an IPv4 VXLAN interface - Open an IPv6 VXLAN interface (different socket) - Remove one of the interfaces With current implementation, the UDP port will be removed from our VXLAN database and turn off the offloads for the other interface, which is still active. The reference count mechanism will only allow UDP port removals once all consumers are gone. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 597181622e649b30a4c4d6bec4816d2f8987d25a Author: Gal Pressman Date: Tue Nov 21 17:49:36 2017 +0200 net/mlx5e: Fix features check of IPv6 traffic [ Upstream commit 2989ad1ec03021ee6d2193c35414f1d970a243de ] The assumption that the next header field contains the transport protocol is wrong for IPv6 packets with extension headers. Instead, we should look the inner-most next header field in the buffer. This will fix TSO offload for tunnels over IPv6 with extension headers. Performance testing: 19.25x improvement, cool! Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap. CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] TSO: Enabled Before: 4,926.24 Mbps Now : 94,827.91 Mbps Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 2dc5654e6fbc46161c56058081258bd3ad8d1ab0 Author: Gal Pressman Date: Thu Nov 23 13:52:28 2017 +0200 net/mlx5e: Fix possible deadlock of VXLAN lock [ Upstream commit 6323514116404cc651df1b7fffa1311ddf8ce647 ] mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user context) and mlx5e_features_check (softirq), but the lock acquired does not disable bottom half and might result in deadlock. Fix it by simply replacing spin_lock() with spin_lock_bh(). While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh(). lockdep's WARNING: inconsistent lock state [ 654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes: [ 654.028321] (&(&vxlan_db->lock)->rlock){+.?.}, at: [] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028528] {SOFTIRQ-ON-W} state was registered at: [ 654.028607] _raw_spin_lock+0x3c/0x70 [ 654.028689] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028794] mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core] [ 654.028878] process_one_work+0x1e9/0x640 [ 654.028942] worker_thread+0x4a/0x3f0 [ 654.029002] kthread+0x141/0x180 [ 654.029056] ret_from_fork+0x24/0x30 [ 654.029114] irq event stamp: 579088 [ 654.029174] hardirqs last enabled at (579088): [] ip6_finish_output2+0x49a/0x8c0 [ 654.029309] hardirqs last disabled at (579087): [] ip6_finish_output2+0x44e/0x8c0 [ 654.029446] softirqs last enabled at (579030): [] irq_enter+0x6d/0x80 [ 654.029567] softirqs last disabled at (579031): [] irq_exit+0xb5/0xc0 [ 654.029684] other info that might help us debug this: [ 654.029781] Possible unsafe locking scenario: [ 654.029868] CPU0 [ 654.029908] ---- [ 654.029947] lock(&(&vxlan_db->lock)->rlock); [ 654.030045] [ 654.030090] lock(&(&vxlan_db->lock)->rlock); [ 654.030162] *** DEADLOCK *** Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 3ddcb727c71779c23097d440355c4abe27379558 Author: Eran Ben Elisha Date: Mon Nov 13 10:11:27 2017 +0200 net/mlx5: Fix rate limit packet pacing naming and struct [ Upstream commit 37e92a9d4fe38dc3e7308913575983a6a088c8d4 ] In mlx5_ifc, struct size was not complete, and thus driver was sending garbage after the last defined field. Fixed it by adding reserved field to complete the struct size. In addition, rename all set_rate_limit to set_pp_rate_limit to be compliant with the Firmware <-> Driver definition. Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates") Fixes: 1466cc5b23d1 ("net/mlx5: Rate limit tables support") Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit f35318b289446015ce2d2cbfd240ba47bda6f187 Author: Yousuk Seung Date: Thu Dec 7 13:41:34 2017 -0800 tcp: invalidate rate samples during SACK reneging [ Upstream commit d4761754b4fb2ef8d9a1e9d121c4bec84e1fe292 ] Mark tcp_sock during a SACK reneging event and invalidate rate samples while marked. Such rate samples may overestimate bw by including packets that were SACKed before reneging. < ack 6001 win 10000 sack 7001:38001 < ack 7001 win 0 sack 8001:38001 // Reneg detected > seq 7001:8001 // RTO, SACK cleared. < ack 38001 win 10000 In above example the rate sample taken after the last ack will count 7001-38001 as delivered while the actual delivery rate likely could be much lower i.e. 7001-8001. This patch adds a new field tcp_sock.sack_reneg and marks it when we declare SACK reneging and entering TCP_CA_Loss, and unmarks it after the last rate sample was taken before moving back to TCP_CA_Open. This patch also invalidates rate samples taken while tcp_sock.is_sack_reneg is set. Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection") Signed-off-by: Yousuk Seung Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Acked-by: Eric Dumazet Acked-by: Priyaranjan Jha Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 265ba7a046c05f67f80c2540ec3212dafe5b3673 Author: Willem de Bruijn Date: Wed Dec 13 14:41:06 2017 -0500 sock: free skb in skb_complete_tx_timestamp on error [ Upstream commit 35b99dffc3f710cafceee6c8c6ac6a98eb2cb4bf ] skb_complete_tx_timestamp must ingest the skb it is passed. Call kfree_skb if the skb cannot be enqueued. Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()") Reported-by: Richard Cochran Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 003514ffb447120d1a5997827ad5226d7b5a82e1 Author: Grygorii Strashko Date: Wed Dec 20 18:45:10 2017 -0600 net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround [ Upstream commit c1a8d0a3accf64a014d605e6806ce05d1c17adf1 ] Under some circumstances driver will perform PHY reset in ksz9031_read_status() to fix autoneg failure case (idle error count = 0xFF). When this happens ksz9031 will not detect link status change any more when connecting to Netgear 1G switch (link can be recovered sometimes by restarting netdevice "ifconfig down up"). Reproduced with TI am572x board equipped with ksz9031 PHY while connecting to Netgear 1G switch. Fix the issue by reconfiguring autonegotiation after PHY reset in ksz9031_read_status(). Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg") Signed-off-by: Grygorii Strashko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dd9a2648b3e35c2369f580215d916baf7e23253a Author: Eric W. Biederman Date: Tue Dec 19 11:27:56 2017 -0600 net: Fix double free and memory corruption in get_net_ns_by_id() [ Upstream commit 21b5944350052d2583e82dd59b19a9ba94a007f0 ] (I can trivially verify that that idr_remove in cleanup_net happens after the network namespace count has dropped to zero --EWB) Function get_net_ns_by_id() does not check for net::count after it has found a peer in netns_ids idr. It may dereference a peer, after its count has already been finaly decremented. This leads to double free and memory corruption: put_net(peer) rtnl_lock() atomic_dec_and_test(&peer->count) [count=0] ... __put_net(peer) get_net_ns_by_id(net, id) spin_lock(&cleanup_list_lock) list_add(&net->cleanup_list, &cleanup_list) spin_unlock(&cleanup_list_lock) queue_work() peer = idr_find(&net->netns_ids, id) | get_net(peer) [count=1] | ... | (use after final put) v ... cleanup_net() ... spin_lock(&cleanup_list_lock) ... list_replace_init(&cleanup_list, ..) ... spin_unlock(&cleanup_list_lock) ... ... ... ... put_net(peer) ... atomic_dec_and_test(&peer->count) [count=0] ... spin_lock(&cleanup_list_lock) ... list_add(&net->cleanup_list, &cleanup_list) ... spin_unlock(&cleanup_list_lock) ... queue_work() ... rtnl_unlock() rtnl_lock() ... for_each_net(tmp) { ... id = __peernet2id(tmp, peer) ... spin_lock_irq(&tmp->nsid_lock) ... idr_remove(&tmp->netns_ids, id) ... ... ... net_drop_ns() ... net_free(peer) ... } ... | v cleanup_net() ... (Second free of peer) Also, put_net() on the right cpu may reorder with left's cpu list_replace_init(&cleanup_list, ..), and then cleanup_list will be corrupted. Since cleanup_net() is executed in worker thread, while put_net(peer) can happen everywhere, there should be enough time for concurrent get_net_ns_by_id() to pick the peer up, and the race does not seem to be unlikely. The patch fixes the problem in standard way. (Also, there is possible problem in peernet2id_alloc(), which requires check for net::count under nsid_lock and maybe_get_net(peer), but in current stable kernel it's used under rtnl_lock() and it has to be safe. Openswitch begun to use peernet2id_alloc(), and possibly it should be fixed too. While this is not in stable kernel yet, so I'll send a separate message to netdev@ later). Cc: Nicolas Dichtel Signed-off-by: Kirill Tkhai Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids" Reviewed-by: Andrey Ryabinin Reviewed-by: "Eric W. Biederman" Signed-off-by: Eric W. Biederman Reviewed-by: Eric Dumazet Acked-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 126f42ecfcb48ec50b289124f23dafa499012650 Author: Nikolay Aleksandrov Date: Mon Dec 18 17:35:09 2017 +0200 net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks [ Upstream commit 84aeb437ab98a2bce3d4b2111c79723aedfceb33 ] The early call to br_stp_change_bridge_id in bridge's newlink can cause a memory leak if an error occurs during the newlink because the fdb entries are not cleaned up if a different lladdr was specified, also another minor issue is that it generates fdb notifications with ifindex = 0. Another unrelated memory leak is the bridge sysfs entries which get added on NETDEV_REGISTER event, but are not cleaned up in the newlink error path. To remove this special case the call to br_stp_change_bridge_id is done after netdev register and we cleanup the bridge on changelink error via br_dev_delete to plug all leaks. This patch makes netlink bridge destruction on newlink error the same as dellink and ioctl del which is necessary since at that point we have a fully initialized bridge device. To reproduce the issue: $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1 RTNETLINK answers: Invalid argument $ rmmod bridge [ 1822.142525] ============================================================================= [ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown() [ 1822.144821] ----------------------------------------------------------------------------- [ 1822.145990] Disabling lock debugging due to kernel taint [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100 [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87 [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1822.150008] Call Trace: [ 1822.150510] dump_stack+0x78/0xa9 [ 1822.151156] slab_err+0xb1/0xd3 [ 1822.151834] ? __kmalloc+0x1bb/0x1ce [ 1822.152546] __kmem_cache_shutdown+0x151/0x28b [ 1822.153395] shutdown_cache+0x13/0x144 [ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb [ 1822.154669] SyS_delete_module+0x194/0x244 [ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a [ 1822.156343] RIP: 0033:0x7f929bd38b17 [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17 [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0 [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11 [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80 [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090 [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0 [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128 Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device") Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 27ccace9b982ea796e15ba9dc6af5941539c6be8 Author: Ido Schimmel Date: Wed Dec 20 19:34:19 2017 +0200 ipv4: Fix use-after-free when flushing FIB tables [ Upstream commit b4681c2829e24943aadd1a7bb3a30d41d0a20050 ] Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the local table uses the same trie allocated for the main table when custom rules are not in use. When a net namespace is dismantled, the main table is flushed and freed (via an RCU callback) before the local table. In case the callback is invoked before the local table is iterated, a use-after-free can occur. Fix this by iterating over the FIB tables in reverse order, so that the main table is always freed after the local table. v3: Reworded comment according to Alex's suggestion. v2: Add a comment to make the fix more explicit per Dave's and Alex's feedback. Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel Reported-by: Fengguang Wu Acked-by: Alexander Duyck Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 44319591ffa28770f12f33b70e4269d8cda8cb5f Author: Alexey Kodanev Date: Wed Dec 20 19:36:03 2017 +0300 ip6_gre: fix device features for ioctl setup [ Upstream commit e5a9336adb317db55eb3fe8200856096f3c71109 ] When ip6gre is created using ioctl, its features, such as scatter-gather, GSO and tx-checksumming will be turned off: # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off generic-segmentation-offload: off [requested on] But when netlink is used, they will be enabled: # ip link add gre6 type ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on generic-segmentation-offload: on This results in a loss of performance when gre6 is created via ioctl. The issue was found with LTP/gre tests. Fix it by moving the setup of device features to a separate function and invoke it with ndo_init callback because both netlink and ioctl will eventually call it via register_netdevice(): register_netdevice() - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init() - ip6gre_tunnel_init_common() - ip6gre_tnl_init_features() The moved code also contains two minor style fixes: * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line. * fixed the issue reported by checkpatch: "Unnecessary parentheses around 'nt->encap.type == TUNNEL_ENCAP_NONE'" Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO") Signed-off-by: Alexey Kodanev Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6d1c489810bcde9266ccdbedc0861a5dc9778f60 Author: Nikita V. Shirokov Date: Wed Dec 6 17:15:43 2017 -0800 adding missing rcu_read_unlock in ipxip6_rcv [ Upstream commit 74c4b656c3d92ec4c824ea1a4afd726b7b6568c8 ] commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") introduced new exit point in ipxip6_rcv. however rcu_read_unlock is missing there. this diff is fixing this v1->v2: instead of doing rcu_read_unlock in place, we are going to "drop" section (to prevent skb leakage) Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: Nikita V. Shirokov Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a3927015a4bbcb73d35cdbed24f81efeba1d7a6a Author: Tonghao Zhang Date: Fri Dec 22 10:15:20 2017 -0800 sctp: Replace use of sockets_allocated with specified macro. [ Upstream commit 8cb38a602478e9f806571f6920b0a3298aabf042 ] The patch(180d8cd942ce) replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to accessor macros. But the sockets_allocated field of sctp sock is not replaced at all. Then replace it now for unifying the code. Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.") Cc: Glauber Costa Signed-off-by: Tonghao Zhang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9f49cbc7cd207f27ead81246d9553515aea68131 Author: Tobias Jordan Date: Wed Dec 6 15:23:23 2017 +0100 net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case [ Upstream commit 589bf32f09852041fbd3b7ce1a9e703f95c230ba ] add appropriate calls to clk_disable_unprepare() by jumping to out_mdio in case orion_mdio_probe() returns -EPROBE_DEFER. Found by Linux Driver Verification project (linuxtesting.org). Fixes: 3d604da1e954 ("net: mvmdio: get and enable optional clock") Signed-off-by: Tobias Jordan Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3bc400bad0e003d40a0a2412411aed7cbae16f96 Author: Mohamed Ghannam Date: Sun Dec 10 03:50:58 2017 +0000 net: ipv4: fix for a race condition in raw_sendmsg [ Upstream commit 8f659a03a0ba9289b9aeb9b4470e6fb263d6f483 ] inet->hdrincl is racy, and could lead to uninitialized stack pointer usage, so its value should be read only once. Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt") Signed-off-by: Mohamed Ghannam Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 72b44d0434c1a4688b55aaa59150d74344721306 Author: Julian Wiedmann Date: Wed Dec 13 18:56:32 2017 +0100 s390/qeth: update takeover IPs after configuration change [ Upstream commit 02f510f326501470348a5df341e8232c3497bbbb ] Any modification to the takeover IP-ranges requires that we re-evaluate which IP addresses are takeover-eligible. Otherwise we might do takeover for some addresses when we no longer should, or vice-versa. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8658408f284efd553a882ee717326845130c51d9 Author: Julian Wiedmann Date: Wed Dec 13 18:56:31 2017 +0100 s390/qeth: lock IP table while applying takeover changes [ Upstream commit 8a03a3692b100d84785ee7a834e9215e304c9e00 ] Modifying the flags of an IP addr object needs to be protected against eg. concurrent removal of the same object from the IP table. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e34a43e57c218bf28f8ec73a1ce68fec41c4b20c Author: Julian Wiedmann Date: Wed Dec 13 18:56:30 2017 +0100 s390/qeth: don't apply takeover changes to RXIP [ Upstream commit b22d73d6689fd902a66c08ebe71ab2f3b351e22f ] When takeover is switched off, current code clears the 'TAKEOVER' flag on all IPs. But the flag is also used for RXIP addresses, and those should not be affected by the takeover mode. Fix the behaviour by consistenly applying takover logic to NORMAL addresses only. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 621b5ae0f9f4f9ef91bf441afc086ecf5e752d51 Author: Julian Wiedmann Date: Wed Dec 13 18:56:29 2017 +0100 s390/qeth: apply takeover changes when mode is toggled [ Upstream commit 7fbd9493f0eeae8cef58300505a9ef5c8fce6313 ] Just as for an explicit enable/disable, toggling the takeover mode also requires that the IP addresses get updated. Otherwise all IPs that were added to the table before the mode-toggle, get registered with the old settings. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c7e9d724785dc9b6a423bddc6b4fa6fb1368692f Author: Neal Cardwell Date: Thu Dec 7 12:43:32 2017 -0500 tcp_bbr: reset long-term bandwidth sampling on loss recovery undo [ Upstream commit 600647d467c6d04b3954b41a6ee1795b5ae00550 ] Fix BBR so that upon notification of a loss recovery undo BBR resets long-term bandwidth sampling. Under high reordering, reordering events can be interpreted as loss. If the reordering and spurious loss estimates are high enough, this can cause BBR to spuriously estimate that we are seeing loss rates high enough to trigger long-term bandwidth estimation. To avoid that problem, this commit resets long-term bandwidth sampling on loss recovery undo events. Signed-off-by: Neal Cardwell Reviewed-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit eb710b5f62ad3a18742bae70f91e8664ee23cbe3 Author: Neal Cardwell Date: Thu Dec 7 12:43:31 2017 -0500 tcp_bbr: reset full pipe detection on loss recovery undo [ Upstream commit 2f6c498e4f15d27852c04ed46d804a39137ba364 ] Fix BBR so that upon notification of a loss recovery undo BBR resets the full pipe detection (STARTUP exit) state machine. Under high reordering, reordering events can be interpreted as loss. If the reordering and spurious loss estimates are high enough, this could previously cause BBR to spuriously estimate that the pipe is full. Since spurious loss recovery means that our overall sending will have slowed down spuriously, this commit gives a flow more time to probe robustly for bandwidth and decide the pipe is really full. Signed-off-by: Neal Cardwell Reviewed-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d3f3d4134eb700ad087691effbdf26fe8f3bfd90 Author: Brian King Date: Fri Dec 15 15:21:50 2017 -0600 tg3: Fix rx hang on MTU change with 5717/5719 [ Upstream commit 748a240c589824e9121befb1cba5341c319885bc ] This fixes a hang issue seen when changing the MTU size from 1500 MTU to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom, they've indicated that these chipsets have the same phy as the 57766 chipset, so the same workarounds apply. This has been tested by IBM on both Power 8 and Power 9 systems as well as by Broadcom on x86 hardware and has been confirmed to resolve the hang issue. Signed-off-by: Brian King Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4f2963559f29f34a8dbcc35a331f152fa3492224 Author: Christoph Paasch Date: Mon Dec 11 00:05:46 2017 -0800 tcp md5sig: Use skb's saddr when replying to an incoming segment [ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ] The MD5-key that belongs to a connection is identified by the peer's IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying to an incoming segment from tcp_check_req() that failed the seq-number checks. Thus, to find the correct key, we need to use the skb's saddr and not the daddr. This bug seems to have been there since quite a while, but probably got unnoticed because the consequences are not catastrophic. We will call tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer, thus the connection doesn't really fail. Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().") Signed-off-by: Christoph Paasch Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e414e7f03c29a4d3f07e042e18bdfe84269897df Author: Neal Cardwell Date: Thu Dec 7 12:43:30 2017 -0500 tcp_bbr: record "full bw reached" decision in new full_bw_reached bit [ Upstream commit c589e69b508d29ed8e644dfecda453f71c02ec27 ] This commit records the "full bw reached" decision in a new full_bw_reached bit. This is a pure refactor that does not change the current behavior, but enables subsequent fixes and improvements. In particular, this enables simple and clean fixes because the full_bw and full_bw_cnt can be unconditionally zeroed without worrying about forgetting that we estimated we filled the pipe in Startup. And it enables future improvements because multiple code paths can be used for estimating that we filled the pipe in Startup; any new code paths only need to set this bit when they think the pipe is full. Note that this fix intentionally reduces the width of the full_bw_cnt counter, since we have never used the most significant bit. Signed-off-by: Neal Cardwell Reviewed-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e7728247372ca9f7faccca4a37e96964f796c94b Author: Avinash Repaka Date: Thu Dec 21 20:17:04 2017 -0800 RDS: Check cmsg_len before dereferencing CMSG_DATA [ Upstream commit 14e138a86f6347c6199f610576d2e11c03bec5f0 ] RDS currently doesn't check if the length of the control message is large enough to hold the required data, before dereferencing the control message data. This results in following crash: BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 [inline] BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157 CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 rds_rdma_bytes net/rds/send.c:1013 [inline] rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108 SYSC_sendmmsg net/socket.c:2139 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2134 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x43fe49 RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49 RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0 R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000 To fix this, we verify that the cmsg_len is large enough to hold the data to be read, before proceeding further. Reported-by: syzbot Signed-off-by: Avinash Repaka Acked-by: Santosh Shilimkar Reviewed-by: Yuval Shaia Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 78ce0e9c4183a34b0910ec36c4ad8df1f7bf7874 Author: Michael S. Tsirkin Date: Tue Dec 5 21:29:37 2017 +0200 ptr_ring: add barriers [ Upstream commit a8ceb5dbfde1092b466936bca0ff3be127ecf38e ] Users of ptr_ring expect that it's safe to give the data structure a pointer and have it be available to consumers, but that actually requires an smb_wmb or a stronger barrier. In absence of such barriers and on architectures that reorder writes, consumer might read an un=initialized value from an skb pointer stored in the skb array. This was observed causing crashes. To fix, add memory barriers. The barrier we use is a wmb, the assumption being that producers do not need to read the value so we do not need to order these reads. Reported-by: George Cherian Suggested-by: Jason Wang Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6d0317869c9153d7e6efa769a764b54019d4a3d3 Author: Shaohua Li Date: Wed Dec 20 12:10:21 2017 -0800 net: reevalulate autoflowlabel setting after sysctl setting [ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ] sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau Cc: Eric Dumazet Cc: Tom Herbert Signed-off-by: Shaohua Li Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1bad9c5ea85e7522d403559cc249dd7a0ae705b1 Author: Sebastian Sjoholm Date: Mon Dec 11 21:51:14 2017 +0100 net: qmi_wwan: add Sierra EM7565 1199:9091 [ Upstream commit aceef61ee56898cfa7b6960fb60b9326c3860441 ] Sierra Wireless EM7565 is an Qualcomm MDM9x50 based M.2 modem. The USB id is added to qmi_wwan.c to allow QMI communication with the EM7565. Signed-off-by: Sebastian Sjoholm Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e3fb538e5715250d6a61a26925215229f2e9f52f Author: Kevin Cernekee Date: Wed Dec 6 12:12:27 2017 -0800 netlink: Add netns check on taps [ Upstream commit 93c647643b48f0131f02e45da3bd367d80443291 ] Currently, a nlmon link inside a child namespace can observe systemwide netlink activity. Filter the traffic so that nlmon can only sniff netlink messages from its own netns. Test case: vpnns -- bash -c "ip link add nlmon0 type nlmon; \ ip link set nlmon0 up; \ tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" & sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \ spi 0x1 mode transport \ auth sha1 0x6162633132330000000000000000000000000000 \ enc aes 0x00000000000000000000000000000000 grep --binary abc123 /tmp/nlmon.pcap Signed-off-by: Kevin Cernekee Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f9c48469278060450e99c3995c5b0717c48d50b8 Author: Kevin Cernekee Date: Mon Dec 11 11:13:45 2017 -0800 net: igmp: Use correct source address on IGMPv3 reports [ Upstream commit a46182b00290839fa3fa159d54fd3237bd8669f0 ] Closing a multicast socket after the final IPv4 address is deleted from an interface can generate a membership report that uses the source IP from a different interface. The following test script, run from an isolated netns, reproduces the issue: #!/bin/bash ip link add dummy0 type dummy ip link add dummy1 type dummy ip link set dummy0 up ip link set dummy1 up ip addr add 10.1.1.1/24 dev dummy0 ip addr add 192.168.99.99/24 dev dummy1 tcpdump -U -i dummy0 & socat EXEC:"sleep 2" \ UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 & sleep 1 ip addr del 10.1.1.1/24 dev dummy0 sleep 5 kill %tcpdump RFC 3376 specifies that the report must be sent with a valid IP source address from the destination subnet, or from address 0.0.0.0. Add an extra check to make sure this is the case. Signed-off-by: Kevin Cernekee Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f55ac6684640a285465cf31a8d5e23bcc3da872d Author: Fugang Duan Date: Fri Dec 22 17:12:09 2017 +0800 net: fec: unmap the xmit buffer that are not transferred by DMA [ Upstream commit 178e5f57a8d8f8fc5799a624b96fc31ef9a29ffa ] The enet IP only support 32 bit, it will use swiotlb buffer to do dma mapping when xmit buffer DMA memory address is bigger than 4G in i.MX platform. After stress suspend/resume test, it will print out: log: [12826.352864] fec 5b040000.ethernet: swiotlb buffer is full (sz: 191 bytes) [12826.359676] DMA: Out of SW-IOMMU space for 191 bytes at device 5b040000.ethernet [12826.367110] fec 5b040000.ethernet eth0: Tx DMA memory map failed The issue is that the ready xmit buffers that are dma mapped but DMA still don't copy them into fifo, once MAC restart, these DMA buffers are not unmapped. So it should check the dma mapping buffer and unmap them. Signed-off-by: Fugang Duan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 521f4d9625b7b2359dbec6039724b4feb9e8490d Author: Eric Dumazet Date: Mon Dec 11 07:03:38 2017 -0800 ipv6: mcast: better catch silly mtu values [ Upstream commit b9b312a7a451e9c098921856e7cfbc201120e1a7 ] syzkaller reported crashes in IPv6 stack [1] Xin Long found that lo MTU was set to silly values. IPv6 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in mld code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv6 minimal MTU. [1] skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20 head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801db307508 EFLAGS: 00010286 RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000 RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95 RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020 R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540 FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_over_panic net/core/skbuff.c:109 [inline] skb_put+0x181/0x1c0 net/core/skbuff.c:1694 add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695 add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817 mld_send_cr net/ipv6/mcast.c:1903 [inline] mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448 call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320 expire_timers kernel/time/timer.c:1357 [inline] __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660 run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 __do_softirq+0x29d/0xbb2 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d3/0x210 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:540 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920 Signed-off-by: Eric Dumazet Reported-by: syzbot Tested-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 57dfc3d10e40053e70691260a791bb453f183cfd Author: Eric Dumazet Date: Mon Dec 11 07:17:39 2017 -0800 ipv4: igmp: guard against silly MTU values [ Upstream commit b5476022bbada3764609368f03329ca287528dc8 ] IPv4 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in igmp code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv4 minimal MTU. This patch adds missing IPV4_MIN_MTU define, to not abuse ETH_MIN_MTU anymore. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit aa7f9011bc01b22658d4c255e2feecf0193e092d Author: Linus Torvalds Date: Fri Dec 29 17:34:43 2017 -0800 kbuild: add '-fno-stack-check' to kernel build options commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream. It appears that hardened gentoo enables "-fstack-check" by default for gcc. That doesn't work _at_all_ for the kernel, because the kernel stack doesn't act like a user stack at all: it's much smaller, and it doesn't auto-expand on use. So the extra "probe one page below the stack" code generated by -fstack-check just breaks the kernel in horrible ways, causing infinite double faults etc. [ I have to say, that the particular code gcc generates looks very stupid even for user space where it works, but that's a separate issue. ] Reported-and-tested-by: Alexander Tsoy Reported-and-tested-by: Toralf Förster Cc: Dave Hansen Cc: Jiri Kosina Cc: Andy Lutomirski Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit eaedee932c91120868d42df39a1fc17fa6a313ef Author: Ming Lei Date: Mon Dec 18 15:40:43 2017 +0800 block: don't let passthrough IO go into .make_request_fn() commit 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed upstream. Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES, but ignores that passthrough I/O can use blk_queue_bounce() too. Especially, passthrough IO may not be sector-aligned, and the check of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may become true even though the max bvec number doesn't exceed BIO_MAX_PAGES, then cause the bio splitted, and the original passthrough bio is submited to generic_make_request(). This patch fixes this issue by checking if the bio is passthrough IO, and use bio_kmalloc() to allocate the cloned passthrough bio. Cc: NeilBrown Fixes: a8821f3f3("block: Improvements to bounce-buffer handling") Tested-by: Michele Ballabio Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 88da02868f7741cc7c1d895840e236e6df5483ef Author: Jens Axboe Date: Mon Dec 18 15:40:44 2017 +0800 block: fix blk_rq_append_bio commit 0abc2a10389f0c9070f76ca906c7382788036b93 upstream. Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio) moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider the fact that the bounced bio becomes invisible to caller since the parameter type is 'struct bio *'. Make it a pointer to a pointer to a bio, so the caller sees the right bio also after a bounce. Fixes: caa4b02476e3 ("blk-map: call blk_queue_bounce from blk_rq_append_bio") Cc: Christoph Hellwig Reported-by: Michele Ballabio (handling failure of blk_rq_append_bio(), only call bio_get() after blk_rq_append_bio() returns OK) Tested-by: Michele Ballabio Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 0c688c288f8e82fdb5f3e0d0f3b2b1f3498ad187 Author: Joel Fernandes Date: Thu Dec 21 02:22:45 2017 +0100 cpufreq: schedutil: Use idle_calls counter of the remote CPU commit 466a2b42d67644447a1765276259a3ea5531ddff upstream. Since the recent remote cpufreq callback work, its possible that a cpufreq update is triggered from a remote CPU. For single policies however, the current code uses the local CPU when trying to determine if the remote sg_cpu entered idle or is busy. This is incorrect. To remedy this, compare with the nohz tick idle_calls counter of the remote CPU. Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks) Acked-by: Viresh Kumar Acked-by: Peter Zijlstra (Intel) Signed-off-by: Joel Fernandes Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 9c5ee053a67eb374878b20eab7d16feb2a87a851 Author: Takashi Iwai Date: Wed Dec 27 08:53:59 2017 +0100 ALSA: hda - Fix missing COEF init for ALC225/295/299 commit 44be77c590f381bc629815ac789b8b15ecc4ddcf upstream. There was a long-standing problem on HP Spectre X360 with Kabylake where it lacks of the front speaker output in some situations. Also there are other products showing the similar behavior. The culprit seems to be the missing COEF setup on ALC codecs, ALC225/295/299, which are all compatible. This patch adds the proper COEF setup (to initialize idx 0x67 / bits 0x3000) for addressing the issue. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195457 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 9fcd2ae2abb5efb67f331f567ed22a7305461a80 Author: Hui Wang Date: Fri Dec 22 11:17:45 2017 +0800 ALSA: hda - fix headset mic detection issue on a Dell machine commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream. It has the codec alc256, and add its pin definition to pin quirk table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 3d858b85e376b08db9c9de42aa4e0b9b47fac466 Author: Hui Wang Date: Fri Dec 22 11:17:46 2017 +0800 ALSA: hda - change the location for one mic on a Lenovo machine commit 8da5bbfc7cbba909f4f32d5e1dda3750baa5d853 upstream. There are two front mics on this machine, and current driver assign the same name Mic to both of them, but pulseaudio can't handle them. As a workaround, we change the location for one of them, then the driver will assign "Front Mic" and "Mic" for them. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 2845bbd1ef1f574ce2bbd7a7f6feab04ca137cf3 Author: Hui Wang Date: Fri Dec 22 11:17:44 2017 +0800 ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines commit 322f74ede933b3e2cb78768b6a6fdbfbf478a0c1 upstream. There is a headset jack on the front panel, when we plug a headset into it, the headset mic can't trigger unsol events, and read_pin_sense() can't detect its presence too. So add this fixup to fix this issue. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 056305595a9940005ff34d0a67e2a2b231d162b7 Author: Takashi Iwai Date: Fri Dec 22 10:45:07 2017 +0100 ALSA: hda: Drop useless WARN_ON() commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream. Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path") cleared hdac_acomp pointer, the WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give a false-positive warning, as the function gets called no matter whether the component is registered or not. For fixing it, let's get rid of the spurious WARN_ON(). Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path") Reported-by: Kouta Okamoto Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit af0dc162f644335c50e2384ba485e4aa79b25e28 Author: Moni Shoua Date: Sun Dec 24 13:54:58 2017 +0200 IB/core: Verify that QP is security enabled in create and destroy commit 4a50881bbac309e6f0684816a180bc3c14e1485d upstream. The XRC target QP create flow sets up qp_sec only if there is an IB link with LSM security enabled. However, several other related uAPI entry points blindly follow the qp_sec NULL pointer, resulting in a possible oops. Check for NULL before using qp_sec. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Reviewed-by: Daniel Jurgens Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman commit 5f3b36984c7b15e9d6ace5fc836b3474a8bb4213 Author: Moni Shoua Date: Sun Dec 24 13:54:57 2017 +0200 IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp() commit 05d14e7b0c138cb07ba30e464f47b39434f3fdef upstream. If the input command length is larger than the kernel supports an error should be returned in case the unsupported bytes are not cleared, instead of the other way aroudn. This matches what all other callers of ib_is_udata_cleared do and will avoid user ABI problems in the future. Fixes: 189aba99e700 ("IB/uverbs: Extend modify_qp and support packet pacing") Reviewed-by: Yishai Hadas Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman commit d471542b9f0706b9d37dba40bc02ef27b05a6bc1 Author: Majd Dibbiny Date: Sun Dec 24 13:54:56 2017 +0200 IB/mlx5: Serialize access to the VMA list commit ad9a3668a434faca1339789ed2f043d679199309 upstream. User-space applications can do mmap and munmap directly at any time. Since the VMA list is not protected with a mutex, concurrent accesses to the VMA list from the mmap and munmap can cause data corruption. Add a mutex around the list. Fixes: 7c2344c3bbf9 ("IB/mlx5: Implements disassociate_ucontext API") Reviewed-by: Yishai Hadas Signed-off-by: Majd Dibbiny Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman commit 907145e68e21947ec76d99827ab4eeeecff4a34c Author: Michael J. Ruhl Date: Fri Dec 22 08:47:20 2017 -0800 IB/hfi: Only read capability registers if the capability exists commit 4c009af473b2026caaa26107e34d7cc68dad7756 upstream. During driver init, various registers are saved to allow restoration after an FLR or gen3 bump. Some of these registers are not available in some circumstances (i.e. Virtual machines). This bug makes the driver unusable when the PCI device is passed into a VM, it fails during probe. Delete unnecessary register read/write, and only access register if the capability exists. Fixes: a618b7e40af2 ("IB/hfi1: Move saving PCI values to a separate function") Reviewed-by: Mike Marciniszyn Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman commit 074e2892a4205b0c88742d9d000c0c19e0b98244 Author: Christophe Leroy Date: Fri Dec 15 15:02:33 2017 +0100 gpio: fix "gpio-line-names" property retrieval commit 822703354774ec935169cbbc8d503236bcb54fda upstream. Following commit 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names() to use device property accessors"), "gpio-line-names" DT property is not retrieved anymore when chip->parent is not set by the driver. This is due to OF based property reads having been replaced by device based property reads. This patch fixes that by making use of fwnode_property_read_string_array() instead of device_property_read_string_array() and handing over either of_fwnode_handle(chip->of_node) or dev_fwnode(chip->parent) to that function. Fixes: 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names() to use device property accessors") Signed-off-by: Christophe Leroy Acked-by: Mika Westerberg Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit 077cb91c9fc35d638e014374491ac6a9f41b19ba Author: Andrew F. Davis Date: Wed Nov 29 15:32:46 2017 -0600 ASoC: tlv320aic31xx: Fix GPIO1 register definition commit 737e0b7b67bdfe24090fab2852044bb283282fc5 upstream. GPIO1 control register is number 51, fix this here. Fixes: bafcbfe429eb ("ASoC: tlv320aic31xx: Make the register values human readable") Signed-off-by: Andrew F. Davis Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 314d9cdf7e0fb83781e8fae836044856a449ffe0 Author: Johan Hovold Date: Mon Nov 13 12:12:56 2017 +0100 ASoC: twl4030: fix child-node lookup commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent codec node was also prematurely freed, while the child node was leaked. Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel") Signed-off-by: Johan Hovold Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit fe9f7bd45c01b4bc328d2442136c5456608f1637 Author: Maciej S. Szmigiero Date: Mon Nov 20 23:14:55 2017 +0100 ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure commit 695b78b548d8a26288f041e907ff17758df9e1d5 upstream. AC'97 ops (register read / write) need SSI regmap and clock, so they have to be set after them. We also need to set these ops back to NULL if we fail the probe. Signed-off-by: Maciej S. Szmigiero Acked-by: Nicolin Chen Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit c7d231ca5e0bf38f6920e92c97feadede0978d1a Author: Johan Hovold Date: Mon Nov 13 12:12:55 2017 +0100 ASoC: da7218: fix fix child-node lookup commit bc6476d6c1edcb9b97621b5131bd169aa81f27db upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent codec node was also prematurely freed. Fixes: 4d50934abd22 ("ASoC: da7218: Add da7218 codec driver") Signed-off-by: Johan Hovold Acked-by: Adam Thomson Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 308ddf2afe83994719006200204dc819fcbd9e7a Author: Ben Hutchings Date: Fri Dec 8 16:15:20 2017 +0000 ASoC: wm_adsp: Fix validation of firmware and coeff lengths commit 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 upstream. The checks for whether another region/block header could be present are subtracting the size from the current offset. Obviously we should instead subtract the offset from the size. The checks for whether the region/block data fit in the file are adding the data size to the current offset and header size, without checking for integer overflow. Rearrange these so that overflow is impossible. Signed-off-by: Ben Hutchings Acked-by: Charles Keepax Tested-by: Charles Keepax Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 23ef17a49f1edfa2c4b437c03b1c0d4da1d116d2 Author: Srinivas Kandagatla Date: Thu Nov 30 10:15:02 2017 +0000 ASoC: codecs: msm8916-wcd: Fix supported formats commit 51f493ae71adc2c49a317a13c38e54e1cdf46005 upstream. This codec is configurable for only 16 bit and 32 bit samples, so reflect this in the supported formats also remove 24bit sample from supported list. Signed-off-by: Srinivas Kandagatla Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 2aec84963e5edf3c463c37da0a94fb4c12c2d2c2 Author: Steve Wise Date: Mon Dec 18 13:10:00 2017 -0800 iw_cxgb4: Only validate the MSN for successful completions commit f55688c45442bc863f40ad678c638785b26cdce6 upstream. If the RECV CQE is in error, ignore the MSN check. This was causing recvs that were flushed into the sw cq to be completed with the wrong status (BAD_MSN instead of FLUSHED). Signed-off-by: Steve Wise Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman commit 0aea6fb0e777bb89437c8883c289e05e1f153ec8 Author: Steven Rostedt (VMware) Date: Fri Dec 22 21:19:29 2017 -0500 ring-buffer: Do no reuse reader page if still in use commit ae415fa4c5248a8cf4faabd5a3c20576cb1ad607 upstream. To free the reader page that is allocated with ring_buffer_alloc_read_page(), ring_buffer_free_read_page() must be called. For faster performance, this page can be reused by the ring buffer to avoid having to free and allocate new pages. The issue arises when the page is used with a splice pipe into the networking code. The networking code may up the page counter for the page, and keep it active while sending it is queued to go to the network. The incrementing of the page ref does not prevent it from being reused in the ring buffer, and this can cause the page that is being sent out to the network to be modified before it is sent by reading new data. Add a check to the page ref counter, and only reuse the page if it is not being used anywhere else. Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 66f833dbed02d39c44440b6b35ac088655c32edb Author: Steven Rostedt (VMware) Date: Fri Dec 22 20:32:35 2017 -0500 ring-buffer: Mask out the info bits when returning buffer page length commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream. Two info bits were added to the "commit" part of the ring buffer data page when returned to be consumed. This was to inform the user space readers that events have been missed, and that the count may be stored at the end of the page. What wasn't handled, was the splice code that actually called a function to return the length of the data in order to zero out the rest of the page before sending it up to user space. These data bits were returned with the length making the value negative, and that negative value was not checked. It was compared to PAGE_SIZE, and only used if the size was less than PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an unsigned compare, meaning the negative size value did not end up causing a large portion of memory to be randomly zeroed out. Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit e08acdb9620bcd4c55128cad07e625cd1825e533 Author: Thomas Gleixner Date: Fri Dec 15 20:35:11 2017 +0100 x86/ldt: Make the LDT mapping RO commit 9f5cb6b32d9e0a3a7453222baaf15664d92adbf2 upstream. Now that the LDT mapping is in a known area when PAGE_TABLE_ISOLATION is enabled its a primary target for attacks, if a user space interface fails to validate a write address correctly. That can never happen, right? The SDM states: If the segment descriptors in the GDT or an LDT are placed in ROM, the processor can enter an indefinite loop if software or the processor attempts to update (write to) the ROM-based segment descriptors. To prevent this problem, set the accessed bits for all segment descriptors placed in a ROM. Also, remove operating-system or executive code that attempts to modify segment descriptors located in ROM. So its a valid approach to set the ACCESS bit when setting up the LDT entry and to map the table RO. Fixup the selftest so it can handle that new mode. Remove the manual ACCESS bit setter in set_tls_desc() as this is now pointless. Folded the patch from Peter Ziljstra. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 704cfa04dde3a35cde59a67cafbcaeacbb3a76c8 Author: Thomas Gleixner Date: Mon Dec 4 15:08:06 2017 +0100 x86/mm/dump_pagetables: Allow dumping current pagetables commit a4b51ef6552c704764684cef7e753162dc87c5fa upstream. Add two debugfs files which allow to dump the pagetable of the current task. current_kernel dumps the regular page table. This is the page table which is normally shared between kernel and user space. If kernel page table isolation is enabled this is the kernel space mapping. If kernel page table isolation is enabled the second file, current_user, dumps the user space page table. These files allow to verify the resulting page tables for page table isolation, but even in the normal case its useful to be able to inspect user space page tables of current for debugging purposes. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 27e16c33bb796cceb92e339998d30145076b43cd Author: Thomas Gleixner Date: Mon Dec 4 15:08:05 2017 +0100 x86/mm/dump_pagetables: Check user space page table for WX pages commit b4bf4f924b1d7bade38fd51b2e401d20d0956e4d upstream. ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages, but does not check the PAGE_TABLE_ISOLATION user space page table. Restructure the code so that dmesg output is selected by an explicit argument and not implicit via checking the pgd argument for !NULL. Add the check for the user space page table. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit dfa58126d7630c0db0b64fa7bd15741293b517b5 Author: Borislav Petkov Date: Mon Dec 4 15:08:04 2017 +0100 x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy commit 75298aa179d56cd64f54e58a19fffc8ab922b4c0 upstream. The upcoming support for dumping the kernel and the user space page tables of the current process would create more random files in the top level debugfs directory. Add a page table directory and move the existing file to it. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 3dfd9fd8d897214b1a880c7fd8ed36b88faa1c02 Author: Dave Hansen Date: Mon Dec 4 15:08:03 2017 +0100 x86/mm/pti: Add Kconfig commit 385ce0ea4c078517fa51c261882c4e72fba53005 upstream. Finally allow CONFIG_PAGE_TABLE_ISOLATION to be enabled. PARAVIRT generally requires that the kernel not manage its own page tables. It also means that the hypervisor and kernel must agree wholeheartedly about what format the page tables are in and what they contain. PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they can not be used together. I've seen conflicting feedback from maintainers lately about whether they want the Kconfig magic to go first or last in a patch series. It's going last here because the partially-applied series leads to kernels that can not boot in a bunch of cases. I did a run through the entire series with CONFIG_PAGE_TABLE_ISOLATION=y to look for build errors, though. [ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 33d9d7836f0fa02777667d72bc815c12fbe61cac Author: Vlastimil Babka Date: Tue Dec 19 22:33:46 2017 +0100 x86/dumpstack: Indicate in Oops whether PTI is configured and enabled commit 5f26d76c3fd67c48806415ef8b1116c97beff8ba upstream. CONFIG_PAGE_TABLE_ISOLATION is relatively new and intrusive feature that may still have some corner cases which could take some time to manifest and be fixed. It would be useful to have Oops messages indicate whether it was enabled for building the kernel, and whether it was disabled during boot. Example of fully enabled: Oops: 0001 [#1] SMP PTI Example of enabled during build, but disabled during boot: Oops: 0001 [#1] SMP NOPTI We can decide to remove this after the feature has been tested in the field long enough. [ tglx: Made it use boot_cpu_has() as requested by Borislav ] Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Reviewed-by: Eduardo Valentin Acked-by: Dave Hansen Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: bpetkov@suse.de Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: jkosina@suse.cz Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit ef4b38472d6b1bf587554dfc7d5ab7abc835c1a5 Author: Peter Zijlstra Date: Tue Dec 5 13:34:53 2017 +0100 x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming commit 0a126abd576ebc6403f063dbe20cf7416c9d9393 upstream. Ideally we'd also use sparse to enforce this separation so it becomes much more difficult to mess up. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit c5548af97ae98f9d9e6ae5a9a005e605bd3c06b5 Author: Dave Hansen Date: Mon Dec 4 15:08:01 2017 +0100 x86/mm: Use INVPCID for __native_flush_tlb_single() commit 6cff64b86aaaa07f89f50498055a20e45754b0c1 upstream. This uses INVPCID to shoot down individual lines of the user mapping instead of marking the entire user map as invalid. This could/might/possibly be faster. This for sure needs tlb_single_page_flush_ceiling to be redetermined; esp. since INVPCID is _slow_. A detailed performance analysis is available here: https://lkml.kernel.org/r/3062e486-3539-8a1f-5724-16199420be71@intel.com [ Peterz: Split out from big combo patch ] Signed-off-by: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 36a72ab52c8d969a7a302082f52731c1be0e9ada Author: Peter Zijlstra Date: Mon Dec 4 15:08:00 2017 +0100 x86/mm: Optimize RESTORE_CR3 commit 21e94459110252d41b45c0c8ba50fd72a664d50c upstream. Most NMI/paranoid exceptions will not in fact change pagetables and would thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3 writes. Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the kernel mappings and now that we track which user PCIDs need flushing we can avoid those too when possible. This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both sites have plenty available. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit b63812b81349e5d1a35107e2464547187bc25a61 Author: Peter Zijlstra Date: Mon Dec 4 15:07:59 2017 +0100 x86/mm: Use/Fix PCID to optimize user/kernel switches commit 6fd166aae78c0ab738d49bda653cbd9e3b1491cf upstream. We can use PCID to retain the TLBs across CR3 switches; including those now part of the user/kernel switch. This increases performance of kernel entry/exit at the cost of more expensive/complicated TLB flushing. Now that we have two address spaces, one for kernel and one for user space, we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID (just like we use the PFN LSB for the PGD). Since we do TLB invalidation from kernel space, the existing code will only invalidate the kernel PCID, we augment that by marking the corresponding user PCID invalid, and upon switching back to userspace, use a flushing CR3 write for the switch. In order to access the user_pcid_flush_mask we use PER_CPU storage, which means the previously established SWAPGS vs CR3 ordering is now mandatory and required. Having to do this memory access does require additional registers, most sites have a functioning stack and we can spill one (RAX), sites without functional stack need to otherwise provide the second scratch register. Note: PCID is generally available on Intel Sandybridge and later CPUs. Note: Up until this point TLB flushing was broken in this series. Based-on-code-from: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 954339c41cceca991c0aec6dd3b7e164c5b9f48b Author: Dave Hansen Date: Mon Dec 4 15:07:58 2017 +0100 x86/mm: Abstract switching CR3 commit 48e111982cda033fec832c6b0592c2acedd85d04 upstream. In preparation to adding additional PCID flushing, abstract the loading of a new ASID into CR3. [ PeterZ: Split out from big combo patch ] Signed-off-by: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit c796e2324094f098575e47ec6d19f22cc4a4f9b9 Author: Dave Hansen Date: Mon Dec 4 15:07:57 2017 +0100 x86/mm: Allow flushing for future ASID switches commit 2ea907c4fe7b78e5840c1dc07800eae93248cad1 upstream. If changing the page tables in such a way that an invalidation of all contexts (aka. PCIDs / ASIDs) is required, they can be actively invalidated by: 1. INVPCID for each PCID (works for single pages too). 2. Load CR3 with each PCID without the NOFLUSH bit set 3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address. But, none of these are really feasible since there are ~6 ASIDs (12 with PAGE_TABLE_ISOLATION) at the time that invalidation is required. Instead of actively invalidating them, invalidate the *current* context and also mark the cpu_tlbstate _quickly_ to indicate future invalidation to be required. At the next context-switch, look for this indicator ('invalidate_other' being set) invalidate all of the cpu_tlbstate.ctxs[] entries. This ensures that any future context switches will do a full flush of the TLB, picking up the previous changes. [ tglx: Folded more fixups from Peter ] Signed-off-by: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 9617ee896217558a2488b6dc71968a4489fb18b6 Author: Andy Lutomirski Date: Tue Dec 12 07:56:42 2017 -0800 x86/pti: Map the vsyscall page if needed commit 85900ea51577e31b186e523c8f4e068c79ecc7d3 upstream. Make VSYSCALLs work fully in PTI mode by mapping them properly to the user space visible page tables. [ tglx: Hide unused functions (Patch by Arnd Bergmann) ] Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 7aef823ee7e9a74de0d2665d116bde557aa134ca Author: Andy Lutomirski Date: Tue Dec 12 07:56:45 2017 -0800 x86/pti: Put the LDT in its own PGD if PTI is on commit f55f0501cbf65ec41cca5058513031b711730b1d upstream. With PTI enabled, the LDT must be mapped in the usermode tables somewhere. The LDT is per process, i.e. per mm. An earlier approach mapped the LDT on context switch into a fixmap area, but that's a big overhead and exhausted the fixmap space when NR_CPUS got big. Take advantage of the fact that there is an address space hole which provides a completely unused pgd. Use this pgd to manage per-mm LDT mappings. This has a down side: the LDT isn't (currently) randomized, and an attack that can write the LDT is instant root due to call gates (thanks, AMD, for leaving call gates in AMD64 but designing them wrong so they're only useful for exploits). This can be mitigated by making the LDT read-only or randomizing the mapping, either of which is strightforward on top of this patch. This will significantly slow down LDT users, but that shouldn't matter for important workloads -- the LDT is only used by DOSEMU(2), Wine, and very old libc implementations. [ tglx: Cleaned it up. ] Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit c125107490107fc4ce6bad0d9b45fd5e33bfa3f3 Author: Andy Lutomirski Date: Tue Dec 12 07:56:44 2017 -0800 x86/mm/64: Make a full PGD-entry size hole in the memory map commit 9f449772a3106bcdd4eb8fdeb281147b0e99fb30 upstream. Shrink vmalloc space from 16384TiB to 12800TiB to enlarge the hole starting at 0xff90000000000000 to be a full PGD entry. A subsequent patch will use this hole for the pagetable isolation LDT alias. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 8b82023b7fc2c3734aae23bc03ed5937e67f7388 Author: Hugh Dickins Date: Mon Dec 4 15:07:50 2017 +0100 x86/events/intel/ds: Map debug buffers in cpu_entry_area commit c1961a4631daef4aeabee8e368b1b13e8f173c91 upstream. The BTS and PEBS buffers both have their virtual addresses programmed into the hardware. This means that any access to them is performed via the page tables. The times that the hardware accesses these are entirely dependent on how the performance monitoring hardware events are set up. In other words, there is no way for the kernel to tell when the hardware might access these buffers. To avoid perf crashes, place 'debug_store' allocate pages and map them into the cpu_entry_area. The PEBS fixup buffer does not need this treatment. [ tglx: Got rid of the kaiser_add_mapping() complication ] Signed-off-by: Hugh Dickins Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit e0eb34665d2eecddfa7a1810b76fae52313c1286 Author: Thomas Gleixner Date: Mon Dec 4 15:07:49 2017 +0100 x86/cpu_entry_area: Add debugstore entries to cpu_entry_area commit 10043e02db7f8a4161f76434931051e7d797a5f6 upstream. The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual addresses which must be visible in any execution context. So it is required to make these mappings visible to user space when kernel page table isolation is active. Provide enough room for the buffer mappings in the cpu_entry_area so the buffers are available in the user space visible page tables. At the point where the kernel side entry area is populated there is no buffer available yet, but the kernel PMD must be populated. To achieve this set the entries for these buffers to non present. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit d230c1917f57c3beee2e0204a4c8c58999758b95 Author: Andy Lutomirski Date: Fri Dec 15 22:08:18 2017 +0100 x86/mm/pti: Map ESPFIX into user space commit 4b6bbe95b87966ba08999574db65c93c5e925a36 upstream. Map the ESPFIX pages into user space when PTI is enabled. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit e08aa2f1988a7d4ded9c9674fe18857ee5c6fc53 Author: Thomas Gleixner Date: Mon Dec 4 15:07:47 2017 +0100 x86/mm/pti: Share entry text PMD commit 6dc72c3cbca0580642808d677181cad4c6433893 upstream. Share the entry text PMD of the kernel mapping with the user space mapping. If large pages are enabled this is a single PMD entry and at the point where it is copied into the user page table the RW bit has not been cleared yet. Clear it right away so the user space visible map becomes RX. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 088baf5de12eb7660e20f3f4efc1cf270acff5f4 Author: Thomas Gleixner Date: Mon Dec 4 15:07:46 2017 +0100 x86/entry: Align entry text section to PMD boundary commit 2f7412ba9c6af5ab16bdbb4a3fdb1dcd2b4fd3c2 upstream. The (irq)entry text must be visible in the user space page tables. To allow simple PMD based sharing, make the entry text PMD aligned. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit fb9dfabb6e803801b9e88ee6c0d56d3b54531b95 Author: Andy Lutomirski Date: Mon Dec 4 15:07:45 2017 +0100 x86/mm/pti: Share cpu_entry_area with user space page tables commit f7cfbee91559ca7e3e961a00ffac921208a115ad upstream. Share the cpu entry area so the user space and kernel space page tables have the same P4D page. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 35531133abf37ea2f00673a0953e397c286f7c7c Author: Thomas Gleixner Date: Mon Dec 4 15:07:43 2017 +0100 x86/mm/pti: Force entry through trampoline when PTI active commit 8d4b067895791ab9fdb1aadfc505f64d71239dd2 upstream. Force the entry through the trampoline only when PTI is active. Otherwise go through the normal entry code. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 9f006b0247234e6448e5fd04b5bcb0c56c968698 Author: Andy Lutomirski Date: Mon Dec 4 15:07:42 2017 +0100 x86/mm/pti: Add functions to clone kernel PMDs commit 03f4424f348e8be95eb1bbeba09461cd7b867828 upstream. Provide infrastructure to: - find a kernel PMD for a mapping which must be visible to user space for the entry/exit code to work. - walk an address range and share the kernel PMD with it. This reuses a small part of the original KAISER patches to populate the user space page table. [ tglx: Made it universally usable so it can be used for any kind of shared mapping. Add a mechanism to clear specific bits in the user space visible PMD entry. Folded Andys simplifactions ] Originally-by: Dave Hansen Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 1bcd98df0f50b28a5fb40eeb2cb7a17a02820232 Author: Dave Hansen Date: Mon Dec 4 15:07:40 2017 +0100 x86/mm/pti: Populate user PGD commit fc2fbc8512ed08d1de7720936fd7d2e4ce02c3a2 upstream. In clone_pgd_range() copy the init user PGDs which cover the kernel half of the address space, so a process has all the required kernel mappings visible. [ tglx: Split out from the big kaiser dump and folded Andys simplification ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 61fd4049e6760eb8832d4e0bec592d8a810b270f Author: Dave Hansen Date: Mon Dec 4 15:07:39 2017 +0100 x86/mm/pti: Allocate a separate user PGD commit d9e9a6418065bb376e5de8d93ce346939b9a37a6 upstream. Kernel page table isolation requires to have two PGDs. One for the kernel, which contains the full kernel mapping plus the user space mapping and one for user space which contains the user space mappings and the minimal set of kernel mappings which are required by the architecture to be able to transition from and to user space. Add the necessary preliminaries. [ tglx: Split out from the big kaiser dump. EFI fixup from Kirill ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit ffcb80ad79e8c3d87b43f2c9ee4b9170c7ec12ea Author: Dave Hansen Date: Mon Dec 4 15:07:38 2017 +0100 x86/mm/pti: Allow NX poison to be set in p4d/pgd commit 1c4de1ff4fe50453b968579ee86fac3da80dd783 upstream. With PAGE_TABLE_ISOLATION the user portion of the kernel page tables is poisoned with the NX bit so if the entry code exits with the kernel page tables selected in CR3, userspace crashes. But doing so trips the p4d/pgd_bad() checks. Make sure it does not do that. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit b9feab7dcf86df222a405df3e0d95b85741a2d73 Author: Dave Hansen Date: Mon Dec 4 15:07:37 2017 +0100 x86/mm/pti: Add mapping helper functions commit 61e9b3671007a5da8127955a1a3bda7e0d5f42e8 upstream. Add the pagetable helper functions do manage the separate user space page tables. [ tglx: Split out from the big combo kaiser patch. Folded Andys simplification and made it out of line as Boris suggested ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 8a2533407f4d60a43effadf7b62825d213cd678c Author: Borislav Petkov Date: Tue Dec 12 14:39:52 2017 +0100 x86/pti: Add the pti= cmdline option and documentation commit 41f4c20b57a4890ea7f56ff8717cc83fefb8d537 upstream. Keep the "nopti" optional for traditional reasons. [ tglx: Don't allow force on when running on XEN PV and made 'on' printout conditional ] Requested-by: Linus Torvalds Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit a4b07fb4e5a6aef3b87a6540cc04cf972525a723 Author: Thomas Gleixner Date: Mon Dec 4 15:07:36 2017 +0100 x86/mm/pti: Add infrastructure for page table isolation commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream. Add the initial files for kernel page table isolation, with a minimal init function and the boot time detection for this misfeature. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit f3d2b767e912b11d146b9c9922bf28efeda0cdc7 Author: Dave Hansen Date: Mon Dec 4 15:07:35 2017 +0100 x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching commit 8a09317b895f073977346779df52f67c1056d81d upstream. PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when it enters the kernel and switch back when it exits. This essentially needs to be done before leaving assembly code. This is extra challenging because the switching context is tricky: the registers that can be clobbered can vary. It is also hard to store things on the stack because there is an established ABI (ptregs) or the stack is entirely unsafe to use. Establish a set of macros that allow changing to the user and kernel CR3 values. Interactions with SWAPGS: Previous versions of the PAGE_TABLE_ISOLATION code relied on having per-CPU scratch space to save/restore a register that can be used for the CR3 MOV. The %GS register is used to index into our per-CPU space, so SWAPGS *had* to be done before the CR3 switch. That scratch space is gone now, but the semantic that SWAPGS must be done before the CR3 MOV is retained. This is good to keep because it is not that hard to do and it allows to do things like add per-CPU debugging information. What this does in the NMI code is worth pointing out. NMIs can interrupt *any* context and they can also be nested with NMIs interrupting other NMIs. The comments below ".Lnmi_from_kernel" explain the format of the stack during this situation. Changing the format of this stack is hard. Instead of storing the old CR3 value on the stack, this depends on the *regular* register save/restore mechanism and then uses %r14 to keep CR3 during the NMI. It is callee-saved and will not be clobbered by the C NMI handlers that get called. [ PeterZ: ESPFIX optimization ] Based-on-code-from: Andy Lutomirski Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit acfee9b8e27e7b5d69276e8b804fed7ff5071c10 Author: Dave Hansen Date: Mon Dec 4 15:07:34 2017 +0100 x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y commit c313ec66317d421fb5768d78c56abed2dc862264 upstream. Global pages stay in the TLB across context switches. Since all contexts share the same kernel mapping, these mappings are marked as global pages so kernel entries in the TLB are not flushed out on a context switch. But, even having these entries in the TLB opens up something that an attacker can use, such as the double-page-fault attack: http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf That means that even when PAGE_TABLE_ISOLATION switches page tables on return to user space the global pages would stay in the TLB cache. Disable global pages so that kernel TLB entries can be flushed before returning to user space. This way, all accesses to kernel addresses from userspace result in a TLB miss independent of the existence of a kernel mapping. Suppress global pages via the __supported_pte_mask. The user space mappings set PAGE_GLOBAL for the minimal kernel mappings which are required for entry/exit. These mappings are set up manually so the filtering does not take place. [ The __supported_pte_mask simplification was written by Thomas Gleixner. ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 72a2beddcd3240047552de69ce45a28029c7e56c Author: Thomas Gleixner Date: Mon Dec 4 15:07:33 2017 +0100 x86/cpufeatures: Add X86_BUG_CPU_INSECURE commit a89f040fa34ec9cd682aed98b8f04e3c47d998bd upstream. Many x86 CPUs leak information to user space due to missing isolation of user space and kernel space page tables. There are many well documented ways to exploit that. The upcoming software migitation of isolating the user and kernel space page tables needs a misfeature flag so code can be made runtime conditional. Add the BUG bits which indicates that the CPU is affected and add a feature bit which indicates that the software migitation is enabled. Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be made later. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 98669825616890a96c9fc67c5f983484578e4dd9 Author: Jing Xia Date: Tue Dec 26 15:12:53 2017 +0800 tracing: Fix crash when it fails to alloc ring buffer commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream. Double free of the ring buffer happens when it fails to alloc new ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured. The root cause is that the pointer is not set to NULL after the buffer is freed in allocate_trace_buffers(), and the freeing of the ring buffer is invoked again later if the pointer is not equal to Null, as: instance_mkdir() |-allocate_trace_buffers() |-allocate_trace_buffer(tr, &tr->trace_buffer...) |-allocate_trace_buffer(tr, &tr->max_buffer...) // allocate fail(-ENOMEM),first free // and the buffer pointer is not set to null |-ring_buffer_free(tr->trace_buffer.buffer) // out_free_tr |-free_trace_buffers() |-free_trace_buffer(&tr->trace_buffer); //if trace_buffer is not null, free again |-ring_buffer_free(buf->buffer) |-rb_free_cpu_buffer(buffer->buffers[cpu]) // ring_buffer_per_cpu is null, and // crash in ring_buffer_per_cpu->pages Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Signed-off-by: Jing Xia Signed-off-by: Chunyan Zhang Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 21a9c7346ef696161dacbbd9f47dabb0f062c4c8 Author: Steven Rostedt (VMware) Date: Tue Dec 26 20:07:34 2017 -0500 tracing: Fix possible double free on failure of allocating trace buffer commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream. Jing Xia and Chunyan Zhang reported that on failing to allocate part of the tracing buffer, memory is freed, but the pointers that point to them are not initialized back to NULL, and later paths may try to free the freed memory again. Jing and Chunyan fixed one of the locations that does this, but missed a spot. Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: Jing Xia Reported-by: Chunyan Zhang Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 234bc12669a362773e84834ace52f1af7510196b Author: Steven Rostedt (VMware) Date: Fri Dec 22 20:38:57 2017 -0500 tracing: Remove extra zeroing out of the ring buffer page commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream. The ring_buffer_read_page() takes care of zeroing out any extra data in the page that it returns. There's no need to zero it out again from the consumer. It was removed from one consumer of this function, but read_buffers_splice_read() did not remove it, and worse, it contained a nasty bug because of it. Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman