commit f82a53b87594f460f2dd9983eeb851a5840e8df8 Author: Greg Kroah-Hartman Date: Wed Jul 5 14:41:57 2017 +0200 Linux 4.11.9 commit f29125639b85dad56dbe95a9f28bb83f6b0c5803 Author: David S. Miller Date: Thu Jun 8 10:16:05 2017 -0400 hsi: Fix build regression due to netdev destructor fix. commit ed66e50d9587fc0bb032e276a2563c0068a5b63a upstream. > ../drivers/hsi/clients/ssi_protocol.c:1069:5: error: 'struct net_device' has no member named 'destructor' Reported-by: Mark Brown Reported-by: Stephen Rothwell Signed-off-by: David S. Miller Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit a5afcf8553e94d3002e1fbfeda82046dbfeb87e6 Author: Steffen Klassert Date: Fri Jun 9 11:35:46 2017 +0200 esp4: Fix udpencap for local TCP packets. [ Upstream commit 0e78a87306a6f55b1c7bbafad1de62c3975953ca ] Locally generated TCP packets are usually cloned, so we do skb_cow_data() on this packets. After that we need to reload the pointer to the esp header. On udpencap this header has an offset to skb_transport_header, so take this offset into account. This is a backport of: commit 0e78a87306a ("esp4: Fix udpencap for local TCP packets.") Fixes: 67d349ed603 ("net/esp4: Fix invalid esph pointer crash") Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output") Reported-by: Don Bowman Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b9a1254c31cc63d1f30fa37b782ebab70e9f0665 Author: Wanpeng Li Date: Mon Jun 5 05:19:09 2017 -0700 KVM: nVMX: Fix exception injection commit d4912215d1031e4fb3d1038d2e1857218dba0d0a upstream. WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel] CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G OE 4.12.0-rc3+ #23 RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel] Call Trace: ? kvm_check_async_pf_completion+0xef/0x120 [kvm] ? rcu_read_lock_sched_held+0x79/0x80 vmx_queue_exception+0x104/0x160 [kvm_intel] ? vmx_queue_exception+0x104/0x160 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm] ? kvm_arch_vcpu_load+0x47/0x240 [kvm] ? kvm_arch_vcpu_load+0x62/0x240 [kvm] kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? __fget+0xf3/0x210 do_vfs_ioctl+0xa4/0x700 ? __fget+0x114/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 This is triggered occasionally by running both win7 and win2016 in L2, in addition, EPT is disabled on both L1 and L2. It can't be reproduced easily. Commit 0b6ac343fc (KVM: nVMX: Correct handling of exception injection) mentioned that "KVM wants to inject page-faults which it got to the guest. This function assumes it is called with the exit reason in vmcs02 being a #PF exception". Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to L2) allows to check all exceptions for intercept during delivery to L2. However, there is no guarantee the exit reason is exception currently, when there is an external interrupt occurred on host, maybe a time interrupt for host which should not be injected to guest, and somewhere queues an exception, then the function nested_vmx_check_exception() will be called and the vmexit emulation codes will try to emulate the "Acknowledge interrupt on exit" behavior, the warning is triggered. Reusing the exit reason from the L2->L0 vmexit is wrong in this case, the reason must always be EXCEPTION_NMI when injecting an exception into L1 as a nested vmexit. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li Fixes: e011c663b9c7 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2") Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit bde6903736124e10c685656e40cee06fed938890 Author: Radim Krčmář Date: Thu May 18 19:37:30 2017 +0200 KVM: x86: zero base3 of unusable segments commit f0367ee1d64d27fa08be2407df5c125442e885e3 upstream. Static checker noticed that base3 could be used uninitialized if the segment was not present (useable). Random stack values probably would not pass VMCS entry checks. Reported-by: Dan Carpenter Fixes: 1aa366163b8b ("KVM: x86 emulator: consolidate segment accessors") Reviewed-by: Paolo Bonzini Reviewed-by: David Hildenbrand Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit 02800188d72c166ab6d0ebfbe13b83a7bab25a81 Author: Radim Krčmář Date: Thu May 18 19:37:31 2017 +0200 KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh() commit 34b0dadbdf698f9b277a31b2747b625b9a75ea1f upstream. Static analysis noticed that pmu->nr_arch_gp_counters can be 32 (INTEL_PMC_MAX_GENERIC) and therefore cannot be used to shift 'int'. I didn't add BUILD_BUG_ON for it as we have a better checker. Reported-by: Dan Carpenter Fixes: 25462f7f5295 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Reviewed-by: Paolo Bonzini Reviewed-by: David Hildenbrand Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit 3d2a8efd7d0d6e82b793a1783f01cbfb59392302 Author: Ladi Prosek Date: Tue Apr 25 16:42:44 2017 +0200 KVM: x86: fix emulation of RSM and IRET instructions commit 6ed071f051e12cf7baa1b69d3becb8f232fdfb7b upstream. On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm on hflags is reverted later on in x86_emulate_instruction where hflags are overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu. Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after an instruction is emulated, this commit deletes emul_flags altogether and makes the emulator access vcpu->arch.hflags using two new accessors. This way all changes, on the emulator side as well as in functions called from the emulator and accessing vcpu state with emul_to_vcpu, are preserved. More details on the bug and its manifestation with Windows and OVMF: It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD. I believe that the SMM part explains why we started seeing this only with OVMF. KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because later on in x86_emulate_instruction we overwrite arch.hflags with ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call. The AMD-specific hflag of interest here is HF_NMI_MASK. When rebooting the system, Windows sends an NMI IPI to all but the current cpu to shut them down. Only after all of them are parked in HLT will the initiating cpu finish the restart. If NMI is masked, other cpus never get the memo and the initiating cpu spins forever, waiting for hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe. Fixes: a584539b24b8 ("KVM: x86: pass the whole hflags field to emulator and back") Signed-off-by: Ladi Prosek Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 51c3bb1d99a31fa5179607eb9434d73f99af9b55 Author: Thomas Petazzoni Date: Tue Mar 21 11:03:53 2017 +0100 mtd: nand: fsmc: fix NAND width handling commit ee56874f23e5c11576540bd695177a5ebc4f4352 upstream. In commit eea628199d5b ("mtd: Add device-tree support to fsmc_nand"), Device Tree support was added to the fmsc_nand driver. However, this code has a bug in how it handles the bank-width DT property to set the bus width. Indeed, in the function fsmc_nand_probe_config_dt() that parses the Device Tree, it sets pdata->width to either 8 or 16 depending on the value of the bank-width DT property. Then, the ->probe() function will test if pdata->width is equal to FSMC_NAND_BW16 (which is 2) to set NAND_BUSWIDTH_16 in nand->options. Therefore, with the DT probing, this condition will never match. This commit fixes that by removing the "width" field from fsmc_nand_platform_data and instead have the fsmc_nand_probe_config_dt() function directly set the appropriate nand->options value. It is worth mentioning that if this commit gets backported to older kernels, prior to the drop of non-DT probing, then non-DT probing will be broken because nand->options will no longer be set to NAND_BUSWIDTH_16. Fixes: eea628199d5b ("mtd: Add device-tree support to fsmc_nand") Signed-off-by: Thomas Petazzoni Reviewed-by: Linus Walleij Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman commit f34b2f6f683caa898fbbbb89bd188c153ff8b94d Author: Kamal Dasu Date: Fri Mar 3 16:16:53 2017 -0500 mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program commit 9d2ee0a60b8bd9bef2a0082c533736d6a7b39873 upstream. On brcmnand controller v6.x and v7.x, the #WP pin is controlled through the NAND_WP bit in CS_SELECT register. The driver currently assumes that toggling the #WP pin is instantaneously enabling/disabling write-protection, but it actually takes some time to propagate the new state to the internal NAND chip logic. This behavior is sometime causing data corruptions when an erase/program operation is executed before write-protection has really been disabled. Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller") Signed-off-by: Kamal Dasu Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman commit 3b422c3c1fb5979a8fd9ca8a83d0bc9a2f67a64b Author: Arnd Bergmann Date: Fri Mar 24 23:02:48 2017 +0100 infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data commit 5b0ff9a00755d4d9c209033a77f1ed8f3186fe5c upstream. hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field, which will then change only a few of its bits, causing a warning with the latest gcc: infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci': infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized] roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1); The code is actually correct since we always set all bits of the port_vlan field, but gcc correctly points out that the first access does contain uninitialized data. This initializes the field to zero first before setting the individual bits. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Arnd Bergmann Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 6af65c535d13ae6e43082af727abf408b71a6789 Author: Suravee Suthikulpanit Date: Mon Jun 26 04:28:04 2017 -0500 iommu/amd: Fix interrupt remapping when disable guest_mode commit 84a21dbdef0b96d773599c33c2afbb002198d303 upstream. Pass-through devices to VM guest can get updated IRQ affinity information via irq_set_affinity() when not running in guest mode. Currently, AMD IOMMU driver in GA mode ignores the updated information if the pass-through device is setup to use vAPIC regardless of guest_mode. This could cause invalid interrupt remapping. Also, the guest_mode bit should be set and cleared only when SVM updates posted-interrupt interrupt remapping information. Signed-off-by: Suravee Suthikulpanit Cc: Joerg Roedel Fixes: d98de49a53e48 ('iommu/amd: Enable vAPIC interrupt remapping mode by default') Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit fb6237c7332fcd8062bdf093bbf7a77276f38069 Author: Pan Bian Date: Sun Apr 23 18:23:21 2017 +0800 iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid() commit 73dbd4a4230216b6a5540a362edceae0c9b4876b upstream. In function amd_iommu_bind_pasid(), the control flow jumps to label out_free when pasid_state->mm and mm is NULL. And mmput(mm) is called. In function mmput(mm), mm is referenced without validation. This will result in a NULL dereference bug. This patch fixes the bug. Signed-off-by: Pan Bian Fixes: f0aac63b873b ('iommu/amd: Don't hold a reference to mm_struct') Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 3d06032fb21d5d5d4758d1398d38654e5682fbb7 Author: Robin Murphy Date: Thu Mar 16 17:00:17 2017 +0000 iommu/dma: Don't reserve PCI I/O windows commit 938f1bbe35e3a7cb07e1fa7c512e2ef8bb866bdf upstream. Even if a host controller's CPU-side MMIO windows into PCI I/O space do happen to leak into PCI memory space such that it might treat them as peer addresses, trying to reserve the corresponding I/O space addresses doesn't do anything to help solve that problem. Stop doing a silly thing. Fixes: fade1ec055dc ("iommu/dma: Avoid PCI host bridge windows") Reviewed-by: Eric Auger Signed-off-by: Robin Murphy Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 092702fa4db58e7feca8b513c7faa52fdc4fb17f Author: Eric Ren Date: Fri Jun 23 15:08:55 2017 -0700 ocfs2: fix deadlock caused by recursive locking in xattr commit 8818efaaacb78c60a9d90c5705b6c99b75d7d442 upstream. Another deadlock path caused by recursive locking is reported. This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking inode lock at vfs entry points"). Yes, we intend to fix this kind of case in incremental way, because it's hard to find out all possible paths at once. This one can be reproduced like this. On node1, cp a large file from home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl. Both nodes will hang up there. The backtraces: On node1: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_write_begin+0x43/0x1a0 [ocfs2] generic_perform_write+0xa9/0x180 __generic_file_write_iter+0x1aa/0x1d0 ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2] __vfs_write+0xc3/0x130 vfs_write+0xb1/0x1a0 SyS_write+0x46/0xa0 On node2: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_xattr_set+0x12e/0xe80 [ocfs2] ocfs2_set_acl+0x22d/0x260 [ocfs2] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2] set_posix_acl+0x75/0xb0 posix_acl_xattr_set+0x49/0xa0 __vfs_setxattr+0x69/0x80 __vfs_setxattr_noperm+0x72/0x1a0 vfs_setxattr+0xa7/0xb0 setxattr+0x12d/0x190 path_setxattr+0x9f/0xb0 SyS_setxattr+0x14/0x20 Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock"). Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren Reported-by: Thomas Voegtle Tested-by: Thomas Voegtle Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 404dfb7533e45e451368c13fe5eb0d7564c810c8 Author: Junxiao Bi Date: Wed May 3 14:51:41 2017 -0700 ocfs2: o2hb: revert hb threshold to keep compatible commit 33496c3c3d7b88dcbe5e55aa01288b05646c6aca upstream. Configfs is the interface for ocfs2-tools to set configure to kernel and $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one used to configure heartbeat dead threshold. Kernel has a default value of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb to override it. Commit 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") changed heartbeat dead threshold name while ocfs2-tools did not, so ocfs2-tools won't set this configurable and the default value is always used. So revert it. Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi Acked-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b363931e894a2d868f759ff60bf27548565adf58 Author: Andy Lutomirski Date: Sat Apr 22 00:01:22 2017 -0700 x86/mm: Fix flush_tlb_page() on Xen commit dbd68d8e84c606673ebbcf15862f8c155fa92326 upstream. flush_tlb_page() passes a bogus range to flush_tlb_others() and expects the latter to fix it up. native_flush_tlb_others() has the fixup but Xen's version doesn't. Move the fixup to flush_tlb_others(). AFAICS the only real effect is that, without this fix, Xen would flush everything instead of just the one page on remote vCPUs in when flush_tlb_page() was called. Signed-off-by: Andy Lutomirski Reviewed-by: Boris Ostrovsky Cc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Michal Hocko Cc: Nadav Amit Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Fixes: e7b52ffd45a6 ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range") Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit ed815afd32cdd6f549036f50ae3a67c92e80c47a Author: Joerg Roedel Date: Thu Apr 6 16:19:22 2017 +0200 x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space commit 5ed386ec09a5d75bcf073967e55e895c2607a5c3 upstream. When this function fails it just sends a SIGSEGV signal to user-space using force_sig(). This signal is missing essential information about the cause, e.g. the trap_nr or an error code. Fix this by propagating the error to the only caller of mpx_handle_bd_fault(), do_bounds(), which sends the correct SIGSEGV signal to the process. Signed-off-by: Joerg Roedel Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: fe3d197f84319 ('x86, mpx: On-demand kernel allocation of bounds tables') Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 92ff9b6ce863575a4684c54b4897707d45933c5b Author: Kan Liang Date: Tue Apr 4 15:14:06 2017 -0400 perf/x86: Fix spurious NMI with PEBS Load Latency event commit fd583ad1563bec5f00140e1f2444adbcd331caad upstream. Spurious NMIs will be observed with the following command: while :; do perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp" -e "cpu/umask=0x03,event=0x0/" -e "cpu/umask=0x02,event=0x0/" -e cycles,branches,cache-misses -e cache-references -- sleep 10 done The bug was introduced by commit: 8077eca079a2 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+") That commit clears the status bits for the counters used for PEBS events, by masking the whole 64 bits pebs_enabled. However, only the low 32 bits of both status and pebs_enabled are reserved for PEBS-able counters. For status bits 32-34 are fixed counter overflow bits. For pebs_enabled bits 32-34 are for PEBS Load Latency. In the test case, the PEBS Load Latency event and fixed counter event could overflow at the same time. The fixed counter overflow bit will be cleared by mistake. Once it is cleared, the fixed counter overflow never be processed, which finally trigger spurious NMI. Correct the PEBS enabled mask by ignoring the non-PEBS bits. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 8077eca079a2 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+") Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit cc6b8fbcc65652d09aef71d0ca232a5c7791a0c7 Author: Baoquan He Date: Tue Jun 27 20:39:06 2017 +0800 x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug commit 8eabf42ae5237e6b699aeac687b5b629e3537c8d upstream. Kernel text KASLR is separated into physical address and virtual address randomization. And for virtual address randomization, we only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE. So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR, but not the original kernel loading address 'output'. The bug will cause kernel boot failure if kernel is loaded at a different position than the address, 16M, which is decided at compiled time. Kexec/kdump is such practical case. To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial value. Tested-by: Dave Young Signed-off-by: Baoquan He Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately") Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 875cfdbe15cc75390f06d0f38cf6b1b03b4a3cd3 Author: Thomas Gleixner Date: Fri Jun 23 10:50:38 2017 +0200 x86/mshyperv: Remove excess #includes from mshyperv.h commit 26fcd952d5c977a94ac64bb44ed409e37607b2c9 upstream. A recent commit included linux/slab.h in linux/irq.h. This breaks the build of vdso32 on a 64-bit kernel. The reason is that linux/irq.h gets included into the vdso code via linux/interrupt.h which is included from asm/mshyperv.h. That makes the 32-bit vdso compile fail, because slab.h includes the pgtable headers for 64-bit on a 64-bit build. Neither linux/clocksource.h nor linux/interrupt.h are needed in the mshyperv.h header file itself - it has a dependency on . Remove the includes and unbreak the build. Reported-by: Ingo Molnar Signed-off-by: Thomas Gleixner Cc: K. Y. Srinivasan Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Vitaly Kuznetsov Cc: devel@linuxdriverproject.org Fixes: dee863b571b0 ("hv: export current Hyper-V clocksource") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1706231038460.2647@nanos Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit b3bc81143c19fbc59430fbcc06fd56cc10aaac7e Author: Josh Poimboeuf Date: Tue May 23 10:37:29 2017 -0500 Revert "x86/entry: Fix the end of the stack for newly forked tasks" commit ebd574994c63164d538a197172157318f58ac647 upstream. Petr Mladek reported the following warning when loading the livepatch sample module: WARNING: CPU: 1 PID: 3699 at arch/x86/kernel/stacktrace.c:132 save_stack_trace_tsk_reliable+0x133/0x1a0 ... Call Trace: __schedule+0x273/0x820 schedule+0x36/0x80 kthreadd+0x305/0x310 ? kthread_create_on_cpu+0x80/0x80 ? icmp_echo.part.32+0x50/0x50 ret_from_fork+0x2c/0x40 That warning means the end of the stack is no longer recognized as such for newly forked tasks. The problem was introduced with the following commit: ff3f7e2475bb ("x86/entry: Fix the end of the stack for newly forked tasks") ... which was completely misguided. It only partially fixed the reported issue, and it introduced another bug in the process. None of the other entry code saves the frame pointer before calling into C code, so it doesn't make sense for ret_from_fork to do so either. Contrary to what I originally thought, the original issue wasn't related to newly forked tasks. It was actually related to ftrace. When entry code calls into a function which then calls into an ftrace handler, the stack frame looks different than normal. The original issue will be fixed in the unwinder, in a subsequent patch. Reported-by: Petr Mladek Signed-off-by: Josh Poimboeuf Acked-by: Thomas Gleixner Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Steven Rostedt Cc: live-patching@vger.kernel.org Fixes: ff3f7e2475bb ("x86/entry: Fix the end of the stack for newly forked tasks") Link: http://lkml.kernel.org/r/f350760f7e82f0750c8d1dd093456eb212751caa.1495553739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 839fe2c4164304d4c460b67db5a3f4e7689edd38 Author: Arnaldo Carvalho de Melo Date: Mon Apr 24 11:58:54 2017 -0300 tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel commit e883d09c9eb2ffddfd057c17e6a0cef446ec8c9b upstream. Just a minor fix done in: Fixes: 26a37ab319a2 ("x86/mce: Fix copy/paste error in exception table entries") Cc: Tony Luck Link: http://lkml.kernel.org/n/tip-ni9jzdd5yxlail6pq8cuexw2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit b856d45c710620d4324b660be1f36ce561ebc281 Author: Christophe JAILLET Date: Sat May 13 13:40:20 2017 +0200 ARM: davinci: PM: Do not free useful resources in normal path in 'davinci_pm_init' commit 95d7c1f18bf8ac03b0fc48eac1f1b11f867765b8 upstream. It is wrong to iounmap resources in the normal path of davinci_pm_init() The 3 ioremap'ed fields of 'pm_config' can be accessed later on in other functions, so we should return 'success' instead of unrolling everything. Fixes: aa9aa1ec2df6 ("ARM: davinci: PM: rework init, remove platform device") Signed-off-by: Christophe JAILLET [nsekhar@ti.com: commit message and minor style fixes] Signed-off-by: Sekhar Nori Signed-off-by: Greg Kroah-Hartman commit b0ed471883749029e49c9a6f02b7568d7f9819d5 Author: Christophe JAILLET Date: Sat May 13 13:40:05 2017 +0200 ARM: davinci: PM: Free resources in error handling path in 'davinci_pm_init' commit f3f6cc814f9cb61cfb738af2b126a8bf19e5ab4c upstream. If 'sram_alloc' fails, we need to free already allocated resources. Fixes: aa9aa1ec2df6 ("ARM: davinci: PM: rework init, remove platform device") Signed-off-by: Christophe JAILLET Signed-off-by: Sekhar Nori Signed-off-by: Greg Kroah-Hartman commit 0afbd9fd39caff83e178dbaaa2581929d449cb85 Author: Doug Berger Date: Thu Jun 29 18:41:36 2017 +0100 ARM: 8685/1: ensure memblock-limit is pmd-aligned commit 9e25ebfe56ece7541cd10a20d715cbdd148a2e06 upstream. The pmd containing memblock_limit is cleared by prepare_page_table() which creates the opportunity for early_alloc() to allocate unmapped memory if memblock_limit is not pmd aligned causing a boot-time hang. Commit 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") attempted to resolve this problem, but there is a path through the adjust_lowmem_bounds() routine where if all memory regions start and end on pmd-aligned addresses the memblock_limit will be set to arm_lowmem_limit. Since arm_lowmem_limit can be affected by the vmalloc early parameter, the value of arm_lowmem_limit may not be pmd-aligned. This commit corrects this oversight such that memblock_limit is always rounded down to pmd-alignment. Fixes: 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") Signed-off-by: Doug Berger Suggested-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 16dfde48319bc5a5091b8ababc2d86c1de763ee4 Author: Lorenzo Pieralisi Date: Fri May 26 17:40:02 2017 +0100 ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation commit cb7cf772d83d2d4e6995c5bb9e0fb59aea8f7080 upstream. The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes muster from an ACPI specification standpoint. Current macro detects the MADT GICC entry length through ACPI firmware version (it changed from 76 to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification) but always uses (erroneously) the ACPICA (latest) struct (ie struct acpi_madt_generic_interrupt - that is 80-bytes long) length to check if the current GICC entry memory record exceeds the MADT table end in memory as defined by the MADT table header itself, which may result in false negatives depending on the ACPI firmware version and how the MADT entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC entries are 76 bytes long, so by adding 80 to a GICC entry start address in memory the resulting address may well be past the actual MADT end, triggering a false negative). Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks and update them to always use the firmware version specific MADT GICC entry length in order to carry out boundary checks. Fixes: b6cfb277378e ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro") Reported-by: Julien Grall Acked-by: Will Deacon Acked-by: Marc Zyngier Signed-off-by: Lorenzo Pieralisi Cc: Julien Grall Cc: Hanjun Guo Cc: Al Stone Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman commit 5e1c6a5d7fa096fd2e2adc1baaf36f7701ef6560 Author: Timmy Li Date: Mon May 22 16:48:28 2017 +0100 ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path commit 717902cc93118119a6fce7765da6cf2786987418 upstream. Commit 093d24a20442 ("arm64: PCI: Manage controller-specific data on per-controller basis") added code to allocate ACPI PCI root_ops dynamically on a per host bridge basis but failed to update the corresponding memory allocation failure path in pci_acpi_scan_root() leading to a potential memory leakage. Fix it by adding the required kfree call. Fixes: 093d24a20442 ("arm64: PCI: Manage controller-specific data on per-controller basis") Reviewed-by: Tomasz Nowicki Signed-off-by: Timmy Li [lorenzo.pieralisi@arm.com: refactored code, rewrote commit log] Signed-off-by: Lorenzo Pieralisi CC: Will Deacon CC: Bjorn Helgaas Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman commit 961010353d3e9bb2a7ddb434ee6c37b03271810b Author: Eric Anholt Date: Thu Apr 27 18:02:32 2017 -0700 watchdog: bcm281xx: Fix use of uninitialized spinlock. commit fedf266f9955d9a019643cde199a2fd9a0259f6f upstream. The bcm_kona_wdt_set_resolution_reg() call takes the spinlock, so initialize it earlier. Fixes a warning at boot with lock debugging enabled. Fixes: 6adb730dc208 ("watchdog: bcm281xx: Watchdog Driver") Signed-off-by: Eric Anholt Reviewed-by: Florian Fainelli Reviewed-by: Guenter Roeck Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Greg Kroah-Hartman commit 662bb18efe1433aa6139af13408899ddac828ae7 Author: Dan Carpenter Date: Wed Jun 14 13:34:05 2017 +0300 xfrm: Oops on error in pfkey_msg2xfrm_state() commit 1e3d0c2c70cd3edb5deed186c5f5c75f2b84a633 upstream. There are some missing error codes here so we accidentally return NULL instead of an error pointer. It results in a NULL pointer dereference. Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.") Signed-off-by: Dan Carpenter Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit f37a5bfa5cf7ceb330367947dab8a8083282795f Author: Dan Carpenter Date: Wed Jun 14 13:35:37 2017 +0300 xfrm: NULL dereference on allocation failure commit e747f64336fc15e1c823344942923195b800aa1e upstream. The default error code in pfkey_msg2xfrm_state() is -ENOBUFS. We added a new call to security_xfrm_state_alloc() which sets "err" to zero so there several places where we can return ERR_PTR(0) if kmalloc() fails. The caller is expecting error pointers so it leads to a NULL dereference. Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.") Signed-off-by: Dan Carpenter Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 29be0c1aefd38f07c5b0547c2f6d7423873c9aca Author: Sabrina Dubroca Date: Wed May 3 16:43:19 2017 +0200 xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY commit 9b3eb54106cf6acd03f07cf0ab01c13676a226c2 upstream. When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for that dst. Unfortunately, the code that allocates and fills this copy doesn't care about what type of flowi (flowi, flowi4, flowi6) gets passed. In multiple code paths (from raw_sendmsg, from TCP when replying to a FIN, in vxlan, geneve, and gre), the flowi that gets passed to xfrm is actually an on-stack flowi4, so we end up reading stuff from the stack past the end of the flowi4 struct. Since xfrm_dst->origin isn't used anywhere following commit ca116922afa8 ("xfrm: Eliminate "fl" and "pol" args to xfrm_bundle_ok()."), just get rid of it. xfrm_dst->partner isn't used either, so get rid of that too. Fixes: 9d6ec938019c ("ipv4: Use flowi4 in public route lookup interfaces.") Signed-off-by: Sabrina Dubroca Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit e0cee9f3bfdf3c8e84db231d0247132b4c30bc6e Author: Hangbin Liu Date: Sun Jun 11 09:44:20 2017 +0800 xfrm: move xfrm_garbage_collect out of xfrm_policy_flush commit 138437f591dd9a42d53c6fed1a3c85e02678851c upstream. Now we will force to do garbage collection if any policy removed in xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini() first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini() -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer dereference when check percpu_empty. The code path looks like: flow_cache_fini() - fc->percpu = NULL xfrm_policy_fini() - xfrm_policy_flush() - xfrm_garbage_collect() - flow_cache_flush() - flow_cache_percpu_empty() - fcp = per_cpu_ptr(fc->percpu, cpu) To reproduce, just add ipsec in netns and then remove the netns. v2: As Xin Long suggested, since only two other places need to call it. move xfrm_garbage_collect() outside xfrm_policy_flush(). v3: Fix subject mismatch after v2 fix. Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy") Signed-off-by: Hangbin Liu Reviewed-by: Xin Long Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 4d03c6171114ed45d1267cf5d0f49d832de1e917 Author: Yossi Kuperman Date: Thu Jun 22 11:37:10 2017 +0300 xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish commit 7c88e21aefcf86fb41b48b2e04528db5a30fbe18 upstream. IPv6 payload length indicates the size of the payload, including any extension headers. In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len is set to the payload size only, regardless of the presence of any extension headers. After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet according to the wrong payload_len, thus corrupting the packet. Set payload_len to account for extension headers as well. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Signed-off-by: Yossi Kuperman Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit c4ed418dc93b6d44b21acc3dbe060ac1be921b79 Author: Juergen Gross Date: Thu May 18 17:28:48 2017 +0200 xen/blkback: don't free be structure too early commit 71df1d7ccad1c36f7321d6b3b48f2ea42681c363 upstream. The be structure must not be freed when freeing the blkif structure isn't done. Otherwise a use-after-free of be when unmapping the ring used for communicating with the frontend will occur in case of a late call of xenblk_disconnect() (e.g. due to an I/O still active when trying to disconnect). Signed-off-by: Juergen Gross Tested-by: Steven Haigh Acked-by: Roger Pau Monné Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit f21d9eda322289bce95ade2b79f9cef314ab2b91 Author: Ard Biesheuvel Date: Fri Jun 23 15:08:41 2017 -0700 mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings commit 029c54b09599573015a5c18dbe59cbdf42742237 upstream. Existing code that uses vmalloc_to_page() may assume that any address for which is_vmalloc_addr() returns true may be passed into vmalloc_to_page() to retrieve the associated struct page. This is not un unreasonable assumption to make, but on architectures that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need to ensure that vmalloc_to_page() does not go off into the weeds trying to dereference huge PUDs or PMDs as table entries. Given that vmalloc() and vmap() themselves never create huge mappings or deal with compound pages at all, there is no correct answer in this case, so return NULL instead, and issue a warning. When reading /proc/kcore on arm64, you will hit an oops as soon as you hit the huge mappings used for the various segments that make up the mapping of vmlinux. With this patch applied, you will no longer hit the oops, but the kcore contents willl be incorrect (these regions will be zeroed out) We are fixing this for kcore specifically, so it avoids vread() for those regions. At least one other problematic user exists, i.e., /dev/kmem, but that is currently broken on arm64 for other reasons. Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.org Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Reviewed-by: Laura Abbott Cc: Michal Hocko Cc: zhong jiang Cc: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0ec03ce7d79dc9a5c47d26bab38c78075d42de9c Author: Thomas Gleixner Date: Tue May 23 23:23:32 2017 +0200 pinctrl/amd: Use regular interrupt instead of chained commit ba714a9c1dea85e0bf2899d02dfeb9c70040427c upstream. The AMD pinctrl driver uses a chained interrupt to demultiplex the GPIO interrupts. Kevin Vandeventer reported, that his new AMD Ryzen locks up hard on boot when the AMD pinctrl driver is initialized. The reason is an interrupt storm. It's not clear whether that's caused by hardware or firmware or both. Using chained interrupts on X86 is a dangerous endavour. If a system is misconfigured or the hardware buggy there is no safety net to catch an interrupt storm. Convert the driver to use a regular interrupt for the demultiplex handler. This allows the interrupt storm detector to catch the malfunction and lets the system boot up. This should be backported to stable because it's likely that more users run into this problem as the AMD Ryzen machines are spreading. Reported-by: Kevin Vandeventer Link: https://bugzilla.suse.com/show_bug.cgi?id=1034261 Signed-off-by: Thomas Gleixner Signed-off-by: Linus Walleij Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman commit d9efa9db58338c359b56ce0fe9435fc201bffe03 Author: Baoquan He Date: Thu May 4 10:25:47 2017 +0800 x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() commit fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7 upstream. Jeff Moyer reported that on his system with two memory regions 0~64G and 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR will make the system hang intermittently during boot. While adding 'nokaslr' won't. The back trace is: Oops: 0000 [#1] SMP RIP: memcpy_erms() [ .... ] Call Trace: pmem_rw_page() bdev_read_page() do_mpage_readpage() mpage_readpages() blkdev_readpages() __do_page_cache_readahead() force_page_cache_readahead() page_cache_sync_readahead() generic_file_read_iter() blkdev_read_iter() __vfs_read() vfs_read() SyS_read() entry_SYSCALL_64_fastpath() This crash happens because the for loop count calculation in sync_global_pgds() is not correct. When a mapping area crosses PGD entries, we should calculate the starting address of region which next PGD covers and assign it to next for loop count, but not add PGDIR_SIZE directly. The old code works right only if the mapping area is an exact multiple of PGDIR_SIZE, otherwize the end region could be skipped so that it can't be synchronized to all other processes from kernel PGD init_mm.pgd. In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it makes this area be mapped inside one PGD entry. With KASLR enabled, this area could cross two PGD entries, then the next PGD entry won't be synced to all other processes. That is why we saw empty PGD. Fix it. Reported-by: Jeff Moyer Signed-off-by: Baoquan He Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dan Williams Cc: Dave Hansen Cc: Dave Young Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jinbum Park Cc: Josh Poimboeuf Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Garnier Cc: Thomas Gleixner Cc: Yasuaki Ishimatsu Cc: Yinghai Lu Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit a340661a36d529038a2f8144b5fb2b132d227ebc Author: Vallish Vaidyeshwara Date: Fri Jun 23 18:53:06 2017 +0000 dm thin: do not queue freed thin mapping for next stage processing commit 00a0ea33b495ee6149bf5a77ac5807ce87323abb upstream. process_prepared_discard_passdown_pt1() should cleanup dm_thin_new_mapping in cases of error. dm_pool_inc_data_range() can fail trying to get a block reference: metadata operation 'dm_pool_inc_data_range' failed: error = -61 When dm_pool_inc_data_range() fails, dm thin aborts current metadata transaction and marks pool as PM_READ_ONLY. Memory for thin mapping is released as well. However, current thin mapping will be queued onto next stage as part of queue_passdown_pt2() or passdown_endio(). This dangling thin mapping memory when processed and accessed in next stage will lead to device mapper crashing. Code flow without fix: -> process_prepared_discard_passdown_pt1(m) -> dm_thin_remove_range() -> discard passdown --> passdown_endio(m) queues m onto next stage -> dm_pool_inc_data_range() fails, frees memory m but does not remove it from next stage queue -> process_prepared_discard_passdown_pt2(m) -> processes freed memory m and crashes One such stack: Call Trace: [] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison] [] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool] [] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool] [] process_prepared+0x81/0xa0 [dm_thin_pool] [] do_worker+0xc5/0x820 [dm_thin_pool] [] ? __schedule+0x244/0x680 [] ? pwq_activate_delayed_work+0x42/0xb0 [] process_one_work+0x153/0x3f0 [] worker_thread+0x12b/0x4b0 [] ? rescuer_thread+0x350/0x350 [] kthread+0xca/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x25/0x30 The fix is to first take the block ref count for discarded block and then do a passdown discard of this block. If block ref count fails, then bail out aborting current metadata transaction, mark pool as PM_READ_ONLY and also free current thin mapping memory (existing error handling code) without queueing this thin mapping onto next stage of processing. If block ref count succeeds, then passdown discard of this block. Discard callback of passdown_endio() will queue this thin mapping onto next stage of processing. Code flow with fix: -> process_prepared_discard_passdown_pt1(m) -> dm_thin_remove_range() -> dm_pool_inc_data_range() --> if fails, free memory m and bail out -> discard passdown --> passdown_endio(m) queues m onto next stage Reviewed-by: Eduardo Valentin Reviewed-by: Cristian Gafton Reviewed-by: Anchal Agarwal Signed-off-by: Vallish Vaidyeshwara Reviewed-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit d7822ccb7fdcc36ebb96fe4a25902b8358f14fa0 Author: Deepak Rawat Date: Mon Jun 26 14:39:08 2017 +0200 drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr commit 82fcee526ba8ca2c5d378bdf51b21b7eb058fe3a upstream. The hash table created during vmw_cmdbuf_res_man_create was never freed. This causes memory leak in context creation. Added the corresponding drm_ht_remove in vmw_cmdbuf_res_man_destroy. Tested for memory leak by running piglit overnight and kernel memory is not inflated which earlier was. Signed-off-by: Deepak Rawat Reviewed-by: Sinclair Yeh Signed-off-by: Thomas Hellstrom Signed-off-by: Greg Kroah-Hartman commit bdbe850337374f8aeb49a2d40b9cfe277bbdfcb1 Author: Kan Liang Date: Thu Jun 29 12:09:26 2017 -0700 perf/x86/intel/uncore: Fix wrong box pointer check commit 80c65fdb4c6920e332a9781a3de5877594b07522 upstream. Should not init a NULL box. It will cause system crash. The issue looks like caused by a typo. This was not noticed because there is no NULL box. Also, for most boxes, they are enabled by default. The init code is not critical. Fixes: fff4b87e594a ("perf/x86/intel/uncore: Make package handling more robust") Signed-off-by: Kan Liang Signed-off-by: Thomas Gleixner Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com Signed-off-by: Greg Kroah-Hartman commit 60c9685b4287d62fe2621c3f27abc9e4b2906f28 Author: Vikas Shivappa Date: Mon Jun 26 11:55:49 2017 -0700 x86/intel_rdt: Fix memory leak on mount failure commit 79298acc4ba097e9ab78644e3e38902d73547c92 upstream. If mount fails, the kn_info directory is not freed causing memory leak. Add the missing error handling path. Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Vikas Shivappa Signed-off-by: Thomas Gleixner Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: andi.kleen@intel.com Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit dfa1f2448715d0e2b8229f8d43b9053bb8221cd2 Author: Bartosz Golaszewski Date: Fri Jun 23 13:45:16 2017 +0200 gpiolib: fix filtering out unwanted events commit ad537b822577fcc143325786cd6ad50d7b9df31c upstream. GPIOEVENT_REQUEST_BOTH_EDGES is not a single flag, but a binary OR of GPIOEVENT_REQUEST_RISING_EDGE and GPIOEVENT_REQUEST_FALLING_EDGE. The expression 'le->eflags & GPIOEVENT_REQUEST_BOTH_EDGES' we'll get evaluated to true even if only one event type was requested. Fix it by checking both RISING & FALLING flags explicitly. Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events") Signed-off-by: Bartosz Golaszewski Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit 515a95fafafe0d98ca6c35af0674f2aaa7e4d0b9 Author: Miklos Szeredi Date: Wed Jun 28 13:41:22 2017 +0200 ovl: copy-up: don't unlock between lookup and link commit e85f82ff9b8ef503923a3be8ca6b5fd1908a7f3f upstream. Nothing prevents mischief on upper layer while we are busy copying up the data. Move the lookup right before the looked up dentry is actually used. Signed-off-by: Miklos Szeredi Fixes: 01ad3eb8a073 ("ovl: concurrent copy up of regular files") Signed-off-by: Greg Kroah-Hartman commit 003192c3d3e3c38cc604430266e2fe3c5ef5302c Author: Benjamin Coddington Date: Fri Jun 16 11:12:59 2017 -0400 Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" commit d9f2950006f110f54444a10442752372ee568289 upstream. This reverts commit 920b4530fb80430ff30ef83efe21ba1fa5623731 which could call d_move() without holding the directory's i_mutex, and reverts commit d4ea7e3c5c0e341c15b073016dbf3ab6c65f12f3 "NFS: Fix old dentry rehash after move", which was a follow-up fix. Signed-off-by: Benjamin Coddington Fixes: 920b4530fb80 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind") Reviewed-by: Jeff Layton Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 95b2e0882b4236b9c2884e38a8c070e8c3cc436a Author: Trond Myklebust Date: Tue Jun 27 17:33:38 2017 -0400 NFSv4.1: Fix a race in nfs4_proc_layoutget commit bd171930e6a3de4f5cffdafbb944e50093dfb59b upstream. If the task calling layoutget is signalled, then it is possible for the calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race, in which case we leak a slot. The fix is to move the call to nfs4_sequence_free_slot() into the nfs4_layoutget_release() so that it gets called at task teardown time. Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit f8da5dee0901ff291a1a99e2c37a23617d8a52ea Author: Benjamin Coddington Date: Fri Jun 2 11:21:34 2017 -0400 NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask commit 501e7a4689378f8b1690089bfdd4f1e12ec22903 upstream. Now that we have umask support, we shouldn't re-send the mode in a SETATTR following an exclusive CREATE, or we risk having the same problem fixed in commit 5334c5bdac92 ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1"), which is that files with S_ISGID will have that bit stripped away. Signed-off-by: Benjamin Coddington Fixes: dff25ddb4808 ("nfs: add support for the umask attribute") Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit f521e0bfcbff93128bffba99343f03c7873a538e Author: Hui Wang Date: Wed Jun 28 08:59:16 2017 +0800 ALSA: hda - set input_path bitmap to zero after moving it to new place commit a8f20fd25bdce81a8e41767c39f456d346b63427 upstream. Recently we met a problem, the codec has valid adcs and input pins, and they can form valid input paths, but the driver does not build valid controls for them like "Mic boost", "Capture Volume" and "Capture Switch". Through debugging, I found the driver needs to shrink the invalid adcs and input paths for this machine, so it will move the whole column bitmap value to the previous column, after moving it, the driver forgets to set the original column bitmap value to zero, as a result, the driver will invalidate the path whose index value is the original colume bitmap value. After executing this function, all valid input paths are invalidated by a mistake, there are no any valid input paths, so the driver won't build controls for them. Fixes: 3a65bcdc577a ("ALSA: hda - Fix inconsistent input_paths after ADC reduction") Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit a189e2f294544f539d30a72bf5bc1bc045915339 Author: Takashi Iwai Date: Wed Jun 28 12:02:02 2017 +0200 ALSA: hda - Fix endless loop of codec configure commit d94815f917da770d42c377786dc428f542e38f71 upstream. azx_codec_configure() loops over the codecs found on the given controller via a linked list. The code used to work in the past, but in the current version, this may lead to an endless loop when a codec binding returns an error. The culprit is that the snd_hda_codec_configure() unregisters the device upon error, and this eventually deletes the given codec object from the bus. Since the list is initialized via list_del_init(), the next object points to the same device itself. This behavior change was introduced at splitting the HD-audio code code, and forgotten to adapt it here. For fixing this bug, just use a *_safe() version of list iteration. Fixes: d068ebc25e6e ("ALSA: hda - Move some codes up to hdac_bus struct") Reported-by: Daniel Vetter Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit f1a8a4fe832433ea40927bcff6b1a83ec6327242 Author: Paul Burton Date: Fri Mar 3 15:26:05 2017 -0800 MIPS: Fix IRQ tracing & lockdep when rescheduling commit d8550860d910c6b7b70f830f59003b33daaa52c9 upstream. When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler from arch/mips/kernel/entry.S we disable interrupts. This is true regardless of whether we reach work_resched from syscall_exit_work, resume_userspace or by looping after calling schedule(). Although we disable interrupts in these paths we don't call trace_hardirqs_off() before calling into C code which may acquire locks, and we therefore leave lockdep with an inconsistent view of whether interrupts are disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are both enabled. Without tracing this interrupt state lockdep will print warnings such as the following once a task returns from a syscall via syscall_exit_partial with TIF_NEED_RESCHED set: [ 49.927678] ------------[ cut here ]------------ [ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8 [ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197 [ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4 [ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a [ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8 [ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c [ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88 [ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498 [ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000 [ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc [ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00 [ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc [ 50.072327] ... [ 50.076087] Call Trace: [ 50.079869] [] show_stack+0x80/0xa8 [ 50.086577] [] dump_stack+0x10c/0x190 [ 50.093498] [] __warn+0xf0/0x108 [ 50.099889] [] warn_slowpath_fmt+0x3c/0x48 [ 50.107241] [] check_flags.part.41+0x1dc/0x1e8 [ 50.114961] [] lock_is_held_type+0x8c/0xb0 [ 50.122291] [] __schedule+0x8c0/0x10f8 [ 50.129221] [] schedule+0x30/0x98 [ 50.135659] [] work_resched+0x8/0x34 [ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]--- [ 50.148405] possible reason: unannotated irqs-off. [ 50.154600] irq event stamp: 400463 [ 50.159566] hardirqs last enabled at (400463): [] _raw_spin_unlock_irqrestore+0x40/0xa8 [ 50.171981] hardirqs last disabled at (400462): [] _raw_spin_lock_irqsave+0x30/0xb0 [ 50.183897] softirqs last enabled at (400450): [] __do_softirq+0x4ac/0x6a8 [ 50.195015] softirqs last disabled at (400425): [] irq_exit+0x110/0x128 Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off() when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking schedule() following the work_resched label because: 1) Interrupts are disabled regardless of the path we take to reach work_resched() & schedule(). 2) Performing the tracing here avoids the need to do it in paths which disable interrupts but don't call out to C code before hitting a path which uses the RESTORE_SOME macro that will call trace_hardirqs_on() or trace_hardirqs_off() as appropriate. We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling syscall_trace_leave() for similar reasons, ensuring that lockdep has a consistent view of state after we re-enable interrupts. Signed-off-by: Paul Burton Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15385/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit 5033e622480a88441a76af48d58fd9202e5acec8 Author: Paul Burton Date: Thu Mar 2 14:02:40 2017 -0800 MIPS: pm-cps: Drop manual cache-line alignment of ready_count commit 161c51ccb7a6faf45ffe09aa5cf1ad85ccdad503 upstream. We allocate memory for a ready_count variable per-CPU, which is accessed via a cached non-coherent TLB mapping to perform synchronisation between threads within the core using LL/SC instructions. In order to ensure that the variable is contained within its own data cache line we allocate 2 lines worth of memory & align the resulting pointer to a line boundary. This is however unnecessary, since kmalloc is guaranteed to return memory which is at least cache-line aligned (see ARCH_DMA_MINALIGN). Stop the redundant manual alignment. Besides cleaning up the code & avoiding needless work, this has the side effect of avoiding an arithmetic error found by Bryan on 64 bit systems due to the 32 bit size of the former dlinesz. This led the ready_count variable to have its upper 32b cleared erroneously for MIPS64 kernels, causing problems when ready_count was later used on MIPS64 via cpuidle. Signed-off-by: Paul Burton Fixes: 3179d37ee1ed ("MIPS: pm-cps: add PM state entry code for CPS systems") Reported-by: Bryan O'Donoghue Reviewed-by: Bryan O'Donoghue Tested-by: Bryan O'Donoghue Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15383/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit dba185d7a5d6be478c9c4d04d7a7ae683fd7b36e Author: James Hogan Date: Thu Jun 29 15:05:04 2017 +0100 MIPS: Avoid accidental raw backtrace commit 854236363370995a609a10b03e35fd3dc5e9e4a1 upstream. Since commit 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") show_backtrace() invokes the raw backtracer when cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels where user and kernel address spaces overlap. However this is used by show_stack() which creates its own pt_regs on the stack and leaves cp0_status uninitialised in most of the code paths. This results in the non deterministic use of the raw back tracer depending on the previous stack content. show_stack() deals exclusively with kernel mode stacks anyway, so explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure we get a useful backtrace. Fixes: 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") Signed-off-by: James Hogan Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16656/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit 9de0e07dfb9d6d479a3d48492902d8b9bb59eb4d Author: Karl Beldan Date: Tue Jun 27 19:22:16 2017 +0000 MIPS: head: Reorder instructions missing a delay slot commit 25d8b92e0af75d72ce8b99e63e5a449cc0888efa upstream. In this sequence the 'move' is assumed in the delay slot of the 'beq', but head.S is in reorder mode and the former gets pushed one 'nop' farther by the assembler. The corrected behavior made booting with an UHI supplied dtb erratic. Fixes: 15f37e158892 ("MIPS: store the appended dtb address in a variable") Signed-off-by: Karl Beldan Reviewed-by: James Hogan Cc: Jonas Gorski Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/16614/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit c806e0188da8a36c141219bc7cf02de7803f974f Author: Juergen Gross Date: Thu May 18 17:28:49 2017 +0200 xen/blkback: don't use xen_blkif_get() in xen-blkback kthread commit a24fa22ce22ae302b3bf8f7008896d52d5d57b8d upstream. There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread of xen-blkback. Thread stopping is synchronous and using the blkif reference counting in the kthread will avoid to ever let the reference count drop to zero at the end of an I/O running concurrent to disconnecting and multiple rings. Setting ring->xenblkd to NULL after stopping the kthread isn't needed as the kthread does this already. Signed-off-by: Juergen Gross Tested-by: Steven Haigh Acked-by: Roger Pau Monné Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 887e338c2e830419c9dfd3890d8524eee0d58bd4 Author: Kinglong Mee Date: Thu Apr 27 11:13:38 2017 +0800 NFSv4.x/callback: Create the callback service through svc_create_pooled commit df807fffaabde625fa9adb82e3e5b88cdaa5709a upstream. As the comments for svc_set_num_threads() said, " Destroying threads relies on the service threads filling in rqstp->rq_task, which only the nfs ones do. Assumes the serv has been created using svc_create_pooled()." If creating service through svc_create(), the svc_pool_map_put() will be called in svc_destroy(), but the pool map isn't used. So that, the reference of pool map will be drop, the next using of pool map will get a zero npools. [ 137.992130] divide error: 0000 [#1] SMP [ 137.992148] Modules linked in: nfsd(E) nfsv4 nfs fscache fuse tun bridge stp llc ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event vmw_balloon coretemp crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel intel_rapl_perf joydev snd_ens1371 gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore parport_pc parport nfit acpi_cpufreq tpm_tis tpm_tis_core tpm vmw_vmci i2c_piix4 shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm crc32c_intel drm e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd] [ 137.992336] CPU: 0 PID: 4514 Comm: rpc.nfsd Tainted: G E 4.11.0-rc8+ #536 [ 137.992777] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 137.993757] task: ffff955984101d00 task.stack: ffff9873c2604000 [ 137.994231] RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc] [ 137.994768] RSP: 0018:ffff9873c2607c18 EFLAGS: 00010246 [ 137.995227] RAX: 0000000000000000 RBX: ffff95598376f000 RCX: 0000000000000002 [ 137.995673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9559944aec00 [ 137.996156] RBP: ffff9873c2607c18 R08: ffff9559944aec28 R09: 0000000000000000 [ 137.996609] R10: 0000000001080002 R11: 0000000000000000 R12: ffff95598376f010 [ 137.997063] R13: ffff95598376f018 R14: ffff9559944aec28 R15: ffff9559944aec00 [ 137.997584] FS: 00007f755529eb40(0000) GS:ffff9559bb600000(0000) knlGS:0000000000000000 [ 137.998048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.998548] CR2: 000055f3aecd9660 CR3: 0000000084290000 CR4: 00000000001406f0 [ 137.999052] Call Trace: [ 137.999517] svc_xprt_do_enqueue+0xef/0x260 [sunrpc] [ 138.000028] svc_xprt_received+0x47/0x90 [sunrpc] [ 138.000487] svc_add_new_perm_xprt+0x76/0x90 [sunrpc] [ 138.000981] svc_addsock+0x14b/0x200 [sunrpc] [ 138.001424] ? recalc_sigpending+0x1b/0x50 [ 138.001860] ? __getnstimeofday64+0x41/0xd0 [ 138.002346] ? do_gettimeofday+0x29/0x90 [ 138.002779] write_ports+0x255/0x2c0 [nfsd] [ 138.003202] ? _copy_from_user+0x4e/0x80 [ 138.003676] ? write_recoverydir+0x100/0x100 [nfsd] [ 138.004098] nfsctl_transaction_write+0x48/0x80 [nfsd] [ 138.004544] __vfs_write+0x37/0x160 [ 138.004982] ? selinux_file_permission+0xd7/0x110 [ 138.005401] ? security_file_permission+0x3b/0xc0 [ 138.005865] vfs_write+0xb5/0x1a0 [ 138.006267] SyS_write+0x55/0xc0 [ 138.006654] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 138.007071] RIP: 0033:0x7f7554b9dc30 [ 138.007437] RSP: 002b:00007ffc9f92c788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 138.007807] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7554b9dc30 [ 138.008168] RDX: 0000000000000002 RSI: 00005640cd536640 RDI: 0000000000000003 [ 138.008573] RBP: 00007ffc9f92c780 R08: 0000000000000001 R09: 0000000000000002 [ 138.008918] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004 [ 138.009254] R13: 00005640cdbf77a0 R14: 00005640cdbf7720 R15: 00007ffc9f92c238 [ 138.009610] Code: 0f 1f 44 00 00 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 07 42 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3 [ 138.010664] RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffff9873c2607c18 [ 138.011061] ---[ end trace b3468224cafa7d11 ]--- Signed-off-by: Kinglong Mee Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit a3042f8a0cdc7b213a8960bd731a97bf17d084c6 Author: Eric Leblond Date: Thu May 11 18:56:38 2017 +0200 netfilter: synproxy: fix conntrackd interaction commit 87e94dbc210a720a34be5c1174faee5c84be963e upstream. This patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit f19613afaf2622f5854a2266e2aa9abc9f9d145a Author: Serhey Popovych Date: Tue Jun 20 14:35:23 2017 +0300 rtnetlink: add IFLA_GROUP to ifla_policy [ Upstream commit db833d40ad3263b2ee3b59a1ba168bb3cfed8137 ] Network interface groups support added while ago, however there is no IFLA_GROUP attribute description in policy and netlink message size calculations until now. Add IFLA_GROUP attribute to the policy. Fixes: cbda10fa97d7 ("net_device: add support for network device groups") Signed-off-by: Serhey Popovych Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4ade61f463634383f18c614a2b9812ce0f7ac448 Author: Serhey Popovych Date: Tue Jun 20 13:29:25 2017 +0300 ipv6: Do not leak throw route references [ Upstream commit 07f615574f8ac499875b21c1142f26308234a92c ] While commit 73ba57bfae4a ("ipv6: fix backtracking for throw routes") does good job on error propagation to the fib_rules_lookup() in fib rules core framework that also corrects throw routes handling, it does not solve route reference leakage problem happened when we return -EAGAIN to the fib_rules_lookup() and leave routing table entry referenced in arg->result. If rule with matched throw route isn't last matched in the list we overwrite arg->result losing reference on throw route stored previously forever. We also partially revert commit ab997ad40839 ("ipv6: fix the incorrect return value of throw route") since we never return routing table entry with dst.error == -EAGAIN when CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point to check for RTF_REJECT flag since it is always set throw route. Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes") Signed-off-by: Serhey Popovych Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 16c4d1be8fe4a7039f992506d4a49e58b1ed05ee Author: Gao Feng Date: Fri Jun 16 15:00:02 2017 +0800 net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev [ Upstream commit 9745e362add89432d2c951272a99b0a5fe4348a9 ] The register_vlan_device would invoke free_netdev directly, when register_vlan_dev failed. It would trigger the BUG_ON in free_netdev if the dev was already registered. In this case, the netdev would be freed in netdev_run_todo later. So add one condition check now. Only when dev is not registered, then free it directly. The following is the part coredump when netdev_upper_dev_link failed in register_vlan_dev. I removed the lines which are too long. [ 411.237457] ------------[ cut here ]------------ [ 411.237458] kernel BUG at net/core/dev.c:7998! [ 411.237484] invalid opcode: 0000 [#1] SMP [ 411.237705] [last unloaded: 8021q] [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6 [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000 [ 411.237782] RIP: 0010:free_netdev+0x116/0x120 [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297 [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878 [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000 [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801 [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000 [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000 [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000 [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0 [ 411.240936] Call Trace: [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q] [ 411.241910] sock_ioctl+0x18b/0x2c0 [ 411.242394] do_vfs_ioctl+0xa1/0x5d0 [ 411.242853] ? sock_alloc_file+0xa6/0x130 [ 411.243465] SyS_ioctl+0x79/0x90 [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 411.244425] RIP: 0033:0x7fb69089a357 [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357 [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003 [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999 [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004 [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001 [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0 Signed-off-by: Gao Feng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c207e0594f5dbb4c8d24118559c7142791360b88 Author: Wei Wang Date: Fri Jun 16 10:46:37 2017 -0700 decnet: always not take dst->__refcnt when inserting dst into hash table [ Upstream commit 76371d2e3ad1f84426a30ebcd8c3b9b98f4c724f ] In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 941bdec095d55e0127b4e3bce6e0ea73ee9f5462 Author: Maor Dickman Date: Thu May 18 15:15:08 2017 +0300 net/mlx5e: Fix timestamping capabilities reporting [ Upstream commit f0b381178b01b831f9907d72f467d6443afdea67 ] Misuse of (BIT) macro caused to report wrong flags for "Hardware Transmit Timestamp Modes" and "Hardware Receive Filter Modes" Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support') Signed-off-by: Maor Dickman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit f464aace786f4428549f36c4eb0e9da2d7e9b6f7 Author: Eli Cohen Date: Thu Jun 8 11:33:16 2017 -0500 net/mlx5: Wait for FW readiness before initializing command interface [ Upstream commit 6c780a0267b8a1075f40b39851132eeaefefcff5 ] Before attempting to initialize the command interface we must wait till the fw_initializing bit is clear. If we fail to meet this condition the hardware will drop our configuration, specifically the descriptors page address. This scenario can happen when the firmware is still executing an FLR flow and did not finish yet so the driver needs to wait for that to finish. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Eli Cohen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit c7d1260afbd09f0240f858620c708ab6f776f4b9 Author: Or Gerlitz Date: Thu Jun 15 20:08:32 2017 +0300 net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it [ Upstream commit 31ac93386d135a6c96de9c8bab406f5ccabf5a4d ] The error flow of mlx5e_create_netdev calls the cleanup call of the given profile without checking if it exists, fix that. Currently the VF reps don't register that callback and we crash if getting into error -- can be reproduced by the user doing ctrl^C while attempting to change the sriov mode from legacy to switchdev. Fixes: 26e59d8077a3 '(net/mlx5e: Implement mlx5e interface attach/detach callbacks') Signed-off-by: Or Gerlitz Reported-by: Sabrina Dubroca Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 050efbe12925bc9fe1fa2695afc2ba186c1f9cc2 Author: Chris Mi Date: Tue May 16 07:07:11 2017 -0400 net/mlx5e: Fix min inline value for VF rep SQs [ Upstream commit 5f195c2c5cba60241004146cd12d71451d6b0fc4 ] The offending commit only changed the code path for PF/VF, but it didn't take care of VF representors. As a result, since params->tx_min_inline_mode for VF representors is kzalloced to 0 (MLX5_INLINE_MODE_NONE), all VF reps SQs were set to that mode. This actually works on CX5 by default but broke CX4. Fix that by adding a call to query the min inline mode from the VF rep build up code. Fixes: a6f402e49901 ("net/mlx5e: Tx, no inline copy on ConnectX-5") Signed-off-by: Chris Mi Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit eb0d418f2e579a2e9de610368c3100fe58d2c0ba Author: Xin Long Date: Thu Jun 15 17:49:08 2017 +0800 sctp: return next obj by passing pos + 1 into sctp_transport_get_idx [ Upstream commit 988c7322116970696211e902b468aefec95b6ec4 ] In sctp_for_each_transport, pos is used to save how many objs it has dumped. Now it gets the last obj by sctp_transport_get_idx, then gets the next obj by sctp_transport_get_next. The issue is that in the meanwhile if some objs in transport hashtable are removed and the objs nums are less than pos, sctp_transport_get_idx would return NULL and hti.walker.tbl is NULL as well. At this moment it should stop hti, instead of continue getting the next obj. Or it would cause a NULL pointer dereference in sctp_transport_get_next. This patch is to pass pos + 1 into sctp_transport_get_idx to get the next obj directly, even if pos > objs nums, it would return NULL and stop hti. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 247ab3c1c1f70add7e2f0a317bf679769677e53b Author: Xin Long Date: Thu Jun 15 16:33:58 2017 +0800 ipv6: fix calling in6_ifa_hold incorrectly for dad work [ Upstream commit f8a894b218138888542a5058d0e902378fd0d4ec ] Now when starting the dad work in addrconf_mod_dad_work, if the dad work is idle and queued, it needs to hold ifa. The problem is there's one gap in [1], during which if the pending dad work is removed elsewhere. It will miss to hold ifa, but the dad word is still idea and queue. if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); <--------------[1] mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); An use-after-free issue can be caused by this. Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in net6_ifa_finish_destroy was hit because of it. As Hannes' suggestion, this patch is to fix it by holding ifa first in addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if the dad_work is already in queue. Note that this patch did not choose to fix it with: if (!mod_delayed_work(delay)) in6_ifa_hold(ifp); As with it, when delay == 0, dad_work would be scheduled immediately, all addrconf_mod_dad_work(0) callings had to be moved under ifp->lock. Reported-by: Wei Chen Suggested-by: Hannes Frederic Sowa Acked-by: Hannes Frederic Sowa Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bdfae324ba1552520f711f9db8ee9c7e81061541 Author: Jesper Dangaard Brouer Date: Wed Jun 14 13:27:37 2017 +0200 net: don't global ICMP rate limit packets originating from loopback [ Upstream commit 849a44de91636c24cea799cb8ad8c36433feb913 ] Florian Weimer seems to have a glibc test-case which requires that loopback interfaces does not get ICMP ratelimited. This was broken by commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited"). An ICMP response will usually be routed back-out the same incoming interface. Thus, take advantage of this and skip global ICMP ratelimit when the incoming device is loopback. In the unlikely event that the outgoing it not loopback, due to strange routing policy rules, ICMP rate limiting still works via peer ratelimiting via icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812 (section 4.3.2.8 "Rate Limiting"). This seems to fix the reproducer given by Florian. While still avoiding to perform expensive and unneeded outgoing route lookup for rate limited packets (in the non-loopback case). Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer Reported-by: "H.J. Lu" Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 487dd0ab72ed0fb8a0a1347b4bd140313aa858cc Author: Bjørn Mork Date: Tue Jun 13 19:10:18 2017 +0200 qmi_wwan: new Telewell and Sierra device IDs [ Upstream commit 60cfe1eaccb8af598ebe1bdc44e157ea30fcdd81 ] A new Sierra Wireless EM7305 device ID used in a Toshiba laptop, and two Longcheer device IDs entries used by Telewell TW-3G HSPA+ branded modems. Reported-by: Petr Kloc Reported-by: Teemu Likonen Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5113c2dcb96d7004ed023fbb11fd7ca21bfa6494 Author: WANG Cong Date: Tue Jun 20 10:46:27 2017 -0700 igmp: add a missing spin_lock_init() [ Upstream commit b4846fc3c8559649277e3e4e6b5cec5348a8d208 ] Andrey reported a lockdep warning on non-initialized spinlock: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 4099 Comm: a.out Not tainted 4.12.0-rc6+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x717/0x1aa0 kernel/locking/lockdep.c:755 ? 0xffffffffa0000000 __lock_acquire+0x269/0x3690 kernel/locking/lockdep.c:3255 lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855 __raw_spin_lock_bh ./include/linux/spinlock_api_smp.h:135 _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:175 spin_lock_bh ./include/linux/spinlock.h:304 ip_mc_clear_src+0x27/0x1e0 net/ipv4/igmp.c:2076 igmpv3_clear_delrec+0xee/0x4f0 net/ipv4/igmp.c:1194 ip_mc_destroy_dev+0x4e/0x190 net/ipv4/igmp.c:1736 We miss a spin_lock_init() in igmpv3_add_delrec(), probably because previously we never use it on this code path. Since we already unlink it from the global mc_tomb list, it is probably safe not to acquire this spinlock here. It does not harm to have it although, to avoid conditional locking. Fixes: c38b7d327aaf ("igmp: acquire pmc lock for ip_mc_clear_src()") Reported-by: Andrey Konovalov Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 20407b1d4a3a3bb09003e9ab7ce10aab27d9d91f Author: WANG Cong Date: Mon Jun 12 09:52:26 2017 -0700 igmp: acquire pmc lock for ip_mc_clear_src() [ Upstream commit c38b7d327aafd1e3ad7ff53eefac990673b65667 ] Andrey reported a use-after-free in add_grec(): for (psf = *psf_list; psf; psf = psf_next) { ... psf_next = psf->sf_next; where the struct ip_sf_list's were already freed by: kfree+0xe8/0x2b0 mm/slub.c:3882 ip_mc_clear_src+0x69/0x1c0 net/ipv4/igmp.c:2078 ip_mc_dec_group+0x19a/0x470 net/ipv4/igmp.c:1618 ip_mc_drop_socket+0x145/0x230 net/ipv4/igmp.c:2609 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:411 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1072 This happens because we don't hold pmc->lock in ip_mc_clear_src() and a parallel mr_ifc_timer timer could jump in and access them. The RCU lock is there but it is merely for pmc itself, this spinlock could actually ensure we don't access them in parallel. Thanks to Eric and Long for discussion on this bug. Reported-by: Andrey Konovalov Cc: Eric Dumazet Cc: Xin Long Signed-off-by: Cong Wang Reviewed-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 42b0540a98b19c0310780698a447e116aec98881 Author: Christian Perle Date: Mon Jun 12 10:06:57 2017 +0200 proc: snmp6: Use correct type in memset [ Upstream commit 3500cd73dff48f28f4ba80c171c4c80034d40f76 ] Reading /proc/net/snmp6 yields bogus values on 32 bit kernels. Use "u64" instead of "unsigned long" in sizeof(). Fixes: 4a4857b1c81e ("proc: Reduce cache miss in snmp6_seq_show") Signed-off-by: Christian Perle Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8fc4f0e6c3049d4fcbb41ba15b00398542ab09b0 Author: Majd Dibbiny Date: Sun May 28 14:47:56 2017 +0300 net/mlx5: Enable 4K UAR only when page size is bigger than 4K [ Upstream commit 91828bd89940e8145f91751a015bc11bc486aad0 ] When the page size isn't bigger than 4K, there is no added value of enabling 4K UAR feature in the Firmware. Modified the condition of enabling the 4K UAR accordingly. Fixes: f502d834950a ("net/mlx5: Activate support for 4K UARs") Signed-off-by: Majd Dibbiny Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit f0e6f2314d662a1e6b6bd77e1746a17f744fdacb Author: Tal Gilboa Date: Mon May 29 17:02:55 2017 +0300 net/mlx5e: Fix wrong indications in DIM due to counter wraparound [ Upstream commit 53acd76ce571e3b71f9205f2d49ab285a9f1aad8 ] DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for changing the channel interrupt moderation values in order to reduce CPU overhead for all traffic types. Each iteration of the algorithm, DIM calculates the difference in throughput, packet rate and interrupt rate from last iteration in order to make a decision. DIM relies on counters for each metric. When these counters get to their type's max value they wraparound. In this case the delta between 'end' and 'start' samples is negative and when translated to unsigned integers - very high. This results in a false indication to the algorithm and might result in a wrong decision. The fix calculates the 'distance' between 'end' and 'start' samples in a cyclic way around the relevant type's max value. It can also be viewed as an absolute value around the type's max value instead of around 0. Testing show higher stability in DIM profile selection and no wraparound issues. Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit f98a0883afef8c2a20978891a0b3fb02af50ff62 Author: Tal Gilboa Date: Mon May 15 14:13:16 2017 +0300 net/mlx5e: Added BW check for DIM decision mechanism [ Upstream commit c3164d2fc48fd4fa0477ab658b644559c3fe9073 ] DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for changing the channel interrupt moderation values in order to reduce CPU overhead for all traffic types. Until now only interrupt and packet rate were sampled. We found a scenario on which we get a false indication since a change in DIM caused more aggregation and reduced packet rate while increasing BW. We now regard a change as succesfull iff: current_BW > (prev_BW + threshold) or current_BW ~= prev_BW and current_PR > (prev_PR + threshold) or current_BW ~= prev_BW and current_PR ~= prev_PR and current_IR < (prev_IR - threshold) Where BW = Bandwidth, PR = Packet rate and IR = Interrupt rate Improvements (ConnectX-4Lx 25GbE, single RX queue, LRO off) -------------------------------------------------- packet size | before[Mb/s] | after[Mb/s] | gain | 2B | 343.4 | 359.4 | 4.5% | 16B | 2739.7 | 2814.8 | 2.7% | 64B | 9739 | 10185.3 | 4.5% | Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit ff8cf391fed865f7e44c67e7181e51537b184af1 Author: Huy Nguyen Date: Mon May 8 11:46:50 2017 -0500 net/mlx5: Remove several module events out of ethtool stats [ Upstream commit f729860a177d097ac44321fb2f7d927a0c54c5a3 ] Remove the following module event counters out of ethtool stats. The reason for removing these event counters is that these events do not occur without techinician's intervention. module_pwr_budget_exd module_long_range module_no_eeprom module_enforce_part module_unknown_id module_unknown_status module_plug Fixes: bedb7c909c19 ("net/mlx5e: Add port module event counters to ethtool stats") Signed-off-by: Huy Nguyen Reviewed by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit e938c05e064c67d1de62d0886ffe6267c5085564 Author: Jia-Ju Bai Date: Sat Jun 10 17:03:35 2017 +0800 net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse [ Upstream commit 343eba69c6968190d8654b857aea952fed9a6749 ] The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the function call path is: tipc_l2_rcv_msg (acquire the lock by rcu_read_lock) tipc_rcv tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep tipc_node_broadcast tipc_node_xmit_skb tipc_node_xmit tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC". Signed-off-by: Jia-Ju Bai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a89d15fc8bac19f5ae10b87d3a1df48d3457b102 Author: Jia-Ju Bai Date: Sat Jun 10 16:49:39 2017 +0800 net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx [ Upstream commit f146e872eb12ebbe92d8e583b2637e0741440db3 ] The kernel may sleep under a rcu read lock in cfpkt_create_pfx, and the function call path is: cfcnfg_linkup_rsp (acquire the lock by rcu_read_lock) cfctrl_linkdown_req cfpkt_create cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep cfserl_receive (acquire the lock by rcu_read_lock) cfpkt_split cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep There is "in_interrupt" in cfpkt_create_pfx to decide use "GFP_KERNEL" or "GFP_ATOMIC". In this situation, "GFP_KERNEL" is used because the function is called under a rcu read lock, instead in interrupt. To fix it, only "GFP_ATOMIC" is used in cfpkt_create_pfx. Signed-off-by: Jia-Ju Bai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 29b8ea35b0c1d4d6c4ab4e3c3ccfb774a22a3bb7 Author: Xin Long Date: Sat Jun 10 14:48:14 2017 +0800 sctp: disable BH in sctp_for_each_endpoint [ Upstream commit 581409dacc9176b0de1f6c4ca8d66e13aa8e1b29 ] Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling BH. If CPU schedules to another thread A at this moment, the thread A may be trying to hold the write_lock with disabling BH. As BH is disabled and CPU cannot schedule back to the thread holding the read_lock, while the thread A keeps waiting for the read_lock. A dead lock would be triggered by this. This patch is to fix this dead lock by calling read_lock_bh instead to disable BH when holding the read_lock in sctp_for_each_endpoint. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Reported-by: Xiumei Mu Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e607742172be5f4d7970f38e3dc263848ff37cf0 Author: Krister Johansen Date: Thu Jun 8 13:12:38 2017 -0700 Fix an intermittent pr_emerg warning about lo becoming free. [ Upstream commit f186ce61bb8235d80068c390dc2aad7ca427a4c2 ] It looks like this: Message from syslogd@flamingo at Apr 26 00:45:00 ... kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4 They seem to coincide with net namespace teardown. The message is emitted by netdev_wait_allrefs(). Forced a kdump in netdev_run_todo, but found that the refcount on the lo device was already 0 at the time we got to the panic. Used bcc to check the blocking in netdev_run_todo. The only places where we're off cpu there are in the rcu_barrier() and msleep() calls. That behavior is expected. The msleep time coincides with the amount of time we spend waiting for the refcount to reach zero; the rcu_barrier() wait times are not excessive. After looking through the list of callbacks that the netdevice notifiers invoke in this path, it appears that the dst_dev_event is the most interesting. The dst_ifdown path places a hold on the loopback_dev as part of releasing the dev associated with the original dst cache entry. Most of our notifier callbacks are straight-forward, but this one a) looks complex, and b) places a hold on the network interface in question. I constructed a new bcc script that watches various events in the liftime of a dst cache entry. Note that dst_ifdown will take a hold on the loopback device until the invalidated dst entry gets freed. [ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183 __dst_free rcu_nocb_kthread kthread ret_from_fork Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6e8f72327b7743db71827efd2c327a4beec94c3b Author: Mateusz Jurczyk Date: Thu Jun 8 11:13:36 2017 +0200 af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers [ Upstream commit defbcf2decc903a28d8398aa477b6881e711e3ea ] Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the AF_UNIX socket. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Signed-off-by: Mateusz Jurczyk Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 83cb92f4cf116e059d83c6e103c7fd9acbac76d9 Author: David Ahern Date: Thu Jun 8 11:31:11 2017 -0600 net: vrf: Make add_fib_rules per network namespace flag [ Upstream commit 097d3c9508dc58286344e4a22b300098cf0c1566 ] Commit 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") adds the l3mdev FIB rule the first time a VRF device is created. However, it only creates the rule once and only in the namespace the first device is created - which may not be init_net. Fix by using the net_generic capability to make the add_fib_rules flag per network namespace. Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Reported-by: Petr Machata Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5586883813df1a46a99d79b8ab69963b9f3401ba Author: David Ahern Date: Wed Jun 7 12:26:23 2017 -0600 net: ipv6: Release route when device is unregistering [ Upstream commit 8397ed36b7c585f8d3e06c431f4137309124f78f ] Roopa reported attempts to delete a bond device that is referenced in a multipath route is hanging: $ ifdown bond2 # ifupdown2 command that deletes virtual devices unregister_netdevice: waiting for bond2 to become free. Usage count = 2 Steps to reproduce: echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown ip link add dev bond12 type bond ip link add dev bond13 type bond ip addr add 2001:db8:2::0/64 dev bond12 ip addr add 2001:db8:3::0/64 dev bond13 ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2 ip link del dev bond12 ip link del dev bond13 The root cause is the recent change to keep routes on a linkdown. Update the check to detect when the device is unregistering and release the route for that case. Fixes: a1a22c12060e4 ("net: ipv6: Keep nexthop of multipath route on admin down") Reported-by: Roopa Prabhu Signed-off-by: David Ahern Acked-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 199f4baff672c12657637791964b194b8212706d Author: Mintz, Yuval Date: Wed Jun 7 21:00:33 2017 +0300 net: Zero ifla_vf_info in rtnl_fill_vfinfo() [ Upstream commit 0eed9cf58446b28b233388b7f224cbca268b6986 ] Some of the structure's fields are not initialized by the rtnetlink. If driver doesn't set those in ndo_get_vf_config(), they'd leak memory to user. Signed-off-by: Yuval Mintz CC: Michal Schmidt Reviewed-by: Greg Rose Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d8d01fc9bad3e0d06c34cb4ed64254b34ca6da06 Author: Mateusz Jurczyk Date: Wed Jun 7 16:14:29 2017 +0200 decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb [ Upstream commit dd0da17b209ed91f39872766634ca967c170ada1 ] Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e34cacd27f477800cb37cffbd055b0560a94065e Author: Johannes Berg Date: Fri Jun 9 21:33:09 2017 +0200 mac80211: free netdev on dev_alloc_name() error [ Upstream commit c7a61cba71fd151cc7d9ebe53a090e0e61eeebf3 ] The change to remove free_netdev() from ieee80211_if_free() erroneously didn't add the necessary free_netdev() for when ieee80211_if_free() is called directly in one place, rather than as the priv_destructor. Add the missing call. Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Johannes Berg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 64603b75f8f625437866b896bbad466459dcf7ed Author: Stephen Rothwell Date: Thu Jun 8 19:06:29 2017 +1000 net: s390: fix up for "Fix inconsistent teardown and release of private netdev state" [ Upstream commit cd1997f6c11483da819a7719aa013093b8003743 ] Signed-off-by: Stephen Rothwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 95876855a55072572895a236b156ffb357fd5538 Author: David S. Miller Date: Mon May 8 12:52:56 2017 -0400 net: Fix inconsistent teardown and release of private netdev state. [ Upstream commit cf124db566e6b036b8bcbe8decbed740bdfac8c6 ] Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3227b51e72f4d8e33cd6274e471c08b2fbec6e2e Author: Alexander Potapenko Date: Tue Jun 6 15:56:54 2017 +0200 net: don't call strlen on non-terminated string in dev_set_alias() [ Upstream commit c28294b941232931fbd714099798eb7aa7e865d7 ] KMSAN reported a use of uninitialized memory in dev_set_alias(), which was caused by calling strlcpy() (which in turn called strlen()) on the user-supplied non-terminated string. Signed-off-by: Alexander Potapenko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman