commit 2ffe8469f55805a0ce527bdf32ee85857dfad87a Author: Jiri Slaby Date: Wed Aug 20 13:43:00 2014 +0200 Linux 3.12.27 commit 9db8cb5a6fa96ef0637e55f66dd970e51224d99c Author: Ales Novak Date: Fri Jun 6 14:35:39 2014 -0700 drivers/rtc/interface.c: fix infinite loop in initializing the alarm commit ee1d90146815fdc8d653c558b327fff2acba041d upstream. In __rtc_read_alarm(), if the alarm time retrieved by rtc_read_alarm_internal() from the device contains invalid values (e.g. month=2,mday=31) and the year not set (=-1), the initialization will loop infinitely because the year-fixing loop expects the time being invalid due to leap year. Fix reduces the loop to the leap years and adds final validity check. Signed-off-by: Ales Novak Acked-by: Alessandro Zummo Reported-by: Jiri Bohac Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit 33751b9ed01ee46fd1c20cf8c42c396704d88b28 Author: Jan Beulich Date: Fri Aug 8 14:20:09 2014 -0700 drivers/rtc/rtc-efi.c: check for invalid data coming back from UEFI commit 6e85bab6bc1019f9b87c53b32da3ad7791e7ddf9 upstream. In particular seeing zero in eft->month is problematic, as it results in -1 (converted to unsigned int, i.e. yielding 0xffffffff) getting passed to rtc_year_days(), where the value gets used as an array index (normally resulting in a crash). This was observed with the driver enabled on x86 on some Fujitsu system (with possibly not up to date firmware, but anyway). Perhaps efi_read_alarm() should not fail if neither enabled nor pending are set, but the returned time is invalid? Signed-off-by: Jan Beulich Reported-by: Raymund Will Cc: Alessandro Zummo Cc: Jingoo Han Acked-by: Lee, Chun-Yi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit e7afccd211af3976fd1c874d9aea7dee0005921e Author: Lee, Chun-Yi Date: Fri Jun 6 14:35:48 2014 -0700 drivers/rtc/rtc-efi.c: avoid subtracting day twice when computing year days commit 809d9627087e1db63b8672c1f264af73b13116fb upstream. Compared source code of rtc-lib.c::rtc_year_days() with efirtc.c::rtc_year_days(), found the code in rtc-efi decreases value of day twice when it computing year days. rtc-lib.c::rtc_year_days() has already decrease days and return the year days from 0 to 365. Signed-off-by: Lee, Chun-Yi Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit 8df01ee264d62a04507fbf7e28416953861fc0d1 Author: Vitaliy Kulikov Date: Thu Nov 14 11:52:16 2013 -0600 ALSA: hda - load EQ params into IDT codec on HP bNB13 systems commit d009f3deb788f7d06fe04c52eaf812b657a0ca68 upstream. Adds linear EQ filtering for integrated speaker protection Signed-off-by: Vitaliy Kulikov Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby commit fb5d0e891e9e70755b4aa2d2b7561250ddb7be6e Author: Benjamin Tisssoires Date: Wed Jan 8 17:18:45 2014 -0500 HID: logitech-dj: Fix USB 3.0 issue commit 42c22dbf81ebd1146960875ddfe71630cb2b3ae6 upstream. This fix (not very clean though) should fix the long time USB3 issue that was spotted last year. The rational has been given by Hans de Goede: ---- I think the most likely cause for this is a firmware bug in the unifying receiver, likely a race condition. The most prominent difference between having a USB-2 device plugged into an EHCI (so USB-2 only) port versus an XHCI port will be inter packet timing. Specifically if you send packets (ie hid reports) one at a time, then with the EHCI controller their will be a significant pause between them, where with XHCI they will be very close together in time. The reason for this is the difference in EHCI / XHCI controller OS <-> driver interfaces. For non periodic endpoints (control, bulk) the EHCI uses a circular linked-list of commands in dma-memory, which it follows to execute commands, if the list is empty, it will go into an idle state and re-check periodically. The XHCI uses a ring of commands per endpoint, and if the OS places anything new on the ring it will do an ioport write, waking up the XHCI making it send the new packet immediately. For periodic transfers (isoc, interrupt) the delay between packets when sending one at a time (rather then queuing them up) will be even larger, because they need to be inserted into the EHCI schedule 2 ms in the future so the OS driver can be sure that the EHCI driver does not try to start executing the time slot in question before the insertion has completed. So a possible fix may be to insert a delay between packets being send to the receiver. ---- I tested this on a buggy Haswell USB 3.0 motherboard, and I always get the notification after adding the msleep. Signed-off-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Jiri Slaby commit 0fc236527720e4dd52260bb9a2fd0fa6423eb0f5 Author: Jiri Kosina Date: Wed Jul 9 09:48:06 2014 -0700 Input: i8042 - add Acer Aspire 5710 to nomux blacklist commit 8c947e20cb1f442c704852b2ca24b81981b09493 upstream. Acer Aspire needs to be added to nomux blacklist, otherwise the touchpad misbehaves rather randomly. Signed-off-by: Jiri Kosina Signed-off-by: Dmitry Torokhov Signed-off-by: Jiri Slaby commit 93a5d430af33e98399b5677cc77f22838d590e40 Author: Laurent Dufour Date: Thu Apr 10 15:02:13 2014 +0200 PCI: rphahp: Fix endianess issues commit 761ce53330a4f02c58768631027d1c1dd0d538f7 upstream. Numerical values stored in the device tree are encoded in Big Endian and should be byte swapped when running in Little Endian. The RPA hotplug module should convert those values as well. Note that in rpaphp_get_drc_props(), the comparison between indexes[i+1] and *index is done using the BE values (whatever is the current endianess). This doesn't matter since we are checking for equality here. This way only the returned value is byte swapped. RPA also made RTAS calls which implies BE values to be used. According to the patch done in RTAS (http://patchwork.ozlabs.org/patch/336865), no additional conversion is required in RPA. Signed-off-by: Laurent Dufour Signed-off-by: Bjorn Helgaas Signed-off-by: Jiri Slaby commit 56cf7e077c081a5874dcbd722cbae771bb5177fc Author: Ying Xue Date: Fri Oct 18 07:23:14 2013 +0200 tipc: don't use memcpy to copy from user space commit 5c0a0fc81f4dc786b42c4fc9c7c72ba635406ab5 upstream. tipc_msg_build() calls skb_copy_to_linear_data_offset() to copy data from user space to kernel space. However, the latter function does in its turn call memcpy() to perform the actual copying. This poses an obvious security and robustness risk, since memcpy() never makes any validity check on the pointer it is copying from. To correct this, we the replace the offending function call with a call to memcpy_fromiovecend(), which uses copy_from_user() to perform the copying. Signed-off-by: Ying Xue Reviewed-by: Paul Gortmaker Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit e0c516a109e873556edd487912c62bc39ad5c9ce Author: Nithin Sujir Date: Fri Sep 20 16:46:56 2013 -0700 tg3: Add support for new 577xx device ids commit 68273712a19e9107a498a371532b3b3eb6dbb14c upstream. This patch adds support for 57764, 57765, 57787, 57782 and 57786 devices. Signed-off-by: Nithin Nayak Sujir Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit f3cc3cedbac7d7d8af56b4acdcf5ee042c436f7c Author: Maurizio Lombardi Date: Tue Apr 1 13:58:22 2014 +0200 bnx2fc: fix memory leak in bnx2fc_allocate_hash_table() commit fdbcbcab0eae6773430546697ace0b3fe48e7fbc upstream. In case of error, the bnx2fc_allocate_hash_table() didn't free all the memory it allocated. Signed-off-by: Maurizio Lombardi Acked-by: Eddie Wai Signed-off-by: Christoph Hellwig Signed-off-by: Jiri Slaby commit f4ee0d62f2c63d5ef77fde1575aeef457e054902 Author: Yuval Mintz Date: Sat Sep 28 08:46:07 2013 +0300 bnx2x: Test nvram when interface is down commit bd8e012b5d369933f50842294372ed580f5d9605 upstream. Since commit 3fb43eb ("bnx2x: Change to D3hot only on removal") nvram is accessible whenever the driver is loaded - Thus it is possible to test it during self-test even if the interface is down Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior Signed-off-by: Eilon Greenstein Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 5f55b99739d4a2b339aab0e4f3ef9091f0c86b15 Author: Dan Carpenter Date: Tue May 27 00:04:44 2014 +0300 RDMA/cxgb3: Fix information leak in send_abort() commit e4514cbd972786af67dd6c442c072685387e22a2 upstream. The cpl_abort_req struct has several reserved members which need to be cleared to avoid disclosing kernel information. I have added a memset() so now it matches the cxgb4 version of this function. Signed-off-by: Dan Carpenter Acked-by: Steve Wise Signed-off-by: Roland Dreier Signed-off-by: Jiri Slaby commit 465a844d10e4f4ba2f6bf0a2f4833d65ea08097c Author: David Gibson Date: Fri Dec 20 15:10:44 2013 +1100 netxen: Correct off-by-one errors in bounds checks commit 4710b2ba873692194c636811ceda398f95e02db2 upstream. netxen_process_lro() contains two bounds checks. One for the ring number against the number of rings, and one for the Rx buffer ID against the array of receive buffers. Both of these have off-by-one errors, using > instead of >=. The correct versions are used in netxen_process_rcv(), they're just wrong in netxen_process_lro(). Signed-off-by: David Gibson Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 854ad0a065a8cd0136533b29478ba2cf658c87a3 Author: Russell King Date: Mon Jun 10 12:16:54 2013 +0100 DMA-API: net: brocade/bna/bnad.c: fix 32-bit DMA mask handling commit 3e5480791e3b0e239d2cd4e5ecd43a7d2585484b upstream. The fallback to 32-bit DMA mask is rather odd: if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) && !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) { *using_dac = true; } else { err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) goto release_regions; } This means we only try and set the coherent DMA mask if we failed to set a 32-bit DMA mask, and only if both fail do we fail the driver. Adjust this so that if either setting fails, we fail the driver - and thereby end up properly setting both the DMA mask and the coherent DMA mask in the fallback case. Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit 383bc2fefbf3af2dd4fe1b12192445d02d2ff114 Author: Wei Yongjun Date: Tue Sep 24 05:18:45 2013 +0000 igbvf: add missing iounmap() on error in igbvf_probe() commit de524681f88ff4ed293aa239f83c8cb04d59b47d upstream. Add the missing iounmap() before return from igbvf_probe() in the error handling case. Signed-off-by: Wei Yongjun Tested-by: Aaron Brown Tested-by: Sibai Li Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 9b0b3e8944123d9888b71f84d844ec8ebcefc70d Author: Dan Carpenter Date: Fri Sep 13 20:44:20 2013 +0000 igbvf: integer wrapping bug setting the mtu commit 3de9e65f011b95235a789b12abc4730570cdb737 upstream. If new_mtu is very large then "new_mtu + ETH_HLEN + ETH_FCS_LEN" can wrap and the check on the next line can underflow. This is one of those bugs which can be triggered by the user if you have namespaces configured. Also since this is something the user can trigger then we don't want to have dev_err() message. This is a static checker fix and I'm not sure what the impact is. Signed-off-by: Dan Carpenter Tested-by: Aaron Brown Tested-by: Sibai Li Sibai.li@intel.com> Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit ae09150a6de6cdade01b451ed95abe8f041251fe Author: Russell King Date: Mon Jun 10 12:26:32 2013 +0100 DMA-API: net: intel/igbvf: fix 32-bit DMA mask handling commit c21b8ebc2f1613fd0a9d5aa0d0d1083aee8ca306 upstream. The fallback to 32-bit DMA mask is rather odd: err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!err) pci_using_dac = 1; } else { err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { dev_err(&pdev->dev, "No usable DMA " "configuration, aborting\n"); goto err_dma; } } } This means we only set the coherent DMA mask in the fallback path if the DMA mask set failed, which is silly. This fixes it to set the coherent DMA mask only if dma_set_mask() succeeded, and to error out if either fails. Acked-by: Jeff Kirsher Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit 47687f86b33ee462925abaddd858a80bd3002f19 Author: Akeem G Abodunrin Date: Fri Nov 8 01:54:07 2013 +0000 igb: Fixed Wake On LAN support commit 42ce4126d8bc2e128e1f207cf79bb0623fac498f upstream. This patch fixes Wake on LAN being reported as supported on some Ethernet ports, in contrary to Hardware capability. Signed-off-by: Akeem G Abodunrin Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 1a05931d6738d3c70958d6b93fa4ada02f1d8ecd Author: Fujinaka, Todd Date: Wed Oct 23 05:52:11 2013 +0000 igb: Don't let ethtool try to write to iNVM in i210/i211 commit a71fc313c4f569be5788caff07ef1fe346842c5b upstream. Don't let ethtool try to write to iNVM in i210/i211. This fixes an issue seen by Marek Vasut. Reported-by: Marek Vasut Signed-off-by: Todd Fujinaka Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit e316cc5097285a8cea595ed1052ca4d0626059e3 Author: Stefan Assmann Date: Tue Sep 24 05:18:39 2013 +0000 igb: fix driver reload with VF assigned to guest commit 781798a11e2820ee35fa9142869bb8cec117dedc upstream. commit fa44f2f185f7f9da19d331929bb1b56c1ccd1d93 broke reloading of igb, when VFs are assigned to a guest, in several ways. 1. on module load adapter->vf_data does not get properly allocated, resulting in a null pointer exception when accessing adapter->vf_data in igb_reset() on module reload. modprobe -r igb ; modprobe igb max_vfs=7 [ 215.215837] igb 0000:01:00.1: removed PHC on eth1 [ 216.932072] igb 0000:01:00.1: IOV Disabled [ 216.937038] igb 0000:01:00.0: removed PHC on eth0 [ 217.127032] igb 0000:01:00.0: Cannot deallocate SR-IOV virtual functions while they are assigned - VFs will not be deallocated [ 217.146178] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k [ 217.154050] igb: Copyright (c) 2007-2013 Intel Corporation. [ 217.160688] igb 0000:01:00.0: Enabling SR-IOV VFs using the module parameter is deprecated - please use the pci sysfs interface. [ 217.173703] igb 0000:01:00.0: irq 103 for MSI/MSI-X [ 217.179227] igb 0000:01:00.0: irq 104 for MSI/MSI-X [ 217.184735] igb 0000:01:00.0: irq 105 for MSI/MSI-X [ 217.220082] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 217.228846] IP: [] igb_reset+0xc5/0x4b0 [igb] [ 217.235472] PGD 3607ec067 PUD 36170b067 PMD 0 [ 217.240461] Oops: 0002 [#1] SMP [ 217.244085] Modules linked in: igb(+) igbvf mptsas mptscsih mptbase scsi_transport_sas [last unloaded: igb] [ 217.255040] CPU: 4 PID: 4833 Comm: modprobe Not tainted 3.11.0+ #46 [...] [ 217.390007] [] igb_probe+0x892/0xfd0 [igb] [ 217.396422] [] local_pci_probe+0x1e/0x40 [ 217.402641] [] pci_device_probe+0xf9/0x110 [...] 2. A follow up issue, pci_enable_sriov() should only be called if no VFs were still allocated on module unload. Otherwise pci_enable_sriov() gets called multiple times in a row rendering the NIC unusable until reset. 3. simply calling igb_enable_sriov() in igb_probe_vfs() is not enough as the interrupts need to be re-setup. Switching that to igb_pci_enable_sriov(). Signed-off-by: Stefan Assmann Tested-by: Aaron Brown Tested-by: Sibai Li Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 9362bb6d02c524909018ebc8eb8b95a3822ad066 Author: Carolyn Wyborny Date: Fri Aug 16 00:39:10 2013 +0000 igb: Fix master/slave mode for all m88 i354 PHY's commit d1c17d806b6a52ff020322bec457717a91ea50a9 upstream. This patch calls code to set the master/slave mode for all m88 gen 2 PHY's. This patch also removes the call to this function for I210 devices only from the function that is not called by I210 devices. Signed-off-by: Carolyn Wyborny Tested-by: Jeff Pieper Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 767972b19f99648d634307fbad8c3beeb1ccdc5b Author: Fujinaka, Todd Date: Tue Oct 1 04:33:55 2013 -0700 igb: Add ethtool offline tests for i354 commit a4e979a27db3eb77e286dbe484e96c0c9c986e83 upstream. Add the ethtool offline tests for i354 devices. Signed-off-by: Todd Fujinaka Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit e0cb1eee84963d171418bb557943e2a11b3685fb Author: Russell King Date: Mon Jun 10 12:24:50 2013 +0100 DMA-API: net: intel/igb: fix 32-bit DMA mask handling commit dc4ff9bb7534ebd153f8441ec0e9190964ad8944 upstream. The fallback to 32-bit DMA mask is rather odd: err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!err) pci_using_dac = 1; } else { err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { dev_err(&pdev->dev, "No usable DMA configuration, aborting\n"); goto err_dma; } } } This means we only set the coherent DMA mask in the fallback path if the DMA mask set failed, which is silly. This fixes it to set the coherent DMA mask only if dma_set_mask() succeeded, and to error out if either fails. Acked-by: Jeff Kirsher Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit 9367cece0ab4d45ccdfdc0b33b206a300e0a8c59 Author: Don Skidmore Date: Tue Oct 1 04:33:49 2013 -0700 ixgbevf: cleanup redundant mailbox read failure check commit c7bb417dbb8888cfd20824d54f9af9c92b9ff43d upstream. Since we are already checking for read failure in check_link we don't need to do it here. Instead just make sure the watchdog task gets scheduled, if we are up, and it can be done there. This will better follow igbvf method of handling a mailbox event and message timeout. Signed-off-by: Alexander Duyck Signed-off-by: Don Skidmore Tested-by: Stephen Ko Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 6532156efe9293a42ff2db4c7b30a60c7126b577 Author: Russell King Date: Mon Jun 10 12:49:38 2013 +0100 DMA-API: net: intel/ixgbevf: fix 32-bit DMA mask handling commit 53567aa4e00399aa59339bba81b285a5b95f425c upstream. The fallback to 32-bit DMA mask is rather odd: if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) && !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) { pci_using_dac = 1; } else { err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { dev_err(&pdev->dev, "No usable DMA " "configuration, aborting\n"); goto err_dma; } } pci_using_dac = 0; } This means we only set the coherent DMA mask in the fallback path if the DMA mask set failed, which is silly. This fixes it to set the coherent DMA mask only if dma_set_mask() succeeded, and to error out if either fails. Acked-by: Jeff Kirsher Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit edd1e69cae088d17cc360bb88f55f4bacacbb4d4 Author: Emil Tantilov Date: Sat Oct 26 08:13:20 2013 +0000 ixgbe: fix inconsistent clearing of the multicast table commit cf78959c0d7afbde31498afc4212294c28e2c278 upstream. This patch resolves an issue where the MTA table can be cleared when the interface is reset while in promisc mode. As result IPv6 traffic between VFs will be interrupted. This patch makes the update of the MTA table unconditional to avoid the inconsistent clearing on reset. Signed-off-by: Emil Tantilov Tested-by: Phil Schmitt Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 0a021a10f8f791971ab795bac74e10529d41e8a6 Author: Jacob Keller Date: Sat Sep 21 05:05:44 2013 +0000 ixgbe: fix qv_lock_napi call in ixgbe_napi_disable_all commit 27d9ce4fd0e2e75c2907f6d3dc0487012a3e4298 upstream. ixgbe_napi_disable_all calls napi_disable on each queue, however the busy polling code introduced a local_bh_disable()d context around the napi_disable. The original author did not realize that napi_disable might sleep, which would cause a sleep while atomic BUG. In addition, on a single processor system, the ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to indicate to the poll and napi routines that the q_vector has been disabled. Now the ixgbe_napi_disable_all function will wait until all pending work has been finished and prevent any future work from being started. Signed-off-by: Jacob Keller Cc: Eliezer Tamir Cc: Alexander Duyck Cc: Hyong-Youb Kim Cc: Amir Vadai Cc: Dmitry Kravkov Tested-by: Phil Schmitt Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 5ad438f9fa0cbc611e7851d6154833e6c0a692e1 Author: Emil Tantilov Date: Tue Oct 22 08:21:04 2013 +0000 ixgbe: fix rx-usecs range checks for BQL commit 2e0103810c6fed6a736c4a3af87b0f5c6bd8cd5b upstream. This patch resolves an issue where the logic used to detect changes in rx-usecs was incorrect and was masked by the call to ixgbe_update_rsc(). Setting rx-usecs between 0,2-9 and 1,10 and up requires a reset to allow ixgbe_configure_tx_ring() to set the correct value for TXDCTL.WTHRESH in order to avoid Tx hangs with BQL enabled. Signed-off-by: Emil Tantilov Tested-by: Phil Schmitt Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 40580423ec6a3428fc8741448af92b946df80f12 Author: Russell King Date: Mon Jun 10 12:47:42 2013 +0100 DMA-API: net: intel/ixgbe: fix 32-bit DMA mask handling commit f5f2eda8049644a27af5fdf59c3766589358e435 upstream. The fallback to 32-bit DMA mask is rather odd: if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) && !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) { pci_using_dac = 1; } else { err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { dev_err(&pdev->dev, "No usable DMA configuration, aborting\n"); goto err_dma; } } pci_using_dac = 0; } This means we only set the coherent DMA mask in the fallback path if the DMA mask set failed, which is silly. This fixes it to set the coherent DMA mask only if dma_set_mask() succeeded, and to error out if either fails. Acked-by: Jeff Kirsher Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit bbf0947cce5b50c9263ee21488c1d73b74b0cbbc Author: Vladimir Davydov Date: Sat Nov 23 07:18:01 2013 +0000 e1000: fix possible reset_task running after adapter down commit 74a1b1ea8a30b035aaad833bbd6b9263e72acfac upstream. On e1000_down(), we should ensure every asynchronous work is canceled before proceeding. Since the watchdog_task can schedule other works apart from itself, it should be stopped first, but currently it is stopped after the reset_task. This can result in the following race leading to the reset_task running after the module unload: e1000_down_and_stop(): e1000_watchdog(): ---------------------- ----------------- cancel_work_sync(reset_task) schedule_work(reset_task) cancel_delayed_work_sync(watchdog_task) The patch moves cancel_delayed_work_sync(watchdog_task) at the beginning of e1000_down_and_stop() thus ensuring the race is impossible. Cc: Tushar Dave Cc: Patrick McHardy Signed-off-by: Vladimir Davydov Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit d2c68516ae0399625de1d0753b4f81514368d1c1 Author: yzhu1 Date: Sat Nov 23 07:07:40 2013 +0000 e1000: prevent oops when adapter is being closed and reset simultaneously commit 6a7d64e3e09e11181a07a2e8cd6af5d6355133be upstream. This change is based on a similar change made to e1000e support in commit bb9e44d0d0f4 ("e1000e: prevent oops when adapter is being closed and reset simultaneously"). The same issue has also been observed on the older e1000 cards. Here, we have increased the RESET_COUNT value to 50 because there are too many accesses to e1000 nic on stress tests to e1000 nic, it is not enough to set RESET_COUT 25. Experimentation has shown that it is enough to set RESET_COUNT 50. Signed-off-by: yzhu1 Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 65b452dd55c938193e6b5ac87e02c57b334d4aa6 Author: Hong Zhiguo Date: Tue Oct 22 18:32:56 2013 +0000 e1000: fix wrong queue idx calculation commit 49a45a0686cc2b43bcb3834a68416a201475dc77 upstream. tx_ring and adapter->tx_ring are already of type "struct e1000_tx_ring *" Signed-off-by: Hong Zhiguo Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 8584ebae95d4884d1a09bd3f66cababaf677a600 Author: Mika Westerberg Date: Thu Jan 16 14:39:39 2014 +0200 e1000e: Fix compilation warning when !CONFIG_PM_SLEEP commit 38a529b5d42e4cfc5ac94844e61335a00eb2d320 upstream. Commit 7509963c703b (e1000e: Fix a compile flag mis-match for suspend/resume) moved suspend and resume hooks to be available when CONFIG_PM is set. However, it can be set even if CONFIG_PM_SLEEP is not set causing following warnings to be emitted: drivers/net/ethernet/intel/e1000e/netdev.c:6178:12: warning: ‘e1000_suspend’ defined but not used [-Wunused-function] drivers/net/ethernet/intel/e1000e/netdev.c:6185:12: warning: ‘e1000_resume’ defined but not used [-Wunused-function] To fix this make the hooks to be available only when CONFIG_PM_SLEEP is set and remove CONFIG_PM wrapping from driver ops because this is already handled by SET_SYSTEM_SLEEP_PM_OPS() and SET_RUNTIME_PM_OPS(). Signed-off-by: Mika Westerberg Cc: Dave Ertman Cc: Aaron Brown Cc: Jeff Kirsher Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 8c69f855ad9655b293647c7b383d19cbc45be756 Author: David Ertman Date: Tue Dec 17 04:42:42 2013 +0000 e1000e: Fix a compile flag mis-match for suspend/resume commit 7509963c703b71eebccc421585e7f48ebbbd3f38 upstream. This patch addresses a mis-match between the declaration and usage of the e1000_suspend and e1000_resume functions. Previously, these functions were declared in a CONFIG_PM_SLEEP wrapper, and then utilized within a CONFIG_PM wrapper. Both the declaration and usage will now be contained within CONFIG_PM wrappers. Signed-off-by: Dave Ertman Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Jiri Slaby commit 174729be6b473370b855754b71d9c214c9ca24b2 Author: Russell King Date: Mon Jun 10 12:22:30 2013 +0100 DMA-API: net: intel/e1000e: fix 32-bit DMA mask handling commit 718a39eb587e038f7ded076afcfd8d709879139f upstream. The fallback to 32-bit DMA mask is rather odd: err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!err) pci_using_dac = 1; } else { err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)); if (err) { dev_err(&pdev->dev, "No usable DMA configuration, aborting\n"); goto err_dma; } } } This means we only set the coherent DMA mask in the fallback path if the DMA mask set failed, which is silly. This fixes it to set the coherent DMA mask only if dma_set_mask() succeeded, and to error out if either fails. Acked-by: Jeff Kirsher Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit 8bac7a35e60ca70c8d12ddbfdf28a8df5a976b2b Author: Russell King Date: Wed Jun 26 13:49:44 2013 +0100 DMA-API: provide a helper to set both DMA and coherent DMA masks commit 4aa806b771d16b810771d86ce23c4c3160888db3 upstream. Provide a helper to set both the DMA and coherent DMA masks to the same value - this avoids duplicated code in a number of drivers, sometimes with buggy error handling, and also allows us identify which drivers do things differently. Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit d9be25f8975760c11671fb0d214bfeb25c9bc2b3 Author: Keith Packard Date: Mon Jan 20 13:31:10 2014 -0800 fbcon: Clean up fbcon data in fb_info on FB_EVENT_FB_UNBIND with 0 fbs commit 5f4dc28bd9c8a990ed6253303b7a821a7abfe9fa upstream. When FB_EVENT_FB_UNBIND is sent, fbcon has two paths, one path taken when there is another frame buffer to switch any affected vcs to and another path when there isn't. In the case where there is another frame buffer to use, fbcon_fb_unbind calls set_con2fb_map to remap all of the affected vcs to the replacement frame buffer. set_con2fb_map will eventually call con2fb_release_oldinfo when the last vcs gets unmapped from the old frame buffer. con2fb_release_oldinfo frees the fbcon data that is hooked off of the fb_info structure, including the cursor timer. In the case where there isn't another frame buffer to use, fbcon_fb_unbind simply calls fbcon_unbind, which doesn't clear the con2fb_map or free the fbcon data hooked from the fb_info structure. In particular, it doesn't stop the cursor blink timer. When the fb_info structure is then freed, we end up with a timer queue pointing into freed memory and "bad things" start happening. This patch first changes con2fb_release_oldinfo so that it can take a NULL pointer for the new frame buffer, but still does all of the deallocation and cursor timer cleanup. Finally, the patch tries to replicate some of what set_con2fb_map does by clearing the con2fb_map for the affected vcs and calling the modified con2fb_release_info function to clean up the fb_info structure. Signed-off-by: Keith Packard Signed-off-by: Tomi Valkeinen Signed-off-by: Jiri Slaby commit 0e8701c008436378d89f97e438ed8ded0debc7d4 Author: Cedric Le Goater Date: Wed Dec 4 17:49:51 2013 +0100 offb: Little endian fixes commit 212c0cbd5be721a39ef3e2f723e0c78008f9e955 upstream. The "screen" properties : depth, width, height, linebytes need to be converted to the host endian order when read from the device tree. The offb_init_palette_hacks() routine also made assumption on the host endian order. Signed-off-by: Cédric Le Goater Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Jiri Slaby commit 056a01ab3c65605a16aa4841de0530d9d3d66b9f Author: Jan Kara Date: Thu Jun 26 12:28:57 2014 -0400 ext4: Fix block zeroing when punching holes in indirect block files commit 77ea2a4ba657a1ad4fb7c64bc5cdce84b8a132b6 upstream. free_holes_block() passed local variable as a block pointer to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local variable instead of proper place in inode / indirect block. We later zero out proper place in inode / indirect block but don't dirty the inode / buffer again which can lead to subtle issues (some changes e.g. to inode can be lost). Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Jiri Slaby commit 8b18c0adbc5d0cb1530692e72bcfb88fd7bb77bb Author: Eric W. Biederman Date: Mon Jul 28 17:26:07 2014 -0700 mnt: Correct permission checks in do_remount commit 9566d6742852c527bf5af38af5cbb878dad75705 upstream. While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn Signed-off-by: "Eric W. Biederman" Signed-off-by: Jiri Slaby commit cab259f821fad20afa688d3fbeb47356447ac20b Author: Eric W. Biederman Date: Mon Jul 28 17:10:56 2014 -0700 mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount commit 07b645589dcda8b7a5249e096fece2a67556f0f4 upstream. There are no races as locked mount flags are guaranteed to never change. Moving the test into do_remount makes it more visible, and ensures all filesystem remounts pass the MNT_LOCK_READONLY permission check. This second case is not an issue today as filesystem remounts are guarded by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged mount namespaces, but it could become an issue in the future. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn Signed-off-by: "Eric W. Biederman" Signed-off-by: Jiri Slaby commit 25c1def33a2f74079f3062b7afdf98fcf9f34e6d Author: Eric W. Biederman Date: Mon Jul 28 16:26:53 2014 -0700 mnt: Only change user settable mount flags in remount commit a6138db815df5ee542d848318e5dae681590fccd upstream. Kenton Varda discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn Signed-off-by: "Eric W. Biederman" Signed-off-by: Jiri Slaby commit 542b50ef6910233f971d30aae0e1322339e25ae9 Author: Naoya Horiguchi Date: Wed Jul 23 14:00:19 2014 -0700 mm: hugetlb: fix copy_hugetlb_page_range() commit 0253d634e0803a8376a0d88efee0bf523d8673f9 upstream. Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry") changed the order of huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage in some workloads like hugepage-backed heap allocation via libhugetlbfs. This patch fixes it. The test program for the problem is shown below: $ cat heap.c #include #include #include #define HPS 0x200000 int main() { int i; char *p = malloc(HPS); memset(p, '1', HPS); for (i = 0; i < 5; i++) { if (!fork()) { memset(p, '2', HPS); p = malloc(HPS); memset(p, '3', HPS); free(p); return 0; } } sleep(1); free(p); return 0; } $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry"), so is applicable to -stable kernels which include it. Signed-off-by: Naoya Horiguchi Reported-by: Guillaume Morin Suggested-by: Guillaume Morin Acked-by: Hugh Dickins Cc: [2.6.37+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit f4050f0ccbe5a21396e51fce3319192dc61841c1 Author: Naoya Horiguchi Date: Mon Jun 23 13:22:03 2014 -0700 hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry commit 4a705fef986231a3e7a6b1a6d3c37025f021f49f upstream. There's a race between fork() and hugepage migration, as a result we try to "dereference" a swap entry as a normal pte, causing kernel panic. The cause of the problem is that copy_hugetlb_page_range() can't handle "swap entry" family (migration entry and hwpoisoned entry) so let's fix it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Naoya Horiguchi Acked-by: Hugh Dickins Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit d7c722da3c3aeed898eccbad143ff6bfd84812e5 Author: Eliad Peller Date: Tue Feb 11 12:30:18 2014 +0200 mac80211: reset probe_send_count also in HW_CONNECTION_MONITOR case commit 448cd2e248732326632957e52ea9c44729affcb2 upstream. In case of beacon_loss with IEEE80211_HW_CONNECTION_MONITOR device, mac80211 probes the ap (and disconnects on timeout) but ignores the ack. If we already got an ack, there's no reason to continue disconnecting. this can help devices that supports IEEE80211_HW_CONNECTION_MONITOR only partially (e.g. take care of keep alives, but does not probe the ap. In case the device wants to disconnect without probing, it can just call ieee80211_connection_loss. Signed-off-by: Eliad Peller Signed-off-by: Johannes Berg Signed-off-by: Jiri Slaby commit 27b249776f31bf46e2436bf3e41624a208e7ba61 Author: Ilan Peer Date: Tue Dec 24 22:08:14 2013 +0200 iwlwifi: mvm: Add a missed beacons threshold commit 12d423e816c69b0b4457bc047dda9a0a1c1a53c1 upstream. Instead of always calling ieee80211_beacon_loss() on every missed beacons notification, call this function only if the number of consecutive missed beacons from last rx is higher than a predefined threshold. Signed-off-by: Ilan Peer Signed-off-by: Emmanuel Grumbach Signed-off-by: Jiri Slaby commit 15f4a2d048fd6447b8a66d20231cfad4dca5fa86 Author: Andrey Utkin Date: Mon Aug 4 23:47:41 2014 +0300 arch/sparc/math-emu/math_32.c: drop stray break operator [ Upstream commit 093758e3daede29cb4ce6aedb111becf9d4bfc57 ] This commit is a guesswork, but it seems to make sense to drop this break, as otherwise the following line is never executed and becomes dead code. And that following line actually saves the result of local calculation by the pointer given in function argument. So the proposed change makes sense if this code in the whole makes sense (but I am unable to analyze it in the whole). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641 Reported-by: David Binderman Signed-off-by: Andrey Utkin Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 7e8d4ace2a7c12cbbb533413ad449024d9411b00 Author: Sowmini Varadhan Date: Fri Aug 1 09:50:40 2014 -0400 sparc64: ldc_connect() should not return EINVAL when handshake is in progress. [ Upstream commit 4ec1b01029b4facb651b8ef70bc20a4be4cebc63 ] The LDC handshake could have been asynchronously triggered after ldc_bind() enables the ldc_rx() receive interrupt-handler (and thus intercepts incoming control packets) and before vio_port_up() calls ldc_connect(). If that is the case, ldc_connect() should return 0 and let the state-machine progress. Signed-off-by: Sowmini Varadhan Acked-by: Karl Volz Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 5c1c4ea37d89f674b1f9583ac87c2045a4e365ff Author: Christopher Alexander Tobias Schulze Date: Sun Aug 3 16:01:53 2014 +0200 sunsab: Fix detection of BREAK on sunsab serial console [ Upstream commit fe418231b195c205701c0cc550a03f6c9758fd9e ] Fix detection of BREAK on sunsab serial console: BREAK detection was only performed when there were also serial characters received simultaneously. To handle all BREAKs correctly, the check for BREAK and the corresponding call to uart_handle_break() must also be done if count == 0, therefore duplicate this code fragment and pull it out of the loop over the received characters. Patch applies to 3.16-rc6. Signed-off-by: Christopher Alexander Tobias Schulze Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit a98877459ce8471c492eba5c86f659a8936fb368 Author: Christopher Alexander Tobias Schulze Date: Sun Aug 3 15:44:52 2014 +0200 bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000 [ Upstream commit 5cdceab3d5e02eb69ea0f5d8fa9181800baf6f77 ] Fix regression in bbc i2c temperature and fan control on some Sun systems that causes the driver to refuse to load due to the bbc_i2c_bussel resource not being present on the (second) i2c bus where the temperature sensors and fan control are located. (The check for the number of resources was removed when the driver was ported to a pure OF driver in mid 2008.) Signed-off-by: Christopher Alexander Tobias Schulze Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 7eca30d5af4302eda6b876dce3af504b490f9e9c Author: David S. Miller Date: Mon Aug 4 20:07:37 2014 -0700 sparc64: Guard against flushing openfirmware mappings. [ Upstream commit 4ca9a23765da3260058db3431faf5b4efd8cf926 ] Based almost entirely upon a patch by Christopher Alexander Tobias Schulze. In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap layer") lazy VMAP tlb flushing was added to the vmalloc layer. This causes problems on sparc64. Sparc64 has two VMAP mapped regions and they are not contiguous with eachother. First we have the malloc mapping area, then another unrelated region, then the vmalloc region. This "another unrelated region" is where the firmware is mapped. If the lazy TLB flushing logic in the vmalloc code triggers after we've had both a module unload and a vfree or similar, it will pass an address range that goes from somewhere inside the malloc region to somewhere inside the vmalloc region, and thus covering the openfirmware area entirely. The sparc64 kernel learns about openfirmware's dynamic mappings in this region early in the boot, and then services TLB misses in this area. But openfirmware has some locked TLB entries which are not mentioned in those dynamic mappings and we should thus not disturb them. These huge lazy TLB flush ranges causes those openfirmware locked TLB entries to be removed, resulting in all kinds of problems including hard hangs and crashes during reboot/reset. Besides causing problems like this, such huge TLB flush ranges are also incredibly inefficient. A plea has been made with the author of the VMAP lazy TLB flushing code, but for now we'll put a safety guard into our flush_tlb_kernel_range() implementation. Since the implementation has become non-trivial, stop defining it as a macro and instead make it a function in a C source file. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 8dbeb4c1447f85e4d020506f671c08ec4cd81c2a Author: David S. Miller Date: Mon Aug 4 16:34:01 2014 -0700 sparc64: Do not insert non-valid PTEs into the TSB hash table. [ Upstream commit 18f38132528c3e603c66ea464727b29e9bbcb91b ] The assumption was that update_mmu_cache() (and the equivalent for PMDs) would only be called when the PTE being installed will be accessible by the user. This is not true for code paths originating from remove_migration_pte(). There are dire consequences for placing a non-valid PTE into the TSB. The TLB miss frramework assumes thatwhen a TSB entry matches we can just load it into the TLB and return from the TLB miss trap. So if a non-valid PTE is in there, we will deadlock taking the TLB miss over and over, never satisfying the miss. Just exit early from update_mmu_cache() and friends in this situation. Based upon a report and patch from Christopher Alexander Tobias Schulze. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 71adc6cba6c1107fd60809af883f440b1d0923d1 Author: David S. Miller Date: Sat May 17 11:28:05 2014 -0700 sparc64: Add membar to Niagara2 memcpy code. [ Upstream commit 5aa4ecfd0ddb1e6dcd1c886e6c49677550f581aa ] This is the prevent previous stores from overlapping the block stores done by the memcpy loop. Based upon a glibc patch by Jose E. Marchesi Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit ffd19429051807486c032f22a51bb922ec499d02 Author: David S. Miller Date: Wed May 7 14:07:32 2014 -0700 sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus. [ Upstream commit b18eb2d779240631a098626cb6841ee2dd34fda0 ] Access to the TSB hash tables during TLB misses requires that there be an atomic 128-bit quad load available so that we fetch a matching TAG and DATA field at the same time. On cpus prior to UltraSPARC-III only virtual address based quad loads are available. UltraSPARC-III and later provide physical address based variants which are easier to use. When we only have virtual address based quad loads available this means that we have to lock the TSB into the TLB at a fixed virtual address on each cpu when it runs that process. We can't just access the PAGE_OFFSET based aliased mapping of these TSBs because we cannot take a recursive TLB miss inside of the TLB miss handler without risking running out of hardware trap levels (some trap combinations can be deep, such as those generated by register window spill and fill traps). Without huge pages it's working perfectly fine, but when the huge TSB got added another chunk of fixed virtual address space was not allocated for this second TSB mapping. So we were mapping both the 8K and 4MB TSBs to the same exact virtual address, causing multiple TLB matches which gives undefined behavior. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 49d8dd90871341cd21b0255fbf6df3077c5fbb3b Author: David S. Miller Date: Tue May 6 21:27:37 2014 -0700 sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses. [ Upstream commit e5c460f46ae7ee94831cb55cb980f942aa9e5a85 ] This was found using Dave Jone's trinity tool. When a user process which is 32-bit performs a load or a store, the cpu chops off the top 32-bits of the effective address before translating it. This is because we run 32-bit tasks with the PSTATE_AM (address masking) bit set. We can't run the kernel with that bit set, so when the kernel accesses userspace no address masking occurs. Since a 32-bit process will have no mappings in that region we will properly fault, so we don't try to handle this using access_ok(), which can safely just be a NOP on sparc64. Real faults from 32-bit processes should never generate such addresses so a bug check was added long ago, and it barks in the logs if this happens. But it also barks when a kernel user access causes this condition, and that _can_ happen. For example, if a pointer passed into a system call is "0xfffffffc" and the kernel access 4 bytes offset from that pointer. Just handle such faults normally via the exception entries. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 168e88a714d78847b2c6adecd5e5c3969085f600 Author: David S. Miller Date: Tue Apr 29 13:28:23 2014 -0700 sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR(). [ Upstream commit fe866433f843b080246ce729b5e6b27b5f5d9a58 ] pte_ERROR() is not used anywhere, delete it. For pgd_ERROR() and pmd_ERROR(), output something similar to x86, giving the address of the pgd/pmd as well as it's value. Also provide the caller, since these macros are invoked from pgd_clear_bad() and pmd_clear_bad() which provides little context as to what high level operation was occuring when the BAD state was detected. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 53171720ebb51e530d9d7c5d1eef40d2411cdb9a Author: David S. Miller Date: Mon Apr 28 23:52:11 2014 -0700 sparc64: Fix top-level fault handling bugs. [ Upstream commit 70ffc6ebaead783ac8dafb1e87df0039bb043596 ] Make get_user_insn() able to cope with huge PMDs. Next, make do_fault_siginfo() more robust when get_user_insn() can't actually fetch the instruction. In particular, use the MMU announced fault address when that happens, instead of calling compute_effective_address() and computing garbage. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 46d044e759115fdeb3ca5b74293d0cd641ae97c8 Author: David S. Miller Date: Mon Apr 28 23:50:08 2014 -0700 sparc64: Handle 32-bit tasks properly in compute_effective_address(). [ Upstream commit d037d16372bbe4d580342bebbb8826821ad9edf0 ] If we have a 32-bit task we must chop off the top 32-bits of the 64-bit value just as the cpu would. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 47b752610108d84c8032b75877b94cdafff0a90e Author: Kirill Tkhai Date: Thu Apr 17 00:45:24 2014 +0400 sparc64: Make itc_sync_lock raw [ Upstream commit 49b6c01f4c1de3b5e5427ac5aba80f9f6d27837a ] One more place where we must not be able to be preempted or to be interrupted in RT. Always actually disable interrupts during synchronization cycle. Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit b87c3db222bba76dfc08d6ce65d9c73c8f1df00c Author: David S. Miller Date: Wed Apr 30 19:37:48 2014 -0700 sparc64: Fix argument sign extension for compat_sys_futex(). [ Upstream commit aa3449ee9c87d9b7660dd1493248abcc57769e31 ] Only the second argument, 'op', is signed. Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 76ae4fcb2283279d807fa56c10455f35ccc09e22 Author: Eric Dumazet Date: Tue Aug 5 16:49:52 2014 +0200 sctp: fix possible seqlock seadlock in sctp_packet_transmit() [ Upstream commit 757efd32d5ce31f67193cc0e6a56e4dffcc42fb1 ] Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [] dump_stack+0x4e/0x7a [] check_preemption_disabled+0xfa/0x100 [] __this_cpu_preempt_check+0x13/0x20 [] sctp_packet_transmit+0x692/0x710 [sctp] [] sctp_outq_flush+0x2a2/0xc30 [sctp] [] ? mark_held_locks+0x7c/0xb0 [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [] sctp_outq_uncork+0x1a/0x20 [sctp] [] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [] sctp_do_sm+0xdb/0x330 [sctp] [] ? preempt_count_sub+0xab/0x100 [] ? sctp_cname+0x70/0x70 [sctp] [] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [] sctp_sendmsg+0x88f/0xe30 [sctp] [] ? lock_release_holdtime.part.28+0x9a/0x160 [] ? put_lock_stats.isra.27+0xe/0x30 [] inet_sendmsg+0x104/0x220 [] ? inet_sendmsg+0x5/0x220 [] sock_sendmsg+0x9e/0xe0 [] ? might_fault+0xb9/0xc0 [] ? might_fault+0x5e/0xc0 [] SYSC_sendto+0x124/0x1c0 [] ? syscall_trace_enter+0x250/0x330 [] SyS_sendto+0xe/0x10 [] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba61c3c4b ("inet: fix possible seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet Cc: Hannes Frederic Sowa Reported-by: Dave Jones Acked-by: Neil Horman Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit cb3770c1f6b13463dcb48802d2cc99591651fa63 Author: Sasha Levin Date: Thu Jul 31 23:00:35 2014 -0400 iovec: make sure the caller actually wants anything in memcpy_fromiovecend [ Upstream commit 06ebb06d49486676272a3c030bfeef4bd969a8e6 ] Check for cases when the caller requests 0 bytes instead of running off and dereferencing potentially invalid iovecs. Signed-off-by: Sasha Levin Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 4b29e9c0a64a0ce0256505315cc3f5819ddbec64 Author: Vlad Yasevich Date: Thu Jul 31 10:33:06 2014 -0400 net: Correctly set segment mac_len in skb_segment(). [ Upstream commit fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 ] When performing segmentation, the mac_len value is copied right out of the original skb. However, this value is not always set correctly (like when the packet is VLAN-tagged) and we'll end up copying a bad value. One way to demonstrate this is to configure a VM which tags packets internally and turn off VLAN acceleration on the forwarding bridge port. The packets show up corrupt like this: 16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0, 0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d. 0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0.. 0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3 0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp 0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp ... This also leads to awful throughput as GSO packets are dropped and cause retransmissions. The solution is to set the mac_len using the values already available in then new skb. We've already adjusted all of the header offset, so we might as well correctly figure out the mac_len using skb_reset_mac_len(). After this change, packets are segmented correctly and performance is restored. CC: Eric Dumazet Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6a25e8f778995cabb0cfe2acb3247e3b42dec35f Author: Vlad Yasevich Date: Thu Jul 31 10:30:25 2014 -0400 macvlan: Initialize vlan_features to turn on offload support. [ Upstream commit 081e83a78db9b0ae1f5eabc2dedecc865f509b98 ] Macvlan devices do not initialize vlan_features. As a result, any vlan devices configured on top of macvlans perform very poorly. Initialize vlan_features based on the vlan features of the lower-level device. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4a07c786e3d9fbe989d8b5bf9920a1e34afd8b91 Author: Daniel Borkmann Date: Tue Jul 22 15:22:45 2014 +0200 net: sctp: inherit auth_capable on INIT collisions [ Upstream commit 1be9a950c646c9092fb3618197f7b6bfb50e82aa ] Jason reported an oops caused by SCTP on his ARM machine with SCTP authentication enabled: Internal error: Oops: 17 [#1] ARM CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1 task: c6eefa40 ti: c6f52000 task.ti: c6f52000 PC is at sctp_auth_calculate_hmac+0xc4/0x10c LR is at sg_init_table+0x20/0x38 pc : [] lr : [] psr: 40000013 sp : c6f538e8 ip : 00000000 fp : c6f53924 r10: c6f50d80 r9 : 00000000 r8 : 00010000 r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254 r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005397f Table: 06f28000 DAC: 00000015 Process sctp-test (pid: 104, stack limit = 0xc6f521c0) Stack: (0xc6f538e8 to 0xc6f54000) [...] Backtrace: [] (sctp_auth_calculate_hmac+0x0/0x10c) from [] (sctp_packet_transmit+0x33c/0x5c8) [] (sctp_packet_transmit+0x0/0x5c8) from [] (sctp_outq_flush+0x7fc/0x844) [] (sctp_outq_flush+0x0/0x844) from [] (sctp_outq_uncork+0x24/0x28) [] (sctp_outq_uncork+0x0/0x28) from [] (sctp_side_effects+0x1134/0x1220) [] (sctp_side_effects+0x0/0x1220) from [] (sctp_do_sm+0xac/0xd4) [] (sctp_do_sm+0x0/0xd4) from [] (sctp_assoc_bh_rcv+0x118/0x160) [] (sctp_assoc_bh_rcv+0x0/0x160) from [] (sctp_inq_push+0x6c/0x74) [] (sctp_inq_push+0x0/0x74) from [] (sctp_rcv+0x7d8/0x888) While we already had various kind of bugs in that area ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache auth_enable per endpoint"), this one is a bit of a different kind. Giving a bit more background on why SCTP authentication is needed can be found in RFC4895: SCTP uses 32-bit verification tags to protect itself against blind attackers. These values are not changed during the lifetime of an SCTP association. Looking at new SCTP extensions, there is the need to have a method of proving that an SCTP chunk(s) was really sent by the original peer that started the association and not by a malicious attacker. To cause this bug, we're triggering an INIT collision between peers; normal SCTP handshake where both sides intent to authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO parameters that are being negotiated among peers: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- RFC4895 says that each endpoint therefore knows its own random number and the peer's random number *after* the association has been established. The local and peer's random number along with the shared key are then part of the secret used for calculating the HMAC in the AUTH chunk. Now, in our scenario, we have 2 threads with 1 non-blocking SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling sctp_bindx(3), listen(2) and connect(2) against each other, thus the handshake looks similar to this, e.g.: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------- -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------> ... Since such collisions can also happen with verification tags, the RFC4895 for AUTH rather vaguely says under section 6.1: In case of INIT collision, the rules governing the handling of this Random Number follow the same pattern as those for the Verification Tag, as explained in Section 5.2.4 of RFC 2960 [5]. Therefore, each endpoint knows its own Random Number and the peer's Random Number after the association has been established. In RFC2960, section 5.2.4, we're eventually hitting Action B: B) In this case, both sides may be attempting to start an association at about the same time but the peer endpoint started its INIT after responding to the local endpoint's INIT. Thus it may have picked a new Verification Tag not being aware of the previous Tag it had sent this endpoint. The endpoint should stay in or enter the ESTABLISHED state but it MUST update its peer's Verification Tag from the State Cookie, stop any init or cookie timers that may running and send a COOKIE ACK. In other words, the handling of the Random parameter is the same as behavior for the Verification Tag as described in Action B of section 5.2.4. Looking at the code, we exactly hit the sctp_sf_do_dupcook_b() case which triggers an SCTP_CMD_UPDATE_ASSOC command to the side effect interpreter, and in fact it properly copies over peer_{random, hmacs, chunks} parameters from the newly created association to update the existing one. Also, the old asoc_shared_key is being released and based on the new params, sctp_auth_asoc_init_active_key() updated. However, the issue observed in this case is that the previous asoc->peer.auth_capable was 0, and has *not* been updated, so that instead of creating a new secret, we're doing an early return from the function sctp_auth_asoc_init_active_key() leaving asoc->asoc_shared_key as NULL. However, we now have to authenticate chunks from the updated chunk list (e.g. COOKIE-ACK). That in fact causes the server side when responding with ... <------------------ AUTH; COOKIE-ACK ----------------- ... to trigger a NULL pointer dereference, since in sctp_packet_transmit(), it discovers that an AUTH chunk is being queued for xmit, and thus it calls sctp_auth_calculate_hmac(). Since the asoc->active_key_id is still inherited from the endpoint, and the same as encoded into the chunk, it uses asoc->asoc_shared_key, which is still NULL, as an asoc_key and dereferences it in ... crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len) ... causing an oops. All this happens because sctp_make_cookie_ack() called with the *new* association has the peer.auth_capable=1 and therefore marks the chunk with auth=1 after checking sctp_auth_send_cid(), but it is *actually* sent later on over the then *updated* association's transport that didn't initialize its shared key due to peer.auth_capable=0. Since control chunks in that case are not sent by the temporary association which are scheduled for deletion, they are issued for xmit via SCTP_CMD_REPLY in the interpreter with the context of the *updated* association. peer.auth_capable was 0 in the updated association (which went from COOKIE_WAIT into ESTABLISHED state), since all previous processing that performed sctp_process_init() was being done on temporary associations, that we eventually throw away each time. The correct fix is to update to the new peer.auth_capable value as well in the collision case via sctp_assoc_update(), so that in case the collision migrated from 0 -> 1, sctp_auth_asoc_init_active_key() can properly recalculate the secret. This therefore fixes the observed server panic. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Reported-by: Jason Gunthorpe Signed-off-by: Daniel Borkmann Tested-by: Jason Gunthorpe Cc: Vlad Yasevich Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 3ef99737f145fce80734b1e868b77422676489d7 Author: Christoph Paasch Date: Tue Jul 29 13:40:57 2014 +0200 tcp: Fix integer-overflow in TCP vegas [ Upstream commit 1f74e613ded11517db90b2bd57e9464d9e0fb161 ] In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger Cc: Neal Cardwell Cc: Eric Dumazet Cc: David Laight Cc: Doug Leith Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 1ba9d8a3a0be1c762f04031b1111aa1455408d38 Author: Christoph Paasch Date: Tue Jul 29 12:07:27 2014 +0200 tcp: Fix integer-overflows in TCP veno [ Upstream commit 45a07695bc64b3ab5d6d2215f9677e5b8c05a7d0 ] In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 2cf6a6ae722a13c73219a698f873955926252026 Author: Andrey Ryabinin Date: Sat Jul 26 21:26:58 2014 +0400 net: sendmsg: fix NULL pointer dereference [ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ] Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa Cc: Eric Dumazet Cc: Reported-by: Sasha Levin Signed-off-by: Andrey Ryabinin Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 2b03bce84364b12644b35a88f6d9e88bfcb7aacc Author: Eric Dumazet Date: Sat Jul 26 08:58:10 2014 +0200 ip: make IP identifiers less predictable [ Upstream commit 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75 ] In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet Reported-by: Jeffrey Knockel Reported-by: Jedidiah R. Crandall Cc: Willy Tarreau Cc: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 16cc7c2f0ce25aaa048b626477f594668203c44d Author: Eric Dumazet Date: Mon Jun 2 05:26:03 2014 -0700 inetpeer: get rid of ip_id_count [ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ] Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 9ac1f1f2b7b385891a5a63d09337a2d66fd76673 Author: Dmitry Kravkov Date: Thu Jul 24 18:54:47 2014 +0300 bnx2x: fix crash during TSO tunneling [ Upstream commit fe26566d8a05151ba1dce75081f6270f73ec4ae1 ] When TSO packet is transmitted additional BD w/o mapping is used to describe the packed. The BD needs special handling in tx completion. kernel: Call Trace: kernel: [] dump_stack+0x19/0x1b kernel: [] warn_slowpath_common+0x61/0x80 kernel: [] warn_slowpath_fmt+0x5c/0x80 kernel: [] ? find_iova+0x4d/0x90 kernel: [] intel_unmap_page.part.36+0x142/0x160 kernel: [] intel_unmap_page+0x26/0x30 kernel: [] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x] kernel: [] bnx2x_tx_int+0xac/0x220 [bnx2x] kernel: [] ? read_tsc+0x9/0x20 kernel: [] bnx2x_poll+0xbb/0x3c0 [bnx2x] kernel: [] net_rx_action+0x15a/0x250 kernel: [] __do_softirq+0xf7/0x290 kernel: [] call_softirq+0x1c/0x30 kernel: [] do_softirq+0x55/0x90 kernel: [] irq_exit+0x115/0x120 kernel: [] do_IRQ+0x58/0xf0 kernel: [] common_interrupt+0x6d/0x6d kernel: [] ? clockevents_notify+0x127/0x140 kernel: [] ? cpuidle_enter_state+0x4f/0xc0 kernel: [] cpuidle_idle_call+0xc5/0x200 kernel: [] arch_cpu_idle+0xe/0x30 kernel: [] cpu_startup_entry+0xf5/0x290 kernel: [] start_secondary+0x265/0x27b kernel: ---[ end trace 11aa7726f18d7e80 ]--- Fixes: a848ade408b ("bnx2x: add CSUM and TSO support for encapsulation protocols") Reported-by: Yulong Pei Cc: Michal Schmidt Signed-off-by: Dmitry Kravkov Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 9377a0c19fd479731713fca0e653d9d0d55f4415 Author: Boris Ostrovsky Date: Wed Jul 9 13:18:18 2014 -0400 x86/espfix/xen: Fix allocation of pages for paravirt page tables commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream. init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: H. Peter Anvin Signed-off-by: Jiri Slaby commit 834ed96b54b04bfd07c7e0964d20ccc2da0c534d Author: Minfei Huang Date: Wed Jun 4 16:11:53 2014 -0700 lib/btree.c: fix leak of whole btree nodes commit c75b53af2f0043aff500af0a6f878497bef41bca upstream. I use btree from 3.14-rc2 in my own module. When the btree module is removed, a warning arises: kmem_cache_destroy btree_node: Slab cache still has objects CPU: 13 PID: 9150 Comm: rmmod Tainted: GF O 3.14.0-rc2 #1 Hardware name: Inspur NF5270M3/NF5270M3, BIOS CHEETAH_2.1.3 09/10/2013 Call Trace: dump_stack+0x49/0x5d kmem_cache_destroy+0xcf/0xe0 btree_module_exit+0x10/0x12 [btree] SyS_delete_module+0x198/0x1f0 system_call_fastpath+0x16/0x1b The cause is that it doesn't release the last btree node, when height = 1 and fill = 1. [akpm@linux-foundation.org: remove unneeded test of NULL] Signed-off-by: Minfei Huang Cc: Joern Engel Cc: Johannes Berg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit c6f5709bebb3a04ce4bd0bebc0eba31e6d19cde7 Author: Sasha Levin Date: Mon Jul 14 17:02:31 2014 -0700 net/l2tp: don't fall back on UDP [get|set]sockopt commit 3cf521f7dc87c031617fd47e4b7aa2593c2f3daf upstream. The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin Acked-by: James Chapman Acked-by: David Miller Cc: Phil Turnbull Cc: Vegard Nossum Cc: Willy Tarreau Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit e0a37035c42e82250492cce9a914f78a3a8c036a Author: Max Filippov Date: Sat May 24 21:48:28 2014 +0400 xtensa: add fixup for double exception raised in window overflow commit 17290231df16eeee5dfc198dbf5ee4b419996dcd upstream. There are two FIXMEs in the double exception handler 'for the extremely unlikely case'. This case gets hit by gcc during kernel build once in a few hours, resulting in an unrecoverable exception condition. Provide missing fixup routine to handle this case. Double exception literals now need 8 more bytes, add them to the linker script. Also replace bbsi instructions with bbsi.l as we're branching depending on 8th and 7th LSB-based bits of exception address. This may be tested by adding the explicit DTLB invalidation to window overflow handlers, like the following: # --- a/arch/xtensa/kernel/vectors.S # +++ b/arch/xtensa/kernel/vectors.S # @@ -592,6 +592,14 @@ ENDPROC(_WindowUnderflow4) # ENTRY_ALIGN64(_WindowOverflow8) # # s32e a0, a9, -16 # + bbsi.l a9, 31, 1f # + rsr a0, ccount # + bbsi.l a0, 4, 1f # + pdtlb a0, a9 # + idtlb a0 # + movi a0, 9 # + idtlb a0 # +1: # l32e a0, a1, -12 # s32e a2, a9, -8 # s32e a1, a9, -12 Signed-off-by: Max Filippov Signed-off-by: Jiri Slaby commit 3516cca653f2c410b8f736a73daf8089f7a18628 Author: Johannes Berg Date: Mon Jul 7 12:01:11 2014 +0200 Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan" commit 08b9939997df30e42a228e1ecb97f99e9c8ea84e upstream. This reverts commit 277d916fc2e959c3f106904116bb4f7b1148d47a as it was at least breaking iwlwifi by setting the IEEE80211_TX_CTL_NO_PS_BUFFER flag in all kinds of interface modes, not only for AP mode where it is appropriate. To avoid reintroducing the original problem, explicitly check for probe request frames in the multicast buffering code. Fixes: 277d916fc2e9 ("mac80211: move "bufferable MMPDU" check to fix AP mode scan") Signed-off-by: Johannes Berg Signed-off-by: Jiri Slaby commit 6e1af05639abfc6f1841e6bf8b5c8492971ed1f2 Author: Malcolm Priestley Date: Wed Jul 23 21:35:11 2014 +0100 staging: vt6655: Fix Warning on boot handle_irq_event_percpu. commit 6cff1f6ad4c615319c1a146b2aa0af1043c5e9f5 upstream. WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0() irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts Using spin_lock_irqsave appears to fix this. Signed-off-by: Malcolm Priestley Signed-off-by: Jiri Slaby commit ad878a9e0f1bb5e5c81039ab10d1b416c5fa58f1 Author: Andy Lutomirski Date: Wed Jul 23 08:34:11 2014 -0700 x86_64/entry/xen: Do not invoke espfix64 on Xen commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream. This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin Signed-off-by: Jiri Slaby commit da629b7d89012aed5da81acc405933aa5389b626 Author: H. Peter Anvin Date: Sun May 4 10:36:22 2014 -0700 x86, espfix: Make it possible to disable 16-bit support commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream. Embedded systems, which may be very memory-size-sensitive, are extremely unlikely to ever encounter any 16-bit software, so make it a CONFIG_EXPERT option to turn off support for any 16-bit software whatsoever. Signed-off-by: H. Peter Anvin Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Jiri Slaby commit 9217746a13b46bf0da738c0bacdb77b0db288137 Author: H. Peter Anvin Date: Sun May 4 10:00:49 2014 -0700 x86, espfix: Make espfix64 a Kconfig option, fix UML commit 197725de65477bc8509b41388157c1a2283542bb upstream. Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar Signed-off-by: H. Peter Anvin Cc: Richard Weinberger Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Jiri Slaby commit 134b7223097fea833afabe883d4d183364392eea Author: H. Peter Anvin Date: Fri May 2 11:33:51 2014 -0700 x86, espfix: Fix broken header guard commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream. Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu Signed-off-by: H. Peter Anvin Signed-off-by: Jiri Slaby commit 24ebf77fa93bc2c5879f1d19973a78ecd1a8cfed Author: H. Peter Anvin Date: Thu May 1 14:12:23 2014 -0700 x86, espfix: Move espfix definitions into a separate header file commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream. Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu Signed-off-by: H. Peter Anvin Signed-off-by: Jiri Slaby commit 7e329475de1f97df56e1cfa412e5b3479994c202 Author: H. Peter Anvin Date: Tue Apr 29 16:46:09 2014 -0700 x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack commit 3891a04aafd668686239349ea58f3314ea2af86b upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst Signed-off-by: H. Peter Anvin Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk Cc: Borislav Petkov Cc: Andrew Lutomriski Cc: Linus Torvalds Cc: Dirk Hohndel Cc: Arjan van de Ven Cc: comex Cc: Alexander van Heukelum Cc: Boris Ostrovsky Cc: # consider after upstream merge Signed-off-by: Jiri Slaby commit a7b4794cc25bdfb541d77f190fee9784542927ef Author: H. Peter Anvin Date: Wed May 21 10:22:59 2014 -0700 Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream. This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in preparation of merging in the proper fix (espfix64). Signed-off-by: H. Peter Anvin Signed-off-by: Jiri Slaby commit 62c54cb10f458a548305bc4c2fb55fc0ad9d79ea Author: Jan Kara Date: Fri Aug 1 12:20:02 2014 +0200 timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks commit 504d58745c9ca28d33572e2d8a9990b43e06075d upstream. clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04b #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: Fengguang Wu Signed-off-by: Jan Kara Signed-off-by: Thomas Gleixner Signed-off-by: Jiri Slaby commit d6a1cfb5f64485010bb0429a7bcb899b96ee92e9 Author: John Stultz Date: Wed Jun 4 16:11:40 2014 -0700 printk: rename printk_sched to printk_deferred commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream. After learning we'll need some sort of deferred printk functionality in the timekeeping core, Peter suggested we rename the printk_sched function so it can be reused by needed subsystems. This only changes the function name. No logic changes. Signed-off-by: John Stultz Reviewed-by: Steven Rostedt Cc: Jan Kara Cc: Peter Zijlstra Cc: Jiri Bohac Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit 49cbb95e7c78ca45aad2d985fc7f186ec6020dbc Author: Anssi Hannula Date: Fri Aug 1 11:55:47 2014 -0400 dm cache: fix race affecting dirty block count commit 44fa816bb778edbab6b6ddaaf24908dd6295937e upstream. nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: Anssi Hannula Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Jiri Slaby commit 9ee4fa038c195fc4ccd45aedaafbc8fd489ec33b Author: Greg Thelen Date: Thu Jul 31 09:07:19 2014 -0700 dm bufio: fully initialize shrinker commit d8c712ea471ce7a4fd1734ad2211adf8469ddddc upstream. 1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: Greg Thelen Acked-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Jiri Slaby commit 8504485d7e3adb86929a240e7e3fafc855c52bc2 Author: Lars-Peter Clausen Date: Thu Jul 17 16:59:00 2014 +0100 iio: buffer: Fix demux table creation commit 61bd55ce1667809f022be88da77db17add90ea4e upstream. When creating the demux table we need to iterate over the selected scan mask for the buffer to get the samples which should be copied to destination buffer. Right now the code uses the mask which contains all active channels, which means the demux table contains entries which causes it to copy all the samples from source to destination buffer one by one without doing any demuxing. Signed-off-by: Lars-Peter Clausen Signed-off-by: Jonathan Cameron Signed-off-by: Jiri Slaby commit e167bebf57cd9033574ee7f78f54e76fe1529e77 Author: Peter Meerwald Date: Wed Jul 16 19:32:00 2014 +0100 iio:bma180: Missing check for frequency fractional part commit 9b2a4d35a6ceaf217be61ed8eb3c16986244f640 upstream. val2 should be zero This will make no difference for correct inputs but will reject incorrect ones with a decimal part in the value written to the sysfs interface. Signed-off-by: Peter Meerwald Cc: Oleksandr Kravchenko Signed-off-by: Jonathan Cameron Signed-off-by: Jiri Slaby commit 357249fb9640cd72a4e6d8008ce1f1be0154e37c Author: Peter Meerwald Date: Wed Jul 16 19:32:00 2014 +0100 iio:bma180: Fix scale factors to report correct acceleration units commit 381676d5e86596b11e22a62f196e192df6091373 upstream. The userspace interface for acceleration sensors is documented as using m/s^2 units [Documentation/ABI/testing/sysfs-bus-iio] The fullscale raw values for the BMA80 corresponds to -/+ 1, 1.5, 2, etc G depending on the selected mode. The scale table was converting to G rather than m/s^2. Change the scaling table to match the documented interface. See commit 71702e6e, iio: mma8452: Use correct acceleration units, for a related fix. Signed-off-by: Peter Meerwald Cc: Oleksandr Kravchenko Signed-off-by: Jonathan Cameron Signed-off-by: Jiri Slaby commit 0a5d8d71f528efeeffa3415fb61846e6f18541ae Author: Malcolm Priestley Date: Wed Jul 23 21:35:12 2014 +0100 staging: vt6655: Fix disassociated messages every 10 seconds commit 4aa0abed3a2a11b7d71ad560c1a3e7631c5a31cd upstream. byReAssocCount is incremented every second resulting in disassociated message being send every 10 seconds whether connection or not. byReAssocCount should only advance while eCommandState is in WLAN_ASSOCIATE_WAIT Change existing scope to if condition. Signed-off-by: Malcolm Priestley Signed-off-by: Jiri Slaby commit 5f24f2a63160e8bc5f12bc2d786048d5ce2f49cb Author: Michal Hocko Date: Wed Jul 30 16:08:33 2014 -0700 memcg: oom_notify use-after-free fix commit 2bcf2e92c3918ce62ab4e934256e47e9a16d19c3 upstream. Paul Furtado has reported the following GPF: general protection fault: 0000 [#1] SMP Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1 task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000 RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240 RSP: e02b:ffff8801d2ec7d48 EFLAGS: 00010283 RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200 RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800 R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30 FS: 00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660 Call Trace: pagefault_out_of_memory+0x18/0x90 mm_fault_error+0xa9/0x1a0 __do_page_fault+0x478/0x4c0 do_page_fault+0x2c/0x40 page_fault+0x28/0x30 Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75 RIP mem_cgroup_oom_synchronize+0x140/0x240 Commit fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock assuming it is protected by the hierarchical OOM-lock. Although this is true for the notification part the protection doesn't cover unregistration of event which can happen in parallel now so mem_cgroup_oom_notify can see already unlinked and/or freed mem_cgroup_eventfd_list. Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881 Fixes: fb2a6fc56be6 (mm: memcg: rework and document OOM waiting and wakeup) Signed-off-by: Michal Hocko Reported-by: Paul Furtado Tested-by: Paul Furtado Acked-by: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit 06c188135130cd30aade48dfaa3beac58c08f074 Author: David Rientjes Date: Wed Jul 30 16:08:24 2014 -0700 mm, thp: do not allow thp faults to avoid cpuset restrictions commit b104a35d32025ca740539db2808aa3385d0f30eb upstream. The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET should be set in allocflags. ALLOC_CPUSET controls if a page allocation should be restricted only to the set of allowed cpuset mems. Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent the fault path from using memory compaction or direct reclaim. Thus, it is unfairly able to allocate outside of its cpuset mems restriction as a side-effect. This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is truly GFP_ATOMIC by verifying it is also not a thp allocation. Signed-off-by: David Rientjes Reported-by: Alex Thorlton Tested-by: Alex Thorlton Cc: Bob Liu Cc: Dave Hansen Cc: Hedi Berriche Cc: Hugh Dickins Cc: Johannes Weiner Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Rik van Riel Cc: Srivatsa S. Bhat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit c5fec566bef6a027e75c84c35ec970482eb88cea Author: Maxim Patlasov Date: Wed Jul 30 16:08:21 2014 -0700 mm/page-writeback.c: fix divide by zero in bdi_dirty_limits() commit f6789593d5cea42a4ecb1cbeab6a23ade5ebbba7 upstream. Under memory pressure, it is possible for dirty_thresh, calculated by global_dirty_limits() in balance_dirty_pages(), to equal zero. Then, if strictlimit is true, bdi_dirty_limits() tries to resolve the proportion: bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh by dividing by zero. Signed-off-by: Maxim Patlasov Acked-by: Rik van Riel Cc: Michal Hocko Cc: KOSAKI Motohiro Cc: Wu Fengguang Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit ca519c15876ef6cdb6ccede7fc6832868ee09494 Author: James Bottomley Date: Thu Jul 3 19:17:34 2014 +0200 scsi: handle flush errors properly commit 89fb4cd1f717a871ef79fa7debbe840e3225cd54 upstream. Flush commands don't transfer data and thus need to be special cased in the I/O completion handler so that we can propagate errors to the block layer and filesystem. Signed-off-by: James Bottomley Reported-by: Steven Haber Tested-by: Steven Haber Reviewed-by: Martin K. Petersen Signed-off-by: Christoph Hellwig Signed-off-by: Jiri Slaby commit 55415f0d061cde055304bbd70332830a45a3b222 Author: Alexandre Bounine Date: Wed Jul 30 16:08:26 2014 -0700 rapidio/tsi721_dma: fix failure to obtain transaction descriptor commit 0193ed8225e1a79ed64632106ec3cc81798cb13c upstream. This is a bug fix for the situation when function tsi721_desc_get() fails to obtain a free transaction descriptor. The bug usually results in a memory access crash dump when data transfer scatter-gather list has more entries than size of hardware buffer descriptors ring. This fix ensures that error is properly returned to a caller instead of an invalid entry. This patch is applicable to kernel versions starting from v3.5. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Andre van Herk Cc: Stef van Os Cc: Vinod Koul Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit a774b9a4f275ebc272f8cc8e6c4f29f601aa695d Author: Eliad Peller Date: Thu Jul 17 15:00:56 2014 +0300 cfg80211: fix mic_failure tracing commit 8c26d458394be44e135d1c6bd4557e1c4e1a0535 upstream. tsc can be NULL (mac80211 currently always passes NULL), resulting in NULL-dereference. check before copying it. Signed-off-by: Eliad Peller Signed-off-by: Emmanuel Grumbach Signed-off-by: Johannes Berg Signed-off-by: Jiri Slaby commit f97dcfbb2d1d5664b4c523470f3fe587c7c8a439 Author: Felix Fietkau Date: Wed Jul 23 15:40:54 2014 +0200 ath9k: fix aggregation session lockup commit c01fac1c77a00227f706a1654317023e3f4ac7f0 upstream. If an aggregation session fails, frames still end up in the driver queue with IEEE80211_TX_CTL_AMPDU set. This causes tx for the affected station/tid to stall, since ath_tx_get_tid_subframe returning packets to send. Fix this by clearing IEEE80211_TX_CTL_AMPDU as long as no aggregation session is running. Reported-by: Antonio Quartulli Signed-off-by: Felix Fietkau Signed-off-by: John W. Linville Signed-off-by: Jiri Slaby commit 8e49be939c5868035f26a77e997cab2e4f6b5d13 Author: Konstantin Khlebnikov Date: Fri Jul 25 09:17:12 2014 +0100 ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout commit 811a2407a3cf7bbd027fbe92d73416f17485a3d8 upstream. On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2 (pmd) entries map 2MiB. When the identity mapping is created on LPAE, the pgd pointers are copied from the swapper_pg_dir. If we find that we need to modify the contents of a pmd, we allocate a new empty pmd table and insert it into the appropriate 1GB slot, before then filling it with the identity mapping. However, if the 1GB slot covers the kernel lowmem mappings, we obliterate those mappings. When replacing a PMD, first copy the old PMD contents to the new PMD, so that we preserve the existing mappings, particularly the mappings of the kernel itself. [rewrote commit message and added code comment -- rmk] Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format") Signed-off-by: Konstantin Khlebnikov Signed-off-by: Russell King Signed-off-by: Jiri Slaby commit 06bcee7de7ca6b3bd122437d9d4181ef399e7847 Author: Milan Broz Date: Tue Jul 29 18:41:09 2014 +0000 crypto: af_alg - properly label AF_ALG socket commit 4c63f83c2c2e16a13ce274ee678e28246bd33645 upstream. Th AF_ALG socket was missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This was recently demonstrated in the cryptsetup package (cryptsetup v1.6.5 and later.) See https://bugzilla.redhat.com/show_bug.cgi?id=1115120 This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Signed-off-by: Milan Broz Acked-by: Paul Moore Signed-off-by: Herbert Xu Signed-off-by: Jiri Slaby commit cbcbb4c4826ff594b091e143b0f049f13ab7a64e Author: Martin Schwidefsky Date: Wed Jul 30 13:04:49 2014 +0200 s390/ptrace: fix PSW mask check commit dab6cf55f81a6e16b8147aed9a843e1691dcd318 upstream. The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect. The PSW_MASK_USER define contains the PSW_MASK_ASC bits, the ptrace interface accepts all combinations for the address-space-control bits. To protect the kernel space the PSW mask check in ptrace needs to reject the address-space-control bit combination for home space. Fixes CVE-2014-3534 Signed-off-by: Martin Schwidefsky Signed-off-by: Jiri Slaby