commit fcba09f2b0bf27eeaa1d4d439edb649585f35040 Author: Greg Kroah-Hartman Date: Sat Oct 3 13:52:18 2015 +0200 Linux 4.2.3 commit b2b2c7be0fc8e9b0f6f32215cd23b54b07ec4b31 Author: Kyle Evans Date: Fri Sep 11 10:40:17 2015 -0500 hp-wmi: limit hotkey enable commit 8a1513b49321e503fd6c8b6793e3b1f9a8a3285b upstream. Do not write initialize magic on systems that do not have feature query 0xb. Fixes Bug #82451. Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd for code clearity. Add a new test function, hp_wmi_bios_2008_later() & simplify hp_wmi_bios_2009_later(), which fixes a bug in cases where an improper value is returned. Probably also fixes Bug #69131. Add missing __init tag. Signed-off-by: Kyle Evans Signed-off-by: Darren Hart Signed-off-by: Greg Kroah-Hartman commit 6abf903c8eb352a3705353789ac200d188466f16 Author: Luis Henriques Date: Thu Sep 17 16:01:40 2015 -0700 zram: fix possible use after free in zcomp_create() commit 3aaf14da807a4e9931a37f21e4251abb8a67021b upstream. zcomp_create() verifies the success of zcomp_strm_{multi,single}_create() through comp->stream, which can potentially be pointing to memory that was freed if these functions returned an error. While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic 'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create() could return other error codes. Function documentation updated accordingly. Fixes: beca3ec71fe5 ("zram: add multi stream functionality") Signed-off-by: Luis Henriques Acked-by: Sergey Senozhatsky Acked-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 92b52680751f95fa275c5a2dfb274de5c320d358 Author: Carol L Soto Date: Thu Aug 27 14:43:25 2015 -0500 net/mlx4_core: Capping number of requested MSIXs to MAX_MSIX [ Upstream commit 9293267a3e2a7a2555d8ddc8f9301525e5b03b1b ] We currently manage IRQs in pool_bm which is a bit field of MAX_MSIX bits. Thus, allocating more than MAX_MSIX interrupts can't be managed in pool_bm. Fixing this by capping number of requested MSIXs to MAX_MSIX. Signed-off-by: Matan Barak Signed-off-by: Carol L Soto Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0d106a6a020b5605c8a4748b9862af62ef2f8e59 Author: Stas Sergeev Date: Mon Jul 20 17:49:58 2015 -0700 mvneta: use inband status only when explicitly enabled [ Upstream commit f8af8e6eb95093d5ce5ebcc52bd1929b0433e172 in net-next tree, will be pushed to Linus very soon. ] The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state signaling") implemented the link parameters auto-negotiation unconditionally. Unfortunately it appears that some HW that implements SGMII protocol, doesn't generate the inband status, so it is not possible to auto-negotiate anything with such HW. This patch enables the auto-negotiation only if explicitly requested with the 'managed' DT property. This patch fixes the following regression: https://lkml.org/lkml/2015/7/8/865 Signed-off-by: Stas Sergeev CC: Thomas Petazzoni CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 40448fc0043995e83336befcca83642f6f158c03 Author: Stas Sergeev Date: Mon Jul 20 17:49:57 2015 -0700 of_mdio: add new DT property 'managed' to specify the PHY management type [ Upstream commit 4cba5c2103657d43d0886e4cff8004d95a3d0def in net-next tree, will be pushed to Linus very soon. ] Currently the PHY management type is selected by the MAC driver arbitrary. The decision is based on the presence of the "fixed-link" node and on a will of the driver's authors. This caused a regression recently, when mvneta driver suddenly started to use the in-band status for auto-negotiation on fixed links. It appears the auto-negotiation may not work when expected by the MAC driver. Sebastien Rannou explains: << Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with inband status) is connected to the switch through QSGMII, and in this context we are on the media side of the PHY. >> https://lkml.org/lkml/2015/7/10/206 This patch introduces the new string property 'managed' that allows the user to set the management type explicitly. The supported values are: "auto" - default. Uses either MDIO or nothing, depending on the presence of the fixed-link node "in-band-status" - use in-band status Signed-off-by: Stas Sergeev CC: Rob Herring CC: Pawel Moll CC: Mark Rutland CC: Ian Campbell CC: Kumar Gala CC: Florian Fainelli CC: Grant Likely CC: devicetree@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bfba942d0d287d2765612629826b87a0749bf6bd Author: Stas Sergeev Date: Mon Jul 20 17:49:56 2015 -0700 net: phy: fixed_phy: handle link-down case [ Upstream 868a4215be9a6d80548ccb74763b883dc99d32a2 in net-next tree, will be pushed to Linus very soon. ] fixed_phy_register() currently hardcodes the fixed PHY link to 1, and expects to find a "speed" parameter to provide correct information towards the fixed PHY consumer. In a subsequent change, where we allow "managed" (e.g: (RS)GMII in-band status auto-negotiation) fixed PHYs, none of these parameters can be provided since they will be auto-negotiated, hence, we just provide a zero-initialized fixed_phy_status to fixed_phy_register() which makes it fail when we call fixed_phy_update_regs() since status.speed = 0 which makes us hit the "default" label and error out. Without this change, we would also see potentially inconsistent speed/duplex parameters for fixed PHYs when the link is DOWN. CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev [florian: add more background to why this is correct and desirable] Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b11c94db52901ca6b5167a2089f14e679cbe0cee Author: Florian Fainelli Date: Mon Jul 20 17:49:55 2015 -0700 net: dsa: bcm_sf2: Do not override speed settings [ Upstream d2eac98f7d1b950b762a7eca05a9ce0ea1d878d2 in net-next tree, will be pushed to Linus very soon. ] The SF2 driver currently overrides speed settings for its port configured using a fixed PHY, this is both unnecessary and incorrect, because we keep feedback to the hardware parameters that we read from the PHY device, which in the case of a fixed PHY cannot possibly change speed. This is a required change to allow the fixed PHY code to allow registering a PHY with a link configured as DOWN by default and avoid some sort of circular dependency where we require the link_update callback to run to program the hardware, and we then utilize the fixed PHY parameters to program the hardware with the same settings. Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4c8f9d6cf799cf77d6e0a2ad0d26e066be630bf9 Author: Guillaume Nault Date: Thu Sep 24 12:54:01 2015 +0200 ppp: fix lockdep splat in ppp_dev_uninit() [ Upstream commit 58a89ecaca53736aa465170530acea4f8be34ab4 ] ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection. ppp_create_interface() must then lock these mutexes in that same order to avoid possible deadlock. [ 120.880011] ====================================================== [ 120.880011] [ INFO: possible circular locking dependency detected ] [ 120.880011] 4.2.0 #1 Not tainted [ 120.880011] ------------------------------------------------------- [ 120.880011] ppp-apitest/15827 is trying to acquire lock: [ 120.880011] (&pn->all_ppp_mutex){+.+.+.}, at: [] ppp_dev_uninit+0x64/0xb0 [ppp_generic] [ 120.880011] [ 120.880011] but task is already holding lock: [ 120.880011] (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x14 [ 120.880011] [ 120.880011] which lock already depends on the new lock. [ 120.880011] [ 120.880011] [ 120.880011] the existing dependency chain (in reverse order) is: [ 120.880011] [ 120.880011] -> #1 (rtnl_mutex){+.+.+.}: [ 120.880011] [] lock_acquire+0xcf/0x10e [ 120.880011] [] mutex_lock_nested+0x56/0x341 [ 120.880011] [] rtnl_lock+0x12/0x14 [ 120.880011] [] register_netdev+0x11/0x27 [ 120.880011] [] ppp_ioctl+0x289/0xc98 [ppp_generic] [ 120.880011] [] do_vfs_ioctl+0x4ea/0x532 [ 120.880011] [] SyS_ioctl+0x4e/0x7d [ 120.880011] [] entry_SYSCALL_64_fastpath+0x12/0x6f [ 120.880011] [ 120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}: [ 120.880011] [] __lock_acquire+0xb07/0xe76 [ 120.880011] [] lock_acquire+0xcf/0x10e [ 120.880011] [] mutex_lock_nested+0x56/0x341 [ 120.880011] [] ppp_dev_uninit+0x64/0xb0 [ppp_generic] [ 120.880011] [] rollback_registered_many+0x19e/0x252 [ 120.880011] [] rollback_registered+0x29/0x38 [ 120.880011] [] unregister_netdevice_queue+0x6a/0x77 [ 120.880011] [] ppp_release+0x42/0x79 [ppp_generic] [ 120.880011] [] __fput+0xec/0x192 [ 120.880011] [] ____fput+0x9/0xb [ 120.880011] [] task_work_run+0x66/0x80 [ 120.880011] [] prepare_exit_to_usermode+0x8c/0xa7 [ 120.880011] [] syscall_return_slowpath+0xe4/0x104 [ 120.880011] [] int_ret_from_sys_call+0x25/0x9f [ 120.880011] [ 120.880011] other info that might help us debug this: [ 120.880011] [ 120.880011] Possible unsafe locking scenario: [ 120.880011] [ 120.880011] CPU0 CPU1 [ 120.880011] ---- ---- [ 120.880011] lock(rtnl_mutex); [ 120.880011] lock(&pn->all_ppp_mutex); [ 120.880011] lock(rtnl_mutex); [ 120.880011] lock(&pn->all_ppp_mutex); [ 120.880011] [ 120.880011] *** DEADLOCK *** Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-by: Sedat Dilek Tested-by: Sedat Dilek Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 45c191bb3aabf6df3db4ba5e94dd24b96edf6ab5 Author: Wilson Kok Date: Tue Sep 22 21:40:22 2015 -0700 fib_rules: fix fib rule dumps across multiple skbs [ Upstream commit 41fc014332d91ee90c32840bf161f9685b7fbf2b ] dump_rules returns skb length and not error. But when family == AF_UNSPEC, the caller of dump_rules assumes that it returns an error. Hence, when family == AF_UNSPEC, we continue trying to dump on -EMSGSIZE errors resulting in incorrect dump idx carried between skbs belonging to the same dump. This results in fib rule dump always only dumping rules that fit into the first skb. This patch fixes dump_rules to return error so that we exit correctly and idx is correctly maintained between skbs that are part of the same dump. Signed-off-by: Wilson Kok Signed-off-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d8abd0589da3bc4fabe1450a5e98099b28874a30 Author: WANG Cong Date: Tue Sep 22 17:01:11 2015 -0700 net: revert "net_sched: move tp->root allocation into fw_init()" [ Upstream commit d8aecb10115497f6cdf841df8c88ebb3ba25fa28 ] fw filter uses tp->root==NULL to check if it is the old method, so it doesn't need allocation at all in this case. This patch reverts the offending commit and adds some comments for old method to make it obvious. Fixes: 33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()") Reported-by: Akshat Kakkar Cc: Jamal Hadi Salim Signed-off-by: Cong Wang Acked-by: Jamal Hadi Salim Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dd9eb1b17ca8fdbcfa496b61a7a5a2a34445a3da Author: David Woodhouse Date: Wed Sep 23 19:45:08 2015 +0100 Fix AF_PACKET ABI breakage in 4.2 [ Upstream commit d3869efe7a8a2298516d9af4f91487cf486ca945 ] Commit 7d82410950aa ("virtio: add explicit big-endian support to memory accessors") accidentally changed the virtio_net header used by AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian. Since virtio_legacy_is_little_endian() is a very long identifier, define a vio_le macro and use that throughout the code instead of the hard-coded 'false' for little-endian. This restores the ABI to match 4.1 and earlier kernels, and makes my test program work again. Signed-off-by: David Woodhouse Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9d0af4ef230500589fec21785e66b24af81d8ca7 Author: Eric Dumazet Date: Wed Sep 23 14:00:21 2015 -0700 tcp: add proper TS val into RST packets [ Upstream commit 675ee231d960af2af3606b4480324e26797eb010 ] RST packets sent on behalf of TCP connections with TS option (RFC 7323 TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr. A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100 ecr 0], length 0 B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss 1460,nop,nop,TS val 7264344 ecr 100], length 0 A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr 7264344], length 0 B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0 ecr 110], length 0 We need to call skb_mstamp_get() to get proper TS val, derived from skb->skb_mstamp Note that RFC 1323 was advocating to not send TS option in RST segment, but RFC 7323 recommends the opposite : Once TSopt has been successfully negotiated, that is both and contain TSopt, the TSopt MUST be sent in every non- segment for the duration of the connection, and SHOULD be sent in an segment (see Section 5.2 for details) Note this RFC recommends to send TS val = 0, but we believe it is premature : We do not know if all TCP stacks are properly handling the receive side : When an segment is received, it MUST NOT be subjected to the PAWS check by verifying an acceptable value in SEG.TSval, and information from the Timestamps option MUST NOT be used to update connection state information. SEG.TSecr MAY be used to provide stricter acceptance checks. In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider to decide to send TS val = 0, if it buys something. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet Acked-by: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1e0c9d37719e535b948b506638d87125ff266373 Author: Jesse Gross Date: Mon Sep 21 20:21:20 2015 -0700 openvswitch: Zero flows on allocation. [ Upstream commit ae5f2fb1d51fa128a460bcfbe3c56d7ab8bf6a43 ] When support for megaflows was introduced, OVS needed to start installing flows with a mask applied to them. Since masking is an expensive operation, OVS also had an optimization that would only take the parts of the flow keys that were covered by a non-zero mask. The values stored in the remaining pieces should not matter because they are masked out. While this works fine for the purposes of matching (which must always look at the mask), serialization to netlink can be problematic. Since the flow and the mask are serialized separately, the uninitialized portions of the flow can be encoded with whatever values happen to be present. In terms of functionality, this has little effect since these fields will be masked out by definition. However, it leaks kernel memory to userspace, which is a potential security vulnerability. It is also possible that other code paths could look at the masked key and get uninitialized data, although this does not currently appear to be an issue in practice. This removes the mask optimization for flows that are being installed. This was always intended to be the case as the mask optimizations were really targetting per-packet flow operations. Fixes: 03f0d916 ("openvswitch: Mega flow implementation") Signed-off-by: Jesse Gross Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ccbe6aba49fde168abba5649976da6370453cbf8 Author: Russell King Date: Mon Sep 21 21:42:59 2015 +0100 net: dsa: actually force the speed on the CPU port [ Upstream commit 53adc9e83028d9e35b6408231ebaf62a94a16e4d ] Commit 54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.") merged in the 4.2 merge window broke the link speed forcing for the CPU port of Marvell DSA switches. The original code was: /* MAC Forcing register: don't force link, speed, duplex * or flow control state to any particular values on physical * ports, but force the CPU port and all DSA ports to 1000 Mb/s * full duplex. */ if (dsa_is_cpu_port(ds, p) || ds->dsa_port_mask & (1 << p)) REG_WRITE(addr, 0x01, 0x003e); else REG_WRITE(addr, 0x01, 0x0003); but the new code does a read-modify-write: reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL); if (dsa_is_cpu_port(ds, port) || ds->dsa_port_mask & (1 << port)) { reg |= PORT_PCS_CTRL_FORCE_LINK | PORT_PCS_CTRL_LINK_UP | PORT_PCS_CTRL_DUPLEX_FULL | PORT_PCS_CTRL_FORCE_DUPLEX; if (mv88e6xxx_6065_family(ds)) reg |= PORT_PCS_CTRL_100; else reg |= PORT_PCS_CTRL_1000; The link speed in the PCS control register is a two bit field. Forcing the link speed in this way doesn't ensure that the bit field is set to the correct value - on the hardware I have here, the speed bitfield remains set to 0x03, resulting in the speed not being forced to gigabit. We must clear both bits before forcing the link speed. Fixes: 54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.") Signed-off-by: Russell King Acked-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e2a3131de43c6e8072ed618330c49f14d87dba6e Author: Herbert Xu Date: Tue Sep 22 11:38:56 2015 +0800 netlink: Replace rhash_portid with bound [ Upstream commit da314c9923fed553a007785a901fd395b7eb6c19 ] On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > barriers and can't be paired this way. You have to pair store_release > and load_acquire. Besides, it isn't a particularly good idea to OK I've decided to drop the acquire/release helpers as they don't help us at all and simply pessimises the code by using full memory barriers (on some architectures) where only a write or read barrier is needed. > depend on memory barriers embedded in other data structures like the > above. Here, especially, rhashtable_insert() would have write barrier > *before* the entry is hashed not necessarily *after*, which means that > in the above case, a socket which appears to have set bound to a > reader might not visible when the reader tries to look up the socket > on the hashtable. But you are right we do need an explicit write barrier here to ensure that the hashing is visible. > There's no reason to be overly smart here. This isn't a crazy hot > path, write barriers tend to be very cheap, store_release more so. > Please just do smp_store_release() and note what it's paired with. It's not about being overly smart. It's about actually understanding what's going on with the code. I've seen too many instances of people simply sprinkling synchronisation primitives around without any knowledge of what is happening underneath, which is just a recipe for creating hard-to-debug races. > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, > > } > > } > > > > - if (!nlk->portid) { > > + if (!nlk->bound) { > > I don't think you can skip load_acquire here just because this is the > second deref of the variable. That doesn't change anything. Race > condition could still happen between the first and second tests and > skipping the second would lead to the same kind of bug. The reason this one is OK is because we do not use nlk->portid or try to get nlk from the hash table before we return to user-space. However, there is a real bug here that none of these acquire/release helpers discovered. The two bound tests here used to be a single one. Now that they are separate it is entirely possible for another thread to come in the middle and bind the socket. So we need to repeat the portid check in order to maintain consistency. > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, > > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND)) > > return -EPERM; > > > > - if (!nlk->portid) > > + if (!nlk->bound) > > Don't we need load_acquire here too? Is this path holding a lock > which makes that unnecessary? Ditto. ---8<--- The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink: Fix autobind race condition that leads to zero port ID") created some new races that can occur due to inconcsistencies between the two port IDs. Tejun is right that a barrier is unavoidable. Therefore I am reverting to the original patch that used a boolean to indicate that a user netlink socket has been bound. Barriers have been added where necessary to ensure that a valid portid and the hashed socket is visible. I have also changed netlink_insert to only return EBUSY if the socket is bound to a portid different to the requested one. This combined with only reading nlk->bound once in netlink_bind fixes a race where two threads that bind the socket at the same time with different port IDs may both succeed. Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID") Reported-by: Tejun Heo Reported-by: Linus Torvalds Signed-off-by: Herbert Xu Nacked-by: Tejun Heo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6e32e731184134db406c428f491a9811cf58252a Author: Herbert Xu Date: Fri Sep 18 19:16:50 2015 +0800 netlink: Fix autobind race condition that leads to zero port ID [ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ] The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: Reset portid after netlink_insert failure") introduced a race condition where if two threads try to autobind the same socket one of them may end up with a zero port ID. This led to kernel deadlocks that were observed by multiple people. This patch reverts that commit and instead fixes it by introducing a separte rhash_portid variable so that the real portid is only set after the socket has been successfully hashed. Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure") Reported-by: Tejun Heo Reported-by: Linus Torvalds Signed-off-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3463bb420c2c8ac9ecabc907575ae2297f83f45c Author: Michael S. Tsirkin Date: Fri Sep 18 13:41:09 2015 +0300 macvtap: fix TUNSETSNDBUF values > 64k [ Upstream commit 3ea79249e81e5ed051f2e6480cbde896d99046e8 ] Upon TUNSETSNDBUF, macvtap reads the requested sndbuf size into a local variable u. commit 39ec7de7092b ("macvtap: fix uninitialized access on TUNSETIFF") changed its type to u16 (which is the right thing to do for all other macvtap ioctls), breaking all values > 64k. The value of TUNSETSNDBUF is actually a signed 32 bit integer, so the right thing to do is to read it into an int. Cc: David S. Miller Fixes: 39ec7de7092b ("macvtap: fix uninitialized access on TUNSETIFF") Reported-by: Mark A. Peloquin Bisected-by: Matthew Rosato Reported-by: Christian Borntraeger Signed-off-by: Michael S. Tsirkin Tested-by: Matthew Rosato Acked-by: Christian Borntraeger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d192596179a62abe787ed7184fc1cd1d1ec41920 Author: Eric Dumazet Date: Tue Sep 15 18:29:47 2015 -0700 net/mlx4_en: really allow to change RSS key [ Upsteam commit 4671fc6d47e0a0108fe24a4d830347d6a6ef4aa7 ] When changing rss key, we do not want to overwrite user provided key by the one provided by netdev_rss_key_fill(), which is the host random key generated at boot time. Fixes: 947cbb0ac242 ("net/mlx4_en: Support for configurable RSS hash function") Signed-off-by: Eric Dumazet Cc: Eyal Perry CC: Amir Vadai Acked-by: Or Gerlitz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 065b4761929a2315ddcc2f4356a9efbd6d105d0e Author: Roopa Prabhu Date: Tue Sep 15 14:44:29 2015 -0700 rtnetlink: catch -EOPNOTSUPP errors from ndo_bridge_getlink [ Upstream commit d64f69b0373a7d0bcec8b5da7712977518a8f42b ] problem reported: kernel 4.1.3 ------------ # bridge vlan port vlan ids eth0 1 PVID Egress Untagged 90 91 92 93 94 95 96 97 98 99 100 vmbr0 1 PVID Egress Untagged 94 kernel 4.2 ----------- # bridge vlan port vlan ids ndo_bridge_getlink can return -EOPNOTSUPP when an interfaces ndo_bridge_getlink op is set to switchdev_port_bridge_getlink and CONFIG_SWITCHDEV is not defined. This today can happen to bond, rocker and team devices. This patch adds -EOPNOTSUPP checks after calls to ndo_bridge_getlink. Fixes: 85fdb956726ff2a ("switchdev: cut over to new switchdev_port_bridge_getlink") Reported-by: Alexandre DERUMIER Signed-off-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit abb7a0340081651d037faa2149163f475f2c44ea Author: Simon Guinot Date: Tue Sep 15 22:41:21 2015 +0200 net: mvneta: fix DMA buffer unmapping in mvneta_rx() [ Upstream commit daf158d0d544cec80b7b30deff8cfc59a6e17610 ] This patch fixes a regression introduced by the commit a84e32894191 ("net: mvneta: fix refilling for Rx DMA buffers"). Due to this commit the newly allocated Rx buffers are DMA-unmapped in place of those passed to the networking stack. Obviously, this causes data corruptions. This patch fixes the issue by ensuring that the right Rx buffers are DMA-unmapped. Reported-by: Oren Laskin Signed-off-by: Simon Guinot Fixes: a84e32894191 ("net: mvneta: fix refilling for Rx DMA buffers") Cc: # v3.8+ Tested-by: Oren Laskin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c5de8f88c0177f3ac724cd4ad4caa685fee11945 Author: Linus Lüssing Date: Fri Sep 11 18:39:48 2015 +0200 bridge: fix igmpv3 / mldv2 report parsing [ Upstream commit c2d4fbd2163e607915cc05798ce7fb7f31117cc1 ] With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMPv3 and MLDv2 report parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header, breaking the message parsing and creating packet loss. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Tobias Powalowski Tested-by: Tobias Powalowski Signed-off-by: Linus Lüssing Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9a04c65c6bcd5fea9e892d338f3d3da8df46c9a1 Author: Marcelo Ricardo Leitner Date: Thu Sep 10 17:31:15 2015 -0300 sctp: fix race on protocol/netns initialization [ Upstream commit 8e2d61e0aed2b7c4ecb35844fe07e0b2b762dee4 ] Consider sctp module is unloaded and is being requested because an user is creating a sctp socket. During initialization, sctp will add the new protocol type and then initialize pernet subsys: status = sctp_v4_protosw_init(); if (status) goto err_protosw_init; status = sctp_v6_protosw_init(); if (status) goto err_v6_protosw_init; status = register_pernet_subsys(&sctp_net_ops); The problem is that after those calls to sctp_v{4,6}_protosw_init(), it is possible for userspace to create SCTP sockets like if the module is already fully loaded. If that happens, one of the possible effects is that we will have readers for net->sctp.local_addr_list list earlier than expected and sctp_net_init() does not take precautions while dealing with that list, leading to a potential panic but not limited to that, as sctp_sock_init() will copy a bunch of blank/partially initialized values from net->sctp. The race happens like this: CPU 0 | CPU 1 socket() | __sock_create | socket() inet_create | __sock_create list_for_each_entry_rcu( | answer, &inetsw[sock->type], | list) { | inet_create /* no hits */ | if (unlikely(err)) { | ... | request_module() | /* socket creation is blocked | * the module is fully loaded | */ | sctp_init | sctp_v4_protosw_init | inet_register_protosw | list_add_rcu(&p->list, | last_perm); | | list_for_each_entry_rcu( | answer, &inetsw[sock->type], sctp_v6_protosw_init | list) { | /* hit, so assumes protocol | * is already loaded | */ | /* socket creation continues | * before netns is initialized | */ register_pernet_subsys | Simply inverting the initialization order between register_pernet_subsys() and sctp_v4_protosw_init() is not possible because register_pernet_subsys() will create a control sctp socket, so the protocol must be already visible by then. Deferring the socket creation to a work-queue is not good specially because we loose the ability to handle its errors. So, as suggested by Vlad, the fix is to split netns initialization in two moments: defaults and control socket, so that the defaults are already loaded by when we register the protocol, while control socket initialization is kept at the same moment it is today. Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace") Signed-off-by: Vlad Yasevich Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 62f43b58d2b2c4f0200b9ca2b997f4c484f0272f Author: Daniel Borkmann Date: Thu Sep 10 20:05:46 2015 +0200 netlink, mmap: transform mmap skb into full skb on taps [ Upstream commit 1853c949646005b5959c483becde86608f548f24 ] Ken-ichirou reported that running netlink in mmap mode for receive in combination with nlmon will throw a NULL pointer dereference in __kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable to handle kernel paging request". The problem is the skb_clone() in __netlink_deliver_tap_skb() for skbs that are mmaped. I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink skb has it pointed to netlink_skb_destructor(), set in the handler netlink_ring_setup_skb(). There, skb->head is being set to NULL, so that in such cases, __kfree_skb() doesn't perform a skb_release_data() via skb_release_all(), where skb->head is possibly being freed through kfree(head) into slab allocator, although netlink mmap skb->head points to the mmap buffer. Similarly, the same has to be done also for large netlink skbs where the data area is vmalloced. Therefore, as discussed, make a copy for these rather rare cases for now. This fixes the issue on my and Ken-ichirou's test-cases. Reference: http://thread.gmane.org/gmane.linux.network/371129 Fixes: bcbde0d449ed ("net: netlink: virtual tap device management") Reported-by: Ken-ichirou MATSUZAWA Signed-off-by: Daniel Borkmann Tested-by: Ken-ichirou MATSUZAWA Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 12e082bc14a2c95787a79228ab8a1f9300cc8667 Author: Florian Fainelli Date: Tue Sep 8 20:06:41 2015 -0700 net: dsa: bcm_sf2: Fix 64-bits register writes [ Upstream commit 03679a14739a0d4c14b52ba65a69ff553bfba73b ] The macro to write 64-bits quantities to the 32-bits register swapped the value and offsets arguments, we want to preserve the ordering of the arguments with respect to how writel() is implemented for instance: value first, offset/base second. Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli Reviewed-by: Vivien Didelot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e60f4a39c2173ad637d5c6541404b7847acac246 Author: Roopa Prabhu Date: Tue Sep 8 10:53:04 2015 -0700 ipv6: fix multipath route replace error recovery [ Upstream commit 6b9ea5a64ed5eeb3f68f2e6fcce0ed1179801d1e ] Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route leaving the fib in an inconsistent state. This patch reduces the possibility of this by doing the following: a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that most errors are caught during building rt6_infos and we fail early c) Separates multipath add and del code. Because add needs the special two stage mode in a) and delete essentially does not care. d) In any event if the code fails during inserting a route again, a warning is printed (This should be unlikely) Before the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from * kernel */ After the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Roopa Prabhu Reviewed-by: Nikolay Aleksandrov Acked-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5548af0c5fc799dd4e165e9755a909e5cb60e4a0 Author: Florian Fainelli Date: Sat Sep 5 13:07:27 2015 -0700 net: dsa: bcm_sf2: Fix ageing conditions and operation [ Upstream commit 39797a279d62972cd914ef580fdfacb13e508bf8 ] The comparison check between cur_hw_state and hw_state is currently invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is going to cause an additional aging to occur. Fix this by not shifting cur_hw_state while reading it, but instead, mask the value with the appropriately shitfted bitmask. The other problem with the fast-ageing process is that we did not set the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL register to avoid leaving spurious bits sets from one operation to the other. Fixes: 12f460f23423 ("net: dsa: bcm_sf2: add HW bridging support") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f5f10834321f31c4b08f4f9760e0857cfa90add4 Author: Richard Laing Date: Thu Sep 3 13:52:31 2015 +1200 net/ipv6: Correct PIM6 mrt_lock handling [ Upstream commit 25b4a44c19c83d98e8c0807a7ede07c1f28eab8b ] In the IPv6 multicast routing code the mrt_lock was not being released correctly in the MFC iterator, as a result adding or deleting a MIF would cause a hang because the mrt_lock could not be acquired. This fix is a copy of the code for the IPv4 case and ensures that the lock is released correctly. Signed-off-by: Richard Laing Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c8bf2008b31f0c522290b32a246e48a551001128 Author: Atsushi Nemoto Date: Wed Sep 2 17:49:29 2015 +0900 net: eth: altera: fix napi poll_list corruption [ Upstream commit 4548a697e4969d695047cebd6d9af5e2f6cc728e ] tse_poll() calls __napi_complete() with irq enabled. This leads napi poll_list corruption and may stop all napi drivers working. Use napi_complete() instead of __napi_complete(). Signed-off-by: Atsushi Nemoto Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 496e7b36b54f554a314bc218c4f02d51b81a2d81 Author: Russell King Date: Wed Sep 2 17:24:14 2015 +0800 net: fec: clear receive interrupts before processing a packet [ Upstream commit ed63f1dcd5788d36f942fbcce350742385e3e18c ] The patch just to re-submit the patch "db3421c114cfa6326" because the patch "4d494cdc92b3b9a0" remove the change. Clear any pending receive interrupt before we process a pending packet. This helps to avoid any spurious interrupts being raised after we have fully cleaned the receive ring, while still allowing an interrupt to be raised if we receive another packet. The position of this is critical: we must do this prior to reading the next packet status to avoid potentially dropping an interrupt when a packet is still pending. Acked-by: Fugang Duan Signed-off-by: Russell King Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dd35e5b8ad3ddcb3dd13e076ba87e16fd4bd2e99 Author: Daniel Borkmann Date: Thu Sep 3 00:29:07 2015 +0200 ipv6: fix exthdrs offload registration in out_rt path [ Upstream commit e41b0bedba0293b9e1e8d1e8ed553104b9693656 ] We previously register IPPROTO_ROUTING offload under inet6_add_offload(), but in error path, we try to unregister it with inet_del_offload(). This doesn't seem correct, it should actually be inet6_del_offload(), also ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it also uses rthdr_offload twice), but it got removed entirely later on. Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.") Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4230591d474281936dc03c37d0177f6015aedaea Author: Daniel Borkmann Date: Wed Sep 2 14:00:36 2015 +0200 sock, diag: fix panic in sock_diag_put_filterinfo [ Upstream commit b382c08656000c12a146723a153b85b13a855b49 ] diag socket's sock_diag_put_filterinfo() dumps classic BPF programs upon request to user space (ss -0 -b). However, native eBPF programs attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method: Their orig_prog is always NULL. However, sock_diag_put_filterinfo() unconditionally tries to access its filter length resp. wants to copy the filter insns from there. Internal cBPF to eBPF transformations attached to sockets don't have this issue, as orig_prog state is kept. It's currently only used by packet sockets. If we would want to add native eBPF support in the future, this needs to be done through a different attribute than PACKET_DIAG_FILTER to not confuse possible user space disassemblers that work on diag data. Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann Acked-by: Nicolas Dichtel Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 001fc2f5d7ee719cf698eee845bc95d468b16380 Author: Mark Salter Date: Tue Sep 1 09:36:05 2015 -0400 phylib: fix device deletion order in mdiobus_unregister() [ Upstream commit b6c6aedcbcbacd7b0cb4b64ed5ac835bc1c60a03 ] commit 8b63ec1837fa ("phylib: Make PHYs children of their MDIO bus, not the bus' parent.") uncovered a problem in mdiobus_unregister() which leads to this warning when I reboot an APM Mustang (arm64) platform: WARNING: CPU: 7 PID: 4239 at fs/sysfs/group.c:224 sysfs_remove_group+0xa0/0xa4() sysfs group fffffe0000e07a10 not found for kobject 'xgene-mii-eth0:03' ... CPU: 7 PID: 4239 Comm: reboot Tainted: G E 4.2.0-0.18.el7.test15.aarch64 #1 Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Aug 26 2015 Call Trace: [] dump_backtrace+0x0/0x170 [] show_stack+0x20/0x2c [] dump_stack+0x78/0x9c [] warn_slowpath_common+0xa0/0xd8 [] warn_slowpath_fmt+0x74/0x88 [] sysfs_remove_group+0x9c/0xa4 [] dpm_sysfs_remove+0x5c/0x70 [] device_del+0x44/0x208 [] device_unregister+0x2c/0x7c [] mdiobus_unregister+0x48/0x94 [] xgene_enet_mdio_remove+0x28/0x44 [] xgene_enet_remove+0xd0/0xd8 [] xgene_enet_shutdown+0x2c/0x3c [] platform_drv_shutdown+0x24/0x40 [] device_shutdown+0xf0/0x1b4 [] kernel_restart_prepare+0x40/0x4c [] kernel_restart+0x1c/0x80 [] SyS_reboot+0x17c/0x250 The problem is that mdiobus_unregister() deletes the bus device before unregistering the phy devices on the bus. This wasn't a problem before because the phys were not children of the bus: /sys/devices/platform/APMC0D05:00/net/eth0/xgene-mii-eth0:03 /sys/devices/platform/APMC0D05:00/net/eth0/xgene-mii-eth0 But now that they are: /sys/devices/platform/APMC0D05:00/net/eth0/xgene-mii-eth0/xgene-mii-eth0:03 when mdiobus_unregister deletes the bus device, the phy subdirs are removed from sysfs also. So when the phys are unregistered afterward, we get the warning. This patch changes the order so that phys are unregistered before the bus device is deleted. Fixes: 8b63ec1837fa ("phylib: Make PHYs children of their MDIO bus, not the bus' parent.") Signed-off-by: Mark Salter Reviewed-by: Florian Fainelli Tested-by: Mark Langsdorf Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman