commit cf5ae2989a32c391d7769933e0267e6fbfae8e14 Author: Greg Kroah-Hartman Date: Mon Nov 21 10:11:59 2016 +0100 Linux 4.8.10 commit 5cd8f6788ff34999dbd4cbec81a6adfc215e1e60 Author: Michal Nazarewicz Date: Tue Oct 4 02:07:34 2016 +0200 usb: gadget: f_fs: stop sleeping in ffs_func_eps_disable commit a9e6f83c2df199187a5248f824f31b6787ae23ae upstream. ffs_func_eps_disable is called from atomic context so it cannot sleep thus cannot grab a mutex. Change the handling of epfile->read_buffer to use non-sleeping synchronisation method. Reported-by: Chen Yu Signed-off-by: Michał Nazarewicz Fixes: 9353afbbfa7b ("buffer data from ‘oversized’ OUT requests") Tested-by: John Stultz Tested-by: Chen Yu Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit e2458382c792eb1be48ce3d604a37d1af9baa9f4 Author: Michal Nazarewicz Date: Tue Oct 4 02:07:33 2016 +0200 usb: gadget: f_fs: edit epfile->ep under lock commit 454915dde06a51133750c6745f0ba57361ba209d upstream. epfile->ep is protected by ffs->eps_lock (not epfile->mutex) so clear it while holding the spin lock. Tested-by: John Stultz Tested-by: Chen Yu Signed-off-by: Michal Nazarewicz Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit e34a0f1c53b5d412a57268397543d31db33e73fb Author: David S. Miller Date: Mon Oct 24 21:25:31 2016 -0700 sparc64: Delete now unused user copy fixup functions. [ Upstream commit 0fd0ff01d4c3c01e7fe69b762ee1a13236639acc ] Now that all of the user copy routines are converted to return accurate residual lengths when an exception occurs, we no longer need the broken fixup routines. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit af97481a6f5ba3bfd1f1e1fcc8254fdbb3416eb5 Author: David S. Miller Date: Mon Oct 24 21:22:27 2016 -0700 sparc64: Delete now unused user copy assembler helpers. [ Upstream commit 614da3d9685b67917cab48c8452fd8bf93de0867 ] All of __ret{,l}_mone{_asi,_fp,_asi_fpu} are now unused. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ac663c54f40b2830b1ca32d1ae9d683fe248b14c Author: David S. Miller Date: Mon Oct 24 21:20:35 2016 -0700 sparc64: Convert U3copy_{from,to}_user to accurate exception reporting. [ Upstream commit ee841d0aff649164080e445e84885015958d8ff4 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d91bb7a87e261436efcc18f27415690aa3a2d32e Author: David S. Miller Date: Mon Oct 24 20:46:44 2016 -0700 sparc64: Convert NG2copy_{from,to}_user to accurate exception reporting. [ Upstream commit e93704e4464fdc191f73fce35129c18de2ebf95d ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a15859f9d8396cce7c55ccdb7e75f70f14cbc349 Author: David S. Miller Date: Mon Oct 24 19:32:12 2016 -0700 sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting. [ Upstream commit 7ae3aaf53f1695877ccd5ebbc49ea65991e41f1e ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bb522726d3115bcb4e7147a594e6f8a0cb4e3fef Author: David S. Miller Date: Mon Oct 24 18:58:05 2016 -0700 sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting. [ Upstream commit 95707704800988093a9b9a27e0f2f67f5b4bf2fa ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b0580eadc19ff3a617a7d07cfaf2a985153c114e Author: David S. Miller Date: Mon Aug 15 16:07:50 2016 -0700 sparc64: Convert U1copy_{from,to}_user to accurate exception reporting. [ Upstream commit cb736fdbb208eb3420f1a2eb2bfc024a6e9dcada ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 50e927483ccf8ea406ebd9cbd63934da8ddcc14e Author: David S. Miller Date: Mon Aug 15 15:26:38 2016 -0700 sparc64: Convert GENcopy_{from,to}_user to accurate exception reporting. [ Upstream commit d0796b555ba60c22eb41ae39a8362156cb08eee9 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 620ec41010d14a385aec45a1b530d09c65072355 Author: David S. Miller Date: Mon Aug 15 15:08:18 2016 -0700 sparc64: Convert copy_in_user to accurate exception reporting. [ Upstream commit 0096ac9f47b1a2e851b3165d44065d18e5f13d58 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bf4d0da8e800f0c15a5fae65ac83342f56306d2c Author: David S. Miller Date: Mon Aug 15 14:47:54 2016 -0700 sparc64: Prepare to move to more saner user copy exception handling. [ Upstream commit 83a17d2661674d8c198adc0e183418f72aabab79 ] The fixup helper function mechanism for handling user copy fault handling is not %100 accurrate, and can never be made so. We are going to transition the code to return the running return return length, which is always kept track in one or more registers of each of these routines. In order to convert them one by one, we have to allow the existing behavior to continue functioning. Therefore make all the copy code that wants the fixup helper to be used return negative one. After all of the user copy routines have been converted, this logic and the fixup helpers themselves can be removed completely. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bbbab9f59ea795bb647da007235952c73cd2174f Author: David S. Miller Date: Wed Aug 10 14:41:33 2016 -0700 sparc64: Delete __ret_efault. [ Upstream commit aa95ce361ed95c72ac42dcb315166bce5cf1a014 ] It is completely unused. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 81a91edbb91adeb35ea3bd7be11003c4901df4ac Author: David S. Miller Date: Thu Oct 27 09:04:54 2016 -0700 sparc64: Handle extremely large kernel TLB range flushes more gracefully. [ Upstream commit a74ad5e660a9ee1d071665e7e8ad822784a2dc7f ] When the vmalloc area gets fragmented, and because the firmware mapping area sits between where modules live and the vmalloc area, we can sometimes receive requests for enormous kernel TLB range flushes. When this happens the cpu just spins flushing billions of pages and this triggers the NMI watchdog and other problems. We took care of this on the TSB side by doing a linear scan of the table once we pass a certain threshold. Do something similar for the TLB flush, however we are limited by the TLB flush facilities provided by the different chip variants. First of all we use an (mostly arbitrary) cut-off of 256K which is about 32 pages. This can be tuned in the future. The huge range code path for each chip works as follows: 1) On spitfire we flush all non-locked TLB entries using diagnostic acceses. 2) On cheetah we use the "flush all" TLB flush. 3) On sun4v/hypervisor we do a TLB context flush on context 0, which unlike previous chips does not remove "permanent" or locked entries. We could probably do something better on spitfire, such as limiting the flush to kernel TLB entries or even doing range comparisons. However that probably isn't worth it since those chips are old and the TLB only had 64 entries. Reported-by: James Clarke Tested-by: James Clarke Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7f8a50eb38d313f08e5fcb734987c00a7ecd05fb Author: David S. Miller Date: Wed Oct 26 10:20:14 2016 -0700 sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code. [ Upstream commit a236441bb69723032db94128761a469030c3fe6d ] Just like the non-cross-call TLB flush handlers, the cross-call ones need to avoid doing PC-relative branches outside of their code blocks. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f7ef55af2f1b905f204688f7216e93bc06c91e72 Author: David S. Miller Date: Wed Oct 26 10:08:22 2016 -0700 sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending. [ Upstream commit 830cda3f9855ff092b0e9610346d110846fc497c ] Noticed by James Clarke. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2a28ab3d41482b44558b8ec981f492e3b4489c65 Author: David S. Miller Date: Tue Oct 25 16:23:26 2016 -0700 sparc64: Fix illegal relative branches in hypervisor patched TLB code. [ Upstream commit b429ae4d5b565a71dfffd759dfcd4f6c093ced94 ] When we copy code over to patch another piece of code, we can only use PC-relative branches that target code within that piece of code. Such PC-relative branches cannot be made to external symbols because the patch moves the location of the code and thus modifies the relative address of external symbols. Use an absolute jmpl to fix this problem. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f4fb552a033e08e3c294ad76b0d6e7c9c1a37182 Author: David S. Miller Date: Tue Oct 25 19:43:17 2016 -0700 sparc64: Handle extremely large kernel TSB range flushes sanely. [ Upstream commit 849c498766060a16aad5b0e0d03206726e7d2fa4 ] If the number of pages we are flushing is more than twice the number of entries in the TSB, just scan the TSB table for matches rather than probing each and every page in the range. Based upon a patch and report by James Clarke. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 51915c6d90704046600414e117d80e6a76ba13e6 Author: James Clarke Date: Mon Oct 24 19:49:25 2016 +0100 sparc: Handle negative offsets in arch_jump_label_transform [ Upstream commit 9d9fa230206a3aea6ef451646c97122f04777983 ] Additionally, if the offset will overflow the immediate for a ba,pt instruction, fall back on a standard ba to get an extra 3 bits. Signed-off-by: James Clarke Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit da6fe239ceffd8bbe1463c361283f0c4c756c983 Author: Baruch Siach Date: Fri Aug 12 16:04:33 2016 +0300 spi: spidev_test: fix build with musl libc commit 8736f8022e532a3c1d8873aac78e1113c6ffc3b9 upstream. spidev.h uses _IOC_SIZEBITS directly. musl libc does not provide this macro unless linux/ioctl.h is included explicitly. Fixes build failures like: In file included from .../host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/sys/ioctl.h:7:0, from .../build/spidev_test-v3.15/spidev_test.c:20: .../build/spidev_test-v3.15/spidev_test.c: In function ‘transfer’: .../build/spidev_test-v3.15/spidev_test.c:75:18: error: ‘_IOC_SIZEBITS’ undeclared (first use in this function) ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr); ^ Signed-off-by: Baruch Siach Signed-off-by: Mark Brown Cc: Ralph Sennhauser Signed-off-by: Greg Kroah-Hartman commit 4ea98e573d65b79714d99a4707b771d3a8ec98ae Author: Florian Fainelli Date: Sun Nov 13 17:50:35 2016 -0800 net: stmmac: Fix lack of link transition for fixed PHYs [ Upstream commit c51e424dc79e1428afc4d697cdb6a07f7af70cbf ] Commit 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") added some logic to avoid polling the fixed PHY and therefore invoking the adjust_link callback more than once, since this is a fixed PHY and link events won't be generated. This works fine the first time, because we start with phydev->irq = PHY_POLL, so we call adjust_link, then we set phydev->irq = PHY_IGNORE_INTERRUPT and we stop polling the PHY. Now, if we called ndo_close(), which calls both phy_stop() and does an explicit netif_carrier_off(), we end up with a link down. Upon calling ndo_open() again, despite starting the PHY state machine, we have PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the link is permanently down. Fixes: 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") Signed-off-by: Florian Fainelli Acked-by: Giuseppe Cavallaro Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 150b491b1b88d467ee6e2982b7baa074e53b60a0 Author: Xin Long Date: Sun Nov 13 21:44:37 2016 +0800 sctp: change sk state only when it has assocs in sctp_shutdown [ Upstream commit 5bf35ddfee052d44f39ebaa395d87101c8918405 ] Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if this sock has no connection (assoc), sk state would be changed to SCTP_SS_CLOSING, which is not as we expect. Besides, after that if users try to listen on this sock, kernel could even panic when it dereference sctp_sk(sk)->bind_hash in sctp_inet_listen, as bind_hash is null when sock has no assoc. This patch is to move sk state change after checking sk assocs is not empty, and also merge these two if() conditions and reduce indent level. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5235fcfa6cf8a06e001ef133eda224cffd08f6c9 Author: Baoquan He Date: Sun Nov 13 13:01:33 2016 +0800 bnx2: Wait for in-flight DMA to complete at probe stage [ Upstream commit 6df77862f63f389df3b1ad879738e04440d7385d ] In-flight DMA from 1st kernel could continue going in kdump kernel. New io-page table has been created before bnx2 does reset at open stage. We have to wait for the in-flight DMA to complete to avoid it look up into the newly created io-page table at probe stage. Suggested-by: Michael Chan Signed-off-by: Baoquan He Acked-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6523ff2e27fe40d1467c213404223ac80ce5aaa6 Author: Baoquan He Date: Sun Nov 13 13:01:32 2016 +0800 Revert "bnx2: Reset device during driver initialization" [ Upstream commit 5d0d4b91bf627f14f95167b738d524156c9d440b ] This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c. When people build bnx2 driver into kernel, it will fail to detect and load firmware because firmware is contained in initramfs and initramfs has not been uncompressed yet during do_initcalls. So revert commit 3e1be7a and work out a new way in the later patch. Signed-off-by: Baoquan He Acked-by: Paul Menzel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 224fb8cbefb229d9ce7b01ac5c4979bb0020cf38 Author: Arkadi Sharshevsky Date: Fri Nov 11 16:34:26 2016 +0100 mlxsw: spectrum_router: Correctly dump neighbour activity [ Upstream commit 42cdb338f40a98e6558bae35456fe86b6e90e1ef ] The device's neighbour table is periodically dumped in order to update the kernel about active neighbours. A single dump session may span multiple queries, until the response carries less records than requested or when a record (can contain up to four neighbour entries) is not full. Current code stops the session when the number of returned records is zero, which can result in infinite loop in case of high packet rate. Fix this by stopping the session according to the above logic. Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Signed-off-by: Arkadi Sharshevsky Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9092bbd64bd91d4cd08f9221368913f8cbec2a40 Author: Yotam Gigi Date: Fri Nov 11 16:34:25 2016 +0100 mlxsw: spectrum: Fix refcount bug on span entries [ Upstream commit 2d644d4c7506646f9c4a2afceb7fd5f030bc0c9f ] When binding port to a newly created span entry, its refcount is initialized to zero even though it has a bound port. That leads to unexpected behaviour when the user tries to delete that port from the span entry. Fix this by initializing the reference count to 1. Also add a warning to put function. Fixes: 763b4b70afcd ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Yotam Gigi Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5712922773b59684c943c52d4925e4fb25c26c8c Author: Mike Frysinger Date: Thu Nov 10 19:08:39 2016 -0500 Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" [ Upstream commit 7b5b74efcca00f15c2aec1dc7175bfe34b6ec643 ] This reverts commit cf00713a655d ("include/uapi/linux/atm_zatm.h: include linux/time.h"). This attempted to fix userspace breakage that no longer existed when the patch was merged. Almost one year earlier, commit 70ba07b675b5 ("atm: remove 'struct zatm_t_hist'") deleted the struct in question. After this patch was merged, we now have to deal with people being unable to include this header in conjunction with standard C library headers like stdlib.h (which linux-atm does). Example breakage: x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \ -I. -DCPPFLAGS_TEST -I../../src/include -O2 -march=native -pipe -g \ -frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \ -Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \ -Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \ -Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c In file included from /usr/include/linux/atm_zatm.h:17:0, from zntune.c:17: /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’ struct timespec { ^ In file included from /usr/include/sys/select.h:43:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from zntune.c:9: /usr/include/time.h:120:8: note: originally defined here struct timespec ^ Signed-off-by: Mike Frysinger Acked-by: Mikko Rapeli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2b5f22e4f7fd208c8d392e5c3755cea1f562cb98 Author: Eric Dumazet Date: Thu Nov 10 13:12:35 2016 -0800 tcp: take care of truncations done by sk_filter() [ Upstream commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 ] With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: Eric Dumazet Reported-by: Marco Grassi Reported-by: Vladis Dronov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 22a78d4c7f43e04c2a8a5301f2200f9c8b4a9c37 Author: Stephen Suryaputra Lin Date: Thu Nov 10 11:16:15 2016 -0500 ipv4: use new_gw for redirect neigh lookup [ Upstream commit 969447f226b451c453ddc83cac6144eaeac6f2e3 ] In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bccb4093d464ca8b33a096999c4bb317f84ad652 Author: Eric Dumazet Date: Wed Nov 9 16:04:46 2016 -0800 net: __skb_flow_dissect() must cap its return value [ Upstream commit 34fad54c2537f7c99d07375e50cb30aa3c23bd83 ] After Tom patch, thoff field could point past the end of the buffer, this could fool some callers. If an skb was provided, skb->len should be the upper limit. If not, hlen is supposed to be the upper limit. Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet Reported-by: Yibin Yang Acked-by: Willem de Bruijn Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a1632e969a55fc24ac8113a889b93a1093ed4da7 Author: David Ahern Date: Mon Nov 7 12:03:09 2016 -0800 net: icmp_route_lookup should use rt dev to determine L3 domain [ Upstream commit 9d1a6c4ea43e48c7880c85971c17939b56832d8a ] icmp_send is called in response to some event. The skb may not have the device set (skb->dev is NULL), but it is expected to have an rt. Update icmp_route_lookup to use the rt on the skb to determine L3 domain. Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9885f474d92b7ce09a6176133f9670525fe22e9b Author: Soheil Hassas Yeganeh Date: Fri Nov 4 15:36:49 2016 -0400 sock: fix sendmmsg for partial sendmsg [ Upstream commit 3023898b7d4aac65987bd2f485cc22390aae6f78 ] Do not send the next message in sendmmsg for partial sendmsg invocations. sendmmsg assumes that it can continue sending the next message when the return value of the individual sendmsg invocations is positive. It results in corrupting the data for TCP, SCTP, and UNIX streams. For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream of "aefgh" if the first sendmsg invocation sends only the first byte while the second sendmsg goes through. Datagram sockets either send the entire datagram or fail, so this patch affects only sockets of type SOCK_STREAM and SOCK_SEQPACKET. Fixes: 228e548e6020 ("net: Add sendmmsg socket system call") Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Eric Dumazet Signed-off-by: Willem de Bruijn Signed-off-by: Neal Cardwell Acked-by: Maciej Żenczykowski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b78ba0a0f231303739bea3e4344d31d5eddf0778 Author: Alexander Duyck Date: Fri Nov 4 15:11:57 2016 -0400 fib_trie: Correct /proc/net/route off by one error [ Upstream commit fd0285a39b1cb496f60210a9a00ad33a815603e7 ] The display of /proc/net/route has had a couple issues due to the fact that when I originally rewrote most of fib_trie I made it so that the iterator was tracking the next value to use instead of the current. In addition it had an off by 1 error where I was tracking the first piece of data as position 0, even though in reality that belonged to the SEQ_START_TOKEN. This patch updates the code so the iterator tracks the last reported position and key instead of the next expected position and key. In addition it shifts things so that all of the leaves start at 1 instead of trying to report leaves starting with offset 0 as being valid. With these two issues addressed this should resolve any off by one errors that were present in the display of /proc/net/route. Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route") Cc: Andy Whitcroft Reported-by: Jason Baron Tested-by: Jason Baron Signed-off-by: Alexander Duyck Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 92fd1c1f2fd27a352b91ad1f874775618aa1865a Author: David Ahern Date: Thu Nov 3 16:17:26 2016 -0700 net: icmp6_send should use dst dev to determine L3 domain [ Upstream commit 5d41ce29e3b91ef305f88d23f72b3359de329cec ] icmp6_send is called in response to some event. The skb may not have the device set (skb->dev is NULL), but it is expected to have a dst set. Update icmp6_send to use the dst on the skb to determine L3 domain. Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 09ee09498bcae967b6bde03940c8ba45dcd19875 Author: Daniel Borkmann Date: Fri Nov 4 00:01:19 2016 +0100 bpf: fix htab map destruction when extra reserve is in use [ Upstream commit 483bed2b0ddd12ec33fc9407e0c6e1088e77a97c ] Commit a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem") added an extra per-cpu reserve to the hash table map to restore old behaviour from pre prealloc times. When non-prealloc is in use for a map, then problem is that once a hash table extra element has been linked into the hash-table, and the hash table is destroyed due to refcount dropping to zero, then htab_map_free() -> delete_all_elements() will walk the whole hash table and drop all elements via htab_elem_free(). The problem is that the element from the extra reserve is first fed to the wrong backend allocator and eventually freed twice. Fixes: a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem") Reported-by: Dmitry Vyukov Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit de289ad2e5754953716f8d835846c804137a3a08 Author: Marcelo Ricardo Leitner Date: Thu Nov 3 17:03:41 2016 -0200 sctp: assign assoc_id earlier in __sctp_connect [ Upstream commit 7233bc84a3aeda835d334499dc00448373caf5c0 ] sctp_wait_for_connect() currently already holds the asoc to keep it alive during the sleep, in case another thread release it. But Andrey Konovalov and Dmitry Vyukov reported an use-after-free in such situation. Problem is that __sctp_connect() doesn't get a ref on the asoc and will do a read on the asoc after calling sctp_wait_for_connect(), but by then another thread may have closed it and the _put on sctp_wait_for_connect will actually release it, causing the use-after-free. Fix is, instead of doing the read after waiting for the connect, do it before so, and avoid this issue as the socket is still locked by then. There should be no issue on returning the asoc id in case of failure as the application shouldn't trust on that number in such situations anyway. This issue doesn't exist in sctp_sendmsg() path. Reported-by: Dmitry Vyukov Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Marcelo Ricardo Leitner Reviewed-by: Xin Long Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 76b5fee5cfa0ae36ef162196410740e4c92f0a96 Author: Eric Dumazet Date: Thu Nov 3 08:59:46 2016 -0700 ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped [ Upstream commit 990ff4d84408fc55942ca6644f67e361737b3d8e ] While fuzzing kernel with syzkaller, Andrey reported a nasty crash in inet6_bind() caused by DCCP lacking a required method. Fixes: ab1e0a13d7029 ("[SOCK] proto: Add hashinfo member to struct proto") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Cc: Arnaldo Carvalho de Melo Acked-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 84d9c612bb7a9e44c6bf286bedfbe72a6d2d71d4 Author: Eric Dumazet Date: Wed Nov 2 20:30:48 2016 -0700 ipv6: dccp: fix out of bound access in dccp_v6_err() [ Upstream commit 1aa9d1a0e7eefcc61696e147d123453fc0016005 ] dccp_v6_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmpv6_notify() are more than enough. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ba93cf7d2118774c0b2dcfccc8ae999427815caa Author: Eric Dumazet Date: Wed Nov 2 19:00:40 2016 -0700 dccp: fix out of bound access in dccp_v4_err() [ Upstream commit 6706a97fec963d6cb3f7fc2978ec1427b4651214 ] dccp_v4_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmp_socket_deliver() are more than enough. This patch might allow to process more ICMP messages, as some routers are still limiting the size of reflected bytes to 28 (RFC 792), instead of extended lengths (RFC 1812 4.3.2.3) Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 378a611013748171f37aef165148bb75fbce85e2 Author: Eric Dumazet Date: Wed Nov 2 18:04:24 2016 -0700 dccp: do not send reset to already closed sockets [ Upstream commit 346da62cc186c4b4b1ac59f87f4482b47a047388 ] Andrey reported following warning while fuzzing with syzkaller WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000 ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a 0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] panic+0x1bc/0x39d kernel/panic.c:179 [] __warn+0x1cc/0x1f0 kernel/panic.c:542 [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 [] sock_release+0x8e/0x1d0 net/socket.c:570 [] sock_close+0x16/0x20 net/socket.c:1017 [] __fput+0x29d/0x720 fs/file_table.c:208 [] ____fput+0x15/0x20 fs/file_table.c:244 [] task_work_run+0xf8/0x170 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [] do_exit+0x883/0x2ac0 kernel/exit.c:828 [] do_group_exit+0x10e/0x340 kernel/exit.c:931 [] get_signal+0x634/0x15a0 kernel/signal.c:2307 [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 [] exit_to_usermode_loop+0xe5/0x130 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x1a8/0x1e0 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc0/0xc2 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Fix this the same way we did for TCP in commit 565b7b2d2e63 ("tcp: do not send reset to already closed sockets") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 72b03e549b9582ab257219cbe4236a6e0f683ad0 Author: Eric Dumazet Date: Wed Nov 2 17:14:41 2016 -0700 dccp: do not release listeners too soon [ Upstream commit c3f24cfb3e508c70c26ee8569d537c8ca67a36c6 ] Andrey Konovalov reported following error while fuzzing with syzkaller : IPv4: Attempt to release alive inet socket ffff880068e98940 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88006b9e0000 task.stack: ffff880068770000 RIP: 0010:[] [] selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639 RSP: 0018:ffff8800687771c8 EFLAGS: 00010202 RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010 RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002 R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940 R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000 FS: 00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0 Stack: ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007 ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480 ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0 Call Trace: [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317 [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81 [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460 [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873 [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 [< inline >] dst_input ./include/net/dst.h:507 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 It turns out DCCP calls __sk_receive_skb(), and this broke when lookups no longer took a reference on listeners. Fix this issue by adding a @refcounted parameter to __sk_receive_skb(), so that sock_put() is used only when needed. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b3523a0773edd6722d35b9cdfd756337e3911df3 Author: Eric Dumazet Date: Wed Nov 2 14:41:50 2016 -0700 tcp: fix return value for partial writes [ Upstream commit 79d8665b9545e128637c51cf7febde9c493b6481 ] After my commit, tcp_sendmsg() might restart its loop after processing socket backlog. If sk_err is set, we blindly return an error, even though we copied data to user space before. We should instead return number of bytes that could be copied, otherwise user space might resend data and corrupt the stream. This might happen if another thread is using recvmsg(MSG_ERRQUEUE) to process timestamps. Issue was diagnosed by Soheil and Willem, big kudos to them ! Fixes: d41a69f1d390f ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Soheil Hassas Yeganeh Cc: Yuchung Cheng Cc: Neal Cardwell Tested-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1f49cc6fa91c703b9a2cbb6fe7176cc44e930c8d Author: Lance Richardson Date: Wed Nov 2 16:36:17 2016 -0400 ipv4: allow local fragmentation in ip_finish_output_gso() [ Upstream commit 9ee6c5dc816aa8256257f2cd4008a9291ec7e985 ] Some configurations (e.g. geneve interface with default MTU of 1500 over an ethernet interface with 1500 MTU) result in the transmission of packets that exceed the configured MTU. While this should be considered to be a "bad" configuration, it is still allowed and should not result in the sending of packets that exceed the configured MTU. Fix by dropping the assumption in ip_finish_output_gso() that locally originated gso packets will never need fragmentation. Basic testing using iperf (observing CPU usage and bandwidth) have shown no measurable performance impact for traffic not requiring fragmentation. Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack") Reported-by: Jan Tluka Signed-off-by: Lance Richardson Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 842a858fa048294c86837be560c277abf1611246 Author: Eric Dumazet Date: Wed Nov 2 07:53:17 2016 -0700 tcp: fix potential memory corruption [ Upstream commit ac9e70b17ecd7c6e933ff2eaf7ab37429e71bf4d ] Imagine initial value of max_skb_frags is 17, and last skb in write queue has 15 frags. Then max_skb_frags is lowered to 14 or smaller value. tcp_sendmsg() will then be allowed to add additional page frags and eventually go past MAX_SKB_FRAGS, overflowing struct skb_shared_info. Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags") Signed-off-by: Eric Dumazet Cc: Hans Westgaard Ry Cc: Håkon Bugge Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fc3b825f2c81a627459fd261faca10afa94cf087 Author: Eli Cooper Date: Tue Nov 1 23:45:12 2016 +0800 ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() [ Upstream commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 ] skb->cb may contain data from previous layers. In the observed scenario, the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so that small packets sent through the tunnel are mistakenly fragmented. This patch unconditionally clears the control buffer in ip6tunnel_xmit(), which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f5f4b71d56324556015f5d6e3db3d5e5fc05dac8 Author: Andy Gospodarek Date: Mon Oct 31 13:32:03 2016 -0400 bgmac: stop clearing DMA receive control register right after it is set [ Upstream commit fcdefccac976ee51dd6071832b842d8fb41c479c ] Current bgmac code initializes some DMA settings in the receive control register for some hardware and then immediately clears those settings. Not clearing those settings results in ~420Mbps *improvement* in throughput; this system can now receive frames at line-rate on Broadcom 5871x hardware compared to ~520Mbps today. I also tested a few other values but found there to be no discernible difference in CPU utilization even if burst size and prefetching values are different. On the hardware tested there was no need to keep the code that cleared all but bits 16-17, but since there is a wide variety of hardware that used this driver (I did not look at all hardware docs for hardware using this IP block), I find it wise to move this call up and clear bits just after reading the default value from the hardware rather than completely removing it. This is a good candidate for -stable >=3.14 since that is when the code that was supposed to improve performance (but did not) was introduced. Signed-off-by: Andy Gospodarek Fixes: 56ceecde1f29 ("bgmac: initialize the DMA controller of core...") Cc: Hauke Mehrtens Acked-by: Hauke Mehrtens Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0c7f764d2c6affb393e7031519a14935c5fa2586 Author: Eric Dumazet Date: Sat Oct 29 11:02:36 2016 -0700 net: mangle zero checksum in skb_checksum_help() [ Upstream commit 4f2e4ad56a65f3b7d64c258e373cb71e8d2499f4 ] Sending zero checksum is ok for TCP, but not for UDP. UDPv6 receiver should by default drop a frame with a 0 checksum, and UDPv4 would not verify the checksum and might accept a corrupted packet. Simply replace such checksum by 0xffff, regardless of transport. This error was caught on SIT tunnels, but seems generic. Signed-off-by: Eric Dumazet Cc: Maciej Żenczykowski Cc: Willem de Bruijn Acked-by: Maciej Żenczykowski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ac22a3ba07964d39e8c2cf2abda7c610d861eefe Author: Eric Dumazet Date: Fri Oct 28 13:40:24 2016 -0700 net: clear sk_err_soft in sk_clone_lock() [ Upstream commit e551c32d57c88923f99f8f010e89ca7ed0735e83 ] At accept() time, it is possible the parent has a non zero sk_err_soft, leftover from a prior error. Make sure we do not leave this value in the child, as it makes future getsockopt(SO_ERROR) calls quite unreliable. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5b078dc6fb6487d05bfca078f5c3806523bc5712 Author: Florian Westphal Date: Fri Oct 28 18:43:11 2016 +0200 dctcp: avoid bogus doubling of cwnd after loss [ Upstream commit ce6dd23329b1ee6a794acf5f7e40f8e89b8317ee ] If a congestion control module doesn't provide .undo_cwnd function, tcp_undo_cwnd_reduction() will set cwnd to tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1); ... which makes sense for reno (it sets ssthresh to half the current cwnd), but it makes no sense for dctcp, which sets ssthresh based on the current congestion estimate. This can cause severe growth of cwnd (eventually overflowing u32). Fix this by saving last cwnd on loss and restore cwnd based on that, similar to cubic and other algorithms. Fixes: e3118e8359bb7c ("net: tcp: add DCTCP congestion control algorithm") Cc: Lawrence Brakmo Cc: Andrew Shewmaker Cc: Glenn Judd Acked-by: Daniel Borkmann Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman