commit 9a088971000c4e7a4abddf9751649ead4d8a0fe0 Author: Greg Kroah-Hartman Date: Wed Dec 18 16:09:17 2019 +0100 Linux 5.4.5 commit 68159412b26e141ada8c41d30cb08871690e3126 Author: Heiner Kallweit Date: Fri Dec 6 23:27:15 2019 +0100 r8169: add missing RX enabling for WoL on RTL8125 [ Upstream commit 00222d1394104f0fd6c01ca9f578afec9e0f148b ] RTL8125 also requires to enable RX for WoL. v2: add missing Fixes tag Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125") Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 157560f95d4cb5e3d15a91e489d0acbf399fabf1 Author: Vladimir Oltean Date: Tue Dec 3 17:45:35 2019 +0200 net: mscc: ocelot: unregister the PTP clock on deinit [ Upstream commit 9385973fe8db9743fa93bf17245635be4eb8c4a6 ] Currently a switch driver deinit frees the regmaps, but the PTP clock is still out there, available to user space via /dev/ptpN. Any PTP operation is a ticking time bomb, since it will attempt to use the freed regmaps and thus trigger kernel panics: [ 4.291746] fsl_enetc 0000:00:00.2 eth1: error -22 setting up slave phy [ 4.291871] mscc_felix 0000:00:00.5: Failed to register DSA switch: -22 [ 4.308666] mscc_felix: probe of 0000:00:00.5 failed with error -22 [ 6.358270] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088 [ 6.367090] Mem abort info: [ 6.369888] ESR = 0x96000046 [ 6.369891] EC = 0x25: DABT (current EL), IL = 32 bits [ 6.369892] SET = 0, FnV = 0 [ 6.369894] EA = 0, S1PTW = 0 [ 6.369895] Data abort info: [ 6.369897] ISV = 0, ISS = 0x00000046 [ 6.369899] CM = 0, WnR = 1 [ 6.369902] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020d58c7000 [ 6.369904] [0000000000000088] pgd=00000020d5912003, pud=00000020d5915003, pmd=0000000000000000 [ 6.369914] Internal error: Oops: 96000046 [#1] PREEMPT SMP [ 6.420443] Modules linked in: [ 6.423506] CPU: 1 PID: 262 Comm: phc_ctl Not tainted 5.4.0-03625-gb7b2a5dadd7f #204 [ 6.431273] Hardware name: LS1028A RDB Board (DT) [ 6.435989] pstate: 40000085 (nZcv daIf -PAN -UAO) [ 6.440802] pc : css_release+0x24/0x58 [ 6.444561] lr : regmap_read+0x40/0x78 [ 6.448316] sp : ffff800010513cc0 [ 6.451636] x29: ffff800010513cc0 x28: ffff002055873040 [ 6.456963] x27: 0000000000000000 x26: 0000000000000000 [ 6.462289] x25: 0000000000000000 x24: 0000000000000000 [ 6.467617] x23: 0000000000000000 x22: 0000000000000080 [ 6.472944] x21: ffff800010513d44 x20: 0000000000000080 [ 6.478270] x19: 0000000000000000 x18: 0000000000000000 [ 6.483596] x17: 0000000000000000 x16: 0000000000000000 [ 6.488921] x15: 0000000000000000 x14: 0000000000000000 [ 6.494247] x13: 0000000000000000 x12: 0000000000000000 [ 6.499573] x11: 0000000000000000 x10: 0000000000000000 [ 6.504899] x9 : 0000000000000000 x8 : 0000000000000000 [ 6.510225] x7 : 0000000000000000 x6 : ffff800010513cf0 [ 6.515550] x5 : 0000000000000000 x4 : 0000000fffffffe0 [ 6.520876] x3 : 0000000000000088 x2 : ffff800010513d44 [ 6.526202] x1 : ffffcada668ea000 x0 : ffffcada64d8b0c0 [ 6.531528] Call trace: [ 6.533977] css_release+0x24/0x58 [ 6.537385] regmap_read+0x40/0x78 [ 6.540795] __ocelot_read_ix+0x6c/0xa0 [ 6.544641] ocelot_ptp_gettime64+0x4c/0x110 [ 6.548921] ptp_clock_gettime+0x4c/0x58 [ 6.552853] pc_clock_gettime+0x5c/0xa8 [ 6.556699] __arm64_sys_clock_gettime+0x68/0xc8 [ 6.561331] el0_svc_common.constprop.2+0x7c/0x178 [ 6.566133] el0_svc_handler+0x34/0xa0 [ 6.569891] el0_sync_handler+0x114/0x1d0 [ 6.573908] el0_sync+0x140/0x180 [ 6.577232] Code: d503201f b00119a1 91022263 b27b7be4 (f9004663) [ 6.583349] ---[ end trace d196b9b14cdae2da ]--- [ 6.587977] Kernel panic - not syncing: Fatal exception [ 6.593216] SMP: stopping secondary CPUs [ 6.597151] Kernel Offset: 0x4ada54400000 from 0xffff800010000000 [ 6.603261] PHYS_OFFSET: 0xffffd0a7c0000000 [ 6.607454] CPU features: 0x10002,21806008 [ 6.611558] Memory Limit: none And now that ocelot->ptp_clock is checked at exit, prevent a potential error where ptp_clock_register returned a pointer-encoded error, which we are keeping in the ocelot private data structure. So now, ocelot->ptp_clock is now either NULL or a valid pointer. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Cc: Antoine Tenart Reviewed-by: Florian Fainelli Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dd561233e068044b7b2203fd55cb1337459ff1f8 Author: Shannon Nelson Date: Tue Dec 3 14:17:34 2019 -0800 ionic: keep users rss hash across lif reset [ Upstream commit ffac2027e18f006f42630f2e01a8a9bd8dc664b5 ] If the user has specified their own RSS hash key, don't lose it across queue resets such as DOWN/UP, MTU change, and number of channels change. This is fixed by moving the key initialization to a little earlier in the lif creation. Also, let's clean up the RSS config a little better on the way down by setting it all to 0. Fixes: aa3198819bea ("ionic: Add RSS support") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9bd01a33c780a4f10a2e36ecf42dcab9f6b01aa5 Author: Jonathan Lemon Date: Tue Dec 3 14:01:14 2019 -0800 xdp: obtain the mem_id mutex before trying to remove an entry. [ Upstream commit 86c76c09898332143be365c702cf8d586ed4ed21 ] A lockdep splat was observed when trying to remove an xdp memory model from the table since the mutex was obtained when trying to remove the entry, but not before the table walk started: Fix the splat by obtaining the lock before starting the table walk. Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.") Reported-by: Grygorii Strashko Signed-off-by: Jonathan Lemon Tested-by: Grygorii Strashko Acked-by: Jesper Dangaard Brouer Acked-by: Ilias Apalodimas Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 05f646cb2174d1a4e032b60b99097f5c4b522616 Author: Jonathan Lemon Date: Thu Nov 14 14:13:00 2019 -0800 page_pool: do not release pool until inflight == 0. [ Upstream commit c3f812cea0d7006469d1cf33a4a9f0a12bb4b3a3 ] The page pool keeps track of the number of pages in flight, and it isn't safe to remove the pool until all pages are returned. Disallow removing the pool until all pages are back, so the pool is always available for page producers. Make the page pool responsible for its own delayed destruction instead of relying on XDP, so the page pool can be used without the xdp memory model. When all pages are returned, free the pool and notify xdp if the pool is registered with the xdp memory system. Have the callback perform a table walk since some drivers (cpsw) may share the pool among multiple xdp_rxq_info. Note that the increment of pages_state_release_cnt may result in inflight == 0, resulting in the pool being released. Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning") Signed-off-by: Jonathan Lemon Acked-by: Jesper Dangaard Brouer Acked-by: Ilias Apalodimas Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6b2377de13af6821d2a196a1e4326369e1a4fabd Author: Aya Levin Date: Sun Dec 1 16:33:55 2019 +0200 net/mlx5e: ethtool, Fix analysis of speed setting [ Upstream commit 3d7cadae51f1b7f28358e36d0a1ce3f0ae2eee60 ] When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is configured while the user, who has an advanced HW which supports extended PTYS, expects also 50G*2 to be configured. With this patch, when extended PTYS mode is available, configure PTYS via extended fields. Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes") Signed-off-by: Aya Levin Reviewed-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit dd54484500ec8018bd7c27caf7b8458d6328c255 Author: Aya Levin Date: Sun Dec 1 14:45:25 2019 +0200 net/mlx5e: Fix translation of link mode into speed [ Upstream commit 6d485e5e555436d2c13accdb10807328c4158a17 ] Add a missing value in translation of PTYS ext_eth_proto_oper to its corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool shows unknown speed. With this fix, ethtool shows speed is 100G as expected. Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register") Signed-off-by: Aya Levin Reviewed-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 65523f0fe7b885da9454e97ca97996d2b3392be5 Author: Roi Dayan Date: Wed Dec 4 11:25:43 2019 +0200 net/mlx5e: Fix freeing flow with kfree() and not kvfree() [ Upstream commit a23dae79fb6555c808528707c6389345d0b0c189 ] Flows are allocated with kzalloc() so free with kfree(). Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Signed-off-by: Roi Dayan Reviewed-by: Eli Britstein Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 2e4e7670cba5bf6c900e832c13f597db262f5ed5 Author: Eran Ben Elisha Date: Thu Dec 5 10:30:22 2019 +0200 net/mlx5e: Fix SFF 8472 eeprom length [ Upstream commit c431f8597863a91eea6024926e0c1b179cfa4852 ] SFF 8472 eeprom length is 512 bytes. Fix module info return value to support 512 bytes read. Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query") Signed-off-by: Eran Ben Elisha Reviewed-by: Aya Levin Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 4e57c233915e898678e5654d9b91ab24600fe7e5 Author: Aaron Conole Date: Tue Dec 3 16:34:14 2019 -0500 act_ct: support asymmetric conntrack [ Upstream commit 95219afbb980f10934de9f23a3e199be69c5ed09 ] The act_ct TC module shares a common conntrack and NAT infrastructure exposed via netfilter. It's possible that a packet needs both SNAT and DNAT manipulation, due to e.g. tuple collision. Netfilter can support this because it runs through the NAT table twice - once on ingress and again after egress. The act_ct action doesn't have such capability. Like netfilter hook infrastructure, we should run through NAT twice to keep the symmetry. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Aaron Conole Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 411fdb975269ac2d1d746535eadb25dea3813189 Author: Eran Ben Elisha Date: Mon Nov 25 12:11:49 2019 +0200 net/mlx5e: Fix TXQ indices to be sequential [ Upstream commit c55d8b108caa2ec1ae8dddd02cb9d3a740f7c838 ] Cited patch changed (channel index, tc) => (TXQ index) mapping to be a static one, in order to keep indices consistent when changing number of channels or TCs. For 32 channels (OOB) and 8 TCs, real num of TXQs is 256. When reducing the amount of channels to 8, the real num of TXQs will be changed to 64. This indices method is buggy: - Channel #0, TC 3, the TXQ index is 96. - Index 8 is not valid, as there is no such TXQ from driver perspective (As it represents channel #8, TC 0, which is not valid with the above configuration). As part of driver's select queue, it calls netdev_pick_tx which returns an index in the range of real number of TXQs. Depends on the return value, with the examples above, driver could have returned index larger than the real number of tx queues, or crash the kernel as it tries to read invalid address of SQ which was not allocated. Fix that by allocating sequential TXQ indices, and hold a new mapping between (channel index, tc) => (real TXQ index). This mapping will be updated as part of priv channels activation, and is used in mlx5e_select_queue to find the selected queue index. The existing indices mapping (channel_tc2txq) is no longer needed, as it is used only for statistics structures and can be calculated on run time. Delete its definintion and updates. Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened") Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit cd477d06d22d8b6d058962043060785c76819446 Author: Martin Varghese Date: Thu Dec 5 05:57:22 2019 +0530 net: Fixed updating of ethertype in skb_mpls_push() [ Upstream commit d04ac224b1688f005a84f764cfe29844f8e9da08 ] The skb_mpls_push was not updating ethertype of an ethernet packet if the packet was originally received from a non ARPHRD_ETHER device. In the below OVS data path flow, since the device corresponding to port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does not update the ethertype of the packet even though the previous push_eth action had added an ethernet header to the packet. recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no), actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4 Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper") Signed-off-by: Martin Varghese Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 10fec3e5660b40839c0109e45569909ec5e33916 Author: Taehee Yoo Date: Thu Dec 5 07:23:39 2019 +0000 hsr: fix a NULL pointer dereference in hsr_dev_xmit() [ Upstream commit df95467b6d2bfce49667ee4b71c67249b01957f7 ] hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would return NULL if master node is not existing in the list. But hsr_dev_xmit() doesn't check return pointer so a NULL dereference could occur. Test commands: ip netns add nst ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link set veth1 netns nst ip link set veth3 netns nst ip link set veth0 up ip link set veth2 up ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip netns exec nst ip link set veth1 up ip netns exec nst ip link set veth3 up ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3 ip netns exec nst ip a a 192.168.100.2/24 dev hsr1 ip netns exec nst ip link set hsr1 up hping3 192.168.100.2 -2 --flood & modprobe -rv hsr Splat looks like: [ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled [ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192 [ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr] [ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b [ 217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202 [ 217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002 [ 217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010 [ 217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001 [ 217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000 [ 217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00 [ 217.383094][ T1635] FS: 00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000 [ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0 [ 217.385940][ T1635] Call Trace: [ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740 [ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10 [ 217.388118][ T1635] ? check_object+0xaf/0x260 [ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500 [ 217.392017][ T1635] ? init_object+0x6b/0x80 [ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0 [ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500 [ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0 [ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40 [ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0 [ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0 [ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0 [ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0 [ 217.401118][ T1635] ? memset+0x1f/0x40 [ 217.402529][ T1635] ? __alloc_skb+0x317/0x500 [ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0 [ ... ] Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()") Acked-by: Cong Wang Signed-off-by: Taehee Yoo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2cbaf5fb573a5f150109c11d9e4a99006b885702 Author: Martin Varghese Date: Mon Dec 2 10:49:51 2019 +0530 Fixed updating of ethertype in function skb_mpls_pop [ Upstream commit 040b5cfbcefa263ccf2c118c4938308606bb7ed8 ] The skb_mpls_pop was not updating ethertype of an ethernet packet if the packet was originally received from a non ARPHRD_ETHER device. In the below OVS data path flow, since the device corresponding to port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update the ethertype of the packet even though the previous push_eth action had added an ethernet header to the packet. recirc_id(0),in_port(7),eth_type(0x8847), mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1), actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), pop_mpls(eth_type=0x800),4 Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper") Signed-off-by: Martin Varghese Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 23fbdd5d1e826454a1ce199e716e2015033212c4 Author: Cong Wang Date: Thu Dec 5 19:39:02 2019 -0800 gre: refetch erspan header from skb->data after pskb_may_pull() [ Upstream commit 0e4940928c26527ce8f97237fef4c8a91cd34207 ] After pskb_may_pull() we should always refetch the header pointers from the skb->data in case it got reallocated. In gre_parse_header(), the erspan header is still fetched from the 'options' pointer which is fetched before pskb_may_pull(). Found this during code review of a KMSAN bug report. Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup") Cc: Lorenzo Bianconi Signed-off-by: Cong Wang Acked-by: Lorenzo Bianconi Acked-by: William Tu Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 71bc12b1fb4afedf52d558a2cfb351f68831caeb Author: Yoshiki Komachi Date: Tue Dec 3 19:40:12 2019 +0900 cls_flower: Fix the behavior using port ranges with hw-offload [ Upstream commit 8ffb055beae58574d3e77b4bf9d4d15eace1ca27 ] The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges") had added filtering based on port ranges to tc flower. However the commit missed necessary changes in hw-offload code, so the feature gave rise to generating incorrect offloaded flow keys in NIC. One more detailed example is below: $ tc qdisc add dev eth0 ingress $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \ dst_port 100-200 action drop With the setup above, an exact match filter with dst_port == 0 will be installed in NIC by hw-offload. IOW, the NIC will have a rule which is equivalent to the following one. $ tc qdisc add dev eth0 ingress $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \ dst_port 0 action drop The behavior was caused by the flow dissector which extracts packet data into the flow key in the tc flower. More specifically, regardless of exact match or specified port ranges, fl_init_dissector() set the FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port numbers from skb in skb_flow_dissect() called by fl_classify(). Note that device drivers received the same struct flow_dissector object as used in skb_flow_dissect(). Thus, offloaded drivers could not identify which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was set to struct flow_dissector in either case. This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new tp_range field in struct fl_flow_key to recognize which filters are applied to offloaded drivers. At this point, when filters based on port ranges passed to drivers, drivers return the EOPNOTSUPP error because they do not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE flag). Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges") Signed-off-by: Yoshiki Komachi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 554d2e14c5e1dac1b15ebd0c461084f7b733cb03 Author: John Hurley Date: Thu Dec 5 17:03:35 2019 +0000 net: sched: allow indirect blocks to bind to clsact in TC [ Upstream commit 25a443f74bcff2c4d506a39eae62fc15ad7c618a ] When a device is bound to a clsact qdisc, bind events are triggered to registered drivers for both ingress and egress. However, if a driver registers to such a device using the indirect block routines then it is assumed that it is only interested in ingress offload and so only replays ingress bind/unbind messages. The NFP driver supports the offload of some egress filters when registering to a block with qdisc of type clsact. However, on unregister, if the block is still active, it will not receive an unbind egress notification which can prevent proper cleanup of other registered callbacks. Modify the indirect block callback command in TC to send messages of ingress and/or egress bind depending on the qdisc in use. NFP currently supports egress offload for TC flower offload so the changes are only added to TC. Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley Acked-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1b511a9d2c09bc7a0b0ea3d2b4538547b7615284 Author: John Hurley Date: Thu Dec 5 17:03:34 2019 +0000 net: core: rename indirect block ingress cb function [ Upstream commit dbad3408896c3c5722ec9cda065468b3df16c5bf ] With indirect blocks, a driver can register for callbacks from a device that is does not 'own', for example, a tunnel device. When registering to or unregistering from a new device, a callback is triggered to generate a bind/unbind event. This, in turn, allows the driver to receive any existing rules or to properly clean up installed rules. When first added, it was assumed that all indirect block registrations would be for ingress offloads. However, the NFP driver can, in some instances, support clsact qdisc binds for egress offload. Change the name of the indirect block callback command in flow_offload to remove the 'ingress' identifier from it. While this does not change functionality, a follow up patch will implement a more more generic callback than just those currently just supporting ingress offload. Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley Acked-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ee0dc0c3f371197ff8dbaf4ce874bef2e33674ea Author: Guillaume Nault Date: Fri Dec 6 12:38:49 2019 +0100 tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() [ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ] Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the timestamp of the last synflood. Protect them with READ_ONCE() and WRITE_ONCE() since reads and writes aren't serialised. Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from struct tcp_sock"). But unprotected accesses were already there when timestamp was stored in .last_synq_overflow. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e70ee16481f9030030b51349f2131116ac916859 Author: Guillaume Nault Date: Fri Dec 6 12:38:43 2019 +0100 tcp: tighten acceptance of ACKs not matching a child socket [ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ] When no synflood occurs, the synflood timestamp isn't updated. Therefore it can be so old that time_after32() can consider it to be in the future. That's a problem for tcp_synq_no_recent_overflow() as it may report that a recent overflow occurred while, in fact, it's just that jiffies has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31. Spurious detection of recent overflows lead to extra syncookie verification in cookie_v[46]_check(). At that point, the verification should fail and the packet dropped. But we should have dropped the packet earlier as we didn't even send a syncookie. Let's refine tcp_synq_no_recent_overflow() to report a recent overflow only if jiffies is within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This way, no spurious recent overflow is reported when jiffies wraps and 'last_overflow' becomes in the future from the point of view of time_after32(). However, if jiffies wraps and enters the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with 'last_overflow' being a stale synflood timestamp), then tcp_synq_no_recent_overflow() still erroneously reports an overflow. In such cases, we have to rely on syncookie verification to drop the packet. We unfortunately have no way to differentiate between a fresh and a stale syncookie timestamp. In practice, using last_overflow as lower bound is problematic. If the synflood timestamp is concurrently updated between the time we read jiffies and the moment we store the timestamp in 'last_overflow', then 'now' becomes smaller than 'last_overflow' and tcp_synq_no_recent_overflow() returns true, potentially dropping a valid syncookie. Reading jiffies after loading the timestamp could fix the problem, but that'd require a memory barrier. Let's just accommodate for potential timestamp growth instead and extend the interval using 'last_overflow - HZ' as lower bound. Signed-off-by: Guillaume Nault Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9afe690185bcdeae3989410a3684f02e0a1fc9e9 Author: Guillaume Nault Date: Fri Dec 6 12:38:36 2019 +0100 tcp: fix rejected syncookies due to stale timestamps [ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ] If no synflood happens for a long enough period of time, then the synflood timestamp isn't refreshed and jiffies can advance so much that time_after32() can't accurately compare them any more. Therefore, we can end up in a situation where time_after32(now, last_overflow + HZ) returns false, just because these two values are too far apart. In that case, the synflood timestamp isn't updated as it should be, which can trick tcp_synq_no_recent_overflow() into rejecting valid syncookies. For example, let's consider the following scenario on a system with HZ=1000: * The synflood timestamp is 0, either because that's the timestamp of the last synflood or, more commonly, because we're working with a freshly created socket. * We receive a new SYN, which triggers synflood protection. Let's say that this happens when jiffies == 2147484649 (that is, 'synflood timestamp' + HZ + 2^31 + 1). * Then tcp_synq_overflow() doesn't update the synflood timestamp, because time_after32(2147484649, 1000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 1000: the value of 'last_overflow' + HZ. * A bit later, we receive the ACK completing the 3WHS. But cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow() says that we're not under synflood. That's because time_after32(2147484649, 120000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID. Of course, in reality jiffies would have increased a bit, but this condition will last for the next 119 seconds, which is far enough to accommodate for jiffie's growth. Fix this by updating the overflow timestamp whenever jiffies isn't within the [last_overflow, last_overflow + HZ] range. That shouldn't have any performance impact since the update still happens at most once per second. Now we're guaranteed to have fresh timestamps while under synflood, so tcp_synq_no_recent_overflow() can safely use it with time_after32() in such situations. Stale timestamps can still make tcp_synq_no_recent_overflow() return the wrong verdict when not under synflood. This will be handled in the next patch. For 64 bits architectures, the problem was introduced with the conversion of ->tw_ts_recent_stamp to 32 bits integer by commit cca9bab1b72c ("tcp: use monotonic timestamps for PAWS"). The problem has always been there on 32 bits architectures. Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 48d58ae9e87aaa11814364ddb52b3461f9abac57 Author: Sabrina Dubroca Date: Wed Dec 4 15:35:53 2019 +0100 net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup [ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ] ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8cadbd146a8712cffef5921559d24b00911ac4b7 Author: Sabrina Dubroca Date: Wed Dec 4 15:35:52 2019 +0100 net: ipv6: add net argument to ip6_dst_lookup_flow [ Upstream commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e ] This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow, as some modules currently pass a net argument without a socket to ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument"). Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9617d69d663de358957df86862984414e0bbc1cf Author: Huy Nguyen Date: Fri Sep 6 09:28:46 2019 -0500 net/mlx5e: Query global pause state before setting prio2buffer [ Upstream commit 73e6551699a32fac703ceea09214d6580edcf2d5 ] When the user changes prio2buffer mapping while global pause is enabled, mlx5 driver incorrectly sets all active buffers (buffer that has at least one priority mapped) to lossy. Solution: If global pause is enabled, set all the active buffers to lossless in prio2buffer command. Also, add error message when buffer size is not enough to meet xoff threshold. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 0703996ff4a1344e7bab4f35d933c1ee75d78a79 Author: Taehee Yoo Date: Fri Dec 6 05:25:48 2019 +0000 tipc: fix ordering of tipc module init and exit routine [ Upstream commit 9cf1cd8ee3ee09ef2859017df2058e2f53c5347f ] In order to set/get/dump, the tipc uses the generic netlink infrastructure. So, when tipc module is inserted, init function calls genl_register_family(). After genl_register_family(), set/get/dump commands are immediately allowed and these callbacks internally use the net_generic. net_generic is allocated by register_pernet_device() but this is called after genl_register_family() in the __init function. So, these callbacks would use un-initialized net_generic. Test commands: #SHELL1 while : do modprobe tipc modprobe -rv tipc done #SHELL2 while : do tipc link list done Splat looks like: [ 59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled [ 59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194 [ 59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc] [ 59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00 [ 59.622550][ T2780] NET: Registered protocol family 30 [ 59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202 [ 59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907 [ 59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149 [ 59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1 [ 59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40 [ 59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328 [ 59.624639][ T2788] FS: 00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 [ 59.624645][ T2788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.625875][ T2780] tipc: Started in single node mode [ 59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0 [ 59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.636478][ T2788] Call Trace: [ 59.637025][ T2788] tipc_nl_add_bc_link+0x179/0x1470 [tipc] [ 59.638219][ T2788] ? lock_downgrade+0x6e0/0x6e0 [ 59.638923][ T2788] ? __tipc_nl_add_link+0xf90/0xf90 [tipc] [ 59.639533][ T2788] ? tipc_nl_node_dump_link+0x318/0xa50 [tipc] [ 59.640160][ T2788] ? mutex_lock_io_nested+0x1380/0x1380 [ 59.640746][ T2788] tipc_nl_node_dump_link+0x4fd/0xa50 [tipc] [ 59.641356][ T2788] ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc] [ 59.642088][ T2788] ? __skb_ext_del+0x270/0x270 [ 59.642594][ T2788] genl_lock_dumpit+0x85/0xb0 [ 59.643050][ T2788] netlink_dump+0x49c/0xed0 [ 59.643529][ T2788] ? __netlink_sendskb+0xc0/0xc0 [ 59.644044][ T2788] ? __netlink_dump_start+0x190/0x800 [ 59.644617][ T2788] ? __mutex_unlock_slowpath+0xd0/0x670 [ 59.645177][ T2788] __netlink_dump_start+0x5a0/0x800 [ 59.645692][ T2788] genl_rcv_msg+0xa75/0xe90 [ 59.646144][ T2788] ? __lock_acquire+0xdfe/0x3de0 [ 59.646692][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320 [ 59.647340][ T2788] ? genl_lock_dumpit+0xb0/0xb0 [ 59.647821][ T2788] ? genl_unlock+0x20/0x20 [ 59.648290][ T2788] ? genl_parallel_done+0xe0/0xe0 [ 59.648787][ T2788] ? find_held_lock+0x39/0x1d0 [ 59.649276][ T2788] ? genl_rcv+0x15/0x40 [ 59.649722][ T2788] ? lock_contended+0xcd0/0xcd0 [ 59.650296][ T2788] netlink_rcv_skb+0x121/0x350 [ 59.650828][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320 [ 59.651491][ T2788] ? netlink_ack+0x940/0x940 [ 59.651953][ T2788] ? lock_acquire+0x164/0x3b0 [ 59.652449][ T2788] genl_rcv+0x24/0x40 [ 59.652841][ T2788] netlink_unicast+0x421/0x600 [ ... ] Fixes: 7e4369057806 ("tipc: fix a slab object leak") Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Taehee Yoo Acked-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2fc7d173ea6121349165f49c8bd91f82c79a9da1 Author: Eric Dumazet Date: Thu Dec 5 10:10:15 2019 -0800 tcp: md5: fix potential overestimation of TCP option space [ Upstream commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 ] Back in 2008, Adam Langley fixed the corner case of packets for flows having all of the following options : MD5 TS SACK Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block can be cooked from the remaining 8 bytes. tcp_established_options() correctly sets opts->num_sack_blocks to zero, but returns 36 instead of 32. This means TCP cooks packets with 4 extra bytes at the end of options, containing unitialized bytes. Fixes: 33ad798c924b ("tcp: options clean up") Signed-off-by: Eric Dumazet Reported-by: syzbot Acked-by: Neal Cardwell Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0fa3554e921483c34807359cd9bf30034163fa8c Author: Aaron Conole Date: Tue Dec 3 16:34:13 2019 -0500 openvswitch: support asymmetric conntrack [ Upstream commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 ] The openvswitch module shares a common conntrack and NAT infrastructure exposed via netfilter. It's possible that a packet needs both SNAT and DNAT manipulation, due to e.g. tuple collision. Netfilter can support this because it runs through the NAT table twice - once on ingress and again after egress. The openvswitch module doesn't have such capability. Like netfilter hook infrastructure, we should run through NAT twice to keep the symmetry. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Aaron Conole Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 61c6c1296a5e3d122223890198ab017f07321def Author: Valentin Vidic Date: Thu Dec 5 07:41:18 2019 +0100 net/tls: Fix return values to avoid ENOTSUPP [ Upstream commit 4a5cdc604b9cf645e6fa24d8d9f055955c3c8516 ] ENOTSUPP is not available in userspace, for example: setsockopt failed, 524, Unknown error 524 Signed-off-by: Valentin Vidic Acked-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 94fbebd20a607d29aa8028c55d5a3521c49acd95 Author: Mian Yousaf Kaukab Date: Thu Dec 5 10:41:16 2019 +0100 net: thunderx: start phy before starting autonegotiation [ Upstream commit a350d2e7adbb57181d33e3aa6f0565632747feaa ] Since commit 2b3e88ea6528 ("net: phy: improve phy state checking") phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start() before calling phy_start_aneg() during probe so that autonegotiation is initiated. As phy_start() takes care of calling phy_start_aneg(), drop the explicit call to phy_start_aneg(). Network fails without this patch on Octeon TX. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Signed-off-by: Mian Yousaf Kaukab Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c774abc60719a5472bbfb692b6462184866e0d2b Author: Eric Dumazet Date: Sat Dec 7 11:34:45 2019 -0800 net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() [ Upstream commit 2dd5616ecdcebdf5a8d007af64e040d4e9214efe ] Use the new tcf_proto_check_kind() helper to make sure user provided value is well formed. BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline] BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668 CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245 string_nocheck lib/vsprintf.c:606 [inline] string+0x4be/0x600 lib/vsprintf.c:668 vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510 __request_module+0x2b1/0x11c0 kernel/kmod.c:143 tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139 tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline] tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmsg net/socket.c:2356 [inline] __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2363 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45a649 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4 R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline] kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132 kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86 slab_alloc_node mm/slub.c:2773 [inline] __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x306/0xa10 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmsg net/socket.c:2356 [inline] __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2363 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 6f96c3c6904c ("net_sched: fix backward compatibility for TCA_KIND") Signed-off-by: Eric Dumazet Reported-by: syzbot Acked-by: Cong Wang Cc: Marcelo Ricardo Leitner Cc: Jamal Hadi Salim Cc: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2bbcffbfc2a51739b6c5933a657e6cff2c597887 Author: Dust Li Date: Tue Dec 3 11:17:40 2019 +0800 net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues [ Upstream commit 2f23cd42e19c22c24ff0e221089b7b6123b117c5 ] sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc in mq_dump() and mqprio_dump(). Fixes: ce679e8df7ed ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio") Signed-off-by: Dust Li Signed-off-by: Tony Lu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5fc9fc7aac9a9d0e6007988f0c075767756072c8 Author: Grygorii Strashko Date: Fri Dec 6 14:28:20 2019 +0200 net: ethernet: ti: cpsw: fix extra rx interrupt [ Upstream commit 51302f77bedab8768b761ed1899c08f89af9e4e2 ] Now RX interrupt is triggered twice every time, because in cpsw_rx_interrupt() it is asked first and then disabled. So there will be pending interrupt always, when RX interrupt is enabled again in NAPI handler. Fix it by first disabling IRQ and then do ask. Fixes: 870915feabdc ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself") Signed-off-by: Grygorii Strashko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0706bfdfa7047408dcb572915063ed115fb76b09 Author: Alexander Lobakin Date: Thu Dec 5 13:02:35 2019 +0300 net: dsa: fix flow dissection on Tx path [ Upstream commit 8bef0af09a5415df761b04fa487a6c34acae74bc ] Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an ability to override protocol and network offset during flow dissection for DSA-enabled devices (i.e. controllers shipped as switch CPU ports) in order to fix skb hashing for RPS on Rx path. However, skb_hash() and added part of code can be invoked not only on Rx, but also on Tx path if we have a multi-queued device and: - kernel is running on UP system or - XPS is not configured. The call stack in this two cases will be like: dev_queue_xmit() -> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() -> skb_tx_hash() -> skb_get_hash(). The problem is that skbs queued for Tx have both network offset and correct protocol already set up even after inserting a CPU tag by DSA tagger, so calling tag_ops->flow_dissect() on this path actually only breaks flow dissection and hashing. This can be observed by adding debug prints just before and right after tag_ops->flow_dissect() call to the related block of code: Before the patch: Rx path (RPS): [ 19.240001] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 19.244271] tag_ops->flow_dissect() [ 19.247811] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */ [ 19.215435] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 19.219746] tag_ops->flow_dissect() [ 19.223241] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */ [ 18.654057] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 18.658332] tag_ops->flow_dissect() [ 18.661826] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */ Tx path (UP system): [ 18.759560] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */ [ 18.763933] tag_ops->flow_dissect() [ 18.767485] Tx: proto: 0x920b, nhoff: 34 /* junk */ [ 22.800020] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */ [ 22.804392] tag_ops->flow_dissect() [ 22.807921] Tx: proto: 0x920b, nhoff: 34 /* junk */ [ 16.898342] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */ [ 16.902705] tag_ops->flow_dissect() [ 16.906227] Tx: proto: 0x920b, nhoff: 34 /* junk */ After: Rx path (RPS): [ 16.520993] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 16.525260] tag_ops->flow_dissect() [ 16.528808] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */ [ 15.484807] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 15.490417] tag_ops->flow_dissect() [ 15.495223] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */ [ 17.134621] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 17.138895] tag_ops->flow_dissect() [ 17.142388] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */ Tx path (UP system): [ 15.499558] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */ [ 20.664689] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */ [ 18.565782] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */ In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)' to prevent code from calling tag_ops->flow_dissect() on Tx. I also decided to initialize 'offset' variable so tagger callbacks can now safely leave it untouched without provoking a chaos. Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection") Signed-off-by: Alexander Lobakin Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c1780f088f400f0cede9a2ea55761d8effa35ca4 Author: Nikolay Aleksandrov Date: Tue Dec 3 16:48:06 2019 +0200 net: bridge: deny dev_set_mac_address() when unregistering [ Upstream commit c4b4c421857dc7b1cf0dccbd738472360ff2cd70 ] We have an interesting memory leak in the bridge when it is being unregistered and is a slave to a master device which would change the mac of its slaves on unregister (e.g. bond, team). This is a very unusual setup but we do end up leaking 1 fdb entry because dev_set_mac_address() would cause the bridge to insert the new mac address into its table after all fdbs are flushed, i.e. after dellink() on the bridge has finished and we call NETDEV_UNREGISTER the bond/team would release it and will call dev_set_mac_address() to restore its original address and that in turn will add an fdb in the bridge. One fix is to check for the bridge dev's reg_state in its ndo_set_mac_address callback and return an error if the bridge is not in NETREG_REGISTERED. Easy steps to reproduce: 1. add bond in mode != A/B 2. add any slave to the bond 3. add bridge dev as a slave to the bond 4. destroy the bridge device Trace: unreferenced object 0xffff888035c4d080 (size 128): comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s) hex dump (first 32 bytes): 41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00 A..6............ d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00 ...^?........... backtrace: [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge] [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge] [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge] [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge] [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge] [<000000006846a77f>] dev_set_mac_address+0x63/0x9b [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding] [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding] [<00000000305d7795>] notifier_call_chain+0x38/0x56 [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23 [<000000008279477b>] rollback_registered_many+0x353/0x6a4 [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43 [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268 Fixes: 43598813386f ("bridge: add local MAC address to forwarding table (v2)") Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 62d7fdb00b0af2c0f41ea76ceff18bb99e794b66 Author: Vladyslav Tarasiuk Date: Fri Dec 6 13:51:05 2019 +0000 mqprio: Fix out-of-bounds access in mqprio_dump [ Upstream commit 9f104c7736904ac72385bbb48669e0c923ca879b ] When user runs a command like tc qdisc add dev eth1 root mqprio KASAN stack-out-of-bounds warning is emitted. Currently, NLA_ALIGN macro used in mqprio_dump provides too large buffer size as argument for nla_put and memcpy down the call stack. The flow looks like this: 1. nla_put expects exact object size as an argument; 2. Later it provides this size to memcpy; 3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN macro itself. Therefore, NLA_ALIGN should not be applied to the nla_put parameter. Otherwise it will lead to out-of-bounds memory access in memcpy. Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Signed-off-by: Vladyslav Tarasiuk Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 20f72aae9b21577e5c325c53817ce4ea00eb1133 Author: Eric Dumazet Date: Thu Dec 5 20:43:46 2019 -0800 inet: protect against too small mtu values. [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ] syzbot was once again able to crash a host by setting a very small mtu on loopback device. Let's make inetdev_valid_mtu() available in include/net/ip.h, and use it in ip_setup_cork(), so that we protect both ip_append_page() and __ip_append_data() Also add a READ_ONCE() when the device mtu is read. Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(), even if other code paths might write over this field. Add a big comment in include/linux/netdevice.h about dev->mtu needing READ_ONCE()/WRITE_ONCE() annotations. Hopefully we will add the missing ones in followup patches. [1] refcount_t: saturated; leaking memory. WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x3e kernel/panic.c:582 report_bug+0x289/0x300 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:174 [inline] fixup_bug arch/x86/kernel/traps.c:169 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89 RSP: 0018:ffff88809689f550 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1 R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001 R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40 refcount_add include/linux/refcount.h:193 [inline] skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999 sock_wmalloc+0xf1/0x120 net/core/sock.c:2096 ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383 udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276 inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821 kernel_sendpage+0x92/0xf0 net/socket.c:3794 sock_sendpage+0x8b/0xc0 net/socket.c:936 pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636 splice_from_pipe+0x108/0x170 fs/splice.c:671 generic_splice_sendpage+0x3c/0x50 fs/splice.c:842 do_splice_from fs/splice.c:861 [inline] direct_splice_actor+0x123/0x190 fs/splice.c:1035 splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990 do_splice_direct+0x1da/0x2a0 fs/splice.c:1078 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441409 Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005 RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010 R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180 R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman