diff options
author | Zefan Li <lizefan@huawei.com> | 2014-11-06 09:49:23 +0800 |
---|---|---|
committer | Zefan Li <lizefan@huawei.com> | 2014-11-06 09:49:23 +0800 |
commit | aa09243c2e99ab49664de5037fe71deb1204d570 (patch) | |
tree | 2cd03820e9e7114a77680f50c7c2451820b9b281 | |
parent | 67d68dea442ead6e572127fd6bac0d2020062b9c (diff) | |
download | linux-3.4.y-queue-aa09243c2e99ab49664de5037fe71deb1204d570.tar.gz |
Add one commit that fixes network regression
-rw-r--r-- | patches/net-do-not-enable-tx-nocache-copy-by-default.patch | 138 | ||||
-rw-r--r-- | patches/series | 1 |
2 files changed, 139 insertions, 0 deletions
diff --git a/patches/net-do-not-enable-tx-nocache-copy-by-default.patch b/patches/net-do-not-enable-tx-nocache-copy-by-default.patch new file mode 100644 index 0000000..cb7341b --- /dev/null +++ b/patches/net-do-not-enable-tx-nocache-copy-by-default.patch @@ -0,0 +1,138 @@ +From cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c Mon Sep 17 00:00:00 2001 +From: Benjamin Poirier <bpoirier@suse.de> +Date: Tue, 7 Jan 2014 10:11:10 -0500 +Subject: net: Do not enable tx-nocache-copy by default +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +commit cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c upstream. + +There are many cases where this feature does not improve performance or even +reduces it. + +For example, here are the results from tests that I've run using 3.12.6 on one +Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are +from the Xeon, but they're similar on the i7. All numbers report the +mean±stddev over 10 runs of 10s. + +1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache +copy from user on transmit" +There is no statistically significant difference between tx-nocache-copy +on/off. +nic irqs spread out (one queue per cpu) + +200x netperf -r 1400,1 +tx-nocache-copy off + 692000±1000 tps + 50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3 +tx-nocache-copy on + 693000±1000 tps + 50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7 + +200x netperf -r 14000,14000 +tx-nocache-copy off + 86450±80 tps + 50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40 +tx-nocache-copy on + 86110±60 tps + 50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20 + +2) single stream throughput tests +tx-nocache-copy leads to higher service demand + + throughput cpu0 cpu1 demand + (Gb/s) (Gcycle) (Gcycle) (cycle/B) + +nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send) + +tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01 +tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004 + +nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send) + +tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007 +tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002 + +As a second example, here are some results from Eric Dumazet with latest +net-next. +tx-nocache-copy also leads to higher service demand + +(cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz) + +lpq83:~# ./ethtool -K eth0 tx-nocache-copy on +lpq83:~# perf stat ./netperf -H lpq84 -c +MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET +Recv Send Send Utilization Service Demand +Socket Socket Message Elapsed Send Recv Send Recv +Size Size Size Time Throughput local remote local remote +bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB + + 87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000 + + Performance counter stats for './netperf -H lpq84 -c': + + 4282.648396 task-clock # 0.423 CPUs utilized + 9,348 context-switches # 0.002 M/sec + 88 CPU-migrations # 0.021 K/sec + 355 page-faults # 0.083 K/sec + 11,812,797,651 cycles # 2.758 GHz [82.79%] + 9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%] + 4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%] + 6,053,172,792 instructions # 0.51 insns per cycle + # 1.49 stalled cycles per insn [83.64%] + 597,275,583 branches # 139.464 M/sec [83.70%] + 8,960,541 branch-misses # 1.50% of all branches [83.65%] + + 10.128990264 seconds time elapsed + +lpq83:~# ./ethtool -K eth0 tx-nocache-copy off +lpq83:~# perf stat ./netperf -H lpq84 -c +MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET +Recv Send Send Utilization Service Demand +Socket Socket Message Elapsed Send Recv Send Recv +Size Size Size Time Throughput local remote local remote +bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB + + 87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000 + + Performance counter stats for './netperf -H lpq84 -c': + + 2847.375441 task-clock # 0.281 CPUs utilized + 11,632 context-switches # 0.004 M/sec + 49 CPU-migrations # 0.017 K/sec + 354 page-faults # 0.124 K/sec + 7,646,889,749 cycles # 2.686 GHz [83.34%] + 6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%] + 1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%] + 2,079,702,453 instructions # 0.27 insns per cycle + # 2.94 stalled cycles per insn [83.22%] + 363,773,213 branches # 127.757 M/sec [83.29%] + 4,242,732 branch-misses # 1.17% of all branches [83.51%] + + 10.128449949 seconds time elapsed + +CC: Tom Herbert <therbert@google.com> +Signed-off-by: Benjamin Poirier <bpoirier@suse.de> +Signed-off-by: David S. Miller <davem@davemloft.net> +Signed-off-by: Zefan Li <lizefan@huawei.com> +--- + net/core/dev.c | 5 ----- + 1 file changed, 5 deletions(-) + +--- a/net/core/dev.c ++++ b/net/core/dev.c +@@ -5546,13 +5546,8 @@ int register_netdevice(struct net_device + dev->features |= NETIF_F_SOFT_FEATURES; + dev->wanted_features = dev->features & dev->hw_features; + +- /* Turn on no cache copy if HW is doing checksum */ + if (!(dev->flags & IFF_LOOPBACK)) { + dev->hw_features |= NETIF_F_NOCACHE_COPY; +- if (dev->features & NETIF_F_ALL_CSUM) { +- dev->wanted_features |= NETIF_F_NOCACHE_COPY; +- dev->features |= NETIF_F_NOCACHE_COPY; +- } + } + + /* Make NETIF_F_HIGHDMA inheritable to VLAN devices. diff --git a/patches/series b/patches/series index ad92e7e..e030510 100644 --- a/patches/series +++ b/patches/series @@ -87,3 +87,4 @@ ipv4-avoid-parallel-route-cache-gc-executions.patch ipv4-disable-bh-while-doing-route-gc.patch rtl8192ce-Fix-null-dereference-in-watchdog.patch ipv6-reuse-ip6_frag_id-from-ip6_ufo_append_data.patch +net-do-not-enable-tx-nocache-copy-by-default.patch |