Phenomenon of the problem

In an embedded platform to do UDP reverse traffic test, found that the traffic can not be up, and the packet loss rate is very high.

Iperf3-C 162.16.0.1-B 600M-U-R Connecting point to host 162.16.0.1, Port 5201 Reverse mode Remote Host 162.16.0.1 is swapping [6] local 162.16.0.215 port 45228 connected to 162.16.0.1 port 5201 [ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [6] 0.00 1.00 SEC 2.06 MBytes 17.3 Mbits/SEC 0.600 ms 93125/94616 (98%) [ 6] 1.00-2.00sec 2.23mbytes 18.7mbits/SEC 0.636ms 51638/53256 (97%) [6] 2.00-3.00sec 1.86mbytes/SEC 0.659 ms 50717/52064 (97%) [6] 3.00-4.20sec 1.28 MBytes 8.96 Mbits/ SEC 20.741 ms 49359/50287 (98%) [6] 4.20-5.00sec 1.19 MBytes 12.5 Mbits/ SEC 0.945ms 53518/54382 (98%) [6] 5.00-6.12sec 2.04 MBytes 15.3 Mbits/ SEC 16.536ms 47297/48777 (97%) [6] 6.12-7.00sec 1.45 MBytes 13.8mbits/SEC 8.949 ms 44629/45679 (98%) [6] 7.00-8.00sec 1.73 Mbytes 14.5 Mbits/ SEC 0.901 ms 59480/60735 (98%) [6] 8.00-9.05 SEC 2.00 MBytes 16.1 Mbits/ SEC 9.359 ms 50948/52397 (97%) [6] 9.05-10.00sec 1.45mbytes 12.7mbits/SEC 1.580ms 50707/51756 (97%) [6] 9.05-10.00sec 1.45mbytes 12.7mbits/SEC 1.580ms 50707/51756

[ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [6] 0.00-10.99sec 786 MBytes 600 Mbits/ SEC 0.000 ms 0/569210 (0%) sender [6] 0.00-10.00sec 17.3mbytes 14.5mbits/SEC 1.580ms 551418/563949 (98%) receiver

iperf Done.

Attempts to tweak some parameters

Tweak some parameters for Linux

echo 4194240 > /proc/sys/net/core/rmem_max


echo 4194240 > /proc/sys/net/core/wmem_max


echo ‘4096 87380 4194240’ > /proc/sys/net/ipv4/tcp_rmem


echo ‘4096 65538 4194240’ > /proc/sys/net/ipv4/tcp_wmem


echo ‘4194240 4194240 4194240’ > /proc/sys/net/ipv4/tcp_mem


echo 196608 > /proc/sys/net/core/rmem_default


echo 196608 > /proc/sys/net/core/wmem_default


echo 1000 > /proc/sys/net/core/netdev_budget


echo 3000 > /proc/sys/net/core/netdev_max_backlog

Adjusting these parameters doesn’t work.

Adjust the parameters of iPerf3

Increasing the socket buffer size with iper3’s -w parameter is also useless.

Debug analysis

Some observations of the parameters of Linux

interrupt

As observed with cat /proc/interrupts, the network interruption occurred only on CPU0 and the system does not support SMP IRQ affinity.

softirq

From cat /proc/softirqs, we can see that NET_RX soft interrupts mainly occur on CPU0. The mpstat command found that soft interrupts occupy 100% of CPU0

15:20:24 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle


All 1.06 0.00 4.76 0.00 53.97 0.00 0.00 40.21 All 1.06 0.00 4.76 0.00 0.00 53.97 0.00 0.00 40.21


0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


3.33 0.00 10.00 0.00 2.22 0.00 84.44 3.33 0.00 10.00 0.00 2.22 0.00 84.44

Where is the CPU consumption analyzed using PERF

Perf Record -e Cup-Clock iPerf3-C 162.16.0.1 - B 200m - U-R-T 100 Perf Report

The result

Iperf3 [kernel.kallsyms] [k] nf_inet_local_in


Iperf3 [kernel.kallsyms] [k] __arch_local_irq_restore


3.61% iperf3 [kernel.kallsyms] [k] __copy_user_common


3.41% iperf3 [kernel.kallsyms] [k] handle_sys


3.21% iperf3 [kernel.kallsyms] [k] do_select


3.21% iperf3 [kernel.kallsyms] [k] tcp_poll


2.81% iperf3 libc-2.24.so [.] memcpy


2.61% iperf3 [kernel.kallsyms] [k] __bzero


2.61% iperf3 [kernel.kallsyms] [k] ipt_do_table

As you can see, nf_inet_local_in occupies the most Cpubian. After checking, this is a netfilter kernel module registered hook function. The performance improved when the module was removed. In addition to NF_INET_LOCAL_IN, IPT_DO_TABLE also takes up a lot of CPU, because the system sets up a lot of iptables rules and gets better by clearing them.

iptables -P INPUT ACCEPT


iptables -P FORWARD ACCEPT


iptables -P OUTPUT ACCEPT


iptables -t nat -F


iptables -t mangle -F


iptables -F


iptables -X


iptables -X -t nat


iptables -X -t raw

A further attempt

irq smp affinity

The system does not know the SMP affinity of the hard interrupt

soft irq

If RPS is enabled and RFS is set, the iPerf3 can increase the flow rate by 30-50M with the parameter ‘-p 2’.

echo 3 > /sys/class/net/waninterface/queues/rx-x/rps_cpus


echo 32768 > /sys/class/net/waninterface/queues/rx-X/rps_flow_cnt

If you look at /proc/softirqs at this point, you see an increase in soft interrupts on both CPUs. On this basis, if you adjust some of the parameters of Linux mentioned above, the traffic can reach 300Mbits/ SEC

Refer to the article

https://ylgrgyq.github.io/201…

https://ylgrgyq.github.io/201…

https://blog.packagecloud.io/…

https://colobu.com/2019/12/09…

http://www.brendangregg.com/p…

https://tqr.ink/2017/07/09/im…