linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-14 06:35:12 +00:00

Author	SHA1	Message	Date
Horatiu Vultur	735fec995b	net: lan966x: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP Implement the ioctl callbacks SIOCSHWTSTAMP and SIOCGHWTSTAMP to allow to configure the ports to enable/disable timestamping for TX. The RX timestamping is always enabled. The HW is capable to run both 1-step timestamping and 2-step timestamping. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:18:43 +00:00
Horatiu Vultur	d096459494	net: lan966x: Add support for ptp clocks The lan966x has 3 PHC. Enable each of them, for now all the timestamping is happening on the first PHC. Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:18:43 +00:00
Horatiu Vultur	d700dff41d	net: lan966x: Add registers that are use for ptp functionality Add the registers that will be used to configure the PHC in the HW. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:18:43 +00:00
Horatiu Vultur	2f92512e1c	dt-bindings: net: lan966x: Extend with the ptp interrupt Extend dt-bindings for lan966x with ptp interrupt. This is generated when doing 2-step timestamping and the timestamp can be read from the FIFO. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:18:43 +00:00
Guillaume Nault	9f397dd5f1	selftests: fib rule: Don't echo modified sysctls Run sysctl in quiet mode. Echoing the modified sysctl doesn't bring any useful information. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:14:47 +00:00
Guillaume Nault	21f25cd436	selftests: fib rule: Log test description All callers of fib_rule6_test_match_n_redirect() and fib_rule4_test_match_n_redirect() pass a third argument containing a description of the test being run. Instead of ignoring this argument, let's use it for logging instead of printing a truncated version of the command. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:14:47 +00:00
Guillaume Nault	2e25211363	selftests: fib rule: Drop erroneous TABLE variable The fib_rule6_del_by_pref() and fib_rule4_del_by_pref() functions use an uninitialised $TABLE variable. They should use $RTABLE instead. This doesn't alter the result of the test, as it just makes the grep command less specific (but since the script always uses the same table number, that doesn't really matter). Let's fix it anyway and, while there, specify the filtering parameters directly in 'ip -X rule show' to avoid the extra grep command entirely. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:14:47 +00:00
Guillaume Nault	8af2ba9a78	selftests: fib rule: Make 'getmatch' and 'match' local variables Let's restrict the scope of these variables to avoid possible interferences. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 14:14:47 +00:00
David S. Miller	1d02c03986	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next -queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-01-31 Alexander Lobakin says: This is an interpolation of [0] to other Intel Ethernet drivers (and is (re)based on its code). The main aim is to keep XDP metadata not only in case with build_skb(), but also when we do napi_alloc_skb() + memcpy(). All Intel drivers suffers from the same here: - metadata gets lost on XDP_PASS in legacy-rx; - excessive headroom allocation on XSK Rx to skbs; - metadata gets lost on XSK Rx to skbs. Those get especially actual in XDP Hints upcoming. I couldn't have addressed the first one for all Intel drivers due to that they don't reserve any headroom for now in legacy-rx mode even with XDP enabled. This is hugely wrong, but requires quite a bunch of work and a separate series. Luckily, ice doesn't suffer from that. igc has 1 and 3 already fixed in [0]. [0] https://lore.kernel.org/netdev/163700856423.565980.10162564921347693758.stgit@firesoul ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-01 12:53:18 +00:00
Sergey Shtylyov	9a90986efc	sh_eth: kill useless initializers in sh_eth_{suspend\|resume}() sh_eth_{suspend\|resume}() initialize their local variable 'ret' to 0 but this value is never really used, thus we can kill those intializers... Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/f09d7c64-4a2b-6973-09a4-10d759ed0df4@omp.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-31 21:37:24 -08:00
Hyeonggon Yoo	7354a426e0	net: ena: Do not waste napi skb cache By profiling, discovered that ena device driver allocates skb by build_skb() and frees by napi_skb_cache_put(). Because the driver does not use napi skb cache in allocation path, napi skb cache is periodically filled and flushed. This is waste of napi skb cache. As ena_alloc_skb() is called only in napi, Use napi_build_skb() and napi_alloc_skb() when allocating skb. This patch was tested on aws a1.metal instance. [ jwiedmann.dev@gmail.com: Use napi_alloc_skb() instead of netdev_alloc_skb_ip_align() to keep things consistent. ] Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/r/YfUAkA9BhyOJRT4B@ip-172-31-19-208.ap-northeast-1.compute.internal Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-31 21:36:21 -08:00
Venkata Sudheer Kumar Bhavaraju	ef10bd49df	qed: use msleep() in qed_mcp_cmd() and add qed_mcp_cmd_nosleep() for udelay. Change qed_mcp_cmd() to use msleep() (by setting QED_MB_FLAG_CAN_SLEEP flag) and add new nosleep() version of the api. These api are used to issue cmds to management fw and the change affects how driver behaves while waiting for a response/resource. All sleepable callers of the existing api now use msleep() version. For non-sleepable callers, the new nosleep() version is explicitly used. Signed-off-by: Venkata Sudheer Kumar Bhavaraju <vbhavaraju@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Link: https://lore.kernel.org/r/20220131005235.1647881-1-vbhavaraju@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-31 21:24:35 -08:00
Alexander Lobakin	f322a620be	ixgbe: respect metadata on XSK Rx to skb For now, if the XDP prog returns XDP_PASS on XSK, the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match ixgbe_construct_skb(). Fixes: `d0bcacd0a1` ("ixgbe: add AF_XDP zero-copy Rx support") Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:50:20 -08:00
Alexander Lobakin	8f405221a7	ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, ixgbe_construct_skb_zc() currently allocates and reserves additional `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Fixes: `d0bcacd0a1` ("ixgbe: add AF_XDP zero-copy Rx support") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:47:21 -08:00
Alexander Lobakin	1fbdaa1338	ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directly To not dereference bi->xdp each time in ixgbe_construct_skb_zc(), pass bi->xdp as an argument instead of bi. We can also call xsk_buff_free() outside of the function as well as assign bi->xdp to NULL, which seems to make it closer to its name. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:47:21 -08:00
Alexander Lobakin	f9e61d365b	igc: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, igc_construct_skb_zc() currently allocates and reserves additional `xdp->data_meta - xdp->data_hard_start`, which is about XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only (+ meta) to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Also, net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match igc_construct_skb(). Fixes: `fc9df2a0b5` ("igc: Enable RX via AF_XDP zero-copy") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:47:13 -08:00
Alexander Lobakin	45a34ca680	ice: respect metadata on XSK Rx to skb For now, if the XDP prog returns XDP_PASS on XSK, the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match ice_construct_skb(). Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:46:05 -08:00
Alexander Lobakin	dc44572d19	ice: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, ice_construct_skb_zc() currently allocates and reserves additional `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:43:35 -08:00
Alexander Lobakin	ee803dca96	ice: respect metadata in legacy-rx/ice_construct_skb() In "legacy-rx" mode represented by ice_construct_skb(), we can still use XDP (and XDP metadata), but after XDP_PASS the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. Point net_prefetch() to xdp->data_meta instead of data. This won't change anything when the meta is not here, but will save some cache misses otherwise. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:43:35 -08:00
Alexander Lobakin	6dba29537c	i40e: respect metadata on XSK Rx to skb For now, if the XDP prog returns XDP_PASS on XSK, the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match i40e_construct_skb(). Fixes: `0a714186d3` ("i40e: add AF_XDP zero-copy Rx support") Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:42:58 -08:00
Alexander Lobakin	bc97f9c6f9	i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, i40e_construct_skb_zc() currently allocates and reserves additional `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Fixes: `0a714186d3` ("i40e: add AF_XDP zero-copy Rx support") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-31 09:37:47 -08:00
David S. Miller	b43471cc10	Merge branch 'mana-XDP-counters' Haiyang Zhang says: ==================== net: mana: Add XDP counters, reuse dropped pages Add drop, tx counters for XDP. Reuse dropped pages ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:39:59 +00:00
Haiyang Zhang	a6bf5703f1	net: mana: Reuse XDP dropped page Reuse the dropped page in RX path to save page allocation overhead. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:39:58 +00:00
Haiyang Zhang	d356abb95b	net: mana: Add counter for XDP_TX This counter will show up in ethtool stat. It is the number of packets received and forwarded by XDP program. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:39:58 +00:00
Haiyang Zhang	f90f84201e	net: mana: Add counter for packet dropped by XDP This counter will show up in ethtool stat data. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:39:58 +00:00
David S. Miller	780bf05f44	Merge branch 'smc-improvements' Tony Lu says: ==================== net/smc: Improvements for TCP_CORK and sendfile() Currently, SMC use default implement for syscall sendfile() [1], which is wildly used in nginx and big data sences. Usually, applications use sendfile() with TCP_CORK: fstat(20, {st_mode=S_IFREG\|0644, st_size=4096, ...}) = 0 setsockopt(19, SOL_TCP, TCP_CORK, [1], 4) = 0 writev(19, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=240}], 1) = 240 sendfile(19, 20, [0] => [4096], 4096) = 4096 close(20) = 0 setsockopt(19, SOL_TCP, TCP_CORK, [0], 4) = 0 The above is an example of Nginx, when sendfile() on, Nginx first enables TCP_CORK, write headers, the data will not be sent. Then call sendfile(), it reads file and write to sndbuf. When TCP_CORK is cleared, all pending data is sent out. The performance of the default implement of sendfile is lower than when it is off. After investigation, it shows two parts to improve: - unnecessary lock contention of delayed work - less data per send than when sendfile off Patch #1 tries to reduce lock_sock() contention in smc_tx_work(). Patch #2 removes timed work for corking, and let applications control it. See TCP_CORK [2] MSG_MORE [3]. Patch #3 adds MSG_SENDPAGE_NOTLAST for corking more data when sendfile(). Test environments: - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4 - socket sndbuf / rcvbuf: 16384 / 131072 bytes - server: smc_run nginx - client: smc_run ./wrk -c 100 -t 2 -d 30 http://192.168.100.1:8080/4k.html - payload: 4KB local disk file Items QPS sendfile off 272477.10 sendfile on (orig) 223622.79 sendfile on (this) 395847.21 This benchmark shows +45.28% improvement compared with sendfile off, and +77.02% compared with original sendfile implement. [1] https://man7.org/linux/man-pages/man2/sendfile.2.html [2] https://linux.die.net/man/7/tcp [3] https://man7.org/linux/man-pages/man2/send.2.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:08:20 +00:00
Tony Lu	be9a16ccca	net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag This introduces a new corked flag, MSG_SENDPAGE_NOTLAST, which is involved in syscall sendfile() [1], it indicates this is not the last page. So we can cork the data until the page is not specify this flag. It has the same effect as MSG_MORE, but existed in sendfile() only. This patch adds a option MSG_SENDPAGE_NOTLAST for corking data, try to cork more data before sending when using sendfile(), which acts like TCP's behaviour. Also, this reimplements the default sendpage to inform that it is supported to some extent. [1] https://man7.org/linux/man-pages/man2/sendfile.2.html Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:08:20 +00:00
Tony Lu	139653bc66	net/smc: Remove corked dealyed work Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options have the same effect. Applications can set these options and informs the kernel to pend the data, and send them out only when the socket or syscall does not specify this flag. In other words, there's no need to send data out by a delayed work, which will queue a lot of work. This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the applications control how/when to send them out. It improves the performance for sendfile and throughput, and remove unnecessary race of lock_sock(). This also unlocks the limitation of sndbuf, and try to fill it up before sending. [1] https://linux.die.net/man/7/tcp [2] https://man7.org/linux/man-pages/man2/send.2.html Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:08:20 +00:00
Tony Lu	ea785a1a57	net/smc: Send directly when TCP_CORK is cleared According to the man page of TCP_CORK [1], if set, don't send out partial frames. All queued partial frames are sent when option is cleared again. When applications call setsockopt to disable TCP_CORK, this call is protected by lock_sock(), and tries to mod_delayed_work() to 0, in order to send pending data right now. However, the delayed work smc_tx_work is also protected by lock_sock(). There introduces lock contention for sending data. To fix it, send pending data directly which acts like TCP, without lock_sock() protected in the context of setsockopt (already lock_sock()ed), and cancel unnecessary dealyed work, which is protected by lock. [1] https://linux.die.net/man/7/tcp Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:08:20 +00:00
David S. Miller	01b2a99515	Merge branch 'hash-rethink' Akhmat Karakotov says: ==================== Make hash rethink configurable As it was shown in the report by Alexander Azimov, hash rethink at the client-side may lead to connection timeout toward stateful anycast services. Tom Herbert created a patchset to address this issue by applying hash rethink only after a negative routing event (3RTOs) [1]. This change also affects server-side behavior, which we found undesirable. This patchset changes defaults in a way to make them safe: hash rethink at the client-side is disabled and enabled at the server-side upon each RTO event or in case of duplicate acknowledgments. This patchset provides two options to change default behaviour. The hash rethink may be disabled at the server-side by the new sysctl option. Changes in the sysctl option don't affect default behavior at the client-side. Hash rethink can also be enabled/disabled with socket option or bpf syscalls which ovewrite both default and sysctl settings. This socket option is available on both client and server-side. This should provide mechanics to enable hash rethink inside administrative domain, such as DC, where hash rethink at the client-side can be desirable. [1] https://lore.kernel.org/netdev/20210809185314.38187-1-tom@herbertland.com/ v2: - Changed sysctl default to ENABLED in all patches. Reduced sysctl and socket option size to u8. Fixed netns bug reported by kernel test robot. v3: - Fixed bug with bad u8 comparison. Moved sk_txrehash to use less bytes in struct. Added WRITE_ONCE() in setsockopt in and READ_ONCE() in tcp_rtx_synack. v4: - Rebase and add documentation for sysctl option. v5: - Move sk_txrehash out of busy poll ifdef. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:25 +00:00
Akhmat Karakotov	cb6cd2cec7	tcp: Change SYN ACK retransmit behaviour to account for rehash Disabling rehash behavior did not affect SYN ACK retransmits because hash was forcefully changed bypassing the sk_rethink_hash function. This patch adds a condition which checks for rehash mode before resetting hash. Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:25 +00:00
Akhmat Karakotov	e7b9bfd184	bpf: Add SO_TXREHASH setsockopt Add bpf socket option to override rehash behaviour from userspace or from bpf. Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:25 +00:00
Akhmat Karakotov	2127324a7d	txhash: Add txrehash sysctl description Update Documentation/admin-guide/sysctl/net.rst with txrehash usage description. Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:25 +00:00
Akhmat Karakotov	26859240e4	txhash: Add socket option to control TX hash rethink behavior Add the SO_TXREHASH socket option to control hash rethink behavior per socket. When default mode is set, sockets disable rehash at initialization and use sysctl option when entering listen state. setsockopt() overrides default behavior. Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:25 +00:00
Akhmat Karakotov	e187013abe	txhash: Make rethinking txhash behavior configurable via sysctl Add a per ns sysctl that controls the txhash rethink behavior: net.core.txrehash. When enabled, the same behavior is retained, when disabled, rethink is not performed. Sysctl is enabled by default. Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 15:05:24 +00:00
Gerhard Engleder	678dfd5280	selftests/net: timestamping: Fix bind_phc check timestamping checks socket options during initialisation. For the field bind_phc of the socket option SO_TIMESTAMPING it expects the value -1 if PHC is not bound. Actually the value of bind_phc is 0 if PHC is not bound. This results in the following output: SIOCSHWTSTAMP: tx_type 0 requested, got 0; rx_filter 0 requested, got 0 SO_TIMESTAMP 0 SO_TIMESTAMPNS 0 SO_TIMESTAMPING flags 0, bind phc 0 not expected, flags 0, bind phc -1 This is fixed by setting default value and expected value of bind_phc to 0. Fixes: `2214d70324` ("selftests/net: timestamping: support binding PHC") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:44:04 +00:00
David S. Miller	116ea68dc7	Merge branch 'renesas-dead-code' Sergey Shtylyov says: ==================== Remove some dead code in the Renesas Ethernet drivers Here are 2 patches against DaveM's 'net-next.git' repo. The Renesas drivers call their ndo_stop() methods directly and they always return 0, making the result checks pointless, hence remove them... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:42:13 +00:00
Sergey Shtylyov	e7d966f9ea	sh_eth: sh_eth_close() always returns 0 sh_eth_close() always returns 0, hence the check in sh_eth_wol_restore() is pointless (however we cannot change the prototype of sh_eth_close() as it implements the driver's ndo_stop() method). Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:42:13 +00:00
Sergey Shtylyov	be94a51f3e	ravb: ravb_close() always returns 0 ravb_close() always returns 0, hence the check in ravb_wol_restore() is pointless (however, we cannot change the prototype of ravb_close() as it implements the driver's ndo_stop() method). Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:42:13 +00:00
Wei Yongjun	cc4598cf17	net/fsl: xgmac_mdio: fix return value check in xgmac_mdio_probe() In case of error, the function devm_ioremap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: `1d14eb15dc` ("net/fsl: xgmac_mdio: Use managed device resources") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:36:19 +00:00
David Ahern	47ed9442b2	ipv4: Make ip_idents_reserve static ip_idents_reserve is only used in net/ipv4/route.c. Make it static and remove the export. Signed-off-by: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:33:10 +00:00
Heiner Kallweit	d192181c2c	r8169: add rtl_disable_exit_l1() Add rtl_disable_exit_l1() for ensuring that the chip doesn't inadvertently exit ASPM L1 when being in a low-power mode. The new function is called from rtl_prepare_power_down() which has to be moved in the code to avoid a forward declaration. According to Realtek OCP register 0xc0ac shadows ERI register 0xd4 on RTL8168 versions from RTL8168g. This allows to simplify the code a little. v2: - call rtl_disable_exit_l1() also if DASH or WoL are enabled Suggested-by: Chun-Hao Lin <hau@realtek.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:32:20 +00:00
Sergey Shtylyov	73c105ad2a	phy: make phy_set_max_speed() void After following the call tree of phy_set_max_speed(), it became clear that this function never returns anything but 0, so we can change its result type to void and drop the result checks from the three drivers that actually bothered to do it... Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:30:56 +00:00
David S. Miller	fe8930278c	Merge branch 'dsa-mv88e6xxx-Improve-indirect-addressing-performance' Tobias Waldekranz says: ==================== net: dsa: mv88e6xxx: Improve indirect addressing performance The individual patches have all the details. This work was triggered by recent work on a platform that took 16s (sic) to load the mv88e6xxx module. The first patch gets rid of most of that time by replacing a very long delay with a tighter poll loop to wait for the busy bit to clear. The second patch shaves off some more time by avoiding redundant busy-bit-checks, saving 1 out of 4 MDIO operations for every register read/write in the optimal case. v1 -> v2: - Make sure that we always poll the busy bit at least twice, in the unlikely event that the first one is quick to query the hardware, but is then scheduled out for a long time before the timeout is checked. v2 -> v3: - Fallback to the longer sleeps after the initial two poll attempts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:29:13 +00:00
Tobias Waldekranz	7bca16b22e	net: dsa: mv88e6xxx: Improve indirect addressing performance Before this change, both the read and write callback would start out by asserting that the chip's busy flag was cleared. However, both callbacks also made sure to wait for the clearing of the busy bit before returning - making the initial check superfluous. The only time that would ever have an effect was if the busy bit was initially set for some reason. With that in mind, make sure to perform an initial check of the busy bit, after which both read and write can rely the previous operation to have waited for the bit to clear. This cuts the number of operations on the underlying MDIO bus by 25% Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:29:12 +00:00
Tobias Waldekranz	35da1dfd94	net: dsa: mv88e6xxx: Improve performance of busy bit polling Avoid a long delay when a busy bit is still set and has to be polled again. Measurements on a system with 2 Opals (6097F) and one Agate (6352) show that even with this much tighter loop, we have about a 50% chance of the bit being cleared on the first poll, all other accesses see the bit being cleared on the second poll. On a standard MDIO bus running MDC at 2.5MHz, a single access with 32 bits of preamble plus 32 bits of data takes 64(1/2.5MHz) = 25.6us. This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow case - bringing the average close to 1ms. With this change in place, the slow case is closer to 226us + CPU overhead, with the average well below 100us - a 10x improvement. This translates to real-world winnings. On a 3-chip 20-port system, the modprobe time drops by 88%: Before: root@coronet:~# time modprobe mv88e6xxx real 0m 15.99s user 0m 0.00s sys 0m 1.52s After: root@coronet:~# time modprobe mv88e6xxx real 0m 2.21s user 0m 0.00s sys 0m 1.54s Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:29:12 +00:00
Sun Shouxin	0da8aa00bf	net: bonding: Add support for IPV6 ns/na to balance-alb/balance-tlb mode Since ipv6 neighbor solicitation and advertisement messages isn't handled gracefully in bond6 driver, we can see packet drop due to inconsistency between mac address in the option message and source MAC . Another examples is ipv6 neighbor solicitation and advertisement messages from VM via tap attached to host bridge, the src mac might be changed through balance-alb mode, but it is not synced with Link-layer address in the option message. The patch implements bond6's tx handle for ipv6 neighbor solicitation and advertisement messages. Suggested-by: Hu Yadi <huyd12@chinatelecom.cn> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-31 11:24:12 +00:00
Jakub Kicinski	4f0e30407e	ipv4: drop fragmentation code from ip_options_build() Since v2.5.44 and addition of ip_options_fragment() ip_options_build() does not render headers for fragments directly. @is_frag is always 0. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-29 17:53:07 +00:00
David S. Miller	ff58831fa0	Merge branch 'Cadence-ZyncMP-SGMII' Robert Hancock says: ==================== Cadence MACB/GEM support for ZynqMP SGMII Changes to allow SGMII mode to work properly in the GEM driver on the Xilinx ZynqMP platform. Changes since v3: -more code formatting and error handling fixes Changes since v2: -fixed missing includes in DT binding example -fixed phy_init and phy_power_on error handling/cleanup, moved phy_power_on to open rather than probe Changes since v1: -changed order of controller reset and PHY init as per suggestion -switched device reset to be optional -updated bindings doc patch for switch to YAML ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-29 17:49:21 +00:00
Robert Hancock	e461bd6f43	arm64: dts: zynqmp: Added GEM reset definitions The Cadence GEM/MACB driver now utilizes the platform-level reset on the ZynqMP platform. Add reset definitions to the ZynqMP platform device tree to allow this to be used. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-29 17:49:21 +00:00

1 2 3 4 5 ...

1072653 commits