linux-stable

Commit Graph

Author	SHA1	Message	Date
Ahmed Zaki	fb6e30a725	net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters as direct function arguments. This will force us to change the API (and all drivers' functions) every time some new parameters are added. This is part 1/2 of the fix, as suggested in [1]: - First simplify the code by always providing a pointer to all params (indir, key and func); the fact that some of them may be NULL seems like a weird historic thing or a premature optimization. It will simplify the drivers if all pointers are always present. - Then make the functions take a dev pointer, and a pointer to a single struct wrapping all arguments. The set_* should also take an extack. Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1] Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-13 22:07:16 -08:00
justinstitt@google.com	e403cffff1	net: Convert some ethtool_sprintf() to ethtool_puts() This patch converts some basic cases of ethtool_sprintf() to ethtool_puts(). The conversions are used in cases where ethtool_sprintf() was being used with just two arguments: \| ethtool_sprintf(&data, buffer[i].name); or when it's used with format string: "%s" \| ethtool_sprintf(&data, "%s", buffer[i].name); which both now become: \| ethtool_puts(&data, buffer[i].name); Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-08 10:56:25 +00:00
William Tu	54f00cce11	vmxnet3: Add XDP support. The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. Background: The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma mapped to the ring's descriptor. If LRO is enabled and packet size larger than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is allocated using alloc_page. So for LRO packets, the payload will be in one buffer from r0 and multiple from r1, for non-LRO packets, only one descriptor in r0 is used for packet size less than 3k. When receiving a packet, the first descriptor will have the sop (start of packet) bit set, and the last descriptor will have the eop (end of packet) bit set. Non-LRO packets will have only one descriptor with both sop and eop set. Other than r0 and r1, vmxnet3 dataring is specifically designed for handling packets with small size, usually 128 bytes, defined in VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in order to avoid memory mapping/unmapping overhead. In summary, packet size: A. < 128B: use dataring B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB) C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE) As a result, the patch adds XDP support for packets using dataring and r0 (case A and B), not the large packet size when LRO is enabled. XDP Implementation: When user loads and XDP prog, vmxnet3 driver checks configurations, such as mtu, lro, and re-allocate the rx buffer size for reserving the extra headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be associated with every rx queue of the device. Note that when using dataring for small packet size, vmxnet3 (front-end driver) doesn't control the buffer allocation, as a result we allocate a new page and copy packet from the dataring to XDP frame. The receive side of XDP is implemented for case A and B, by invoking the bpf program at vmxnet3_rq_rx_complete and handle its returned action. The vmxnet3_process_xdp(), vmxnet3_process_xdp_small() function handles the ring0 and dataring case separately, and decides the next journey of the packet afterward. For TX, vmxnet3 has split header design. Outgoing packets are parsed first and protocol headers (L2/L3/L4) are copied to the backend. The rest of the payload are dma mapped. Since XDP_TX does not parse the packet protocol, the entire XDP frame is dma mapped for transmission and transmitted in a batch. Later on, the frame is freed and recycled back to the memory pool. Performance: Tested using two VMs inside one ESXi vSphere 7.0 machine, using single core on each vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3 device, sending 64B or 512B UDP packet. VM1 txgen: $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \ --forward-mode=txonly --eth-peer=0,<mac addr of vm2> option: add "--txonly-multi-flow" option: use --txpkts=512 or 64 byte VM2 running XDP: $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> options: XDP_DROP, XDP_PASS, XDP_TX To test REDIRECT to cpu 0, use $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop Single core performance comparison with skb-mode. 64B: skb-mode -> native-mode XDP_DROP: 1.6Mpps -> 2.4Mpps XDP_PASS: 338Kpps -> 367Kpps XDP_TX: 1.1Mpps -> 2.3Mpps REDIRECT-drop: 1.3Mpps -> 2.3Mpps 512B: skb-mode -> native-mode XDP_DROP: 863Kpps -> 1.3Mpps XDP_PASS: 275Kpps -> 376Kpps XDP_TX: 554Kpps -> 1.2Mpps REDIRECT-drop: 659Kpps -> 1.2Mpps Demo: https://youtu.be/4lm1CSCi78Q Future work: - XDP frag support - use napi_consume_skb() instead of dev_kfree_skb_any at unmap - stats using u64_stats_t - using bitfield macro BIT() - optimization for DMA synchronization using actual frame length, instead of always max_len Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-14 08:03:52 +01:00
Yunsheng Lin	b51f4113eb	net: introduce and use skb_frag_fill_page_desc() Most users use __skb_frag_set_page()/skb_frag_off_set()/ skb_frag_size_set() to fill the page desc for a skb frag. Introduce skb_frag_fill_page_desc() to do that. net/bpf/test_run.c does not call skb_frag_off_set() to set the offset, "copy_from_user(page_address(page), ...)" and 'shinfo' being part of the 'data' kzalloced in bpf_test_init() suggest that it is assuming offset to be initialized as zero, so call skb_frag_fill_page_desc() with offset being zero for this case. Also, skb_frag_set_page() is not used anymore, so remove it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-13 19:47:56 +01:00
Seiji Nishikawa	6f4833383e	net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete() When vmxnet3_rq_create() fails to allocate rq->data_ring.base due to page allocation failure, subsequent call to vmxnet3_rq_rx_complete() can result in NULL pointer dereference. To fix this bug, check not only that rxDataRingUsed is true but also that adapter->rxdataring_enabled is true before calling memcpy() in vmxnet3_rq_rx_complete(). [1728352.477993] ethtool: page allocation failure: order:9, mode:0x6000c0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 ... [1728352.478009] Call Trace: [1728352.478028] dump_stack+0x41/0x60 [1728352.478035] warn_alloc.cold.120+0x7b/0x11b [1728352.478038] ? _cond_resched+0x15/0x30 [1728352.478042] ? __alloc_pages_direct_compact+0x15f/0x170 [1728352.478043] __alloc_pages_slowpath+0xcd3/0xd10 [1728352.478047] __alloc_pages_nodemask+0x2e2/0x320 [1728352.478049] __dma_direct_alloc_pages.constprop.25+0x8a/0x120 [1728352.478053] dma_direct_alloc+0x5a/0x2a0 [1728352.478056] vmxnet3_rq_create.part.57+0x17c/0x1f0 [vmxnet3] ... [1728352.478188] vmxnet3 0000:0b:00.0 ens192: rx data ring will be disabled ... [1728352.515347] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 ... [1728352.515440] RIP: 0010:memcpy_orig+0x54/0x130 ... [1728352.515655] Call Trace: [1728352.515665] <IRQ> [1728352.515672] vmxnet3_rq_rx_complete+0x419/0xef0 [vmxnet3] [1728352.515690] vmxnet3_poll_rx_only+0x31/0xa0 [vmxnet3] ... Signed-off-by: Seiji Nishikawa <snishika@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-19 09:03:05 +01:00
Ronak Doshi	3bced313b9	vmxnet3: use gro callback when UPT is enabled Currently, vmxnet3 uses GRO callback only if LRO is disabled. However, on smartNic based setups where UPT is supported, LRO can be enabled from guest VM but UPT devicve does not support LRO as of now. In such cases, there can be performance degradation as GRO is not being done. This patch fixes this issue by calling GRO API when UPT is enabled. We use updateRxProd to determine if UPT mode is active or not. To clarify few things discussed over the thread: The patch is not neglecting any feature bits nor disabling GRO. It uses GRO callback when UPT is active as LRO is not available in UPT. GRO callback cannot be used as default for all cases as it degrades performance for non-UPT cases or for cases when LRO is already done in ESXi. Cc: stable@vger.kernel.org Fixes: `6f91f4ba04` ("vmxnet3: add support for capability registers") Signed-off-by: Ronak Doshi <doshir@vmware.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230323200721.27622-1-doshir@vmware.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-24 19:13:49 -07:00
Ronak Doshi	ec76d0c2da	vmxnet3: move rss code block under eop descriptor Commit `b3973bb400` ("vmxnet3: set correct hash type based on rss information") added hashType information into skb. However, rssType field is populated for eop descriptor. This can lead to incorrectly reporting of hashType for packets which use multiple rx descriptors. Multiple rx descriptors are used for Jumbo frame or LRO packets, which can hit this issue. This patch moves the RSS codeblock under eop descritor. Cc: stable@vger.kernel.org Fixes: `b3973bb400` ("vmxnet3: set correct hash type based on rss information") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Peng Li <lpeng@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Link: https://lore.kernel.org/r/20230208223900.5794-1-doshir@vmware.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:50:46 -08:00
Ronak Doshi	3d8f2c4269	vmxnet3: correctly report csum_level for encapsulated packet Commit `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the pathc did not report correctly the csum_level for encapsulated packet. This patch fixes this issue by reporting correct csum level for the encapsulated packet. Fixes: `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Peng Li <lpeng@vmware.com> Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-21 17:55:30 -08:00
Ronak Doshi	409e8ec8c5	vmxnet3: use correct intrConf reference when using extended queues Commit `39f9895a00` ("vmxnet3: add support for 32 Tx/Rx queues") added support for 32Tx/Rx queues. As a part of this patch, intrConf structure was extended to incorporate increased queues. This patch fixes the issue where incorrect reference is being used. Fixes: `39f9895a00` ("vmxnet3: add support for 32 Tx/Rx queues") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:30:07 +00:00
Ronak Doshi	40b8c2a1af	vmxnet3: correctly report encapsulated LRO packet Commit `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the pathc did not report correctly the encapsulated packet which is LRO'ed by the hypervisor. This patch fixes this issue by using correct callback for the LRO'ed encapsulated packet. Fixes: `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:30:07 +00:00
Jakub Kicinski	b48b89f9c1	net: drop the weight argument from netif_napi_add We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-28 18:57:14 -07:00
Wolfram Sang	fb3ceec187	net: move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-31 14:11:07 -07:00
Ronak Doshi	5b91884bf5	vmxnet3: do not reschedule napi for rx processing Commit '2c5a5748105a ("vmxnet3: add support for out of order rx completion")' added support for out of order rx completion. Within that patch, an enhancement was done to reschedule napi for processing rx completions. However, it can lead to missing an interrupt. So, this patch reverts that part of the code. Fixes: `2c5a574810` ("vmxnet3: add support for out of order rx completion") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-29 12:11:23 +01:00
Andrey Turkin	ffcdd1197d	vmxnet3: Implement ethtool's get_channels command Some tools (e.g. libxdp) use that information. Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:21:15 +01:00
Andrey Turkin	bdeed8b095	vmxnet3: Record queue number to incoming packets Make generic XDP processing attribute packets to their actual queues instead of queue #0. This improves AF_XDP performance considerably since softirq threads no longer fight over single AF_XDP socket spinlock. Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com> Link: https://lore.kernel.org/r/20220717022050.822766-2-andrey.turkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:38:53 -07:00
Ronak Doshi	a56b158a50	vmxnet3: disable overlay offloads if UPT device does not support 'Commit `6f91f4ba04` ("vmxnet3: add support for capability registers")' added support for capability registers. These registers are used to advertize capabilities of the device. The patch updated the dev_caps to disable outer checksum offload if PTCR register does not support it. However, it missed to update other overlay offloads. This patch fixes this issue. Fixes: `6f91f4ba04` ("vmxnet3: add support for capability registers") Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-20 09:12:46 +01:00
Ronak Doshi	acc38e041b	vmxnet3: update to version 7 With all vmxnet3 version 7 changes incorporated in the vmxnet3 driver, the driver can configure emulation to run at vmxnet3 version 7, provided the emulation advertises support for version 7. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:01 +02:00
Ronak Doshi	60cafa0395	vmxnet3: use ext1 field to indicate encapsulated packet Till vmxnet3 version 6, om field of transmit descriptor was used to indicate encapsulated offload packet and msscof was used to indirectly indicate TSO/CSO. From version 7 and later, ext1 field will be used to indicate whether packet is encapsulated or not and om fields will continue to indicate if the packet is TSO or CSO. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:01 +02:00
Ronak Doshi	d2857b99a7	vmxnet3: limit number of TXDs used for TSO packet Currently, vmxnet3 does not have a limit on number of descriptors used for a TSO packet. However, with UPT, for hardware performance reasons, this patch limits the number of transmit descriptors to 24 for a TSO packet. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:01 +02:00
Ronak Doshi	c7112ebd27	vmxnet3: add command to set ring buffer sizes This patch adds a new command to set ring buffer sizes. This is required to pass the buffer size information to passthrough devices. For performance reasons, with version7 and later, ring1 will contain only mtu size buffers (bound to 3K). Packets > 3K will use both ring1 and ring2. Also, ring sizes are round down to power of 2 and ring2 default size is increased to 512. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:01 +02:00
Ronak Doshi	2c5a574810	vmxnet3: add support for out of order rx completion Currently, vmxnet3 processes rx completions in-order i.e. no out of order completion descriptor is expected. With UPT, if hardware supports LRO, then hardware can report out of order rx completions. This patch enhances vmxnet3 to add this support. This supports gets effective only when the corresponding feature bit is set. Also, minor enhancements are done for performance. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:01 +02:00
Ronak Doshi	543fb67405	vmxnet3: add support for large passthrough BAR register For vmxnet3 to work in UPT mode, the BAR sizes have been increased. The PT page has been extended to 2 pages and also includes OOB pages as a part of PT BAR. This patch enhances vmxnet3 to use appropriate BAR offsets based on the capability registered. To use new offsets, VMXNET3_CAP_LARGE_BAR needs to be set by the device. If it is not set then the device will use legacy PT page layout. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:00 +02:00
Ronak Doshi	6f91f4ba04	vmxnet3: add support for capability registers This patch enhances vmxnet3 to suuport capability registers which allows it to enable features selectively. The DCR register tracks the capabilities vmxnet3 device supports. The PTCR register states the capabilities that the passthrough device supports. With the help of these registers, vmxnet3 can enable only those features which the passthrough device supoprts. This allows smooth trasition to Uniform-Passthrough (UPT) mode if the virtual nic requests it. If PTCR register returns nothing or error it means UPT is not being requested and vnic will continue in emulation mode. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:00 +02:00
Ronak Doshi	55f0395fca	vmxnet3: prepare for version 7 changes vmxnet3 is currently at version 6 and this patch initiates the preparation to accommodate changes for upto version 7. Introduced utility macros for vmxnet3 version 7 comparison and update Copyright information. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-09 12:42:00 +02:00
Zixuan Fu	edf410cb74	net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup() In vmxnet3_rq_create(), when dma_alloc_coherent() fails, vmxnet3_rq_destroy() is called. It sets rq->rx_ring[i].base to NULL. Then vmxnet3_rq_create() returns an error to its callers mxnet3_rq_create_all() -> vmxnet3_change_mtu(). Then vmxnet3_change_mtu() calls vmxnet3_force_close() -> dev_close() in error handling code. And the driver calls vmxnet3_close() -> vmxnet3_quiesce_dev() -> vmxnet3_rq_cleanup_all() -> vmxnet3_rq_cleanup(). In vmxnet3_rq_cleanup(), rq->rx_ring[ring_idx].base is accessed, but this variable is NULL, causing a NULL pointer dereference. To fix this possible bug, an if statement is added to check whether rq->rx_ring[0].base is NULL in vmxnet3_rq_cleanup() and exit early if so. The error log in our fault-injection testing is shown as follows: [ 65.220135] BUG: kernel NULL pointer dereference, address: 0000000000000008 ... [ 65.222633] RIP: 0010:vmxnet3_rq_cleanup_all+0x396/0x4e0 [vmxnet3] ... [ 65.227977] Call Trace: ... [ 65.228262] vmxnet3_quiesce_dev+0x80f/0x8a0 [vmxnet3] [ 65.228580] vmxnet3_close+0x2c4/0x3f0 [vmxnet3] [ 65.228866] __dev_close_many+0x288/0x350 [ 65.229607] dev_close_many+0xa4/0x480 [ 65.231124] dev_close+0x138/0x230 [ 65.231933] vmxnet3_force_close+0x1f0/0x240 [vmxnet3] [ 65.232248] vmxnet3_change_mtu+0x75d/0x920 [vmxnet3] ... Fixes: `d1a890fa37` ("net: VMware virtual Ethernet NIC driver: vmxnet3") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Zixuan Fu <r33s3n6@gmail.com> Link: https://lore.kernel.org/r/20220514050711.2636709-1-r33s3n6@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-17 12:03:52 +02:00
Zixuan Fu	9e7fef9521	net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf() In vmxnet3_rq_alloc_rx_buf(), when dma_map_single() fails, rbi->skb is freed immediately. Similarly, in another branch, when dma_map_page() fails, rbi->page is also freed. In the two cases, vmxnet3_rq_alloc_rx_buf() returns an error to its callers vmxnet3_rq_init() -> vmxnet3_rq_init_all() -> vmxnet3_activate_dev(). Then vmxnet3_activate_dev() calls vmxnet3_rq_cleanup_all() in error handling code, and rbi->skb or rbi->page are freed again in vmxnet3_rq_cleanup_all(), causing use-after-free bugs. To fix these possible bugs, rbi->skb and rbi->page should be cleared after they are freed. The error log in our fault-injection testing is shown as follows: [ 14.319016] BUG: KASAN: use-after-free in consume_skb+0x2f/0x150 ... [ 14.321586] Call Trace: ... [ 14.325357] consume_skb+0x2f/0x150 [ 14.325671] vmxnet3_rq_cleanup_all+0x33a/0x4e0 [vmxnet3] [ 14.326150] vmxnet3_activate_dev+0xb9d/0x2ca0 [vmxnet3] [ 14.326616] vmxnet3_open+0x387/0x470 [vmxnet3] ... [ 14.361675] Allocated by task 351: ... [ 14.362688] __netdev_alloc_skb+0x1b3/0x6f0 [ 14.362960] vmxnet3_rq_alloc_rx_buf+0x1b0/0x8d0 [vmxnet3] [ 14.363317] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3] [ 14.363661] vmxnet3_open+0x387/0x470 [vmxnet3] ... [ 14.367309] [ 14.367412] Freed by task 351: ... [ 14.368932] __dev_kfree_skb_any+0xd2/0xe0 [ 14.369193] vmxnet3_rq_alloc_rx_buf+0x71e/0x8d0 [vmxnet3] [ 14.369544] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3] [ 14.369883] vmxnet3_open+0x387/0x470 [vmxnet3] [ 14.370174] __dev_open+0x28a/0x420 [ 14.370399] __dev_change_flags+0x192/0x590 [ 14.370667] dev_change_flags+0x7a/0x180 [ 14.370919] do_setlink+0xb28/0x3570 [ 14.371150] rtnl_newlink+0x1160/0x1740 [ 14.371399] rtnetlink_rcv_msg+0x5bf/0xa50 [ 14.371661] netlink_rcv_skb+0x1cd/0x3e0 [ 14.371913] netlink_unicast+0x5dc/0x840 [ 14.372169] netlink_sendmsg+0x856/0xc40 [ 14.372420] ____sys_sendmsg+0x8a7/0x8d0 [ 14.372673] __sys_sendmsg+0x1c2/0x270 [ 14.372914] do_syscall_64+0x41/0x90 [ 14.373145] entry_SYSCALL_64_after_hwframe+0x44/0xae ... Fixes: `5738a09d58` ("vmxnet3: fix checks for dma mapping errors") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Zixuan Fu <r33s3n6@gmail.com> Link: https://lore.kernel.org/r/20220514050656.2636588-1-r33s3n6@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-17 12:02:27 +02:00
Christophe JAILLET	c38f306839	vmxnet3: Remove useless DMA-32 fallback configuration As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32 bits case will also fail for the same reason. So if dma_set_mask_and_coherent() succeeds, 'dma64' is know to be 'true'. Simplify code and remove some dead code accordingly. [1]: https://lkml.org/lkml/2021/6/7/398 Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/43e5dcf1a5e9e9c5d2d86f87810d6e93e3d22e32.1641718188.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-09 16:52:19 -08:00
Jakub Kicinski	3150a73366	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-09 13:23:02 -08:00
Ronak Doshi	f71ef02f1a	vmxnet3: fix minimum vectors alloc issue 'Commit `39f9895a00` ("vmxnet3: add support for 32 Tx/Rx queues")' added support for 32Tx/Rx queues. Within that patch, value of VMXNET3_LINUX_MIN_MSIX_VECT was updated. However, there is a case (numvcpus = 2) which actually requires 3 intrs which matches VMXNET3_LINUX_MIN_MSIX_VECT which then is treated as failure by stack to allocate more vectors. This patch fixes this issue. Fixes: `39f9895a00` ("vmxnet3: add support for 32 Tx/Rx queues") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Link: https://lore.kernel.org/r/20211207081737.14000-1-doshir@vmware.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 17:56:39 -08:00
Hao Chen	7462494408	ethtool: extend ringparam setting/getting API with rx_buf_len Add two new parameters kernel_ringparam and extack for .get_ringparam and .set_ringparam to extend more ring params through netlink. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-22 12:31:49 +00:00
Jean Sacren	1d6d336fed	net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c In one if branch, (ec->rx_coalesce_usecs != 0) is checked. When it is checked again in two more places, it is always false and has no effect on the whole check expression. We should remove it in both places. In another if branch, (ec->use_adaptive_rx_coalesce != 0) is checked. When it is checked again, it is always false. We should remove the entire branch with it. In addition we might as well let C precedence dictate by getting rid of two pairs of parentheses in the neighboring lines in order to keep expressions on both sides of '\|\|' in balance with checkpatch warning silenced. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Link: https://lore.kernel.org/r/20211031012728.8325-1-sakiwit@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-01 16:35:27 -07:00
Jakub Kicinski	7df621a3ee	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h `7b50ecfcc6` ("net: Rename ->stream_memory_read to ->sock_is_readable") `4c1e34c0db` ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c `0daa55d033` ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") `e77bcdd1f6` ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-28 10:43:58 -07:00
Dongli Zhang	9159f10240	vmxnet3: do not stop tx queues after netif_device_detach() The netif_device_detach() conditionally stops all tx queues if the queues are running. There is no need to call netif_tx_stop_all_queues() again. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-28 12:51:17 +01:00
Jakub Kicinski	8bc7823ed3	net: drivers: get ready for const netdev->dev_addr Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. We will make netdev->dev_addr a const. Make sure local references to netdev->dev_addr are constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:59:45 +01:00
Jakub Kicinski	ea52a0b58e	net: use dev_addr_set() Use dev_addr_set() instead of writing directly to netdev->dev_addr in various misc and old drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-09 11:55:01 +01:00
Yufeng Mo	f3ccfda193	ethtool: extend coalesce setting uAPI with CQE mode In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-24 07:38:29 -07:00
Christophe JAILLET	bf7bec4620	vmxnet3: switch from 'pci_' to 'dma_' API The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below. It has been hand modified to use 'dma_set_mask_and_coherent()' instead of 'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable. This is less verbose. The explicit 'err = -EIO;' has been removed because 'dma_set_mask_and_coherent()' returns 0 or -EIO, so its return code can be used directly. It has been compile tested. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-23 12:02:28 +01:00
Ronak Doshi	ce2639ad69	vmxnet3: update to version 6 With all vmxnet3 version 6 changes incorporated in the vmxnet3 driver, the driver can configure emulation to run at vmxnet3 version 6, provided the emulation advertises support for version 6. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	8c5663e461	vmxnet3: increase maximum configurable mtu to 9190 This patch increases the maximum configurable mtu to 9190 to accommodate jumbo packets of overlay traffic. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	b3973bb400	vmxnet3: set correct hash type based on rss information As vmxnet3 supports IP/TCP/UDP RSS, this patch sets appropriate hash type based on the type of RSS performed. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	79d124bb36	vmxnet3: add support for ESP IPv6 RSS Vmxnet3 version 4 added support for ESP RSS. However, only IPv4 was supported. With vmxnet3 version 6, this patch enables RSS for ESP IPv6 packets as well. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	15ccf2f4b0	vmxnet3: remove power of 2 limitation on the queues With version 6, vmxnet3 relaxes the restriction on queues to be power of two. This is helpful in cases (Edge VM) where vcpus are less than 8 and device requires more than 4 queues. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	39f9895a00	vmxnet3: add support for 32 Tx/Rx queues Currently, vmxnet3 supports maximum of 8 Tx/Rx queues. With increase in number of vcpus on a VM, to achieve better performance and utilize idle vcpus, we need to increase the max number of queues supported. This patch enhances vmxnet3 to support maximum of 32 Tx/Rx queues. Increasing the Rx queues also increases the probability of distrubuting the traffic from different flows to different queues with RSS. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	69dbef0d1c	vmxnet3: prepare for version 6 changes vmxnet3 is currently at version 4 and this patch initiates the preparation to accommodate changes for upto version 6. Introduced utility macros for vmxnet3 version 6 comparison and update Copyright information. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	b22580233d	vmxnet3: fix cksum offload issues for tunnels with non-default udp ports Commit `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the inner offload capability is to be restricted to UDP tunnels with default Vxlan and Geneve ports. This patch fixes the issue for tunnels with non-default ports using features check capability and filtering appropriate features for such tunnels. Fixes: `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-02 13:41:15 -07:00
Alexander Duyck	3b78b3067f	vmxnet3: Update driver to use ethtool_sprintf So this patch actually does 3 things. First it removes a stray white space at the start of the variable declaration in vmxnet3_get_strings. Second it flips the logic for the string test so that we exit immediately if we are not looking for the stats strings. Doing this we can avoid unnecessary indentation and line wrapping. Then finally it updates the code to use ethtool_sprintf rather than a memcpy and pointer increment to write the ethtool strings. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-17 11:42:31 -07:00
Ronak Doshi	de1da8bcf4	vmxnet3: Remove buf_info from device accessible structures buf_info structures in RX & TX queues are private driver data that do not need to be visible to the device. Although there is physical address and length in the queue descriptor that points to these structures, their layout is not standardized, and device never looks at them. So lets allocate these structures in non-DMA-able memory, and fill physical address as all-ones and length as zero in the queue descriptor. That should alleviate worries brought by Martin Radev in https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210104/022829.html that malicious vmxnet3 device could subvert SVM/TDX guarantees. Signed-off-by: Petr Vandrovec <petr@vmware.com> Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:07:03 -08:00
Ronak Doshi	1dac3b1bc6	vmxnet3: fix cksum offload issues for non-udp tunnels Commit `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the inner offload capability is to be restrictued to UDP tunnels. This patch fixes the issue for non-udp tunnels by adding features check capability and filtering appropriate features for non-udp tunnels. Fixes: `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 16:41:40 -07:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Ronak Doshi	8a7f280f29	vmxnet3: use correct tcp hdr length when packet is encapsulated Commit `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, while calculating tcp hdr length, it does not take into account if the packet is encapsulated or not. This patch fixes this issue by using correct reference for inner tcp header. Fixes: `dacce2be33` ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-10 12:09:38 -07:00

1 2 3 4 5 ...

259 Commits