linux-stable

Commit Graph

Author	SHA1	Message	Date
Moshe Shemesh	c12f4c6ac3	net/mlx5: Move fw reset unload to mlx5_fw_reset_complete_reload Refactor fw reset code to have the unload driver part done on mlx5_fw_reset_complete_reload(), so if it was called by the PF which initiated the reload fw activate flow, the unload part will be handled by the mlx5_devlink_reload_fw_activate() callback itself and not by the reset event work. This will be used by the downstream patch to invoke devlink reload callbacks with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:58:46 -07:00
Jiri Pirko	2dec18ad82	net: devlink: remove region snapshots list dependency on devlink->lock After mlx4 driver is converted to do locked reload, devlink_region_snapshot_create() may be called from both locked and unlocked context. Note that in mlx4 region snapshots could be created on any command failure. That can happen in any flow that involves commands to FW, which means most of the driver flows. So resolve this by removing dependency on devlink->lock for region snapshots list consistency and introduce new mutex to ensure it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:58:46 -07:00
Jiri Pirko	5502e8712c	net: devlink: remove region snapshot ID tracking dependency on devlink->lock After mlx4 driver is converted to do locked reload, functions to get/put regions snapshot ID may be called from both locked and unlocked context. So resolve this by removing dependency on devlink->lock for region snapshot ID tracking by using internal xa_lock() to maintain shapshot_ids xa_array consistency. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:58:46 -07:00
Jakub Kicinski	1515a1b899	Merge branch 'add-framework-for-selftests-in-devlink' Vikas Gupta says: ==================== add framework for selftests in devlink Add support for selftests in the devlink framework. Adds a callback .selftests_check and .selftests_run in devlink_ops. User can add test(s) suite which is subsequently passed to the driver and driver can opt for running particular tests based on its capabilities. Patchset adds a flash based test for the bnxt_en driver. ==================== Link: https://lore.kernel.org/r/20220727165721.37959-1-vikas.gupta@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:57:11 -07:00
vikas	5b6ff128fd	bnxt_en: implement callbacks for devlink selftests Add callbacks ============= .selftest_check: returns true for flash selftest. .selftest_run: runs a flash selftest. Also, refactor NVM APIs so that they can be used with devlink and ethtool both. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:56:53 -07:00
Vikas Gupta	08f588fa30	devlink: introduce framework for selftests Add a framework for running selftests. Framework exposes devlink commands and test suite(s) to the user to execute and query the supported tests by the driver. Below are new entries in devlink_nl_ops devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported selftests by the drivers. devlink_nl_cmd_selftests_run: To execute selftests. Users can provide a test mask for executing group tests or standalone tests. Documentation/networking/devlink/ path is already part of MAINTAINERS & the new files come under this path. Hence no update needed to the MAINTAINERS Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:56:53 -07:00
Jakub Kicinski	68be7b82e7	Merge branch 'mlx5e-use-tls-tx-pool-to-improve-connection-rate' Tariq Toukan says: ==================== mlx5e use TLS TX pool to improve connection rate To offload encryption operations, the mlx5 device maintains state and keeps track of every kTLS device-offloaded connection. Two HW objects are used per TX context of a kTLS offloaded connection: a. Transport interface send (TIS) object, to reach the HW context. b. Data Encryption Key (DEK) to perform the crypto operations. These two objects are created and destroyed per TLS TX context, via FW commands. In total, 4 FW commands are issued per TLS TX context, which seriously limits the connection rate. In this series, we aim to save creation and destroy of TIS objects by recycling them. Upon recycling of a TIS, the HW still needs to be notified for the re-mapping between a TIS and a context. This is done by posting WQEs via an SQ, significantly faster API than the FW command interface. A pool is used for recycling. The pool dynamically interacts to the load and connection rate, growing and shrinking accordingly. Saving the TIS FW commands per context increases connection rate by ~42%, from 11.6K to 16.5K connections per sec. Connection rate is still limited by FW bottleneck due to the remaining per context FW commands (DEK create/destroy). This will soon be addressed in a followup series. By combining the two series, the FW bottleneck will be released, and a significantly higher (about 100K connections per sec) kTLS TX device-offloaded connection rate is reached. ==================== Link: https://lore.kernel.org/r/20220727094346.10540-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:57 -07:00
Tariq Toukan	624bf09921	net/mlx5e: kTLS, Dynamically re-size TX recycling pool Let the TLS TX recycle pool be more flexible in size, by continuously and dynamically allocating and releasing HW resources in response to changes in the connections rate and load. Allocate and release pool entries in bulks (16). Use a workqueue to release/allocate in the background. Allocate a new bulk when the pool size goes lower than the low threshold (1K). Symmetric operation is done when the pool size gets greater than the upper threshold (4K). Every idle pool entry holds: 1 TIS, 1 DEK (HW resources), in addition to ~100 bytes in host memory. Start with an empty pool to minimize memory and HW resources waste for non-TLS users that have the device-offload TLS enabled. Upon a new request, in case the pool is empty, do not wait for a whole bulk allocation to complete. Instead, trigger an instant allocation of a single resource to reduce latency. Performance tests: Before: 11,684 CPS After: 16,556 CPS Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:55 -07:00
Tariq Toukan	c4dfe704f5	net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections The transport interface send (TIS) object is responsible for performing all transport related operations of the transmit side. The ConnectX HW uses a TIS object to save and access the TLS crypto information and state of an offloaded TX kTLS connection. Before this patch, we used to create a new TIS per connection and destroy it once it’s closed. Every create and destroy of a TIS is a FW command. Same applies for the private TLS context, where we used to dynamically allocate and free it per connection. Resources recycling reduce the impact of the allocation/free operations and helps speeding up the connection rate. In this feature we maintain a pool of TX objects and use it to recycle the resources instead of re-creating them per connection. A cached TIS popped from the pool is updated to serve the new connection via the fast-path HW interface, updating the tls static and progress params. This is a very fast operation, significantly faster than FW commands. On recycling, a WQE fence is required after the context params change. This guarantees that the data is sent after the context has been successfully updated in hardware, and that the context modification doesn't interfere with existing traffic. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:55 -07:00
Tariq Toukan	23b1cf1e3f	net/mlx5e: kTLS, Take stats out of OOO handler Let the caller of mlx5e_ktls_tx_handle_ooo() take care of updating the stats, according to the returned value. As the switch/case blocks are already there, this change saves unnecessary branches in the handler. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:54 -07:00
Tariq Toukan	da6682faa8	net/mlx5e: kTLS, Introduce TLS-specific create TIS TLS TIS objects have a defined role in mapping and reaching the HW TLS contexts. Some standard TIS attributes (like LAG port affinity) are not relevant for them. Use a dedicated TLS TIS create function instead of the generic mlx5e_create_tis. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:54 -07:00
Tariq Toukan	7adc91e0c9	net/tls: Multi-threaded calls to TX tls_dev_del Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:54 -07:00
Tariq Toukan	113671b255	net/tls: Perform immediate device ctx cleanup when possible TLS context destructor can be run in atomic context. Cleanup operations for device-offloaded contexts could require access and interaction with the device callbacks, which might sleep. Hence, the cleanup of such contexts must be deferred and completed inside an async work. For all others, this is not necessary, as cleanup is atomic. Invoke cleanup immediately for them, avoiding queueing redundant gc work. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:54 -07:00
Yang Li	8fd1e15177	tls: rx: Fix unsigned comparison with less than zero The return from the call to tls_rx_msg_size() is int, it can be a negative error code, however this is being assigned to an unsigned long variable 'sz', so making 'sz' an int. Eliminate the following coccicheck warning: ./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:39 -07:00
Jakub Kicinski	37e2618834	Merge branch 'tls-rx-follow-ups-to-rx-work' Jakub Kicinski says: ==================== tls: rx: follow ups to rx work A selection of unrelated changes. First some selftest polishing. Next a change to rcvtimeo handling for locking based on an exchange with Eric. Follow up to Paolo's comments from yesterday. Last but not least a fix to a false positive warning, turns out I've been testing with DEBUG_NET=n this whole time. ==================== Link: https://lore.kernel.org/r/20220727031524.358216-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:02 -07:00
Jakub Kicinski	e20691fa36	tls: rx: fix the false positive warning I went too far in the accessor conversion, we can't use tls_strp_msg() after decryption because the message may not be ready. What we care about on this path is that the output skb is detached, i.e. we didn't somehow just turn around and used the input skb with its TCP data still attached. So look at the anchor directly. Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:00 -07:00
Jakub Kicinski	d11ef9cc5a	tls: strp: rename and multithread the workqueue Paolo points out that there seems to be no strong reason strparser users a single threaded workqueue. Perhaps there were some performance or pinning considerations? Since we don't know (and it's the slow path) let's default to the most natural, multi-threaded choice. Also rename the workqueue to "tls-". Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:49:59 -07:00
Jakub Kicinski	70f03fc2fc	tls: rx: don't consider sock_rcvtimeo() cumulative Eric indicates that restarting rcvtimeo on every wait may be fine. I thought that we should consider it cumulative, and made tls_rx_reader_lock() return the remaining timeo after acquiring the reader lock. tls_rx_rec_wait() gets its timeout passed in by value so it does not keep track of time previously spent. Make the lock waiting consistent with tls_rx_rec_wait() - don't keep track of time spent. Read the timeo fresh in tls_rx_rec_wait(). It's unclear to me why callers are supposed to cache the value. Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:49:59 -07:00
Jakub Kicinski	86c591fb91	selftests: tls: handful of memrnd() and length checks Add a handful of memory randomizations and precise length checks. Nothing is really broken here, I did this to increase confidence when debugging. It does fix a GCC warning, tho. Apparently GCC recognizes that memory needs to be initialized for send() but does not recognize that for write(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:49:59 -07:00
Xie Shaowen	efe3e6b5ae	net: usb: delete extra space and tab in blank line delete extra space and tab in blank line, there is no functional change. Signed-off-by: Xie Shaowen <studentxswpy@163.com> Link: https://lore.kernel.org/r/20220727081253.3043941-1-studentxswpy@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:48:20 -07:00
Jakub Kicinski	272ac32f56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 18:21:16 -07:00
Daniel Müller	64893e83f9	libbpf: Support PPC in arch_specific_syscall_pfx Commit `708ac5bea0` ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes") added the arch_specific_syscall_pfx() function, which returns a string representing the architecture in use. As it turns out this function is currently not aware of Power PC, where NULL is returned. That's being flagged by the libbpf CI system, which builds for ppc64le and the compiler sees a NULL pointer being passed in to a %s format string. With this change we add representations for two more architectures, for Power PC and Power PC 64, and also adjust the string format logic to handle NULL pointers gracefully, in an attempt to prevent similar issues with other architectures in the future. Fixes: `708ac5bea0` ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes") Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net	2022-07-28 16:11:18 -07:00
Lama Kayal	069448b2fd	net/mlx5e: Move mlx5e_init_l2_addr to en_main Move the function declaration of mlx5e_init_l2_addr to en/fs.h, rename to mlx5e_fs_init_l2_addr to align with the fs API functions naming convention and let it take mlx5e_flow_steering as arguments while keeping implementation at en_fs.c file. This helps maintain a clean driver code and avoids unnecessary dependencies. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:30 -07:00
Lama Kayal	a02c07ea5d	net/mlx5e: Split en_fs ndo's and move to en_main Add inner callee for ndo mlx5e_vlan_rx_add_vid and mlx5e_vlan_rx_kill_vid, to separate the priv usage from other flow steering flows. Move wrapper ndo's into en_main, and split the rest of the functionality into a separate part inside en_fs. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:29 -07:00
Lama Kayal	5b031add2f	net/mlx5e: Separate mlx5e_set_rx_mode_work and move caller to en_main Separate mlx5e_set_rx_mode into two, and move caller to en_main while keeping implementation in en_fs in the newly declared function mlx5e_fs_set_rx_mode. This; to minimize the coupling of flow_steering to priv. Add a parallel boolean member vlan_strip_disable to mlx5e_flow_steering that's updated similarly as its identical in priv, thus making it possible to adjust the rx_mode work handler to current changes. Also, add state_destroy boolean to mlx5e_flow_steering struct which replaces the old check : !test_bit(MLX5E_STATE_DESTROYING, &priv->state). This state member is updated accordingly prior to INIT_WORK(mlx5e_set_rx_mode_work), This is done for similar purposes as mentioned earlier and to minimize argument passings. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:29 -07:00
Lama Kayal	7bb7071568	net/mlx5e: Add mdev to flow_steering struct Make flow_steering struct contain mlx5_core_dev such that it becomes self contained and easier to decouple later on this series. Let its values be initialized in mlx5e_fs_init(). Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:29 -07:00
Lama Kayal	6a7bc5d0e1	net/mlx5e: Report flow steering errors with mdev err report API Let en_fs report errors via mdev error report API, aka mlx5_core_* macros, thus replace the netdev API reports. This to minimize netdev coupling to the flow steering struct. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:28 -07:00
Lama Kayal	af8bbf7300	net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer Make mlx5e_flow_steering member of mlx5e_priv a pointer. Add dynamic allocation respectively. Allocate fs for all profiles when initializing profile, symmetrically deallocate at profile cleanup. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:28 -07:00
Lama Kayal	454533aa87	net/mlx5e: Allocate VLAN and TC for featured profiles only Introduce allocation and de-allocation functions for both flow steering VLAN and TC as part of fs API. Add allocations of VLAN and TC as nic profile feature, such that fs_init() will allocate both VLAN and TC only if they're featured in the profile. VLAN and TC are relevant for nic_profile only. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:28 -07:00
Lama Kayal	23bde065c3	net/mlx5e: Make mlx5e_tc_table private Move mlx5e_tc_table struct to en_tc.c thus make it private. Introduce allocation and deallocation functions as part of the tc API to allow this switch smoothly. Convert mlx5e_nic_chain() macro to a function of en_tc.c. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:27 -07:00
Lama Kayal	65f586c273	net/mlx5e: Convert mlx5e_tc_table member of mlx5e_flow_steering to pointer Make fs.tc be a pointer and allocate it dynamically. Add mlx5e_priv pointer to mlx5e_tc_table, and thus get a work-around to accessing priv via tc when handling tc events inside mlx5e_tc_netdev_event. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:27 -07:00
Roi Dayan	7d1a5ce46e	net/mlx5e: TC, Support tc action api for police Add support for tc action api for police. Offloading standalone police action without a tc rule and reporting stats. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:27 -07:00
Roi Dayan	f8e9d413a2	net/mlx5e: TC, Separate get/update/replace meter functions mlx5e_tc_meter_get() to get an existing meter. mlx5e_tc_meter_update() to update an existing meter without refcount. mlx5e_tc_meter_replace() to get/create a meter and update if needed. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:26 -07:00
Roi Dayan	b50ce4350c	net/mlx5e: Add red and green counters for metering Add red and green counters per meter instance. TC police action is implemented as a meter instance. The meter counters represent the police action notexceed/exceed counters. TC rules using the same meter instance will use the same counters. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:26 -07:00
Roi Dayan	e5b1db2741	net/mlx5e: TC, Allocate post meter ft per rule To support a TC police action notexceed counter and supporting actions other than drop/pipe there is a need to create separate ft and rules per rule and not to use a common one created on eswitch init. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:26 -07:00
Yevgeny Kliteynik	8920d92b8b	net/mlx5: DR, Add support for flow metering ASO Add support for ASO action of type flow metering on device that supports STEv1. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Hamdan Igbaria <hamdani@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:25 -07:00
Gal Pressman	ec082d31c1	net/mlx5e: Fix wrong use of skb_tcp_all_headers() with encapsulation Use skb_inner_tcp_all_headers() instead of skb_tcp_all_headers() when transmitting an encapsulated packet in mlx5e_tx_get_gso_ihs(). Fixes: `504148fedb` ("net: add skb_[inner_]tcp_all_headers helpers") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-28 13:55:25 -07:00
Linus Torvalds	33ea1340ba	Including fixes from bluetooth and netfilter, no known blockers for the release. Current release - regressions: - wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix taking the lock before its initialized - Bluetooth: mgmt: fix double free on error path Current release - new code bugs: - eth: ice: fix tunnel checksum offload with fragmented traffic Previous releases - regressions: - tcp: md5: fix IPv4-mapped support after refactoring, don't take the pure v6 path - Revert "tcp: change pingpong threshold to 3", improving detection of interactive sessions - mld: fix netdev refcount leak in mld_{query \| report}_work() due to a race - Bluetooth: - always set event mask on suspend, avoid early wake ups - L2CAP: fix use-after-free caused by l2cap_chan_put - bridge: do not send empty IFLA_AF_SPEC attribute Previous releases - always broken: - ping6: fix memleak in ipv6_renew_options() - sctp: prevent null-deref caused by over-eager error paths - virtio-net: fix the race between refill work and close, resulting in NAPI scheduled after close and a BUG() - macsec: - fix three netlink parsing bugs - avoid breaking the device state on invalid change requests - fix a memleak in another error path Misc: - dt-bindings: net: ethernet-controller: rework 'fixed-link' schema - two more batches of sysctl data race adornment Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmLi0uAACgkQMUZtbf5S IrvRlhAAuDzmhVZh2pMT44CmoPS6GGnf7k4fL7lctYa1m+AuvE+zfCx3VOK7tj7V OGecNx+CISA4xsAN4iNYVBMnq4437tuXCPDK+SZPpXlVflEP/SW9muCW8b4OwNFZ W5bswL0wlla5WYEgZmvUhEBcsfAN8r71u6j71adHpmGqoIA74Dw9VCKNzLsb8qMB 9sSeMmHDsOgDHw/rWCNo5JNr8DS35g0zMYJp1Q7Nu2H588dRk6aNXbW1nJ5Yph9c ZC7E6HweXrLHJbBC7SKf2mJfyzEIyvTvfnoi4knaBFE5b3g70ImZJ7AiDe/C37ZA bTmfT4FKZSA5WPKvd8OPxFwJBnrjd1AyHqvdY4wFsDk+PtNXzjOpBFPsNOXzBqIn ccwl9t4v/6Vre+miZyh6P0wmsZNTipcTe/CEYQQcIUGKgcx3rB5usikynoZGRc7Q cASpIaXfWEIUTw+NUd902NKgPq0oiuqjGBHc2Vl3k+4KcNUX16CtCFs6ufzIsSLe vlfcFL/myxuhbdcvNqUw5w+hyoSSA/54lvo5o+Uccgs/ifc/u7C43v6yX9gG/SQl MBa2PVWC0ztO6y6HKjd5OhrkUBenK1dLl9uGvExAi47kBImLdlttIanzZ/asUxzO 9L2Ocq2WMcBPIjsGDTQirdRMAGYqLEKSgTKf8zMHAEkRXWFfn2Y= =xnan -----END PGP SIGNATURE----- Merge tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and netfilter, no known blockers for the release. Current release - regressions: - wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix taking the lock before its initialized - Bluetooth: mgmt: fix double free on error path Current release - new code bugs: - eth: ice: fix tunnel checksum offload with fragmented traffic Previous releases - regressions: - tcp: md5: fix IPv4-mapped support after refactoring, don't take the pure v6 path - Revert "tcp: change pingpong threshold to 3", improving detection of interactive sessions - mld: fix netdev refcount leak in mld_{query \| report}_work() due to a race - Bluetooth: - always set event mask on suspend, avoid early wake ups - L2CAP: fix use-after-free caused by l2cap_chan_put - bridge: do not send empty IFLA_AF_SPEC attribute Previous releases - always broken: - ping6: fix memleak in ipv6_renew_options() - sctp: prevent null-deref caused by over-eager error paths - virtio-net: fix the race between refill work and close, resulting in NAPI scheduled after close and a BUG() - macsec: - fix three netlink parsing bugs - avoid breaking the device state on invalid change requests - fix a memleak in another error path Misc: - dt-bindings: net: ethernet-controller: rework 'fixed-link' schema - two more batches of sysctl data race adornment" * tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) stmmac: dwmac-mediatek: fix resource leak in probe ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr net: ping6: Fix memleak in ipv6_renew_options(). net/funeth: Fix fun_xdp_tx() and XDP packet reclaim sctp: leave the err path free in sctp_stream_init to sctp_stream_free sfc: disable softirqs for ptp TX ptp: ocp: Select CRC16 in the Kconfig. tcp: md5: fix IPv4-mapped support virtio-net: fix the race between refill work and close mptcp: Do not return EINPROGRESS when subflow creation succeeds Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put Bluetooth: Always set event mask on suspend Bluetooth: mgmt: Fix double free on error path wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop() ice: do not setup vlan for loopback VSI ice: check (DD \| EOF) bits on Rx descriptor rather than (EOP \| RS) ice: Fix VSIs unable to share unicast MAC ice: Fix tunnel checksum offload with fragmented traffic ice: Fix max VLANs available for VF netfilter: nft_queue: only allow supported familes and hooks ...	2022-07-28 11:54:59 -07:00
Maciej Fijalkowski	44ece4e1a3	ice: allow toggling loopback mode via ndo_set_features callback Add support for NETIF_F_LOOPBACK. This feature can be set via: $ ethtool -K eth0 loopback <on\|off> Feature can be useful for local data path tests. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:40 -07:00
Maciej Fijalkowski	c67672fa26	ice: compress branches in ice_set_features() Instead of rather verbose comparison of current netdev->features bits vs the incoming ones from user, let us compress them by a helper features set that will be the result of netdev->features XOR features. This way, current, extensive branches: if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT)) set_feature(true); else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT) set_feature(false); can become: netdev_features_t changed = netdev->features ^ features; if (changed & NETIF_F_BIT) set_feature(!!(features & NETIF_F_BIT)); This is nothing new as currently several other drivers use this approach, which I find much more convenient. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:40 -07:00
Michal Wilczynski	a419526de6	ice: Fix promiscuous mode not turning off When trust is turned off for the VF, the expectation is that promiscuous and allmulticast filters are removed. Currently default VSI filter is not getting cleared in this flow. Example: ip link set enp236s0f0 vf 0 trust on ip link set enp236s0f0v0 promisc on ip link set enp236s0f0 vf 0 trust off /* promiscuous mode is still enabled on VF0 */ Remove switch filters for both cases. This commit fixes above behavior by removing default VSI filters and allmulticast filters when vf-true-promisc-support is OFF. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:40 -07:00
Michal Wilczynski	d7393425e7	ice: Introduce enabling promiscuous mode on multiple VF's In current implementation default VSI switch filter is only able to forward traffic to a single VSI. This limits promiscuous mode with private flag 'vf-true-promisc-support' to a single VF. Enabling it on the second VF won't work. Also allmulticast support doesn't seem to be properly implemented when vf-true-promisc-support is true. Use standard ice_add_rule_internal() function that already implements forwarding to multiple VSI's instead of constructing AQ call manually. Add switch filter for allmulticast mode when vf-true-promisc-support is enabled. The same filter is added regardless of the flag - it doesn't matter for this case. Remove unnecessary fields in switch structure. From now on book keeping will be done by ice_add_rule_internal(). Refactor unnecessarily passed function arguments. To test: 1) Create 2 VM's, and two VF's. Attach VF's to VM's. 2) Enable promiscuous mode on both of them and check if traffic is seen on both of them. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:22 -07:00
Jacob Keller	d8fae2504e	igb: convert .adjfreq to .adjfine The 82576 PTP implementation still uses .adjfreq instead of using the newer .adjfine. This implementation uses a pre-simplified calculation since the base increment value for the 82576 is just 16 * 2^19. Converting this into scaled_ppm is tricky, and makes the intent a bit less clear. Simply convert to the normal flow of multiplying the base increment value by the scaled_ppm and then dividing by 1000000ULL << 16. This can be implemented using mul_u64_u64_div_u64 which can avoid the possible overflow that might occur for large adjustments. Use of .adjfine can improve the precision of small adjustments and gets us one driver closer to removing the old implementation from the kernel entirely. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:03:01 -07:00
Jacob Keller	5a5542324a	ixgbe: convert .adjfreq to .adjfine Convert the ixgbe PTP frequency adjustment implementations from .adjfreq to .adjfine. This allows using the scaled parts per million adjustment from the PTP core and results in a more precise adjustment for small corrections. To avoid overflow, use mul_u64_u64_div_u64 to perform the calculation. On X86 platforms, this will use instructions that perform the operations with 128bit intermediate values. For other architectures, the implementation will limit the loss of precision as much as possible. This change slightly improves the precision of frequency adjustments for all ixgbe based devices, and gets us one driver closer to being able to remove the older .adjfreq implementation from the kernel. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:02:02 -07:00
Jacob Keller	ccd3bf9859	i40e: convert .adjfreq to .adjfine The i40e driver currently implements the .adjfreq handler for frequency adjustments. This takes the adjustment parameter in parts per billion. The PTP core supports .adjfine which provides an adjustment in scaled parts per million. This has a higher resolution and can result in more precise adjustments for small corrections. Convert the existing .adjfreq implementation to the newer .adjfine implementation. This is trivial since it just requires changing the divisor from 1000000000ULL to (1000000ULL << 16) in the mul_u64_u64_div_u64 call. This improves the precision of the adjustments and gets us one driver closer to removing the old .adjfreq support from the kernel. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 10:59:38 -07:00
Jacob Keller	3626a690b7	i40e: use mul_u64_u64_div_u64 for PTP frequency calculation The i40e device has a different clock rate depending on the current link speed. This requires using a different increment rate for the PTP clock registers. For slower link speeds, the base increment value is larger. Directly multiplying the larger increment value by the parts per billion adjustment might overflow. To avoid this, the i40e implementation defaults to using the lower increment value and then multiplying the adjustment afterwards. This causes a loss of precision for lower link speeds. We can fix this by using mul_u64_u64_div_u64 instead of performing the multiplications using standard C operations. On X86, this will use special instructions that perform the multiplication and division with 128bit intermediate values. For other architectures, the fallback implementation will limit the loss of precision for large values. Small adjustments don't overflow anyways and won't lose precision at all. This allows first multiplying the base increment value and then performing the adjustment calculation, since we no longer fear overflowing. It also makes it easier to convert to the even more precise .adjfine implementation in a following change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 10:58:30 -07:00
Jacob Keller	abab010f16	e1000e: convert .adjfreq to .adjfine The PTP implementation for the e1000e driver uses the older .adjfreq method. This method takes an adjustment in parts per billion. The newer .adjfine implementation uses scaled_ppm. The use of scaled_ppm allows for finer grained adjustments and is preferred over using the older implementation. Make use of mul_u64_u64_div_u64 in order to handle possible overflow of the multiplication used to calculate the desired adjustment to the hardware increment value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 10:57:06 -07:00
Jacob Keller	ab8e8db27e	e1000e: remove unnecessary range check in e1000e_phc_adjfreq The e1000e_phc_adjfreq function validates that the input delta is within the maximum range. This is already handled by the core PTP code and this is a duplicate and thus unnecessary check. It also complicates refactoring to use the newer .adjfine implementation, where the input is no longer specified in parts per billion. Remove the range validation check. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 10:55:56 -07:00
Jacob Keller	4488df1401	ice: implement adjfine with mul_u64_u64_div_u64 The PTP frequency adjustment code needs to determine an appropriate adjustment given an input scaled_ppm adjustment. We calculate the adjustment to the register by multiplying the base (nominal) increment value by the scaled_ppm and then dividing by the scaled one million value. For very large adjustments, this might overflow. To avoid this, both the scaled_ppm and divisor values are downshifted. We can avoid that on X86 architectures by using mul_u64_u64_div_u64. This helper function will perform the multiplication and division with 128bit intermediate values. We know that scaled_ppm is never larger than the divisor so this operation will never result in an overflow. This improves the accuracy of the calculations for large adjustment values on X86. It is likely an improvement on other architectures as well because the default implementation of mul_u64_u64_div_u64 is smarter than the original approach taken in the ice code. Additionally, this implementation is easier to read, using fewer local variables and lines of code to implement. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 10:54:47 -07:00
Dan Carpenter	4d3d3a1b24	stmmac: dwmac-mediatek: fix resource leak in probe If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt() before returning. Otherwise it is a resource leak. Fixes: `fa4b3ca60e` ("stmmac: dwmac-mediatek: fix clock issue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 10:43:04 -07:00

1 2 3 4 5 ...

1109078 Commits All Branches Search

1109078 Commits

All Branches