Commit graph

1091880 commits

Author SHA1 Message Date
Jason Wang
3a246644e8 vdpa: ifcvf: set pci driver data in probe
[ Upstream commit bd8bb9aed5 ]

We should set the pci driver data in probe instead of the vdpa device
adding callback. Otherwise if no vDPA device is created we will lose
the pointer to the management device.

Fixes: 6b5df347c6 ("vDPA/ifcvf: implement management netlink framework for ifcvf")
Tested-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220524055557.1938-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Eric Dumazet
d8e1bc6029 tcp: tcp_rtx_synack() can be called from process context
[ Upstream commit 0a375c8224 ]

Laurent reported the enclosed report [1]

This bug triggers with following coditions:

0) Kernel built with CONFIG_DEBUG_PREEMPT=y

1) A new passive FastOpen TCP socket is created.
   This FO socket waits for an ACK coming from client to be a complete
   ESTABLISHED one.
2) A socket operation on this socket goes through lock_sock()
   release_sock() dance.
3) While the socket is owned by the user in step 2),
   a retransmit of the SYN is received and stored in socket backlog.
4) At release_sock() time, the socket backlog is processed while
   in process context.
5) A SYNACK packet is cooked in response of the SYN retransmit.
6) -> tcp_rtx_synack() is called in process context.

Before blamed commit, tcp_rtx_synack() was always called from BH handler,
from a timer handler.

Fix this by using TCP_INC_STATS() & NET_INC_STATS()
which do not assume caller is in non preemptible context.

[1]
BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
caller is tcp_rtx_synack.part.0+0x36/0xc0
CPU: 10 PID: 2180 Comm: epollpep Tainted: G           OE     5.16.0-0.bpo.4-amd64 #1  Debian 5.16.12-1~bpo11+1
Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x48/0x5e
 check_preemption_disabled+0xde/0xe0
 tcp_rtx_synack.part.0+0x36/0xc0
 tcp_rtx_synack+0x8d/0xa0
 ? kmem_cache_alloc+0x2e0/0x3e0
 ? apparmor_file_alloc_security+0x3b/0x1f0
 inet_rtx_syn_ack+0x16/0x30
 tcp_check_req+0x367/0x610
 tcp_rcv_state_process+0x91/0xf60
 ? get_nohz_timer_target+0x18/0x1a0
 ? lock_timer_base+0x61/0x80
 ? preempt_count_add+0x68/0xa0
 tcp_v4_do_rcv+0xbd/0x270
 __release_sock+0x6d/0xb0
 release_sock+0x2b/0x90
 sock_setsockopt+0x138/0x1140
 ? __sys_getsockname+0x7e/0xc0
 ? aa_sk_perm+0x3e/0x1a0
 __sys_setsockopt+0x198/0x1e0
 __x64_sys_setsockopt+0x21/0x30
 do_syscall_64+0x38/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220530213713.601888-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Guoju Fang
4e659b3968 net: sched: add barrier to fix packet stuck problem for lockless qdisc
[ Upstream commit 2e8728c955 ]

In qdisc_run_end(), the spin_unlock() only has store-release semantic,
which guarantees all earlier memory access are visible before it. But
the subsequent test_bit() has no barrier semantics so may be reordered
ahead of the spin_unlock(). The store-load reordering may cause a packet
stuck problem.

The concurrent operations can be described as below,
         CPU 0                      |          CPU 1
   qdisc_run_end()                  |     qdisc_run_begin()
          .                         |           .
 ----> /* may be reorderd here */   |           .
|         .                         |           .
|     spin_unlock()                 |         set_bit()
|         .                         |         smp_mb__after_atomic()
 ---- test_bit()                    |         spin_trylock()
          .                         |          .

Consider the following sequence of events:
    CPU 0 reorder test_bit() ahead and see MISSED = 0
    CPU 1 calls set_bit()
    CPU 1 calls spin_trylock() and return fail
    CPU 0 executes spin_unlock()

At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
take it, until the next cpu pushing to the qdisc (if ever ...) will
notice and dequeue it.

This patch fix this by adding one explicit barrier. As spin_unlock() and
test_bit() ordering is a store-load ordering, a full memory barrier
smp_mb() is needed here.

Fixes: a90c57f2ce ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Guoju Fang <gjfang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220528101628.120193-1-gjfang@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Maxim Mikityanskiy
c3e82d328f net/mlx5e: Update netdev features after changing XDP state
[ Upstream commit f6279f113a ]

Some features (LRO, HW GRO) conflict with XDP. If there is an attempt to
enable such features while XDP is active, they will be set to `off
[requested on]`. In order to activate these features after XDP is turned
off, the driver needs to call netdev_update_features(). This commit adds
this missing call after XDP state changes.

Fixes: cf6e34c8c2 ("net/mlx5e: Properly block LRO when XDP is enabled")
Fixes: b0617e7b35 ("net/mlx5e: Properly block HW GRO when XDP is enabled")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Changcheng Liu
4fef1600d7 net/mlx5: correct ECE offset in query qp output
[ Upstream commit 3fc2a9e89b ]

ECE field should be after opt_param_mask in query qp output.

Fixes: 6b646a7e4a ("net/mlx5: Add ability to read and write ECE options")
Signed-off-by: Changcheng Liu <jerrliu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Maxim Mikityanskiy
449357caff net/mlx5e: Disable softirq in mlx5e_activate_rq to avoid race condition
[ Upstream commit 2e642afb61 ]

When the driver activates the channels, it assumes NAPI isn't running
yet. mlx5e_activate_rq posts a NOP WQE to ICOSQ to trigger a hardware
interrupt and start NAPI, which will run mlx5e_alloc_rx_mpwqe and post
UMR WQEs to ICOSQ to be able to receive packets with striding RQ.

Unfortunately, a race condition is possible if NAPI is triggered by
something else (for example, TX) at a bad timing, before
mlx5e_activate_rq finishes. In this case, mlx5e_alloc_rx_mpwqe may post
UMR WQEs to ICOSQ, and with the bad timing, the wqe_info of the first
UMR may be overwritten by the wqe_info of the NOP posted by
mlx5e_activate_rq.

The consequence is that icosq->db.wqe_info[0].num_wqebbs will be changed
from MLX5E_UMR_WQEBBS to 1, disrupting the integrity of the array-based
linked list in wqe_info[]. mlx5e_poll_ico_cq will hang in an infinite
loop after processing wqe_info[0], because after the corruption, the
next item to be processed will be wqe_info[1], which is filled with
zeros, and `sqcc += wi->num_wqebbs` will never move further.

This commit fixes this race condition by using async_icosq to post the
NOP and trigger the interrupt. async_icosq is always protected with a
spinlock, eliminating the race condition.

Fixes: bc77b240b3 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reported-by: Karsten Nielsen <karsten@foo-bar.dk>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Paul Blakey
27d58bf44d net/mlx5: CT: Fix header-rewrite re-use for tupels
[ Upstream commit 1f2856cde6 ]

Tuple entries that don't have nat configured for them
which are added to the ct nat table will always create
a new modify header, as we don't check for possible
re-use on them. The same for tuples that have nat configured
for them but are added to ct table.

Fix the above by only avoiding wasteful re-use lookup
for actually natted entries in ct nat table.

Fixes: 7fac5c2ece ("net/mlx5: CT: Avoid reusing modify header context for natted entries")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:58 +02:00
Maor Dickman
2adaac7ba5 net/mlx5e: TC NIC mode, fix tc chains miss table
[ Upstream commit 66cb64e292 ]

The cited commit changed promisc table to be created on demand with the
highest priority in the NIC table replacing the vlan table, this caused
tc NIC tables miss flow to skip the prmoisc table because it use vlan
table as miss table.

OVS offload in NIC mode use promisc by default so any unicast packet
which will be handled by tc NIC tables miss flow will skip the promisc
rule and will be dropped.

Fix this by adding new empty table in new tc level with low priority and
point the nic tc chain miss to it, the new table is managed so it will
point to vlan table if promisc is disabled and to promisc table if enabled.

Fixes: 1c46d7409f ("net/mlx5e: Optimize promiscuous mode")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Leon Romanovsky
79caa98f73 net/mlx5: Don't use already freed action pointer
[ Upstream commit 80b2bd737d ]

The call to mlx5dr_action_destroy() releases "action" memory. That
pointer is set to miss_action later and generates the following smatch
error:

 drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c:53 set_miss_action()
 warn: 'action' was already freed.

Make sure that the pointer is always valid by setting NULL after destroy.

Fixes: 6a48faeeca ("net/mlx5: Add direct rule fs_cmd implementation")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Christophe JAILLET
c7680f2b50 virtio: pci: Fix an error handling path in vp_modern_probe()
[ Upstream commit 7a836a2aba ]

If an error occurs after a successful pci_request_selected_regions() call,
it should be undone by a corresponding pci_release_selected_regions() call,
as already done in vp_modern_remove().

Fixes: fd502729fb ("virtio-pci: introduce modern device module")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <237109725aad2c3c03d14549f777b1927c84b045.1648977064.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Eli Cohen
1714361c2d vdpa: Fix error logic in vdpa_nl_cmd_dev_get_doit
[ Upstream commit 7a6691f1f8 ]

In vdpa_nl_cmd_dev_get_doit(), if the call to genlmsg_reply() fails we
must not call nlmsg_free() since this is done inside genlmsg_reply().

Fix it.

Fixes: bc0d90ee02 ("vdpa: Enable user to query vdpa device info")
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220518133804.1075129-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Weizhao Ouyang
0b2df9a868 erofs: fix 'backmost' member of z_erofs_decompress_frontend
[ Upstream commit 4398d3c31b ]

Initialize 'backmost' to true in DECOMPRESS_FRONTEND_INIT.

Fixes: 5c6dcc57e2 ("erofs: get rid of `struct z_erofs_collector'")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220530075114.918874-1-o451686892@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Hangbin Liu
86ee0a5910 bonding: show NS IPv6 targets in proc master info
[ Upstream commit 4a1f14df55 ]

When adding bond new parameter ns_targets. I forgot to print this
in bond master proc info. After updating, the bond master info will look
like:

ARP IP target/s (n.n.n.n form): 192.168.1.254
NS IPv6 target/s (XX::XX form): 2022::1, 2022::2

Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Reported-by: Li Liang <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20220530062639.37179-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Viorel Suman
08a4fc1350 net: phy: at803x: disable WOL at probe
[ Upstream commit d7cd5e06c9 ]

Before 7beecaf7d5 ("net: phy: at803x: improve the WOL feature") patch
"at803x_get_wol" implementation used AT803X_INTR_ENABLE_WOL value to set
WAKE_MAGIC flag, and now AT803X_WOL_EN value is used for the same purpose.
The problem here is that the values of these two bits are different after
hardware reset: AT803X_INTR_ENABLE_WOL=0 after hardware reset, but
AT803X_WOL_EN=1. So now, if called right after boot, "at803x_get_wol" will
set WAKE_MAGIC flag, even if WOL function is not enabled by calling
"at803x_set_wol" function. The patch disables WOL function on probe thus
the behavior is consistent.

Fixes: 7beecaf7d5 ("net: phy: at803x: improve the WOL feature")
Signed-off-by: Viorel Suman <viorel.suman@nxp.com>
Link: https://lore.kernel.org/r/20220527084935.235274-1-viorel.suman@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Haisu Wang
33271e8cbe blk-mq: do not update io_ticks with passthrough requests
[ Upstream commit b81c14ca14 ]

Flush or passthrough requests are not accounted as normal IO in completion.
To reflect iostat for slow IO, io_ticks is updated when stat show called
based on inflight numbers.
It may cause inconsistent io_ticks calculation result.

So do not account non-passthrough request when check inflight.

Fixes: 86d7331299 ("block: update io_ticks when io hang")
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: samuelliao <samuelliao@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220530064059.1120058-1-haisuwang@tencent.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Peter Zijlstra
f1c4f31df1 sched/autogroup: Fix sysctl move
[ Upstream commit 82f586f923 ]

Ivan reported /proc/sys/kernel/sched_autogroup_enabled went walk-about
and using the noautogroup command line parameter would result in a
boot error message.

Turns out the sysctl move placed the init function wrong.

Fixes: c8eaf6ac76 ("sched: move autogroup sysctls into its own file")
Reported-by: Ivan Kozik <ivan@ludios.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ivan Kozik <ivan@ludios.org>
Link: https://lkml.kernel.org/r/YpR2IqndgsyMzN00@worktop.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Jens Axboe
b79f0b14a5 block: make bioset_exit() fully resilient against being called twice
[ Upstream commit 605f7415ec ]

Most of bioset_exit() is fine being called twice, as it clears the
various allocations etc when they are freed. The exception is
bio_alloc_cache_destroy(), which does not clear ->cache when it has
freed it.

This isn't necessarily a bug, but can be if buggy users does call the
exit path more then once, or with just a memset() bioset which has
never been initialized. dm appears to be one such user.

Fixes: be4d234d7a ("bio: add allocation cache abstraction")
Link: https://lore.kernel.org/linux-block/YpK7m+14A+pZKs5k@casper.infradead.org/
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:57 +02:00
Íñigo Huguet
b2d60329a0 sfc: fix wrong tx channel offset with efx_separate_tx_channels
[ Upstream commit c308dfd1b4 ]

tx_channel_offset is calculated in efx_allocate_msix_channels, but it is
also calculated again in efx_set_channels because it was originally done
there, and when efx_allocate_msix_channels was introduced it was
forgotten to be removed from efx_set_channels.

Moreover, the old calculation is wrong when using
efx_separate_tx_channels because now we can have XDP channels after the
TX channels, so n_channels - n_tx_channels doesn't point to the first TX
channel.

Remove the old calculation from efx_set_channels, and add the
initialization of this variable if MSI or legacy interrupts are used,
next to the initialization of the rest of the related variables, where
it was missing.

Fixes: 3990a8fffb ("sfc: allocate channels for XDP tx queues")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Martin Habets
913d45f02d sfc: fix considering that all channels have TX queues
[ Upstream commit 2e102b53f8 ]

Normally, all channels have RX and TX queues, but this is not true if
modparam efx_separate_tx_channels=1 is used. In that cases, some
channels only have RX queues and others only TX queues (or more
preciselly, they have them allocated, but not initialized).

Fix efx_channel_has_tx_queues to return the correct value for this case
too.

Messages shown at probe time before the fix:
 sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
 ------------[ cut here ]------------
 netdevice: ens6f0np0: failed to initialise TXQ -1
 WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
 [...] stripped
 RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
 [...] stripped
 Call Trace:
  efx_init_tx_queue+0xaa/0xf0 [sfc]
  efx_start_channels+0x49/0x120 [sfc]
  efx_start_all+0x1f8/0x430 [sfc]
  efx_net_open+0x5a/0xe0 [sfc]
  __dev_open+0xd0/0x190
  __dev_change_flags+0x1b3/0x220
  dev_change_flags+0x21/0x60
 [...] stripped

Messages shown at remove time before the fix:
 sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
 sfc 0000:03:00.0 ens6f0np0: failed to flush queues

Fixes: 8700aff089 ("sfc: fix channel allocation with brute force")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Tested-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Hangbin Liu
fdff9f7c55 bonding: NS target should accept link local address
[ Upstream commit 5e1eeef69c ]

When setting bond NS target, we use bond_is_ip6_target_ok() to check
if the address valid. The link local address was wrongly rejected in
bond_changelink(), as most time the user just set the ARP/NS target to
gateway, while the IPv6 gateway is always a link local address when user
set up interface via SLAAC.

So remove the link local addr check when setting bond NS target.

Fixes: 129e3c1bab ("bonding: add new option ns_ip6_target")
Reported-by: Li Liang <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Christoph Hellwig
c90322fc9b block: use bio_queue_enter instead of blk_queue_enter in bio_poll
[ Upstream commit ebd076bf7d ]

We want to have a valid live gendisk to call ->poll and not just a
request_queue, so call the right helper.

Fixes: 3e08773c38 ("block: switch polling to be bio based")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220523124302.526186-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Yu Xiao
ebbd33392d nfp: only report pause frame configuration for physical device
[ Upstream commit 0649e4d634 ]

Only report pause frame configuration for physical device. Logical
port of both PCI PF and PCI VF do not support it.

Fixes: 9fdc5d85a8 ("nfp: update ethtool reporting of pauseframe control")
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Eric Dumazet
41e191fe72 tcp: add accessors to read/set tp->snd_cwnd
[ Upstream commit 4057037535 ]

We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321ff7 ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Guangguan Wang
5831e434e8 net/smc: fixes for converting from "struct smc_cdc_tx_pend **" to "struct smc_wr_tx_pend_priv *"
[ Upstream commit e225c9a5a7 ]

"struct smc_cdc_tx_pend **" can not directly convert
to "struct smc_wr_tx_pend_priv *".

Fixes: 2bced6aefa ("net/smc: put slot when connection is killed")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Heinrich Schuchardt
d610e2c4ff riscv: read-only pages should not be writable
[ Upstream commit 630f972d76 ]

If EFI pages are marked as read-only,
we should remove the _PAGE_WRITE flag.

The current code overwrites an unused value.

Fixes: b91540d52a ("RISC-V: Add EFI runtime services")
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Link: https://lore.kernel.org/r/20220528014132.91052-1-heinrich.schuchardt@canonical.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Zhang Wensheng
f128a4ad1d nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
[ Upstream commit 858f1bf65d ]

When 'index' is a big numbers, it may become negative which forced
to 'int'. then 'index << part_shift' might overflow to a positive
value that is not greater than '0xfffff', then sysfs might complains
about duplicate creation. Because of this, move the 'index' judgment
to the front will fix it and be better.

Fixes: b0d9111a2d ("nbd: use an idr to keep track of nbd devices")
Fixes: 940c264984 ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-6-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:56 +02:00
Yu Kuai
5171ef20ba nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
[ Upstream commit 2895f1831e ]

Otherwise io will hung because request will only be completed if the
cmd has the flag 'NBD_CMD_INFLIGHT'.

Fixes: 07175cb1ba ("nbd: make sure request completion won't concurrent")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-4-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Christoph Hellwig
be7c76c47f block: take destination bvec offsets into account in bio_copy_data_iter
[ Upstream commit 403d50341c ]

Appartly bcache can copy into bios that do not just contain fresh
pages but can have offsets into the bio_vecs.  Restore support for tht
in bio_copy_data_iter.

Fixes: f8b679a070 ("block: rewrite bio_copy_data_iter to use bvec_kmap_local and memcpy_to_bvec")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220524143919.1155501-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Menglong Dong
819d7d136b bpf: Fix probe read error in ___bpf_prog_run()
[ Upstream commit caff1fa411 ]

I think there is something wrong with BPF_PROBE_MEM in ___bpf_prog_run()
in big-endian machine. Let's make a test and see what will happen if we
want to load a 'u16' with BPF_PROBE_MEM.

Let's make the src value '0x0001', the value of dest register will become
0x0001000000000000, as the value will be loaded to the first 2 byte of
DST with following code:

  bpf_probe_read_kernel(&DST, SIZE, (const void *)(long) (SRC + insn->off));

Obviously, the value in DST is not correct. In fact, we can compare
BPF_PROBE_MEM with LDX_MEM_H:

  DST = *(SIZE *)(unsigned long) (SRC + insn->off);

If the memory load is done by LDX_MEM_H, the value in DST will be 0x1 now.

And I think this error results in the test case 'test_bpf_sk_storage_map'
failing:

  test_bpf_sk_storage_map:PASS:bpf_iter_bpf_sk_storage_map__open_and_load 0 nsec
  test_bpf_sk_storage_map:PASS:socket 0 nsec
  test_bpf_sk_storage_map:PASS:map_update 0 nsec
  test_bpf_sk_storage_map:PASS:socket 0 nsec
  test_bpf_sk_storage_map:PASS:map_update 0 nsec
  test_bpf_sk_storage_map:PASS:socket 0 nsec
  test_bpf_sk_storage_map:PASS:map_update 0 nsec
  test_bpf_sk_storage_map:PASS:attach_iter 0 nsec
  test_bpf_sk_storage_map:PASS:create_iter 0 nsec
  test_bpf_sk_storage_map:PASS:read 0 nsec
  test_bpf_sk_storage_map:FAIL:ipv6_sk_count got 0 expected 3
  $10/26 bpf_iter/bpf_sk_storage_map:FAIL

The code of the test case is simply, it will load sk->sk_family to the
register with BPF_PROBE_MEM and check if it is AF_INET6. With this patch,
now the test case 'bpf_iter' can pass:

  $10  bpf_iter:OK

Fixes: 2a02759ef5 ("bpf: Add support for BTF pointers to interpreter")
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20220524021228.533216-1-imagedong@tencent.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Song Liu
f8ad3a0007 selftests/bpf: fix stacktrace_build_id with missing kprobe/urandom_read
[ Upstream commit 59ed76fe2f ]

Kernel function urandom_read is replaced with urandom_read_iter.
Therefore, kprobe on urandom_read is not working any more:

[root@eth50-1 bpf]# ./test_progs -n 161
test_stacktrace_build_id:PASS:skel_open_and_load 0 nsec
libbpf: kprobe perf_event_open() failed: No such file or directory
libbpf: prog 'oncpu': failed to create kprobe 'urandom_read+0x0' \
        perf event: No such file or directory
libbpf: prog 'oncpu': failed to auto-attach: -2
test_stacktrace_build_id:FAIL:attach_tp err -2
161     stacktrace_build_id:FAIL

Fix this by replacing urandom_read with urandom_read_iter in the test.

Fixes: 1b388e7765 ("random: convert to using fops->read_iter()")
Reported-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220526191608.2364049-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Zhihao Cheng
1174ab8ba3 ubi: ubi_create_volume: Fix use-after-free when volume creation failed
[ Upstream commit 8c03a1c21d ]

There is an use-after-free problem for 'eba_tbl' in ubi_create_volume()'s
error handling path:

  ubi_eba_replace_table(vol, eba_tbl)
    vol->eba_tbl = tbl
out_mapping:
  ubi_eba_destroy_table(eba_tbl)   // Free 'eba_tbl'
out_unlock:
  put_device(&vol->dev)
    vol_release
      kfree(tbl->entries)	  // UAF

Fix it by removing redundant 'eba_tbl' releasing.
Fetch a reproducer in [Link].

Fixes: 493cfaeaa0 ("mtd: utilize new cdev_device_add helper function")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215965
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Zhihao Cheng
edb8693702 ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
[ Upstream commit d09e9a2bdd ]

There at least 6 PEBs reserved on UBI device:
1. EBA_RESERVED_PEBS[1]
2. WL_RESERVED_PEBS[1]
3. UBI_LAYOUT_VOLUME_EBS[2]
4. MIN_FASTMAP_RESERVED_PEBS[2]

When all ubi volumes take all their PEBs, there are 3 (EBA_RESERVED_PEBS +
WL_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS - MIN_FASTMAP_TAKEN_PEBS[1])
free PEBs. Since commit f9c34bb529 ("ubi: Fix producing anchor PEBs")
and commit 4b68bf9a69 ("ubi: Select fastmap anchor PEBs considering
wear level rules") applied, there is only 1 (3 - FASTMAP_ANCHOR_PEBS[1] -
FASTMAP_NEXT_ANCHOR_PEBS[1]) free PEB to fill pool and wl_pool, after
filling pool, wl_pool is always empty. So, UBI could be stuck in an
infinite loop:

	ubi_thread	   system_wq
wear_leveling_worker <--------------------------------------------------
  get_peb_for_wl							|
    // fm_wl_pool, used = size = 0					|
    schedule_work(&ubi->fm_work)					|
									|
		    update_fastmap_work_fn				|
		      ubi_update_fastmap				|
			ubi_refill_pools				|
			// ubi->free_count - ubi->beb_rsvd_pebs < 5	|
			// wl_pool is not filled with any PEBs		|
			schedule_erase(old_fm_anchor)			|
			ubi_ensure_anchor_pebs				|
			  __schedule_ubi_work(wear_leveling_worker)	|
									|
__erase_worker								|
  ensure_wear_leveling							|
    __schedule_ubi_work(wear_leveling_worker) --------------------------

, which cause high cpu usage of ubi_bgt:
top - 12:10:42 up 5 min,  2 users,  load average: 1.76, 0.68, 0.27
Tasks: 123 total,   3 running,  54 sleeping,   0 stopped,   0 zombie

  PID USER PR   NI VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1589 root 20   0   0      0      0 R  45.0  0.0   0:38.86 ubi_bgt0d
  319 root 20   0   0      0      0 I  15.2  0.0   0:15.29 kworker/0:3-eve
  371 root 20   0   0      0      0 I  14.9  0.0   0:12.85 kworker/3:3-eve
   20 root 20   0   0      0      0 I  11.3  0.0   0:05.33 kworker/1:0-eve
  202 root 20   0   0      0      0 I  11.3  0.0   0:04.93 kworker/2:3-eve

In commit 4b68bf9a69 ("ubi: Select fastmap anchor PEBs considering
wear level rules"), there are three key changes:
  1) Choose the fastmap anchor when the most free PEBs are available.
  2) Enable anchor move within the anchor area again as it is useful
     for distributing wear.
  3) Import a candidate fm anchor and check this PEB's erase count during
     wear leveling. If the wear leveling limit is exceeded, use the used
     anchor area PEB with the lowest erase count to replace it.

The anchor candidate can be removed, we can check fm_anchor PEB's erase
count during wear leveling. Fix it by:
  1) Removing 'fm_next_anchor' and check 'fm_anchor' during wear leveling.
  2) Preferentially filling one free peb into fm_wl_pool in condition of
     ubi->free_count > ubi->beb_rsvd_pebs, then try to reserve enough
     free count for fastmap non anchor pebs after the above prerequisites
     are met.
Then, there are at least 1 PEB in pool and 1 PEB in wl_pool after calling
ubi_refill_pools() with all erase works done.

Fetch a reproducer in [Link].

Fixes: 4b68bf9a69 ("ubi: Select fastmap anchor PEBs ... rules")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215407
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Baokun Li
69295267c4 jffs2: fix memory leak in jffs2_do_fill_super
[ Upstream commit c14adb1cf7 ]

If jffs2_iget() or d_make_root() in jffs2_do_fill_super() returns
an error, we can observe the following kmemleak report:

--------------------------------------------
unreferenced object 0xffff888105a65340 (size 64):
  comm "mount", pid 710, jiffies 4302851558 (age 58.239s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff859c45e5>] kmem_cache_alloc_trace+0x475/0x8a0
    [<ffffffff86160146>] jffs2_sum_init+0x96/0x1a0
    [<ffffffff86140e25>] jffs2_do_mount_fs+0x745/0x2120
    [<ffffffff86149fec>] jffs2_do_fill_super+0x35c/0x810
    [<ffffffff8614aae9>] jffs2_fill_super+0x2b9/0x3b0
    [...]
unreferenced object 0xffff8881bd7f0000 (size 65536):
  comm "mount", pid 710, jiffies 4302851558 (age 58.239s)
  hex dump (first 32 bytes):
    bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
    bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  backtrace:
    [<ffffffff858579ba>] kmalloc_order+0xda/0x110
    [<ffffffff85857a11>] kmalloc_order_trace+0x21/0x130
    [<ffffffff859c2ed1>] __kmalloc+0x711/0x8a0
    [<ffffffff86160189>] jffs2_sum_init+0xd9/0x1a0
    [<ffffffff86140e25>] jffs2_do_mount_fs+0x745/0x2120
    [<ffffffff86149fec>] jffs2_do_fill_super+0x35c/0x810
    [<ffffffff8614aae9>] jffs2_fill_super+0x2b9/0x3b0
    [...]
--------------------------------------------

This is because the resources allocated in jffs2_sum_init() are not
released. Call jffs2_sum_exit() to release these resources to solve
the problem.

Fixes: e631ddba58 ("[JFFS2] Add erase block summary support (mount time improvement)")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Genjian Zhang
69a3378524 ep93xx: clock: Do not return the address of the freed memory
[ Upstream commit 8a7322a3a0 ]

Avoid return freed memory addresses,Modified to the actual error
return value of clk_register().

Fixes: 9645ccc7bd ("ep93xx: clock: convert in-place to COMMON_CLK")
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Acked-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Christoph Hellwig
4c10e26c35 block, loop: support partitions without scanning
[ Upstream commit b9684a71fc ]

Historically we did distinguish between a flag that surpressed partition
scanning, and a combinations of the minors variable and another flag if
any partitions were supported.  This was generally confusing and doesn't
make much sense, but some corner case uses of the loop driver actually
do want to support manually added partitions on a device that does not
actively scan for partitions.  To make things worsee the loop driver
also wants to dynamically toggle the scanning for partitions on a live
gendisk, which makes the disk->flags updates non-atomic.

Introduce a new GD_SUPPRESS_PART_SCAN bit in disk->state that disables
just scanning for partitions, and toggle that instead of GENHD_FL_NO_PART
in the loop driver.

Fixes: 1ebe2e5f9d ("block: remove GENHD_FL_EXT_DEVT")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220527055806.1972352-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:55 +02:00
Alexander Lobakin
9707b27648 modpost: fix removing numeric suffixes
[ Upstream commit b5beffa20d ]

With the `-z unique-symbol` linker flag or any similar mechanism,
it is possible to trigger the following:

ERROR: modpost: "param_set_uint.0" [vmlinux] is a static EXPORT_SYMBOL

The reason is that for now the condition from remove_dot():

if (m && (s[n + m] == '.' || s[n + m] == 0))

which was designed to test if it's a dot or a '\0' after the suffix
is never satisfied.
This is due to that `s[n + m]` always points to the last digit of a
numeric suffix, not on the symbol next to it (from a custom debug
print added to modpost):

param_set_uint.0, s[n + m] is '0', s[n + m + 1] is '\0'

So it's off-by-one and was like that since 2014.

Fix this for the sake of any potential upcoming features, but don't
bother stable-backporting, as it's well hidden -- apart from that
LD flag, it can be triggered only with GCC LTO which never landed
upstream.

Fixes: fcd38ed0ff ("scripts: modpost: fix compilation warning")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Miaoqian Lin
8a1a125515 net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register
[ Upstream commit 02ded5a173 ]

of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.

mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register().
We don't need the device node after it.

Add missing of_node_put() to avoid refcount leak.

Fixes: a3c53be55c ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Miaoqian Lin
2e44f21c38 net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaks
[ Upstream commit 5dd89d2fc4 ]

of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release
the refcount in error case.
Add missing of_node_put() to avoid refcount leak.

Fixes: b1f66a5bee ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Dan Carpenter
0b238f75b6 net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()
[ Upstream commit e7e7104e2d ]

The "fsp->location" variable comes from user via ethtool_get_rxnfc().
Check that it is valid to prevent an out of bounds read.

Fixes: 7aab747e55 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Vincent Ray
7ce6a502eb net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog
[ Upstream commit a54ce37036 ]

In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :

if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))

to be executed before an enqueue operation called before
qdisc_run_begin().

As a result the following race can happen :

           CPU 1                             CPU 2

      qdisc_run_begin()               qdisc_run_begin() /* true */
        set(MISSED)                            .
      /* returns false */                      .
          .                            /* sees MISSED = 1 */
          .                            /* so qdisc not empty */
          .                            __qdisc_run()
          .                                    .
          .                              pfifo_fast_dequeue()
 ----> /* may be done here */                  .
|         .                                clear(MISSED)
|         .                                    .
|         .                                smp_mb __after_atomic();
|         .                                    .
|         .                                /* recheck the queue */
|         .                                /* nothing => exit   */
|   enqueue(skb1)
|         .
|   qdisc_run_begin()
|         .
|     spin_trylock() /* fail */
|         .
|     smp_mb__before_atomic() /* not enough */
|         .
 ---- if (test_bit(MISSED))
        return false;   /* exit */

In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().

CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.

Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.

At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.

We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.

Fixes: 89837eb4b2 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
Signed-off-by: Vincent Ray <vray@kalrayinc.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Michael Walle
00c041773a net: lan966x: check devm_of_phy_get() for -EDEFER_PROBE
[ Upstream commit b58cdd4388 ]

At the moment, if devm_of_phy_get() returns an error the serdes
simply isn't set. While it is bad to ignore an error in general, there
is a particular bug that network isn't working if the serdes driver is
compiled as a module. In that case, devm_of_phy_get() returns
-EDEFER_PROBE and the error is silently ignored.

The serdes is optional, it is not there if the port is using RGMII, in
which case devm_of_phy_get() returns -ENODEV. Rearrange the error
handling so that -ENODEV will be handled but other error codes will
abort the probing.

Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20220525231239.1307298-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Dan Carpenter
607c5cd1a0 drm/amdgpu: Off by one in dm_dmub_outbox1_low_irq()
[ Upstream commit a35faec3db ]

The > ARRAY_SIZE() should be >= ARRAY_SIZE() to prevent an out of bounds
access.

Fixes: e27c41d5b0 ("drm/amd/display: Support for DMUB HPD interrupt handling")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Eddie James
119b7d5480 spi: fsi: Fix spurious timeout
[ Upstream commit 61bf40ef51 ]

The driver may return a timeout error even if the status register
indicates that the transfer may proceed. Fix this by restructuring
the polling loop.

Fixes: 89b35e3f28 ("spi: fsi: Implement a timeout for polling status")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20220525165852.33167-2-eajames@linux.ibm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
liuyacan
776fcaf860 net/smc: set ini->smcrv2.ib_dev_v2 to NULL if SMC-Rv2 is unavailable
[ Upstream commit b3b1a17538 ]

In the process of checking whether RDMAv2 is available, the current
implementation first sets ini->smcrv2.ib_dev_v2, and then allocates
smc buf desc and register rmb, but the latter may fail. In this case,
the pointer should be reset.

Fixes: e49300a6bf ("net/smc: add listen processing for SMC-Rv2")
Signed-off-by: liuyacan <liuyacan@corp.netease.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220525085408.812273-1-liuyacan@corp.netease.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:54 +02:00
Siddharth Vadapalli
1808f7c9f2 net: ethernet: ti: am65-cpsw: Fix fwnode passed to phylink_create()
[ Upstream commit 0b7180072a ]

am65-cpsw-nuss driver incorrectly uses fwnode member of common
ethernet device's "struct device_node" instead of using fwnode
member of the port's "struct device_node" in phylink_create().
This results in all ports having the same phy data when there
are multiple ports with their phy properties populated in their
respective nodes rather than the common ethernet device node.

Fix it here by using fwnode member of the port's node.

Fixes: e8609e6947 ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20220524062558.19296-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:53 +02:00
Taehee Yoo
4b8032d39b amt: fix possible memory leak in amt_rcv()
[ Upstream commit 1a1a0e80e0 ]

If an amt receives packets and it finds socket.
If it can't find a socket, it should free a received skb.
But it doesn't.
So, a memory leak would possibly occur.

Fixes: cbc21dc1cf ("amt: add data plane of amt interface")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:53 +02:00
Taehee Yoo
5559f47604 amt: fix return value of amt_update_handler()
[ Upstream commit ac1dbf5598 ]

If a relay receives an update message, it lookup a tunnel.
and if there is no tunnel for that message, it should be treated
as an error, not a success.
But amt_update_handler() returns false, which means success.

Fixes: cbc21dc1cf ("amt: add data plane of amt interface")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:53 +02:00
Jann Horn
c15d5277f0 s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
[ Upstream commit bd52cd5e23 ]

The argument of scatterwalk_unmap() is supposed to be the void* that was
returned by the previous scatterwalk_map() call.
The s390 AES-GCM implementation was instead passing the pointer to the
struct scatter_walk.

This doesn't actually break anything because scatterwalk_unmap() only uses
its argument under CONFIG_HIGHMEM and ARCH_HAS_FLUSH_ON_KUNMAP.

Fixes: bf7fa03870 ("s390/crypto: add s390 platform specific aes gcm support.")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517143047.3054498-1-jannh@google.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:53 +02:00
Krzysztof Kozlowski
6033ae2443 clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
[ Upstream commit 9c04a8ff03 ]

The irq_of_parse_and_map() returns 0 on failure, not a negative ERRNO.

Fixes: 89355274e1 ("clocksource/drivers/oxnas-rps: Add Oxford Semiconductor RPS Dual Timer")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://lore.kernel.org/r/20220422104101.55754-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:53 +02:00
Christoph Hellwig
0a9cd23d29 scsi: sd: Don't call blk_cleanup_disk() in sd_probe()
[ Upstream commit 7274ce0558 ]

In SCSI the midlayer has ownership of the request_queue, so on probe
failure we must only put the gendisk, but leave the request_queue alone.

Link: https://lore.kernel.org/r/20220523083813.227935-1-hch@lst.de
Fixes: 03252259e1 ("scsi: sd: Clean up gendisk if device_add_disk() failed")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 18:44:53 +02:00