Commit graph

778 commits

Author SHA1 Message Date
Alexander Aring
7d28c870a7 fs: dlm: fix DLM_IFL_CB_PENDING gets overwritten
commit a034c1370d upstream.

This patch introduce a new internal flag per lkb value to handle
internal flags which are handled not on wire. The current lkb internal
flags stored as lkb->lkb_flags are split in upper and lower bits, the
lower bits are used to share internal flags over wire for other cluster
wide lkb copies on other nodes.

In commit 61bed0baa4 ("fs: dlm: use a non-static queue for callbacks")
we introduced a new internal flag for pending callbacks for the dlm
callback queue. This flag is protected by the lkb->lkb_cb_lock lock.
This patch overlooked that on dlm receive path and the mentioned upper
and lower bits, that dlm will read the flags, mask it and write it
back. As example receive_flags() in fs/dlm/lock.c. This flag
manipulation is not done atomically and is not protected by
lkb->lkb_cb_lock. This has unknown side effects of the current callback
handling.

In future we should move to set/clear/test bit functionality and avoid
read, mask and writing back flag values. In later patches we will move
the upper parts to the new introduced internal lkb flags which are not
shared between other cluster nodes to the new non shared internal flag
field to avoid similar issues.

Cc: stable@vger.kernel.org
Fixes: 61bed0baa4 ("fs: dlm: use a non-static queue for callbacks")
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-11 23:10:55 +09:00
Alexander Aring
23a7f43153 fs: dlm: send FIN ack back in right cases
commit 00908b3388 upstream.

This patch moves to send a ack back for receiving a FIN message only
when we are in valid states. In other cases and there might be a sender
waiting for a ack we just let it timeout at the senders time and
hopefully all other cleanups will remove the FIN message on their
sending queue. As an example we should never send out an ACK being in
LAST_ACK state or we cannot assume a working socket communication when
we are in CLOSED state.

Cc: stable@vger.kernel.org
Fixes: 489d8e559c ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:30 +01:00
Alexander Aring
dcfde37c99 fs: dlm: move sending fin message into state change handling
commit a584963618 upstream.

This patch moves the send fin handling, which should appear in a specific
state change, into the state change handling while the per node
state_lock is held. I experienced issues with other messages because
we changed the state and a fin message was sent out in a different state.

Cc: stable@vger.kernel.org
Fixes: 489d8e559c ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:30 +01:00
Alexander Aring
5cdc5869b6 fs: dlm: don't set stop rx flag after node reset
commit 15c63db8e8 upstream.

Similar to the stop tx flag, the rx flag should warn about a dlm message
being received at DLM_FIN state change, when we are assuming no other
dlm application messages. If we receive a FIN message and we are in the
state DLM_FIN_WAIT2 we call midcomms_node_reset() which puts the
midcomms node into DLM_CLOSED state. Afterwards we should not set the
DLM_NODE_FLAG_STOP_RX flag any more.  This patch changes the setting
DLM_NODE_FLAG_STOP_RX in those state changes when we receive a FIN
message and we assume there will be no other dlm application messages
received until we hit DLM_CLOSED state.

Cc: stable@vger.kernel.org
Fixes: 489d8e559c ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:30 +01:00
Alexander Aring
c7283d6461 fs: dlm: fix race setting stop tx flag
commit 164272113b upstream.

This patch sets the stop tx flag before we commit the dlm message.
This flag will report about unexpected transmissions after we
send the DLM_FIN message out, which should be the last message sent.
When we commit the dlm fin message, it could be that we already
got an ack back and the CLOSED state change already happened.
We should not set this flag when we are in CLOSED state. To avoid this
race we simply set the tx flag before the state change can be in
progress by moving it before dlm_midcomms_commit_mhandle().

Cc: stable@vger.kernel.org
Fixes: 489d8e559c ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:29 +01:00
Alexander Aring
73a2bd498a fs: dlm: be sure to call dlm_send_queue_flush()
commit 7354fa4ef6 upstream.

If we release a midcomms node structure, there should be nothing left
inside the dlm midcomms send queue. However, sometimes this is not true
because I believe some DLM_FIN message was not acked... if we run
into a shutdown timeout, then we should be sure there is no pending send
dlm message inside this queue when releasing midcomms node structure.

Cc: stable@vger.kernel.org
Fixes: 489d8e559c ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:29 +01:00
Alexander Aring
a2de9f9b68 fs: dlm: fix use after free in midcomms commit
commit 724b6bab0d upstream.

While working on processing dlm message in softirq context I experienced
the following KASAN use-after-free warning:

[  151.760477] ==================================================================
[  151.761803] BUG: KASAN: use-after-free in dlm_midcomms_commit_mhandle+0x19d/0x4b0
[  151.763414] Read of size 4 at addr ffff88811a980c60 by task lock_torture/1347

[  151.765284] CPU: 7 PID: 1347 Comm: lock_torture Not tainted 6.1.0-rc4+ #2828
[  151.766778] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+16134+e5908aa2 04/01/2014
[  151.768726] Call Trace:
[  151.769277]  <TASK>
[  151.769748]  dump_stack_lvl+0x5b/0x86
[  151.770556]  print_report+0x180/0x4c8
[  151.771378]  ? kasan_complete_mode_report_info+0x7c/0x1e0
[  151.772241]  ? dlm_midcomms_commit_mhandle+0x19d/0x4b0
[  151.773069]  kasan_report+0x93/0x1a0
[  151.773668]  ? dlm_midcomms_commit_mhandle+0x19d/0x4b0
[  151.774514]  __asan_load4+0x7e/0xa0
[  151.775089]  dlm_midcomms_commit_mhandle+0x19d/0x4b0
[  151.775890]  ? create_message.isra.29.constprop.64+0x57/0xc0
[  151.776770]  send_common+0x19f/0x1b0
[  151.777342]  ? remove_from_waiters+0x60/0x60
[  151.778017]  ? lock_downgrade+0x410/0x410
[  151.778648]  ? __this_cpu_preempt_check+0x13/0x20
[  151.779421]  ? rcu_lockdep_current_cpu_online+0x88/0xc0
[  151.780292]  _convert_lock+0x46/0x150
[  151.780893]  convert_lock+0x7b/0xc0
[  151.781459]  dlm_lock+0x3ac/0x580
[  151.781993]  ? 0xffffffffc0540000
[  151.782522]  ? torture_stop+0x120/0x120 [dlm_locktorture]
[  151.783379]  ? dlm_scan_rsbs+0xa70/0xa70
[  151.784003]  ? preempt_count_sub+0xd6/0x130
[  151.784661]  ? is_module_address+0x47/0x70
[  151.785309]  ? torture_stop+0x120/0x120 [dlm_locktorture]
[  151.786166]  ? 0xffffffffc0540000
[  151.786693]  ? lockdep_init_map_type+0xc3/0x360
[  151.787414]  ? 0xffffffffc0540000
[  151.787947]  torture_dlm_lock_sync.isra.3+0xe9/0x150 [dlm_locktorture]
[  151.789004]  ? torture_stop+0x120/0x120 [dlm_locktorture]
[  151.789858]  ? 0xffffffffc0540000
[  151.790392]  ? lock_torture_cleanup+0x20/0x20 [dlm_locktorture]
[  151.791347]  ? delay_tsc+0x94/0xc0
[  151.791898]  torture_ex_iter+0xc3/0xea [dlm_locktorture]
[  151.792735]  ? torture_start+0x30/0x30 [dlm_locktorture]
[  151.793606]  lock_torture+0x177/0x270 [dlm_locktorture]
[  151.794448]  ? torture_dlm_lock_sync.isra.3+0x150/0x150 [dlm_locktorture]
[  151.795539]  ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[  151.796476]  ? do_raw_spin_lock+0x11e/0x1e0
[  151.797152]  ? mark_held_locks+0x34/0xb0
[  151.797784]  ? _raw_spin_unlock_irqrestore+0x30/0x70
[  151.798581]  ? __kthread_parkme+0x79/0x110
[  151.799246]  ? trace_preempt_on+0x2a/0xf0
[  151.799902]  ? __kthread_parkme+0x79/0x110
[  151.800579]  ? preempt_count_sub+0xd6/0x130
[  151.801271]  ? __kasan_check_read+0x11/0x20
[  151.801963]  ? __kthread_parkme+0xec/0x110
[  151.802630]  ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[  151.803569]  kthread+0x192/0x1d0
[  151.804104]  ? kthread_complete_and_exit+0x30/0x30
[  151.804881]  ret_from_fork+0x1f/0x30
[  151.805480]  </TASK>

[  151.806111] Allocated by task 1347:
[  151.806681]  kasan_save_stack+0x26/0x50
[  151.807308]  kasan_set_track+0x25/0x30
[  151.807920]  kasan_save_alloc_info+0x1e/0x30
[  151.808609]  __kasan_slab_alloc+0x63/0x80
[  151.809263]  kmem_cache_alloc+0x1ad/0x830
[  151.809916]  dlm_allocate_mhandle+0x17/0x20
[  151.810590]  dlm_midcomms_get_mhandle+0x96/0x260
[  151.811344]  _create_message+0x95/0x180
[  151.811994]  create_message.isra.29.constprop.64+0x57/0xc0
[  151.812880]  send_common+0x129/0x1b0
[  151.813467]  _convert_lock+0x46/0x150
[  151.814074]  convert_lock+0x7b/0xc0
[  151.814648]  dlm_lock+0x3ac/0x580
[  151.815199]  torture_dlm_lock_sync.isra.3+0xe9/0x150 [dlm_locktorture]
[  151.816258]  torture_ex_iter+0xc3/0xea [dlm_locktorture]
[  151.817129]  lock_torture+0x177/0x270 [dlm_locktorture]
[  151.817986]  kthread+0x192/0x1d0
[  151.818518]  ret_from_fork+0x1f/0x30

[  151.819369] Freed by task 1336:
[  151.819890]  kasan_save_stack+0x26/0x50
[  151.820514]  kasan_set_track+0x25/0x30
[  151.821128]  kasan_save_free_info+0x2e/0x50
[  151.821812]  __kasan_slab_free+0x107/0x1a0
[  151.822483]  kmem_cache_free+0x204/0x5e0
[  151.823152]  dlm_free_mhandle+0x18/0x20
[  151.823781]  dlm_mhandle_release+0x2e/0x40
[  151.824454]  rcu_core+0x583/0x1330
[  151.825047]  rcu_core_si+0xe/0x20
[  151.825594]  __do_softirq+0xf4/0x5c2

[  151.826450] Last potentially related work creation:
[  151.827238]  kasan_save_stack+0x26/0x50
[  151.827870]  __kasan_record_aux_stack+0xa2/0xc0
[  151.828609]  kasan_record_aux_stack_noalloc+0xb/0x20
[  151.829415]  call_rcu+0x4c/0x760
[  151.829954]  dlm_mhandle_delete+0x97/0xb0
[  151.830718]  dlm_process_incoming_buffer+0x2fc/0xb30
[  151.831524]  process_dlm_messages+0x16e/0x470
[  151.832245]  process_one_work+0x505/0xa10
[  151.832905]  worker_thread+0x67/0x650
[  151.833507]  kthread+0x192/0x1d0
[  151.834046]  ret_from_fork+0x1f/0x30

[  151.834900] The buggy address belongs to the object at ffff88811a980c30
                which belongs to the cache dlm_mhandle of size 88
[  151.836894] The buggy address is located 48 bytes inside of
                88-byte region [ffff88811a980c30, ffff88811a980c88)

[  151.839007] The buggy address belongs to the physical page:
[  151.839904] page:0000000076cf5d62 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11a980
[  151.841378] flags: 0x8000000000000200(slab|zone=2)
[  151.842141] raw: 8000000000000200 0000000000000000 dead000000000122 ffff8881089b43c0
[  151.843401] raw: 0000000000000000 0000000000220022 00000001ffffffff 0000000000000000
[  151.844640] page dumped because: kasan: bad access detected

[  151.845822] Memory state around the buggy address:
[  151.846602]  ffff88811a980b00: fb fb fb fb fc fc fc fc fa fb fb fb fb fb fb fb
[  151.847761]  ffff88811a980b80: fb fb fb fc fc fc fc fa fb fb fb fb fb fb fb fb
[  151.848921] >ffff88811a980c00: fb fb fc fc fc fc fa fb fb fb fb fb fb fb fb fb
[  151.850076]                                                        ^
[  151.851085]  ffff88811a980c80: fb fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb
[  151.852269]  ffff88811a980d00: fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb fc
[  151.853428] ==================================================================
[  151.855618] Disabling lock debugging due to kernel taint

It is accessing a mhandle in dlm_midcomms_commit_mhandle() and the mhandle
was freed by a call_rcu() call in dlm_process_incoming_buffer(),
dlm_mhandle_delete(). It looks like it was freed because an ack of
this message was received. There is a short race between committing the
dlm message to be transmitted and getting an ack back. If the ack is
faster than returning from dlm_midcomms_commit_msg_3_2(), then we run
into a use-after free because we still need to reference the mhandle when
calling srcu_read_unlock().

To avoid that, we don't allow that mhandle to be freed between
dlm_midcomms_commit_msg_3_2() and srcu_read_unlock() by using rcu read
lock. We can do that because mhandle is protected by rcu handling.

Cc: stable@vger.kernel.org
Fixes: 489d8e559c ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:29 +01:00
Alexander Aring
714329e2d0 fs: dlm: start midcomms before scand
commit aad633dc0c upstream.

The scand kthread can send dlm messages out, especially dlm remove
messages to free memory for unused rsb on other nodes. To send out dlm
messages, midcomms must be initialized. This patch moves the midcomms
start before scand is started.

Cc: stable@vger.kernel.org
Fixes: e7fd41792f ("[DLM] The core of the DLM for GFS2/CLVM")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:29:29 +01:00
Yang Yingliang
a43a0987ac fs: dlm: fix return value check in dlm_memory_init()
[ Upstream commit 8113aa9136 ]

It should check 'cb_cache', after calling kmem_cache_create("dlm_cb").

Fixes: 61bed0baa4 ("fs: dlm: use a non-static queue for callbacks")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10 09:27:43 +01:00
Benjamin Coddington
98123866fc Treewide: Stop corrupting socket's task_frag
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:28:49 -08:00
Alexander Aring
7a5e9f1f83 fs: dlm: fix building without lockdep
This patch uses assert_spin_locked() instead of lockdep_is_held()
where it's available to use because lockdep_is_held() is only available
if CONFIG_LOCKDEP is set.

In other cases like lockdep_sock_is_held() we surround it by a
CONFIG_LOCKDEP idef.

Fixes: dbb751ffab ("fs: dlm: parallelize lowcomms socket handling")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-22 10:14:26 -06:00
Alexander Aring
dbb751ffab fs: dlm: parallelize lowcomms socket handling
This patch is rework of lowcomms handling, the main goal was here to
handle recvmsg() and sendpage() to run parallel. Parallel in two senses:
1. per connection and 2. that recvmsg()/sendpage() doesn't block each
other.

Currently recvmsg()/sendpage() cannot run parallel because two
workqueues "dlm_recv" and "dlm_send" are ordered workqueues. That means
only one work item can be executed. The amount of queue items will be
increased about the amount of nodes being inside the cluster. The current
two workqueues for sending and receiving can also block each other if the
same connection is executed at the same time in dlm_recv and dlm_send
workqueue because a per connection mutex for the socket handling.

To make it more parallel we introduce one "dlm_io" workqueue which is
not an ordered workqueue, the amount of workers are not limited. Due
per connection flags SEND/RECV pending we schedule workers ordered per
connection and per send and receive task. To get rid of the mutex
blocking same workers to do socket handling we switched to a semaphore
which handles socket operations as read lock and sock releases as write
operations, to prevent sock_release() being called while the socket is
being used.

There might be more optimization removing the semaphore and replacing it
with other synchronization mechanism, however due other circumstances
e.g. othercon behaviour it seems complicated to doing this change. I
added comments to remove the othercon handling and moving to a different
synchronization mechanism as this is done. We need to do that to the next
dlm major version upgrade because it is not backwards compatible with the
current connect mechanism.

The processing of dlm messages need to be still handled by a ordered
workqueue. An dlm_process ordered workqueue was introduced which gets
filled by the receive worker. This is probably the next bottleneck of
DLM but the application can't currently parse dlm messages parallel. A
comment was introduced to lift the workqueue context of dlm processing
in a non-sleepable softirq to get messages processing done fast.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
1351975ac1 fs: dlm: don't init error value
This patch removes a init of an error value to -EINVAL which is not
necessary.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
c852a6d706 fs: dlm: use saved sk_error_report()
This patch changes the handling of calling the original
sk_error_report() by not putting it on the stack and calling it later.
If the listen_sock.sk_error_report() is NULL in this moment it indicates
a bug in our implementation.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
e9dd5fd849 fs: dlm: use sock2con without checking null
This patch removes null checks on private data for sockets. If we have a
null dereference there we having a bug in our implementation that such
callback occurs in this state.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
6f0b0b5d7a fs: dlm: remove dlm_node_addrs lookup list
This patch merges the dlm_node_addrs lookup list to the connection
structure. It is a per node mapping to some configuration setup by
configfs. We don't need two lookup structures. The connection hash has
now a lifetime like the dlm_node_addrs entries. Means we add only new
entries when configure cluster and not while new connections are coming
in, remove connection when a node got fenced and cleanup all connection
when the dlm exits. It should work the same and even will show more
issues because we don't try to somehow keep those two data structures in
sync with the current cluster configuration.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
c51c9cd8ad fs: dlm: don't put dlm_local_addrs on heap
This patch removes to allocate the dlm_local_addr[] pointers on the
heap. Instead we directly store the type of "struct sockaddr_storage".
This removes function deinit_local() because it was freeing memory only.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
c3d88dfd15 fs: dlm: cleanup listen sock handling
This patch removes save_listen_callbacks() and add_listen_sock() as they
are only used once in lowcomms functionality. For shutdown lowcomms it's
not necessary to whole flush the workqueues to synchronize with
restoring the old sk_data_ready() callback. Only the listen con receive
work need to be cancelled. For each individual node shutdown we should be
sure that last ack was been transmitted which is done by flushing per
connection swork worker.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
4f567acb0b fs: dlm: remove socket shutdown handling
Since commit 489d8e559c ("fs: dlm: add reliable connection if
reconnect") we have functionality like TCP offers for half-closed
sockets on dlm application protocol layer. This feature is required
because the cluster manager events about leaving resource memberships
can be locally already occurred but other cluster nodes having a pending
leaving membership over the cluster manager protocol happening. In this
time the local dlm node already shutdown it's connection and don't
transmit anymore any new dlm messages, but however it still needs to be
able to accept dlm messages because the pending leave membership request
of the cluster manager protocol which the dlm kernel implementation has
no control about it.

We have this functionality on the application for two reasons, the main
reason is that SCTP does not support such functionality on socket
layer. But we can do it inside application layer.

Another small issue is that this feature is broken in the TCP world
because some NAT devices does not implement such functionality
correctly. This is the same reason why the reliable connection session
layer in DLM exists. We give up on middle devices in the networking
which sends e.g. TCP resets out. In DLM we cannot have any message
dropping and we ensure it over a session layer that it can't happen.

Back to the half-closed grace shutdown handling. It's not necessary
anymore to do it on socket layer (which is only support for TCP sockets)
because we do it on application layer. This patch removes this handling,
if there are still issues then we have a problem on the application
layer for such handling.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
1037c2a94a fs: dlm: use listen sock as dlm running indicator
This patch will switch from dlm_allow_conn to check if dlm lowcomms is
running or not to if we actually have a listen socket set or not. The
list socket will be set and unset in lowcomms start and shutdown
functionality. To synchronize with data_ready() callback we will set the
socket callback to NULL while socket lock is held.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
dd070a56e0 fs: dlm: use list_first_entry_or_null
Instead of check on list_empty() we can do the same with
list_first_entry_or_null() and return NULL if the returned value is NULL.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
01ea3d7701 fs: dlm: remove twice INIT_WORK
This patch removed a twice INIT_WORK() functionality. We already doing
this inside of dlm_lowcomms_init() functionality which is called only
once dlm is loaded.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
8b0188b0d6 fs: dlm: add midcomms init/start functions
This patch introduces leftovers of init, start, stop and exit
functionality. The dlm application layer should always call the midcomms
layer which getting aware of such event and redirect it to the lowcomms
layer. Some functionality which is currently handled inside the start
functionality of midcomms and lowcomms should be handled in the init
functionality as it only need to be initialized once when dlm is loaded.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
17827754e5 fs: dlm: add dst nodeid for msg tracing
In DLM when we send a dlm message it is easy to add the lock resource
name, but additional lookup is required when to trace the receive
message side. The idea here is to move the lookup work to the user by
using a lookup to find the right send message with recv message. As note
DLM can't drop any message which is guaranteed by a special session
layer.

For doing the lookup a 3 tupel is required as an unique identification
which is dst nodeid, src nodeid and sequence number. This patch adds the
destination nodeid to the dlm message trace points. The source nodeid is
given by the h_nodeid field inside the header.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
554d849616 fs: dlm: rename DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING
This patch renames DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING because
CB_PENDING is a proper name to describe this flag. This flag is set when
callback enqueue will return DLM_ENQUEUE_CALLBACK_NEED_SCHED because the
callback worker need to be queued. The flag tells that callbacks are
currently pending to be called and will be unset if the callback work
for the specific lkb is done. The term need schedule is part of this
time but a proper name is to say that there are some callbacks pending
to being called.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
740bb8fc10 fs: dlm: ast do WARN_ON_ONCE() on hotpath
This patch changes the ast hotpath functionality in very unlikely cases
that we do WARN_ON_ONCE() instead of WARN_ON() to not spamming the
console output if we run into states that it would occur over and over
again.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
9267c85769 fs: dlm: drop lkb ref in bug case
This patch will drop the lkb reference in an very unlikely case which
should in practice not happened. However if it happens we cleanup the
reference just in case.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
f217d7ccb9 fs: dlm: avoid false-positive checker warning
This patch avoid the false-positive checker warning about writing 112
bytes into a 88 bytes field "e->request", see:

[   54.891560] dlm: csmb1: dlm_recover_directory 23 out 2 messages
[   54.990542] ------------[ cut here ]------------
[   54.991012] memcpy: detected field-spanning write (size 112) of single field "&e->request" at fs/dlm/requestqueue.c:47 (size 88)
[   54.992150] WARNING: CPU: 0 PID: 297 at fs/dlm/requestqueue.c:47 dlm_add_requestqueue+0x177/0x180
[   54.993002] CPU: 0 PID: 297 Comm: kworker/u4:3 Not tainted 6.1.0-rc5-00008-ge01d50cbd6ee #248
[   54.993878] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
[   54.994718] Workqueue: dlm_recv process_recv_sockets
[   54.995230] RIP: 0010:dlm_add_requestqueue+0x177/0x180
[   54.995731] Code: e7 01 0f 85 3b ff ff ff b9 58 00 00 00 48 c7 c2 c0 41 74 82 4c 89 ee 48 c7 c7 20 42 74 82 c6 05 8b 8d 30 02 01 e8 51 07 be 00 <0f> 0b e9 12 ff ff ff 66 90 0f 1f 44 00 00 41 57 48 8d 87 10 08 00
[   54.997483] RSP: 0018:ffffc90000b1fbe8 EFLAGS: 00010282
[   54.997990] RAX: 0000000000000000 RBX: ffff888024fc3d00 RCX: 0000000000000000
[   54.998667] RDX: 0000000000000001 RSI: ffffffff81155014 RDI: fffff52000163f73
[   54.999342] RBP: ffff88800dbac000 R08: 0000000000000001 R09: ffffc90000b1fa5f
[   54.999997] R10: fffff52000163f4b R11: 203a7970636d656d R12: ffff88800cfb0018
[   55.000673] R13: 0000000000000070 R14: ffff888024fc3d18 R15: 0000000000000000
[   55.001344] FS:  0000000000000000(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000
[   55.002078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.002603] CR2: 00007f35d4f0b9a0 CR3: 0000000025495002 CR4: 0000000000770ef0
[   55.003258] PKRU: 55555554
[   55.003514] Call Trace:
[   55.003756]  <TASK>
[   55.003953]  dlm_receive_buffer+0x1c0/0x200
[   55.004348]  dlm_process_incoming_buffer+0x46d/0x780
[   55.004786]  ? kernel_recvmsg+0x8b/0xc0
[   55.005150]  receive_from_sock.isra.0+0x168/0x420
[   55.005582]  ? process_listen_recv_socket+0x10/0x10
[   55.006018]  ? finish_task_switch.isra.0+0xe0/0x400
[   55.006469]  ? __switch_to+0x2fe/0x6a0
[   55.006808]  ? read_word_at_a_time+0xe/0x20
[   55.007197]  ? strscpy+0x146/0x190
[   55.007505]  process_one_work+0x3d0/0x6b0
[   55.007863]  worker_thread+0x8d/0x620
[   55.008209]  ? __kthread_parkme+0xd8/0xf0
[   55.008565]  ? process_one_work+0x6b0/0x6b0
[   55.008937]  kthread+0x171/0x1a0
[   55.009251]  ? kthread_exit+0x60/0x60
[   55.009582]  ret_from_fork+0x1f/0x30
[   55.009903]  </TASK>
[   55.010120] ---[ end trace 0000000000000000 ]---
[   55.025783] dlm: csmb1: dlm_recover 5 generation 3 done: 201 ms
[   55.026466] gfs2: fsid=smbcluster:csmb1.0: recover generation 3 done

It seems the checker is unable to detect the additional length bytes
which was allocated additionally for the flexible array in struct
dlm_message. To solve it we split the memcpy() into copy for the 88 bytes
struct and another memcpy() for the flexible array m_extra field.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21 09:45:49 -06:00
Alexander Aring
775af20746 fs: dlm: use WARN_ON_ONCE() instead of WARN_ON()
To not get the console spammed about WARN_ON() of invalid states in the
dlm midcomms hot path handling we switch to WARN_ON_ONCE() to get it
only once that there might be an issue with the midcomms state handling.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
3e54c9e80e fs: dlm: fix log of lowcomms vs midcomms
This patch will fix a small issue when printing out that
dlm_midcomms_start() failed to start and it was printing out that the
dlm subcomponent lowcomms was failed but lowcomms is behind the midcomms
layer.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
9c693d76ab fs: dlm: catch dlm_add_member() error
This patch will catch a possible dlm_add_member() and delivers it to the
dlm recovery handling.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
194a3fb488 fs: dlm: relax sending to allow receiving
This patch drops additionally the sock_mutex when there is a sending
message burst. Since we have acknowledge handling we free sending
buffers only when we receive an ack back, but if we are stuck in
send_to_sock() looping because dlm sends a lot of messages and we never
leave the loop the sending buffer fill up very quickly. We can't receive
during this iteration because the sock_mutex is held. This patch will
unlock the sock_mutex so it should be possible to receive messages when
a burst of sending messages happens. This will allow to free up memory
because acks which are already received can be processed.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
3872f87b09 fs: dlm: remove ls_remove_wait waitqueue
This patch removes the ls_remove_wait waitqueue handling. The current
handling tries to wait before a lookup is send out for a identically
resource name which is going to be removed. Hereby the remove message
should be send out before the new lookup message. The reason is that
after a lookup request and response will actually use the specific
remote rsb. A followed remove message would delete the rsb on the remote
side but it's still being used.

To reach a similar behaviour we simple send the remove message out while
the rsb lookup lock is held and the rsb is removed from the toss list.
Other find_rsb() calls would never have the change to get a rsb back to
live while a remove message will be send out (without holding the lock).

This behaviour requires a non-sleepable context which should be provided
now and might be the reason why it was not implemented so in the first
place.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
e1711fe3fd fs: dlm: allow different allocation context per _create_message
This patch allows to give the use control about the allocation context
based on a per message basis. Currently all messages forced to be
created under GFP_NOFS context.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
61bed0baa4 fs: dlm: use a non-static queue for callbacks
This patch will introducde a queue implementation for callbacks by using
the Linux lists. The current callback queue handling is implemented by a
static limit of 6 entries, see DLM_CALLBACKS_SIZE. The sequence number
inside the callback structure was used to see if the entries inside the
static entry is valid or not. We don't need any sequence numbers anymore
with a dynamic datastructure with grows and shrinks during runtime to
offer such functionality.

We assume that every callback will be delivered to the DLM user if once
queued. Therefore the callback flag DLM_CB_SKIP was dropped and the
check for skipping bast was moved before worker handling and not skip
while the callback worker executes. This will reduce unnecessary queues
of the callback worker.

All last callback saves are pointers now and don't need to copied over.
There is a reference counter for callback structures which will care
about to free the callback structures at the right time if they are not
referenced anymore.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
27d3994ebb fs: dlm: move last cast bast time to function call
This patch moves the debugging information of the last cast and bast
time when calling the last and bast function call.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
92e9573330 fs: dlm: use spin lock instead of mutex
There is no need to use a mutex in those hot path sections. We change it
to spin lock to serve callbacks more faster by not allowing schedule.
The locked sections will not be locked for a long time.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
a4c0352bb1 fs: dlm: convert ls_cb_mutex mutex to spinlock
This patch converts the ls_cb_mutex mutex to a spinlock, there is no
sleepable context when this lock is held.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
d3e4dc5d68 fs: dlm: use list_first_entry marco
Instead of using list_entry() this patch moves to using the
list_first_entry() macro.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
85839f27b1 fs: dlm: let dlm_add_cb queue work after resume only
We should allow dlm_add_cb() to call queue_work() only after the
recovery queued pending for delayed lkbs. This patch will move the
switch LSFL_CB_DELAY after the delayed lkb work was processed.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
e01c4b7bd4 fd: dlm: trace send/recv of dlm message and rcom
This patch adds tracepoints for send and recv cases of dlm messages and
dlm rcom messages. In case of send and dlm message we add the dlm rsb
resource name this dlm messages belongs to. This has the advantage to
follow dlm messages on a per lock basis. In case of recv message the
resource name can be extracted by follow the send message sequence
number.

The dlm message DLM_MSG_PURGE doesn't belong to a lock request and will
not set the resource name in a dlm_message trace. The same for all rcom
messages.

There is additional handling required for this debugging functionality
which is tried to be small as possible. Also the midcomms layer gets
aware of lock resource names, for now this is required to make a
connection between sequence number and lock resource names. It is for
debugging purpose only.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
5b787667e8 fs: dlm: use packet in dlm_mhandle
To allow more than just dereferencing the inner header we directly point
to the inner dlm packet which allows us to dereference the header, rcom
or message structure.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
57a5724ef0 fs: dlm: remove send repeat remove handling
This patch removes the send repeat remove handling. This handling is
there to repeatingly DLM_MSG_REMOVE messages in cases the dlm stack
thinks it was not received at the first time. In cases of message drops
this functionality is necessary, but since the DLM midcomms layer
guarantees there are no messages drops between cluster nodes this
feature became not strict necessary anymore. Due message
delays/processing it could be that two send_repeat_remove() are sent out
while the other should be still on it's way. We remove the repeat remove
handling because we are sure that the message cannot be dropped due
communication errors.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
f0f4bb431b fs: dlm: retry accept() until -EAGAIN or error returns
This patch fixes a race if we get two times an socket data ready event
while the listen connection worker is queued. Currently it will be
served only once but we need to do it (in this case twice) until we hit
-EAGAIN which tells us there is no pending accept going on.

This patch wraps an do while loop until we receive a return value which
is different than 0 as it was done before commit d11ccd451b ("fs: dlm:
listen socket out of connection hash").

Cc: stable@vger.kernel.org
Fixes: d11ccd451b ("fs: dlm: listen socket out of connection hash")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Alexander Aring
08ae0547e7 fs: dlm: fix sock release if listen fails
This patch fixes a double sock_release() call when the listen() is
called for the dlm lowcomms listen socket. The caller of
dlm_listen_for_all should never care about releasing the socket if
dlm_listen_for_all() fails, it's done now only once if listen() fails.

Cc: stable@vger.kernel.org
Fixes: 2dc6b1158c ("fs: dlm: introduce generic listen")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:59:41 -06:00
Paulo Miguel Almeida
d96d0f9617 dlm: replace one-element array with fixed size array
One-element arrays are deprecated. So, replace one-element array with
fixed size array member in struct dlm_ls, and refactor the rest of the
code, accordingly.

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/228
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
Link: https://lore.kernel.org/lkml/Y0W5jkiXUkpNl4ap@mail.google.com/

Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08 12:58:47 -06:00
Linus Torvalds
0326074ff4 Networking changes for 6.1.
Core
 ----
 
  - Introduce and use a single page frag cache for allocating small skb
    heads, clawing back the 10-20% performance regression in UDP flood
    test from previous fixes.
 
  - Run packets which already went thru HW coalescing thru SW GRO.
    This significantly improves TCP segment coalescing and simplifies
    deployments as different workloads benefit from HW or SW GRO.
 
  - Shrink the size of the base zero-copy send structure.
 
  - Move TCP init under a new slow / sleepable version of DO_ONCE().
 
 BPF
 ---
 
  - Add BPF-specific, any-context-safe memory allocator.
 
  - Add helpers/kfuncs for PKCS#7 signature verification from BPF
    programs.
 
  - Define a new map type and related helpers for user space -> kernel
    communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
 
  - Allow targeting BPF iterators to loop through resources of one
    task/thread.
 
  - Add ability to call selected destructive functions.
    Expose crash_kexec() to allow BPF to trigger a kernel dump.
    Use CAP_SYS_BOOT check on the loading process to judge permissions.
 
  - Enable BPF to collect custom hierarchical cgroup stats efficiently
    by integrating with the rstat framework.
 
  - Support struct arguments for trampoline based programs.
    Only structs with size <= 16B and x86 are supported.
 
  - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
    sockets (instead of just TCP and UDP sockets).
 
  - Add a helper for accessing CLOCK_TAI for time sensitive network
    related programs.
 
  - Support accessing network tunnel metadata's flags.
 
  - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
 
  - Add support for writing to Netfilter's nf_conn:mark.
 
 Protocols
 ---------
 
  - WiFi: more Extremely High Throughput (EHT) and Multi-Link
    Operation (MLO) work (802.11be, WiFi 7).
 
  - vsock: improve support for SO_RCVLOWAT.
 
  - SMC: support SO_REUSEPORT.
 
  - Netlink: define and document how to use netlink in a "modern" way.
    Support reporting missing attributes via extended ACK.
 
  - IPSec: support collect metadata mode for xfrm interfaces.
 
  - TCPv6: send consistent autoflowlabel in SYN_RECV state
    and RST packets.
 
  - TCP: introduce optional per-netns connection hash table to allow
    better isolation between namespaces (opt-in, at the cost of memory
    and cache pressure).
 
  - MPTCP: support TCP_FASTOPEN_CONNECT.
 
  - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
 
  - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
 
  - Open vSwitch:
    - Allow specifying ifindex of new interfaces.
    - Allow conntrack and metering in non-initial user namespace.
 
  - TLS: support the Korean ARIA-GCM crypto algorithm.
 
  - Remove DECnet support.
 
 Driver API
 ----------
 
  - Allow selecting the conduit interface used by each port
    in DSA switches, at runtime.
 
  - Ethernet Power Sourcing Equipment and Power Device support.
 
  - Add tc-taprio support for queueMaxSDU parameter, i.e. setting
    per traffic class max frame size for time-based packet schedules.
 
  - Support PHY rate matching - adapting between differing host-side
    and link-side speeds.
 
  - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
 
  - Validate OF (device tree) nodes for DSA shared ports; make
    phylink-related properties mandatory on DSA and CPU ports.
    Enforcing more uniformity should allow transitioning to phylink.
 
  - Require that flash component name used during update matches one
    of the components for which version is reported by info_get().
 
  - Remove "weight" argument from driver-facing NAPI API as much
    as possible. It's one of those magic knobs which seemed like
    a good idea at the time but is too indirect to use in practice.
 
  - Support offload of TLS connections with 256 bit keys.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Microchip KSZ9896 6-port Gigabit Ethernet Switch
    - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
    - Analog Devices ADIN1110 and ADIN2111 industrial single pair
      Ethernet (10BASE-T1L) MAC+PHY.
    - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
 
  - Ethernet SFPs / modules:
    - RollBall / Hilink / Turris 10G copper SFPs
    - HALNy GPON module
 
  - WiFi:
    - CYW43439 SDIO chipset (brcmfmac)
    - CYW89459 PCIe chipset (brcmfmac)
    - BCM4378 on Apple platforms (brcmfmac)
 
 Drivers
 -------
 
  - CAN:
    - gs_usb: HW timestamp support
 
  - Ethernet PHYs:
    - lan8814: cable diagnostics
 
  - Ethernet NICs:
    - Intel (100G):
      - implement control of FCS/CRC stripping
      - port splitting via devlink
      - L2TPv3 filtering offload
    - nVidia/Mellanox:
      - tunnel offload for sub-functions
      - MACSec offload, w/ Extended packet number and replay
        window offload
      - significantly restructure, and optimize the AF_XDP support,
        align the behavior with other vendors
    - Huawei:
      - configuring DSCP map for traffic class selection
      - querying standard FEC statistics
      - querying SerDes lane number via ethtool
    - Marvell/Cavium:
      - egress priority flow control
      - MACSec offload
    - AMD/SolarFlare:
      - PTP over IPv6 and raw Ethernet
    - small / embedded:
      - ax88772: convert to phylink (to support SFP cages)
      - altera: tse: convert to phylink
      - ftgmac100: support fixed link
      - enetc: standard Ethtool counters
      - macb: ZynqMP SGMII dynamic configuration support
      - tsnep: support multi-queue and use page pool
      - lan743x: Rx IP & TCP checksum offload
      - igc: add xdp frags support to ndo_xdp_xmit
 
  - Ethernet high-speed switches:
    - Marvell (prestera):
      - support SPAN port features (traffic mirroring)
      - nexthop object offloading
    - Microchip (sparx5):
      - multicast forwarding offload
      - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - support RGMII cmode
    - NXP (felix):
      - standardized ethtool counters
    - Microchip (lan966x):
      - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
      - traffic policing and mirroring
      - link aggregation / bonding offload
      - QUSGMII PHY mode support
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - cold boot calibration support on WCN6750
    - support to connect to a non-transmit MBSSID AP profile
    - enable remain-on-channel support on WCN6750
    - Wake-on-WLAN support for WCN6750
    - support to provide transmit power from firmware via nl80211
    - support to get power save duration for each client
    - spectral scan support for 160 MHz
 
  - MediaTek WiFi (mt76):
    - WiFi-to-Ethernet bridging offload for MT7986 chips
 
  - RealTek WiFi (rtw89):
    - P2P support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S
 Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY
 fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe
 IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j
 TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52
 bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs
 HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89
 Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B
 AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q
 9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf
 LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE
 ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8=
 =Gsio
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Introduce and use a single page frag cache for allocating small skb
     heads, clawing back the 10-20% performance regression in UDP flood
     test from previous fixes.

   - Run packets which already went thru HW coalescing thru SW GRO. This
     significantly improves TCP segment coalescing and simplifies
     deployments as different workloads benefit from HW or SW GRO.

   - Shrink the size of the base zero-copy send structure.

   - Move TCP init under a new slow / sleepable version of DO_ONCE().

  BPF:

   - Add BPF-specific, any-context-safe memory allocator.

   - Add helpers/kfuncs for PKCS#7 signature verification from BPF
     programs.

   - Define a new map type and related helpers for user space -> kernel
     communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).

   - Allow targeting BPF iterators to loop through resources of one
     task/thread.

   - Add ability to call selected destructive functions. Expose
     crash_kexec() to allow BPF to trigger a kernel dump. Use
     CAP_SYS_BOOT check on the loading process to judge permissions.

   - Enable BPF to collect custom hierarchical cgroup stats efficiently
     by integrating with the rstat framework.

   - Support struct arguments for trampoline based programs. Only
     structs with size <= 16B and x86 are supported.

   - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
     sockets (instead of just TCP and UDP sockets).

   - Add a helper for accessing CLOCK_TAI for time sensitive network
     related programs.

   - Support accessing network tunnel metadata's flags.

   - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.

   - Add support for writing to Netfilter's nf_conn:mark.

  Protocols:

   - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
     (MLO) work (802.11be, WiFi 7).

   - vsock: improve support for SO_RCVLOWAT.

   - SMC: support SO_REUSEPORT.

   - Netlink: define and document how to use netlink in a "modern" way.
     Support reporting missing attributes via extended ACK.

   - IPSec: support collect metadata mode for xfrm interfaces.

   - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
     packets.

   - TCP: introduce optional per-netns connection hash table to allow
     better isolation between namespaces (opt-in, at the cost of memory
     and cache pressure).

   - MPTCP: support TCP_FASTOPEN_CONNECT.

   - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.

   - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.

   - Open vSwitch:
      - Allow specifying ifindex of new interfaces.
      - Allow conntrack and metering in non-initial user namespace.

   - TLS: support the Korean ARIA-GCM crypto algorithm.

   - Remove DECnet support.

  Driver API:

   - Allow selecting the conduit interface used by each port in DSA
     switches, at runtime.

   - Ethernet Power Sourcing Equipment and Power Device support.

   - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
     traffic class max frame size for time-based packet schedules.

   - Support PHY rate matching - adapting between differing host-side
     and link-side speeds.

   - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.

   - Validate OF (device tree) nodes for DSA shared ports; make
     phylink-related properties mandatory on DSA and CPU ports.
     Enforcing more uniformity should allow transitioning to phylink.

   - Require that flash component name used during update matches one of
     the components for which version is reported by info_get().

   - Remove "weight" argument from driver-facing NAPI API as much as
     possible. It's one of those magic knobs which seemed like a good
     idea at the time but is too indirect to use in practice.

   - Support offload of TLS connections with 256 bit keys.

  New hardware / drivers:

   - Ethernet:
      - Microchip KSZ9896 6-port Gigabit Ethernet Switch
      - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
      - Analog Devices ADIN1110 and ADIN2111 industrial single pair
        Ethernet (10BASE-T1L) MAC+PHY.
      - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).

   - Ethernet SFPs / modules:
      - RollBall / Hilink / Turris 10G copper SFPs
      - HALNy GPON module

   - WiFi:
      - CYW43439 SDIO chipset (brcmfmac)
      - CYW89459 PCIe chipset (brcmfmac)
      - BCM4378 on Apple platforms (brcmfmac)

  Drivers:

   - CAN:
      - gs_usb: HW timestamp support

   - Ethernet PHYs:
      - lan8814: cable diagnostics

   - Ethernet NICs:
      - Intel (100G):
         - implement control of FCS/CRC stripping
         - port splitting via devlink
         - L2TPv3 filtering offload
      - nVidia/Mellanox:
         - tunnel offload for sub-functions
         - MACSec offload, w/ Extended packet number and replay window
           offload
         - significantly restructure, and optimize the AF_XDP support,
           align the behavior with other vendors
      - Huawei:
         - configuring DSCP map for traffic class selection
         - querying standard FEC statistics
         - querying SerDes lane number via ethtool
      - Marvell/Cavium:
         - egress priority flow control
         - MACSec offload
      - AMD/SolarFlare:
         - PTP over IPv6 and raw Ethernet
      - small / embedded:
         - ax88772: convert to phylink (to support SFP cages)
         - altera: tse: convert to phylink
         - ftgmac100: support fixed link
         - enetc: standard Ethtool counters
         - macb: ZynqMP SGMII dynamic configuration support
         - tsnep: support multi-queue and use page pool
         - lan743x: Rx IP & TCP checksum offload
         - igc: add xdp frags support to ndo_xdp_xmit

   - Ethernet high-speed switches:
      - Marvell (prestera):
         - support SPAN port features (traffic mirroring)
         - nexthop object offloading
      - Microchip (sparx5):
         - multicast forwarding offload
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - support RGMII cmode
      - NXP (felix):
         - standardized ethtool counters
      - Microchip (lan966x):
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
         - traffic policing and mirroring
         - link aggregation / bonding offload
         - QUSGMII PHY mode support

   - Qualcomm 802.11ax WiFi (ath11k):
      - cold boot calibration support on WCN6750
      - support to connect to a non-transmit MBSSID AP profile
      - enable remain-on-channel support on WCN6750
      - Wake-on-WLAN support for WCN6750
      - support to provide transmit power from firmware via nl80211
      - support to get power save duration for each client
      - spectral scan support for 160 MHz

   - MediaTek WiFi (mt76):
      - WiFi-to-Ethernet bridging offload for MT7986 chips

   - RealTek WiFi (rtw89):
      - P2P support"

* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
  eth: pse: add missing static inlines
  once: rename _SLOW to _SLEEPABLE
  net: pse-pd: add regulator based PSE driver
  dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
  ethtool: add interface to interact with Ethernet Power Equipment
  net: mdiobus: search for PSE nodes by parsing PHY nodes.
  net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
  net: add framework to support Ethernet PSE and PDs devices
  dt-bindings: net: phy: add PoDL PSE property
  net: marvell: prestera: Propagate nh state from hw to kernel
  net: marvell: prestera: Add neighbour cache accounting
  net: marvell: prestera: add stub handler neighbour events
  net: marvell: prestera: Add heplers to interact with fib_notifier_info
  net: marvell: prestera: Add length macros for prestera_ip_addr
  net: marvell: prestera: add delayed wq and flush wq on deinit
  net: marvell: prestera: Add strict cleanup of fib arbiter
  net: marvell: prestera: Add cleanup of allocated fib_nodes
  net: marvell: prestera: Add router nexthops ABI
  eth: octeon: fix build after netif_napi_add() changes
  net/mlx5: E-Switch, Return EBUSY if can't get mode lock
  ...
2022-10-04 13:38:03 -07:00
Alexander Aring
3b7610302a fs: dlm: fix possible use after free if tracing
This patch fixes a possible use after free if tracing for the specific
event is enabled. To avoid the use after free we introduce a out_put
label like all other user lock specific requests and safe in a boolean
to do a put or not which depends on the execution path of
dlm_user_request().

Cc: stable@vger.kernel.org
Fixes: 7a3de7324c ("fs: dlm: trace user space callbacks")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-09-26 09:58:07 -05:00
Jakub Kicinski
9c5d03d362 genetlink: start to validate reserved header bytes
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-29 12:47:15 +01:00
Alexander Aring
56171e0db2 fs: dlm: const void resource name parameter
The resource name parameter should never be changed by DLM so we declare
it as const. At some point it is handled as a char pointer, a resource
name can be a non printable ascii string as well. This patch change it
to handle it as void pointer as it is offered by DLM API.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-23 15:02:47 -05:00