Commit graph

935 commits

Author SHA1 Message Date
David Howells
76426abf9b afs: Revert "afs: Hide silly-rename files from userspace"
[ Upstream commit 0aec3847d0 ]

This reverts commit 57e9d49c54.

This undoes the hiding of .__afsXXXX silly-rename files.  The problem with
hiding them is that rm can't then manually delete them.

This also reverts commit 5f7a076466 ("afs: Fix
endless loop in directory parsing") as that's a bugfix for the above.

Fixes: 57e9d49c54 ("afs: Hide silly-rename files from userspace")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008102.html
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/3085695.1710328121@warthog.procyon.org.uk
Reviewed-by: Jeffrey E Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:20:57 -04:00
David Howells
058ed71e0f afs: Fix endless loop in directory parsing
[ Upstream commit 5f7a076466 ]

If a directory has a block with only ".__afsXXXX" files in it (from
uncompleted silly-rename), these .__afsXXXX files are skipped but without
advancing the file position in the dir_context.  This leads to
afs_dir_iterate() repeating the block again and again.

Fix this by making the code that skips the .__afsXXXX file also manually
advance the file position.

The symptoms are a soft lookup:

        watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check:5737]
        ...
        RIP: 0010:afs_dir_iterate_block+0x39/0x1fd
        ...
         ? watchdog_timer_fn+0x1a6/0x213
        ...
         ? asm_sysvec_apic_timer_interrupt+0x16/0x20
         ? afs_dir_iterate_block+0x39/0x1fd
         afs_dir_iterate+0x10a/0x148
         afs_readdir+0x30/0x4a
         iterate_dir+0x93/0xd3
         __do_sys_getdents64+0x6b/0xd4

This is almost certainly the actual fix for:

        https://bugzilla.kernel.org/show_bug.cgi?id=218496

Fixes: 57e9d49c54 ("afs: Hide silly-rename files from userspace")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/786185.1708694102@warthog.procyon.org.uk
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06 14:45:09 +00:00
Daniil Dulov
e8530b170e afs: Increase buffer size in afs_update_volume_status()
[ Upstream commit 6ea38e2aeb ]

The max length of volume->vid value is 20 characters.
So increase idbuf[] size up to 24 to avoid overflow.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

[DH: Actually, it's 20 + NUL, so increase it to 24 and use snprintf()]

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20240211150442.3416-1-d.dulov@aladdin.ru/ # v1
Link: https://lore.kernel.org/r/20240212083347.10742-1-d.dulov@aladdin.ru/ # v2
Link: https://lore.kernel.org/r/20240219143906.138346-3-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01 13:26:36 +01:00
Oleg Nesterov
ea4eb77c53 afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
[ Upstream commit 1702e0654c ]

David Howells says:

 (5) afs_find_server().

     There could be a lot of servers in the list and each server can have
     multiple addresses, so I think this would be better with an exclusive
     second pass.

     The server list isn't likely to change all that often, but when it does
     change, there's a good chance several servers are going to be
     added/removed one after the other.  Further, this is only going to be
     used for incoming cache management/callback requests from the server,
     which hopefully aren't going to happen too often - but it is remotely
     drivable.

 (6) afs_find_server_by_uuid().

     Similarly to (5), there could be a lot of servers to search through, but
     they are in a tree not a flat list, so it should be faster to process.
     Again, it's not likely to change that often and, again, when it does
     change it's likely to involve multiple changes.  This can be driven
     remotely by an incoming cache management request but is mostly going to
     be driven by setting up or reconfiguring a volume's server list -
     something that also isn't likely to happen often.

Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock()
never takes the lock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20231130115614.GA21581@redhat.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:48 +00:00
Oleg Nesterov
eef7c4cd98 afs: fix the usage of read_seqbegin_or_lock() in afs_lookup_volume_rcu()
[ Upstream commit 4121b43371 ]

David Howells says:

 (2) afs_lookup_volume_rcu().

     There can be a lot of volumes known by a system.  A thousand would
     require a 10-step walk and this is drivable by remote operation, so I
     think this should probably take a lock on the second pass too.

Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock()
never takes the lock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20231130115606.GA21571@redhat.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:48 +00:00
David Howells
ab49164c60 afs: Hide silly-rename files from userspace
[ Upstream commit 57e9d49c54 ]

There appears to be a race between silly-rename files being created/removed
and various userspace tools iterating over the contents of a directory,
leading to such errors as:

	find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory
	tar: ./include/linux/greybus/.__afs3C95: File removed before we read it

when building a kernel.

Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files
to userspace.  This doesn't stop them being looked up directly by name as
we need to be able to look them up from within the kernel as part of the
silly-rename algorithm.

Fixes: 79ddbfa500 ("afs: Implement sillyrename for unlink and rename")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:04 -08:00
David Howells
d0eafc7631 mm, netfs, fscache: stop read optimisation when folio removed from pagecache
[ Upstream commit b4fa966f03 ]

Fscache has an optimisation by which reads from the cache are skipped
until we know that (a) there's data there to be read and (b) that data
isn't entirely covered by pages resident in the netfs pagecache.  This is
done with two flags manipulated by fscache_note_page_release():

	if (...
	    test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
	    test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);

where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to
indicate that netfslib should download from the server or clear the page
instead.

The fscache_note_page_release() function is intended to be called from
->releasepage() - but that only gets called if PG_private or PG_private_2
is set - and currently the former is at the discretion of the network
filesystem and the latter is only set whilst a page is being written to
the cache, so sometimes we miss clearing the optimisation.

Fix this by following Willy's suggestion[1] and adding an address_space
flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
->release_folio() if it's set, even if PG_private or PG_private_2 aren't
set.

Note that this would require folio_test_private() and page_has_private() to
become more complicated.  To avoid that, in the places[*] where these are
used to conditionalise calls to filemap_release_folio() and
try_to_release_page(), the tests are removed the those functions just
jumped to unconditionally and the test is performed there.

[*] There are some exceptions in vmscan.c where the check guards more than
just a call to the releaser.  I've added a function, folio_needs_release()
to wrap all the checks for that.

AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
is called.

Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
and the optimisation cancelled if a cachefiles object already contains data
when we open it.

[dwysocha@redhat.com: call folio_mapping() inside folio_needs_release()]
  Link: 902c990e31
Link: https://lkml.kernel.org/r/20230628104852.3391651-3-dhowells@redhat.com
Fixes: 1f67e6d0b1 ("fscache: Provide a function to note the release of a page")
Fixes: 047487c947 ("cachefiles: Implement the I/O routines")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: SeongJae Park <sj@kernel.org>
Cc: Daire Byrne <daire.byrne@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Cc: Dave Wysochanski <dwysocha@redhat.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 1898efcdbe ("block: update the stable_writes flag in bdev_add")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10 17:10:31 +01:00
David Howells
9b4c95a63e afs: Fix use-after-free due to get/remove race in volume tree
[ Upstream commit 9a6b294ab4 ]

When an afs_volume struct is put, its refcount is reduced to 0 before
the cell->volume_lock is taken and the volume removed from the
cell->volumes tree.

Unfortunately, this means that the lookup code can race and see a volume
with a zero ref in the tree, resulting in a use-after-free:

    refcount_t: addition on 0; use-after-free.
    WARNING: CPU: 3 PID: 130782 at lib/refcount.c:25 refcount_warn_saturate+0x7a/0xda
    ...
    RIP: 0010:refcount_warn_saturate+0x7a/0xda
    ...
    Call Trace:
     afs_get_volume+0x3d/0x55
     afs_create_volume+0x126/0x1de
     afs_validate_fc+0xfe/0x130
     afs_get_tree+0x20/0x2e5
     vfs_get_tree+0x1d/0xc9
     do_new_mount+0x13b/0x22e
     do_mount+0x5d/0x8a
     __do_sys_mount+0x100/0x12a
     do_syscall_64+0x3a/0x94
     entry_SYSCALL_64_after_hwframe+0x62/0x6a

Fix this by:

 (1) When putting, use a flag to indicate if the volume has been removed
     from the tree and skip the rb_erase if it has.

 (2) When looking up, use a conditional ref increment and if it fails
     because the refcount is 0, replace the node in the tree and set the
     removal flag.

Fixes: 20325960f8 ("afs: Reorganise volume and server trees to be rooted on the cell")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01 12:38:58 +00:00
David Howells
17605162e2 afs: Fix overwriting of result of DNS query
[ Upstream commit a9e01ac8c5 ]

In afs_update_cell(), ret is the result of the DNS lookup and the errors
are to be handled by a switch - however, the value gets clobbered in
between by setting it to -ENOMEM in case afs_alloc_vlserver_list()
fails.

Fix this by moving the setting of -ENOMEM into the error handling for
OOM failure.  Further, only do it if we don't have an alternative error
to return.

Found by Linux Verification Center (linuxtesting.org) with SVACE.  Based
on a patch from Anastasia Belova [1].

Fixes: d5c32c89b2 ("afs: Fix cell DNS lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Anastasia Belova <abelova@astralinux.ru>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: lvc-project@linuxtesting.org
Link: https://lore.kernel.org/r/20231221085849.1463-1-abelova@astralinux.ru/ [1]
Link: https://lore.kernel.org/r/1700862.1703168632@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01 12:38:58 +00:00
David Howells
087b96adc6 afs: Fix dynamic root lookup DNS check
[ Upstream commit 74cef6872c ]

In the afs dynamic root directory, the ->lookup() function does a DNS check
on the cell being asked for and if the DNS upcall reports an error it will
report an error back to userspace (typically ENOENT).

However, if a failed DNS upcall returns a new-style result, it will return
a valid result, with the status field set appropriately to indicate the
type of failure - and in that case, dns_query() doesn't return an error and
we let stat() complete with no error - which can cause confusion in
userspace as subsequent calls that trigger d_automount then fail with
ENOENT.

Fix this by checking the status result from a valid dns_query() and
returning an error if it indicates a failure.

Fixes: bbb4c4323a ("dns: Allow the dns resolver to retrieve a server set")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216637
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01 12:38:57 +00:00
David Howells
9c6ea7abe3 afs: Fix the dynamic root's d_delete to always delete unused dentries
[ Upstream commit 71f8b55bc3 ]

Fix the afs dynamic root's d_delete function to always delete unused
dentries rather than only deleting them if they're positive.  With things
as they stand upstream, negative dentries stemming from failed DNS lookups
stick around preventing retries.

Fixes: 66c7e1d319 ("afs: Split the dynroot stuff out and give it its own ops tables")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01 12:38:57 +00:00
David Howells
e0cda159c8 afs: Fix refcount underflow from error handling race
[ Upstream commit 52bf9f6c09 ]

If an AFS cell that has an unreachable (eg. ENETUNREACH) server listed (VL
server or fileserver), an asynchronous probe to one of its addresses may
fail immediately because sendmsg() returns an error.  When this happens, a
refcount underflow can happen if certain events hit a very small window.

The way this occurs is:

 (1) There are two levels of "call" object, the afs_call and the
     rxrpc_call.  Each of them can be transitioned to a "completed" state
     in the event of success or failure.

 (2) Asynchronous afs_calls are self-referential whilst they are active to
     prevent them from evaporating when they're not being processed.  This
     reference is disposed of when the afs_call is completed.

     Note that an afs_call may only be completed once; once completed
     completing it again will do nothing.

 (3) When a call transmission is made, the app-side rxrpc code queues a Tx
     buffer for the rxrpc I/O thread to transmit.  The I/O thread invokes
     sendmsg() to transmit it - and in the case of failure, it transitions
     the rxrpc_call to the completed state.

 (4) When an rxrpc_call is completed, the app layer is notified.  In this
     case, the app is kafs and it schedules a work item to process events
     pertaining to an afs_call.

 (5) When the afs_call event processor is run, it goes down through the
     RPC-specific handler to afs_extract_data() to retrieve data from rxrpc
     - and, in this case, it picks up the error from the rxrpc_call and
     returns it.

     The error is then propagated to the afs_call and that is completed
     too.  At this point the self-reference is released.

 (6) If the rxrpc I/O thread manages to complete the rxrpc_call within the
     window between rxrpc_send_data() queuing the request packet and
     checking for call completion on the way out, then
     rxrpc_kernel_send_data() will return the error from sendmsg() to the
     app.

 (7) Then afs_make_call() will see an error and will jump to the error
     handling path which will attempt to clean up the afs_call.

 (8) The problem comes when the error handling path in afs_make_call()
     tries to unconditionally drop an async afs_call's self-reference.
     This self-reference, however, may already have been dropped by
     afs_extract_data() completing the afs_call

 (9) The refcount underflows when we return to afs_do_probe_vlserver() and
     that tries to drop its reference on the afs_call.

Fix this by making afs_make_call() attempt to complete the afs_call rather
than unconditionally putting it.  That way, if afs_extract_data() manages
to complete the call first, afs_make_call() won't do anything.

The bug can be forced by making do_udp_sendmsg() return -ENETUNREACH and
sticking an msleep() in rxrpc_send_data() after the 'success:' label to
widen the race window.

The error message looks something like:

    refcount_t: underflow; use-after-free.
    WARNING: CPU: 3 PID: 720 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
    ...
    RIP: 0010:refcount_warn_saturate+0xba/0x110
    ...
    afs_put_call+0x1dc/0x1f0 [kafs]
    afs_fs_get_capabilities+0x8b/0xe0 [kafs]
    afs_fs_probe_fileserver+0x188/0x1e0 [kafs]
    afs_lookup_server+0x3bf/0x3f0 [kafs]
    afs_alloc_server_list+0x130/0x2e0 [kafs]
    afs_create_volume+0x162/0x400 [kafs]
    afs_get_tree+0x266/0x410 [kafs]
    vfs_get_tree+0x25/0xc0
    fc_mount+0xe/0x40
    afs_d_automount+0x1b3/0x390 [kafs]
    __traverse_mounts+0x8f/0x210
    step_into+0x340/0x760
    path_openat+0x13a/0x1260
    do_filp_open+0xaf/0x160
    do_sys_openat2+0xaf/0x170

or something like:

    refcount_t: underflow; use-after-free.
    ...
    RIP: 0010:refcount_warn_saturate+0x99/0xda
    ...
    afs_put_call+0x4a/0x175
    afs_send_vl_probes+0x108/0x172
    afs_select_vlserver+0xd6/0x311
    afs_do_cell_detect_alias+0x5e/0x1e9
    afs_cell_detect_alias+0x44/0x92
    afs_validate_fc+0x9d/0x134
    afs_get_tree+0x20/0x2e6
    vfs_get_tree+0x1d/0xc9
    fc_mount+0xe/0x33
    afs_d_automount+0x48/0x9d
    __traverse_mounts+0xe0/0x166
    step_into+0x140/0x274
    open_last_lookups+0x1c1/0x1df
    path_openat+0x138/0x1c3
    do_filp_open+0x55/0xb4
    do_sys_openat2+0x6c/0xb6

Fixes: 34fa47612b ("afs: Fix race in async call refcounting")
Reported-by: Bill MacAllister <bill@ca-zephyr.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052304
Suggested-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/2633992.1702073229@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:00:15 +01:00
David Howells
48b3ee0134 afs: Fix file locking on R/O volumes to operate in local mode
[ Upstream commit b590eb41be ]

AFS doesn't really do locking on R/O volumes as fileservers don't maintain
state with each other and thus a lock on a R/O volume file on one
fileserver will not be be visible to someone looking at the same file on
another fileserver.

Further, the server may return an error if you try it.

Fix this by doing what other AFS clients do and handle filelocking on R/O
volume files entirely within the client and don't touch the server.

Fixes: 6c6c1d63c2 ("afs: Provide mount-time configurable byte-range file locking emulation")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03 07:32:08 +01:00
David Howells
f9cf17836e afs: Return ENOENT if no cell DNS record can be found
[ Upstream commit 0167236e7d ]

Make AFS return error ENOENT if no cell SRV or AFSDB DNS record (or
cellservdb config file record) can be found rather than returning
EDESTADDRREQ.

Also add cell name lookup info to the cursor dump.

Fixes: d5c32c89b2 ("afs: Fix cell DNS lookup")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03 07:32:08 +01:00
David Howells
d2b3bc8c7f afs: Make error on cell lookup failure consistent with OpenAFS
[ Upstream commit 2a4ca1b4b7 ]

When kafs tries to look up a cell in the DNS or the local config, it will
translate a lookup failure into EDESTADDRREQ whereas OpenAFS translates it
into ENOENT.  Applications such as West expect the latter behaviour and
fail if they see the former.

This can be seen by trying to mount an unknown cell:

   # mount -t afs %example.com:cell.root /mnt
   mount: /mnt: mount(2) system call failed: Destination address required.

Fixes: 4d673da145 ("afs: Support the AFS dynamic root")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03 07:32:07 +01:00
David Howells
790ea5bc40 afs: Fix afs_server_list to be cleaned up with RCU
[ Upstream commit e6bace7313 ]

afs_server_list is accessed with the rcu_read_lock() held from
volume->servers, so it needs to be cleaned up correctly.

Fix this by using kfree_rcu() instead of kfree().

Fixes: 8a070a9648 ("afs: Detect cell aliases 1 - Cells with root volumes")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03 07:32:06 +01:00
David Howells
f5ea303502 afs: Fix accidental truncation when storing data
[ Upstream commit 03275585ca ]

When an AFS FS.StoreData RPC call is made, amongst other things it is
given the resultant file size to be.  On the server, this is processed
by truncating the file to new size and then writing the data.

Now, kafs has a lock (vnode->io_lock) that serves to serialise
operations against a specific vnode (ie.  inode), but the parameters for
the op are set before the lock is taken.  This allows two writebacks
(say sync and kswapd) to race - and if writes are ongoing the writeback
for a later write could occur before the writeback for an earlier one if
the latter gets interrupted.

Note that afs_writepages() cannot take i_mutex and only takes a shared
lock on vnode->validate_lock.

Also note that the server does the truncation and the write inside a
lock, so there's no problem at that end.

Fix this by moving the calculation for the proposed new i_size inside
the vnode->io_lock.  Also reset the iterator (which we might have read
from) and update the mtime setting there.

Fixes: bd80d8a80e ("afs: Use ITER_XARRAY for writing")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3526895.1687960024@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:22:06 +02:00
David Howells
616aba5536 afs: Fix vlserver probe RTT handling
[ Upstream commit ba00b19067 ]

In the same spirit as commit ca57f02295 ("afs: Fix fileserver probe
RTT handling"), don't rule out using a vlserver just because there
haven't been enough packets yet to calculate a real rtt.  Always set the
server's probe rtt from the estimate provided by rxrpc_kernel_get_srtt,
which is capped at 1 second.

This could lead to EDESTADDRREQ errors when accessing a cell for the
first time, even though the vl servers are known and have responded to a
probe.

Fixes: 1d4adfaf65 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2023-June/006746.html
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21 16:01:02 +02:00
David Howells
84c699681a afs: Fix setting of mtime when creating a file/dir/symlink
[ Upstream commit a27648c742 ]

kafs incorrectly passes a zero mtime (ie. 1st Jan 1970) to the server when
creating a file, dir or symlink because the mtime recorded in the
afs_operation struct gets passed to the server by the marshalling routines,
but the afs_mkdir(), afs_create() and afs_symlink() functions don't set it.

This gets masked if a file or directory is subsequently modified.

Fix this by filling in op->mtime before calling the create op.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14 11:15:16 +02:00
Marc Dionne
274c0b0c2f afs: Avoid endless loop if file is larger than expected
[ Upstream commit 9ea4eff4b6 ]

afs_read_dir fetches an amount of data that's based on what the inode
size is thought to be.  If the file on the server is larger than what
was fetched, the code rechecks i_size and retries.  If the local i_size
was not properly updated, this can lead to an endless loop of fetching
i_size from the server and noticing each time that the size is larger on
the server.

If it is known that the remote size is larger than i_size, bump up the
fetch size to that size.

Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:38 +09:00
David Howells
7b6ccf752a afs: Fix getattr to report server i_size on dirs, not local size
[ Upstream commit 45f66fa03b ]

Fix afs_getattr() to report the server's idea of the file size of a
directory rather than the local size.  The local size may differ as we edit
the local copy to avoid having to redownload it and we may end up with a
differently structured blob of a different size.

However, if the directory is discarded from the pagecache we then download
it again and the user may see the directory file size apparently change.

Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:38 +09:00
Marc Dionne
2cfce11132 afs: Fix updating of i_size with dv jump from server
[ Upstream commit d7f74e9a91 ]

If the data version returned from the server is larger than expected,
the local data is invalidated, but we may still want to note the remote
file size.

Since we're setting change_size, we have to also set data_changed
for the i_size to get updated.

Fixes: 3f4aa98181 ("afs: Fix EOF corruption")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:38 +09:00
Al Viro
5a19095103 use less confusing names for iov_iter direction initializers
[ Upstream commit de4eda9de2 ]

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stable-dep-of: 6dd88fd59d ("vhost-scsi: unbreak any layout for response")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09 11:28:04 +01:00
David Howells
b4a59fd2e5 afs: Fix lost servers_outstanding count
[ Upstream commit 36f82c93ee ]

The afs_fs_probe_dispatcher() work function is passed a count on
net->servers_outstanding when it is scheduled (which may come via its
timer).  This is passed back to the work_item, passed to the timer or
dropped at the end of the dispatcher function.

But, at the top of the dispatcher function, there are two checks which
skip the rest of the function: if the network namespace is being destroyed
or if there are no fileservers to probe.  These two return paths, however,
do not drop the count passed to the dispatcher, and so, sometimes, the
destruction of a network namespace, such as induced by rmmod of the kafs
module, may get stuck in afs_purge_servers(), waiting for
net->servers_outstanding to become zero.

Fix this by adding the missing decrements in afs_fs_probe_dispatcher().

Fixes: f6cbb368bc ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/167164544917.2072364.3759519569649459359.stgit@warthog.procyon.org.uk/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:33:08 +01:00
Marc Dionne
ef4d3ea405 afs: Fix server->active leak in afs_put_server
The atomic_read was accidentally replaced with atomic_inc_return,
which prevents the server from getting cleaned up and causes rmmod
to hang with a warning:

    Can't purge s=00000001

Fixes: 2757a4dc18 ("afs: Fix access after dec in put functions")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20221130174053.2665818-1-marc.dionne@auristor.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-30 10:02:37 -08:00
David Howells
ca57f02295 afs: Fix fileserver probe RTT handling
The fileserver probing code attempts to work out the best fileserver to
use for a volume by retrieving the RTT calculated by AF_RXRPC for the
probe call sent to each server and comparing them.  Sometimes, however,
no RTT estimate is available and rxrpc_kernel_get_srtt() returns false,
leading good fileservers to be given an RTT of UINT_MAX and thus causing
the rotation algorithm to ignore them.

Fix afs_select_fileserver() to ignore rxrpc_kernel_get_srtt()'s return
value and just take the estimated RTT it provides - which will be capped
at 1 second.

Fixes: 1d4adfaf65 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166965503999.3392585.13954054113218099395.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-28 18:40:07 -08:00
Linus Torvalds
7a3353c5c4 struct file-related stuff
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxjIQAKCRBZ7Krx/gZQ
 6/FPAQCNCZygQzd+54//vo4kTwv5T2Bv3hS8J51rASPJT87/BQD/TfCLS5urt/Gt
 81A1dFOfnTXseofuBKyGSXwQm0dWpgA=
 =PLre
 -----END PGP SIGNATURE-----

Merge tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs file updates from Al Viro:
 "struct file-related stuff"

* tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dma_buf_getfile(): don't bother with ->f_flags reassignments
  Change calling conventions for filldir_t
  locks: fix TOCTOU race when granting write lease
2022-10-06 17:13:18 -07:00
Linus Torvalds
26b1224903 Networking fixes for 6.0-rc5, including fixes from rxrpc, netfilter,
wireless and bluetooth subtrees
 
 Current release - regressions:
   - skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
 
   - bluetooth: fix regression preventing ACL packet transmission
 
 Current release - new code bugs:
   - dsa: microchip: fix kernel oops on ksz8 switches
 
   - dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
 
 Previous releases - regressions:
   - netfilter: clean up hook list when offload flags check fails
 
   - wifi: mt76: fix crash in chip reset fail
 
   - rxrpc: fix ICMP/ICMP6 error handling
 
   - ice: fix DMA mappings leak
 
   - i40e: fix kernel crash during module removal
 
 Previous releases - always broken:
   - ipv6: sr: fix out-of-bounds read when setting HMAC data.
 
   - tcp: TX zerocopy should not sense pfmemalloc status
 
   - sch_sfb: don't assume the skb is still around after enqueueing to child
 
   - netfilter: drop dst references before setting
 
   - wifi: wilc1000: fix DMA on stack objects
 
   - rxrpc: fix an insufficiently large sglist in rxkad_verify_packet_2()
 
   - fec: use a spinlock to guard `fep->ptp_clk_on`
 
 Misc:
   - usb: qmi_wwan: add Quectel RM520N
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmMZwOMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkCVMP/jUOqbRTzPOqV7+HhDM65Ww9Prb1WYKS
 4o9Vi9DAL/DH8tXSDDtacE0GbtN6Mlr0kEYJE+lwn1hFsfGY1mzH2pcVjJqtZ7fh
 6o2joaQMJ5lJe4dsr2k0TRtYhCzdeCrvzoTs2qLSeb5KYFOsxMtBikSF3kOJ1TZq
 3bOK3OomiT/XO4FCnyPJvg8VBghkp69oNnefOavM8x7FzrMh6MY7VQem/IjnlIU2
 9sHvMPLai81B1NXu2eybThEYSqutZHzM1PLuqIMhSMwdiSrVRlLFq7/BNP0FxCTR
 /Mby+fnU4xIu+TK1bRoUM0fsipNceVEkXln4pQrRmxu1Yw62RFn5p+MGCM+Sc1UB
 8HZzjtNRtlD+ZfRyQKs2m1+WJtbuESyzAeJaKoQ2eO2StmLqlnZLEYGl6oVQj6Uj
 l3Bn1aBIOYrSM0T/qTNmAv7BTJYvpomlwqoQ5A1+5b0jKhn39Un3Yj3eU2aEv8eX
 mJ80KAWk26Cc7ZA8WbL70Ac8wV76+TcpWCezVvrcZqrFoYAvicLrwEc92Fq0IiG5
 bkA84zhd1TOFSuYVH7f79lSUPYHIs8FxFXIV8SZZGLbhj6nk607cwgbpA9IXizZT
 CUajNoJclS1tELPG7VOv6DJS+VnpdsCm0+q3dnYTYiLlrEX71Bgc/ofgGOBKTeS3
 iRyS2V0GLYoW
 =DeAB
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from rxrpc, netfilter, wireless and bluetooth
  subtrees.

  Current release - regressions:

   - skb: export skb drop reaons to user by TRACE_DEFINE_ENUM

   - bluetooth: fix regression preventing ACL packet transmission

  Current release - new code bugs:

   - dsa: microchip: fix kernel oops on ksz8 switches

   - dsa: qca8k: fix NULL pointer dereference for
     of_device_get_match_data

  Previous releases - regressions:

   - netfilter: clean up hook list when offload flags check fails

   - wifi: mt76: fix crash in chip reset fail

   - rxrpc: fix ICMP/ICMP6 error handling

   - ice: fix DMA mappings leak

   - i40e: fix kernel crash during module removal

  Previous releases - always broken:

   - ipv6: sr: fix out-of-bounds read when setting HMAC data.

   - tcp: TX zerocopy should not sense pfmemalloc status

   - sch_sfb: don't assume the skb is still around after
     enqueueing to child

   - netfilter: drop dst references before setting

   - wifi: wilc1000: fix DMA on stack objects

   - rxrpc: fix an insufficiently large sglist in
     rxkad_verify_packet_2()

   - fec: use a spinlock to guard `fep->ptp_clk_on`

  Misc:

   - usb: qmi_wwan: add Quectel RM520N"

* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  sch_sfb: Also store skb len before calling child enqueue
  net: phy: lan87xx: change interrupt src of link_up to comm_ready
  net/smc: Fix possible access to freed memory in link clear
  net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
  net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
  net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
  net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
  net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
  net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
  net: usb: qmi_wwan: add Quectel RM520N
  net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
  tcp: fix early ETIMEDOUT after spurious non-SACK RTO
  stmmac: intel: Simplify intel_eth_pci_remove()
  net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
  ipv6: sr: fix out-of-bounds read when setting HMAC data.
  bonding: accept unsolicited NA message
  bonding: add all node mcast address when slave up
  bonding: use unspecified address if no available link local address
  wifi: use struct_group to copy addresses
  wifi: mac80211_hwsim: check length for virtio packets
  ...
2022-09-08 08:15:01 -04:00
David Howells
0066f1b0e2 afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked
When trying to get a file lock on an AFS file, the server may return
UAEAGAIN to indicate that the lock is already held.  This is currently
translated by the default path to -EREMOTEIO.

Translate it instead to -EAGAIN so that we know we can retry it.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey E Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166075761334.3533338.2591992675160918098.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-06 21:33:01 -04:00
David Howells
7903192c4b afs: Use the operation issue time instead of the reply time for callbacks
rxrpc and kafs between them try to use the receive timestamp on the first
data packet (ie. the one with sequence number 1) as a base from which to
calculate the time at which callback promise and lock expiration occurs.

However, we don't know how long it took for the server to send us the reply
from it having completed the basic part of the operation - it might then,
for instance, have to send a bunch of a callback breaks, depending on the
particular operation.

Fix this by using the time at which the operation is issued on the client
as a base instead.  That should never be longer than the server's idea of
the expiry time.

Fixes: 781070551c ("afs: Fix calculation of callback expiry time")
Fixes: 2070a3e449 ("rxrpc: Allow the reply time to be obtained on a client call")
Suggested-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2022-09-01 11:44:13 +01:00
Al Viro
25885a35a7 Change calling conventions for filldir_t
filldir_t instances (directory iterators callbacks) used to return 0 for
"OK, keep going" or -E... for "stop".  Note that it's *NOT* how the
error values are reported - the rules for those are callback-dependent
and ->iterate{,_shared}() instances only care about zero vs. non-zero
(look at emit_dir() and friends).

So let's just return bool ("should we keep going?") - it's less confusing
that way.  The choice between "true means keep going" and "true means
stop" is bikesheddable; we have two groups of callbacks -
	do something for everything in directory, until we run into problem
and
	find an entry in directory and do something to it.

The former tended to use 0/-E... conventions - -E<something> on failure.
The latter tended to use 0/1, 1 being "stop, we are done".
The callers treated anything non-zero as "stop", ignoring which
non-zero value did they get.

"true means stop" would be more natural for the second group; "true
means keep going" - for the first one.  I tried both variants and
the things like
	if allocation failed
		something = -ENOMEM;
		return true;
just looked unnatural and asking for trouble.

[folded suggestion from Matthew Wilcox <willy@infradead.org>]
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-17 17:25:04 -04:00
David Howells
8549a26308 afs: Enable multipage folio support
Enable multipage folio support for the afs filesystem.

Support has already been implemented in netfslib, fscache and cachefiles
and in most of afs, but I've waited for Matthew Wilcox's latest folio
changes.

Note that it does require a change to afs_write_begin() to return the
correct subpage.  This is a "temporary" change as we're working on
getting rid of the need for ->write_begin() and ->write_end()
completely, at least as far as network filesystems are concerned - but
it doesn't prevent afs from making use of the capability.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: kafs-testing@auristor.com
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-13 17:20:51 -07:00
Linus Torvalds
4b22e20741 AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmLpXWwACgkQ+7dXa6fL
 C2tpSg//WX4+1XvsehIZxykJYmQ7XjVjrg9uGIEaslUry2nqEECDeK7LppKLuCoD
 1xMBc5GRn4h5YbBCGbMWnr+r6eezcXdwr5Fv+On4fD3l5AWTk5xTaR8v1So3G9bK
 nAkWaoSrFapuifWgBkDGy9NhLW/k6d24BAKCqGhOlPcVzi4B95C4gQ5sIlMlH9xw
 jUkHB+HSLZj/yBV47sWQUs0uy5fCiOgm3qgWYl2cTHD4qJ052i0ELO30uAv9Oykq
 YjgOUryZlCP/vLSWzOeK8bgINjCLetd+2jGT96bV15BouQ4uoHSrgt7ACtp1I3NP
 WAPGdzlzXZqaYSrqX2sPP/Tmq0/SipoKCRqK5e2yetmqs7zznKZe0ATwFhjJYp7L
 IOULEaOAyrAeT9KQEwqtANXXoUxyEUqz2GKiybXWC8oUFJ0RAKpSf4LRi9cpoaRz
 VDmi0F4/ZA+nMDOPPdbSZTtDCu+68Lz2l6Z7ONwIZ1qzyXJB4BC4wb+9rx7UftA9
 QuX9TqGUtjlWiPcM0qvdFoPOIA3xOXbZVtlvqygFxzDDXGlRVZdUa3uqMjELKzK1
 v1yRZIS00zqWM4lWFyaQNCZHwFhlnixRzkAmvQcioAb/NJL3eA4n3FbL+RBTSrWP
 XEtgpI32ROobA3S+69LomYH4AisMgP9XDP04mOQqQfIxg9GFdEI=
 =fXqy
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20220802' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "Fix AFS refcount handling.

  The first patch converts afs to use refcount_t for its refcounts and
  the second patch fixes afs_put_call() and afs_put_server() to save the
  values they're going to log in the tracepoint before decrementing the
  refcount"

* tag 'afs-fixes-20220802' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix access after dec in put functions
  afs: Use refcount_t rather than atomic_t
2022-08-09 10:08:08 -07:00
Linus Torvalds
f00654007f Folio changes for 6.0
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
    when running xfstests
 
  - Convert more of mpage to use folios
 
  - Remove add_to_page_cache() and add_to_page_cache_locked()
 
  - Convert find_get_pages_range() to filemap_get_folios()
 
  - Improvements to the read_cache_page() family of functions
 
  - Remove a few unnecessary checks of PageError
 
  - Some straightforward filesystem conversions to use folios
 
  - Split PageMovable users out from address_space_operations into their
    own movable_operations
 
  - Convert aops->migratepage to aops->migrate_folio
 
  - Remove nobh support (Christoph Hellwig)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp
 gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha
 mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS
 7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT
 /58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z
 C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M
 Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q==
 =DgUi
 -----END PGP SIGNATURE-----

Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
   when running xfstests

 - Convert more of mpage to use folios

 - Remove add_to_page_cache() and add_to_page_cache_locked()

 - Convert find_get_pages_range() to filemap_get_folios()

 - Improvements to the read_cache_page() family of functions

 - Remove a few unnecessary checks of PageError

 - Some straightforward filesystem conversions to use folios

 - Split PageMovable users out from address_space_operations into
   their own movable_operations

 - Convert aops->migratepage to aops->migrate_folio

 - Remove nobh support (Christoph Hellwig)

* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
  fs: remove the NULL get_block case in mpage_writepages
  fs: don't call ->writepage from __mpage_writepage
  fs: remove the nobh helpers
  jfs: stop using the nobh helper
  ext2: remove nobh support
  ntfs3: refactor ntfs_writepages
  mm/folio-compat: Remove migration compatibility functions
  fs: Remove aops->migratepage()
  secretmem: Convert to migrate_folio
  hugetlb: Convert to migrate_folio
  aio: Convert to migrate_folio
  f2fs: Convert to filemap_migrate_folio()
  ubifs: Convert to filemap_migrate_folio()
  btrfs: Convert btrfs_migratepage to migrate_folio
  mm/migrate: Add filemap_migrate_folio()
  mm/migrate: Convert migrate_page() to migrate_folio()
  nfs: Convert to migrate_folio
  btrfs: Convert btree_migratepage to migrate_folio
  mm/migrate: Convert expected_page_refs() to folio_expected_refs()
  mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
  ...
2022-08-03 10:35:43 -07:00
David Howells
2757a4dc18 afs: Fix access after dec in put functions
Reference-putting functions should not access the object being put after
decrementing the refcount unless they reduce the refcount to zero.

Fix a couple of instances of this in afs by copying the information to be
logged by tracepoint to local variables before doing the decrement.

[Fixed a bit in afs_put_server() that I'd missed but Marc caught]

Fixes: 341f741f04 ("afs: Refcount the afs_call struct")
Fixes: 4521819369 ("afs: Trace afs_server usage")
Fixes: 977e5f8ed0 ("afs: Split the usage count on struct afs_server")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/165911278430.3745403.16526310736054780645.stgit@warthog.procyon.org.uk/ # v1
2022-08-02 18:21:29 +01:00
David Howells
c56f9ec8b2 afs: Use refcount_t rather than atomic_t
Use refcount_t rather than atomic_t in afs to make use of the count
checking facilities provided.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/165911277768.3745403.423349776836296452.stgit@warthog.procyon.org.uk/ # v1
2022-08-02 18:10:11 +01:00
Xiubo Li
fac47b43c7 netfs: do not unlock and put the folio twice
check_write_begin() will unlock and put the folio when return
non-zero.  So we should avoid unlocking and putting it twice in
netfs layer.

Change the way ->check_write_begin() works in the following two ways:

 (1) Pass it a pointer to the folio pointer, allowing it to unlock and put
     the folio prior to doing the stuff it wants to do, provided it clears
     the folio pointer.

 (2) Change the return values such that 0 with folio pointer set means
     continue, 0 with folio pointer cleared means re-get and all error
     codes indicating an error (no special treatment for -EAGAIN).

[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-07-14 10:10:12 +02:00
Matthew Wilcox (Oracle)
59fc647405 afs: Remove check of PageError
If read_mapping_page() encounters an error, it returns an errno, not a
page with PageError set, so this is dead code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29 08:51:06 -04:00
David Howells
cb78d1b5ef afs: Fix dynamic root getattr
The recent patch to make afs_getattr consult the server didn't account
for the pseudo-inodes employed by the dynamic root-type afs superblock
not having a volume or a server to access, and thus an oops occurs if
such a directory is stat'd.

Fix this by checking to see if the vnode->volume pointer actually points
anywhere before following it in afs_getattr().

This can be tested by stat'ing a directory in /afs.  It may be
sufficient just to do "ls /afs" and the oops looks something like:

        BUG: kernel NULL pointer dereference, address: 0000000000000020
        ...
        RIP: 0010:afs_getattr+0x8b/0x14b
        ...
        Call Trace:
         <TASK>
         vfs_statx+0x79/0xf5
         vfs_fstatat+0x49/0x62

Fixes: 2aeb8c86d4 ("afs: Fix afs_getattr() to refetch file status if callback break occurred")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/165408450783.1031787.7941404776393751186.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-21 11:47:30 -05:00
David Howells
40a8110120 netfs: Rename the netfs_io_request cleanup op and give it an op pointer
The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).

So rename the ->cleanup op to ->free_request (to match ->init_request) and
pass in the I/O pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
2022-06-10 20:55:21 +01:00
Linus Torvalds
e81fb4198e netfs: Further cleanups after struct netfs_inode wrapper introduced
Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up.  For
type safety, it's better not to do that (and it's less typing too).

Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer.  Note that the ->write_begin() and ->write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.

netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.

Changes
=======
- Updated the kerneldoc comments and documentation [DH].

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/
2022-06-10 20:55:21 +01:00
David Howells
102d841055 afs: Fix some checker issues
Remove an unused global variable and make another static as reported by
make C=1.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
2022-06-10 20:55:21 +01:00
David Howells
874c8ca1e6 netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context
While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
following complaint[1] from gcc v12:

  In file included from include/linux/string.h:253,
                   from include/linux/ceph/ceph_debug.h:7,
                   from fs/ceph/inode.c:2:
  In function 'fortify_memset_chk',
      inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
      inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
  include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    242 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode).  The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.

Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &ctx->inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).

Most of the changes were done with:

  perl -p -i -e 's/vfs_inode/netfs.inode/'g \
        `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`

Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.

Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].

Version #2:
 - Fix a couple of missed name changes due to a disabled cifs option.
 - Rename nfs_i_context to nfs_inode
 - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
   structs.

[ This also undoes commit 507160f46c ("netfs: gcc-12: temporarily
  disable '-Wattribute-warning' for now") that is no longer needed ]

Fixes: bc899ee1c8 ("netfs: Add a netfs inode context")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
cc: Jonathan Corbet <corbet@lwn.net>
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <smfrench@gmail.com>
cc: William Kucharski <william.kucharski@oracle.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: linux-doc@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09 13:55:00 -07:00
Linus Torvalds
507160f46c netfs: gcc-12: temporarily disable '-Wattribute-warning' for now
This is a pure band-aid so that I can continue merging stuff from people
while some of the gcc-12 fallout gets sorted out.

In particular, gcc-12 is very unhappy about the kinds of pointer
arithmetic tricks that netfs does, and that makes the fortify checks
trigger in afs and ceph:

  In function ‘fortify_memset_chk’,
      inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2,
      inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2,
      inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2:
  include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    258 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and the reason is that netfs_i_context_init() is passed a 'struct inode'
pointer, and then it does

        struct netfs_i_context *ctx = netfs_i_context(inode);

        memset(ctx, 0, sizeof(*ctx));

where that netfs_i_context() function just does pointer arithmetic on
the inode pointer, knowing that the netfs_i_context is laid out
immediately after it in memory.

This is all truly disgusting, since the whole "netfs_i_context is laid
out immediately after it in memory" is not actually remotely true in
general, but is just made to be that way for afs and ceph.

See for example fs/cifs/cifsglob.h:

  struct cifsInodeInfo {
        struct {
                /* These must be contiguous */
                struct inode    vfs_inode;      /* the VFS's inode record */
                struct netfs_i_context netfs_ctx; /* Netfslib context */
        };
	[...]

and realize that this is all entirely wrong, and the pointer arithmetic
that netfs_i_context() is doing is also very very wrong and wouldn't
give the right answer if netfs_ctx had different alignment rules from a
'struct inode', for example).

Anyway, that's just a long-winded way to say "the gcc-12 warning is
actually quite reasonable, and our code happens to work but is pretty
disgusting".

This is getting fixed properly, but for now I made the mistake of
thinking "the week right after the merge window tends to be calm for me
as people take a breather" and I did a sustem upgrade.  And I got gcc-12
as a result, so to continue merging fixes from people and not have the
end result drown in warnings, I am fixing all these gcc-12 issues I hit.

Including with these kinds of temporary fixes.

Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09 11:29:36 -07:00
David Howells
17eabd4256 afs: Fix infinite loop found by xfstest generic/676
In AFS, a directory is handled as a file that the client downloads and
parses locally for the purposes of performing lookup and getdents
operations.  The in-kernel afs filesystem has a number of functions that
do this.

A directory file is arranged as a series of 2K blocks divided into
32-byte slots, where a directory entry occupies one or more slots, plus
each block starts with one or more metadata blocks.

When parsing a block, if the last slots are occupied by a dirent that
occupies more than a single slot and the file position points at a slot
that's not the initial one, the logic in afs_dir_iterate_block() that
skips over it won't advance the file pointer to the end of it.  This
will cause an infinite loop in getdents() as it will keep retrying that
block and failing to advance beyond the final entry.

Fix this by advancing the file pointer if the next entry will be beyond
it when we skip a block.

This was found by the generic/676 xfstest but can also be triggered with
something like:

	~/xfstests-dev/src/t_readdir_3 /xfstest.test/z 4000 1

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lore.kernel.org/r/165391973497.110268.2939296942213894166.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-01 11:11:51 -07:00
Linus Torvalds
62e5873ec9 Misc hardening changes for 5.19-rc1
Hi Linus,
 
 Please, pull the following hardening changes that I've been collecting
 in my tree during the last development cycle. All of them have been
 baking in linux-next.
 
 Replace open-coded instances with size_t saturating arithmetic helpers:
 
 - virt: acrn: Prefer array_size and struct_size over open coded arithmetic (Len Baker)
 - afs: Prefer struct_size over open coded arithmetic (Len Baker)
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmKNUvYACgkQRwW0y0cG
 2zGm1g//YZYWBEzqOncd1K4/TBzbMytxE7oQYqUiS4e3V5D1eiT8924BLr7iuWMd
 c64Lv0trjyHCjdqBxGLMz5cv5EZd206xS3bqIQ8gqcKHOK/z7ob/c7YJxdUcXaWK
 oyGY65Arg9XnuRIJvNeiwdJNpPGsSyI8BSRC6gf6NFYke/GNIgv3rxmVBSbH8V7W
 QVBWf3AWR8+ZlXJFHWEfIAMRwFtumAhak3ZpN/4SDWM6cw3PIm1TFEO5Jx/rWaNn
 LZth+91uJU6UeSrhrME/QQ9Q1kgmLUMqzW3U/HpMky7Ugi0ffWIHc83XYDOgrFRJ
 Br5KLL4c5SHMJfPXqHMsbRNG/bI6fJfd2A0vUDIQMdy4kX88MAzTsUzSiGe6gAAc
 a3RQV4qUkBvC53glZm4pb0IGY4j3vO1tf6Hd9BdxH4knpfQzVNTAwHXUxDq8kDBu
 Bir3B60qAktszxdp/QukmfoTRpVPSV732TMgIllVPFtFD/KYCiYFhSILPouh5JWJ
 3TP8ND1hKrV3P9FVirlJxdFLcADu8dyrmwxKdxZVa6ek9xtr0gt111FYcTeZG+eS
 PaJ7N8G+OkvYktv14/+0DBgIQdeHQ/kp6D4eTjQZ6y0k0b7uAI3HRv7fxWh8eyOw
 RmaE8jQCHokUnerouBUIGEwAmyFhN3atogqIHczB4ebBNbEeOGg=
 =22O8
 -----END PGP SIGNATURE-----

Merge tag 'size_t-saturating-helpers-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull misc hardening updates from Gustavo Silva:
 "Replace a few open-coded instances with size_t saturating arithmetic
  helpers"

* tag 'size_t-saturating-helpers-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  virt: acrn: Prefer array_size and struct_size over open coded arithmetic
  afs: Prefer struct_size over open coded arithmetic
2022-05-25 13:56:57 -07:00
Linus Torvalds
7e062cda7d Networking changes for 5.19.
Core
 ----
 
  - Support TCPv6 segmentation offload with super-segments larger than
    64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
 
  - Generalize skb freeing deferral to per-cpu lists, instead of
    per-socket lists.
 
  - Add a netdev statistic for packets dropped due to L2 address
    mismatch (rx_otherhost_dropped).
 
  - Continue work annotating skb drop reasons.
 
  - Accept alternative netdev names (ALT_IFNAME) in more netlink
    requests.
 
  - Add VLAN support for AF_PACKET SOCK_RAW GSO.
 
  - Allow receiving skb mark from the socket as a cmsg.
 
  - Enable memcg accounting for veth queues, sysctl tables and IPv6.
 
 BPF
 ---
 
  - Add libbpf support for User Statically-Defined Tracing (USDTs).
 
  - Speed up symbol resolution for kprobes multi-link attachments.
 
  - Support storing typed pointers to referenced and unreferenced
    objects in BPF maps.
 
  - Add support for BPF link iterator.
 
  - Introduce access to remote CPU map elements in BPF per-cpu map.
 
  - Allow middle-of-the-road settings for the
    kernel.unprivileged_bpf_disabled sysctl.
 
  - Implement basic types of dynamic pointers e.g. to allow for
    dynamically sized ringbuf reservations without extra memory copies.
 
 Protocols
 ---------
 
  - Retire port only listening_hash table, add a second bind table
    hashed by port and address. Avoid linear list walk when binding
    to very popular ports (e.g. 443).
 
  - Add bridge FDB bulk flush filtering support allowing user space
    to remove all FDB entries matching a condition.
 
  - Introduce accept_unsolicited_na sysctl for IPv6 to implement
    router-side changes for RFC9131.
 
  - Support for MPTCP path manager in user space.
 
  - Add MPTCP support for fallback to regular TCP for connections
    that have never connected additional subflows or transmitted
    out-of-sequence data (partial support for RFC8684 fallback).
 
  - Avoid races in MPTCP-level window tracking, stabilize and improve
    throughput.
 
  - Support lockless operation of GRE tunnels with seq numbers enabled.
 
  - WiFi support for host based BSS color collision detection.
 
  - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
 
  - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
 
  - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
 
  - Allow matching on the number of VLAN tags via tc-flower.
 
  - Add tracepoint for tcp_set_ca_state().
 
 Driver API
 ----------
 
  - Improve error reporting from classifier and action offload.
 
  - Add support for listing line cards in switches (devlink).
 
  - Add helpers for reporting page pool statistics with ethtool -S.
 
  - Add support for reading clock cycles when using PTP virtual clocks,
    instead of having the driver convert to time before reporting.
    This makes it possible to report time from different vclocks.
 
  - Support configuring low-latency Tx descriptor push via ethtool.
 
  - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
    - Sunplus SP7021 SoC (sp7021_emac)
    - Add support for Renesas RZ/V2M (in ravb)
    - Add support for MediaTek mt7986 switches (in mtk_eth_soc)
 
  - Ethernet PHYs:
    - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
    - TI DP83TD510 PHY
    - Microchip LAN8742/LAN88xx PHYs
 
  - WiFi:
    - Driver for pureLiFi X, XL, XC devices (plfxlc)
    - Driver for Silicon Labs devices (wfx)
    - Support for WCN6750 (in ath11k)
    - Support Realtek 8852ce devices (in rtw89)
 
  - Mobile:
    - MediaTek T700 modems (Intel 5G 5000 M.2 cards)
 
  - CAN:
   - ctucanfd: add support for CTU CAN FD open-source IP core
     from Czech Technical University in Prague
 
 Drivers
 -------
 
  - Delete a number of old drivers still using virt_to_bus().
 
  - Ethernet NICs:
    - intel: support TSO on tunnels MPLS
    - broadcom: support multi-buffer XDP
    - nfp: support VF rate limiting
    - sfc: use hardware tx timestamps for more than PTP
    - mlx5: multi-port eswitch support
    - hyper-v: add support for XDP_REDIRECT
    - atlantic: XDP support (including multi-buffer)
    - macb: improve real-time perf by deferring Tx processing to NAPI
 
  - High-speed Ethernet switches:
    - mlxsw: implement basic line card information querying
    - prestera: add support for traffic policing on ingress and egress
 
  - Embedded Ethernet switches:
    - lan966x: add support for packet DMA (FDMA)
    - lan966x: add support for PTP programmable pins
    - ti: cpsw_new: enable bc/mc storm prevention
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - Wake-on-WLAN support for QCA6390 and WCN6855
    - device recovery (firmware restart) support
    - support setting Specific Absorption Rate (SAR) for WCN6855
    - read country code from SMBIOS for WCN6855/QCA6390
    - enable keep-alive during WoWLAN suspend
    - implement remain-on-channel support
 
  - MediaTek WiFi (mt76):
    - support Wireless Ethernet Dispatch offloading packet movement
      between the Ethernet switch and WiFi interfaces
    - non-standard VHT MCS10-11 support
    - mt7921 AP mode support
    - mt7921 IPv6 NS offload support
 
  - Ethernet PHYs:
    - micrel: ksz9031/ksz9131: cabletest support
    - lan87xx: SQI support for T1 PHYs
    - lan937x: add interrupt support for link detection
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKNMPQACgkQMUZtbf5S
 IrsRARAAuDyYs6jFYB3p+xazZdOnbF4iAgVv71+DQGvmsCl6CB9OrsNZMlvE85OL
 Q3gjcRbgjrkN4lhgI8DmiGYbsUJnAvVjFdNjccz1Z/vTLYvuIM0ol54MUp5S+9WY
 StncOJkOGJxxR/Gi5gzVmejPDsysU3Jik+hm/fpIcz8pybXxAsFKU5waY5qfl+/T
 TZepfV0VCfqRDjqcF1qA5+jJZNU8pdodQlZ1+mh8bwu6Jk1ZkWkj6Ov8MWdwQldr
 LnPeK/9hIGzkdJYHZfajxA3t8D0K5CHzSuih2bJ9ry8ZXgVBkXEThew778/R5izW
 uB0YZs9COFlrIP7XHjtRTy/2xHOdYIPlj2nWhVdfuQDX8Crvt4VRN6EZ1rjko1ZJ
 WanfG6WHF8NH5pXBRQbh3kIMKBnYn6OIzuCfCQSqd+niHcxFIM4vRiggeXI5C5TW
 vJgEWfK6X+NfDiFVa3xyCrEmp5ieA/pNecpwd8rVkql+MtFAAw4vfsotLKOJEAru
 J/XL6UE+YuLqIJV9ACZ9x1AFXXAo661jOxBunOo4VXhXVzWS9lYYz5r5ryIkgT/8
 /Fr0zjANJWgfIuNdIBtYfQ4qG+LozGq038VA06RhFUAZ5tF9DzhqJs2Q2AFuWWBC
 ewCePJVqo1j2Ceq2mGonXRt47OEnlePoOxTk9W+cKZb7ZWE+zEo=
 =Wjii
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Support TCPv6 segmentation offload with super-segments larger than
     64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).

   - Generalize skb freeing deferral to per-cpu lists, instead of
     per-socket lists.

   - Add a netdev statistic for packets dropped due to L2 address
     mismatch (rx_otherhost_dropped).

   - Continue work annotating skb drop reasons.

   - Accept alternative netdev names (ALT_IFNAME) in more netlink
     requests.

   - Add VLAN support for AF_PACKET SOCK_RAW GSO.

   - Allow receiving skb mark from the socket as a cmsg.

   - Enable memcg accounting for veth queues, sysctl tables and IPv6.

  BPF
  ---

   - Add libbpf support for User Statically-Defined Tracing (USDTs).

   - Speed up symbol resolution for kprobes multi-link attachments.

   - Support storing typed pointers to referenced and unreferenced
     objects in BPF maps.

   - Add support for BPF link iterator.

   - Introduce access to remote CPU map elements in BPF per-cpu map.

   - Allow middle-of-the-road settings for the
     kernel.unprivileged_bpf_disabled sysctl.

   - Implement basic types of dynamic pointers e.g. to allow for
     dynamically sized ringbuf reservations without extra memory copies.

  Protocols
  ---------

   - Retire port only listening_hash table, add a second bind table
     hashed by port and address. Avoid linear list walk when binding to
     very popular ports (e.g. 443).

   - Add bridge FDB bulk flush filtering support allowing user space to
     remove all FDB entries matching a condition.

   - Introduce accept_unsolicited_na sysctl for IPv6 to implement
     router-side changes for RFC9131.

   - Support for MPTCP path manager in user space.

   - Add MPTCP support for fallback to regular TCP for connections that
     have never connected additional subflows or transmitted
     out-of-sequence data (partial support for RFC8684 fallback).

   - Avoid races in MPTCP-level window tracking, stabilize and improve
     throughput.

   - Support lockless operation of GRE tunnels with seq numbers enabled.

   - WiFi support for host based BSS color collision detection.

   - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.

   - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).

   - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).

   - Allow matching on the number of VLAN tags via tc-flower.

   - Add tracepoint for tcp_set_ca_state().

  Driver API
  ----------

   - Improve error reporting from classifier and action offload.

   - Add support for listing line cards in switches (devlink).

   - Add helpers for reporting page pool statistics with ethtool -S.

   - Add support for reading clock cycles when using PTP virtual clocks,
     instead of having the driver convert to time before reporting. This
     makes it possible to report time from different vclocks.

   - Support configuring low-latency Tx descriptor push via ethtool.

   - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.

  New hardware / drivers
  ----------------------

   - Ethernet:
      - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
      - Sunplus SP7021 SoC (sp7021_emac)
      - Add support for Renesas RZ/V2M (in ravb)
      - Add support for MediaTek mt7986 switches (in mtk_eth_soc)

   - Ethernet PHYs:
      - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
      - TI DP83TD510 PHY
      - Microchip LAN8742/LAN88xx PHYs

   - WiFi:
      - Driver for pureLiFi X, XL, XC devices (plfxlc)
      - Driver for Silicon Labs devices (wfx)
      - Support for WCN6750 (in ath11k)
      - Support Realtek 8852ce devices (in rtw89)

   - Mobile:
      - MediaTek T700 modems (Intel 5G 5000 M.2 cards)

   - CAN:
      - ctucanfd: add support for CTU CAN FD open-source IP core from
        Czech Technical University in Prague

  Drivers
  -------

   - Delete a number of old drivers still using virt_to_bus().

   - Ethernet NICs:
      - intel: support TSO on tunnels MPLS
      - broadcom: support multi-buffer XDP
      - nfp: support VF rate limiting
      - sfc: use hardware tx timestamps for more than PTP
      - mlx5: multi-port eswitch support
      - hyper-v: add support for XDP_REDIRECT
      - atlantic: XDP support (including multi-buffer)
      - macb: improve real-time perf by deferring Tx processing to NAPI

   - High-speed Ethernet switches:
      - mlxsw: implement basic line card information querying
      - prestera: add support for traffic policing on ingress and egress

   - Embedded Ethernet switches:
      - lan966x: add support for packet DMA (FDMA)
      - lan966x: add support for PTP programmable pins
      - ti: cpsw_new: enable bc/mc storm prevention

   - Qualcomm 802.11ax WiFi (ath11k):
      - Wake-on-WLAN support for QCA6390 and WCN6855
      - device recovery (firmware restart) support
      - support setting Specific Absorption Rate (SAR) for WCN6855
      - read country code from SMBIOS for WCN6855/QCA6390
      - enable keep-alive during WoWLAN suspend
      - implement remain-on-channel support

   - MediaTek WiFi (mt76):
      - support Wireless Ethernet Dispatch offloading packet movement
        between the Ethernet switch and WiFi interfaces
      - non-standard VHT MCS10-11 support
      - mt7921 AP mode support
      - mt7921 IPv6 NS offload support

   - Ethernet PHYs:
      - micrel: ksz9031/ksz9131: cabletest support
      - lan87xx: SQI support for T1 PHYs
      - lan937x: add interrupt support for link detection"

* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
  ptp: ocp: Add firmware header checks
  ptp: ocp: fix PPS source selector debugfs reporting
  ptp: ocp: add .init function for sma_op vector
  ptp: ocp: vectorize the sma accessor functions
  ptp: ocp: constify selectors
  ptp: ocp: parameterize input/output sma selectors
  ptp: ocp: revise firmware display
  ptp: ocp: add Celestica timecard PCI ids
  ptp: ocp: Remove #ifdefs around PCI IDs
  ptp: ocp: 32-bit fixups for pci start address
  Revert "net/smc: fix listen processing for SMC-Rv2"
  ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
  selftests/bpf: Dynptr tests
  bpf: Add dynptr data slices
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_from_mem for local dynptrs
  bpf: Add verifier support for dynptrs
  bpf: Suppress 'passing zero to PTR_ERR' warning
  bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
  ...
2022-05-25 12:22:58 -07:00
Linus Torvalds
fdaf9a5840 Page cache changes for 5.19
- Appoint myself page cache maintainer
 
  - Fix how scsicam uses the page cache
 
  - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
 
  - Remove the AOP flags entirely
 
  - Remove pagecache_write_begin() and pagecache_write_end()
 
  - Documentation updates
 
  - Convert several address_space operations to use folios:
    - is_dirty_writeback
    - readpage becomes read_folio
    - releasepage becomes release_folio
    - freepage becomes free_folio
 
  - Change filler_t to require a struct file pointer be the first argument
    like ->read_folio
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp
 gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O
 7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj
 xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d
 nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26
 bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX
 WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A==
 =djLv
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache

Pull page cache updates from Matthew Wilcox:

 - Appoint myself page cache maintainer

 - Fix how scsicam uses the page cache

 - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS

 - Remove the AOP flags entirely

 - Remove pagecache_write_begin() and pagecache_write_end()

 - Documentation updates

 - Convert several address_space operations to use folios:
     - is_dirty_writeback
     - readpage becomes read_folio
     - releasepage becomes release_folio
     - freepage becomes free_folio

 - Change filler_t to require a struct file pointer be the first
   argument like ->read_folio

* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
  nilfs2: Fix some kernel-doc comments
  Appoint myself page cache maintainer
  fs: Remove aops->freepage
  secretmem: Convert to free_folio
  nfs: Convert to free_folio
  orangefs: Convert to free_folio
  fs: Add free_folio address space operation
  fs: Convert drop_buffers() to use a folio
  fs: Change try_to_free_buffers() to take a folio
  jbd2: Convert release_buffer_page() to use a folio
  jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
  reiserfs: Convert release_buffer_page() to use a folio
  fs: Remove last vestiges of releasepage
  ubifs: Convert to release_folio
  reiserfs: Convert to release_folio
  orangefs: Convert to release_folio
  ocfs2: Convert to release_folio
  nilfs2: Remove comment about releasepage
  nfs: Convert to release_folio
  jfs: Convert to release_folio
  ...
2022-05-24 19:55:07 -07:00
David Howells
adc9613ff6 afs: Adjust ACK interpretation to try and cope with NAT
If a client's address changes, say if it is NAT'd, this can disrupt an in
progress operation.  For most operations, this is not much of a problem,
but StoreData can be different as some servers modify the target file as
the data comes in, so if a store request is disrupted, the file can get
corrupted on the server.

The problem is that the server doesn't recognise packets that come after
the change of address as belonging to the original client and will bounce
them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if
the packet number falls within the initial sequence number window of a call
or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting
it.  In both cases, firstPacket will be 1 and previousPacket will be 0 in
the ACK information.

Fix this by the following means:

 (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1
     and previousPacket as 0, assume this indicates that the server saw the
     incoming packets from a different peer and thus as a different call.
     Fail the call with error -ENETRESET.

 (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the
     first packet has been hard-ACK'd.  If it hasn't been hard-ACK'd, the
     ACK packet will cause it to get retransmitted, so the call will just
     be repeated.

 (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of
     the operation.

 (4) Prioritise the error code over things like -ECONNRESET as the server
     did actually respond.

 (5) Make writeback treat -ENETRESET as a retryable error and make it
     redirty all the pages involved in a write so that the VM will retry.

Note that there is still a circumstance that I can't easily deal with: if
the operation is fully received and processed by the server, but the reply
is lost due to address change.  There's no way to know if the op happened.
We can examine the server, but a conflicting change could have been made by
a third party - and we can't tell the difference.  In such a case, a
message like:

    kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a)

will be logged to dmesg on the next op to touch the file and the client
will reset the inode state, including invalidating clean parts of the
pagecache.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:03:02 +01:00
David Howells
de696c4784 rxrpc, afs: Fix selection of abort codes
The RX_USER_ABORT code should really only be used to indicate that the user
of the rxrpc service (ie. userspace) implicitly caused a call to be aborted
- for instance if the AF_RXRPC socket is closed whilst the call was in
progress.  (The user may also explicitly abort a call and specify the abort
code to use).

Change some of the points of generation to use other abort codes instead:

 (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see
     ENOMEM and EFAULT during received data delivery and abort with
     RX_CALL_DEAD in the default case.

 (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a
     reply.

 (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had
     heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't.

 (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not
     completed successfully or been aborted.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:03:02 +01:00