linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-04 16:15:11 +00:00

Author	SHA1	Message	Date
Gao Xiang	798eecaea0	erofs: don't warn MicroLZMA format anymore The LZMA algorithm support has been landed for more than one year since Linux 5.16. Besides, the new XZ Utils 5.4 has been available in most Linux distributions. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231021020137.1646959-1-hsiangkao@linux.alibaba.com	2023-10-31 06:56:47 +08:00
Amir Goldstein	24e16e385f	ovl: add support for appending lowerdirs one by one Add new mount options lowerdir+ and datadir+ that can be used to add layers to lower layers stack one by one. Unlike the legacy lowerdir mount option, special characters (i.e. colons and cammas) are not unescaped with these new mount options. The new mount options can be repeated to compose a large stack of lower layers, but they may not be mixed with the lagacy lowerdir mount option, because for displaying lower layers in mountinfo, we do not want to mix escaped with unescaped lower layers path syntax. Similar to data-only layer rules with the lowerdir mount option, the datadir+ option must follow at least one lowerdir+ option and the lowerdir+ option must not follow the datadir+ option. If the legacy lowerdir mount option follows lowerdir+ and datadir+ mount options, it overrides them. Sepcifically, calling: fsconfig(FSCONFIG_SET_STRING, "lowerdir", "", 0); can be used to reset previously setup lower layers. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpegt7VC94KkRtb1dfHG8+4OzwPBLYqhtc8=QFUxpFJE+=RQ@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:13:02 +02:00
Amir Goldstein	819829f031	ovl: refactor layer parsing helpers In preparation for new mount options to add lowerdirs one by one, generalize ovl_parse_param_upperdir() into helper ovl_parse_layer() that will be used for parsing a single lower layers. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpegt7VC94KkRtb1dfHG8+4OzwPBLYqhtc8=QFUxpFJE+=RQ@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:13:02 +02:00
Amir Goldstein	0cea4c097d	ovl: store and show the user provided lowerdir mount option We are about to add new mount options for adding lowerdir one by one, but those mount options will not support escaping. For the existing case, where lowerdir mount option is provided as a colon separated list, store the user provided (possibly escaped) string and display it as is when showing the lowerdir mount option. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:13:02 +02:00
Amir Goldstein	c835110b58	ovl: remove unused code in lowerdir param parsing Commit `beae836e9c` ("ovl: temporarily disable appending lowedirs") removed the ability to append lowerdirs with syntax lowerdir=":<path>". Remove leftover code and comments that are irrelevant with lowerdir append mode disabled. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:13:02 +02:00
Alexander Larsson	bc8df7a3dc	ovl: Add an alternative type of whiteout An xattr whiteout (called "xwhiteout" in the code) is a reguar file of zero size with the "overlay.whiteout" xattr set. A file like this in a directory with the "overlay.whiteouts" xattrs set will be treated the same way as a regular whiteout. The "overlay.whiteouts" directory xattr is used in order to efficiently handle overlay checks in readdir(), as we only need to checks xattrs in affected directories. The advantage of this kind of whiteout is that they can be escaped using the standard overlay xattr escaping mechanism. So, a file with a "overlay.overlay.whiteout" xattr would be unescaped to "overlay.whiteout", which could then be consumed by another overlayfs as a whiteout. Overlayfs itself doesn't create whiteouts like this, but a userspace mechanism could use this alternative mechanism to convert images that may contain whiteouts to be used with overlayfs. To work as a whiteout for both regular overlayfs mounts as well as userxattr mounts both the "user.overlay.whiteout" and the "trusted.overlay.whiteout" xattrs will need to be created. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:59 +02:00
Alexander Larsson	dad02fad84	ovl: Support escaped overlay.* xattrs There are cases where you want to use an overlayfs mount as a lowerdir for another overlayfs mount. For example, if the system rootfs is on overlayfs due to composefs, or to make it volatile (via tmps), then you cannot currently store a lowerdir on the rootfs. This means you can't e.g. store on the rootfs a prepared container image for use using overlayfs. To work around this, we introduce an escapment mechanism for overlayfs xattrs. Whenever the lower/upper dir has a xattr named "overlay.overlay.XYZ", we list it as "overlay.XYZ" in listxattrs, and when the user calls getxattr or setxattr on "overlay.XYZ", we apply to "overlay.overlay.XYZ" in the backing directories. This allows storing any kind of overlay xattrs in a overlayfs mount that can be used as a lowerdir in another mount. It is possible to stack this mechanism multiple times, such that "overlay.overlay.overlay.XYZ" will survive two levels of overlay mounts, however this is not all that useful in practice because of stack depth limitations of overlayfs mounts. Note: These escaped xattrs are copied to upper during copy-up. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:59 +02:00
Alexander Larsson	d431e65260	ovl: Add OVL_XATTR_TRUSTED/USER_PREFIX_LEN macros These match the ones for e.g. XATTR_TRUSTED_PREFIX_LEN. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:59 +02:00
Amir Goldstein	420a62dde6	ovl: Move xattr support to new xattrs.c file This moves the code from super.c and inode.c, and makes ovl_xattr_get/set() static. This is in preparation for doing more work on xattrs support. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:59 +02:00
Amir Goldstein	5b02bfc1e7	ovl: do not encode lower fh with upper sb_writers held When lower fs is a nested overlayfs, calling encode_fh() on a lower directory dentry may trigger copy up and take sb_writers on the upper fs of the lower nested overlayfs. The lower nested overlayfs may have the same upper fs as this overlayfs, so nested sb_writers lock is illegal. Move all the callers that encode lower fh to before ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	c63e56a4a6	ovl: do not open/llseek lower file with upper sb_writers held overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek take the ovl_inode_lock, without holding upper sb_writers. In case of nested lower overlay that uses same upper fs as this overlay, lockdep will warn about (possibly false positive) circular lock dependency when doing open/llseek of lower ovl file during copy up with our upper sb_writers held, because the locking ordering seems reverse to the locking order in ovl_copy_up_start(): - lower ovl_inode_lock - upper sb_writers Let the copy up "transaction" keeps an elevated mnt write count on upper mnt, but leaves taking upper sb_writers to lower level helpers only when they actually need it. This allows to avoid holding upper sb_writers during lower file open/llseek and prevents the lockdep warning. Minimizing the scope of upper sb_writers during copy up is also needed for fixing another possible deadlocks by a following patch. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	162d064440	ovl: reorder ovl_want_write() after ovl_inode_lock() Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	d08d3b3c2c	ovl: split ovl_want_write() into two helpers ovl_get_write_access() gets write access to upper mnt without taking freeze protection on upper sb and ovl_start_write() only takes freeze protection on upper sb. These helpers will be used to breakup the large ovl_want_write() scope during copy up into finer grained freeze protection scopes. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	c002728f60	ovl: add helper ovl_file_modified() A simple wrapper for updating ovl inode size/mtime, to conform with ovl_file_accessed(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:55 +02:00
Amir Goldstein	f7621b11e8	ovl: protect copying of realinode attributes to ovl inode ovl_copyattr() may be called concurrently from aio completion context without any lock and that could lead to overlay inode attributes getting permanently out of sync with real inode attributes. Use ovl inode spinlock to protect ovl_copyattr(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:55 +02:00
Amir Goldstein	389a4a4a19	ovl: punt write aio completion to workqueue We want to protect concurrent updates of ovl inode size and mtime (i.e. ovl_copyattr()) from aio completion context. Punt write aio completion to a workqueue so that we can protect ovl_copyattr() with a spinlock. Export sb_init_dio_done_wq(), so that overlayfs can use its own dio workqueue to punt aio completions. Suggested-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/8620dfd3-372d-4ae0-aa3f-2fe97dda1bca@kernel.dk/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:54 +02:00
Amir Goldstein	5f034d3473	ovl: propagate IOCB_APPEND flag on writes to realfile If ovl file is opened O_APPEND, the underlying realfile is also opened O_APPEND, so it makes sense to propagate the IOCB_APPEND flags on sync writes to realfile, just as we do with aio writes. Effectively, because sync ovl writes are protected by inode lock, this change only makes a difference if the realfile is written to (size extending writes) from underneath overlayfs. The behavior in this case is undefined, so it is ok if we change the behavior (to fail the ovl IOCB_APPEND write). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:54 +02:00
Amir Goldstein	db5b5e83ee	ovl: use simpler function to convert iocb to rw flags Overlayfs implements its own function to translate iocb flags into rw flags, so that they can be passed into another vfs call. With commit `ce71bfea20` ("fs: align IOCB_* flags with RWF_* flags") Jens created a 1:1 matching between the iocb flags and rw flags, simplifying the conversion. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:54 +02:00
Linus Torvalds	9e87705289	Initial bcachefs pull request for 6.7-rc1 Here's the bcachefs filesystem pull request. One new patch since last week: the exportfs constants ended up conflicting with other filesystems that are also getting added to the global enum, so switched to new constants picked by Amir. I'll also be sending another pull request later on in the cycle bringing things up to date my master branch that people are currently running; that will be restricted to fs/bcachefs/, naturally. Testing - fstests as well as the bcachefs specific tests in ktest: https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-for-upstream It's also been soaking in linux-next, which resulted in a whole bunch of smatch complaints and fixes and a patch or two from Kees. The only new non fs/bcachefs/ patch is the objtool patch that adds bcachefs functions to the list of noreturns. The patch that exports osq_lock() has been dropped for now, per Ingo. Prereq patch list: `faf1dce852` objtool: Add bcachefs noreturns `73badee428` lib/generic-radix-tree.c: Add peek_prev() `9492261ff2` lib/generic-radix-tree.c: Don't overflow in peek() `0fb5d567f5` MAINTAINERS: Add entry for generic-radix-tree `b414e8ecd4` closures: Add a missing include `48b7935722` closures: closure_nr_remaining() `ced58fc7ab` closures: closure_wait_event() `bd0d22e41e` MAINTAINERS: Add entry for closures `8c8d2d9670` bcache: move closures to lib/ `957e48087d` locking: export contention tracepoints for bcachefs six locks `21db931445` lib: Export errname `83feeb1955` lib/string_helpers: string_get_size() now returns characters wrote `7d672f4094` stacktrace: Export stack_trace_save_tsk `771eb4fe8b` fs: factor out d_mark_tmpfile() `2b69987be5` sched: Add task_struct->faults_disabled_mapping -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmU/wyIACgkQE6szbY3K bnZc1xAAqjQBGXdtgtKQvk0/ru0WaMZguMsOHd3BUXIbm30F6eJqnoXQ/ahALofc Ju6NrOgcy9wmdPKWpbeF+aK3WnkAW9jShDd0QieVH6PkhyYyh5r11iR/EVtjjLu5 6Teodn8fyTqn9WSDtKG15QreTCJrEasAoGFQKQDA8oiXC7zc+RSpLUkkTWD/pxyW zVqkGGiAUG4x6FON+X2a3QBa9WCahIgV6XzHstGLsmOECxKO/LopGR5jThuIhv9t Yo0wodQTKAgb9QviG6V3f2dJLQKKUVDmVEGTXv+8Hl3d8CiYBJeIh+icp+VESBo1 m8ev0y2xbTPLwgm5v0Uj4o/G8ISZ+qmcexV2zQ9xUWUAd2AjEBzhCh9BrNXM5qSg o7mphH+Pt6bJXgzxb2RkYJixU11yG3yuHPOCrRGGFpVHiNYhdHuJeDZOqChWZB8x 6kY0uvU0X0tqVfWKxMwTwuqG8mJ5BkJNvnEvYi05QEZG0dDcUhgOqYlNNaL8vGkl qVixOwE4aH4kscdmW2gXY1c76VSebheyN8n6Wj1zrmTw4hTJH7ZWXPtmbRqQzpB6 U6w3NjVyopbIjuF+syWeGqitTT/8fpvgZU4E9MpKGmHX4ADgecp6YSZQzzxTJn7D cbVX7YQxhmsM50C1PW7A8yLCspD/uRNiKLvzb/g9gFSInk4rV+U= =g+ia -----END PGP SIGNATURE----- Merge tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs Pull initial bcachefs updates from Kent Overstreet: "Here's the bcachefs filesystem pull request. One new patch since last week: the exportfs constants ended up conflicting with other filesystems that are also getting added to the global enum, so switched to new constants picked by Amir. The only new non fs/bcachefs/ patch is the objtool patch that adds bcachefs functions to the list of noreturns. The patch that exports osq_lock() has been dropped for now, per Ingo" * tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs: (2781 commits) exportfs: Change bcachefs fid_type enum to avoid conflicts bcachefs: Refactor memcpy into direct assignment bcachefs: Fix drop_alloc_keys() bcachefs: snapshot_create_lock bcachefs: Fix snapshot skiplists during snapshot deletion bcachefs: bch2_sb_field_get() refactoring bcachefs: KEY_TYPE_error now counts towards i_sectors bcachefs: Fix handling of unknown bkey types bcachefs: Switch to unsafe_memcpy() in a few places bcachefs: Use struct_size() bcachefs: Correctly initialize new buckets on device resize bcachefs: Fix another smatch complaint bcachefs: Use strsep() in split_devs() bcachefs: Add iops fields to bch_member bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1 bcachefs: New superblock section members_v2 bcachefs: Add new helper to retrieve bch_member from sb bcachefs: bucket_lock() is now a sleepable lock bcachefs: fix crc32c checksum merge byte order problem bcachefs: Fix bch2_inode_delete_keys() ...	2023-10-30 11:09:38 -10:00
Linus Torvalds	d5acbc60fa	for-6.7-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmU/xAEACgkQxWXV+ddt WDvYKg//SjTimA5Nins9mb4jdz8n+dDeZnQhKzy3FqInU41EzDRc4WwnEODmDlTa AyU9rGB3k0JNSUc075jZFCyLqq/ARiOqRi4x33Gk0ckIlc4X5OgBoqP2XkPh0VlP txskLCrmhc3pwyR4ErlFDX2jebIUXfkv39bJuE40grGvUatRe+WNq0ERIrgO8RAr Rc3hBotMH8AIqfD1L6j1ZiZIAyrOkT1BJMuqeoq27/gJZn/MRhM9TCrMTzfWGaoW SxPrQiCDEN3KECsOY/caroMn3AekDijg/ley1Nf7Z0N6oEV+n4VWWPBFE9HhRz83 9fIdvSbGjSJF6ekzTjcVXPAbcuKZFzeqOdBRMIW3TIUo7mZQyJTVkMsc1y/NL2Z3 9DhlRLIzvWJJjt1CEK0u18n5IU+dGngdktbhWWIuIlo8r+G/iKR/7zqU92VfWLHL Z7/eh6HgH5zr2bm+yKORbrUjkv4IVhGVarW8D4aM+MCG0lFN2GaPcJCCUrp4n7rZ PzpQbxXa38ANBk6hsp4ndS8TJSBL9moY8tumzLcKg97nzNMV6KpBdV/G6/QfRLCN 3kM6UbwTAkMwGcQS86Mqx6s04ORLnQeD6f7N6X4Ppx0Mi/zkjI2HkRuvQGp12B0v iZjCCZAYY2Iu+/TU0GrCXSss/grzIAUPzM9msyV3XGO/VBpwdec= =9TVx -----END PGP SIGNATURE----- Merge tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "New features: - raid-stripe-tree New tree for logical file extent mapping where the physical mapping may not match on multiple devices. This is now used in zoned mode to implement RAID0/RAID1* profiles, but can be used in non-zoned mode as well. The support for RAID56 is in development and will eventually fix the problems with the current implementation. This is a backward incompatible feature and has to be enabled at mkfs time. - simple quota accounting (squota) A simplified mode of qgroup that accounts all space on the initial extent owners (a subvolume), the snapshots are then cheap to create and delete. The deletion of snapshots in fully accounting qgroups is a known CPU/IO performance bottleneck. The squota is not suitable for the general use case but works well for containers where the original subvolume exists for the whole time. This is a backward incompatible feature as it needs extending some structures, but can be enabled on an existing filesystem. - temporary filesystem fsid (temp_fsid) The fsid identifies a filesystem and is hard coded in the structures, which disallows mounting the same fsid found on different devices. For a single device filesystem this is not strictly necessary, a new temporary fsid can be generated on mount e.g. after a device is cloned. This will be used by Steam Deck for root partition A/B testing, or can be used for VM root images. Other user visible changes: - filesystems with partially finished metadata_uuid conversion cannot be mounted anymore and the uuid fixup has to be done by btrfs-progs (btrfstune). Performance improvements: - reduce reservations for checksum deletions (with enabled free space tree by factor of 4), on a sample workload on file with many extents the deletion time decreased by 12% - make extent state merges more efficient during insertions, reduce rb-tree iterations (run time of critical functions reduced by 5%) Core changes: - the integrity check functionality has been removed, this was a debugging feature and removal does not affect other integrity checks like checksums or tree-checker - space reservation changes: - more efficient delayed ref reservations, this avoids building up too much work or overusing or exhausting the global block reserve in some situations - move delayed refs reservation to the transaction start time, this prevents some ENOSPC corner cases related to exhaustion of global reserve - improvements in reducing excessive reservations for block group items - adjust overcommit logic in near full situations, account for one more chunk to eventually allocate metadata chunk, this is mostly relevant for small filesystems (<10GiB) - single device filesystems are scanned but not registered (except seed devices), this allows temp_fsid to work - qgroup iterations do not need GFP_ATOMIC allocations anymore - cleanups, refactoring, reduced data structure size, function parameter simplifications, error handling fixes" * tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (156 commits) btrfs: open code timespec64 in struct btrfs_inode btrfs: remove redundant log root tree index assignment during log sync btrfs: remove redundant initialization of variable dirty in btrfs_update_time() btrfs: sysfs: show temp_fsid feature btrfs: disable the device add feature for temp-fsid btrfs: disable the seed feature for temp-fsid btrfs: update comment for temp-fsid, fsid, and metadata_uuid btrfs: remove pointless empty log context list check when syncing log btrfs: update comment for struct btrfs_inode::lock btrfs: remove pointless barrier from btrfs_sync_file() btrfs: add and use helpers for reading and writing last_trans_committed btrfs: add and use helpers for reading and writing fs_info->generation btrfs: add and use helpers for reading and writing log_transid btrfs: add and use helpers for reading and writing last_log_commit btrfs: support cloned-device mount capability btrfs: add helper function find_fsid_by_disk btrfs: stop reserving excessive space for block group item insertions btrfs: stop reserving excessive space for block group item updates btrfs: reorder btrfs_inode to fill gaps btrfs: open code btrfs_ordered_inode_tree in btrfs_inode ...	2023-10-30 10:42:06 -10:00
Linus Torvalds	8829687a4a	fscrypt updates for 6.7 This update adds support for configuring the crypto data unit size (i.e. the granularity of file contents encryption) to be less than the filesystem block size. This can allow users to use inline encryption hardware in some cases when it wouldn't otherwise be possible. In addition, there are two commits that are prerequisites for the extent-based encryption support that the btrfs folks are working on. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZT8acBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+czAQDkStgX1ICJQANnxwbrg/SUVdZjPuFH sJw3sUVpBR81TwEA/SyWh3YzVNZdpE7PWNrCknrC+qnO8hd9QBEjnQfwIQc= =t44a -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "This update adds support for configuring the crypto data unit size (i.e. the granularity of file contents encryption) to be less than the filesystem block size. This can allow users to use inline encryption hardware in some cases when it wouldn't otherwise be possible. In addition, there are two commits that are prerequisites for the extent-based encryption support that the btrfs folks are working on" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: track master key presence separately from secret fscrypt: rename fscrypt_info => fscrypt_inode_info fscrypt: support crypto data unit size less than filesystem block size fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes fscrypt: compute max_lblk_bits from s_maxbytes and block size fscrypt: make the bounce page pool opt-in instead of opt-out fscrypt: make it clearer that key_prefix is deprecated	2023-10-30 10:23:42 -10:00
Linus Torvalds	8b16da681e	NFSD 6.7 Release Notes This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmU5IuoACgkQM2qzM29m f5eVsg//bVp8S93ci/oDlKfzOwH2fO5e5rna91wrDpJxkd51h6KTx55dSRG5sjAZ EywIVOann6xCtsixAPyff5Cweg2dWvzQRsy1ZnvWQ1qZBzD5KAJY5LPkeSFUCKBo Zani/qTOYbxzgFMjZx+yDSXDPKG68WYZBQK59SI7mURu4SYdk8aRyNY8mjHfr0Vh Aqrcny4oVtXV4sL5P5G/2FUW7WKT3olA3jSYlRRNMhbs2qpEemRCCrspOEMMad+b t1+ZCg+U27PMranvOJnof4RU7peZbaxDWA0gyiUbivVXVtZn9uOs0ffhktkvechL ePc33dqdp2ITdKIPA6JlaRv5WflKXQw0YYM9Kv5mcR4A2el7owL4f/pMlPhtbYwJ IOJv15KdKVN979G2e6WMYiKK+iHfaUUguhMEXnfnGoAajHOZNQiUEo3iFQAD7LDc DvMF8d9QqYmB9IW8FOYaRRfZGJOQHf3TL79Nd08z/bn5swvlvfj77leux9Sb+0/m Luk2Xvz2AJVSXE31wzabaGHkizN+BtH+e4MMbXUHBPW5jE9v7XOnEUFr4UdZyr9P Gl87A7NcrzNjJWT5TrnzM4sOslNsx46Aeg+VuNt2fSRn2dm6iBu2B8s0N4imx6dV PX1y9VSLq5WRhjrFZ1qeiZdsuTaQtrEiNDoRIQR6nCJPAV80iFk= =B4wJ -----END PGP SIGNATURE----- Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers" * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits) svcrdma: Fix tracepoint printk format svcrdma: Drop connection after an RDMA Read error NFSD: clean up alloc_init_deleg() NFSD: Fix frame size warning in svc_export_parse() NFSD: Rewrite synopsis of nfsd_percpu_counters_init() nfsd: Clean up errors in nfs3proc.c nfsd: Clean up errors in nfs4state.c NFSD: Clean up errors in stats.c NFSD: simplify error paths in nfsd_svc() NFSD: Clean up nfsd4_encode_seek() NFSD: Clean up nfsd4_encode_offset_status() NFSD: Clean up nfsd4_encode_copy_notify() NFSD: Clean up nfsd4_encode_copy() NFSD: Clean up nfsd4_encode_test_stateid() NFSD: Clean up nfsd4_encode_exchange_id() NFSD: Clean up nfsd4_do_encode_secinfo() NFSD: Clean up nfsd4_encode_access() NFSD: Clean up nfsd4_encode_readdir() NFSD: Clean up nfsd4_encode_entry4() NFSD: Add an nfsd4_encode_nfs_cookie4() helper ...	2023-10-30 10:12:29 -10:00
Linus Torvalds	14ab6d425e	vfs-6.7.ctime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC YkBLU3lXqQ87nfu28ExFAzh10hG2jwM= =m4pB -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...	2023-10-30 09:47:13 -10:00
Linus Torvalds	7352a6765c	vfs-6.7.xattr -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppWAAKCRCRxhvAZXjc okB2AP4jjoRErJBwj245OIDJqzoj4m4UVOVd0MH2AkiSpANczwD/TToChdpusY2y qAYg1fQoGMbDVlb7Txaj9qI9ieCf9w0= =2PXg -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs xattr updates from Christian Brauner: "The 's_xattr' field of 'struct super_block' currently requires a mutable table of 'struct xattr_handler' entries (although each handler itself is const). However, no code in vfs actually modifies the tables. This changes the type of 's_xattr' to allow const tables, and modifies existing file systems to move their tables to .rodata. This is desirable because these tables contain entries with function pointers in them; moving them to .rodata makes it considerably less likely to be modified accidentally or maliciously at runtime" * tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (30 commits) const_structs.checkpatch: add xattr_handler net: move sockfs_xattr_handlers to .rodata shmem: move shmem_xattr_handlers to .rodata overlayfs: move xattr tables to .rodata xfs: move xfs_xattr_handlers to .rodata ubifs: move ubifs_xattr_handlers to .rodata squashfs: move squashfs_xattr_handlers to .rodata smb: move cifs_xattr_handlers to .rodata reiserfs: move reiserfs_xattr_handlers to .rodata orangefs: move orangefs_xattr_handlers to .rodata ocfs2: move ocfs2_xattr_handlers and ocfs2_xattr_handler_map to .rodata ntfs3: move ntfs_xattr_handlers to .rodata nfs: move nfs4_xattr_handlers to .rodata kernfs: move kernfs_xattr_handlers to .rodata jfs: move jfs_xattr_handlers to .rodata jffs2: move jffs2_xattr_handlers to .rodata hfsplus: move hfsplus_xattr_handlers to .rodata hfs: move hfs_xattr_handlers to .rodata gfs2: move gfs2_xattr_handlers_max to .rodata fuse: move fuse_xattr_handlers to .rodata ...	2023-10-30 09:29:44 -10:00
Linus Torvalds	3b3f874cc1	vfs-6.7.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTpoQAAKCRCRxhvAZXjc ovFNAQDgIRjXfZ1Ku+USxsRRdqp8geJVaNc3PuMmYhOYhUenqgEAmC1m+p0y31dS P6+HlL16Mqgu0tpLCcJK9BibpDZ0Ew4= =7yD1 -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Rename and export helpers that get write access to a mount. They are used in overlayfs to get write access to the upper mount. - Print the pretty name of the root device on boot failure. This helps in scenarios where we would usually only print "unknown-block(1,2)". - Add an internal SB_I_NOUMASK flag. This is another part in the endless POSIX ACL saga in a way. When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip the umask because if the relevant inode has POSIX ACLs set it might take the umask from there. But if the inode doesn't have any POSIX ACLs set then we apply the umask in the filesytem itself. So we end up with: (1) no SB_POSIXACL -> strip umask in vfs (2) SB_POSIXACL -> strip umask in filesystem The umask semantics associated with SB_POSIXACL allowed filesystems that don't even support POSIX ACLs at all to raise SB_POSIXACL purely to avoid umask stripping. That specifically means NFS v4 and Overlayfs. NFS v4 does it because it delegates this to the server and Overlayfs because it needs to delegate umask stripping to the upper filesystem, i.e., the filesystem used as the writable layer. This went so far that SB_POSIXACL is raised eve on kernels that don't even have POSIX ACL support at all. Stop this blatant abuse and add SB_I_NOUMASK which is an internal superblock flag that filesystems can raise to opt out of umask handling. That should really only be the two mentioned above. It's not that we want any filesystems to do this. Ideally we have all umask handling always in the vfs. - Make overlayfs use SB_I_NOUMASK too. - Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in IS_POSIXACL() if the kernel doesn't have support for it. This is a very old patch but it's only possible to do this now with the wider cleanup that was done. - Follow-up work on fake path handling from last cycle. Citing mostly from Amir: When overlayfs was first merged, overlayfs files of regular files and directories, the ones that are installed in file table, had a "fake" path, namely, f_path is the overlayfs path and f_inode is the "real" inode on the underlying filesystem. In v6.5, we took another small step by introducing of the backing_file container and the file_real_path() helper. This change allowed vfs and filesystem code to get the "real" path of an overlayfs backing file. With this change, we were able to make fsnotify work correctly and report events on the "real" filesystem objects that were accessed via overlayfs. This method works fine, but it still leaves the vfs vulnerable to new code that is not aware of files with fake path. A recent example is commit `db1d1e8b98` ("IMA: use vfs_getattr_nosec to get the i_version"). This commit uses direct referencing to f_path in IMA code that otherwise uses file_inode() and file_dentry() to reference the filesystem objects that it is measuring. This contains work to switch things around: instead of having filesystem code opt-in to get the "real" path, have generic code opt-in for the "fake" path in the few places that it is needed. Is it far more likely that new filesystems code that does not use the file_dentry() and file_real_path() helpers will end up causing crashes or averting LSM/audit rules if we keep the "fake" path exposed by default. This change already makes file_dentry() moot, but for now we did not change this helper just added a WARN_ON() in ovl_d_real() to catch if we have made any wrong assumptions. After the dust settles on this change, we can make file_dentry() a plain accessor and we can drop the inode argument to ->d_real(). - Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small change but it really isn't and I would like to see everyone on their tippie toes for any possible bugs from this work. Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for files since a very long time because of the nasty interactions between the SCM_RIGHTS file descriptor garbage collection. So extending it makes a lot of sense but it is a subtle change. There are almost no places that fiddle with file rcu semantics directly and the ones that did mess around with struct file internal under rcu have been made to stop doing that because it really was always dodgy. I forgot to put in the link tag for this change and the discussion in the commit so adding it into the merge message: https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com Cleanups: - Various smaller pipe cleanups including the removal of a spin lock that was only used to protect against writes without pipe_lock() from O_NOTIFICATION_PIPE aka watch queues. As that was never implemented remove the additional locking from pipe_write(). - Annotate struct watch_filter with the new __counted_by attribute. - Clarify do_unlinkat() cleanup so that it doesn't look like an extra iput() is done that would cause issues. - Simplify file cleanup when the file has never been opened. - Use module helper instead of open-coding it. - Predict error unlikely for stale retry. - Use WRITE_ONCE() for mount expiry field instead of just commenting that one hopes the compiler doesn't get smart. Fixes: - Fix readahead on block devices. - Fix writeback when layztime is enabled and inodes whose timestamp is the only thing that changed reside on wb->b_dirty_time. This caused excessively large zombie memory cgroup when lazytime was enabled as such inodes weren't handled fast enough. - Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()" * tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits) file, i915: fix file reference for mmap_singleton() vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs chardev: Simplify usage of try_module_get() ovl: rely on SB_I_NOUMASK fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n fs: store real path instead of fake path in backing file f_path fs: create helper file_user_path() for user displayed mapped file path fs: get mnt_writers count for an open backing file's real path vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked vfs: predict the error in retry_estale as unlikely backing file: free directly vfs: fix readahead(2) on block devices io_uring: use files_lookup_fd_locked() file: convert to SLAB_TYPESAFE_BY_RCU vfs: shave work on failed file open fs: simplify misleading code to remove ambiguity regarding ihold()/iput() watch_queue: Annotate struct watch_filter with __counted_by fs/pipe: use spinlock in pipe_read() only if there is a watch_queue fs/pipe: remove unnecessary spinlock from pipe_write() ...	2023-10-30 09:14:19 -10:00
Linus Torvalds	0d63d8b229	vfs-6.7.autofs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppJgAKCRCRxhvAZXjc omYoAQD+g3BxYbxEdEuhrnbaZMljp2GEYn+L6I2txdvmp/TpSQEAsQipcEgMC1WI uc9IDiakYWWCSaN8F7BGR7zKsK5feAc= =V+jp -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.autofs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull autofs mount api updates from Christian Brauner: "This ports autofs to the new mount api. The patchset has existed for quite a while but never made it upstream. Ian picked it back up. This also fixes a bug where fs_param_is_fd() was passed a garbage param->dirfd but it expected it to be set to the fd that was used to set param->file otherwise result->uint_32 contains nonsense. So make sure it's set. One less filesystem using the old mount api. We're getting there, albeit rather slow. The last remaining major filesystem that hasn't converted is btrfs. Patches exist - I even wrote them - but so far they haven't made it upstream" * tag 'vfs-6.7.autofs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: autofs: fix add autofs_parse_fd() fsconfig: ensure that dirfd is set to aux autofs: fix protocol sub version setting autofs: convert autofs to use the new mount api autofs: validate protocol version autofs: refactor parse_options() autofs: reformat 0pt enum declaration autofs: refactor super block info init autofs: add autofs_parse_fd() autofs: refactor autofs_prepare_pipe()	2023-10-30 09:10:21 -10:00
Linus Torvalds	d4e175f2c4	vfs-6.7.super -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZT0C2gAKCRCRxhvAZXjc otV8AQCK5F9ONoQ7ISpdrKyUJiswySGXx0CYPfXbSg5gHH87zgEAua3vwVKeGXXF 5iVsdiNzIIQDwGDx7FyxufL4ggcN6gQ= =E1kV -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.super' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs superblock updates from Christian Brauner: "This contains the work to make block device opening functions return a struct bdev_handle instead of just a struct block_device. The same struct bdev_handle is then also passed to block device closing functions. This allows us to propagate context from opening to closing a block device without having to modify all users everytime. Sidenote, in the future we might even want to try and have block device opening functions return a struct file directly but that's a series on top of this. These are further preparatory changes to be able to count writable opens and blocking writes to mounted block devices. That's a separate piece of work for next cycle and for that we absolutely need the changes to btrfs that have been quietly dropped somehow. Originally the series contained a patch that removed the old blkdev_() helpers. But since this would've caused needles churn in -next for bcachefs we ended up delaying it. The second piece of work addresses one of the major annoyances about the work last cycle, namely that we required dropping s_umount whenever we used the superblock and fs_holder_ops for a block device. The reason for that requirement had been that in some codepaths s_umount could've been taken under disk->open_mutex (that's always been the case, at least theoretically). For example, on surprise block device removal or media change. And opening and closing block devices required grabbing disk->open_mutex as well. So we did the work and went through the block layer and fixed all those places so that s_umount is never taken under disk->open_mutex. This means no more brittle games where we yield and reacquire s_umount during block device opening and closing and no more requirements where block devices need to be closed. Filesystems don't need to care about this. There's a bunch of other follow-up work such as moving block device freezing and thawing to holder operations which makes it work for all block devices and not just the main block device just as we did for surprise removal. But that is for next cycle. Tested with fstests for all major fses, blktests, LTP" tag 'vfs-6.7.super' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (37 commits) porting: update locking requirements fs: assert that open_mutex isn't held over holder ops block: assert that we're not holding open_mutex over blk_report_disk_dead block: move bdev_mark_dead out of disk_check_media_change block: WARN_ON_ONCE() when we remove active partitions block: simplify bdev_del_partition() fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock jfs: fix log->bdev_handle null ptr deref in lbmStartIO bcache: Fixup error handling in register_cache() xfs: Convert to bdev_open_by_path() reiserfs: Convert to bdev_open_by_dev/path() ocfs2: Convert to use bdev_open_by_dev() nfs/blocklayout: Convert to use bdev_open_by_dev/path() jfs: Convert to bdev_open_by_dev() f2fs: Convert to bdev_open_by_dev/path() ext4: Convert to bdev_open_by_dev() erofs: Convert to use bdev_open_by_path() btrfs: Convert to bdev_open_by_path() fs: Convert to bdev_open_by_dev() mm/swap: Convert to use bdev_open_by_dev() ...	2023-10-30 08:59:05 -10:00
Steve French	7588b83066	Add definition for new smb3.1.1 command type Add structs and defines for new SMB3.1.1 command, server to client notification. See MS-SMB2 section 2.2.44 Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-30 09:57:03 -05:00
Steve French	d5a3c153fd	SMB3: clarify some of the unused CreateOption flags Update comments to show flags which should be not set (zero). See MS-SMB2 section 2.2.13 Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-30 09:57:03 -05:00
Zhihao Cheng	7569049359	ubifs: ubifs_link: Fix wrong name len calculating when UBIFS is encrypted The length of dentry name is calculated after the raw name is encrypted, except for ubifs_link(), which could make the size of dir underflow. Here is a reproducer: touch $TMP/file mkdir $TMP/dir stat $TMP/dir for i in $(seq 1 8) do ln $TMP/file $TMP/dir/$i unlink $TMP/dir/$i done stat $TMP/dir The size of dir will be underflow(-96). Fix it by calculating dentry name's length after the name is encrypted. Fixes: `f4f61d2cc6` ("ubifs: Implement encrypted filenames") Reported-by: Roland Ruckerbauer <roland.ruckerbauer@robart.cc> Link: https://lore.kernel.org/linux-mtd/1638777819.2925845.1695222544742.JavaMail.zimbra@robart.cc/T/#u Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2023-10-28 23:19:08 +02:00
Konstantin Meskhidze	d81efd6610	ubifs: fix possible dereference after free 'old_idx' could be dereferenced after free via 'rb_link_node' function call. Fixes: `b5fda08ef2` ("ubifs: Fix memleak when insert_old_idx() failed") Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2023-10-28 23:18:12 +02:00
Ferry Meng	60f2f4a81d	ubifs: Fix missing error code err Fix smatch warning: fs/ubifs/journal.c:1610 ubifs_jnl_truncate() warn: missing error code 'err' Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2023-10-28 22:10:52 +02:00
Vincent Whitchurch	f4a04c97fb	ubifs: Fix memory leak of bud->log_hash Ensure that the allocated bud->log_hash (if any) is freed in all cases when the bud itself is freed, to fix this leak caught by kmemleak: # keyctl add logon foo:bar data @s # echo clear > /sys/kernel/debug/kmemleak # mount -t ubifs /dev/ubi0_0 mnt -o auth_hash_name=sha256,auth_key=foo:bar # echo a > mnt/x # umount mnt # mount -t ubifs /dev/ubi0_0 mnt -o auth_hash_name=sha256,auth_key=foo:bar # umount mnt # sleep 5 # echo scan > /sys/kernel/debug/kmemleak # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xff... (size 128): comm "mount" backtrace: __kmalloc __ubifs_hash_get_desc+0x5d/0xe0 ubifs ubifs_replay_journal ubifs_mount ... Fixes: `da8ef65f95` ("ubifs: Authenticate replayed journal") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2023-10-28 22:09:03 +02:00
Yang Li	48ec6328de	ubifs: Fix some kernel-doc comments Add description of @time and @flags in ubifs_update_time(). to silence the warnings: fs/ubifs/file.c:1383: warning: Function parameter or member 'time' not described in 'ubifs_update_time' fs/ubifs/file.c:1383: warning: Function parameter or member 'flags' not described in 'ubifs_update_time' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5848 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2023-10-28 22:03:14 +02:00
Amir Goldstein	d9e5d9221d	fs: fix build error with CONFIG_EXPORTFS=m or not defined Many of the filesystems that call the generic exportfs helpers do not select the EXPORTFS config. Move generic_encode_ino32_fh() to libfs.c, same as generic_fh_to_*() to avoid having to fix all those config dependencies. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310262151.renqMvme-lkp@intel.com/ Fixes: dfaf653dc415 ("exportfs: make ->encode_fh() a mandatory method for NFS export") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231026204540.143217-1-amir73il@gmail.com Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 16:16:19 +02:00
Amir Goldstein	ceb3388043	freevxfs: derive f_fsid from bdev->bd_dev The majority of blockdev filesystems, which do not have a UUID in their on-disk format, derive f_fsid of statfs(2) from bdev->bd_dev. Use the same practice for freevxfs. This will allow reporting fanotify events with fanotify_event_info_fid. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231024121457.3014063-1-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 16:16:19 +02:00
Amir Goldstein	ae62bcb5e7	fs: report f_fsid from s_dev for "simple" filesystems There are many "simple" filesystems () that report null f_fsid in statfs(2). Those "simple" filesystems report sb->s_dev as the st_dev field of the stat syscalls for all inodes of the filesystem (). In order to enable fanotify reporting of events with fsid on those "simple" filesystems, report the sb->s_dev number in f_fsid field of statfs(2). () For most of the "simple" filesystem refered to in this commit, the ->statfs() operation is simple_statfs(). Some of those fs assign the simple_statfs() method directly in their ->s_op struct and some assign it indirectly via a call to simple_fill_super() or to pseudo_fs_fill_super() with either custom or "simple" s_op. We also make the same change to efivarfs and hugetlbfs, although they do not use simple_statfs(), because they use the simple_* inode opreations (e.g. simple_lookup()). (**) For most of the "simple" filesystems, the ->getattr() method is not assigned, so stat() is implemented by generic_fillattr(). A few "simple" filesystem use the simple_getattr() method which also calls generic_fillattr() to fill most of the stat struct. The two exceptions are procfs and 9p. procfs implements several different ->getattr() methods, but they all end up calling generic_fillattr() to fill the st_dev field from sb->s_dev. 9p has more complicated ->getattr() methods, but they too, end up calling generic_fillattr() to fill the st_dev field from sb->s_dev. Note that 9p and kernfs also call simple_statfs() from custom ->statfs() methods which already fill the f_fsid field, but v9fs_statfs() calls simple_statfs() only in case f_fsid was not filled and kenrfs_statfs() overwrites f_fsid after calling simple_statfs(). Link: https://lore.kernel.org/r/20230919094820.g5bwharbmy2dq46w@quack3/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023143049.2944970-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 16:16:18 +02:00
Amir Goldstein	64343119d7	exportfs: support encoding non-decodeable file handles by default AT_HANDLE_FID was added as an API for name_to_handle_at() that request the encoding of a file id, which is not intended to be decoded. This file id is used by fanotify to describe objects in events. So far, overlayfs is the only filesystem that supports encoding non-decodeable file ids, by providing export_operations with an ->encode_fh() method and without a ->decode_fh() method. Add support for encoding non-decodeable file ids to all the filesystems that do not provide export_operations, by encoding a file id of type FILEID_INO64_GEN from { i_ino, i_generation }. A filesystem may that does not support NFS export, can opt-out of encoding non-decodeable file ids for fanotify by defining an empty export_operations struct (i.e. with a NULL ->encode_fh() method). This allows the use of fanotify events with file ids on filesystems like 9p which do not support NFS export to bring fanotify in feature parity with inotify on those filesystems. Note that fanotify also requires that the filesystems report a non-null fsid. Currently, many simple filesystems that have support for inotify (e.g. debugfs, tracefs, sysfs) report a null fsid, so can still not be used with fanotify in file id reporting mode. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-5-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 16:16:18 +02:00
Amir Goldstein	41d1ddd271	exportfs: define FILEID_INO64_GEN* file handle types Similar to the common FILEID_INO32* file handle types, define common FILEID_INO64* file handle types. The type values of FILEID_INO64_GEN and FILEID_INO64_GEN_PARENT are the values returned by fuse and xfs for 64bit ino encoded file handle types. Note that these type value are filesystem specific and they do not define a universal file handle format, for example: fuse encodes FILEID_INO64_GEN as [ino-hi32,ino-lo32,gen] and xfs encodes FILEID_INO64_GEN as [hostr-order-ino64,gen] (a.k.a xfs_fid64). The FILEID_INO64_GEN fhandle type is going to be used for file ids for fanotify from filesystems that do not support NFS export. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 16:16:18 +02:00
Amir Goldstein	e21fc2038c	exportfs: make ->encode_fh() a mandatory method for NFS export Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 16:15:15 +02:00
Christian Brauner	3b224e1df6	fs: assert that open_mutex isn't held over holder ops With recent block level changes we should never be in a situation where we hold disk->open_mutex when calling into these helpers. So assert that in the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231017184823.1383356-6-hch@lst.de Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:23 +02:00
Jan Kara	fd1464105c	fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock The implementation of bdev holder operations such as fs_bdev_mark_dead() and fs_bdev_sync() grab sb->s_umount semaphore under bdev->bd_holder_lock. This is problematic because it leads to disk->open_mutex -> sb->s_umount lock ordering which is counterintuitive (usually we grab higher level (e.g. filesystem) locks first and lower level (e.g. block layer) locks later) and indeed makes lockdep complain about possible locking cycles whenever we open a block device while holding sb->s_umount semaphore. Implement a function bdev_super_lock_shared() which safely transitions from holding bdev->bd_holder_lock to holding sb->s_umount on alive superblock without introducing the problematic lock dependency. We use this function fs_bdev_sync() and fs_bdev_mark_dead(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231018152924.3858-1-jack@suse.cz Link: https://lore.kernel.org/r/20231017184823.1383356-1-hch@lst.de Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:22 +02:00
Lizhi Xu	6306ff39a7	jfs: fix log->bdev_handle null ptr deref in lbmStartIO When sbi->flag is JFS_NOINTEGRITY in lmLogOpen(), log->bdev_handle can't be inited, so it value will be NULL. Therefore, add the "log ->no_integrity=1" judgment in lbmStartIO() to avoid such problems. Reported-and-tested-by: syzbot+23bc20037854bb335d59@syzkaller.appspotmail.com Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Link: https://lore.kernel.org/r/20231009094557.1398920-1-lizhi.xu@windriver.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:22 +02:00
Jan Kara	e340dd63f6	xfs: Convert to bdev_open_by_path() Convert xfs to use bdev_open_by_path() and pass the handle around. CC: "Darrick J. Wong" <djwong@kernel.org> CC: linux-xfs@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-28-jack@suse.cz Acked-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:21 +02:00
Jan Kara	ba1787a5ed	reiserfs: Convert to bdev_open_by_dev/path() Convert reiserfs to use bdev_open_by_dev/path() and pass the handle around. CC: reiserfs-devel@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-27-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:21 +02:00
Jan Kara	ebc4185497	ocfs2: Convert to use bdev_open_by_dev() Convert ocfs2 heartbeat code to use bdev_open_by_dev() and pass the handle around. CC: Joseph Qi <joseph.qi@linux.alibaba.com> CC: ocfs2-devel@oss.oracle.com Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-26-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:21 +02:00
Jan Kara	3fe5d9fb0b	nfs/blocklayout: Convert to use bdev_open_by_dev/path() Convert block device handling to use bdev_open_by_dev/path() and pass the handle around. CC: linux-nfs@vger.kernel.org CC: Trond Myklebust <trond.myklebust@hammerspace.com> CC: Anna Schumaker <anna@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-25-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:21 +02:00
Jan Kara	898c57f456	jfs: Convert to bdev_open_by_dev() Convert jfs to use bdev_open_by_dev() and pass the handle around. CC: Dave Kleikamp <shaggy@kernel.org> CC: jfs-discussion@lists.sourceforge.net Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-24-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:21 +02:00
Jan Kara	2b107946f8	f2fs: Convert to bdev_open_by_dev/path() Convert f2fs to use bdev_open_by_dev/path() and pass the handle around. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: Chao Yu <chao@kernel.org> CC: linux-f2fs-devel@lists.sourceforge.net Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-23-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:20 +02:00
Jan Kara	d577c8aaed	ext4: Convert to bdev_open_by_dev() Convert ext4 to use bdev_open_by_dev() and pass the handle around. CC: linux-ext4@vger.kernel.org CC: Ted Tso <tytso@mit.edu> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-22-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:20 +02:00
Jan Kara	4984572008	erofs: Convert to use bdev_open_by_path() Convert erofs to use bdev_open_by_path() and pass the handle around. CC: Gao Xiang <xiang@kernel.org> CC: Chao Yu <chao@kernel.org> CC: linux-erofs@lists.ozlabs.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-21-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:20 +02:00
Jan Kara	86ec15d00b	btrfs: Convert to bdev_open_by_path() Convert btrfs to use bdev_open_by_path() and pass the handle around. We also drop the holder from struct btrfs_device as it is now not needed anymore. CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-20-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:20 +02:00
Jan Kara	f4a48bc36c	fs: Convert to bdev_open_by_dev() Convert mount code to use bdev_open_by_dev() and propagate the handle around to bdev_release(). Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-19-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-28 13:29:19 +02:00
Linus Torvalds	d1b0949f23	assorted fixes all over the place -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZTxSwQAKCRBZ7Krx/gZQ 6zadAP9o/724KPDCY3ybgwKyEQ1UNjHTriFRBeoF3o2q0WgidwEA+/xS0Xk3i25w xnSZO/8My1edE1IcK/JDwewH/J+4Kw0= =N/Lv -----END PGP SIGNATURE----- Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc filesystem fixes from Al Viro: "Assorted fixes all over the place: literally nothing in common, could have been three separate pull requests. All are simple regression fixes, but not for anything from this cycle" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lock io_uring: kiocb_done() should not trust ->ki_pos if ->{read,write}_iter() failed sparc32: fix a braino in fault handling in csum_and_copy_..._user()	2023-10-27 16:44:58 -10:00
Al Viro	dc32464a5f	ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lock Use of dget() after we'd dropped ->d_lock is too late - dentry might be gone by that point. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2023-10-27 20:14:38 -04:00
Lukas Wunner	201c0da4d0	treewide: Add SPDX identifier to IETF ASN.1 modules Per section 4.c. of the IETF Trust Legal Provisions, "Code Components" in IETF Documents are licensed on the terms of the BSD-3-Clause license: https://trustee.ietf.org/documents/trust-legal-provisions/tlp-5/ The term "Code Components" specifically includes ASN.1 modules: https://trustee.ietf.org/documents/trust-legal-provisions/code-components-list-3/ Add an SPDX identifier as well as a copyright notice pursuant to section 6.d. of the Trust Legal Provisions to all ASN.1 modules in the tree which are derived from IETF Documents. Section 4.d. of the Trust Legal Provisions requests that each Code Component identify the RFC from which it is taken, so link that RFC in every ASN.1 module. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-27 18:04:28 +08:00
Dominique Martinet	e02be6390d	9p/fs: add MODULE_DESCIPTION Fix modpost warning that MODULE_DESCRIPTION is missing in fs/9p/9p.o Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Message-ID: <20231025223107.1274963-1-asmadeus@codewreck.org> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>	2023-10-27 12:44:13 +09:00
Steven Rostedt (Google)	29e06c1070	eventfs: Fix typo in eventfs_inode union comment It's eventfs_inode not eventfs_indoe. There's no deer involved! Link: https://lore.kernel.org/linux-trace-kernel/20231024131024.5634c743@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-10-25 21:26:26 -04:00
Steven Rostedt (Google)	a9de4eb15a	eventfs: Fix WARN_ON() in create_file_dentry() As the comment right above a WARN_ON() in create_file_dentry() states: * Note, with the mutex held, the e_dentry cannot have content * and the ei->is_freed be true at the same time. But the WARN_ON() only has: WARN_ON_ONCE(ei->is_free); Where to match the comment (and what it should actually do) is: dentry = *e_dentry; WARN_ON_ONCE(dentry && ei->is_free) Also in that case, set dentry to NULL (although it should never happen). Link: https://lore.kernel.org/linux-trace-kernel/20231024123628.62b88755@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-10-25 21:26:26 -04:00
Hugh Dickins	ddc1a5cbc0	mempolicy: alloc_pages_mpol() for NUMA policy without vma Shrink shmem's stack usage by eliminating the pseudo-vma from its folio allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the principal actor for passing mempolicy choice down to __alloc_pages(), rather than vma_alloc_folio(gfp, order, vma, addr, hugepage). vma_alloc_folio() and alloc_pages() remain, but as wrappers around alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the additional args to policy_nodemask(), which subsumes policy_node(). Cleanup throughout, cutting out some unhelpful "helpers". It would all be much simpler without MPOL_INTERLEAVE, but that adds a dynamic to the constant mpol: complicated by v3.6 commit `09c231cb8b` ("tmpfs: distribute interleave better across nodes"), which added ino bias to the interleave, hidden from mm/mempolicy.c until this commit. Hence "ilx" throughout, the "interleave index". Originally I thought it could be done just with nid, but that's wrong: the nodemask may come from the shared policy layer below a shmem vma, or it may come from the task layer above a shmem vma; and without the final nodemask then nodeid cannot be decided. And how ilx is applied depends also on page order. The interleave index is almost always irrelevant unless MPOL_INTERLEAVE: with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX passed down from vma-less alloc_pages() is also used as hint not to use THP-style hugepage allocation - to avoid the overhead of a hugepage arg (though I don't understand why we never just added a GFP bit for THP - if it actually needs a different allocation strategy from other pages of the same order). vma_alloc_folio() still carries its hugepage arg here, but it is not used, and should be removed when agreed. get_vma_policy() no longer allows a NULL vma: over time I believe we've eradicated all the places which used to need it e.g. swapoff and madvise used to pass NULL vma to read_swap_cache_async(), but now know the vma. [hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()] Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:16 -07:00
Hugh Dickins	4b981bc1aa	kernfs: drop shared NUMA mempolicy hooks It seems strange that kernfs should be an outlier with a set_policy and get_policy in its kernfs_vm_ops. Ah, it dates back to v2.6.30's commit `095160aee9` ("sysfs: fix some bin_vm_ops errors"), when I had crashed on powerpc's pci_mmap_legacy_page_range() fallback to shmem_zero_setup(). Well, that was commendably thorough, to give sysfs-bin a set_policy and get_policy, just to avoid the way it was coded resulting in EINVAL from mmap when CONFIG_NUMA; but somehow feels a bit over-the-top to me now. It's easier to say that nobody should expect to manage a shmem object's shared NUMA mempolicy via some kernfs backdoor to that object: delete that code (and there's no longer an EINVAL from mmap in the NUMA case). This then leaves set_policy/get_policy as implemented only by shmem - though importantly also by SysV SHM, which has to interface with shmem which implements them, and with SHM_HUGETLB which does not. Link: https://lkml.kernel.org/r/302164-a760-4a9e-879b-6870c9b4013@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:15 -07:00
Hugh Dickins	10969b5571	hugetlbfs: drop shared NUMA mempolicy pretence Patch series "mempolicy: cleanups leading to NUMA mpol without vma", v2. Mostly cleanups in mm/mempolicy.c, but finally removing the pseudo-vma from shmem folio allocation, and removing the mmap_lock around folio migration for mbind and migrate_pages syscalls. This patch (of 12): hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA mempolicy for this file? hugetlb_vm_ops has never offered a set_policy method, and hugetlbfs_parse_param() has never supported any mpol options for a mount-wide default policy. It's just an illusion: clean it away so as not to confuse others, giving us more freedom to adjust shmem's set_policy/get_policy implementation. But hugetlbfs_inode_info is still required, just to accommodate seals. Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a set_policy method and/or mpol mount option (Andi's first posting did include an admitted-unsatisfactory hugetlb_set_policy()); but it seems that nobody has bothered to add that in the nineteen years since v2.6.7 made it possible, and there is at least one company that has invested enough into hugetlbfs, that I guess they have learnt well enough how to manage its NUMA, without needing shared mempolicy. Remove linux/mempolicy.h from linux/hugetlb.h: include linux/pagemap.h in its place, because hugetlb.h's recently added use of filemap_lock_folio() requires that (although most .configs and .c's get it in some other way). Link: https://lkml.kernel.org/r/ebc0987e-beff-8bfb-9283-234c2cbd17c5@google.com Link: https://lkml.kernel.org/r/cae82d4b-904a-faaf-282a-34fcc188c81f@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:15 -07:00
Hugh Dickins	1cbf0a5884	ext4: add __GFP_NOWARN to GFP_NOWAIT in readahead Since commit `e509ad4d77` ("ext4: use bdev_getblk() to avoid memory reclaim in readahead path") rightly replaced GFP_NOFAIL allocations by GFP_NOWAIT allocations, I've occasionally been seeing "page allocation failure: order:0" warnings under load: all with ext4_sb_breadahead_unmovable() in the stack. I don't think those warnings are of any interest: suppress them with __GFP_NOWARN. Link: https://lkml.kernel.org/r/7bc6ad16-9a4d-dd90-202e-47d6cbb5a136@google.com Fixes: `e509ad4d77` ("ext4: use bdev_getblk() to avoid memory reclaim in readahead path") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Hui Zhu <teawater@antgroup.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:14 -07:00
Matthew Wilcox (Oracle)	0a88810d9b	buffer: remove folio_create_empty_buffers() With all users converted, remove the old create_empty_buffers() and rename folio_create_empty_buffers() to create_empty_buffers(). Link: https://lkml.kernel.org/r/20231016201114.1928083-28-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:10 -07:00
Matthew Wilcox (Oracle)	c9f2480ed7	ufs: remove ufs_get_locked_page() Both callers are now converted to ufs_get_locked_folio(). Link: https://lkml.kernel.org/r/20231016201114.1928083-27-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:10 -07:00
Matthew Wilcox (Oracle)	c7e8812ce5	ufs: convert ufs_change_blocknr() to use folios Convert the locked_page argument to a folio, then use folios throughout. Saves three hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-26-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	e7ca7f1725	ufs: use ufs_get_locked_folio() in ufs_alloc_lastblock() Switch to the folio APIs, saving one folio->page->folio conversion. Link: https://lkml.kernel.org/r/20231016201114.1928083-25-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	5fb7bd50b3	ufs: add ufs_get_locked_folio and ufs_put_locked_folio Convert the _page variants to call them. Saves a few hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	44f6857526	reiserfs: convert writepage to use a folio Convert the incoming page to a folio and then use it throughout the writeback path. This definitely isn't enough to support large folios, but I don't expect reiserfs to gain support for those before it is removed. Link: https://lkml.kernel.org/r/20231016201114.1928083-23-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	414ae0a440	ocfs2: convert ocfs2_map_page_blocks to use a folio Convert the page argument to a folio and then use the folio APIs throughout. Replaces three hidden calls to compound_head() with one explicit one. Link: https://lkml.kernel.org/r/20231016201114.1928083-22-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	c3f4200ac6	ntfs3: convert ntfs_zero_range() to use a folio Use the folio API throughout, saving six hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-21-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	24a7b35285	ntfs: convert ntfs_prepare_pages_for_non_resident_write() to folios Convert each element of the pages array to a folio before using it. This in no way renders the function large-folio safe, but it does remove a lot of hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-20-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	a04eb7cb18	ntfs: convert ntfs_writepage to use a folio Use folio APIs throughout. Saves many hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-19-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	a2da3afce9	ntfs: convert ntfs_read_block() to use a folio The caller already has the folio, so pass it in and use the folio API throughout saving five hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-18-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	922b12eff0	nilfs2: convert nilfs_lookup_dirty_data_buffers to use folio_create_empty_buffers This function was already using a folio, so this update to the new API removes a single folio->page->folio conversion. Link: https://lkml.kernel.org/r/20231016201114.1928083-17-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	73c32e07a3	nilfs2: remove nilfs_page_get_nth_block All users have now been converted to get_nth_block(). Link: https://lkml.kernel.org/r/20231016201114.1928083-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	664c87b75e	nilfs2: convert nilfs_mdt_get_frozen_buffer to use a folio Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	1a846bf388	nilfs2: convert nilfs_mdt_forget_block() to use a folio Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	4093602d6b	nilfs2: convert nilfs_copy_page() to nilfs_copy_folio() Both callers already have a folio, so pass it in and use it directly. Removes a lot of hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:09 -07:00
Matthew Wilcox (Oracle)	c5521c7689	nilfs2: convert nilfs_grab_buffer() to use a folio Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	6c346be91d	nilfs2: convert nilfs_mdt_freeze_buffer to use a folio Remove a number of folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	4064a0aa8a	gfs2: convert gfs2_write_buf_to_page() to use a folio Remove several folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	c646e57372	gfs2: convert gfs2_getjdatabuf to use a folio Use the folio APIs, saving four hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	0eb751791d	gfs2: convert gfs2_getbuf() to folios Remove several folio->page->folio conversions. Also use __GFP_NOFAIL instead of calling yield() and the new get_nth_bh(). Link: https://lkml.kernel.org/r/20231016201114.1928083-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	81cb277ebd	gfs2: convert inode unstuffing to use a folio Use the folio APIs, removing numerous hidden calls to compound_head(). Also remove the stale comment about the page being looked up if it's NULL. Link: https://lkml.kernel.org/r/20231016201114.1928083-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	0217fbb027	buffer: add get_nth_bh() Extract this useful helper from nilfs_page_get_nth_block() Link: https://lkml.kernel.org/r/20231016201114.1928083-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	d405999367	ext4: convert to folio_create_empty_buffers Remove an unnecessary folio->page->folio conversion and take advantage of the new return value from folio_create_empty_buffers(). Link: https://lkml.kernel.org/r/20231016201114.1928083-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	4f05f139e3	mpage: convert map_buffer_to_folio() to folio_create_empty_buffers() Saves a folio->page->folio conversion. Link: https://lkml.kernel.org/r/20231016201114.1928083-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	3decb8564e	buffer: make folio_create_empty_buffers() return a buffer_head Patch series "Finish the create_empty_buffers() transition", v2. Pankaj recently added folio_create_empty_buffers() as the folio equivalent to create_empty_buffers(). This patch set finishes the conversion by first converting all remaining filesystems to call folio_create_empty_buffers(), then renaming it back to create_empty_buffers(). I took the opportunity to make a few simplifications like making folio_create_empty_buffers() return the head buffer and extracting get_nth_bh() from nilfs2. A few of the patches in this series aren't directly related to create_empty_buffers(), but I saw them while I was working on this and thought they'd be easy enough to add to this series. Compile-tested only, other than ext4. This patch (of 26): Almost all callers want to know the first BH that was allocated for this folio. We already have that handy, so return it. Link: https://lkml.kernel.org/r/20231016201114.1928083-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231016201114.1928083-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Dominique Martinet	9b5c628183	9p: v9fs_listxattr: fix %s null argument warning W=1 warns about null argument to kprintf: In file included from fs/9p/xattr.c:12: In function ‘v9fs_xattr_get’, inlined from ‘v9fs_listxattr’ at fs/9p/xattr.c:142:9: include/net/9p/9p.h:55:2: error: ‘%s’ directive argument is null [-Werror=format-overflow=] 55 \| _p9_debug(level, __func__, fmt, ##__VA_ARGS__) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use an empty string instead of : - this is ok 9p-wise because p9pdu_vwritef serializes a null string and an empty string the same way (one '0' word for length) - since this degrades the print statements, add new single quotes for xattr's name delimter (Old: "file = (null)", new: "file = ''") Link: https://lore.kernel.org/r/20231008060138.517057-1-suhui@nfschina.com Suggested-by: Su Hui <suhui@nfschina.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <20231025103445.1248103-2-asmadeus@codewreck.org>	2023-10-26 07:05:52 +09:00
Christian Brauner	61d4fb0b34	file, i915: fix file reference for mmap_singleton() Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_rcu() goes into an infinite loop trying to carefully verify that i915->gem.mmap_singleton hasn't changed - see the splat below. So I stared at this code to figure out what it actually does. It seems that the i915->gem.mmap_singleton pointer itself never had rcu semantics. The i915->gem.mmap_singleton is replaced in file->f_op->release::singleton_release(): static int singleton_release(struct inode inode, struct file file) { struct drm_i915_private *i915 = file->private_data; cmpxchg(&i915->gem.mmap_singleton, file, NULL); drm_dev_put(&i915->drm); return 0; } The cmpxchg() is ordered against a concurrent update of i915->gem.mmap_singleton from mmap_singleton(). IOW, when mmap_singleton() fails to get a reference on i915->gem.mmap_singleton: While mmap_singleton() does rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); it allocates a new file via anon_inode_getfile() and does smp_store_mb(i915->gem.mmap_singleton, file); So, then what happens in the case of this bug is that at some point fput() is called and drops the file->f_count to zero leaving the pointer in i915->gem.mmap_singleton in tact. Now, there might be delays until file->f_op->release::singleton_release() is called and i915->gem.mmap_singleton is set to NULL. Say concurrently another task hits mmap_singleton() and does: rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); When get_file_rcu() fails to get a reference via atomic_inc_not_zero() it will try the reload from i915->gem.mmap_singleton expecting it to be NULL, assuming it has comparable semantics as we expect in __fget_files_rcu(). But it hasn't so it reloads the same pointer again, trying the same atomic_inc_not_zero() again and doing so until file->f_op->release::singleton_release() of the old file has been called. So, in contrast to __fget_files_rcu() here we want to not retry when atomic_inc_not_zero() has failed. We only want to retry in case we managed to get a reference but the pointer did change on reload. <3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238 <3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20) <6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002 <6> [511.395962] Call Trace: <6> [511.395966] <TASK> <6> [511.395974] ? __schedule+0x3a8/0xd70 <6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140 <6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396029] ? get_file_rcu+0x10/0x30 <6> [511.396039] ? get_file_rcu+0x10/0x30 <6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915] <6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915] <6> [511.396903] ? mmap_region+0x253/0xb60 <6> [511.396925] ? do_mmap+0x334/0x5c0 <6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0 <6> [511.396949] ? rcu_is_watching+0x11/0x50 <6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915] <6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915] <6> [511.397762] ? igt_mmap+0x11e/0x150 [i915] <6> [511.398139] ? __trace_bprintk+0x76/0x90 <6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915] <6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915] <6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915] <6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915] <6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915] <6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915] <6> [511.400692] ? pci_device_probe+0x95/0x120 <6> [511.400704] ? really_probe+0x164/0x3c0 <6> [511.400715] ? __pfx___driver_attach+0x10/0x10 <6> [511.400722] ? __driver_probe_device+0x73/0x160 <6> [511.400731] ? driver_probe_device+0x19/0xa0 <6> [511.400741] ? __driver_attach+0xb6/0x180 <6> [511.400749] ? __pfx___driver_attach+0x10/0x10 <6> [511.400756] ? bus_for_each_dev+0x77/0xd0 <6> [511.400770] ? bus_add_driver+0x114/0x210 <6> [511.400781] ? driver_register+0x5b/0x110 <6> [511.400791] ? i915_init+0x23/0xc0 [i915] <6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915] <6> [511.401503] ? do_one_initcall+0x57/0x270 <6> [511.401515] ? rcu_is_watching+0x11/0x50 <6> [511.401521] ? kmalloc_trace+0xa3/0xb0 <6> [511.401532] ? do_init_module+0x5f/0x210 <6> [511.401544] ? load_module+0x1d00/0x1f60 <6> [511.401581] ? init_module_from_file+0x86/0xd0 <6> [511.401590] ? init_module_from_file+0x86/0xd0 <6> [511.401613] ? idempotent_init_module+0x17c/0x230 <6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0 <6> [511.401650] ? do_syscall_64+0x3c/0x90 <6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 <6> [511.401684] </TASK> Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963 Cc: Jann Horn <jannh@google.com>, Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-25 22:17:04 +02:00
Matthew Wilcox (Oracle)	82dd620653	ext2: Convert ext2_prepare_chunk and ext2_commit_chunk to folios All callers now have a folio, so pass it in. Saves one call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>	2023-10-25 20:31:51 +02:00
Matthew Wilcox (Oracle)	da3a849a5c	ext2: Convert ext2_make_empty() to use a folio Remove two hidden calls to compound_head() by using the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-9-willy@infradead.org>	2023-10-25 20:31:29 +02:00
Matthew Wilcox (Oracle)	c2d20492e2	ext2: Convert ext2_unlink() and ext2_rename() to use folios This involves changing ext2_find_entry(), ext2_dotdot(), ext2_inode_by_name(), ext2_set_link() and ext2_delete_entry() to take a folio. These were also the last users of ext2_get_page() and ext2_put_page(), so remove those at the same time. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-8-willy@infradead.org>	2023-10-25 20:31:29 +02:00
Matthew Wilcox (Oracle)	7e56bbf15d	ext2: Convert ext2_delete_entry() to use folios Save some calls to compound_head() by using the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-7-willy@infradead.org>	2023-10-25 20:28:33 +02:00
Matthew Wilcox (Oracle)	f4b830cfce	ext2: Convert ext2_empty_dir() to use a folio Save two calls to compound_head() by using the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-6-willy@infradead.org>	2023-10-25 20:19:01 +02:00
Matthew Wilcox (Oracle)	1de0736c3a	ext2: Convert ext2_add_link() to use a folio Remove five hidden calls to compound_head() and fix a couple of places that assumed PAGE_SIZE. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-5-willy@infradead.org>	2023-10-25 20:19:01 +02:00
Matthew Wilcox (Oracle)	51706b6fd4	ext2: Convert ext2_readdir to use a folio Saves a hidden call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-4-willy@infradead.org>	2023-10-25 20:19:00 +02:00
Matthew Wilcox (Oracle)	52df49ee83	ext2: Add ext2_get_folio() Convert ext2_get_page() into ext2_get_folio() and keep the original function around as a temporary wrapper. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-3-willy@infradead.org>	2023-10-25 20:19:00 +02:00
Matthew Wilcox (Oracle)	46f84a9bea	ext2: Convert ext2_check_page to ext2_check_folio Support in this function for large folios is limited to supporting filesystems with block size > PAGE_SIZE. This new functionality will only be supported on machines without HIGHMEM, so the problem of kmap_local only being able to map a single page in the folio can be ignored. We will not use large folios for ext2 directories on HIGHMEM machines. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-2-willy@infradead.org>	2023-10-25 20:19:00 +02:00
Amir Goldstein	66c62769bc	exportfs: add helpers to check if filesystem can encode/decode file handles The logic of whether filesystem can encode/decode file handles is open coded in many places. In preparation to changing the logic, move the open coded logic into inline helpers. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-24 17:57:45 +02:00
Ian Kent	d3c5006176	autofs: fix add autofs_parse_fd() We are seeing systemd hang on its autofs direct mount at /proc/sys/fs/binfmt_misc. Historically this was due to a mismatch in the communication structure size between a 64 bit kernel and a 32 bit user space and was fixed by making the pipe communication record oriented. During autofs v5 development I decided to stay with the existing usage instead of changing to a packed structure for autofs <=> user space communications which turned out to be a mistake on my part. Problems arose and they were fixed by allowing for the 64 bit to 32 bit size difference in the automount(8) code. Along the way systemd started to use autofs and eventually encountered this problem too. systemd refused to compensate for the length difference insisting it be fixed in the kernel. Fortunately Linus implemented the packetized pipe which resolved the problem in a straight forward and simple way. In the autofs mount api conversion series I inadvertatly dropped the packet pipe flag settings when adding the autofs_parse_fd() function. This patch fixes that omission. Fixes: `546694b8f6` ("autofs: add autofs_parse_fd()") Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20231023093359.64265-1-raven@themaw.net Tested-by: Anders Roxell <anders.roxell@linaro.org> Cc: Bill O'Donnell <bodonnel@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-24 11:04:45 +02:00
Bernd Schubert	c04d905f6c	vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups The calling code actually handles -ECHILD, so this BUG_ON can be converted to WARN_ON_ONCE. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Link: https://lore.kernel.org/r/20231023184718.11143-1-bschubert@ddn.com Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-24 10:51:05 +02:00
Linus Torvalds	d88520ad73	fix for lock_rename() misuse in nfsd -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZS4N8wAKCRBZ7Krx/gZQ 65q9AQDhucfo26czFALs6aOceZ1K+FUu3OzgU0gbQaCCLhuubwD/Uu3GXL2KrVaj uMk7Wv6a68/j1VXwtNMpSb0MV09j/wM= =xKoB -----END PGP SIGNATURE----- Merge tag 'pull-nfsd-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nfsd fix from Al Viro: "Catch from lock_rename() audit; nfsd_rename() checked that both directories belonged to the same filesystem, but only after having done lock_rename(). Trivial fix, tested and acked by nfs folks" * tag 'pull-nfsd-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: lock_rename() needs both directories to live on the same fs	2023-10-23 20:40:04 -10:00
Yue Haibing	a321af9dd0	fs/9p: Remove unused function declaration v9fs_inode2stat() Commit `531b1094b7` ("[PATCH] v9fs: zero copy implementation") declared but never implemented this. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Message-ID: <20230807141726.38860-1-yuehaibing@huawei.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2023-10-24 13:52:56 +09:00
Linus Torvalds	e017769f4c	for-6.6-rc7-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmU2lLEACgkQxWXV+ddt WDvCThAApe+zMNdEhQ/cgrvfzP/X91Q53PXQsdVsrujPyUV8eEV4oJzEwVbJhRdw 3ukIQtvyAMNiWhEBhOQRwxjuUoTCApGAeEEEl1cWWEqQ7G2/2LS4+bcWzgQ3Vu32 dzYL37ddsfe4n7OgfnymtMrnv7kge0XbAlY3GbavaDccZDQDqcD5wSAOyOhfIsH7 kcu4sA5Fi44wVSfAJX1Dms+wXfsmQu/sd3c9Gcyce9Hpy1cEW3vWbApLBE4K0aKX /JHTdmkAJ20a4APQsfGH+UymyuZgr8d2eGmL9rVYKhT/c+Dow0lNAWYkvGf/MawM CX3GdP6f6ZOR/anCPZ8nqZCE5AoFykGazvpCCSrvCOpU7o7GqxbAQkWWFcMp1FHW 9TFrj81WK18DeCfCNw7lR3sdMy/2o2nnSUAw3DFY4n/3Lek7FUmrBTHvXlWDot7T TM9CzYGF840QhL5s5SMYS09YmeI0I34L7HJAi/+qli48SooGuL9RZ29TmzHIX69Y 2bgpS64j06p/AGEnfHAcT1LbpiFCPmO5cpXKv/t40GL5QO5d4WV698ysDGoPYUPO 8CPL85Y8cao56KGJLyOroGz0P1bo+RdNe5bN6xJJoTRn1Y9oUA+bQSnN8x9iuunF 9QZrAIHzNyDcRGzoqgDW+3bivOvIus/Dto/u1P3ap68kP2HTVsY= =gOyi -----END PGP SIGNATURE----- Merge tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix for a problem with snapshot of a newly created subvolume that can lead to inconsistent data under some circumstances. Kernel 6.5 added a performance optimization to skip transaction commit for subvolume creation but this could end up with newer data on disk but not linked to other structures. The fix itself is an added condition, the rest of the patch is a parameter added to several functions" * tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix unwritten extent buffer after snapshotting a new subvolume	2023-10-23 07:59:13 -10:00
Filipe Manana	eb96e22193	btrfs: fix unwritten extent buffer after snapshotting a new subvolume When creating a snapshot of a subvolume that was created in the current transaction, we can end up not persisting a dirty extent buffer that is referenced by the snapshot, resulting in IO errors due to checksum failures when trying to read the extent buffer later from disk. A sequence of steps that leads to this is the following: 1) At ioctl.c:create_subvol() we allocate an extent buffer, with logical address 36007936, for the leaf/root of a new subvolume that has an ID of 291. We mark the extent buffer as dirty, and at this point the subvolume tree has a single node/leaf which is also its root (level 0); 2) We no longer commit the transaction used to create the subvolume at create_subvol(). We used to, but that was recently removed in commit `1b53e51a4a` ("btrfs: don't commit transaction for every subvol create"); 3) The transaction used to create the subvolume has an ID of 33, so the extent buffer 36007936 has a generation of 33; 4) Several updates happen to subvolume 291 during transaction 33, several files created and its tree height changes from 0 to 1, so we end up with a new root at level 1 and the extent buffer 36007936 is now a leaf of that new root node, which is extent buffer 36048896. The commit root remains as 36007936, since we are still at transaction 33; 5) Creation of a snapshot of subvolume 291, with an ID of 292, starts at ioctl.c:create_snapshot(). This triggers a commit of transaction 33 and we end up at transaction.c:create_pending_snapshot(), in the critical section of a transaction commit. There we COW the root of subvolume 291, which is extent buffer 36048896. The COW operation returns extent buffer 36048896, since there's no need to COW because the extent buffer was created in this transaction and it was not written yet. The we call btrfs_copy_root() against the root node 36048896. During this operation we allocate a new extent buffer to turn into the root node of the snapshot, copy the contents of the root node 36048896 into this snapshot root extent buffer, set the owner to 292 (the ID of the snapshot), etc, and then we call btrfs_inc_ref(). This will create a delayed reference for each leaf pointed by the root node with a reference root of 292 - this includes a reference for the leaf 36007936. After that we set the bit BTRFS_ROOT_FORCE_COW in the root's state. Then we call btrfs_insert_dir_item(), to create the directory entry in in the tree of subvolume 291 that points to the snapshot. This ends up needing to modify leaf 36007936 to insert the respective directory items. Because the bit BTRFS_ROOT_FORCE_COW is set for the root's state, we need to COW the leaf. We end up at btrfs_force_cow_block() and then at update_ref_for_cow(). At update_ref_for_cow() we call btrfs_block_can_be_shared() which returns false, despite the fact the leaf 36007936 is shared - the subvolume's root and the snapshot's root point to that leaf. The reason that it incorrectly returns false is because the commit root of the subvolume is extent buffer 36007936 - it was the initial root of the subvolume when we created it. So btrfs_block_can_be_shared() which has the following logic: int btrfs_block_can_be_shared(struct btrfs_root root, struct extent_buffer buf) { if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && buf != root->node && buf != root->commit_root && (btrfs_header_generation(buf) <= btrfs_root_last_snapshot(&root->root_item) \|\| btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC))) return 1; return 0; } Returns false (0) since 'buf' (extent buffer 36007936) matches the root's commit root. As a result, at update_ref_for_cow(), we don't check for the number of references for extent buffer 36007936, we just assume it's not shared and therefore that it has only 1 reference, so we set the local variable 'refs' to 1. Later on, in the final if-else statement at update_ref_for_cow(): static noinline int update_ref_for_cow(struct btrfs_trans_handle trans, struct btrfs_root root, struct extent_buffer buf, struct extent_buffer cow, int last_ref) { (...) if (refs > 1) { (...) } else { (...) btrfs_clear_buffer_dirty(trans, buf); last_ref = 1; } } So we mark the extent buffer 36007936 as not dirty, and as a result we don't write it to disk later in the transaction commit, despite the fact that the snapshot's root points to it. Attempting to access the leaf or dumping the tree for example shows that the extent buffer was not written: $ btrfs inspect-internal dump-tree -t 292 /dev/sdb btrfs-progs v6.2.2 file tree key (292 ROOT_ITEM 33) node 36110336 level 1 items 2 free space 119 generation 33 owner 292 node 36110336 flags 0x1(WRITTEN) backref revision 1 checksum stored a8103e3e checksum calced a8103e3e fs uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 chunk uuid e8c9c885-78f4-4d31-85fe-89e5f5fd4a07 key (256 INODE_ITEM 0) block 36007936 gen 33 key (257 EXTENT_DATA 0) block 36052992 gen 33 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 total bytes 107374182400 bytes used 38572032 uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 The respective on disk region is full of zeroes as the device was trimmed at mkfs time. Obviously 'btrfs check' also detects and complains about this: $ btrfs check /dev/sdb Opening filesystem to check... Checking filesystem on /dev/sdb UUID: 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 generation: 33 (33) [1/7] checking root items [2/7] checking extents checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 owner ref check failed [36007936 4096] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 The following tree block(s) is corrupted in tree 292: tree block bytenr: 36110336, level: 1, node key: (256, 1, 0) root 292 root dir 256 not found ERROR: errors found in fs roots found 38572032 bytes used, error(s) found total csum bytes: 16048 total tree bytes: 1265664 total fs tree bytes: 1118208 total extent tree bytes: 65536 btree space waste bytes: 562598 file data blocks allocated: 65978368 referenced 36569088 Fix this by updating btrfs_block_can_be_shared() to consider that an extent buffer may be shared if it matches the commit root and if its generation matches the current transaction's generation. This can be reproduced with the following script: $ cat test.sh #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi # Use a filesystem with a 64K node size so that we have the same node # size on every machine regardless of its page size (on x86_64 default # node size is 16K due to the 4K page size, while on PPC it's 64K by # default). This way we can make sure we are able to create a btree for # the subvolume with a height of 2. mkfs.btrfs -f -n 64K $DEV mount $DEV $MNT btrfs subvolume create $MNT/subvol # Create a few empty files on the subvolume, this bumps its btree # height to 2 (root node at level 1 and 2 leaves). for ((i = 1; i <= 300; i++)); do echo -n > $MNT/subvol/file_$i done btrfs subvolume snapshot -r $MNT/subvol $MNT/subvol/snap umount $DEV btrfs check $DEV Running it on a 6.5 kernel (or any 6.6-rc kernel at the moment): $ ./test.sh Create subvolume '/mnt/sdi/subvol' Create a readonly snapshot of '/mnt/sdi/subvol' in '/mnt/sdi/subvol/snap' Opening filesystem to check... Checking filesystem on /dev/sdi UUID: bbdde2ff-7d02-45ca-8a73-3c36f23755a1 [1/7] checking root items [2/7] checking extents parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure owner ref check failed [30539776 65536] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure Wrong key of child node/leaf, wanted: (256, 1, 0), have: (2, 132, 0) Wrong generation of child node/leaf, wanted: 5, have: 7 root 257 root dir 256 not found ERROR: errors found in fs roots found 917504 bytes used, error(s) found total csum bytes: 0 total tree bytes: 851968 total fs tree bytes: 393216 total extent tree bytes: 65536 btree space waste bytes: 736550 file data blocks allocated: 0 referenced 0 A test case for fstests will follow soon. Fixes: `1b53e51a4a` ("btrfs: don't commit transaction for every subvol create") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-23 17:17:30 +02:00
Andreas Gruenbacher	1358706907	gfs2: Stop using GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT Header gfs2_ondisk.h defines GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT in a misguided attempt to abstract away the fact that sectors on block devices are 512 or (1 << 9) bytes in size. Stop using those definitions. I would be inclinded to remove those definitions altogether, but the gfs2 user-space tools are using them. In addition, instead of GFS2_SB(inode)->sd_sb.sb_bsize_shift, simply use inode->i_blkbits. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andrew Price <anprice@redhat.com>	2023-10-23 11:47:13 +02:00
Andreas Gruenbacher	2d8d799061	gfs2: setattr_chown: Add missing initialization Add a missing initialization of variable ap in setattr_chown(). Without, chown() may be able to bypass quotas. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-10-23 11:47:13 +02:00
Catherine Hoang	14a537983b	xfs: allow read IO and FICLONE to run concurrently One of our VM cluster management products needs to snapshot KVM image files so that they can be restored in case of failure. Snapshotting is done by redirecting VM disk writes to a sidecar file and using reflink on the disk image, specifically the FICLONE ioctl as used by "cp --reflink". Reflink locks the source and destination files while it operates, which means that reads from the main vm disk image are blocked, causing the vm to stall. When an image file is heavily fragmented, the copy process could take several minutes. Some of the vm image files have 50-100 million extent records, and duplicating that much metadata locks the file for 30 minutes or more. Having activities suspended for such a long time in a cluster node could result in node eviction. Clone operations and read IO do not change any data in the source file, so they should be able to run concurrently. Demote the exclusive locks taken by FICLONE to shared locks to allow reads while cloning. While a clone is in progress, writes will take the IOLOCK_EXCL, so they block until the clone completes. Link: https://lore.kernel.org/linux-xfs/8911B94D-DD29-4D6E-B5BC-32EAF1866245@oracle.com/ Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-10-23 12:02:26 +05:30
Christoph Hellwig	35dc55b9e8	xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space If xfs_bmapi_write finds a delalloc extent at the requested range, it tries to convert the entire delalloc extent to a real allocation. But if the allocator cannot find a single free extent large enough to cover the start block of the requested range, xfs_bmapi_write will return 0 but leave *nimaps set to 0. In that case we simply need to keep looping with the same startoffset_fsb so that one of the following allocations will eventually reach the requested range. Note that this could affect any caller of xfs_bmapi_write that covers an existing delayed allocation. As far as I can tell we do not have any other such caller, though - the regular writeback path uses xfs_bmapi_convert_delalloc to convert delayed allocations to real ones, and direct I/O invalidates the page cache first. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-10-23 11:06:54 +05:30
Cheng Lin	2b99e410b2	xfs: introduce protection for drop nlink When abnormal drop_nlink are detected on the inode, return error, to avoid corruption propagation. Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-10-23 11:04:47 +05:30
Chandan Babu R	9fa8753aa1	xfs: CPU usage optimizations for realtime allocator [v2.3] This is version 2 of [Omar's] XFS realtime allocator opimization patch series. Changes since v1 [1]: - Fixed potential overflow in patch 4. - Changed deprecated typedefs to normal struct names - Fixed broken indentation - Used xfs_fileoff_t instead of xfs_fsblock_t where appropriate. - Added calls to xfs_rtbuf_cache_relse anywhere that the cache is used instead of relying on the buffers being dirtied and thus attached to the transaction. - Clarified comments and commit messages in a few places. - Added Darrick's Reviewed-bys. Cover letter from v1: Our distributed storage system uses XFS's realtime device support as a way to split an XFS filesystem between an SSD and an HDD -- we configure the HDD as the realtime device so that metadata goes on the SSD and data goes on the HDD. We've been running this in production for a few years now, so we have some fairly fragmented filesystems. This has exposed various CPU inefficiencies in the realtime allocator. These became even worse when we experimented with using XFS_XFLAG_EXTSIZE to force files to be allocated contiguously. This series adds several optimizations that don't change the realtime allocator's decisions, but make them happen more efficiently, mainly by avoiding redundant work. We've tested these in production and measured ~10%% lower CPU utilization. Furthermore, it made it possible to use XFS_XFLAG_EXTSIZE to force contiguous allocations -- without these patches, our most fragmented systems would become unresponsive due to high CPU usage in the realtime allocator, but with them, CPU utilization is actually ~4-6%% lower than before, and disk I/O utilization is 15-20%% lower. Patches 2 and 3 are preparations for later optimizations; the remaining patches are the optimizations themselves. 1: https://lore.kernel.org/linux-xfs/cover.1687296675.git.osandov@osandov.com/ v2.1: djwong rebased everything atop his own cleanups, added dave's rtalloc_args v2.2: rebase with new apis and clean them up too v2.3: move struct definition around for lolz With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZTFTEQAKCRBKO3ySh0YR pkbpAQD52XiQ7GXAiZSfJDStw1cr3C664AhmX/KdGtzHPmqxIQD/e8I0u1GEPezK 6qnipiiFBq/px/hqakx5t8VgHfucwwY= =XdBg -----END PGP SIGNATURE----- Merge tag 'rtalloc-speedups-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA xfs: CPU usage optimizations for realtime allocator [v2.3] This is version 2 of [Omar's] XFS realtime allocator opimization patch series. Changes since v1 [1]: - Fixed potential overflow in patch 4. - Changed deprecated typedefs to normal struct names - Fixed broken indentation - Used xfs_fileoff_t instead of xfs_fsblock_t where appropriate. - Added calls to xfs_rtbuf_cache_relse anywhere that the cache is used instead of relying on the buffers being dirtied and thus attached to the transaction. - Clarified comments and commit messages in a few places. - Added Darrick's Reviewed-bys. Cover letter from v1: Our distributed storage system uses XFS's realtime device support as a way to split an XFS filesystem between an SSD and an HDD -- we configure the HDD as the realtime device so that metadata goes on the SSD and data goes on the HDD. We've been running this in production for a few years now, so we have some fairly fragmented filesystems. This has exposed various CPU inefficiencies in the realtime allocator. These became even worse when we experimented with using XFS_XFLAG_EXTSIZE to force files to be allocated contiguously. This series adds several optimizations that don't change the realtime allocator's decisions, but make them happen more efficiently, mainly by avoiding redundant work. We've tested these in production and measured ~10%% lower CPU utilization. Furthermore, it made it possible to use XFS_XFLAG_EXTSIZE to force contiguous allocations -- without these patches, our most fragmented systems would become unresponsive due to high CPU usage in the realtime allocator, but with them, CPU utilization is actually ~4-6%% lower than before, and disk I/O utilization is 15-20%% lower. Patches 2 and 3 are preparations for later optimizations; the remaining patches are the optimizations themselves. 1: https://lore.kernel.org/linux-xfs/cover.1687296675.git.osandov@osandov.com/ v2.1: djwong rebased everything atop his own cleanups, added dave's rtalloc_args v2.2: rebase with new apis and clean them up too v2.3: move struct definition around for lolz With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'rtalloc-speedups-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: don't look for end of extent further than necessary in xfs_rtallocate_extent_near() xfs: don't try redundant allocations in xfs_rtallocate_extent_near() xfs: limit maxlen based on available space in xfs_rtallocate_extent_near() xfs: return maximum free size from xfs_rtany_summary() xfs: invert the realtime summary cache xfs: simplify rt bitmap/summary block accessor functions xfs: simplify xfs_rtbuf_get calling conventions xfs: cache last bitmap block in realtime allocator xfs: consolidate realtime allocation arguments	2023-10-23 10:59:22 +05:30
Chandan Babu R	830b4abfe2	xfs: refactor rtbitmap/summary accessors [v1.2] Since the rtbitmap and rtsummary accessor functions have proven more controversial than the rest of the macro refactoring, split the patchset into two to make review easier. v1.1: various cleanups suggested by hch v1.2: rework the accessor functions to reduce the amount of cursor tracking required, and create explicit bitmap/summary logging functions With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZTFTEQAKCRBKO3ySh0YR prDOAQD5497tQO1iiA8bdO/kHGHwUWZub45q6+AwZRjzzc5BdAD8CgvZ/8F14hGC i80DF+GsmFL95mNHwlB1FS+lhJMF8wk= =R0rf -----END PGP SIGNATURE----- Merge tag 'refactor-rtbitmap-accessors-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA xfs: refactor rtbitmap/summary accessors [v1.2] Since the rtbitmap and rtsummary accessor functions have proven more controversial than the rest of the macro refactoring, split the patchset into two to make review easier. v1.1: various cleanups suggested by hch v1.2: rework the accessor functions to reduce the amount of cursor tracking required, and create explicit bitmap/summary logging functions With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'refactor-rtbitmap-accessors-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: use accessor functions for summary info words xfs: create helpers for rtsummary block/wordcount computations xfs: use accessor functions for bitmap words xfs: create a helper to handle logging parts of rt bitmap/summary blocks	2023-10-23 10:54:45 +05:30
Chandan Babu R	035e32f752	xfs: refactor rtbitmap/summary macros [v1.1] In preparation for adding block headers and enforcing endian order in rtbitmap and rtsummary blocks, replace open-coded geometry computations and fugly macros with proper helper functions that can be typechecked. Soon we'll be needing to add more complex logic to the helpers. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZTFTEQAKCRBKO3ySh0YR pn5HAP436EI/oXPv2LN6YgYXgbVEzhFdS4CsVE75L9Vs1Wux0wEAsKLyTNC/Re4d rhpraQBZty0/YbTSzvXS0IxUIeOq7Qk= =FzAh -----END PGP SIGNATURE----- Merge tag 'refactor-rtbitmap-macros-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA xfs: refactor rtbitmap/summary macros [v1.1] In preparation for adding block headers and enforcing endian order in rtbitmap and rtsummary blocks, replace open-coded geometry computations and fugly macros with proper helper functions that can be typechecked. Soon we'll be needing to add more complex logic to the helpers. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'refactor-rtbitmap-macros-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: create helpers for rtbitmap block/wordcount computations xfs: convert rt summary macros to helpers xfs: convert open-coded xfs_rtword_t pointer accesses to helper xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros xfs: convert the rtbitmap block and bit macros to static inline functions	2023-10-23 10:49:53 +05:30
Chandan Babu R	9d4ca5afa6	xfs: refactor rt extent unit conversions [v1.1] This series replaces all the open-coded integer division and multiplication conversions between rt blocks and rt extents with calls to static inline helpers. Having cleaned all that up, the helpers are augmented to skip the expensive operations in favor of bit shifts and masking if the rt extent size is a power of two. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZTFTEQAKCRBKO3ySh0YR pn8VAPwIqtQs6t19YAuIPy+Iwo0xls+G/XWTD+l+rWurtJBGKwD/csGP7Bt0w71i UzJ+VhyFUacV4MJdyGCrkGi0tuwUjQU= =zyjM -----END PGP SIGNATURE----- Merge tag 'refactor-rt-unit-conversions-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA xfs: refactor rt extent unit conversions [v1.1] This series replaces all the open-coded integer division and multiplication conversions between rt blocks and rt extents with calls to static inline helpers. Having cleaned all that up, the helpers are augmented to skip the expensive operations in favor of bit shifts and masking if the rt extent size is a power of two. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'refactor-rt-unit-conversions-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: use shifting and masking when converting rt extents, if possible xfs: create rt extent rounding helpers for realtime extent blocks xfs: convert do_div calls to xfs_rtb_to_rtx helper calls xfs: create helpers to convert rt block numbers to rt extent numbers xfs: create a helper to convert extlen to rtextlen xfs: create a helper to compute leftovers of realtime extents xfs: create a helper to convert rtextents to rtblocks	2023-10-23 10:45:10 +05:30
Chandan Babu R	3ef52c0109	xfs: clean up realtime type usage [v1.1] The realtime code uses xfs_rtblock_t and xfs_fsblock_t in a lot of places, and it's very confusing. Clean up all the type usage so that an xfs_rtblock_t is always a block within the realtime volume, an xfs_fileoff_t is always a file offset within a realtime metadata file, and an xfs_rtxnumber_t is always a rt extent within the realtime volume. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZTFTEQAKCRBKO3ySh0YR pkdNAP0XCeqWYWQK7QspUy2PU0fJRtVtXi44hyz3DMPrKJg4MgEAgbuvy9iUoqXX xeQ9hYBQS5N8+lSmgwRgMqTvEFTNgQU= =cExb -----END PGP SIGNATURE----- Merge tag 'clean-up-realtime-units-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA xfs: clean up realtime type usage [v1.1] The realtime code uses xfs_rtblock_t and xfs_fsblock_t in a lot of places, and it's very confusing. Clean up all the type usage so that an xfs_rtblock_t is always a block within the realtime volume, an xfs_fileoff_t is always a file offset within a realtime metadata file, and an xfs_rtxnumber_t is always a rt extent within the realtime volume. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'clean-up-realtime-units-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: convert rt extent numbers to xfs_rtxnum_t xfs: rename xfs_verify_rtext to xfs_verify_rtbext xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h xfs: make sure maxlen is still congruent with prod when rounding down xfs: fix units conversion error in xfs_bmap_del_extent_delay	2023-10-23 10:40:39 +05:30
Chandan Babu R	d0e85e79d6	xfs: minor bugfixes for rt stuff [v1.1] This is a preparatory patchset that fixes a few miscellaneous bugs before we start in on larger cleanups of realtime units usage. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZTFTDQAKCRBKO3ySh0YR psyEAQD4X4wWlxmJdgzI4ggmrh8AJyRrcy7vJgF3kBPm91RLFwD7BuX8vRjKim3y p/90GWxBbymV76uL7XMGMBlFO2tE6w4= =kflG -----END PGP SIGNATURE----- Merge tag 'realtime-fixes-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA xfs: minor bugfixes for rt stuff [v1.1] This is a preparatory patchset that fixes a few miscellaneous bugs before we start in on larger cleanups of realtime units usage. v1.1: various cleanups suggested by hch With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'realtime-fixes-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: rt stubs should return negative errnos when rt disabled xfs: prevent rt growfs when quota is enabled xfs: hoist freeing of rt data fork extent mappings xfs: bump max fsgeom struct version	2023-10-23 10:36:35 +05:30
Namjae Jeon	0c180317c6	ksmbd: add support for surrogate pair conversion ksmbd is missing supporting to convert filename included surrogate pair characters. It triggers a "file or folder does not exist" error in Windows client. [Steps to Reproduce for bug] 1. Create surrogate pair file touch $(echo -e '\xf0\x9d\x9f\xa3') touch $(echo -e '\xf0\x9d\x9f\xa4') 2. Try to open these files in ksmbd share through Windows client. This patch update unicode functions not to consider about surrogate pair (and IVS). Reviewed-by: Marios Makassikis <mmakassikis@freebox.fr> Tested-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:06:27 -05:00
Kangjing Huang	ecce70cf17	ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev() Physical ib_device does not have an underlying net_device, thus its association with IPoIB net_device cannot be retrieved via ops.get_netdev() or ib_device_get_by_netdev(). ksmbd reads physical ib_device port GUID from the lower 16 bytes of the hardware addresses on IPoIB net_device and match its underlying ib_device using ib_find_gid() Signed-off-by: Kangjing Huang <huangkangjing@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:06:27 -05:00
Marios Makassikis	807252f028	ksmbd: fix recursive locking in vfs helpers Running smb2.rename test from Samba smbtorture suite against a kernel built with lockdep triggers a "possible recursive locking detected" warning. This is because mnt_want_write() is called twice with no mnt_drop_write() in between: -> ksmbd_vfs_mkdir() -> ksmbd_vfs_kern_path_create() -> kern_path_create() -> filename_create() -> mnt_want_write() -> mnt_want_write() Fix this by removing the mnt_want_write/mnt_drop_write calls from vfs helpers that call kern_path_create(). Full lockdep trace below: ============================================ WARNING: possible recursive locking detected 6.6.0-rc5 #775 Not tainted -------------------------------------------- kworker/1:1/32 is trying to acquire lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: ksmbd_vfs_mkdir+0xe1/0x410 but task is already holding lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_writers#5); lock(sb_writers#5); * DEADLOCK * May be due to missing lock nesting notation 4 locks held by kworker/1:1/32: #0: ffff8880064e4138 ((wq_completion)ksmbd-io){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #1: ffff888005b0fdd0 ((work_completion)(&work->work)){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #2: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 #3: ffff8880057ce760 (&type->i_mutex_dir_key#3/1){+.+.}-{3:3}, at: filename_create+0x123/0x260 Cc: stable@vger.kernel.org Fixes: `40b268d384` ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:06:27 -05:00
Namjae Jeon	3354db6688	ksmbd: fix kernel-doc comment of ksmbd_vfs_setxattr() Fix argument list that the kdoc format and script verified in ksmbd_vfs_setxattr(). fs/smb/server/vfs.c:929: warning: Function parameter or member 'path' not described in 'ksmbd_vfs_setxattr' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:06:27 -05:00
Namjae Jeon	1819a90429	ksmbd: reorganize ksmbd_iov_pin_rsp() If ksmbd_iov_pin_rsp fail, io vertor should be rollback. This patch moves memory allocations to before setting the io vector to avoid rollbacks. Fixes: `e2b76ab8b5` ("ksmbd: add support for read compound") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:06:27 -05:00
Cheng-Han Wu	eacc655e18	ksmbd: Remove unused field in ksmbd_user struct fs/smb/server/mgmt/user_config.h:21: Remove the unused field 'failed_login_count' from the ksmbd_user struct. Signed-off-by: Cheng-Han Wu <hank20010209@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:06:27 -05:00
Meetakshi Setiya	1460720c59	cifs: Add client version details to NTLM authenticate message The NTLM authenticate message currently sets the NTLMSSP_NEGOTIATE_VERSION flag but does not populate the VERSION structure. This commit fixes this bug by ensuring that the flag is set and the version details are included in the message. Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:03:42 -05:00
Steve French	475efd9808	smb3: fix touch -h of symlink For example: touch -h -t 02011200 testfile where testfile is a symlink would not change the timestamp, but touch -t 02011200 testfile does work to change the timestamp of the target Suggested-by: David Howells <dhowells@redhat.com> Reported-by: Micah Veilleux <micah.veilleux@iba-group.com> Closes: https://bugzilla.samba.org/show_bug.cgi?id=14476 Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2023-10-22 19:03:41 -05:00
Trond Myklebust	6e7434abcd	NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts If we're using the 'softerr' mount option, we may want to allow layoutget to return EAGAIN to allow knfsd server threads to return a JUKEBOX/DELAY error to the client instead of busy waiting. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2023-10-22 19:47:56 -04:00
Trond Myklebust	5b9d31ae1c	NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY When using a 'softerr' mount, the NFSv4 client can get stuck waiting forever while the server just returns NFS4ERR_DELAY. Among other things, this causes the knfsd server threads to busy wait. Add a parameter that tells the NFSv4 client how many times to retry before giving up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2023-10-22 19:47:56 -04:00
Kees Cook	7413ab70cb	bcachefs: Refactor memcpy into direct assignment The memcpy() in bch2_bkey_append_ptr() is operating on an embedded fake flexible array which looks to the compiler like it has 0 size. This causes W=1 builds to emit warnings due to -Wstringop-overflow: In file included from include/linux/string.h:254, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/radix-tree.h:14, from include/linux/backing-dev-defs.h:6, from fs/bcachefs/bcachefs.h:182: fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr': include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=] 57 \| #define __underlying_memcpy __builtin_memcpy \| ^ include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy' 648 \| __underlying_##op(p, q, __fortify_size); \ \| ^~~~~~~~~~~~~ include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk' 693 \| #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ \| ^~~~~~~~~~~~~~~~~~~~ fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy' 235 \| memcpy((void *) &k->v + bkey_val_bytes(&k->k), \| ^~~~~~ fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0 287 \| struct bch_val v; \| ^ Avoid making any structure changes and just replace the u64 copy into a direct assignment, side-stepping the entire problem. Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@intel.com/ Link: https://lore.kernel.org/r/20231010235609.work.594-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	795413c548	bcachefs: Fix drop_alloc_keys() For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	37fad9497f	bcachefs: snapshot_create_lock Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	1e2d399970	bcachefs: Fix snapshot skiplists during snapshot deletion In snapshot deleion, we have to pick new skiplist nodes for entries that point to nodes being deleted. The function that finds a new skiplist node, skipping over entries being deleted, was incorrect: if n = 0, but the parent node is being deleted, we also need to skip over that node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	4637429e39	bcachefs: bch2_sb_field_get() refactoring Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	be47e0ba4f	bcachefs: KEY_TYPE_error now counts towards i_sectors KEY_TYPE_error is used when all replicas in an extent are marked as failed; it indicates that data was present, but has been lost. So that i_sectors doesn't change when replacing extents with KEY_TYPE_error, we now have to count error keys as allocations - this fixes fsck errors later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	6929d5e74e	bcachefs: Fix handling of unknown bkey types min_val_size was U8_MAX for unknown key types, causing us to flag any known key as invalid - it should have been 0. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	88d39fd544	bcachefs: Switch to unsafe_memcpy() in a few places The new fortify checking doesn't work for us in all places; this switches to unsafe_memcpy() where appropriate to silence a few warnings/errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Christophe JAILLET	c2d81c2412	bcachefs: Use struct_size() Use struct_size() instead of hand writing it. This is less verbose and more robust. While at it, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	69d1f052d1	bcachefs: Correctly initialize new buckets on device resize bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	4fc1f402c6	bcachefs: Fix another smatch complaint This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Kent Overstreet	dc08c661a2	bcachefs: Use strsep() in split_devs() Minor refactoring to fix a smatch complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Hunter Shaffer	40f7914e8d	bcachefs: Add iops fields to bch_member Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Hunter Shaffer	9af26120f0	bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Hunter Shaffer	3f7b9713da	bcachefs: New superblock section members_v2 members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Hunter Shaffer	1241df5872	bcachefs: Add new helper to retrieve bch_member from sb Prep work for introducing bch_sb_field_members_v2 - introduce new helpers that will check for members_v2 if it exists, otherwise using v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	73bbeaa2de	bcachefs: bucket_lock() is now a sleepable lock fsck_err() may sleep - it takes a mutex and may allocate memory, so bucket_lock() needs to be a sleepable lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Brian Foster	3c40841cdc	bcachefs: fix crc32c checksum merge byte order problem An fsstress task on a big endian system (s390x) quickly produces a bunch of CRC errors in the system logs. Most of these are related to the narrow CRCs path, but the fundamental problem can be reduced to a single write and re-read (after dropping caches) of a previously merged extent. The key merge path that handles extent merges eventually calls into bch2_checksum_merge() to combine the CRCs of the associated extents. This code attempts to avoid a byte order swap by feeding the le64 values into the crc32c code, but the latter casts the resulting u64 value down to a u32, which truncates the high bytes where the actual crc value ends up. This results in a CRC value that does not change (since it is merged with a CRC of 0), and checksum failures ensue. Fix the checksum merge code to swap to cpu byte order on the boundaries to the external crc code such that any value casting is handled properly. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	4220666398	bcachefs: Fix bch2_inode_delete_keys() bch2_inode_delete_keys() was using BTREE_ITER_NOT_EXTENTS, on the assumption that it would never need to split extents. But that caused a race with extents being split by other threads - specifically, the data move path. Extents iterators have the iterator position pointing to the start of the extent, which avoids the race. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	7dcf62c06d	bcachefs: Make btree root read errors recoverable The entire btree will be lost, but that is better than the entire filesystem not being recoverable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	1ee608c65d	bcachefs: Fall back to requesting passphrase directly We can only do this in userspace, unfortunately - but kernel keyrings have never seemed to worked reliably, this is a useful fallback. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	d281701b00	bcachefs: Fix looping around bch2_propagate_key_to_snapshot_leaves() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	d2a990d1b1	bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errors These errors aren't actual errors, and should never be printed - do this in the common helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	a190cbcfa0	bcachefs: Silence transaction restart error message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	1e3b40980b	bcachefs: More assertions for nocow locking - assert in shutdown path that no nocow locks are held - check for overflow when taking nocow locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	efedfc2ece	bcachefs: nocow locking: Fix lock leak Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	793a06d984	bcachefs: Fixes for building in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	03ef80b469	bcachefs: Ignore unknown mount options This makes mount option handling consistent with other filesystems - options may be handled at different layers, so an option we don't know about might not be intended for us. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	b560e32ef7	bcachefs: Always check for invalid bkeys in main commit path Previously, we would check for invalid bkeys at transaction commit time, but only if CONFIG_BCACHEFS_DEBUG=y. This check is important enough to always be on - it appears there's been corruption making it into the journal that would have been caught by it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	eebe8a8459	bcachefs: Make sure to initialize equiv when creating new snapshots Previously, equiv was set in the snapshot deletion path, which is where it's needed - equiv, for snapshot ID equivalence classes, would ideally be a private data structure to the snapshot deletion path. But if a new snapshot is created while snapshot deletion is running, move_key_to_correct_snapshot() moves a key to snapshot id 0 - oops. Fixes: https://github.com/koverstreet/bcachefs/issues/593 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	82142a5541	bcachefs: Fix a null ptr deref in bch2_get_alloc_in_memory_pos() Reported-by: smatch Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Torge Matthies	d8b6f8c3c6	bcachefs: Fix changing durability using sysfs Signed-off-by: Torge Matthies <openglfreak@googlemail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Brian Foster	7239f8e0ee	bcachefs: initial freeze/unfreeze support Initial support for the vfs superblock freeze and unfreeze operations. Superblock freeze occurs in stages, where the vfs attempts to quiesce high level write operations, page faults, fs internal operations, and then finally calls into the filesystem for any last stage steps (i.e. log flushing, etc.) before marking the superblock frozen. The majority of write paths are covered by freeze protection (i.e. sb_start_write() and friends) in higher level common code, with the exception of the fs-internal SB_FREEZE_FS stage (i.e. sb_start_intwrite()). This typically maps to active filesystem transactions in a manner that allows the vfs to implement a barrier of internal fs operations during the freeze sequence. This is not a viable model for bcachefs, however, because it utilizes transactions both to populate the journal as well as to perform journal reclaim. This means that mapping intwrite protection to transaction lifecycle or transaction commit is likely to deadlock freeze, as quiescing the journal requires transactional operations blocked by the final stage of freeze. The flipside of this is that bcachefs does already maintain its own internal sets of write references for similar purposes, currently utilized for transitions from read-write to read-only mode. Since this largely mirrors the high level sequence involved with freeze, we can simply invoke this mechanism in the freeze callback to fully quiesce the filesystem in the final stage. This means that while the SB_FREEZE_FS stage is essentially a no-op, the ->freeze_fs() callback that immediately follows begins by performing effectively the same step by quiescing all internal write references. One caveat to this approach is that without integration of internal freeze protection, write operations gated on internal write refs will fail with an internal -EROFS error rather than block on acquiring freeze protection. IOW, this is roughly equivalent to only having support for sb_start_intwrite_trylock(), and not the blocking variant. Many of these paths already use non-blocking internal write refs and so would map into an sb_start_intwrite_trylock() anyways. The only instance of this I've been able to uncover that doesn't explicitly rely on a higher level non-blocking write ref is the bch2_rbio_narrow_crcs() path, which updates crcs in certain read cases, and Kent has pointed out isn't critical if it happens to fail due to read-only status. Given that, implement basic freeze support as described above and leave tighter integration with internal freeze protection as a possible future enhancement. There are multiple potential ideas worth exploring here. For example, we could implement a multi-stage freeze callback that might allow bcachefs to quiesce its internal write references without deadlocks, we could integrate intwrite protection with bcachefs' internal write references somehow or another, or perhaps consider implementing blocking support for internal write refs to be used specifically for freeze, etc. In the meantime, this enables functional freeze support and the associated test coverage that comes with it. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	40a53b9215	bcachefs: More minor smatch fixes - fix a few uninitialized return values - return a proper error code in lookup_lostfound() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	51c801bc64	bcachefs: Minor bch2_btree_node_get() smatch fixes - it's no longer possible for trans to be NULL - also, move "wait for read to complete" to the slowpath, __bch2_btree_node_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	d04fdf5c10	bcachefs: snapshots: Use kvfree_rcu_mightsleep() kvfree_rcu() was renamed - not removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	97ecc23632	bcachefs: Fix strndup_user() error checking strndup_user() returns an error pointer, not NULL. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	cfda31c033	bcachefs: drop journal lock before calling journal_write bch2_journal_write() expects process context, it takes journal_lock as needed. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	4b33a1916a	bcachefs: bch2_ioctl_disk_resize_journal(): check for integer truncation Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	75e0c4789b	bcachefs: Fix error checks in bch2_chacha_encrypt_key() crypto_alloc_sync_skcipher() returns an ERR_PTR, not NULL. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	a55fc65eb2	bcachefs: Fix an overflow check When bucket sector counts were changed from u16s to u32s, a few things were missed. This fixes an overflow check, and a truncation that prevented the overflow check from firing. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	f7f6943a8c	bcachefs: Fix copy_to_user() usage in flush_buf() copy_to_user() returns the number of bytes successfully copied - not an errcode. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Brian Foster	3e55189b50	bcachefs: fix race between journal entry close and pin set bcachefs freeze testing via fstests generic/390 occasionally reproduces the following BUG from bch2_fs_read_only(): BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty)); This indicates that one or more dirty key cache keys still exist after the attempt to flush and quiesce the fs. The sequence that leads to this problem actually occurs on unfreeze (ro->rw), and looks something like the following: - Task A begins a transaction commit and acquires journal_res for the current seq. This transaction intends to perform key cache insertion. - Task B begins a bch2_journal_flush() via bch2_sync_fs(). This ends up in journal_entry_want_write(), which closes the current journal entry and drops the reference to the pin list created on entry open. The pin put pops the front of the journal via fast reclaim since the reference count has dropped to 0. - Task A attempts to set the journal pin for the associated cached key, but bch2_journal_pin_set() skips the pin insert because the seq of the transaction reservation is behind the front of the pin list fifo. The end result is that the pin associated with the cached key is not added, which prevents a subsequent reclaim from processing the key and thus leaves it dangling at freeze time. The fundamental cause of this problem is that the front of the journal is allowed to pop before a transaction with outstanding reservation on the associated journal seq is able to add a pin. The count for the pin list associated with the seq drops to zero and is prematurely reclaimed as a result. The logical fix for this problem lies in how the journal buffer is managed in similar scenarios where the entry might have been closed before a transaction with outstanding reservations happens to be committed. When a journal entry is opened, the current sequence number is bumped, the associated pin list is initialized with a reference count of 1, and the journal buffer reference count is bumped (via journal_state_inc()). When a journal reservation is acquired, the reservation also acquires a reference on the associated buffer. If the journal entry is closed in the meantime, it drops both the pin and buffer references held by the open entry, but the buffer still has references held by outstanding reservation. After the associated transaction commits, the reservation release drops the associated buffer references and the buffer is written out once the reference count has dropped to zero. The fundamental problem here is that the lifecycle of the pin list reference held by an open journal entry is too short to cover the processing of transactions with outstanding reservations. The simplest way to address this is to expand the pin list reference to the lifecycle of the buffer vs. the shorter lifecycle of the open journal entry. This ensures the pin list for a seq with outstanding reservation cannot be popped and reclaimed before all outstanding reservations have been released, even if the associated journal entry has been closed for further reservations. Move the pin put from journal entry close to where final processing of the journal buffer occurs. Create a duplicate helper to cover the case where the caller doesn't already hold the journal lock. This allows generic/390 to pass reliably. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Brian Foster	fc08031bb8	bcachefs: prepare journal buf put to handle pin put bcachefs freeze testing has uncovered some raciness between journal entry open/close and pin list reference count management. The details of the problem are described in a separate patch. In preparation for the associated fix, refactor the journal buffer put path a bit to allow it to eventually handle dropping the pin list reference currently held by an open journal entry. Retain the journal write dispatch helper since the closure code is inlined and we don't want to increase the amount of inline code in the transaction commit path, but rename the function to reflect the purpose of final processing of the journal buffer. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Brian Foster	92b63f5bf0	bcachefs: refactor pin put helpers We have a couple journal pin put helpers to handle cases where the journal lock is already held or not. Refactor the helpers to lock and reclaim from the highest level and open code the reclaim from the one caller of the internal variant. The latter call will be moved into the journal buf release helper in a later patch. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Dan Carpenter	d67a72bfc9	bcachefs: snapshot: Add missing assignment in bch2_delete_dead_snapshots() This code accidentally left out the "ret = " assignment so the errors from for_each_btree_key2() are not checked. Fixes: 53534482a250 ("bcachefs: for_each_btree_key2()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Dan Carpenter	1f12900ab5	bcachefs: fs-ioctl: Fix copy_to_user() error code The copy_to_user() function returns the number of bytes that it wasn't able to copy but we want to return -EFAULT to the user. Fixes: e0750d947352 ("bcachefs: Initial commit") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Dan Carpenter	b6c22147e0	bcachefs: acl: Add missing check in bch2_acl_chmod() The "ret = bkey_err(k);" assignment was accidentally left out so the call to bch2_btree_iter_peek_slot() is not checked for errors. Fixes: 53306e096d91 ("bcachefs: Always check for transaction restarts") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Dan Carpenter	e9a0a26ed0	bcachefs: acl: Uninitialized variable in bch2_acl_chmod() The clean up code at the end of the function uses "acl" so it needs to be initialized to NULL. Fixes: 53306e096d91 ("bcachefs: Always check for transaction restarts") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Nick Desaulniers	265cc42315	bcachefs: Fix -Wself-assign Fixes the following observed error reported by Nathan on IRC. fs/bcachefs/io_misc.c:467:6: error: explicitly assigning value of variable of type 'int' to itself [-Werror,-Wself-assign] 467 \| ret = ret; \| ~~~ ^ ~~~ Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Jiapeng Chong	3b59fbec86	bcachefs: Remove duplicate include ./fs/bcachefs/btree_update.h: journal.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6573 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Dan Carpenter	867c1fe018	bcachefs: fix error checking in bch2_fs_alloc() There is a typo here where it uses ";" instead of "?:". The result is that bch2_fs_fs_io_direct_init() is called unconditionally and the errors from it are not checked. Fixes: 0060c68159fc ("bcachefs: Split up fs-io.[ch]") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Brian Foster <bfoster@redhat.com>	2023-10-22 17:10:14 -04:00
Dan Carpenter	4ba985b84d	bcachefs: chardev: fix an integer overflow (32 bit only) On 32 bit systems, "sizeof(*arg) + replica_entries_bytes" can have an integer overflow leading to memory corruption. Use size_add() to prevent this. Fixes: b44dd3797034 ("bcachefs: Redo filesystem usage ioctls") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Dan Carpenter	301e0237ca	bcachefs: chardev: return -EFAULT if copy_to_user() fails The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT to the user. Fixes: e0750d947352 ("bcachefs: Initial commit") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	8c2d82a6fe	bcachefs: Change bucket_lock() to use bit_spin_lock() bucket_lock() previously open coded a spinlock, because we need to cram a spinlock into a single byte. But it turns out not all archs support xchg() on a single byte; since we need struct bucket to be small, this means we have to play fun games with casts and ifdefs for endianness. This fixes building on 32 bit arm, and likely other architectures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: linux-bcachefs@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	439c172bc7	bcachefs: Kill other unreachable() uses Per previous commit, bare unreachable() considered harmful, convert to BUG() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Josh Poimboeuf	3764647b25	bcachefs: Remove undefined behavior in bch2_dev_buckets_reserved() In general it's a good idea to avoid using bare unreachable() because it introduces undefined behavior in compiled code. In this case it even confuses GCC into emitting an empty unused bch2_dev_buckets_reserved.part.0() function. Use BUG() instead, which is nice and defined. While in theory it should never trigger, if something were to go awry and the BCH_WATERMARK_NR case were to actually hit, the failure mode is much more robust. Fixes the following warnings: vmlinux.o: warning: objtool: bch2_bucket_alloc_trans() falls through to next function bch2_reset_alloc_cursors() vmlinux.o: warning: objtool: bch2_dev_buckets_reserved.part.0() is missing an ELF size annotation Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Christophe JAILLET	0198b2356b	bcachefs: Remove a redundant and harmless bch2_free_super() call Remove a redundant call to bch2_free_super(). This is harmless because bch2_free_super() has a memset() at its end. So a second call would only lead to from kfree(NULL). Remove the redundant call and only rely on the error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Christophe JAILLET	71933fb69b	bcachefs: Fix use-after-free in bch2_dev_add() If __bch2_dev_attach_bdev() fails, bch2_dev_free() is called twice. Once here and another time in the error handling path. This leads to several use-after-free. Remove the redundant call and only rely on the error handling path. Fixes: 6a44735653d4 ("bcachefs: Improved superblock-related error messages") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Brian Foster	a9737e0b38	bcachefs: add module description to fix modpost warning modpost produces the following warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/bcachefs.o Add a module description for bcachefs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	6bd68ec266	bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	96dea3d599	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Yang Li	b5e85d4d0c	bcachefs: Remove unneeded semicolon ./fs/bcachefs/btree_gc.c:1249:2-3: Unneeded semicolon ./fs/bcachefs/btree_gc.c:1521:2-3: Unneeded semicolon ./fs/bcachefs/btree_gc.c:1575:2-3: Unneeded semicolon ./fs/bcachefs/counters.c:46:2-3: Unneeded semicolon Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	7bba0dc6fc	bcachefs: Add a missing prefetch include Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	e82f5f40f2	bcachefs: Fix -Wcompare-distinct-pointer-types in bch2_copygc_get_buckets() When building bcachefs for 32-bit ARM, there is a warning when using max() to compare an expression involving 'size_t' with an 'unsigned long' literal: fs/bcachefs/movinggc.c:159:21: error: comparison of distinct pointer types ('typeof (16UL) ' (aka 'unsigned long ') and 'typeof (buckets_in_flight->nr / 4) ' (aka 'unsigned int ')) [-Werror,-Wcompare-distinct-pointer-types] 159 \| size_t nr_to_get = max(16UL, buckets_in_flight->nr / 4); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:76:19: note: expanded from macro 'max' 76 \| #define max(x, y) __careful_cmp(x, y, >) \| ^~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp' 38 \| __builtin_choose_expr(__safe_cmp(x, y), \ \| ^~~~~~~~~~~~~~~~ include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp' 28 \| (__typecheck(x, y) && __no_side_effects(x, y)) \| ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:22:28: note: expanded from macro '__typecheck' 22 \| (!!(sizeof((typeof(x) )1 == (typeof(y) )1))) \| ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when comparing these two expressions. Use max_t(size_t, ...) for this situation, eliminating the warning. Fixes: dd49018737d4 ("bcachefs: Rhashtable based buckets_in_flight for copygc") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	53eda6f713	bcachefs: Fix -Wcompare-distinct-pointer-types in do_encrypt() When building bcachefs for 32-bit ARM, there is a warning when using min() to compare a variable of type 'size_t' with an expression of type 'unsigned long': fs/bcachefs/checksum.c:142:22: error: comparison of distinct pointer types ('typeof (len) ' (aka 'unsigned int ') and 'typeof (((1UL) << 12) - offset) ' (aka 'unsigned long ')) [-Werror,-Wcompare-distinct-pointer-types] 142 \| unsigned pg_len = min(len, PAGE_SIZE - offset); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:69:19: note: expanded from macro 'min' 69 \| #define min(x, y) __careful_cmp(x, y, <) \| ^~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp' 38 \| __builtin_choose_expr(__safe_cmp(x, y), \ \| ^~~~~~~~~~~~~~~~ include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp' 28 \| (__typecheck(x, y) && __no_side_effects(x, y)) \| ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:22:28: note: expanded from macro '__typecheck' 22 \| (!!(sizeof((typeof(x) )1 == (typeof(y) )1))) \| ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when comparing these two expressions. Use min_t(size_t, ...) for this situation, eliminating the warning. Fixes: 1fb50457684f ("bcachefs: Fix memory corruption in encryption path") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	1f70225d77	bcachefs: Fix -Wincompatible-function-pointer-types-strict from key_invalid callbacks When building bcachefs with -Wincompatible-function-pointer-types-strict, a clang warning designed to catch issues with mismatched function pointer types, which will be fatal at runtime due to kernel Control Flow Integrity (kCFI), there are several instances along the lines of: fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int ()(const struct bch_fs , struct bkey_s_c, enum bkey_invalid_flags, struct printbuf )' with an expression of type 'int (const struct bch_fs , struct bkey_s_c, unsigned int, struct printbuf *)' [-Werror,-Wincompatible-function-pointer-types-strict] 118 \| BCH_BKEY_TYPES() \| ^~~~~~~~~~~~~~~~ fs/bcachefs/bcachefs_format.h:342:2: note: expanded from macro 'BCH_BKEY_TYPES' 342 \| x(deleted, 0) \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x' 117 \| #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name, \| ^~~~~~~~~~~~~~~~~~~~ <scratch space>:206:1: note: expanded from here 206 \| bch2_bkey_ops_deleted \| ^~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:34:17: note: expanded from macro 'bch2_bkey_ops_deleted' 34 \| .key_invalid = deleted_key_invalid, \ \| ^~~~~~~~~~~~~~~~~~~ The flags parameter should be of type 'enum bkey_invalid_flags', not 'unsigned int'. Adjust the type everywhere so that there is no more warning. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	0940863fd2	bcachefs: Fix -Wformat in bch2_bucket_gens_invalid() When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_bucket_gens_invalid() due to use of an incorrect format specifier: fs/bcachefs/alloc_background.c:530:10: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat] 529 \| prt_printf(err, "bad val size (%lu != %zu)", \| ~~~ \| %zu 530 \| bkey_val_bytes(k.k), sizeof(struct bch_bucket_gens)); \| ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %lu but on 32-bit architectures, size_t is 'unsigned int'. Use '%zu', the format specifier for 'size_t', to eliminate the warning. Fixes: 4be0d766a7e9 ("bcachefs: bucket_gens btree") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	14f63ff3f6	bcachefs: Fix -Wformat in bch2_alloc_v4_invalid() When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_alloc_v4_invalid() due to use of an incorrect format specifier: fs/bcachefs/alloc_background.c:246:30: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat] 245 \| prt_printf(err, "bad val size (%u > %lu)", \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| %u 246 \| alloc_v4_u64s(a.v), bkey_val_u64s(k.k)); \| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey.h:58:27: note: expanded from macro 'bkey_val_u64s' 58 \| #define bkey_val_u64s(_k) ((_k)->u64s - BKEY_U64s) \| ^ fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ This expression is of type 'size_t'. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %lu but on 32-bit architectures, size_t is 'unsigned int'. Use '%zu', the format specifier for 'size_t' to eliminate the warning. Fixes: 11be8e8db283 ("bcachefs: New on disk format: Backpointers") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	f7ed15eb17	bcachefs: Fix -Wformat in bch2_btree_key_cache_to_text() When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_btree_key_cache_to_text() due to use of an incorrect format specifier: fs/bcachefs/btree_key_cache.c:1060:36: error: format specifies type 'size_t' (aka 'unsigned int') but the argument has type 'long' [-Werror,-Wformat] 1060 \| prt_printf(out, "nr_freed:\t%zu", atomic_long_read(&c->nr_freed)); \| ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| %ld fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %zu but on 32-bit architectures, size_t is 'unsigned int'. Use '%lu' to match the other format specifiers used in this function for printing values returned from atomic_long_read(). Fixes: 6d799930ce0f ("bcachefs: btree key cache pcpu freedlist") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	fac1250a8c	bcachefs: Fix -Wformat in bch2_set_bucket_needs_journal_commit() When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_set_bucket_needs_journal_commit() due to a debug print using the wrong specifier: fs/bcachefs/buckets_waiting_for_journal.c:137:30: error: format specifies type 'size_t' (aka 'unsigned int') but the argument has type 'unsigned long' [-Werror,-Wformat] 136 \| pr_debug("took %zu rehashes, table at %zu/%zu elements", \| ~~~ \| %lu 137 \| nr_rehashes, nr_elements, 1UL << b->t->bits); \| ^~~~~~~~~~~~~~~~~ include/linux/printk.h:579:26: note: expanded from macro 'pr_debug' 579 \| dynamic_pr_debug(fmt, ##__VA_ARGS__) \| ~~~ ^~~~~~~~~~~ include/linux/dynamic_debug.h:270:22: note: expanded from macro 'dynamic_pr_debug' 270 \| pr_fmt(fmt), ##__VA_ARGS__) \| ~~~ ^~~~~~~~~~~ include/linux/dynamic_debug.h:250:59: note: expanded from macro '_dynamic_func_call' 250 \| _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__) \| ^~~~~~~~~~~ include/linux/dynamic_debug.h:248:65: note: expanded from macro '_dynamic_func_call_cls' 248 \| __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__) \| ^~~~~~~~~~~ include/linux/dynamic_debug.h:224:15: note: expanded from macro '__dynamic_func_call_cls' 224 \| func(&id, ##__VA_ARGS__); \ \| ^~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %zu but on 32-bit architectures, size_t is 'unsigned int'. Use the correct specifier to resolve the warning. Fixes: 7a82e75ddaef ("bcachefs: New data structure for buckets waiting on journal commit") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	6bf3766b52	bcachefs: Fix a handful of spelling mistakes in various messages There are several spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	74c1e4221b	bcachefs: remove redundant pointer q The pointer q is being assigned a value but it is never read. The assignment and pointer are redundant and can be removed. Cleans up clang scan build warning: fs/bcachefs/quota.c:813:2: warning: Value stored to 'q' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	2a831e4ba9	bcachefs: remove duplicated assignment to variable offset_into_extent Variable offset_into_extent is being assigned to zero and a few statements later it is being re-assigned again to the save value. The second assignment is redundant and can be removed. Cleans up clang-scan build warning: fs/bcachefs/io.c:2722:3: warning: Value stored to 'offset_into_extent' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	c04cbc0dfd	bcachefs: remove redundant initializations of variables start_offset and end_offset The variables start_offset and end_offset are being initialized with values that are never read, they being re-assigned later on. The initializations are redundant and can be removed. Cleans up clang-scan build warnings: fs/bcachefs/fs-io.c:243:11: warning: Value stored to 'start_offset' during its initialization is never read [deadcode.DeadStores] fs/bcachefs/fs-io.c:244:11: warning: Value stored to 'end_offset' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	519d6c8845	bcachefs: remove redundant initialization of pointer dst The pointer dst is being initialized with a value that is never read, it is being re-assigned later on when it is used in a while-loop The initialization is redundant and can be removed. Cleans up clang-scan build warning: fs/bcachefs/disk_groups.c:186:30: warning: Value stored to 'dst' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	7cb0e6992e	bcachefs: remove redundant initialization of pointer d The pointer d is being initialized with a value that is never read, it is being re-assigned later on when it is used in a for-loop. The initialization is redundant and can be removed. Cleans up clang-scan build warning: fs/bcachefs/buckets.c:1303:25: warning: Value stored to 'd' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	feb5cc3981	bcachefs: trace_read_nopromote() Add a tracepoint to print the reason a read wasn't promoted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	f3e374efbf	bcachefs: Log finsert/fcollapse operations Now that we have the logged operations btree, we can make finsert/fcollapse atomic w.r.t. unclean shutdown as well. This adds bch_logged_op_finsert to represent the state of an finsert or fcollapse, which is a bit more complicated than truncate since we need to track our position in the "shift extents" operation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	b030e262b5	bcachefs: Log truncate operations Previously, we guaranteed atomicity of truncate after unclean shutdown with the BCH_INODE_I_SIZE_DIRTY flag - which required a full scan of the inodes btree. Recently the deleted inodes btree was added so that we no longer have to scan for deleted inodes, but truncate was unfinished and that change left it broken. This patch uses the new logged operations btree to fix truncate atomicity; we now log an operation that can be replayed at the start of a truncate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	aaad530ac6	bcachefs: BTREE_ID_logged_ops Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	5902cc283c	bcachefs: New io_misc.c helpers This pulls the non vfs specific parts of truncate and finsert/fcollapse out of fs-io.c, and moves them to io_misc.c. This is prep work for logging these operations, to make them atomic in the event of a crash. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	1809b8cba7	bcachefs: Break up io.c More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	cbf57db53f	bcachefs: bch2_trans_update_get_key_cache() Factor out a slowpath into a separate function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	aef32bf7cc	bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	39791d7de2	bcachefs: Kill incorrect assertion In the bch2_fs_alloc() error path we call bch2_fs_free() without setting BCH_FS_STOPPING - this is fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	e46c181af9	bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	da187cacb8	bcachefs: Kill missing inode warnings in bch2_quota_read() bch2_quota_read(), when scanning for inodes, may attempt to look up inodes that have been deleted in the main subvolume - this is not an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	c7afec9bd6	bcachefs: Fix bch_sb_handle type blk_mode_t was recently introduced; we should be using it now, instead of fmode_t. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	c872afa224	bcachefs: Fix bch2_propagate_key_to_snapshot_leaves() When we handle a transaction restart in a nested context, we need to return -BCH_ERR_transaction_restart_nested because we invalidated the outer context's iterators and locks. bch2_propagate_key_to_snapshot_leaves() wasn't doing this, this patch fixes it to use trans_was_restarted(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	5b7fbdcd5b	bcachefs: Fix silent enum conversion error This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	5cfd69775e	bcachefs: Array bounds fixes It's no longer legal to use a zero size array as a flexible array member - this causes UBSAN to complain. This patch switches our zero size arrays to normal flexible array members when possible, and inserts casts in other places (e.g. where we use the zero size array as a marker partway through an array). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	a9a7bbab14	bcachefs: bch2_acl_to_text() We can now print out acls from bch2_xattr_to_text(), when the xattr contains an acl. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Brian Foster	197763a70b	bcachefs: restart journal reclaim thread on ro->rw transitions Commit c2d5ff36065a4 ("bcachefs: Start journal reclaim thread earlier") tweaked reclaim thread management to start a bit earlier in the mount sequence by moving the start call from __bch2_fs_read_write() to bch2_fs_journal_start(). This has the side effect of never starting the reclaim thread on a ro->rw transition, which can be observed by monitoring reclaim behavior via the journal_reclaim tracepoints. I.e. once an fs has remounted ro->rw, we only ever rely on direct reclaim from that point forward. Since bch2_journal_reclaim_start() properly handles the case where the reclaim thread has already been created, restore the start call in the read-write helper. This allows the reclaim thread to start early when appropriate and also exit/restart on remounts or freeze cycles. In the latter case it may be possible to simply allow the task to freeze rather than destroy it, but for now just fix the immediate bug. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	097d4cc8fd	bcachefs: Fix snapshot_skiplist_good() We weren't correctly checking snapshot skiplist nodes - we were checking if they were in the same tree, not if they were an actual ancestor. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	cba37d81f5	bcachefs: Kill stripe check in bch2_alloc_v4_invalid() Since we set bucket data type to BCH_DATA_stripe based on the data pointer, not just the stripe pointer, it doesn't make sense to check for no stripe in the .key_invalid method - this is a situation that shouldn't happen, but our other fsck/repair code handles it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	9d2a7bd8b7	bcachefs: Improve bch2_moving_ctxt_to_text() Print more information out about moving contexts - fold in the output of the redundant bch2_data_jobs_to_text(), and also include information relevant to whether move_data() should be blocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	cc07773f15	bcachefs: Put bkey invalid check in commit path in a more useful place When doing updates early in recovery, before we can go RW, we still want to check that keys are valid at commit time - this moves key invalid checking to before the "btree updates to journal" path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	71aba59029	bcachefs: Always check alloc data type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	4491283f8d	bcachefs: Fix a double free on invalid bkey Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	a111901f52	bcachefs: bch2_propagate_key_to_snapshot_leaves() If fsck finds a key that needs work done, the primary example being an unlinked inode that needs to be deleted, and the key is in an internal snapshot node, we have a bit of a conundrum. The conundrum is that internal snapshot nodes are shared, and we in general do updates in internal snapshot nodes because there may be overwrites in some snapshots and not others, and this may affect other keys referenced by this key (i.e. extents). For example, we might be seeing an unlinked inode in an internal snapshot node, but then in one child snapshot the inode might have been reattached and might not be unlinked. Deleting the inode in the internal snapshot node would be wrong, because then we'll delete all the extents that the child snapshot references. But if an unlinked inode does not have any overwrites in child snapshots, we're fine: the inode is overwrritten in all child snapshots, so we can do the deletion at the point of comonality in the snapshot tree, i.e. the node where we found it. This patch adds a new helper, bch2_propagate_key_to_snapshot_leaves(), to handle the case where we need a to update a key that does have overwrites in child snapshots: we copy the key to leaf snapshot nodes, and then rewind fsck and process the needed updates there. With this, fsck can now always correctly handle unlinked inodes found in internal snapshot nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	f55d6e07bc	bcachefs: Cleanup redundant snapshot nodes After deleteing snapshots, we may be left with a snapshot tree where some nodes only have one child, and we have a linear chain. Interior snapshot nodes are never used directly (i.e. they never have subvolumes that point to them), they are only referered to by child snapshot nodes - hence, they are redundant. The existing code talks about redundant snapshot nodes as forming and equivalence class; i.e. nodes for which snapshot_t->equiv is equal. In a given equivalence class, we only ever need a single key at a given position - i.e. multiple versions with different snapshot fields are redundant. The existing snapshot cleanup code deletes these redundant keys, but not redundant nodes. It turns out this is buggy, because we assume that after snapshot deletion finishes we should only have a single key per equivalence class, but the btree update path doesn't preserve this - overwriting keys in old snapshots doesn't check for the equivalence class being equal, and thus we can end up with duplicate keys in the same equivalence class and fsck complaining about snapshot deletion not having run correctly. The equivalence class notion has been leaking out of the core snapshots code and into too much other code, i.e. fsck, so this patch takes a different approach: snapshot deletion now moves keys to the node in an equivalence class being kept (the leafiest node) and then deletes the redundant nodes in the equivalance class. Some work has to be done to correctly delete interior snapshot nodes; snapshot node depth and skiplist fields for descendent nodes have to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	da52576080	bcachefs: Fix btree write buffer with snapshots btrees Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	66487c54ad	bcachefs: Fix is_ancestor bitmap The is_ancestor bitmap is at optimization for bch2_snapshot_is_ancestor; once we get sufficiently close to the ancestor ID we're searching for we test a bitmap. But initialization of the is_ancestor bitmap was broken; we do it by using bch2_snapshot_parent(), but we call that on nodes that haven't been initialized yet with bch2_mark_snapshot(). Fix this by adding a separate loop in bch2_snapshots_read() for initializing the is_ancestor bitmap, and also add some new debug asserts for checking this sort of breakage in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	fa5bed376a	bcachefs: move check_pos_snapshot_overwritten() to snapshot.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	7573041ab9	bcachefs: Fix bch2_mount error path In the bch2_mount() error path, we were calling deactivate_locked_super(), which calls ->kill_sb(), which in our case was calling bch2_fs_free() without __bch2_fs_stop(). This changes bch2_mount() to just call bch2_fs_stop() directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	adc0e95091	bcachefs: Delete a faulty assertion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	55d5276d2e	bcachefs: Improve btree_path_relock_fail tracepoint In https://github.com/koverstreet/bcachefs/issues/450, we're seeing unexplained btree_path_relock_fail events - according to the information currently in the tracepoint, it appears the relock should be succeeding. This adds lock counts to the tracepoint to help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	d0445e131e	bcachefs: Fix divide by zero in rebalance_work() This fixes https://github.com/koverstreet/bcachefs-tools/issues/159 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	8e877caaad	bcachefs: Split out snapshot.c subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	e5570df295	bcachefs: stack_trace_save_tsk() depends on CONFIG_STACKTRACE Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	62898dd12b	bcachefs: Fix swallowing of data in buffered write path In __bch2_buffered_write, if we fail to write to an entire !uptodate folio, we have to back out the write, bail out and retry. But we were missing an iov_iter_revert() call, so the data written to the folio was lost and the rest of the write shifted to the wrong offset. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Brian Foster	8c9b0f7bdc	bcachefs: fix up wonky error handling in bch2_seek_pagecache_hole() The folio_hole_offset() helper returns a mix of bool and int types. The latter is to support a possible -EAGAIN error code when using nonblocking locks. This is not only confusing, but the only caller also essentially ignores errors outside of stopping the range iteration. This means an -EAGAIN error can't return directly from folio_hole_offset() and may be lost via bch2_clamp_data_hole(). Fix up the error handling and make it more readable. __filemap_get_folio() returns -ENOENT instead of NULL when no folio exists, so reuse the same error code in folio_hole_offset(). Fix up bch2_seek_pagecache_hole() to return the current offset on -ENOENT, but otherwise return unexpected error code up to the caller. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	029b85fe41	bcachefs: Fix bkey format calculation For extents, we increase the number of bits of the size field to allow extents to get bigger due to merging - but this code didn't check for overflow. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	c8ef8c3eb5	bcachefs: Fix bch2_extent_fallocate() - There was no need for a retry loop in bch2_extent_fallocate(); if we have to retry we may be overwriting something different and we need to return an error and let the caller retry. - The bch2_alloc_sectors_start() error path was wrong, and wasn't running our cleanup at the end of the function This also fixes a very rare open bucket leak due to the missing cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	ff5b741c25	bcachefs: Zero btree_paths on allocation This fixes a bug in the cycle detector, bch2_check_for_deadlock() - we have to make sure the node pointers in the btree paths array are set to something not-garbage before another thread may see them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	e9679b4a06	bcachefs: Fix 'pointer to invalid device' check This fixes the device removal tests, which have been failing at random due to the fact that when we're running the .key_invalid checks in the write path the key may actually no longer exist - we might be racing with the keys being deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Joshua Ashton	a125c0742c	bcachefs: Lower BCH_NAME_MAX to 512 To ensure we aren't shooting ourselves in the foot after merge for potentially doing future revisions for dirent or for storing multiple names for casefolding, limit this to 512 for now. Previously this define was linked to the max size a d_name in bch_dirent could be. Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Joshua Ashton	29c336afeb	bcachefs: Optimize bch2_dirent_name_bytes Avoids doing a full strnlen for getting the length of the name of a dirent entry. Given the fact that the name of dirents is stored at the end of the bkey's value, and we know the length of that in u64s, we can find the last u64 and figure out how many NUL bytes are at the end of the string. On little endian systems this ends up being the leading zeros of the last u64, whereas on big endian systems this ends up being the trailing zeros of the last u64. We can take that value in bits and divide it by 8 to get the number of NUL bytes at the end. There is no endian-fixup or other compatibility here as this is string data interpreted as a u64. Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Joshua Ashton	01a7e74fe1	bcachefs: Introduce bch2_dirent_get_name A nice cleanup that avoids a bunch of open-coding name/string usage around dirent usage. Will be used by casefolding impl in future commits. Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	f854ce4d0a	bcachefs: six locks: Guard against wakee exiting in __six_lock_wakeup() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	93ee2c4b21	bcachefs: Don't open code closure_nr_remaining() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00

... 3 4 5 6 7 ...

87633 commits