Commit Graph

144 Commits

Author SHA1 Message Date
Linus Torvalds 910202f00a vfs-6.9.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4DwAKCRCRxhvAZXjc
 ooTRAQDRI6Qz6wJym5Yblta8BScMGbt/SgrdgkoCvT6y83MtqwD+Nv/AZQzi3A3l
 9NdULtniW1reuCYkc8R7dYM8S+yAwAc=
 =Y1qX
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull block handle updates from Christian Brauner:
 "Last cycle we changed opening of block devices, and opening a block
  device would return a bdev_handle. This allowed us to implement
  support for restricting and forbidding writes to mounted block
  devices. It was accompanied by converting and adding helpers to
  operate on bdev_handles instead of plain block devices.

  That was already a good step forward but ultimately it isn't necessary
  to have special purpose helpers for opening block devices internally
  that return a bdev_handle.

  Fundamentally, opening a block device internally should just be
  equivalent to opening files. So now all internal opens of block
  devices return files just as a userspace open would. Instead of
  introducing a separate indirection into bdev_open_by_*() via struct
  bdev_handle bdev_file_open_by_*() is made to just return a struct
  file. Opening and closing a block device just becomes equivalent to
  opening and closing a file.

  This all works well because internally we already have a pseudo fs for
  block devices and so opening block devices is simple. There's a few
  places where we needed to be careful such as during boot when the
  kernel is supposed to mount the rootfs directly without init doing it.
  Here we need to take care to ensure that we flush out any asynchronous
  file close. That's what we already do for opening, unpacking, and
  closing the initramfs. So nothing new here.

  The equivalence of opening and closing block devices to regular files
  is a win in and of itself. But it also has various other advantages.
  We can remove struct bdev_handle completely. Various low-level helpers
  are now private to the block layer. Other helpers were simply
  removable completely.

  A follow-up series that is already reviewed build on this and makes it
  possible to remove bdev->bd_inode and allows various clean ups of the
  buffer head code as well. All places where we stashed a bdev_handle
  now just stash a file and use simple accessors to get to the actual
  block device which was already the case for bdev_handle"

* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  block: remove bdev_handle completely
  block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
  bdev: remove bdev pointer from struct bdev_handle
  bdev: make struct bdev_handle private to the block layer
  bdev: make bdev_{release, open_by_dev}() private to block layer
  bdev: remove bdev_open_by_path()
  reiserfs: port block device access to file
  ocfs2: port block device access to file
  nfs: port block device access to files
  jfs: port block device access to file
  f2fs: port block device access to files
  ext4: port block device access to file
  erofs: port device access to file
  btrfs: port device access to file
  bcachefs: port block device access to file
  target: port block device access to file
  s390: port block device access to file
  nvme: port block device access to file
  block2mtd: port device access to files
  bcache: port block device access to files
  ...
2024-03-11 10:52:34 -07:00
Christian Brauner 9f2f767f5e
bcachefs: port block device access to file
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-18-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25 12:05:25 +01:00
Su Yue 7dcfb87af9 bcachefs: fix kmemleak in __bch2_read_super error handling path
During xfstest tests, there are some kmemleak reports e.g. generic/051 with
if USE_KMEMLEAK=yes:

====================================================================
EXPERIMENTAL kmemleak reported some memory leaks!  Due to the way kmemleak
works, the leak might be from an earlier test, or something totally unrelated.
unreferenced object 0xffff9ef905aaf778 (size 8):
  comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s)
  hex dump (first 8 bytes):
    a5 cc cc cc cc cc cc cc                          ........
  backtrace:
    [<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0
    [<ffffffff87f49b66>] kmalloc_trace+0x26/0xb0
    [<ffffffffc0a3fefe>] __bch2_read_super+0xfe/0x4e0 [bcachefs]
    [<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs]
    [<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs]
    [<ffffffff88080c90>] legacy_get_tree+0x30/0x60
    [<ffffffff8802c748>] vfs_get_tree+0x28/0xf0
    [<ffffffff88061fe5>] path_mount+0x475/0xb60
    [<ffffffff880627e5>] __x64_sys_mount+0x105/0x140
    [<ffffffff88932642>] do_syscall_64+0x42/0xf0
    [<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
unreferenced object 0xffff9ef96cdc4fc0 (size 32):
  comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s)
  hex dump (first 32 bytes):
    2f 64 65 76 2f 6d 61 70 70 65 72 2f 74 65 73 74  /dev/mapper/test
    2d 31 00 cc cc cc cc cc cc cc cc cc cc cc cc cc  -1..............
  backtrace:
    [<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0
    [<ffffffff87f4a081>] __kmalloc_node_track_caller+0x51/0x150
    [<ffffffff87f3adc2>] kstrdup+0x32/0x60
    [<ffffffffc0a3ff1a>] __bch2_read_super+0x11a/0x4e0 [bcachefs]
    [<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs]
    [<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs]
    [<ffffffff88080c90>] legacy_get_tree+0x30/0x60
    [<ffffffff8802c748>] vfs_get_tree+0x28/0xf0
    [<ffffffff88061fe5>] path_mount+0x475/0xb60
    [<ffffffff880627e5>] __x64_sys_mount+0x105/0x140
    [<ffffffff88932642>] do_syscall_64+0x42/0xf0
    [<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
====================================================================

The leak happens if bdev_open_by_path() failed to open a block device
then it goes label 'out' directly without call of bch2_free_super().

Fix it by going to label 'err' instead of 'out' if bdev_open_by_path()
fails.

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-10 17:37:34 -05:00
Linus Torvalds 35a4474b5c More bcachefs updates for 6.7-rc1
- assorted prep work for disk space accounting rewrite
  - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
    makes our trigger context more explicit
  - A few fixes to avoid excessive transaction restarts on multithreaded
    workloads: fstests (in addition to ktest tests) are now checking
    slowpath counters, and that's shaking out a few bugs
  - Assorted tracepoint improvements
  - Starting to break up bcachefs_format.h and move on disk types so
    they're with the code they belong to; this will make room to start
    documenting the on disk format better.
  - A few minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWtjOsACgkQE6szbY3K
 bnbyXRAAsx+yM81TFqsLzRRqf8oocRwf2dj5XzExz9Ig/lYQS5LIVROS2OxwDsAc
 DeaYQSTcph9dkOswCrNR96bBnEgmmZ1ClfVI6WRXvm6vs4rjhSMNbNaVyySrMUVn
 5p/Lsn1/RKl0lWMYlHrdryo+106zRcr6z1Hiv9QCXkXhzdkV8wFYDkfbMveShUsu
 KobC29wvd2EfZr04nqsIXS/y/iRIXhtZqJmFCiAguN70UWrwUwArpELHI5Ve+WPZ
 9VjgFXW6Ka3QxJs/20tX+t24DrC+eDXR44DzQmxwG5mPBBpXkcSk5UgRw/EUag5U
 5+mDZQ5Ei3gvZvUwrilMosVy3pIw0IuvqeqwDGFoFXs1cce01QCMN+NG/dBTQw9i
 KGGxJw5sOrZ8fIiFnypk1M+r9NVtA8MjriLNR5bJjCWPSpWqzkT2HzxFXc6HmTZu
 vsE/AxwC1RLA6B2HZlDEqLOdHE3cofkDiIzWM5ABvb4p118iyk9hE6HhAufk5UdE
 HaG646kGB8pUY/sCxBIOD6K2pgthDFv+fftTM7X+uIazD3bovvPQCEInu48/KAHn
 /KmslSPO0txyjnRFMbXFJvd4Fgfo44GcBCeqGpy3B79aEJ3nroyRZ0qNnnsqj0Gl
 picUWjTn4W561Q1zBXuE/6cLWEp+sfaqYQcM8L3CCitRTVDPaCQ=
 =yd+F
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs updates from Kent Overstreet:
 "Some fixes, Some refactoring, some minor features:

   - Assorted prep work for disk space accounting rewrite

   - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
     makes our trigger context more explicit

   - A few fixes to avoid excessive transaction restarts on
     multithreaded workloads: fstests (in addition to ktest tests) are
     now checking slowpath counters, and that's shaking out a few bugs

   - Assorted tracepoint improvements

   - Starting to break up bcachefs_format.h and move on disk types so
     they're with the code they belong to; this will make room to start
     documenting the on disk format better.

   - A few minor fixes"

* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
  bcachefs: Improve inode_to_text()
  bcachefs: logged_ops_format.h
  bcachefs: reflink_format.h
  bcachefs; extents_format.h
  bcachefs: ec_format.h
  bcachefs: subvolume_format.h
  bcachefs: snapshot_format.h
  bcachefs: alloc_background_format.h
  bcachefs: xattr_format.h
  bcachefs: dirent_format.h
  bcachefs: inode_format.h
  bcachefs; quota_format.h
  bcachefs: sb-counters_format.h
  bcachefs: counters.c -> sb-counters.c
  bcachefs: comment bch_subvolume
  bcachefs: bch_snapshot::btime
  bcachefs: add missing __GFP_NOWARN
  bcachefs: opts->compression can now also be applied in the background
  bcachefs: Prep work for variable size btree node buffers
  bcachefs: grab s_umount only if snapshotting
  ...
2024-01-21 14:01:12 -08:00
Kent Overstreet 3a58dfbc46 bcachefs: counters.c -> sb-counters.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 4ae016607b bcachefs: Print size of superblock with space allocated
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:09 -05:00
Linus Torvalds 999a36b52b bcachefs updates for 6.8:
- btree write buffer rewrite: instead of adding keys to the btree write
    buffer at transaction commit time, we know journal them with a
    different journal entry type and copy them from the journal to the
    write buffer just prior to journal write.
 
    This reduces the number of atomic operations on shared cachelines
    in the transaction commit path and is a signicant performance
    improvement on some workloads: multithreaded 4k random writes went
    from ~650k iops to ~850k iops.
 
  - Bring back optimistic spinning for six locks: the new implementation
    doesn't use osq locks; instead we add to the lock waitlist as normal,
    and then spin on the lock_acquired bit in the waitlist entry, _not_
    the lock itself.
 
  - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types
  - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck
    but without mounting: useful for transparently using the kernel
    version of fsck from 'bcachefs fsck' when the kernel version is a
    better match for the on disk filesystem.
 
  - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet,
    but the passes that are supported are fully featured - errors may be
    corrected as normal.
 
    The new ioctls use the new 'thread_with_file' abstraction for kicking
    off a kthread that's tied to a file descriptor returned to userspace
    via the ioctl.
 
  - btree_paths within a btree_trans are now dynamically growable,
    instead of being limited to 64. This is important for the
    check_directory_structure phase of fsck, and also fixes some issues
    we were having with btree path overflow in the reflink btree.
 
  - Trigger refactoring; prep work for the upcoming disk space accounting
    rewrite
 
  - Numerous bugfixes :)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWe8PUACgkQE6szbY3K
 bnYw6g/9GAXfIGasTZZwK2XEr36RYtEFYMwd/m9V1ET0DH6d/MFH9G7tTYl52AQ4
 k9cDFb0d2qdtNk2Rlml1lHFrxMzkp2Q7j9S4YcETrE+/Dir8ODVcJXrGeNTCMGmz
 B+C12mTOpWrzGMrioRgFZjWAnacsY3RP8NFRTT9HIJHO9UCP+xN5y++sX10C5Gwv
 7UVWTaUwjkgdYWkR8RCKGXuG5cNNlRp4Y0eeK2XruG1iI9VAilir1glcD/YMOY8M
 vECQzmf2ZLGFS/tpnmqVhNbNwVWpTQMYassvKaisWNHLDUgskOoF8YfoYSH27t7F
 GBb1154O2ga6ea866677FDeNVlg386mGCTUy2xOhMpDL3zW+/Is+8MdfJI4MJP5R
 EwcjHnn2bk0C2kULbAohw0gnU42FulfvsLNnrfxCeygmZrDoOOCL1HpvnBG4vskc
 Fp6NK83l974QnyLdPsjr1yB2d2pgb+uMP1v76IukQi0IjNSAyvwSa5nloPTHRzpC
 j6e2cFpdtX+6vEu6KngXVKTblSEnwhVBTaTR37Lr8PX1sZqFS/+mjRDgg3HZa/GI
 u0fC0mQyVL9KjDs5LJGpTc/qs8J4mpoS5+dfzn38MI76dFxd5TYZKWVfILTrOtDF
 ugDnoLkMuYFdueKI2M3YzxXyaA7HBT+7McAdENuJJzJnEuSAZs0=
 =JvA2
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs updates from Kent Overstreet:

 - btree write buffer rewrite: instead of adding keys to the btree write
   buffer at transaction commit time, we now journal them with a
   different journal entry type and copy them from the journal to the
   write buffer just prior to journal write.

   This reduces the number of atomic operations on shared cachelines in
   the transaction commit path and is a signicant performance
   improvement on some workloads: multithreaded 4k random writes went
   from ~650k iops to ~850k iops.

 - Bring back optimistic spinning for six locks: the new implementation
   doesn't use osq locks; instead we add to the lock waitlist as normal,
   and then spin on the lock_acquired bit in the waitlist entry, _not_
   the lock itself.

 - New ioctls:

    - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types

    - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of
      fsck but without mounting: useful for transparently using the
      kernel version of fsck from 'bcachefs fsck' when the kernel
      version is a better match for the on disk filesystem.

    - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported
      yet, but the passes that are supported are fully featured - errors
      may be corrected as normal.

   The new ioctls use the new 'thread_with_file' abstraction for kicking
   off a kthread that's tied to a file descriptor returned to userspace
   via the ioctl.

 - btree_paths within a btree_trans are now dynamically growable,
   instead of being limited to 64. This is important for the
   check_directory_structure phase of fsck, and also fixes some issues
   we were having with btree path overflow in the reflink btree.

 - Trigger refactoring; prep work for the upcoming disk space accounting
   rewrite

 - Numerous bugfixes :)

* tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits)
  bcachefs: eytzinger0_find() search should be const
  bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()
  bcachefs: fix simulateously upgrading & downgrading
  bcachefs: Restart recovery passes more reliably
  bcachefs: bch2_dump_bset() doesn't choke on u64s == 0
  bcachefs: improve checksum error messages
  bcachefs: improve validate_bset_keys()
  bcachefs: print sb magic when relevant
  bcachefs: __bch2_sb_field_to_text()
  bcachefs: %pg is banished
  bcachefs: Improve would_deadlock trace event
  bcachefs: fsck_err()s don't need to manually check c->sb.version anymore
  bcachefs: Upgrades now specify errors to fix, like downgrades
  bcachefs: no thread_with_file in userspace
  bcachefs: Don't autofix errors we can't fix
  bcachefs: add missing bch2_latency_acct() call
  bcachefs: increase max_active on io_complete_wq
  bcachefs: add time_stats for btree_node_read_done()
  bcachefs: don't clear accessed bit in btree node fill
  bcachefs: Add an option to control btree node prefetching
  ...
2024-01-10 16:34:17 -08:00
Linus Torvalds 3f6984e730 vfs-6.8.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc
 osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1
 /wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc=
 =iFAQ
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs super updates from Christian Brauner:
 "This contains the super work for this cycle including the long-awaited
  series by Jan to make it possible to prevent writing to mounted block
  devices:

   - Writing to mounted devices is dangerous and can lead to filesystem
     corruption as well as crashes. Furthermore syzbot comes with more
     and more involved examples how to corrupt block device under a
     mounted filesystem leading to kernel crashes and reports we can do
     nothing about. Add tracking of writers to each block device and a
     kernel cmdline argument which controls whether other writeable
     opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
     allowed.

     Note that this effectively only prevents modification of the
     particular block device's page cache by other writers. The actual
     device content can still be modified by other means - e.g. by
     issuing direct scsi commands, by doing writes through devices lower
     in the storage stack (e.g. in case loop devices, DM, or MD are
     involved) etc. But blocking direct modifications of the block
     device page cache is enough to give filesystems a chance to perform
     data validation when loading data from the underlying storage and
     thus prevent kernel crashes.

     Syzbot can use this cmdline argument option to avoid uninteresting
     crashes. Also users whose userspace setup does not need writing to
     mounted block devices can set this option for hardening. We expect
     that this will be interesting to quite a few workloads.

     Btrfs is currently opted out of this because they still haven't
     merged patches we require for this to work from three kernel
     releases ago.

   - Reimplement block device freezing and thawing as holder operations
     on the block device.

     This allows us to extend block device freezing to all devices
     associated with a superblock and not just the main device. It also
     allows us to remove get_active_super() and thus another function
     that scans the global list of superblocks.

     Freezing via additional block devices only works if the filesystem
     chooses to use @fs_holder_ops for these additional devices as well.
     That currently only includes ext4 and xfs.

     Earlier releases switched get_tree_bdev() and mount_bdev() to use
     @fs_holder_ops. The remaining nilfs2 open-coded version of
     mount_bdev() has been converted to rely on @fs_holder_ops as well.
     So block device freezing for the main block device will continue to
     work as before.

     There should be no regressions in functionality. The only special
     case is btrfs where block device freezing for the main block device
     never worked because sb->s_bdev isn't set. Block device freezing
     for btrfs can be fixed once they can switch to @fs_holder_ops but
     that can happen whenever they're ready"

* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  block: Fix a memory leak in bdev_open_by_dev()
  super: don't bother with WARN_ON_ONCE()
  super: massage wait event mechanism
  ext4: Block writes to journal device
  xfs: Block writes to log device
  fs: Block writes to mounted block devices
  btrfs: Do not restrict writes to btrfs devices
  block: Add config option to not allow writing to mounted devices
  block: Remove blkdev_get_by_*() functions
  bcachefs: Convert to bdev_open_by_path()
  fs: handle freezing from multiple devices
  fs: remove dead check
  nilfs2: simplify device handling
  fs: streamline thaw_super_locked
  ext4: simplify device handling
  xfs: simplify device handling
  fs: simplify setup_bdev_super() calls
  blkdev: comment fs_holder_ops
  porting: document block device freeze and thaw changes
  fs: remove unused helper
  ...
2024-01-08 10:43:51 -08:00
Kent Overstreet e7999235e6 bcachefs: fix simulateously upgrading & downgrading
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet 4819b66e29 bcachefs: improve checksum error messages
new helpers:
 - bch2_csum_to_text()
 - bch2_csum_err_msg()

standardize our checksum error messages a bit, and print out the
checksums a bit more nicely.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet 5e448c4893 bcachefs: print sb magic when relevant
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet 5b88365660 bcachefs: __bch2_sb_field_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet 1f5af5fc17 bcachefs: %pg is banished
not portable to userspace

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet 15eaaa4c31 bcachefs: Upgrades now specify errors to fix, like downgrades
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet 6b00de06f5 bcachefs: bch_member->seq
Add new fields for split brain detection:

 - bch_member->seq, which tracks the sequence number of the last superblock
   write that happened to each member device

 - bch_sb->write_time, which tracks the time of the last superblock write,
   to allow detection of when two members have diverged but had the same
   number of superblock writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet cea07a7b6a bcachefs: vstruct_for_each() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet 9fea2274f7 bcachefs: for_each_member_device() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet 447c1c0105 bcachefs: check for failure to downgrade
With the upcoming member seq patch, it's now critical that we don't ever
write to a superblock that hasn't been version downgraded - failure to
update member seq fields will cause split brain detection to fire
erroniously.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet 3c471b6588 bcachefs: convert bch_fs_flags to x-macro
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Daniel Hill 2b161cc7cb bcachefs: add a quieter bch2_read_super
If we're looking for a bcachefs supers iteratively we don't want to see
this error.

This function replaces KERN_ERR with KERN_INFO for when we don't find a
bcachefs superblock but preserves other errors.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet 84f1638795 bcachefs: bch_sb_field_downgrade
Add a new superblock section that contains a list of
  { minor version, recovery passes, errors_to_fix }

that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.

The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet 8b16413cda bcachefs: bch_sb.recovery_passes_required
Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.

 - recovery_passes_requried: recovery passes that must be run on the
   next mount
 - errors_silent: errors that will be silently fixed

These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Brian Foster 63807d9518 bcachefs: preserve device path as device name
Various userspace scripts/tools may expect mount entries in
/proc/mounts to reflect the device path names used to mount the
associated filesystem. bcachefs seems to normalize the device path
to the underlying device name based on the block device. This
confuses tools like fstests when the test devices might be lvm or
device-mapper based.

The default behavior for show_vfsmnt() appers to be to use the
string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path
at device superblock read time and to display it via
->show_devname().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24 02:42:07 -05:00
Jan Kara 1bfdc94b28
bcachefs: Convert to bdev_open_by_path()
Convert bcachefs to use bdev_open_by_path() and pass the handle around.

CC: Kent Overstreet <kent.overstreet@linux.dev>
CC: Brian Foster <bfoster@redhat.com>
CC: <linux-bcachefs@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231101174325.10596-1-jack@suse.cz
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-18 14:59:25 +01:00
Kent Overstreet 59154f2c66 bcachefs: bch2_prt_datetime()
Improved, better named version of pr_time().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05 13:12:08 -05:00
Kent Overstreet 01ccee225a bcachefs: Add missing printk newlines
This was causing error messages in -tools to not get printed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04 22:19:13 -04:00
Kent Overstreet f5d26fa31e bcachefs: bch_sb_field_errors
Add a new superblock section to keep counts of errors seen since
filesystem creation: we'll be addingcounters for every distinct fsck
error.

The new superblock section has entries of the for [ id, count,
time_of_last_error ]; this is intended to let us see what errors are
occuring - and getting fixed - via show-super output.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet 94119eeb02 bcachefs: Add IO error counts to bch_member
We now track IO errors per device since filesystem creation.

IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet 4637429e39 bcachefs: bch2_sb_field_get() refactoring
Instead of using token pasting to generate methods for each superblock
section, just make the type a parameter to bch2_sb_field_get().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:16 -04:00
Hunter Shaffer 9af26120f0 bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Hunter Shaffer 3f7b9713da bcachefs: New superblock section members_v2
members_v2 has dynamically resizable entries so that we can extend
bch_member. The members can no longer be accessed with simple array
indexing Instead members_v2_get is used to find a member's exact
location within the array and returns a copy of that member.
Alternatively member_v2_get_mut retrieves a mutable point to a member.

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Hunter Shaffer 1241df5872 bcachefs: Add new helper to retrieve bch_member from sb
Prep work for introducing bch_sb_field_members_v2 - introduce new
helpers that will check for members_v2 if it exists, otherwise using v1

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Kent Overstreet 793a06d984 bcachefs: Fixes for building in userspace
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Kent Overstreet 40a53b9215 bcachefs: More minor smatch fixes
- fix a few uninitialized return values
 - return a proper error code in lookup_lostfound()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:14 -04:00
Kent Overstreet 96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Colin Ian King 6bf3766b52 bcachefs: Fix a handful of spelling mistakes in various messages
There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet 1809b8cba7 bcachefs: Break up io.c
More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet a37ad1a3ab bcachefs: sb-clean.c
Pull code for bch_sb_field_clean out into its own file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet 0ec3985694 bcachefs: Move bch_sb_field_crypt code to checksum.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet fb8e5b4cae bcachefs: sb-members.c
Split out a new file for bch_sb_field_members - we'll likely want to
move more code here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet 1e81f89b02 bcachefs: Fix assorted checkpatch nits
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet 6fe893eade bcachefs: Fix for sb buffer being misaligned
On old kernels, kmalloc() may return an allocation that's not naturally
aligned - this resulted in a bug where we allocated a bio with not
enough biovecs. Fix this by using buf_pages().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet bf5a261c7a bcachefs: Assorted fixes for clang
clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:09 -04:00
Kent Overstreet 813e0cecd1 bcachefs: Upgrade path fixes
Some minor fixes to not print errors that are actually due to a verson
upgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet f39d1aca4d bcachefs: Add buffered IO fallback for userspace
In userspace, we want to be able to switch to buffered IO when we're
dealing with an image on a filesystem/device that doesn't support the
blocksize the filesystem was formatted with.

This plumbs through !opts.direct_io -> FMODE_BUFFERED, which will be
supported by the shim version of blkdev_get_by_path() in -tools, and it
adds a fallback to disable direct IO and retry for userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet b912913613 bcachefs: Fix build error on weird gcc
fixes
./include/linux/stddef.h:8:14: error: positional initialization of field in ‘struct’ declared with ‘designated_init’ attribute [-Werror=designated-init]

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet 065bd3356c bcachefs: Version table now lists required recovery passes
Now that we've got forward compatibility sorted out, we should be doing
more frequent version upgrades in the future.

To avoid having to run a full fsck for every version upgrade, this
improves the BCH_METADATA_VERSIONS() table to explicitly specify a
bitmask of recovery passes to run when upgrading to or past a given
version.

This means we can also delete PASS_UPGRADE().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet 6619d84626 bcachefs: bch2_sb_maybe_downgrade(), bch2_sb_upgrade()
Add some new helpers, and fix upgrade/downgrade in bch2_fs_initialize().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet ba8eeae8ee bcachefs: bcachefs_metadata_version_major_minor
This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.

With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet 3045bb958a bcachefs: version_upgrade is now an enum
The version_upgrade parameter is now an enum, not a bool, and it's
persistent in the superblock:
 - compatible (default):	upgrade to the latest compatible version
 - incompatible:		upgrade to latest incompatible version
 - none

Currently all upgrades are incompatible upgrades, but the next release
will introduce major:minor versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00