Commit graph

87835 commits

Author SHA1 Message Date
Linus Torvalds
bdb2701f0b for-6.7-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmV5rTIACgkQxWXV+ddt
 WDuLUg/+Ix/CeA+JY6VZMA2kBHMzmRexSjYONWfQwIL7LPBy4sOuSEaTZt+QQMs+
 AEKau1YfTgo7e9S2DlbZhIWp6P87VFui7Q1E99uJEmKelakvf94DbMrufPTTKjaD
 JG2KB6LsD59yWwfbGHEAVVNGSMRk2LDXzcUWMK6/uzu/7Bcr4ataOymWd86/blUV
 cw5g87uAHpBn+R1ARTf1CkqyYiI9UldNUJmW1q7dwxOyYG+weUtJImosw2Uda76y
 wQXAFQAH3vsFzTC+qjC9Vz7cnyAX9qAw48ODRH7rIT1BQ3yAFQbfXE20jJ/fSE+C
 lz3p05tA9373KAOtLUHmANBwe3NafCnlut6ZYRfpTcEzUslAO5PnajPaHh5Al7uC
 Iwdpy49byoyVFeNf0yECBsuDP8s86HlUALF8mdJabPI1Kl66MUea6KgS1oyO3pCB
 hfqLbpofV4JTywtIRLGQTQvzSwkjPHTbSwtZ9nftTw520a5f7memDu5vi4XzFd+B
 NrJxmz2DrMRlwrLgWg9OXXgx1riWPvHnIoqzjG5W6A9N74Ud1/oz7t3VzjGSQ5S2
 UikRB6iofPE0deD8IF6H6DvFfvQxU9d9BJ6IS9V2zRt5vdgJ2w08FlqbLZewSY4x
 iaQ+L7UYKDjC9hdosXVNu/6fAspyBVdSp2NbKk14fraZtNAoPNs=
 =uF/Q
 -----END PGP SIGNATURE-----

Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
  "Some fixes to quota accounting code, mostly around error handling and
   correctness:

   - free reserves on various error paths, after IO errors or
     transaction abort

   - don't clear reserved range at the folio release time, it'll be
     properly cleared after final write

   - fix integer overflow due to int used when passing around size of
     freed reservations

   - fix a regression in squota accounting that missed some cases with
     delayed refs"

* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ensure releasing squota reserve on head refs
  btrfs: don't clear qgroup reserved bit in release_folio
  btrfs: free qgroup pertrans reserve on transaction abort
  btrfs: fix qgroup_free_reserved_data int overflow
  btrfs: free qgroup reserve when ORDERED_IOERR is set
2023-12-14 11:53:00 -08:00
Linus Torvalds
5bd7ef53ff ufs got broken this merge window on folio conversion - calling
conventions for filemap_lock_folio() are not the same as for
 find_lock_page()
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZXnZHAAKCRBZ7Krx/gZQ
 6y3QAQCazzMsqWYmqfkbR5yGjolKBPS6ILFWBHWoFySs9/WptAEA3c/960nhFuh1
 aQE9Qp5zUlbWmSZ5zjz3Q2lX8N/jugU=
 =kyMm
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull ufs fix from Al Viro:
 "ufs got broken this merge window on folio conversion - calling
  conventions for filemap_lock_folio() are not the same as for
  find_lock_page()"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix ufs_get_locked_folio() breakage
2023-12-13 11:09:58 -08:00
Al Viro
485053bb81 fix ufs_get_locked_folio() breakage
filemap_lock_folio() returns ERR_PTR(-ENOENT) if the thing is not
in cache - not NULL like find_lock_page() used to.

Fixes: 5fb7bd50b3 "ufs: add ufs_get_locked_folio and ufs_put_locked_folio"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-13 11:14:09 -05:00
Linus Torvalds
cf52eed70e Fix various bugs / regressions for ext4, including a soft lockup, a
WARN_ON, and a BUG.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmV4tMwACgkQ8vlZVpUN
 gaNZRAf/ejQZne9iZck8SSV62mkR9E7EwN9J2+gkWFrlsyurErZlVsBA5yRB+i9A
 V1v6DRGDnYFwKFNHJhR/RW9NhEwpYkX9Vo3miksSCq8rsAB1kjSs3xVrTBIYi/8c
 ztw4ncyxW7RRFRmruzFfUEKriiyJzxJYx+EqbNsQHcl5ET6Y2/5zM0bChV9MwuN3
 iS1Rm98RbHVrylzKbGG562MaGdJyUYvQ+mnRCgma1mTu6K9SWLJg211icLTsDhHg
 XEB/QGWji2O7xOudcry8wLIpoR6rYPAhWfbkLekW1K9hjV3iXuJoVjj7eB9LctMf
 FAXr8u0FKJI0iIQyrQrEEqIuh+jKBA==
 =4zQL
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix various bugs / regressions for ext4, including a soft lockup, a
  WARN_ON, and a BUG"

* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix soft lockup in journal_finish_inode_data_buffers()
  ext4: fix warning in ext4_dio_write_end_io()
  jbd2: increase the journal IO's priority
  jbd2: correct the printing of write_flags in jbd2_write_superblock()
  ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12 11:37:04 -08:00
Linus Torvalds
eaadbbaaff fuse fixes for 6.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZXhQgQAKCRDh3BK/laaZ
 PHL9AQC0y7A+HLH6oXM8uI8rqC8e78qGdoGGl+Ppapae+BhO8gD+Or5B4yR0MZKR
 Z/j0zMe57mmRMplxcwz/LXXCqeE9+w0=
 =cn6g
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

 - Fix a couple of potential crashes, one introduced in 6.6 and one
   in 5.10

 - Fix misbehavior of virtiofs submounts on memory pressure

 - Clarify naming in the uAPI for a recent feature

* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
  fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
  fuse: share lookup state between submount and its parent
  docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
  fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
2023-12-12 11:06:41 -08:00
Linus Torvalds
8b8cd4beea nine smb3 server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmV3pUoACgkQiiy9cAdy
 T1GQCgv/YURd8zz5k+GSOvUF2tCl6zW6h0NJQbWRIjgl4i7eGHZwIslgCI6kZIN1
 AFrSyUj4tZQmFvh0aVWLZeWsoKETbSkOYkz2dC4X/lC8LJD3VAy3vzAhu4oSAWva
 +pItQVlOTG0CcmMSTANSfw0sSsCwC2BHAUJnnu7ypgERI3wllOPtxE1xN9mT/8Bf
 NxJDZa3jtZd2hC4Cda1NTYYEfaSGufEOzPZIW9/h5ftpRo0qtEZkKh9TPddBMGm4
 yMnt1sSp4DHoW6xOyGOt+7kJAGA5NtP3/voLSjirG558Bb4HjWhBT+Dkxe6dUiXn
 i9gi1bFJ/8gRulv1cTdOxTFGE+i9Wr4PzpG2g82qugYRTl3LqLoJBa8NH+WzKz+q
 AX8EySFdlJtE++wTMNZB5hgFuJNGkzRi3YbjrQjvHFDQvaSVHvtayyhuEN+UcqAe
 gWuj1PTDKy6cfkxFYPDEBtMgp1u4+72nWOxoYUE5LyvzkLCLjfgMKCDX03RlAvfZ
 zB76cU/3
 =yMkH
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Memory leak fix (in lock error path)

 - Two fixes for create with allocation size

 - FIx for potential UAF in lease break error path

 - Five directory lease (caching) fixes found during additional recent
   testing

* tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
  ksmbd: fix wrong allocation size update in smb2_open()
  ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
  ksmbd: lazy v2 lease break on smb2_write()
  ksmbd: send v2 lease break notification for directory
  ksmbd: downgrade RWH lease caching state to RH for directory
  ksmbd: set v2 lease capability
  ksmbd: set epoch in create context v2 lease
  ksmbd: fix memory leak in smb2_lock()
2023-12-12 10:30:10 -08:00
Ye Bin
6c02757c93 jbd2: fix soft lockup in journal_finish_inode_data_buffers()
There's issue when do io test:
WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G  OE
Call trace:
 dump_backtrace+0x0/0x1a0
 show_stack+0x24/0x30
 dump_stack+0xb0/0x100
 watchdog_timer_fn+0x254/0x3f8
 __hrtimer_run_queues+0x11c/0x380
 hrtimer_interrupt+0xfc/0x2f8
 arch_timer_handler_phys+0x38/0x58
 handle_percpu_devid_irq+0x90/0x248
 generic_handle_irq+0x3c/0x58
 __handle_domain_irq+0x68/0xc0
 gic_handle_irq+0x90/0x320
 el1_irq+0xcc/0x180
 queued_spin_lock_slowpath+0x1d8/0x320
 jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
 kjournald2+0xec/0x2f0 [jbd2]
 kthread+0x134/0x138
 ret_from_fork+0x10/0x18

Analyzed informations from vmcore as follows:
(1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
(2) Now is processing the 855th jbd2_inode;
(3) JBD2 task has TIF_NEED_RESCHED flag;
(4) There's no pags in address_space around the 855th jbd2_inode;
(5) There are some process is doing drop caches;
(6) Mounted with 'nodioread_nolock' option;
(7) 128 CPUs;

According to informations from vmcore we know 'journal->j_list_lock' spin lock
competition is fierce. So journal_finish_inode_data_buffers() maybe process
slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
will not call cond_resched(). So may lead to soft lockup.
journal_finish_inode_data_buffers
  filemap_fdatawait_range_keep_errors
    __filemap_fdatawait_range
      while (index <= end)
        nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
        if (!nr_pages)
           break;    --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
        for (i = 0; i < nr_pages; i++)
          wait_on_page_writeback(page);
        cond_resched();

To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-12-12 10:25:46 -05:00
Linus Torvalds
26aff84943 More bcachefs bugfixes for 6.7:
- Fix a rare emergency shutdown path bug: dropping journal pins after
    the filesystem has mostly been torn down is not what we want.
  - Fix some concurrency issues with the btree write buffer and journal
    replay by not using the btree write buffer until journal replay is
    finished
  - A fixup from the prior patch to kill journal pre-reservations: at the
    start of the btree update path, where previously we took a
    pre-reservation, we do at least want to check the journal watermark.
  - Fix a race between dropping device metadata and btree node writes,
    which would re-add a pointer to a device that had just been dropped
  - Fix one of the SCRU lock warnings, in
    bch2_compression_stats_to_text().
  - Partial fix for a rare transaction paths overflow, when indirect
    extents had been split by background tasks, by not running certain
    triggers when they're not needed.
  - Fix for creating a snapshot with implicit source in a subdirectory of
    the containing subvolume
  - Don't unfreeze when we're emergency read-only
  - Fix for rebalance spinning trying to compress unwritten extentns
  - Another deleted_inodes fix, for directories
  - Fix a rare deadlock (usually just an unecessary wait) when flushing
    the journal with an open journal entry.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmV2T4EACgkQE6szbY3K
 bnay1w/+PyH5qwE2gOy17rno6cWSNyKJELUkcqVNqrSTZpuA+TbMbcV8+oOeBnG1
 9/ShwKRvwwNC4HVk6KySoTMo9lRkaZ5wX6DpEsOqxoN8aCp6kqiCUxr0inAAyVdu
 O8FktP83eSX/vERWNlCeGLdi1KsCK0BWVbVMpkiVEO9QhLpS9eo1C8btstDIjbsv
 TVGvKO7IpVgibSBwymQPKpZa6BGN4d6emLlgKStdpVVR1RwJW3eLJwi1EV2hSp1f
 LBnTI5eD64pu+phEb4zE83JX932XAbxdBWaHlN1y3i4l6+sJDu63Y4R8bkbW+rnJ
 cbiyYM5IuAH6MFbbh9rIW8kEIvjrX13mY94oGlK8ClCI9WX129jD5538tEH624U5
 KnhCZpkuzeGC5CVXNAzdJ8NP/Aj9qtKvSyssG6R5ZTitQ1FnTZ391Wb2pIRgj9pm
 yVfpJ/Q4cizVfSsKBvtr0U5I444zq50z+brKwegIoH8uMuGHKXcIgTUOu4q5pKDD
 znjS9eFrQTN2li2HB3LMxuS94yUmozqwgxClMptynLsHVknQH7F3cAdD+mYbwW5Q
 GUOd/QTlpskBYAUfBS8ewllowRjLGDJyrGvbR9Mvitk8CxOLRgoDipdh1K13jDMS
 zCmG1eQgdbtPHTM6fqif8Bu8xtgK7p2r099dcBhhiWmRyLPo5Qw=
 =l5sa
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2023-12-10' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs bugfixes from Kent Overstreet:

 - Fix a rare emergency shutdown path bug: dropping journal pins after
   the filesystem has mostly been torn down is not what we want.

 - Fix some concurrency issues with the btree write buffer and journal
   replay by not using the btree write buffer until journal replay is
   finished

 - A fixup from the prior patch to kill journal pre-reservations: at the
   start of the btree update path, where previously we took a
   pre-reservation, we do at least want to check the journal watermark.

 - Fix a race between dropping device metadata and btree node writes,
   which would re-add a pointer to a device that had just been dropped

 - Fix one of the SCRU lock warnings, in
   bch2_compression_stats_to_text().

 - Partial fix for a rare transaction paths overflow, when indirect
   extents had been split by background tasks, by not running certain
   triggers when they're not needed.

 - Fix for creating a snapshot with implicit source in a subdirectory of
   the containing subvolume

 - Don't unfreeze when we're emergency read-only

 - Fix for rebalance spinning trying to compress unwritten extentns

 - Another deleted_inodes fix, for directories

 - Fix a rare deadlock (usually just an unecessary wait) when flushing
   the journal with an open journal entry.

* tag 'bcachefs-2023-12-10' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Close journal entry if necessary when flushing all pins
  bcachefs: Fix uninitialized var in bch2_journal_replay()
  bcachefs: Fix deleted inode check for dirs
  bcachefs: rebalance shouldn't attempt to compress unwritten extents
  bcachefs: don't attempt rw on unfreeze when shutdown
  bcachefs: Fix creating snapshot with implict source
  bcachefs: Don't run indirect extent trigger unless inserting/deleting
  bcachefs: Convert compression_stats to for_each_btree_key2
  bcachefs: Fix bch2_extent_drop_ptrs() call
  bcachefs: Fix a journal deadlock in replay
  bcachefs; Don't use btree write buffer until journal replay is finished
  bcachefs: Don't drop journal pins in exit path
2023-12-11 16:13:51 -08:00
David Howells
52bf9f6c09 afs: Fix refcount underflow from error handling race
If an AFS cell that has an unreachable (eg. ENETUNREACH) server listed (VL
server or fileserver), an asynchronous probe to one of its addresses may
fail immediately because sendmsg() returns an error.  When this happens, a
refcount underflow can happen if certain events hit a very small window.

The way this occurs is:

 (1) There are two levels of "call" object, the afs_call and the
     rxrpc_call.  Each of them can be transitioned to a "completed" state
     in the event of success or failure.

 (2) Asynchronous afs_calls are self-referential whilst they are active to
     prevent them from evaporating when they're not being processed.  This
     reference is disposed of when the afs_call is completed.

     Note that an afs_call may only be completed once; once completed
     completing it again will do nothing.

 (3) When a call transmission is made, the app-side rxrpc code queues a Tx
     buffer for the rxrpc I/O thread to transmit.  The I/O thread invokes
     sendmsg() to transmit it - and in the case of failure, it transitions
     the rxrpc_call to the completed state.

 (4) When an rxrpc_call is completed, the app layer is notified.  In this
     case, the app is kafs and it schedules a work item to process events
     pertaining to an afs_call.

 (5) When the afs_call event processor is run, it goes down through the
     RPC-specific handler to afs_extract_data() to retrieve data from rxrpc
     - and, in this case, it picks up the error from the rxrpc_call and
     returns it.

     The error is then propagated to the afs_call and that is completed
     too.  At this point the self-reference is released.

 (6) If the rxrpc I/O thread manages to complete the rxrpc_call within the
     window between rxrpc_send_data() queuing the request packet and
     checking for call completion on the way out, then
     rxrpc_kernel_send_data() will return the error from sendmsg() to the
     app.

 (7) Then afs_make_call() will see an error and will jump to the error
     handling path which will attempt to clean up the afs_call.

 (8) The problem comes when the error handling path in afs_make_call()
     tries to unconditionally drop an async afs_call's self-reference.
     This self-reference, however, may already have been dropped by
     afs_extract_data() completing the afs_call

 (9) The refcount underflows when we return to afs_do_probe_vlserver() and
     that tries to drop its reference on the afs_call.

Fix this by making afs_make_call() attempt to complete the afs_call rather
than unconditionally putting it.  That way, if afs_extract_data() manages
to complete the call first, afs_make_call() won't do anything.

The bug can be forced by making do_udp_sendmsg() return -ENETUNREACH and
sticking an msleep() in rxrpc_send_data() after the 'success:' label to
widen the race window.

The error message looks something like:

    refcount_t: underflow; use-after-free.
    WARNING: CPU: 3 PID: 720 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
    ...
    RIP: 0010:refcount_warn_saturate+0xba/0x110
    ...
    afs_put_call+0x1dc/0x1f0 [kafs]
    afs_fs_get_capabilities+0x8b/0xe0 [kafs]
    afs_fs_probe_fileserver+0x188/0x1e0 [kafs]
    afs_lookup_server+0x3bf/0x3f0 [kafs]
    afs_alloc_server_list+0x130/0x2e0 [kafs]
    afs_create_volume+0x162/0x400 [kafs]
    afs_get_tree+0x266/0x410 [kafs]
    vfs_get_tree+0x25/0xc0
    fc_mount+0xe/0x40
    afs_d_automount+0x1b3/0x390 [kafs]
    __traverse_mounts+0x8f/0x210
    step_into+0x340/0x760
    path_openat+0x13a/0x1260
    do_filp_open+0xaf/0x160
    do_sys_openat2+0xaf/0x170

or something like:

    refcount_t: underflow; use-after-free.
    ...
    RIP: 0010:refcount_warn_saturate+0x99/0xda
    ...
    afs_put_call+0x4a/0x175
    afs_send_vl_probes+0x108/0x172
    afs_select_vlserver+0xd6/0x311
    afs_do_cell_detect_alias+0x5e/0x1e9
    afs_cell_detect_alias+0x44/0x92
    afs_validate_fc+0x9d/0x134
    afs_get_tree+0x20/0x2e6
    vfs_get_tree+0x1d/0xc9
    fc_mount+0xe/0x33
    afs_d_automount+0x48/0x9d
    __traverse_mounts+0xe0/0x166
    step_into+0x140/0x274
    open_last_lookups+0x1c1/0x1df
    path_openat+0x138/0x1c3
    do_filp_open+0x55/0xb4
    do_sys_openat2+0x6c/0xb6

Fixes: 34fa47612b ("afs: Fix race in async call refcounting")
Reported-by: Bill MacAllister <bill@ca-zephyr.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052304
Suggested-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/2633992.1702073229@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-11 15:40:41 -08:00
Kent Overstreet
a66ff26b0f bcachefs: Close journal entry if necessary when flushing all pins
Since outstanding journal buffers hold a journal pin, when flushing all
pins we need to close the current journal entry if necessary so its pin
can be released.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-10 16:53:46 -05:00
Kent Overstreet
4a147af208 bcachefs: Fix uninitialized var in bch2_journal_replay()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-10 12:23:07 -05:00
Linus Torvalds
ca20f1622b Char/Misc driver fixes for 6.7-rc5
Here are some small fixes for 6.7-rc5 for a variety of small driver
 subsystems.  Included in here are:
   - debugfs revert for reported issue
   - greybus revert for reported issue
   - greybus fixup for endian build warning
   - coresight driver fixes
   - nvmem driver fixes
   - devcoredump fix
   - parport new device id
   - ndtest build fix
 
 All of these have ben in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZXRxAQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykphwCfaF0Dh6oajneYbo/pq70+an876uYAnjwALPfr
 g2EezrYYUAkkPACOd27t
 =q7gC
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc driver fixes from Greg KH:
 "Here are some small fixes for 6.7-rc5 for a variety of small driver
  subsystems. Included in here are:

   - debugfs revert for reported issue

   - greybus revert for reported issue

   - greybus fixup for endian build warning

   - coresight driver fixes

   - nvmem driver fixes

   - devcoredump fix

   - parport new device id

   - ndtest build fix

  All of these have ben in linux-next with no reported issues"

* tag 'char-misc-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  nvmem: Do not expect fixed layouts to grab a layout driver
  parport: Add support for Brainboxes IX/UC/PX parallel cards
  Revert "greybus: gb-beagleplay: Ensure le for values in transport"
  greybus: gb-beagleplay: Ensure le for values in transport
  greybus: BeaglePlay driver needs CRC_CCITT
  Revert "debugfs: annotate debugfs handlers vs. removal with lockdep"
  devcoredump: Send uevent once devcd is ready
  ndtest: fix typo class_regster -> class_register
  misc: mei: client.c: fix problem of return '-EOVERFLOW' in mei_cl_write
  misc: mei: client.c: return negative error code in mei_cl_write
  mei: pxp: fix mei_pxp_send_message return value
  coresight: ultrasoc-smb: Fix uninitialized before use buf_hw_base
  coresight: ultrasoc-smb: Config SMB buffer before register sink
  coresight: ultrasoc-smb: Fix sleep while close preempt in enable_smb
  Documentation: coresight: fix `make refcheckdocs` warning
  hwtracing: hisi_ptt: Don't try to attach a task
  hwtracing: hisi_ptt: Handle the interrupt in hardirq context
  hwtracing: hisi_ptt: Add dummy callback pmu::read()
  coresight: Fix crash when Perf and sysfs modes are used concurrently
  coresight: etm4x: Remove bogous __exit annotation for some functions
2023-12-09 12:44:10 -08:00
Linus Torvalds
2099306c4e Six smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVzknIACgkQiiy9cAdy
 T1HIsgwAkpJsKz0ReQB0PDNGQ9/ZHnxhQ5V/E1sCtwbXeu50YffIo+hpABtkK8aK
 ViZAEgpcoO0kWzSr7zvPwxkykbDLV8pBXqvwHTg/KxXpxCEGt2VNCP5oOAFFp1IM
 8XpBkitB1rCuM6P6vJ5nmRK1gpwaGS/qwLWqROejvsouur/w/KfOH6bNEq/PMypo
 sUc7N5WP01grnh9/ipvgUBHSjKpWJFZ7Y2eRXirCXBMEHmosqbam0Fac6njYflLE
 4gc8YSrfxHaBqrgschAzcnwQrGY0lEy/UN9KpUOCKqGxP4ha45ni5t9foiCArhI5
 qzA7Ns//zqRcl+/A5DmJi8brvbHWnDzG4wQ5IyEm81w/8W5HqXt+9kGSxs6q8K0n
 MdVBq2eRl3untIRTXtMpp2xvPyAdKOuYZYfe27nv64Il2Xg3xUnh0viuX5M/BHBH
 kIqhMcvfFigdjYcgpgAuDeVXXmjUoW/9+Ybdc19d4ypsrxAGqMjQqLKgBLrAewDu
 DS80ihnn
 =LR8C
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Six smb3 client fixes:

   - Fixes for copy_file_range and clone (cache invalidation and file
     size), also addresses an xfstest failure

   - Fix to return proper error if REMAP_FILE_DEDUP set (also fixes
     xfstest generic/304)

   - Fix potential null pointer reference with DFS

   - Multichannel fix addressing (reverting an earlier patch) some of
     the problems with enabling/disabling channels dynamically

  Still working on a followon multichannel fix to address another issue
  found in reconnect testing that will send next week"

* tag '6.7-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: reconnect worker should take reference on server struct unconditionally
  Revert "cifs: reconnect work should have reference on server struct"
  cifs: Fix non-availability of dedup breaking generic/304
  smb: client: fix potential NULL deref in parse_dfs_referrals()
  cifs: Fix flushing, invalidation and file size with FICLONE
  cifs: Fix flushing, invalidation and file size with copy_file_range()
2023-12-09 12:10:56 -08:00
Linus Torvalds
8e819a7623 31 hotfixes. 10 of these address pre-6.6 issues and are marked cc:stable.
The remainder address post-6.6 issues or aren't considered serious enough
 to justify backporting.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZXKEfwAKCRDdBJ7gKXxA
 jlRpAQCiAp1nSqIz/fOKTzoQRaTDXU/m+C+6ZAXdKLDfvQBhpwEAnxxjZ8IgF+8Z
 Klz/GirHX5w5o7jE2wb8iObo1nR75Qo=
 =omRq
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "31 hotfixes. Ten of these address pre-6.6 issues and are marked
  cc:stable. The remainder address post-6.6 issues or aren't considered
  serious enough to justify backporting"

* tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
  nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
  mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
  scripts/gdb: fix lx-device-list-bus and lx-device-list-class
  MAINTAINERS: drop Antti Palosaari
  highmem: fix a memory copy problem in memcpy_from_folio
  nilfs2: fix missing error check for sb_set_blocksize call
  kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
  units: add missing header
  drivers/base/cpu: crash data showing should depends on KEXEC_CORE
  mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
  scripts/gdb/tasks: fix lx-ps command error
  mm/Kconfig: make userfaultfd a menuconfig
  selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
  mm/damon/core: copy nr_accesses when splitting region
  lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
  checkstack: fix printed address
  mm/memory_hotplug: fix error handling in add_memory_resource()
  mm/memory_hotplug: add missing mem_hotplug_lock
  .mailmap: add a new address mapping for Chester Lin
  ...
2023-12-08 08:36:23 -08:00
Namjae Jeon
1373665448 ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
MS confirm that "AISi" name of SMB2_CREATE_ALLOCATION_SIZE in MS-SMB2
specification is a typo. cifs/ksmbd have been using this wrong name from
MS-SMB2. It should be "AlSi". Also It will cause problem when running
smb2.create.open test in smbtorture against ksmbd.

Cc: stable@vger.kernel.org
Fixes: 12197a7fdd ("Clarify SMB2/SMB3 create context and add missing ones")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
a9f106c765 ksmbd: fix wrong allocation size update in smb2_open()
When client send SMB2_CREATE_ALLOCATION_SIZE create context, ksmbd update
old size to ->AllocationSize in smb2 create response. ksmbd_vfs_getattr()
should be called after it to get updated stat result.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
658609d9a6 ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
opinfo_put() could be called twice on error of smb21_lease_break_ack().
It will cause UAF issue if opinfo is referenced on other places.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
c2a721eead ksmbd: lazy v2 lease break on smb2_write()
Don't immediately send directory lease break notification on smb2_write().
Instead, It postpones it until smb2_close().

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
d47d9886ae ksmbd: send v2 lease break notification for directory
If client send different parent key, different client guid, or there is
no parent lease key flags in create context v2 lease, ksmbd send lease
break to client.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Kent Overstreet
6d1980f0af bcachefs: Fix deleted inode check for dirs
We could delete directories transactionally on rmdir()/unlink(), but we
don't; instead, like with regular files we wait for the VFS to call
evict().

That means that our check for directories in the deleted inodes btree is
wrong - the check should be for non-empty directories.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-08 00:39:56 -05:00
Ryusuke Konishi
675abf8df1 nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
If nilfs2 reads a disk image with corrupted segment usage metadata, and
its segment usage information is marked as an error for the segment at the
write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs
during log writing.

Segments newly allocated for writing with nilfs_sufile_alloc() will not
have this error flag set, but this unexpected situation will occur if the
segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active
segment) was marked in error.

Fix this issue by inserting a sanity check to treat it as a file system
corruption.

Since error returns are not allowed during the execution phase where
nilfs_sufile_set_segment_usage() is used, this inserts the sanity check
into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the
segment usage record to be updated and sets it up in a dirty state for
writing.

In addition, nilfs_sufile_set_segment_usage() is also called when
canceling log writing and undoing segment usage update, so in order to
avoid issuing the same kernel warning in that case, in case of
cancellation, avoid checking the error flag in
nilfs_sufile_set_segment_usage().

Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:50 -08:00
Sidhartha Kumar
4a3ef6be03 mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
After commit a08c7193e4 "mm/filemap: remove hugetlb special casing in
filemap.c", hugetlb pages are stored in the page cache in base page sized
indexes.  This leads to multi index stores in the xarray which is only
supporting through CONFIG_XARRAY_MULTI.  The other page cache user of
multi index stores ,THP, selects XARRAY_MULTI.  Have CONFIG_HUGETLB_PAGE
follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE
&& !CONFIG_XARRAY_MULTI config.

Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com
Fixes: a08c7193e4 ("mm/filemap: remove hugetlb special casing in filemap.c")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:49 -08:00
Ryusuke Konishi
d61d0ab573 nilfs2: fix missing error check for sb_set_blocksize call
When mounting a filesystem image with a block size larger than the page
size, nilfs2 repeatedly outputs long error messages with stack traces to
the kernel log, such as the following:

 getblk(): invalid block size 8192 requested
 logical block size: 512
 ...
 Call Trace:
  dump_stack_lvl+0x92/0xd4
  dump_stack+0xd/0x10
  bdev_getblk+0x33a/0x354
  __breadahead+0x11/0x80
  nilfs_search_super_root+0xe2/0x704 [nilfs2]
  load_nilfs+0x72/0x504 [nilfs2]
  nilfs_mount+0x30f/0x518 [nilfs2]
  legacy_get_tree+0x1b/0x40
  vfs_get_tree+0x18/0xc4
  path_mount+0x786/0xa88
  __ia32_sys_mount+0x147/0x1a8
  __do_fast_syscall_32+0x56/0xc8
  do_fast_syscall_32+0x29/0x58
  do_SYSENTER_32+0x15/0x18
  entry_SYSENTER_32+0x98/0xf1
 ...

This overloads the system logger.  And to make matters worse, it sometimes
crashes the kernel with a memory access violation.

This is because the return value of the sb_set_blocksize() call, which
should be checked for errors, is not checked.

The latter issue is due to out-of-buffer memory being accessed based on a
large block size that caused sb_set_blocksize() to fail for buffers read
with the initial minimum block size that remained unupdated in the
super_block structure.

Since nilfs2 mkfs tool does not accept block sizes larger than the system
page size, this has been overlooked.  However, it is possible to create
this situation by intentionally modifying the tool or by passing a
filesystem image created on a system with a large page size to a system
with a smaller page size and mounting it.

Fix this issue by inserting the expected error handling for the call to
sb_set_blocksize().

Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:48 -08:00
Lizhi Xu
eb66b8abae squashfs: squashfs_read_data need to check if the length is 0
When the length passed in is 0, the pagemap_scan_test_walk() caller should
bail.  This error causes at least a WARN_ON().

Link: https://lkml.kernel.org/r/20231116031352.40853-1-lizhi.xu@windriver.com
Reported-by: syzbot+32d3767580a1ea339a81@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000000526f2060a30a085@google.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:45 -08:00
Peter Xu
4980e837ca mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set
The new pagemap ioctl contains a fast path for wr-protections without
looking into category masks.  It forgets to check PM_SCAN_WP_MATCHING
before applying the wr-protections.  It can cause, e.g., pte markers
installed on archs that do not even support uffd wr-protect.

WARNING: CPU: 0 PID: 5059 at mm/memory.c:1520 zap_pte_range mm/memory.c:1520 [inline]

Link: https://lkml.kernel.org/r/20231116201547.536857-3-peterx@redhat.com
Fixes: 12f6b01a0b ("fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+7ca4b2719dc742b8d0a4@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:45 -08:00
Peter Xu
0dff1b407d mm/pagemap: fix ioctl(PAGEMAP_SCAN) on vma check
Patch series "mm/pagemap: A few fixes to the recent PAGEMAP_SCAN".

This series should fix two known reports from syzbot on the new
PAGEMAP_SCAN ioctl():

https://lore.kernel.org/all/000000000000b0e576060a30ee3b@google.com/
https://lore.kernel.org/all/000000000000773fa7060a31e2cc@google.com/

The 3rd patch is something I found when testing these patches.


This patch (of 3):

The new ioctl(PAGEMAP_SCAN) relies on vma wr-protect capability provided
by userfault, however in the vma test it didn't explicitly require the vma
to have wr-protect function enabled, even if PM_SCAN_WP_MATCHING flag is
set.

It means the pagemap code can now apply uffd-wp bit to a page in the vma
even if not registered to userfaultfd at all.

Then in whatever way as long as the pte got written and page fault
resolved, we'll apply the write bit even if uffd-wp bit is set.  We'll see
a pte that has both UFFD_WP and WRITE bit set.  Anything later that looks
up the pte for uffd-wp bit will trigger the warning:

WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]

Fix it by doing proper check over the vma attributes when
PM_SCAN_WP_MATCHING is specified.

Link: https://lkml.kernel.org/r/20231116201547.536857-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20231116201547.536857-2-peterx@redhat.com
Fixes: 52526ca7fd ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+e94c5aaf7890901ebf9b@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:44 -08:00
Daniel Hill
e597288839 bcachefs: rebalance shouldn't attempt to compress unwritten extents
This fixes a bug where rebalance would loop repeatedly on the same
extents.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 17:43:21 -05:00
Boris Burkov
e85a0adacf btrfs: ensure releasing squota reserve on head refs
A reservation goes through a 3 step lifetime:

- generated during delalloc
- released/counted by ordered_extent allocation
- freed by running delayed ref

That third step depends on must_insert_reserved on the head ref, so the
head ref with that field set owns the reservation. Once you prepare to
run the head ref, must_insert_reserved is unset, which means that
running the ref must free the reservation, whether or not it succeeds,
or else the reservation is leaked. That results in either a risk of
spurious ENOSPC if the fs stays writeable or a warning on unmount if it
is readonly.

The existing squota code was aware of these invariants, but missed a few
cases. Improve it by adding a helper function to use in the cleanup
paths and call it from the existing early returns in running delayed
refs. This also simplifies btrfs_record_squota_delta and struct
btrfs_quota_delta.

This fixes (or at least improves the reliability of) generic/475 with
"mkfs -O squota". On my machine, that test failed ~4/10 times without
this patch and passed 100/100 times with it.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:57 +01:00
Boris Burkov
a86805504b btrfs: don't clear qgroup reserved bit in release_folio
The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for
duplicate reservations. That is two writes to that range in one
transaction shouldn't create two reservations, as the reservation will
only be freed once when the write finally goes down. Therefore, it is
never OK to clear that bit without freeing the associated qgroup
reserve. At this point, we don't want to be freeing the reserve, so mask
off the bit.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:52 +01:00
Boris Burkov
b321a52cce btrfs: free qgroup pertrans reserve on transaction abort
If we abort a transaction, we never run the code that frees the pertrans
qgroup reservation. This results in warnings on unmount as that
reservation has been leaked. The leak isn't a huge issue since the fs is
read-only, but it's better to clean it up when we know we can/should. Do
it during the cleanup_transaction step of aborting.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:49 +01:00
Boris Burkov
9e65bfca24 btrfs: fix qgroup_free_reserved_data int overflow
The reserved data counter and input parameter is a u64, but we
inadvertently accumulate it in an int. Overflowing that int results in
freeing the wrong amount of data and breaking reserve accounting.

Unfortunately, this overflow rot spreads from there, as the qgroup
release/free functions rely on returning an int to take advantage of
negative values for error codes.

Therefore, the full fix is to return the "released" or "freed" amount by
a u64 argument and to return 0 or negative error code via the return
value.

Most of the call sites simply ignore the return value, though some
of them handle the error and count the returned bytes. Change all of
them accordingly.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:46 +01:00
Boris Burkov
f63e1164b9 btrfs: free qgroup reserve when ORDERED_IOERR is set
An ordered extent completing is a critical moment in qgroup reserve
handling, as the ownership of the reservation is handed off from the
ordered extent to the delayed ref. In the happy path we release (unlock)
but do not free (decrement counter) the reservation, and the delayed ref
drives the free. However, on an error, we don't create a delayed ref,
since there is no ref to add. Therefore, free on the error path.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:40 +01:00
Shyam Prasad N
04909192ad cifs: reconnect worker should take reference on server struct unconditionally
Reconnect worker currently assumes that the server struct
is alive and only takes reference on the server if it needs
to call smb2_reconnect.

With the new ability to disable channels based on whether the
server has multichannel disabled, this becomes a problem when
we need to disable established channels. While disabling the
channels and deallocating the server, there could be reconnect
work that could not be cancelled (because it started).

This change forces the reconnect worker to unconditionally
take a reference on the server when it runs.

Also, this change now allows smb2_reconnect to know if it was
called by the reconnect worker. Based on this, the cifs_put_tcp_session
can decide whether it can cancel the reconnect work synchronously or not.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-06 11:04:23 -06:00
Shyam Prasad N
8233425248 Revert "cifs: reconnect work should have reference on server struct"
This reverts commit 19a4b9d6c3.

This earlier commit was making an assumption that each mod_delayed_work
called for the reconnect work would result in smb2_reconnect_server
being called twice. This assumption turns out to be untrue. So reverting
this change for now.

I will submit a follow-up patch to fix the actual problem in a different
way.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-06 11:03:36 -06:00
Brian Foster
5796230582 bcachefs: don't attempt rw on unfreeze when shutdown
The internal freeze mechanism in bcachefs mostly reuses the generic
rw<->ro transition code. If the fs happens to shutdown during or
after freeze, a transition back to rw can fail. This is expected,
but returning an error from the unfreeze callout prevents the
filesystem from being unfrozen.

Skip the read write transition if the fs is shutdown. This allows
the fs to unfreeze at the vfs level so writes will no longer block,
but will still fail due to the emergency read-only state of the fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 00:21:30 -05:00
Kent Overstreet
7aebaabfed bcachefs: Fix creating snapshot with implict source
When creating a snapshot without specifying the source subvolume, we use
the subvolume containing the new snapshot.

Previously, this worked if the directory containing the new snapshot was
the subvolume root - but we were using the incorrect helper, and got a
subvolume ID of 0 when the parent directory wasn't the root of the
subvolume, causing an emergency read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 00:21:30 -05:00
David Howells
691a41d8da cifs: Fix non-availability of dedup breaking generic/304
Deduplication isn't supported on cifs, but cifs doesn't reject it, instead
treating it as extent duplication/cloning.  This can cause generic/304 to go
silly and run for hours on end.

Fix cifs to indicate EOPNOTSUPP if REMAP_FILE_DEDUP is set in
->remap_file_range().

Note that it's unclear whether or not commit b073a08016 is meant to cause
cifs to return an error if REMAP_FILE_DEDUP.

Fixes: b073a08016 ("cifs: fix that return -EINVAL when do dedupe operation")
Cc: stable@vger.kernel.org
Suggested-by: Dave Chinner <david@fromorbit.com>
cc: Xiaoli Feng <fengxiaoli0714@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Darrick Wong <darrick.wong@oracle.com>
cc: fstests@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3876191.1701555260@warthog.procyon.org.uk/
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 21:12:04 -06:00
Paulo Alcantara
92414333eb smb: client: fix potential NULL deref in parse_dfs_referrals()
If server returned no data for FSCTL_DFS_GET_REFERRALS, @dfs_rsp will
remain NULL and then parse_dfs_referrals() will dereference it.

Fix this by returning -EIO when no output data is returned.

Besides, we can't fix it in SMB2_ioctl() as some FSCTLs are allowed to
return no data as per MS-SMB2 2.2.32.

Fixes: 9d49640a21 ("CIFS: implement get_dfs_refer for SMB2+")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 21:12:00 -06:00
Namjae Jeon
eb547407f3 ksmbd: downgrade RWH lease caching state to RH for directory
RWH(Read + Write + Handle) caching state is not supported for directory.
ksmbd downgrade it to RH for directory if client send RWH caching lease
state.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Namjae Jeon
18dd1c367c ksmbd: set v2 lease capability
Set SMB2_GLOBAL_CAP_DIRECTORY_LEASING to ->capabilities to inform server
support directory lease to client.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Namjae Jeon
d045850b62 ksmbd: set epoch in create context v2 lease
To support v2 lease(directory lease), ksmbd set epoch in create context
v2 lease response.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Zizhi Wo
8f17527230 ksmbd: fix memory leak in smb2_lock()
In smb2_lock(), if setup_async_work() executes successfully,
work->cancel_argv will bind the argv that generated by kmalloc(). And
release_async_work() is called in ksmbd_conn_try_dequeue_request() or
smb2_lock() to release argv.
However, when setup_async_work function fails, work->cancel_argv has not
been bound to the argv, resulting in the previously allocated argv not
being released. Call kfree() to fix it.

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Kent Overstreet
f88d811a23 bcachefs: Don't run indirect extent trigger unless inserting/deleting
This fixes a transaction path overflow reported in the snapshot deletion
path, when moving extents to the correct snapshot.

The root of the issue is that creating/deleting a reflink pointer can
generate an unbounded number of updates, if it is allowed to reference
an unbounded number of indirect extents; to prevent this, merging of
reflink pointers has been disabled.

But there's a hole, which is that copygc/rebalance may fragment existing
extents in the course of moving them around, and if an indirect extent
becomes too fragmented we'll then become unable to delete the reflink
pointer.

The eventual solution is going to be to tweak trigger handling so that
we can process large reflink pointers incrementally when necessary, and
notice that trigger updates don't need to be run for the part of the
reflink pointer not changing. That is going to be a bigger project
though, for another patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
adcf4ee642 bcachefs: Convert compression_stats to for_each_btree_key2
for_each_btree_key2() runs each loop iteration in a btree transaction,
and thus does not cause SRCU lock hold time problems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
131898b0cb bcachefs: Fix bch2_extent_drop_ptrs() call
Also, make bch2_extent_drop_ptrs() safer, so it works with extents and
non-extents iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
87b0d8d3d0 bcachefs: Fix a journal deadlock in replay
Recently, journal pre-reservations were removed. They were for reserving
space ahead of time in the journal for operations that are required for
journal reclaim, e.g. btree key cache flushing and interior node btree
updates.

Instead we have watermarks - only operations for journal reclaim are
allowed when the journal is low on space, and in general we're quite
good about doing operations in the order that will free up space in the
journal quickest when we're low on space. If we're doing a journal
reclaim operation out of order, we usually do it in nonblocking mode if
it's not freeing up space at the end of the journal.

There's an exceptino though - interior btree node update operations have
to be BCH_WATERMARK_reclaim - once they've been started, and they can't
be nonblocking. Generally this is fine because they'll only be a very
small fraction of transaction commits - but there's an exception, which
is during journal replay.

Journal replay does many btree operations, but doesn't need to commit
them to the journal since they're already in the journal. So killing off
of pre-reservation, plus another change to make journal replay more
efficient by initially doing the replay in sorted btree order, made it
possible for the interior update operations replay generates to fill and
deadlock the journal.

Fix this by introducing a new check on journal space at the _start_ of
an interior update operation. This causes us to block if necessary in
exactly the same way as we used to when interior updates took a journal
pre-reservaiton, but without all the expensive accounting
pre-reservations required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
ef6fae4a13 bcachefs; Don't use btree write buffer until journal replay is finished
The keys being replayed by journal replay have to be synchronized with
updates by other threads that overwrite them. We rely on btree node
locks for synchronizing - but since btree write buffer updates take no
btree locks, that won't work.

Instead, simply disable using the btree write buffer until journal
replay is finished.

This fixes a rare backpointers error in the merge_torture_flakey test.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 15:46:31 -05:00
David Howells
c54fc3a4f3 cifs: Fix flushing, invalidation and file size with FICLONE
Fix a number of issues in the cifs filesystem implementation of the FICLONE
ioctl in cifs_remap_file_range().  This is analogous to the previously
fixed bug in cifs_file_copychunk_range() and can share the helper
functions.

Firstly, the invalidation of the destination range is handled incorrectly:
We shouldn't just invalidate the whole file as dirty data in the file may
get lost and we can't just call truncate_inode_pages_range() to invalidate
the destination range as that will erase parts of a partial folio at each
end whilst invalidating and discarding all the folios in the middle.  We
need to force all the folios covering the range to be reloaded, but we
mustn't lose dirty data in them that's not in the destination range.

Further, we shouldn't simply round out the range to PAGE_SIZE at each end
as cifs should move to support multipage folios.

Secondly, there's an issue whereby a write may have extended the file
locally, but not have been written back yet.  This can leaves the local
idea of the EOF at a later point than the server's EOF.  If a clone request
is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE
(which gets translated to -EIO locally) if the clone source extends past
the server's EOF.

Fix this by:

 (0) Flush the source region (already done).  The flush does nothing and
     the EOF isn't moved if the source region has no dirty data.

 (1) Move the EOF to the end of the source region if it isn't already at
     least at this point.  If we can't do this, for instance if the server
     doesn't support it, just flush the entire source file.

 (2) Find the folio (if present) at each end of the range, flushing it and
     increasing the region-to-be-invalidated to cover those in their
     entirety.

 (3) Fully discard all the folios covering the range as we want them to be
     reloaded.

 (4) Then perform the extent duplication.

Thirdly, set i_size after doing the duplicate_extents operation as this
value may be used by various things internally.  stat() hides the issue
because setting ->time to 0 causes cifs_getatr() to revalidate the
attributes.

These were causing the cifs/001 xfstest to fail.

Fixes: 04b38d6012 ("vfs: pull btrfs clone API to vfs layer")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
cc: Christoph Hellwig <hch@lst.de>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04 14:15:05 -06:00
David Howells
7b2404a886 cifs: Fix flushing, invalidation and file size with copy_file_range()
Fix a number of issues in the cifs filesystem implementation of the
copy_file_range() syscall in cifs_file_copychunk_range().

Firstly, the invalidation of the destination range is handled incorrectly:
We shouldn't just invalidate the whole file as dirty data in the file may
get lost and we can't just call truncate_inode_pages_range() to invalidate
the destination range as that will erase parts of a partial folio at each
end whilst invalidating and discarding all the folios in the middle.  We
need to force all the folios covering the range to be reloaded, but we
mustn't lose dirty data in them that's not in the destination range.

Further, we shouldn't simply round out the range to PAGE_SIZE at each end
as cifs should move to support multipage folios.

Secondly, there's an issue whereby a write may have extended the file
locally, but not have been written back yet.  This can leaves the local
idea of the EOF at a later point than the server's EOF.  If a copy request
is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE
(which gets translated to -EIO locally) if the copy source extends past the
server's EOF.

Fix this by:

 (0) Flush the source region (already done).  The flush does nothing and
     the EOF isn't moved if the source region has no dirty data.

 (1) Move the EOF to the end of the source region if it isn't already at
     least at this point.  If we can't do this, for instance if the server
     doesn't support it, just flush the entire source file.

 (2) Find the folio (if present) at each end of the range, flushing it and
     increasing the region-to-be-invalidated to cover those in their
     entirety.

 (3) Fully discard all the folios covering the range as we want them to be
     reloaded.

 (4) Then perform the copy.

Thirdly, set i_size after doing the copychunk_range operation as this value
may be used by various things internally.  stat() hides the issue because
setting ->time to 0 causes cifs_getatr() to revalidate the attributes.

These were causing the generic/075 xfstest to fail.

Fixes: 620d8745b3 ("Introduce cifs_copy_file_range()")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04 14:14:43 -06:00
Amir Goldstein
3f29f1c336 fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
The new fuse init flag FUSE_DIRECT_IO_ALLOW_MMAP breaks assumptions made by
FOPEN_PARALLEL_DIRECT_WRITES and causes test generic/095 to hit
BUG_ON(fi->writectr < 0) assertions in fuse_set_nowrite():

generic/095 5s ...
  kernel BUG at fs/fuse/dir.c:1756!
...
  ? fuse_set_nowrite+0x3d/0xdd
  ? do_raw_spin_unlock+0x88/0x8f
  ? _raw_spin_unlock+0x2d/0x43
  ? fuse_range_is_writeback+0x71/0x84
  fuse_sync_writes+0xf/0x19
  fuse_direct_io+0x167/0x5bd
  fuse_direct_write_iter+0xf0/0x146

Auto disable FOPEN_PARALLEL_DIRECT_WRITES when server negotiated
FUSE_DIRECT_IO_ALLOW_MMAP.

Fixes: e78662e818 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04 10:19:32 +01:00