Commit graph

1234480 commits

Author SHA1 Message Date
Chandan Babu R
47c460efc4 xfs: fix realtime geometry integer overflows [v2]
While reading through the realtime geometry support code in xfsprogs, I
 noticed a discrepancy between the sb_rextslog computation used when
 writing out the superblock during mkfs and the validation code used in
 xfs_repair.  This discrepancy would lead to system failure for a runt rt
 volume having more than 1 rt block but zero rt extents in length.  Most
 people aren't going to configure a 1M extent size for their 360k rt
 floppy disk volume, but I did!
 
 In the process of studying that code, it occurred to me that there is a
 second bug in the computation -- the use of highbit32 for a 64-bit
 value means that the upper 32 bits are not considered in the search for
 a high bit.  This causes the creation of a realtime summary file that is
 the wrong length.  If rextents is a multiple of U32_MAX then this will
 appear to work fine because highbit32 returns -1 for an input of 0; but
 for all other cases the rt summary is undersized, leading to failures.
 
 Fix the first problem by standardizing the computation with a helper in
 libxfs; and the second problem by correcting the computation.  This will
 cause any existing rt volumes larger than 2^32 blocks to fail validation
 but they probably were already crashing anyway.
 
 v2: pick up review tags
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR
 prfnAQDVz4i7wygrSGFefrlRwum7OcfjnEO1DMbGmtRK70o9LAEA3qX57rLGwevB
 iltpNB7QGdi5LuCvn2eR608gMBDY6wg=
 =DszP
 -----END PGP SIGNATURE-----

Merge tag 'fix-rtmount-overflows-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA

xfs: fix realtime geometry integer overflows

While reading through the realtime geometry support code in xfsprogs, I
noticed a discrepancy between the sb_rextslog computation used when
writing out the superblock during mkfs and the validation code used in
xfs_repair.  This discrepancy would lead to system failure for a runt rt
volume having more than 1 rt block but zero rt extents in length.  Most
people aren't going to configure a 1M extent size for their 360k rt
floppy disk volume, but I did!

In the process of studying that code, it occurred to me that there is a
second bug in the computation -- the use of highbit32 for a 64-bit
value means that the upper 32 bits are not considered in the search for
a high bit.  This causes the creation of a realtime summary file that is
the wrong length.  If rextents is a multiple of U32_MAX then this will
appear to work fine because highbit32 returns -1 for an input of 0; but
for all other cases the rt summary is undersized, leading to failures.

Fix the first problem by standardizing the computation with a helper in
libxfs; and the second problem by correcting the computation.  This will
cause any existing rt volumes larger than 2^32 blocks to fail validation
but they probably were already crashing anyway.

This has been lightly tested with fstests.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'fix-rtmount-overflows-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: don't allow overly small or large realtime volumes
  xfs: fix 32-bit truncation in xfs_compute_rextslog
  xfs: make rextslog computation consistent with mkfs
2023-12-07 14:03:11 +05:30
Chandan Babu R
34d3866668 xfs: continue removing defer item boilerplate [v2]
Now that we've restructured log intent item recovery to reconstruct the
 incore deferred work state, apply further cleanups to that code to
 remove boilerplate that is duplicated across all the _item.c files.
 Having done that, collapse a bunch of trivial helpers to reduce the
 overall call chain.  That enables us to refactor the relog code so that
 the ->relog_item implementations only have to know how to format the
 implementation-specific data encoded in an intent item and don't
 themselves have to handle the log item juggling.
 
 v2: pick up rvb tags
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR
 pn7tAQD1AmShSQYrSkbrxBJGy7pvT7T/KkaMvV/CDQiGU0N6+wEA2DfX33nmGRC1
 g8THbLLBsFzdYPVXyKSXdAEC6zzKYgA=
 =V5xP
 -----END PGP SIGNATURE-----

Merge tag 'reconstruct-defer-cleanups-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA

xfs: continue removing defer item boilerplate

Now that we've restructured log intent item recovery to reconstruct the
incore deferred work state, apply further cleanups to that code to
remove boilerplate that is duplicated across all the _item.c files.
Having done that, collapse a bunch of trivial helpers to reduce the
overall call chain.  That enables us to refactor the relog code so that
the ->relog_item implementations only have to know how to format the
implementation-specific data encoded in an intent item and don't
themselves have to handle the log item juggling.

This has been lightly tested with fstests.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'reconstruct-defer-cleanups-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: move ->iop_relog to struct xfs_defer_op_type
  xfs: collapse the ->create_done functions
  xfs: hoist xfs_trans_add_item calls to defer ops functions
  xfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog
  xfs: use xfs_defer_create_done for the relogging operation
  xfs: hoist ->create_intent boilerplate to its callsite
  xfs: collapse the ->finish_item helpers
  xfs: hoist intent done flag setting to ->finish_item callsite
  xfs: don't set XFS_TRANS_HAS_INTENT_DONE when there's no ATTRD log item
2023-12-07 13:58:08 +05:30
Chandan Babu R
6b4ffe97e9 xfs: log intent item recovery should reconstruct defer work state [v3]
Long Li reported a KASAN report from a UAF when intent recovery fails:
 
  ==================================================================
  BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
  Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
  CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
  Workqueue: xfs-cil/sda xlog_cil_push_work
  Call Trace:
   <TASK>
   dump_stack_lvl+0x50/0x70
   print_report+0xc2/0x600
   kasan_report+0xb6/0xe0
   xfs_cui_release+0xb7/0xc0
   xfs_cud_item_release+0x3c/0x90
   xfs_trans_committed_bulk+0x2d5/0x7f0
   xlog_cil_committed+0xaba/0xf20
   xlog_cil_push_work+0x1a60/0x2360
   process_one_work+0x78e/0x1140
   worker_thread+0x58b/0xf60
   kthread+0x2cd/0x3c0
   ret_from_fork+0x1f/0x30
   </TASK>
 
  Allocated by task 531:
   kasan_save_stack+0x22/0x40
   kasan_set_track+0x25/0x30
   __kasan_slab_alloc+0x55/0x60
   kmem_cache_alloc+0x195/0x5f0
   xfs_cui_init+0x198/0x1d0
   xlog_recover_cui_commit_pass2+0x133/0x5f0
   xlog_recover_items_pass2+0x107/0x230
   xlog_recover_commit_trans+0x3e7/0x9c0
   xlog_recovery_process_trans+0x140/0x1d0
   xlog_recover_process_ophdr+0x1a0/0x3d0
   xlog_recover_process_data+0x108/0x2d0
   xlog_recover_process+0x1f6/0x280
   xlog_do_recovery_pass+0x609/0xdb0
   xlog_do_log_recovery+0x84/0xe0
   xlog_do_recover+0x7d/0x470
   xlog_recover+0x25f/0x490
   xfs_log_mount+0x2dd/0x6f0
   xfs_mountfs+0x11ce/0x1e70
   xfs_fs_fill_super+0x10ec/0x1b20
   get_tree_bdev+0x3c8/0x730
   vfs_get_tree+0x89/0x2c0
   path_mount+0xecf/0x1800
   do_mount+0xf3/0x110
   __x64_sys_mount+0x154/0x1f0
   do_syscall_64+0x39/0x80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
 
  Freed by task 531:
   kasan_save_stack+0x22/0x40
   kasan_set_track+0x25/0x30
   kasan_save_free_info+0x2b/0x40
   __kasan_slab_free+0x114/0x1b0
   kmem_cache_free+0xf8/0x510
   xfs_cui_item_free+0x95/0xb0
   xfs_cui_release+0x86/0xc0
   xlog_recover_cancel_intents.isra.0+0xf8/0x210
   xlog_recover_finish+0x7e7/0x980
   xfs_log_mount_finish+0x2bb/0x4a0
   xfs_mountfs+0x14bf/0x1e70
   xfs_fs_fill_super+0x10ec/0x1b20
   get_tree_bdev+0x3c8/0x730
   vfs_get_tree+0x89/0x2c0
   path_mount+0xecf/0x1800
   do_mount+0xf3/0x110
   __x64_sys_mount+0x154/0x1f0
   do_syscall_64+0x39/0x80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
 
  The buggy address belongs to the object at ffff888012575dc8
   which belongs to the cache xfs_cui_item of size 432
  The buggy address is located 152 bytes inside of
   freed 432-byte region [ffff888012575dc8, ffff888012575f78)
 
  The buggy address belongs to the physical page:
  page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
  head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
  page_type: 0xffffffff()
  raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
  raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
 
  Memory state around the buggy address:
   ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
   ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
  >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                         ^
   ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
  ==================================================================
 
 "If process intents fails, intent items left in AIL will be delete
 from AIL and freed in error handling, even intent items that have been
 recovered and created done items. After this, uaf will be triggered when
 done item committed, because at this point the released intent item will
 be accessed.
 
 xlog_recover_finish                     xlog_cil_push_work
 ----------------------------            ---------------------------
 xlog_recover_process_intents
   xfs_cui_item_recover//cui_refcount == 1
     xfs_trans_get_cud
     xfs_trans_commit
       <add cud item to cil>
   xfs_cui_item_recover
     <error occurred and return>
 xlog_recover_cancel_intents
   xfs_cui_release     //cui_refcount == 0
     xfs_cui_item_free //free cui
   <release other intent items>
 xlog_force_shutdown   //shutdown
                                <...>
                                         <push items in cil>
                                         xlog_cil_committed
                                           xfs_cud_item_release
                                             xfs_cui_release // UAF
 
 "Intent log items are created with a reference count of 2, one for the
 creator, and one for the intent done object. Log recovery explicitly
 drops the creator reference after it is inserted into the AIL, but it
 then processes the log item as if it also owns the intent-done reference.
 
 "The code in ->iop_recovery should assume that it passes the reference
 to the done intent, we can remove the intent item from the AIL after
 creating the done-intent, but if that code fails before creating the
 done-intent then it needs to release the intent reference by log recovery
 itself.
 
 "That way when we go to cancel the intent, the only intents we find in
 the AIL are the ones we know have not been processed yet and hence we
 can safely drop both the creator and the intent done reference from
 xlog_recover_cancel_intents().
 
 "Hence if we remove the intent from the list of intents that need to
 be recovered after we have done the initial recovery, we acheive two
 things:
 
 "1. the tail of the log can be moved forward with the commit of the
 done intent or new intent to continue the operation, and
 
 "2. We avoid the problem of trying to determine how many reference
 counts we need to drop from intent recovery cancelling because we
 never come across intents we've actually attempted recovery on."
 
 Restated: The cause of the UAF is that xlog_recover_cancel_intents
 thinks that it owns the refcount on any intent item in the AIL, and that
 it's always safe to release these intent items.  This is not true after
 the recovery function creates an log intent done item and points it at
 the log intent item because releasing the done item always releases the
 intent item.
 
 The runtime defer ops code avoids all this by tracking both the log
 intent and the intent done items, and releasing only the intent done
 item if both have been created.  Long Li proposed fixing this by adding
 state flags, but I have a more comprehensive fix.
 
 First, observe that the latter half of the intent _recover functions are
 nearly open-coded versions of the corresponding _finish_one function
 that uses an onstack deferred work item to single-step through the item.
 
 Second, notice that the recover function is not an exact match because
 of the odd behavior that unfinished recovered work items are relogged
 with separate log intent items instead of a single new log intent item,
 which is what the defer ops machinery does.
 
 Dave and I have long suspected that recovery should be reconstructing
 the defer work state from what's in the recovered intent item.  Now we
 finally have an excuse to refactor the code to do that.
 
 This series starts by fixing a resource leak in LARP recovery.  We fix
 the bug that Long Li reported by switching the intent recovery code to
 construct chains of xfs_defer_pending objects and then using the defer
 pending objects to track the intent/done item ownership.  Finally, we
 clean up the code to reconstruct the exact incore state, which means we
 can remove all the opencoded _recover code, which makes maintaining log
 items much easier.
 
 v2: minor changes per review comments
 v3: pick up more rvb tags, fix build errors
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXExxgAKCRBKO3ySh0YR
 pteoAP9mQhZ9tnB7Nj37dfx2BY6vcZXBJYDhUIzfzCh5B0wOSAD+MkTw8TTinlsq
 HAXuAxf4cjyk5TNl9sXnJ+9L4+bUVQU=
 =Y57I
 -----END PGP SIGNATURE-----

Merge tag 'reconstruct-defer-work-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA

xfs: log intent item recovery should reconstruct defer work state

Long Li reported a KASAN report from a UAF when intent recovery fails:

 ==================================================================
 BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
 Workqueue: xfs-cil/sda xlog_cil_push_work
 Call Trace:
  <TASK>
  dump_stack_lvl+0x50/0x70
  print_report+0xc2/0x600
  kasan_report+0xb6/0xe0
  xfs_cui_release+0xb7/0xc0
  xfs_cud_item_release+0x3c/0x90
  xfs_trans_committed_bulk+0x2d5/0x7f0
  xlog_cil_committed+0xaba/0xf20
  xlog_cil_push_work+0x1a60/0x2360
  process_one_work+0x78e/0x1140
  worker_thread+0x58b/0xf60
  kthread+0x2cd/0x3c0
  ret_from_fork+0x1f/0x30
  </TASK>

 Allocated by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  __kasan_slab_alloc+0x55/0x60
  kmem_cache_alloc+0x195/0x5f0
  xfs_cui_init+0x198/0x1d0
  xlog_recover_cui_commit_pass2+0x133/0x5f0
  xlog_recover_items_pass2+0x107/0x230
  xlog_recover_commit_trans+0x3e7/0x9c0
  xlog_recovery_process_trans+0x140/0x1d0
  xlog_recover_process_ophdr+0x1a0/0x3d0
  xlog_recover_process_data+0x108/0x2d0
  xlog_recover_process+0x1f6/0x280
  xlog_do_recovery_pass+0x609/0xdb0
  xlog_do_log_recovery+0x84/0xe0
  xlog_do_recover+0x7d/0x470
  xlog_recover+0x25f/0x490
  xfs_log_mount+0x2dd/0x6f0
  xfs_mountfs+0x11ce/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 Freed by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  kasan_save_free_info+0x2b/0x40
  __kasan_slab_free+0x114/0x1b0
  kmem_cache_free+0xf8/0x510
  xfs_cui_item_free+0x95/0xb0
  xfs_cui_release+0x86/0xc0
  xlog_recover_cancel_intents.isra.0+0xf8/0x210
  xlog_recover_finish+0x7e7/0x980
  xfs_log_mount_finish+0x2bb/0x4a0
  xfs_mountfs+0x14bf/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 The buggy address belongs to the object at ffff888012575dc8
  which belongs to the cache xfs_cui_item of size 432
 The buggy address is located 152 bytes inside of
  freed 432-byte region [ffff888012575dc8, ffff888012575f78)

 The buggy address belongs to the physical page:
 page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
  ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
 >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
 ==================================================================

"If process intents fails, intent items left in AIL will be delete
from AIL and freed in error handling, even intent items that have been
recovered and created done items. After this, uaf will be triggered when
done item committed, because at this point the released intent item will
be accessed.

xlog_recover_finish                     xlog_cil_push_work
----------------------------            ---------------------------
xlog_recover_process_intents
  xfs_cui_item_recover//cui_refcount == 1
    xfs_trans_get_cud
    xfs_trans_commit
      <add cud item to cil>
  xfs_cui_item_recover
    <error occurred and return>
xlog_recover_cancel_intents
  xfs_cui_release     //cui_refcount == 0
    xfs_cui_item_free //free cui
  <release other intent items>
xlog_force_shutdown   //shutdown
                               <...>
                                        <push items in cil>
                                        xlog_cil_committed
                                          xfs_cud_item_release
                                            xfs_cui_release // UAF

"Intent log items are created with a reference count of 2, one for the
creator, and one for the intent done object. Log recovery explicitly
drops the creator reference after it is inserted into the AIL, but it
then processes the log item as if it also owns the intent-done reference.

"The code in ->iop_recovery should assume that it passes the reference
to the done intent, we can remove the intent item from the AIL after
creating the done-intent, but if that code fails before creating the
done-intent then it needs to release the intent reference by log recovery
itself.

"That way when we go to cancel the intent, the only intents we find in
the AIL are the ones we know have not been processed yet and hence we
can safely drop both the creator and the intent done reference from
xlog_recover_cancel_intents().

"Hence if we remove the intent from the list of intents that need to
be recovered after we have done the initial recovery, we acheive two
things:

"1. the tail of the log can be moved forward with the commit of the
done intent or new intent to continue the operation, and

"2. We avoid the problem of trying to determine how many reference
counts we need to drop from intent recovery cancelling because we
never come across intents we've actually attempted recovery on."

Restated: The cause of the UAF is that xlog_recover_cancel_intents
thinks that it owns the refcount on any intent item in the AIL, and that
it's always safe to release these intent items.  This is not true after
the recovery function creates an log intent done item and points it at
the log intent item because releasing the done item always releases the
intent item.

The runtime defer ops code avoids all this by tracking both the log
intent and the intent done items, and releasing only the intent done
item if both have been created.  Long Li proposed fixing this by adding
state flags, but I have a more comprehensive fix.

First, observe that the latter half of the intent _recover functions are
nearly open-coded versions of the corresponding _finish_one function
that uses an onstack deferred work item to single-step through the item.

Second, notice that the recover function is not an exact match because
of the odd behavior that unfinished recovered work items are relogged
with separate log intent items instead of a single new log intent item,
which is what the defer ops machinery does.

Dave and I have long suspected that recovery should be reconstructing
the defer work state from what's in the recovered intent item.  Now we
finally have an excuse to refactor the code to do that.

This series starts by fixing a resource leak in LARP recovery.  We fix
the bug that Long Li reported by switching the intent recovery code to
construct chains of xfs_defer_pending objects and then using the defer
pending objects to track the intent/done item ownership.  Finally, we
clean up the code to reconstruct the exact incore state, which means we
can remove all the opencoded _recover code, which makes maintaining log
items much easier.

v2: minor changes per review comments
v3: pick up more rvb tags, fix build errors

This has been lightly tested with fstests.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'reconstruct-defer-work-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: move ->iop_recover to xfs_defer_op_type
  xfs: use xfs_defer_finish_one to finish recovered work items
  xfs: dump the recovered xattri log item if corruption happens
  xfs: recreate work items when recovering intent items
  xfs: transfer recovered intent item ownership in ->iop_recover
  xfs: pass the xfs_defer_pending object to iop_recover
  xfs: use xfs_defer_pending objects to recover intent items
  xfs: don't leak recovered attri intent items
2023-12-07 13:50:54 +05:30
Darrick J. Wong
3f3cec0310 xfs: force small EFIs for reaping btree extents
Introduce the concept of a defer ops barrier to separate consecutively
queued pending work items of the same type.  With a barrier in place,
the two work items will be tracked separately, and receive separate log
intent items.  The goal here is to prevent reaping of old metadata
blocks from creating unnecessarily huge EFIs that could then run the
risk of overflowing the scrub transaction.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:19 -08:00
Darrick J. Wong
6bb9ea8ecd xfs: log EFIs for all btree blocks being used to stage a btree
We need to log EFIs for every extent that we allocate for the purpose of
staging a new btree so that if we fail then the blocks will be freed
during log recovery.  Use the autoreaping mechanism provided by the
previous patch to attach paused freeing work to the scrub transaction.
We can then mark the EFIs stale if we decide to commit the new btree, or
we can unpause the EFIs if we decide to abort the repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:19 -08:00
Darrick J. Wong
be40841763 xfs: implement block reservation accounting for btrees we're staging
Create a new xrep_newbt structure to encapsulate a fake root for
creating a staged btree cursor as well as to track all the blocks that
we need to reserve in order to build that btree.

As for the particular choice of lowspace thresholds and btree block
slack factors -- at this point one could say that the thresholds in
online repair come from bulkload_estimate_ag_slack in xfs_repair[1].
But that's not the entire story, since the offline btree rebuilding
code in xfs_repair was merged as a retroport of the online btree code
in this patchset!

Before xfs_btree_staging.[ch] came along, xfs_repair determined the
slack factor (aka the number of slots to leave unfilled in each new
btree block) via open-coded logic in repair/phase5.c[2].  At that point
the slack factors were arbitrary quantities per btree.  The rmapbt
automatically left 10 slots free; everything else left zero.

That had a noticeable effect on performance straight after mounting
because adding records to /any/ btree would result in splits.  A few
years ago when this patch was first written, Dave and I decided that
repair should generate btree blocks that were 75% full unless space was
tight, in which case it should try to fill the blocks to nearly full.
We defined tight as ~10% free to avoid repair failures but settled on
3/32 (~9%) to avoid div64.

IOWs, we mostly pulled the thresholds out of thin air.  We've been
QAing with those geometry numbers ever since. ;)

Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/bulkload.c?h=v6.5.0#n114
Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/phase5.c?h=v4.19.0#n1349
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-12-06 18:45:18 -08:00
Darrick J. Wong
4c8ecd1cfd xfs: remove unused fields from struct xbtree_ifakeroot
Remove these unused fields since nobody uses them.  They should have
been removed years ago in a different cleanup series from Christoph
Hellwig.

Fixes: daf83964a3 ("xfs: move the per-fork nextents fields into struct xfs_ifork")
Fixes: f7e67b20ec ("xfs: move the fork format fields into struct xfs_ifork")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-12-06 18:45:18 -08:00
Darrick J. Wong
e3042be36c xfs: automatic freeing of freshly allocated unwritten space
As mentioned in the previous commit, online repair wants to allocate
space to write out a new metadata structure, and it also wants to hedge
against system crashes during repairs by logging (and later cancelling)
EFIs to free the space if we crash before committing the new data
structure.

Therefore, create a trio of functions to schedule automatic reaping of
freshly allocated unwritten space.  xfs_alloc_schedule_autoreap creates
a paused EFI representing the space we just allocated.  Once the
allocations are made and the autoreaps scheduled, we can start writing
to disk.

If the writes succeed, xfs_alloc_cancel_autoreap marks the EFI work
items as stale and unpauses the pending deferred work item.  Assuming
that's done in the same transaction that commits the new structure into
the filesystem, we guarantee that either the new object is fully
visible, or that all the space gets reclaimed.

If the writes succeed but only part of an extent was used, repair must
call the same _cancel_autoreap function to kill the first EFI and then
log a new EFI to free the unused space.  The first EFI is already
committed, so it cannot be changed.

For full extents that aren't used, xfs_alloc_commit_autoreap will
unpause the EFI, which results in the space being freed during the next
_defer_finish cycle.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:18 -08:00
Darrick J. Wong
4c88fef3af xfs: remove __xfs_free_extent_later
xfs_free_extent_later is a trivial helper, so remove it to reduce the
amount of thinking required to understand the deferred freeing
interface.  This will make it easier to introduce automatic reaping of
speculative allocations in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:18 -08:00
Darrick J. Wong
4dffb2cbb4 xfs: allow pausing of pending deferred work items
Traditionally, all pending deferred work attached to a transaction is
finished when one of the xfs_defer_finish* functions is called.
However, online repair wants to be able to allocate space for a new data
structure, format a new metadata structure into the allocated space, and
commit that into the filesystem.

As a hedge against system crashes during repairs, we also want to log
some EFI items for the allocated space speculatively, and cancel them if
we elect to commit the new data structure.

Therefore, introduce the idea of pausing a pending deferred work item.
Log intent items are still created for paused items and relogged as
necessary.  However, paused items are pushed onto a side list before we
start calling ->finish_item, and the whole list is reattach to the
transaction afterwards.  New work items are never attached to paused
pending items.

Modify xfs_defer_cancel to clean up pending deferred work items holding
a log intent item but not a log intent done item, since that is now
possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:18 -08:00
Darrick J. Wong
6b12613940 xfs: don't append work items to logged xfs_defer_pending objects
When someone tries to add a deferred work item to xfs_defer_add, it will
try to attach the work item to the most recently added xfs_defer_pending
object attached to the transaction.  However, it doesn't check if the
pending object has a log intent item attached to it.  This is incorrect
behavior because we cannot add more work to an object that has already
been committed to the ondisk log.

Therefore, change the behavior not to append to pending items with a non
null dfp_intent.  In practice this has not been an issue because the
only way xfs_defer_add gets called after log intent items have been
committed is from the defer ops ->finish_item functions themselves, and
the @dop_pending isolation in xfs_defer_finish_noroll protects the
pending items that have already been logged.

However, the next patch will add the ability to pause a deferred extent
free object during online btree rebuilding, and any new extfree work
items need to have their own pending event.

While we're at it, hoist the predicate to its own static inline function
for readability.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:18 -08:00
Darrick J. Wong
3f113c2739 xfs: make xchk_iget safer in the presence of corrupt inode btrees
When scrub is trying to iget an inode, ensure that it won't end up
deadlocked on a cycle in the inode btree by using an empty transaction
to store all the buffers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
9c07bca793 xfs: elide ->create_done calls for unlogged deferred work
Extended attribute updates use the deferred work machinery to manage
state across a chain of smaller transactions.  All previous deferred
work users have employed log intent items and log done items to manage
restarting of interrupted operations, which means that ->create_intent
sets dfp_intent to a log intent item and ->create_done uses that item to
create a log intent done item.

However, xattrs have used the INCOMPLETE flag to deal with the lack of
recovery support for an interrupted transaction chain.  Log items are
optional if the xattr update caller didn't set XFS_DA_OP_LOGGED to
require a restartable sequence.

In other words, ->create_intent can return NULL to say that there's no
log intent item.  If that's the case, no log intent done item should be
created.  Clean up xfs_defer_create_done not to do this, so that the
->create_done functions don't have to check for non-null dfp_intent
themselves.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
e14293803f xfs: don't allow overly small or large realtime volumes
Don't allow realtime volumes that are less than one rt extent long.
This has been broken across 4 LTS kernels with nobody noticing, so let's
just disable it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
a49c708f9a xfs: move ->iop_relog to struct xfs_defer_op_type
The only log items that need relogging are the ones created for deferred
work operations, and the only part of the code base that relogs log
items is the deferred work machinery.  Move the function pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
94da54d582 xfs: document what LARP means
Christoph requested a blurb somewhere explaining exactly what LARP
means.  I don't know of a good place other than the source code (debug
knobs aren't covered in Documentation/), so here it is.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
cf8f0e6c14 xfs: fix 32-bit truncation in xfs_compute_rextslog
It's quite reasonable that some customer somewhere will want to
configure a realtime volume with more than 2^32 extents.  If they try to
do this, the highbit32() call will truncate the upper bits of the
xfs_rtbxlen_t and produce the wrong value for rextslog.  This in turn
causes the rsumlevels to be wrong, which results in a realtime summary
file that is the wrong length.  Fix that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
a6a38f309a xfs: make rextslog computation consistent with mkfs
There's a weird discrepancy in xfsprogs dating back to the creation of
the Linux port -- if there are zero rt extents, mkfs will set
sb_rextents and sb_rextslog both to zero:

	sbp->sb_rextslog =
		(uint8_t)(rtextents ?
			libxfs_highbit32((unsigned int)rtextents) : 0);

However, that's not the check that xfs_repair uses for nonzero rtblocks:

	if (sb->sb_rextslog !=
			libxfs_highbit32((unsigned int)sb->sb_rextents))

The difference here is that xfs_highbit32 returns -1 if its argument is
zero.  Unfortunately, this means that in the weird corner case of a
realtime volume shorter than 1 rt extent, xfs_repair will immediately
flag a freshly formatted filesystem as corrupt.  Because mkfs has been
writing ondisk artifacts like this for decades, we have to accept that
as "correct".  TBH, zero rextslog for zero rtextents makes more sense to
me anyway.

Regrettably, the superblock verifier checks created in commit copied
xfs_repair even though mkfs has been writing out such filesystems for
ages.  Fix the superblock verifier to accept what mkfs spits out; the
userspace version of this patch will have to fix xfs_repair as well.

Note that the new helper leaves the zeroday bug where the upper 32 bits
of sb_rextents is ripped off and fed to highbit32.  This leads to a
seriously undersized rt summary file, which immediately breaks mkfs:

$ hugedisk.sh foo /dev/sdc $(( 0x100000080 * 4096))B
$ /sbin/mkfs.xfs -f /dev/sda -m rmapbt=0,reflink=0 -r rtdev=/dev/mapper/foo
meta-data=/dev/sda               isize=512    agcount=4, agsize=1298176 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0    bigtime=1 inobtcount=1 nrext64=1
data     =                       bsize=4096   blocks=5192704, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =/dev/mapper/foo        extsz=4096   blocks=4294967424, rtextents=4294967424
Discarding blocks...Done.
mkfs.xfs: Error initializing the realtime space [117 - Structure needs cleaning]

The next patch will drop support for rt volumes with fewer than 1 or
more than 2^32-1 rt extents, since they've clearly been broken forever.

Fixes: f8e566c0f5 ("xfs: validate the realtime geometry in xfs_validate_sb_common")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:17 -08:00
Darrick J. Wong
8a9aa763e1 xfs: collapse the ->create_done functions
Move the meat of the ->create_done function helpers into ->create_done
to reduce the amount of boilerplate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:16 -08:00
Darrick J. Wong
b28852a5bd xfs: hoist xfs_trans_add_item calls to defer ops functions
Remove even more repeated boilerplate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:16 -08:00
Darrick J. Wong
3e0958be21 xfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog
Hoist this dirty flag setting to the ->iop_relog callsite to reduce
boilerplate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:16 -08:00
Darrick J. Wong
bd3a88f6b7 xfs: use xfs_defer_create_done for the relogging operation
Now that we have a helper to handle creating a log intent done item and
updating all the necessary state flags, use it to reduce boilerplate in
the ->iop_relog implementations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:16 -08:00
Darrick J. Wong
f3fd7f6fce xfs: hoist ->create_intent boilerplate to its callsite
Hoist the dirty flag setting code out of each ->create_intent
implementation up to the callsite to reduce boilerplate further.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:16 -08:00
Darrick J. Wong
e6e5299fcb xfs: collapse the ->finish_item helpers
Each log item's ->finish_item function sets up a small amount of state
and calls another function to do the work.  Collapse that other function
into ->finish_item to reduce the call stack height.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:16 -08:00
Darrick J. Wong
db7ccc0bac xfs: move ->iop_recover to xfs_defer_op_type
Finish off the series by moving the intent item recovery function
pointer to the xfs_defer_op_type struct, since this is really a deferred
work function now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:15 -08:00
Darrick J. Wong
3dd75c8db1 xfs: hoist intent done flag setting to ->finish_item callsite
Each log intent item's ->finish_item call chain inevitably includes some
code to set the dirty flag of the transaction.  If there's an associated
log intent done item, it also sets the item's dirty flag and the
transaction's INTENT_DONE flag.  This is repeated throughout the
codebase.

Reduce the LOC by moving all that to xfs_defer_finish_one.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:15 -08:00
Darrick J. Wong
e5f1a5146e xfs: use xfs_defer_finish_one to finish recovered work items
Get rid of the open-coded calls to xfs_defer_finish_one.  This also
means that the recovery transaction takes care of cleaning up the dfp,
and we have solved (I hope) all the ownership issues in recovery.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:15 -08:00
Darrick J. Wong
172538beba xfs: don't set XFS_TRANS_HAS_INTENT_DONE when there's no ATTRD log item
XFS_TRANS_HAS_INTENT_DONE is a flag to the CIL that we've added a log
intent done item to the transaction.  This enables an optimization
wherein we avoid writing out log intent and log intent done items if
they would have ended up in the same checkpoint.  This reduces writes to
the ondisk log and speeds up recovery as a result.

However, callers can use the defer ops machinery to modify xattrs
without using the log items.  In this situation, there won't be an
intent done item, so we do not need to set the flag.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:15 -08:00
Darrick J. Wong
a51489e140 xfs: dump the recovered xattri log item if corruption happens
If xfs_attri_item_recover receives a corruption error when it tries to
finish a recovered log intent item, it should dump the log item for
debugging, just like all the other log intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:15 -08:00
Darrick J. Wong
e70fb328d5 xfs: recreate work items when recovering intent items
Recreate work items for each xfs_defer_pending object when we are
recovering intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:15 -08:00
Darrick J. Wong
deb4cd8ba8 xfs: transfer recovered intent item ownership in ->iop_recover
Now that we pass the xfs_defer_pending object into the intent item
recovery functions, we know exactly when ownership of the sole refcount
passes from the recovery context to the intent done item.  At that
point, we need to null out dfp_intent so that the recovery mechanism
won't release it.  This should fix the UAF problem reported by Long Li.

Note that we still want to recreate the full deferred work state.  That
will be addressed in the next patches.

Fixes: 2e76f188fd ("xfs: cancel intents immediately if process_intents fails")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:14 -08:00
Darrick J. Wong
a050acdfa8 xfs: pass the xfs_defer_pending object to iop_recover
Now that log intent item recovery recreates the xfs_defer_pending state,
we should pass that into the ->iop_recover routines so that the intent
item can finish the recreation work.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:14 -08:00
Darrick J. Wong
03f7767c9f xfs: use xfs_defer_pending objects to recover intent items
One thing I never quite got around to doing is porting the log intent
item recovery code to reconstruct the deferred pending work state.  As a
result, each intent item open codes xfs_defer_finish_one in its recovery
method, because that's what the EFI code did before xfs_defer.c even
existed.

This is a gross thing to have left unfixed -- if an EFI cannot proceed
due to busy extents, we end up creating separate new EFIs for each
unfinished work item, which is a change in behavior from what runtime
would have done.

Worse yet, Long Li pointed out that there's a UAF in the recovery code.
The ->commit_pass2 function adds the intent item to the AIL and drops
the refcount.  The one remaining refcount is now owned by the recovery
mechanism (aka the log intent items in the AIL) with the intent of
giving the refcount to the intent done item in the ->iop_recover
function.

However, if something fails later in recovery, xlog_recover_finish will
walk the recovered intent items in the AIL and release them.  If the CIL
hasn't been pushed before that point (which is possible since we don't
force the log until later) then the intent done release will try to free
its associated intent, which has already been freed.

This patch starts to address this mess by having the ->commit_pass2
functions recreate the xfs_defer_pending state.  The next few patches
will fix the recovery functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:14 -08:00
Darrick J. Wong
07bcbdf020 xfs: don't leak recovered attri intent items
If recovery finds an xattr log intent item calling for the removal of an
attribute and the file doesn't even have an attr fork, we know that the
removal is trivially complete.  However, we can't just exit the recovery
function without doing something about the recovered log intent item --
it's still on the AIL, and not logging an attrd item means it stays
there forever.

This has likely not been seen in practice because few people use LARP
and the runtime code won't log the attri for a no-attrfork removexattr
operation.  But let's fix this anyway.

Also we shouldn't really be testing the attr fork presence until we've
taken the ILOCK, though this doesn't matter much in recovery, which is
single threaded.

Fixes: fdaf1bb3ca ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-06 18:45:14 -08:00
Linus Torvalds
33cc938e65 Linux 6.7-rc4 2023-12-03 18:52:56 +09:00
Linus Torvalds
968f35f4ab five cifs/smb3 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVqcagACgkQiiy9cAdy
 T1E/YgwAtsB7RwxufSE5CB18wKdyBySIIZBSzm9IFTzX92VWGUolui+mQDtqvmRA
 Q+JlOzHrgo+FJNYMUvT8eY5r9GKfIvrqKRsBs5EITUAqg8cGPQzgG+Jgy1NsSuD7
 Of5WBjTbZPMUPGqcI2wf3+/xGLyiQq63thUUCn9QiaYkq0SjOdc/IcZUo8dcHnVB
 N/58QX+JLpEVwjGL5NJtG4EscbyqGBEk9KTg4C9MP7emNG9LuNo/1UJCzs5SFNKk
 TlLeYpVjtjqWhhr48AfXoFfxGO8K7XuFHqiPksw5Lnl8Mcvo8mb1zZvz+xX3g016
 EQ2RzZ+UNQh5qwBALswPNRXlvWUV2gA0cQC1JKY7NgPF5bc/GRefWvsSJd0ycwud
 U1D/tYX2hRYKGZqVu5fbNy848JIDFE2AEBM9nu/77n3pEmeyD+h2F4FSsEpkevNj
 uo0cqw3C0hvMl6O1VM+pA8RfbfQuoDaQK7PyLaZS3cSVKUKcBhCLdC9WKfQGQBLF
 HEWLFzb8
 =g+0x
 -----END PGP SIGNATURE-----

Merge tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Two fallocate fixes

 - Fix warnings from new gcc

 - Two symlink fixes

* tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client, common: fix fortify warnings
  cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
  cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
  smb: client: report correct st_size for SMB and NFS symlinks
  smb: client: fix missing mode bits for SMB symlinks
2023-12-03 09:08:26 +09:00
Linus Torvalds
55abae438c firewire fixes for 6.7-rc4
This pull request includes a single patch to fix long-standing issue of
 memory leak at failure of device registration for fw_unit. We rarely
 encounter the issue, while it should be applied to stable releases, since
 it fixes inappropriate API usage.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQE66IEYNDXNBPeGKSsLtaWM8LwEwUCZWrgUQAKCRCsLtaWM8Lw
 Ew8iAQDWPv5JItuZneeRs6NisGxtwNxyxP+O3qmSG4LxXOVfzAEA+K7ubreLee2a
 lfZkWPeZnWtSIdN+eAm4JfKMhzDfrA4=
 =oODB
 -----END PGP SIGNATURE-----

Merge tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A single patch to fix long-standing issue of memory leak at failure of
  device registration for fw_unit. We rarely encounter the issue, but it
  should be applied to stable releases, since it fixes inappropriate API
  usage"

* tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix possible memory leak in create_units()
2023-12-03 09:03:07 +09:00
Linus Torvalds
1b8af6552c powerpc fixes for 6.7 #3
-  Fix corruption of f0/vs0 during FP/Vector save, seen as userspace crashes
     when using io-uring workers (in particular with MariaDB).
 
  -  Fix KVM_RUN potentially clobbering all host userspace FP/Vector registers.
 
 Thanks to: Timothy Pearson, Jens Axboe, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmVruBgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJArD/95orkDHQw68ju3/1TPDuZ1umseT9k7
 trmYgVWzgU2yd7pSGk6bwuFOwJKmaXIkQmX+FzAtT7WCWYa4dlzYkhEjpfQ+etG8
 YRVBOTChmeebZWEWlddaFLGdMX/RCcxbEOwlpQ96aJQoKWGUjTYJWMKEo5Vu/iZh
 L7ScFvxWMJcaCFmVOsVSYCmce/SbncBb2rsC/JxdNwCQ9xdmP5SpqkPlkCwVc4ej
 P0uIICvUOmDdrL2rV9n7d4iGWoTKCNUdZthLtYA8DDspk6PsLP73LmWc4ke6dqvF
 fqQC2iC/KuL/Z7AhBEUTNuxhucvq/4gf/tEc6DmLYQ7Kom0cb1GSQl56emjNFXr3
 QoHGsJLpTvVFfY6tE/MGtHFF/Cc2hn65XXdyUISORL5QCAPzGTCINIv4QdDtCDmc
 zsHfbMzzhOmAWy9fEop386UdPBrQHOZgcDBNGzhirpe1cyYEOMWaumKBoAdoYoGv
 Q/rRy8F/VocbRSI6AcV/MP3cp377MpV4NXeR/9tzQM8eSVFAsyhOKk13KAWTXINP
 CYfjKB9JjvRfRb0PN7ug46kH4C1oc3UgdRjMcE4cBniBcO/w4P5EY888MltnhFF5
 iY3ZNevO2Zm9ukPk5cWBl8hKzzUy+KXvD0Zz9ar0RBNu3ZrrOC1FX4UIMoMKOMkk
 tZKQAGoqTxdB1Q==
 =v1AA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix corruption of f0/vs0 during FP/Vector save, seen as userspace
   crashes when using io-uring workers (in particular with MariaDB)

 - Fix KVM_RUN potentially clobbering all host userspace FP/Vector
   registers

Thanks to Timothy Pearson, Jens Axboe, and Nicholas Piggin.

* tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers
  powerpc: Don't clobber f0/vs0 during fp|altivec register save
2023-12-03 08:43:35 +09:00
Linus Torvalds
17b17be28d VFIO fixes for v6.7-rc4
- Fix the lifecycle of a mutex in the pds variant driver such that
    a reset prior to opening the device won't find it uninitialized.
    Implement the release path to symmetrically destroy the mutex.
    Also switch a different lock from spinlock to mutex as the code
    path has the potential to sleep and doesn't need the spinlock
    context otherwise. (Brett Creeley)
 
  - Fix an issue detected via randconfig where KVM tries to symbol_get
    an undeclared function.  The symbol is temporarily declared
    unconditionally here, which resolves the problem and avoids churn
    relative to a series pending for the next merge window which
    resolves some of this symbol ugliness, but also fixes Kconfig
    dependencies. (Sean Christopherson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmVqaKobHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiPsEP/jlzAOfc/WsRrtBfvgJM
 GqV1/3KbDbbYJ3lf81H3c6uudZIYFaJmYLqqKkalGh9hCdKBv93Yh1ofZ79JeS8d
 Bv4AHbbhcWqFbPi+y6A9lOAIYTrMtiA4IpJU2lXDB7X0emuePlqASKdYObeYwKmJ
 Jot1/OcI4vMU29LJ5uYjEJfet+SDs0fIx3H0LsYuXbfvXIqt13tuJRKHZvMM7pg0
 ZT5ZYnfpbtF6HU5pPK73WKcbeIemJFU+8pChLcO/tyrKzFDxMWITrmiYuD+4MWgK
 0GSDhhVYlAoxDGCJL+KBy12fwIBG92c/kqZsLZdnqiIp8H+ajwdrK/zrHQlIRMCR
 gksTBnnzM97vWxX3ubkKJ9Y8Mm6XBASnE6UexbexRpD6FqKIrjvXOJpq83xsNC5X
 DBqdV4d9px7mmhMwthhmXvZEbZUen+Fk1/1W6iop9DbpQCLdJ2tzhaD0fdEpQD1+
 GhdcKA8VjSX5snbXfyiT2IGJXHURqGlLGIQYYI2cw938n3KuEJ7+EAnWjwsqod2H
 kOQA8P1Mb8+ZqzrVukbc8mTfP/S44achuoKqf79BeWGmFipUO4Y48WNANVkN+yTa
 XNVcMWEll3xIF7yjlja3R4ucGu1TItBO2rGG94F/yrUSVONHskk+BEQOOhgGmGb4
 BqeOaT4znUa7eFpkrvDlQA0d
 =p0kR
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio

Pull vfio fixes from Alex Williamson:

 - Fix the lifecycle of a mutex in the pds variant driver such that a
   reset prior to opening the device won't find it uninitialized.
   Implement the release path to symmetrically destroy the mutex. Also
   switch a different lock from spinlock to mutex as the code path has
   the potential to sleep and doesn't need the spinlock context
   otherwise (Brett Creeley)

 - Fix an issue detected via randconfig where KVM tries to symbol_get an
   undeclared function. The symbol is temporarily declared
   unconditionally here, which resolves the problem and avoids churn
   relative to a series pending for the next merge window which resolves
   some of this symbol ugliness, but also fixes Kconfig dependencies
   (Sean Christopherson)

* tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart
  vfio/pds: Fix possible sleep while in atomic context
  vfio/pds: Fix mutex lock->magic != lock warning
2023-12-03 08:37:39 +09:00
Linus Torvalds
deb4b9dd3b xen: branch for v6.7-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZWraswAKCRCAXGG7T9hj
 vlV0AP9241p7vHlIW6PIdfNZt9/dZZpuFnKHz+cE99pTZDl5nwEAoYqXUm/kEb14
 VAy7x0XIVdEt+9l4bgO2Qggx+Y184Qs=
 =APgm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - A fix for the Xen event driver setting the correct return value when
   experiencing an allocation failure

 - A fix for allocating space for a struct in the percpu area to not
   cross page boundaries (this one is for x86, a similar one for Arm was
   already in the pull request for rc3)

* tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: fix error code in xen_bind_pirq_msi_to_irq()
  x86/xen: fix percpu vcpu_info allocation
2023-12-03 08:31:53 +09:00
Linus Torvalds
669fc83452 Probes fixes for v6.7-r3:
- objpool: Fix objpool overrun case on memory/cache access delay especially
   on the big.LITTLE SoC. The objpool uses a copy of object slot index
   internal loop, but the slot index can be changed on another processor
   in parallel. In that case, the difference of 'head' local copy and the
   'slot->last' index will be bigger than local slot size. In that case,
   we need to re-read the slot::head to update it.
 
 - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
   kretprobe_holder::rp is RCU managed, it should use rcu_assign_pointer()
   and rcu_dereference_check() correctly. Also adding __rcu tag for
   finding wrong usage by sparse.
 
 - rethook: Fix to use appropriate rcu API for rethook::handler. The same
   as kretprobe, rethook::handler is RCU managed and it should use
   rcu_assign_pointer() and rcu_dereference_check(). This also adds __rcu
   tag for finding wrong usage by sparse.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVpfBobHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bNyMIAJSLICKQNuFiBJEn/rty
 ACWJ9QMOnwi0DoVaepG/m9QJh6AIUUFW4//9helmSm0GIVzxQ2+f8UeKU+sYiVtH
 ro9atea4W4+FMTvtEB1cU8oG5CDVT4WQdUXbjMktqYe3+WB8Zt8+fIP0mnbTFAVr
 yStpliGPecmlupJVRYqrJGyDdbkUxXxVlPsP/eDrHFgbBWv8Incw0f+MLGSi6LSE
 sZ1MaKCdi2tlHbtD/fiowfLoBMZwQAKY4hq/XguVsWh+BGaGUgwtif+8ESwPeu22
 KEZLyWDQ1N8XBHyOBotV7vsBEwh6LKtLGVXIBsO4KxVyGw6msxWBis0dt/tkn+kk
 LEg=
 =B9WK
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - objpool: Fix objpool overrun case on memory/cache access delay
   especially on the big.LITTLE SoC. The objpool uses a copy of object
   slot index internal loop, but the slot index can be changed on
   another processor in parallel. In that case, the difference of 'head'
   local copy and the 'slot->last' index will be bigger than local slot
   size. In that case, we need to re-read the slot::head to update it.

 - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
   kretprobe_holder::rp is RCU managed, it should use
   rcu_assign_pointer() and rcu_dereference_check() correctly. Also
   adding __rcu tag for finding wrong usage by sparse.

 - rethook: Fix to use appropriate rcu API for rethook::handler. The
   same as kretprobe, rethook::handler is RCU managed and it should use
   rcu_assign_pointer() and rcu_dereference_check(). This also adds
   __rcu tag for finding wrong usage by sparse.

* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook: Use __rcu pointer for rethook::handler
  kprobes: consistent rcu api usage for kretprobe holder
  lib: objpool: fix head overrun on RK3588 SBC
2023-12-03 08:02:49 +09:00
Linus Torvalds
815fb87b75 Power management fixes for 6.7-rc4
- Fix the AMD P-state driver's EPP sysfs interface in the cases when the
    performance governor is in use (Ayush Jain).
 
  - Make the ->fast_switch() callback in the AMD P-state driver return the
    target frequency as expected (Gautham R. Shenoy).
 
  - Allow user space to control the range of frequencies to use via
    scaling_min_freq and scaling_max_freq when AMD P-state driver is in
    use (Wyes Karny).
 
  - Prevent power domains needed for wakeup signaling from being turned
    off during system suspend on Qualcomm systems and prevent performance
    states votes from runtime-suspended devices from being lost across
    a system suspend-resume cycle in qcom-cpufreq-nvmem (Stephan Gerhold).
 
  - Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
    i.MX6ULL types that can run at that frequency (Christoph Niedermaier).
 
  - Eliminate unnecessary and harmful conversions to uW from the DTPM
    (dynamic thermal and power management) framework (Lukasz Luba).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmVqSX8SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxoGMP/1FOjpggmrnlRw9mwZqeyJPSg60A9JSJ
 2/OMFc5Up3taWn2YVayk9GI9zbkv6GlVOuJvPFQmi6O6vAG8uYuJUIns/8jnGE4b
 qT7wUKJmXrOakkyJgwByafsjjTqdV+IyVYiTlfbuyGby1I2GlGa+GGMvEbXGZdfz
 DwOmB2O+3GkxrVnbN39BXsapONAdjEeTuEB1U8bjNohhow08GhbC32B4fbJirsAA
 kmtPH94UX20JguXw9PXcszlyZWM0wEux2hXcJ2WrRid1O/Iz9/NbNEKMKGJ7CLFo
 SSh9Lkx+kjLnNnmY6XkH7wPqyBT/seDzPyn2vHD3p0KFoc69ca+zhcplJKNSgvuO
 k4kTllXfvC6x27eoBMOLfaGhYMbQgpzZ2LKZp0FF3lTa/a6JjAV4Hc2AisAGnGeu
 5QZ2UjZAZg1Pq/SmOjQtKNupY8LGt99vxmHLs+CPcJIm//dKmshkOQ3eOoOKl/0W
 e/B6tBenJGcTUU7B56cdtyeGAFdozqIEn9EQEtCswv1kLmbVvDeVGNcBRahADQXa
 EXqUcdCwWM+4ARV4Bim9ZwB4pRjtvEtM9qri5RDxTXZMgFWm3MOKDj0S9mxX7Aps
 7pQdtSM4Xa44suP7+oDHfw4c8kTEDyyj5PEW2lWtNfCsnsUBwCfcQx3jg0j9lY9o
 KtBJ8f+4Acku
 =/TF9
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix issues in two cpufreq drivers, in the AMD P-state driver and
  in the power-capping DTPM framework.

  Specifics:

   - Fix the AMD P-state driver's EPP sysfs interface in the cases when
     the performance governor is in use (Ayush Jain)

   - Make the ->fast_switch() callback in the AMD P-state driver return
     the target frequency as expected (Gautham R. Shenoy)

   - Allow user space to control the range of frequencies to use via
     scaling_min_freq and scaling_max_freq when AMD P-state driver is in
     use (Wyes Karny)

   - Prevent power domains needed for wakeup signaling from being turned
     off during system suspend on Qualcomm systems and prevent
     performance states votes from runtime-suspended devices from being
     lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
     (Stephan Gerhold)

   - Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
     i.MX6ULL types that can run at that frequency (Christoph
     Niedermaier)

   - Eliminate unnecessary and harmful conversions to uW from the DTPM
     (dynamic thermal and power management) framework (Lukasz Luba)"

* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate: Only print supported EPP values for performance governor
  cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
  powercap: DTPM: Fix unneeded conversions to micro-Watts
  cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
  pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
  cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
  cpufreq: qcom-nvmem: Enable virtual power domain devices
  cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
2023-12-02 09:01:00 +09:00
Linus Torvalds
ce474ae7d0 ACPI fixes for 6.7-rc4
- Fix a recently introduced build issue on ARM32 platforms caused by an
    inadvertent header file breakage (Dave Jiang).
 
  - Eliminate questionable usage of acpi_driver_data() in the ACPI
    backlight cooling device code that leads to NULL pointer dereferences
    after recent ACPI core changes (Hans de Goede).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmVqSGMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxKBsP/RqwUb8oNYxreUmKgIVZ0H8SnPRfluVP
 4St+HeTxsmdN+obglVUWnhZMViewGsionLuq0y/FrYoWLI2F08UQAU8i248h20aZ
 nGXalM+n5H517dOidTJzvGxKHMOA2TrzyVna/IcQAYLnbXmp25j8EdHmuhrI3KfK
 3yxXLe+6J22776U/MMyutR+rwTVE0tgfQOWM4YuxT67uUPQVkKX+3/uYt/EfkKsX
 Dz2ce/5vF28JDjv/yTxaoctMpmjjem97av0J7Y1EM/5K9kTZ070U8OZhYEgBHw5o
 pRalhQlNz5VI/KQy3Mc8n4DmrwrPoktCBI0pQPr8FV56dmCFmTaFY1C4/SxebTr4
 O2U3r5GkmcxLCKZXAUUAc8J+M6BBRHNQtlpBN3iNRG4ID2x72idPn5fr02UeGk4g
 lxysNzcIwxcOuogeKTD2IERzLZA7Ub3qm8gguFOMqnxV1nmPx6f1nl1MuxLsJY4e
 ZQwrwZCDJr5CZUrvJz6Mo9HUibLQOELgtsWinKV9OYUgTOfQ69t+XuwsPYIYtQiH
 UV7qfE+ZLiG/jZPgP6tfN6InbtB9I4xXNMhKyoZrGTNtbKUVLdDHJH62/vIyHBXQ
 VuLr9jH+CuyRjJSRCnbWA1BWkLhWfHQnFaq9UVWAWkVd8MisGVolnCSPGjUyQur7
 ZT7i3c13nxJ2
 =ANS/
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "This fixes a recently introduced build issue on ARM32 and a NULL
  pointer dereference in the ACPI backlight driver due to a design issue
  exposed by a recent change in the ACPI bus type code.

  Specifics:

   - Fix a recently introduced build issue on ARM32 platforms caused by
     an inadvertent header file breakage (Dave Jiang)

   - Eliminate questionable usage of acpi_driver_data() in the ACPI
     backlight cooling device code that leads to NULL pointer
     dereferences after recent ACPI core changes (Hans de Goede)"

* tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Use acpi_video_device for cooling-dev driver data
  ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
2023-12-02 08:52:20 +09:00
Linus Torvalds
35f8458480 Fix a 6.7-rc1 regression where the arm64 KPTI ends up enabled even on
systems that don't need it.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmVqHPAACgkQa9axLQDI
 XvGIPQ/+OL6XCj4DLJm0/FB2yYwn0k1Q58L+MCi1SZsrMYtWHcw4DZCG1pr5csvx
 fTQx9dQjXstUeSif0sDq60aBVJjOEArc4Zk9wW4d/GZ1lv+ZbZvSDfCfEpNS5arN
 FjZilwpjB8xacRJ8OABuN3OFurZmYOgjan6xFcjVIEeipBfzG4vJh4J/u/j5+7iQ
 dMEAo3tMFLYLTC/XUFVp/5m5Q8Enwz/Csnoz4XtwK7JYhm7Kuvl2gj3XOa3sa23Y
 m1y4PPQTWBC+rJ9KGAUgk8WTRHd417LBZQfOnzmyyFQMEmmVEtRNYlF0NVNr4Umk
 5pVhhLHGCUnRg0UEIC5v5w5rk0AsBOBZoETW5QmIfRRA6SApR8jNyjcoCHbjOqXb
 65Hu420DPUu3eCmKpjhmF+a/xc5NlT1ArEg9kIuM1Lb1zeZpMmkdE50jTamhLnd2
 BbfwD6ViU/f9/u7R4Bl6v0J20XHzbupUBOTnLGBboeBki+o60Sc4ndCMDMc3s3Y7
 iIdSDXvZMWp6Tzi9n4zcyGXPJ/oKAVJOI+8Ipdc5ATJ/a3mDYkI9LCkqvPYviaAV
 X61Rd1/ZNgsD117AbUCSJObYfJHIXYNakd5gLlyo6SinxelhSah8wT9Zzygo+Tbr
 cOp4RWf3IZLaWPzFUd87N4GvxPQtwzrzja8rualK07Jq67MMZzM=
 =kXj+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix a regression where the arm64 KPTI ends up enabled even on systems
  that don't need it"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Avoid enabling KPTI unnecessarily
2023-12-02 08:48:59 +09:00
Linus Torvalds
1a2b418566 IOMMU Fixes for Linux v6.7-rc3
Including:
 
 	- Fix race conditions in device probe path
 
 	- Handle ERR_PTR() returns in __iommu_domain_alloc() path
 
 	- Update MAINTAINERS entry for Qualcom IOMMUs
 
 	- Printk argument fix in device tree specific code
 
 	- Several Intel VT-d fixes from Lu Baolu:
 	  - Do not support enforcing cache coherency for non-empty domains
 	  - Avoid devTLB invalidation if iommu is off
 	  - Disable PCI ATS in legacy passthrough mode
 	  - Support non-PCI devices when clearing context
 	  - Fix incorrect cache invalidation for mm notification
 	  - Add MTL to quirk list to skip TE disabling
 	  - Set variable intel_dirty_ops to static
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmVpq1UACgkQK/BELZcB
 GuOcUA/+Kko7SEbNCxdRlGZWYUtDrF4Aa6UOwFAFnqJikEpf3IOI6lGl/aNF/zZ/
 NMcAL53Jk9UNvKXlbwdVAOlEOco1LRGPbZFFJUGcS33iG9JhmHrMpmGuOoQe67zU
 Po5uxLigSXRlFBNqst61G8juOlyR0r6Kwel4LNoyfJ7RJuJ76j9gPL2CaIajrfB0
 mkKWKP8Qw11yA6A9sWLyFUOdLdmg4oG1rtjZRrcYGTvyKZrFxod8aQB12xcl2wd3
 RZYel2OReGML3tSczrwAHJis6XawNn9ifm2Z7jxyN10zIUH60BSl8NJrToxkIkfi
 eMNyGb89ZUDxZfH5oRHDo9ZRg3r/6AZnkvK1rA24onUJQtqlWKfnHzW4i9Hn4sdV
 ecckcyIRkRm54q9vsRh2Ke074OzOREVAeL7bUGmh8zsfc2mQSGofgZDxB12cL7re
 k3+GaPE2TTUpE9VSWttQN87d5wyds7Dfd0RiDu4T7mbP62b3UMhP//scTiWTQ4M5
 iYTHhJwrTebSjjEQB/0h09grYEFS2jsAimc6uj6MDnGTUGSlzd1VgiBZCn+ea8KL
 z0d0yi6rcVMBDUpU/GLUsaYji2trTxDV4popxz+gpDZx1wLBgaDbX8QJ4K3lH6a3
 51REakYTL4xb7m9aNGcvx+sYyB/OXlvGhc2jUv0kdJd4DmR8Un4=
 =O/Sr
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Fix race conditions in device probe path

 - Handle ERR_PTR() returns in __iommu_domain_alloc() path

 - Update MAINTAINERS entry for Qualcom IOMMUs

 - Printk argument fix in device tree specific code

 - Several Intel VT-d fixes from Lu Baolu:
     - Do not support enforcing cache coherency for non-empty domains
     - Avoid devTLB invalidation if iommu is off
     - Disable PCI ATS in legacy passthrough mode
     - Support non-PCI devices when clearing context
     - Fix incorrect cache invalidation for mm notification
     - Add MTL to quirk list to skip TE disabling
     - Set variable intel_dirty_ops to static

* tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix printk arg in of_iommu_get_resv_regions()
  iommu/vt-d: Set variable intel_dirty_ops to static
  iommu/vt-d: Fix incorrect cache invalidation for mm notification
  iommu/vt-d: Add MTL to quirk list to skip TE disabling
  iommu/vt-d: Make context clearing consistent with context mapping
  iommu/vt-d: Disable PCI ATS in legacy passthrough mode
  iommu/vt-d: Omit devTLB invalidation requests when TES=0
  iommu/vt-d: Support enforce_cache_coherency only for empty domains
  iommu: Avoid more races around device probe
  MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry
  iommu: Flow ERR_PTR out from __iommu_domain_alloc()
2023-12-02 08:42:39 +09:00
Linus Torvalds
06a3c59f9c sound fixes for 6.7-rc4
No surprise here, including only a collection of HD-audio
 device-specific small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmVpvSIOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8PRA/+PQFZaI5L5tcEB9+mjH3TUVSYtbBweOBToEbg
 QFt3KNS550Q6Mcvnhrp1AH7amFkWt354e8iZz5tVAITrFkwkYWImlneZweCWQXeB
 /TvIViuUr8KMfl2eyAMVRUkuqQ7EoVrkL1uIjNWCernNUQqlc+oUTn6D42lxiwkG
 mQpzJL/MknSPV+31lv0wZyj7TDSCDPRhmX6Lu6FH+QbbhdZUGpl8etmUUYNa8D1G
 hh2tPjVGuUxZDcIYtVD3NIjOTFvTyVF3S+cxqW2WT1qs2ZmWH+nBBqZbkGva7gfB
 /s0TNTTOLq6VD58B5vf25D1AUVEMub5295PKrCnKfo2VO+/PMPQWXBS2pYuCey6y
 MZpqCKRStgBCCfCZg6byWiVKwqsSJJTvKlXAginYXuK/jFJhnJxqVRQ9Mc6Op81z
 +c1p0Zy9IEhTAdF2RLGw99zho1B5cx/zBGRvVwq0c9NV+LQwMoRsFKbPr6QAM65K
 bAb+3wVQ3cG1y8FVz3Amq5JfcbymJ15CLwI5nW3Op47MrUSsYXflbPfgExI7dRTH
 Xegm8jPbw8V9r66BYAATRQztENxQSGINL/0ngGAS4jEpKN2XTgjSJgBizBdvRis5
 RbJERBICEUhFGlogREsareSkkoV20lgIxJ5Nc/uve4RtLaV8ovheTA48NspXZ8ld
 6HlRRIg=
 =57FL
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "No surprise here, including only a collection of HD-audio
  device-specific small fixes"

* tag 'sound-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: Disable power-save on KONTRON SinglePC
  ALSA: hda/realtek: Add supported ALC257 for ChromeOS
  ALSA: hda/realtek: Headset Mic VREF to 100%
  ALSA: hda: intel-nhlt: Ignore vbps when looking for DMIC 32 bps format
  ALSA: hda: cs35l56: Enable low-power hibernation mode on SPI
  ALSA: cs35l41: Fix for old systems which do not support command
  ALSA: hda: cs35l41: Remove unnecessary boolean state variable firmware_running
  ALSA: hda - Fix speaker and headset mic pin config for CHUWI CoreBook XPro
2023-12-02 08:33:29 +09:00
Linus Torvalds
b1e51588aa drm fixes for 6.7-rc4
drm:
 - Revert unexport of prime helpers for fd/handle conversion
 
 dma_resv:
 - Do not double add fences in dma_resv_add_fence.
 
 gpuvm:
 - Fix GPUVM license identifier.
 
 i915:
 - Mark internal GSC engine with reserved uabi class
 - Take VGA converters into account in eDP probe
 - Fix intel_pre_plane_updates() call to ensure workarounds get applied
 
 panel:
 - Revert panel fixes as they require exporting device_is_dependent.
 
 nouveau:
 - fix oversized allocations in new vm path
 - fix zero-length array
 - remove a stray lock
 
 nt36523:
 - Fix error check for nt36523.
 
 amdgpu:
 - DMUB fix
 - DCN 3.5 fixes
 - XGMI fix
 - DCN 3.2 fixes
 - Vangogh suspend fix
 - NBIO 7.9 fix
 - GFX11 golden register fix
 - Backlight fix
 - NBIO 7.11 fix
 - IB test overflow fix
 - DCN 3.1.4 fixes
 - fix a runtime pm ref count
 - Retimer fix
 - ABM fix
 - DCN 3.1.5 fix
 - Fix AGP addressing
 - Fix possible memory leak in SMU error path
 - Make sure PME is enabled in D3
 - Fix possible NULL pointer dereference in debugfs
 - EEPROM fix
 - GC 9.4.3 fix
 
 amdkfd:
 - IP version check fix
 - Fix memory leak in pqm_uninit()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVpf7sACgkQDHTzWXnE
 hr7dzBAAg7iqxlScxzbDEwsKjRRmRv77lRjKJbk8jcjee1b2Bubtj3Ezaa4iIVFc
 iYFV8GfyeyisSYxlzNHDIA0F95IGdASQSAs1pIHzegCobYyybax2hr/ZhQeThFG1
 xwji2PCHnbD2+r1jvVgvY6234+Re7hfQotsL+KWqLGz5YS4sgLw+Vyofm1WxTt7K
 OTbVvunvfn1PgT36LFOEjaVHAog9IHbZ/4cwEqgucOoTl7rAOXV5mN4WRsBh1B+E
 h6Ny9O8rMRajC6PDmUiwUBuSPj+TDj8K7WtfUQWhR1EySEikcwClfpWPM/LCyuPr
 wdv3LCPBIJnFQOa0pJX/3afTc0tB69ETtGrw342fo+wyUOKwaIt9OVlRaDgcjExv
 TIeTEjP6GJ7lcd/8VkAj97qBexFUQrcKVZjrRfTJuv+ee9KcCv0TA7I8hsqaOew/
 2xirLknX5kHt/pYH6CYsYzEt4ruoE5F0m1fjCpj6VubSl5B+kZsuQzui9BE1WusO
 +VKsujluhmcQHWH5KQYWqeVtgYa7ZP18o4P+tF5Xon52w2VeP8EqREbgH0uFIcd/
 jPVVjyyEgUgWpyw42v0w+EwGXRACnT5kMtDHrKNVsgF43cev2nvKq2GHQ4djvOG/
 CYWYCxro+ZbGDfEfPBDY16qh1ra+dnlZTYwYfBcGT5u4RfgDzPo=
 =YlVI
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Weekly fixes, mostly amdgpu fixes with a scattering of nouveau, i915,
  and a couple of reverts. Hopefully it will quieten down in coming
  weeks.

  drm:
   - Revert unexport of prime helpers for fd/handle conversion

  dma_resv:
   - Do not double add fences in dma_resv_add_fence.

  gpuvm:
   - Fix GPUVM license identifier.

  i915:
   - Mark internal GSC engine with reserved uabi class
   - Take VGA converters into account in eDP probe
   - Fix intel_pre_plane_updates() call to ensure workarounds get applied

  panel:
   - Revert panel fixes as they require exporting device_is_dependent.

  nouveau:
   - fix oversized allocations in new vm path
   - fix zero-length array
   - remove a stray lock

  nt36523:
   - Fix error check for nt36523.

  amdgpu:
   - DMUB fix
   - DCN 3.5 fixes
   - XGMI fix
   - DCN 3.2 fixes
   - Vangogh suspend fix
   - NBIO 7.9 fix
   - GFX11 golden register fix
   - Backlight fix
   - NBIO 7.11 fix
   - IB test overflow fix
   - DCN 3.1.4 fixes
   - fix a runtime pm ref count
   - Retimer fix
   - ABM fix
   - DCN 3.1.5 fix
   - Fix AGP addressing
   - Fix possible memory leak in SMU error path
   - Make sure PME is enabled in D3
   - Fix possible NULL pointer dereference in debugfs
   - EEPROM fix
   - GC 9.4.3 fix

  amdkfd:
   - IP version check fix
   - Fix memory leak in pqm_uninit()"

* tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm: (53 commits)
  Revert "drm/prime: Unexport helpers for fd/handle conversion"
  drm/amdgpu: Use another offset for GC 9.4.3 remap
  drm/amd/display: Fix some HostVM parameters in DML
  drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit
  drm/amdgpu: Update EEPROM I2C address for smu v13_0_0
  drm/amd/display: Allow DTBCLK disable for DCN35
  drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
  drm/amd: Enable PCIe PME from D3
  drm/amd/pm: fix a memleak in aldebaran_tables_init
  drm/amdgpu: fix AGP addressing when GART is not at 0
  drm/amd/display: update dcn315 lpddr pstate latency
  drm/amd/display: fix ABM disablement
  drm/amd/display: Fix black screen on video playback with embedded panel
  drm/amd/display: Fix conversions between bytes and KB
  drm/amdkfd: Use common function for IP version check
  drm/amd/display: Remove config update
  drm/amd/display: Update DCN35 clock table policy
  drm/amd/display: force toggle rate wa for first link training for a retimer
  drm/amdgpu: correct the amdgpu runtime dereference usage count
  drm/amd/display: Update min Z8 residency time to 2100 for DCN314
  ...
2023-12-02 08:18:59 +09:00
Linus Torvalds
c9a925b7bc io_uring-6.7-2023-11-30
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmVo5jYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuNNEAC7sm239X9ixFQ7E70kxz1FyPpF6KS5oVWT
 piAEbUkWQON5yI7M7C2l8w6/xDi8yCIf272HnvPlYfJzkoOfjt7hjWpxtKUOVYJe
 MtL+KSiXqtdv8fvYaS61dyPzqJQ7q7D1cnCVUv7eKm7hSo7NZQH94fzC7UW+Xm/G
 2wng8C5Ltd0IfLpqQJnrn/yGnsCw3PfQYiC7unMXB5NT5eriM5jnHGpl9EPMLxbP
 TWIyUYiqxzrd9QkCTdpEZkKP35Pho/tzCtc3mN0+9tcMuoESX0KnQiR5q5IPet4/
 kkTvZZ7Kw18k8Eb99JSH2G2maFrrEZg0C3MfTF0W4O2t19Pajx8cyhVyAa1ib32o
 TcT6+M1XdAp2rEpfDSRvNCqRpMXm1zARpo4GvEHqGbY5/VefXeJPPaJyAu0CLNlk
 p1FJCQq8hMHd71GCfzb9d1Z+Mozd7dOO1CJqPYz35WXdtXSJ0b8Hw/aVIaYT9JP7
 IbP9IE7ZuPPZq+BC6FTH1O2zbJ0h+PSC5yAONw+Py3YHUT586e11nCyhQUrOJQmE
 kJENcknQCtcFgckXzT5ROh+Vlt6KHjltrVOAT3Jl2LhRssczJo6/4+BZfgvHJipE
 TSOdKFS1Saxh0XX8DGovYT78rg3tullzkvWEVFRrDk6MlFOCHGAs1E0Prz7yqzE4
 KscqZOwIZw==
 =wA/u
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix an issue with discontig page checking for IORING_SETUP_NO_MMAP

 - Fix an issue with not allowing IORING_SETUP_NO_MMAP also disallowing
   mmap'ed buffer rings

 - Fix an issue with deferred release of memory mapped pages

 - Fix a lockdep issue with IORING_SETUP_NO_MMAP

 - Use fget/fput consistently, even from our sync system calls. No real
   issue here, but if we were ever to allow closing io_uring descriptors
   it would be required. Let's play it safe and just use the full ref
   counted versions upfront. Most uses of io_uring are threaded anyway,
   and hence already doing the full version underneath.

* tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux:
  io_uring: use fget/fput consistently
  io_uring: free io_buffer_list entries via RCU
  io_uring/kbuf: prune deferred locked cache when tearing down
  io_uring/kbuf: recycle freed mapped buffer ring entries
  io_uring/kbuf: defer release of mapped buffer rings
  io_uring: enable io_mem_alloc/free to be used in other parts
  io_uring: don't guard IORING_OFF_PBUF_RING with SETUP_NO_MMAP
  io_uring: don't allow discontig pages for IORING_SETUP_NO_MMAP
2023-12-02 06:47:32 +09:00
Linus Torvalds
ee0c8a9b34 block-6.7-2023-12-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmVqKo4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqL9D/9bPvuA+Oogx+C/kNConjxnuyPBiXcZjb/4
 5gO/6N0FC8yu+HQqgscGTyEjJO2FKfLx+YxxBs1UVIt4Tm+jZwC3nPqw9X4W3RCz
 pK9fxCNlzxey0SZU3ZJQIOtqP3df5Yuas9V/h35GS4m1XaoDE6cPpsIVUrAnoNwg
 W990L8sOy6y4XzMPzyHJCyoDCay1Qp2ly0Vdlz4/ESRmEp564i42nFN+8zpZ/w7h
 V+Ekn6JwP1ssqUeY/k43QcfRzYwSvvnTQJ1y9t3erf6HcHtpbCgnL1jTaGEmr4IS
 1sw3ffqo23xBSsGP+D2OF4+9pwGI9+xwNpYnRdrpDPxKhCn5EEh+g6+f+m7YEnFV
 q1swlMTqHtRLFdYbKe8Tl8hPRwEeSpKy8sXph56hwGZY0T/IyB+Pe3aXrh1DYPA5
 4+GASZHFQPH82P1ibVNdpMRZe4rPPblw38GZauZ1JbI0m0zXqEveB2AgZeCcw1ky
 l7KBdMdGBqSWYVmfKcJd3f30vKPyhMSp4eE9/LFp24vmyIIw+dSp6vup0yrM6jk9
 taUU6PCHzaxmI1YGz1BzNVa8cfYKB6aiWeQ2OGa4Z7ba4TuksMLkbfVvu21jdi+z
 PsL/KlqPSPwFL/3XAZagIb3BXUhoQyfwIU8GnAuw2wTU5RJzWnbwF3wXpNaBIJxI
 8y5OWsFqIg==
 =5kb6
 -----END PGP SIGNATURE-----

Merge tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Invalid namespace identification error handling (Marizio Ewan,
       Keith)
     - Fabrics keep-alive tuning (Mark)

 - Fix for a bad error check regression in bcache (Markus)

 - Fix for a performance regression with O_DIRECT (Ming)

 - Fix for a flush related deadlock (Ming)

 - Make the read-only warn on per-partition (Yu)

* tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux:
  nvme-core: check for too small lba shift
  blk-mq: don't count completed flush data request as inflight in case of quiesce
  block: Document the role of the two attribute groups
  block: warn once for each partition in bio_check_ro()
  block: move .bd_inode into 1st cacheline of block_device
  nvme: check for valid nvme_identify_ns() before using it
  nvme-core: fix a memory leak in nvme_ns_info_from_identify()
  nvme: fine-tune sending of first keep-alive
  bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
2023-12-02 06:39:30 +09:00
Linus Torvalds
abd792f330 - Fix DM verity target's FEC support to always initialize IO before it
frees it.  Also fix alignment of struct dm_verity_fec_io within the
   per-bio-data.
 
 - Fix DM verity target to not FEC failed readahead IO.
 
 - Update DM flakey target to use MAX_ORDER rather than MAX_ORDER - 1.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmVqIjwACgkQxSPxCi2d
 A1pcwAgA00/Fln0p84cD3wFKauC61RALx5awoS0S2obAN+JY9yLs3xl1XDm92HyI
 9giOXofHVKIlOQW6qASfZoCNGvtKPCVoKZF9KXKCqpK8wyKpuuG+yTPVeSsOK/fw
 pKcPp3FyXsu+9FXH3oO9xauLPOiGDC7BfIcHFQITHzT7qwMxQcPQ1HwfVwjrWIjG
 lgIQToiSZokBKBWXKyo63SMVkwWhlTdrfG1CJrc0UC9/f6DBMS0RTYJqmNJ3V8ak
 i0QyQdGZxc9TFuZe/G+Oq381z+X42iRDlluVU3ClMQTyoemQRcySi98CjRLruu7x
 1H79s8ZIaJc/4mkxlJUQingL+dmuGA==
 =Av5r
 -----END PGP SIGNATURE-----

Merge tag 'dm-6.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM verity target's FEC support to always initialize IO before it
   frees it. Also fix alignment of struct dm_verity_fec_io within the
   per-bio-data

 - Fix DM verity target to not FEC failed readahead IO

 - Update DM flakey target to use MAX_ORDER rather than MAX_ORDER - 1

* tag 'dm-6.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-flakey: start allocating with MAX_ORDER
  dm-verity: align struct dm_verity_fec_io properly
  dm verity: don't perform FEC for failed readahead IO
  dm verity: initialize fec io before freeing it
2023-12-02 06:32:29 +09:00