Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.
So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.
The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():
static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;
cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);
return 0;
}
The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:
While mmap_singleton() does
rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();
it allocates a new file via anon_inode_getfile() and does
smp_store_mb(i915->gem.mmap_singleton, file);
So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.
Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.
Say concurrently another task hits mmap_singleton() and does:
rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();
When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().
But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.
So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.
<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>
Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
The calling code actually handles -ECHILD, so this BUG_ON
can be converted to WARN_ON_ONCE.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Link: https://lore.kernel.org/r/20231023184718.11143-1-bschubert@ddn.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
The cgwb cleanup routine will try to release the dying cgwb by switching
the attached inodes. It fetches the attached inodes from wb->b_attached
list, omitting the fact that inodes only with dirty timestamps reside in
wb->b_dirty_time list, which is the case when lazytime is enabled. This
causes enormous zombie memory cgroup when lazytime is enabled, as inodes
with dirty timestamps can not be switched to a live cgwb for a long time.
It is reasonable not to switch cgwb for inodes with dirty data, as
otherwise it may break the bandwidth restrictions. However since the
writeback of inode metadata is not accounted for, let's also switch
inodes with dirty timestamps to avoid zombie memory and block cgroups
when laztytime is enabled.
Fixes: c22d70a162 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231014125511.102978-1-jefflexu@linux.alibaba.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In commit f61b9bb3f838 ("fs: add a new SB_I_NOUMASK flag") we added a
new SB_I_NOUMASK flag that is used by filesystems like NFS to indicate
that umask stripping is never supposed to be done in the vfs independent
of whether or not POSIX ACLs are supported.
Overlayfs falls into the same category as it raises SB_POSIXACL
unconditionally to defer umask application to the upper filesystem.
Now that we have SB_I_NOUMASK use that and make SB_POSIXACL properly
conditional on whether or not the kernel does have support for it. This
will enable use to turn IS_POSIXACL() into nop on kernels that don't
have POSIX ACL support avoding bugs from missed umask stripping.
Link: https://lore.kernel.org/r/20231012-einband-uferpromenade-80541a047a1f@brauner
Acked-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Make IS_POSIXACL() return false if POSIX ACL support is disabled.
Never skip applying the umask in namei.c and never bother to do any
ACL specific checks if the filesystem falsely indicates it has ACLs
enabled when the feature is completely disabled in the kernel.
This fixes a problem where the umask is always ignored in the NFS
client when compiled without CONFIG_FS_POSIX_ACL. This is a 4 year
old regression caused by commit 013cdf1088 which itself was not
completely wrong, but failed to consider all the side effects by
misdesigned VFS code.
Prior to that commit, there were two places where the umask could be
applied, for example when creating a directory:
1. in the VFS layer in SYSCALL_DEFINE3(mkdirat), but only if
!IS_POSIXACL()
2. again (unconditionally) in nfs3_proc_mkdir()
The first one does not apply, because even without
CONFIG_FS_POSIX_ACL, the NFS client sets SB_POSIXACL in
nfs_fill_super().
After that commit, (2.) was replaced by:
2b. in posix_acl_create(), called by nfs3_proc_mkdir()
There's one branch in posix_acl_create() which applies the umask;
however, without CONFIG_FS_POSIX_ACL, posix_acl_create() is an empty
dummy function which does not apply the umask.
The approach chosen by this patch is to make IS_POSIXACL() always
return false when POSIX ACL support is disabled, so the umask always
gets applied by the VFS layer. This is consistent with the (regular)
behavior of posix_acl_create(): that function returns early if
IS_POSIXACL() is false, before applying the umask.
Therefore, posix_acl_create() is responsible for applying the umask if
there is ACL support enabled in the file system (SB_POSIXACL), and the
VFS layer is responsible for all other cases (no SB_POSIXACL or no
CONFIG_FS_POSIX_ACL).
Signed-off-by: Max Kellermann <mk@cm4all.com>
Link: https://lore.kernel.org/r/151603744662.29035.4910161264124875658.stgit@rabbit.intern.cm-ag
Signed-off-by: Christian Brauner <brauner@kernel.org>
A backing file struct stores two path's, one "real" path that is referring
to f_inode and one "fake" path, which should be displayed to users in
/proc/<pid>/maps.
There is a lot more potential code that needs to know the "real" path, then
code that needs to know the "fake" path.
Instead of code having to request the "real" path with file_real_path(),
store the "real" path in f_path and require code that needs to know the
"fake" path request it with file_user_path().
Replace the file_real_path() helper with a simple const accessor f_path().
After this change, file_dentry() is not expected to observe any files
with overlayfs f_path and real f_inode, so the call to ->d_real() should
not be needed. Leave the ->d_real() call for now and add an assertion
in ovl_d_real() to catch if we made wrong assumptions.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsnw@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231009153712.1566422-4-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Overlayfs uses backing files with "fake" overlayfs f_path and "real"
underlying f_inode, in order to use underlying inode aops for mapped
files and to display the overlayfs path in /proc/<pid>/maps.
In preparation for storing the overlayfs "fake" path instead of the
underlying "real" path in struct backing_file, define a noop helper
file_user_path() that returns f_path for now.
Use the new helper in procfs and kernel logs whenever a path of a
mapped file is displayed to users.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231009153712.1566422-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
A writeable mapped backing file can perform writes to the real inode.
Therefore, the real path mount must be kept writable so long as the
writable map exists.
This may not be strictly needed for ovelrayfs private upper mount,
but it is correct to take the mnt_writers count in the vfs helper.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231009153712.1566422-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
So happens it already was not doing it, but there is no need to "hope"
as indicated in the comment.
No changes in generated assembly.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20231004111916.728135-3-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Backing files as used by overlayfs are never installed into file
descriptor tables and are explicitly documented as such. They aren't
subject to rcu access conditions like regular files are.
Their lifetime is bound to the lifetime of the overlayfs file, i.e.,
they're stashed in ovl_file->private_data and go away otherwise. If
they're set as vma->vm_file - which is their main purpose - then they're
subject to regular refcount rules and vma->vm_file can't be installed
into an fdtable after having been set. All in all I don't see any need
for rcu delay here. So free it directly.
This all hinges on such hybrid beasts to never actually be installed
into fdtables which - as mentioned before - is not allowed. So add an
explicit WARN_ON_ONCE() so we catch any case where someone is suddenly
trying to install one of those things into a file descriptor table so we
can have a nice long chat with them.
Link: https://lore.kernel.org/r/20231005-sakralbau-wappnen-f5c31755ed70@brauner
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Readahead was factored to call generic_fadvise. That refactor added an
S_ISREG restriction which broke readahead on block devices.
In addition to S_ISREG, this change checks S_ISBLK to fix block device
readahead. There is no change in behavior with any file type besides block
devices in this change.
Fixes: 3d8f761531 ("vfs: implement readahead(2) using POSIX_FADV_WILLNEED")
Signed-off-by: Reuben Hawkins <reubenhwk@gmail.com>
Link: https://lore.kernel.org/r/20231003015704.2415-1-reubenhwk@gmail.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
While valid we don't need to open-code rcu dereferences if we're
acquiring file_lock anyway.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20231010030615.GO800259@ZenIV
Signed-off-by: Christian Brauner <brauner@kernel.org>
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.
The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.
With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.
In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.
Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.
Cc: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Failed opens (mostly ENOENT) legitimately happen a lot, for example here
are stats from stracing kernel build for few seconds (strace -fc make):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ------------------
0.76 0.076233 5 15040 3688 openat
(this is tons of header files tried in different paths)
In the common case of there being nothing to close (only the file object
to free) there is a lot of overhead which can be avoided.
This is most notably delegation of freeing to task_work, which comes
with an enormous cost (see 021a160abf ("fs: use __fput_sync in
close(2)" for an example).
Benchmarked with will-it-scale with a custom testcase based on
tests/open1.c, stuffed into tests/openneg.c:
[snip]
while (1) {
int fd = open("/tmp/nonexistent", O_RDONLY);
assert(fd == -1);
(*iterations)++;
}
[/snip]
Sapphire Rapids, openneg_processes -t 1 (ops/s):
before: 1950013
after: 2914973 (+49%)
file refcount is checked as a safety belt against buggy consumers with
an atomic cmpxchg. Technically it is not necessary, but it happens to
not be measurable due to several other atomics which immediately follow.
Optmizing them away to make this atomic into a problem is left as an
exercise for the reader.
v2:
- unexport fput_badopen and move to fs/internal.h
- handle the refcount with cmpxchg, adjust commentary accordingly
- tweak the commit message
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Because 'inode' is being initialised before checking if 'dentry' is negative
it looks like an extra iput() on 'inode' may happen since the ihold() is
done only if the dentry is *not* negative. In reality this doesn't happen
because d_is_negative() is never true if ->d_inode is NULL. This patch only
makes the code easier to understand, as I was initially mislead by it.
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Link: https://lore.kernel.org/r/20230928152341.303-1-lhenriques@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct watch_filter.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Siddh Raman Pant <code@siddh.me>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qian Cai <cai@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Message-Id: <20230922175407.work.754-kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
If there is no watch_queue, holding the pipe mutex is enough to
prevent concurrent writes, and we can avoid the spinlock.
O_NOTIFICATION_QUEUE is an exotic and rarely used feature, and of all
the pipes that exist at any given time, only very few actually have a
watch_queue, therefore it appears worthwile to optimize the common
case.
This patch does not optimize pipe_resize_ring() where the spinlocks
could be avoided as well; that does not seem like a worthwile
optimization because this function is not called often.
Related commits:
- commit 8df441294d ("pipe: Check for ring full inside of the
spinlock in pipe_write()")
- commit b667b86734 ("pipe: Advance tail pointer inside of wait
spinlock in pipe_read()")
- commit 189b0ddc24 ("pipe: Fix missing lock in pipe_resize_ring()")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-4-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit 8df441294d ("pipe: Check for ring full inside of
the spinlock in pipe_write()") which was obsoleted by commit
c73be61ced ("pipe: Add general notification queue support") because
now pipe_write() fails early with -EXDEV if there is a watch_queue.
Without a watch_queue, no notifications can be posted to the pipe and
mutex protection is enough, as can be seen in splice_pipe_to_pipe()
which does not use the spinlock either.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-3-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This declutters the code by reducing the number of #ifdefs and makes
the watch_queue checks simpler. This has no runtime effect; the
machine code is identical.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-2-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This has no effect on 64 bit because there are 10 32-bit integers
surrounding the two bools, but on 32 bit architectures, this reduces
the struct size by 4 bytes by merging the two bools into one word.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-1-max.kellermann@ionos.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4
also sets this flag to prevent the VFS from applying the umask on
newly-created files. NFSv4 doesn't support POSIX ACLs however, which
causes confusion when other subsystems try to test for them.
Add a new SB_I_NOUMASK flag that allows filesystems to opt-in to umask
stripping without advertising support for POSIX ACLs. Set the new flag
on NFSv4 instead of SB_POSIXACL.
Also, move mode_strip_umask to namei.h and convert init_mknod and
init_mkdir to use it.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230911-acl-fix-v3-1-b25315333f6c@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Given a wrong root device, current log may not give the pretty name
which is useful to locate root cause.
For example, there are 2 blk devs in a VM, /dev/vda which has 2 partitials
/dev/vda1 and /dev/vda2 and /dev/vdb which is blank. /dev/vda2 is the
right root dev. When set "root=/dev/vdb", we get error log:
[ 0.635575] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(254,16)
It's not straightforward to find out the root cause as there is lack of
the root devive name therefore hard for people to get those info from the
device number, in the example, (254,16).
It is more comprehensive way to hint the root cause if pretty name is
given here, like:
[ 0.559887] Kernel panic - not syncing: VFS: Unable to mount root fs on "/dev/vdb" or unknown-block(254,16)
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Message-Id: <20230907091025.3436878-1-jianyong.wu@arm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Overlayfs is going to use those to get write access on the upper mount
during entire copy up without taking freeze protection on upper sb for
the entire copy up.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230908132900.2983519-3-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Before exporting these helpers to modules, make their names more
meaningful.
The names mnt_{get,put)_write_access*() were chosen, because they rhyme
with the inode {get,put)_write_access() helpers, which have a very close
meaning for the inode object.
Suggested-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230817-anfechtbar-ruhelosigkeit-8c6cca8443fc@brauner/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230908132900.2983519-2-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add CI integration support files for drm subsystem to gitlab.freedesktop.org instance.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmTv5XMACgkQDHTzWXnE
hr6xGg/+N2Eb3QN7nbhAtpL80Xy1mR4fSYz/ZC8yad+942Nmn9T35PgnrABj0a6+
drQJMqOay1dooe/KRKpLJGYIxCklQUpRn35jiW9KqYkUTwBS77wrGlMySKfzv6eL
yl9Pl69xfQHm/cmigDWe0JavvpDrRnmxBuGrHSjVPHltWVcTDRTyMYpnh6xzZ8JQ
zUpSLhDqkpNJ11072UxTkALwGMIanGOJAYHOoRkJi65BhRcK5QfzslXzpibf484t
FlEh2I5XPS+GBYn84GIvujzn3Bic7Dz9By2z6qIeTX93W5csYnLmKqI0ErAvlGIz
9h6ELeVTnDc/OZa8FmPe4GIc08dKf5cpuEI5szJW30eRVtxI6zygOszHKJArrXkl
JDRuuRXd/SegQvV8NeGaetjh+t53U8utXxPqS6gD50zb5TzYs2nkxOgNGnXjAUjY
DtwvQDsIsL5tFWwaRd7p+Ph23v2F9T486RpCcFkZFPMSbxO8k2lkEyzKBFPruhgl
aYiRtcCnNPajzjOU+Hhgw2ayfR3GN3FDcURLdrKFYnwVZynxpaZDPcMyVzOvYFOr
aH4TfJcwsm/CFhm8Pq5GVgwWJsLPteZEHTq7pRzcjwbt2zs+bzbaCn3twVyxg5a2
R1vNtNI+UuvbEC6srBM/CTa2KpUtmg5dHWppdoboex7XhlF6j1g=
=X1RZ
-----END PGP SIGNATURE-----
Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm ci scripts from Dave Airlie:
"This is a bunch of ci integration for the freedesktop gitlab instance
where we currently do upstream userspace testing on diverse sets of
GPU hardware. From my perspective I think it's an experiment worth
going with and seeing how the benefits/noise playout keeping these
files useful.
Ideally I'd like to get this so we can do pre-merge testing on PRs
eventually.
Below is some info from danvet on why we've ended up making the
decision and how we can roll it back if we decide it was a bad plan.
Why in upstream?
- like documentation, testcases, tools CI integration is one of these
things where you can waste endless amounts of time if you
accidentally have a version that doesn't match your source code
- but also like the above, there's a balance, this is the initial cut
of what we think makes sense to keep in sync vs out-of-tree,
probably needs adjustment
- gitlab supports out-of-repo gitlab integration and that's what's
been used for the kernel in drm, but it results in per-driver
fragmentation and lots of duplicated effort. the simple act of
smashing an arbitrary winner into a topic branch already started
surfacing patches on dri-devel and sparking good cross driver team
discussions
Why gitlab?
- it's not any more shit than any of the other CI
- drm userspace uses it extensively for everything in userspace, we
have a lot of people and experience with this, including
integration of hw testing labs
- media userspace like gstreamer is also on gitlab.fd.o, and there's
discussion to extend this to the media subsystem in some fashion
Can this be shared?
- there's definitely a pile of code that could move to scripts/ if
other subsystem adopt ci integration in upstream kernel git. other
bits are more drm/gpu specific like the igt-gpu-tests/tools
integration
- docker images can be run locally or in other CI runners
Will we regret this?
- it's all in one directory, intentionally, for easy deletion
- probably 1-2 years in upstream to see whether this is worth it or a
Big Mistake. that's roughly what it took to _really_ roll out solid
CI in the bigger userspace projects we have on gitlab.fd.o like
mesa3d"
* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
drm: ci: docs: fix build warning - add missing escape
drm: Add initial ci/ subdirectory
fix a ld.lld linker (in)compatibility quirk and make the x86 SMP init code a bit
more conservative to fix kexec() lockups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmT97boRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jObA//X7nug+d+IMLIs+c4579z4ZhkltMRxJVI
Btf8sdHpwgTUtKaOLmJnGiJ7f0GK5NtoaNtGUJF28aQETVOhco0Fvg/R8k1FE2Tc
CJqw6oy2FjVqD9qzZPCXh6QCvTtGjN5GF+xmoUbf7eZ9U8IVvOxBG+7yDMorQw3P
zzjIccLLg/aDvNLN/yZD2oqw6UGHZuh/Qr/0Q4PkZ7zL+yWV8EC+HOx3rlQklq0x
hh6YMwa4LGr3przUObHsfNS11EDzLDhg2WtTQMr12vlnpUB2eXnXWLklr6rpWjcz
qJiMxkrEkygB7seXnuQ0b4KHN/17zdkJ+t6vZoznUTXs1ohIDLWdiNTSl03qCs9B
V98a1x3MPT6aro9O/5ywyAJwPb0hvsg2S0ODFWab0Z3oRUbIG/k6dTEYlP7qZw8v
EFMtLy6M2EILXetj8q2ZGcA0rKz7pj/z9SosWDzqNj76w7xGwDKrSWoKJckkCwG+
j+ycBuKfrpxVYOF4ywvONSf35QTIW8BR0sM9Lg1GZuwaeincFwLf0cmS4ljGRyZ1
Vsi0SfpIgVQkeY/17onTa1C5X6c2wIE9nq253M58Xnc9B2EWpYImr+4PVZk6s4GI
GExvdPC/rIIwYa0LmvYTTlpHEd7f5qIAhfcEtMAuGSjVDLvmdDGFkaU7TgJ6Jcw2
D12wKSAAgPU=
=S38E
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
affecting certain Intel systems.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmT97GIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iLjRAAjk67cOUubVYXYOwlI4/5c+XiMC1qcqv9
M0uK30h+EECqWdm5Waj4mjFJ3seWUPrNsmdY9L4a6Gd1+s6W10wIabCEJnDBcfj5
M4Nv7OBnbfKb2JGfUfgd0OSSoclxL/QJh5UXMI+vLPKS2hMC5wL3BdKsxwrmbT+y
ZwilRbkHU0kSZpkkKE9yZNOcHhBMZ6LuQGlCAQ5k99Q8Z2KbxPa8T/8zx/RjYfuw
5WrE2gJJPXE/J/IHIvg7auhu2IcpzOmWPtb9J8MPExrUZLSQIYNhdqqyY9lvagKu
OdYcotBEoZM2+MSXzVWSQPKVVKox03FDErwdGef/QLFmzzldFFGUjKSRBfpnMK9Q
esjSiIGXlxh8fFcyCS/6FH48zA5y4BnBSzmPn3xt63VK0+EHPtUOUt8Xiug8DEYs
O8hW+3SYmhr6seF9ioX0RCoqft8dQuNz07zZWfLu5Qc0Bl8aLBhwRoXsYKnKqems
uqQ7bOb1FFwrS6AgebDhqHJUiw6WaeldsEolDyZzbKGJqFHJuTTFaECHApceLD7M
V4ymgy6QoKeccctr8qnwzX/5xbwTx8fRfTIuL9ZR2KC0NJLF/UwIw+bEAq2vuv3A
ROFYH+z1h4IvlCE3o3mRhGGYTxkdFw+BRXrWP690qxyRHBoHE+JwccSu36CKwBYS
LiZn7CWuFtg=
=Zq1S
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fix from Ingo Molnar:
"Work around a firmware bug in the uncore PMU driver, affecting certain
Intel systems"
* tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on EMR
perf tools maintainership:
- Add git information for perf-tools and perf-tools-next trees/branches to the
MAINTAINERS file. That is where development now takes place and myself and
Namhyung Kim have write access, more people to come as we emulate other
maintainer groups.
perf record:
- Record kernel data maps when 'perf record --data' is used, so that global variables can
be resolved and used in tools that do data profiling.
perf trace:
- Remove the old, experimental support for BPF events in which a .c file was passed as
an event: "perf trace -e hello.c" to then get compiled and loaded.
The only known usage for that, that shipped with the kernel as an example for such events,
augmented the raw_syscalls tracepoints and was converted to a libbpf skeleton, reusing all
the user space components and the BPF code connected to the syscalls.
In the end just the way to glue the BPF part and the user space type beautifiers changed,
now being performed by libbpf skeletons.
The next step is to use BTF to do pretty printing of all syscall types, as discussed with
Alan Maguire and others.
Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all path/filenames/strings,
some of the networking data structures, perf_event_attr, etc, i.e. systemwide tracing of
nanosleep calls and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5 seconds:
# perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0
10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0
3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0
Performance counter stats for 'sleep 5':
2,617,347 cycles
1,855,997 instructions # 0.71 insn per cycle
5.002282128 seconds time elapsed
0.000855000 seconds user
0.000852000 seconds sys
#
perf annotate:
- Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1) for
licensing reasons, and we missed a build test on tools/perf/tests makefile.
Since we now default to NDEBUG=1, we ended up segfaulting when building with
BUILD_NONDISTRO=1 because a needed initialization routine was being "error
checked" via an assert.
Fix it by explicitly checking the result and aborting instead if it fails.
We better back propagate the error, but at least 'perf annotate' on samples
collected for a BPF program is back working when perf is built with
BUILD_NONDISTRO=1.
perf report/top:
- Add back TUI hierarchy mode header, that is seen when using 'perf report/top --hierarchy'.
- Fix the number of entries for 'e' key in the TUI that was preventing navigation of
lines when expanding an entry.
perf report/script:
- Support cross platform register handling, allowing a perf.data file collected
on one architecture to have registers sampled correctly displayed when
analysis tools such as 'perf report' and 'perf script' are used on a different
architecture.
- Fix handling of event attributes in pipe mode, i.e. when one uses:
perf record -o - | perf report -i -
When no perf.data files are used.
- Handle files generated via pipe mode with a version of perf and then read
also via pipe mode with a different version of perf, where the event attr
record may have changed, use the record size field to properly support this
version mismatch.
perf probe:
- Accessing global variables from uprobes isn't supported, make the error
message state that instead of stating that some minimal kernel version is
needed to have that feature. This seems just a tool limitation, the kernel
probably has all that is needed.
perf tests:
- Fix a reference count related leak in the dlfilter v0 API where the result
of a thread__find_symbol_fb() is not matched with an addr_location__exit()
to drop the reference counts of the resolved components (machine, thread, map,
symbol, etc). Add a dlfilter test to make sure that doesn't regresses.
- Lots of fixes for the 'perf test' written in shell script related to problems
found with the shellcheck utility.
- Fixes for 'perf test' shell scripts testing features enabled when perf is
built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf counters.
- Add perf record sample filtering test, things like the following example, that gets
implemented as a BPF filter attached to the event:
# perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'
- Improve the way the task_analyzer test checks if libtraceevent is linked,
using 'perf version --build-options' instead of the more expensinve
'perf record -e "sched:sched_switch"'.
- Add support for riscv in the mmap-basic test. (This went as well via the RiscV tree, same contents).
libperf:
- Implement riscv mmap support (This went as well via the RiscV tree, same contents).
perf script:
- New tool that converts perf.data files to the firefox profiler format so that one can use
the visualizer at https://profiler.firefox.com/. Done by Anup Sharma as part of this year's
Google Summer of Code.
One can generate the output and upload it to the web interface but Anup also automated
everything:
perf script gecko -F 99 -a sleep 60
- Support syscall name parsing on arm64.
- Print "cgroup" field on the same line as "comm".
perf bench:
- Add new 'uprobe' benchmark to measure the overhead of uprobes with/without
BPF programs attached to it.
- breakpoints are not available on power9, skip that test.
perf stat:
- Add #num_cpus_online literal to be used in 'perf stat' metrics, and add this extra
'perf test' check that exemplifies its purpose:
TEST_ASSERT_VAL("#num_cpus_online",
expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);
Miscellaneous:
- Improve tool startup time by lazily reading PMU, JSON, sysfs data.
- Improve error reporting in the parsing of events, passing YYLTYPE to error routines,
so that the output can show were the parsing error was found.
- Add 'perf test' entries to check the parsing of events improvements.
- Fix various leak for things detected by -fsanitize=address, mostly things that would
be freed at tool exit, including:
- Free evsel->filter on the destructor.
- Allow tools to register a thread->priv destructor and use it in 'perf trace'.
- Free evsel->priv in 'perf trace'.
- Free string returned by synthesize_perf_probe_point() when the caller fails
to do all it needs.
- Adjust various compiler options to not consider errors some warnings when
building with broken headers found in things like python, flex, bison, as we
otherwise build with -Werror. Some for gcc, some for clang, some for some
specific version of those, some for some specific version of flex or bison, or
some specific combination of these components, bah.
- Allow customization of clang options for BPF target, this helps building on
gentoo where there are other oddities where BPF targets gets passed some compiler
options intended for the native build, so building with WERROR=0 helps while
these oddities are fixed.
- Dont pass ERR_PTR() values to perf_session__delete() in 'perf top' and 'perf lock',
fixing some segfaults when handling some odd failures.
- Add LTO build option.
- Fix format of unordered lists in the perf docs (tools/perf/Documentation).
- Overhaul the bison files, using constructs such as YYNOMEM.
- Remove unused tokens from the bison .y files.
- Add more comments to various structs.
- A few LoongArch enablement patches.
Vendor events (JSON):
- Add JSON metrics for Yitian 710 DDR (aarch64). Things like:
EventName, BriefDescription
visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.",
op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.",
op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",
- Add AmpereOne metrics (aarch64).
- Update N2 and V2 metrics (aarch64) and events using Arm telemetry repo.
- Update scale units and descriptions of common topdown metrics on aarch64. Things like:
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
- Update events for intel: meteorlake to 1.04, sapphirerapids to 1.15, Icelake+ metric constraints.
- Update files for the power10 platform.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZPfJZgAKCRCyPKLppCJ+
J1/eAP9lgtavD0V75wy1p5zyotkceOmPTkk1DYFVx2Euhxa/lAD/YW/JvuVSo0Gr
HqJP52XaV0tF8gG+YxL+Lay/Ke0P5AQ=
=d12c
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"perf tools maintainership:
- Add git information for perf-tools and perf-tools-next trees and
branches to the MAINTAINERS file. That is where development now
takes place and myself and Namhyung Kim have write access, more
people to come as we emulate other maintainer groups.
perf record:
- Record kernel data maps when 'perf record --data' is used, so that
global variables can be resolved and used in tools that do data
profiling.
perf trace:
- Remove the old, experimental support for BPF events in which a .c
file was passed as an event: "perf trace -e hello.c" to then get
compiled and loaded.
The only known usage for that, that shipped with the kernel as an
example for such events, augmented the raw_syscalls tracepoints and
was converted to a libbpf skeleton, reusing all the user space
components and the BPF code connected to the syscalls.
In the end just the way to glue the BPF part and the user space
type beautifiers changed, now being performed by libbpf skeletons.
The next step is to use BTF to do pretty printing of all syscall
types, as discussed with Alan Maguire and others.
Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
path/filenames/strings, some of the networking data structures,
perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
seconds:
# perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0
10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0
3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0
Performance counter stats for 'sleep 5':
2,617,347 cycles
1,855,997 instructions # 0.71 insn per cycle
5.002282128 seconds time elapsed
0.000855000 seconds user
0.000852000 seconds sys
perf annotate:
- Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
for licensing reasons, and we missed a build test on
tools/perf/tests makefile.
Since we now default to NDEBUG=1, we ended up segfaulting when
building with BUILD_NONDISTRO=1 because a needed initialization
routine was being "error checked" via an assert.
Fix it by explicitly checking the result and aborting instead if it
fails.
We better back propagate the error, but at least 'perf annotate' on
samples collected for a BPF program is back working when perf is
built with BUILD_NONDISTRO=1.
perf report/top:
- Add back TUI hierarchy mode header, that is seen when using 'perf
report/top --hierarchy'.
- Fix the number of entries for 'e' key in the TUI that was
preventing navigation of lines when expanding an entry.
perf report/script:
- Support cross platform register handling, allowing a perf.data file
collected on one architecture to have registers sampled correctly
displayed when analysis tools such as 'perf report' and 'perf
script' are used on a different architecture.
- Fix handling of event attributes in pipe mode, i.e. when one uses:
perf record -o - | perf report -i -
When no perf.data files are used.
- Handle files generated via pipe mode with a version of perf and
then read also via pipe mode with a different version of perf,
where the event attr record may have changed, use the record size
field to properly support this version mismatch.
perf probe:
- Accessing global variables from uprobes isn't supported, make the
error message state that instead of stating that some minimal
kernel version is needed to have that feature. This seems just a
tool limitation, the kernel probably has all that is needed.
perf tests:
- Fix a reference count related leak in the dlfilter v0 API where the
result of a thread__find_symbol_fb() is not matched with an
addr_location__exit() to drop the reference counts of the resolved
components (machine, thread, map, symbol, etc). Add a dlfilter test
to make sure that doesn't regresses.
- Lots of fixes for the 'perf test' written in shell script related
to problems found with the shellcheck utility.
- Fixes for 'perf test' shell scripts testing features enabled when
perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
counters.
- Add perf record sample filtering test, things like the following
example, that gets implemented as a BPF filter attached to the
event:
# perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'
- Improve the way the task_analyzer test checks if libtraceevent is
linked, using 'perf version --build-options' instead of the more
expensinve 'perf record -e "sched:sched_switch"'.
- Add support for riscv in the mmap-basic test. (This went as well
via the RiscV tree, same contents).
libperf:
- Implement riscv mmap support (This went as well via the RiscV tree,
same contents).
perf script:
- New tool that converts perf.data files to the firefox profiler
format so that one can use the visualizer at
https://profiler.firefox.com/. Done by Anup Sharma as part of this
year's Google Summer of Code.
One can generate the output and upload it to the web interface but
Anup also automated everything:
perf script gecko -F 99 -a sleep 60
- Support syscall name parsing on arm64.
- Print "cgroup" field on the same line as "comm".
perf bench:
- Add new 'uprobe' benchmark to measure the overhead of uprobes
with/without BPF programs attached to it.
- breakpoints are not available on power9, skip that test.
perf stat:
- Add #num_cpus_online literal to be used in 'perf stat' metrics, and
add this extra 'perf test' check that exemplifies its purpose:
TEST_ASSERT_VAL("#num_cpus_online",
expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);
Miscellaneous:
- Improve tool startup time by lazily reading PMU, JSON, sysfs data.
- Improve error reporting in the parsing of events, passing YYLTYPE
to error routines, so that the output can show were the parsing
error was found.
- Add 'perf test' entries to check the parsing of events
improvements.
- Fix various leak for things detected by -fsanitize=address, mostly
things that would be freed at tool exit, including:
- Free evsel->filter on the destructor.
- Allow tools to register a thread->priv destructor and use it in
'perf trace'.
- Free evsel->priv in 'perf trace'.
- Free string returned by synthesize_perf_probe_point() when the
caller fails to do all it needs.
- Adjust various compiler options to not consider errors some
warnings when building with broken headers found in things like
python, flex, bison, as we otherwise build with -Werror. Some for
gcc, some for clang, some for some specific version of those, some
for some specific version of flex or bison, or some specific
combination of these components, bah.
- Allow customization of clang options for BPF target, this helps
building on gentoo where there are other oddities where BPF targets
gets passed some compiler options intended for the native build, so
building with WERROR=0 helps while these oddities are fixed.
- Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
and 'perf lock', fixing some segfaults when handling some odd
failures.
- Add LTO build option.
- Fix format of unordered lists in the perf docs
(tools/perf/Documentation)
- Overhaul the bison files, using constructs such as YYNOMEM.
- Remove unused tokens from the bison .y files.
- Add more comments to various structs.
- A few LoongArch enablement patches.
Vendor events (JSON):
- Add JSON metrics for Yitian 710 DDR (aarch64). Things like:
EventName, BriefDescription
visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.",
op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.",
op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",
- Add AmpereOne metrics (aarch64).
- Update N2 and V2 metrics (aarch64) and events using Arm telemetry
repo.
- Update scale units and descriptions of common topdown metrics on
aarch64. Things like:
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
- Update events for intel: meteorlake to 1.04, sapphirerapids to
1.15, Icelake+ metric constraints.
- Update files for the power10 platform"
* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
perf parse-events: Fix driver config term
perf parse-events: Fixes relating to no_value terms
perf parse-events: Fix propagation of term's no_value when cloning
perf parse-events: Name the two term enums
perf list: Don't print Unit for "default_core"
perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
perf dlfilter: Avoid leak in v0 API test use of resolve_address()
perf metric: Add #num_cpus_online literal
perf pmu: Remove str from perf_pmu_alias
perf parse-events: Make common term list to strbuf helper
perf parse-events: Minor help message improvements
perf pmu: Avoid uninitialized use of alias->str
perf jevents: Use "default_core" for events with no Unit
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
perf test shell stat_bpf_counters: Fix test on Intel
perf test shell record_bpf_filter: Skip 6.2 kernel
libperf: Get rid of attr.id field
perf tools: Convert to perf_record_header_attr_id()
libperf: Add perf_record_header_attr_id()
perf tools: Handle old data in PERF_RECORD_ATTR
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmT8qQwACgkQiiy9cAdy
T1Hjzgv/dCmqowHN9fI1gP/SpkbyZt0/GrlhpRMELQ2THAYuJmWQ9i/vR7VFMMv7
C7vfx3nYTQfQAcsshA+bfOkABr02A9HD5HuBEhWldEFQD/qIPrPMyU53DJ6m4QOb
81/8ox6XJMDNHWJUPJVfhAyHoKBxYyNcqqkITnBcSpqENbm7IZ8UhUJe1l75nkBi
IYR28e4Aci5uGPuMB7NAuKmJM0IKi1ipMy4MCXkSD/vQZStz76mhzx200Wm6ObuX
jqPzIb32simsVY7q/Hbkxt2MPDUbsn6kBhOSe8oJVxEgrMy56pZCOByPjc0ZFzE6
SziZ0JanrfjlGdBg5qjmcUamEMV44Oi1QQ6PpkHRA2zDyXMpv8ulwy/3Z40O7zbb
WBBar0HsJE5osCL0jiwLbNorctEnmtSKWcyXOI924Bli4eesE/ojq2u4H4KrlXzA
gDqPwAhhw/PrbuEmgRy9maYE8KwutZh2HCRvxgFB1NgbcNLvKgkQrly4QP29hKQy
+HViqKAe
=6QhY
-----END PGP SIGNATURE-----
Merge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- six smb3 client fixes including ones to allow controlling smb3
directory caching timeout and limits, and one debugging improvement
- one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)
- one minor spnego registry update
* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
spnego: add missing OID to oid registry
smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
cifs: update internal module version number for cifs.ko
smb3: allow controlling maximum number of cached directories
smb3: add trace point for queryfs (statfs)
nls: Hide new NLS_UCS2_UTILS
smb3: allow controlling length of time directory entries are cached with dir leases
smb: propagate error code of extract_sharename()
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that can't be extracted.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that does nothing.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iov_iter_extract_pages() doesn't correctly handle skipping over initial
zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.
The problem is that it accidentally reduces maxsize to 0 when it
skipping and thus runs to the end of the array and returns 0.
Fix this by sticking the calculated size-to-copy in a new variable
rather than back in maxsize.
Fixes: 7d58fe7310 ("iov_iter: Add a function to extract a page list from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
- sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEYv+KdYTgKVaVRgAGdCY7N/W1+RMFAmT8zzIACgkQdCY7N/W1
+RNmHhAAxp8Hme2NiDlhoZkYj7I2WOzKcHmAxD2Ss+pLWwwoo9g1DgD/7aOuQu2m
lkeI3+b1xaRQY1quSVvEzbuIzeKvBnzBT1utceITaH9AKJ8ffha4OHgO4LMBCH3A
UGWPK/Yp+KuAPjFvmvY6jZvEluLWQWLAFc0Gxs8thxR09TjEhAhMCvK+YAYe6GVe
uFhFklBNkCbcR/TelhFXPscrTUfQQtM1VjLNuj6Uh1G5gMDBPSiCJVGobnAtJOdq
cVZG8oUAT6UUl6d6hDSucv0APS5bpWYJBfXBlwopOokA+uKZOAT75mf3WwOLperT
l2HQzvFamP3p1thqrUyqsLq6i0DtP0Aaa+M5lHqM6yX+H0mEmBD6JXhCL/Ka7g2/
ezA5LZTmfOlcXb4zAmORZA7jp8/KH7q1bfGARV79jHOZ+unUQ4aHufo7+hw+WMO3
nwELWLBGuCn/Gd8bNTHSPHCY2hiWep4eFSqpevI+JQGJ8/Jvpo7WdllZ1BqZ9bpB
yhABvKt+bBLfXElG7kH1ctBVaAubWai7bBvF/ASIC08dceP2SbYojj/msZOz0SsM
gGUMrXM7jZ/jEsWuDd5uOFs+QupA1HhDtcF1KRFx6o4//EDOn2g/0rB+GVaSCcI0
mddhtvRWjYRO9po8IAnprH+qU67ZWy5ouZ6U0/i2FL8E/rNoHfk=
=sNwf
-----END PGP SIGNATURE-----
Merge tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from Adrian Glaubitz:
- Fix a use-after-free bug in the push-switch driver (Duoming Zhou)
- Fix calls to dma_declare_coherent_memory() that incorrectly passed
the buffer end address instead of the buffer size as the size
parameter
* tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
* The kernel now dynamically probes for misaligned access speed, as
opposed to relying on a table of known implementations.
* Support for non-coherent devices on systems using the Andes AX45MP
core, including the RZ/Five SoCs.
* Support for the V extension in ptrace(), again.
* Support for KASLR.
* Support for the BPF prog pack allocator in RISC-V.
* A handful of bug fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmT8eV0THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiQYTD/9V6asKMDdWUV+gti/gRvJsiYUjIrrK
h4MB8hL3fHfCLBpTD4rU6K1Gx6hzPjGsxIuQyAq/hf752KB/9XUiIVziRBv2ZEBb
GuTFCXfg0QXBUlxBZzFw5SKUuKXgRaMAQ14qjy3tfLk31YMQmBtAlEPdDM8mZOCQ
zNI3bbdn6zASeaSMh7hwBoOJWP2ACoOEW7RcD44EDT8jb3YW5rEF86x0XtYLgJb6
xhaR4ieIdaOLxz2RbjXj0GcPIBfhTxZbwN3fLlD8PxuGqCKn5kN03bPPwP9tMTAc
z02EgVcSDvJWpYikuuTkPMxpSi18OZPJ6eriwOv5ccP5NXQScO09iGo7IZEM7OzO
j1IrIXyncU4BhxlpWombU454Va+ezUlfh9uh+MrJ+Bnve3T3S9ax7AV4S8vkJZlT
bnmJVS/g7L/7nxTQdJ3zoAo2WzFQXL0C8SR5tGo/3aRk0uYoliHy/W419f55F9GZ
rFcc+LMqai8N4bLN3whaK0NnuodNWHoNlpcd/5ncJwecswuDkah3LWcd4rwBrWhu
8RIkIfpdr/vTQjUVXVLeMHdKB+lST3iF1feMqJj0PfTyvTZi5yfSppjAfkAdVq+9
lHqAjsaGdiCrOtLxb0oBR2PTDQPAm2gN2meuSMommDQR6Vul8K5WcQml9Zx9QEWA
eDXWYDZISKYJbA==
=s89m
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The kernel now dynamically probes for misaligned access speed, as
opposed to relying on a table of known implementations.
- Support for non-coherent devices on systems using the Andes AX45MP
core, including the RZ/Five SoCs.
- Support for the V extension in ptrace(), again.
- Support for KASLR.
- Support for the BPF prog pack allocator in RISC-V.
- A handful of bug fixes and cleanups.
* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
bpf, riscv: use prog pack allocator in the BPF JIT
riscv: implement a memset like function for text
riscv: extend patch_text_nosync() for multiple pages
bpf: make bpf_prog_pack allocator portable
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
RISC-V: Add ptrace support for vectors
soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
cache: Add L2 cache management for Andes AX45MP RISC-V core
dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
riscv: mm: dma-noncoherent: nonstandard cache operations support
riscv: errata: Add Andes alternative ports
riscv: asm: vendorid_list: Add Andes Technology to the vendors list
...
The original code puts flush_work() before timer_shutdown_sync()
in switch_drv_remove(). Although we use flush_work() to stop
the worker, it could be rescheduled in switch_timer(). As a result,
a use-after-free bug can occur. The details are shown below:
(cpu 0) | (cpu 1)
switch_drv_remove() |
flush_work() |
... | switch_timer // timer
| schedule_work(&psw->work)
timer_shutdown_sync() |
... | switch_work_handler // worker
kfree(psw) // free |
| psw->state = 0 // use
This patch puts timer_shutdown_sync() before flush_work() to
mitigate the bugs. As a result, the worker and timer will be
stopped safely before the deallocate operations.
Fixes: 9f5e8eee5c ("sh: generic push-switch framework.")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20230802033737.9738-1-duoming@zju.edu.cn
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
In all these cases, the last argument to dma_declare_coherent_memory() is
the buffer end address, but the expected value should be the size of the
reserved region.
Fixes: 39fb993038 ("media: arch: sh: ap325rxa: Use new renesas-ceu camera driver")
Fixes: c2f9b05fd5 ("media: arch: sh: ecovec: Use new renesas-ceu camera driver")
Fixes: f3590dc329 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera driver")
Fixes: 186c446f4b ("media: arch: sh: migor: Use new renesas-ceu camera driver")
Fixes: 1a3c230b41 ("media: arch: sh: ms7724se: Use new renesas-ceu camera driver")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://lore.kernel.org/r/20230724120742.2187-1-petrtesarik@huaweicloud.com
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Mostly small stragglers that missed the initial merge. Driver updates
are qla2xxx and smartpqi (mp3sas has a high diffstat due to the
volatile qualifier removal, fnic due to unused function removal and
sd.c has a lot of code shuffling to remove forward declarations).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZPyD4CYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZWGAQDlh/3q
5YJp7f8sIqmgdOiKl3bln3API9Y0MPsC3z5TsAEAv9LYQZH3He4XvxMy/v5FioEs
8IWIoBsUZtcgoK6mI4w=
=8Aj8
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Mostly small stragglers that missed the initial merge.
Driver updates are qla2xxx and smartpqi (mp3sas has a high diffstat
due to the volatile qualifier removal, fnic due to unused function
removal and sd.c has a lot of code shuffling to remove forward
declarations)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (38 commits)
scsi: ufs: core: No need to update UPIU.header.flags and lun in advanced RPMB handler
scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD
scsi: mpt3sas: Remove volatile qualifier
scsi: mpt3sas: Perform additional retries if doorbell read returns 0
scsi: libsas: Simplify sas_queue_reset() and remove unused code
scsi: ufs: Fix the build for the old ARM OABI
scsi: qla2xxx: Fix unused variable warning in qla2xxx_process_purls_pkt()
scsi: fnic: Remove unused functions fnic_scsi_host_start/end_tag()
scsi: qla2xxx: Fix spelling mistake "tranport" -> "transport"
scsi: fnic: Replace sgreset tag with max_tag_id
scsi: qla2xxx: Remove unused variables in qla24xx_build_scsi_type_6_iocbs()
scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error
scsi: smartpqi: Change driver version to 2.1.24-046
scsi: smartpqi: Enhance error messages
scsi: smartpqi: Enhance controller offline notification
scsi: smartpqi: Enhance shutdown notification
scsi: smartpqi: Simplify lun_number assignment
scsi: smartpqi: Rename pciinfo to pci_info
scsi: smartpqi: Rename MACRO to clarify purpose
scsi: smartpqi: Add abort handler
...
Here is one last fixup for your tree for 6.6-rc1. It resolves a problem
with the way that symbol_get was changed in the module tree merge in
your tree to fix up the DVB drivers which rely on this old api to attach
new devices.
As the changelog comment says:
In commit 9011e49d54 ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly restricted
to GPL-only marked symbols. This interacts oddly with the DVB logic
which only uses dvb_attach() to load the dvb driver which then uses
symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols as
EXPORT_SYMBOL_GPL().
This has been acked by Hans from the V4L driver side, and from Luis from
the module side, and Christoph said it was the correct solution, and was
tested by the original reporter of the issue.
It has passed 0-day testing, but has not been in linux-next due to it
only being sent yesterday.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPwnEQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylvTQCgxfqe4gH03Pqiw/52XUvn40XvPtUAoJRfLVsp
GwCxypgSc0F+SPHRaaiO
=BvGY
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver symbol lookup fix from Greg KH:
"Here is one last fixup for your tree for 6.6-rc1. It resolves a
problem with the way that symbol_get was changed in the module tree
merge in your tree to fix up the DVB drivers which rely on this old
api to attach new devices.
As the changelog comment says:
In commit 9011e49d54 ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly
restricted to GPL-only marked symbols. This interacts oddly with the
DVB logic which only uses dvb_attach() to load the dvb driver which
then uses symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols
as EXPORT_SYMBOL_GPL().
This has been acked by Hans from the V4L driver side, Luis from the
module side, Mauro on the media side, and Christoph said it was the
correct solution, and was tested by the original reporter of the
issue.
It has passed 0-day testing, but has not been in linux-next due to it
only being sent yesterday"
* tag 'driver-core-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
media: dvb: symbol fixup for dvb_attach()
- move a dma-debug call that prints a message out from a lock that's
causing problems with the lock order in serial drivers (Sergey Senozhatsky)
- fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right dependency
on not default to y (Christoph Hellwig)
- move an ifdef a bit to remove a __maybe_unused that seems to trip up
some sensitivities (Christoph Hellwig)
- revert a bogus check in the CMA allocator (Zhenhua Huang)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmT8MJ8LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOOyg/7BaZt4lih24cSFhXsU10wB49cKQuOngsT+4KFX7Eg
8LqdCK/L61M1VWm/1tcw6/XKK7oBDADIWnlutFTv1BHIaC0fjbFk0IpOQyWoXsYE
vlCzm6Sh1rrryYK/dviLwC3+ccF33C5dzsZsKzeJ89v0cP//rENdOfXhwAIT2Wc7
FG/h0W7wWeQG4jHCz+DGhRp7X6r5urwW72KNap4BlBaDpZdAV9E5w1OeaplnENXt
E7CK8rnHZz/AgH98LIMa929fgNhJ7Bec5mV9pItpBzYWAwL2iWk6k11FbAkxDiFQ
MQ7gMnH5KRkCVpH/QILX1NU5ImsjdjyCUYn+2q8OJ5Y2C42K7V5ClspIgBCFk7XH
AbCAGveoRUoQ3iCnQYZ0YPJOtyNSaUNB+q4NzwR/WFipiXGJBPflggI8gPzRiN8e
8SnKhduODklzLhOZZr7+nUEJpmwgR8aCkmhERZN/bw8iidBs/chiy9t1PfLmiOH9
R545BWEfsgpRikpEgb0HnuGD/zg26LcugLUilNbSZq0fzH4tkD5iCKKUitAITYLP
ED0wDj9AdyQ98L6aSgAxc+9ip3eATFntM6hlg/16Ve4zdDTbFgucZpJhdSuVF8BI
rq1VV3wztUm++zhkTFptK9eN2T+AQtpC+3XI6QyXpByADbX+m8GbQJm+r8V3hu5A
X64=
=AvYr
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- move a dma-debug call that prints a message out from a lock that's
causing problems with the lock order in serial drivers (Sergey
Senozhatsky)
- fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right
dependency and not default to y (Christoph Hellwig)
- move an ifdef a bit to remove a __maybe_unused that seems to trip up
some sensitivities (Christoph Hellwig)
- revert a bogus check in the CMA allocator (Zhenhua Huang)
* tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping:
Revert "dma-contiguous: check for memory region overlap"
dma-pool: remove a __maybe_unused label in atomic_pool_expand
dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmT8goMUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vz2jxAAraKdqDFC7ARsslpG2KI4sUad/2S6
eMgEE4ud/yJzs8NbMKH0QZvnBo+TbTGoxakersX9MBU+FNSofll7BTdfURwV8pmx
BhAycP9kFBljBFQAYEpxJ2pBIEqReq+58Fs9oHHgMg5WcJ9VvRecFnP6stXhH8Jd
taFQ2vlYDMDFnF++nSIK9wvZzN7Ku+fng9ChdI2OZFqx8EX6XaDE3c/1LZP2vms7
SoaC8ZxI+66u1oo51XZz0QPWLVrKxaVRACtn02HbiQ7YLpfOK0zyPXUMwTpcOgng
urwcd/pDZoOF8DPjhPd6UwJKGs15swupzNiyQKNSnReff8DpzA+SWVe3ntIX/NF4
AVscmLa/XntiiXHgeXtWXc6Ve7255eOQruy3tt0A5bmWGDWY1Ik+hpPC1yevZfYU
0DMrVZPNntSEmEKml4iPQptvEGAVcYFN0VCemcEunqcDFyUaCh7SkIoKL80rf/h8
bIbDHOGYYYmGH9AwOYgUcmwAYK154z99EvO45ptiXuC05wfw5X3JedO8S2YsK7Yx
6ZCzs/ZpvPpTxnvoDw59qVcZ1T6PR8BoExEYKUXuBfgYVivoGHHh/BNaRsKqc1F4
+FA/Yn2JrUoANNm1uyJv+z0zteXzOH/hIPxnPajH8Ojh2lCdniOBi8+SZgk7PUss
k4CDTrVHH1eO73M=
=MzOj
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Add PCI_DYNAMIC_OF_NODES dependency on OF_IRQ to fix sparc64 build
error (Lizhi Hou)
- After coalescing host bridge resources, free any released resources
to avoid a leak (Ross Lagerwall)
- Revert a quirk that prevented NVIDIA T4 GPUs from using Secondary Bus
Reset. The quirk worked around an issue that we now think is related
to the Root Port, not the GPU (Bjorn Helgaas)
* tag 'pci-v6.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: Mark NVIDIA T4 GPUs to avoid bus reset"
PCI: Free released resource after coalescing
PCI: Fix CONFIG_PCI_DYNAMIC_OF_NODES kconfig dependencies
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEoE9b9c3U2JxX98mqbmZLrHqL0iMFAmTspXgACgkQbmZLrHqL
0iOD1hAAjQfLo8AmAKN0xWhr7+8HQjdSHgkqnOgeySDvWu40aWlSRRQ6bsuXoQ9G
TGNX6Emqsgc6CcZ63x5yeerLMudaaSliRSDdcN3HUTmnnqSl/mDteBIsYRpsjpA8
hl7qZU377N8WRTglQjvta62jVp/o67EJxQGqhQiI4Yi9GY0KLB8Pn6ta8A0S2x5A
C8RHUv7O7006Ox64uYP9Woes469oltPvkU2RiDnGoaZ3055SFkoylB1yflpK74JM
fipTM6Y4ciiEbEJT8H1gIYd8qVEDHn4r7QLA2m9oZo1/uhDErVCKMS7sOmHvphfP
aBXpPv5H4f9wQ6MXuFlZWFmbQXzVdYcL6eje24rcYNSGNLbjl92FqmZoH8wU5MUB
JXuQdJ6+eDMyue78n9XOurBbTVUSq+KIeSpWktheBujNiBEgQ08O4Nr8TBJsPttR
vcBqX+QcK0ivTgtWRojVyd3rlJcyHFbhaOk4x+5Og1VlLyi+dkfbcOiKpaMTZyIs
cw7aLhK4mlUfL26U/bPLSFo7PmWaNO/UaialtykqG+9mELW2WiobBKEHCXUw7xs2
TuFkRDv42w3DpKPm2cr93wbxbcoBFpOZMcX1CgYKBOGVon2UZDs26Jtvm8yj//Px
nkqDRwYWFVJVF/3RIq0dGW24GbNH7Y5IK4OEqr+Cd71U7lyTjMA=
=rOGo
-----END PGP SIGNATURE-----
Merge tag 'ntb-6.6' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Link toggling fixes and debugfs error path fixes"
[ And for everybody like me who always have to remind themselves what
the TLA of the day is, and what NTB stands for - it's a PCIe
"Non-Transparent Bridge" thing - Linus ]
* tag 'ntb-6.6' of https://github.com/jonmason/ntb:
ntb: Check tx descriptors outstanding instead of head/tail for tx queue
ntb: Fix calculation ntb_transport_tx_free_entry()
ntb: Drop packets when qp link is down
ntb: Clean up tx tail index on link down
ntb: amd: Drop unnecessary error check for debugfs_create_dir
NTB: ntb_tool: Switch to memdup_user_nul() helper
dtivers: ntb: fix parameter check in perf_setup_dbgfs()
ntb: Remove error checking for debugfs_create_dir()
Add missing OID to the registry. Some servers and clients (including
Windows) now request "NEGOEX - SPNEGEO Extended Negotiation Security")
See https://datatracker.ietf.org/doc/html/draft-zhu-negoex-02
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In commit 9011e49d54 ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly restricted
to GPL-only marked symbols. This interacts oddly with the DVB logic
which only uses dvb_attach() to load the dvb driver which then uses
symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols as
EXPORT_SYMBOL_GPL().
Fixes: 9011e49d54 ("modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules")
Cc: stable <stable@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-media@vger.kernel.org
Cc: linux-modules@vger.kernel.org
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Link: https://lore.kernel.org/r/20230908092035.3815268-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmT7eEwACgkQiiy9cAdy
T1EY1Qv+OIgraDJ23wl4FX17dEQLKqcVH5wCWhU9IEq9oIhKSMngDE4kAKu/unEJ
GOKy9QfY2eshPWHCoZjh9lI1Xg776nC3GeAtekOFHXX9FXRZfWP0+Sp6HRNk1TxD
iTrRUDGnGA20QF7tgxNo5EJb2yvxBmEmhiSTwlFUuV+oOjvbQw0eB2EheEEzhPJF
7qrlvC0GvvgUOrVVg6YEMCNFb9bvrns06Rwrl3Hq86gaauXIg4pBIF/AHiwh7jCo
+upXo1vgfWp29XMwX79LqCOx9u96q0i5AZonXTN2W/AxsgsWA6RQnGyZmL+Q/2/4
w4jN76NaLspoVE8grbzcSJq8b3Rlx2HUlzBaZgqAbuB8zx/59RXf559Aewt+Y9Pt
anGsTTpp6a9nc9YYlAGSmaA97y5fnHO47sHpKHVhR9vKbQgvnqGzvmqgNUeRbDzb
wgUjOma6SPfZL6JNqNb+SrktI410MjS11/+mP4NlTrVVomYyKZXslcvpnMuoiHMH
U4RpYB84
=9Rd7
-----END PGP SIGNATURE-----
Merge tag '6.6-rc-ksmbd' of git://git.samba.org/ksmbd
Pull smb server update from Steve French:
"After two years, many fixes and much testing, ksmbd is no longer
experimental"
* tag '6.6-rc-ksmbd' of git://git.samba.org/ksmbd:
ksmbd: remove experimental warning
- Fix a bug encountered by people using bittorrent where they'd get
NULL pointer dereferences on page cache lookups when using XFS
- Two documentation fixes
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmT7XLYACgkQDpNsjXcp
gj6qrAf+LiAs3dUELOjrqaQQbbNGp4na+YwJCiezuvwZn8P+ieJpt6QCEDHEb1jH
LCOjr0GFMhHnAWp9Q0Qay4IXoKk8DPkA/avSaZgsl5blmMyNqFMgHklU7mjRvhCG
ayb/NeZYwrJhA9NyueXYuH3h7QDryxyIN3TZS1/7z13YrohMIQeu3q7X/ZBMh7NS
uPd7vmDj8TnZ/agQzplQ4XDov9lrzkUXDJqpMvn/Gbr4K7y66UZa3SLxi1JPrnah
ffDvBlK2OImNBoaADfiRImWc7QlXVkF/B08xUcJ6tXAeO6xJDykkie+gjsF2S040
YP2YIG+IWi47zqa25EuxFRtavwUh6w==
=4GKy
-----END PGP SIGNATURE-----
Merge tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray
Pull xarray fixes from Matthew Wilcox:
- Fix a bug encountered by people using bittorrent where they'd get
NULL pointer dereferences on page cache lookups when using XFS
- Two documentation fixes
* tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray:
idr: fix param name in idr_alloc_cyclic() doc
xarray: Document necessary flag in alloc functions
XArray: Do not return sibling entries from xa_load()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmT7NSUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuIuEACxQj0JjIVwgp8VlUcMAEFap7GH940qs8is
tnRrGS48G/z3WZ0zizGQ1hF2m9oOJ1NwZ1bBrGJlVBP8KSgyR9IxI6WVt0JBZgg8
DNQt+v5FoX4v5ZbFr/NTqz/6YnPWVwyTkJs4JIi9YLmpKkd58N+51n3wl56xyrp7
8OcjHPH1ZI0qGar2poPXI7EKaweSzJW7nr3/cHujYoCfAGn7eZbHNgY0wdAtWdhM
oGFABgMKTFKA0rCSaP8uo7GIqov+KXvFUNOuC6eNjtaUvXMecPLEJxvDhhi52fml
LEH1a9bmtazu6bufrbI2D4SmXLNBcqnQw5d/1pn9oRO/UU6AAi6E7I/L1iodKWav
bkr2i+YMnhUqVDFRwy+j78bcGRlLPWupvj68RdpvAvzhdW/85p9nH66CJfTnjQ3q
xJJKoUREd/8C6ktm5xHYyETxcL5q6Vdz1w5GmsCyOWWpsXsEbh0GHn6pRDkLWJf4
OQhO4CbxIARGE5NRK8MYl/J/581yS1cIdGZAg7hsVHzTGDSt3h/Hlh6ONpWCcvXD
SJI39W83oBSOnsbRG4FU6Bnt/WHTD0yi07cr867l9HEq63JOK/OduMjXA/tdRfwP
fZU3RfFYcS8+h9UHgrSgvCk03fCjq1HsWCDfpc6sUuRG1fH6jSNFdkCKqhCoU9Py
K2IKi1IE0Q==
=4SLW
-----END PGP SIGNATURE-----
Merge tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix null_blk polled IO timeout handling (Chengming)
- Regression fix for swapped arguments in drbd bvec_set_page()
(Christoph)
- String length handling fix for s390 dasd (Heiko)
- Fixes for blk-throttle accounting (Yu)
- Fix page pinning issue for same page segments (Christoph)
- Remove redundant file_remove_privs() call (Christoph)
- Fix a regression in partition handling for devices not supporting
partitions (Li)
* tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux:
drbd: swap bvec_set_page len and offset
block: fix pin count management when merging same-page segments
null_blk: fix poll request timeout handling
s390/dasd: fix string length handling
block: don't add or resize partition on the disk with GENHD_FL_NO_PART
block: remove the call to file_remove_privs in blkdev_write_iter
blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative
blk-throttle: print signed value 'carryover_bytes/ios' for user
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmT7NXsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphevEACwpgak3KV+zE+u/2g8rsjRu9FDkgi9hW+7
RKbwPtgTN3R3/z+eGq57gWW3aSJx0Mqaxf/mtEqf+K+yxK3FbDDKAkkwMG0S/TGT
FT+beCeJHiA9tTruoW1ipGh7gxS0D3JWwPRl5i7+rtagzdbOSYWmsrNPVXl3uVee
x6sORlZ3Ye2VRd0m+3fQ7nWw+epMKebMrlpmWC7og0mdt/zhF+q6zgAV8tdEYe7E
VQ8yrWa9LyYw07a+M2kra2+IZzSWYZE7qZE1OZcwbI9ARvqNkXyslqXfW0gVrDml
lfoUnwqsRI45RfTOTao6lAzc5QZHTlrl1Mqg79139grDaEM/bCx/kQq2ZUG03Tcy
FUgKNrz6E/te5u1ar8E8ppXEAmjjAsKiYZsT+d5nHv7xPV9F15Ryu+Mr9sFAiHXw
hMyInmBJ6JcQx9BYpvxTMi7N4Mfiw2Z9GTE2yhhVPvEAG418+YYa7CSNdXVSy/M2
QuAQEc+K1CcVhmgNZbSe2FnSCve1khHURT0C3PYI0u1VHpZmkzcSY5utdKuqeRDw
SqHk/XO3bBWW13UuLBnIB8+jk1AGKDc0vJQrdzxxxSYHc7uTx30QHYQMdA90Bqyx
2z/lg7IwfITgFkRO30AGXbe2NN4YnbseJ5wyfHvlN/WcHrDo9zbZUt6gReVCcOWA
/NBftGe8WA==
=eHn2
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into the 6.6-rc merge window:
- Fix for a regression this merge window caused by the SQPOLL
affinity patch, where we can race with SQPOLL thread shutdown and
cause an oops when trying to set affinity (Gabriel)
- Fix for a regression this merge window where fdinfo reading with
for a ring setup with IORING_SETUP_NO_SQARRAY will attempt to
deference the non-existing SQ ring array (me)
- Add the patch that allows more finegrained control over who can use
io_uring (Matteo)
- Locking fix for a regression added this merge window for IOPOLL
overflow (Pavel)
- IOPOLL fix for stable, breaking our loop if helper threads are
exiting (Pavel)
Also had a fix for unreaped iopoll requests from io-wq from Ming, but
we found an issue with that and hence it got reverted. Will get this
sorted for a future rc"
* tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linux:
Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"
io_uring: fix unprotected iopoll overflow
io_uring: break out of iowq iopoll on teardown
io_uring: add a sysctl to disable io_uring system-wide
io_uring/fdinfo: only print ->sq_array[] if it's there
io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
io_uring: Don't set affinity on a dying sqpoll thread