linux-stable/kernel
Mike Christie f9010dbdce fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d50 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-01 17:15:33 -04:00
..
bpf bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps 2023-05-22 10:26:39 -07:00
cgroup cgroup changes for v6.4-rc1 2023-04-29 10:05:22 -07:00
configs Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
debug
dma dma-mapping updates for Linux 6.4 2023-04-29 10:29:57 -07:00
entry ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
events perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event() 2023-05-08 10:58:26 +02:00
futex
gcov
irq xen: branch for v6.4-rc4 2023-05-27 09:42:56 -07:00
kcsan - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
livepatch Scheduler changes for v6.4: 2023-04-28 14:53:30 -07:00
locking Two fixes for debugobjects: 2023-05-28 07:15:33 -04:00
module module: fix module load for ia64 2023-05-30 09:29:40 -07:00
power More power management updates for 6.4-rc1 2023-05-03 12:01:05 -07:00
printk - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
rcu RCU Changes for 6.4: 2023-04-24 12:16:14 -07:00
sched sched: fix cid_lock kernel-doc warnings 2023-05-08 10:58:28 +02:00
time tick/broadcast: Make broadcast device replacement work correctly 2023-05-08 23:18:16 +02:00
trace tracing: Have function_graph selftest call cond_resched() 2023-05-28 21:15:46 -04:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile modules-6.4-rc1 2023-04-27 16:36:55 -07:00
acct.c
async.c
audit.c
audit.h
audit_fsnotify.c
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
cfi.c
compat.c
configs.c
context_tracking.c
cpu.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
cpu_pm.c
crash_core.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
extable.c
fail_function.c
fork.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
freezer.c
gen_kheaders.sh
groups.c
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c
kallsyms.c
kallsyms_internal.h
kallsyms_selftest.c
kallsyms_selftest.h
kcmp.c
kcov.c
kexec.c
kexec_core.c
kexec_elf.c
kexec_file.c kexec: remove unnecessary arch_kexec_kernel_image_load() 2023-04-08 13:45:38 -07:00
kexec_internal.h
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c
ksysfs.c kernel/ksysfs.c: use sysfs_emit for sysfs show handlers 2023-03-24 17:09:14 +01:00
kthread.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
latencytop.c
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c convert setns(2) to fdget()/fdput() 2023-04-20 22:55:35 -04:00
padata.c
panic.c cpu: Mark nmi_panic_self_stop() __noreturn 2023-04-14 17:31:26 +02:00
params.c
pid.c pid: add pidfd_prepare() 2023-04-03 11:16:56 +02:00
pid_namespace.c kernel: pid_namespace: simplify sysctls with register_sysctl() 2023-05-02 19:23:29 -07:00
pid_sysctl.h kernel: pid_namespace: simplify sysctls with register_sysctl() 2023-05-02 19:23:29 -07:00
profile.c
ptrace.c ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
range.c
reboot.c
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-02 17:23:27 -07:00
resource.c
resource_kunit.c
rseq.c
scftorture.c
scs.c
seccomp.c seccomp: simplify sysctls with register_sysctl_init() 2023-04-13 11:49:20 -07:00
signal.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
smp.c trace,smp: Trace all smp_function_call*() invocations 2023-03-24 11:01:30 +01:00
smpboot.c
smpboot.h
softirq.c softirq: Add trace points for tasklet entry/exit 2023-04-15 10:17:16 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c
sys.c mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0 2023-05-02 17:21:49 -07:00
sys_ni.c
sysctl-test.c
sysctl.c mm: compaction: move compaction sysctl to its own file 2023-04-13 11:49:35 -07:00
task_work.c
taskstats.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
watch_queue.c modules-6.4-rc1 2023-04-27 16:36:55 -07:00
watchdog.c
watchdog_hld.c
workqueue.c workqueue changes for v6.4-rc1 2023-04-29 09:48:52 -07:00
workqueue_internal.h