vhost: Fix worker hangs due to missed wake up calls

We can race where we have added work to the work_list, but
vhost_task_fn has passed that check but not yet set us into
TASK_INTERRUPTIBLE. wake_up_process will see us in TASK_RUNNING and
just return.

This bug was intoduced in commit f9010dbdce ("fork, vhost: Use
CLONE_THREAD to fix freezer/ps regression") when I moved the setting
of TASK_INTERRUPTIBLE to simplfy the code and avoid get_signal from
logging warnings about being in the wrong state. This moves the setting
of TASK_INTERRUPTIBLE back to before we test if we need to stop the
task to avoid a possible race there as well. We then have vhost_worker
set TASK_RUNNING if it finds work similar to before.

Fixes: f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230607192338.6041-3-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This commit is contained in:
Mike Christie 2023-06-07 14:23:38 -05:00 committed by Michael S. Tsirkin
parent a284f09eff
commit 4b13cbef79
2 changed files with 12 additions and 8 deletions

View File

@ -341,6 +341,8 @@ static bool vhost_worker(void *data)
node = llist_del_all(&worker->work_list);
if (node) {
__set_current_state(TASK_RUNNING);
node = llist_reverse_order(node);
/* make sure flag is seen after deletion */
smp_wmb();

View File

@ -28,10 +28,6 @@ static int vhost_task_fn(void *data)
for (;;) {
bool did_work;
/* mb paired w/ vhost_task_stop */
if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags))
break;
if (!dead && signal_pending(current)) {
struct ksignal ksig;
/*
@ -48,11 +44,17 @@ static int vhost_task_fn(void *data)
clear_thread_flag(TIF_SIGPENDING);
}
did_work = vtsk->fn(vtsk->data);
if (!did_work) {
set_current_state(TASK_INTERRUPTIBLE);
schedule();
/* mb paired w/ vhost_task_stop */
set_current_state(TASK_INTERRUPTIBLE);
if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
__set_current_state(TASK_RUNNING);
break;
}
did_work = vtsk->fn(vtsk->data);
if (!did_work)
schedule();
}
complete(&vtsk->exited);