accel/habanalabs: reset device if scrubbing failed

If scrubbing memory after user released device has failed it means
the device is in a bad state and should be reset.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
This commit is contained in:
Oded Gabbay 2023-06-12 14:24:05 +03:00
parent 89803af535
commit 37d72439a4

View file

@ -454,8 +454,10 @@ static void hpriv_release(struct kref *ref)
/* Scrubbing is handled within hl_device_reset(), so here need to do it directly */
int rc = hdev->asic_funcs->scrub_device_mem(hdev);
if (rc)
if (rc) {
dev_err(hdev->dev, "failed to scrub memory from hpriv release (%d)\n", rc);
hl_device_reset(hdev, HL_DRV_RESET_HARD);
}
}
/* Now we can mark the compute_ctx as not active. Even if a reset is running in a different