linux-stable/drivers/misc/habanalabs
Oded Gabbay 5c2207ba23 habanalabs: increase timeout during reset
[ Upstream commit 7a65ee046b ]

When doing training, the DL framework (e.g. tensorflow) performs hundreds
of thousands of memory allocations and mappings. In case the driver needs
to perform hard-reset during training, the driver kills the application and
unmaps all those memory allocations. Unfortunately, because of that large
amount of mappings, the driver isn't able to do that in the current timeout
(5 seconds). Therefore, increase the timeout significantly to 30 seconds
to avoid situation where the driver resets the device with active mappings,
which sometime can cause a kernel bug.

BTW, it doesn't mean we will spend all the 30 seconds because the reset
thread checks every one second if the unmap operation is done.

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:28 +02:00
..
goya habanalabs: patched cb equals user cb in device memset 2020-03-12 13:00:11 +01:00
include habanalabs: stop using the acronym KMD 2019-09-05 14:55:27 +03:00
asid.c habanalabs: stop using the acronym KMD 2019-09-05 14:55:27 +03:00
command_buffer.c habanalabs: stop using the acronym KMD 2019-09-05 14:55:27 +03:00
command_submission.c habanalabs: rate limit error msg on waiting for CS 2020-01-12 12:21:31 +01:00
context.c habanalabs: rate limit error msg on waiting for CS 2020-01-12 12:21:31 +01:00
debugfs.c habanalabs: replace __cpu_to_le32/64 with cpu_to_le32/64 2019-09-05 14:55:27 +03:00
device.c habanalabs: do not halt CoreSight during hard reset 2020-03-12 13:00:11 +01:00
firmware_if.c habanalabs: fix host memory polling in BE architecture 2019-07-29 11:40:25 +03:00
habanalabs.h habanalabs: increase timeout during reset 2020-06-24 17:50:28 +02:00
habanalabs_drv.c habanalabs: create two char devices per ASIC 2019-09-05 14:55:26 +03:00
habanalabs_ioctl.c habanalabs: add uapi to retrieve aggregate H/W events 2019-09-05 14:55:27 +03:00
hw_queue.c habanalabs: add uapi to retrieve device utilization 2019-09-05 14:55:27 +03:00
hwmon.c habanalabs: display card name as sensors header 2019-09-05 14:55:27 +03:00
irq.c habanalabs: replace __le32_to_cpu with le32_to_cpu 2019-09-05 14:55:27 +03:00
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Makefile treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
memory.c habanalabs: skip VA block list update in reset flow 2020-01-04 19:18:18 +01:00
mmu.c habanalabs: add WARN in case of bad MMU mapping 2019-05-31 18:25:20 +03:00
pci.c habanalabs: increase PCI ELBI timeout for Palladium 2019-05-13 14:44:50 +03:00
sysfs.c habanalabs: replace __cpu_to_le32/64 with cpu_to_le32/64 2019-09-05 14:55:27 +03:00