drm/amdgpu: Reset RAS error count and status regs

Reset the RAS error count and error status registers after
reading to prevent over reporting error counts on Aldebaran.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-By: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Mukul Joshi 2021-03-24 11:36:33 -04:00 committed by Alex Deucher
parent 5f41741a6d
commit 1f0d8e3781
1 changed files with 6 additions and 0 deletions

View File

@ -501,6 +501,12 @@ static ssize_t amdgpu_ras_sysfs_read(struct device *dev,
if (amdgpu_ras_query_error_status(obj->adev, &info))
return -EINVAL;
if (obj->adev->asic_type == CHIP_ALDEBARAN) {
if (amdgpu_ras_reset_error_status(obj->adev, info.head.block))
DRM_WARN("Failed to reset error counter and error status");
}
return sysfs_emit(buf, "%s: %lu\n%s: %lu\n", "ue", info.ue_count,
"ce", info.ce_count);
}