Commit Graph

29 Commits

Author SHA1 Message Date
Christian König 6f05c4e9d1 drm/amdgpu: move static CSA address to top of address space v2
Move the CSA area to the top of the VA space to avoid clashing with
HMM/ATC in the lower range on GFX9.

v2: wrong sign noticed by Roger, rebase on CSA_VADDR cleanup, handle VA
hole on GFX9 as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-02-19 14:18:48 -05:00
Monk Liu 84e5b5161e drm/amdgpu:free CSA in unified place
instead of doing it in each GFX ip's sw_fini

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-06 12:47:51 -05:00
Monk Liu 75bc6099bc drm/amdgpu:read VRAMLOST from gim
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04 16:41:45 -05:00
Monk Liu 13a752e3a2 drm/amdgpu:cleanup in_sriov_reset and lock_reset
since now gpu reset is unified with gpu_recover
for both bare-metal and SR-IOV:

1)rename in_sriov_reset to in_gpu_reset
2)move lock_reset from adev->virt to adev

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04 16:41:31 -05:00
Monk Liu 5740682e66 drm/amdgpu:implement new GPU recover(v3)
1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset

2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently

3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit

V2:

4,in scheduler main routine the job from guilty context  will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error

5,in scheduler recovery routine all jobs from the guilty entity would be
dropped

6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.

V3:

7,replace deprecated gpu reset, use new gpu recover

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04 16:41:30 -05:00
pding b636176efd drm/amdgpu/virt: add wait_reset virt ops
Driver can use this interface to check if there's a function level
reset done in hypervisor. It's helpful when IRQ handler for reset
is not ready, or special handling is required.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04 16:33:13 -05:00
pding a16f8f11c5 drm/amdgpu/virt: add function to check MMIO (v2)
MMIO space can be blocked on virtualised device. Add this
function to check if MMIO is blocked or not.

Todo: need a reliable method such like communation
with hypervisor.

v2:
 - add comments inline

Signed-off-by: pding <Pixel.Ding@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04 16:33:13 -05:00
Horace Chen 2dc8f81e4f drm/amdgpu: SR-IOV data exchange between PF&VF
SR-IOV need to exchange some data between PF&VF through shared VRAM

PF will copy some necessary firmware and information to the shared
VRAM. It also requires some information from VF. PF will send a
key through mailbox2 to help guest calculate checksum so that it can
verify whether the data is correct.

So check the data on the specified offset of the shared VRAM, if the
checksum is right, read values from it and write some VF information
next to the data from PF.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-10-19 15:26:59 -04:00
Alex Deucher e23b74aab5 drm/amdgpu: fix vf error handling
The error handling for virtual functions assumed a single
vf per VM and didn't properly account for bare metal.  Make
the error arrays per device and add locking.

Reviewed-by: Gavin Wan <gavin.wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-28 16:03:20 -04:00
Christian König 0f4b3c6862 drm/amdgpu: cleanup static CSA handling
Move the CSA bo_va from the VM to the fpriv structure.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-08-17 15:46:05 -04:00
Gavin Wan 890419409a drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox.
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the
trans_error function does nothing.

The error information includes error_code (16bit), error_flags(16bit)
and error_data(64bit). Since there are not many errors, we keep the
errors in an array and transfer all errors to Host before amdgpu
initialization function (amdgpu_device_init) exit.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-14 11:05:52 -04:00
Monk Liu 7225f8736c drm/amdgpu:use job* to replace voluntary
that way we can know which job cause hang and
can do per sched reset/recovery instead of all
sched.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24 17:40:38 -04:00
Shaoyun Liu cdf6adb28f drm/amdgpu: Move kiq ring lock out of virt structure
The usage of kiq should not depend on the virtualization.

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by:Andres Rodriquez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24 17:40:12 -04:00
Xiangliang Yu 904cd3891d drm/amdgpu/virt: add two functions for MM table
Add two functions to allocate & free MM table memory.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-04-28 17:32:58 -04:00
Xiangliang Yu ecb2b9c698 drm/amdgpu/virt: add structure for MM table
Add new structure for MM table for multi media scheduler of sriov.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-29 23:55:10 -04:00
Monk Liu 480da26260 drm/amdgpu:use work instead of delay-work
no need to use a delay work since we don't know how
much time hypervisor takes on FLR, so just polling
and waiting in a work.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-29 23:53:11 -04:00
Monk Liu 147b5983bb drm/amdgpu:add lock_reset for SRIOV
this lock is used for sriov_gpu_reset, only get this mutex
can run into sriov_gpu_reset.

we have couple source triggers gpu_reset for SRIOV:
1) submit timedout and trigger reset voluntarily
2) invalid instruction detected by ENGINE and trigger reset voluntarily
2) hypervisor found world switch hang and trigger flr and notify guest to
   do reset.

all need take care and we need a mutex to protect the consistency of
reset routine.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-29 23:52:46 -04:00
Monk Liu ed17c71b3a drm/amdgpu:change kiq lock name
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-29 23:52:45 -04:00
Monk Liu a90ad3c2af drm/amdgpu:implement SRIOV gpu_reset (v2)
implement SRIOV gpu_reset for future use.
it wil be called from:
1) job timeout
2) privl access or instruction error interrupt
3) hypervisor detect VF hang

v2: agd: rebase on upstream

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-29 23:52:45 -04:00
Monk Liu ae65a26dd3 drm/amdgpu:add META_DATA struct for CSA/SRIOV v2
META-DATA is used in GFX cmd submit, we have two
format suit for META-DATA-init, one is legacy and another
is for chained-ib preempt, which is used in vulkan
UMD.

v2: drop use CP version number to judge if chain-ib
supports or not, we wait for it mature

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:33 -05:00
Xiangliang Yu ab71ac56f6 drm/amdgpu/virt: implement VI virt operation interfaces
VI has asic specific virt support, which including mailbox and
golden registers init.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:24 -05:00
Xiangliang Yu 1e9f139279 drm/amdgpu/virt: add high level interfaces for virt
Add high level interfaces that is not relate to specific asic. So
asic files just need to implement the interfaces to support
virtualization.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:23 -05:00
Xiangliang Yu bc992ba5a3 drm/amdgpu/virt: use kiq to access registers (v2)
For virtualization, it is must for driver to use KIQ to access
registers when it is out of GPU full access mode.

v2: agd: rebase

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:23 -05:00
Xiangliang Yu 5ec9f06e10 drm/amdgpu/virt: add runtime flag
Add new flag to define gpu runtime that is out of full gpu access.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:22 -05:00
Xiangliang Yu 880e87e380 drm/amdgpu/gfx8: implement emit_rreg/wreg function
Implement emit_rreg/wreg function for kiq ring.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:21 -05:00
Monk Liu 4e4bbe7343 drm/amdgpu:add new file for SRIOV
for SRIOV usage, CSA is only used per device and each
VM will map on it.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:19 -05:00
Monk Liu bd7de27d81 drm/amdgpu:new field members for SRIOV
and implement CSA functions in this file

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:19 -05:00
Xiangliang Yu 5a5099cbf4 drm/amdgpu/virt: rename fieldes of virtualization structure
Use acronym to rename fields to make easy to spell out.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:13:04 -05:00
Monk Liu ceeb50ed77 drm/amdgpu:cleanup virt related define
move virtual machine related structure to amdgpu_virt.h
easy for developer to maintain for virualization stuffs

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-09-22 10:24:20 -04:00