Commit graph

151 commits

Author SHA1 Message Date
Dave Airlie
40d47c5fb4 amd-drm-next-6.9-2024-02-19:
amdgpu:
 - ATHUB 4.1 support
 - EEPROM support updates
 - RAS updates
 - LSDMA 7.0 support
 - JPEG DPG support
 - IH 7.0 support
 - HDP 7.0 support
 - VCN 5.0 support
 - Misc display fixes
 - Retimer fixes
 - DCN 3.5 fixes
 - VCN 4.x fixes
 - PSR fixes
 - PSP 14.0 support
 - VA_RESERVED cleanup
 - SMU 13.0.6 updates
 - NBIO 7.11 updates
 - SDMA 6.1 updates
 - MMHUB 3.3 updates
 - Suspend/resume fixes
 - DMUB updates
 
 amdkfd:
 - Trap handler enhancements
 - Fix cache size reporting
 - Relocate the trap handler
 
 radeon:
 - fix typo in print statement
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZdPLUgAKCRC93/aFa7yZ
 2AakAQDmhlQkJAxIxJw4/5mEQY5zaMJ033lcZGzBQbj8uL42pQD/aQ/gdN/bOfPZ
 gsdidzgL5MThBOFfw72pBkEoE+kQXgc=
 =oP2J
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-02-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-02-19:

amdgpu:
- ATHUB 4.1 support
- EEPROM support updates
- RAS updates
- LSDMA 7.0 support
- JPEG DPG support
- IH 7.0 support
- HDP 7.0 support
- VCN 5.0 support
- Misc display fixes
- Retimer fixes
- DCN 3.5 fixes
- VCN 4.x fixes
- PSR fixes
- PSP 14.0 support
- VA_RESERVED cleanup
- SMU 13.0.6 updates
- NBIO 7.11 updates
- SDMA 6.1 updates
- MMHUB 3.3 updates
- Suspend/resume fixes
- DMUB updates

amdkfd:
- Trap handler enhancements
- Fix cache size reporting
- Relocate the trap handler

radeon:
- fix typo in print statement

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219214810.4911-1-alexander.deucher@amd.com
2024-02-22 13:21:19 +10:00
Mario Limonciello
ce311df91d Revert "drm/amd: flush any delayed gfxoff on suspend entry"
commit ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee726395 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.

Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.

This reverts commit 0dee726395.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-13 08:59:50 -05:00
Dave Airlie
b344e64fbd amd-drm-next-6.9-2024-02-09:
amdgpu:
 - Validate DMABuf imports in compute VMs
 - Add RAS ACA framework
 - PSP 13 fixes
 - Misc code cleanups
 - Replay fixes
 - Atom interpretor PS, WS bounds checking
 - DML2 fixes
 - Audio fixes
 - DCN 3.5 Z state fixes
 - Remove deprecated ida_simple usage
 - UBSAN fixes
 - RAS fixes
 - Enable seq64 infrastructure
 - DC color block enablement
 - Documentation updates
 - DC documentation updates
 - DMCUB updates
 - S3 fixes
 - VCN 4.0.5 fixes
 - DP MST fixes
 - SR-IOV fixes
 
 amdkfd:
 - Validate DMABuf imports in compute VMs
 - SVM fixes
 - Trap handler updates
 
 radeon:
 - Atom interpretor PS, WS bounds checking
 - Misc code cleanups
 
 UAPI:
 - Bump KFD version so UMDs know that the fixes that enable the management of
   VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present
 - Add INFO query for input power.  This matches the existing INFO query for average
   power.  Used in gaming HUDs, etc.
   Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZcaM8gAKCRC93/aFa7yZ
 2L64AP9S8Wh5T2dEm3Nr8zBR008KdFQyOGVoO4qwlmyJMgin3wEA57gHiUrvs3o7
 HRR+PU4JMo4OxQZNpVQtYYHc1BL6nQU=
 =3AqF
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-02-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-02-09:

amdgpu:
- Validate DMABuf imports in compute VMs
- Add RAS ACA framework
- PSP 13 fixes
- Misc code cleanups
- Replay fixes
- Atom interpretor PS, WS bounds checking
- DML2 fixes
- Audio fixes
- DCN 3.5 Z state fixes
- Remove deprecated ida_simple usage
- UBSAN fixes
- RAS fixes
- Enable seq64 infrastructure
- DC color block enablement
- Documentation updates
- DC documentation updates
- DMCUB updates
- S3 fixes
- VCN 4.0.5 fixes
- DP MST fixes
- SR-IOV fixes

amdkfd:
- Validate DMABuf imports in compute VMs
- SVM fixes
- Trap handler updates

radeon:
- Atom interpretor PS, WS bounds checking
- Misc code cleanups

UAPI:
- Bump KFD version so UMDs know that the fixes that enable the management of
  VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present
- Add INFO query for input power.  This matches the existing INFO query for average
  power.  Used in gaming HUDs, etc.
  Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240209221459.5453-1-alexander.deucher@amd.com
2024-02-13 11:32:23 +10:00
Jani Nikula
345a36c4f1 drm/amdgpu: prefer snprintf over sprintf
This will trade the W=1 warning -Wformat-overflow to
-Wformat-truncation. This lets us enable -Wformat-overflow subsystem
wide.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com
2024-01-31 11:04:25 +02:00
Srinivasan Shanmugam
be91a828d0 drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting

Cc: Le Ma <Le.Ma@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:26 -05:00
Victor Lu
85150626ea drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0.

Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC
and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter.

Using amdgpu_sriov_runtime to determine whether to access via kiq or
RLC is sufficient for now.

v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call

v4: avoid using amdgpu_sriov_w/rreg

v3: use W/RREG32_XCC to handle non-kiq case

v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters
    of amdgpu_device_wreg/rreg

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:03:07 -05:00
Alex Deucher
ba0fb4b48c drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
Issues were reported with commit 1cfb4d6121
("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere
Altra Developer Platform (AVA developer platform).

Various ARM systems seem to have problems related
to PCIe and MMIO access.  In this case, I'm not sure
if this is specific to the ADLINK platform or ARM
in general.  Seems to be some coherency issue with
VRAM.  For now, just don't put MQDs in VRAM on ARM.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html
Fixes: 1cfb4d6121 ("drm/amdgpu: put MQDs in VRAM")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: alexey.klimov@linaro.org
2023-11-03 11:59:51 -04:00
Stanley.Yang
b1338a8e71 drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery
This is workaround, kiq ring test failed in suspend stage when do ras
recovery.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Lijo Lazar
4e8303cf2c drm/amdgpu: Use function for IP version check
Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:23:28 -04:00
Mario Limonciello
0dee726395 drm/amd: flush any delayed gfxoff on suspend entry
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry.  This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.

To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.

commit 4b31b92b14 ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.

This is dead code due to commit 10cb67eb8a ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code.  Remove that dead code.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16 11:35:14 -04:00
Srinivasan Shanmugam
50fbe0cc95 drm/amdgpu: Add -ENOMEM error handling when there is no memory
Return -ENOMEM, when there is no sufficient dynamically allocated memory

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Srinivasan Shanmugam
37c3fc6620 drm/amdgpu: Return -ENOMEM when there is no memory in 'amdgpu_gfx_mqd_sw_init'
Return -ENOMEM, when there is no sufficient dynamically allocated memory
to create MQD backup for ring

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:36:16 -04:00
Srinivasan Shanmugam
8cce16826f drm/amdgpu: Fix unused variable in amdgpu_gfx.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kcq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:497:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  497 |  int j;
      |      ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:528:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  528 |  int j;
      |      ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_enable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:630:12: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  630 |  int r, i, j;
      |

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:01:10 -04:00
Shiwu Zhang
e825fb641b drm/amdgpu: fix the memory override in kiq ring struct
This is introduced by the code merge and will let the
adev->gfx.kiq[0].ring struct being overrided

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:04 -04:00
Jack Xiao
e602157ec0 drm/amdgpu: fix S3 issue if MQD in VRAM
1. Need flush HDP for MQD putting in vram
2. Zero out mes MQD

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:33 -04:00
Tao Zhou
d78c71321e drm/amdgpu: add GFX RAS common function
The common function can help reduce redundant code.

v2: remove xcp operation, only need to do RAS operations for all
instances.
v3: remove check for GFX RAS support, will be checked in higher level.
    add amdgpu prefix for the function name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:26 -04:00
Lijo Lazar
f9632096be drm/amdgpu: Add compute mode descriptor function
Keep a helper function to get description of compute partition mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:57:48 -04:00
Lijo Lazar
b6f90baafe drm/amdgpu: Move memory partition query to gmc
GMC block handles memory related information, it makes more sense to
keep memory partition functions in gmc block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:56:50 -04:00
Lijo Lazar
ded7d99eb5 drm/amdgpu: Add flags for partition mode query
It's not required to take lock on all cases while querying partition
mode. Querying partition mode during KFD init process doesn't need to
take a lock. Init process after a switch will already be happening under
lock. Control the behaviour by adding flags to xcp_query_partition_mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:55:29 -04:00
Lijo Lazar
233bb3733b drm/amdgpu: Use unique doorbell range per xcc
Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER.
Keeping the same range causes CPF in other XCCs also to be busy when an IB
packet is submitted to KCQ. Only the XCC which processes the packet
comes back to idle afterwards and this causes other CPs not be idle.
This in turn affects clockgating behavior as RLC doesn't get idle
interrupt.

LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning
different ranges doesn't seem to have any side effect as user queue ranges
are outside of this range. User queue tests - PM4 through KFD and AQL
through rocr - have the same results after this change.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:52:06 -04:00
Lijo Lazar
8e7fd19380 drm/amdgpu: Switch to SOC partition funcs
For GFXv9.4.3, use SOC level partition switch implementation rather than
keeping them at GFX IP level. Change the exisiting implementation in
GFX IP for keeping partition mode and restrict it to only GFX related
switch.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:49:42 -04:00
Shiwu Zhang
993d218f82 drm/amdgpu: remove partition attributes sys file for gfx_v9_4_3
For driver de-init like rmmod operations those partition specific
attributes need to be removed accordingly.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:48:59 -04:00
Shiwu Zhang
37dd9d58a5 drm/amdgpu: fix kcq mqd_backup buffer double free for multi-XCD
For gfx_v9_4_3 and beyond, struct kiq has its own mqd_backup pointer
rather than using the last pointer from mec struct. Then the kfree
operation on the pointer from the mec struct should be removed otherwise
it will cause double free on the first kcq's mqd_backup buffer on XCD1.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:48:52 -04:00
Rajneesh Bhardwaj
ea2d2f8ece drm/amdgpu: detect current GPU memory partition mode
- Add helpers to detect the current GPU memory partition.
 - Add current memory partition mode sysfs node.

Tested-by: Ori Messinger <Ori.Messinger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:48:42 -04:00
Mukul Joshi
cb30544e3c drm/amdgpu: Fix failure when switching to DPX mode
Fix the if condition which causes dynamic repartitioning
to fail when trying to switch to DPX mode.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:46:54 -04:00
Mukul Joshi
0c7315e7d5 drm/amdkfd: Add device repartition support
GFX9.4.3 will support dynamic repartitioning of the GPU through sysfs.
Add device repartitioning support in KFD to repartition GPU from one
mode to other.

v2: squash in fix ("drm/amdkfd: Fix warning kgd2kfd_unlock_kfd defined but not used")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:45:38 -04:00
Lijo Lazar
8078f1c610 drm/amdgpu: Change num_xcd to xcc_mask
Instead of number of XCCs, keep a mask of XCCs for the exact XCCs
available on the ASIC. XCC configuration could differ based on
different ASIC configs.

v2:
	Rename num_xcd to num_xcc (Hawking)
	Use smaller xcc_mask size, changed to u16 (Le)

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:44:38 -04:00
Le Ma
20bedf1379 drm/amdgpu: introduce new doorbell assignment table for GC 9.4.3
Four basic reasons as below to do the change:
  1. number of ring expand a lot on GC 9.4.3, and adjustment on old
     assignment cannot make each ring in a continuous doorbell space.
  2. the SDMA doorbell index should not exceed 0x1FF on SDMA 4.2.2 due to
     regDOORBELLx_CTRL_ENTRY.BIF_DOORBELLx_RANGE_OFFSET_ENTRY field width.
  3. re-design the doorbell assignment and unify the calculation as
     "start + ring/inst id" will make the code much concise.
  4. only defining the START/END makes the table look simple

v2: (Lijo)
  1. replace name
  2. use num_inst_per_aid/sdma_doorbell_range instead of hardcoding

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:44:03 -04:00
Le Ma
feb36dd014 drm/amdgpu: convert the doorbell_index to 2 dwords offset for kiq
KIQ doorbell_index is non-zero from XCC1, thus need to left-shift it like
other rings.

Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:43:56 -04:00
Shiwu Zhang
147862d00b drm/amdgpu: enable the ring and IB test for slave kcq
With the mec FW update to utilize the mqd base set by
driver for kcq mapping, slave kcq ring test and IB test
can be re-enabled.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:42:08 -04:00
Le Ma
98a54e88e8 drm/amdgpu: add sysfs node for compute partition mode
Add current/available compute partitin mode sysfs node.

v2: make the sysfs node as IP independent one in amdgpu_gfx.c

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:40:25 -04:00
Le Ma
3566938b34 drm/amdgpu: assign different AMDGPU_GFXHUB for rings on each xcc
Pass the xcc_id to AMDGPU_GFXHUB(x)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:40:22 -04:00
Hawking Zhang
f4caf58426 drm/amdgpu: introduce vmhub definition for multi-partition cases (v3)
v1: Each partition has its own gfxhub or mmhub. adjust
the num of MAX_VMHUBS and the GFXHUB/MMHUB layout (Le)

v2: re-design the AMDGPU_GFXHUB/AMDGPU_MMHUB layout (Le)

v3: apply the gfxhub/mmhub layout to new IPs (Hawking)

v4: fix up gmc11 (Alex)

v5: rebase (Alex)

Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:40:03 -04:00
Dan Carpenter
3fb9dd5fef drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq()
This function was releasing the incorrect lock on the error path.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1156e1a60f ("drm/amdgpu: add [en/dis]able_kgq() functions")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:36:53 -04:00
Horatio Zhang
2f48965bdc drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still
use amdgpu_irq_put to disable this interrupt, which caused the call trace
in this function.

[  102.873958] Call Trace:
[  102.873959]  <TASK>
[  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu]
[  102.874019]  gfx_v11_0_suspend+0xe/0x20 [amdgpu]
[  102.874072]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu]
[  102.874122]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
[  102.874172]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu]
[  102.874223]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu]
[  102.874321]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
[  102.874375]  process_one_work+0x21f/0x3f0
[  102.874377]  worker_thread+0x200/0x3e0
[  102.874378]  ? process_one_work+0x3f0/0x3f0
[  102.874379]  kthread+0xfd/0x130
[  102.874380]  ? kthread_complete_and_exit+0x20/0x20
[  102.874381]  ret_from_fork+0x22/0x30

v2:
- Handle umc and gfx ras cases in separated patch
- Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11

v3:
- Improve the subject and code comments
- Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init

v4:
- Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and
SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state
- Check cp_ecc_error_irq.funcs rather than ip version for a more
sustainable life

v5:
- Simplify judgment conditions

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:35:20 -04:00
Dan Carpenter
7a6a2e59aa drm/amdgpu: unlock the correct lock in amdgpu_gfx_enable_kcq()
We changed which lock we are supposed to take but this error path
was accidentally over looked so it still drops the old lock.

Fixes: def799c659 ("drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:31:46 -04:00
Alex Deucher
1cfb4d6121 drm/amdgpu: put MQDs in VRAM
Reduces preemption latency.
Only enable this for gfx10 and 11 for now
to avoid changing behavior on gfx 8 and 9.

v2: move MES MQDs into VRAM as well (YuBiao)
v3: enable on gfx10, 11 only (Alex)
v4: minor style changes, document why gfx10/11 only (Alex)

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:30:58 -04:00
Alex Deucher
1156e1a60f drm/amdgpu: add [en/dis]able_kgq() functions
To replace the IP specific variants which are largely
duplicate.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:28 -04:00
Guchun Chen
34305ac364 drm/amdgpu: check correct allocated mqd_backup object after alloc
Instead of the default one, check the right mqd_backup object.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Cc: Le Ma <le.ma@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:07 -04:00
Guchun Chen
7a4685cdfb drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.c
This should be a typo when intruducing multi-xx support.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Cc: Le Ma <le.ma@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:02 -04:00
Alex Deucher
b185c31847 drm/amdgpu: track MQD size for gfx and compute
It varies by generation and we need to know the size
to expose this via debugfs.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-24 18:36:45 -04:00
Le Ma
47659738fb drm/amdgpu: allocate doorbell index for multi-die case
Allocate different doorbell index for kiq/kcq rings
on each die

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-21 08:49:37 -04:00
Le Ma
66daccde42 drm/amdgpu: add master/slave check in init phase
Skip KCQ setup on slave xcc as there's no use case.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-20 15:43:27 -04:00
Le Ma
def799c659 drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)
v1: Modify kiq_init/fini, mqd_sw_init/fini and
enable/disable_kcq to adapt to multi-die case.
Pass 0 as default to all asics with single xcc (Le)
v2: squash commits to avoid breaking the build (Le)
v3: unify naming style (Le)
v4: apply the changes to gc v11_0 (Hawking)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18 16:28:55 -04:00
Le Ma
be697aa3a7 drm/amdgpu: move queue_bitmap to an independent structure (v3)
To allocate independent queue_bitmap for each XCD,
then the old bitmap policy can be continued to use
with a clear logic.

Use mec_bitmap[0] as default for all non-GC 9.4.3 IPs.

v2: squash commits to avoid breaking the build
v3: unify naming style

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18 16:28:54 -04:00
Le Ma
277bd3371f drm/amdgpu: convert gfx.kiq to array type (v3)
v1: more kiq instances are a available in SOC (Le)
v2: squash commits to avoid breaking the build (Le)
v3: make the conversion for gfx/mec v11_0 (Hawking)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18 16:28:54 -04:00
Le Ma
0530553ba8 drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)
It looks better to place this field in ring
structure. Also drop the repeated ring funcs definitions
if there's no difference except for vmhub field.

v2: rename the field to vm_hub like others (Le)
v3: apply the changes to new ip blocks (Hawking)
v4: fix vcn sw ring (Alex)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14 13:47:49 -04:00
Hawking Zhang
af8312a38f drm/amdgpu: Correct gfx ras_late_init callback
Use default gfx ras_late_init callback for gfx
ras block.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-13 17:27:48 -04:00
Lang Yu
25959dd67d drm/amdgpu: allow multipipe policy on ASICs with one MEC
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0
instead of MEC number > 1.

This will allow multipipe policy on ASICs with one MEC,
e.g., gfx11 APUs.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-18 22:48:49 -05:00
YiPeng Chai
ac7b25d92c drm/amdgpu: Perform gpu reset after gfx finishes processing ras poison consumption on gfx_v11_0_3
Perform gpu reset after gfx finishes processing
ras poison consumption on gfx_v11_0_3.

V2:
 Move gfx poison consumption handler from hw_ops to ip
 function level.

V3:
 Adjust the calling position of amdgpu_gfx_poison_consumation_handler.

V4:
   Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null
 pointer check in amdgpu_ras_interrupt_poison_consumption_handler
 needs to be adjusted.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00