Commit Graph

665 Commits

Author SHA1 Message Date
Jack Zhang 86b93fd62d drm/amdgpu/sriov Don't send msg when smu suspend
For sriov and pp_onevf_mode, do not send message to set smu
status, because smu doesn't support these messages under VF.

Besides, it should skip smu_suspend when pp_onevf_mode is disabled.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-07 11:44:24 -05:00
Alex Deucher 2cb44fb093 drm/amdgpu: enable GPU reset by default on renoir
Everything is in place.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:45 -05:00
Alex Deucher 658c663947 drm/amdgpu: enable GPU reset by default on Navi
Has been working fine for a while.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:45 -05:00
Nirmoy Das a9d4fe2fd6 drm/amdgpu: remove unnecessary conversion to bool
Better clean that up before some automation starts to complain about it

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:55:27 -05:00
chen gong c68dbcd8f9 drm/amdgpu: add kiq version interface for RREG32/WREG32
Reading some registers by mmio will result in hang when GPU is in
"gfxoff" state.This problem can be solved by GPU in "ring command
packages" way.

Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:34:22 -05:00
chen gong d33a99c4b6 drm/amdgpu: provide a generic function interface for reading/writing register by KIQ
Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c,
and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and
flexible.

Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:34:14 -05:00
Pan, Xinhui bd05221123 drm/amdgpu: add the lost mutex_init back
Initialize notifier_lock.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1016
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-17 16:03:09 -05:00
Hawking Zhang e9d4cf918f drm/amdgpu: add arcturus to gpu recovery check code path
support check if dirver should try gpu recovery for
arcturus

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:40:54 -05:00
Evan Quan 9530273ec9 drm/amd/powerplay: cover the powerplay implementation details V3
This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.

V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-14 10:18:08 -05:00
Jack Zhang 895bd048fb amd/amdgpu/sriov tdr enablement with pp_onevf_mode
Under sriov and pp_onevf mode,
1.take resume instead of hw_init for smc recover to avoid
potential memory leak.

2.add return condition inside smc resume function for
sriov_pp_onevf_mode and pm_enabled param.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07 11:58:19 -05:00
zhengbin 2a9b90ae47 drm/amdgpu: use true, false for bool variable in amdgpu_device.c
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3961:1-19: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3981:1-19: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-23 15:00:01 -05:00
Ma Feng e3c00faa7a drm/amdgpu: Remove unneeded variable 'ret' in amdgpu_device.c
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1036:5-8: Unneeded variable: "ret". Return "0" on line 1079

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-23 14:58:56 -05:00
Jane Jian d83c7a07a7 drm/amdgpu: update VCN1(dual instances) fw types ID and VCN ip block type
Previously there is no VCN1 type ID in psp gfx interface. Also add VCN ip
block type unless the reinit after FLR for sriov would fail.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:33:26 -05:00
Andrey Grodzovsky c96cf2823d drm/amdgpu: Switch from system_highpri_wq to system_unbound_wq
This is to avoid queueing jobs to same CPU during XGMI hive reset
because there is a strict timeline for when the reset commands
must reach all the GPUs in the hive.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:13 -05:00
Andrey Grodzovsky c6a6e2db99 drm/amdgpu: Redo XGMI reset synchronization.
Use task barrier in XGMI hive to synchronize ASIC resets
across devices in XGMI hive.

v2: Return right away with a warning if no xgmi hive, update doc.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:13 -05:00
Andrey Grodzovsky 041a62bc06 drm/amdgpu: reverts commit ce316fa55e.
In preparation for doing XGMI reset synchronization using task barrier.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Monk Liu 5a7489a7e1 drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV
issues:
MEC is ruined by the amdkfd_pre_reset after VF FLR done

fix:
amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR,
the correct sequence is do amdkfd_pre_reset before VF FLR but there is
a limitation to block this sequence:
if we do pre_reset() before VF FLR, it would go KIQ way to do register
access and stuck there, because KIQ probably won't work by that time
(e.g. you already made GFX hang)

so the best way right now is to simply remove it.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Nirmoy Das f880799d7f amd/amdgpu: add sched array to IPs with multiple run-queues
This sched array can be passed on to entity creation routine
instead of manually creating such sched array on every context creation.

v2: squash in missing break fix

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Nirmoy Das 0c88b43032 drm/amdgpu: replace vm_pte's run-queue list with drm gpu scheds list
drm_sched_entity_init() takes drm gpu scheduler list instead of
drm_sched_rq list. This makes conversion of drm_sched_rq list
to drm gpu scheduler list unnecessary

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Emily Deng 8973d9ec8f drm/amdgpu/sriov: Tonga sriov also need load firmware with smu
Fix Tonga sriov load driver fail issue.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewd-by Yintian Tao <Yintian.tao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:06 -05:00
Yong Zhao d7f72fe482 drm/amdgpu: Add CU info print log
The log will be useful for easily getting the CU info on various
emulation models or ASICs.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:06 -05:00
Simon Ser 93b09a9a89 drm/amdgpu: log when amdgpu.dc=1 but ASIC is unsupported
This makes it easier to figure out whether the kernel parameter has been
taken into account.

Signed-off-by: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-11 15:22:08 -05:00
Yintian Tao c9ffa427db drm/amd/powerplay: enable pp one vf mode for vega10
Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response

v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-11 15:22:07 -05:00
Le Ma 00eaa57172 drm/amdgpu: clear err_event_athub flag after reset exit
Otherwise next err_event_athub error cannot call gpu reset. And following
resume sequence will not be affected by this flag.

v2: create function to clear amdgpu_ras_in_intr for modularity of ras driver

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:26:11 -05:00
Le Ma b823821f22 drm/amdgpu: support full gpu reset workflow when ras err_event_athub occurs
This athub fatal error can be recovered by baco without system-level reboot,
so add a mode to use baco for the recovery. Not affect the default psp reset
situations for now.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:26:03 -05:00
Le Ma ce316fa55e drm/amdgpu: add concurrent baco reset support for XGMI
Currently each XGMI node reset wq does not run in parrallel if bound to same
cpu. Make change to bound the xgmi_reset_work item to different cpus.

XGMI requires all nodes enter into baco within very close proximity before
any node exit baco. So schedule the xgmi_reset_work wq twice for enter/exit
baco respectively.

To use baco for XGMI, PMFW supported for baco on XGMI needs to be involved.

The case that PSP reset and baco reset coexist within an XGMI hive never exist
and is not in the consideration.

v2: define use_baco flag to simplify the code for xgmi baco sequence

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:25:53 -05:00
Le Ma 7a22677b95 drm/amdgpu: enable/disable doorbell interrupt in baco entry/exit helper
This operation is needed when baco entry/exit for ras recovery

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:25:45 -05:00
Emily Deng 0ea203a912 drm/amdgpu/sriov: No need the event 3 and 4 now
As will call unload kms when initialize fail, and the unload kms will
send event 3 and 4, so don't need event 3 and 4 in device init.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-02 17:39:28 -05:00
Yintian Tao 7c868b592d drm/amdgpu: not remove sysfs if not create sysfs
When load amdgpu failed before create pm_sysfs and ucode_sysfs,
the pm_sysfs and ucode_sysfs should not be removed.
Otherwise, there will be warning call trace just like below.
[   24.836386] [drm] VCE initialized successfully.
[   24.841352] amdgpu 0000:00:07.0: amdgpu_device_ip_init failed
[   25.370383] amdgpu 0000:00:07.0: Fatal error during GPU init
[   25.889575] [drm] amdgpu: finishing device.
[   26.069128] amdgpu 0000:00:07.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[   26.070110] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[   26.200309] [TTM] Finalizing pool allocator
[   26.200314] [TTM] Finalizing DMA pool allocator
[   26.200349] [TTM] Zone  kernel: Used memory at exit: 0 KiB
[   26.200351] [TTM] Zone   dma32: Used memory at exit: 0 KiB
[   26.200353] [drm] amdgpu: ttm finalized
[   26.205329] ------------[ cut here ]------------
[   26.205330] sysfs group 'fw_version' not found for kobject '0000:00:07.0'
[   26.205347] WARNING: CPU: 0 PID: 1228 at fs/sysfs/group.c:256 sysfs_remove_group+0x80/0x90
[   26.205348] Modules linked in: amdgpu(OE+) gpu_sched(OE) ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache binfmt_misc snd_hda_codec_generic ledtrig_audio crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hda_core snd_hwdep snd_pcm snd_timer input_leds snd joydev soundcore serio_raw pcspkr evbug aesni_intel aes_x86_64 crypto_simd cryptd mac_hid glue_helper sunrpc ip_tables x_tables autofs4 8139too psmouse 8139cp mii i2c_piix4 pata_acpi floppy
[   26.205369] CPU: 0 PID: 1228 Comm: modprobe Tainted: G           OE     5.2.0-rc1 #1
[   26.205370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   26.205372] RIP: 0010:sysfs_remove_group+0x80/0x90
[   26.205374] Code: e8 35 b9 ff ff 5b 41 5c 41 5d 5d c3 48 89 df e8 f6 b5 ff ff eb c6 49 8b 55 00 49 8b 34 24 48 c7 c7 48 7a 70 98 e8 60 63 d3 ff <0f> 0b eb d7 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[   26.205375] RSP: 0018:ffffbee242b0b908 EFLAGS: 00010282
[   26.205376] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[   26.205377] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff97ad6f817380
[   26.205377] RBP: ffffbee242b0b920 R08: ffffffff98f520c4 R09: 00000000000002b3
[   26.205378] R10: ffffbee242b0b8f8 R11: 00000000000002b3 R12: ffffffffc0e58240
[   26.205379] R13: ffff97ad6d1fe0b0 R14: ffff97ad4db954c8 R15: ffff97ad4db7fff0
[   26.205380] FS:  00007ff3d8a1c4c0(0000) GS:ffff97ad6f800000(0000) knlGS:0000000000000000
[   26.205381] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.205381] CR2: 00007f9b2ef1df04 CR3: 000000042aab8001 CR4: 00000000003606f0
[   26.205384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   26.205385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   26.205385] Call Trace:
[   26.205461]  amdgpu_ucode_sysfs_fini+0x18/0x20 [amdgpu]
[   26.205518]  amdgpu_device_fini+0x3b4/0x560 [amdgpu]
[   26.205573]  amdgpu_driver_unload_kms+0x4f/0xa0 [amdgpu]
[   26.205623]  amdgpu_driver_load_kms+0xcd/0x250 [amdgpu]
[   26.205637]  drm_dev_register+0x12b/0x1c0 [drm]
[   26.205695]  amdgpu_pci_probe+0x12a/0x1e0 [amdgpu]
[   26.205699]  local_pci_probe+0x47/0xa0
[   26.205701]  pci_device_probe+0x106/0x1b0
[   26.205704]  really_probe+0x21a/0x3f0
[   26.205706]  driver_probe_device+0x11c/0x140
[   26.205707]  device_driver_attach+0x58/0x60
[   26.205709]  __driver_attach+0xc3/0x140

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-02 17:39:05 -05:00
Alex Deucher de185019bc drm/amdgpu: move pci handling out of pm ops
The documentation says the that PCI core handles this
for you unless you choose to implement it.  Just rely
on the PCI core to handle the pci specific bits.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-26 14:51:03 -05:00
Alex Deucher 3840c5bcc2 drm/amdgpu: disentangle runtime pm and vga_switcheroo
Originally we only supported runtime pm on PX/HG laptops
so vga_switcheroo and runtime pm are sort of entangled.

Attempt to logically separate them.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:53 -05:00
Alex Deucher 361dbd01a1 drm/amdgpu: add helpers for baco entry and exit
BACO - Bus Active, Chip Off

Will be used for runtime pm.  Entry will enter the BACO
state (chip off).  Exit will exit the BACO state (chip on).

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:52 -05:00
Alex Deucher 31af062acf drm/amdgpu: rename amdgpu_device_is_px to amdgpu_device_supports_boco (v2)
BACO - Bus Active, Chip Off
BOCO - Bus Off, Chip Off

To better match what we are checking for and to align with
amdgpu_device_supports_baco.

BOCO is used on PowerXpress/Hybrid Graphics systems and BACO
is used on desktop dGPU boards.

v2: fix typo in documentation

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:50 -05:00
Alex Deucher a69cba42b1 drm/amdgpu: add a amdgpu_device_supports_baco helper
BACO - Bus Active, Chip Off

To check if a device supports BACO or not.  This will be
used in determining when to enable runtime pm.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:49 -05:00
Yintian Tao d0d13fe874 drm/amdgpu: put flush_delayed_work at first
There is one regression from 042f3d7b745cd76aa
To put flush_delayed_work after adev->shutdown = true
which will make amdgpu_ih_process not response the irq
At last, all ib ring tests will be failed just like below

[drm] amdgpu: finishing device.
[drm] Fence fallback timer expired on ring gfx
[drm] Fence fallback timer expired on ring comp_1.0.0
[drm] Fence fallback timer expired on ring comp_1.1.0
[drm] Fence fallback timer expired on ring comp_1.2.0
[drm] Fence fallback timer expired on ring comp_1.3.0
[drm] Fence fallback timer expired on ring comp_1.0.1
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.1.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.2.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.3.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd_enc_0.0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vce0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).

v2: replace cancel_delayed_work_sync() with flush_delayed_work()

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 10:12:51 -05:00
Leo Liu 52f2e779ad drm/amdgpu: add driver support for JPEG2.0 and above
By using JPEG IP block type

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 10:12:50 -05:00
yu kuai b8b7213057 drm/amdgpu: add function parameter description in 'amdgpu_device_set_cg_state'
Fixes gcc warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1954: warning: Function
parameter or member 'state' not described in 'amdgpu_device_set_cg_state'

Fixes: e3ecdffac9 ("drm/amdgpu: add documentation for amdgpu_device.c")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Bhawanpreet Lakha b86a1aa36a drm/amd/display: rename DCN1_0 kconfig to DCN
Since dcn20 and dcn21 are under dcn1 it doesnt make sense to
have it named dcn1.

Change it to "dcn" to make it generic

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Bhawanpreet Lakha aca935c7cc drm/amd/display: Drop CONFIG_DRM_AMD_DC_DCN2_1 flag
[Why]

DCN21 is stable enough to be build by default. So drop the flags.

[How]

Remove them using the unifdef tool. The following commands were executed
in sequence:

$ find -name '*.c' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DCN2_1 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_1 '{}' ';'
$ find -name '*.h' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DCN2_1 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_1 '{}' ';'

In addition:

* Remove from kconfig, and replace any dependencies with DCN1_0.
* Remove from any makefiles.
* Fix and cleanup Renoir definitions in dal_asic_id.h
* Expand DCN1 ifdef to include DCN21 code in the following files:
    * clk_mgr/clk_mgr.c: dc_clk_mgr_create()
    * core/dc_resources.c: dc_create_resource_pool()
    * gpio/hw_factory.c: dal_hw_factory_init()
    * gpio/hw_translate.c: dal_hw_translate_init()

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Bhawanpreet Lakha 1da37801a8 drm/amd/display: Drop CONFIG_DRM_AMD_DC_DCN2_0 and DSC_SUPPORTED
[Why]

DCN2 and DSC are stable enough to be build by default. So drop the flags.

[How]

Remove them using the unifdef tool. The following commands were executed
in sequence:

$ find -name '*.c' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DSC_SUPPORT -DCONFIG_DRM_AMD_DC_DCN2_0 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_0 '{}' ';'
$ find -name '*.h' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DSC_SUPPORT -DCONFIG_DRM_AMD_DC_DCN2_0 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_0 '{}' ';'

In addition:

* Remove from kconfig, and replace any dependencies with DCN1_0.
* Remove from any makefiles.
* Fix and cleanup NV defninitions in dal_asic_id.h
* Expand DCN1 ifdef to include DCN2 code in the following files:
    * clk_mgr/clk_mgr.c: dc_clk_mgr_create()
    * core/dc_resources.c: dc_create_resource_pool()
    * dce/dce_dmcu.c: dcn20_*lock_phy()
    * dce/dce_dmcu.c: dcn20_funcs
    * dce/dce_dmcu.c: dcn20_dmcu_create()
    * gpio/hw_factory.c: dal_hw_factory_init()
    * gpio/hw_translate.c: dal_hw_translate_init()

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Jesse Zhang 9f87516764 drm/amd/amdgpu: finish delay works before release resources
flush/cancel delayed works before doing finalization
to avoid concurrently requests.

Signed-off-by: Jesse Zhang <zhexi.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-11 17:38:13 -05:00
Evan Quan 60599a0363 drm/amdgpu: perform p-state switch after the whole hive initialized
P-state switch should be performed after all devices from the hive
get initialized.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by:  Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 16:27:48 -05:00
Evan Quan b0adca4d50 drm/amdgpu: register gpu instance before fan boost feature enablment
Otherwise, the feature enablement will be skipped due to wrong count.

Fixes: beff74bc6e ("drm/amdgpu: fix a race in GPU reset with IB test (v2)")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 16:27:47 -05:00
Evan Quan 39ea6e5f9e drm/amdgpu: change pstate only after all XGMI device initialized
Pstate settings should be performed after all device of the
XGMI setup get initialized.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 16:27:46 -05:00
Le Ma bff77e86a3 drm/amdgpu: bypass some cleanup work after err_event_athub (v2)
PSP lost connection when err_event_athub occurs. These cleanup work can be
skipped in BACO reset.

v2: squash in missing include (Alex)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-30 11:06:52 -04:00
Wambui Karuga f440ff44b1 drm/amd: correct "_LENTH" mispelling in constant
Correct the "_LENTH" mispelling in the AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
constant.

Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-28 11:19:17 -04:00
Andrey Grodzovsky 121a2bc6ae drm/amdgpu: Move amdgpu_ras_recovery_init to after SMU ready.
For Arcturus the I2C traffic is done through SMU tables and so
we must postpone RAS recovery init to after they are ready
which is in amdgpu_device_ip_hw_init_phase2.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-25 16:50:10 -04:00
Tianci.Yin e35e2b117f drm/amdgpu: add a generic fb accessing helper function(v3)
add a generic helper function for accessing framebuffer via MMIO

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-17 16:31:13 -04:00
Alex Deucher 897483d8a0 drm/amdgpu: move gpu reset out of amdgpu_device_suspend
Move it into the caller.  There are cases were we don't
want it.  We need it for hibernation, but we don't need
it for runtime pm, so drop it for runtime pm.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-15 15:51:39 -04:00
Alex Deucher 803cc26d5c drm/amdgpu: move pci_save_state into suspend path
for amdgpu_device_suspend.  This follows the logic
in the resume path.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-15 15:51:39 -04:00