diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
index c80d0f1307db..590dae77ea12 100644
--- a/Documentation/accel/qaic/aic100.rst
+++ b/Documentation/accel/qaic/aic100.rst
@@ -36,8 +36,9 @@ AIC100 DID (0xa100).
 
 AIC100 does not implement FLR (function level reset).
 
-AIC100 implements MSI but does not implement MSI-X. AIC100 requires 17 MSIs to
-operate (1 for MHI, 16 for the DMA Bridge).
+AIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to
+operate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in
+scenarios where reserving 32 MSIs isn't feasible.
 
 As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
 hardware. AIC100 provides 3, 64-bit BARs.
@@ -220,10 +221,14 @@ of the defined channels, and their uses.
 +----------------+---------+----------+----------------------------------------+
 | QAIC_DEBUG     | 18 & 19 | AMSS     | Not used.                              |
 +----------------+---------+----------+----------------------------------------+
-| QAIC_TIMESYNC  | 20 & 21 | SBL/AMSS | Used to synchronize timestamps in the  |
+| QAIC_TIMESYNC  | 20 & 21 | SBL      | Used to synchronize timestamps in the  |
 |                |         |          | device side logs with the host time    |
 |                |         |          | source.                                |
 +----------------+---------+----------+----------------------------------------+
+| QAIC_TIMESYNC  | 22 & 23 | AMSS     | Used to periodically synchronize       |
+| _PERIODIC      |         |          | timestamps in the device side logs with|
+|                |         |          | the host time source.                  |
++----------------+---------+----------+----------------------------------------+
 
 DMA Bridge
 ==========
diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
index c88502383136..f81020736ebf 100644
--- a/Documentation/accel/qaic/qaic.rst
+++ b/Documentation/accel/qaic/qaic.rst
@@ -10,6 +10,9 @@ accelerator products.
 Interrupts
 ==========
 
+IRQ Storm Mitigation
+--------------------
+
 While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
 mechanism, it is still possible for an IRQ storm to occur. A storm can happen
 if the workload is particularly quick, and the host is responsive. If the host
@@ -35,6 +38,26 @@ generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
 IRQs over 5 minutes while keeping the host system stable, and having the same
 workload throughput performance (within run to run noise variation).
 
+Single MSI Mode
+---------------
+
+MultiMSI is not well supported on all systems; virtualized ones even less so
+(circa 2023). Between hypervisors masking the PCIe MSI capability structure to
+large memory requirements for vIOMMUs (required for supporting MultiMSI), it is
+useful to be able to fall back to a single MSI when needed.
+
+To support this fallback, we allow the case where only one MSI is able to be
+allocated, and share that one MSI between MHI and the DBCs. The device detects
+when only one MSI has been configured and directs the interrupts for the DBCs
+to the interrupt normally used for MHI. Unfortunately this means that the
+interrupt handlers for every DBC and MHI wake up for every interrupt that
+arrives; however, the DBC threaded irq handlers only are started when work to be
+done is detected (MHI will always start its threaded handler).
+
+If the DBC is configured to force MSI interrupts, this can circumvent the
+software IRQ storm mitigation mentioned above. Since the MSI is shared it is
+never disabled, allowing each new entry to the FIFO to trigger a new interrupt.
+
 
 Neural Network Control (NNC) Protocol
 =====================================
@@ -178,3 +201,8 @@ overrides this for that call. Default is 5000 (5 seconds).
 
 Sets the polling interval in microseconds (us) when datapath polling is active.
 Takes effect at the next polling interval. Default is 100 (100 us).
+
+**timesync_delay_ms (unsigned int)**
+
+Sets the time interval in milliseconds (ms) between two consecutive timesync
+operations. Default is 1000 (1000 ms).
diff --git a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
index dae55b8a267b..dc078ceeca9a 100644
--- a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
+++ b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
@@ -17,6 +17,7 @@ properties:
   compatible:
     enum:
       - brcm,2711-v3d
+      - brcm,2712-v3d
       - brcm,7268-v3d
       - brcm,7278-v3d
 
diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
index b748b8ae70b2..59cfe8a7a8ba 100644
--- a/Documentation/gpu/drm-kms-helpers.rst
+++ b/Documentation/gpu/drm-kms-helpers.rst
@@ -363,6 +363,12 @@ EDID Helper Functions Reference
 .. kernel-doc:: drivers/gpu/drm/drm_edid.c
    :export:
 
+.. kernel-doc:: include/drm/drm_eld.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_eld.c
+   :export:
+
 SCDC Helper Functions Reference
 ===============================
 
diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 602010cb6894..acc5901ac840 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -552,6 +552,12 @@ Overview
 .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
    :doc: Overview
 
+Flow Control
+------------
+
+.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
+   :doc: Flow Control
+
 Scheduler Function References
 -----------------------------
 
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 03fe5d1247be..b62c7fa0c2bc 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -621,6 +621,23 @@ Contact: Javier Martinez Canillas <javierm@redhat.com>
 
 Level: Intermediate
 
+Clean up and document former selftests suites
+---------------------------------------------
+
+Some KUnit test suites (drm_buddy, drm_cmdline_parser, drm_damage_helper,
+drm_format, drm_framebuffer, drm_dp_mst_helper, drm_mm, drm_plane_helper and
+drm_rect) are former selftests suites that have been converted over when KUnit
+was first introduced.
+
+These suites were fairly undocumented, and with different goals than what unit
+tests can be. Trying to identify what each test in these suites actually test
+for, whether that makes sense for a unit test, and either remove it if it
+doesn't or document it if it does would be of great help.
+
+Contact: Maxime Ripard <mripard@kernel.org>
+
+Level: Intermediate
+
 Enable trinity for DRM
 ----------------------
 
diff --git a/MAINTAINERS b/MAINTAINERS
index ea790149af79..b9a54de83019 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6503,8 +6503,7 @@ T:	git git://anongit.freedesktop.org/drm/drm-misc
 F:	drivers/gpu/drm/sun4i/sun8i*
 
 DRM DRIVER FOR ARM PL111 CLCD
-M:	Emma Anholt <emma@anholt.net>
-S:	Supported
+S:	Orphan
 T:	git git://anongit.freedesktop.org/drm/drm-misc
 F:	drivers/gpu/drm/pl111/
 
@@ -6619,8 +6618,7 @@ F:	Documentation/devicetree/bindings/display/panel/himax,hx8394.yaml
 F:	drivers/gpu/drm/panel/panel-himax-hx8394.c
 
 DRM DRIVER FOR HX8357D PANELS
-M:	Emma Anholt <emma@anholt.net>
-S:	Maintained
+S:	Orphan
 T:	git git://anongit.freedesktop.org/drm/drm-misc
 F:	Documentation/devicetree/bindings/display/himax,hx8357d.txt
 F:	drivers/gpu/drm/tiny/hx8357d.c
@@ -7213,8 +7211,8 @@ F:	Documentation/devicetree/bindings/display/ti/
 F:	drivers/gpu/drm/omapdrm/
 
 DRM DRIVERS FOR V3D
-M:	Emma Anholt <emma@anholt.net>
 M:	Melissa Wen <mwen@igalia.com>
+M:	Maíra Canal <mcanal@igalia.com>
 S:	Supported
 T:	git git://anongit.freedesktop.org/drm/drm-misc
 F:	Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
@@ -7222,7 +7220,6 @@ F:	drivers/gpu/drm/v3d/
 F:	include/uapi/drm/v3d_drm.h
 
 DRM DRIVERS FOR VC4
-M:	Emma Anholt <emma@anholt.net>
 M:	Maxime Ripard <mripard@kernel.org>
 S:	Supported
 T:	git git://github.com/anholt/linux
diff --git a/drivers/accel/ivpu/Kconfig b/drivers/accel/ivpu/Kconfig
index 1a4c4ed9d113..682c53245286 100644
--- a/drivers/accel/ivpu/Kconfig
+++ b/drivers/accel/ivpu/Kconfig
@@ -1,16 +1,17 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 config DRM_ACCEL_IVPU
-	tristate "Intel VPU for Meteor Lake and newer"
+	tristate "Intel NPU (Neural Processing Unit)"
 	depends on DRM_ACCEL
 	depends on X86_64 && !UML
 	depends on PCI && PCI_MSI
 	select FW_LOADER
-	select SHMEM
+	select DRM_GEM_SHMEM_HELPER
 	select GENERIC_ALLOCATOR
 	help
-	  Choose this option if you have a system that has an 14th generation Intel CPU
-	  or newer. VPU stands for Versatile Processing Unit and it's a CPU-integrated
-	  inference accelerator for Computer Vision and Deep Learning applications.
+	  Choose this option if you have a system with an 14th generation
+	  Intel CPU (Meteor Lake) or newer. Intel NPU (formerly called Intel VPU)
+	  is a CPU-integrated inference accelerator for Computer Vision
+	  and Deep Learning applications.
 
 	  If "M" is selected, the module will be called intel_vpu.
diff --git a/drivers/accel/ivpu/ivpu_debugfs.c b/drivers/accel/ivpu/ivpu_debugfs.c
index ea453b985b49..19035230563d 100644
--- a/drivers/accel/ivpu/ivpu_debugfs.c
+++ b/drivers/accel/ivpu/ivpu_debugfs.c
@@ -14,6 +14,7 @@
 #include "ivpu_fw.h"
 #include "ivpu_fw_log.h"
 #include "ivpu_gem.h"
+#include "ivpu_hw.h"
 #include "ivpu_jsm_msg.h"
 #include "ivpu_pm.h"
 
@@ -115,6 +116,31 @@ static const struct drm_debugfs_info vdev_debugfs_list[] = {
 	{"reset_pending", reset_pending_show, 0},
 };
 
+static ssize_t
+dvfs_mode_fops_write(struct file *file, const char __user *user_buf, size_t size, loff_t *pos)
+{
+	struct ivpu_device *vdev = file->private_data;
+	struct ivpu_fw_info *fw = vdev->fw;
+	u32 dvfs_mode;
+	int ret;
+
+	ret = kstrtou32_from_user(user_buf, size, 0, &dvfs_mode);
+	if (ret < 0)
+		return ret;
+
+	fw->dvfs_mode = dvfs_mode;
+
+	ivpu_pm_schedule_recovery(vdev);
+
+	return size;
+}
+
+static const struct file_operations dvfs_mode_fops = {
+	.owner = THIS_MODULE,
+	.open = simple_open,
+	.write = dvfs_mode_fops_write,
+};
+
 static int fw_log_show(struct seq_file *s, void *v)
 {
 	struct ivpu_device *vdev = s->private;
@@ -151,6 +177,30 @@ static const struct file_operations fw_log_fops = {
 	.release = single_release,
 };
 
+static ssize_t
+fw_profiling_freq_fops_write(struct file *file, const char __user *user_buf,
+			     size_t size, loff_t *pos)
+{
+	struct ivpu_device *vdev = file->private_data;
+	bool enable;
+	int ret;
+
+	ret = kstrtobool_from_user(user_buf, size, &enable);
+	if (ret < 0)
+		return ret;
+
+	ivpu_hw_profiling_freq_drive(vdev, enable);
+	ivpu_pm_schedule_recovery(vdev);
+
+	return size;
+}
+
+static const struct file_operations fw_profiling_freq_fops = {
+	.owner = THIS_MODULE,
+	.open = simple_open,
+	.write = fw_profiling_freq_fops_write,
+};
+
 static ssize_t
 fw_trace_destination_mask_fops_write(struct file *file, const char __user *user_buf,
 				     size_t size, loff_t *pos)
@@ -280,6 +330,9 @@ void ivpu_debugfs_init(struct ivpu_device *vdev)
 	debugfs_create_file("force_recovery", 0200, debugfs_root, vdev,
 			    &ivpu_force_recovery_fops);
 
+	debugfs_create_file("dvfs_mode", 0200, debugfs_root, vdev,
+			    &dvfs_mode_fops);
+
 	debugfs_create_file("fw_log", 0644, debugfs_root, vdev,
 			    &fw_log_fops);
 	debugfs_create_file("fw_trace_destination_mask", 0200, debugfs_root, vdev,
@@ -291,4 +344,8 @@ void ivpu_debugfs_init(struct ivpu_device *vdev)
 
 	debugfs_create_file("reset_engine", 0200, debugfs_root, vdev,
 			    &ivpu_reset_engine_fops);
+
+	if (ivpu_hw_gen(vdev) >= IVPU_HW_40XX)
+		debugfs_create_file("fw_profiling_freq_drive", 0200,
+				    debugfs_root, vdev, &fw_profiling_freq_fops);
 }
diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c
index 790603017653..64927682161b 100644
--- a/drivers/accel/ivpu/ivpu_drv.c
+++ b/drivers/accel/ivpu/ivpu_drv.c
@@ -31,8 +31,6 @@
 			   __stringify(DRM_IVPU_DRIVER_MINOR) "."
 #endif
 
-static const struct drm_driver driver;
-
 static struct lock_class_key submitted_jobs_xa_lock_class_key;
 
 int ivpu_dbg_mask;
@@ -41,7 +39,7 @@ MODULE_PARM_DESC(dbg_mask, "Driver debug mask. See IVPU_DBG_* macros.");
 
 int ivpu_test_mode;
 module_param_named_unsafe(test_mode, ivpu_test_mode, int, 0644);
-MODULE_PARM_DESC(test_mode, "Test mode: 0 - normal operation, 1 - fw unit test, 2 - null hw");
+MODULE_PARM_DESC(test_mode, "Test mode mask. See IVPU_TEST_MODE_* macros.");
 
 u8 ivpu_pll_min_ratio;
 module_param_named(pll_min_ratio, ivpu_pll_min_ratio, byte, 0644);
@@ -93,8 +91,8 @@ static void file_priv_release(struct kref *ref)
 	ivpu_dbg(vdev, FILE, "file_priv release: ctx %u\n", file_priv->ctx.id);
 
 	ivpu_cmdq_release_all(file_priv);
-	ivpu_bo_remove_all_bos_from_context(&file_priv->ctx);
 	ivpu_jsm_context_release(vdev, file_priv->ctx.id);
+	ivpu_bo_remove_all_bos_from_context(vdev, &file_priv->ctx);
 	ivpu_mmu_user_context_fini(vdev, &file_priv->ctx);
 	drm_WARN_ON(&vdev->drm, xa_erase_irq(&vdev->context_xa, file_priv->ctx.id) != file_priv);
 	mutex_destroy(&file_priv->lock);
@@ -317,16 +315,14 @@ static int ivpu_wait_for_ready(struct ivpu_device *vdev)
 	unsigned long timeout;
 	int ret;
 
-	if (ivpu_test_mode == IVPU_TEST_MODE_FW_TEST)
+	if (ivpu_test_mode & IVPU_TEST_MODE_FW_TEST)
 		return 0;
 
-	ivpu_ipc_consumer_add(vdev, &cons, IVPU_IPC_CHAN_BOOT_MSG);
+	ivpu_ipc_consumer_add(vdev, &cons, IVPU_IPC_CHAN_BOOT_MSG, NULL);
 
 	timeout = jiffies + msecs_to_jiffies(vdev->timeout.boot);
 	while (1) {
-		ret = ivpu_ipc_irq_handler(vdev);
-		if (ret)
-			break;
+		ivpu_ipc_irq_handler(vdev, NULL);
 		ret = ivpu_ipc_receive(vdev, &cons, &ipc_hdr, NULL, 0);
 		if (ret != -ETIMEDOUT || time_after_eq(jiffies, timeout))
 			break;
@@ -362,7 +358,7 @@ int ivpu_boot(struct ivpu_device *vdev)
 	int ret;
 
 	/* Update boot params located at first 4KB of FW memory */
-	ivpu_fw_boot_params_setup(vdev, vdev->fw->mem->kvaddr);
+	ivpu_fw_boot_params_setup(vdev, ivpu_bo_vaddr(vdev->fw->mem));
 
 	ret = ivpu_hw_boot_fw(vdev);
 	if (ret) {
@@ -414,7 +410,9 @@ static const struct drm_driver driver = {
 
 	.open = ivpu_open,
 	.postclose = ivpu_postclose,
-	.gem_prime_import = ivpu_gem_prime_import,
+
+	.gem_create_object = ivpu_gem_create_object,
+	.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
 
 	.ioctls = ivpu_drm_ioctls,
 	.num_ioctls = ARRAY_SIZE(ivpu_drm_ioctls),
@@ -427,6 +425,13 @@ static const struct drm_driver driver = {
 	.minor = DRM_IVPU_DRIVER_MINOR,
 };
 
+static irqreturn_t ivpu_irq_thread_handler(int irq, void *arg)
+{
+	struct ivpu_device *vdev = arg;
+
+	return ivpu_ipc_irq_thread_handler(vdev);
+}
+
 static int ivpu_irq_init(struct ivpu_device *vdev)
 {
 	struct pci_dev *pdev = to_pci_dev(vdev->drm.dev);
@@ -440,8 +445,8 @@ static int ivpu_irq_init(struct ivpu_device *vdev)
 
 	vdev->irq = pci_irq_vector(pdev, 0);
 
-	ret = devm_request_irq(vdev->drm.dev, vdev->irq, vdev->hw->ops->irq_handler,
-			       IRQF_NO_AUTOEN, DRIVER_NAME, vdev);
+	ret = devm_request_threaded_irq(vdev->drm.dev, vdev->irq, vdev->hw->ops->irq_handler,
+					ivpu_irq_thread_handler, IRQF_NO_AUTOEN, DRIVER_NAME, vdev);
 	if (ret)
 		ivpu_err(vdev, "Failed to request an IRQ %d\n", ret);
 
@@ -533,6 +538,11 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
 	xa_init_flags(&vdev->context_xa, XA_FLAGS_ALLOC);
 	xa_init_flags(&vdev->submitted_jobs_xa, XA_FLAGS_ALLOC1);
 	lockdep_set_class(&vdev->submitted_jobs_xa.xa_lock, &submitted_jobs_xa_lock_class_key);
+	INIT_LIST_HEAD(&vdev->bo_list);
+
+	ret = drmm_mutex_init(&vdev->drm, &vdev->bo_list_lock);
+	if (ret)
+		goto err_xa_destroy;
 
 	ret = ivpu_pci_init(vdev);
 	if (ret)
@@ -550,7 +560,7 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
 	/* Power up early so the rest of init code can access VPU registers */
 	ret = ivpu_hw_power_up(vdev);
 	if (ret)
-		goto err_xa_destroy;
+		goto err_power_down;
 
 	ret = ivpu_mmu_global_context_init(vdev);
 	if (ret)
@@ -574,20 +584,15 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
 
 	ivpu_pm_init(vdev);
 
-	ret = ivpu_job_done_thread_init(vdev);
+	ret = ivpu_boot(vdev);
 	if (ret)
 		goto err_ipc_fini;
 
-	ret = ivpu_boot(vdev);
-	if (ret)
-		goto err_job_done_thread_fini;
-
+	ivpu_job_done_consumer_init(vdev);
 	ivpu_pm_enable(vdev);
 
 	return 0;
 
-err_job_done_thread_fini:
-	ivpu_job_done_thread_fini(vdev);
 err_ipc_fini:
 	ivpu_ipc_fini(vdev);
 err_fw_fini:
@@ -612,7 +617,7 @@ static void ivpu_dev_fini(struct ivpu_device *vdev)
 	ivpu_shutdown(vdev);
 	if (IVPU_WA(d3hot_after_power_off))
 		pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
-	ivpu_job_done_thread_fini(vdev);
+	ivpu_job_done_consumer_fini(vdev);
 	ivpu_pm_cancel_recovery(vdev);
 
 	ivpu_ipc_fini(vdev);
diff --git a/drivers/accel/ivpu/ivpu_drv.h b/drivers/accel/ivpu/ivpu_drv.h
index 417ddeca8517..ebc4b84f27b2 100644
--- a/drivers/accel/ivpu/ivpu_drv.h
+++ b/drivers/accel/ivpu/ivpu_drv.h
@@ -17,9 +17,10 @@
 #include <uapi/drm/ivpu_accel.h>
 
 #include "ivpu_mmu_context.h"
+#include "ivpu_ipc.h"
 
 #define DRIVER_NAME "intel_vpu"
-#define DRIVER_DESC "Driver for Intel Versatile Processing Unit (VPU)"
+#define DRIVER_DESC "Driver for Intel NPU (Neural Processing Unit)"
 #define DRIVER_DATE "20230117"
 
 #define PCI_DEVICE_ID_MTL   0x7d1d
@@ -88,6 +89,7 @@ struct ivpu_wa_table {
 	bool d3hot_after_power_off;
 	bool interrupt_clear_with_0;
 	bool disable_clock_relinquish;
+	bool disable_d0i3_msg;
 };
 
 struct ivpu_hw_info;
@@ -115,8 +117,11 @@ struct ivpu_device {
 	struct xarray context_xa;
 	struct xa_limit context_xa_limit;
 
+	struct mutex bo_list_lock; /* Protects bo_list */
+	struct list_head bo_list;
+
 	struct xarray submitted_jobs_xa;
-	struct task_struct *job_done_thread;
+	struct ivpu_ipc_consumer job_done_consumer;
 
 	atomic64_t unique_id_counter;
 
@@ -126,6 +131,7 @@ struct ivpu_device {
 		int tdr;
 		int reschedule_suspend;
 		int autosuspend;
+		int d0i3_entry_msg;
 	} timeout;
 };
 
@@ -148,9 +154,11 @@ extern u8 ivpu_pll_min_ratio;
 extern u8 ivpu_pll_max_ratio;
 extern bool ivpu_disable_mmu_cont_pages;
 
-#define IVPU_TEST_MODE_DISABLED  0
-#define IVPU_TEST_MODE_FW_TEST   1
-#define IVPU_TEST_MODE_NULL_HW   2
+#define IVPU_TEST_MODE_FW_TEST            BIT(0)
+#define IVPU_TEST_MODE_NULL_HW            BIT(1)
+#define IVPU_TEST_MODE_NULL_SUBMISSION    BIT(2)
+#define IVPU_TEST_MODE_D0I3_MSG_DISABLE   BIT(4)
+#define IVPU_TEST_MODE_D0I3_MSG_ENABLE    BIT(5)
 extern int ivpu_test_mode;
 
 struct ivpu_file_priv *ivpu_file_priv_get(struct ivpu_file_priv *file_priv);
diff --git a/drivers/accel/ivpu/ivpu_fw.c b/drivers/accel/ivpu/ivpu_fw.c
index 691da521dde5..6576232f3e67 100644
--- a/drivers/accel/ivpu/ivpu_fw.c
+++ b/drivers/accel/ivpu/ivpu_fw.c
@@ -33,12 +33,17 @@
 
 #define ADDR_TO_L2_CACHE_CFG(addr) ((addr) >> 31)
 
-#define IVPU_FW_CHECK_API(vdev, fw_hdr, name, min_major) \
+/* Check if FW API is compatible with the driver */
+#define IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, name, min_major) \
 	ivpu_fw_check_api(vdev, fw_hdr, #name, \
 			  VPU_##name##_API_VER_INDEX, \
 			  VPU_##name##_API_VER_MAJOR, \
 			  VPU_##name##_API_VER_MINOR, min_major)
 
+/* Check if API version is lower that the given version */
+#define IVPU_FW_CHECK_API_VER_LT(vdev, fw_hdr, name, major, minor) \
+	ivpu_fw_check_api_ver_lt(vdev, fw_hdr, #name, VPU_##name##_API_VER_INDEX, major, minor)
+
 static char *ivpu_firmware;
 module_param_named_unsafe(firmware, ivpu_firmware, charp, 0644);
 MODULE_PARM_DESC(firmware, "VPU firmware binary in /lib/firmware/..");
@@ -105,6 +110,19 @@ ivpu_fw_check_api(struct ivpu_device *vdev, const struct vpu_firmware_header *fw
 	return 0;
 }
 
+static bool
+ivpu_fw_check_api_ver_lt(struct ivpu_device *vdev, const struct vpu_firmware_header *fw_hdr,
+			 const char *str, int index, u16 major, u16 minor)
+{
+	u16 fw_major = (u16)(fw_hdr->api_version[index] >> 16);
+	u16 fw_minor = (u16)(fw_hdr->api_version[index]);
+
+	if (fw_major < major || (fw_major == major && fw_minor < minor))
+		return true;
+
+	return false;
+}
+
 static int ivpu_fw_parse(struct ivpu_device *vdev)
 {
 	struct ivpu_fw_info *fw = vdev->fw;
@@ -164,9 +182,9 @@ static int ivpu_fw_parse(struct ivpu_device *vdev)
 	ivpu_info(vdev, "Firmware: %s, version: %s", fw->name,
 		  (const char *)fw_hdr + VPU_FW_HEADER_SIZE);
 
-	if (IVPU_FW_CHECK_API(vdev, fw_hdr, BOOT, 3))
+	if (IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, BOOT, 3))
 		return -EINVAL;
-	if (IVPU_FW_CHECK_API(vdev, fw_hdr, JSM, 3))
+	if (IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, JSM, 3))
 		return -EINVAL;
 
 	fw->runtime_addr = runtime_addr;
@@ -182,6 +200,8 @@ static int ivpu_fw_parse(struct ivpu_device *vdev)
 	fw->trace_destination_mask = VPU_TRACE_DESTINATION_VERBOSE_TRACING;
 	fw->trace_hw_component_mask = -1;
 
+	fw->dvfs_mode = 0;
+
 	ivpu_dbg(vdev, FW_BOOT, "Size: file %lu image %u runtime %u shavenn %u\n",
 		 fw->file->size, fw->image_size, fw->runtime_size, fw->shave_nn_size);
 	ivpu_dbg(vdev, FW_BOOT, "Address: runtime 0x%llx, load 0x%llx, entry point 0x%llx\n",
@@ -195,6 +215,24 @@ static void ivpu_fw_release(struct ivpu_device *vdev)
 	release_firmware(vdev->fw->file);
 }
 
+/* Initialize workarounds that depend on FW version */
+static void
+ivpu_fw_init_wa(struct ivpu_device *vdev)
+{
+	const struct vpu_firmware_header *fw_hdr = (const void *)vdev->fw->file->data;
+
+	if (IVPU_FW_CHECK_API_VER_LT(vdev, fw_hdr, BOOT, 3, 17) ||
+	    (ivpu_hw_gen(vdev) > IVPU_HW_37XX) ||
+	    (ivpu_test_mode & IVPU_TEST_MODE_D0I3_MSG_DISABLE))
+		vdev->wa.disable_d0i3_msg = true;
+
+	/* Force enable the feature for testing purposes */
+	if (ivpu_test_mode & IVPU_TEST_MODE_D0I3_MSG_ENABLE)
+		vdev->wa.disable_d0i3_msg = false;
+
+	IVPU_PRINT_WA(disable_d0i3_msg);
+}
+
 static int ivpu_fw_update_global_range(struct ivpu_device *vdev)
 {
 	struct ivpu_fw_info *fw = vdev->fw;
@@ -248,7 +286,7 @@ static int ivpu_fw_mem_init(struct ivpu_device *vdev)
 
 	if (fw->shave_nn_size) {
 		fw->mem_shave_nn = ivpu_bo_alloc_internal(vdev, vdev->hw->ranges.shave.start,
-							  fw->shave_nn_size, DRM_IVPU_BO_UNCACHED);
+							  fw->shave_nn_size, DRM_IVPU_BO_WC);
 		if (!fw->mem_shave_nn) {
 			ivpu_err(vdev, "Failed to allocate shavenn buffer\n");
 			ret = -ENOMEM;
@@ -297,6 +335,8 @@ int ivpu_fw_init(struct ivpu_device *vdev)
 	if (ret)
 		goto err_fw_release;
 
+	ivpu_fw_init_wa(vdev);
+
 	ret = ivpu_fw_mem_init(vdev);
 	if (ret)
 		goto err_fw_release;
@@ -422,14 +462,31 @@ static void ivpu_fw_boot_params_print(struct ivpu_device *vdev, struct vpu_boot_
 		 boot_params->punit_telemetry_sram_size);
 	ivpu_dbg(vdev, FW_BOOT, "boot_params.vpu_telemetry_enable = 0x%x\n",
 		 boot_params->vpu_telemetry_enable);
+	ivpu_dbg(vdev, FW_BOOT, "boot_params.dvfs_mode = %u\n",
+		 boot_params->dvfs_mode);
+	ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_delayed_entry = %d\n",
+		 boot_params->d0i3_delayed_entry);
+	ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_residency_time_us = %lld\n",
+		 boot_params->d0i3_residency_time_us);
+	ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_entry_vpu_ts = %llu\n",
+		 boot_params->d0i3_entry_vpu_ts);
 }
 
 void ivpu_fw_boot_params_setup(struct ivpu_device *vdev, struct vpu_boot_params *boot_params)
 {
 	struct ivpu_bo *ipc_mem_rx = vdev->ipc->mem_rx;
 
-	/* In case of warm boot we only have to reset the entrypoint addr */
+	/* In case of warm boot only update variable params */
 	if (!ivpu_fw_is_cold_boot(vdev)) {
+		boot_params->d0i3_residency_time_us =
+			ktime_us_delta(ktime_get_boottime(), vdev->hw->d0i3_entry_host_ts);
+		boot_params->d0i3_entry_vpu_ts = vdev->hw->d0i3_entry_vpu_ts;
+
+		ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_residency_time_us = %lld\n",
+			 boot_params->d0i3_residency_time_us);
+		ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_entry_vpu_ts = %llu\n",
+			 boot_params->d0i3_entry_vpu_ts);
+
 		boot_params->save_restore_ret_address = 0;
 		vdev->pm->is_warmboot = true;
 		wmb(); /* Flush WC buffers after writing save_restore_ret_address */
@@ -442,6 +499,13 @@ void ivpu_fw_boot_params_setup(struct ivpu_device *vdev, struct vpu_boot_params
 	boot_params->vpu_id = to_pci_dev(vdev->drm.dev)->bus->number;
 	boot_params->frequency = ivpu_hw_reg_pll_freq_get(vdev);
 
+	/*
+	 * This param is a debug firmware feature.  It switches default clock
+	 * to higher resolution one for fine-grained and more accurate firmware
+	 * task profiling.
+	 */
+	boot_params->perf_clk_frequency = ivpu_hw_profiling_freq_get(vdev);
+
 	/*
 	 * Uncached region of VPU address space, covers IPC buffers, job queues
 	 * and log buffers, programmable to L2$ Uncached by VPU MTRR
@@ -493,6 +557,11 @@ void ivpu_fw_boot_params_setup(struct ivpu_device *vdev, struct vpu_boot_params
 	boot_params->punit_telemetry_sram_base = ivpu_hw_reg_telemetry_offset_get(vdev);
 	boot_params->punit_telemetry_sram_size = ivpu_hw_reg_telemetry_size_get(vdev);
 	boot_params->vpu_telemetry_enable = ivpu_hw_reg_telemetry_enable_get(vdev);
+	boot_params->dvfs_mode = vdev->fw->dvfs_mode;
+	if (!IVPU_WA(disable_d0i3_msg))
+		boot_params->d0i3_delayed_entry = 1;
+	boot_params->d0i3_residency_time_us = 0;
+	boot_params->d0i3_entry_vpu_ts = 0;
 
 	wmb(); /* Flush WC buffers after writing bootparams */
 
diff --git a/drivers/accel/ivpu/ivpu_fw.h b/drivers/accel/ivpu/ivpu_fw.h
index 10ae2847f0ef..66b60fa161b5 100644
--- a/drivers/accel/ivpu/ivpu_fw.h
+++ b/drivers/accel/ivpu/ivpu_fw.h
@@ -27,6 +27,7 @@ struct ivpu_fw_info {
 	u32 trace_level;
 	u32 trace_destination_mask;
 	u64 trace_hw_component_mask;
+	u32 dvfs_mode;
 };
 
 int ivpu_fw_init(struct ivpu_device *vdev);
diff --git a/drivers/accel/ivpu/ivpu_gem.c b/drivers/accel/ivpu/ivpu_gem.c
index c91852f2edc8..1dda4f38ea25 100644
--- a/drivers/accel/ivpu/ivpu_gem.c
+++ b/drivers/accel/ivpu/ivpu_gem.c
@@ -20,215 +20,18 @@
 #include "ivpu_mmu.h"
 #include "ivpu_mmu_context.h"
 
-MODULE_IMPORT_NS(DMA_BUF);
-
 static const struct drm_gem_object_funcs ivpu_gem_funcs;
 
-static struct lock_class_key prime_bo_lock_class_key;
-
-static int __must_check prime_alloc_pages_locked(struct ivpu_bo *bo)
+static inline void ivpu_dbg_bo(struct ivpu_device *vdev, struct ivpu_bo *bo, const char *action)
 {
-	/* Pages are managed by the underlying dma-buf */
-	return 0;
-}
-
-static void prime_free_pages_locked(struct ivpu_bo *bo)
-{
-	/* Pages are managed by the underlying dma-buf */
-}
-
-static int prime_map_pages_locked(struct ivpu_bo *bo)
-{
-	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-	struct sg_table *sgt;
-
-	sgt = dma_buf_map_attachment_unlocked(bo->base.import_attach, DMA_BIDIRECTIONAL);
-	if (IS_ERR(sgt)) {
-		ivpu_err(vdev, "Failed to map attachment: %ld\n", PTR_ERR(sgt));
-		return PTR_ERR(sgt);
-	}
-
-	bo->sgt = sgt;
-	return 0;
-}
-
-static void prime_unmap_pages_locked(struct ivpu_bo *bo)
-{
-	dma_buf_unmap_attachment_unlocked(bo->base.import_attach, bo->sgt, DMA_BIDIRECTIONAL);
-	bo->sgt = NULL;
-}
-
-static const struct ivpu_bo_ops prime_ops = {
-	.type = IVPU_BO_TYPE_PRIME,
-	.name = "prime",
-	.alloc_pages = prime_alloc_pages_locked,
-	.free_pages = prime_free_pages_locked,
-	.map_pages = prime_map_pages_locked,
-	.unmap_pages = prime_unmap_pages_locked,
-};
-
-static int __must_check shmem_alloc_pages_locked(struct ivpu_bo *bo)
-{
-	int npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
-	struct page **pages;
-
-	pages = drm_gem_get_pages(&bo->base);
-	if (IS_ERR(pages))
-		return PTR_ERR(pages);
-
-	if (bo->flags & DRM_IVPU_BO_WC)
-		set_pages_array_wc(pages, npages);
-	else if (bo->flags & DRM_IVPU_BO_UNCACHED)
-		set_pages_array_uc(pages, npages);
-
-	bo->pages = pages;
-	return 0;
-}
-
-static void shmem_free_pages_locked(struct ivpu_bo *bo)
-{
-	if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
-		set_pages_array_wb(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
-
-	drm_gem_put_pages(&bo->base, bo->pages, true, false);
-	bo->pages = NULL;
-}
-
-static int ivpu_bo_map_pages_locked(struct ivpu_bo *bo)
-{
-	int npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
-	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-	struct sg_table *sgt;
-	int ret;
-
-	sgt = drm_prime_pages_to_sg(&vdev->drm, bo->pages, npages);
-	if (IS_ERR(sgt)) {
-		ivpu_err(vdev, "Failed to allocate sgtable\n");
-		return PTR_ERR(sgt);
-	}
-
-	ret = dma_map_sgtable(vdev->drm.dev, sgt, DMA_BIDIRECTIONAL, 0);
-	if (ret) {
-		ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
-		goto err_free_sgt;
-	}
-
-	bo->sgt = sgt;
-	return 0;
-
-err_free_sgt:
-	kfree(sgt);
-	return ret;
-}
-
-static void ivpu_bo_unmap_pages_locked(struct ivpu_bo *bo)
-{
-	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-
-	dma_unmap_sgtable(vdev->drm.dev, bo->sgt, DMA_BIDIRECTIONAL, 0);
-	sg_free_table(bo->sgt);
-	kfree(bo->sgt);
-	bo->sgt = NULL;
-}
-
-static const struct ivpu_bo_ops shmem_ops = {
-	.type = IVPU_BO_TYPE_SHMEM,
-	.name = "shmem",
-	.alloc_pages = shmem_alloc_pages_locked,
-	.free_pages = shmem_free_pages_locked,
-	.map_pages = ivpu_bo_map_pages_locked,
-	.unmap_pages = ivpu_bo_unmap_pages_locked,
-};
-
-static int __must_check internal_alloc_pages_locked(struct ivpu_bo *bo)
-{
-	unsigned int i, npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
-	struct page **pages;
-	int ret;
-
-	pages = kvmalloc_array(npages, sizeof(*bo->pages), GFP_KERNEL);
-	if (!pages)
-		return -ENOMEM;
-
-	for (i = 0; i < npages; i++) {
-		pages[i] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
-		if (!pages[i]) {
-			ret = -ENOMEM;
-			goto err_free_pages;
-		}
-		cond_resched();
-	}
-
-	bo->pages = pages;
-	return 0;
-
-err_free_pages:
-	while (i--)
-		put_page(pages[i]);
-	kvfree(pages);
-	return ret;
-}
-
-static void internal_free_pages_locked(struct ivpu_bo *bo)
-{
-	unsigned int i, npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
-
-	if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
-		set_pages_array_wb(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
-
-	for (i = 0; i < npages; i++)
-		put_page(bo->pages[i]);
-
-	kvfree(bo->pages);
-	bo->pages = NULL;
-}
-
-static const struct ivpu_bo_ops internal_ops = {
-	.type = IVPU_BO_TYPE_INTERNAL,
-	.name = "internal",
-	.alloc_pages = internal_alloc_pages_locked,
-	.free_pages = internal_free_pages_locked,
-	.map_pages = ivpu_bo_map_pages_locked,
-	.unmap_pages = ivpu_bo_unmap_pages_locked,
-};
-
-static int __must_check ivpu_bo_alloc_and_map_pages_locked(struct ivpu_bo *bo)
-{
-	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-	int ret;
-
-	lockdep_assert_held(&bo->lock);
-	drm_WARN_ON(&vdev->drm, bo->sgt);
-
-	ret = bo->ops->alloc_pages(bo);
-	if (ret) {
-		ivpu_err(vdev, "Failed to allocate pages for BO: %d", ret);
-		return ret;
-	}
-
-	ret = bo->ops->map_pages(bo);
-	if (ret) {
-		ivpu_err(vdev, "Failed to map pages for BO: %d", ret);
-		goto err_free_pages;
-	}
-	return ret;
-
-err_free_pages:
-	bo->ops->free_pages(bo);
-	return ret;
-}
-
-static void ivpu_bo_unmap_and_free_pages(struct ivpu_bo *bo)
-{
-	mutex_lock(&bo->lock);
-
-	WARN_ON(!bo->sgt);
-	bo->ops->unmap_pages(bo);
-	WARN_ON(bo->sgt);
-	bo->ops->free_pages(bo);
-	WARN_ON(bo->pages);
-
-	mutex_unlock(&bo->lock);
+	if (bo->ctx)
+		ivpu_dbg(vdev, BO, "%6s: size %zu has_pages %d dma_mapped %d handle %u ctx %d vpu_addr 0x%llx mmu_mapped %d\n",
+			 action, ivpu_bo_size(bo), (bool)bo->base.pages, (bool)bo->base.sgt,
+			 bo->handle, bo->ctx->id, bo->vpu_addr, bo->mmu_mapped);
+	else
+		ivpu_dbg(vdev, BO, "%6s: size %zu has_pages %d dma_mapped %d handle %u (not added to context)\n",
+			 action, ivpu_bo_size(bo), (bool)bo->base.pages, (bool)bo->base.sgt,
+			 bo->handle);
 }
 
 /*
@@ -245,21 +48,24 @@ int __must_check ivpu_bo_pin(struct ivpu_bo *bo)
 
 	mutex_lock(&bo->lock);
 
-	if (!bo->vpu_addr) {
-		ivpu_err(vdev, "vpu_addr not set for BO ctx_id: %d handle: %d\n",
-			 bo->ctx->id, bo->handle);
+	ivpu_dbg_bo(vdev, bo, "pin");
+
+	if (!bo->ctx) {
+		ivpu_err(vdev, "vpu_addr not allocated for BO %d\n", bo->handle);
 		ret = -EINVAL;
 		goto unlock;
 	}
 
-	if (!bo->sgt) {
-		ret = ivpu_bo_alloc_and_map_pages_locked(bo);
-		if (ret)
-			goto unlock;
-	}
-
 	if (!bo->mmu_mapped) {
-		ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, bo->sgt,
+		struct sg_table *sgt = drm_gem_shmem_get_pages_sgt(&bo->base);
+
+		if (IS_ERR(sgt)) {
+			ret = PTR_ERR(sgt);
+			ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
+			goto unlock;
+		}
+
+		ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, sgt,
 					       ivpu_bo_is_snooped(bo));
 		if (ret) {
 			ivpu_err(vdev, "Failed to map BO in MMU: %d\n", ret);
@@ -281,248 +87,213 @@ ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
 	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 	int ret;
 
-	if (!range) {
-		if (bo->flags & DRM_IVPU_BO_SHAVE_MEM)
-			range = &vdev->hw->ranges.shave;
-		else if (bo->flags & DRM_IVPU_BO_DMA_MEM)
-			range = &vdev->hw->ranges.dma;
-		else
-			range = &vdev->hw->ranges.user;
-	}
+	mutex_lock(&bo->lock);
 
-	mutex_lock(&ctx->lock);
-	ret = ivpu_mmu_context_insert_node_locked(ctx, range, ivpu_bo_size(bo), &bo->mm_node);
+	ret = ivpu_mmu_context_insert_node(ctx, range, ivpu_bo_size(bo), &bo->mm_node);
 	if (!ret) {
 		bo->ctx = ctx;
 		bo->vpu_addr = bo->mm_node.start;
-		list_add_tail(&bo->ctx_node, &ctx->bo_list);
+	} else {
+		ivpu_err(vdev, "Failed to add BO to context %u: %d\n", ctx->id, ret);
 	}
-	mutex_unlock(&ctx->lock);
+
+	ivpu_dbg_bo(vdev, bo, "alloc");
+
+	mutex_unlock(&bo->lock);
 
 	return ret;
 }
 
-static void ivpu_bo_free_vpu_addr(struct ivpu_bo *bo)
+static void ivpu_bo_unbind_locked(struct ivpu_bo *bo)
 {
 	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-	struct ivpu_mmu_context *ctx = bo->ctx;
 
-	ivpu_dbg(vdev, BO, "remove from ctx: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
-		 ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
+	lockdep_assert_held(&bo->lock);
 
-	mutex_lock(&bo->lock);
+	ivpu_dbg_bo(vdev, bo, "unbind");
+
+	/* TODO: dma_unmap */
 
 	if (bo->mmu_mapped) {
-		drm_WARN_ON(&vdev->drm, !bo->sgt);
-		ivpu_mmu_context_unmap_sgt(vdev, ctx, bo->vpu_addr, bo->sgt);
+		drm_WARN_ON(&vdev->drm, !bo->ctx);
+		drm_WARN_ON(&vdev->drm, !bo->vpu_addr);
+		drm_WARN_ON(&vdev->drm, !bo->base.sgt);
+		ivpu_mmu_context_unmap_sgt(vdev, bo->ctx, bo->vpu_addr, bo->base.sgt);
 		bo->mmu_mapped = false;
 	}
 
-	mutex_lock(&ctx->lock);
-	list_del(&bo->ctx_node);
-	bo->vpu_addr = 0;
-	bo->ctx = NULL;
-	ivpu_mmu_context_remove_node_locked(ctx, &bo->mm_node);
-	mutex_unlock(&ctx->lock);
+	if (bo->ctx) {
+		ivpu_mmu_context_remove_node(bo->ctx, &bo->mm_node);
+		bo->vpu_addr = 0;
+		bo->ctx = NULL;
+	}
+}
 
+static void ivpu_bo_unbind(struct ivpu_bo *bo)
+{
+	mutex_lock(&bo->lock);
+	ivpu_bo_unbind_locked(bo);
 	mutex_unlock(&bo->lock);
 }
 
-void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx)
+void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx)
 {
-	struct ivpu_bo *bo, *tmp;
+	struct ivpu_bo *bo;
 
-	list_for_each_entry_safe(bo, tmp, &ctx->bo_list, ctx_node)
-		ivpu_bo_free_vpu_addr(bo);
+	if (drm_WARN_ON(&vdev->drm, !ctx))
+		return;
+
+	mutex_lock(&vdev->bo_list_lock);
+	list_for_each_entry(bo, &vdev->bo_list, bo_list_node) {
+		mutex_lock(&bo->lock);
+		if (bo->ctx == ctx)
+			ivpu_bo_unbind_locked(bo);
+		mutex_unlock(&bo->lock);
+	}
+	mutex_unlock(&vdev->bo_list_lock);
+}
+
+struct drm_gem_object *ivpu_gem_create_object(struct drm_device *dev, size_t size)
+{
+	struct ivpu_bo *bo;
+
+	if (size == 0 || !PAGE_ALIGNED(size))
+		return ERR_PTR(-EINVAL);
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return ERR_PTR(-ENOMEM);
+
+	bo->base.base.funcs = &ivpu_gem_funcs;
+	bo->base.pages_mark_dirty_on_put = true; /* VPU can dirty a BO anytime */
+
+	INIT_LIST_HEAD(&bo->bo_list_node);
+	mutex_init(&bo->lock);
+
+	return &bo->base.base;
 }
 
 static struct ivpu_bo *
-ivpu_bo_alloc(struct ivpu_device *vdev, struct ivpu_mmu_context *mmu_context,
-	      u64 size, u32 flags, const struct ivpu_bo_ops *ops,
-	      const struct ivpu_addr_range *range, u64 user_ptr)
+ivpu_bo_create(struct ivpu_device *vdev, u64 size, u32 flags)
 {
+	struct drm_gem_shmem_object *shmem;
 	struct ivpu_bo *bo;
-	int ret = 0;
-
-	if (drm_WARN_ON(&vdev->drm, size == 0 || !PAGE_ALIGNED(size)))
-		return ERR_PTR(-EINVAL);
 
 	switch (flags & DRM_IVPU_BO_CACHE_MASK) {
 	case DRM_IVPU_BO_CACHED:
-	case DRM_IVPU_BO_UNCACHED:
 	case DRM_IVPU_BO_WC:
 		break;
 	default:
 		return ERR_PTR(-EINVAL);
 	}
 
-	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
-	if (!bo)
-		return ERR_PTR(-ENOMEM);
+	shmem = drm_gem_shmem_create(&vdev->drm, size);
+	if (IS_ERR(shmem))
+		return ERR_CAST(shmem);
 
-	mutex_init(&bo->lock);
-	bo->base.funcs = &ivpu_gem_funcs;
+	bo = to_ivpu_bo(&shmem->base);
+	bo->base.map_wc = flags & DRM_IVPU_BO_WC;
 	bo->flags = flags;
-	bo->ops = ops;
-	bo->user_ptr = user_ptr;
 
-	if (ops->type == IVPU_BO_TYPE_SHMEM)
-		ret = drm_gem_object_init(&vdev->drm, &bo->base, size);
-	else
-		drm_gem_private_object_init(&vdev->drm, &bo->base, size);
+	mutex_lock(&vdev->bo_list_lock);
+	list_add_tail(&bo->bo_list_node, &vdev->bo_list);
+	mutex_unlock(&vdev->bo_list_lock);
 
-	if (ret) {
-		ivpu_err(vdev, "Failed to initialize drm object\n");
-		goto err_free;
-	}
-
-	if (flags & DRM_IVPU_BO_MAPPABLE) {
-		ret = drm_gem_create_mmap_offset(&bo->base);
-		if (ret) {
-			ivpu_err(vdev, "Failed to allocate mmap offset\n");
-			goto err_release;
-		}
-	}
-
-	if (mmu_context) {
-		ret = ivpu_bo_alloc_vpu_addr(bo, mmu_context, range);
-		if (ret) {
-			ivpu_err(vdev, "Failed to add BO to context: %d\n", ret);
-			goto err_release;
-		}
-	}
+	ivpu_dbg(vdev, BO, "create: vpu_addr 0x%llx size %zu flags 0x%x\n",
+		 bo->vpu_addr, bo->base.base.size, flags);
 
 	return bo;
+}
 
-err_release:
-	drm_gem_object_release(&bo->base);
-err_free:
-	kfree(bo);
-	return ERR_PTR(ret);
+static int ivpu_bo_open(struct drm_gem_object *obj, struct drm_file *file)
+{
+	struct ivpu_file_priv *file_priv = file->driver_priv;
+	struct ivpu_device *vdev = file_priv->vdev;
+	struct ivpu_bo *bo = to_ivpu_bo(obj);
+	struct ivpu_addr_range *range;
+
+	if (bo->flags & DRM_IVPU_BO_SHAVE_MEM)
+		range = &vdev->hw->ranges.shave;
+	else if (bo->flags & DRM_IVPU_BO_DMA_MEM)
+		range = &vdev->hw->ranges.dma;
+	else
+		range = &vdev->hw->ranges.user;
+
+	return ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, range);
 }
 
 static void ivpu_bo_free(struct drm_gem_object *obj)
 {
+	struct ivpu_device *vdev = to_ivpu_device(obj->dev);
 	struct ivpu_bo *bo = to_ivpu_bo(obj);
-	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 
-	if (bo->ctx)
-		ivpu_dbg(vdev, BO, "free: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
-			 bo->ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
-	else
-		ivpu_dbg(vdev, BO, "free: ctx (released) allocated %d mmu_mapped %d\n",
-			 (bool)bo->sgt, bo->mmu_mapped);
+	mutex_lock(&vdev->bo_list_lock);
+	list_del(&bo->bo_list_node);
+	mutex_unlock(&vdev->bo_list_lock);
 
 	drm_WARN_ON(&vdev->drm, !dma_resv_test_signaled(obj->resv, DMA_RESV_USAGE_READ));
 
-	vunmap(bo->kvaddr);
-
-	if (bo->ctx)
-		ivpu_bo_free_vpu_addr(bo);
-
-	if (bo->sgt)
-		ivpu_bo_unmap_and_free_pages(bo);
-
-	if (bo->base.import_attach)
-		drm_prime_gem_destroy(&bo->base, bo->sgt);
-
-	drm_gem_object_release(&bo->base);
+	ivpu_dbg_bo(vdev, bo, "free");
 
+	ivpu_bo_unbind(bo);
 	mutex_destroy(&bo->lock);
-	kfree(bo);
+
+	drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
+	drm_gem_shmem_free(&bo->base);
 }
 
-static int ivpu_bo_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
-{
-	struct ivpu_bo *bo = to_ivpu_bo(obj);
-	struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-
-	ivpu_dbg(vdev, BO, "mmap: ctx %u handle %u vpu_addr 0x%llx size %zu type %s",
-		 bo->ctx->id, bo->handle, bo->vpu_addr, ivpu_bo_size(bo), bo->ops->name);
-
-	if (obj->import_attach) {
-		/* Drop the reference drm_gem_mmap_obj() acquired.*/
-		drm_gem_object_put(obj);
-		vma->vm_private_data = NULL;
-		return dma_buf_mmap(obj->dma_buf, vma, 0);
-	}
-
-	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND);
-	vma->vm_page_prot = ivpu_bo_pgprot(bo, vm_get_page_prot(vma->vm_flags));
-
-	return 0;
-}
-
-static struct sg_table *ivpu_bo_get_sg_table(struct drm_gem_object *obj)
-{
-	struct ivpu_bo *bo = to_ivpu_bo(obj);
-	loff_t npages = obj->size >> PAGE_SHIFT;
-	int ret = 0;
-
-	mutex_lock(&bo->lock);
-
-	if (!bo->sgt)
-		ret = ivpu_bo_alloc_and_map_pages_locked(bo);
-
-	mutex_unlock(&bo->lock);
-
-	if (ret)
-		return ERR_PTR(ret);
-
-	return drm_prime_pages_to_sg(obj->dev, bo->pages, npages);
-}
-
-static vm_fault_t ivpu_vm_fault(struct vm_fault *vmf)
-{
-	struct vm_area_struct *vma = vmf->vma;
-	struct drm_gem_object *obj = vma->vm_private_data;
-	struct ivpu_bo *bo = to_ivpu_bo(obj);
-	loff_t npages = obj->size >> PAGE_SHIFT;
-	pgoff_t page_offset;
-	struct page *page;
-	vm_fault_t ret;
-	int err;
-
-	mutex_lock(&bo->lock);
-
-	if (!bo->sgt) {
-		err = ivpu_bo_alloc_and_map_pages_locked(bo);
-		if (err) {
-			ret = vmf_error(err);
-			goto unlock;
-		}
-	}
-
-	/* We don't use vmf->pgoff since that has the fake offset */
-	page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
-	if (page_offset >= npages) {
-		ret = VM_FAULT_SIGBUS;
-	} else {
-		page = bo->pages[page_offset];
-		ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
-	}
-
-unlock:
-	mutex_unlock(&bo->lock);
-
-	return ret;
-}
-
-static const struct vm_operations_struct ivpu_vm_ops = {
-	.fault = ivpu_vm_fault,
-	.open = drm_gem_vm_open,
-	.close = drm_gem_vm_close,
+static const struct dma_buf_ops ivpu_bo_dmabuf_ops =  {
+	.cache_sgt_mapping = true,
+	.attach = drm_gem_map_attach,
+	.detach = drm_gem_map_detach,
+	.map_dma_buf = drm_gem_map_dma_buf,
+	.unmap_dma_buf = drm_gem_unmap_dma_buf,
+	.release = drm_gem_dmabuf_release,
+	.mmap = drm_gem_dmabuf_mmap,
+	.vmap = drm_gem_dmabuf_vmap,
+	.vunmap = drm_gem_dmabuf_vunmap,
 };
 
+static struct dma_buf *ivpu_bo_export(struct drm_gem_object *obj, int flags)
+{
+	struct drm_device *dev = obj->dev;
+	struct dma_buf_export_info exp_info = {
+		.exp_name = KBUILD_MODNAME,
+		.owner = dev->driver->fops->owner,
+		.ops = &ivpu_bo_dmabuf_ops,
+		.size = obj->size,
+		.flags = flags,
+		.priv = obj,
+		.resv = obj->resv,
+	};
+	void *sgt;
+
+	/*
+	 * Make sure that pages are allocated and dma-mapped before exporting the bo.
+	 * DMA-mapping is required if the bo will be imported to the same device.
+	 */
+	sgt = drm_gem_shmem_get_pages_sgt(to_drm_gem_shmem_obj(obj));
+	if (IS_ERR(sgt))
+		return sgt;
+
+	return drm_gem_dmabuf_export(dev, &exp_info);
+}
+
 static const struct drm_gem_object_funcs ivpu_gem_funcs = {
 	.free = ivpu_bo_free,
-	.mmap = ivpu_bo_mmap,
-	.vm_ops = &ivpu_vm_ops,
-	.get_sg_table = ivpu_bo_get_sg_table,
+	.open = ivpu_bo_open,
+	.export = ivpu_bo_export,
+	.print_info = drm_gem_shmem_object_print_info,
+	.pin = drm_gem_shmem_object_pin,
+	.unpin = drm_gem_shmem_object_unpin,
+	.get_sg_table = drm_gem_shmem_object_get_sg_table,
+	.vmap = drm_gem_shmem_object_vmap,
+	.vunmap = drm_gem_shmem_object_vunmap,
+	.mmap = drm_gem_shmem_object_mmap,
+	.vm_ops = &drm_gem_shmem_vm_ops,
 };
 
-int
-ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+int ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct ivpu_file_priv *file_priv = file->driver_priv;
 	struct ivpu_device *vdev = file_priv->vdev;
@@ -537,23 +308,20 @@ ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (size == 0)
 		return -EINVAL;
 
-	bo = ivpu_bo_alloc(vdev, &file_priv->ctx, size, args->flags, &shmem_ops, NULL, 0);
+	bo = ivpu_bo_create(vdev, size, args->flags);
 	if (IS_ERR(bo)) {
 		ivpu_err(vdev, "Failed to create BO: %pe (ctx %u size %llu flags 0x%x)",
 			 bo, file_priv->ctx.id, args->size, args->flags);
 		return PTR_ERR(bo);
 	}
 
-	ret = drm_gem_handle_create(file, &bo->base, &bo->handle);
+	ret = drm_gem_handle_create(file, &bo->base.base, &bo->handle);
 	if (!ret) {
 		args->vpu_addr = bo->vpu_addr;
 		args->handle = bo->handle;
 	}
 
-	drm_gem_object_put(&bo->base);
-
-	ivpu_dbg(vdev, BO, "alloc shmem: ctx %u vpu_addr 0x%llx size %zu flags 0x%x\n",
-		 file_priv->ctx.id, bo->vpu_addr, ivpu_bo_size(bo), bo->flags);
+	drm_gem_object_put(&bo->base.base);
 
 	return ret;
 }
@@ -563,8 +331,8 @@ ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 fla
 {
 	const struct ivpu_addr_range *range;
 	struct ivpu_addr_range fixed_range;
+	struct iosys_map map;
 	struct ivpu_bo *bo;
-	pgprot_t prot;
 	int ret;
 
 	drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(vpu_addr));
@@ -578,81 +346,42 @@ ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 fla
 		range = &vdev->hw->ranges.global;
 	}
 
-	bo = ivpu_bo_alloc(vdev, &vdev->gctx, size, flags, &internal_ops, range, 0);
+	bo = ivpu_bo_create(vdev, size, flags);
 	if (IS_ERR(bo)) {
 		ivpu_err(vdev, "Failed to create BO: %pe (vpu_addr 0x%llx size %llu flags 0x%x)",
 			 bo, vpu_addr, size, flags);
 		return NULL;
 	}
 
+	ret = ivpu_bo_alloc_vpu_addr(bo, &vdev->gctx, range);
+	if (ret)
+		goto err_put;
+
 	ret = ivpu_bo_pin(bo);
 	if (ret)
 		goto err_put;
 
-	if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
-		drm_clflush_pages(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
-
-	if (bo->flags & DRM_IVPU_BO_WC)
-		set_pages_array_wc(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
-	else if (bo->flags & DRM_IVPU_BO_UNCACHED)
-		set_pages_array_uc(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
-
-	prot = ivpu_bo_pgprot(bo, PAGE_KERNEL);
-	bo->kvaddr = vmap(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT, VM_MAP, prot);
-	if (!bo->kvaddr) {
-		ivpu_err(vdev, "Failed to map BO into kernel virtual memory\n");
+	ret = drm_gem_shmem_vmap(&bo->base, &map);
+	if (ret)
 		goto err_put;
-	}
-
-	ivpu_dbg(vdev, BO, "alloc internal: ctx 0 vpu_addr 0x%llx size %zu flags 0x%x\n",
-		 bo->vpu_addr, ivpu_bo_size(bo), flags);
 
 	return bo;
 
 err_put:
-	drm_gem_object_put(&bo->base);
+	drm_gem_object_put(&bo->base.base);
 	return NULL;
 }
 
 void ivpu_bo_free_internal(struct ivpu_bo *bo)
 {
-	drm_gem_object_put(&bo->base);
-}
+	struct iosys_map map = IOSYS_MAP_INIT_VADDR(bo->base.vaddr);
 
-struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
-{
-	struct ivpu_device *vdev = to_ivpu_device(dev);
-	struct dma_buf_attachment *attach;
-	struct ivpu_bo *bo;
-
-	attach = dma_buf_attach(buf, dev->dev);
-	if (IS_ERR(attach))
-		return ERR_CAST(attach);
-
-	get_dma_buf(buf);
-
-	bo = ivpu_bo_alloc(vdev, NULL, buf->size, DRM_IVPU_BO_MAPPABLE, &prime_ops, NULL, 0);
-	if (IS_ERR(bo)) {
-		ivpu_err(vdev, "Failed to import BO: %pe (size %lu)", bo, buf->size);
-		goto err_detach;
-	}
-
-	lockdep_set_class(&bo->lock, &prime_bo_lock_class_key);
-
-	bo->base.import_attach = attach;
-
-	return &bo->base;
-
-err_detach:
-	dma_buf_detach(buf, attach);
-	dma_buf_put(buf);
-	return ERR_CAST(bo);
+	drm_gem_shmem_vunmap(&bo->base, &map);
+	drm_gem_object_put(&bo->base.base);
 }
 
 int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct ivpu_file_priv *file_priv = file->driver_priv;
-	struct ivpu_device *vdev = to_ivpu_device(dev);
 	struct drm_ivpu_bo_info *args = data;
 	struct drm_gem_object *obj;
 	struct ivpu_bo *bo;
@@ -665,21 +394,12 @@ int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 	bo = to_ivpu_bo(obj);
 
 	mutex_lock(&bo->lock);
-
-	if (!bo->ctx) {
-		ret = ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, NULL);
-		if (ret) {
-			ivpu_err(vdev, "Failed to allocate vpu_addr: %d\n", ret);
-			goto unlock;
-		}
-	}
-
 	args->flags = bo->flags;
 	args->mmap_offset = drm_vma_node_offset_addr(&obj->vma_node);
 	args->vpu_addr = bo->vpu_addr;
 	args->size = obj->size;
-unlock:
 	mutex_unlock(&bo->lock);
+
 	drm_gem_object_put(obj);
 	return ret;
 }
@@ -714,41 +434,41 @@ static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
 {
 	unsigned long dma_refcount = 0;
 
-	if (bo->base.dma_buf && bo->base.dma_buf->file)
-		dma_refcount = atomic_long_read(&bo->base.dma_buf->file->f_count);
+	mutex_lock(&bo->lock);
 
-	drm_printf(p, "%5u %6d %16llx %10lu %10u %12lu %14s\n",
-		   bo->ctx->id, bo->handle, bo->vpu_addr, ivpu_bo_size(bo),
-		   kref_read(&bo->base.refcount), dma_refcount, bo->ops->name);
+	if (bo->base.base.dma_buf && bo->base.base.dma_buf->file)
+		dma_refcount = atomic_long_read(&bo->base.base.dma_buf->file->f_count);
+
+	drm_printf(p, "%-3u %-6d 0x%-12llx %-10lu 0x%-8x %-4u %-8lu",
+		   bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.base.size,
+		   bo->flags, kref_read(&bo->base.base.refcount), dma_refcount);
+
+	if (bo->base.base.import_attach)
+		drm_printf(p, " imported");
+
+	if (bo->base.pages)
+		drm_printf(p, " has_pages");
+
+	if (bo->mmu_mapped)
+		drm_printf(p, " mmu_mapped");
+
+	drm_printf(p, "\n");
+
+	mutex_unlock(&bo->lock);
 }
 
 void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p)
 {
 	struct ivpu_device *vdev = to_ivpu_device(dev);
-	struct ivpu_file_priv *file_priv;
-	unsigned long ctx_id;
 	struct ivpu_bo *bo;
 
-	drm_printf(p, "%5s %6s %16s %10s %10s %12s %14s\n",
-		   "ctx", "handle", "vpu_addr", "size", "refcount", "dma_refcount", "type");
+	drm_printf(p, "%-3s %-6s %-14s %-10s %-10s %-4s %-8s %s\n",
+		   "ctx", "handle", "vpu_addr", "size", "flags", "refs", "dma_refs", "attribs");
 
-	mutex_lock(&vdev->gctx.lock);
-	list_for_each_entry(bo, &vdev->gctx.bo_list, ctx_node)
+	mutex_lock(&vdev->bo_list_lock);
+	list_for_each_entry(bo, &vdev->bo_list, bo_list_node)
 		ivpu_bo_print_info(bo, p);
-	mutex_unlock(&vdev->gctx.lock);
-
-	xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
-		file_priv = ivpu_file_priv_get_by_ctx_id(vdev, ctx_id);
-		if (!file_priv)
-			continue;
-
-		mutex_lock(&file_priv->ctx.lock);
-		list_for_each_entry(bo, &file_priv->ctx.bo_list, ctx_node)
-			ivpu_bo_print_info(bo, p);
-		mutex_unlock(&file_priv->ctx.lock);
-
-		ivpu_file_priv_put(&file_priv);
-	}
+	mutex_unlock(&vdev->bo_list_lock);
 }
 
 void ivpu_bo_list_print(struct drm_device *dev)
diff --git a/drivers/accel/ivpu/ivpu_gem.h b/drivers/accel/ivpu/ivpu_gem.h
index a0b4d4a32b3b..d75cad0d3c74 100644
--- a/drivers/accel/ivpu/ivpu_gem.h
+++ b/drivers/accel/ivpu/ivpu_gem.h
@@ -6,84 +6,52 @@
 #define __IVPU_GEM_H__
 
 #include <drm/drm_gem.h>
+#include <drm/drm_gem_shmem_helper.h>
 #include <drm/drm_mm.h>
 
-struct dma_buf;
-struct ivpu_bo_ops;
 struct ivpu_file_priv;
 
 struct ivpu_bo {
-	struct drm_gem_object base;
-	const struct ivpu_bo_ops *ops;
-
+	struct drm_gem_shmem_object base;
 	struct ivpu_mmu_context *ctx;
-	struct list_head ctx_node;
+	struct list_head bo_list_node;
 	struct drm_mm_node mm_node;
 
-	struct mutex lock; /* Protects: pages, sgt, mmu_mapped */
-	struct sg_table *sgt;
-	struct page **pages;
-	bool mmu_mapped;
-
-	void *kvaddr;
+	struct mutex lock; /* Protects: ctx, mmu_mapped, vpu_addr */
 	u64 vpu_addr;
 	u32 handle;
 	u32 flags;
-	uintptr_t user_ptr;
-	u32 job_status;
-};
-
-enum ivpu_bo_type {
-	IVPU_BO_TYPE_SHMEM = 1,
-	IVPU_BO_TYPE_INTERNAL,
-	IVPU_BO_TYPE_PRIME,
-};
-
-struct ivpu_bo_ops {
-	enum ivpu_bo_type type;
-	const char *name;
-	int (*alloc_pages)(struct ivpu_bo *bo);
-	void (*free_pages)(struct ivpu_bo *bo);
-	int (*map_pages)(struct ivpu_bo *bo);
-	void (*unmap_pages)(struct ivpu_bo *bo);
+	u32 job_status; /* Valid only for command buffer */
+	bool mmu_mapped;
 };
 
 int ivpu_bo_pin(struct ivpu_bo *bo);
-void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx);
-void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p);
-void ivpu_bo_list_print(struct drm_device *dev);
+void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx);
 
-struct ivpu_bo *
-ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags);
+struct drm_gem_object *ivpu_gem_create_object(struct drm_device *dev, size_t size);
+struct ivpu_bo *ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags);
 void ivpu_bo_free_internal(struct ivpu_bo *bo);
-struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf);
-void ivpu_bo_unmap_sgt_and_remove_from_context(struct ivpu_bo *bo);
 
 int ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
 int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
 int ivpu_bo_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
 
+void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p);
+void ivpu_bo_list_print(struct drm_device *dev);
+
 static inline struct ivpu_bo *to_ivpu_bo(struct drm_gem_object *obj)
 {
-	return container_of(obj, struct ivpu_bo, base);
+	return container_of(obj, struct ivpu_bo, base.base);
 }
 
 static inline void *ivpu_bo_vaddr(struct ivpu_bo *bo)
 {
-	return bo->kvaddr;
+	return bo->base.vaddr;
 }
 
 static inline size_t ivpu_bo_size(struct ivpu_bo *bo)
 {
-	return bo->base.size;
-}
-
-static inline struct page *ivpu_bo_get_page(struct ivpu_bo *bo, u64 offset)
-{
-	if (offset > ivpu_bo_size(bo) || !bo->pages)
-		return NULL;
-
-	return bo->pages[offset / PAGE_SIZE];
+	return bo->base.base.size;
 }
 
 static inline u32 ivpu_bo_cache_mode(struct ivpu_bo *bo)
@@ -96,20 +64,9 @@ static inline bool ivpu_bo_is_snooped(struct ivpu_bo *bo)
 	return ivpu_bo_cache_mode(bo) == DRM_IVPU_BO_CACHED;
 }
 
-static inline pgprot_t ivpu_bo_pgprot(struct ivpu_bo *bo, pgprot_t prot)
-{
-	if (bo->flags & DRM_IVPU_BO_WC)
-		return pgprot_writecombine(prot);
-
-	if (bo->flags & DRM_IVPU_BO_UNCACHED)
-		return pgprot_noncached(prot);
-
-	return prot;
-}
-
 static inline struct ivpu_device *ivpu_bo_to_vdev(struct ivpu_bo *bo)
 {
-	return to_ivpu_device(bo->base.dev);
+	return to_ivpu_device(bo->base.base.dev);
 }
 
 static inline void *ivpu_to_cpu_addr(struct ivpu_bo *bo, u32 vpu_addr)
diff --git a/drivers/accel/ivpu/ivpu_hw.h b/drivers/accel/ivpu/ivpu_hw.h
index 1079e06255ba..b2909168a0a6 100644
--- a/drivers/accel/ivpu/ivpu_hw.h
+++ b/drivers/accel/ivpu/ivpu_hw.h
@@ -15,8 +15,11 @@ struct ivpu_hw_ops {
 	int (*power_down)(struct ivpu_device *vdev);
 	int (*reset)(struct ivpu_device *vdev);
 	bool (*is_idle)(struct ivpu_device *vdev);
+	int (*wait_for_idle)(struct ivpu_device *vdev);
 	void (*wdt_disable)(struct ivpu_device *vdev);
 	void (*diagnose_failure)(struct ivpu_device *vdev);
+	u32 (*profiling_freq_get)(struct ivpu_device *vdev);
+	void (*profiling_freq_drive)(struct ivpu_device *vdev, bool enable);
 	u32 (*reg_pll_freq_get)(struct ivpu_device *vdev);
 	u32 (*reg_telemetry_offset_get)(struct ivpu_device *vdev);
 	u32 (*reg_telemetry_size_get)(struct ivpu_device *vdev);
@@ -58,6 +61,8 @@ struct ivpu_hw_info {
 	u32 sku;
 	u16 config;
 	int dma_bits;
+	ktime_t d0i3_entry_host_ts;
+	u64 d0i3_entry_vpu_ts;
 };
 
 extern const struct ivpu_hw_ops ivpu_hw_37xx_ops;
@@ -85,6 +90,11 @@ static inline bool ivpu_hw_is_idle(struct ivpu_device *vdev)
 	return vdev->hw->ops->is_idle(vdev);
 };
 
+static inline int ivpu_hw_wait_for_idle(struct ivpu_device *vdev)
+{
+	return vdev->hw->ops->wait_for_idle(vdev);
+};
+
 static inline int ivpu_hw_power_down(struct ivpu_device *vdev)
 {
 	ivpu_dbg(vdev, PM, "HW power down\n");
@@ -104,6 +114,16 @@ static inline void ivpu_hw_wdt_disable(struct ivpu_device *vdev)
 	vdev->hw->ops->wdt_disable(vdev);
 };
 
+static inline u32 ivpu_hw_profiling_freq_get(struct ivpu_device *vdev)
+{
+	return vdev->hw->ops->profiling_freq_get(vdev);
+};
+
+static inline void ivpu_hw_profiling_freq_drive(struct ivpu_device *vdev, bool enable)
+{
+	return vdev->hw->ops->profiling_freq_drive(vdev, enable);
+};
+
 /* Register indirect accesses */
 static inline u32 ivpu_hw_reg_pll_freq_get(struct ivpu_device *vdev)
 {
diff --git a/drivers/accel/ivpu/ivpu_hw_37xx.c b/drivers/accel/ivpu/ivpu_hw_37xx.c
index 5c0246b9e522..4ab1f14cf360 100644
--- a/drivers/accel/ivpu/ivpu_hw_37xx.c
+++ b/drivers/accel/ivpu/ivpu_hw_37xx.c
@@ -29,6 +29,7 @@
 
 #define PLL_REF_CLK_FREQ	     (50 * 1000000)
 #define PLL_SIMULATION_FREQ	     (10 * 1000000)
+#define PLL_PROF_CLK_FREQ	     (38400 * 1000)
 #define PLL_DEFAULT_EPP_VALUE	     0x80
 
 #define TIM_SAFE_ENABLE		     0xf1d0dead
@@ -37,7 +38,7 @@
 #define TIMEOUT_US		     (150 * USEC_PER_MSEC)
 #define PWR_ISLAND_STATUS_TIMEOUT_US (5 * USEC_PER_MSEC)
 #define PLL_TIMEOUT_US		     (1500 * USEC_PER_MSEC)
-#define IDLE_TIMEOUT_US		     (500 * USEC_PER_MSEC)
+#define IDLE_TIMEOUT_US		     (5 * USEC_PER_MSEC)
 
 #define ICB_0_IRQ_MASK ((REG_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, HOST_IPC_FIFO_INT)) | \
 			(REG_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_0_INT)) | \
@@ -90,6 +91,7 @@ static void ivpu_hw_timeouts_init(struct ivpu_device *vdev)
 	vdev->timeout.tdr = 2000;
 	vdev->timeout.reschedule_suspend = 10;
 	vdev->timeout.autosuspend = 10;
+	vdev->timeout.d0i3_entry_msg = 5;
 }
 
 static int ivpu_pll_wait_for_cmd_send(struct ivpu_device *vdev)
@@ -651,10 +653,6 @@ static int ivpu_hw_37xx_power_up(struct ivpu_device *vdev)
 {
 	int ret;
 
-	ret = ivpu_hw_37xx_reset(vdev);
-	if (ret)
-		ivpu_warn(vdev, "Failed to reset HW: %d\n", ret);
-
 	ret = ivpu_hw_37xx_d0i3_disable(vdev);
 	if (ret)
 		ivpu_warn(vdev, "Failed to disable D0I3: %d\n", ret);
@@ -718,12 +716,28 @@ static bool ivpu_hw_37xx_is_idle(struct ivpu_device *vdev)
 	       REG_TEST_FLD(VPU_37XX_BUTTRESS_VPU_STATUS, IDLE, val);
 }
 
+static int ivpu_hw_37xx_wait_for_idle(struct ivpu_device *vdev)
+{
+	return REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_STATUS, IDLE, 0x1, IDLE_TIMEOUT_US);
+}
+
+static void ivpu_hw_37xx_save_d0i3_entry_timestamp(struct ivpu_device *vdev)
+{
+	vdev->hw->d0i3_entry_host_ts = ktime_get_boottime();
+	vdev->hw->d0i3_entry_vpu_ts = REGV_RD64(VPU_37XX_CPU_SS_TIM_PERF_FREE_CNT);
+}
+
 static int ivpu_hw_37xx_power_down(struct ivpu_device *vdev)
 {
 	int ret = 0;
 
-	if (!ivpu_hw_37xx_is_idle(vdev) && ivpu_hw_37xx_reset(vdev))
-		ivpu_err(vdev, "Failed to reset the VPU\n");
+	ivpu_hw_37xx_save_d0i3_entry_timestamp(vdev);
+
+	if (!ivpu_hw_37xx_is_idle(vdev)) {
+		ivpu_warn(vdev, "VPU not idle during power down\n");
+		if (ivpu_hw_37xx_reset(vdev))
+			ivpu_warn(vdev, "Failed to reset the VPU\n");
+	}
 
 	if (ivpu_pll_disable(vdev)) {
 		ivpu_err(vdev, "Failed to disable PLL\n");
@@ -756,6 +770,16 @@ static void ivpu_hw_37xx_wdt_disable(struct ivpu_device *vdev)
 	REGV_WR32(VPU_37XX_CPU_SS_TIM_GEN_CONFIG, val);
 }
 
+static u32 ivpu_hw_37xx_profiling_freq_get(struct ivpu_device *vdev)
+{
+	return PLL_PROF_CLK_FREQ;
+}
+
+static void ivpu_hw_37xx_profiling_freq_drive(struct ivpu_device *vdev, bool enable)
+{
+	/* Profiling freq - is a debug feature. Unavailable on VPU 37XX. */
+}
+
 static u32 ivpu_hw_37xx_pll_to_freq(u32 ratio, u32 config)
 {
 	u32 pll_clock = PLL_REF_CLK_FREQ * ratio;
@@ -867,17 +891,20 @@ static void ivpu_hw_37xx_irq_noc_firewall_handler(struct ivpu_device *vdev)
 }
 
 /* Handler for IRQs from VPU core (irqV) */
-static u32 ivpu_hw_37xx_irqv_handler(struct ivpu_device *vdev, int irq)
+static bool ivpu_hw_37xx_irqv_handler(struct ivpu_device *vdev, int irq, bool *wake_thread)
 {
 	u32 status = REGV_RD32(VPU_37XX_HOST_SS_ICB_STATUS_0) & ICB_0_IRQ_MASK;
 
+	if (!status)
+		return false;
+
 	REGV_WR32(VPU_37XX_HOST_SS_ICB_CLEAR_0, status);
 
 	if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_0_INT, status))
 		ivpu_mmu_irq_evtq_handler(vdev);
 
 	if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, HOST_IPC_FIFO_INT, status))
-		ivpu_ipc_irq_handler(vdev);
+		ivpu_ipc_irq_handler(vdev, wake_thread);
 
 	if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_1_INT, status))
 		ivpu_dbg(vdev, IRQ, "MMU sync complete\n");
@@ -894,17 +921,17 @@ static u32 ivpu_hw_37xx_irqv_handler(struct ivpu_device *vdev, int irq)
 	if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, NOC_FIREWALL_INT, status))
 		ivpu_hw_37xx_irq_noc_firewall_handler(vdev);
 
-	return status;
+	return true;
 }
 
 /* Handler for IRQs from Buttress core (irqB) */
-static u32 ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
+static bool ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
 {
 	u32 status = REGB_RD32(VPU_37XX_BUTTRESS_INTERRUPT_STAT) & BUTTRESS_IRQ_MASK;
 	bool schedule_recovery = false;
 
-	if (status == 0)
-		return 0;
+	if (!status)
+		return false;
 
 	if (REG_TEST_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, FREQ_CHANGE, status))
 		ivpu_dbg(vdev, IRQ, "FREQ_CHANGE irq: %08x",
@@ -940,23 +967,27 @@ static u32 ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
 	if (schedule_recovery)
 		ivpu_pm_schedule_recovery(vdev);
 
-	return status;
+	return true;
 }
 
 static irqreturn_t ivpu_hw_37xx_irq_handler(int irq, void *ptr)
 {
 	struct ivpu_device *vdev = ptr;
-	u32 ret_irqv, ret_irqb;
+	bool irqv_handled, irqb_handled, wake_thread = false;
 
 	REGB_WR32(VPU_37XX_BUTTRESS_GLOBAL_INT_MASK, 0x1);
 
-	ret_irqv = ivpu_hw_37xx_irqv_handler(vdev, irq);
-	ret_irqb = ivpu_hw_37xx_irqb_handler(vdev, irq);
+	irqv_handled = ivpu_hw_37xx_irqv_handler(vdev, irq, &wake_thread);
+	irqb_handled = ivpu_hw_37xx_irqb_handler(vdev, irq);
 
 	/* Re-enable global interrupts to re-trigger MSI for pending interrupts */
 	REGB_WR32(VPU_37XX_BUTTRESS_GLOBAL_INT_MASK, 0x0);
 
-	return IRQ_RETVAL(ret_irqb | ret_irqv);
+	if (wake_thread)
+		return IRQ_WAKE_THREAD;
+	if (irqv_handled || irqb_handled)
+		return IRQ_HANDLED;
+	return IRQ_NONE;
 }
 
 static void ivpu_hw_37xx_diagnose_failure(struct ivpu_device *vdev)
@@ -993,11 +1024,14 @@ const struct ivpu_hw_ops ivpu_hw_37xx_ops = {
 	.info_init = ivpu_hw_37xx_info_init,
 	.power_up = ivpu_hw_37xx_power_up,
 	.is_idle = ivpu_hw_37xx_is_idle,
+	.wait_for_idle = ivpu_hw_37xx_wait_for_idle,
 	.power_down = ivpu_hw_37xx_power_down,
 	.reset = ivpu_hw_37xx_reset,
 	.boot_fw = ivpu_hw_37xx_boot_fw,
 	.wdt_disable = ivpu_hw_37xx_wdt_disable,
 	.diagnose_failure = ivpu_hw_37xx_diagnose_failure,
+	.profiling_freq_get = ivpu_hw_37xx_profiling_freq_get,
+	.profiling_freq_drive = ivpu_hw_37xx_profiling_freq_drive,
 	.reg_pll_freq_get = ivpu_hw_37xx_reg_pll_freq_get,
 	.reg_telemetry_offset_get = ivpu_hw_37xx_reg_telemetry_offset_get,
 	.reg_telemetry_size_get = ivpu_hw_37xx_reg_telemetry_size_get,
diff --git a/drivers/accel/ivpu/ivpu_hw_37xx_reg.h b/drivers/accel/ivpu/ivpu_hw_37xx_reg.h
index 4083beb5e9db..f6fec1919202 100644
--- a/drivers/accel/ivpu/ivpu_hw_37xx_reg.h
+++ b/drivers/accel/ivpu/ivpu_hw_37xx_reg.h
@@ -240,6 +240,8 @@
 #define VPU_37XX_CPU_SS_TIM_GEN_CONFIG					0x06021008u
 #define VPU_37XX_CPU_SS_TIM_GEN_CONFIG_WDOG_TO_INT_CLR_MASK		BIT_MASK(9)
 
+#define VPU_37XX_CPU_SS_TIM_PERF_FREE_CNT				0x06029000u
+
 #define VPU_37XX_CPU_SS_DOORBELL_0					0x06300000u
 #define VPU_37XX_CPU_SS_DOORBELL_0_SET_MASK				BIT_MASK(0)
 
diff --git a/drivers/accel/ivpu/ivpu_hw_40xx.c b/drivers/accel/ivpu/ivpu_hw_40xx.c
index e691c49c9841..eba2fdef2ace 100644
--- a/drivers/accel/ivpu/ivpu_hw_40xx.c
+++ b/drivers/accel/ivpu/ivpu_hw_40xx.c
@@ -39,6 +39,7 @@
 #define TIMEOUT_US		     (150 * USEC_PER_MSEC)
 #define PWR_ISLAND_STATUS_TIMEOUT_US (5 * USEC_PER_MSEC)
 #define PLL_TIMEOUT_US		     (1500 * USEC_PER_MSEC)
+#define IDLE_TIMEOUT_US		     (5 * USEC_PER_MSEC)
 
 #define WEIGHTS_DEFAULT              0xf711f711u
 #define WEIGHTS_ATS_DEFAULT          0x0000f711u
@@ -139,18 +140,21 @@ static void ivpu_hw_timeouts_init(struct ivpu_device *vdev)
 		vdev->timeout.tdr = 2000000;
 		vdev->timeout.reschedule_suspend = 1000;
 		vdev->timeout.autosuspend = -1;
+		vdev->timeout.d0i3_entry_msg = 500;
 	} else if (ivpu_is_simics(vdev)) {
 		vdev->timeout.boot = 50;
 		vdev->timeout.jsm = 500;
 		vdev->timeout.tdr = 10000;
 		vdev->timeout.reschedule_suspend = 10;
 		vdev->timeout.autosuspend = -1;
+		vdev->timeout.d0i3_entry_msg = 100;
 	} else {
 		vdev->timeout.boot = 1000;
 		vdev->timeout.jsm = 500;
 		vdev->timeout.tdr = 2000;
 		vdev->timeout.reschedule_suspend = 10;
 		vdev->timeout.autosuspend = 10;
+		vdev->timeout.d0i3_entry_msg = 5;
 	}
 }
 
@@ -824,12 +828,6 @@ static int ivpu_hw_40xx_power_up(struct ivpu_device *vdev)
 {
 	int ret;
 
-	ret = ivpu_hw_40xx_reset(vdev);
-	if (ret) {
-		ivpu_err(vdev, "Failed to reset HW: %d\n", ret);
-		return ret;
-	}
-
 	ret = ivpu_hw_40xx_d0i3_disable(vdev);
 	if (ret)
 		ivpu_warn(vdev, "Failed to disable D0I3: %d\n", ret);
@@ -898,10 +896,23 @@ static bool ivpu_hw_40xx_is_idle(struct ivpu_device *vdev)
 	       REG_TEST_FLD(VPU_40XX_BUTTRESS_VPU_STATUS, IDLE, val);
 }
 
+static int ivpu_hw_40xx_wait_for_idle(struct ivpu_device *vdev)
+{
+	return REGB_POLL_FLD(VPU_40XX_BUTTRESS_VPU_STATUS, IDLE, 0x1, IDLE_TIMEOUT_US);
+}
+
+static void ivpu_hw_40xx_save_d0i3_entry_timestamp(struct ivpu_device *vdev)
+{
+	vdev->hw->d0i3_entry_host_ts = ktime_get_boottime();
+	vdev->hw->d0i3_entry_vpu_ts = REGV_RD64(VPU_40XX_CPU_SS_TIM_PERF_EXT_FREE_CNT);
+}
+
 static int ivpu_hw_40xx_power_down(struct ivpu_device *vdev)
 {
 	int ret = 0;
 
+	ivpu_hw_40xx_save_d0i3_entry_timestamp(vdev);
+
 	if (!ivpu_hw_40xx_is_idle(vdev) && ivpu_hw_40xx_reset(vdev))
 		ivpu_warn(vdev, "Failed to reset the VPU\n");
 
@@ -933,6 +944,19 @@ static void ivpu_hw_40xx_wdt_disable(struct ivpu_device *vdev)
 	REGV_WR32(VPU_40XX_CPU_SS_TIM_GEN_CONFIG, val);
 }
 
+static u32 ivpu_hw_40xx_profiling_freq_get(struct ivpu_device *vdev)
+{
+	return vdev->hw->pll.profiling_freq;
+}
+
+static void ivpu_hw_40xx_profiling_freq_drive(struct ivpu_device *vdev, bool enable)
+{
+	if (enable)
+		vdev->hw->pll.profiling_freq = PLL_PROFILING_FREQ_HIGH;
+	else
+		vdev->hw->pll.profiling_freq = PLL_PROFILING_FREQ_DEFAULT;
+}
+
 /* Register indirect accesses */
 static u32 ivpu_hw_40xx_reg_pll_freq_get(struct ivpu_device *vdev)
 {
@@ -1023,13 +1047,12 @@ static void ivpu_hw_40xx_irq_noc_firewall_handler(struct ivpu_device *vdev)
 }
 
 /* Handler for IRQs from VPU core (irqV) */
-static irqreturn_t ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq)
+static bool ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq, bool *wake_thread)
 {
 	u32 status = REGV_RD32(VPU_40XX_HOST_SS_ICB_STATUS_0) & ICB_0_IRQ_MASK;
-	irqreturn_t ret = IRQ_NONE;
 
 	if (!status)
-		return IRQ_NONE;
+		return false;
 
 	REGV_WR32(VPU_40XX_HOST_SS_ICB_CLEAR_0, status);
 
@@ -1037,7 +1060,7 @@ static irqreturn_t ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq)
 		ivpu_mmu_irq_evtq_handler(vdev);
 
 	if (REG_TEST_FLD(VPU_40XX_HOST_SS_ICB_STATUS_0, HOST_IPC_FIFO_INT, status))
-		ret |= ivpu_ipc_irq_handler(vdev);
+		ivpu_ipc_irq_handler(vdev, wake_thread);
 
 	if (REG_TEST_FLD(VPU_40XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_1_INT, status))
 		ivpu_dbg(vdev, IRQ, "MMU sync complete\n");
@@ -1054,17 +1077,17 @@ static irqreturn_t ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq)
 	if (REG_TEST_FLD(VPU_40XX_HOST_SS_ICB_STATUS_0, NOC_FIREWALL_INT, status))
 		ivpu_hw_40xx_irq_noc_firewall_handler(vdev);
 
-	return ret;
+	return true;
 }
 
 /* Handler for IRQs from Buttress core (irqB) */
-static irqreturn_t ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
+static bool ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
 {
 	bool schedule_recovery = false;
 	u32 status = REGB_RD32(VPU_40XX_BUTTRESS_INTERRUPT_STAT) & BUTTRESS_IRQ_MASK;
 
-	if (status == 0)
-		return IRQ_NONE;
+	if (!status)
+		return false;
 
 	if (REG_TEST_FLD(VPU_40XX_BUTTRESS_INTERRUPT_STAT, FREQ_CHANGE, status))
 		ivpu_dbg(vdev, IRQ, "FREQ_CHANGE");
@@ -1116,26 +1139,27 @@ static irqreturn_t ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
 	if (schedule_recovery)
 		ivpu_pm_schedule_recovery(vdev);
 
-	return IRQ_HANDLED;
+	return true;
 }
 
 static irqreturn_t ivpu_hw_40xx_irq_handler(int irq, void *ptr)
 {
+	bool irqv_handled, irqb_handled, wake_thread = false;
 	struct ivpu_device *vdev = ptr;
-	irqreturn_t ret = IRQ_NONE;
 
 	REGB_WR32(VPU_40XX_BUTTRESS_GLOBAL_INT_MASK, 0x1);
 
-	ret |= ivpu_hw_40xx_irqv_handler(vdev, irq);
-	ret |= ivpu_hw_40xx_irqb_handler(vdev, irq);
+	irqv_handled = ivpu_hw_40xx_irqv_handler(vdev, irq, &wake_thread);
+	irqb_handled = ivpu_hw_40xx_irqb_handler(vdev, irq);
 
 	/* Re-enable global interrupts to re-trigger MSI for pending interrupts */
 	REGB_WR32(VPU_40XX_BUTTRESS_GLOBAL_INT_MASK, 0x0);
 
-	if (ret & IRQ_WAKE_THREAD)
+	if (wake_thread)
 		return IRQ_WAKE_THREAD;
-
-	return ret;
+	if (irqv_handled || irqb_handled)
+		return IRQ_HANDLED;
+	return IRQ_NONE;
 }
 
 static void ivpu_hw_40xx_diagnose_failure(struct ivpu_device *vdev)
@@ -1185,11 +1209,14 @@ const struct ivpu_hw_ops ivpu_hw_40xx_ops = {
 	.info_init = ivpu_hw_40xx_info_init,
 	.power_up = ivpu_hw_40xx_power_up,
 	.is_idle = ivpu_hw_40xx_is_idle,
+	.wait_for_idle = ivpu_hw_40xx_wait_for_idle,
 	.power_down = ivpu_hw_40xx_power_down,
 	.reset = ivpu_hw_40xx_reset,
 	.boot_fw = ivpu_hw_40xx_boot_fw,
 	.wdt_disable = ivpu_hw_40xx_wdt_disable,
 	.diagnose_failure = ivpu_hw_40xx_diagnose_failure,
+	.profiling_freq_get = ivpu_hw_40xx_profiling_freq_get,
+	.profiling_freq_drive = ivpu_hw_40xx_profiling_freq_drive,
 	.reg_pll_freq_get = ivpu_hw_40xx_reg_pll_freq_get,
 	.reg_telemetry_offset_get = ivpu_hw_40xx_reg_telemetry_offset_get,
 	.reg_telemetry_size_get = ivpu_hw_40xx_reg_telemetry_size_get,
diff --git a/drivers/accel/ivpu/ivpu_ipc.c b/drivers/accel/ivpu/ivpu_ipc.c
index a4ca40b184d4..e86621f16f85 100644
--- a/drivers/accel/ivpu/ivpu_ipc.c
+++ b/drivers/accel/ivpu/ivpu_ipc.c
@@ -5,7 +5,7 @@
 
 #include <linux/genalloc.h>
 #include <linux/highmem.h>
-#include <linux/kthread.h>
+#include <linux/pm_runtime.h>
 #include <linux/wait.h>
 
 #include "ivpu_drv.h"
@@ -17,19 +17,12 @@
 #include "ivpu_pm.h"
 
 #define IPC_MAX_RX_MSG	128
-#define IS_KTHREAD()	(get_current()->flags & PF_KTHREAD)
 
 struct ivpu_ipc_tx_buf {
 	struct ivpu_ipc_hdr ipc;
 	struct vpu_jsm_msg jsm;
 };
 
-struct ivpu_ipc_rx_msg {
-	struct list_head link;
-	struct ivpu_ipc_hdr *ipc_hdr;
-	struct vpu_jsm_msg *jsm_msg;
-};
-
 static void ivpu_ipc_msg_dump(struct ivpu_device *vdev, char *c,
 			      struct ivpu_ipc_hdr *ipc_hdr, u32 vpu_addr)
 {
@@ -139,8 +132,49 @@ static void ivpu_ipc_tx(struct ivpu_device *vdev, u32 vpu_addr)
 	ivpu_hw_reg_ipc_tx_set(vdev, vpu_addr);
 }
 
-void
-ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons, u32 channel)
+static void
+ivpu_ipc_rx_msg_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
+		    struct ivpu_ipc_hdr *ipc_hdr, struct vpu_jsm_msg *jsm_msg)
+{
+	struct ivpu_ipc_info *ipc = vdev->ipc;
+	struct ivpu_ipc_rx_msg *rx_msg;
+
+	lockdep_assert_held(&ipc->cons_lock);
+	lockdep_assert_irqs_disabled();
+
+	rx_msg = kzalloc(sizeof(*rx_msg), GFP_ATOMIC);
+	if (!rx_msg) {
+		ivpu_ipc_rx_mark_free(vdev, ipc_hdr, jsm_msg);
+		return;
+	}
+
+	atomic_inc(&ipc->rx_msg_count);
+
+	rx_msg->ipc_hdr = ipc_hdr;
+	rx_msg->jsm_msg = jsm_msg;
+	rx_msg->callback = cons->rx_callback;
+
+	if (rx_msg->callback) {
+		list_add_tail(&rx_msg->link, &ipc->cb_msg_list);
+	} else {
+		spin_lock(&cons->rx_lock);
+		list_add_tail(&rx_msg->link, &cons->rx_msg_list);
+		spin_unlock(&cons->rx_lock);
+		wake_up(&cons->rx_msg_wq);
+	}
+}
+
+static void
+ivpu_ipc_rx_msg_del(struct ivpu_device *vdev, struct ivpu_ipc_rx_msg *rx_msg)
+{
+	list_del(&rx_msg->link);
+	ivpu_ipc_rx_mark_free(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
+	atomic_dec(&vdev->ipc->rx_msg_count);
+	kfree(rx_msg);
+}
+
+void ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
+			   u32 channel, ivpu_ipc_rx_callback_t rx_callback)
 {
 	struct ivpu_ipc_info *ipc = vdev->ipc;
 
@@ -148,13 +182,15 @@ ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
 	cons->channel = channel;
 	cons->tx_vpu_addr = 0;
 	cons->request_id = 0;
-	spin_lock_init(&cons->rx_msg_lock);
+	cons->aborted = false;
+	cons->rx_callback = rx_callback;
+	spin_lock_init(&cons->rx_lock);
 	INIT_LIST_HEAD(&cons->rx_msg_list);
 	init_waitqueue_head(&cons->rx_msg_wq);
 
-	spin_lock_irq(&ipc->cons_list_lock);
+	spin_lock_irq(&ipc->cons_lock);
 	list_add_tail(&cons->link, &ipc->cons_list);
-	spin_unlock_irq(&ipc->cons_list_lock);
+	spin_unlock_irq(&ipc->cons_lock);
 }
 
 void ivpu_ipc_consumer_del(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons)
@@ -162,18 +198,14 @@ void ivpu_ipc_consumer_del(struct ivpu_device *vdev, struct ivpu_ipc_consumer *c
 	struct ivpu_ipc_info *ipc = vdev->ipc;
 	struct ivpu_ipc_rx_msg *rx_msg, *r;
 
-	spin_lock_irq(&ipc->cons_list_lock);
+	spin_lock_irq(&ipc->cons_lock);
 	list_del(&cons->link);
-	spin_unlock_irq(&ipc->cons_list_lock);
+	spin_unlock_irq(&ipc->cons_lock);
 
-	spin_lock_irq(&cons->rx_msg_lock);
-	list_for_each_entry_safe(rx_msg, r, &cons->rx_msg_list, link) {
-		list_del(&rx_msg->link);
-		ivpu_ipc_rx_mark_free(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
-		atomic_dec(&ipc->rx_msg_count);
-		kfree(rx_msg);
-	}
-	spin_unlock_irq(&cons->rx_msg_lock);
+	spin_lock_irq(&cons->rx_lock);
+	list_for_each_entry_safe(rx_msg, r, &cons->rx_msg_list, link)
+		ivpu_ipc_rx_msg_del(vdev, rx_msg);
+	spin_unlock_irq(&cons->rx_lock);
 
 	ivpu_ipc_tx_release(vdev, cons->tx_vpu_addr);
 }
@@ -202,52 +234,61 @@ unlock:
 	return ret;
 }
 
+static bool ivpu_ipc_rx_need_wakeup(struct ivpu_ipc_consumer *cons)
+{
+	bool ret;
+
+	spin_lock_irq(&cons->rx_lock);
+	ret = !list_empty(&cons->rx_msg_list) || cons->aborted;
+	spin_unlock_irq(&cons->rx_lock);
+
+	return ret;
+}
+
 int ivpu_ipc_receive(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
 		     struct ivpu_ipc_hdr *ipc_buf,
-		     struct vpu_jsm_msg *ipc_payload, unsigned long timeout_ms)
+		     struct vpu_jsm_msg *jsm_msg, unsigned long timeout_ms)
 {
-	struct ivpu_ipc_info *ipc = vdev->ipc;
 	struct ivpu_ipc_rx_msg *rx_msg;
 	int wait_ret, ret = 0;
 
-	wait_ret = wait_event_timeout(cons->rx_msg_wq,
-				      (IS_KTHREAD() && kthread_should_stop()) ||
-				      !list_empty(&cons->rx_msg_list),
-				      msecs_to_jiffies(timeout_ms));
+	if (drm_WARN_ONCE(&vdev->drm, cons->rx_callback, "Consumer works only in async mode\n"))
+		return -EINVAL;
 
-	if (IS_KTHREAD() && kthread_should_stop())
-		return -EINTR;
+	wait_ret = wait_event_timeout(cons->rx_msg_wq,
+				      ivpu_ipc_rx_need_wakeup(cons),
+				      msecs_to_jiffies(timeout_ms));
 
 	if (wait_ret == 0)
 		return -ETIMEDOUT;
 
-	spin_lock_irq(&cons->rx_msg_lock);
+	spin_lock_irq(&cons->rx_lock);
+	if (cons->aborted) {
+		spin_unlock_irq(&cons->rx_lock);
+		return -ECANCELED;
+	}
 	rx_msg = list_first_entry_or_null(&cons->rx_msg_list, struct ivpu_ipc_rx_msg, link);
 	if (!rx_msg) {
-		spin_unlock_irq(&cons->rx_msg_lock);
+		spin_unlock_irq(&cons->rx_lock);
 		return -EAGAIN;
 	}
-	list_del(&rx_msg->link);
-	spin_unlock_irq(&cons->rx_msg_lock);
 
 	if (ipc_buf)
 		memcpy(ipc_buf, rx_msg->ipc_hdr, sizeof(*ipc_buf));
 	if (rx_msg->jsm_msg) {
-		u32 size = min_t(int, rx_msg->ipc_hdr->data_size, sizeof(*ipc_payload));
+		u32 size = min_t(int, rx_msg->ipc_hdr->data_size, sizeof(*jsm_msg));
 
 		if (rx_msg->jsm_msg->result != VPU_JSM_STATUS_SUCCESS) {
 			ivpu_dbg(vdev, IPC, "IPC resp result error: %d\n", rx_msg->jsm_msg->result);
 			ret = -EBADMSG;
 		}
 
-		if (ipc_payload)
-			memcpy(ipc_payload, rx_msg->jsm_msg, size);
+		if (jsm_msg)
+			memcpy(jsm_msg, rx_msg->jsm_msg, size);
 	}
 
-	ivpu_ipc_rx_mark_free(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
-	atomic_dec(&ipc->rx_msg_count);
-	kfree(rx_msg);
-
+	ivpu_ipc_rx_msg_del(vdev, rx_msg);
+	spin_unlock_irq(&cons->rx_lock);
 	return ret;
 }
 
@@ -260,7 +301,7 @@ ivpu_ipc_send_receive_internal(struct ivpu_device *vdev, struct vpu_jsm_msg *req
 	struct ivpu_ipc_consumer cons;
 	int ret;
 
-	ivpu_ipc_consumer_add(vdev, &cons, channel);
+	ivpu_ipc_consumer_add(vdev, &cons, channel, NULL);
 
 	ret = ivpu_ipc_send(vdev, &cons, req);
 	if (ret) {
@@ -285,23 +326,19 @@ consumer_del:
 	return ret;
 }
 
-int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
-			  enum vpu_ipc_msg_type expected_resp_type,
-			  struct vpu_jsm_msg *resp, u32 channel,
-			  unsigned long timeout_ms)
+int ivpu_ipc_send_receive_active(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
+				 enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
+				 u32 channel, unsigned long timeout_ms)
 {
 	struct vpu_jsm_msg hb_req = { .type = VPU_JSM_MSG_QUERY_ENGINE_HB };
 	struct vpu_jsm_msg hb_resp;
 	int ret, hb_ret;
 
-	ret = ivpu_rpm_get(vdev);
-	if (ret < 0)
-		return ret;
+	drm_WARN_ON(&vdev->drm, pm_runtime_status_suspended(vdev->drm.dev));
 
-	ret = ivpu_ipc_send_receive_internal(vdev, req, expected_resp_type, resp,
-					     channel, timeout_ms);
+	ret = ivpu_ipc_send_receive_internal(vdev, req, expected_resp, resp, channel, timeout_ms);
 	if (ret != -ETIMEDOUT)
-		goto rpm_put;
+		return ret;
 
 	hb_ret = ivpu_ipc_send_receive_internal(vdev, &hb_req, VPU_JSM_MSG_QUERY_ENGINE_HB_DONE,
 						&hb_resp, VPU_IPC_CHAN_ASYNC_CMD,
@@ -311,7 +348,21 @@ int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
 		ivpu_pm_schedule_recovery(vdev);
 	}
 
-rpm_put:
+	return ret;
+}
+
+int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
+			  enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
+			  u32 channel, unsigned long timeout_ms)
+{
+	int ret;
+
+	ret = ivpu_rpm_get(vdev);
+	if (ret < 0)
+		return ret;
+
+	ret = ivpu_ipc_send_receive_active(vdev, req, expected_resp, resp, channel, timeout_ms);
+
 	ivpu_rpm_put(vdev);
 	return ret;
 }
@@ -329,35 +380,7 @@ ivpu_ipc_match_consumer(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons
 	return false;
 }
 
-static void
-ivpu_ipc_dispatch(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
-		  struct ivpu_ipc_hdr *ipc_hdr, struct vpu_jsm_msg *jsm_msg)
-{
-	struct ivpu_ipc_info *ipc = vdev->ipc;
-	struct ivpu_ipc_rx_msg *rx_msg;
-	unsigned long flags;
-
-	lockdep_assert_held(&ipc->cons_list_lock);
-
-	rx_msg = kzalloc(sizeof(*rx_msg), GFP_ATOMIC);
-	if (!rx_msg) {
-		ivpu_ipc_rx_mark_free(vdev, ipc_hdr, jsm_msg);
-		return;
-	}
-
-	atomic_inc(&ipc->rx_msg_count);
-
-	rx_msg->ipc_hdr = ipc_hdr;
-	rx_msg->jsm_msg = jsm_msg;
-
-	spin_lock_irqsave(&cons->rx_msg_lock, flags);
-	list_add_tail(&rx_msg->link, &cons->rx_msg_list);
-	spin_unlock_irqrestore(&cons->rx_msg_lock, flags);
-
-	wake_up(&cons->rx_msg_wq);
-}
-
-int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
+void ivpu_ipc_irq_handler(struct ivpu_device *vdev, bool *wake_thread)
 {
 	struct ivpu_ipc_info *ipc = vdev->ipc;
 	struct ivpu_ipc_consumer *cons;
@@ -375,7 +398,7 @@ int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
 		vpu_addr = ivpu_hw_reg_ipc_rx_addr_get(vdev);
 		if (vpu_addr == REG_IO_ERROR) {
 			ivpu_err_ratelimited(vdev, "Failed to read IPC rx addr register\n");
-			return -EIO;
+			return;
 		}
 
 		ipc_hdr = ivpu_to_cpu_addr(ipc->mem_rx, vpu_addr);
@@ -405,15 +428,15 @@ int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
 		}
 
 		dispatched = false;
-		spin_lock_irqsave(&ipc->cons_list_lock, flags);
+		spin_lock_irqsave(&ipc->cons_lock, flags);
 		list_for_each_entry(cons, &ipc->cons_list, link) {
 			if (ivpu_ipc_match_consumer(vdev, cons, ipc_hdr, jsm_msg)) {
-				ivpu_ipc_dispatch(vdev, cons, ipc_hdr, jsm_msg);
+				ivpu_ipc_rx_msg_add(vdev, cons, ipc_hdr, jsm_msg);
 				dispatched = true;
 				break;
 			}
 		}
-		spin_unlock_irqrestore(&ipc->cons_list_lock, flags);
+		spin_unlock_irqrestore(&ipc->cons_lock, flags);
 
 		if (!dispatched) {
 			ivpu_dbg(vdev, IPC, "IPC RX msg 0x%x dropped (no consumer)\n", vpu_addr);
@@ -421,7 +444,28 @@ int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
 		}
 	}
 
-	return 0;
+	if (wake_thread)
+		*wake_thread = !list_empty(&ipc->cb_msg_list);
+}
+
+irqreturn_t ivpu_ipc_irq_thread_handler(struct ivpu_device *vdev)
+{
+	struct ivpu_ipc_info *ipc = vdev->ipc;
+	struct ivpu_ipc_rx_msg *rx_msg, *r;
+	struct list_head cb_msg_list;
+
+	INIT_LIST_HEAD(&cb_msg_list);
+
+	spin_lock_irq(&ipc->cons_lock);
+	list_splice_tail_init(&ipc->cb_msg_list, &cb_msg_list);
+	spin_unlock_irq(&ipc->cons_lock);
+
+	list_for_each_entry_safe(rx_msg, r, &cb_msg_list, link) {
+		rx_msg->callback(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
+		ivpu_ipc_rx_msg_del(vdev, rx_msg);
+	}
+
+	return IRQ_HANDLED;
 }
 
 int ivpu_ipc_init(struct ivpu_device *vdev)
@@ -456,10 +500,10 @@ int ivpu_ipc_init(struct ivpu_device *vdev)
 		goto err_free_rx;
 	}
 
+	spin_lock_init(&ipc->cons_lock);
 	INIT_LIST_HEAD(&ipc->cons_list);
-	spin_lock_init(&ipc->cons_list_lock);
+	INIT_LIST_HEAD(&ipc->cb_msg_list);
 	drmm_mutex_init(&vdev->drm, &ipc->lock);
-
 	ivpu_ipc_reset(vdev);
 	return 0;
 
@@ -472,6 +516,13 @@ err_free_tx:
 
 void ivpu_ipc_fini(struct ivpu_device *vdev)
 {
+	struct ivpu_ipc_info *ipc = vdev->ipc;
+
+	drm_WARN_ON(&vdev->drm, ipc->on);
+	drm_WARN_ON(&vdev->drm, !list_empty(&ipc->cons_list));
+	drm_WARN_ON(&vdev->drm, !list_empty(&ipc->cb_msg_list));
+	drm_WARN_ON(&vdev->drm, atomic_read(&ipc->rx_msg_count) > 0);
+
 	ivpu_ipc_mem_fini(vdev);
 }
 
@@ -488,16 +539,27 @@ void ivpu_ipc_disable(struct ivpu_device *vdev)
 {
 	struct ivpu_ipc_info *ipc = vdev->ipc;
 	struct ivpu_ipc_consumer *cons, *c;
-	unsigned long flags;
+	struct ivpu_ipc_rx_msg *rx_msg, *r;
+
+	drm_WARN_ON(&vdev->drm, !list_empty(&ipc->cb_msg_list));
 
 	mutex_lock(&ipc->lock);
 	ipc->on = false;
 	mutex_unlock(&ipc->lock);
 
-	spin_lock_irqsave(&ipc->cons_list_lock, flags);
-	list_for_each_entry_safe(cons, c, &ipc->cons_list, link)
+	spin_lock_irq(&ipc->cons_lock);
+	list_for_each_entry_safe(cons, c, &ipc->cons_list, link) {
+		spin_lock(&cons->rx_lock);
+		if (!cons->rx_callback)
+			cons->aborted = true;
+		list_for_each_entry_safe(rx_msg, r, &cons->rx_msg_list, link)
+			ivpu_ipc_rx_msg_del(vdev, rx_msg);
+		spin_unlock(&cons->rx_lock);
 		wake_up(&cons->rx_msg_wq);
-	spin_unlock_irqrestore(&ipc->cons_list_lock, flags);
+	}
+	spin_unlock_irq(&ipc->cons_lock);
+
+	drm_WARN_ON(&vdev->drm, atomic_read(&ipc->rx_msg_count) > 0);
 }
 
 void ivpu_ipc_reset(struct ivpu_device *vdev)
@@ -505,6 +567,7 @@ void ivpu_ipc_reset(struct ivpu_device *vdev)
 	struct ivpu_ipc_info *ipc = vdev->ipc;
 
 	mutex_lock(&ipc->lock);
+	drm_WARN_ON(&vdev->drm, ipc->on);
 
 	memset(ivpu_bo_vaddr(ipc->mem_tx), 0, ivpu_bo_size(ipc->mem_tx));
 	memset(ivpu_bo_vaddr(ipc->mem_rx), 0, ivpu_bo_size(ipc->mem_rx));
diff --git a/drivers/accel/ivpu/ivpu_ipc.h b/drivers/accel/ivpu/ivpu_ipc.h
index 68f5b6668e00..40ca3cc4e61f 100644
--- a/drivers/accel/ivpu/ivpu_ipc.h
+++ b/drivers/accel/ivpu/ivpu_ipc.h
@@ -42,13 +42,26 @@ struct ivpu_ipc_hdr {
 	u8 status;
 } __packed __aligned(IVPU_IPC_ALIGNMENT);
 
+typedef void (*ivpu_ipc_rx_callback_t)(struct ivpu_device *vdev,
+				       struct ivpu_ipc_hdr *ipc_hdr,
+				       struct vpu_jsm_msg *jsm_msg);
+
+struct ivpu_ipc_rx_msg {
+	struct list_head link;
+	struct ivpu_ipc_hdr *ipc_hdr;
+	struct vpu_jsm_msg *jsm_msg;
+	ivpu_ipc_rx_callback_t callback;
+};
+
 struct ivpu_ipc_consumer {
 	struct list_head link;
 	u32 channel;
 	u32 tx_vpu_addr;
 	u32 request_id;
+	bool aborted;
+	ivpu_ipc_rx_callback_t rx_callback;
 
-	spinlock_t rx_msg_lock; /* Protects rx_msg_list */
+	spinlock_t rx_lock; /* Protects rx_msg_list and aborted */
 	struct list_head rx_msg_list;
 	wait_queue_head_t rx_msg_wq;
 };
@@ -60,8 +73,9 @@ struct ivpu_ipc_info {
 
 	atomic_t rx_msg_count;
 
-	spinlock_t cons_list_lock; /* Protects cons_list */
+	spinlock_t cons_lock; /* Protects cons_list and cb_msg_list */
 	struct list_head cons_list;
+	struct list_head cb_msg_list;
 
 	atomic_t request_id;
 	struct mutex lock; /* Lock on status */
@@ -75,19 +89,22 @@ void ivpu_ipc_enable(struct ivpu_device *vdev);
 void ivpu_ipc_disable(struct ivpu_device *vdev);
 void ivpu_ipc_reset(struct ivpu_device *vdev);
 
-int ivpu_ipc_irq_handler(struct ivpu_device *vdev);
+void ivpu_ipc_irq_handler(struct ivpu_device *vdev, bool *wake_thread);
+irqreturn_t ivpu_ipc_irq_thread_handler(struct ivpu_device *vdev);
 
 void ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
-			   u32 channel);
+			   u32 channel, ivpu_ipc_rx_callback_t callback);
 void ivpu_ipc_consumer_del(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons);
 
 int ivpu_ipc_receive(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
-		     struct ivpu_ipc_hdr *ipc_buf, struct vpu_jsm_msg *ipc_payload,
+		     struct ivpu_ipc_hdr *ipc_buf, struct vpu_jsm_msg *jsm_msg,
 		     unsigned long timeout_ms);
 
+int ivpu_ipc_send_receive_active(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
+				 enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
+				 u32 channel, unsigned long timeout_ms);
 int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
-			  enum vpu_ipc_msg_type expected_resp_type,
-			  struct vpu_jsm_msg *resp, u32 channel,
-			  unsigned long timeout_ms);
+			  enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
+			  u32 channel, unsigned long timeout_ms);
 
 #endif /* __IVPU_IPC_H__ */
diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
index 8983e3a4fdf9..7206cf9cdb4a 100644
--- a/drivers/accel/ivpu/ivpu_job.c
+++ b/drivers/accel/ivpu/ivpu_job.c
@@ -7,7 +7,6 @@
 
 #include <linux/bitfield.h>
 #include <linux/highmem.h>
-#include <linux/kthread.h>
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <uapi/drm/ivpu_accel.h>
@@ -24,10 +23,6 @@
 #define JOB_ID_CONTEXT_MASK  GENMASK(31, 8)
 #define JOB_MAX_BUFFER_COUNT 65535
 
-static unsigned int ivpu_tdr_timeout_ms;
-module_param_named(tdr_timeout_ms, ivpu_tdr_timeout_ms, uint, 0644);
-MODULE_PARM_DESC(tdr_timeout_ms, "Timeout for device hang detection, in milliseconds, 0 - default");
-
 static void ivpu_cmdq_ring_db(struct ivpu_device *vdev, struct ivpu_cmdq *cmdq)
 {
 	ivpu_hw_reg_db_set(vdev, cmdq->db_id);
@@ -196,6 +191,8 @@ static int ivpu_cmdq_push_job(struct ivpu_cmdq *cmdq, struct ivpu_job *job)
 	entry->batch_buf_addr = job->cmd_buf_vpu_addr;
 	entry->job_id = job->job_id;
 	entry->flags = 0;
+	if (unlikely(ivpu_test_mode & IVPU_TEST_MODE_NULL_SUBMISSION))
+		entry->flags = VPU_JOB_FLAGS_NULL_SUBMISSION_MASK;
 	wmb(); /* Ensure that tail is updated after filling entry */
 	header->tail = next_entry;
 	wmb(); /* Flush WC buffer for jobq header */
@@ -264,7 +261,7 @@ static void job_release(struct kref *ref)
 
 	for (i = 0; i < job->bo_count; i++)
 		if (job->bos[i])
-			drm_gem_object_put(&job->bos[i]->base);
+			drm_gem_object_put(&job->bos[i]->base.base);
 
 	dma_fence_put(job->done_fence);
 	ivpu_file_priv_put(&job->file_priv);
@@ -340,23 +337,12 @@ static int ivpu_job_done(struct ivpu_device *vdev, u32 job_id, u32 job_status)
 	ivpu_dbg(vdev, JOB, "Job complete:  id %3u ctx %2d engine %d status 0x%x\n",
 		 job->job_id, job->file_priv->ctx.id, job->engine_idx, job_status);
 
+	ivpu_stop_job_timeout_detection(vdev);
+
 	job_put(job);
 	return 0;
 }
 
-static void ivpu_job_done_message(struct ivpu_device *vdev, void *msg)
-{
-	struct vpu_ipc_msg_payload_job_done *payload;
-	struct vpu_jsm_msg *job_ret_msg = msg;
-	int ret;
-
-	payload = (struct vpu_ipc_msg_payload_job_done *)&job_ret_msg->payload;
-
-	ret = ivpu_job_done(vdev, payload->job_id, payload->job_status);
-	if (ret)
-		ivpu_err(vdev, "Failed to finish job %d: %d\n", payload->job_id, ret);
-}
-
 void ivpu_jobs_abort_all(struct ivpu_device *vdev)
 {
 	struct ivpu_job *job;
@@ -398,11 +384,13 @@ static int ivpu_direct_job_submission(struct ivpu_job *job)
 	if (ret)
 		goto err_xa_erase;
 
+	ivpu_start_job_timeout_detection(vdev);
+
 	ivpu_dbg(vdev, JOB, "Job submitted: id %3u addr 0x%llx ctx %2d engine %d next %d\n",
 		 job->job_id, job->cmd_buf_vpu_addr, file_priv->ctx.id,
 		 job->engine_idx, cmdq->jobq->header.tail);
 
-	if (ivpu_test_mode == IVPU_TEST_MODE_NULL_HW) {
+	if (ivpu_test_mode & IVPU_TEST_MODE_NULL_HW) {
 		ivpu_job_done(vdev, job->job_id, VPU_JSM_STATUS_SUCCESS);
 		cmdq->jobq->header.head = cmdq->jobq->header.tail;
 		wmb(); /* Flush WC buffer for jobq header */
@@ -448,7 +436,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
 	}
 
 	bo = job->bos[CMD_BUF_IDX];
-	if (!dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_READ)) {
+	if (!dma_resv_test_signaled(bo->base.base.resv, DMA_RESV_USAGE_READ)) {
 		ivpu_warn(vdev, "Buffer is already in use\n");
 		return -EBUSY;
 	}
@@ -468,7 +456,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
 	}
 
 	for (i = 0; i < buf_count; i++) {
-		ret = dma_resv_reserve_fences(job->bos[i]->base.resv, 1);
+		ret = dma_resv_reserve_fences(job->bos[i]->base.base.resv, 1);
 		if (ret) {
 			ivpu_warn(vdev, "Failed to reserve fences: %d\n", ret);
 			goto unlock_reservations;
@@ -477,7 +465,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
 
 	for (i = 0; i < buf_count; i++) {
 		usage = (i == CMD_BUF_IDX) ? DMA_RESV_USAGE_WRITE : DMA_RESV_USAGE_BOOKKEEP;
-		dma_resv_add_fence(job->bos[i]->base.resv, job->done_fence, usage);
+		dma_resv_add_fence(job->bos[i]->base.base.resv, job->done_fence, usage);
 	}
 
 unlock_reservations:
@@ -562,61 +550,36 @@ free_handles:
 	return ret;
 }
 
-static int ivpu_job_done_thread(void *arg)
+static void
+ivpu_job_done_callback(struct ivpu_device *vdev, struct ivpu_ipc_hdr *ipc_hdr,
+		       struct vpu_jsm_msg *jsm_msg)
 {
-	struct ivpu_device *vdev = (struct ivpu_device *)arg;
-	struct ivpu_ipc_consumer cons;
-	struct vpu_jsm_msg jsm_msg;
-	bool jobs_submitted;
-	unsigned int timeout;
+	struct vpu_ipc_msg_payload_job_done *payload;
 	int ret;
 
-	ivpu_dbg(vdev, JOB, "Started %s\n", __func__);
-
-	ivpu_ipc_consumer_add(vdev, &cons, VPU_IPC_CHAN_JOB_RET);
-
-	while (!kthread_should_stop()) {
-		timeout = ivpu_tdr_timeout_ms ? ivpu_tdr_timeout_ms : vdev->timeout.tdr;
-		jobs_submitted = !xa_empty(&vdev->submitted_jobs_xa);
-		ret = ivpu_ipc_receive(vdev, &cons, NULL, &jsm_msg, timeout);
-		if (!ret) {
-			ivpu_job_done_message(vdev, &jsm_msg);
-		} else if (ret == -ETIMEDOUT) {
-			if (jobs_submitted && !xa_empty(&vdev->submitted_jobs_xa)) {
-				ivpu_err(vdev, "TDR detected, timeout %d ms", timeout);
-				ivpu_hw_diagnose_failure(vdev);
-				ivpu_pm_schedule_recovery(vdev);
-			}
-		}
+	if (!jsm_msg) {
+		ivpu_err(vdev, "IPC message has no JSM payload\n");
+		return;
 	}
 
-	ivpu_ipc_consumer_del(vdev, &cons);
-
-	ivpu_jobs_abort_all(vdev);
-
-	ivpu_dbg(vdev, JOB, "Stopped %s\n", __func__);
-	return 0;
-}
-
-int ivpu_job_done_thread_init(struct ivpu_device *vdev)
-{
-	struct task_struct *thread;
-
-	thread = kthread_run(&ivpu_job_done_thread, (void *)vdev, "ivpu_job_done_thread");
-	if (IS_ERR(thread)) {
-		ivpu_err(vdev, "Failed to start job completion thread\n");
-		return -EIO;
+	if (jsm_msg->result != VPU_JSM_STATUS_SUCCESS) {
+		ivpu_err(vdev, "Invalid JSM message result: %d\n", jsm_msg->result);
+		return;
 	}
 
-	get_task_struct(thread);
-	wake_up_process(thread);
-
-	vdev->job_done_thread = thread;
-
-	return 0;
+	payload = (struct vpu_ipc_msg_payload_job_done *)&jsm_msg->payload;
+	ret = ivpu_job_done(vdev, payload->job_id, payload->job_status);
+	if (!ret && !xa_empty(&vdev->submitted_jobs_xa))
+		ivpu_start_job_timeout_detection(vdev);
 }
 
-void ivpu_job_done_thread_fini(struct ivpu_device *vdev)
+void ivpu_job_done_consumer_init(struct ivpu_device *vdev)
 {
-	kthread_stop_put(vdev->job_done_thread);
+	ivpu_ipc_consumer_add(vdev, &vdev->job_done_consumer,
+			      VPU_IPC_CHAN_JOB_RET, ivpu_job_done_callback);
+}
+
+void ivpu_job_done_consumer_fini(struct ivpu_device *vdev)
+{
+	ivpu_ipc_consumer_del(vdev, &vdev->job_done_consumer);
 }
diff --git a/drivers/accel/ivpu/ivpu_job.h b/drivers/accel/ivpu/ivpu_job.h
index 5514c2d8a609..45a2f2ec82e5 100644
--- a/drivers/accel/ivpu/ivpu_job.h
+++ b/drivers/accel/ivpu/ivpu_job.h
@@ -59,8 +59,8 @@ int ivpu_submit_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 void ivpu_cmdq_release_all(struct ivpu_file_priv *file_priv);
 void ivpu_cmdq_reset_all_contexts(struct ivpu_device *vdev);
 
-int ivpu_job_done_thread_init(struct ivpu_device *vdev);
-void ivpu_job_done_thread_fini(struct ivpu_device *vdev);
+void ivpu_job_done_consumer_init(struct ivpu_device *vdev);
+void ivpu_job_done_consumer_fini(struct ivpu_device *vdev);
 
 void ivpu_jobs_abort_all(struct ivpu_device *vdev);
 
diff --git a/drivers/accel/ivpu/ivpu_jsm_msg.c b/drivers/accel/ivpu/ivpu_jsm_msg.c
index 0c2fe7142024..8cea0dd731b9 100644
--- a/drivers/accel/ivpu/ivpu_jsm_msg.c
+++ b/drivers/accel/ivpu/ivpu_jsm_msg.c
@@ -4,6 +4,7 @@
  */
 
 #include "ivpu_drv.h"
+#include "ivpu_hw.h"
 #include "ivpu_ipc.h"
 #include "ivpu_jsm_msg.h"
 
@@ -36,6 +37,17 @@ const char *ivpu_jsm_msg_type_to_str(enum vpu_ipc_msg_type type)
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_DESTROY_CMD_QUEUE);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_SET_CONTEXT_SCHED_PROPERTIES);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_REGISTER_DB);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_RESUME_CMDQ);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SUSPEND_CMDQ);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_RESUME_CMDQ_RSP);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SUSPEND_CMDQ_DONE);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG_RSP);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_ENGINE_RESUME);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_RESUME_ENGINE_DONE);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_STATE_DUMP);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_STATE_DUMP_RSP);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_BLOB_DEINIT);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_DYNDBG_CONTROL);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_JOB_DONE);
@@ -65,6 +77,12 @@ const char *ivpu_jsm_msg_type_to_str(enum vpu_ipc_msg_type type)
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_SET_CONTEXT_SCHED_PROPERTIES_RSP);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_BLOB_DEINIT_DONE);
 	IVPU_CASE_TO_STR(VPU_JSM_MSG_DYNDBG_CONTROL_RSP);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_PWR_D0I3_ENTER);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_PWR_D0I3_ENTER_DONE);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_ENABLE);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_ENABLE_DONE);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_DISABLE);
+	IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_DISABLE_DONE);
 	}
 	#undef IVPU_CASE_TO_STR
 
@@ -243,3 +261,23 @@ int ivpu_jsm_context_release(struct ivpu_device *vdev, u32 host_ssid)
 	return ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_SSID_RELEASE_DONE, &resp,
 				     VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm);
 }
+
+int ivpu_jsm_pwr_d0i3_enter(struct ivpu_device *vdev)
+{
+	struct vpu_jsm_msg req = { .type = VPU_JSM_MSG_PWR_D0I3_ENTER };
+	struct vpu_jsm_msg resp;
+	int ret;
+
+	if (IVPU_WA(disable_d0i3_msg))
+		return 0;
+
+	req.payload.pwr_d0i3_enter.send_response = 1;
+
+	ret = ivpu_ipc_send_receive_active(vdev, &req, VPU_JSM_MSG_PWR_D0I3_ENTER_DONE,
+					   &resp, VPU_IPC_CHAN_GEN_CMD,
+					   vdev->timeout.d0i3_entry_msg);
+	if (ret)
+		return ret;
+
+	return ivpu_hw_wait_for_idle(vdev);
+}
diff --git a/drivers/accel/ivpu/ivpu_jsm_msg.h b/drivers/accel/ivpu/ivpu_jsm_msg.h
index 66979a948c7c..ae75e5dbcc41 100644
--- a/drivers/accel/ivpu/ivpu_jsm_msg.h
+++ b/drivers/accel/ivpu/ivpu_jsm_msg.h
@@ -22,4 +22,5 @@ int ivpu_jsm_trace_get_capability(struct ivpu_device *vdev, u32 *trace_destinati
 int ivpu_jsm_trace_set_config(struct ivpu_device *vdev, u32 trace_level, u32 trace_destination_mask,
 			      u64 trace_hw_component_mask);
 int ivpu_jsm_context_release(struct ivpu_device *vdev, u32 host_ssid);
+int ivpu_jsm_pwr_d0i3_enter(struct ivpu_device *vdev);
 #endif
diff --git a/drivers/accel/ivpu/ivpu_mmu.c b/drivers/accel/ivpu/ivpu_mmu.c
index 2538c78fbebe..2228c44b115f 100644
--- a/drivers/accel/ivpu/ivpu_mmu.c
+++ b/drivers/accel/ivpu/ivpu_mmu.c
@@ -230,7 +230,12 @@
 				  (REG_FLD(IVPU_MMU_REG_GERROR, MSI_PRIQ_ABT)) | \
 				  (REG_FLD(IVPU_MMU_REG_GERROR, MSI_ABT)))
 
-static char *ivpu_mmu_event_to_str(u32 cmd)
+#define IVPU_MMU_CERROR_NONE         0x0
+#define IVPU_MMU_CERROR_ILL          0x1
+#define IVPU_MMU_CERROR_ABT          0x2
+#define IVPU_MMU_CERROR_ATC_INV_SYNC 0x3
+
+static const char *ivpu_mmu_event_to_str(u32 cmd)
 {
 	switch (cmd) {
 	case IVPU_MMU_EVT_F_UUT:
@@ -276,6 +281,22 @@ static char *ivpu_mmu_event_to_str(u32 cmd)
 	}
 }
 
+static const char *ivpu_mmu_cmdq_err_to_str(u32 err)
+{
+	switch (err) {
+	case IVPU_MMU_CERROR_NONE:
+		return "No CMDQ Error";
+	case IVPU_MMU_CERROR_ILL:
+		return "Illegal command";
+	case IVPU_MMU_CERROR_ABT:
+		return "External abort on CMDQ read";
+	case IVPU_MMU_CERROR_ATC_INV_SYNC:
+		return "Sync failed to complete ATS invalidation";
+	default:
+		return "Unknown CMDQ Error";
+	}
+}
+
 static void ivpu_mmu_config_check(struct ivpu_device *vdev)
 {
 	u32 val_ref;
@@ -479,10 +500,7 @@ static int ivpu_mmu_cmdq_sync(struct ivpu_device *vdev)
 	u64 val;
 	int ret;
 
-	val = FIELD_PREP(IVPU_MMU_CMD_OPCODE, CMD_SYNC) |
-	      FIELD_PREP(IVPU_MMU_CMD_SYNC_0_CS, 0x2) |
-	      FIELD_PREP(IVPU_MMU_CMD_SYNC_0_MSH, 0x3) |
-	      FIELD_PREP(IVPU_MMU_CMD_SYNC_0_MSI_ATTR, 0xf);
+	val = FIELD_PREP(IVPU_MMU_CMD_OPCODE, CMD_SYNC);
 
 	ret = ivpu_mmu_cmdq_cmd_write(vdev, "SYNC", val, 0);
 	if (ret)
@@ -492,8 +510,15 @@ static int ivpu_mmu_cmdq_sync(struct ivpu_device *vdev)
 	REGV_WR32(IVPU_MMU_REG_CMDQ_PROD, q->prod);
 
 	ret = ivpu_mmu_cmdq_wait_for_cons(vdev);
-	if (ret)
-		ivpu_err(vdev, "Timed out waiting for consumer: %d\n", ret);
+	if (ret) {
+		u32 err;
+
+		val = REGV_RD32(IVPU_MMU_REG_CMDQ_CONS);
+		err = REG_GET_FLD(IVPU_MMU_REG_CMDQ_CONS, ERR, val);
+
+		ivpu_err(vdev, "Timed out waiting for MMU consumer: %d, error: %s\n", ret,
+			 ivpu_mmu_cmdq_err_to_str(err));
+	}
 
 	return ret;
 }
@@ -750,9 +775,12 @@ int ivpu_mmu_init(struct ivpu_device *vdev)
 
 	ivpu_dbg(vdev, MMU, "Init..\n");
 
-	drmm_mutex_init(&vdev->drm, &mmu->lock);
 	ivpu_mmu_config_check(vdev);
 
+	ret = drmm_mutex_init(&vdev->drm, &mmu->lock);
+	if (ret)
+		return ret;
+
 	ret = ivpu_mmu_structs_alloc(vdev);
 	if (ret)
 		return ret;
diff --git a/drivers/accel/ivpu/ivpu_mmu_context.c b/drivers/accel/ivpu/ivpu_mmu_context.c
index c1050a2df954..12a8c09d4547 100644
--- a/drivers/accel/ivpu/ivpu_mmu_context.c
+++ b/drivers/accel/ivpu/ivpu_mmu_context.c
@@ -5,6 +5,9 @@
 
 #include <linux/bitfield.h>
 #include <linux/highmem.h>
+#include <linux/set_memory.h>
+
+#include <drm/drm_cache.h>
 
 #include "ivpu_drv.h"
 #include "ivpu_hw.h"
@@ -39,12 +42,57 @@
 #define IVPU_MMU_ENTRY_MAPPED  (IVPU_MMU_ENTRY_FLAG_AF | IVPU_MMU_ENTRY_FLAG_USER | \
 				IVPU_MMU_ENTRY_FLAG_NG | IVPU_MMU_ENTRY_VALID)
 
+static void *ivpu_pgtable_alloc_page(struct ivpu_device *vdev, dma_addr_t *dma)
+{
+	dma_addr_t dma_addr;
+	struct page *page;
+	void *cpu;
+
+	page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+	if (!page)
+		return NULL;
+
+	set_pages_array_wc(&page, 1);
+
+	dma_addr = dma_map_page(vdev->drm.dev, page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(vdev->drm.dev, dma_addr))
+		goto err_free_page;
+
+	cpu = vmap(&page, 1, VM_MAP, pgprot_writecombine(PAGE_KERNEL));
+	if (!cpu)
+		goto err_dma_unmap_page;
+
+
+	*dma = dma_addr;
+	return cpu;
+
+err_dma_unmap_page:
+	dma_unmap_page(vdev->drm.dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+err_free_page:
+	put_page(page);
+	return NULL;
+}
+
+static void ivpu_pgtable_free_page(struct ivpu_device *vdev, u64 *cpu_addr, dma_addr_t dma_addr)
+{
+	struct page *page;
+
+	if (cpu_addr) {
+		page = vmalloc_to_page(cpu_addr);
+		vunmap(cpu_addr);
+		dma_unmap_page(vdev->drm.dev, dma_addr & ~IVPU_MMU_ENTRY_FLAGS_MASK, PAGE_SIZE,
+			       DMA_BIDIRECTIONAL);
+		set_pages_array_wb(&page, 1);
+		put_page(page);
+	}
+}
+
 static int ivpu_mmu_pgtable_init(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable)
 {
 	dma_addr_t pgd_dma;
 
-	pgtable->pgd_dma_ptr = dma_alloc_coherent(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pgd_dma,
-						  GFP_KERNEL);
+	pgtable->pgd_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pgd_dma);
 	if (!pgtable->pgd_dma_ptr)
 		return -ENOMEM;
 
@@ -53,13 +101,6 @@ static int ivpu_mmu_pgtable_init(struct ivpu_device *vdev, struct ivpu_mmu_pgtab
 	return 0;
 }
 
-static void ivpu_mmu_pgtable_free(struct ivpu_device *vdev, u64 *cpu_addr, dma_addr_t dma_addr)
-{
-	if (cpu_addr)
-		dma_free_coherent(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, cpu_addr,
-				  dma_addr & ~IVPU_MMU_ENTRY_FLAGS_MASK);
-}
-
 static void ivpu_mmu_pgtables_free(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable)
 {
 	int pgd_idx, pud_idx, pmd_idx;
@@ -84,19 +125,19 @@ static void ivpu_mmu_pgtables_free(struct ivpu_device *vdev, struct ivpu_mmu_pgt
 				pte_dma_ptr = pgtable->pte_ptrs[pgd_idx][pud_idx][pmd_idx];
 				pte_dma = pgtable->pmd_ptrs[pgd_idx][pud_idx][pmd_idx];
 
-				ivpu_mmu_pgtable_free(vdev, pte_dma_ptr, pte_dma);
+				ivpu_pgtable_free_page(vdev, pte_dma_ptr, pte_dma);
 			}
 
 			kfree(pgtable->pte_ptrs[pgd_idx][pud_idx]);
-			ivpu_mmu_pgtable_free(vdev, pmd_dma_ptr, pmd_dma);
+			ivpu_pgtable_free_page(vdev, pmd_dma_ptr, pmd_dma);
 		}
 
 		kfree(pgtable->pmd_ptrs[pgd_idx]);
 		kfree(pgtable->pte_ptrs[pgd_idx]);
-		ivpu_mmu_pgtable_free(vdev, pud_dma_ptr, pud_dma);
+		ivpu_pgtable_free_page(vdev, pud_dma_ptr, pud_dma);
 	}
 
-	ivpu_mmu_pgtable_free(vdev, pgtable->pgd_dma_ptr, pgtable->pgd_dma);
+	ivpu_pgtable_free_page(vdev, pgtable->pgd_dma_ptr, pgtable->pgd_dma);
 }
 
 static u64*
@@ -108,7 +149,7 @@ ivpu_mmu_ensure_pud(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
 	if (pud_dma_ptr)
 		return pud_dma_ptr;
 
-	pud_dma_ptr = dma_alloc_wc(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pud_dma, GFP_KERNEL);
+	pud_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pud_dma);
 	if (!pud_dma_ptr)
 		return NULL;
 
@@ -131,7 +172,7 @@ err_free_pmd_ptrs:
 	kfree(pgtable->pmd_ptrs[pgd_idx]);
 
 err_free_pud_dma_ptr:
-	ivpu_mmu_pgtable_free(vdev, pud_dma_ptr, pud_dma);
+	ivpu_pgtable_free_page(vdev, pud_dma_ptr, pud_dma);
 	return NULL;
 }
 
@@ -145,7 +186,7 @@ ivpu_mmu_ensure_pmd(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
 	if (pmd_dma_ptr)
 		return pmd_dma_ptr;
 
-	pmd_dma_ptr = dma_alloc_wc(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pmd_dma, GFP_KERNEL);
+	pmd_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pmd_dma);
 	if (!pmd_dma_ptr)
 		return NULL;
 
@@ -160,7 +201,7 @@ ivpu_mmu_ensure_pmd(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
 	return pmd_dma_ptr;
 
 err_free_pmd_dma_ptr:
-	ivpu_mmu_pgtable_free(vdev, pmd_dma_ptr, pmd_dma);
+	ivpu_pgtable_free_page(vdev, pmd_dma_ptr, pmd_dma);
 	return NULL;
 }
 
@@ -174,7 +215,7 @@ ivpu_mmu_ensure_pte(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
 	if (pte_dma_ptr)
 		return pte_dma_ptr;
 
-	pte_dma_ptr = dma_alloc_wc(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pte_dma, GFP_KERNEL);
+	pte_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pte_dma);
 	if (!pte_dma_ptr)
 		return NULL;
 
@@ -249,38 +290,6 @@ static void ivpu_mmu_context_unmap_page(struct ivpu_mmu_context *ctx, u64 vpu_ad
 	ctx->pgtable.pte_ptrs[pgd_idx][pud_idx][pmd_idx][pte_idx] = IVPU_MMU_ENTRY_INVALID;
 }
 
-static void
-ivpu_mmu_context_flush_page_tables(struct ivpu_mmu_context *ctx, u64 vpu_addr, size_t size)
-{
-	struct ivpu_mmu_pgtable *pgtable = &ctx->pgtable;
-	u64 end_addr = vpu_addr + size;
-
-	/* Align to PMD entry (2 MB) */
-	vpu_addr &= ~(IVPU_MMU_PTE_MAP_SIZE - 1);
-
-	while (vpu_addr < end_addr) {
-		int pgd_idx = FIELD_GET(IVPU_MMU_PGD_INDEX_MASK, vpu_addr);
-		u64 pud_end = (pgd_idx + 1) * (u64)IVPU_MMU_PUD_MAP_SIZE;
-
-		while (vpu_addr < end_addr && vpu_addr < pud_end) {
-			int pud_idx = FIELD_GET(IVPU_MMU_PUD_INDEX_MASK, vpu_addr);
-			u64 pmd_end = (pud_idx + 1) * (u64)IVPU_MMU_PMD_MAP_SIZE;
-
-			while (vpu_addr < end_addr && vpu_addr < pmd_end) {
-				int pmd_idx = FIELD_GET(IVPU_MMU_PMD_INDEX_MASK, vpu_addr);
-
-				clflush_cache_range(pgtable->pte_ptrs[pgd_idx][pud_idx][pmd_idx],
-						    IVPU_MMU_PGTABLE_SIZE);
-				vpu_addr += IVPU_MMU_PTE_MAP_SIZE;
-			}
-			clflush_cache_range(pgtable->pmd_ptrs[pgd_idx][pud_idx],
-					    IVPU_MMU_PGTABLE_SIZE);
-		}
-		clflush_cache_range(pgtable->pud_ptrs[pgd_idx], IVPU_MMU_PGTABLE_SIZE);
-	}
-	clflush_cache_range(pgtable->pgd_dma_ptr, IVPU_MMU_PGTABLE_SIZE);
-}
-
 static int
 ivpu_mmu_context_map_pages(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
 			   u64 vpu_addr, dma_addr_t dma_addr, size_t size, u64 prot)
@@ -327,6 +336,9 @@ ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
 	u64 prot;
 	u64 i;
 
+	if (drm_WARN_ON(&vdev->drm, !ctx))
+		return -EINVAL;
+
 	if (!IS_ALIGNED(vpu_addr, IVPU_MMU_PAGE_SIZE))
 		return -EINVAL;
 
@@ -349,10 +361,11 @@ ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
 			mutex_unlock(&ctx->lock);
 			return ret;
 		}
-		ivpu_mmu_context_flush_page_tables(ctx, vpu_addr, size);
 		vpu_addr += size;
 	}
 
+	/* Ensure page table modifications are flushed from wc buffers to memory */
+	wmb();
 	mutex_unlock(&ctx->lock);
 
 	ret = ivpu_mmu_invalidate_tlb(vdev, ctx->id);
@@ -369,8 +382,8 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
 	int ret;
 	u64 i;
 
-	if (!IS_ALIGNED(vpu_addr, IVPU_MMU_PAGE_SIZE))
-		ivpu_warn(vdev, "Unaligned vpu_addr: 0x%llx\n", vpu_addr);
+	if (drm_WARN_ON(&vdev->drm, !ctx))
+		return;
 
 	mutex_lock(&ctx->lock);
 
@@ -378,10 +391,11 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
 		size_t size = sg_dma_len(sg) + sg->offset;
 
 		ivpu_mmu_context_unmap_pages(ctx, vpu_addr, size);
-		ivpu_mmu_context_flush_page_tables(ctx, vpu_addr, size);
 		vpu_addr += size;
 	}
 
+	/* Ensure page table modifications are flushed from wc buffers to memory */
+	wmb();
 	mutex_unlock(&ctx->lock);
 
 	ret = ivpu_mmu_invalidate_tlb(vdev, ctx->id);
@@ -390,28 +404,34 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
 }
 
 int
-ivpu_mmu_context_insert_node_locked(struct ivpu_mmu_context *ctx,
-				    const struct ivpu_addr_range *range,
-				    u64 size, struct drm_mm_node *node)
+ivpu_mmu_context_insert_node(struct ivpu_mmu_context *ctx, const struct ivpu_addr_range *range,
+			     u64 size, struct drm_mm_node *node)
 {
-	lockdep_assert_held(&ctx->lock);
+	int ret;
 
+	WARN_ON(!range);
+
+	mutex_lock(&ctx->lock);
 	if (!ivpu_disable_mmu_cont_pages && size >= IVPU_MMU_CONT_PAGES_SIZE) {
-		if (!drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_CONT_PAGES_SIZE, 0,
-						 range->start, range->end, DRM_MM_INSERT_BEST))
-			return 0;
+		ret = drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_CONT_PAGES_SIZE, 0,
+						  range->start, range->end, DRM_MM_INSERT_BEST);
+		if (!ret)
+			goto unlock;
 	}
 
-	return drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_PAGE_SIZE, 0,
-					   range->start, range->end, DRM_MM_INSERT_BEST);
+	ret = drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_PAGE_SIZE, 0,
+					  range->start, range->end, DRM_MM_INSERT_BEST);
+unlock:
+	mutex_unlock(&ctx->lock);
+	return ret;
 }
 
 void
-ivpu_mmu_context_remove_node_locked(struct ivpu_mmu_context *ctx, struct drm_mm_node *node)
+ivpu_mmu_context_remove_node(struct ivpu_mmu_context *ctx, struct drm_mm_node *node)
 {
-	lockdep_assert_held(&ctx->lock);
-
+	mutex_lock(&ctx->lock);
 	drm_mm_remove_node(node);
+	mutex_unlock(&ctx->lock);
 }
 
 static int
@@ -421,7 +441,6 @@ ivpu_mmu_context_init(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx, u3
 	int ret;
 
 	mutex_init(&ctx->lock);
-	INIT_LIST_HEAD(&ctx->bo_list);
 
 	ret = ivpu_mmu_pgtable_init(vdev, &ctx->pgtable);
 	if (ret) {
diff --git a/drivers/accel/ivpu/ivpu_mmu_context.h b/drivers/accel/ivpu/ivpu_mmu_context.h
index f15d8c630d8a..535db3a1fc74 100644
--- a/drivers/accel/ivpu/ivpu_mmu_context.h
+++ b/drivers/accel/ivpu/ivpu_mmu_context.h
@@ -23,10 +23,9 @@ struct ivpu_mmu_pgtable {
 };
 
 struct ivpu_mmu_context {
-	struct mutex lock; /* protects: mm, pgtable, bo_list */
+	struct mutex lock; /* Protects: mm, pgtable */
 	struct drm_mm mm;
 	struct ivpu_mmu_pgtable pgtable;
-	struct list_head bo_list;
 	u32 id;
 };
 
@@ -39,11 +38,9 @@ int ivpu_mmu_user_context_init(struct ivpu_device *vdev, struct ivpu_mmu_context
 void ivpu_mmu_user_context_fini(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx);
 void ivpu_mmu_user_context_mark_invalid(struct ivpu_device *vdev, u32 ssid);
 
-int ivpu_mmu_context_insert_node_locked(struct ivpu_mmu_context *ctx,
-					const struct ivpu_addr_range *range,
-					u64 size, struct drm_mm_node *node);
-void ivpu_mmu_context_remove_node_locked(struct ivpu_mmu_context *ctx,
-					 struct drm_mm_node *node);
+int ivpu_mmu_context_insert_node(struct ivpu_mmu_context *ctx, const struct ivpu_addr_range *range,
+				 u64 size, struct drm_mm_node *node);
+void ivpu_mmu_context_remove_node(struct ivpu_mmu_context *ctx, struct drm_mm_node *node);
 
 int ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
 			     u64 vpu_addr, struct sg_table *sgt, bool llc_coherent);
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index e9b16cbc26f4..0af8864cb3b5 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -15,6 +15,7 @@
 #include "ivpu_fw.h"
 #include "ivpu_ipc.h"
 #include "ivpu_job.h"
+#include "ivpu_jsm_msg.h"
 #include "ivpu_mmu.h"
 #include "ivpu_pm.h"
 
@@ -22,6 +23,10 @@ static bool ivpu_disable_recovery;
 module_param_named_unsafe(disable_recovery, ivpu_disable_recovery, bool, 0644);
 MODULE_PARM_DESC(disable_recovery, "Disables recovery when VPU hang is detected");
 
+static unsigned long ivpu_tdr_timeout_ms;
+module_param_named(tdr_timeout_ms, ivpu_tdr_timeout_ms, ulong, 0644);
+MODULE_PARM_DESC(tdr_timeout_ms, "Timeout for device hang detection, in milliseconds, 0 - default");
+
 #define PM_RESCHEDULE_LIMIT     5
 
 static void ivpu_pm_prepare_cold_boot(struct ivpu_device *vdev)
@@ -69,27 +74,31 @@ retry:
 	ret = ivpu_hw_power_up(vdev);
 	if (ret) {
 		ivpu_err(vdev, "Failed to power up HW: %d\n", ret);
-		return ret;
+		goto err_power_down;
 	}
 
 	ret = ivpu_mmu_enable(vdev);
 	if (ret) {
 		ivpu_err(vdev, "Failed to resume MMU: %d\n", ret);
-		ivpu_hw_power_down(vdev);
-		return ret;
+		goto err_power_down;
 	}
 
 	ret = ivpu_boot(vdev);
-	if (ret) {
-		ivpu_mmu_disable(vdev);
-		ivpu_hw_power_down(vdev);
-		if (!ivpu_fw_is_cold_boot(vdev)) {
-			ivpu_warn(vdev, "Failed to resume the FW: %d. Retrying cold boot..\n", ret);
-			ivpu_pm_prepare_cold_boot(vdev);
-			goto retry;
-		} else {
-			ivpu_err(vdev, "Failed to resume the FW: %d\n", ret);
-		}
+	if (ret)
+		goto err_mmu_disable;
+
+	return 0;
+
+err_mmu_disable:
+	ivpu_mmu_disable(vdev);
+err_power_down:
+	ivpu_hw_power_down(vdev);
+
+	if (!ivpu_fw_is_cold_boot(vdev)) {
+		ivpu_pm_prepare_cold_boot(vdev);
+		goto retry;
+	} else {
+		ivpu_err(vdev, "Failed to resume the FW: %d\n", ret);
 	}
 
 	return ret;
@@ -136,6 +145,31 @@ void ivpu_pm_schedule_recovery(struct ivpu_device *vdev)
 	}
 }
 
+static void ivpu_job_timeout_work(struct work_struct *work)
+{
+	struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, job_timeout_work.work);
+	struct ivpu_device *vdev = pm->vdev;
+	unsigned long timeout_ms = ivpu_tdr_timeout_ms ? ivpu_tdr_timeout_ms : vdev->timeout.tdr;
+
+	ivpu_err(vdev, "TDR detected, timeout %lu ms", timeout_ms);
+	ivpu_hw_diagnose_failure(vdev);
+
+	ivpu_pm_schedule_recovery(vdev);
+}
+
+void ivpu_start_job_timeout_detection(struct ivpu_device *vdev)
+{
+	unsigned long timeout_ms = ivpu_tdr_timeout_ms ? ivpu_tdr_timeout_ms : vdev->timeout.tdr;
+
+	/* No-op if already queued */
+	queue_delayed_work(system_wq, &vdev->pm->job_timeout_work, msecs_to_jiffies(timeout_ms));
+}
+
+void ivpu_stop_job_timeout_detection(struct ivpu_device *vdev)
+{
+	cancel_delayed_work_sync(&vdev->pm->job_timeout_work);
+}
+
 int ivpu_pm_suspend_cb(struct device *dev)
 {
 	struct drm_device *drm = dev_get_drvdata(dev);
@@ -153,6 +187,8 @@ int ivpu_pm_suspend_cb(struct device *dev)
 		}
 	}
 
+	ivpu_jsm_pwr_d0i3_enter(vdev);
+
 	ivpu_suspend(vdev);
 	ivpu_pm_prepare_warm_boot(vdev);
 
@@ -188,6 +224,7 @@ int ivpu_pm_runtime_suspend_cb(struct device *dev)
 {
 	struct drm_device *drm = dev_get_drvdata(dev);
 	struct ivpu_device *vdev = to_ivpu_device(drm);
+	bool hw_is_idle = true;
 	int ret;
 
 	ivpu_dbg(vdev, PM, "Runtime suspend..\n");
@@ -200,11 +237,16 @@ int ivpu_pm_runtime_suspend_cb(struct device *dev)
 		return -EAGAIN;
 	}
 
+	if (!vdev->pm->suspend_reschedule_counter)
+		hw_is_idle = false;
+	else if (ivpu_jsm_pwr_d0i3_enter(vdev))
+		hw_is_idle = false;
+
 	ret = ivpu_suspend(vdev);
 	if (ret)
 		ivpu_err(vdev, "Failed to set suspend VPU: %d\n", ret);
 
-	if (!vdev->pm->suspend_reschedule_counter) {
+	if (!hw_is_idle) {
 		ivpu_warn(vdev, "VPU failed to enter idle, force suspended.\n");
 		ivpu_pm_prepare_cold_boot(vdev);
 	} else {
@@ -304,6 +346,7 @@ void ivpu_pm_init(struct ivpu_device *vdev)
 
 	atomic_set(&pm->in_reset, 0);
 	INIT_WORK(&pm->recovery_work, ivpu_pm_recovery_work);
+	INIT_DELAYED_WORK(&pm->job_timeout_work, ivpu_job_timeout_work);
 
 	if (ivpu_disable_recovery)
 		delay = -1;
@@ -318,6 +361,7 @@ void ivpu_pm_init(struct ivpu_device *vdev)
 
 void ivpu_pm_cancel_recovery(struct ivpu_device *vdev)
 {
+	drm_WARN_ON(&vdev->drm, delayed_work_pending(&vdev->pm->job_timeout_work));
 	cancel_work_sync(&vdev->pm->recovery_work);
 }
 
diff --git a/drivers/accel/ivpu/ivpu_pm.h b/drivers/accel/ivpu/ivpu_pm.h
index 044db150be07..97c6e0b0aa42 100644
--- a/drivers/accel/ivpu/ivpu_pm.h
+++ b/drivers/accel/ivpu/ivpu_pm.h
@@ -12,6 +12,7 @@ struct ivpu_device;
 
 struct ivpu_pm_info {
 	struct ivpu_device *vdev;
+	struct delayed_work job_timeout_work;
 	struct work_struct recovery_work;
 	atomic_t in_reset;
 	atomic_t reset_counter;
@@ -37,5 +38,7 @@ int __must_check ivpu_rpm_get_if_active(struct ivpu_device *vdev);
 void ivpu_rpm_put(struct ivpu_device *vdev);
 
 void ivpu_pm_schedule_recovery(struct ivpu_device *vdev);
+void ivpu_start_job_timeout_detection(struct ivpu_device *vdev);
+void ivpu_stop_job_timeout_detection(struct ivpu_device *vdev);
 
 #endif /* __IVPU_PM_H__ */
diff --git a/drivers/accel/ivpu/vpu_boot_api.h b/drivers/accel/ivpu/vpu_boot_api.h
index 6b71be92ba65..04c954258563 100644
--- a/drivers/accel/ivpu/vpu_boot_api.h
+++ b/drivers/accel/ivpu/vpu_boot_api.h
@@ -11,7 +11,10 @@
  *  The bellow values will be used to construct the version info this way:
  *  fw_bin_header->api_version[VPU_BOOT_API_VER_ID] = (VPU_BOOT_API_VER_MAJOR << 16) |
  *  VPU_BOOT_API_VER_MINOR;
- *  VPU_BOOT_API_VER_PATCH will be ignored. KMD and compatibility is not affected if this changes.
+ *  VPU_BOOT_API_VER_PATCH will be ignored. KMD and compatibility is not affected if this changes
+ *  This information is collected by using vpuip_2/application/vpuFirmware/make_std_fw_image.py
+ *  If a header is missing this info we ignore the header, if a header is missing or contains
+ *  partial info a build error will be generated.
  */
 
 /*
@@ -24,12 +27,12 @@
  * Minor version changes when API backward compatibility is preserved.
  * Resets to 0 if Major version is incremented.
  */
-#define VPU_BOOT_API_VER_MINOR 12
+#define VPU_BOOT_API_VER_MINOR 20
 
 /*
  * API header changed (field names, documentation, formatting) but API itself has not been changed
  */
-#define VPU_BOOT_API_VER_PATCH 2
+#define VPU_BOOT_API_VER_PATCH 4
 
 /*
  * Index in the API version table
@@ -63,6 +66,12 @@ struct vpu_firmware_header {
 	/* Size of memory require for firmware execution */
 	u32 runtime_size;
 	u32 shave_nn_fw_size;
+	/* Size of primary preemption buffer. */
+	u32 preemption_buffer_1_size;
+	/* Size of secondary preemption buffer. */
+	u32 preemption_buffer_2_size;
+	/* Space reserved for future preemption-related fields. */
+	u32 preemption_reserved[6];
 };
 
 /*
@@ -89,6 +98,14 @@ enum VPU_BOOT_L2_CACHE_CFG_TYPE {
 	VPU_BOOT_L2_CACHE_CFG_NUM = 2
 };
 
+/** VPU MCA ECC signalling mode. By default, no signalling is used */
+enum VPU_BOOT_MCA_ECC_SIGNAL_TYPE {
+	VPU_BOOT_MCA_ECC_NONE = 0,
+	VPU_BOOT_MCA_ECC_CORR = 1,
+	VPU_BOOT_MCA_ECC_FATAL = 2,
+	VPU_BOOT_MCA_ECC_BOTH = 3
+};
+
 /**
  * Logging destinations.
  *
@@ -131,9 +148,11 @@ enum vpu_trace_destination {
 #define VPU_TRACE_PROC_BIT_ACT_SHV_3 22
 #define VPU_TRACE_PROC_NO_OF_HW_DEVS 23
 
-/* KMB HW component IDs are sequential, so define first and last IDs. */
-#define VPU_TRACE_PROC_BIT_KMB_FIRST VPU_TRACE_PROC_BIT_LRT
-#define VPU_TRACE_PROC_BIT_KMB_LAST  VPU_TRACE_PROC_BIT_SHV_15
+/* VPU 30xx HW component IDs are sequential, so define first and last IDs. */
+#define VPU_TRACE_PROC_BIT_30XX_FIRST VPU_TRACE_PROC_BIT_LRT
+#define VPU_TRACE_PROC_BIT_30XX_LAST  VPU_TRACE_PROC_BIT_SHV_15
+#define VPU_TRACE_PROC_BIT_KMB_FIRST  VPU_TRACE_PROC_BIT_30XX_FIRST
+#define VPU_TRACE_PROC_BIT_KMB_LAST   VPU_TRACE_PROC_BIT_30XX_LAST
 
 struct vpu_boot_l2_cache_config {
 	u8 use;
@@ -148,6 +167,25 @@ struct vpu_warm_boot_section {
 	u32 is_clear_op;
 };
 
+/*
+ * When HW scheduling mode is enabled, a present period is defined.
+ * It will be used by VPU to swap between normal and focus priorities
+ * to prevent starving of normal priority band (when implemented).
+ * Host must provide a valid value at boot time in
+ * `vpu_focus_present_timer_ms`. If the value provided by the host is not within the
+ * defined range a default value will be used. Here we define the min. and max.
+ * allowed values and the and default value of the present period. Units are milliseconds.
+ */
+#define VPU_PRESENT_CALL_PERIOD_MS_DEFAULT 50
+#define VPU_PRESENT_CALL_PERIOD_MS_MIN	   16
+#define VPU_PRESENT_CALL_PERIOD_MS_MAX	   10000
+
+/**
+ * Macros to enable various operation modes within the VPU.
+ * To be defined as part of 32 bit mask.
+ */
+#define VPU_OP_MODE_SURVIVABILITY 0x1
+
 struct vpu_boot_params {
 	u32 magic;
 	u32 vpu_id;
@@ -218,6 +256,7 @@ struct vpu_boot_params {
 	 * the threshold will not be logged); applies to every enabled logging
 	 * destination and loggable HW component. See 'mvLog_t' enum for acceptable
 	 * values.
+	 * TODO: EISW-33556: Move log level definition (mvLog_t) to this file.
 	 */
 	u32 default_trace_level;
 	u32 boot_type;
@@ -249,7 +288,36 @@ struct vpu_boot_params {
 	u32 temp_sensor_period_ms;
 	/** PLL ratio for efficient clock frequency */
 	u32 pn_freq_pll_ratio;
-	u32 pad4[28];
+	/** DVFS Mode: Default: 0, Max Performance: 1, On Demand: 2, Power Save: 3 */
+	u32 dvfs_mode;
+	/**
+	 * Depending on DVFS Mode:
+	 * On-demand: Default if 0.
+	 *    Bit 0-7   - uint8_t: Highest residency percent
+	 *    Bit 8-15  - uint8_t: High residency percent
+	 *    Bit 16-23 - uint8_t: Low residency percent
+	 *    Bit 24-31 - uint8_t: Lowest residency percent
+	 *    Bit 32-35 - unsigned 4b: PLL Ratio increase amount on highest residency
+	 *    Bit 36-39 - unsigned 4b: PLL Ratio increase amount on high residency
+	 *    Bit 40-43 - unsigned 4b: PLL Ratio decrease amount on low residency
+	 *    Bit 44-47 - unsigned 4b: PLL Ratio decrease amount on lowest frequency
+	 *    Bit 48-55 - uint8_t: Period (ms) for residency decisions
+	 *    Bit 56-63 - uint8_t: Averaging windows (as multiples of period. Max: 30 decimal)
+	 * Power Save/Max Performance: Unused
+	 */
+	u64 dvfs_param;
+	/**
+	 * D0i3 delayed entry
+	 * Bit0: Disable CPU state save on D0i2 entry flow.
+	 *       0: Every D0i2 entry saves state. Save state IPC message ignored.
+	 *       1: IPC message required to save state on D0i3 entry flow.
+	 */
+	u32 d0i3_delayed_entry;
+	/* Time spent by VPU in D0i3 state */
+	u64 d0i3_residency_time_us;
+	/* Value of VPU perf counter at the time of entering D0i3 state . */
+	u64 d0i3_entry_vpu_ts;
+	u32 pad4[20];
 	/* Warm boot information: 0x400 - 0x43F */
 	u32 warm_boot_sections_count;
 	u32 warm_boot_start_address_reference;
@@ -274,8 +342,12 @@ struct vpu_boot_params {
 	u32 vpu_scheduling_mode;
 	/* Present call period in milliseconds. */
 	u32 vpu_focus_present_timer_ms;
-	/* Unused/reserved: 0x478 - 0xFFF */
-	u32 pad6[738];
+	/* VPU ECC Signaling */
+	u32 vpu_uses_ecc_mca_signal;
+	/* Values defined by VPU_OP_MODE* macros */
+	u32 vpu_operation_mode;
+	/* Unused/reserved: 0x480 - 0xFFF */
+	u32 pad6[736];
 };
 
 /*
diff --git a/drivers/accel/ivpu/vpu_jsm_api.h b/drivers/accel/ivpu/vpu_jsm_api.h
index 2949ec8365bd..7da7622742be 100644
--- a/drivers/accel/ivpu/vpu_jsm_api.h
+++ b/drivers/accel/ivpu/vpu_jsm_api.h
@@ -22,12 +22,12 @@
 /*
  * Minor version changes when API backward compatibility is preserved.
  */
-#define VPU_JSM_API_VER_MINOR 0
+#define VPU_JSM_API_VER_MINOR 15
 
 /*
  * API header changed (field names, documentation, formatting) but API itself has not been changed
  */
-#define VPU_JSM_API_VER_PATCH 1
+#define VPU_JSM_API_VER_PATCH 0
 
 /*
  * Index in the API version table
@@ -84,11 +84,13 @@
  * Job flags bit masks.
  */
 #define VPU_JOB_FLAGS_NULL_SUBMISSION_MASK 0x00000001
+#define VPU_JOB_FLAGS_PRIVATE_DATA_MASK	   0xFF000000
 
 /*
  * Sizes of the reserved areas in jobs, in bytes.
  */
-#define VPU_JOB_RESERVED_BYTES	     16
+#define VPU_JOB_RESERVED_BYTES 8
+
 /*
  * Sizes of the reserved areas in job queues, in bytes.
  */
@@ -108,6 +110,20 @@
  */
 #define VPU_DYNDBG_CMD_MAX_LEN 96
 
+/*
+ * For HWS command queue scheduling, we can prioritise command queues inside the
+ * same process with a relative in-process priority. Valid values for relative
+ * priority are given below - max and min.
+ */
+#define VPU_HWS_COMMAND_QUEUE_MAX_IN_PROCESS_PRIORITY 7
+#define VPU_HWS_COMMAND_QUEUE_MIN_IN_PROCESS_PRIORITY -7
+
+/*
+ * For HWS priority scheduling, we can have multiple realtime priority bands.
+ * They are numbered 0 to a MAX.
+ */
+#define VPU_HWS_MAX_REALTIME_PRIORITY_LEVEL 31U
+
 /*
  * Job format.
  */
@@ -117,8 +133,14 @@ struct vpu_job_queue_entry {
 	u32 flags; /**< Flags bit field, see VPU_JOB_FLAGS_* above */
 	u64 root_page_table_addr; /**< Address of root page table to use for this job */
 	u64 root_page_table_update_counter; /**< Page tables update events counter */
-	u64 preemption_buffer_address; /**< Address of the preemption buffer to use for this job */
-	u64 preemption_buffer_size; /**< Size of the preemption buffer to use for this job */
+	u64 primary_preempt_buf_addr;
+	/**< Address of the primary preemption buffer to use for this job */
+	u32 primary_preempt_buf_size;
+	/**< Size of the primary preemption buffer to use for this job */
+	u32 secondary_preempt_buf_size;
+	/**< Size of secondary preemption buffer to use for this job */
+	u64 secondary_preempt_buf_addr;
+	/**< Address of secondary preemption buffer to use for this job */
 	u8 reserved_0[VPU_JOB_RESERVED_BYTES];
 };
 
@@ -152,6 +174,46 @@ enum vpu_trace_entity_type {
 	VPU_TRACE_ENTITY_TYPE_HW_COMPONENT = 2,
 };
 
+/*
+ * HWS specific log buffer header details.
+ * Total size is 32 bytes.
+ */
+struct vpu_hws_log_buffer_header {
+	/* Written by VPU after adding a log entry. Initialised by host to 0. */
+	u32 first_free_entry_index;
+	/* Incremented by VPU every time the VPU overwrites the 0th entry;
+	 * initialised by host to 0.
+	 */
+	u32 wraparound_count;
+	/*
+	 * This is the number of buffers that can be stored in the log buffer provided by the host.
+	 * It is written by host before passing buffer to VPU. VPU should consider it read-only.
+	 */
+	u64 num_of_entries;
+	u64 reserved[2];
+};
+
+/*
+ * HWS specific log buffer entry details.
+ * Total size is 32 bytes.
+ */
+struct vpu_hws_log_buffer_entry {
+	/* VPU timestamp must be an invariant timer tick (not impacted by DVFS) */
+	u64 vpu_timestamp;
+	/*
+	 * Operation type:
+	 *     0 - context state change
+	 *     1 - queue new work
+	 *     2 - queue unwait sync object
+	 *     3 - queue no more work
+	 *     4 - queue wait sync object
+	 */
+	u32 operation_type;
+	u32 reserved;
+	/* Operation data depends on operation type */
+	u64 operation_data[2];
+};
+
 /*
  * Host <-> VPU IPC messages types.
  */
@@ -228,6 +290,23 @@ enum vpu_ipc_msg_type {
 	 * deallocated or reassigned to another context.
 	 */
 	VPU_JSM_MSG_HWS_REGISTER_DB = 0x1117,
+	/** Control command: Log buffer setting */
+	VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG = 0x1118,
+	/* Control command: Suspend command queue. */
+	VPU_JSM_MSG_HWS_SUSPEND_CMDQ = 0x1119,
+	/* Control command: Resume command queue */
+	VPU_JSM_MSG_HWS_RESUME_CMDQ = 0x111a,
+	/* Control command: Resume engine after reset */
+	VPU_JSM_MSG_HWS_ENGINE_RESUME = 0x111b,
+	/* Control command: Enable survivability/DCT mode */
+	VPU_JSM_MSG_DCT_ENABLE = 0x111c,
+	/* Control command: Disable survivability/DCT mode */
+	VPU_JSM_MSG_DCT_DISABLE = 0x111d,
+	/**
+	 * Dump VPU state. To be used for debug purposes only.
+	 * NOTE: Please introduce new ASYNC commands before this one. *
+	 */
+	VPU_JSM_MSG_STATE_DUMP = 0x11FF,
 	/* IPC Host -> Device, General commands */
 	VPU_JSM_MSG_GENERAL_CMD = 0x1200,
 	VPU_JSM_MSG_BLOB_DEINIT = VPU_JSM_MSG_GENERAL_CMD,
@@ -236,6 +315,10 @@ enum vpu_ipc_msg_type {
 	 * Linux command: `echo '<dyndbg_cmd>' > <debugfs>/dynamic_debug/control`.
 	 */
 	VPU_JSM_MSG_DYNDBG_CONTROL = 0x1201,
+	/**
+	 * Perform the save procedure for the D0i3 entry
+	 */
+	VPU_JSM_MSG_PWR_D0I3_ENTER = 0x1202,
 	/* IPC Device -> Host, Job completion */
 	VPU_JSM_MSG_JOB_DONE = 0x2100,
 	/* IPC Device -> Host, Async command completion */
@@ -304,11 +387,35 @@ enum vpu_ipc_msg_type {
 	VPU_JSM_MSG_DESTROY_CMD_QUEUE_RSP = 0x2216,
 	/** Response to control command: Set context scheduling properties */
 	VPU_JSM_MSG_SET_CONTEXT_SCHED_PROPERTIES_RSP = 0x2217,
+	/** Response to control command: Log buffer setting */
+	VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG_RSP = 0x2218,
+	/* IPC Device -> Host, HWS notify index entry of log buffer written */
+	VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION = 0x2219,
+	/* IPC Device -> Host, HWS completion of a context suspend request */
+	VPU_JSM_MSG_HWS_SUSPEND_CMDQ_DONE = 0x221a,
+	/* Response to control command: Resume command queue */
+	VPU_JSM_MSG_HWS_RESUME_CMDQ_RSP = 0x221b,
+	/* Response to control command: Resume engine command response */
+	VPU_JSM_MSG_HWS_RESUME_ENGINE_DONE = 0x221c,
+	/* Response to control command: Enable survivability/DCT mode */
+	VPU_JSM_MSG_DCT_ENABLE_DONE = 0x221d,
+	/* Response to control command: Disable survivability/DCT mode */
+	VPU_JSM_MSG_DCT_DISABLE_DONE = 0x221e,
+	/**
+	 * Response to state dump control command.
+	 * NOTE: Please introduce new ASYNC responses before this one. *
+	 */
+	VPU_JSM_MSG_STATE_DUMP_RSP = 0x22FF,
 	/* IPC Device -> Host, General command completion */
 	VPU_JSM_MSG_GENERAL_CMD_DONE = 0x2300,
 	VPU_JSM_MSG_BLOB_DEINIT_DONE = VPU_JSM_MSG_GENERAL_CMD_DONE,
 	/** Response to VPU_JSM_MSG_DYNDBG_CONTROL. */
 	VPU_JSM_MSG_DYNDBG_CONTROL_RSP = 0x2301,
+	/**
+	 * Acknowledgment of completion of the save procedure initiated by
+	 * VPU_JSM_MSG_PWR_D0I3_ENTER
+	 */
+	VPU_JSM_MSG_PWR_D0I3_ENTER_DONE = 0x2302,
 };
 
 enum vpu_ipc_msg_status { VPU_JSM_MSG_FREE, VPU_JSM_MSG_ALLOCATED };
@@ -593,12 +700,12 @@ struct vpu_ipc_msg_payload_hws_priority_band_setup {
 	 * Default quantum in 100ns units for scheduling across processes
 	 * within a priority band
 	 */
-	u64 process_quantum[VPU_HWS_NUM_PRIORITY_BANDS];
+	u32 process_quantum[VPU_HWS_NUM_PRIORITY_BANDS];
 	/*
 	 * Default grace period in 100ns units for processes that preempt each
 	 * other within a priority band
 	 */
-	u64 process_grace_period[VPU_HWS_NUM_PRIORITY_BANDS];
+	u32 process_grace_period[VPU_HWS_NUM_PRIORITY_BANDS];
 	/*
 	 * For normal priority band, specifies the target VPU percentage
 	 * in situations when it's starved by the focus band.
@@ -608,32 +715,51 @@ struct vpu_ipc_msg_payload_hws_priority_band_setup {
 	u32 reserved_0;
 };
 
-/* HWS create command queue request */
+/*
+ * @brief HWS create command queue request.
+ * Host will create a command queue via this command.
+ * Note: Cmdq group is a handle of an object which
+ * may contain one or more command queues.
+ * @see VPU_JSM_MSG_CREATE_CMD_QUEUE
+ * @see VPU_JSM_MSG_CREATE_CMD_QUEUE_RSP
+ */
 struct vpu_ipc_msg_payload_hws_create_cmdq {
 	/* Process id */
 	u64 process_id;
 	/* Host SSID */
 	u32 host_ssid;
-	/* Zero Padding */
-	u32 reserved;
+	/* Engine for which queue is being created */
+	u32 engine_idx;
+	/*
+	 * Cmdq group may be set to 0 or equal to
+	 * cmdq_id while each priority band contains
+	 * only single engine instances.
+	 */
+	u64 cmdq_group;
 	/* Command queue id */
 	u64 cmdq_id;
 	/* Command queue base */
 	u64 cmdq_base;
 	/* Command queue size */
 	u32 cmdq_size;
-	/* Reserved */
+	/* Zero padding */
 	u32 reserved_0;
 };
 
-/* HWS create command queue response */
+/*
+ * @brief HWS create command queue response.
+ * @see VPU_JSM_MSG_CREATE_CMD_QUEUE
+ * @see VPU_JSM_MSG_CREATE_CMD_QUEUE_RSP
+ */
 struct vpu_ipc_msg_payload_hws_create_cmdq_rsp {
 	/* Process id */
 	u64 process_id;
 	/* Host SSID */
 	u32 host_ssid;
-	/* Zero Padding */
-	u32 reserved;
+	/* Engine for which queue is being created */
+	u32 engine_idx;
+	/* Command queue group */
+	u64 cmdq_group;
 	/* Command queue id */
 	u64 cmdq_id;
 };
@@ -661,7 +787,7 @@ struct vpu_ipc_msg_payload_hws_set_context_sched_properties {
 	/* Inside realtime band assigns a further priority */
 	u32 realtime_priority_level;
 	/* Priority relative to other contexts in the same process */
-	u32 in_process_priority;
+	s32 in_process_priority;
 	/* Zero padding / Reserved */
 	u32 reserved_1;
 	/* Context quantum relative to other contexts of same priority in the same process */
@@ -694,6 +820,123 @@ struct vpu_jsm_hws_register_db {
 	u64 cmdq_size;
 };
 
+/*
+ * @brief Structure to set another buffer to be used for scheduling-related logging.
+ * The size of the logging buffer and the number of entries is defined as part of the
+ * buffer itself as described next.
+ * The log buffer received from the host is made up of;
+ *   - header:     32 bytes in size, as shown in 'struct vpu_hws_log_buffer_header'.
+ *                 The header contains the number of log entries in the buffer.
+ *   - log entry:  0 to n-1, each log entry is 32 bytes in size, as shown in
+ *                 'struct vpu_hws_log_buffer_entry'.
+ *                 The entry contains the VPU timestamp, operation type and data.
+ * The host should provide the notify index value of log buffer to VPU. This is a
+ * value defined within the log buffer and when written to will generate the
+ * scheduling log notification.
+ * The host should set engine_idx and vpu_log_buffer_va to 0 to disable logging
+ * for a particular engine.
+ * VPU will handle one log buffer for each of supported engines.
+ * VPU should allow the logging to consume one host_ssid.
+ * @see VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG
+ * @see VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG_RSP
+ * @see VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION
+ */
+struct vpu_ipc_msg_payload_hws_set_scheduling_log {
+	/* Engine ordinal */
+	u32 engine_idx;
+	/* Host SSID */
+	u32 host_ssid;
+	/*
+	 * VPU log buffer virtual address.
+	 * Set to 0 to disable logging for this engine.
+	 */
+	u64 vpu_log_buffer_va;
+	/*
+	 * Notify index of log buffer. VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION
+	 * is generated when an event log is written to this index.
+	 */
+	u64 notify_index;
+};
+
+/*
+ * @brief The scheduling log notification is generated by VPU when it writes
+ * an event into the log buffer at the notify_index. VPU notifies host with
+ * VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION. This is an asynchronous
+ * message from VPU to host.
+ * @see VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION
+ * @see VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG
+ */
+struct vpu_ipc_msg_payload_hws_scheduling_log_notification {
+	/* Engine ordinal */
+	u32 engine_idx;
+	/* Zero Padding */
+	u32 reserved_0;
+};
+
+/*
+ * @brief HWS suspend command queue request and done structure.
+ * Host will request the suspend of contexts and VPU will;
+ *   - Suspend all work on this context
+ *   - Preempt any running work
+ *   - Asynchronously perform the above and return success immediately once
+ *     all items above are started successfully
+ *   - Notify the host of completion of these operations via
+ *     VPU_JSM_MSG_HWS_SUSPEND_CMDQ_DONE
+ *   - Reject any other context operations on a context with an in-flight
+ *     suspend request running
+ * Same structure used when VPU notifies host of completion of a context suspend
+ * request. The ids and suspend fence value reported in this command will match
+ * the one in the request from the host to suspend the context. Once suspend is
+ * complete, VPU will not access any data relating to this command queue until
+ * it is resumed.
+ * @see VPU_JSM_MSG_HWS_SUSPEND_CMDQ
+ * @see VPU_JSM_MSG_HWS_SUSPEND_CMDQ_DONE
+ */
+struct vpu_ipc_msg_payload_hws_suspend_cmdq {
+	/* Host SSID */
+	u32 host_ssid;
+	/* Zero Padding */
+	u32 reserved_0;
+	/* Command queue id */
+	u64 cmdq_id;
+	/*
+	 * Suspend fence value - reported by the VPU suspend context
+	 * completed once suspend is complete.
+	 */
+	u64 suspend_fence_value;
+};
+
+/*
+ * @brief HWS Resume command queue request / response structure.
+ * Host will request the resume of a context;
+ *  - VPU will resume all work on this context
+ *  - Scheduler will allow this context to be scheduled
+ * @see VPU_JSM_MSG_HWS_RESUME_CMDQ
+ * @see VPU_JSM_MSG_HWS_RESUME_CMDQ_RSP
+ */
+struct vpu_ipc_msg_payload_hws_resume_cmdq {
+	/* Host SSID */
+	u32 host_ssid;
+	/* Zero Padding */
+	u32 reserved_0;
+	/* Command queue id */
+	u64 cmdq_id;
+};
+
+/*
+ * @brief HWS Resume engine request / response structure.
+ * After a HWS engine reset, all scheduling is stopped on VPU until a engine resume.
+ * Host shall send this command to resume scheduling of any valid queue.
+ * @see VPU_JSM_MSG_HWS_RESUME_ENGINE
+ * @see VPU_JSM_MSG_HWS_RESUME_ENGINE_DONE
+ */
+struct vpu_ipc_msg_payload_hws_resume_engine {
+	/* Engine to be resumed */
+	u32 engine_idx;
+	/* Reserved */
+	u32 reserved_0;
+};
+
 /**
  * Payload for VPU_JSM_MSG_TRACE_SET_CONFIG[_RSP] and
  * VPU_JSM_MSG_TRACE_GET_CONFIG_RSP messages.
@@ -938,6 +1181,35 @@ struct vpu_ipc_msg_payload_dyndbg_control {
 	char dyndbg_cmd[VPU_DYNDBG_CMD_MAX_LEN];
 };
 
+/**
+ * Payload for VPU_JSM_MSG_PWR_D0I3_ENTER
+ *
+ * This is a bi-directional payload.
+ */
+struct vpu_ipc_msg_payload_pwr_d0i3_enter {
+	/**
+	 * 0: VPU_JSM_MSG_PWR_D0I3_ENTER_DONE is not sent to the host driver
+	 *    The driver will poll for D0i2 Idle state transitions.
+	 * 1: VPU_JSM_MSG_PWR_D0I3_ENTER_DONE is sent after VPU state save is complete
+	 */
+	u32 send_response;
+	u32 reserved_0;
+};
+
+/**
+ * Payload for VPU_JSM_MSG_DCT_ENABLE message.
+ *
+ * Default values for DCT active/inactive times are 5.3ms and 30ms respectively,
+ * corresponding to a 85% duty cycle. This payload allows the host to tune these
+ * values according to application requirements.
+ */
+struct vpu_ipc_msg_payload_pwr_dct_control {
+	/** Duty cycle active time in microseconds */
+	u32 dct_active_us;
+	/** Duty cycle inactive time in microseconds */
+	u32 dct_inactive_us;
+};
+
 /*
  * Payloads union, used to define complete message format.
  */
@@ -974,6 +1246,13 @@ union vpu_ipc_msg_payload {
 	struct vpu_ipc_msg_payload_hws_destroy_cmdq hws_destroy_cmdq;
 	struct vpu_ipc_msg_payload_hws_set_context_sched_properties
 		hws_set_context_sched_properties;
+	struct vpu_ipc_msg_payload_hws_set_scheduling_log hws_set_scheduling_log;
+	struct vpu_ipc_msg_payload_hws_scheduling_log_notification hws_scheduling_log_notification;
+	struct vpu_ipc_msg_payload_hws_suspend_cmdq hws_suspend_cmdq;
+	struct vpu_ipc_msg_payload_hws_resume_cmdq hws_resume_cmdq;
+	struct vpu_ipc_msg_payload_hws_resume_engine hws_resume_engine;
+	struct vpu_ipc_msg_payload_pwr_d0i3_enter pwr_d0i3_enter;
+	struct vpu_ipc_msg_payload_pwr_dct_control pwr_dct_control;
 };
 
 /*
diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
index 2418418f7a50..3f7f6dfde7f2 100644
--- a/drivers/accel/qaic/Makefile
+++ b/drivers/accel/qaic/Makefile
@@ -9,4 +9,5 @@ qaic-y := \
 	mhi_controller.o \
 	qaic_control.o \
 	qaic_data.o \
-	qaic_drv.o
+	qaic_drv.o \
+	qaic_timesync.o
diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
index 5036e58e7235..5d3cc30009cc 100644
--- a/drivers/accel/qaic/mhi_controller.c
+++ b/drivers/accel/qaic/mhi_controller.c
@@ -348,7 +348,7 @@ static struct mhi_channel_config aic100_channels[] = {
 		.local_elements = 0,
 		.event_ring = 0,
 		.dir = DMA_TO_DEVICE,
-		.ee_mask = MHI_CH_EE_SBL | MHI_CH_EE_AMSS,
+		.ee_mask = MHI_CH_EE_SBL,
 		.pollcfg = 0,
 		.doorbell = MHI_DB_BRST_DISABLE,
 		.lpm_notify = false,
@@ -364,7 +364,39 @@ static struct mhi_channel_config aic100_channels[] = {
 		.local_elements = 0,
 		.event_ring = 0,
 		.dir = DMA_FROM_DEVICE,
-		.ee_mask = MHI_CH_EE_SBL | MHI_CH_EE_AMSS,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TIMESYNC_PERIODIC",
+		.num = 22,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.num = 23,
+		.name = "QAIC_TIMESYNC_PERIODIC",
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
 		.pollcfg = 0,
 		.doorbell = MHI_DB_BRST_DISABLE,
 		.lpm_notify = false,
@@ -468,7 +500,7 @@ static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntrl)
 }
 
 struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar,
-						    int mhi_irq)
+						    int mhi_irq, bool shared_msi)
 {
 	struct mhi_controller *mhi_cntrl;
 	int ret;
@@ -500,6 +532,10 @@ struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, voi
 		return ERR_PTR(-ENOMEM);
 
 	mhi_cntrl->irq[0] = mhi_irq;
+
+	if (shared_msi) /* MSI shared with data path, no IRQF_NO_SUSPEND */
+		mhi_cntrl->irq_flags = IRQF_SHARED;
+
 	mhi_cntrl->fw_image = "qcom/aic100/sbl.bin";
 
 	/* use latest configured timeout */
diff --git a/drivers/accel/qaic/mhi_controller.h b/drivers/accel/qaic/mhi_controller.h
index 2ae45d768e24..500e7f4af2af 100644
--- a/drivers/accel/qaic/mhi_controller.h
+++ b/drivers/accel/qaic/mhi_controller.h
@@ -8,7 +8,7 @@
 #define MHICONTROLLERQAIC_H_
 
 struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar,
-						    int mhi_irq);
+						    int mhi_irq, bool shared_msi);
 void qaic_mhi_free_controller(struct mhi_controller *mhi_cntrl, bool link_up);
 void qaic_mhi_start_reset(struct mhi_controller *mhi_cntrl);
 void qaic_mhi_reset_done(struct mhi_controller *mhi_cntrl);
diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
index e3f4c30f3ffd..bc40d52dc010 100644
--- a/drivers/accel/qaic/qaic.h
+++ b/drivers/accel/qaic/qaic.h
@@ -123,6 +123,8 @@ struct qaic_device {
 	struct srcu_struct	dev_lock;
 	/* true: Device under reset; false: Device not under reset */
 	bool			in_reset;
+	/* true: single MSI is used to operate device */
+	bool			single_msi;
 	/*
 	 * true: A tx MHI transaction has failed and a rx buffer is still queued
 	 * in control device. Such a buffer is considered lost rx buffer
@@ -137,6 +139,10 @@ struct qaic_device {
 	u32 (*gen_crc)(void *msg);
 	/* Validate the CRC of a control message */
 	bool (*valid_crc)(void *msg);
+	/* MHI "QAIC_TIMESYNC" channel device */
+	struct mhi_device	*qts_ch;
+	/* Work queue for tasks related to MHI "QAIC_TIMESYNC" channel */
+	struct workqueue_struct	*qts_wq;
 };
 
 struct qaic_drm_device {
diff --git a/drivers/accel/qaic/qaic_control.c b/drivers/accel/qaic/qaic_control.c
index 388abd40024b..84915824be54 100644
--- a/drivers/accel/qaic/qaic_control.c
+++ b/drivers/accel/qaic/qaic_control.c
@@ -1138,7 +1138,7 @@ static int abort_dma_cont(struct qaic_device *qdev, struct wrapper_list *wrapper
 		if (!list_is_first(&wrapper->list, &wrappers->list))
 			kref_put(&wrapper->ref_count, free_wrapper);
 
-	wrapper = add_wrapper(wrappers, offsetof(struct wrapper_msg, trans) + sizeof(*out_trans));
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper));
 
 	if (!wrapper)
 		return -ENOMEM;
diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
index 4a8e43a7a6a4..8da81768f2ab 100644
--- a/drivers/accel/qaic/qaic_data.c
+++ b/drivers/accel/qaic/qaic_data.c
@@ -51,6 +51,7 @@
 		})
 #define NUM_EVENTS	128
 #define NUM_DELAYS	10
+#define fifo_at(base, offset) ((base) + (offset) * get_dbc_req_elem_size())
 
 static unsigned int wait_exec_default_timeout_ms = 5000; /* 5 sec default */
 module_param(wait_exec_default_timeout_ms, uint, 0600);
@@ -1058,6 +1059,16 @@ unlock_usr_srcu:
 	return ret;
 }
 
+static inline u32 fifo_space_avail(u32 head, u32 tail, u32 q_size)
+{
+	u32 avail = head - tail - 1;
+
+	if (head <= tail)
+		avail += q_size;
+
+	return avail;
+}
+
 static inline int copy_exec_reqs(struct qaic_device *qdev, struct bo_slice *slice, u32 dbc_id,
 				 u32 head, u32 *ptail)
 {
@@ -1066,27 +1077,20 @@ static inline int copy_exec_reqs(struct qaic_device *qdev, struct bo_slice *slic
 	u32 tail = *ptail;
 	u32 avail;
 
-	avail = head - tail;
-	if (head <= tail)
-		avail += dbc->nelem;
-
-	--avail;
-
+	avail = fifo_space_avail(head, tail, dbc->nelem);
 	if (avail < slice->nents)
 		return -EAGAIN;
 
 	if (tail + slice->nents > dbc->nelem) {
 		avail = dbc->nelem - tail;
 		avail = min_t(u32, avail, slice->nents);
-		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(), reqs,
-		       sizeof(*reqs) * avail);
+		memcpy(fifo_at(dbc->req_q_base, tail), reqs, sizeof(*reqs) * avail);
 		reqs += avail;
 		avail = slice->nents - avail;
 		if (avail)
 			memcpy(dbc->req_q_base, reqs, sizeof(*reqs) * avail);
 	} else {
-		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(), reqs,
-		       sizeof(*reqs) * slice->nents);
+		memcpy(fifo_at(dbc->req_q_base, tail), reqs, sizeof(*reqs) * slice->nents);
 	}
 
 	*ptail = (tail + slice->nents) % dbc->nelem;
@@ -1094,46 +1098,31 @@ static inline int copy_exec_reqs(struct qaic_device *qdev, struct bo_slice *slic
 	return 0;
 }
 
-/*
- * Based on the value of resize we may only need to transmit first_n
- * entries and the last entry, with last_bytes to send from the last entry.
- * Note that first_n could be 0.
- */
 static inline int copy_partial_exec_reqs(struct qaic_device *qdev, struct bo_slice *slice,
-					 u64 resize, u32 dbc_id, u32 head, u32 *ptail)
+					 u64 resize, struct dma_bridge_chan *dbc, u32 head,
+					 u32 *ptail)
 {
-	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
 	struct dbc_req *reqs = slice->reqs;
 	struct dbc_req *last_req;
 	u32 tail = *ptail;
-	u64 total_bytes;
 	u64 last_bytes;
 	u32 first_n;
 	u32 avail;
-	int ret;
-	int i;
 
-	avail = head - tail;
-	if (head <= tail)
-		avail += dbc->nelem;
+	avail = fifo_space_avail(head, tail, dbc->nelem);
 
-	--avail;
-
-	total_bytes = 0;
-	for (i = 0; i < slice->nents; i++) {
-		total_bytes += le32_to_cpu(reqs[i].len);
-		if (total_bytes >= resize)
+	/*
+	 * After this for loop is complete, first_n represents the index
+	 * of the last DMA request of this slice that needs to be
+	 * transferred after resizing and last_bytes represents DMA size
+	 * of that request.
+	 */
+	last_bytes = resize;
+	for (first_n = 0; first_n < slice->nents; first_n++)
+		if (last_bytes > le32_to_cpu(reqs[first_n].len))
+			last_bytes -= le32_to_cpu(reqs[first_n].len);
+		else
 			break;
-	}
-
-	if (total_bytes < resize) {
-		/* User space should have used the full buffer path. */
-		ret = -EINVAL;
-		return ret;
-	}
-
-	first_n = i;
-	last_bytes = i ? resize + le32_to_cpu(reqs[i].len) - total_bytes : resize;
 
 	if (avail < (first_n + 1))
 		return -EAGAIN;
@@ -1142,22 +1131,21 @@ static inline int copy_partial_exec_reqs(struct qaic_device *qdev, struct bo_sli
 		if (tail + first_n > dbc->nelem) {
 			avail = dbc->nelem - tail;
 			avail = min_t(u32, avail, first_n);
-			memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(), reqs,
-			       sizeof(*reqs) * avail);
+			memcpy(fifo_at(dbc->req_q_base, tail), reqs, sizeof(*reqs) * avail);
 			last_req = reqs + avail;
 			avail = first_n - avail;
 			if (avail)
 				memcpy(dbc->req_q_base, last_req, sizeof(*reqs) * avail);
 		} else {
-			memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(), reqs,
-			       sizeof(*reqs) * first_n);
+			memcpy(fifo_at(dbc->req_q_base, tail), reqs, sizeof(*reqs) * first_n);
 		}
 	}
 
-	/* Copy over the last entry. Here we need to adjust len to the left over
+	/*
+	 * Copy over the last entry. Here we need to adjust len to the left over
 	 * size, and set src and dst to the entry it is copied to.
 	 */
-	last_req = dbc->req_q_base + (tail + first_n) % dbc->nelem * get_dbc_req_elem_size();
+	last_req = fifo_at(dbc->req_q_base, (tail + first_n) % dbc->nelem);
 	memcpy(last_req, reqs + slice->nents - 1, sizeof(*reqs));
 
 	/*
@@ -1168,6 +1156,9 @@ static inline int copy_partial_exec_reqs(struct qaic_device *qdev, struct bo_sli
 	last_req->len = cpu_to_le32((u32)last_bytes);
 	last_req->src_addr = reqs[first_n].src_addr;
 	last_req->dest_addr = reqs[first_n].dest_addr;
+	if (!last_bytes)
+		/* Disable DMA transfer */
+		last_req->cmd = GENMASK(7, 2) & reqs[first_n].cmd;
 
 	*ptail = (tail + first_n + 1) % dbc->nelem;
 
@@ -1227,26 +1218,17 @@ static int send_bo_list_to_device(struct qaic_device *qdev, struct drm_file *fil
 		bo->req_id = dbc->next_req_id++;
 
 		list_for_each_entry(slice, &bo->slices, slice) {
-			/*
-			 * If this slice does not fall under the given
-			 * resize then skip this slice and continue the loop
-			 */
-			if (is_partial && pexec[i].resize && pexec[i].resize <= slice->offset)
-				continue;
-
 			for (j = 0; j < slice->nents; j++)
 				slice->reqs[j].req_id = cpu_to_le16(bo->req_id);
 
-			/*
-			 * If it is a partial execute ioctl call then check if
-			 * resize has cut this slice short then do a partial copy
-			 * else do complete copy
-			 */
-			if (is_partial && pexec[i].resize &&
-			    pexec[i].resize < slice->offset + slice->size)
+			if (is_partial && (!pexec[i].resize || pexec[i].resize <= slice->offset))
+				/* Configure the slice for no DMA transfer */
+				ret = copy_partial_exec_reqs(qdev, slice, 0, dbc, head, tail);
+			else if (is_partial && pexec[i].resize < slice->offset + slice->size)
+				/* Configure the slice to be partially DMA transferred */
 				ret = copy_partial_exec_reqs(qdev, slice,
-							     pexec[i].resize - slice->offset,
-							     dbc->id, head, tail);
+							     pexec[i].resize - slice->offset, dbc,
+							     head, tail);
 			else
 				ret = copy_exec_reqs(qdev, slice, dbc->id, head, tail);
 			if (ret) {
@@ -1466,6 +1448,16 @@ irqreturn_t dbc_irq_handler(int irq, void *data)
 
 	rcu_id = srcu_read_lock(&dbc->ch_lock);
 
+	if (datapath_polling) {
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		/*
+		 * Normally datapath_polling will not have irqs enabled, but
+		 * when running with only one MSI the interrupt is shared with
+		 * MHI so it cannot be disabled. Return ASAP instead.
+		 */
+		return IRQ_HANDLED;
+	}
+
 	if (!dbc->usr) {
 		srcu_read_unlock(&dbc->ch_lock, rcu_id);
 		return IRQ_HANDLED;
@@ -1488,7 +1480,8 @@ irqreturn_t dbc_irq_handler(int irq, void *data)
 		return IRQ_NONE;
 	}
 
-	disable_irq_nosync(irq);
+	if (!dbc->qdev->single_msi)
+		disable_irq_nosync(irq);
 	srcu_read_unlock(&dbc->ch_lock, rcu_id);
 	return IRQ_WAKE_THREAD;
 }
@@ -1559,12 +1552,12 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data)
 	u32 tail;
 
 	rcu_id = srcu_read_lock(&dbc->ch_lock);
+	qdev = dbc->qdev;
 
 	head = readl(dbc->dbc_base + RSPHP_OFF);
 	if (head == U32_MAX) /* PCI link error */
 		goto error_out;
 
-	qdev = dbc->qdev;
 read_fifo:
 
 	if (!event_count) {
@@ -1645,14 +1638,14 @@ read_fifo:
 	goto read_fifo;
 
 normal_out:
-	if (likely(!datapath_polling))
+	if (!qdev->single_msi && likely(!datapath_polling))
 		enable_irq(irq);
-	else
+	else if (unlikely(datapath_polling))
 		schedule_work(&dbc->poll_work);
 	/* checking the fifo and enabling irqs is a race, missed event check */
 	tail = readl(dbc->dbc_base + RSPTP_OFF);
 	if (tail != U32_MAX && head != tail) {
-		if (likely(!datapath_polling))
+		if (!qdev->single_msi && likely(!datapath_polling))
 			disable_irq_nosync(irq);
 		goto read_fifo;
 	}
@@ -1661,9 +1654,9 @@ normal_out:
 
 error_out:
 	srcu_read_unlock(&dbc->ch_lock, rcu_id);
-	if (likely(!datapath_polling))
+	if (!qdev->single_msi && likely(!datapath_polling))
 		enable_irq(irq);
-	else
+	else if (unlikely(datapath_polling))
 		schedule_work(&dbc->poll_work);
 
 	return IRQ_HANDLED;
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
index 6f58095767df..b12226385003 100644
--- a/drivers/accel/qaic/qaic_drv.c
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -27,6 +27,7 @@
 
 #include "mhi_controller.h"
 #include "qaic.h"
+#include "qaic_timesync.h"
 
 MODULE_IMPORT_NS(DMA_BUF);
 
@@ -324,6 +325,7 @@ static void cleanup_qdev(struct qaic_device *qdev)
 	cleanup_srcu_struct(&qdev->dev_lock);
 	pci_set_drvdata(qdev->pdev, NULL);
 	destroy_workqueue(qdev->cntl_wq);
+	destroy_workqueue(qdev->qts_wq);
 }
 
 static struct qaic_device *create_qdev(struct pci_dev *pdev, const struct pci_device_id *id)
@@ -347,6 +349,12 @@ static struct qaic_device *create_qdev(struct pci_dev *pdev, const struct pci_de
 	if (!qdev->cntl_wq)
 		return NULL;
 
+	qdev->qts_wq = alloc_workqueue("qaic_ts", WQ_UNBOUND, 0);
+	if (!qdev->qts_wq) {
+		destroy_workqueue(qdev->cntl_wq);
+		return NULL;
+	}
+
 	pci_set_drvdata(pdev, qdev);
 	qdev->pdev = pdev;
 
@@ -424,14 +432,24 @@ static int init_msi(struct qaic_device *qdev, struct pci_dev *pdev)
 	int i;
 
 	/* Managed release since we use pcim_enable_device */
-	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
-	if (ret < 0)
-		return ret;
+	ret = pci_alloc_irq_vectors(pdev, 32, 32, PCI_IRQ_MSI);
+	if (ret == -ENOSPC) {
+		ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
+		if (ret < 0)
+			return ret;
 
-	if (ret < 32) {
-		pci_err(pdev, "%s: Requested 32 MSIs. Obtained %d MSIs which is less than the 32 required.\n",
-			__func__, ret);
-		return -ENODEV;
+		/*
+		 * Operate in one MSI mode. All interrupts will be directed to
+		 * MSI0; every interrupt will wake up all the interrupt handlers
+		 * (MHI and DBC[0-15]). Since the interrupt is now shared, it is
+		 * not disabled during DBC threaded handler, but only one thread
+		 * will be allowed to run per DBC, so while it can be
+		 * interrupted, it shouldn't race with itself.
+		 */
+		qdev->single_msi = true;
+		pci_info(pdev, "Allocating 32 MSIs failed, operating in 1 MSI mode. Performance may be impacted.\n");
+	} else if (ret < 0) {
+		return ret;
 	}
 
 	mhi_irq = pci_irq_vector(pdev, 0);
@@ -439,15 +457,17 @@ static int init_msi(struct qaic_device *qdev, struct pci_dev *pdev)
 		return mhi_irq;
 
 	for (i = 0; i < qdev->num_dbc; ++i) {
-		ret = devm_request_threaded_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+		ret = devm_request_threaded_irq(&pdev->dev,
+						pci_irq_vector(pdev, qdev->single_msi ? 0 : i + 1),
 						dbc_irq_handler, dbc_irq_threaded_fn, IRQF_SHARED,
 						"qaic_dbc", &qdev->dbc[i]);
 		if (ret)
 			return ret;
 
 		if (datapath_polling) {
-			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
-			disable_irq_nosync(qdev->dbc[i].irq);
+			qdev->dbc[i].irq = pci_irq_vector(pdev, qdev->single_msi ? 0 : i + 1);
+			if (!qdev->single_msi)
+				disable_irq_nosync(qdev->dbc[i].irq);
 			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
 		}
 	}
@@ -479,7 +499,8 @@ static int qaic_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto cleanup_qdev;
 	}
 
-	qdev->mhi_cntrl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
+	qdev->mhi_cntrl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq,
+						       qdev->single_msi);
 	if (IS_ERR(qdev->mhi_cntrl)) {
 		ret = PTR_ERR(qdev->mhi_cntrl);
 		goto cleanup_qdev;
@@ -586,6 +607,10 @@ static int __init qaic_init(void)
 		goto free_pci;
 	}
 
+	ret = qaic_timesync_init();
+	if (ret)
+		pr_debug("qaic: qaic_timesync_init failed %d\n", ret);
+
 	return 0;
 
 free_pci:
@@ -611,6 +636,7 @@ static void __exit qaic_exit(void)
 	 * reinitializing the link_up state after the cleanup is done.
 	 */
 	link_up = true;
+	qaic_timesync_deinit();
 	mhi_driver_unregister(&qaic_mhi_driver);
 	pci_unregister_driver(&qaic_pci_driver);
 }
diff --git a/drivers/accel/qaic/qaic_timesync.c b/drivers/accel/qaic/qaic_timesync.c
new file mode 100644
index 000000000000..301f4462d51b
--- /dev/null
+++ b/drivers/accel/qaic/qaic_timesync.c
@@ -0,0 +1,395 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/math64.h>
+#include <linux/mhi.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/time64.h>
+#include <linux/timer.h>
+
+#include "qaic.h"
+#include "qaic_timesync.h"
+
+#define QTIMER_REG_OFFSET			0xa28
+#define QAIC_TIMESYNC_SIGNATURE			0x55aa
+#define QAIC_CONV_QTIMER_TO_US(qtimer)		(mul_u64_u32_div(qtimer, 10, 192))
+
+static unsigned int timesync_delay_ms = 1000; /* 1 sec default */
+module_param(timesync_delay_ms, uint, 0600);
+MODULE_PARM_DESC(timesync_delay_ms, "Delay in ms between two consecutive timesync operations");
+
+enum qts_msg_type {
+	QAIC_TS_CMD_TO_HOST,
+	QAIC_TS_SYNC_REQ,
+	QAIC_TS_ACK_TO_HOST,
+	QAIC_TS_MSG_TYPE_MAX
+};
+
+/**
+ * struct qts_hdr - Timesync message header structure.
+ * @signature: Unique signature to identify the timesync message.
+ * @reserved_1: Reserved for future use.
+ * @reserved_2: Reserved for future use.
+ * @msg_type: sub-type of the timesync message.
+ * @reserved_3: Reserved for future use.
+ */
+struct qts_hdr {
+	__le16	signature;
+	__le16	reserved_1;
+	u8	reserved_2;
+	u8	msg_type;
+	__le16	reserved_3;
+} __packed;
+
+/**
+ * struct qts_timeval - Structure to carry time information.
+ * @tv_sec: Seconds part of the time.
+ * @tv_usec: uS (microseconds) part of the time.
+ */
+struct qts_timeval {
+	__le64	tv_sec;
+	__le64	tv_usec;
+} __packed;
+
+/**
+ * struct qts_host_time_sync_msg_data - Structure to denote the timesync message.
+ * @header: Header of the timesync message.
+ * @data: Time information.
+ */
+struct qts_host_time_sync_msg_data {
+	struct	qts_hdr header;
+	struct	qts_timeval data;
+} __packed;
+
+/**
+ * struct mqts_dev - MHI QAIC Timesync Control device.
+ * @qdev: Pointer to the root device struct driven by QAIC driver.
+ * @mhi_dev: Pointer to associated MHI device.
+ * @timer: Timer handle used for timesync.
+ * @qtimer_addr: Device QTimer register pointer.
+ * @buff_in_use: atomic variable to track if the sync_msg buffer is in use.
+ * @dev: Device pointer to qdev->pdev->dev stored for easy access.
+ * @sync_msg: Buffer used to send timesync message over MHI.
+ */
+struct mqts_dev {
+	struct qaic_device *qdev;
+	struct mhi_device *mhi_dev;
+	struct timer_list timer;
+	void __iomem *qtimer_addr;
+	atomic_t buff_in_use;
+	struct device *dev;
+	struct qts_host_time_sync_msg_data *sync_msg;
+};
+
+struct qts_resp_msg {
+	struct qts_hdr	hdr;
+} __packed;
+
+struct qts_resp {
+	struct qts_resp_msg	data;
+	struct work_struct	work;
+	struct qaic_device	*qdev;
+};
+
+#ifdef readq
+static u64 read_qtimer(const volatile void __iomem *addr)
+{
+	return readq(addr);
+}
+#else
+static u64 read_qtimer(const volatile void __iomem *addr)
+{
+	u64 low, high;
+
+	low = readl(addr);
+	high = readl(addr + sizeof(u32));
+	return low | (high << 32);
+}
+#endif
+
+static void qaic_timesync_ul_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
+{
+	struct mqts_dev *mqtsdev = dev_get_drvdata(&mhi_dev->dev);
+
+	dev_dbg(mqtsdev->dev, "%s status: %d xfer_len: %zu\n", __func__,
+		mhi_result->transaction_status, mhi_result->bytes_xferd);
+
+	atomic_set(&mqtsdev->buff_in_use, 0);
+}
+
+static void qaic_timesync_dl_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
+{
+	struct mqts_dev *mqtsdev = dev_get_drvdata(&mhi_dev->dev);
+
+	dev_err(mqtsdev->dev, "%s no data expected on dl channel\n", __func__);
+}
+
+static void qaic_timesync_timer(struct timer_list *t)
+{
+	struct mqts_dev *mqtsdev = from_timer(mqtsdev, t, timer);
+	struct qts_host_time_sync_msg_data *sync_msg;
+	u64 device_qtimer_us;
+	u64 device_qtimer;
+	u64 host_time_us;
+	u64 offset_us;
+	u64 host_sec;
+	int ret;
+
+	if (atomic_read(&mqtsdev->buff_in_use)) {
+		dev_dbg(mqtsdev->dev, "%s buffer not free, schedule next cycle\n", __func__);
+		goto mod_timer;
+	}
+	atomic_set(&mqtsdev->buff_in_use, 1);
+
+	sync_msg = mqtsdev->sync_msg;
+	sync_msg->header.signature = cpu_to_le16(QAIC_TIMESYNC_SIGNATURE);
+	sync_msg->header.msg_type = QAIC_TS_SYNC_REQ;
+	/* Read host UTC time and convert to uS*/
+	host_time_us = div_u64(ktime_get_real_ns(), NSEC_PER_USEC);
+	device_qtimer = read_qtimer(mqtsdev->qtimer_addr);
+	device_qtimer_us = QAIC_CONV_QTIMER_TO_US(device_qtimer);
+	/* Offset between host UTC and device time */
+	offset_us = host_time_us - device_qtimer_us;
+
+	host_sec = div_u64(offset_us, USEC_PER_SEC);
+	sync_msg->data.tv_usec = cpu_to_le64(offset_us - host_sec * USEC_PER_SEC);
+	sync_msg->data.tv_sec = cpu_to_le64(host_sec);
+	ret = mhi_queue_buf(mqtsdev->mhi_dev, DMA_TO_DEVICE, sync_msg, sizeof(*sync_msg), MHI_EOT);
+	if (ret && (ret != -EAGAIN)) {
+		dev_err(mqtsdev->dev, "%s unable to queue to mhi:%d\n", __func__, ret);
+		return;
+	} else if (ret == -EAGAIN) {
+		atomic_set(&mqtsdev->buff_in_use, 0);
+	}
+
+mod_timer:
+	ret = mod_timer(t, jiffies + msecs_to_jiffies(timesync_delay_ms));
+	if (ret)
+		dev_err(mqtsdev->dev, "%s mod_timer error:%d\n", __func__, ret);
+}
+
+static int qaic_timesync_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
+{
+	struct qaic_device *qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+	struct mqts_dev *mqtsdev;
+	struct timer_list *timer;
+	int ret;
+
+	mqtsdev = kzalloc(sizeof(*mqtsdev), GFP_KERNEL);
+	if (!mqtsdev) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	timer = &mqtsdev->timer;
+	mqtsdev->mhi_dev = mhi_dev;
+	mqtsdev->qdev = qdev;
+	mqtsdev->dev = &qdev->pdev->dev;
+
+	mqtsdev->sync_msg = kzalloc(sizeof(*mqtsdev->sync_msg), GFP_KERNEL);
+	if (!mqtsdev->sync_msg) {
+		ret = -ENOMEM;
+		goto free_mqts_dev;
+	}
+	atomic_set(&mqtsdev->buff_in_use, 0);
+
+	ret = mhi_prepare_for_transfer(mhi_dev);
+	if (ret)
+		goto free_sync_msg;
+
+	/* Qtimer register pointer */
+	mqtsdev->qtimer_addr = qdev->bar_0 + QTIMER_REG_OFFSET;
+	timer_setup(timer, qaic_timesync_timer, 0);
+	timer->expires = jiffies + msecs_to_jiffies(timesync_delay_ms);
+	add_timer(timer);
+	dev_set_drvdata(&mhi_dev->dev, mqtsdev);
+
+	return 0;
+
+free_sync_msg:
+	kfree(mqtsdev->sync_msg);
+free_mqts_dev:
+	kfree(mqtsdev);
+out:
+	return ret;
+};
+
+static void qaic_timesync_remove(struct mhi_device *mhi_dev)
+{
+	struct mqts_dev *mqtsdev = dev_get_drvdata(&mhi_dev->dev);
+
+	del_timer_sync(&mqtsdev->timer);
+	mhi_unprepare_from_transfer(mqtsdev->mhi_dev);
+	kfree(mqtsdev->sync_msg);
+	kfree(mqtsdev);
+}
+
+static const struct mhi_device_id qaic_timesync_match_table[] = {
+	{ .chan = "QAIC_TIMESYNC_PERIODIC"},
+	{},
+};
+
+MODULE_DEVICE_TABLE(mhi, qaic_timesync_match_table);
+
+static struct mhi_driver qaic_timesync_driver = {
+	.id_table = qaic_timesync_match_table,
+	.remove = qaic_timesync_remove,
+	.probe = qaic_timesync_probe,
+	.ul_xfer_cb = qaic_timesync_ul_xfer_cb,
+	.dl_xfer_cb = qaic_timesync_dl_xfer_cb,
+	.driver = {
+		.name = "qaic_timesync_periodic",
+	},
+};
+
+static void qaic_boot_timesync_worker(struct work_struct *work)
+{
+	struct qts_resp *resp = container_of(work, struct qts_resp, work);
+	struct qts_host_time_sync_msg_data *req;
+	struct qts_resp_msg data = resp->data;
+	struct qaic_device *qdev = resp->qdev;
+	struct mhi_device *mhi_dev;
+	struct timespec64 ts;
+	int ret;
+
+	mhi_dev = qdev->qts_ch;
+	/* Queue the response message beforehand to avoid race conditions */
+	ret = mhi_queue_buf(mhi_dev, DMA_FROM_DEVICE, &resp->data, sizeof(resp->data), MHI_EOT);
+	if (ret) {
+		kfree(resp);
+		dev_warn(&mhi_dev->dev, "Failed to re-queue response buffer %d\n", ret);
+		return;
+	}
+
+	switch (data.hdr.msg_type) {
+	case QAIC_TS_CMD_TO_HOST:
+		req = kzalloc(sizeof(*req), GFP_KERNEL);
+		if (!req)
+			break;
+
+		req->header = data.hdr;
+		req->header.msg_type = QAIC_TS_SYNC_REQ;
+		ktime_get_real_ts64(&ts);
+		req->data.tv_sec = cpu_to_le64(ts.tv_sec);
+		req->data.tv_usec = cpu_to_le64(div_u64(ts.tv_nsec, NSEC_PER_USEC));
+
+		ret = mhi_queue_buf(mhi_dev, DMA_TO_DEVICE, req, sizeof(*req), MHI_EOT);
+		if (ret) {
+			kfree(req);
+			dev_dbg(&mhi_dev->dev, "Failed to send request message. Error %d\n", ret);
+		}
+		break;
+	case QAIC_TS_ACK_TO_HOST:
+		dev_dbg(&mhi_dev->dev, "ACK received from device\n");
+		break;
+	default:
+		dev_err(&mhi_dev->dev, "Invalid message type %u.\n", data.hdr.msg_type);
+	}
+}
+
+static int qaic_boot_timesync_queue_resp(struct mhi_device *mhi_dev, struct qaic_device *qdev)
+{
+	struct qts_resp *resp;
+	int ret;
+
+	resp = kzalloc(sizeof(*resp), GFP_KERNEL);
+	if (!resp)
+		return -ENOMEM;
+
+	resp->qdev = qdev;
+	INIT_WORK(&resp->work, qaic_boot_timesync_worker);
+
+	ret = mhi_queue_buf(mhi_dev, DMA_FROM_DEVICE, &resp->data, sizeof(resp->data), MHI_EOT);
+	if (ret) {
+		kfree(resp);
+		dev_warn(&mhi_dev->dev, "Failed to queue response buffer %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void qaic_boot_timesync_remove(struct mhi_device *mhi_dev)
+{
+	struct qaic_device *qdev;
+
+	qdev = dev_get_drvdata(&mhi_dev->dev);
+	mhi_unprepare_from_transfer(qdev->qts_ch);
+	qdev->qts_ch = NULL;
+}
+
+static int qaic_boot_timesync_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
+{
+	struct qaic_device *qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+	int ret;
+
+	ret = mhi_prepare_for_transfer(mhi_dev);
+	if (ret)
+		return ret;
+
+	qdev->qts_ch = mhi_dev;
+	dev_set_drvdata(&mhi_dev->dev, qdev);
+
+	ret = qaic_boot_timesync_queue_resp(mhi_dev, qdev);
+	if (ret) {
+		dev_set_drvdata(&mhi_dev->dev, NULL);
+		qdev->qts_ch = NULL;
+		mhi_unprepare_from_transfer(mhi_dev);
+	}
+
+	return ret;
+}
+
+static void qaic_boot_timesync_ul_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
+{
+	kfree(mhi_result->buf_addr);
+}
+
+static void qaic_boot_timesync_dl_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
+{
+	struct qts_resp *resp = container_of(mhi_result->buf_addr, struct qts_resp, data);
+
+	if (mhi_result->transaction_status || mhi_result->bytes_xferd != sizeof(resp->data)) {
+		kfree(resp);
+		return;
+	}
+
+	queue_work(resp->qdev->qts_wq, &resp->work);
+}
+
+static const struct mhi_device_id qaic_boot_timesync_match_table[] = {
+	{ .chan = "QAIC_TIMESYNC"},
+	{},
+};
+
+static struct mhi_driver qaic_boot_timesync_driver = {
+	.id_table = qaic_boot_timesync_match_table,
+	.remove = qaic_boot_timesync_remove,
+	.probe = qaic_boot_timesync_probe,
+	.ul_xfer_cb = qaic_boot_timesync_ul_xfer_cb,
+	.dl_xfer_cb = qaic_boot_timesync_dl_xfer_cb,
+	.driver = {
+		.name = "qaic_timesync",
+	},
+};
+
+int qaic_timesync_init(void)
+{
+	int ret;
+
+	ret = mhi_driver_register(&qaic_timesync_driver);
+	if (ret)
+		return ret;
+	ret = mhi_driver_register(&qaic_boot_timesync_driver);
+
+	return ret;
+}
+
+void qaic_timesync_deinit(void)
+{
+	mhi_driver_unregister(&qaic_boot_timesync_driver);
+	mhi_driver_unregister(&qaic_timesync_driver);
+}
diff --git a/drivers/accel/qaic/qaic_timesync.h b/drivers/accel/qaic/qaic_timesync.h
new file mode 100644
index 000000000000..851b7acd43bb
--- /dev/null
+++ b/drivers/accel/qaic/qaic_timesync.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef __QAIC_TIMESYNC_H__
+#define __QAIC_TIMESYNC_H__
+
+int qaic_timesync_init(void);
+void qaic_timesync_deinit(void);
+#endif /* __QAIC_TIMESYNC_H__ */
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 3eee8636f847..cdbc56e07649 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -75,15 +75,15 @@ config DRM_KUNIT_TEST_HELPERS
 config DRM_KUNIT_TEST
 	tristate "KUnit tests for DRM" if !KUNIT_ALL_TESTS
 	depends on DRM && KUNIT
-	select PRIME_NUMBERS
+	select DRM_BUDDY
 	select DRM_DISPLAY_DP_HELPER
 	select DRM_DISPLAY_HELPER
-	select DRM_LIB_RANDOM
-	select DRM_KMS_HELPER
-	select DRM_BUDDY
-	select DRM_EXPORT_FOR_TESTS if m
-	select DRM_KUNIT_TEST_HELPERS
 	select DRM_EXEC
+	select DRM_EXPORT_FOR_TESTS if m
+	select DRM_KMS_HELPER
+	select DRM_KUNIT_TEST_HELPERS
+	select DRM_LIB_RANDOM
+	select PRIME_NUMBERS
 	default KUNIT_ALL_TESTS
 	help
 	  This builds unit tests for DRM. This option is not useful for
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 8e1bde059170..cdbe91ac0bfc 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -22,6 +22,7 @@ drm-y := \
 	drm_drv.o \
 	drm_dumb_buffers.o \
 	drm_edid.o \
+	drm_eld.o \
 	drm_encoder.o \
 	drm_file.o \
 	drm_fourcc.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 625db444df1c..10d56979fe3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
 
-		if (!(ring && ring->sched.thread))
+		if (!(ring && drm_sched_wqueue_ready(&ring->sched)))
 			continue;
 
 		/* stop secheduler and drain ring. */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a53f436fa9f1..c1efa13bccbb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1665,9 +1665,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
-		kthread_park(ring->sched.thread);
+		drm_sched_wqueue_stop(&ring->sched);
 	}
 
 	seq_puts(m, "run ib test:\n");
@@ -1681,9 +1681,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
-		kthread_unpark(ring->sched.thread);
+		drm_sched_wqueue_start(&ring->sched);
 	}
 
 	up_write(&adev->reset_domain->sem);
@@ -1903,7 +1903,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 	ring = adev->rings[val];
 
-	if (!ring || !ring->funcs->preempt_ib || !ring->sched.thread)
+	if (!ring || !ring->funcs->preempt_ib ||
+	    !drm_sched_wqueue_ready(&ring->sched))
 		return -EINVAL;
 
 	/* the last preemption failed */
@@ -1921,7 +1922,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 		goto pro_end;
 
 	/* stop the scheduler */
-	kthread_park(ring->sched.thread);
+	drm_sched_wqueue_stop(&ring->sched);
 
 	/* preempt the IB */
 	r = amdgpu_ring_preempt_ib(ring);
@@ -1955,7 +1956,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 failure:
 	/* restart the scheduler */
-	kthread_unpark(ring->sched.thread);
+	drm_sched_wqueue_start(&ring->sched);
 
 	up_read(&adev->reset_domain->sem);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7eeaf0aa7f81..2a6684a38714 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2573,7 +2573,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 			break;
 		}
 
-		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
 				   DRM_SCHED_PRIORITY_COUNT,
 				   ring->num_hw_submission, 0,
 				   timeout, adev->reset_domain->wq,
@@ -4964,7 +4964,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		spin_lock(&ring->sched.job_list_lock);
@@ -5103,7 +5103,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		/* Clear job fence from fence drv to avoid force_completion
@@ -5592,7 +5592,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
@@ -5668,7 +5668,7 @@ skip_hw_reset:
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_start(&ring->sched, true);
@@ -5991,7 +5991,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, NULL);
@@ -6119,7 +6119,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		drm_sched_start(&ring->sched, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1f357198533f..62bb7fc7448a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -115,7 +115,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (!entity)
 		return 0;
 
-	return drm_sched_job_init(&(*job)->base, entity, owner);
+	return drm_sched_job_init(&(*job)->base, entity, 1, owner);
 }
 
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ee97814ebd99..1513a9c602ee 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -87,6 +87,7 @@
 #include <drm/drm_blend.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <drm/drm_vblank.h>
 #include <drm/drm_audio_component.h>
 #include <drm/drm_gem_atomic_helper.h>
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 2444fc33dd7c..c3f677130def 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -2382,10 +2382,10 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state,
 EXPORT_SYMBOL(drm_atomic_helper_setup_commit);
 
 /**
- * drm_atomic_helper_wait_for_dependencies - wait for required preceeding commits
+ * drm_atomic_helper_wait_for_dependencies - wait for required preceding commits
  * @old_state: atomic state object with old state structures
  *
- * This function waits for all preceeding commits that touch the same CRTC as
+ * This function waits for all preceding commits that touch the same CRTC as
  * @old_state to both be committed to the hardware (as signalled by
  * drm_atomic_helper_commit_hw_done()) and executed by the hardware (as signalled
  * by calling drm_crtc_send_vblank_event() on the &drm_crtc_state.event).
diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index c3027115d055..9403b3f576f7 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -5,7 +5,6 @@
 
 #include <linux/iosys-map.h>
 #include <linux/list.h>
-#include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
@@ -84,16 +83,13 @@ int drm_client_init(struct drm_device *dev, struct drm_client_dev *client,
 	if (!drm_core_check_feature(dev, DRIVER_MODESET) || !dev->driver->dumb_create)
 		return -EOPNOTSUPP;
 
-	if (funcs && !try_module_get(funcs->owner))
-		return -ENODEV;
-
 	client->dev = dev;
 	client->name = name;
 	client->funcs = funcs;
 
 	ret = drm_client_modeset_create(client);
 	if (ret)
-		goto err_put_module;
+		return ret;
 
 	ret = drm_client_open(client);
 	if (ret)
@@ -105,10 +101,6 @@ int drm_client_init(struct drm_device *dev, struct drm_client_dev *client,
 
 err_free:
 	drm_client_modeset_free(client);
-err_put_module:
-	if (funcs)
-		module_put(funcs->owner);
-
 	return ret;
 }
 EXPORT_SYMBOL(drm_client_init);
@@ -177,8 +169,6 @@ void drm_client_release(struct drm_client_dev *client)
 	drm_client_modeset_free(client);
 	drm_client_close(client);
 	drm_dev_put(dev);
-	if (client->funcs)
-		module_put(client->funcs->owner);
 }
 EXPORT_SYMBOL(drm_client_release);
 
diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index c3725086f413..b0516505f7ae 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -1198,6 +1198,12 @@ static const u32 dp_colorspaces =
  * 	drm_connector_set_path_property(), in the case of DP MST with the
  * 	path property the MST manager created. Userspace cannot change this
  * 	property.
+ *
+ * 	In the case of DP MST, the property has the format
+ * 	``mst:<parent>-<ports>`` where ``<parent>`` is the KMS object ID of the
+ * 	parent connector and ``<ports>`` is a hyphen-separated list of DP MST
+ * 	port numbers. Note, KMS object IDs are not guaranteed to be stable
+ * 	across reboots.
  * TILE:
  * 	Connector tile group property to indicate how a set of DRM connector
  * 	compose together into one logical screen. This is used by both high-res
diff --git a/drivers/gpu/drm/drm_crtc_internal.h b/drivers/gpu/drm/drm_crtc_internal.h
index 8556c3b3ff88..6b646e0783be 100644
--- a/drivers/gpu/drm/drm_crtc_internal.h
+++ b/drivers/gpu/drm/drm_crtc_internal.h
@@ -222,6 +222,8 @@ int drm_mode_addfb2_ioctl(struct drm_device *dev,
 			  void *data, struct drm_file *file_priv);
 int drm_mode_rmfb_ioctl(struct drm_device *dev,
 			void *data, struct drm_file *file_priv);
+int drm_mode_closefb_ioctl(struct drm_device *dev,
+			   void *data, struct drm_file *file_priv);
 int drm_mode_getfb(struct drm_device *dev,
 		   void *data, struct drm_file *file_priv);
 int drm_mode_getfb2_ioctl(struct drm_device *dev,
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 39db08f803ea..cb4031d5dcbb 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -41,10 +41,12 @@
 #include <drm/drm_displayid.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <drm/drm_encoder.h>
 #include <drm/drm_print.h>
 
 #include "drm_crtc_internal.h"
+#include "drm_internal.h"
 
 static int oui(u8 first, u8 second, u8 third)
 {
@@ -5509,6 +5511,27 @@ static void clear_eld(struct drm_connector *connector)
 	connector->audio_latency[1] = 0;
 }
 
+/*
+ * Get 3-byte SAD buffer from struct cea_sad.
+ */
+void drm_edid_cta_sad_get(const struct cea_sad *cta_sad, u8 *sad)
+{
+	sad[0] = cta_sad->format << 3 | cta_sad->channels;
+	sad[1] = cta_sad->freq;
+	sad[2] = cta_sad->byte2;
+}
+
+/*
+ * Set struct cea_sad from 3-byte SAD buffer.
+ */
+void drm_edid_cta_sad_set(struct cea_sad *cta_sad, const u8 *sad)
+{
+	cta_sad->format = (sad[0] & 0x78) >> 3;
+	cta_sad->channels = sad[0] & 0x07;
+	cta_sad->freq = sad[1] & 0x7f;
+	cta_sad->byte2 = sad[2];
+}
+
 /*
  * drm_edid_to_eld - build ELD from EDID
  * @connector: connector corresponding to the HDMI/DP sink
@@ -5593,7 +5616,7 @@ static void drm_edid_to_eld(struct drm_connector *connector,
 }
 
 static int _drm_edid_to_sad(const struct drm_edid *drm_edid,
-			    struct cea_sad **sads)
+			    struct cea_sad **psads)
 {
 	const struct cea_db *db;
 	struct cea_db_iter iter;
@@ -5602,20 +5625,16 @@ static int _drm_edid_to_sad(const struct drm_edid *drm_edid,
 	cea_db_iter_edid_begin(drm_edid, &iter);
 	cea_db_iter_for_each(db, &iter) {
 		if (cea_db_tag(db) == CTA_DB_AUDIO) {
-			int j;
+			struct cea_sad *sads;
+			int i;
 
 			count = cea_db_payload_len(db) / 3; /* SAD is 3B */
-			*sads = kcalloc(count, sizeof(**sads), GFP_KERNEL);
-			if (!*sads)
+			sads = kcalloc(count, sizeof(*sads), GFP_KERNEL);
+			*psads = sads;
+			if (!sads)
 				return -ENOMEM;
-			for (j = 0; j < count; j++) {
-				const u8 *sad = &db->data[j * 3];
-
-				(*sads)[j].format = (sad[0] & 0x78) >> 3;
-				(*sads)[j].channels = sad[0] & 0x7;
-				(*sads)[j].freq = sad[1] & 0x7F;
-				(*sads)[j].byte2 = sad[2];
-			}
+			for (i = 0; i < count; i++)
+				drm_edid_cta_sad_set(&sads[i], &db->data[i * 3]);
 			break;
 		}
 	}
diff --git a/drivers/gpu/drm/drm_eld.c b/drivers/gpu/drm/drm_eld.c
new file mode 100644
index 000000000000..5177991aa272
--- /dev/null
+++ b/drivers/gpu/drm/drm_eld.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
+
+#include "drm_internal.h"
+
+/**
+ * drm_eld_sad_get - get SAD from ELD to struct cea_sad
+ * @eld: ELD buffer
+ * @sad_index: SAD index
+ * @cta_sad: destination struct cea_sad
+ *
+ * @return: 0 on success, or negative on errors
+ */
+int drm_eld_sad_get(const u8 *eld, int sad_index, struct cea_sad *cta_sad)
+{
+	const u8 *sad;
+
+	if (sad_index >= drm_eld_sad_count(eld))
+		return -EINVAL;
+
+	sad = eld + DRM_ELD_CEA_SAD(drm_eld_mnl(eld), sad_index);
+
+	drm_edid_cta_sad_set(cta_sad, sad);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_eld_sad_get);
+
+/**
+ * drm_eld_sad_set - set SAD to ELD from struct cea_sad
+ * @eld: ELD buffer
+ * @sad_index: SAD index
+ * @cta_sad: source struct cea_sad
+ *
+ * @return: 0 on success, or negative on errors
+ */
+int drm_eld_sad_set(u8 *eld, int sad_index, const struct cea_sad *cta_sad)
+{
+	u8 *sad;
+
+	if (sad_index >= drm_eld_sad_count(eld))
+		return -EINVAL;
+
+	sad = eld + DRM_ELD_CEA_SAD(drm_eld_mnl(eld), sad_index);
+
+	drm_edid_cta_sad_get(cta_sad, sad);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_eld_sad_set);
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 446458aca8e9..5ddaffd32586 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -913,7 +913,7 @@ static void print_size(struct drm_printer *p, const char *stat,
 	unsigned u;
 
 	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
-		if (sz < SZ_1K)
+		if (sz == 0 || !IS_ALIGNED(sz, SZ_1K))
 			break;
 		sz = div_u64(sz, SZ_1K);
 	}
diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c
index 060b753881a2..8c6090a90d56 100644
--- a/drivers/gpu/drm/drm_flip_work.c
+++ b/drivers/gpu/drm/drm_flip_work.c
@@ -27,14 +27,12 @@
 #include <drm/drm_print.h>
 #include <drm/drm_util.h>
 
-/**
- * drm_flip_work_allocate_task - allocate a flip-work task
- * @data: data associated to the task
- * @flags: allocator flags
- *
- * Allocate a drm_flip_task object and attach private data to it.
- */
-struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags)
+struct drm_flip_task {
+	struct list_head node;
+	void *data;
+};
+
+static struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags)
 {
 	struct drm_flip_task *task;
 
@@ -44,18 +42,8 @@ struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags)
 
 	return task;
 }
-EXPORT_SYMBOL(drm_flip_work_allocate_task);
 
-/**
- * drm_flip_work_queue_task - queue a specific task
- * @work: the flip-work
- * @task: the task to handle
- *
- * Queues task, that will later be run (passed back to drm_flip_func_t
- * func) on a work queue after drm_flip_work_commit() is called.
- */
-void drm_flip_work_queue_task(struct drm_flip_work *work,
-			      struct drm_flip_task *task)
+static void drm_flip_work_queue_task(struct drm_flip_work *work, struct drm_flip_task *task)
 {
 	unsigned long flags;
 
@@ -63,7 +51,6 @@ void drm_flip_work_queue_task(struct drm_flip_work *work,
 	list_add_tail(&task->node, &work->queued);
 	spin_unlock_irqrestore(&work->lock, flags);
 }
-EXPORT_SYMBOL(drm_flip_work_queue_task);
 
 /**
  * drm_flip_work_queue - queue work
diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index f93a4efcee90..b1be458ed4dd 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -20,6 +20,97 @@
 #include <drm/drm_print.h>
 #include <drm/drm_rect.h>
 
+/**
+ * drm_format_conv_state_init - Initialize format-conversion state
+ * @state: The state to initialize
+ *
+ * Clears all fields in struct drm_format_conv_state. The state will
+ * be empty with no preallocated resources.
+ */
+void drm_format_conv_state_init(struct drm_format_conv_state *state)
+{
+	state->tmp.mem = NULL;
+	state->tmp.size = 0;
+	state->tmp.preallocated = false;
+}
+EXPORT_SYMBOL(drm_format_conv_state_init);
+
+/**
+ * drm_format_conv_state_copy - Copy format-conversion state
+ * @state: Destination state
+ * @old_state: Source state
+ *
+ * Copies format-conversion state from @old_state to @state; except for
+ * temporary storage.
+ */
+void drm_format_conv_state_copy(struct drm_format_conv_state *state,
+				const struct drm_format_conv_state *old_state)
+{
+	/*
+	 * So far, there's only temporary storage here, which we don't
+	 * duplicate. Just clear the fields.
+	 */
+	state->tmp.mem = NULL;
+	state->tmp.size = 0;
+	state->tmp.preallocated = false;
+}
+EXPORT_SYMBOL(drm_format_conv_state_copy);
+
+/**
+ * drm_format_conv_state_reserve - Allocates storage for format conversion
+ * @state: The format-conversion state
+ * @new_size: The minimum allocation size
+ * @flags: Flags for kmalloc()
+ *
+ * Allocates at least @new_size bytes and returns a pointer to the memory
+ * range. After calling this function, previously returned memory blocks
+ * are invalid. It's best to collect all memory requirements of a format
+ * conversion and call this function once to allocate the range.
+ *
+ * Returns:
+ * A pointer to the allocated memory range, or NULL otherwise.
+ */
+void *drm_format_conv_state_reserve(struct drm_format_conv_state *state,
+				    size_t new_size, gfp_t flags)
+{
+	void *mem;
+
+	if (new_size <= state->tmp.size)
+		goto out;
+	else if (state->tmp.preallocated)
+		return NULL;
+
+	mem = krealloc(state->tmp.mem, new_size, flags);
+	if (!mem)
+		return NULL;
+
+	state->tmp.mem = mem;
+	state->tmp.size = new_size;
+
+out:
+	return state->tmp.mem;
+}
+EXPORT_SYMBOL(drm_format_conv_state_reserve);
+
+/**
+ * drm_format_conv_state_release - Releases an format-conversion storage
+ * @state: The format-conversion state
+ *
+ * Releases the memory range references by the format-conversion state.
+ * After this call, all pointers to the memory are invalid. Prefer
+ * drm_format_conv_state_init() for cleaning up and unloading a driver.
+ */
+void drm_format_conv_state_release(struct drm_format_conv_state *state)
+{
+	if (state->tmp.preallocated)
+		return;
+
+	kfree(state->tmp.mem);
+	state->tmp.mem = NULL;
+	state->tmp.size = 0;
+}
+EXPORT_SYMBOL(drm_format_conv_state_release);
+
 static unsigned int clip_offset(const struct drm_rect *clip, unsigned int pitch, unsigned int cpp)
 {
 	return clip->y1 * pitch + clip->x1 * cpp;
@@ -45,6 +136,7 @@ EXPORT_SYMBOL(drm_fb_clip_offset);
 static int __drm_fb_xfrm(void *dst, unsigned long dst_pitch, unsigned long dst_pixsize,
 			 const void *vaddr, const struct drm_framebuffer *fb,
 			 const struct drm_rect *clip, bool vaddr_cached_hint,
+			 struct drm_format_conv_state *state,
 			 void (*xfrm_line)(void *dbuf, const void *sbuf, unsigned int npixels))
 {
 	unsigned long linepixels = drm_rect_width(clip);
@@ -60,7 +152,7 @@ static int __drm_fb_xfrm(void *dst, unsigned long dst_pitch, unsigned long dst_p
 	 * one line at a time.
 	 */
 	if (!vaddr_cached_hint) {
-		stmp = kmalloc(sbuf_len, GFP_KERNEL);
+		stmp = drm_format_conv_state_reserve(state, sbuf_len, GFP_KERNEL);
 		if (!stmp)
 			return -ENOMEM;
 	}
@@ -79,8 +171,6 @@ static int __drm_fb_xfrm(void *dst, unsigned long dst_pitch, unsigned long dst_p
 		dst += dst_pitch;
 	}
 
-	kfree(stmp);
-
 	return 0;
 }
 
@@ -88,6 +178,7 @@ static int __drm_fb_xfrm(void *dst, unsigned long dst_pitch, unsigned long dst_p
 static int __drm_fb_xfrm_toio(void __iomem *dst, unsigned long dst_pitch, unsigned long dst_pixsize,
 			      const void *vaddr, const struct drm_framebuffer *fb,
 			      const struct drm_rect *clip, bool vaddr_cached_hint,
+			      struct drm_format_conv_state *state,
 			      void (*xfrm_line)(void *dbuf, const void *sbuf, unsigned int npixels))
 {
 	unsigned long linepixels = drm_rect_width(clip);
@@ -101,9 +192,9 @@ static int __drm_fb_xfrm_toio(void __iomem *dst, unsigned long dst_pitch, unsign
 	void *dbuf;
 
 	if (vaddr_cached_hint) {
-		dbuf = kmalloc(dbuf_len, GFP_KERNEL);
+		dbuf = drm_format_conv_state_reserve(state, dbuf_len, GFP_KERNEL);
 	} else {
-		dbuf = kmalloc(stmp_off + sbuf_len, GFP_KERNEL);
+		dbuf = drm_format_conv_state_reserve(state, stmp_off + sbuf_len, GFP_KERNEL);
 		stmp = dbuf + stmp_off;
 	}
 	if (!dbuf)
@@ -124,8 +215,6 @@ static int __drm_fb_xfrm_toio(void __iomem *dst, unsigned long dst_pitch, unsign
 		dst += dst_pitch;
 	}
 
-	kfree(dbuf);
-
 	return 0;
 }
 
@@ -134,6 +223,7 @@ static int drm_fb_xfrm(struct iosys_map *dst,
 		       const unsigned int *dst_pitch, const u8 *dst_pixsize,
 		       const struct iosys_map *src, const struct drm_framebuffer *fb,
 		       const struct drm_rect *clip, bool vaddr_cached_hint,
+		       struct drm_format_conv_state *state,
 		       void (*xfrm_line)(void *dbuf, const void *sbuf, unsigned int npixels))
 {
 	static const unsigned int default_dst_pitch[DRM_FORMAT_MAX_PLANES] = {
@@ -146,10 +236,12 @@ static int drm_fb_xfrm(struct iosys_map *dst,
 	/* TODO: handle src in I/O memory here */
 	if (dst[0].is_iomem)
 		return __drm_fb_xfrm_toio(dst[0].vaddr_iomem, dst_pitch[0], dst_pixsize[0],
-					  src[0].vaddr, fb, clip, vaddr_cached_hint, xfrm_line);
+					  src[0].vaddr, fb, clip, vaddr_cached_hint, state,
+					  xfrm_line);
 	else
 		return __drm_fb_xfrm(dst[0].vaddr, dst_pitch[0], dst_pixsize[0],
-				     src[0].vaddr, fb, clip, vaddr_cached_hint, xfrm_line);
+				     src[0].vaddr, fb, clip, vaddr_cached_hint, state,
+				     xfrm_line);
 }
 
 /**
@@ -235,6 +327,7 @@ static void drm_fb_swab32_line(void *dbuf, const void *sbuf, unsigned int pixels
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
  * @cached: Source buffer is mapped cached (eg. not write-combined)
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and swaps per-pixel
  * bytes during the process. Destination and framebuffer formats must match. The
@@ -249,7 +342,8 @@ static void drm_fb_swab32_line(void *dbuf, const void *sbuf, unsigned int pixels
  */
 void drm_fb_swab(struct iosys_map *dst, const unsigned int *dst_pitch,
 		 const struct iosys_map *src, const struct drm_framebuffer *fb,
-		 const struct drm_rect *clip, bool cached)
+		 const struct drm_rect *clip, bool cached,
+		 struct drm_format_conv_state *state)
 {
 	const struct drm_format_info *format = fb->format;
 	u8 cpp = DIV_ROUND_UP(drm_format_info_bpp(format, 0), 8);
@@ -268,7 +362,7 @@ void drm_fb_swab(struct iosys_map *dst, const unsigned int *dst_pitch,
 		return;
 	}
 
-	drm_fb_xfrm(dst, dst_pitch, &cpp, src, fb, clip, cached, swab_line);
+	drm_fb_xfrm(dst, dst_pitch, &cpp, src, fb, clip, cached, state, swab_line);
 }
 EXPORT_SYMBOL(drm_fb_swab);
 
@@ -295,6 +389,7 @@ static void drm_fb_xrgb8888_to_rgb332_line(void *dbuf, const void *sbuf, unsigne
  * @src: Array of XRGB8888 source buffers
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts the
  * color format during the process. Destination and framebuffer formats must match. The
@@ -309,13 +404,13 @@ static void drm_fb_xrgb8888_to_rgb332_line(void *dbuf, const void *sbuf, unsigne
  */
 void drm_fb_xrgb8888_to_rgb332(struct iosys_map *dst, const unsigned int *dst_pitch,
 			       const struct iosys_map *src, const struct drm_framebuffer *fb,
-			       const struct drm_rect *clip)
+			       const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		1,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_rgb332_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb332);
@@ -364,6 +459,7 @@ static void drm_fb_xrgb8888_to_rgb565_swab_line(void *dbuf, const void *sbuf,
  * @src: Array of XRGB8888 source buffer
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  * @swab: Swap bytes
  *
  * This function copies parts of a framebuffer to display memory and converts the
@@ -379,7 +475,8 @@ static void drm_fb_xrgb8888_to_rgb565_swab_line(void *dbuf, const void *sbuf,
  */
 void drm_fb_xrgb8888_to_rgb565(struct iosys_map *dst, const unsigned int *dst_pitch,
 			       const struct iosys_map *src, const struct drm_framebuffer *fb,
-			       const struct drm_rect *clip, bool swab)
+			       const struct drm_rect *clip, struct drm_format_conv_state *state,
+			       bool swab)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		2,
@@ -392,7 +489,7 @@ void drm_fb_xrgb8888_to_rgb565(struct iosys_map *dst, const unsigned int *dst_pi
 	else
 		xfrm_line = drm_fb_xrgb8888_to_rgb565_line;
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, xfrm_line);
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state, xfrm_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb565);
 
@@ -421,6 +518,7 @@ static void drm_fb_xrgb8888_to_xrgb1555_line(void *dbuf, const void *sbuf, unsig
  * @src: Array of XRGB8888 source buffer
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts
  * the color format during the process. The parameters @dst, @dst_pitch and
@@ -436,13 +534,13 @@ static void drm_fb_xrgb8888_to_xrgb1555_line(void *dbuf, const void *sbuf, unsig
  */
 void drm_fb_xrgb8888_to_xrgb1555(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip)
+				 const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		2,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_xrgb1555_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_xrgb1555);
@@ -473,6 +571,7 @@ static void drm_fb_xrgb8888_to_argb1555_line(void *dbuf, const void *sbuf, unsig
  * @src: Array of XRGB8888 source buffer
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts
  * the color format during the process. The parameters @dst, @dst_pitch and
@@ -488,13 +587,13 @@ static void drm_fb_xrgb8888_to_argb1555_line(void *dbuf, const void *sbuf, unsig
  */
 void drm_fb_xrgb8888_to_argb1555(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip)
+				 const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		2,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_argb1555_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb1555);
@@ -525,6 +624,7 @@ static void drm_fb_xrgb8888_to_rgba5551_line(void *dbuf, const void *sbuf, unsig
  * @src: Array of XRGB8888 source buffer
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts
  * the color format during the process. The parameters @dst, @dst_pitch and
@@ -540,13 +640,13 @@ static void drm_fb_xrgb8888_to_rgba5551_line(void *dbuf, const void *sbuf, unsig
  */
 void drm_fb_xrgb8888_to_rgba5551(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip)
+				 const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		2,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_rgba5551_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgba5551);
@@ -575,6 +675,7 @@ static void drm_fb_xrgb8888_to_rgb888_line(void *dbuf, const void *sbuf, unsigne
  * @src: Array of XRGB8888 source buffers
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts the
  * color format during the process. Destination and framebuffer formats must match. The
@@ -590,13 +691,13 @@ static void drm_fb_xrgb8888_to_rgb888_line(void *dbuf, const void *sbuf, unsigne
  */
 void drm_fb_xrgb8888_to_rgb888(struct iosys_map *dst, const unsigned int *dst_pitch,
 			       const struct iosys_map *src, const struct drm_framebuffer *fb,
-			       const struct drm_rect *clip)
+			       const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		3,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_rgb888_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb888);
@@ -623,6 +724,7 @@ static void drm_fb_xrgb8888_to_argb8888_line(void *dbuf, const void *sbuf, unsig
  * @src: Array of XRGB8888 source buffer
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts the
  * color format during the process. The parameters @dst, @dst_pitch and @src refer
@@ -638,13 +740,13 @@ static void drm_fb_xrgb8888_to_argb8888_line(void *dbuf, const void *sbuf, unsig
  */
 void drm_fb_xrgb8888_to_argb8888(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip)
+				 const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		4,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_argb8888_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb8888);
@@ -669,13 +771,14 @@ static void drm_fb_xrgb8888_to_abgr8888_line(void *dbuf, const void *sbuf, unsig
 static void drm_fb_xrgb8888_to_abgr8888(struct iosys_map *dst, const unsigned int *dst_pitch,
 					const struct iosys_map *src,
 					const struct drm_framebuffer *fb,
-					const struct drm_rect *clip)
+					const struct drm_rect *clip,
+					struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		4,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_abgr8888_line);
 }
 
@@ -699,13 +802,14 @@ static void drm_fb_xrgb8888_to_xbgr8888_line(void *dbuf, const void *sbuf, unsig
 static void drm_fb_xrgb8888_to_xbgr8888(struct iosys_map *dst, const unsigned int *dst_pitch,
 					const struct iosys_map *src,
 					const struct drm_framebuffer *fb,
-					const struct drm_rect *clip)
+					const struct drm_rect *clip,
+					struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		4,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_xbgr8888_line);
 }
 
@@ -735,6 +839,7 @@ static void drm_fb_xrgb8888_to_xrgb2101010_line(void *dbuf, const void *sbuf, un
  * @src: Array of XRGB8888 source buffers
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts the
  * color format during the process. Destination and framebuffer formats must match. The
@@ -750,13 +855,14 @@ static void drm_fb_xrgb8888_to_xrgb2101010_line(void *dbuf, const void *sbuf, un
  */
 void drm_fb_xrgb8888_to_xrgb2101010(struct iosys_map *dst, const unsigned int *dst_pitch,
 				    const struct iosys_map *src, const struct drm_framebuffer *fb,
-				    const struct drm_rect *clip)
+				    const struct drm_rect *clip,
+				    struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		4,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_xrgb2101010_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_xrgb2101010);
@@ -788,6 +894,7 @@ static void drm_fb_xrgb8888_to_argb2101010_line(void *dbuf, const void *sbuf, un
  * @src: Array of XRGB8888 source buffers
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts
  * the color format during the process. The parameters @dst, @dst_pitch and
@@ -803,13 +910,14 @@ static void drm_fb_xrgb8888_to_argb2101010_line(void *dbuf, const void *sbuf, un
  */
 void drm_fb_xrgb8888_to_argb2101010(struct iosys_map *dst, const unsigned int *dst_pitch,
 				    const struct iosys_map *src, const struct drm_framebuffer *fb,
-				    const struct drm_rect *clip)
+				    const struct drm_rect *clip,
+				    struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		4,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_argb2101010_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb2101010);
@@ -839,6 +947,7 @@ static void drm_fb_xrgb8888_to_gray8_line(void *dbuf, const void *sbuf, unsigned
  * @src: Array of XRGB8888 source buffers
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts the
  * color format during the process. Destination and framebuffer formats must match. The
@@ -858,13 +967,13 @@ static void drm_fb_xrgb8888_to_gray8_line(void *dbuf, const void *sbuf, unsigned
  */
 void drm_fb_xrgb8888_to_gray8(struct iosys_map *dst, const unsigned int *dst_pitch,
 			      const struct iosys_map *src, const struct drm_framebuffer *fb,
-			      const struct drm_rect *clip)
+			      const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
 		1,
 	};
 
-	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+	drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false, state,
 		    drm_fb_xrgb8888_to_gray8_line);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_gray8);
@@ -878,6 +987,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_gray8);
  * @src:	The framebuffer memory to copy from
  * @fb:		The framebuffer to copy from
  * @clip:	Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory. If the
  * formats of the display and the framebuffer mismatch, the blit function
@@ -896,7 +1006,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_gray8);
  */
 int drm_fb_blit(struct iosys_map *dst, const unsigned int *dst_pitch, uint32_t dst_format,
 		const struct iosys_map *src, const struct drm_framebuffer *fb,
-		const struct drm_rect *clip)
+		const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	uint32_t fb_format = fb->format->format;
 
@@ -904,44 +1014,44 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int *dst_pitch, uint32_t d
 		drm_fb_memcpy(dst, dst_pitch, src, fb, clip);
 		return 0;
 	} else if (fb_format == (dst_format | DRM_FORMAT_BIG_ENDIAN)) {
-		drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+		drm_fb_swab(dst, dst_pitch, src, fb, clip, false, state);
 		return 0;
 	} else if (fb_format == (dst_format & ~DRM_FORMAT_BIG_ENDIAN)) {
-		drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+		drm_fb_swab(dst, dst_pitch, src, fb, clip, false, state);
 		return 0;
 	} else if (fb_format == DRM_FORMAT_XRGB8888) {
 		if (dst_format == DRM_FORMAT_RGB565) {
-			drm_fb_xrgb8888_to_rgb565(dst, dst_pitch, src, fb, clip, false);
+			drm_fb_xrgb8888_to_rgb565(dst, dst_pitch, src, fb, clip, state, false);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_XRGB1555) {
-			drm_fb_xrgb8888_to_xrgb1555(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_xrgb1555(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_ARGB1555) {
-			drm_fb_xrgb8888_to_argb1555(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_argb1555(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_RGBA5551) {
-			drm_fb_xrgb8888_to_rgba5551(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_rgba5551(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_RGB888) {
-			drm_fb_xrgb8888_to_rgb888(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_rgb888(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_ARGB8888) {
-			drm_fb_xrgb8888_to_argb8888(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_argb8888(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_XBGR8888) {
-			drm_fb_xrgb8888_to_xbgr8888(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_xbgr8888(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_ABGR8888) {
-			drm_fb_xrgb8888_to_abgr8888(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_abgr8888(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_XRGB2101010) {
-			drm_fb_xrgb8888_to_xrgb2101010(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_xrgb2101010(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_ARGB2101010) {
-			drm_fb_xrgb8888_to_argb2101010(dst, dst_pitch, src, fb, clip);
+			drm_fb_xrgb8888_to_argb2101010(dst, dst_pitch, src, fb, clip, state);
 			return 0;
 		} else if (dst_format == DRM_FORMAT_BGRX8888) {
-			drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+			drm_fb_swab(dst, dst_pitch, src, fb, clip, false, state);
 			return 0;
 		}
 	}
@@ -978,6 +1088,7 @@ static void drm_fb_gray8_to_mono_line(void *dbuf, const void *sbuf, unsigned int
  * @src: Array of XRGB8888 source buffers
  * @fb: DRM framebuffer
  * @clip: Clip rectangle area to copy
+ * @state: Transform and conversion state
  *
  * This function copies parts of a framebuffer to display memory and converts the
  * color format during the process. Destination and framebuffer formats must match. The
@@ -1002,7 +1113,7 @@ static void drm_fb_gray8_to_mono_line(void *dbuf, const void *sbuf, unsigned int
  */
 void drm_fb_xrgb8888_to_mono(struct iosys_map *dst, const unsigned int *dst_pitch,
 			     const struct iosys_map *src, const struct drm_framebuffer *fb,
-			     const struct drm_rect *clip)
+			     const struct drm_rect *clip, struct drm_format_conv_state *state)
 {
 	static const unsigned int default_dst_pitch[DRM_FORMAT_MAX_PLANES] = {
 		0, 0, 0, 0
@@ -1042,7 +1153,7 @@ void drm_fb_xrgb8888_to_mono(struct iosys_map *dst, const unsigned int *dst_pitc
 	 * Allocate a buffer to be used for both copying from the cma
 	 * memory and to store the intermediate grayscale line pixels.
 	 */
-	src32 = kmalloc(len_src32 + linepixels, GFP_KERNEL);
+	src32 = drm_format_conv_state_reserve(state, len_src32 + linepixels, GFP_KERNEL);
 	if (!src32)
 		return;
 
@@ -1056,8 +1167,6 @@ void drm_fb_xrgb8888_to_mono(struct iosys_map *dst, const unsigned int *dst_pitc
 		vaddr += fb->pitches[0];
 		mono += dst_pitch_0;
 	}
-
-	kfree(src32);
 }
 EXPORT_SYMBOL(drm_fb_xrgb8888_to_mono);
 
diff --git a/drivers/gpu/drm/drm_framebuffer.c b/drivers/gpu/drm/drm_framebuffer.c
index d3ba0698b84b..09e289fca5c3 100644
--- a/drivers/gpu/drm/drm_framebuffer.c
+++ b/drivers/gpu/drm/drm_framebuffer.c
@@ -394,6 +394,31 @@ static void drm_mode_rmfb_work_fn(struct work_struct *w)
 	}
 }
 
+static int drm_mode_closefb(struct drm_framebuffer *fb,
+			    struct drm_file *file_priv)
+{
+	struct drm_framebuffer *fbl;
+	bool found = false;
+
+	mutex_lock(&file_priv->fbs_lock);
+	list_for_each_entry(fbl, &file_priv->fbs, filp_head)
+		if (fb == fbl)
+			found = true;
+
+	if (!found) {
+		mutex_unlock(&file_priv->fbs_lock);
+		return -ENOENT;
+	}
+
+	list_del_init(&fb->filp_head);
+	mutex_unlock(&file_priv->fbs_lock);
+
+	/* Drop the reference that was stored in the fbs list */
+	drm_framebuffer_put(fb);
+
+	return 0;
+}
+
 /**
  * drm_mode_rmfb - remove an FB from the configuration
  * @dev: drm device
@@ -410,9 +435,8 @@ static void drm_mode_rmfb_work_fn(struct work_struct *w)
 int drm_mode_rmfb(struct drm_device *dev, u32 fb_id,
 		  struct drm_file *file_priv)
 {
-	struct drm_framebuffer *fb = NULL;
-	struct drm_framebuffer *fbl = NULL;
-	int found = 0;
+	struct drm_framebuffer *fb;
+	int ret;
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		return -EOPNOTSUPP;
@@ -421,24 +445,13 @@ int drm_mode_rmfb(struct drm_device *dev, u32 fb_id,
 	if (!fb)
 		return -ENOENT;
 
-	mutex_lock(&file_priv->fbs_lock);
-	list_for_each_entry(fbl, &file_priv->fbs, filp_head)
-		if (fb == fbl)
-			found = 1;
-	if (!found) {
-		mutex_unlock(&file_priv->fbs_lock);
-		goto fail_unref;
+	ret = drm_mode_closefb(fb, file_priv);
+	if (ret != 0) {
+		drm_framebuffer_put(fb);
+		return ret;
 	}
 
-	list_del_init(&fb->filp_head);
-	mutex_unlock(&file_priv->fbs_lock);
-
-	/* drop the reference we picked up in framebuffer lookup */
-	drm_framebuffer_put(fb);
-
 	/*
-	 * we now own the reference that was stored in the fbs list
-	 *
 	 * drm_framebuffer_remove may fail with -EINTR on pending signals,
 	 * so run this in a separate stack as there's no way to correctly
 	 * handle this after the fb is already removed from the lookup table.
@@ -457,10 +470,6 @@ int drm_mode_rmfb(struct drm_device *dev, u32 fb_id,
 		drm_framebuffer_put(fb);
 
 	return 0;
-
-fail_unref:
-	drm_framebuffer_put(fb);
-	return -ENOENT;
 }
 
 int drm_mode_rmfb_ioctl(struct drm_device *dev,
@@ -471,6 +480,28 @@ int drm_mode_rmfb_ioctl(struct drm_device *dev,
 	return drm_mode_rmfb(dev, *fb_id, file_priv);
 }
 
+int drm_mode_closefb_ioctl(struct drm_device *dev,
+			   void *data, struct drm_file *file_priv)
+{
+	struct drm_mode_closefb *r = data;
+	struct drm_framebuffer *fb;
+	int ret;
+
+	if (!drm_core_check_feature(dev, DRIVER_MODESET))
+		return -EOPNOTSUPP;
+
+	if (r->pad)
+		return -EINVAL;
+
+	fb = drm_framebuffer_lookup(dev, file_priv, r->fb_id);
+	if (!fb)
+		return -ENOENT;
+
+	ret = drm_mode_closefb(fb, file_priv);
+	drm_framebuffer_put(fb);
+	return ret;
+}
+
 /**
  * drm_mode_getfb - get FB info
  * @dev: drm device for the ioctl
diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c
index 5d4b9cd077f7..e440f458b663 100644
--- a/drivers/gpu/drm/drm_gem_atomic_helper.c
+++ b/drivers/gpu/drm/drm_gem_atomic_helper.c
@@ -218,7 +218,14 @@ void
 __drm_gem_duplicate_shadow_plane_state(struct drm_plane *plane,
 				       struct drm_shadow_plane_state *new_shadow_plane_state)
 {
+	struct drm_plane_state *plane_state = plane->state;
+	struct drm_shadow_plane_state *shadow_plane_state =
+		to_drm_shadow_plane_state(plane_state);
+
 	__drm_atomic_helper_plane_duplicate_state(plane, &new_shadow_plane_state->base);
+
+	drm_format_conv_state_copy(&shadow_plane_state->fmtcnv_state,
+				   &new_shadow_plane_state->fmtcnv_state);
 }
 EXPORT_SYMBOL(__drm_gem_duplicate_shadow_plane_state);
 
@@ -266,6 +273,7 @@ EXPORT_SYMBOL(drm_gem_duplicate_shadow_plane_state);
  */
 void __drm_gem_destroy_shadow_plane_state(struct drm_shadow_plane_state *shadow_plane_state)
 {
+	drm_format_conv_state_release(&shadow_plane_state->fmtcnv_state);
 	__drm_atomic_helper_plane_destroy_state(&shadow_plane_state->base);
 }
 EXPORT_SYMBOL(__drm_gem_destroy_shadow_plane_state);
@@ -302,6 +310,7 @@ void __drm_gem_reset_shadow_plane(struct drm_plane *plane,
 				  struct drm_shadow_plane_state *shadow_plane_state)
 {
 	__drm_atomic_helper_plane_reset(plane, &shadow_plane_state->base);
+	drm_format_conv_state_init(&shadow_plane_state->fmtcnv_state);
 }
 EXPORT_SYMBOL(__drm_gem_reset_shadow_plane);
 
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 08c088319652..54f5e8851de5 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -61,6 +61,42 @@
  * contained within struct drm_gpuva already. Hence, for inserting &drm_gpuva
  * entries from within dma-fence signalling critical sections it is enough to
  * pre-allocate the &drm_gpuva structures.
+ *
+ * &drm_gem_objects which are private to a single VM can share a common
+ * &dma_resv in order to improve locking efficiency (e.g. with &drm_exec).
+ * For this purpose drivers must pass a &drm_gem_object to drm_gpuvm_init(), in
+ * the following called 'resv object', which serves as the container of the
+ * GPUVM's shared &dma_resv. This resv object can be a driver specific
+ * &drm_gem_object, such as the &drm_gem_object containing the root page table,
+ * but it can also be a 'dummy' object, which can be allocated with
+ * drm_gpuvm_resv_object_alloc().
+ *
+ * In order to connect a struct drm_gpuva its backing &drm_gem_object each
+ * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each
+ * &drm_gpuvm_bo contains a list of &drm_gpuva structures.
+ *
+ * A &drm_gpuvm_bo is an abstraction that represents a combination of a
+ * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique.
+ * This is ensured by the API through drm_gpuvm_bo_obtain() and
+ * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding
+ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
+ * particular combination. If not existent a new instance is created and linked
+ * to the &drm_gem_object.
+ *
+ * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
+ * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
+ * lists are maintained in order to accelerate locking of dma-resv locks and
+ * validation of evicted objects bound in a &drm_gpuvm. For instance, all
+ * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
+ * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
+ * order to validate all evicted &drm_gem_objects. It is also possible to lock
+ * additional &drm_gem_objects by providing the corresponding parameters to
+ * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
+ * use of helper functions such as drm_gpuvm_prepare_range() or
+ * drm_gpuvm_prepare_objects().
+ *
+ * Every bound &drm_gem_object is treated as external object when its &dma_resv
+ * structure is different than the &drm_gpuvm's common &dma_resv structure.
  */
 
 /**
@@ -386,21 +422,42 @@
 /**
  * DOC: Locking
  *
- * Generally, the GPU VA manager does not take care of locking itself, it is
- * the drivers responsibility to take care about locking. Drivers might want to
- * protect the following operations: inserting, removing and iterating
- * &drm_gpuva objects as well as generating all kinds of operations, such as
- * split / merge or prefetch.
+ * In terms of managing &drm_gpuva entries DRM GPUVM does not take care of
+ * locking itself, it is the drivers responsibility to take care about locking.
+ * Drivers might want to protect the following operations: inserting, removing
+ * and iterating &drm_gpuva objects as well as generating all kinds of
+ * operations, such as split / merge or prefetch.
  *
- * The GPU VA manager also does not take care of the locking of the backing
- * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
- * enforce mutual exclusion using either the GEMs dma_resv lock or alternatively
- * a driver specific external lock. For the latter see also
- * drm_gem_gpuva_set_lock().
+ * DRM GPUVM also does not take care of the locking of the backing
+ * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by
+ * itself; drivers are responsible to enforce mutual exclusion using either the
+ * GEMs dma_resv lock or alternatively a driver specific external lock. For the
+ * latter see also drm_gem_gpuva_set_lock().
  *
- * However, the GPU VA manager contains lockdep checks to ensure callers of its
- * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is
- * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink().
+ * However, DRM GPUVM contains lockdep checks to ensure callers of its API hold
+ * the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed
+ * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also
+ * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put().
+ *
+ * The latter is required since on creation and destruction of a &drm_gpuvm_bo
+ * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list.
+ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
+ * &drm_gem_object must be able to observe previous creations and destructions
+ * of &drm_gpuvm_bos in order to keep instances unique.
+ *
+ * The &drm_gpuvm's lists for keeping track of external and evicted objects are
+ * protected against concurrent insertion / removal and iteration internally.
+ *
+ * However, drivers still need ensure to protect concurrent calls to functions
+ * iterating those lists, namely drm_gpuvm_prepare_objects() and
+ * drm_gpuvm_validate().
+ *
+ * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate
+ * that the corresponding &dma_resv locks are held in order to protect the
+ * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and
+ * the corresponding lockdep checks are enabled. This is an optimization for
+ * drivers which are capable of taking the corresponding &dma_resv locks and
+ * hence do not require internal locking.
  */
 
 /**
@@ -430,6 +487,7 @@
  *	{
  *		struct drm_gpuva_ops *ops;
  *		struct drm_gpuva_op *op
+ *		struct drm_gpuvm_bo *vm_bo;
  *
  *		driver_lock_va_space();
  *		ops = drm_gpuvm_sm_map_ops_create(gpuvm, addr, range,
@@ -437,6 +495,10 @@
  *		if (IS_ERR(ops))
  *			return PTR_ERR(ops);
  *
+ *		vm_bo = drm_gpuvm_bo_obtain(gpuvm, obj);
+ *		if (IS_ERR(vm_bo))
+ *			return PTR_ERR(vm_bo);
+ *
  *		drm_gpuva_for_each_op(op, ops) {
  *			struct drm_gpuva *va;
  *
@@ -449,7 +511,7 @@
  *
  *				driver_vm_map();
  *				drm_gpuva_map(gpuvm, va, &op->map);
- *				drm_gpuva_link(va);
+ *				drm_gpuva_link(va, vm_bo);
  *
  *				break;
  *			case DRM_GPUVA_OP_REMAP: {
@@ -476,11 +538,11 @@
  *				driver_vm_remap();
  *				drm_gpuva_remap(prev, next, &op->remap);
  *
- *				drm_gpuva_unlink(va);
  *				if (prev)
- *					drm_gpuva_link(prev);
+ *					drm_gpuva_link(prev, va->vm_bo);
  *				if (next)
- *					drm_gpuva_link(next);
+ *					drm_gpuva_link(next, va->vm_bo);
+ *				drm_gpuva_unlink(va);
  *
  *				break;
  *			}
@@ -496,6 +558,7 @@
  *				break;
  *			}
  *		}
+ *		drm_gpuvm_bo_put(vm_bo);
  *		driver_unlock_va_space();
  *
  *		return 0;
@@ -505,6 +568,7 @@
  *
  *	struct driver_context {
  *		struct drm_gpuvm *gpuvm;
+ *		struct drm_gpuvm_bo *vm_bo;
  *		struct drm_gpuva *new_va;
  *		struct drm_gpuva *prev_va;
  *		struct drm_gpuva *next_va;
@@ -525,6 +589,7 @@
  *				  struct drm_gem_object *obj, u64 offset)
  *	{
  *		struct driver_context ctx;
+ *		struct drm_gpuvm_bo *vm_bo;
  *		struct drm_gpuva_ops *ops;
  *		struct drm_gpuva_op *op;
  *		int ret = 0;
@@ -534,16 +599,23 @@
  *		ctx.new_va = kzalloc(sizeof(*ctx.new_va), GFP_KERNEL);
  *		ctx.prev_va = kzalloc(sizeof(*ctx.prev_va), GFP_KERNEL);
  *		ctx.next_va = kzalloc(sizeof(*ctx.next_va), GFP_KERNEL);
- *		if (!ctx.new_va || !ctx.prev_va || !ctx.next_va) {
+ *		ctx.vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
+ *		if (!ctx.new_va || !ctx.prev_va || !ctx.next_va || !vm_bo) {
  *			ret = -ENOMEM;
  *			goto out;
  *		}
  *
+ *		// Typically protected with a driver specific GEM gpuva lock
+ *		// used in the fence signaling path for drm_gpuva_link() and
+ *		// drm_gpuva_unlink(), hence pre-allocate.
+ *		ctx.vm_bo = drm_gpuvm_bo_obtain_prealloc(ctx.vm_bo);
+ *
  *		driver_lock_va_space();
  *		ret = drm_gpuvm_sm_map(gpuvm, &ctx, addr, range, obj, offset);
  *		driver_unlock_va_space();
  *
  *	out:
+ *		drm_gpuvm_bo_put(ctx.vm_bo);
  *		kfree(ctx.new_va);
  *		kfree(ctx.prev_va);
  *		kfree(ctx.next_va);
@@ -556,7 +628,7 @@
  *
  *		drm_gpuva_map(ctx->vm, ctx->new_va, &op->map);
  *
- *		drm_gpuva_link(ctx->new_va);
+ *		drm_gpuva_link(ctx->new_va, ctx->vm_bo);
  *
  *		// prevent the new GPUVA from being freed in
  *		// driver_mapping_create()
@@ -568,22 +640,23 @@
  *	int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
  *	{
  *		struct driver_context *ctx = __ctx;
+ *		struct drm_gpuva *va = op->remap.unmap->va;
  *
  *		drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
  *
- *		drm_gpuva_unlink(op->remap.unmap->va);
- *		kfree(op->remap.unmap->va);
- *
  *		if (op->remap.prev) {
- *			drm_gpuva_link(ctx->prev_va);
+ *			drm_gpuva_link(ctx->prev_va, va->vm_bo);
  *			ctx->prev_va = NULL;
  *		}
  *
  *		if (op->remap.next) {
- *			drm_gpuva_link(ctx->next_va);
+ *			drm_gpuva_link(ctx->next_va, va->vm_bo);
  *			ctx->next_va = NULL;
  *		}
  *
+ *		drm_gpuva_unlink(va);
+ *		kfree(va);
+ *
  *		return 0;
  *	}
  *
@@ -597,6 +670,201 @@
  *	}
  */
 
+/**
+ * get_next_vm_bo_from_list() - get the next vm_bo element
+ * @__gpuvm: the &drm_gpuvm
+ * @__list_name: the name of the list we're iterating on
+ * @__local_list: a pointer to the local list used to store already iterated items
+ * @__prev_vm_bo: the previous element we got from get_next_vm_bo_from_list()
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from
+ * the list, so list insertion deletion can happen concurrently.
+ *
+ * Elements popped from the original list are kept in a local list, so removal
+ * and is_empty checks can still happen while we're iterating the list.
+ */
+#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
+	({										\
+		struct drm_gpuvm_bo *__vm_bo = NULL;					\
+											\
+		drm_gpuvm_bo_put(__prev_vm_bo);						\
+											\
+		spin_lock(&(__gpuvm)->__list_name.lock);				\
+		if (!(__gpuvm)->__list_name.local_list)					\
+			(__gpuvm)->__list_name.local_list = __local_list;		\
+		else									\
+			drm_WARN_ON((__gpuvm)->drm,					\
+				    (__gpuvm)->__list_name.local_list != __local_list);	\
+											\
+		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
+			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
+						   struct drm_gpuvm_bo,			\
+						   list.entry.__list_name);		\
+			if (kref_get_unless_zero(&__vm_bo->kref)) {			\
+				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
+					       __local_list);				\
+				break;							\
+			} else {							\
+				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
+				__vm_bo = NULL;						\
+			}								\
+		}									\
+		spin_unlock(&(__gpuvm)->__list_name.lock);				\
+											\
+		__vm_bo;								\
+	})
+
+/**
+ * for_each_vm_bo_in_list() - internal vm_bo list iterator
+ * @__gpuvm: the &drm_gpuvm
+ * @__list_name: the name of the list we're iterating on
+ * @__local_list: a pointer to the local list used to store already iterated items
+ * @__vm_bo: the struct drm_gpuvm_bo to assign in each iteration step
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from the
+ * list, hence list insertion and deletion can happen concurrently.
+ *
+ * It is not allowed to re-assign the vm_bo pointer from inside this loop.
+ *
+ * Typical use:
+ *
+ *	struct drm_gpuvm_bo *vm_bo;
+ *	LIST_HEAD(my_local_list);
+ *
+ *	ret = 0;
+ *	for_each_vm_bo_in_list(gpuvm, <list_name>, &my_local_list, vm_bo) {
+ *		ret = do_something_with_vm_bo(..., vm_bo);
+ *		if (ret)
+ *			break;
+ *	}
+ *	// Drop ref in case we break out of the loop.
+ *	drm_gpuvm_bo_put(vm_bo);
+ *	restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
+ *
+ *
+ * Only used for internal list iterations, not meant to be exposed to the outside
+ * world.
+ */
+#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
+	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
+						__local_list, NULL);		\
+	     __vm_bo;								\
+	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
+						__local_list, __vm_bo))
+
+static void
+__restore_vm_bo_list(struct drm_gpuvm *gpuvm, spinlock_t *lock,
+		     struct list_head *list, struct list_head **local_list)
+{
+	/* Merge back the two lists, moving local list elements to the
+	 * head to preserve previous ordering, in case it matters.
+	 */
+	spin_lock(lock);
+	if (*local_list) {
+		list_splice(*local_list, list);
+		*local_list = NULL;
+	}
+	spin_unlock(lock);
+}
+
+/**
+ * restore_vm_bo_list() - move vm_bo elements back to their original list
+ * @__gpuvm: the &drm_gpuvm
+ * @__list_name: the name of the list we're iterating on
+ *
+ * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
+ * to restore the original state and let new iterations take place.
+ */
+#define restore_vm_bo_list(__gpuvm, __list_name)			\
+	__restore_vm_bo_list((__gpuvm), &(__gpuvm)->__list_name.lock,	\
+			     &(__gpuvm)->__list_name.list,		\
+			     &(__gpuvm)->__list_name.local_list)
+
+static void
+cond_spin_lock(spinlock_t *lock, bool cond)
+{
+	if (cond)
+		spin_lock(lock);
+}
+
+static void
+cond_spin_unlock(spinlock_t *lock, bool cond)
+{
+	if (cond)
+		spin_unlock(lock);
+}
+
+static void
+__drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
+			struct list_head *entry, struct list_head *list)
+{
+	cond_spin_lock(lock, !!lock);
+	if (list_empty(entry))
+		list_add_tail(entry, list);
+	cond_spin_unlock(lock, !!lock);
+}
+
+/**
+ * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ * @__lock: whether to lock with the internal spinlock
+ *
+ * Inserts the given @__vm_bo into the list specified by @__list_name.
+ */
+#define drm_gpuvm_bo_list_add(__vm_bo, __list_name, __lock)			\
+	__drm_gpuvm_bo_list_add((__vm_bo)->vm,					\
+				__lock ? &(__vm_bo)->vm->__list_name.lock :	\
+					 NULL,					\
+				&(__vm_bo)->list.entry.__list_name,		\
+				&(__vm_bo)->vm->__list_name.list)
+
+static void
+__drm_gpuvm_bo_list_del(struct drm_gpuvm *gpuvm, spinlock_t *lock,
+			struct list_head *entry, bool init)
+{
+	cond_spin_lock(lock, !!lock);
+	if (init) {
+		if (!list_empty(entry))
+			list_del_init(entry);
+	} else {
+		list_del(entry);
+	}
+	cond_spin_unlock(lock, !!lock);
+}
+
+/**
+ * drm_gpuvm_bo_list_del_init() - remove a vm_bo from the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ * @__lock: whether to lock with the internal spinlock
+ *
+ * Removes the given @__vm_bo from the list specified by @__list_name.
+ */
+#define drm_gpuvm_bo_list_del_init(__vm_bo, __list_name, __lock)		\
+	__drm_gpuvm_bo_list_del((__vm_bo)->vm,					\
+				__lock ? &(__vm_bo)->vm->__list_name.lock :	\
+					 NULL,					\
+				&(__vm_bo)->list.entry.__list_name,		\
+				true)
+
+/**
+ * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ * @__lock: whether to lock with the internal spinlock
+ *
+ * Removes the given @__vm_bo from the list specified by @__list_name.
+ */
+#define drm_gpuvm_bo_list_del(__vm_bo, __list_name, __lock)			\
+	__drm_gpuvm_bo_list_del((__vm_bo)->vm,					\
+				__lock ? &(__vm_bo)->vm->__list_name.lock :	\
+					 NULL,					\
+				&(__vm_bo)->list.entry.__list_name,		\
+				false)
+
 #define to_drm_gpuva(__node)	container_of((__node), struct drm_gpuva, rb.node)
 
 #define GPUVA_START(node) ((node)->va.addr)
@@ -618,8 +886,14 @@ drm_gpuvm_check_overflow(u64 addr, u64 range)
 {
 	u64 end;
 
-	return WARN(check_add_overflow(addr, range, &end),
-		    "GPUVA address limited to %zu bytes.\n", sizeof(end));
+	return check_add_overflow(addr, range, &end);
+}
+
+static bool
+drm_gpuvm_warn_check_overflow(struct drm_gpuvm *gpuvm, u64 addr, u64 range)
+{
+	return drm_WARN(gpuvm->drm, drm_gpuvm_check_overflow(addr, range),
+			"GPUVA address limited to %zu bytes.\n", sizeof(addr));
 }
 
 static bool
@@ -643,7 +917,18 @@ drm_gpuvm_in_kernel_node(struct drm_gpuvm *gpuvm, u64 addr, u64 range)
 	return krange && addr < kend && kstart < end;
 }
 
-static bool
+/**
+ * drm_gpuvm_range_valid() - checks whether the given range is valid for the
+ * given &drm_gpuvm
+ * @gpuvm: the GPUVM to check the range for
+ * @addr: the base address
+ * @range: the range starting from the base address
+ *
+ * Checks whether the range is within the GPUVM's managed boundaries.
+ *
+ * Returns: true for a valid range, false otherwise
+ */
+bool
 drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm,
 		      u64 addr, u64 range)
 {
@@ -651,11 +936,52 @@ drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm,
 	       drm_gpuvm_in_mm_range(gpuvm, addr, range) &&
 	       !drm_gpuvm_in_kernel_node(gpuvm, addr, range);
 }
+EXPORT_SYMBOL_GPL(drm_gpuvm_range_valid);
+
+static void
+drm_gpuvm_gem_object_free(struct drm_gem_object *obj)
+{
+	drm_gem_object_release(obj);
+	kfree(obj);
+}
+
+static const struct drm_gem_object_funcs drm_gpuvm_object_funcs = {
+	.free = drm_gpuvm_gem_object_free,
+};
+
+/**
+ * drm_gpuvm_resv_object_alloc() - allocate a dummy &drm_gem_object
+ * @drm: the drivers &drm_device
+ *
+ * Allocates a dummy &drm_gem_object which can be passed to drm_gpuvm_init() in
+ * order to serve as root GEM object providing the &drm_resv shared across
+ * &drm_gem_objects local to a single GPUVM.
+ *
+ * Returns: the &drm_gem_object on success, NULL on failure
+ */
+struct drm_gem_object *
+drm_gpuvm_resv_object_alloc(struct drm_device *drm)
+{
+	struct drm_gem_object *obj;
+
+	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	if (!obj)
+		return NULL;
+
+	obj->funcs = &drm_gpuvm_object_funcs;
+	drm_gem_private_object_init(drm, obj, 0);
+
+	return obj;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_resv_object_alloc);
 
 /**
  * drm_gpuvm_init() - initialize a &drm_gpuvm
  * @gpuvm: pointer to the &drm_gpuvm to initialize
  * @name: the name of the GPU VA space
+ * @flags: the &drm_gpuvm_flags for this GPUVM
+ * @drm: the &drm_device this VM resides in
+ * @r_obj: the resv &drm_gem_object providing the GPUVM's common &dma_resv
  * @start_offset: the start offset of the GPU VA space
  * @range: the size of the GPU VA space
  * @reserve_offset: the start of the kernel reserved GPU VA area
@@ -668,8 +994,10 @@ drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm,
  * &name is expected to be managed by the surrounding driver structures.
  */
 void
-drm_gpuvm_init(struct drm_gpuvm *gpuvm,
-	       const char *name,
+drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
+	       enum drm_gpuvm_flags flags,
+	       struct drm_device *drm,
+	       struct drm_gem_object *r_obj,
 	       u64 start_offset, u64 range,
 	       u64 reserve_offset, u64 reserve_range,
 	       const struct drm_gpuvm_ops *ops)
@@ -677,45 +1005,676 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
 	gpuvm->rb.tree = RB_ROOT_CACHED;
 	INIT_LIST_HEAD(&gpuvm->rb.list);
 
-	drm_gpuvm_check_overflow(start_offset, range);
+	INIT_LIST_HEAD(&gpuvm->extobj.list);
+	spin_lock_init(&gpuvm->extobj.lock);
+
+	INIT_LIST_HEAD(&gpuvm->evict.list);
+	spin_lock_init(&gpuvm->evict.lock);
+
+	kref_init(&gpuvm->kref);
+
+	gpuvm->name = name ? name : "unknown";
+	gpuvm->flags = flags;
+	gpuvm->ops = ops;
+	gpuvm->drm = drm;
+	gpuvm->r_obj = r_obj;
+
+	drm_gem_object_get(r_obj);
+
+	drm_gpuvm_warn_check_overflow(gpuvm, start_offset, range);
 	gpuvm->mm_start = start_offset;
 	gpuvm->mm_range = range;
 
-	gpuvm->name = name ? name : "unknown";
-	gpuvm->ops = ops;
-
 	memset(&gpuvm->kernel_alloc_node, 0, sizeof(struct drm_gpuva));
-
 	if (reserve_range) {
 		gpuvm->kernel_alloc_node.va.addr = reserve_offset;
 		gpuvm->kernel_alloc_node.va.range = reserve_range;
 
-		if (likely(!drm_gpuvm_check_overflow(reserve_offset,
-						     reserve_range)))
+		if (likely(!drm_gpuvm_warn_check_overflow(gpuvm, reserve_offset,
+							  reserve_range)))
 			__drm_gpuva_insert(gpuvm, &gpuvm->kernel_alloc_node);
 	}
 }
 EXPORT_SYMBOL_GPL(drm_gpuvm_init);
 
-/**
- * drm_gpuvm_destroy() - cleanup a &drm_gpuvm
- * @gpuvm: pointer to the &drm_gpuvm to clean up
- *
- * Note that it is a bug to call this function on a manager that still
- * holds GPU VA mappings.
- */
-void
-drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
+static void
+drm_gpuvm_fini(struct drm_gpuvm *gpuvm)
 {
 	gpuvm->name = NULL;
 
 	if (gpuvm->kernel_alloc_node.va.range)
 		__drm_gpuva_remove(&gpuvm->kernel_alloc_node);
 
-	WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
-	     "GPUVA tree is not empty, potentially leaking memory.");
+	drm_WARN(gpuvm->drm, !RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
+		 "GPUVA tree is not empty, potentially leaking memory.\n");
+
+	drm_WARN(gpuvm->drm, !list_empty(&gpuvm->extobj.list),
+		 "Extobj list should be empty.\n");
+	drm_WARN(gpuvm->drm, !list_empty(&gpuvm->evict.list),
+		 "Evict list should be empty.\n");
+
+	drm_gem_object_put(gpuvm->r_obj);
 }
-EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
+
+static void
+drm_gpuvm_free(struct kref *kref)
+{
+	struct drm_gpuvm *gpuvm = container_of(kref, struct drm_gpuvm, kref);
+
+	drm_gpuvm_fini(gpuvm);
+
+	if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
+		return;
+
+	gpuvm->ops->vm_free(gpuvm);
+}
+
+/**
+ * drm_gpuvm_put() - drop a struct drm_gpuvm reference
+ * @gpuvm: the &drm_gpuvm to release the reference of
+ *
+ * This releases a reference to @gpuvm.
+ *
+ * This function may be called from atomic context.
+ */
+void
+drm_gpuvm_put(struct drm_gpuvm *gpuvm)
+{
+	if (gpuvm)
+		kref_put(&gpuvm->kref, drm_gpuvm_free);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_put);
+
+static int
+__drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+			    struct drm_exec *exec,
+			    unsigned int num_fences)
+{
+	struct drm_gpuvm_bo *vm_bo;
+	LIST_HEAD(extobjs);
+	int ret = 0;
+
+	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
+		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
+		if (ret)
+			break;
+	}
+	/* Drop ref in case we break out of the loop. */
+	drm_gpuvm_bo_put(vm_bo);
+	restore_vm_bo_list(gpuvm, extobj);
+
+	return ret;
+}
+
+static int
+drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
+				 struct drm_exec *exec,
+				 unsigned int num_fences)
+{
+	struct drm_gpuvm_bo *vm_bo;
+	int ret = 0;
+
+	drm_gpuvm_resv_assert_held(gpuvm);
+	list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
+		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
+		if (ret)
+			break;
+
+		if (vm_bo->evicted)
+			drm_gpuvm_bo_list_add(vm_bo, evict, false);
+	}
+
+	return ret;
+}
+
+/**
+ * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec locking context
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
+ * &drm_gpuvm contains mappings of.
+ *
+ * Using this function directly, it is the drivers responsibility to call
+ * drm_exec_init() and drm_exec_fini() accordingly.
+ *
+ * Note: This function is safe against concurrent insertion and removal of
+ * external objects, however it is not safe against concurrent usage itself.
+ *
+ * Drivers need to make sure to protect this case with either an outer VM lock
+ * or by calling drm_gpuvm_prepare_vm() before this function within the
+ * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
+ * mutual exclusion.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+			  struct drm_exec *exec,
+			  unsigned int num_fences)
+{
+	if (drm_gpuvm_resv_protected(gpuvm))
+		return drm_gpuvm_prepare_objects_locked(gpuvm, exec,
+							num_fences);
+	else
+		return __drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
+
+/**
+ * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec locking context
+ * @addr: the start address within the VA space
+ * @range: the range to iterate within the VA space
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
+ * and @addr + @range.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
+			u64 addr, u64 range, unsigned int num_fences)
+{
+	struct drm_gpuva *va;
+	u64 end = addr + range;
+	int ret;
+
+	drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
+		struct drm_gem_object *obj = va->gem.obj;
+
+		ret = drm_exec_prepare_obj(exec, obj, num_fences);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
+
+/**
+ * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec wrapper
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects the given
+ * &drm_gpuvm contains mappings of.
+ *
+ * Addionally, when calling this function with struct drm_gpuvm_exec::extra
+ * being set the driver receives the given @fn callback to lock additional
+ * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
+ * would call drm_exec_prepare_obj() from within this callback.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec)
+{
+	struct drm_gpuvm *gpuvm = vm_exec->vm;
+	struct drm_exec *exec = &vm_exec->exec;
+	unsigned int num_fences = vm_exec->num_fences;
+	int ret;
+
+	drm_exec_init(exec, vm_exec->flags);
+
+	drm_exec_until_all_locked(exec) {
+		ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
+		drm_exec_retry_on_contention(exec);
+		if (ret)
+			goto err;
+
+		ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
+		drm_exec_retry_on_contention(exec);
+		if (ret)
+			goto err;
+
+		if (vm_exec->extra.fn) {
+			ret = vm_exec->extra.fn(vm_exec);
+			drm_exec_retry_on_contention(exec);
+			if (ret)
+				goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	drm_exec_fini(exec);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
+
+static int
+fn_lock_array(struct drm_gpuvm_exec *vm_exec)
+{
+	struct {
+		struct drm_gem_object **objs;
+		unsigned int num_objs;
+	} *args = vm_exec->extra.priv;
+
+	return drm_exec_prepare_array(&vm_exec->exec, args->objs,
+				      args->num_objs, vm_exec->num_fences);
+}
+
+/**
+ * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec wrapper
+ * @objs: additional &drm_gem_objects to lock
+ * @num_objs: the number of additional &drm_gem_objects to lock
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
+ * contains mappings of, plus the ones given through @objs.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
+			  struct drm_gem_object **objs,
+			  unsigned int num_objs)
+{
+	struct {
+		struct drm_gem_object **objs;
+		unsigned int num_objs;
+	} args;
+
+	args.objs = objs;
+	args.num_objs = num_objs;
+
+	vm_exec->extra.fn = fn_lock_array;
+	vm_exec->extra.priv = &args;
+
+	return drm_gpuvm_exec_lock(vm_exec);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
+
+/**
+ * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
+ * @vm_exec: the &drm_gpuvm_exec wrapper
+ * @addr: the start address within the VA space
+ * @range: the range to iterate within the VA space
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
+ * @addr + @range.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
+			  u64 addr, u64 range)
+{
+	struct drm_gpuvm *gpuvm = vm_exec->vm;
+	struct drm_exec *exec = &vm_exec->exec;
+	int ret;
+
+	drm_exec_init(exec, vm_exec->flags);
+
+	drm_exec_until_all_locked(exec) {
+		ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
+					      vm_exec->num_fences);
+		drm_exec_retry_on_contention(exec);
+		if (ret)
+			goto err;
+	}
+
+	return ret;
+
+err:
+	drm_exec_fini(exec);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
+
+static int
+__drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
+{
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	struct drm_gpuvm_bo *vm_bo;
+	LIST_HEAD(evict);
+	int ret = 0;
+
+	for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
+		ret = ops->vm_bo_validate(vm_bo, exec);
+		if (ret)
+			break;
+	}
+	/* Drop ref in case we break out of the loop. */
+	drm_gpuvm_bo_put(vm_bo);
+	restore_vm_bo_list(gpuvm, evict);
+
+	return ret;
+}
+
+static int
+drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
+{
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	struct drm_gpuvm_bo *vm_bo, *next;
+	int ret = 0;
+
+	drm_gpuvm_resv_assert_held(gpuvm);
+
+	list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
+				 list.entry.evict) {
+		ret = ops->vm_bo_validate(vm_bo, exec);
+		if (ret)
+			break;
+
+		dma_resv_assert_held(vm_bo->obj->resv);
+		if (!vm_bo->evicted)
+			drm_gpuvm_bo_list_del_init(vm_bo, evict, false);
+	}
+
+	return ret;
+}
+
+/**
+ * drm_gpuvm_validate() - validate all BOs marked as evicted
+ * @gpuvm: the &drm_gpuvm to validate evicted BOs
+ * @exec: the &drm_exec instance used for locking the GPUVM
+ *
+ * Calls the &drm_gpuvm_ops::vm_bo_validate callback for all evicted buffer
+ * objects being mapped in the given &drm_gpuvm.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
+{
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+
+	if (unlikely(!ops || !ops->vm_bo_validate))
+		return -EOPNOTSUPP;
+
+	if (drm_gpuvm_resv_protected(gpuvm))
+		return drm_gpuvm_validate_locked(gpuvm, exec);
+	else
+		return __drm_gpuvm_validate(gpuvm, exec);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
+
+/**
+ * drm_gpuvm_resv_add_fence - add fence to private and all extobj
+ * dma-resv
+ * @gpuvm: the &drm_gpuvm to add a fence to
+ * @exec: the &drm_exec locking context
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ */
+void
+drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
+			 struct drm_exec *exec,
+			 struct dma_fence *fence,
+			 enum dma_resv_usage private_usage,
+			 enum dma_resv_usage extobj_usage)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_assert_held(obj->resv);
+		dma_resv_add_fence(obj->resv, fence,
+				   drm_gpuvm_is_extobj(gpuvm, obj) ?
+				   extobj_usage : private_usage);
+	}
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
+
+/**
+ * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ *
+ * If provided by the driver, this function uses the &drm_gpuvm_ops
+ * vm_bo_alloc() callback to allocate.
+ *
+ * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
+		    struct drm_gem_object *obj)
+{
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	struct drm_gpuvm_bo *vm_bo;
+
+	if (ops && ops->vm_bo_alloc)
+		vm_bo = ops->vm_bo_alloc();
+	else
+		vm_bo = kzalloc(sizeof(*vm_bo), GFP_KERNEL);
+
+	if (unlikely(!vm_bo))
+		return NULL;
+
+	vm_bo->vm = drm_gpuvm_get(gpuvm);
+	vm_bo->obj = obj;
+	drm_gem_object_get(obj);
+
+	kref_init(&vm_bo->kref);
+	INIT_LIST_HEAD(&vm_bo->list.gpuva);
+	INIT_LIST_HEAD(&vm_bo->list.entry.gem);
+
+	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
+	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
+
+	return vm_bo;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_create);
+
+static void
+drm_gpuvm_bo_destroy(struct kref *kref)
+{
+	struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
+						  kref);
+	struct drm_gpuvm *gpuvm = vm_bo->vm;
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	struct drm_gem_object *obj = vm_bo->obj;
+	bool lock = !drm_gpuvm_resv_protected(gpuvm);
+
+	if (!lock)
+		drm_gpuvm_resv_assert_held(gpuvm);
+
+	drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
+	drm_gpuvm_bo_list_del(vm_bo, evict, lock);
+
+	drm_gem_gpuva_assert_lock_held(obj);
+	list_del(&vm_bo->list.entry.gem);
+
+	if (ops && ops->vm_bo_free)
+		ops->vm_bo_free(vm_bo);
+	else
+		kfree(vm_bo);
+
+	drm_gpuvm_put(gpuvm);
+	drm_gem_object_put(obj);
+}
+
+/**
+ * drm_gpuvm_bo_put() - drop a struct drm_gpuvm_bo reference
+ * @vm_bo: the &drm_gpuvm_bo to release the reference of
+ *
+ * This releases a reference to @vm_bo.
+ *
+ * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
+ * includes removing it from the GEMs gpuva list. Hence, if a call to this
+ * function can potentially let the reference count drop to zero the caller must
+ * hold the dma-resv or driver specific GEM gpuva lock.
+ *
+ * This function may only be called from non-atomic context.
+ */
+void
+drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
+{
+	might_sleep();
+
+	if (vm_bo)
+		kref_put(&vm_bo->kref, drm_gpuvm_bo_destroy);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
+
+static struct drm_gpuvm_bo *
+__drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
+		    struct drm_gem_object *obj)
+{
+	struct drm_gpuvm_bo *vm_bo;
+
+	drm_gem_gpuva_assert_lock_held(obj);
+	drm_gem_for_each_gpuvm_bo(vm_bo, obj)
+		if (vm_bo->vm == gpuvm)
+			return vm_bo;
+
+	return NULL;
+}
+
+/**
+ * drm_gpuvm_bo_find() - find the &drm_gpuvm_bo for the given
+ * &drm_gpuvm and &drm_gem_object
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ *
+ * Find the &drm_gpuvm_bo representing the combination of the given
+ * &drm_gpuvm and &drm_gem_object. If found, increases the reference
+ * count of the &drm_gpuvm_bo accordingly.
+ *
+ * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
+		  struct drm_gem_object *obj)
+{
+	struct drm_gpuvm_bo *vm_bo = __drm_gpuvm_bo_find(gpuvm, obj);
+
+	return vm_bo ? drm_gpuvm_bo_get(vm_bo) : NULL;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_find);
+
+/**
+ * drm_gpuvm_bo_obtain() - obtains and instance of the &drm_gpuvm_bo for the
+ * given &drm_gpuvm and &drm_gem_object
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ *
+ * Find the &drm_gpuvm_bo representing the combination of the given
+ * &drm_gpuvm and &drm_gem_object. If found, increases the reference
+ * count of the &drm_gpuvm_bo accordingly. If not found, allocates a new
+ * &drm_gpuvm_bo.
+ *
+ * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
+ *
+ * Returns: a pointer to the &drm_gpuvm_bo on success, an ERR_PTR on failure
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
+		    struct drm_gem_object *obj)
+{
+	struct drm_gpuvm_bo *vm_bo;
+
+	vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
+	if (vm_bo)
+		return vm_bo;
+
+	vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
+	if (!vm_bo)
+		return ERR_PTR(-ENOMEM);
+
+	drm_gem_gpuva_assert_lock_held(obj);
+	list_add_tail(&vm_bo->list.entry.gem, &obj->gpuva.list);
+
+	return vm_bo;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain);
+
+/**
+ * drm_gpuvm_bo_obtain_prealloc() - obtains and instance of the &drm_gpuvm_bo
+ * for the given &drm_gpuvm and &drm_gem_object
+ * @__vm_bo: A pre-allocated struct drm_gpuvm_bo.
+ *
+ * Find the &drm_gpuvm_bo representing the combination of the given
+ * &drm_gpuvm and &drm_gem_object. If found, increases the reference
+ * count of the found &drm_gpuvm_bo accordingly, while the @__vm_bo reference
+ * count is decreased. If not found @__vm_bo is returned without further
+ * increase of the reference count.
+ *
+ * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
+ *
+ * Returns: a pointer to the found &drm_gpuvm_bo or @__vm_bo if no existing
+ * &drm_gpuvm_bo was found
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
+{
+	struct drm_gpuvm *gpuvm = __vm_bo->vm;
+	struct drm_gem_object *obj = __vm_bo->obj;
+	struct drm_gpuvm_bo *vm_bo;
+
+	vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
+	if (vm_bo) {
+		drm_gpuvm_bo_put(__vm_bo);
+		return vm_bo;
+	}
+
+	drm_gem_gpuva_assert_lock_held(obj);
+	list_add_tail(&__vm_bo->list.entry.gem, &obj->gpuva.list);
+
+	return __vm_bo;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
+
+/**
+ * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
+ * extobj list
+ * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
+ *
+ * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
+ * already and if the corresponding &drm_gem_object is an external object,
+ * actually.
+ */
+void
+drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
+{
+	struct drm_gpuvm *gpuvm = vm_bo->vm;
+	bool lock = !drm_gpuvm_resv_protected(gpuvm);
+
+	if (!lock)
+		drm_gpuvm_resv_assert_held(gpuvm);
+
+	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
+		drm_gpuvm_bo_list_add(vm_bo, extobj, lock);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
+
+/**
+ * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the &drm_gpuvms
+ * evicted list
+ * @vm_bo: the &drm_gpuvm_bo to add or remove
+ * @evict: indicates whether the object is evicted
+ *
+ * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms evicted list.
+ */
+void
+drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
+{
+	struct drm_gpuvm *gpuvm = vm_bo->vm;
+	struct drm_gem_object *obj = vm_bo->obj;
+	bool lock = !drm_gpuvm_resv_protected(gpuvm);
+
+	dma_resv_assert_held(obj->resv);
+	vm_bo->evicted = evict;
+
+	/* Can't add external objects to the evicted list directly if not using
+	 * internal spinlocks, since in this case the evicted list is protected
+	 * with the VM's common dma-resv lock.
+	 */
+	if (drm_gpuvm_is_extobj(gpuvm, obj) && !lock)
+		return;
+
+	if (evict)
+		drm_gpuvm_bo_list_add(vm_bo, evict, lock);
+	else
+		drm_gpuvm_bo_list_del_init(vm_bo, evict, lock);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
 
 static int
 __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
@@ -764,11 +1723,21 @@ drm_gpuva_insert(struct drm_gpuvm *gpuvm,
 {
 	u64 addr = va->va.addr;
 	u64 range = va->va.range;
+	int ret;
 
 	if (unlikely(!drm_gpuvm_range_valid(gpuvm, addr, range)))
 		return -EINVAL;
 
-	return __drm_gpuva_insert(gpuvm, va);
+	ret = __drm_gpuva_insert(gpuvm, va);
+	if (likely(!ret))
+		/* Take a reference of the GPUVM for the successfully inserted
+		 * drm_gpuva. We can't take the reference in
+		 * __drm_gpuva_insert() itself, since we don't want to increse
+		 * the reference count for the GPUVM's kernel_alloc_node.
+		 */
+		drm_gpuvm_get(gpuvm);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(drm_gpuva_insert);
 
@@ -795,35 +1764,46 @@ drm_gpuva_remove(struct drm_gpuva *va)
 	struct drm_gpuvm *gpuvm = va->vm;
 
 	if (unlikely(va == &gpuvm->kernel_alloc_node)) {
-		WARN(1, "Can't destroy kernel reserved node.\n");
+		drm_WARN(gpuvm->drm, 1,
+			 "Can't destroy kernel reserved node.\n");
 		return;
 	}
 
 	__drm_gpuva_remove(va);
+	drm_gpuvm_put(va->vm);
 }
 EXPORT_SYMBOL_GPL(drm_gpuva_remove);
 
 /**
  * drm_gpuva_link() - link a &drm_gpuva
  * @va: the &drm_gpuva to link
+ * @vm_bo: the &drm_gpuvm_bo to add the &drm_gpuva to
  *
- * This adds the given &va to the GPU VA list of the &drm_gem_object it is
- * associated with.
+ * This adds the given &va to the GPU VA list of the &drm_gpuvm_bo and the
+ * &drm_gpuvm_bo to the &drm_gem_object it is associated with.
+ *
+ * For every &drm_gpuva entry added to the &drm_gpuvm_bo an additional
+ * reference of the latter is taken.
  *
  * This function expects the caller to protect the GEM's GPUVA list against
- * concurrent access using the GEMs dma_resv lock.
+ * concurrent access using either the GEMs dma_resv lock or a driver specific
+ * lock set through drm_gem_gpuva_set_lock().
  */
 void
-drm_gpuva_link(struct drm_gpuva *va)
+drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo)
 {
 	struct drm_gem_object *obj = va->gem.obj;
+	struct drm_gpuvm *gpuvm = va->vm;
 
 	if (unlikely(!obj))
 		return;
 
-	drm_gem_gpuva_assert_lock_held(obj);
+	drm_WARN_ON(gpuvm->drm, obj != vm_bo->obj);
 
-	list_add_tail(&va->gem.entry, &obj->gpuva.list);
+	va->vm_bo = drm_gpuvm_bo_get(vm_bo);
+
+	drm_gem_gpuva_assert_lock_held(obj);
+	list_add_tail(&va->gem.entry, &vm_bo->list.gpuva);
 }
 EXPORT_SYMBOL_GPL(drm_gpuva_link);
 
@@ -834,20 +1814,31 @@ EXPORT_SYMBOL_GPL(drm_gpuva_link);
  * This removes the given &va from the GPU VA list of the &drm_gem_object it is
  * associated with.
  *
+ * This removes the given &va from the GPU VA list of the &drm_gpuvm_bo and
+ * the &drm_gpuvm_bo from the &drm_gem_object it is associated with in case
+ * this call unlinks the last &drm_gpuva from the &drm_gpuvm_bo.
+ *
+ * For every &drm_gpuva entry removed from the &drm_gpuvm_bo a reference of
+ * the latter is dropped.
+ *
  * This function expects the caller to protect the GEM's GPUVA list against
- * concurrent access using the GEMs dma_resv lock.
+ * concurrent access using either the GEMs dma_resv lock or a driver specific
+ * lock set through drm_gem_gpuva_set_lock().
  */
 void
 drm_gpuva_unlink(struct drm_gpuva *va)
 {
 	struct drm_gem_object *obj = va->gem.obj;
+	struct drm_gpuvm_bo *vm_bo = va->vm_bo;
 
 	if (unlikely(!obj))
 		return;
 
 	drm_gem_gpuva_assert_lock_held(obj);
-
 	list_del_init(&va->gem.entry);
+
+	va->vm_bo = NULL;
+	drm_gpuvm_bo_put(vm_bo);
 }
 EXPORT_SYMBOL_GPL(drm_gpuva_unlink);
 
@@ -992,10 +1983,10 @@ drm_gpuva_remap(struct drm_gpuva *prev,
 		struct drm_gpuva *next,
 		struct drm_gpuva_op_remap *op)
 {
-	struct drm_gpuva *curr = op->unmap->va;
-	struct drm_gpuvm *gpuvm = curr->vm;
+	struct drm_gpuva *va = op->unmap->va;
+	struct drm_gpuvm *gpuvm = va->vm;
 
-	drm_gpuva_remove(curr);
+	drm_gpuva_remove(va);
 
 	if (op->prev) {
 		drm_gpuva_init_from_op(prev, op->prev);
@@ -1637,9 +2628,8 @@ err_free_ops:
 EXPORT_SYMBOL_GPL(drm_gpuvm_prefetch_ops_create);
 
 /**
- * drm_gpuvm_gem_unmap_ops_create() - creates the &drm_gpuva_ops to unmap a GEM
- * @gpuvm: the &drm_gpuvm representing the GPU VA space
- * @obj: the &drm_gem_object to unmap
+ * drm_gpuvm_bo_unmap_ops_create() - creates the &drm_gpuva_ops to unmap a GEM
+ * @vm_bo: the &drm_gpuvm_bo abstraction
  *
  * This function creates a list of operations to perform unmapping for every
  * GPUVA attached to a GEM.
@@ -1656,15 +2646,14 @@ EXPORT_SYMBOL_GPL(drm_gpuvm_prefetch_ops_create);
  * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
  */
 struct drm_gpuva_ops *
-drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
-			       struct drm_gem_object *obj)
+drm_gpuvm_bo_unmap_ops_create(struct drm_gpuvm_bo *vm_bo)
 {
 	struct drm_gpuva_ops *ops;
 	struct drm_gpuva_op *op;
 	struct drm_gpuva *va;
 	int ret;
 
-	drm_gem_gpuva_assert_lock_held(obj);
+	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
 
 	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
 	if (!ops)
@@ -1672,8 +2661,8 @@ drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
 
 	INIT_LIST_HEAD(&ops->list);
 
-	drm_gem_for_each_gpuva(va, obj) {
-		op = gpuva_op_alloc(gpuvm);
+	drm_gpuvm_bo_for_each_va(va, vm_bo) {
+		op = gpuva_op_alloc(vm_bo->vm);
 		if (!op) {
 			ret = -ENOMEM;
 			goto err_free_ops;
@@ -1687,10 +2676,10 @@ drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
 	return ops;
 
 err_free_ops:
-	drm_gpuva_ops_free(gpuvm, ops);
+	drm_gpuva_ops_free(vm_bo->vm, ops);
 	return ERR_PTR(ret);
 }
-EXPORT_SYMBOL_GPL(drm_gpuvm_gem_unmap_ops_create);
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_unmap_ops_create);
 
 /**
  * drm_gpuva_ops_free() - free the given &drm_gpuva_ops
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 8462b657c375..b12c463bc460 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -22,6 +22,7 @@
  */
 
 #include <linux/kthread.h>
+#include <linux/types.h>
 
 #include <drm/drm_ioctl.h>
 #include <drm/drm_vblank.h>
@@ -31,6 +32,7 @@
 
 #define DRM_IF_VERSION(maj, min) (maj << 16 | min)
 
+struct cea_sad;
 struct dentry;
 struct dma_buf;
 struct iosys_map;
@@ -267,3 +269,7 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
 				const struct drm_framebuffer *fb);
 void drm_framebuffer_debugfs_init(struct drm_device *dev);
+
+/* drm_edid.c */
+void drm_edid_cta_sad_get(const struct cea_sad *cta_sad, u8 *sad);
+void drm_edid_cta_sad_set(struct cea_sad *cta_sad, const u8 *sad);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 77590b0f38fa..44fda68c28ae 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -675,6 +675,7 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB, drm_mode_addfb_ioctl, 0),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB2, drm_mode_addfb2_ioctl, 0),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_RMFB, drm_mode_rmfb_ioctl, 0),
+	DRM_IOCTL_DEF(DRM_IOCTL_MODE_CLOSEFB, drm_mode_closefb_ioctl, 0),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_PAGE_FLIP, drm_mode_page_flip_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_DIRTYFB, drm_mode_dirtyfb_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_DUMB, drm_mode_create_dumb_ioctl, 0),
diff --git a/drivers/gpu/drm/drm_mipi_dbi.c b/drivers/gpu/drm/drm_mipi_dbi.c
index e90f0bf895b3..daac649aabdb 100644
--- a/drivers/gpu/drm/drm_mipi_dbi.c
+++ b/drivers/gpu/drm/drm_mipi_dbi.c
@@ -197,12 +197,14 @@ EXPORT_SYMBOL(mipi_dbi_command_stackbuf);
  * @fb: The source framebuffer
  * @clip: Clipping rectangle of the area to be copied
  * @swap: When true, swap MSB/LSB of 16-bit values
+ * @fmtcnv_state: Format-conversion state
  *
  * Returns:
  * Zero on success, negative error code on failure.
  */
 int mipi_dbi_buf_copy(void *dst, struct iosys_map *src, struct drm_framebuffer *fb,
-		      struct drm_rect *clip, bool swap)
+		      struct drm_rect *clip, bool swap,
+		      struct drm_format_conv_state *fmtcnv_state)
 {
 	struct drm_gem_object *gem = drm_gem_fb_get_obj(fb, 0);
 	struct iosys_map dst_map = IOSYS_MAP_INIT_VADDR(dst);
@@ -215,12 +217,13 @@ int mipi_dbi_buf_copy(void *dst, struct iosys_map *src, struct drm_framebuffer *
 	switch (fb->format->format) {
 	case DRM_FORMAT_RGB565:
 		if (swap)
-			drm_fb_swab(&dst_map, NULL, src, fb, clip, !gem->import_attach);
+			drm_fb_swab(&dst_map, NULL, src, fb, clip, !gem->import_attach,
+				    fmtcnv_state);
 		else
 			drm_fb_memcpy(&dst_map, NULL, src, fb, clip);
 		break;
 	case DRM_FORMAT_XRGB8888:
-		drm_fb_xrgb8888_to_rgb565(&dst_map, NULL, src, fb, clip, swap);
+		drm_fb_xrgb8888_to_rgb565(&dst_map, NULL, src, fb, clip, fmtcnv_state, swap);
 		break;
 	default:
 		drm_err_once(fb->dev, "Format is not supported: %p4cc\n",
@@ -252,7 +255,7 @@ static void mipi_dbi_set_window_address(struct mipi_dbi_dev *dbidev,
 }
 
 static void mipi_dbi_fb_dirty(struct iosys_map *src, struct drm_framebuffer *fb,
-			      struct drm_rect *rect)
+			      struct drm_rect *rect, struct drm_format_conv_state *fmtcnv_state)
 {
 	struct mipi_dbi_dev *dbidev = drm_to_mipi_dbi_dev(fb->dev);
 	unsigned int height = rect->y2 - rect->y1;
@@ -270,7 +273,7 @@ static void mipi_dbi_fb_dirty(struct iosys_map *src, struct drm_framebuffer *fb,
 	if (!dbi->dc || !full || swap ||
 	    fb->format->format == DRM_FORMAT_XRGB8888) {
 		tr = dbidev->tx_buf;
-		ret = mipi_dbi_buf_copy(tr, src, fb, rect, swap);
+		ret = mipi_dbi_buf_copy(tr, src, fb, rect, swap, fmtcnv_state);
 		if (ret)
 			goto err_msg;
 	} else {
@@ -332,7 +335,8 @@ void mipi_dbi_pipe_update(struct drm_simple_display_pipe *pipe,
 		return;
 
 	if (drm_atomic_helper_damage_merged(old_state, state, &rect))
-		mipi_dbi_fb_dirty(&shadow_plane_state->data[0], fb, &rect);
+		mipi_dbi_fb_dirty(&shadow_plane_state->data[0], fb, &rect,
+				  &shadow_plane_state->fmtcnv_state);
 
 	drm_dev_exit(idx);
 }
@@ -368,7 +372,8 @@ void mipi_dbi_enable_flush(struct mipi_dbi_dev *dbidev,
 	if (!drm_dev_enter(&dbidev->drm, &idx))
 		return;
 
-	mipi_dbi_fb_dirty(&shadow_plane_state->data[0], fb, &rect);
+	mipi_dbi_fb_dirty(&shadow_plane_state->data[0], fb, &rect,
+			  &shadow_plane_state->fmtcnv_state);
 	backlight_enable(dbidev->backlight);
 
 	drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 2416c526f9b0..3d0f8d182506 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -535,7 +535,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&submit->sched_job,
 				 &ctx->sched_entity[args->pipe],
-				 submit->ctx);
+				 1, submit->ctx);
 	if (ret)
 		goto err_submit_put;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 9276756e1397..5105d290e72e 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1917,7 +1917,7 @@ static int etnaviv_gpu_rpm_suspend(struct device *dev)
 	u32 idle, mask;
 
 	/* If there are any jobs in the HW queue, we're not idle */
-	if (atomic_read(&gpu->sched.hw_rq_count))
+	if (atomic_read(&gpu->sched.credit_count))
 		return -EBUSY;
 
 	/* Check whether the hardware (except FE and MC) is idle */
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 9b79f218e21a..c4b04b0dee16 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 {
 	int ret;
 
-	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
+	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
diff --git a/drivers/gpu/drm/gud/gud_pipe.c b/drivers/gpu/drm/gud/gud_pipe.c
index a02f75be81f0..e163649816d5 100644
--- a/drivers/gpu/drm/gud/gud_pipe.c
+++ b/drivers/gpu/drm/gud/gud_pipe.c
@@ -51,7 +51,8 @@ static bool gud_is_big_endian(void)
 
 static size_t gud_xrgb8888_to_r124(u8 *dst, const struct drm_format_info *format,
 				   void *src, struct drm_framebuffer *fb,
-				   struct drm_rect *rect)
+				   struct drm_rect *rect,
+				   struct drm_format_conv_state *fmtcnv_state)
 {
 	unsigned int block_width = drm_format_info_block_width(format, 0);
 	unsigned int bits_per_pixel = 8 / block_width;
@@ -75,7 +76,7 @@ static size_t gud_xrgb8888_to_r124(u8 *dst, const struct drm_format_info *format
 
 	iosys_map_set_vaddr(&dst_map, buf);
 	iosys_map_set_vaddr(&vmap, src);
-	drm_fb_xrgb8888_to_gray8(&dst_map, NULL, &vmap, fb, rect);
+	drm_fb_xrgb8888_to_gray8(&dst_map, NULL, &vmap, fb, rect, fmtcnv_state);
 	pix8 = buf;
 
 	for (y = 0; y < height; y++) {
@@ -152,7 +153,8 @@ static size_t gud_xrgb8888_to_color(u8 *dst, const struct drm_format_info *forma
 static int gud_prep_flush(struct gud_device *gdrm, struct drm_framebuffer *fb,
 			  const struct iosys_map *src, bool cached_reads,
 			  const struct drm_format_info *format, struct drm_rect *rect,
-			  struct gud_set_buffer_req *req)
+			  struct gud_set_buffer_req *req,
+			  struct drm_format_conv_state *fmtcnv_state)
 {
 	u8 compression = gdrm->compression;
 	struct iosys_map dst;
@@ -178,23 +180,23 @@ retry:
 	 */
 	if (format != fb->format) {
 		if (format->format == GUD_DRM_FORMAT_R1) {
-			len = gud_xrgb8888_to_r124(buf, format, vaddr, fb, rect);
+			len = gud_xrgb8888_to_r124(buf, format, vaddr, fb, rect, fmtcnv_state);
 			if (!len)
 				return -ENOMEM;
 		} else if (format->format == DRM_FORMAT_R8) {
-			drm_fb_xrgb8888_to_gray8(&dst, NULL, src, fb, rect);
+			drm_fb_xrgb8888_to_gray8(&dst, NULL, src, fb, rect, fmtcnv_state);
 		} else if (format->format == DRM_FORMAT_RGB332) {
-			drm_fb_xrgb8888_to_rgb332(&dst, NULL, src, fb, rect);
+			drm_fb_xrgb8888_to_rgb332(&dst, NULL, src, fb, rect, fmtcnv_state);
 		} else if (format->format == DRM_FORMAT_RGB565) {
-			drm_fb_xrgb8888_to_rgb565(&dst, NULL, src, fb, rect,
+			drm_fb_xrgb8888_to_rgb565(&dst, NULL, src, fb, rect, fmtcnv_state,
 						  gud_is_big_endian());
 		} else if (format->format == DRM_FORMAT_RGB888) {
-			drm_fb_xrgb8888_to_rgb888(&dst, NULL, src, fb, rect);
+			drm_fb_xrgb8888_to_rgb888(&dst, NULL, src, fb, rect, fmtcnv_state);
 		} else {
 			len = gud_xrgb8888_to_color(buf, format, vaddr, fb, rect);
 		}
 	} else if (gud_is_big_endian() && format->cpp[0] > 1) {
-		drm_fb_swab(&dst, NULL, src, fb, rect, cached_reads);
+		drm_fb_swab(&dst, NULL, src, fb, rect, cached_reads, fmtcnv_state);
 	} else if (compression && cached_reads && pitch == fb->pitches[0]) {
 		/* can compress directly from the framebuffer */
 		buf = vaddr + rect->y1 * pitch;
@@ -266,7 +268,8 @@ static int gud_usb_bulk(struct gud_device *gdrm, size_t len)
 
 static int gud_flush_rect(struct gud_device *gdrm, struct drm_framebuffer *fb,
 			  const struct iosys_map *src, bool cached_reads,
-			  const struct drm_format_info *format, struct drm_rect *rect)
+			  const struct drm_format_info *format, struct drm_rect *rect,
+			  struct drm_format_conv_state *fmtcnv_state)
 {
 	struct gud_set_buffer_req req;
 	size_t len, trlen;
@@ -274,7 +277,7 @@ static int gud_flush_rect(struct gud_device *gdrm, struct drm_framebuffer *fb,
 
 	drm_dbg(&gdrm->drm, "Flushing [FB:%d] " DRM_RECT_FMT "\n", fb->base.id, DRM_RECT_ARG(rect));
 
-	ret = gud_prep_flush(gdrm, fb, src, cached_reads, format, rect, &req);
+	ret = gud_prep_flush(gdrm, fb, src, cached_reads, format, rect, &req, fmtcnv_state);
 	if (ret)
 		return ret;
 
@@ -318,6 +321,7 @@ static void gud_flush_damage(struct gud_device *gdrm, struct drm_framebuffer *fb
 			     const struct iosys_map *src, bool cached_reads,
 			     struct drm_rect *damage)
 {
+	struct drm_format_conv_state fmtcnv_state = DRM_FORMAT_CONV_STATE_INIT;
 	const struct drm_format_info *format;
 	unsigned int i, lines;
 	size_t pitch;
@@ -340,7 +344,7 @@ static void gud_flush_damage(struct gud_device *gdrm, struct drm_framebuffer *fb
 		rect.y1 += i * lines;
 		rect.y2 = min_t(u32, rect.y1 + lines, damage->y2);
 
-		ret = gud_flush_rect(gdrm, fb, src, cached_reads, format, &rect);
+		ret = gud_flush_rect(gdrm, fb, src, cached_reads, format, &rect, &fmtcnv_state);
 		if (ret) {
 			if (ret != -ENODEV && ret != -ECONNRESET &&
 			    ret != -ESHUTDOWN && ret != -EPROTO)
@@ -350,6 +354,8 @@ static void gud_flush_damage(struct gud_device *gdrm, struct drm_framebuffer *fb
 			break;
 		}
 	}
+
+	drm_format_conv_state_release(&fmtcnv_state);
 }
 
 void gud_flush_work(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/display/intel_audio.c b/drivers/gpu/drm/i915/display/intel_audio.c
index 19605264a35c..39f5b698e08a 100644
--- a/drivers/gpu/drm/i915/display/intel_audio.c
+++ b/drivers/gpu/drm/i915/display/intel_audio.c
@@ -25,6 +25,7 @@
 #include <linux/kernel.h>
 
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <drm/i915_component.h>
 
 #include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/display/intel_crtc_state_dump.c b/drivers/gpu/drm/i915/display/intel_crtc_state_dump.c
index 66fe880af8f3..2d15e82c0b3d 100644
--- a/drivers/gpu/drm/i915/display/intel_crtc_state_dump.c
+++ b/drivers/gpu/drm/i915/display/intel_crtc_state_dump.c
@@ -4,6 +4,7 @@
  */
 
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 
 #include "i915_drv.h"
 #include "intel_crtc_state_dump.h"
diff --git a/drivers/gpu/drm/i915/display/intel_sdvo.c b/drivers/gpu/drm/i915/display/intel_sdvo.c
index a636f42ceae5..3eac559043d7 100644
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@@ -35,6 +35,7 @@
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 
 #include "i915_drv.h"
 #include "i915_reg.h"
diff --git a/drivers/gpu/drm/lima/lima_device.c b/drivers/gpu/drm/lima/lima_device.c
index 02cef0cea657..0bf7105c8748 100644
--- a/drivers/gpu/drm/lima/lima_device.c
+++ b/drivers/gpu/drm/lima/lima_device.c
@@ -514,7 +514,7 @@ int lima_device_suspend(struct device *dev)
 
 	/* check any task running */
 	for (i = 0; i < lima_pipe_num; i++) {
-		if (atomic_read(&ldev->pipe[i].base.hw_rq_count))
+		if (atomic_read(&ldev->pipe[i].base.credit_count))
 			return -EBUSY;
 	}
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 295f0353a02e..c3bf8cda8498 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -123,7 +123,7 @@ int lima_sched_task_init(struct lima_sched_task *task,
 	for (i = 0; i < num_bos; i++)
 		drm_gem_object_get(&bos[i]->base.base);
 
-	err = drm_sched_job_init(&task->base, &context->base, vm);
+	err = drm_sched_job_init(&task->base, &context->base, 1, vm);
 	if (err) {
 		kfree(task->bos);
 		return err;
@@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
-	return drm_sched_init(&pipe->base, &lima_sched_ops,
+	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL,
 			      DRM_SCHED_PRIORITY_COUNT,
 			      1,
 			      lima_job_hang_limit,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 41b13dec9bef..f62ab5257e66 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -841,7 +841,8 @@ static void suspend_scheduler(struct msm_gpu *gpu)
 	 */
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_park(sched->thread);
+
+		drm_sched_wqueue_stop(sched);
 	}
 }
 
@@ -851,7 +852,8 @@ static void resume_scheduler(struct msm_gpu *gpu)
 
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_unpark(sched->thread);
+
+		drm_sched_wqueue_start(sched);
 	}
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 99744de6c05a..c002cabe7b9c 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -48,7 +48,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return ERR_PTR(ret);
 	}
 
-	ret = drm_sched_job_init(&submit->base, queue->entity, queue);
+	ret = drm_sched_job_init(&submit->base, queue->entity, 1, queue);
 	if (ret) {
 		kfree(submit->hw_fence);
 		kfree(submit);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 95257ab0185d..4968568e3b54 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	 /* currently managing hangcheck ourselves: */
 	sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
-	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
+	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     num_hw_submissions, 0, sched_timeout,
 			     NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 7840b6428afb..df8da9cab515 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -38,6 +38,7 @@
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <drm/drm_fb_helper.h>
 #include <drm/drm_probe_helper.h>
 #include <drm/drm_vblank.h>
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 0f3bd187ede6..7afad86da64b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -148,10 +148,17 @@ nouveau_bo_del_ttm(struct ttm_buffer_object *bo)
 	 * If nouveau_bo_new() allocated this buffer, the GEM object was never
 	 * initialized, so don't attempt to release it.
 	 */
-	if (bo->base.dev)
+	if (bo->base.dev) {
+		/* Gem objects not being shared with other VMs get their
+		 * dma_resv from a root GEM object.
+		 */
+		if (nvbo->no_share)
+			drm_gem_object_put(nvbo->r_obj);
+
 		drm_gem_object_release(&bo->base);
-	else
+	} else {
 		dma_resv_fini(&bo->base._resv);
+	}
 
 	kfree(nvbo);
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index 07f671cf895e..70c551921a9e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -26,6 +26,11 @@ struct nouveau_bo {
 	struct list_head entry;
 	int pbbo_index;
 	bool validate_mapped;
+
+	/* Root GEM object we derive the dma_resv of in case this BO is not
+	 * shared between VMs.
+	 */
+	struct drm_gem_object *r_obj;
 	bool no_share;
 
 	/* GPU address space is independent of CPU word size */
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 50589f982d1a..f603eaef1560 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -190,6 +190,8 @@ nouveau_cli_work_queue(struct nouveau_cli *cli, struct dma_fence *fence,
 static void
 nouveau_cli_fini(struct nouveau_cli *cli)
 {
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm_locked(cli);
+
 	/* All our channels are dead now, which means all the fences they
 	 * own are signalled, and all callback functions have been called.
 	 *
@@ -199,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli)
 	WARN_ON(!list_empty(&cli->worker));
 
 	usif_client_fini(cli);
-	nouveau_uvmm_fini(&cli->uvmm);
+	if (uvmm)
+		nouveau_uvmm_fini(uvmm);
 	nouveau_sched_entity_fini(&cli->sched_entity);
 	nouveau_vmm_fini(&cli->svm);
 	nouveau_vmm_fini(&cli->vmm);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index e73a233c6572..7f7051df84a6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -93,7 +93,10 @@ struct nouveau_cli {
 	struct nvif_mmu mmu;
 	struct nouveau_vmm vmm;
 	struct nouveau_vmm svm;
-	struct nouveau_uvmm uvmm;
+	struct {
+		struct nouveau_uvmm *ptr;
+		bool disabled;
+	} uvmm;
 
 	struct nouveau_sched_entity sched_entity;
 
@@ -121,10 +124,7 @@ struct nouveau_cli_work {
 static inline struct nouveau_uvmm *
 nouveau_cli_uvmm(struct nouveau_cli *cli)
 {
-	if (!cli || !cli->uvmm.vmm.cli)
-		return NULL;
-
-	return &cli->uvmm;
+	return cli ? cli->uvmm.ptr : NULL;
 }
 
 static inline struct nouveau_uvmm *
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index a0d303e5ce3d..49c2bcbef129 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -111,7 +111,8 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
 	if (vmm->vmm.object.oclass < NVIF_CLASS_VMM_NV50)
 		return 0;
 
-	if (nvbo->no_share && uvmm && &uvmm->resv != nvbo->bo.base.resv)
+	if (nvbo->no_share && uvmm &&
+	    drm_gpuvm_resv(&uvmm->base) != nvbo->bo.base.resv)
 		return -EPERM;
 
 	ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);
@@ -245,7 +246,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
 		if (unlikely(!uvmm))
 			return -EINVAL;
 
-		resv = &uvmm->resv;
+		resv = drm_gpuvm_resv(&uvmm->base);
 	}
 
 	if (!(domain & (NOUVEAU_GEM_DOMAIN_VRAM | NOUVEAU_GEM_DOMAIN_GART)))
@@ -288,6 +289,11 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
 	if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_TESLA)
 		nvbo->valid_domains &= domain;
 
+	if (nvbo->no_share) {
+		nvbo->r_obj = drm_gpuvm_resv_obj(&uvmm->base);
+		drm_gem_object_get(nvbo->r_obj);
+	}
+
 	*pnvbo = nvbo;
 	return 0;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 7c376c4ccdcf..e03fd2bc8a11 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -89,7 +89,7 @@ nouveau_job_init(struct nouveau_job *job,
 
 	}
 
-	ret = drm_sched_job_init(&job->base, &entity->base, NULL);
+	ret = drm_sched_job_init(&job->base, &entity->base, 1, NULL);
 	if (ret)
 		goto err_free_chains;
 
@@ -435,7 +435,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
 	if (!drm->sched_wq)
 		return -ENOMEM;
 
-	return drm_sched_init(sched, &nouveau_sched_ops,
+	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
 			      DRM_SCHED_PRIORITY_COUNT,
 			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
 			      NULL, NULL, "nouveau_sched", drm->dev->dev);
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 5cf892c50f43..eda7bb8624f1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -62,6 +62,8 @@ struct bind_job_op {
 	enum vm_bind_op op;
 	u32 flags;
 
+	struct drm_gpuvm_bo *vm_bo;
+
 	struct {
 		u64 addr;
 		u64 range;
@@ -929,25 +931,13 @@ nouveau_uvmm_sm_unmap_cleanup(struct nouveau_uvmm *uvmm,
 static int
 nouveau_uvmm_validate_range(struct nouveau_uvmm *uvmm, u64 addr, u64 range)
 {
-	u64 end = addr + range;
-	u64 kernel_managed_end = uvmm->kernel_managed_addr +
-				 uvmm->kernel_managed_size;
-
 	if (addr & ~PAGE_MASK)
 		return -EINVAL;
 
 	if (range & ~PAGE_MASK)
 		return -EINVAL;
 
-	if (end <= addr)
-		return -EINVAL;
-
-	if (addr < NOUVEAU_VA_SPACE_START ||
-	    end > NOUVEAU_VA_SPACE_END)
-		return -EINVAL;
-
-	if (addr < kernel_managed_end &&
-	    end > uvmm->kernel_managed_addr)
+	if (!drm_gpuvm_range_valid(&uvmm->base, addr, range))
 		return -EINVAL;
 
 	return 0;
@@ -1113,22 +1103,28 @@ bind_validate_region(struct nouveau_job *job)
 }
 
 static void
-bind_link_gpuvas(struct drm_gpuva_ops *ops, struct nouveau_uvma_prealloc *new)
+bind_link_gpuvas(struct bind_job_op *bop)
 {
+	struct nouveau_uvma_prealloc *new = &bop->new;
+	struct drm_gpuvm_bo *vm_bo = bop->vm_bo;
+	struct drm_gpuva_ops *ops = bop->ops;
 	struct drm_gpuva_op *op;
 
 	drm_gpuva_for_each_op(op, ops) {
 		switch (op->op) {
 		case DRM_GPUVA_OP_MAP:
-			drm_gpuva_link(&new->map->va);
+			drm_gpuva_link(&new->map->va, vm_bo);
 			break;
-		case DRM_GPUVA_OP_REMAP:
+		case DRM_GPUVA_OP_REMAP: {
+			struct drm_gpuva *va = op->remap.unmap->va;
+
 			if (op->remap.prev)
-				drm_gpuva_link(&new->prev->va);
+				drm_gpuva_link(&new->prev->va, va->vm_bo);
 			if (op->remap.next)
-				drm_gpuva_link(&new->next->va);
-			drm_gpuva_unlink(op->remap.unmap->va);
+				drm_gpuva_link(&new->next->va, va->vm_bo);
+			drm_gpuva_unlink(va);
 			break;
+		}
 		case DRM_GPUVA_OP_UNMAP:
 			drm_gpuva_unlink(op->unmap.va);
 			break;
@@ -1150,10 +1146,17 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
 
 	list_for_each_op(op, &bind_job->ops) {
 		if (op->op == OP_MAP) {
-			op->gem.obj = drm_gem_object_lookup(job->file_priv,
-							    op->gem.handle);
-			if (!op->gem.obj)
+			struct drm_gem_object *obj = op->gem.obj =
+				drm_gem_object_lookup(job->file_priv,
+						      op->gem.handle);
+			if (!obj)
 				return -ENOENT;
+
+			dma_resv_lock(obj->resv, NULL);
+			op->vm_bo = drm_gpuvm_bo_obtain(&uvmm->base, obj);
+			dma_resv_unlock(obj->resv);
+			if (IS_ERR(op->vm_bo))
+				return PTR_ERR(op->vm_bo);
 		}
 
 		ret = bind_validate_op(job, op);
@@ -1364,7 +1367,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
 		case OP_UNMAP_SPARSE:
 		case OP_MAP:
 		case OP_UNMAP:
-			bind_link_gpuvas(op->ops, &op->new);
+			bind_link_gpuvas(op);
 			break;
 		default:
 			break;
@@ -1511,6 +1514,12 @@ nouveau_uvmm_bind_job_free_work_fn(struct work_struct *work)
 		if (!IS_ERR_OR_NULL(op->ops))
 			drm_gpuva_ops_free(&uvmm->base, op->ops);
 
+		if (!IS_ERR_OR_NULL(op->vm_bo)) {
+			dma_resv_lock(obj->resv, NULL);
+			drm_gpuvm_bo_put(op->vm_bo);
+			dma_resv_unlock(obj->resv);
+		}
+
 		if (obj)
 			drm_gem_object_put(obj);
 	}
@@ -1648,18 +1657,6 @@ err_free:
 	return ret;
 }
 
-int
-nouveau_uvmm_ioctl_vm_init(struct drm_device *dev,
-			   void *data,
-			   struct drm_file *file_priv)
-{
-	struct nouveau_cli *cli = nouveau_cli(file_priv);
-	struct drm_nouveau_vm_init *init = data;
-
-	return nouveau_uvmm_init(&cli->uvmm, cli, init->kernel_managed_addr,
-				 init->kernel_managed_size);
-}
-
 static int
 nouveau_uvmm_vm_bind(struct nouveau_uvmm_bind_job_args *args)
 {
@@ -1776,15 +1773,18 @@ void
 nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbo, struct nouveau_mem *mem)
 {
 	struct drm_gem_object *obj = &nvbo->bo.base;
+	struct drm_gpuvm_bo *vm_bo;
 	struct drm_gpuva *va;
 
 	dma_resv_assert_held(obj->resv);
 
-	drm_gem_for_each_gpuva(va, obj) {
-		struct nouveau_uvma *uvma = uvma_from_va(va);
+	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+		drm_gpuvm_bo_for_each_va(va, vm_bo) {
+			struct nouveau_uvma *uvma = uvma_from_va(va);
 
-		nouveau_uvma_map(uvma, mem);
-		drm_gpuva_invalidate(va, false);
+			nouveau_uvma_map(uvma, mem);
+			drm_gpuva_invalidate(va, false);
+		}
 	}
 }
 
@@ -1792,29 +1792,53 @@ void
 nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo)
 {
 	struct drm_gem_object *obj = &nvbo->bo.base;
+	struct drm_gpuvm_bo *vm_bo;
 	struct drm_gpuva *va;
 
 	dma_resv_assert_held(obj->resv);
 
-	drm_gem_for_each_gpuva(va, obj) {
-		struct nouveau_uvma *uvma = uvma_from_va(va);
+	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+		drm_gpuvm_bo_for_each_va(va, vm_bo) {
+			struct nouveau_uvma *uvma = uvma_from_va(va);
 
-		nouveau_uvma_unmap(uvma);
-		drm_gpuva_invalidate(va, true);
+			nouveau_uvma_unmap(uvma);
+			drm_gpuva_invalidate(va, true);
+		}
 	}
 }
 
-int
-nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
-		  u64 kernel_managed_addr, u64 kernel_managed_size)
+static void
+nouveau_uvmm_free(struct drm_gpuvm *gpuvm)
 {
-	int ret;
-	u64 kernel_managed_end = kernel_managed_addr + kernel_managed_size;
+	struct nouveau_uvmm *uvmm = uvmm_from_gpuvm(gpuvm);
 
-	mutex_init(&uvmm->mutex);
-	dma_resv_init(&uvmm->resv);
-	mt_init_flags(&uvmm->region_mt, MT_FLAGS_LOCK_EXTERN);
-	mt_set_external_lock(&uvmm->region_mt, &uvmm->mutex);
+	kfree(uvmm);
+}
+
+static const struct drm_gpuvm_ops gpuvm_ops = {
+	.vm_free = nouveau_uvmm_free,
+};
+
+int
+nouveau_uvmm_ioctl_vm_init(struct drm_device *dev,
+			   void *data,
+			   struct drm_file *file_priv)
+{
+	struct nouveau_uvmm *uvmm;
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct drm_device *drm = cli->drm->dev;
+	struct drm_gem_object *r_obj;
+	struct drm_nouveau_vm_init *init = data;
+	u64 kernel_managed_end;
+	int ret;
+
+	if (check_add_overflow(init->kernel_managed_addr,
+			       init->kernel_managed_size,
+			       &kernel_managed_end))
+		return -EINVAL;
+
+	if (kernel_managed_end > NOUVEAU_VA_SPACE_END)
+		return -EINVAL;
 
 	mutex_lock(&cli->mutex);
 
@@ -1823,39 +1847,48 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
 		goto out_unlock;
 	}
 
-	if (kernel_managed_end <= kernel_managed_addr) {
-		ret = -EINVAL;
+	uvmm = kzalloc(sizeof(*uvmm), GFP_KERNEL);
+	if (!uvmm) {
+		ret = -ENOMEM;
 		goto out_unlock;
 	}
 
-	if (kernel_managed_end > NOUVEAU_VA_SPACE_END) {
-		ret = -EINVAL;
+	r_obj = drm_gpuvm_resv_object_alloc(drm);
+	if (!r_obj) {
+		kfree(uvmm);
+		ret = -ENOMEM;
 		goto out_unlock;
 	}
 
-	uvmm->kernel_managed_addr = kernel_managed_addr;
-	uvmm->kernel_managed_size = kernel_managed_size;
+	mutex_init(&uvmm->mutex);
+	mt_init_flags(&uvmm->region_mt, MT_FLAGS_LOCK_EXTERN);
+	mt_set_external_lock(&uvmm->region_mt, &uvmm->mutex);
 
-	drm_gpuvm_init(&uvmm->base, cli->name,
+	drm_gpuvm_init(&uvmm->base, cli->name, 0, drm, r_obj,
 		       NOUVEAU_VA_SPACE_START,
 		       NOUVEAU_VA_SPACE_END,
-		       kernel_managed_addr, kernel_managed_size,
-		       NULL);
+		       init->kernel_managed_addr,
+		       init->kernel_managed_size,
+		       &gpuvm_ops);
+	/* GPUVM takes care from here on. */
+	drm_gem_object_put(r_obj);
 
 	ret = nvif_vmm_ctor(&cli->mmu, "uvmm",
 			    cli->vmm.vmm.object.oclass, RAW,
-			    kernel_managed_addr, kernel_managed_size,
-			    NULL, 0, &cli->uvmm.vmm.vmm);
+			    init->kernel_managed_addr,
+			    init->kernel_managed_size,
+			    NULL, 0, &uvmm->vmm.vmm);
 	if (ret)
-		goto out_free_gpuva_mgr;
+		goto out_gpuvm_fini;
 
-	cli->uvmm.vmm.cli = cli;
+	uvmm->vmm.cli = cli;
+	cli->uvmm.ptr = uvmm;
 	mutex_unlock(&cli->mutex);
 
 	return 0;
 
-out_free_gpuva_mgr:
-	drm_gpuvm_destroy(&uvmm->base);
+out_gpuvm_fini:
+	drm_gpuvm_put(&uvmm->base);
 out_unlock:
 	mutex_unlock(&cli->mutex);
 	return ret;
@@ -1870,9 +1903,6 @@ nouveau_uvmm_fini(struct nouveau_uvmm *uvmm)
 	struct nouveau_sched_entity *entity = &cli->sched_entity;
 	struct drm_gpuva *va, *next;
 
-	if (!cli)
-		return;
-
 	rmb(); /* for list_empty to work without lock */
 	wait_event(entity->job.wq, list_empty(&entity->job.list.head));
 
@@ -1910,8 +1940,6 @@ nouveau_uvmm_fini(struct nouveau_uvmm *uvmm)
 
 	mutex_lock(&cli->mutex);
 	nouveau_vmm_fini(&uvmm->vmm);
-	drm_gpuvm_destroy(&uvmm->base);
+	drm_gpuvm_put(&uvmm->base);
 	mutex_unlock(&cli->mutex);
-
-	dma_resv_fini(&uvmm->resv);
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.h b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
index a308c59760a5..f0a6d98ace4f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
@@ -12,12 +12,6 @@ struct nouveau_uvmm {
 	struct nouveau_vmm vmm;
 	struct maple_tree region_mt;
 	struct mutex mutex;
-	struct dma_resv resv;
-
-	u64 kernel_managed_addr;
-	u64 kernel_managed_size;
-
-	bool disabled;
 };
 
 struct nouveau_uvma_region {
@@ -82,8 +76,6 @@ struct nouveau_uvmm_bind_job_args {
 
 #define to_uvmm_bind_job(job) container_of((job), struct nouveau_uvmm_bind_job, base)
 
-int nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
-		      u64 kernel_managed_addr, u64 kernel_managed_size);
 void nouveau_uvmm_fini(struct nouveau_uvmm *uvmm);
 
 void nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbov, struct nouveau_mem *mem);
diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index b2835b3ea6f5..6598c9c08ba1 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -69,7 +69,6 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state)
 {
 	struct drm_device *dev = old_state->dev;
 	struct omap_drm_private *priv = dev->dev_private;
-	bool fence_cookie = dma_fence_begin_signalling();
 
 	dispc_runtime_get(priv->dispc);
 
@@ -92,6 +91,8 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state)
 		omap_atomic_wait_for_completion(dev, old_state);
 
 		drm_atomic_helper_commit_planes(dev, old_state, 0);
+
+		drm_atomic_helper_commit_hw_done(old_state);
 	} else {
 		/*
 		 * OMAP3 DSS seems to have issues with the work-around above,
@@ -101,12 +102,10 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state)
 		drm_atomic_helper_commit_planes(dev, old_state, 0);
 
 		drm_atomic_helper_commit_modeset_enables(dev, old_state);
+
+		drm_atomic_helper_commit_hw_done(old_state);
 	}
 
-	drm_atomic_helper_commit_hw_done(old_state);
-
-	dma_fence_end_signalling(fence_cookie);
-
 	/*
 	 * Wait for completion of the page flips to ensure that old buffers
 	 * can't be touched by the hardware anymore before cleaning up planes.
diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c
index 95c8472d878a..f22677373171 100644
--- a/drivers/gpu/drm/panel/panel-edp.c
+++ b/drivers/gpu/drm/panel/panel-edp.c
@@ -973,6 +973,8 @@ static const struct panel_desc auo_b116xak01 = {
 	},
 	.delay = {
 		.hpd_absent = 200,
+		.unprepare = 500,
+		.enable = 50,
 	},
 };
 
@@ -1801,6 +1803,12 @@ static const struct panel_delay delay_200_500_e50 = {
 	.enable = 50,
 };
 
+static const struct panel_delay delay_200_500_e80 = {
+	.hpd_absent = 200,
+	.unprepare = 500,
+	.enable = 80,
+};
+
 static const struct panel_delay delay_200_500_e80_d50 = {
 	.hpd_absent = 200,
 	.unprepare = 500,
@@ -1820,6 +1828,19 @@ static const struct panel_delay delay_200_500_e200 = {
 	.enable = 200,
 };
 
+static const struct panel_delay delay_200_500_e200_d10 = {
+	.hpd_absent = 200,
+	.unprepare = 500,
+	.enable = 200,
+	.disable = 10,
+};
+
+static const struct panel_delay delay_200_150_e200 = {
+	.hpd_absent = 200,
+	.unprepare = 150,
+	.enable = 200,
+};
+
 #define EDP_PANEL_ENTRY(vend_chr_0, vend_chr_1, vend_chr_2, product_id, _delay, _name) \
 { \
 	.name = _name, \
@@ -1840,34 +1861,69 @@ static const struct edp_panel_entry edp_panels[] = {
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x145c, &delay_200_500_e50, "B116XAB01.4"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x1e9b, &delay_200_500_e50, "B133UAN02.1"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x1ea5, &delay_200_500_e50, "B116XAK01.6"),
-	EDP_PANEL_ENTRY('A', 'U', 'O', 0x405c, &auo_b116xak01.delay, "B116XAK01"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x208d, &delay_200_500_e50, "B140HTN02.1"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x235c, &delay_200_500_e50, "B116XTN02.3"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x239b, &delay_200_500_e50, "B116XAN06.1"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x255c, &delay_200_500_e50, "B116XTN02.5"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x403d, &delay_200_500_e50, "B140HAN04.0"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x405c, &auo_b116xak01.delay, "B116XAK01.0"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x582d, &delay_200_500_e50, "B133UAN01.0"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x615c, &delay_200_500_e50, "B116XAN06.1"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x635c, &delay_200_500_e50, "B116XAN06.3"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x639c, &delay_200_500_e50, "B140HAK02.7"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x8594, &delay_200_500_e50, "B133UAN01.0"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0xf390, &delay_200_500_e50, "B140XTN07.7"),
 
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0715, &delay_200_150_e200, "NT116WHM-N21"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0731, &delay_200_500_e80, "NT116WHM-N42"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0741, &delay_200_500_e200, "NT116WHM-N44"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0786, &delay_200_500_p2e80, "NV116WHM-T01"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x07d1, &boe_nv133fhm_n61.delay, "NV133FHM-N61"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x07f6, &delay_200_500_e200, "NT140FHM-N44"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x082d, &boe_nv133fhm_n61.delay, "NV133FHM-N62"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x08b2, &delay_200_500_e200, "NT140WHM-N49"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x09c3, &delay_200_500_e50, "NT116WHM-N21,836X2"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x094b, &delay_200_500_e50, "NT116WHM-N21"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0951, &delay_200_500_e80, "NV116WHM-N47"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x095f, &delay_200_500_e50, "NE135FBM-N41 v8.1"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0979, &delay_200_500_e50, "NV116WHM-N49 V8.0"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x098d, &boe_nv110wtm_n61.delay, "NV110WTM-N61"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x09ae, &delay_200_500_e200, "NT140FHM-N45"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x09dd, &delay_200_500_e50, "NT116WHM-N21"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0a5d, &delay_200_500_e50, "NV116WHM-N45"),
 	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0ac5, &delay_200_500_e50, "NV116WHM-N4C"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0b43, &delay_200_500_e200, "NV140FHM-T09"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0b56, &delay_200_500_e80, "NT140FHM-N47"),
+	EDP_PANEL_ENTRY('B', 'O', 'E', 0x0c20, &delay_200_500_e80, "NT140FHM-N47"),
 
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1132, &delay_200_500_e80_d50, "N116BGE-EA2"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1138, &innolux_n116bca_ea1.delay, "N116BCA-EA1-RC4"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1139, &delay_200_500_e80_d50, "N116BGE-EA2"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1145, &delay_200_500_e80_d50, "N116BCN-EB1"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x114c, &innolux_n116bca_ea1.delay, "N116BCA-EA1"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1152, &delay_200_500_e80_d50, "N116BCN-EA1"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1153, &delay_200_500_e80_d50, "N116BGE-EA2"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1154, &delay_200_500_e80_d50, "N116BCA-EA2"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1157, &delay_200_500_e80_d50, "N116BGE-EA2"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x115b, &delay_200_500_e80_d50, "N116BCN-EB1"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1247, &delay_200_500_e80_d50, "N120ACA-EA1"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x142b, &delay_200_500_e80_d50, "N140HCA-EAC"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x144f, &delay_200_500_e80_d50, "N140HGA-EA1"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x1468, &delay_200_500_e80, "N140HGA-EA1"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x14e5, &delay_200_500_e80_d50, "N140HGA-EA1"),
 	EDP_PANEL_ENTRY('C', 'M', 'N', 0x14d4, &delay_200_500_e80_d50, "N140HCA-EAC"),
+	EDP_PANEL_ENTRY('C', 'M', 'N', 0x14d6, &delay_200_500_e80_d50, "N140BGA-EA4"),
 
+	EDP_PANEL_ENTRY('H', 'K', 'C', 0x2d5c, &delay_200_500_e200, "MB116AN01-2"),
+
+	EDP_PANEL_ENTRY('I', 'V', 'O', 0x048e, &delay_200_500_e200_d10, "M116NWR6 R5"),
 	EDP_PANEL_ENTRY('I', 'V', 'O', 0x057d, &delay_200_500_e200, "R140NWF5 RH"),
 	EDP_PANEL_ENTRY('I', 'V', 'O', 0x854a, &delay_200_500_p2e100, "M133NW4J"),
 	EDP_PANEL_ENTRY('I', 'V', 'O', 0x854b, &delay_200_500_p2e100, "R133NW4K-R0"),
+	EDP_PANEL_ENTRY('I', 'V', 'O', 0x8c4d, &delay_200_150_e200, "R140NWFM R1"),
 
 	EDP_PANEL_ENTRY('K', 'D', 'B', 0x0624, &kingdisplay_kd116n21_30nv_a010.delay, "116N21-30NV-A010"),
+	EDP_PANEL_ENTRY('K', 'D', 'C', 0x0809, &delay_200_500_e50, "KD116N2930A15"),
 	EDP_PANEL_ENTRY('K', 'D', 'B', 0x1120, &delay_200_500_e80_d50, "116N29-30NK-C007"),
 
 	EDP_PANEL_ENTRY('S', 'H', 'P', 0x1511, &delay_200_500_e50, "LQ140M1JW48"),
diff --git a/drivers/gpu/drm/panel/panel-novatek-nt35510.c b/drivers/gpu/drm/panel/panel-novatek-nt35510.c
index d6dceb858008..83a9cf53d269 100644
--- a/drivers/gpu/drm/panel/panel-novatek-nt35510.c
+++ b/drivers/gpu/drm/panel/panel-novatek-nt35510.c
@@ -1023,7 +1023,7 @@ static const struct nt35510_config nt35510_hydis_hva40wv1 = {
 		.hdisplay = 480,
 		.hsync_start = 480 + 2, /* HFP = 2 */
 		.hsync_end = 480 + 2 + 0, /* HSync = 0 */
-		.htotal = 480 + 2 + 0 + 5, /* HFP = 5 */
+		.htotal = 480 + 2 + 0 + 5, /* HBP = 5 */
 		.vdisplay = 800,
 		.vsync_start = 800 + 2, /* VFP = 2 */
 		.vsync_end = 800 + 2 + 0, /* VSync = 0 */
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
index 28f7046e1b1a..c90ad5ee34e7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -403,7 +403,7 @@ void panfrost_device_reset(struct panfrost_device *pfdev)
 	panfrost_job_enable_interrupts(pfdev);
 }
 
-static int panfrost_device_resume(struct device *dev)
+static int panfrost_device_runtime_resume(struct device *dev)
 {
 	struct panfrost_device *pfdev = dev_get_drvdata(dev);
 
@@ -413,7 +413,7 @@ static int panfrost_device_resume(struct device *dev)
 	return 0;
 }
 
-static int panfrost_device_suspend(struct device *dev)
+static int panfrost_device_runtime_suspend(struct device *dev)
 {
 	struct panfrost_device *pfdev = dev_get_drvdata(dev);
 
@@ -426,5 +426,75 @@ static int panfrost_device_suspend(struct device *dev)
 	return 0;
 }
 
-EXPORT_GPL_RUNTIME_DEV_PM_OPS(panfrost_pm_ops, panfrost_device_suspend,
-			      panfrost_device_resume, NULL);
+static int panfrost_device_resume(struct device *dev)
+{
+	struct panfrost_device *pfdev = dev_get_drvdata(dev);
+	int ret;
+
+	if (pfdev->comp->pm_features & BIT(GPU_PM_VREG_OFF)) {
+		unsigned long freq = pfdev->pfdevfreq.fast_rate;
+		struct dev_pm_opp *opp;
+
+		opp = dev_pm_opp_find_freq_ceil(dev, &freq);
+		if (IS_ERR(opp))
+			return PTR_ERR(opp);
+		dev_pm_opp_set_opp(dev, opp);
+		dev_pm_opp_put(opp);
+	}
+
+	if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS)) {
+		ret = clk_enable(pfdev->clock);
+		if (ret)
+			goto err_clk;
+
+		if (pfdev->bus_clock) {
+			ret = clk_enable(pfdev->bus_clock);
+			if (ret)
+				goto err_bus_clk;
+		}
+	}
+
+	ret = pm_runtime_force_resume(dev);
+	if (ret)
+		goto err_resume;
+
+	return 0;
+
+err_resume:
+	if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS) && pfdev->bus_clock)
+		clk_disable(pfdev->bus_clock);
+err_bus_clk:
+	if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS))
+		clk_disable(pfdev->clock);
+err_clk:
+	if (pfdev->comp->pm_features & BIT(GPU_PM_VREG_OFF))
+		dev_pm_opp_set_opp(dev, NULL);
+	return ret;
+}
+
+static int panfrost_device_suspend(struct device *dev)
+{
+	struct panfrost_device *pfdev = dev_get_drvdata(dev);
+	int ret;
+
+	ret = pm_runtime_force_suspend(dev);
+	if (ret)
+		return ret;
+
+	if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS)) {
+		if (pfdev->bus_clock)
+			clk_disable(pfdev->bus_clock);
+
+		clk_disable(pfdev->clock);
+	}
+
+	if (pfdev->comp->pm_features & BIT(GPU_PM_VREG_OFF))
+		dev_pm_opp_set_opp(dev, NULL);
+
+	return 0;
+}
+
+EXPORT_GPL_DEV_PM_OPS(panfrost_pm_ops) = {
+	RUNTIME_PM_OPS(panfrost_device_runtime_suspend, panfrost_device_runtime_resume, NULL)
+	SYSTEM_SLEEP_PM_OPS(panfrost_device_suspend, panfrost_device_resume)
+};
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index 1ef38f60d5dc..0fc558db6bfd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -25,6 +25,16 @@ struct panfrost_perfcnt;
 #define NUM_JOB_SLOTS 3
 #define MAX_PM_DOMAINS 5
 
+/**
+ * enum panfrost_gpu_pm - Supported kernel power management features
+ * @GPU_PM_CLK_DIS:  Allow disabling clocks during system suspend
+ * @GPU_PM_VREG_OFF: Allow turning off regulators during system suspend
+ */
+enum panfrost_gpu_pm {
+	GPU_PM_CLK_DIS,
+	GPU_PM_VREG_OFF,
+};
+
 struct panfrost_features {
 	u16 id;
 	u16 revision;
@@ -75,6 +85,9 @@ struct panfrost_compatible {
 
 	/* Vendor implementation quirks callback */
 	void (*vendor_quirk)(struct panfrost_device *pfdev);
+
+	/* Allowed PM features */
+	u8 pm_features;
 };
 
 struct panfrost_device {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 7cabf4e3d1f2..a926d71e8131 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -274,7 +274,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&job->base,
 				 &file_priv->sched_entity[slot],
-				 NULL);
+				 1, NULL);
 	if (ret)
 		goto out_put_job;
 
@@ -734,6 +734,7 @@ static const struct panfrost_compatible mediatek_mt8183_b_data = {
 	.supply_names = mediatek_mt8183_b_supplies,
 	.num_pm_domains = ARRAY_SIZE(mediatek_mt8183_pm_domains),
 	.pm_domain_names = mediatek_mt8183_pm_domains,
+	.pm_features = BIT(GPU_PM_CLK_DIS) | BIT(GPU_PM_VREG_OFF),
 };
 
 static const char * const mediatek_mt8186_pm_domains[] = { "core0", "core1" };
@@ -742,6 +743,7 @@ static const struct panfrost_compatible mediatek_mt8186_data = {
 	.supply_names = mediatek_mt8183_b_supplies,
 	.num_pm_domains = ARRAY_SIZE(mediatek_mt8186_pm_domains),
 	.pm_domain_names = mediatek_mt8186_pm_domains,
+	.pm_features = BIT(GPU_PM_CLK_DIS) | BIT(GPU_PM_VREG_OFF),
 };
 
 static const char * const mediatek_mt8192_supplies[] = { "mali", NULL };
@@ -752,6 +754,7 @@ static const struct panfrost_compatible mediatek_mt8192_data = {
 	.supply_names = mediatek_mt8192_supplies,
 	.num_pm_domains = ARRAY_SIZE(mediatek_mt8192_pm_domains),
 	.pm_domain_names = mediatek_mt8192_pm_domains,
+	.pm_features = BIT(GPU_PM_CLK_DIS) | BIT(GPU_PM_VREG_OFF),
 };
 
 static const struct of_device_id dt_match[] = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_dump.c b/drivers/gpu/drm/panfrost/panfrost_dump.c
index e7942ac449c6..47751302f1bc 100644
--- a/drivers/gpu/drm/panfrost/panfrost_dump.c
+++ b/drivers/gpu/drm/panfrost/panfrost_dump.c
@@ -220,16 +220,8 @@ void panfrost_core_dump(struct panfrost_job *job)
 
 		iter.hdr->bomap.data[0] = bomap - bomap_start;
 
-		for_each_sgtable_page(bo->base.sgt, &page_iter, 0) {
-			struct page *page = sg_page_iter_page(&page_iter);
-
-			if (!IS_ERR(page)) {
-				*bomap++ = page_to_phys(page);
-			} else {
-				dev_err(pfdev->dev, "Panfrost Dump: wrong page\n");
-				*bomap++ = 0;
-			}
-		}
+		for_each_sgtable_page(bo->base.sgt, &page_iter, 0)
+			*bomap++ = page_to_phys(sg_page_iter_page(&page_iter));
 
 		iter.hdr->bomap.iova = mapping->mmnode.start << PAGE_SHIFT;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index f0be7e19b13e..09f5e1563ebd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -60,14 +60,21 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
 
 	gpu_write(pfdev, GPU_INT_MASK, 0);
 	gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_RESET_COMPLETED);
-	gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
 
+	gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT,
-		val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000);
+		val, val & GPU_IRQ_RESET_COMPLETED, 10, 10000);
 
 	if (ret) {
-		dev_err(pfdev->dev, "gpu soft reset timed out\n");
-		return ret;
+		dev_err(pfdev->dev, "gpu soft reset timed out, attempting hard reset\n");
+
+		gpu_write(pfdev, GPU_CMD, GPU_CMD_HARD_RESET);
+		ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT, val,
+						 val & GPU_IRQ_RESET_COMPLETED, 100, 10000);
+		if (ret) {
+			dev_err(pfdev->dev, "gpu hard reset timed out\n");
+			return ret;
+		}
 	}
 
 	gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
@@ -362,32 +369,42 @@ unsigned long long panfrost_cycle_counter_read(struct panfrost_device *pfdev)
 	return ((u64)hi << 32) | lo;
 }
 
+static u64 panfrost_get_core_mask(struct panfrost_device *pfdev)
+{
+	u64 core_mask;
+
+	if (pfdev->features.l2_present == 1)
+		return U64_MAX;
+
+	/*
+	 * Only support one core group now.
+	 * ~(l2_present - 1) unsets all bits in l2_present except
+	 * the bottom bit. (l2_present - 2) has all the bits in
+	 * the first core group set. AND them together to generate
+	 * a mask of cores in the first core group.
+	 */
+	core_mask = ~(pfdev->features.l2_present - 1) &
+		     (pfdev->features.l2_present - 2);
+	dev_info_once(pfdev->dev, "using only 1st core group (%lu cores from %lu)\n",
+		      hweight64(core_mask),
+		      hweight64(pfdev->features.shader_present));
+
+	return core_mask;
+}
+
 void panfrost_gpu_power_on(struct panfrost_device *pfdev)
 {
 	int ret;
 	u32 val;
-	u64 core_mask = U64_MAX;
+	u64 core_mask;
 
 	panfrost_gpu_init_quirks(pfdev);
+	core_mask = panfrost_get_core_mask(pfdev);
 
-	if (pfdev->features.l2_present != 1) {
-		/*
-		 * Only support one core group now.
-		 * ~(l2_present - 1) unsets all bits in l2_present except
-		 * the bottom bit. (l2_present - 2) has all the bits in
-		 * the first core group set. AND them together to generate
-		 * a mask of cores in the first core group.
-		 */
-		core_mask = ~(pfdev->features.l2_present - 1) &
-			     (pfdev->features.l2_present - 2);
-		dev_info_once(pfdev->dev, "using only 1st core group (%lu cores from %lu)\n",
-			      hweight64(core_mask),
-			      hweight64(pfdev->features.shader_present));
-	}
 	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present & core_mask);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + L2_READY_LO,
 		val, val == (pfdev->features.l2_present & core_mask),
-		100, 20000);
+		10, 20000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu L2");
 
@@ -395,22 +412,40 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
 		  pfdev->features.shader_present & core_mask);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_READY_LO,
 		val, val == (pfdev->features.shader_present & core_mask),
-		100, 20000);
+		10, 20000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu shader");
 
 	gpu_write(pfdev, TILER_PWRON_LO, pfdev->features.tiler_present);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_READY_LO,
-		val, val == pfdev->features.tiler_present, 100, 1000);
+		val, val == pfdev->features.tiler_present, 10, 1000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu tiler");
 }
 
 void panfrost_gpu_power_off(struct panfrost_device *pfdev)
 {
-	gpu_write(pfdev, TILER_PWROFF_LO, 0);
-	gpu_write(pfdev, SHADER_PWROFF_LO, 0);
-	gpu_write(pfdev, L2_PWROFF_LO, 0);
+	u64 core_mask = panfrost_get_core_mask(pfdev);
+	int ret;
+	u32 val;
+
+	gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask);
+	ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
+					 val, !val, 1, 1000);
+	if (ret)
+		dev_err(pfdev->dev, "shader power transition timeout");
+
+	gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present);
+	ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO,
+					 val, !val, 1, 1000);
+	if (ret)
+		dev_err(pfdev->dev, "tiler power transition timeout");
+
+	gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask);
+	ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
+				 val, !val, 0, 1000);
+	if (ret)
+		dev_err(pfdev->dev, "l2 power transition timeout");
 }
 
 int panfrost_gpu_init(struct panfrost_device *pfdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index ecd2e035147f..f9446e197428 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -852,7 +852,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		js->queue[j].fence_context = dma_fence_context_alloc(1);
 
 		ret = drm_sched_init(&js->queue[j].sched,
-				     &panfrost_sched_ops,
+				     &panfrost_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
@@ -963,7 +963,7 @@ int panfrost_job_is_idle(struct panfrost_device *pfdev)
 
 	for (i = 0; i < NUM_JOB_SLOTS; i++) {
 		/* If there are any jobs in the HW queue, we're not idle */
-		if (atomic_read(&js->queue[i].sched.hw_rq_count))
+		if (atomic_read(&js->queue[i].sched.credit_count))
 			return false;
 	}
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 55ec807550b3..c25743b05c55 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -44,6 +44,7 @@
 	 GPU_IRQ_MULTIPLE_FAULT)
 #define GPU_CMD				0x30
 #define   GPU_CMD_SOFT_RESET		0x01
+#define   GPU_CMD_HARD_RESET		0x02
 #define   GPU_CMD_PERFCNT_CLEAR		0x03
 #define   GPU_CMD_PERFCNT_SAMPLE	0x04
 #define   GPU_CMD_CYCLE_COUNT_START	0x05
diff --git a/drivers/gpu/drm/radeon/radeon_audio.c b/drivers/gpu/drm/radeon/radeon_audio.c
index d6ccaf24ee0c..279bf130a18c 100644
--- a/drivers/gpu/drm/radeon/radeon_audio.c
+++ b/drivers/gpu/drm/radeon/radeon_audio.c
@@ -26,6 +26,7 @@
 #include <linux/component.h>
 
 #include <drm/drm_crtc.h>
+#include <drm/drm_eld.h>
 #include "dce6_afmt.h"
 #include "evergreen_hdmi.h"
 #include "radeon.h"
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 3143ecaaff86..f8ed093b7356 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -51,7 +51,7 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			   __assign_str(name, sched_job->sched->name);
 			   __entry->job_count = spsc_queue_count(&entity->job_queue);
 			   __entry->hw_job_count = atomic_read(
-				   &sched_job->sched->hw_rq_count);
+				   &sched_job->sched->credit_count);
 			   ),
 	    TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
 		      __entry->entity, __entry->id,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 409e4256f6e7..4d42b1e4daa6 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -370,7 +370,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 		container_of(cb, struct drm_sched_entity, cb);
 
 	drm_sched_entity_clear_dep(f, cb);
-	drm_sched_wakeup_if_can_queue(entity->rq->sched);
+	drm_sched_wakeup(entity->rq->sched, entity);
 }
 
 /**
@@ -602,7 +602,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
 			drm_sched_rq_update_fifo(entity, submit_ts);
 
-		drm_sched_wakeup_if_can_queue(entity->rq->sched);
+		drm_sched_wakeup(entity->rq->sched, entity);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 99797a8c836a..044a8c4875ba 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,7 +48,30 @@
  * through the jobs entity pointer.
  */
 
-#include <linux/kthread.h>
+/**
+ * DOC: Flow Control
+ *
+ * The DRM GPU scheduler provides a flow control mechanism to regulate the rate
+ * in which the jobs fetched from scheduler entities are executed.
+ *
+ * In this context the &drm_gpu_scheduler keeps track of a driver specified
+ * credit limit representing the capacity of this scheduler and a credit count;
+ * every &drm_sched_job carries a driver specified number of credits.
+ *
+ * Once a job is executed (but not yet finished), the job's credits contribute
+ * to the scheduler's credit count until the job is finished. If by executing
+ * one more job the scheduler's credit count would exceed the scheduler's
+ * credit limit, the job won't be executed. Instead, the scheduler will wait
+ * until the credit count has decreased enough to not overflow its credit limit.
+ * This implies waiting for previously executed jobs.
+ *
+ * Optionally, drivers may register a callback (update_job_credits) provided by
+ * struct drm_sched_backend_ops to update the job's credits dynamically. The
+ * scheduler executes this callback every time the scheduler considers a job for
+ * execution and subsequently checks whether the job fits the scheduler's credit
+ * limit.
+ */
+
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
@@ -76,6 +99,51 @@ int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
 module_param_named(sched_policy, drm_sched_policy, int, 0444);
 
+static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
+{
+	u32 credits;
+
+	drm_WARN_ON(sched, check_sub_overflow(sched->credit_limit,
+					      atomic_read(&sched->credit_count),
+					      &credits));
+
+	return credits;
+}
+
+/**
+ * drm_sched_can_queue -- Can we queue more to the hardware?
+ * @sched: scheduler instance
+ * @entity: the scheduler entity
+ *
+ * Return true if we can push at least one more job from @entity, false
+ * otherwise.
+ */
+static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+				struct drm_sched_entity *entity)
+{
+	struct drm_sched_job *s_job;
+
+	s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+	if (!s_job)
+		return false;
+
+	if (sched->ops->update_job_credits) {
+		s_job->credits = sched->ops->update_job_credits(s_job);
+
+		drm_WARN(sched, !s_job->credits,
+			 "Jobs with zero credits bypass job-flow control.\n");
+	}
+
+	/* If a job exceeds the credit limit, truncate it to the credit limit
+	 * itself to guarantee forward progress.
+	 */
+	if (drm_WARN(sched, s_job->credits > sched->credit_limit,
+		     "Jobs may not exceed the credit limit, truncate.\n"))
+		s_job->credits = sched->credit_limit;
+
+	return drm_sched_available_credits(sched) >= s_job->credits;
+}
+
 static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
 							    const struct rb_node *b)
 {
@@ -187,12 +255,18 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 /**
  * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
  *
+ * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
  *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find the next ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
+			      struct drm_sched_rq *rq)
 {
 	struct drm_sched_entity *entity;
 
@@ -202,6 +276,14 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	if (entity) {
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
+				/* If we can't queue yet, preserve the current
+				 * entity in terms of fairness.
+				 */
+				if (!drm_sched_can_queue(sched, entity)) {
+					spin_unlock(&rq->lock);
+					return ERR_PTR(-ENOSPC);
+				}
+
 				rq->current_entity = entity;
 				reinit_completion(&entity->entity_idle);
 				spin_unlock(&rq->lock);
@@ -211,8 +293,15 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	}
 
 	list_for_each_entry(entity, &rq->entities, list) {
-
 		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
 			rq->current_entity = entity;
 			reinit_completion(&entity->entity_idle);
 			spin_unlock(&rq->lock);
@@ -231,12 +320,18 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 /**
  * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
  *
+ * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
  *
- * Find oldest waiting ready entity, returns NULL if none found.
+ * Find oldest waiting ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
+				struct drm_sched_rq *rq)
 {
 	struct rb_node *rb;
 
@@ -246,6 +341,14 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 
 		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
 		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
 			rq->current_entity = entity;
 			reinit_completion(&entity->entity_idle);
 			break;
@@ -256,6 +359,42 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
 }
 
+/**
+ * drm_sched_run_job_queue - enqueue run-job work
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_run_job);
+}
+
+/**
+ * __drm_sched_run_free_queue - enqueue free-job work
+ * @sched: scheduler instance
+ */
+static void __drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_free_job);
+}
+
+/**
+ * drm_sched_run_free_queue - enqueue free-job work if ready
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_job *job;
+
+	spin_lock(&sched->job_list_lock);
+	job = list_first_entry_or_null(&sched->pending_list,
+				       struct drm_sched_job, list);
+	if (job && dma_fence_is_signaled(&job->s_fence->finished))
+		__drm_sched_run_free_queue(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -267,7 +406,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	struct drm_sched_fence *s_fence = s_job->s_fence;
 	struct drm_gpu_scheduler *sched = s_fence->sched;
 
-	atomic_dec(&sched->hw_rq_count);
+	atomic_sub(s_job->credits, &sched->credit_count);
 	atomic_dec(sched->score);
 
 	trace_drm_sched_process_job(s_fence);
@@ -275,7 +414,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	wake_up_interruptible(&sched->wake_up_worker);
+	__drm_sched_run_free_queue(sched);
 }
 
 /**
@@ -299,11 +438,36 @@ static void drm_sched_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
  */
 static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 {
+	lockdep_assert_held(&sched->job_list_lock);
+
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
-		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
+		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
+static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
+/**
+ * drm_sched_tdr_queue_imm: - immediately start job timeout handler
+ *
+ * @sched: scheduler for which the timeout handling should be started.
+ *
+ * Start timeout handling immediately for the named scheduler.
+ */
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	sched->timeout = 0;
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
@@ -388,7 +552,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+	/* Protects against concurrent deletion in drm_sched_get_finished_job */
 	spin_lock(&sched->job_list_lock);
 	job = list_first_entry_or_null(&sched->pending_list,
 				       struct drm_sched_job, list);
@@ -416,11 +580,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	if (status != DRM_GPU_SCHED_STAT_ENODEV) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (status != DRM_GPU_SCHED_STAT_ENODEV)
+		drm_sched_start_timeout_unlocked(sched);
 }
 
 /**
@@ -439,13 +600,13 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 {
 	struct drm_sched_job *s_job, *tmp;
 
-	kthread_park(sched->thread);
+	drm_sched_wqueue_stop(sched);
 
 	/*
 	 * Reinsert back the bad job here - now it's safe as
-	 * drm_sched_get_cleanup_job cannot race against us and release the
+	 * drm_sched_get_finished_job cannot race against us and release the
 	 * bad job at this point - we parked (waited for) any in progress
-	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
+	 * (earlier) cleanups and drm_sched_get_finished_job will not be called
 	 * now until the scheduler thread is unparked.
 	 */
 	if (bad && bad->sched == sched)
@@ -468,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 					      &s_job->cb)) {
 			dma_fence_put(s_job->s_fence->parent);
 			s_job->s_fence->parent = NULL;
-			atomic_dec(&sched->hw_rq_count);
+			atomic_sub(s_job->credits, &sched->credit_count);
 		} else {
 			/*
 			 * remove job from pending_list.
@@ -529,7 +690,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
 		struct dma_fence *fence = s_job->s_fence->parent;
 
-		atomic_inc(&sched->hw_rq_count);
+		atomic_add(s_job->credits, &sched->credit_count);
 
 		if (!full_recovery)
 			continue;
@@ -546,13 +707,10 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 			drm_sched_job_done(s_job, -ECANCELED);
 	}
 
-	if (full_recovery) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (full_recovery)
+		drm_sched_start_timeout_unlocked(sched);
 
-	kthread_unpark(sched->thread);
+	drm_sched_wqueue_start(sched);
 }
 EXPORT_SYMBOL(drm_sched_start);
 
@@ -613,6 +771,8 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * drm_sched_job_init - init a scheduler job
  * @job: scheduler job to init
  * @entity: scheduler entity to use
+ * @credits: the number of credits this job contributes to the schedulers
+ * credit limit
  * @owner: job owner for debugging
  *
  * Refer to drm_sched_entity_push_job() documentation
@@ -630,7 +790,7 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  */
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       void *owner)
+		       u32 credits, void *owner)
 {
 	if (!entity->rq) {
 		/* This will most likely be followed by missing frames
@@ -641,7 +801,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		return -ENOENT;
 	}
 
+	if (unlikely(!credits)) {
+		pr_err("*ERROR* %s: credits cannot be 0!\n", __func__);
+		return -EINVAL;
+	}
+
 	job->entity = entity;
+	job->credits = credits;
 	job->s_fence = drm_sched_fence_alloc(entity, owner);
 	if (!job->s_fence)
 		return -ENOMEM;
@@ -854,27 +1020,18 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 EXPORT_SYMBOL(drm_sched_job_cleanup);
 
 /**
- * drm_sched_can_queue -- Can we queue more to the hardware?
- * @sched: scheduler instance
- *
- * Return true if we can push more jobs to the hw, otherwise false.
- */
-static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
-{
-	return atomic_read(&sched->hw_rq_count) <
-		sched->hw_submission_limit;
-}
-
-/**
- * drm_sched_wakeup_if_can_queue - Wake up the scheduler
+ * drm_sched_wakeup - Wake up the scheduler if it is ready to queue
  * @sched: scheduler instance
+ * @entity: the scheduler entity
  *
  * Wake up the scheduler if we can queue jobs.
  */
-void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
+		      struct drm_sched_entity *entity)
 {
-	if (drm_sched_can_queue(sched))
-		wake_up_interruptible(&sched->wake_up_worker);
+	if (drm_sched_entity_is_ready(entity))
+		if (drm_sched_can_queue(sched, entity))
+			drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -882,7 +1039,11 @@ void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
  *
  * @sched: scheduler instance
  *
- * Returns the entity to process or NULL if none are found.
+ * Return an entity to process or NULL if none are found.
+ *
+ * Note, that we break out of the for-loop when "entity" is non-null, which can
+ * also be an error-pointer--this assures we don't process lower priority
+ * run-queues. See comments in the respectively called functions.
  */
 static struct drm_sched_entity *
 drm_sched_select_entity(struct drm_gpu_scheduler *sched)
@@ -890,23 +1051,20 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *entity;
 	int i;
 
-	if (!drm_sched_can_queue(sched))
-		return NULL;
-
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
-			drm_sched_rq_select_entity_fifo(sched->sched_rq[i]) :
-			drm_sched_rq_select_entity_rr(sched->sched_rq[i]);
+			drm_sched_rq_select_entity_fifo(sched, sched->sched_rq[i]) :
+			drm_sched_rq_select_entity_rr(sched, sched->sched_rq[i]);
 		if (entity)
 			break;
 	}
 
-	return entity;
+	return IS_ERR(entity) ? NULL : entity;
 }
 
 /**
- * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
+ * drm_sched_get_finished_job - fetch the next finished job to be destroyed
  *
  * @sched: scheduler instance
  *
@@ -914,7 +1072,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
  * ready for it to be destroyed.
  */
 static struct drm_sched_job *
-drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
+drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
 {
 	struct drm_sched_job *job, *next;
 
@@ -934,8 +1092,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 						typeof(*next), list);
 
 		if (next) {
-			next->s_fence->scheduled.timestamp =
-				dma_fence_timestamp(&job->s_fence->finished);
+			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
+				     &next->s_fence->scheduled.flags))
+				next->s_fence->scheduled.timestamp =
+					dma_fence_timestamp(&job->s_fence->finished);
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
@@ -985,91 +1145,82 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 EXPORT_SYMBOL(drm_sched_pick_best);
 
 /**
- * drm_sched_blocked - check if the scheduler is blocked
+ * drm_sched_free_job_work - worker to call free_job
  *
- * @sched: scheduler instance
- *
- * Returns true if blocked, otherwise false.
+ * @w: free job work
  */
-static bool drm_sched_blocked(struct drm_gpu_scheduler *sched)
+static void drm_sched_free_job_work(struct work_struct *w)
 {
-	if (kthread_should_park()) {
-		kthread_parkme();
-		return true;
-	}
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_free_job);
+	struct drm_sched_job *job;
 
-	return false;
+	if (READ_ONCE(sched->pause_submit))
+		return;
+
+	job = drm_sched_get_finished_job(sched);
+	if (job)
+		sched->ops->free_job(job);
+
+	drm_sched_run_free_queue(sched);
+	drm_sched_run_job_queue(sched);
 }
 
 /**
- * drm_sched_main - main scheduler thread
+ * drm_sched_run_job_work - worker to call run_job
  *
- * @param: scheduler instance
- *
- * Returns 0.
+ * @w: run job work
  */
-static int drm_sched_main(void *param)
+static void drm_sched_run_job_work(struct work_struct *w)
 {
-	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_run_job);
+	struct drm_sched_entity *entity;
+	struct dma_fence *fence;
+	struct drm_sched_fence *s_fence;
+	struct drm_sched_job *sched_job;
 	int r;
 
-	sched_set_fifo_low(current);
+	if (READ_ONCE(sched->pause_submit))
+		return;
 
-	while (!kthread_should_stop()) {
-		struct drm_sched_entity *entity = NULL;
-		struct drm_sched_fence *s_fence;
-		struct drm_sched_job *sched_job;
-		struct dma_fence *fence;
-		struct drm_sched_job *cleanup_job = NULL;
+	entity = drm_sched_select_entity(sched);
+	if (!entity)
+		return;
 
-		wait_event_interruptible(sched->wake_up_worker,
-					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
-					 (!drm_sched_blocked(sched) &&
-					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop());
-
-		if (cleanup_job)
-			sched->ops->free_job(cleanup_job);
-
-		if (!entity)
-			continue;
-
-		sched_job = drm_sched_entity_pop_job(entity);
-
-		if (!sched_job) {
-			complete_all(&entity->entity_idle);
-			continue;
-		}
-
-		s_fence = sched_job->s_fence;
-
-		atomic_inc(&sched->hw_rq_count);
-		drm_sched_job_begin(sched_job);
-
-		trace_drm_run_job(sched_job, entity);
-		fence = sched->ops->run_job(sched_job);
+	sched_job = drm_sched_entity_pop_job(entity);
+	if (!sched_job) {
 		complete_all(&entity->entity_idle);
-		drm_sched_fence_scheduled(s_fence, fence);
-
-		if (!IS_ERR_OR_NULL(fence)) {
-			/* Drop for original kref_init of the fence */
-			dma_fence_put(fence);
-
-			r = dma_fence_add_callback(fence, &sched_job->cb,
-						   drm_sched_job_done_cb);
-			if (r == -ENOENT)
-				drm_sched_job_done(sched_job, fence->error);
-			else if (r)
-				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
-					  r);
-		} else {
-			drm_sched_job_done(sched_job, IS_ERR(fence) ?
-					   PTR_ERR(fence) : 0);
-		}
-
-		wake_up(&sched->job_scheduled);
+		return;	/* No more work */
 	}
-	return 0;
+
+	s_fence = sched_job->s_fence;
+
+	atomic_add(sched_job->credits, &sched->credit_count);
+	drm_sched_job_begin(sched_job);
+
+	trace_drm_run_job(sched_job, entity);
+	fence = sched->ops->run_job(sched_job);
+	complete_all(&entity->entity_idle);
+	drm_sched_fence_scheduled(s_fence, fence);
+
+	if (!IS_ERR_OR_NULL(fence)) {
+		/* Drop for original kref_init of the fence */
+		dma_fence_put(fence);
+
+		r = dma_fence_add_callback(fence, &sched_job->cb,
+					   drm_sched_job_done_cb);
+		if (r == -ENOENT)
+			drm_sched_job_done(sched_job, fence->error);
+		else if (r)
+			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
+	} else {
+		drm_sched_job_done(sched_job, IS_ERR(fence) ?
+				   PTR_ERR(fence) : 0);
+	}
+
+	wake_up(&sched->job_scheduled);
+	drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -1077,8 +1228,10 @@ static int drm_sched_main(void *param)
  *
  * @sched: scheduler instance
  * @ops: backend operations for this scheduler
+ * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
+ *	       allocated and used
  * @num_rqs: number of runqueues, one for each priority, up to DRM_SCHED_PRIORITY_COUNT
- * @hw_submission: number of hw submissions that can be in flight
+ * @credit_limit: the number of credits this scheduler can hold from all jobs
  * @hang_limit: number of times to allow a job to hang before dropping it
  * @timeout: timeout value in jiffies for the scheduler
  * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is
@@ -1091,14 +1244,15 @@ static int drm_sched_main(void *param)
  */
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
-		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
+		   struct workqueue_struct *submit_wq,
+		   u32 num_rqs, u32 credit_limit, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
 {
 	int i, ret;
 
 	sched->ops = ops;
-	sched->hw_submission_limit = hw_submission;
+	sched->credit_limit = credit_limit;
 	sched->name = name;
 	sched->timeout = timeout;
 	sched->timeout_wq = timeout_wq ? : system_wq;
@@ -1121,14 +1275,22 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		return 0;
 	}
 
+	if (submit_wq) {
+		sched->submit_wq = submit_wq;
+		sched->own_submit_wq = false;
+	} else {
+		sched->submit_wq = alloc_ordered_workqueue(name, 0);
+		if (!sched->submit_wq)
+			return -ENOMEM;
+
+		sched->own_submit_wq = true;
+	}
+	ret = -ENOMEM;
 	sched->sched_rq = kmalloc_array(num_rqs, sizeof(*sched->sched_rq),
 					GFP_KERNEL | __GFP_ZERO);
-	if (!sched->sched_rq) {
-		drm_err(sched, "%s: out of memory for sched_rq\n", __func__);
-		return -ENOMEM;
-	}
+	if (!sched->sched_rq)
+		goto Out_free;
 	sched->num_rqs = num_rqs;
-	ret = -ENOMEM;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < sched->num_rqs; i++) {
 		sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
 		if (!sched->sched_rq[i])
@@ -1136,31 +1298,27 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		drm_sched_rq_init(sched, sched->sched_rq[i]);
 	}
 
-	init_waitqueue_head(&sched->wake_up_worker);
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
 	spin_lock_init(&sched->job_list_lock);
-	atomic_set(&sched->hw_rq_count, 0);
+	atomic_set(&sched->credit_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
+	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
+	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
-
-	/* Each scheduler will run on a seperate kernel thread */
-	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
-	if (IS_ERR(sched->thread)) {
-		ret = PTR_ERR(sched->thread);
-		sched->thread = NULL;
-		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
-		goto Out_unroll;
-	}
+	sched->pause_submit = false;
 
 	sched->ready = true;
 	return 0;
 Out_unroll:
 	for (--i ; i >= DRM_SCHED_PRIORITY_MIN; i--)
 		kfree(sched->sched_rq[i]);
+Out_free:
 	kfree(sched->sched_rq);
 	sched->sched_rq = NULL;
+	if (sched->own_submit_wq)
+		destroy_workqueue(sched->submit_wq);
 	drm_err(sched, "%s: Failed to setup GPU scheduler--out of memory\n", __func__);
 	return ret;
 }
@@ -1178,8 +1336,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *s_entity;
 	int i;
 
-	if (sched->thread)
-		kthread_stop(sched->thread);
+	drm_sched_wqueue_stop(sched);
 
 	for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		struct drm_sched_rq *rq = sched->sched_rq[i];
@@ -1202,6 +1359,8 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	/* Confirm no work left behind accessing device structures */
 	cancel_delayed_work_sync(&sched->work_tdr);
 
+	if (sched->own_submit_wq)
+		destroy_workqueue(sched->submit_wq);
 	sched->ready = false;
 	kfree(sched->sched_rq);
 	sched->sched_rq = NULL;
@@ -1252,3 +1411,42 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	}
 }
 EXPORT_SYMBOL(drm_sched_increase_karma);
+
+/**
+ * drm_sched_wqueue_ready - Is the scheduler ready for submission
+ *
+ * @sched: scheduler instance
+ *
+ * Returns true if submission is ready
+ */
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
+{
+	return sched->ready;
+}
+EXPORT_SYMBOL(drm_sched_wqueue_ready);
+
+/**
+ * drm_sched_wqueue_stop - stop scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
+{
+	WRITE_ONCE(sched->pause_submit, true);
+	cancel_work_sync(&sched->work_run_job);
+	cancel_work_sync(&sched->work_free_job);
+}
+EXPORT_SYMBOL(drm_sched_wqueue_stop);
+
+/**
+ * drm_sched_wqueue_start - start scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
+{
+	WRITE_ONCE(sched->pause_submit, false);
+	queue_work(sched->submit_wq, &sched->work_run_job);
+	queue_work(sched->submit_wq, &sched->work_free_job);
+}
+EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/drivers/gpu/drm/solomon/ssd130x.c b/drivers/gpu/drm/solomon/ssd130x.c
index e0174f82e353..bef293922b98 100644
--- a/drivers/gpu/drm/solomon/ssd130x.c
+++ b/drivers/gpu/drm/solomon/ssd130x.c
@@ -808,7 +808,8 @@ static void ssd132x_clear_screen(struct ssd130x_device *ssd130x, u8 *data_array)
 static int ssd130x_fb_blit_rect(struct drm_framebuffer *fb,
 				const struct iosys_map *vmap,
 				struct drm_rect *rect,
-				u8 *buf, u8 *data_array)
+				u8 *buf, u8 *data_array,
+				struct drm_format_conv_state *fmtcnv_state)
 {
 	struct ssd130x_device *ssd130x = drm_to_ssd130x(fb->dev);
 	struct iosys_map dst;
@@ -826,7 +827,7 @@ static int ssd130x_fb_blit_rect(struct drm_framebuffer *fb,
 		return ret;
 
 	iosys_map_set_vaddr(&dst, buf);
-	drm_fb_xrgb8888_to_mono(&dst, &dst_pitch, vmap, fb, rect);
+	drm_fb_xrgb8888_to_mono(&dst, &dst_pitch, vmap, fb, rect, fmtcnv_state);
 
 	drm_gem_fb_end_cpu_access(fb, DMA_FROM_DEVICE);
 
@@ -838,7 +839,8 @@ static int ssd130x_fb_blit_rect(struct drm_framebuffer *fb,
 static int ssd132x_fb_blit_rect(struct drm_framebuffer *fb,
 				const struct iosys_map *vmap,
 				struct drm_rect *rect, u8 *buf,
-				u8 *data_array)
+				u8 *data_array,
+				struct drm_format_conv_state *fmtcnv_state)
 {
 	struct ssd130x_device *ssd130x = drm_to_ssd130x(fb->dev);
 	unsigned int dst_pitch = drm_rect_width(rect);
@@ -855,7 +857,7 @@ static int ssd132x_fb_blit_rect(struct drm_framebuffer *fb,
 		return ret;
 
 	iosys_map_set_vaddr(&dst, buf);
-	drm_fb_xrgb8888_to_gray8(&dst, &dst_pitch, vmap, fb, rect);
+	drm_fb_xrgb8888_to_gray8(&dst, &dst_pitch, vmap, fb, rect, fmtcnv_state);
 
 	drm_gem_fb_end_cpu_access(fb, DMA_FROM_DEVICE);
 
@@ -871,6 +873,7 @@ static int ssd130x_primary_plane_atomic_check(struct drm_plane *plane,
 	struct ssd130x_device *ssd130x = drm_to_ssd130x(drm);
 	struct drm_plane_state *plane_state = drm_atomic_get_new_plane_state(state, plane);
 	struct ssd130x_plane_state *ssd130x_state = to_ssd130x_plane_state(plane_state);
+	struct drm_shadow_plane_state *shadow_plane_state = &ssd130x_state->base;
 	struct drm_crtc *crtc = plane_state->crtc;
 	struct drm_crtc_state *crtc_state = NULL;
 	const struct drm_format_info *fi;
@@ -895,6 +898,16 @@ static int ssd130x_primary_plane_atomic_check(struct drm_plane *plane,
 
 	pitch = drm_format_info_min_pitch(fi, 0, ssd130x->width);
 
+	if (plane_state->fb->format != fi) {
+		void *buf;
+
+		/* format conversion necessary; reserve buffer */
+		buf = drm_format_conv_state_reserve(&shadow_plane_state->fmtcnv_state,
+						    pitch, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+	}
+
 	ssd130x_state->buffer = kcalloc(pitch, ssd130x->height, GFP_KERNEL);
 	if (!ssd130x_state->buffer)
 		return -ENOMEM;
@@ -909,6 +922,7 @@ static int ssd132x_primary_plane_atomic_check(struct drm_plane *plane,
 	struct ssd130x_device *ssd130x = drm_to_ssd130x(drm);
 	struct drm_plane_state *plane_state = drm_atomic_get_new_plane_state(state, plane);
 	struct ssd130x_plane_state *ssd130x_state = to_ssd130x_plane_state(plane_state);
+	struct drm_shadow_plane_state *shadow_plane_state = &ssd130x_state->base;
 	struct drm_crtc *crtc = plane_state->crtc;
 	struct drm_crtc_state *crtc_state = NULL;
 	const struct drm_format_info *fi;
@@ -933,6 +947,16 @@ static int ssd132x_primary_plane_atomic_check(struct drm_plane *plane,
 
 	pitch = drm_format_info_min_pitch(fi, 0, ssd130x->width);
 
+	if (plane_state->fb->format != fi) {
+		void *buf;
+
+		/* format conversion necessary; reserve buffer */
+		buf = drm_format_conv_state_reserve(&shadow_plane_state->fmtcnv_state,
+						    pitch, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+	}
+
 	ssd130x_state->buffer = kcalloc(pitch, ssd130x->height, GFP_KERNEL);
 	if (!ssd130x_state->buffer)
 		return -ENOMEM;
@@ -968,7 +992,8 @@ static void ssd130x_primary_plane_atomic_update(struct drm_plane *plane,
 
 		ssd130x_fb_blit_rect(fb, &shadow_plane_state->data[0], &dst_clip,
 				     ssd130x_plane_state->buffer,
-				     ssd130x_crtc_state->data_array);
+				     ssd130x_crtc_state->data_array,
+				     &shadow_plane_state->fmtcnv_state);
 	}
 
 	drm_dev_exit(idx);
@@ -1002,7 +1027,8 @@ static void ssd132x_primary_plane_atomic_update(struct drm_plane *plane,
 
 		ssd132x_fb_blit_rect(fb, &shadow_plane_state->data[0], &dst_clip,
 				     ssd130x_plane_state->buffer,
-				     ssd130x_crtc_state->data_array);
+				     ssd130x_crtc_state->data_array,
+				     &shadow_plane_state->fmtcnv_state);
 	}
 
 	drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c
index 0ba3ca3ac509..a1fcee665023 100644
--- a/drivers/gpu/drm/tegra/hdmi.c
+++ b/drivers/gpu/drm/tegra/hdmi.c
@@ -24,6 +24,7 @@
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_debugfs.h>
+#include <drm/drm_eld.h>
 #include <drm/drm_file.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_probe_helper.h>
diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index d5a3d3f4fece..83341576630d 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -20,6 +20,7 @@
 #include <drm/display/drm_scdc_helper.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_debugfs.h>
+#include <drm/drm_eld.h>
 #include <drm/drm_file.h>
 #include <drm/drm_panel.h>
 #include <drm/drm_simple_kms_helper.h>
diff --git a/drivers/gpu/drm/tests/Makefile b/drivers/gpu/drm/tests/Makefile
index ba7baa622675..2645af241ff0 100644
--- a/drivers/gpu/drm/tests/Makefile
+++ b/drivers/gpu/drm/tests/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
 	drm_connector_test.o \
 	drm_damage_helper_test.o \
 	drm_dp_mst_helper_test.o \
+	drm_exec_test.o \
 	drm_format_helper_test.o \
 	drm_format_test.o \
 	drm_framebuffer_test.o \
@@ -17,7 +18,6 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
 	drm_modes_test.o \
 	drm_plane_helper_test.o \
 	drm_probe_helper_test.o \
-	drm_rect_test.o	\
-	drm_exec_test.o
+	drm_rect_test.o
 
 CFLAGS_drm_mm_test.o := $(DISABLE_STRUCTLEAK_PLUGIN)
diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c
index 09ee6f6af896..ea2af6bd9abe 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -13,315 +13,11 @@
 
 #include "../lib/drm_random.h"
 
-#define TIMEOUT(name__)								\
-	unsigned long name__ = jiffies + MAX_SCHEDULE_TIMEOUT
-
-static unsigned int random_seed;
-
 static inline u64 get_size(int order, u64 chunk_size)
 {
 	return (1 << order) * chunk_size;
 }
 
-__printf(2, 3)
-static bool __timeout(unsigned long timeout, const char *fmt, ...)
-{
-	va_list va;
-
-	if (!signal_pending(current)) {
-		cond_resched();
-		if (time_before(jiffies, timeout))
-			return false;
-	}
-
-	if (fmt) {
-		va_start(va, fmt);
-		vprintk(fmt, va);
-		va_end(va);
-	}
-
-	return true;
-}
-
-static void __dump_block(struct kunit *test, struct drm_buddy *mm,
-			 struct drm_buddy_block *block, bool buddy)
-{
-	kunit_err(test, "block info: header=%llx, state=%u, order=%d, offset=%llx size=%llx root=%d buddy=%d\n",
-		  block->header, drm_buddy_block_state(block),
-			  drm_buddy_block_order(block), drm_buddy_block_offset(block),
-			  drm_buddy_block_size(mm, block), !block->parent, buddy);
-}
-
-static void dump_block(struct kunit *test, struct drm_buddy *mm,
-		       struct drm_buddy_block *block)
-{
-	struct drm_buddy_block *buddy;
-
-	__dump_block(test, mm, block, false);
-
-	buddy = drm_get_buddy(block);
-	if (buddy)
-		__dump_block(test, mm, buddy, true);
-}
-
-static int check_block(struct kunit *test, struct drm_buddy *mm,
-		       struct drm_buddy_block *block)
-{
-	struct drm_buddy_block *buddy;
-	unsigned int block_state;
-	u64 block_size;
-	u64 offset;
-	int err = 0;
-
-	block_state = drm_buddy_block_state(block);
-
-	if (block_state != DRM_BUDDY_ALLOCATED &&
-	    block_state != DRM_BUDDY_FREE && block_state != DRM_BUDDY_SPLIT) {
-		kunit_err(test, "block state mismatch\n");
-		err = -EINVAL;
-	}
-
-	block_size = drm_buddy_block_size(mm, block);
-	offset = drm_buddy_block_offset(block);
-
-	if (block_size < mm->chunk_size) {
-		kunit_err(test, "block size smaller than min size\n");
-		err = -EINVAL;
-	}
-
-	/* We can't use is_power_of_2() for a u64 on 32-bit systems. */
-	if (block_size & (block_size - 1)) {
-		kunit_err(test, "block size not power of two\n");
-		err = -EINVAL;
-	}
-
-	if (!IS_ALIGNED(block_size, mm->chunk_size)) {
-		kunit_err(test, "block size not aligned to min size\n");
-		err = -EINVAL;
-	}
-
-	if (!IS_ALIGNED(offset, mm->chunk_size)) {
-		kunit_err(test, "block offset not aligned to min size\n");
-		err = -EINVAL;
-	}
-
-	if (!IS_ALIGNED(offset, block_size)) {
-		kunit_err(test, "block offset not aligned to block size\n");
-		err = -EINVAL;
-	}
-
-	buddy = drm_get_buddy(block);
-
-	if (!buddy && block->parent) {
-		kunit_err(test, "buddy has gone fishing\n");
-		err = -EINVAL;
-	}
-
-	if (buddy) {
-		if (drm_buddy_block_offset(buddy) != (offset ^ block_size)) {
-			kunit_err(test, "buddy has wrong offset\n");
-			err = -EINVAL;
-		}
-
-		if (drm_buddy_block_size(mm, buddy) != block_size) {
-			kunit_err(test, "buddy size mismatch\n");
-			err = -EINVAL;
-		}
-
-		if (drm_buddy_block_state(buddy) == block_state &&
-		    block_state == DRM_BUDDY_FREE) {
-			kunit_err(test, "block and its buddy are free\n");
-			err = -EINVAL;
-		}
-	}
-
-	return err;
-}
-
-static int check_blocks(struct kunit *test, struct drm_buddy *mm,
-			struct list_head *blocks, u64 expected_size, bool is_contiguous)
-{
-	struct drm_buddy_block *block;
-	struct drm_buddy_block *prev;
-	u64 total;
-	int err = 0;
-
-	block = NULL;
-	prev = NULL;
-	total = 0;
-
-	list_for_each_entry(block, blocks, link) {
-		err = check_block(test, mm, block);
-
-		if (!drm_buddy_block_is_allocated(block)) {
-			kunit_err(test, "block not allocated\n");
-			err = -EINVAL;
-		}
-
-		if (is_contiguous && prev) {
-			u64 prev_block_size;
-			u64 prev_offset;
-			u64 offset;
-
-			prev_offset = drm_buddy_block_offset(prev);
-			prev_block_size = drm_buddy_block_size(mm, prev);
-			offset = drm_buddy_block_offset(block);
-
-			if (offset != (prev_offset + prev_block_size)) {
-				kunit_err(test, "block offset mismatch\n");
-				err = -EINVAL;
-			}
-		}
-
-		if (err)
-			break;
-
-		total += drm_buddy_block_size(mm, block);
-		prev = block;
-	}
-
-	if (!err) {
-		if (total != expected_size) {
-			kunit_err(test, "size mismatch, expected=%llx, found=%llx\n",
-				  expected_size, total);
-			err = -EINVAL;
-		}
-		return err;
-	}
-
-	if (prev) {
-		kunit_err(test, "prev block, dump:\n");
-		dump_block(test, mm, prev);
-	}
-
-	kunit_err(test, "bad block, dump:\n");
-	dump_block(test, mm, block);
-
-	return err;
-}
-
-static int check_mm(struct kunit *test, struct drm_buddy *mm)
-{
-	struct drm_buddy_block *root;
-	struct drm_buddy_block *prev;
-	unsigned int i;
-	u64 total;
-	int err = 0;
-
-	if (!mm->n_roots) {
-		kunit_err(test, "n_roots is zero\n");
-		return -EINVAL;
-	}
-
-	if (mm->n_roots != hweight64(mm->size)) {
-		kunit_err(test, "n_roots mismatch, n_roots=%u, expected=%lu\n",
-			  mm->n_roots, hweight64(mm->size));
-		return -EINVAL;
-	}
-
-	root = NULL;
-	prev = NULL;
-	total = 0;
-
-	for (i = 0; i < mm->n_roots; ++i) {
-		struct drm_buddy_block *block;
-		unsigned int order;
-
-		root = mm->roots[i];
-		if (!root) {
-			kunit_err(test, "root(%u) is NULL\n", i);
-			err = -EINVAL;
-			break;
-		}
-
-		err = check_block(test, mm, root);
-
-		if (!drm_buddy_block_is_free(root)) {
-			kunit_err(test, "root not free\n");
-			err = -EINVAL;
-		}
-
-		order = drm_buddy_block_order(root);
-
-		if (!i) {
-			if (order != mm->max_order) {
-				kunit_err(test, "max order root missing\n");
-				err = -EINVAL;
-			}
-		}
-
-		if (prev) {
-			u64 prev_block_size;
-			u64 prev_offset;
-			u64 offset;
-
-			prev_offset = drm_buddy_block_offset(prev);
-			prev_block_size = drm_buddy_block_size(mm, prev);
-			offset = drm_buddy_block_offset(root);
-
-			if (offset != (prev_offset + prev_block_size)) {
-				kunit_err(test, "root offset mismatch\n");
-				err = -EINVAL;
-			}
-		}
-
-		block = list_first_entry_or_null(&mm->free_list[order],
-						 struct drm_buddy_block, link);
-		if (block != root) {
-			kunit_err(test, "root mismatch at order=%u\n", order);
-			err = -EINVAL;
-		}
-
-		if (err)
-			break;
-
-		prev = root;
-		total += drm_buddy_block_size(mm, root);
-	}
-
-	if (!err) {
-		if (total != mm->size) {
-			kunit_err(test, "expected mm size=%llx, found=%llx\n",
-				  mm->size, total);
-			err = -EINVAL;
-		}
-		return err;
-	}
-
-	if (prev) {
-		kunit_err(test, "prev root(%u), dump:\n", i - 1);
-		dump_block(test, mm, prev);
-	}
-
-	if (root) {
-		kunit_err(test, "bad root(%u), dump:\n", i);
-		dump_block(test, mm, root);
-	}
-
-	return err;
-}
-
-static void mm_config(u64 *size, u64 *chunk_size)
-{
-	DRM_RND_STATE(prng, random_seed);
-	u32 s, ms;
-
-	/* Nothing fancy, just try to get an interesting bit pattern */
-
-	prandom_seed_state(&prng, random_seed);
-
-	/* Let size be a random number of pages up to 8 GB (2M pages) */
-	s = 1 + drm_prandom_u32_max_state((BIT(33 - 12)) - 1, &prng);
-	/* Let the chunk size be a random power of 2 less than size */
-	ms = BIT(drm_prandom_u32_max_state(ilog2(s), &prng));
-	/* Round size down to the chunk size */
-	s &= -ms;
-
-	/* Convert from pages to bytes */
-	*chunk_size = (u64)ms << 12;
-	*size = (u64)s << 12;
-}
-
 static void drm_test_buddy_alloc_pathological(struct kunit *test)
 {
 	u64 mm_size, size, start = 0;
@@ -403,96 +99,6 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
 	drm_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_smoke(struct kunit *test)
-{
-	u64 mm_size, chunk_size, start = 0;
-	unsigned long flags = 0;
-	struct drm_buddy mm;
-	int *order;
-	int i;
-
-	DRM_RND_STATE(prng, random_seed);
-	TIMEOUT(end_time);
-
-	mm_config(&mm_size, &chunk_size);
-
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, chunk_size),
-			       "buddy_init failed\n");
-
-	order = drm_random_order(mm.max_order + 1, &prng);
-	KUNIT_ASSERT_TRUE(test, order);
-
-	for (i = 0; i <= mm.max_order; ++i) {
-		struct drm_buddy_block *block;
-		int max_order = order[i];
-		bool timeout = false;
-		LIST_HEAD(blocks);
-		u64 total, size;
-		LIST_HEAD(tmp);
-		int order, err;
-
-		KUNIT_ASSERT_FALSE_MSG(test, check_mm(test, &mm),
-				       "pre-mm check failed, abort\n");
-
-		order = max_order;
-		total = 0;
-
-		do {
-retry:
-			size = get_size(order, chunk_size);
-			err = drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags);
-			if (err) {
-				if (err == -ENOMEM) {
-					KUNIT_FAIL(test, "buddy_alloc hit -ENOMEM with order=%d\n",
-						   order);
-				} else {
-					if (order--) {
-						err = 0;
-						goto retry;
-					}
-
-					KUNIT_FAIL(test, "buddy_alloc with order=%d failed\n",
-						   order);
-				}
-
-				break;
-			}
-
-			block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
-			KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
-
-			list_move_tail(&block->link, &blocks);
-			KUNIT_EXPECT_EQ_MSG(test, drm_buddy_block_order(block), order,
-					    "buddy_alloc order mismatch\n");
-
-			total += drm_buddy_block_size(&mm, block);
-
-			if (__timeout(end_time, NULL)) {
-				timeout = true;
-				break;
-			}
-		} while (total < mm.size);
-
-		if (!err)
-			err = check_blocks(test, &mm, &blocks, total, false);
-
-		drm_buddy_free_list(&mm, &blocks);
-
-		if (!err) {
-			KUNIT_EXPECT_FALSE_MSG(test, check_mm(test, &mm),
-					       "post-mm check failed\n");
-		}
-
-		if (err || timeout)
-			break;
-
-		cond_resched();
-	}
-
-	kfree(order);
-	drm_buddy_fini(&mm);
-}
-
 static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
 {
 	u64 mm_size, size, start = 0;
@@ -634,64 +240,6 @@ static void drm_test_buddy_alloc_optimistic(struct kunit *test)
 	drm_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_range(struct kunit *test)
-{
-	unsigned long flags = DRM_BUDDY_RANGE_ALLOCATION;
-	u64 offset, size, rem, chunk_size, end;
-	unsigned long page_num;
-	struct drm_buddy mm;
-	LIST_HEAD(blocks);
-
-	mm_config(&size, &chunk_size);
-
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, size, chunk_size),
-			       "buddy_init failed");
-
-	KUNIT_ASSERT_FALSE_MSG(test, check_mm(test, &mm),
-			       "pre-mm check failed, abort!");
-
-	rem = mm.size;
-	offset = 0;
-
-	for_each_prime_number_from(page_num, 1, ULONG_MAX - 1) {
-		struct drm_buddy_block *block;
-		LIST_HEAD(tmp);
-
-		size = min(page_num * mm.chunk_size, rem);
-		end = offset + size;
-
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, offset, end,
-								    size, mm.chunk_size,
-									&tmp, flags),
-				"alloc_range with offset=%llx, size=%llx failed\n", offset, size);
-
-		block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
-		KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_range has no blocks\n");
-
-		KUNIT_ASSERT_EQ_MSG(test, drm_buddy_block_offset(block), offset,
-				    "alloc_range start offset mismatch, found=%llx, expected=%llx\n",
-							drm_buddy_block_offset(block), offset);
-
-		KUNIT_ASSERT_FALSE(test, check_blocks(test, &mm, &tmp, size, true));
-
-		list_splice_tail(&tmp, &blocks);
-
-		offset += size;
-
-		rem -= size;
-		if (!rem)
-			break;
-
-		cond_resched();
-	}
-
-	drm_buddy_free_list(&mm, &blocks);
-
-	KUNIT_EXPECT_FALSE_MSG(test, check_mm(test, &mm), "post-mm check failed\n");
-
-	drm_buddy_fini(&mm);
-}
-
 static void drm_test_buddy_alloc_limit(struct kunit *test)
 {
 	u64 size = U64_MAX, start = 0;
@@ -727,29 +275,16 @@ static void drm_test_buddy_alloc_limit(struct kunit *test)
 	drm_buddy_fini(&mm);
 }
 
-static int drm_buddy_suite_init(struct kunit_suite *suite)
-{
-	while (!random_seed)
-		random_seed = get_random_u32();
-
-	kunit_info(suite, "Testing DRM buddy manager, with random_seed=0x%x\n", random_seed);
-
-	return 0;
-}
-
 static struct kunit_case drm_buddy_tests[] = {
 	KUNIT_CASE(drm_test_buddy_alloc_limit),
-	KUNIT_CASE(drm_test_buddy_alloc_range),
 	KUNIT_CASE(drm_test_buddy_alloc_optimistic),
 	KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
-	KUNIT_CASE(drm_test_buddy_alloc_smoke),
 	KUNIT_CASE(drm_test_buddy_alloc_pathological),
 	{}
 };
 
 static struct kunit_suite drm_buddy_test_suite = {
 	.name = "drm_buddy",
-	.suite_init = drm_buddy_suite_init,
 	.test_cases = drm_buddy_tests,
 };
 
diff --git a/drivers/gpu/drm/tests/drm_format_helper_test.c b/drivers/gpu/drm/tests/drm_format_helper_test.c
index f6408e56f786..08992636ec05 100644
--- a/drivers/gpu/drm/tests/drm_format_helper_test.c
+++ b/drivers/gpu/drm/tests/drm_format_helper_test.c
@@ -20,6 +20,10 @@
 
 #define TEST_USE_DEFAULT_PITCH 0
 
+static unsigned char fmtcnv_state_mem[PAGE_SIZE];
+static struct drm_format_conv_state fmtcnv_state =
+	DRM_FORMAT_CONV_STATE_INIT_PREALLOCATED(fmtcnv_state_mem, sizeof(fmtcnv_state_mem));
+
 struct convert_to_gray8_result {
 	unsigned int dst_pitch;
 	const u8 expected[TEST_BUF_SIZE];
@@ -630,8 +634,7 @@ static void drm_test_fb_xrgb8888_to_gray8(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_gray8(&dst, dst_pitch, &src, &fb, &params->clip);
-
+	drm_fb_xrgb8888_to_gray8(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 }
 
@@ -664,7 +667,7 @@ static void drm_test_fb_xrgb8888_to_rgb332(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_rgb332(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_rgb332(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 }
 
@@ -697,12 +700,14 @@ static void drm_test_fb_xrgb8888_to_rgb565(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_rgb565(&dst, dst_pitch, &src, &fb, &params->clip, false);
+	drm_fb_xrgb8888_to_rgb565(&dst, dst_pitch, &src, &fb, &params->clip,
+				  &fmtcnv_state, false);
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
 	buf = dst.vaddr; /* restore original value of buf */
-	drm_fb_xrgb8888_to_rgb565(&dst, &result->dst_pitch, &src, &fb, &params->clip, true);
+	drm_fb_xrgb8888_to_rgb565(&dst, &result->dst_pitch, &src, &fb, &params->clip,
+				  &fmtcnv_state, true);
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected_swab, dst_size);
 
@@ -711,7 +716,8 @@ static void drm_test_fb_xrgb8888_to_rgb565(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_RGB565, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_RGB565, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 
@@ -748,7 +754,7 @@ static void drm_test_fb_xrgb8888_to_xrgb1555(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_xrgb1555(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_xrgb1555(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -757,7 +763,8 @@ static void drm_test_fb_xrgb8888_to_xrgb1555(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XRGB1555, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XRGB1555, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 
@@ -794,7 +801,7 @@ static void drm_test_fb_xrgb8888_to_argb1555(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_argb1555(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_argb1555(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -803,7 +810,8 @@ static void drm_test_fb_xrgb8888_to_argb1555(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ARGB1555, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ARGB1555, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 
@@ -840,7 +848,7 @@ static void drm_test_fb_xrgb8888_to_rgba5551(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_rgba5551(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_rgba5551(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -849,7 +857,8 @@ static void drm_test_fb_xrgb8888_to_rgba5551(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_RGBA5551, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_RGBA5551, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le16buf_to_cpu(test, (__force const __le16 *)buf, dst_size / sizeof(__le16));
 
@@ -890,7 +899,7 @@ static void drm_test_fb_xrgb8888_to_rgb888(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_rgb888(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_rgb888(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
 	buf = dst.vaddr; /* restore original value of buf */
@@ -898,7 +907,8 @@ static void drm_test_fb_xrgb8888_to_rgb888(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_RGB888, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_RGB888, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	KUNIT_EXPECT_FALSE(test, blit_result);
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
@@ -933,7 +943,7 @@ static void drm_test_fb_xrgb8888_to_argb8888(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_argb8888(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_argb8888(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -942,7 +952,8 @@ static void drm_test_fb_xrgb8888_to_argb8888(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ARGB8888, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ARGB8888, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
@@ -979,7 +990,7 @@ static void drm_test_fb_xrgb8888_to_xrgb2101010(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_xrgb2101010(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_xrgb2101010(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le32buf_to_cpu(test, buf, dst_size / sizeof(u32));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -989,7 +1000,7 @@ static void drm_test_fb_xrgb8888_to_xrgb2101010(struct kunit *test)
 	int blit_result = 0;
 
 	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XRGB2101010, &src, &fb,
-				  &params->clip);
+				  &params->clip, &fmtcnv_state);
 
 	KUNIT_EXPECT_FALSE(test, blit_result);
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
@@ -1024,7 +1035,7 @@ static void drm_test_fb_xrgb8888_to_argb2101010(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_argb2101010(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_argb2101010(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -1034,7 +1045,7 @@ static void drm_test_fb_xrgb8888_to_argb2101010(struct kunit *test)
 	int blit_result = 0;
 
 	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ARGB2101010, &src, &fb,
-				  &params->clip);
+				  &params->clip, &fmtcnv_state);
 
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
@@ -1071,7 +1082,7 @@ static void drm_test_fb_xrgb8888_to_mono(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_xrgb8888_to_mono(&dst, dst_pitch, &src, &fb, &params->clip);
+	drm_fb_xrgb8888_to_mono(&dst, dst_pitch, &src, &fb, &params->clip, &fmtcnv_state);
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 }
 
@@ -1104,7 +1115,7 @@ static void drm_test_fb_swab(struct kunit *test)
 	const unsigned int *dst_pitch = (result->dst_pitch == TEST_USE_DEFAULT_PITCH) ?
 		NULL : &result->dst_pitch;
 
-	drm_fb_swab(&dst, dst_pitch, &src, &fb, &params->clip, false);
+	drm_fb_swab(&dst, dst_pitch, &src, &fb, &params->clip, false, &fmtcnv_state);
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 	KUNIT_EXPECT_MEMEQ(test, buf, result->expected, dst_size);
 
@@ -1114,7 +1125,7 @@ static void drm_test_fb_swab(struct kunit *test)
 	int blit_result;
 
 	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XRGB8888 | DRM_FORMAT_BIG_ENDIAN,
-				  &src, &fb, &params->clip);
+				  &src, &fb, &params->clip, &fmtcnv_state);
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
 	KUNIT_EXPECT_FALSE(test, blit_result);
@@ -1123,7 +1134,8 @@ static void drm_test_fb_swab(struct kunit *test)
 	buf = dst.vaddr;
 	memset(buf, 0, dst_size);
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_BGRX8888, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_BGRX8888, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
 	KUNIT_EXPECT_FALSE(test, blit_result);
@@ -1137,7 +1149,8 @@ static void drm_test_fb_swab(struct kunit *test)
 	mock_format.format |= DRM_FORMAT_BIG_ENDIAN;
 	fb.format = &mock_format;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XRGB8888, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XRGB8888, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
 	KUNIT_EXPECT_FALSE(test, blit_result);
@@ -1175,7 +1188,8 @@ static void drm_test_fb_xrgb8888_to_abgr8888(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ABGR8888, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_ABGR8888, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
@@ -1214,7 +1228,8 @@ static void drm_test_fb_xrgb8888_to_xbgr8888(struct kunit *test)
 
 	int blit_result = 0;
 
-	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XBGR8888, &src, &fb, &params->clip);
+	blit_result = drm_fb_blit(&dst, dst_pitch, DRM_FORMAT_XBGR8888, &src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	buf = le32buf_to_cpu(test, (__force const __le32 *)buf, dst_size / sizeof(u32));
 
@@ -1817,7 +1832,8 @@ static void drm_test_fb_memcpy(struct kunit *test)
 
 	int blit_result;
 
-	blit_result = drm_fb_blit(dst, dst_pitches, params->format, src, &fb, &params->clip);
+	blit_result = drm_fb_blit(dst, dst_pitches, params->format, src, &fb, &params->clip,
+				  &fmtcnv_state);
 
 	KUNIT_EXPECT_FALSE(test, blit_result);
 	for (size_t i = 0; i < fb.format->num_planes; i++) {
diff --git a/drivers/gpu/drm/tests/drm_mm_test.c b/drivers/gpu/drm/tests/drm_mm_test.c
index 05d5e7af6d25..4e9247cf9977 100644
--- a/drivers/gpu/drm/tests/drm_mm_test.c
+++ b/drivers/gpu/drm/tests/drm_mm_test.c
@@ -17,10 +17,6 @@
 
 #include "../lib/drm_random.h"
 
-static unsigned int random_seed;
-static unsigned int max_iterations = 8192;
-static unsigned int max_prime = 128;
-
 enum {
 	BEST,
 	BOTTOMUP,
@@ -37,10 +33,6 @@ static const struct insert_mode {
 	[TOPDOWN] = { "top-down", DRM_MM_INSERT_HIGH },
 	[EVICT] = { "evict", DRM_MM_INSERT_EVICT },
 	{}
-}, evict_modes[] = {
-	{ "bottom-up", DRM_MM_INSERT_LOW },
-	{ "top-down", DRM_MM_INSERT_HIGH },
-	{}
 };
 
 static bool assert_no_holes(struct kunit *test, const struct drm_mm *mm)
@@ -97,57 +89,6 @@ static bool assert_one_hole(struct kunit *test, const struct drm_mm *mm, u64 sta
 	return ok;
 }
 
-static bool assert_continuous(struct kunit *test, const struct drm_mm *mm, u64 size)
-{
-	struct drm_mm_node *node, *check, *found;
-	unsigned long n;
-	u64 addr;
-
-	if (!assert_no_holes(test, mm))
-		return false;
-
-	n = 0;
-	addr = 0;
-	drm_mm_for_each_node(node, mm) {
-		if (node->start != addr) {
-			KUNIT_FAIL(test, "node[%ld] list out of order, expected %llx found %llx\n",
-				   n, addr, node->start);
-			return false;
-		}
-
-		if (node->size != size) {
-			KUNIT_FAIL(test, "node[%ld].size incorrect, expected %llx, found %llx\n",
-				   n, size, node->size);
-			return false;
-		}
-
-		if (drm_mm_hole_follows(node)) {
-			KUNIT_FAIL(test, "node[%ld] is followed by a hole!\n", n);
-			return false;
-		}
-
-		found = NULL;
-		drm_mm_for_each_node_in_range(check, mm, addr, addr + size) {
-			if (node != check) {
-				KUNIT_FAIL(test,
-					   "lookup return wrong node, expected start %llx, found %llx\n",
-					   node->start, check->start);
-				return false;
-			}
-			found = check;
-		}
-		if (!found) {
-			KUNIT_FAIL(test, "lookup failed for node %llx + %llx\n", addr, size);
-			return false;
-		}
-
-		addr += size;
-		n++;
-	}
-
-	return true;
-}
-
 static u64 misalignment(struct drm_mm_node *node, u64 alignment)
 {
 	u64 rem;
@@ -270,215 +211,6 @@ static void drm_test_mm_debug(struct kunit *test)
 			       nodes[0].start, nodes[0].size);
 }
 
-static struct drm_mm_node *set_node(struct drm_mm_node *node,
-				    u64 start, u64 size)
-{
-	node->start = start;
-	node->size = size;
-	return node;
-}
-
-static bool expect_reserve_fail(struct kunit *test, struct drm_mm *mm, struct drm_mm_node *node)
-{
-	int err;
-
-	err = drm_mm_reserve_node(mm, node);
-	if (likely(err == -ENOSPC))
-		return true;
-
-	if (!err) {
-		KUNIT_FAIL(test, "impossible reserve succeeded, node %llu + %llu\n",
-			   node->start, node->size);
-		drm_mm_remove_node(node);
-	} else {
-		KUNIT_FAIL(test,
-			   "impossible reserve failed with wrong error %d [expected %d], node %llu + %llu\n",
-		       err, -ENOSPC, node->start, node->size);
-	}
-	return false;
-}
-
-static bool noinline_for_stack check_reserve_boundaries(struct kunit *test, struct drm_mm *mm,
-							unsigned int count,
-							u64 size)
-{
-	const struct boundary {
-		u64 start, size;
-		const char *name;
-	} boundaries[] = {
-#define B(st, sz) { (st), (sz), "{ " #st ", " #sz "}" }
-		B(0, 0),
-		B(-size, 0),
-		B(size, 0),
-		B(size * count, 0),
-		B(-size, size),
-		B(-size, -size),
-		B(-size, 2 * size),
-		B(0, -size),
-		B(size, -size),
-		B(count * size, size),
-		B(count * size, -size),
-		B(count * size, count * size),
-		B(count * size, -count * size),
-		B(count * size, -(count + 1) * size),
-		B((count + 1) * size, size),
-		B((count + 1) * size, -size),
-		B((count + 1) * size, -2 * size),
-#undef B
-	};
-	struct drm_mm_node tmp = {};
-	int n;
-
-	for (n = 0; n < ARRAY_SIZE(boundaries); n++) {
-		if (!expect_reserve_fail(test, mm, set_node(&tmp, boundaries[n].start,
-							    boundaries[n].size))) {
-			KUNIT_FAIL(test, "boundary[%d:%s] failed, count=%u, size=%lld\n",
-				   n, boundaries[n].name, count, size);
-			return false;
-		}
-	}
-
-	return true;
-}
-
-static int __drm_test_mm_reserve(struct kunit *test, unsigned int count, u64 size)
-{
-	DRM_RND_STATE(prng, random_seed);
-	struct drm_mm mm;
-	struct drm_mm_node tmp, *nodes, *node, *next;
-	unsigned int *order, n, m, o = 0;
-	int ret, err;
-
-	/* For exercising drm_mm_reserve_node(), we want to check that
-	 * reservations outside of the drm_mm range are rejected, and to
-	 * overlapping and otherwise already occupied ranges. Afterwards,
-	 * the tree and nodes should be intact.
-	 */
-
-	DRM_MM_BUG_ON(!count);
-	DRM_MM_BUG_ON(!size);
-
-	ret = -ENOMEM;
-	order = drm_random_order(count, &prng);
-	if (!order)
-		goto err;
-
-	nodes = vzalloc(array_size(count, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	ret = -EINVAL;
-	drm_mm_init(&mm, 0, count * size);
-
-	if (!check_reserve_boundaries(test, &mm, count, size))
-		goto out;
-
-	for (n = 0; n < count; n++) {
-		nodes[n].start = order[n] * size;
-		nodes[n].size = size;
-
-		err = drm_mm_reserve_node(&mm, &nodes[n]);
-		if (err) {
-			KUNIT_FAIL(test, "reserve failed, step %d, start %llu\n",
-				   n, nodes[n].start);
-			ret = err;
-			goto out;
-		}
-
-		if (!drm_mm_node_allocated(&nodes[n])) {
-			KUNIT_FAIL(test, "reserved node not allocated! step %d, start %llu\n",
-				   n, nodes[n].start);
-			goto out;
-		}
-
-		if (!expect_reserve_fail(test, &mm, &nodes[n]))
-			goto out;
-	}
-
-	/* After random insertion the nodes should be in order */
-	if (!assert_continuous(test, &mm, size))
-		goto out;
-
-	/* Repeated use should then fail */
-	drm_random_reorder(order, count, &prng);
-	for (n = 0; n < count; n++) {
-		if (!expect_reserve_fail(test, &mm, set_node(&tmp, order[n] * size, 1)))
-			goto out;
-
-		/* Remove and reinsert should work */
-		drm_mm_remove_node(&nodes[order[n]]);
-		err = drm_mm_reserve_node(&mm, &nodes[order[n]]);
-		if (err) {
-			KUNIT_FAIL(test, "reserve failed, step %d, start %llu\n",
-				   n, nodes[n].start);
-			ret = err;
-			goto out;
-		}
-	}
-
-	if (!assert_continuous(test, &mm, size))
-		goto out;
-
-	/* Overlapping use should then fail */
-	for (n = 0; n < count; n++) {
-		if (!expect_reserve_fail(test, &mm, set_node(&tmp, 0, size * count)))
-			goto out;
-	}
-	for (n = 0; n < count; n++) {
-		if (!expect_reserve_fail(test, &mm, set_node(&tmp, size * n, size * (count - n))))
-			goto out;
-	}
-
-	/* Remove several, reinsert, check full */
-	for_each_prime_number(n, min(max_prime, count)) {
-		for (m = 0; m < n; m++) {
-			node = &nodes[order[(o + m) % count]];
-			drm_mm_remove_node(node);
-		}
-
-		for (m = 0; m < n; m++) {
-			node = &nodes[order[(o + m) % count]];
-			err = drm_mm_reserve_node(&mm, node);
-			if (err) {
-				KUNIT_FAIL(test, "reserve failed, step %d/%d, start %llu\n",
-					   m, n, node->start);
-				ret = err;
-				goto out;
-			}
-		}
-
-		o += n;
-
-		if (!assert_continuous(test, &mm, size))
-			goto out;
-	}
-
-	ret = 0;
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	vfree(nodes);
-	kfree(order);
-err:
-	return ret;
-}
-
-static void drm_test_mm_reserve(struct kunit *test)
-{
-	const unsigned int count = min_t(unsigned int, BIT(10), max_iterations);
-	int n;
-
-	for_each_prime_number_from(n, 1, 54) {
-		u64 size = BIT_ULL(n);
-
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_reserve(test, count, size - 1));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_reserve(test, count, size));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_reserve(test, count, size + 1));
-
-		cond_resched();
-	}
-}
-
 static bool expect_insert(struct kunit *test, struct drm_mm *mm,
 			  struct drm_mm_node *node, u64 size, u64 alignment, unsigned long color,
 			const struct insert_mode *mode)
@@ -503,600 +235,6 @@ static bool expect_insert(struct kunit *test, struct drm_mm *mm,
 	return true;
 }
 
-static bool expect_insert_fail(struct kunit *test, struct drm_mm *mm, u64 size)
-{
-	struct drm_mm_node tmp = {};
-	int err;
-
-	err = drm_mm_insert_node(mm, &tmp, size);
-	if (likely(err == -ENOSPC))
-		return true;
-
-	if (!err) {
-		KUNIT_FAIL(test, "impossible insert succeeded, node %llu + %llu\n",
-			   tmp.start, tmp.size);
-		drm_mm_remove_node(&tmp);
-	} else {
-		KUNIT_FAIL(test,
-			   "impossible insert failed with wrong error %d [expected %d], size %llu\n",
-			   err, -ENOSPC, size);
-	}
-	return false;
-}
-
-static int __drm_test_mm_insert(struct kunit *test, unsigned int count, u64 size, bool replace)
-{
-	DRM_RND_STATE(prng, random_seed);
-	const struct insert_mode *mode;
-	struct drm_mm mm;
-	struct drm_mm_node *nodes, *node, *next;
-	unsigned int *order, n, m, o = 0;
-	int ret;
-
-	/* Fill a range with lots of nodes, check it doesn't fail too early */
-
-	DRM_MM_BUG_ON(!count);
-	DRM_MM_BUG_ON(!size);
-
-	ret = -ENOMEM;
-	nodes = vmalloc(array_size(count, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	order = drm_random_order(count, &prng);
-	if (!order)
-		goto err_nodes;
-
-	ret = -EINVAL;
-	drm_mm_init(&mm, 0, count * size);
-
-	for (mode = insert_modes; mode->name; mode++) {
-		for (n = 0; n < count; n++) {
-			struct drm_mm_node tmp;
-
-			node = replace ? &tmp : &nodes[n];
-			memset(node, 0, sizeof(*node));
-			if (!expect_insert(test, &mm, node, size, 0, n, mode)) {
-				KUNIT_FAIL(test, "%s insert failed, size %llu step %d\n",
-					   mode->name, size, n);
-				goto out;
-			}
-
-			if (replace) {
-				drm_mm_replace_node(&tmp, &nodes[n]);
-				if (drm_mm_node_allocated(&tmp)) {
-					KUNIT_FAIL(test,
-						   "replaced old-node still allocated! step %d\n",
-						   n);
-					goto out;
-				}
-
-				if (!assert_node(test, &nodes[n], &mm, size, 0, n)) {
-					KUNIT_FAIL(test,
-						   "replaced node did not inherit parameters, size %llu step %d\n",
-						   size, n);
-					goto out;
-				}
-
-				if (tmp.start != nodes[n].start) {
-					KUNIT_FAIL(test,
-						   "replaced node mismatch location expected [%llx + %llx], found [%llx + %llx]\n",
-						   tmp.start, size, nodes[n].start, nodes[n].size);
-					goto out;
-				}
-			}
-		}
-
-		/* After random insertion the nodes should be in order */
-		if (!assert_continuous(test, &mm, size))
-			goto out;
-
-		/* Repeated use should then fail */
-		if (!expect_insert_fail(test, &mm, size))
-			goto out;
-
-		/* Remove one and reinsert, as the only hole it should refill itself */
-		for (n = 0; n < count; n++) {
-			u64 addr = nodes[n].start;
-
-			drm_mm_remove_node(&nodes[n]);
-			if (!expect_insert(test, &mm, &nodes[n], size, 0, n, mode)) {
-				KUNIT_FAIL(test, "%s reinsert failed, size %llu step %d\n",
-					   mode->name, size, n);
-				goto out;
-			}
-
-			if (nodes[n].start != addr) {
-				KUNIT_FAIL(test,
-					   "%s reinsert node moved, step %d, expected %llx, found %llx\n",
-					   mode->name, n, addr, nodes[n].start);
-				goto out;
-			}
-
-			if (!assert_continuous(test, &mm, size))
-				goto out;
-		}
-
-		/* Remove several, reinsert, check full */
-		for_each_prime_number(n, min(max_prime, count)) {
-			for (m = 0; m < n; m++) {
-				node = &nodes[order[(o + m) % count]];
-				drm_mm_remove_node(node);
-			}
-
-			for (m = 0; m < n; m++) {
-				node = &nodes[order[(o + m) % count]];
-				if (!expect_insert(test, &mm, node, size, 0, n, mode)) {
-					KUNIT_FAIL(test,
-						   "%s multiple reinsert failed, size %llu step %d\n",
-							   mode->name, size, n);
-					goto out;
-				}
-			}
-
-			o += n;
-
-			if (!assert_continuous(test, &mm, size))
-				goto out;
-
-			if (!expect_insert_fail(test, &mm, size))
-				goto out;
-		}
-
-		drm_mm_for_each_node_safe(node, next, &mm)
-			drm_mm_remove_node(node);
-		DRM_MM_BUG_ON(!drm_mm_clean(&mm));
-
-		cond_resched();
-	}
-
-	ret = 0;
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_nodes:
-	vfree(nodes);
-	return ret;
-}
-
-static void drm_test_mm_insert(struct kunit *test)
-{
-	const unsigned int count = min_t(unsigned int, BIT(10), max_iterations);
-	unsigned int n;
-
-	for_each_prime_number_from(n, 1, 54) {
-		u64 size = BIT_ULL(n);
-
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert(test, count, size - 1, false));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert(test, count, size, false));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert(test, count, size + 1, false));
-
-		cond_resched();
-	}
-}
-
-static void drm_test_mm_replace(struct kunit *test)
-{
-	const unsigned int count = min_t(unsigned int, BIT(10), max_iterations);
-	unsigned int n;
-
-	/* Reuse __drm_test_mm_insert to exercise replacement by inserting a dummy node,
-	 * then replacing it with the intended node. We want to check that
-	 * the tree is intact and all the information we need is carried
-	 * across to the target node.
-	 */
-
-	for_each_prime_number_from(n, 1, 54) {
-		u64 size = BIT_ULL(n);
-
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert(test, count, size - 1, true));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert(test, count, size, true));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert(test, count, size + 1, true));
-
-		cond_resched();
-	}
-}
-
-static bool expect_insert_in_range(struct kunit *test, struct drm_mm *mm, struct drm_mm_node *node,
-				   u64 size, u64 alignment, unsigned long color,
-				   u64 range_start, u64 range_end, const struct insert_mode *mode)
-{
-	int err;
-
-	err = drm_mm_insert_node_in_range(mm, node,
-					  size, alignment, color,
-					  range_start, range_end,
-					  mode->mode);
-	if (err) {
-		KUNIT_FAIL(test,
-			   "insert (size=%llu, alignment=%llu, color=%lu, mode=%s) nto range [%llx, %llx] failed with err=%d\n",
-				   size, alignment, color, mode->name,
-				   range_start, range_end, err);
-		return false;
-	}
-
-	if (!assert_node(test, node, mm, size, alignment, color)) {
-		drm_mm_remove_node(node);
-		return false;
-	}
-
-	return true;
-}
-
-static bool expect_insert_in_range_fail(struct kunit *test, struct drm_mm *mm,
-					u64 size, u64 range_start, u64 range_end)
-{
-	struct drm_mm_node tmp = {};
-	int err;
-
-	err = drm_mm_insert_node_in_range(mm, &tmp, size, 0, 0, range_start, range_end,
-					  0);
-	if (likely(err == -ENOSPC))
-		return true;
-
-	if (!err) {
-		KUNIT_FAIL(test,
-			   "impossible insert succeeded, node %llx + %llu, range [%llx, %llx]\n",
-				   tmp.start, tmp.size, range_start, range_end);
-		drm_mm_remove_node(&tmp);
-	} else {
-		KUNIT_FAIL(test,
-			   "impossible insert failed with wrong error %d [expected %d], size %llu, range [%llx, %llx]\n",
-				   err, -ENOSPC, size, range_start, range_end);
-	}
-
-	return false;
-}
-
-static bool assert_contiguous_in_range(struct kunit *test, struct drm_mm *mm,
-				       u64 size, u64 start, u64 end)
-{
-	struct drm_mm_node *node;
-	unsigned int n;
-
-	if (!expect_insert_in_range_fail(test, mm, size, start, end))
-		return false;
-
-	n = div64_u64(start + size - 1, size);
-	drm_mm_for_each_node(node, mm) {
-		if (node->start < start || node->start + node->size > end) {
-			KUNIT_FAIL(test,
-				   "node %d out of range, address [%llx + %llu], range [%llx, %llx]\n",
-					   n, node->start, node->start + node->size, start, end);
-			return false;
-		}
-
-		if (node->start != n * size) {
-			KUNIT_FAIL(test, "node %d out of order, expected start %llx, found %llx\n",
-				   n, n * size, node->start);
-			return false;
-		}
-
-		if (node->size != size) {
-			KUNIT_FAIL(test, "node %d has wrong size, expected size %llx, found %llx\n",
-				   n, size, node->size);
-			return false;
-		}
-
-		if (drm_mm_hole_follows(node) && drm_mm_hole_node_end(node) < end) {
-			KUNIT_FAIL(test, "node %d is followed by a hole!\n", n);
-			return false;
-		}
-
-		n++;
-	}
-
-	if (start > 0) {
-		node = __drm_mm_interval_first(mm, 0, start - 1);
-		if (drm_mm_node_allocated(node)) {
-			KUNIT_FAIL(test, "node before start: node=%llx+%llu, start=%llx\n",
-				   node->start, node->size, start);
-			return false;
-		}
-	}
-
-	if (end < U64_MAX) {
-		node = __drm_mm_interval_first(mm, end, U64_MAX);
-		if (drm_mm_node_allocated(node)) {
-			KUNIT_FAIL(test, "node after end: node=%llx+%llu, end=%llx\n",
-				   node->start, node->size, end);
-			return false;
-		}
-	}
-
-	return true;
-}
-
-static int __drm_test_mm_insert_range(struct kunit *test, unsigned int count, u64 size,
-				      u64 start, u64 end)
-{
-	const struct insert_mode *mode;
-	struct drm_mm mm;
-	struct drm_mm_node *nodes, *node, *next;
-	unsigned int n, start_n, end_n;
-	int ret;
-
-	DRM_MM_BUG_ON(!count);
-	DRM_MM_BUG_ON(!size);
-	DRM_MM_BUG_ON(end <= start);
-
-	/* Very similar to __drm_test_mm_insert(), but now instead of populating the
-	 * full range of the drm_mm, we try to fill a small portion of it.
-	 */
-
-	ret = -ENOMEM;
-	nodes = vzalloc(array_size(count, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	ret = -EINVAL;
-	drm_mm_init(&mm, 0, count * size);
-
-	start_n = div64_u64(start + size - 1, size);
-	end_n = div64_u64(end - size, size);
-
-	for (mode = insert_modes; mode->name; mode++) {
-		for (n = start_n; n <= end_n; n++) {
-			if (!expect_insert_in_range(test, &mm, &nodes[n], size, size, n,
-						    start, end, mode)) {
-				KUNIT_FAIL(test,
-					   "%s insert failed, size %llu, step %d [%d, %d], range [%llx, %llx]\n",
-						   mode->name, size, n, start_n, end_n, start, end);
-				goto out;
-			}
-		}
-
-		if (!assert_contiguous_in_range(test, &mm, size, start, end)) {
-			KUNIT_FAIL(test,
-				   "%s: range [%llx, %llx] not full after initialisation, size=%llu\n",
-				   mode->name, start, end, size);
-			goto out;
-		}
-
-		/* Remove one and reinsert, it should refill itself */
-		for (n = start_n; n <= end_n; n++) {
-			u64 addr = nodes[n].start;
-
-			drm_mm_remove_node(&nodes[n]);
-			if (!expect_insert_in_range(test, &mm, &nodes[n], size, size, n,
-						    start, end, mode)) {
-				KUNIT_FAIL(test, "%s reinsert failed, step %d\n", mode->name, n);
-				goto out;
-			}
-
-			if (nodes[n].start != addr) {
-				KUNIT_FAIL(test,
-					   "%s reinsert node moved, step %d, expected %llx, found %llx\n",
-					   mode->name, n, addr, nodes[n].start);
-				goto out;
-			}
-		}
-
-		if (!assert_contiguous_in_range(test, &mm, size, start, end)) {
-			KUNIT_FAIL(test,
-				   "%s: range [%llx, %llx] not full after reinsertion, size=%llu\n",
-				   mode->name, start, end, size);
-			goto out;
-		}
-
-		drm_mm_for_each_node_safe(node, next, &mm)
-			drm_mm_remove_node(node);
-		DRM_MM_BUG_ON(!drm_mm_clean(&mm));
-
-		cond_resched();
-	}
-
-	ret = 0;
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	vfree(nodes);
-	return ret;
-}
-
-static int insert_outside_range(struct kunit *test)
-{
-	struct drm_mm mm;
-	const unsigned int start = 1024;
-	const unsigned int end = 2048;
-	const unsigned int size = end - start;
-
-	drm_mm_init(&mm, start, size);
-
-	if (!expect_insert_in_range_fail(test, &mm, 1, 0, start))
-		return -EINVAL;
-
-	if (!expect_insert_in_range_fail(test, &mm, size,
-					 start - size / 2, start + (size + 1) / 2))
-		return -EINVAL;
-
-	if (!expect_insert_in_range_fail(test, &mm, size,
-					 end - (size + 1) / 2, end + size / 2))
-		return -EINVAL;
-
-	if (!expect_insert_in_range_fail(test, &mm, 1, end, end + size))
-		return -EINVAL;
-
-	drm_mm_takedown(&mm);
-	return 0;
-}
-
-static void drm_test_mm_insert_range(struct kunit *test)
-{
-	const unsigned int count = min_t(unsigned int, BIT(13), max_iterations);
-	unsigned int n;
-
-	/* Check that requests outside the bounds of drm_mm are rejected. */
-	KUNIT_ASSERT_FALSE(test, insert_outside_range(test));
-
-	for_each_prime_number_from(n, 1, 50) {
-		const u64 size = BIT_ULL(n);
-		const u64 max = count * size;
-
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert_range(test, count, size, 0, max));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert_range(test, count, size, 1, max));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert_range(test, count, size, 0, max - 1));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert_range(test, count, size, 0, max / 2));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert_range(test, count, size,
-								    max / 2, max));
-		KUNIT_ASSERT_FALSE(test, __drm_test_mm_insert_range(test, count, size,
-								    max / 4 + 1, 3 * max / 4 - 1));
-
-		cond_resched();
-	}
-}
-
-static int prepare_frag(struct kunit *test, struct drm_mm *mm, struct drm_mm_node *nodes,
-			unsigned int num_insert, const struct insert_mode *mode)
-{
-	unsigned int size = 4096;
-	unsigned int i;
-
-	for (i = 0; i < num_insert; i++) {
-		if (!expect_insert(test, mm, &nodes[i], size, 0, i, mode) != 0) {
-			KUNIT_FAIL(test, "%s insert failed\n", mode->name);
-			return -EINVAL;
-		}
-	}
-
-	/* introduce fragmentation by freeing every other node */
-	for (i = 0; i < num_insert; i++) {
-		if (i % 2 == 0)
-			drm_mm_remove_node(&nodes[i]);
-	}
-
-	return 0;
-}
-
-static u64 get_insert_time(struct kunit *test, struct drm_mm *mm,
-			   unsigned int num_insert, struct drm_mm_node *nodes,
-			   const struct insert_mode *mode)
-{
-	unsigned int size = 8192;
-	ktime_t start;
-	unsigned int i;
-
-	start = ktime_get();
-	for (i = 0; i < num_insert; i++) {
-		if (!expect_insert(test, mm, &nodes[i], size, 0, i, mode) != 0) {
-			KUNIT_FAIL(test, "%s insert failed\n", mode->name);
-			return 0;
-		}
-	}
-
-	return ktime_to_ns(ktime_sub(ktime_get(), start));
-}
-
-static void drm_test_mm_frag(struct kunit *test)
-{
-	struct drm_mm mm;
-	const struct insert_mode *mode;
-	struct drm_mm_node *nodes, *node, *next;
-	unsigned int insert_size = 10000;
-	unsigned int scale_factor = 4;
-
-	/* We need 4 * insert_size nodes to hold intermediate allocated
-	 * drm_mm nodes.
-	 * 1 times for prepare_frag()
-	 * 1 times for get_insert_time()
-	 * 2 times for get_insert_time()
-	 */
-	nodes = vzalloc(array_size(insert_size * 4, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	/* For BOTTOMUP and TOPDOWN, we first fragment the
-	 * address space using prepare_frag() and then try to verify
-	 * that insertions scale quadratically from 10k to 20k insertions
-	 */
-	drm_mm_init(&mm, 1, U64_MAX - 2);
-	for (mode = insert_modes; mode->name; mode++) {
-		u64 insert_time1, insert_time2;
-
-		if (mode->mode != DRM_MM_INSERT_LOW &&
-		    mode->mode != DRM_MM_INSERT_HIGH)
-			continue;
-
-		if (prepare_frag(test, &mm, nodes, insert_size, mode))
-			goto err;
-
-		insert_time1 = get_insert_time(test, &mm, insert_size,
-					       nodes + insert_size, mode);
-		if (insert_time1 == 0)
-			goto err;
-
-		insert_time2 = get_insert_time(test, &mm, (insert_size * 2),
-					       nodes + insert_size * 2, mode);
-		if (insert_time2 == 0)
-			goto err;
-
-		kunit_info(test, "%s fragmented insert of %u and %u insertions took %llu and %llu nsecs\n",
-			   mode->name, insert_size, insert_size * 2, insert_time1, insert_time2);
-
-		if (insert_time2 > (scale_factor * insert_time1)) {
-			KUNIT_FAIL(test, "%s fragmented insert took %llu nsecs more\n",
-				   mode->name, insert_time2 - (scale_factor * insert_time1));
-			goto err;
-		}
-
-		drm_mm_for_each_node_safe(node, next, &mm)
-			drm_mm_remove_node(node);
-	}
-
-err:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	vfree(nodes);
-}
-
-static void drm_test_mm_align(struct kunit *test)
-{
-	const struct insert_mode *mode;
-	const unsigned int max_count = min(8192u, max_prime);
-	struct drm_mm mm;
-	struct drm_mm_node *nodes, *node, *next;
-	unsigned int prime;
-
-	/* For each of the possible insertion modes, we pick a few
-	 * arbitrary alignments and check that the inserted node
-	 * meets our requirements.
-	 */
-
-	nodes = vzalloc(array_size(max_count, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	drm_mm_init(&mm, 1, U64_MAX - 2);
-
-	for (mode = insert_modes; mode->name; mode++) {
-		unsigned int i = 0;
-
-		for_each_prime_number_from(prime, 1, max_count) {
-			u64 size = next_prime_number(prime);
-
-			if (!expect_insert(test, &mm, &nodes[i], size, prime, i, mode)) {
-				KUNIT_FAIL(test, "%s insert failed with alignment=%d",
-					   mode->name, prime);
-				goto out;
-			}
-
-			i++;
-		}
-
-		drm_mm_for_each_node_safe(node, next, &mm)
-			drm_mm_remove_node(node);
-		DRM_MM_BUG_ON(!drm_mm_clean(&mm));
-
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	vfree(nodes);
-}
-
 static void drm_test_mm_align_pot(struct kunit *test, int max)
 {
 	struct drm_mm mm;
@@ -1144,626 +282,6 @@ static void drm_test_mm_align64(struct kunit *test)
 	drm_test_mm_align_pot(test, 64);
 }
 
-static void show_scan(struct kunit *test, const struct drm_mm_scan *scan)
-{
-	kunit_info(test, "scan: hit [%llx, %llx], size=%lld, align=%lld, color=%ld\n",
-		   scan->hit_start, scan->hit_end, scan->size, scan->alignment, scan->color);
-}
-
-static void show_holes(struct kunit *test, const struct drm_mm *mm, int count)
-{
-	u64 hole_start, hole_end;
-	struct drm_mm_node *hole;
-
-	drm_mm_for_each_hole(hole, mm, hole_start, hole_end) {
-		struct drm_mm_node *next = list_next_entry(hole, node_list);
-		const char *node1 = NULL, *node2 = NULL;
-
-		if (drm_mm_node_allocated(hole))
-			node1 = kasprintf(GFP_KERNEL, "[%llx + %lld, color=%ld], ",
-					  hole->start, hole->size, hole->color);
-
-		if (drm_mm_node_allocated(next))
-			node2 = kasprintf(GFP_KERNEL, ", [%llx + %lld, color=%ld]",
-					  next->start, next->size, next->color);
-
-		kunit_info(test, "%sHole [%llx - %llx, size %lld]%s\n", node1,
-			   hole_start, hole_end, hole_end - hole_start, node2);
-
-		kfree(node2);
-		kfree(node1);
-
-		if (!--count)
-			break;
-	}
-}
-
-struct evict_node {
-	struct drm_mm_node node;
-	struct list_head link;
-};
-
-static bool evict_nodes(struct kunit *test, struct drm_mm_scan *scan,
-			struct evict_node *nodes, unsigned int *order, unsigned int count,
-			bool use_color, struct list_head *evict_list)
-{
-	struct evict_node *e, *en;
-	unsigned int i;
-
-	for (i = 0; i < count; i++) {
-		e = &nodes[order ? order[i] : i];
-		list_add(&e->link, evict_list);
-		if (drm_mm_scan_add_block(scan, &e->node))
-			break;
-	}
-	list_for_each_entry_safe(e, en, evict_list, link) {
-		if (!drm_mm_scan_remove_block(scan, &e->node))
-			list_del(&e->link);
-	}
-	if (list_empty(evict_list)) {
-		KUNIT_FAIL(test,
-			   "Failed to find eviction: size=%lld [avail=%d], align=%lld (color=%lu)\n",
-			   scan->size, count, scan->alignment, scan->color);
-		return false;
-	}
-
-	list_for_each_entry(e, evict_list, link)
-		drm_mm_remove_node(&e->node);
-
-	if (use_color) {
-		struct drm_mm_node *node;
-
-		while ((node = drm_mm_scan_color_evict(scan))) {
-			e = container_of(node, typeof(*e), node);
-			drm_mm_remove_node(&e->node);
-			list_add(&e->link, evict_list);
-		}
-	} else {
-		if (drm_mm_scan_color_evict(scan)) {
-			KUNIT_FAIL(test,
-				   "drm_mm_scan_color_evict unexpectedly reported overlapping nodes!\n");
-			return false;
-		}
-	}
-
-	return true;
-}
-
-static bool evict_nothing(struct kunit *test, struct drm_mm *mm,
-			  unsigned int total_size, struct evict_node *nodes)
-{
-	struct drm_mm_scan scan;
-	LIST_HEAD(evict_list);
-	struct evict_node *e;
-	struct drm_mm_node *node;
-	unsigned int n;
-
-	drm_mm_scan_init(&scan, mm, 1, 0, 0, 0);
-	for (n = 0; n < total_size; n++) {
-		e = &nodes[n];
-		list_add(&e->link, &evict_list);
-		drm_mm_scan_add_block(&scan, &e->node);
-	}
-	list_for_each_entry(e, &evict_list, link)
-		drm_mm_scan_remove_block(&scan, &e->node);
-
-	for (n = 0; n < total_size; n++) {
-		e = &nodes[n];
-
-		if (!drm_mm_node_allocated(&e->node)) {
-			KUNIT_FAIL(test, "node[%d] no longer allocated!\n", n);
-			return false;
-		}
-
-		e->link.next = NULL;
-	}
-
-	drm_mm_for_each_node(node, mm) {
-		e = container_of(node, typeof(*e), node);
-		e->link.next = &e->link;
-	}
-
-	for (n = 0; n < total_size; n++) {
-		e = &nodes[n];
-
-		if (!e->link.next) {
-			KUNIT_FAIL(test, "node[%d] no longer connected!\n", n);
-			return false;
-		}
-	}
-
-	return assert_continuous(test, mm, nodes[0].node.size);
-}
-
-static bool evict_everything(struct kunit *test, struct drm_mm *mm,
-			     unsigned int total_size, struct evict_node *nodes)
-{
-	struct drm_mm_scan scan;
-	LIST_HEAD(evict_list);
-	struct evict_node *e;
-	unsigned int n;
-	int err;
-
-	drm_mm_scan_init(&scan, mm, total_size, 0, 0, 0);
-	for (n = 0; n < total_size; n++) {
-		e = &nodes[n];
-		list_add(&e->link, &evict_list);
-		if (drm_mm_scan_add_block(&scan, &e->node))
-			break;
-	}
-
-	err = 0;
-	list_for_each_entry(e, &evict_list, link) {
-		if (!drm_mm_scan_remove_block(&scan, &e->node)) {
-			if (!err) {
-				KUNIT_FAIL(test, "Node %lld not marked for eviction!\n",
-					   e->node.start);
-				err = -EINVAL;
-			}
-		}
-	}
-	if (err)
-		return false;
-
-	list_for_each_entry(e, &evict_list, link)
-		drm_mm_remove_node(&e->node);
-
-	if (!assert_one_hole(test, mm, 0, total_size))
-		return false;
-
-	list_for_each_entry(e, &evict_list, link) {
-		err = drm_mm_reserve_node(mm, &e->node);
-		if (err) {
-			KUNIT_FAIL(test, "Failed to reinsert node after eviction: start=%llx\n",
-				   e->node.start);
-			return false;
-		}
-	}
-
-	return assert_continuous(test, mm, nodes[0].node.size);
-}
-
-static int evict_something(struct kunit *test, struct drm_mm *mm,
-			   u64 range_start, u64 range_end, struct evict_node *nodes,
-			   unsigned int *order, unsigned int count, unsigned int size,
-			   unsigned int alignment, const struct insert_mode *mode)
-{
-	struct drm_mm_scan scan;
-	LIST_HEAD(evict_list);
-	struct evict_node *e;
-	struct drm_mm_node tmp;
-	int err;
-
-	drm_mm_scan_init_with_range(&scan, mm, size, alignment, 0, range_start,
-				    range_end, mode->mode);
-	if (!evict_nodes(test, &scan, nodes, order, count, false, &evict_list))
-		return -EINVAL;
-
-	memset(&tmp, 0, sizeof(tmp));
-	err = drm_mm_insert_node_generic(mm, &tmp, size, alignment, 0,
-					 DRM_MM_INSERT_EVICT);
-	if (err) {
-		KUNIT_FAIL(test, "Failed to insert into eviction hole: size=%d, align=%d\n",
-			   size, alignment);
-		show_scan(test, &scan);
-		show_holes(test, mm, 3);
-		return err;
-	}
-
-	if (tmp.start < range_start || tmp.start + tmp.size > range_end) {
-		KUNIT_FAIL(test,
-			   "Inserted [address=%llu + %llu] did not fit into the request range [%llu, %llu]\n",
-			   tmp.start, tmp.size, range_start, range_end);
-		err = -EINVAL;
-	}
-
-	if (!assert_node(test, &tmp, mm, size, alignment, 0) ||
-	    drm_mm_hole_follows(&tmp)) {
-		KUNIT_FAIL(test,
-			   "Inserted did not fill the eviction hole: size=%lld [%d], align=%d [rem=%lld], start=%llx, hole-follows?=%d\n",
-			   tmp.size, size, alignment, misalignment(&tmp, alignment),
-			   tmp.start, drm_mm_hole_follows(&tmp));
-		err = -EINVAL;
-	}
-
-	drm_mm_remove_node(&tmp);
-	if (err)
-		return err;
-
-	list_for_each_entry(e, &evict_list, link) {
-		err = drm_mm_reserve_node(mm, &e->node);
-		if (err) {
-			KUNIT_FAIL(test, "Failed to reinsert node after eviction: start=%llx\n",
-				   e->node.start);
-			return err;
-		}
-	}
-
-	if (!assert_continuous(test, mm, nodes[0].node.size)) {
-		KUNIT_FAIL(test, "range is no longer continuous\n");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static void drm_test_mm_evict(struct kunit *test)
-{
-	DRM_RND_STATE(prng, random_seed);
-	const unsigned int size = 8192;
-	const struct insert_mode *mode;
-	struct drm_mm mm;
-	struct evict_node *nodes;
-	struct drm_mm_node *node, *next;
-	unsigned int *order, n;
-
-	/* Here we populate a full drm_mm and then try and insert a new node
-	 * by evicting other nodes in a random order. The drm_mm_scan should
-	 * pick the first matching hole it finds from the random list. We
-	 * repeat that for different allocation strategies, alignments and
-	 * sizes to try and stress the hole finder.
-	 */
-
-	nodes = vzalloc(array_size(size, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	order = drm_random_order(size, &prng);
-	if (!order)
-		goto err_nodes;
-
-	drm_mm_init(&mm, 0, size);
-	for (n = 0; n < size; n++) {
-		if (drm_mm_insert_node(&mm, &nodes[n].node, 1)) {
-			KUNIT_FAIL(test, "insert failed, step %d\n", n);
-			goto out;
-		}
-	}
-
-	/* First check that using the scanner doesn't break the mm */
-	if (!evict_nothing(test, &mm, size, nodes)) {
-		KUNIT_FAIL(test, "evict_nothing() failed\n");
-		goto out;
-	}
-	if (!evict_everything(test, &mm, size, nodes)) {
-		KUNIT_FAIL(test, "evict_everything() failed\n");
-		goto out;
-	}
-
-	for (mode = evict_modes; mode->name; mode++) {
-		for (n = 1; n <= size; n <<= 1) {
-			drm_random_reorder(order, size, &prng);
-			if (evict_something(test, &mm, 0, U64_MAX, nodes, order, size, n, 1,
-					    mode)) {
-				KUNIT_FAIL(test, "%s evict_something(size=%u) failed\n",
-					   mode->name, n);
-				goto out;
-			}
-		}
-
-		for (n = 1; n < size; n <<= 1) {
-			drm_random_reorder(order, size, &prng);
-			if (evict_something(test, &mm, 0, U64_MAX, nodes, order, size,
-					    size / 2, n, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_something(size=%u, alignment=%u) failed\n",
-					   mode->name, size / 2, n);
-				goto out;
-			}
-		}
-
-		for_each_prime_number_from(n, 1, min(size, max_prime)) {
-			unsigned int nsize = (size - n + 1) / 2;
-
-			DRM_MM_BUG_ON(!nsize);
-
-			drm_random_reorder(order, size, &prng);
-			if (evict_something(test, &mm, 0, U64_MAX, nodes, order, size,
-					    nsize, n, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_something(size=%u, alignment=%u) failed\n",
-					   mode->name, nsize, n);
-				goto out;
-			}
-		}
-
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_nodes:
-	vfree(nodes);
-}
-
-static void drm_test_mm_evict_range(struct kunit *test)
-{
-	DRM_RND_STATE(prng, random_seed);
-	const unsigned int size = 8192;
-	const unsigned int range_size = size / 2;
-	const unsigned int range_start = size / 4;
-	const unsigned int range_end = range_start + range_size;
-	const struct insert_mode *mode;
-	struct drm_mm mm;
-	struct evict_node *nodes;
-	struct drm_mm_node *node, *next;
-	unsigned int *order, n;
-
-	/* Like drm_test_mm_evict() but now we are limiting the search to a
-	 * small portion of the full drm_mm.
-	 */
-
-	nodes = vzalloc(array_size(size, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	order = drm_random_order(size, &prng);
-	if (!order)
-		goto err_nodes;
-
-	drm_mm_init(&mm, 0, size);
-	for (n = 0; n < size; n++) {
-		if (drm_mm_insert_node(&mm, &nodes[n].node, 1)) {
-			KUNIT_FAIL(test, "insert failed, step %d\n", n);
-			goto out;
-		}
-	}
-
-	for (mode = evict_modes; mode->name; mode++) {
-		for (n = 1; n <= range_size; n <<= 1) {
-			drm_random_reorder(order, size, &prng);
-			if (evict_something(test, &mm, range_start, range_end, nodes,
-					    order, size, n, 1, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_something(size=%u) failed with range [%u, %u]\n",
-					   mode->name, n, range_start, range_end);
-				goto out;
-			}
-		}
-
-		for (n = 1; n <= range_size; n <<= 1) {
-			drm_random_reorder(order, size, &prng);
-			if (evict_something(test, &mm, range_start, range_end, nodes,
-					    order, size, range_size / 2, n, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_something(size=%u, alignment=%u) failed with range [%u, %u]\n",
-					   mode->name, range_size / 2, n, range_start, range_end);
-				goto out;
-			}
-		}
-
-		for_each_prime_number_from(n, 1, min(range_size, max_prime)) {
-			unsigned int nsize = (range_size - n + 1) / 2;
-
-			DRM_MM_BUG_ON(!nsize);
-
-			drm_random_reorder(order, size, &prng);
-			if (evict_something(test, &mm, range_start, range_end, nodes,
-					    order, size, nsize, n, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_something(size=%u, alignment=%u) failed with range [%u, %u]\n",
-					   mode->name, nsize, n, range_start, range_end);
-				goto out;
-			}
-		}
-
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_nodes:
-	vfree(nodes);
-}
-
-static unsigned int node_index(const struct drm_mm_node *node)
-{
-	return div64_u64(node->start, node->size);
-}
-
-static void drm_test_mm_topdown(struct kunit *test)
-{
-	const struct insert_mode *topdown = &insert_modes[TOPDOWN];
-
-	DRM_RND_STATE(prng, random_seed);
-	const unsigned int count = 8192;
-	unsigned int size;
-	unsigned long *bitmap;
-	struct drm_mm mm;
-	struct drm_mm_node *nodes, *node, *next;
-	unsigned int *order, n, m, o = 0;
-
-	/* When allocating top-down, we expect to be returned a node
-	 * from a suitable hole at the top of the drm_mm. We check that
-	 * the returned node does match the highest available slot.
-	 */
-
-	nodes = vzalloc(array_size(count, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	bitmap = bitmap_zalloc(count, GFP_KERNEL);
-	if (!bitmap)
-		goto err_nodes;
-
-	order = drm_random_order(count, &prng);
-	if (!order)
-		goto err_bitmap;
-
-	for (size = 1; size <= 64; size <<= 1) {
-		drm_mm_init(&mm, 0, size * count);
-		for (n = 0; n < count; n++) {
-			if (!expect_insert(test, &mm, &nodes[n], size, 0, n, topdown)) {
-				KUNIT_FAIL(test, "insert failed, size %u step %d\n", size, n);
-				goto out;
-			}
-
-			if (drm_mm_hole_follows(&nodes[n])) {
-				KUNIT_FAIL(test,
-					   "hole after topdown insert %d, start=%llx\n, size=%u",
-					   n, nodes[n].start, size);
-				goto out;
-			}
-
-			if (!assert_one_hole(test, &mm, 0, size * (count - n - 1)))
-				goto out;
-		}
-
-		if (!assert_continuous(test, &mm, size))
-			goto out;
-
-		drm_random_reorder(order, count, &prng);
-		for_each_prime_number_from(n, 1, min(count, max_prime)) {
-			for (m = 0; m < n; m++) {
-				node = &nodes[order[(o + m) % count]];
-				drm_mm_remove_node(node);
-				__set_bit(node_index(node), bitmap);
-			}
-
-			for (m = 0; m < n; m++) {
-				unsigned int last;
-
-				node = &nodes[order[(o + m) % count]];
-				if (!expect_insert(test, &mm, node, size, 0, 0, topdown)) {
-					KUNIT_FAIL(test, "insert failed, step %d/%d\n", m, n);
-					goto out;
-				}
-
-				if (drm_mm_hole_follows(node)) {
-					KUNIT_FAIL(test,
-						   "hole after topdown insert %d/%d, start=%llx\n",
-						   m, n, node->start);
-					goto out;
-				}
-
-				last = find_last_bit(bitmap, count);
-				if (node_index(node) != last) {
-					KUNIT_FAIL(test,
-						   "node %d/%d, size %d, not inserted into upmost hole, expected %d, found %d\n",
-						   m, n, size, last, node_index(node));
-					goto out;
-				}
-
-				__clear_bit(last, bitmap);
-			}
-
-			DRM_MM_BUG_ON(find_first_bit(bitmap, count) != count);
-
-			o += n;
-		}
-
-		drm_mm_for_each_node_safe(node, next, &mm)
-			drm_mm_remove_node(node);
-		DRM_MM_BUG_ON(!drm_mm_clean(&mm));
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_bitmap:
-	bitmap_free(bitmap);
-err_nodes:
-	vfree(nodes);
-}
-
-static void drm_test_mm_bottomup(struct kunit *test)
-{
-	const struct insert_mode *bottomup = &insert_modes[BOTTOMUP];
-
-	DRM_RND_STATE(prng, random_seed);
-	const unsigned int count = 8192;
-	unsigned int size;
-	unsigned long *bitmap;
-	struct drm_mm mm;
-	struct drm_mm_node *nodes, *node, *next;
-	unsigned int *order, n, m, o = 0;
-
-	/* Like drm_test_mm_topdown, but instead of searching for the last hole,
-	 * we search for the first.
-	 */
-
-	nodes = vzalloc(array_size(count, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	bitmap = bitmap_zalloc(count, GFP_KERNEL);
-	if (!bitmap)
-		goto err_nodes;
-
-	order = drm_random_order(count, &prng);
-	if (!order)
-		goto err_bitmap;
-
-	for (size = 1; size <= 64; size <<= 1) {
-		drm_mm_init(&mm, 0, size * count);
-		for (n = 0; n < count; n++) {
-			if (!expect_insert(test, &mm, &nodes[n], size, 0, n, bottomup)) {
-				KUNIT_FAIL(test,
-					   "bottomup insert failed, size %u step %d\n", size, n);
-				goto out;
-			}
-
-			if (!assert_one_hole(test, &mm, size * (n + 1), size * count))
-				goto out;
-		}
-
-		if (!assert_continuous(test, &mm, size))
-			goto out;
-
-		drm_random_reorder(order, count, &prng);
-		for_each_prime_number_from(n, 1, min(count, max_prime)) {
-			for (m = 0; m < n; m++) {
-				node = &nodes[order[(o + m) % count]];
-				drm_mm_remove_node(node);
-				__set_bit(node_index(node), bitmap);
-			}
-
-			for (m = 0; m < n; m++) {
-				unsigned int first;
-
-				node = &nodes[order[(o + m) % count]];
-				if (!expect_insert(test, &mm, node, size, 0, 0, bottomup)) {
-					KUNIT_FAIL(test, "insert failed, step %d/%d\n", m, n);
-					goto out;
-				}
-
-				first = find_first_bit(bitmap, count);
-				if (node_index(node) != first) {
-					KUNIT_FAIL(test,
-						   "node %d/%d not inserted into bottom hole, expected %d, found %d\n",
-						   m, n, first, node_index(node));
-					goto out;
-				}
-				__clear_bit(first, bitmap);
-			}
-
-			DRM_MM_BUG_ON(find_first_bit(bitmap, count) != count);
-
-			o += n;
-		}
-
-		drm_mm_for_each_node_safe(node, next, &mm)
-			drm_mm_remove_node(node);
-		DRM_MM_BUG_ON(!drm_mm_clean(&mm));
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_bitmap:
-	bitmap_free(bitmap);
-err_nodes:
-	vfree(nodes);
-}
-
 static void drm_test_mm_once(struct kunit *test, unsigned int mode)
 {
 	struct drm_mm mm;
@@ -1817,440 +335,18 @@ static void drm_test_mm_highest(struct kunit *test)
 	drm_test_mm_once(test, DRM_MM_INSERT_HIGH);
 }
 
-static void separate_adjacent_colors(const struct drm_mm_node *node,
-				     unsigned long color, u64 *start, u64 *end)
-{
-	if (drm_mm_node_allocated(node) && node->color != color)
-		++*start;
-
-	node = list_next_entry(node, node_list);
-	if (drm_mm_node_allocated(node) && node->color != color)
-		--*end;
-}
-
-static bool colors_abutt(struct kunit *test, const struct drm_mm_node *node)
-{
-	if (!drm_mm_hole_follows(node) &&
-	    drm_mm_node_allocated(list_next_entry(node, node_list))) {
-		KUNIT_FAIL(test, "colors abutt; %ld [%llx + %llx] is next to %ld [%llx + %llx]!\n",
-			   node->color, node->start, node->size,
-		       list_next_entry(node, node_list)->color,
-		       list_next_entry(node, node_list)->start,
-		       list_next_entry(node, node_list)->size);
-		return true;
-	}
-
-	return false;
-}
-
-static void drm_test_mm_color(struct kunit *test)
-{
-	const unsigned int count = min(4096u, max_iterations);
-	const struct insert_mode *mode;
-	struct drm_mm mm;
-	struct drm_mm_node *node, *nn;
-	unsigned int n;
-
-	/* Color adjustment complicates everything. First we just check
-	 * that when we insert a node we apply any color_adjustment callback.
-	 * The callback we use should ensure that there is a gap between
-	 * any two nodes, and so after each insertion we check that those
-	 * holes are inserted and that they are preserved.
-	 */
-
-	drm_mm_init(&mm, 0, U64_MAX);
-
-	for (n = 1; n <= count; n++) {
-		node = kzalloc(sizeof(*node), GFP_KERNEL);
-		if (!node)
-			goto out;
-
-		if (!expect_insert(test, &mm, node, n, 0, n, &insert_modes[0])) {
-			KUNIT_FAIL(test, "insert failed, step %d\n", n);
-			kfree(node);
-			goto out;
-		}
-	}
-
-	drm_mm_for_each_node_safe(node, nn, &mm) {
-		if (node->color != node->size) {
-			KUNIT_FAIL(test, "invalid color stored: expected %lld, found %ld\n",
-				   node->size, node->color);
-
-			goto out;
-		}
-
-		drm_mm_remove_node(node);
-		kfree(node);
-	}
-
-	/* Now, let's start experimenting with applying a color callback */
-	mm.color_adjust = separate_adjacent_colors;
-	for (mode = insert_modes; mode->name; mode++) {
-		u64 last;
-
-		node = kzalloc(sizeof(*node), GFP_KERNEL);
-		if (!node)
-			goto out;
-
-		node->size = 1 + 2 * count;
-		node->color = node->size;
-
-		if (drm_mm_reserve_node(&mm, node)) {
-			KUNIT_FAIL(test, "initial reserve failed!\n");
-			goto out;
-		}
-
-		last = node->start + node->size;
-
-		for (n = 1; n <= count; n++) {
-			int rem;
-
-			node = kzalloc(sizeof(*node), GFP_KERNEL);
-			if (!node)
-				goto out;
-
-			node->start = last;
-			node->size = n + count;
-			node->color = node->size;
-
-			if (drm_mm_reserve_node(&mm, node) != -ENOSPC) {
-				KUNIT_FAIL(test, "reserve %d did not report color overlap!", n);
-				goto out;
-			}
-
-			node->start += n + 1;
-			rem = misalignment(node, n + count);
-			node->start += n + count - rem;
-
-			if (drm_mm_reserve_node(&mm, node)) {
-				KUNIT_FAIL(test, "reserve %d failed", n);
-				goto out;
-			}
-
-			last = node->start + node->size;
-		}
-
-		for (n = 1; n <= count; n++) {
-			node = kzalloc(sizeof(*node), GFP_KERNEL);
-			if (!node)
-				goto out;
-
-			if (!expect_insert(test, &mm, node, n, n, n, mode)) {
-				KUNIT_FAIL(test, "%s insert failed, step %d\n", mode->name, n);
-				kfree(node);
-				goto out;
-			}
-		}
-
-		drm_mm_for_each_node_safe(node, nn, &mm) {
-			u64 rem;
-
-			if (node->color != node->size) {
-				KUNIT_FAIL(test,
-					   "%s invalid color stored: expected %lld, found %ld\n",
-					   mode->name, node->size, node->color);
-
-				goto out;
-			}
-
-			if (colors_abutt(test, node))
-				goto out;
-
-			div64_u64_rem(node->start, node->size, &rem);
-			if (rem) {
-				KUNIT_FAIL(test,
-					   "%s colored node misaligned, start=%llx expected alignment=%lld [rem=%lld]\n",
-					   mode->name, node->start, node->size, rem);
-				goto out;
-			}
-
-			drm_mm_remove_node(node);
-			kfree(node);
-		}
-
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, nn, &mm) {
-		drm_mm_remove_node(node);
-		kfree(node);
-	}
-	drm_mm_takedown(&mm);
-}
-
-static int evict_color(struct kunit *test, struct drm_mm *mm, u64 range_start,
-		       u64 range_end, struct evict_node *nodes, unsigned int *order,
-		unsigned int count, unsigned int size, unsigned int alignment,
-		unsigned long color, const struct insert_mode *mode)
-{
-	struct drm_mm_scan scan;
-	LIST_HEAD(evict_list);
-	struct evict_node *e;
-	struct drm_mm_node tmp;
-	int err;
-
-	drm_mm_scan_init_with_range(&scan, mm, size, alignment, color, range_start,
-				    range_end, mode->mode);
-	if (!evict_nodes(test, &scan, nodes, order, count, true, &evict_list))
-		return -EINVAL;
-
-	memset(&tmp, 0, sizeof(tmp));
-	err = drm_mm_insert_node_generic(mm, &tmp, size, alignment, color,
-					 DRM_MM_INSERT_EVICT);
-	if (err) {
-		KUNIT_FAIL(test,
-			   "Failed to insert into eviction hole: size=%d, align=%d, color=%lu, err=%d\n",
-			   size, alignment, color, err);
-		show_scan(test, &scan);
-		show_holes(test, mm, 3);
-		return err;
-	}
-
-	if (tmp.start < range_start || tmp.start + tmp.size > range_end) {
-		KUNIT_FAIL(test,
-			   "Inserted [address=%llu + %llu] did not fit into the request range [%llu, %llu]\n",
-			   tmp.start, tmp.size, range_start, range_end);
-		err = -EINVAL;
-	}
-
-	if (colors_abutt(test, &tmp))
-		err = -EINVAL;
-
-	if (!assert_node(test, &tmp, mm, size, alignment, color)) {
-		KUNIT_FAIL(test,
-			   "Inserted did not fit the eviction hole: size=%lld [%d], align=%d [rem=%lld], start=%llx\n",
-			   tmp.size, size, alignment, misalignment(&tmp, alignment), tmp.start);
-		err = -EINVAL;
-	}
-
-	drm_mm_remove_node(&tmp);
-	if (err)
-		return err;
-
-	list_for_each_entry(e, &evict_list, link) {
-		err = drm_mm_reserve_node(mm, &e->node);
-		if (err) {
-			KUNIT_FAIL(test, "Failed to reinsert node after eviction: start=%llx\n",
-				   e->node.start);
-			return err;
-		}
-	}
-
-	cond_resched();
-	return 0;
-}
-
-static void drm_test_mm_color_evict(struct kunit *test)
-{
-	DRM_RND_STATE(prng, random_seed);
-	const unsigned int total_size = min(8192u, max_iterations);
-	const struct insert_mode *mode;
-	unsigned long color = 0;
-	struct drm_mm mm;
-	struct evict_node *nodes;
-	struct drm_mm_node *node, *next;
-	unsigned int *order, n;
-
-	/* Check that the drm_mm_scan also honours color adjustment when
-	 * choosing its victims to create a hole. Our color_adjust does not
-	 * allow two nodes to be placed together without an intervening hole
-	 * enlarging the set of victims that must be evicted.
-	 */
-
-	nodes = vzalloc(array_size(total_size, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	order = drm_random_order(total_size, &prng);
-	if (!order)
-		goto err_nodes;
-
-	drm_mm_init(&mm, 0, 2 * total_size - 1);
-	mm.color_adjust = separate_adjacent_colors;
-	for (n = 0; n < total_size; n++) {
-		if (!expect_insert(test, &mm, &nodes[n].node,
-				   1, 0, color++,
-				   &insert_modes[0])) {
-			KUNIT_FAIL(test, "insert failed, step %d\n", n);
-			goto out;
-		}
-	}
-
-	for (mode = evict_modes; mode->name; mode++) {
-		for (n = 1; n <= total_size; n <<= 1) {
-			drm_random_reorder(order, total_size, &prng);
-			if (evict_color(test, &mm, 0, U64_MAX, nodes, order, total_size,
-					n, 1, color++, mode)) {
-				KUNIT_FAIL(test, "%s evict_color(size=%u) failed\n", mode->name, n);
-				goto out;
-			}
-		}
-
-		for (n = 1; n < total_size; n <<= 1) {
-			drm_random_reorder(order, total_size, &prng);
-			if (evict_color(test, &mm, 0, U64_MAX, nodes, order, total_size,
-					total_size / 2, n, color++, mode)) {
-				KUNIT_FAIL(test, "%s evict_color(size=%u, alignment=%u) failed\n",
-					   mode->name, total_size / 2, n);
-				goto out;
-			}
-		}
-
-		for_each_prime_number_from(n, 1, min(total_size, max_prime)) {
-			unsigned int nsize = (total_size - n + 1) / 2;
-
-			DRM_MM_BUG_ON(!nsize);
-
-			drm_random_reorder(order, total_size, &prng);
-			if (evict_color(test, &mm, 0, U64_MAX, nodes, order, total_size,
-					nsize, n, color++, mode)) {
-				KUNIT_FAIL(test, "%s evict_color(size=%u, alignment=%u) failed\n",
-					   mode->name, nsize, n);
-				goto out;
-			}
-		}
-
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_nodes:
-	vfree(nodes);
-}
-
-static void drm_test_mm_color_evict_range(struct kunit *test)
-{
-	DRM_RND_STATE(prng, random_seed);
-	const unsigned int total_size = 8192;
-	const unsigned int range_size = total_size / 2;
-	const unsigned int range_start = total_size / 4;
-	const unsigned int range_end = range_start + range_size;
-	const struct insert_mode *mode;
-	unsigned long color = 0;
-	struct drm_mm mm;
-	struct evict_node *nodes;
-	struct drm_mm_node *node, *next;
-	unsigned int *order, n;
-
-	/* Like drm_test_mm_color_evict(), but limited to small portion of the full
-	 * drm_mm range.
-	 */
-
-	nodes = vzalloc(array_size(total_size, sizeof(*nodes)));
-	KUNIT_ASSERT_TRUE(test, nodes);
-
-	order = drm_random_order(total_size, &prng);
-	if (!order)
-		goto err_nodes;
-
-	drm_mm_init(&mm, 0, 2 * total_size - 1);
-	mm.color_adjust = separate_adjacent_colors;
-	for (n = 0; n < total_size; n++) {
-		if (!expect_insert(test, &mm, &nodes[n].node,
-				   1, 0, color++,
-				   &insert_modes[0])) {
-			KUNIT_FAIL(test, "insert failed, step %d\n", n);
-			goto out;
-		}
-	}
-
-	for (mode = evict_modes; mode->name; mode++) {
-		for (n = 1; n <= range_size; n <<= 1) {
-			drm_random_reorder(order, range_size, &prng);
-			if (evict_color(test, &mm, range_start, range_end, nodes, order,
-					total_size, n, 1, color++, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_color(size=%u) failed for range [%x, %x]\n",
-						mode->name, n, range_start, range_end);
-				goto out;
-			}
-		}
-
-		for (n = 1; n < range_size; n <<= 1) {
-			drm_random_reorder(order, total_size, &prng);
-			if (evict_color(test, &mm, range_start, range_end, nodes, order,
-					total_size, range_size / 2, n, color++, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_color(size=%u, alignment=%u) failed for range [%x, %x]\n",
-					   mode->name, total_size / 2, n, range_start, range_end);
-				goto out;
-			}
-		}
-
-		for_each_prime_number_from(n, 1, min(range_size, max_prime)) {
-			unsigned int nsize = (range_size - n + 1) / 2;
-
-			DRM_MM_BUG_ON(!nsize);
-
-			drm_random_reorder(order, total_size, &prng);
-			if (evict_color(test, &mm, range_start, range_end, nodes, order,
-					total_size, nsize, n, color++, mode)) {
-				KUNIT_FAIL(test,
-					   "%s evict_color(size=%u, alignment=%u) failed for range [%x, %x]\n",
-					   mode->name, nsize, n, range_start, range_end);
-				goto out;
-			}
-		}
-
-		cond_resched();
-	}
-
-out:
-	drm_mm_for_each_node_safe(node, next, &mm)
-		drm_mm_remove_node(node);
-	drm_mm_takedown(&mm);
-	kfree(order);
-err_nodes:
-	vfree(nodes);
-}
-
-static int drm_mm_suite_init(struct kunit_suite *suite)
-{
-	while (!random_seed)
-		random_seed = get_random_u32();
-
-	kunit_info(suite,
-		   "Testing DRM range manager, with random_seed=0x%x max_iterations=%u max_prime=%u\n",
-		   random_seed, max_iterations, max_prime);
-
-	return 0;
-}
-
-module_param(random_seed, uint, 0400);
-module_param(max_iterations, uint, 0400);
-module_param(max_prime, uint, 0400);
-
 static struct kunit_case drm_mm_tests[] = {
 	KUNIT_CASE(drm_test_mm_init),
 	KUNIT_CASE(drm_test_mm_debug),
-	KUNIT_CASE(drm_test_mm_reserve),
-	KUNIT_CASE(drm_test_mm_insert),
-	KUNIT_CASE(drm_test_mm_replace),
-	KUNIT_CASE(drm_test_mm_insert_range),
-	KUNIT_CASE(drm_test_mm_frag),
-	KUNIT_CASE(drm_test_mm_align),
 	KUNIT_CASE(drm_test_mm_align32),
 	KUNIT_CASE(drm_test_mm_align64),
-	KUNIT_CASE(drm_test_mm_evict),
-	KUNIT_CASE(drm_test_mm_evict_range),
-	KUNIT_CASE(drm_test_mm_topdown),
-	KUNIT_CASE(drm_test_mm_bottomup),
 	KUNIT_CASE(drm_test_mm_lowest),
 	KUNIT_CASE(drm_test_mm_highest),
-	KUNIT_CASE(drm_test_mm_color),
-	KUNIT_CASE(drm_test_mm_color_evict),
-	KUNIT_CASE(drm_test_mm_color_evict_range),
 	{}
 };
 
 static struct kunit_suite drm_mm_test_suite = {
 	.name = "drm_mm",
-	.suite_init = drm_mm_suite_init,
 	.test_cases = drm_mm_tests,
 };
 
diff --git a/drivers/gpu/drm/tidss/tidss_kms.c b/drivers/gpu/drm/tidss/tidss_kms.c
index c979ad1af236..d096d8d2bc8f 100644
--- a/drivers/gpu/drm/tidss/tidss_kms.c
+++ b/drivers/gpu/drm/tidss/tidss_kms.c
@@ -4,8 +4,6 @@
  * Author: Tomi Valkeinen <tomi.valkeinen@ti.com>
  */
 
-#include <linux/dma-fence.h>
-
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_bridge.h>
@@ -25,7 +23,6 @@ static void tidss_atomic_commit_tail(struct drm_atomic_state *old_state)
 {
 	struct drm_device *ddev = old_state->dev;
 	struct tidss_device *tidss = to_tidss(ddev);
-	bool fence_cookie = dma_fence_begin_signalling();
 
 	dev_dbg(ddev->dev, "%s\n", __func__);
 
@@ -36,7 +33,6 @@ static void tidss_atomic_commit_tail(struct drm_atomic_state *old_state)
 	drm_atomic_helper_commit_modeset_enables(ddev, old_state);
 
 	drm_atomic_helper_commit_hw_done(old_state);
-	dma_fence_end_signalling(fence_cookie);
 	drm_atomic_helper_wait_for_flip_done(ddev, old_state);
 
 	drm_atomic_helper_cleanup_planes(ddev, old_state);
diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index c5c34cd2edc1..4e3a152f897a 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -411,7 +411,8 @@ static void cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
 		unsigned int offset = drm_fb_clip_offset(pitch, format, &damage);
 		struct iosys_map dst = IOSYS_MAP_INIT_OFFSET(&vaddr, offset);
 
-		drm_fb_blit(&dst, &pitch, format->format, shadow_plane_state->data, fb, &damage);
+		drm_fb_blit(&dst, &pitch, format->format, shadow_plane_state->data, fb,
+			    &damage, &shadow_plane_state->fmtcnv_state);
 	}
 
 	drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/tiny/ili9225.c b/drivers/gpu/drm/tiny/ili9225.c
index 4ceb68ffac4b..dd8b0a181be9 100644
--- a/drivers/gpu/drm/tiny/ili9225.c
+++ b/drivers/gpu/drm/tiny/ili9225.c
@@ -78,7 +78,7 @@ static inline int ili9225_command(struct mipi_dbi *dbi, u8 cmd, u16 data)
 }
 
 static void ili9225_fb_dirty(struct iosys_map *src, struct drm_framebuffer *fb,
-			     struct drm_rect *rect)
+			     struct drm_rect *rect, struct drm_format_conv_state *fmtcnv_state)
 {
 	struct mipi_dbi_dev *dbidev = drm_to_mipi_dbi_dev(fb->dev);
 	unsigned int height = rect->y2 - rect->y1;
@@ -98,7 +98,7 @@ static void ili9225_fb_dirty(struct iosys_map *src, struct drm_framebuffer *fb,
 	if (!dbi->dc || !full || swap ||
 	    fb->format->format == DRM_FORMAT_XRGB8888) {
 		tr = dbidev->tx_buf;
-		ret = mipi_dbi_buf_copy(tr, src, fb, rect, swap);
+		ret = mipi_dbi_buf_copy(tr, src, fb, rect, swap, fmtcnv_state);
 		if (ret)
 			goto err_msg;
 	} else {
@@ -171,7 +171,8 @@ static void ili9225_pipe_update(struct drm_simple_display_pipe *pipe,
 		return;
 
 	if (drm_atomic_helper_damage_merged(old_state, state, &rect))
-		ili9225_fb_dirty(&shadow_plane_state->data[0], fb, &rect);
+		ili9225_fb_dirty(&shadow_plane_state->data[0], fb, &rect,
+				 &shadow_plane_state->fmtcnv_state);
 
 	drm_dev_exit(idx);
 }
@@ -281,7 +282,8 @@ static void ili9225_pipe_enable(struct drm_simple_display_pipe *pipe,
 
 	ili9225_command(dbi, ILI9225_DISPLAY_CONTROL_1, 0x1017);
 
-	ili9225_fb_dirty(&shadow_plane_state->data[0], fb, &rect);
+	ili9225_fb_dirty(&shadow_plane_state->data[0], fb, &rect,
+			 &shadow_plane_state->fmtcnv_state);
 
 out_exit:
 	drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 2d999a0facde..05a72473cfc6 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -758,7 +758,11 @@ static const uint64_t ofdrm_primary_plane_format_modifiers[] = {
 static int ofdrm_primary_plane_helper_atomic_check(struct drm_plane *plane,
 						   struct drm_atomic_state *new_state)
 {
+	struct drm_device *dev = plane->dev;
+	struct ofdrm_device *odev = ofdrm_device_of_dev(dev);
 	struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(new_state, plane);
+	struct drm_shadow_plane_state *new_shadow_plane_state =
+		to_drm_shadow_plane_state(new_plane_state);
 	struct drm_framebuffer *new_fb = new_plane_state->fb;
 	struct drm_crtc *new_crtc = new_plane_state->crtc;
 	struct drm_crtc_state *new_crtc_state = NULL;
@@ -777,6 +781,16 @@ static int ofdrm_primary_plane_helper_atomic_check(struct drm_plane *plane,
 	else if (!new_plane_state->visible)
 		return 0;
 
+	if (new_fb->format != odev->format) {
+		void *buf;
+
+		/* format conversion necessary; reserve buffer */
+		buf = drm_format_conv_state_reserve(&new_shadow_plane_state->fmtcnv_state,
+						    odev->pitch, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+	}
+
 	new_crtc_state = drm_atomic_get_new_crtc_state(new_state, new_plane_state->crtc);
 
 	new_ofdrm_crtc_state = to_ofdrm_crtc_state(new_crtc_state);
@@ -817,7 +831,7 @@ static void ofdrm_primary_plane_helper_atomic_update(struct drm_plane *plane,
 
 		iosys_map_incr(&dst, drm_fb_clip_offset(dst_pitch, dst_format, &dst_clip));
 		drm_fb_blit(&dst, &dst_pitch, dst_format->format, shadow_plane_state->data, fb,
-			    &damage);
+			    &damage, &shadow_plane_state->fmtcnv_state);
 	}
 
 	drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/tiny/repaper.c b/drivers/gpu/drm/tiny/repaper.c
index 73dd4f4289c2..8fd6758f5725 100644
--- a/drivers/gpu/drm/tiny/repaper.c
+++ b/drivers/gpu/drm/tiny/repaper.c
@@ -509,7 +509,8 @@ static void repaper_get_temperature(struct repaper_epd *epd)
 	epd->factored_stage_time = epd->stage_time * factor10x / 10;
 }
 
-static int repaper_fb_dirty(struct drm_framebuffer *fb)
+static int repaper_fb_dirty(struct drm_framebuffer *fb,
+			    struct drm_format_conv_state *fmtcnv_state)
 {
 	struct drm_gem_dma_object *dma_obj = drm_fb_dma_get_gem_obj(fb, 0);
 	struct repaper_epd *epd = drm_to_epd(fb->dev);
@@ -545,7 +546,7 @@ static int repaper_fb_dirty(struct drm_framebuffer *fb)
 
 	iosys_map_set_vaddr(&dst, buf);
 	iosys_map_set_vaddr(&vmap, dma_obj->vaddr);
-	drm_fb_xrgb8888_to_mono(&dst, &dst_pitch, &vmap, fb, &clip);
+	drm_fb_xrgb8888_to_mono(&dst, &dst_pitch, &vmap, fb, &clip, fmtcnv_state);
 
 	drm_gem_fb_end_cpu_access(fb, DMA_FROM_DEVICE);
 
@@ -830,13 +831,16 @@ static void repaper_pipe_update(struct drm_simple_display_pipe *pipe,
 				struct drm_plane_state *old_state)
 {
 	struct drm_plane_state *state = pipe->plane.state;
+	struct drm_format_conv_state fmtcnv_state = DRM_FORMAT_CONV_STATE_INIT;
 	struct drm_rect rect;
 
 	if (!pipe->crtc.state->active)
 		return;
 
 	if (drm_atomic_helper_damage_merged(old_state, state, &rect))
-		repaper_fb_dirty(state->fb);
+		repaper_fb_dirty(state->fb, &fmtcnv_state);
+
+	drm_format_conv_state_release(&fmtcnv_state);
 }
 
 static const struct drm_simple_display_pipe_funcs repaper_pipe_funcs = {
diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 5fefc895bca2..34bbbd7b53dd 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -19,6 +19,7 @@
 #include <drm/drm_drv.h>
 #include <drm/drm_fbdev_generic.h>
 #include <drm/drm_format_helper.h>
+#include <drm/drm_framebuffer.h>
 #include <drm/drm_gem_atomic_helper.h>
 #include <drm/drm_gem_framebuffer_helper.h>
 #include <drm/drm_gem_shmem_helper.h>
@@ -579,6 +580,44 @@ static const uint64_t simpledrm_primary_plane_format_modifiers[] = {
 	DRM_FORMAT_MOD_INVALID
 };
 
+static int simpledrm_primary_plane_helper_atomic_check(struct drm_plane *plane,
+						       struct drm_atomic_state *state)
+{
+	struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane);
+	struct drm_shadow_plane_state *new_shadow_plane_state =
+		to_drm_shadow_plane_state(new_plane_state);
+	struct drm_framebuffer *new_fb = new_plane_state->fb;
+	struct drm_crtc *new_crtc = new_plane_state->crtc;
+	struct drm_crtc_state *new_crtc_state = NULL;
+	struct drm_device *dev = plane->dev;
+	struct simpledrm_device *sdev = simpledrm_device_of_dev(dev);
+	int ret;
+
+	if (new_crtc)
+		new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc);
+
+	ret = drm_atomic_helper_check_plane_state(new_plane_state, new_crtc_state,
+						  DRM_PLANE_NO_SCALING,
+						  DRM_PLANE_NO_SCALING,
+						  false, false);
+	if (ret)
+		return ret;
+	else if (!new_plane_state->visible)
+		return 0;
+
+	if (new_fb->format != sdev->format) {
+		void *buf;
+
+		/* format conversion necessary; reserve buffer */
+		buf = drm_format_conv_state_reserve(&new_shadow_plane_state->fmtcnv_state,
+						    sdev->pitch, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static void simpledrm_primary_plane_helper_atomic_update(struct drm_plane *plane,
 							 struct drm_atomic_state *state)
 {
@@ -609,7 +648,7 @@ static void simpledrm_primary_plane_helper_atomic_update(struct drm_plane *plane
 
 		iosys_map_incr(&dst, drm_fb_clip_offset(sdev->pitch, sdev->format, &dst_clip));
 		drm_fb_blit(&dst, &sdev->pitch, sdev->format->format, shadow_plane_state->data,
-			    fb, &damage);
+			    fb, &damage, &shadow_plane_state->fmtcnv_state);
 	}
 
 	drm_dev_exit(idx);
@@ -635,7 +674,7 @@ static void simpledrm_primary_plane_helper_atomic_disable(struct drm_plane *plan
 
 static const struct drm_plane_helper_funcs simpledrm_primary_plane_helper_funcs = {
 	DRM_GEM_SHADOW_PLANE_HELPER_FUNCS,
-	.atomic_check = drm_plane_helper_atomic_check,
+	.atomic_check = simpledrm_primary_plane_helper_atomic_check,
 	.atomic_update = simpledrm_primary_plane_helper_atomic_update,
 	.atomic_disable = simpledrm_primary_plane_helper_atomic_disable,
 };
diff --git a/drivers/gpu/drm/tiny/st7586.c b/drivers/gpu/drm/tiny/st7586.c
index 3cf4eec16a81..7336fa1ddaed 100644
--- a/drivers/gpu/drm/tiny/st7586.c
+++ b/drivers/gpu/drm/tiny/st7586.c
@@ -64,7 +64,8 @@ static const u8 st7586_lookup[] = { 0x7, 0x4, 0x2, 0x0 };
 
 static void st7586_xrgb8888_to_gray332(u8 *dst, void *vaddr,
 				       struct drm_framebuffer *fb,
-				       struct drm_rect *clip)
+				       struct drm_rect *clip,
+				       struct drm_format_conv_state *fmtcnv_state)
 {
 	size_t len = (clip->x2 - clip->x1) * (clip->y2 - clip->y1);
 	unsigned int x, y;
@@ -77,7 +78,7 @@ static void st7586_xrgb8888_to_gray332(u8 *dst, void *vaddr,
 
 	iosys_map_set_vaddr(&dst_map, buf);
 	iosys_map_set_vaddr(&vmap, vaddr);
-	drm_fb_xrgb8888_to_gray8(&dst_map, NULL, &vmap, fb, clip);
+	drm_fb_xrgb8888_to_gray8(&dst_map, NULL, &vmap, fb, clip, fmtcnv_state);
 	src = buf;
 
 	for (y = clip->y1; y < clip->y2; y++) {
@@ -93,7 +94,7 @@ static void st7586_xrgb8888_to_gray332(u8 *dst, void *vaddr,
 }
 
 static int st7586_buf_copy(void *dst, struct iosys_map *src, struct drm_framebuffer *fb,
-			   struct drm_rect *clip)
+			   struct drm_rect *clip, struct drm_format_conv_state *fmtcnv_state)
 {
 	int ret;
 
@@ -101,7 +102,7 @@ static int st7586_buf_copy(void *dst, struct iosys_map *src, struct drm_framebuf
 	if (ret)
 		return ret;
 
-	st7586_xrgb8888_to_gray332(dst, src->vaddr, fb, clip);
+	st7586_xrgb8888_to_gray332(dst, src->vaddr, fb, clip, fmtcnv_state);
 
 	drm_gem_fb_end_cpu_access(fb, DMA_FROM_DEVICE);
 
@@ -109,7 +110,7 @@ static int st7586_buf_copy(void *dst, struct iosys_map *src, struct drm_framebuf
 }
 
 static void st7586_fb_dirty(struct iosys_map *src, struct drm_framebuffer *fb,
-			    struct drm_rect *rect)
+			    struct drm_rect *rect, struct drm_format_conv_state *fmtcnv_state)
 {
 	struct mipi_dbi_dev *dbidev = drm_to_mipi_dbi_dev(fb->dev);
 	struct mipi_dbi *dbi = &dbidev->dbi;
@@ -121,7 +122,7 @@ static void st7586_fb_dirty(struct iosys_map *src, struct drm_framebuffer *fb,
 
 	DRM_DEBUG_KMS("Flushing [FB:%d] " DRM_RECT_FMT "\n", fb->base.id, DRM_RECT_ARG(rect));
 
-	ret = st7586_buf_copy(dbidev->tx_buf, src, fb, rect);
+	ret = st7586_buf_copy(dbidev->tx_buf, src, fb, rect, fmtcnv_state);
 	if (ret)
 		goto err_msg;
 
@@ -160,7 +161,8 @@ static void st7586_pipe_update(struct drm_simple_display_pipe *pipe,
 		return;
 
 	if (drm_atomic_helper_damage_merged(old_state, state, &rect))
-		st7586_fb_dirty(&shadow_plane_state->data[0], fb, &rect);
+		st7586_fb_dirty(&shadow_plane_state->data[0], fb, &rect,
+				&shadow_plane_state->fmtcnv_state);
 
 	drm_dev_exit(idx);
 }
@@ -238,7 +240,8 @@ static void st7586_pipe_enable(struct drm_simple_display_pipe *pipe,
 
 	msleep(100);
 
-	st7586_fb_dirty(&shadow_plane_state->data[0], fb, &rect);
+	st7586_fb_dirty(&shadow_plane_state->data[0], fb, &rect,
+			&shadow_plane_state->fmtcnv_state);
 
 	mipi_dbi_command(dbi, MIPI_DCS_SET_DISPLAY_ON);
 out_exit:
diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index e8b314137020..4b21b20e4998 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -11,7 +11,8 @@ v3d-y := \
 	v3d_mmu.o \
 	v3d_perfmon.o \
 	v3d_trace_points.o \
-	v3d_sched.o
+	v3d_sched.o \
+	v3d_sysfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c b/drivers/gpu/drm/v3d/v3d_debugfs.c
index 330669f51fa7..f843a50d5dce 100644
--- a/drivers/gpu/drm/v3d/v3d_debugfs.c
+++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
@@ -12,69 +12,83 @@
 #include "v3d_drv.h"
 #include "v3d_regs.h"
 
-#define REGDEF(reg) { reg, #reg }
+#define REGDEF(min_ver, max_ver, reg) { min_ver, max_ver, reg, #reg }
 struct v3d_reg_def {
+	u32 min_ver;
+	u32 max_ver;
 	u32 reg;
 	const char *name;
 };
 
 static const struct v3d_reg_def v3d_hub_reg_defs[] = {
-	REGDEF(V3D_HUB_AXICFG),
-	REGDEF(V3D_HUB_UIFCFG),
-	REGDEF(V3D_HUB_IDENT0),
-	REGDEF(V3D_HUB_IDENT1),
-	REGDEF(V3D_HUB_IDENT2),
-	REGDEF(V3D_HUB_IDENT3),
-	REGDEF(V3D_HUB_INT_STS),
-	REGDEF(V3D_HUB_INT_MSK_STS),
+	REGDEF(33, 42, V3D_HUB_AXICFG),
+	REGDEF(33, 71, V3D_HUB_UIFCFG),
+	REGDEF(33, 71, V3D_HUB_IDENT0),
+	REGDEF(33, 71, V3D_HUB_IDENT1),
+	REGDEF(33, 71, V3D_HUB_IDENT2),
+	REGDEF(33, 71, V3D_HUB_IDENT3),
+	REGDEF(33, 71, V3D_HUB_INT_STS),
+	REGDEF(33, 71, V3D_HUB_INT_MSK_STS),
 
-	REGDEF(V3D_MMU_CTL),
-	REGDEF(V3D_MMU_VIO_ADDR),
-	REGDEF(V3D_MMU_VIO_ID),
-	REGDEF(V3D_MMU_DEBUG_INFO),
+	REGDEF(33, 71, V3D_MMU_CTL),
+	REGDEF(33, 71, V3D_MMU_VIO_ADDR),
+	REGDEF(33, 71, V3D_MMU_VIO_ID),
+	REGDEF(33, 71, V3D_MMU_DEBUG_INFO),
+
+	REGDEF(71, 71, V3D_GMP_STATUS(71)),
+	REGDEF(71, 71, V3D_GMP_CFG(71)),
+	REGDEF(71, 71, V3D_GMP_VIO_ADDR(71)),
 };
 
 static const struct v3d_reg_def v3d_gca_reg_defs[] = {
-	REGDEF(V3D_GCA_SAFE_SHUTDOWN),
-	REGDEF(V3D_GCA_SAFE_SHUTDOWN_ACK),
+	REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN),
+	REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN_ACK),
 };
 
 static const struct v3d_reg_def v3d_core_reg_defs[] = {
-	REGDEF(V3D_CTL_IDENT0),
-	REGDEF(V3D_CTL_IDENT1),
-	REGDEF(V3D_CTL_IDENT2),
-	REGDEF(V3D_CTL_MISCCFG),
-	REGDEF(V3D_CTL_INT_STS),
-	REGDEF(V3D_CTL_INT_MSK_STS),
-	REGDEF(V3D_CLE_CT0CS),
-	REGDEF(V3D_CLE_CT0CA),
-	REGDEF(V3D_CLE_CT0EA),
-	REGDEF(V3D_CLE_CT1CS),
-	REGDEF(V3D_CLE_CT1CA),
-	REGDEF(V3D_CLE_CT1EA),
+	REGDEF(33, 71, V3D_CTL_IDENT0),
+	REGDEF(33, 71, V3D_CTL_IDENT1),
+	REGDEF(33, 71, V3D_CTL_IDENT2),
+	REGDEF(33, 71, V3D_CTL_MISCCFG),
+	REGDEF(33, 71, V3D_CTL_INT_STS),
+	REGDEF(33, 71, V3D_CTL_INT_MSK_STS),
+	REGDEF(33, 71, V3D_CLE_CT0CS),
+	REGDEF(33, 71, V3D_CLE_CT0CA),
+	REGDEF(33, 71, V3D_CLE_CT0EA),
+	REGDEF(33, 71, V3D_CLE_CT1CS),
+	REGDEF(33, 71, V3D_CLE_CT1CA),
+	REGDEF(33, 71, V3D_CLE_CT1EA),
 
-	REGDEF(V3D_PTB_BPCA),
-	REGDEF(V3D_PTB_BPCS),
+	REGDEF(33, 71, V3D_PTB_BPCA),
+	REGDEF(33, 71, V3D_PTB_BPCS),
 
-	REGDEF(V3D_GMP_STATUS),
-	REGDEF(V3D_GMP_CFG),
-	REGDEF(V3D_GMP_VIO_ADDR),
+	REGDEF(33, 41, V3D_GMP_STATUS(33)),
+	REGDEF(33, 41, V3D_GMP_CFG(33)),
+	REGDEF(33, 41, V3D_GMP_VIO_ADDR(33)),
 
-	REGDEF(V3D_ERR_FDBGO),
-	REGDEF(V3D_ERR_FDBGB),
-	REGDEF(V3D_ERR_FDBGS),
-	REGDEF(V3D_ERR_STAT),
+	REGDEF(33, 71, V3D_ERR_FDBGO),
+	REGDEF(33, 71, V3D_ERR_FDBGB),
+	REGDEF(33, 71, V3D_ERR_FDBGS),
+	REGDEF(33, 71, V3D_ERR_STAT),
 };
 
 static const struct v3d_reg_def v3d_csd_reg_defs[] = {
-	REGDEF(V3D_CSD_STATUS),
-	REGDEF(V3D_CSD_CURRENT_CFG0),
-	REGDEF(V3D_CSD_CURRENT_CFG1),
-	REGDEF(V3D_CSD_CURRENT_CFG2),
-	REGDEF(V3D_CSD_CURRENT_CFG3),
-	REGDEF(V3D_CSD_CURRENT_CFG4),
-	REGDEF(V3D_CSD_CURRENT_CFG5),
-	REGDEF(V3D_CSD_CURRENT_CFG6),
+	REGDEF(41, 71, V3D_CSD_STATUS),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG0(41)),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG1(41)),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG2(41)),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG3(41)),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG4(41)),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG5(41)),
+	REGDEF(41, 41, V3D_CSD_CURRENT_CFG6(41)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG0(71)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG1(71)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG2(71)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG3(71)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG4(71)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG5(71)),
+	REGDEF(71, 71, V3D_CSD_CURRENT_CFG6(71)),
+	REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG7),
 };
 
 static int v3d_v3d_debugfs_regs(struct seq_file *m, void *unused)
@@ -85,38 +99,41 @@ static int v3d_v3d_debugfs_regs(struct seq_file *m, void *unused)
 	int i, core;
 
 	for (i = 0; i < ARRAY_SIZE(v3d_hub_reg_defs); i++) {
-		seq_printf(m, "%s (0x%04x): 0x%08x\n",
-			   v3d_hub_reg_defs[i].name, v3d_hub_reg_defs[i].reg,
-			   V3D_READ(v3d_hub_reg_defs[i].reg));
+		const struct v3d_reg_def *def = &v3d_hub_reg_defs[i];
+
+		if (v3d->ver >= def->min_ver && v3d->ver <= def->max_ver) {
+			seq_printf(m, "%s (0x%04x): 0x%08x\n",
+				   def->name, def->reg, V3D_READ(def->reg));
+		}
 	}
 
-	if (v3d->ver < 41) {
-		for (i = 0; i < ARRAY_SIZE(v3d_gca_reg_defs); i++) {
+	for (i = 0; i < ARRAY_SIZE(v3d_gca_reg_defs); i++) {
+		const struct v3d_reg_def *def = &v3d_gca_reg_defs[i];
+
+		if (v3d->ver >= def->min_ver && v3d->ver <= def->max_ver) {
 			seq_printf(m, "%s (0x%04x): 0x%08x\n",
-				   v3d_gca_reg_defs[i].name,
-				   v3d_gca_reg_defs[i].reg,
-				   V3D_GCA_READ(v3d_gca_reg_defs[i].reg));
+				   def->name, def->reg, V3D_GCA_READ(def->reg));
 		}
 	}
 
 	for (core = 0; core < v3d->cores; core++) {
 		for (i = 0; i < ARRAY_SIZE(v3d_core_reg_defs); i++) {
-			seq_printf(m, "core %d %s (0x%04x): 0x%08x\n",
-				   core,
-				   v3d_core_reg_defs[i].name,
-				   v3d_core_reg_defs[i].reg,
-				   V3D_CORE_READ(core,
-						 v3d_core_reg_defs[i].reg));
+			const struct v3d_reg_def *def = &v3d_core_reg_defs[i];
+
+			if (v3d->ver >= def->min_ver && v3d->ver <= def->max_ver) {
+				seq_printf(m, "core %d %s (0x%04x): 0x%08x\n",
+					   core, def->name, def->reg,
+					   V3D_CORE_READ(core, def->reg));
+			}
 		}
 
-		if (v3d_has_csd(v3d)) {
-			for (i = 0; i < ARRAY_SIZE(v3d_csd_reg_defs); i++) {
+		for (i = 0; i < ARRAY_SIZE(v3d_csd_reg_defs); i++) {
+			const struct v3d_reg_def *def = &v3d_csd_reg_defs[i];
+
+			if (v3d->ver >= def->min_ver && v3d->ver <= def->max_ver) {
 				seq_printf(m, "core %d %s (0x%04x): 0x%08x\n",
-					   core,
-					   v3d_csd_reg_defs[i].name,
-					   v3d_csd_reg_defs[i].reg,
-					   V3D_CORE_READ(core,
-							 v3d_csd_reg_defs[i].reg));
+					   core, def->name, def->reg,
+					   V3D_CORE_READ(core, def->reg));
 			}
 		}
 	}
@@ -147,8 +164,10 @@ static int v3d_v3d_debugfs_ident(struct seq_file *m, void *unused)
 		   str_yes_no(ident2 & V3D_HUB_IDENT2_WITH_MMU));
 	seq_printf(m, "TFU:        %s\n",
 		   str_yes_no(ident1 & V3D_HUB_IDENT1_WITH_TFU));
-	seq_printf(m, "TSY:        %s\n",
-		   str_yes_no(ident1 & V3D_HUB_IDENT1_WITH_TSY));
+	if (v3d->ver <= 42) {
+		seq_printf(m, "TSY:        %s\n",
+			   str_yes_no(ident1 & V3D_HUB_IDENT1_WITH_TSY));
+	}
 	seq_printf(m, "MSO:        %s\n",
 		   str_yes_no(ident1 & V3D_HUB_IDENT1_WITH_MSO));
 	seq_printf(m, "L3C:        %s (%dkb)\n",
@@ -177,10 +196,14 @@ static int v3d_v3d_debugfs_ident(struct seq_file *m, void *unused)
 		seq_printf(m, "  QPUs:         %d\n", nslc * qups);
 		seq_printf(m, "  Semaphores:   %d\n",
 			   V3D_GET_FIELD(ident1, V3D_IDENT1_NSEM));
-		seq_printf(m, "  BCG int:      %d\n",
-			   (ident2 & V3D_IDENT2_BCG_INT) != 0);
-		seq_printf(m, "  Override TMU: %d\n",
-			   (misccfg & V3D_MISCCFG_OVRTMUOUT) != 0);
+		if (v3d->ver <= 42) {
+			seq_printf(m, "  BCG int:      %d\n",
+				   (ident2 & V3D_IDENT2_BCG_INT) != 0);
+		}
+		if (v3d->ver < 40) {
+			seq_printf(m, "  Override TMU: %d\n",
+				   (misccfg & V3D_MISCCFG_OVRTMUOUT) != 0);
+		}
 	}
 
 	return 0;
@@ -212,14 +235,15 @@ static int v3d_measure_clock(struct seq_file *m, void *unused)
 	int measure_ms = 1000;
 
 	if (v3d->ver >= 40) {
+		int cycle_count_reg = V3D_PCTR_CYCLE_COUNT(v3d->ver);
 		V3D_CORE_WRITE(core, V3D_V4_PCTR_0_SRC_0_3,
-			       V3D_SET_FIELD(V3D_PCTR_CYCLE_COUNT,
+			       V3D_SET_FIELD(cycle_count_reg,
 					     V3D_PCTR_S0));
 		V3D_CORE_WRITE(core, V3D_V4_PCTR_0_CLR, 1);
 		V3D_CORE_WRITE(core, V3D_V4_PCTR_0_EN, 1);
 	} else {
 		V3D_CORE_WRITE(core, V3D_V3_PCTR_0_PCTRS0,
-			       V3D_PCTR_CYCLE_COUNT);
+			       V3D_PCTR_CYCLE_COUNT(v3d->ver));
 		V3D_CORE_WRITE(core, V3D_V3_PCTR_0_CLR, 1);
 		V3D_CORE_WRITE(core, V3D_V3_PCTR_0_EN,
 			       V3D_V3_PCTR_0_EN_ENABLE |
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index ffbbe9d527d3..44a1ca57d6a4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -19,6 +19,7 @@
 #include <linux/module.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
+#include <linux/sched/clock.h>
 #include <linux/reset.h>
 
 #include <drm/drm_drv.h>
@@ -111,6 +112,10 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
 	v3d_priv->v3d = v3d;
 
 	for (i = 0; i < V3D_MAX_QUEUES; i++) {
+		v3d_priv->enabled_ns[i] = 0;
+		v3d_priv->start_ns[i] = 0;
+		v3d_priv->jobs_sent[i] = 0;
+
 		sched = &v3d->queue[i].sched;
 		drm_sched_entity_init(&v3d_priv->sched_entity[i],
 				      DRM_SCHED_PRIORITY_NORMAL, &sched,
@@ -136,7 +141,35 @@ v3d_postclose(struct drm_device *dev, struct drm_file *file)
 	kfree(v3d_priv);
 }
 
-DEFINE_DRM_GEM_FOPS(v3d_drm_fops);
+static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+	struct v3d_file_priv *file_priv = file->driver_priv;
+	u64 timestamp = local_clock();
+	enum v3d_queue queue;
+
+	for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+		/* Note that, in case of a GPU reset, the time spent during an
+		 * attempt of executing the job is not computed in the runtime.
+		 */
+		drm_printf(p, "drm-engine-%s: \t%llu ns\n",
+			   v3d_queue_to_string(queue),
+			   file_priv->start_ns[queue] ? file_priv->enabled_ns[queue]
+						      + timestamp - file_priv->start_ns[queue]
+						      : file_priv->enabled_ns[queue]);
+
+		/* Note that we only count jobs that completed. Therefore, jobs
+		 * that were resubmitted due to a GPU reset are not computed.
+		 */
+		drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
+			   v3d_queue_to_string(queue), file_priv->jobs_sent[queue]);
+	}
+}
+
+static const struct file_operations v3d_drm_fops = {
+	.owner = THIS_MODULE,
+	DRM_GEM_FOPS,
+	.show_fdinfo = drm_show_fdinfo,
+};
 
 /* DRM_AUTH is required on SUBMIT_CL for now, while we don't have GMP
  * protection between clients.  Note that render nodes would be
@@ -176,6 +209,7 @@ static const struct drm_driver v3d_drm_driver = {
 	.ioctls = v3d_drm_ioctls,
 	.num_ioctls = ARRAY_SIZE(v3d_drm_ioctls),
 	.fops = &v3d_drm_fops,
+	.show_fdinfo = v3d_show_fdinfo,
 
 	.name = DRIVER_NAME,
 	.desc = DRIVER_DESC,
@@ -187,6 +221,7 @@ static const struct drm_driver v3d_drm_driver = {
 
 static const struct of_device_id v3d_of_match[] = {
 	{ .compatible = "brcm,2711-v3d" },
+	{ .compatible = "brcm,2712-v3d" },
 	{ .compatible = "brcm,7268-v3d" },
 	{ .compatible = "brcm,7278-v3d" },
 	{},
@@ -281,8 +316,14 @@ static int v3d_platform_drm_probe(struct platform_device *pdev)
 	if (ret)
 		goto irq_disable;
 
+	ret = v3d_sysfs_init(dev);
+	if (ret)
+		goto drm_unregister;
+
 	return 0;
 
+drm_unregister:
+	drm_dev_unregister(drm);
 irq_disable:
 	v3d_irq_disable(v3d);
 gem_destroy:
@@ -296,6 +337,9 @@ static void v3d_platform_drm_remove(struct platform_device *pdev)
 {
 	struct drm_device *drm = platform_get_drvdata(pdev);
 	struct v3d_dev *v3d = to_v3d_dev(drm);
+	struct device *dev = &pdev->dev;
+
+	v3d_sysfs_destroy(dev);
 
 	drm_dev_unregister(drm);
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 106454f28956..4c59fefaa0b4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -21,11 +21,27 @@ struct reset_control;
 
 #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
 
+static inline char *v3d_queue_to_string(enum v3d_queue queue)
+{
+	switch (queue) {
+	case V3D_BIN: return "bin";
+	case V3D_RENDER: return "render";
+	case V3D_TFU: return "tfu";
+	case V3D_CSD: return "csd";
+	case V3D_CACHE_CLEAN: return "cache_clean";
+	}
+	return "UNKNOWN";
+}
+
 struct v3d_queue_state {
 	struct drm_gpu_scheduler sched;
 
 	u64 fence_context;
 	u64 emit_seqno;
+
+	u64 start_ns;
+	u64 enabled_ns;
+	u64 jobs_sent;
 };
 
 /* Performance monitor object. The perform lifetime is controlled by userspace
@@ -167,6 +183,12 @@ struct v3d_file_priv {
 	} perfmon;
 
 	struct drm_sched_entity sched_entity[V3D_MAX_QUEUES];
+
+	u64 start_ns[V3D_MAX_QUEUES];
+
+	u64 enabled_ns[V3D_MAX_QUEUES];
+
+	u64 jobs_sent[V3D_MAX_QUEUES];
 };
 
 struct v3d_bo {
@@ -238,6 +260,11 @@ struct v3d_job {
 	 */
 	struct v3d_perfmon *perfmon;
 
+	/* File descriptor of the process that submitted the job that could be used
+	 * for collecting stats by process of GPU usage.
+	 */
+	struct drm_file *file;
+
 	/* Callback for the freeing of the job on refcount going to 0. */
 	void (*free)(struct kref *ref);
 };
@@ -418,3 +445,7 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 				 struct drm_file *file_priv);
+
+/* v3d_sysfs.c */
+int v3d_sysfs_init(struct device *dev);
+void v3d_sysfs_destroy(struct device *dev);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 2e94ce788c71..9d2ac23c29e3 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -47,9 +47,9 @@ v3d_init_hw_state(struct v3d_dev *v3d)
 static void
 v3d_idle_axi(struct v3d_dev *v3d, int core)
 {
-	V3D_CORE_WRITE(core, V3D_GMP_CFG, V3D_GMP_CFG_STOP_REQ);
+	V3D_CORE_WRITE(core, V3D_GMP_CFG(v3d->ver), V3D_GMP_CFG_STOP_REQ);
 
-	if (wait_for((V3D_CORE_READ(core, V3D_GMP_STATUS) &
+	if (wait_for((V3D_CORE_READ(core, V3D_GMP_STATUS(v3d->ver)) &
 		      (V3D_GMP_STATUS_RD_COUNT_MASK |
 		       V3D_GMP_STATUS_WR_COUNT_MASK |
 		       V3D_GMP_STATUS_CFG_BUSY)) == 0, 100)) {
@@ -415,9 +415,10 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	job = *container;
 	job->v3d = v3d;
 	job->free = free;
+	job->file = file_priv;
 
 	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
-				 v3d_priv);
+				 1, v3d_priv);
 	if (ret)
 		goto fail;
 
@@ -1013,8 +1014,12 @@ v3d_gem_init(struct drm_device *dev)
 	u32 pt_size = 4096 * 1024;
 	int ret, i;
 
-	for (i = 0; i < V3D_MAX_QUEUES; i++)
+	for (i = 0; i < V3D_MAX_QUEUES; i++) {
 		v3d->queue[i].fence_context = dma_fence_context_alloc(1);
+		v3d->queue[i].start_ns = 0;
+		v3d->queue[i].enabled_ns = 0;
+		v3d->queue[i].jobs_sent = 0;
+	}
 
 	spin_lock_init(&v3d->mm_lock);
 	spin_lock_init(&v3d->job_lock);
@@ -1072,6 +1077,8 @@ v3d_gem_destroy(struct drm_device *dev)
 	 */
 	WARN_ON(v3d->bin_job);
 	WARN_ON(v3d->render_job);
+	WARN_ON(v3d->tfu_job);
+	WARN_ON(v3d->csd_job);
 
 	drm_mm_takedown(&v3d->mm);
 
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index e714d5318f30..afc76390a197 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -14,21 +14,23 @@
  */
 
 #include <linux/platform_device.h>
+#include <linux/sched/clock.h>
 
 #include "v3d_drv.h"
 #include "v3d_regs.h"
 #include "v3d_trace.h"
 
-#define V3D_CORE_IRQS ((u32)(V3D_INT_OUTOMEM |	\
-			     V3D_INT_FLDONE |	\
-			     V3D_INT_FRDONE |	\
-			     V3D_INT_CSDDONE |	\
-			     V3D_INT_GMPV))
+#define V3D_CORE_IRQS(ver) ((u32)(V3D_INT_OUTOMEM |	\
+				  V3D_INT_FLDONE |	\
+				  V3D_INT_FRDONE |	\
+				  V3D_INT_CSDDONE(ver) |	\
+				  (ver < 71 ? V3D_INT_GMPV : 0)))
 
-#define V3D_HUB_IRQS ((u32)(V3D_HUB_INT_MMU_WRV |	\
-			    V3D_HUB_INT_MMU_PTI |	\
-			    V3D_HUB_INT_MMU_CAP |	\
-			    V3D_HUB_INT_TFUC))
+#define V3D_HUB_IRQS(ver) ((u32)(V3D_HUB_INT_MMU_WRV |	\
+				 V3D_HUB_INT_MMU_PTI |	\
+				 V3D_HUB_INT_MMU_CAP |	\
+				 V3D_HUB_INT_TFUC |		\
+				 (ver >= 71 ? V3D_V7_HUB_INT_GMPV : 0)))
 
 static irqreturn_t
 v3d_hub_irq(int irq, void *arg);
@@ -100,6 +102,18 @@ v3d_irq(int irq, void *arg)
 	if (intsts & V3D_INT_FLDONE) {
 		struct v3d_fence *fence =
 			to_v3d_fence(v3d->bin_job->base.irq_fence);
+		struct v3d_file_priv *file = v3d->bin_job->base.file->driver_priv;
+		u64 runtime = local_clock() - file->start_ns[V3D_BIN];
+
+		file->enabled_ns[V3D_BIN] += local_clock() - file->start_ns[V3D_BIN];
+		file->jobs_sent[V3D_BIN]++;
+		v3d->queue[V3D_BIN].jobs_sent++;
+
+		file->start_ns[V3D_BIN] = 0;
+		v3d->queue[V3D_BIN].start_ns = 0;
+
+		file->enabled_ns[V3D_BIN] += runtime;
+		v3d->queue[V3D_BIN].enabled_ns += runtime;
 
 		trace_v3d_bcl_irq(&v3d->drm, fence->seqno);
 		dma_fence_signal(&fence->base);
@@ -109,15 +123,39 @@ v3d_irq(int irq, void *arg)
 	if (intsts & V3D_INT_FRDONE) {
 		struct v3d_fence *fence =
 			to_v3d_fence(v3d->render_job->base.irq_fence);
+		struct v3d_file_priv *file = v3d->render_job->base.file->driver_priv;
+		u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
+
+		file->enabled_ns[V3D_RENDER] += local_clock() - file->start_ns[V3D_RENDER];
+		file->jobs_sent[V3D_RENDER]++;
+		v3d->queue[V3D_RENDER].jobs_sent++;
+
+		file->start_ns[V3D_RENDER] = 0;
+		v3d->queue[V3D_RENDER].start_ns = 0;
+
+		file->enabled_ns[V3D_RENDER] += runtime;
+		v3d->queue[V3D_RENDER].enabled_ns += runtime;
 
 		trace_v3d_rcl_irq(&v3d->drm, fence->seqno);
 		dma_fence_signal(&fence->base);
 		status = IRQ_HANDLED;
 	}
 
-	if (intsts & V3D_INT_CSDDONE) {
+	if (intsts & V3D_INT_CSDDONE(v3d->ver)) {
 		struct v3d_fence *fence =
 			to_v3d_fence(v3d->csd_job->base.irq_fence);
+		struct v3d_file_priv *file = v3d->csd_job->base.file->driver_priv;
+		u64 runtime = local_clock() - file->start_ns[V3D_CSD];
+
+		file->enabled_ns[V3D_CSD] += local_clock() - file->start_ns[V3D_CSD];
+		file->jobs_sent[V3D_CSD]++;
+		v3d->queue[V3D_CSD].jobs_sent++;
+
+		file->start_ns[V3D_CSD] = 0;
+		v3d->queue[V3D_CSD].start_ns = 0;
+
+		file->enabled_ns[V3D_CSD] += runtime;
+		v3d->queue[V3D_CSD].enabled_ns += runtime;
 
 		trace_v3d_csd_irq(&v3d->drm, fence->seqno);
 		dma_fence_signal(&fence->base);
@@ -127,7 +165,7 @@ v3d_irq(int irq, void *arg)
 	/* We shouldn't be triggering these if we have GMP in
 	 * always-allowed mode.
 	 */
-	if (intsts & V3D_INT_GMPV)
+	if (v3d->ver < 71 && (intsts & V3D_INT_GMPV))
 		dev_err(v3d->drm.dev, "GMP violation\n");
 
 	/* V3D 4.2 wires the hub and core IRQs together, so if we &
@@ -154,6 +192,18 @@ v3d_hub_irq(int irq, void *arg)
 	if (intsts & V3D_HUB_INT_TFUC) {
 		struct v3d_fence *fence =
 			to_v3d_fence(v3d->tfu_job->base.irq_fence);
+		struct v3d_file_priv *file = v3d->tfu_job->base.file->driver_priv;
+		u64 runtime = local_clock() - file->start_ns[V3D_TFU];
+
+		file->enabled_ns[V3D_TFU] += local_clock() - file->start_ns[V3D_TFU];
+		file->jobs_sent[V3D_TFU]++;
+		v3d->queue[V3D_TFU].jobs_sent++;
+
+		file->start_ns[V3D_TFU] = 0;
+		v3d->queue[V3D_TFU].start_ns = 0;
+
+		file->enabled_ns[V3D_TFU] += runtime;
+		v3d->queue[V3D_TFU].enabled_ns += runtime;
 
 		trace_v3d_tfu_irq(&v3d->drm, fence->seqno);
 		dma_fence_signal(&fence->base);
@@ -197,6 +247,11 @@ v3d_hub_irq(int irq, void *arg)
 		status = IRQ_HANDLED;
 	}
 
+	if (v3d->ver >= 71 && (intsts & V3D_V7_HUB_INT_GMPV)) {
+		dev_err(v3d->drm.dev, "GMP Violation\n");
+		status = IRQ_HANDLED;
+	}
+
 	return status;
 }
 
@@ -211,8 +266,8 @@ v3d_irq_init(struct v3d_dev *v3d)
 	 * for us.
 	 */
 	for (core = 0; core < v3d->cores; core++)
-		V3D_CORE_WRITE(core, V3D_CTL_INT_CLR, V3D_CORE_IRQS);
-	V3D_WRITE(V3D_HUB_INT_CLR, V3D_HUB_IRQS);
+		V3D_CORE_WRITE(core, V3D_CTL_INT_CLR, V3D_CORE_IRQS(v3d->ver));
+	V3D_WRITE(V3D_HUB_INT_CLR, V3D_HUB_IRQS(v3d->ver));
 
 	irq1 = platform_get_irq_optional(v3d_to_pdev(v3d), 1);
 	if (irq1 == -EPROBE_DEFER)
@@ -256,12 +311,12 @@ v3d_irq_enable(struct v3d_dev *v3d)
 
 	/* Enable our set of interrupts, masking out any others. */
 	for (core = 0; core < v3d->cores; core++) {
-		V3D_CORE_WRITE(core, V3D_CTL_INT_MSK_SET, ~V3D_CORE_IRQS);
-		V3D_CORE_WRITE(core, V3D_CTL_INT_MSK_CLR, V3D_CORE_IRQS);
+		V3D_CORE_WRITE(core, V3D_CTL_INT_MSK_SET, ~V3D_CORE_IRQS(v3d->ver));
+		V3D_CORE_WRITE(core, V3D_CTL_INT_MSK_CLR, V3D_CORE_IRQS(v3d->ver));
 	}
 
-	V3D_WRITE(V3D_HUB_INT_MSK_SET, ~V3D_HUB_IRQS);
-	V3D_WRITE(V3D_HUB_INT_MSK_CLR, V3D_HUB_IRQS);
+	V3D_WRITE(V3D_HUB_INT_MSK_SET, ~V3D_HUB_IRQS(v3d->ver));
+	V3D_WRITE(V3D_HUB_INT_MSK_CLR, V3D_HUB_IRQS(v3d->ver));
 }
 
 void
@@ -276,8 +331,8 @@ v3d_irq_disable(struct v3d_dev *v3d)
 
 	/* Clear any pending interrupts we might have left. */
 	for (core = 0; core < v3d->cores; core++)
-		V3D_CORE_WRITE(core, V3D_CTL_INT_CLR, V3D_CORE_IRQS);
-	V3D_WRITE(V3D_HUB_INT_CLR, V3D_HUB_IRQS);
+		V3D_CORE_WRITE(core, V3D_CTL_INT_CLR, V3D_CORE_IRQS(v3d->ver));
+	V3D_WRITE(V3D_HUB_INT_CLR, V3D_HUB_IRQS(v3d->ver));
 
 	cancel_work_sync(&v3d->overflow_mem_work);
 }
diff --git a/drivers/gpu/drm/v3d/v3d_regs.h b/drivers/gpu/drm/v3d/v3d_regs.h
index 3663e0d6bf76..1b1a62ad9585 100644
--- a/drivers/gpu/drm/v3d/v3d_regs.h
+++ b/drivers/gpu/drm/v3d/v3d_regs.h
@@ -57,6 +57,7 @@
 #define V3D_HUB_INT_MSK_STS                            0x0005c
 #define V3D_HUB_INT_MSK_SET                            0x00060
 #define V3D_HUB_INT_MSK_CLR                            0x00064
+# define V3D_V7_HUB_INT_GMPV                           BIT(6)
 # define V3D_HUB_INT_MMU_WRV                           BIT(5)
 # define V3D_HUB_INT_MMU_PTI                           BIT(4)
 # define V3D_HUB_INT_MMU_CAP                           BIT(3)
@@ -64,6 +65,7 @@
 # define V3D_HUB_INT_TFUC                              BIT(1)
 # define V3D_HUB_INT_TFUF                              BIT(0)
 
+/* GCA registers only exist in V3D < 41 */
 #define V3D_GCA_CACHE_CTRL                             0x0000c
 # define V3D_GCA_CACHE_CTRL_FLUSH                      BIT(0)
 
@@ -86,7 +88,8 @@
 # define V3D_TOP_GR_BRIDGE_SW_INIT_1                   0x0000c
 # define V3D_TOP_GR_BRIDGE_SW_INIT_1_V3D_CLK_108_SW_INIT BIT(0)
 
-#define V3D_TFU_CS                                     0x00400
+#define V3D_TFU_CS(ver) ((ver >= 71) ? 0x00700 : 0x00400)
+
 /* Stops current job, empties input fifo. */
 # define V3D_TFU_CS_TFURST                             BIT(31)
 # define V3D_TFU_CS_CVTCT_MASK                         V3D_MASK(23, 16)
@@ -95,7 +98,7 @@
 # define V3D_TFU_CS_NFREE_SHIFT                        8
 # define V3D_TFU_CS_BUSY                               BIT(0)
 
-#define V3D_TFU_SU                                     0x00404
+#define V3D_TFU_SU(ver) ((ver >= 71) ? 0x00704 : 0x00404)
 /* Interrupt when FINTTHR input slots are free (0 = disabled) */
 # define V3D_TFU_SU_FINTTHR_MASK                       V3D_MASK(13, 8)
 # define V3D_TFU_SU_FINTTHR_SHIFT                      8
@@ -106,39 +109,42 @@
 # define V3D_TFU_SU_THROTTLE_MASK                      V3D_MASK(1, 0)
 # define V3D_TFU_SU_THROTTLE_SHIFT                     0
 
-#define V3D_TFU_ICFG                                   0x00408
+#define V3D_TFU_ICFG(ver) ((ver >= 71) ? 0x00708 : 0x00408)
 /* Interrupt when the conversion is complete. */
 # define V3D_TFU_ICFG_IOC                              BIT(0)
 
 /* Input Image Address */
-#define V3D_TFU_IIA                                    0x0040c
+#define V3D_TFU_IIA(ver) ((ver >= 71) ? 0x0070c : 0x0040c)
 /* Input Chroma Address */
-#define V3D_TFU_ICA                                    0x00410
+#define V3D_TFU_ICA(ver) ((ver >= 71) ? 0x00710 : 0x00410)
 /* Input Image Stride */
-#define V3D_TFU_IIS                                    0x00414
+#define V3D_TFU_IIS(ver) ((ver >= 71) ? 0x00714 : 0x00414)
 /* Input Image U-Plane Address */
-#define V3D_TFU_IUA                                    0x00418
+#define V3D_TFU_IUA(ver) ((ver >= 71) ? 0x00718 : 0x00418)
+/* Image output config (VD 7.x only) */
+#define V3D_V7_TFU_IOC                                 0x0071c
 /* Output Image Address */
-#define V3D_TFU_IOA                                    0x0041c
+#define V3D_TFU_IOA(ver) ((ver >= 71) ? 0x00720 : 0x0041c)
 /* Image Output Size */
-#define V3D_TFU_IOS                                    0x00420
+#define V3D_TFU_IOS(ver) ((ver >= 71) ? 0x00724 : 0x00420)
 /* TFU YUV Coefficient 0 */
-#define V3D_TFU_COEF0                                  0x00424
-/* Use these regs instead of the defaults. */
+#define V3D_TFU_COEF0(ver) ((ver >= 71) ? 0x00728 : 0x00424)
+/* Use these regs instead of the defaults (V3D 4.x only) */
 # define V3D_TFU_COEF0_USECOEF                         BIT(31)
 /* TFU YUV Coefficient 1 */
-#define V3D_TFU_COEF1                                  0x00428
+#define V3D_TFU_COEF1(ver) ((ver >= 71) ? 0x0072c : 0x00428)
 /* TFU YUV Coefficient 2 */
-#define V3D_TFU_COEF2                                  0x0042c
+#define V3D_TFU_COEF2(ver) ((ver >= 71) ? 0x00730 : 0x0042c)
 /* TFU YUV Coefficient 3 */
-#define V3D_TFU_COEF3                                  0x00430
+#define V3D_TFU_COEF3(ver) ((ver >= 71) ? 0x00734 : 0x00430)
 
+/* V3D 4.x only */
 #define V3D_TFU_CRC                                    0x00434
 
 /* Per-MMU registers. */
 
 #define V3D_MMUC_CONTROL                               0x01000
-# define V3D_MMUC_CONTROL_CLEAR                        BIT(3)
+#define V3D_MMUC_CONTROL_CLEAR(ver) ((ver >= 71) ? BIT(11) : BIT(3))
 # define V3D_MMUC_CONTROL_FLUSHING                     BIT(2)
 # define V3D_MMUC_CONTROL_FLUSH                        BIT(1)
 # define V3D_MMUC_CONTROL_ENABLE                       BIT(0)
@@ -246,7 +252,6 @@
 
 #define V3D_CTL_L2TCACTL                               0x00030
 # define V3D_L2TCACTL_TMUWCF                           BIT(8)
-# define V3D_L2TCACTL_L2T_NO_WM                        BIT(4)
 /* Invalidates cache lines. */
 # define V3D_L2TCACTL_FLM_FLUSH                        0
 /* Removes cachelines without writing dirty lines back. */
@@ -267,8 +272,8 @@
 #define V3D_CTL_INT_MSK_CLR                            0x00064
 # define V3D_INT_QPU_MASK                              V3D_MASK(27, 16)
 # define V3D_INT_QPU_SHIFT                             16
-# define V3D_INT_CSDDONE                               BIT(7)
-# define V3D_INT_PCTR                                  BIT(6)
+#define V3D_INT_CSDDONE(ver) ((ver >= 71) ? BIT(6) : BIT(7))
+#define V3D_INT_PCTR(ver) ((ver >= 71) ? BIT(5) : BIT(6))
 # define V3D_INT_GMPV                                  BIT(5)
 # define V3D_INT_TRFB                                  BIT(4)
 # define V3D_INT_SPILLUSE                              BIT(3)
@@ -350,21 +355,25 @@
 #define V3D_V4_PCTR_0_SRC_X(x)                         (V3D_V4_PCTR_0_SRC_0_3 + \
 							4 * (x))
 # define V3D_PCTR_S0_MASK                              V3D_MASK(6, 0)
+# define V3D_V7_PCTR_S0_MASK                           V3D_MASK(7, 0)
 # define V3D_PCTR_S0_SHIFT                             0
 # define V3D_PCTR_S1_MASK                              V3D_MASK(14, 8)
+# define V3D_V7_PCTR_S1_MASK                           V3D_MASK(15, 8)
 # define V3D_PCTR_S1_SHIFT                             8
 # define V3D_PCTR_S2_MASK                              V3D_MASK(22, 16)
+# define V3D_V7_PCTR_S2_MASK                           V3D_MASK(23, 16)
 # define V3D_PCTR_S2_SHIFT                             16
 # define V3D_PCTR_S3_MASK                              V3D_MASK(30, 24)
+# define V3D_V7_PCTR_S3_MASK                           V3D_MASK(31, 24)
 # define V3D_PCTR_S3_SHIFT                             24
-# define V3D_PCTR_CYCLE_COUNT                          32
+#define V3D_PCTR_CYCLE_COUNT(ver) ((ver >= 71) ? 0 : 32)
 
 /* Output values of the counters. */
 #define V3D_PCTR_0_PCTR0                               0x00680
 #define V3D_PCTR_0_PCTR31                              0x006fc
 #define V3D_PCTR_0_PCTRX(x)                            (V3D_PCTR_0_PCTR0 + \
 							4 * (x))
-#define V3D_GMP_STATUS                                 0x00800
+#define V3D_GMP_STATUS(ver) ((ver >= 71) ? 0x00600 : 0x00800)
 # define V3D_GMP_STATUS_GMPRST                         BIT(31)
 # define V3D_GMP_STATUS_WR_COUNT_MASK                  V3D_MASK(30, 24)
 # define V3D_GMP_STATUS_WR_COUNT_SHIFT                 24
@@ -377,13 +386,13 @@
 # define V3D_GMP_STATUS_INVPROT                        BIT(1)
 # define V3D_GMP_STATUS_VIO                            BIT(0)
 
-#define V3D_GMP_CFG                                    0x00804
+#define V3D_GMP_CFG(ver) ((ver >= 71) ? 0x00604 : 0x00804)
 # define V3D_GMP_CFG_LBURSTEN                          BIT(3)
 # define V3D_GMP_CFG_PGCRSEN                           BIT()
 # define V3D_GMP_CFG_STOP_REQ                          BIT(1)
 # define V3D_GMP_CFG_PROT_ENABLE                       BIT(0)
 
-#define V3D_GMP_VIO_ADDR                               0x00808
+#define V3D_GMP_VIO_ADDR(ver) ((ver >= 71) ? 0x00608 : 0x00808)
 #define V3D_GMP_VIO_TYPE                               0x0080c
 #define V3D_GMP_TABLE_ADDR                             0x00810
 #define V3D_GMP_CLEAR_LOAD                             0x00814
@@ -398,25 +407,25 @@
 # define V3D_CSD_STATUS_HAVE_CURRENT_DISPATCH          BIT(1)
 # define V3D_CSD_STATUS_HAVE_QUEUED_DISPATCH           BIT(0)
 
-#define V3D_CSD_QUEUED_CFG0                            0x00904
+#define V3D_CSD_QUEUED_CFG0(ver) ((ver >= 71) ? 0x00930 : 0x00904)
 # define V3D_CSD_QUEUED_CFG0_NUM_WGS_X_MASK            V3D_MASK(31, 16)
 # define V3D_CSD_QUEUED_CFG0_NUM_WGS_X_SHIFT           16
 # define V3D_CSD_QUEUED_CFG0_WG_X_OFFSET_MASK          V3D_MASK(15, 0)
 # define V3D_CSD_QUEUED_CFG0_WG_X_OFFSET_SHIFT         0
 
-#define V3D_CSD_QUEUED_CFG1                            0x00908
+#define V3D_CSD_QUEUED_CFG1(ver) ((ver >= 71) ? 0x00934 : 0x00908)
 # define V3D_CSD_QUEUED_CFG1_NUM_WGS_Y_MASK            V3D_MASK(31, 16)
 # define V3D_CSD_QUEUED_CFG1_NUM_WGS_Y_SHIFT           16
 # define V3D_CSD_QUEUED_CFG1_WG_Y_OFFSET_MASK          V3D_MASK(15, 0)
 # define V3D_CSD_QUEUED_CFG1_WG_Y_OFFSET_SHIFT         0
 
-#define V3D_CSD_QUEUED_CFG2                            0x0090c
+#define V3D_CSD_QUEUED_CFG2(ver) ((ver >= 71) ? 0x00938 : 0x0090c)
 # define V3D_CSD_QUEUED_CFG2_NUM_WGS_Z_MASK            V3D_MASK(31, 16)
 # define V3D_CSD_QUEUED_CFG2_NUM_WGS_Z_SHIFT           16
 # define V3D_CSD_QUEUED_CFG2_WG_Z_OFFSET_MASK          V3D_MASK(15, 0)
 # define V3D_CSD_QUEUED_CFG2_WG_Z_OFFSET_SHIFT         0
 
-#define V3D_CSD_QUEUED_CFG3                            0x00910
+#define V3D_CSD_QUEUED_CFG3(ver) ((ver >= 71) ? 0x0093c : 0x00910)
 # define V3D_CSD_QUEUED_CFG3_OVERLAP_WITH_PREV         BIT(26)
 # define V3D_CSD_QUEUED_CFG3_MAX_SG_ID_MASK            V3D_MASK(25, 20)
 # define V3D_CSD_QUEUED_CFG3_MAX_SG_ID_SHIFT           20
@@ -428,23 +437,28 @@
 # define V3D_CSD_QUEUED_CFG3_WG_SIZE_SHIFT             0
 
 /* Number of batches, minus 1 */
-#define V3D_CSD_QUEUED_CFG4                            0x00914
+#define V3D_CSD_QUEUED_CFG4(ver) ((ver >= 71) ? 0x00940 : 0x00914)
 
 /* Shader address, pnan, singleseg, threading, like a shader record. */
-#define V3D_CSD_QUEUED_CFG5                            0x00918
+#define V3D_CSD_QUEUED_CFG5(ver) ((ver >= 71) ? 0x00944 : 0x00918)
 
 /* Uniforms address (4 byte aligned) */
-#define V3D_CSD_QUEUED_CFG6                            0x0091c
+#define V3D_CSD_QUEUED_CFG6(ver) ((ver >= 71) ? 0x00948 : 0x0091c)
 
-#define V3D_CSD_CURRENT_CFG0                          0x00920
-#define V3D_CSD_CURRENT_CFG1                          0x00924
-#define V3D_CSD_CURRENT_CFG2                          0x00928
-#define V3D_CSD_CURRENT_CFG3                          0x0092c
-#define V3D_CSD_CURRENT_CFG4                          0x00930
-#define V3D_CSD_CURRENT_CFG5                          0x00934
-#define V3D_CSD_CURRENT_CFG6                          0x00938
+/* V3D 7.x+ only */
+#define V3D_V7_CSD_QUEUED_CFG7                         0x0094c
 
-#define V3D_CSD_CURRENT_ID0                            0x0093c
+#define V3D_CSD_CURRENT_CFG0(ver) ((ver >= 71) ? 0x00958 : 0x00920)
+#define V3D_CSD_CURRENT_CFG1(ver) ((ver >= 71) ? 0x0095c : 0x00924)
+#define V3D_CSD_CURRENT_CFG2(ver) ((ver >= 71) ? 0x00960 : 0x00928)
+#define V3D_CSD_CURRENT_CFG3(ver) ((ver >= 71) ? 0x00964 : 0x0092c)
+#define V3D_CSD_CURRENT_CFG4(ver) ((ver >= 71) ? 0x00968 : 0x00930)
+#define V3D_CSD_CURRENT_CFG5(ver) ((ver >= 71) ? 0x0096c : 0x00934)
+#define V3D_CSD_CURRENT_CFG6(ver) ((ver >= 71) ? 0x00970 : 0x00938)
+/* V3D 7.x+ only */
+#define V3D_V7_CSD_CURRENT_CFG7                        0x00974
+
+#define V3D_CSD_CURRENT_ID0(ver) ((ver >= 71) ? 0x00978 : 0x0093c)
 # define V3D_CSD_CURRENT_ID0_WG_X_MASK                 V3D_MASK(31, 16)
 # define V3D_CSD_CURRENT_ID0_WG_X_SHIFT                16
 # define V3D_CSD_CURRENT_ID0_WG_IN_SG_MASK             V3D_MASK(11, 8)
@@ -452,7 +466,7 @@
 # define V3D_CSD_CURRENT_ID0_L_IDX_MASK                V3D_MASK(7, 0)
 # define V3D_CSD_CURRENT_ID0_L_IDX_SHIFT               0
 
-#define V3D_CSD_CURRENT_ID1                            0x00940
+#define V3D_CSD_CURRENT_ID1(ver) ((ver >= 71) ? 0x0097c : 0x00940)
 # define V3D_CSD_CURRENT_ID0_WG_Z_MASK                 V3D_MASK(31, 16)
 # define V3D_CSD_CURRENT_ID0_WG_Z_SHIFT                16
 # define V3D_CSD_CURRENT_ID0_WG_Y_MASK                 V3D_MASK(15, 0)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 038e1ae589c7..fccbea2a5f2e 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -18,6 +18,7 @@
  * semaphores to interlock between them.
  */
 
+#include <linux/sched/clock.h>
 #include <linux/kthread.h>
 
 #include "v3d_drv.h"
@@ -76,6 +77,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_bin_job *job = to_bin_job(sched_job);
 	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_file_priv *file = job->base.file->driver_priv;
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
 	unsigned long irqflags;
@@ -107,6 +109,9 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
 	trace_v3d_submit_cl(dev, false, to_v3d_fence(fence)->seqno,
 			    job->start, job->end);
 
+	file->start_ns[V3D_BIN] = local_clock();
+	v3d->queue[V3D_BIN].start_ns = file->start_ns[V3D_BIN];
+
 	v3d_switch_perfmon(v3d, &job->base);
 
 	/* Set the current and end address of the control list.
@@ -131,6 +136,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_render_job *job = to_render_job(sched_job);
 	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_file_priv *file = job->base.file->driver_priv;
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
 
@@ -158,6 +164,9 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
 	trace_v3d_submit_cl(dev, true, to_v3d_fence(fence)->seqno,
 			    job->start, job->end);
 
+	file->start_ns[V3D_RENDER] = local_clock();
+	v3d->queue[V3D_RENDER].start_ns = file->start_ns[V3D_RENDER];
+
 	v3d_switch_perfmon(v3d, &job->base);
 
 	/* XXX: Set the QCFG */
@@ -176,6 +185,7 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_tfu_job *job = to_tfu_job(sched_job);
 	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_file_priv *file = job->base.file->driver_priv;
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
 
@@ -190,20 +200,25 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
 
 	trace_v3d_submit_tfu(dev, to_v3d_fence(fence)->seqno);
 
-	V3D_WRITE(V3D_TFU_IIA, job->args.iia);
-	V3D_WRITE(V3D_TFU_IIS, job->args.iis);
-	V3D_WRITE(V3D_TFU_ICA, job->args.ica);
-	V3D_WRITE(V3D_TFU_IUA, job->args.iua);
-	V3D_WRITE(V3D_TFU_IOA, job->args.ioa);
-	V3D_WRITE(V3D_TFU_IOS, job->args.ios);
-	V3D_WRITE(V3D_TFU_COEF0, job->args.coef[0]);
-	if (job->args.coef[0] & V3D_TFU_COEF0_USECOEF) {
-		V3D_WRITE(V3D_TFU_COEF1, job->args.coef[1]);
-		V3D_WRITE(V3D_TFU_COEF2, job->args.coef[2]);
-		V3D_WRITE(V3D_TFU_COEF3, job->args.coef[3]);
+	file->start_ns[V3D_TFU] = local_clock();
+	v3d->queue[V3D_TFU].start_ns = file->start_ns[V3D_TFU];
+
+	V3D_WRITE(V3D_TFU_IIA(v3d->ver), job->args.iia);
+	V3D_WRITE(V3D_TFU_IIS(v3d->ver), job->args.iis);
+	V3D_WRITE(V3D_TFU_ICA(v3d->ver), job->args.ica);
+	V3D_WRITE(V3D_TFU_IUA(v3d->ver), job->args.iua);
+	V3D_WRITE(V3D_TFU_IOA(v3d->ver), job->args.ioa);
+	if (v3d->ver >= 71)
+		V3D_WRITE(V3D_V7_TFU_IOC, job->args.v71.ioc);
+	V3D_WRITE(V3D_TFU_IOS(v3d->ver), job->args.ios);
+	V3D_WRITE(V3D_TFU_COEF0(v3d->ver), job->args.coef[0]);
+	if (v3d->ver >= 71 || (job->args.coef[0] & V3D_TFU_COEF0_USECOEF)) {
+		V3D_WRITE(V3D_TFU_COEF1(v3d->ver), job->args.coef[1]);
+		V3D_WRITE(V3D_TFU_COEF2(v3d->ver), job->args.coef[2]);
+		V3D_WRITE(V3D_TFU_COEF3(v3d->ver), job->args.coef[3]);
 	}
 	/* ICFG kicks off the job. */
-	V3D_WRITE(V3D_TFU_ICFG, job->args.icfg | V3D_TFU_ICFG_IOC);
+	V3D_WRITE(V3D_TFU_ICFG(v3d->ver), job->args.icfg | V3D_TFU_ICFG_IOC);
 
 	return fence;
 }
@@ -213,9 +228,10 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_csd_job *job = to_csd_job(sched_job);
 	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_file_priv *file = job->base.file->driver_priv;
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
-	int i;
+	int i, csd_cfg0_reg, csd_cfg_reg_count;
 
 	v3d->csd_job = job;
 
@@ -231,12 +247,17 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 
 	trace_v3d_submit_csd(dev, to_v3d_fence(fence)->seqno);
 
+	file->start_ns[V3D_CSD] = local_clock();
+	v3d->queue[V3D_CSD].start_ns = file->start_ns[V3D_CSD];
+
 	v3d_switch_perfmon(v3d, &job->base);
 
-	for (i = 1; i <= 6; i++)
-		V3D_CORE_WRITE(0, V3D_CSD_QUEUED_CFG0 + 4 * i, job->args.cfg[i]);
+	csd_cfg0_reg = V3D_CSD_QUEUED_CFG0(v3d->ver);
+	csd_cfg_reg_count = v3d->ver < 71 ? 6 : 7;
+	for (i = 1; i <= csd_cfg_reg_count; i++)
+		V3D_CORE_WRITE(0, csd_cfg0_reg + 4 * i, job->args.cfg[i]);
 	/* CFG0 write kicks off the job. */
-	V3D_CORE_WRITE(0, V3D_CSD_QUEUED_CFG0, job->args.cfg[0]);
+	V3D_CORE_WRITE(0, csd_cfg0_reg, job->args.cfg[0]);
 
 	return fence;
 }
@@ -246,9 +267,25 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_job *job = to_v3d_job(sched_job);
 	struct v3d_dev *v3d = job->v3d;
+	struct v3d_file_priv *file = job->file->driver_priv;
+	u64 runtime;
+
+	file->start_ns[V3D_CACHE_CLEAN] = local_clock();
+	v3d->queue[V3D_CACHE_CLEAN].start_ns = file->start_ns[V3D_CACHE_CLEAN];
 
 	v3d_clean_caches(v3d);
 
+	runtime = local_clock() - file->start_ns[V3D_CACHE_CLEAN];
+
+	file->enabled_ns[V3D_CACHE_CLEAN] += runtime;
+	v3d->queue[V3D_CACHE_CLEAN].enabled_ns += runtime;
+
+	file->jobs_sent[V3D_CACHE_CLEAN]++;
+	v3d->queue[V3D_CACHE_CLEAN].jobs_sent++;
+
+	file->start_ns[V3D_CACHE_CLEAN] = 0;
+	v3d->queue[V3D_CACHE_CLEAN].start_ns = 0;
+
 	return NULL;
 }
 
@@ -336,7 +373,7 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job)
 {
 	struct v3d_csd_job *job = to_csd_job(sched_job);
 	struct v3d_dev *v3d = job->base.v3d;
-	u32 batches = V3D_CORE_READ(0, V3D_CSD_CURRENT_CFG4);
+	u32 batches = V3D_CORE_READ(0, V3D_CSD_CURRENT_CFG4(v3d->ver));
 
 	/* If we've made progress, skip reset and let the timer get
 	 * rearmed.
@@ -388,7 +425,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 	int ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
-			     &v3d_bin_sched_ops,
+			     &v3d_bin_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -397,7 +434,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		return ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
-			     &v3d_render_sched_ops,
+			     &v3d_render_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -406,7 +443,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		goto fail;
 
 	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
-			     &v3d_tfu_sched_ops,
+			     &v3d_tfu_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -416,7 +453,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 
 	if (v3d_has_csd(v3d)) {
 		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
-				     &v3d_csd_sched_ops,
+				     &v3d_csd_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -425,7 +462,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 			goto fail;
 
 		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
-				     &v3d_cache_clean_sched_ops,
+				     &v3d_cache_clean_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
diff --git a/drivers/gpu/drm/v3d/v3d_sysfs.c b/drivers/gpu/drm/v3d/v3d_sysfs.c
new file mode 100644
index 000000000000..d106845ba890
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_sysfs.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Igalia S.L.
+ */
+
+#include <linux/sched/clock.h>
+#include <linux/sysfs.h>
+
+#include "v3d_drv.h"
+
+static ssize_t
+gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct drm_device *drm = dev_get_drvdata(dev);
+	struct v3d_dev *v3d = to_v3d_dev(drm);
+	enum v3d_queue queue;
+	u64 timestamp = local_clock();
+	u64 active_runtime;
+	ssize_t len = 0;
+
+	len += sysfs_emit(buf, "queue\ttimestamp\tjobs\truntime\n");
+
+	for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+		if (v3d->queue[queue].start_ns)
+			active_runtime = timestamp - v3d->queue[queue].start_ns;
+		else
+			active_runtime = 0;
+
+		/* Each line will display the queue name, timestamp, the number
+		 * of jobs sent to that queue and the runtime, as can be seem here:
+		 *
+		 * queue	timestamp	jobs	runtime
+		 * bin		239043069420	22620	17438164056
+		 * render	239043069420	22619	27284814161
+		 * tfu		239043069420	8763	394592566
+		 * csd		239043069420	3168	10787905530
+		 * cache_clean	239043069420	6127	237375940
+		 */
+		len += sysfs_emit_at(buf, len, "%s\t%llu\t%llu\t%llu\n",
+				     v3d_queue_to_string(queue),
+				     timestamp,
+				     v3d->queue[queue].jobs_sent,
+				     v3d->queue[queue].enabled_ns + active_runtime);
+	}
+
+	return len;
+}
+static DEVICE_ATTR_RO(gpu_stats);
+
+static struct attribute *v3d_sysfs_entries[] = {
+	&dev_attr_gpu_stats.attr,
+	NULL,
+};
+
+static struct attribute_group v3d_sysfs_attr_group = {
+	.attrs = v3d_sysfs_entries,
+};
+
+int
+v3d_sysfs_init(struct device *dev)
+{
+	return sysfs_create_group(&dev->kobj, &v3d_sysfs_attr_group);
+}
+
+void
+v3d_sysfs_destroy(struct device *dev)
+{
+	return sysfs_remove_group(&dev->kobj, &v3d_sysfs_attr_group);
+}
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 96365a772f77..bb7d86a0c6a1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -58,6 +58,9 @@
 #define MAX_CAPSET_ID 63
 #define MAX_RINGS 64
 
+/* See virtio_gpu_ctx_create. One additional character for NULL terminator. */
+#define DEBUG_NAME_MAX_LEN 65
+
 struct virtio_gpu_object_params {
 	unsigned long size;
 	bool dumb;
@@ -274,6 +277,8 @@ struct virtio_gpu_fpriv {
 	uint64_t base_fence_ctx;
 	uint64_t ring_idx_mask;
 	struct mutex context_lock;
+	char debug_name[DEBUG_NAME_MAX_LEN];
+	bool explicit_debug_name;
 };
 
 /* virtgpu_ioctl.c */
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index b24b11f25197..e4f76f315550 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -42,12 +42,19 @@
 static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
 					     struct virtio_gpu_fpriv *vfpriv)
 {
-	char dbgname[TASK_COMM_LEN];
+	if (vfpriv->explicit_debug_name) {
+		virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+					      vfpriv->context_init,
+					      strlen(vfpriv->debug_name),
+					      vfpriv->debug_name);
+	} else {
+		char dbgname[TASK_COMM_LEN];
 
-	get_task_comm(dbgname, current);
-	virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
-				      vfpriv->context_init, strlen(dbgname),
-				      dbgname);
+		get_task_comm(dbgname, current);
+		virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+					      vfpriv->context_init, strlen(dbgname),
+					      dbgname);
+	}
 
 	vfpriv->context_created = true;
 }
@@ -107,6 +114,9 @@ static int virtio_gpu_getparam_ioctl(struct drm_device *dev, void *data,
 	case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
 		value = vgdev->capset_id_mask;
 		break;
+	case VIRTGPU_PARAM_EXPLICIT_DEBUG_NAME:
+		value = vgdev->has_context_init ? 1 : 0;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -565,8 +575,8 @@ static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
 					 void *data, struct drm_file *file)
 {
 	int ret = 0;
-	uint32_t num_params, i, param, value;
-	uint64_t valid_ring_mask;
+	uint32_t num_params, i;
+	uint64_t valid_ring_mask, param, value;
 	size_t len;
 	struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
 	struct virtio_gpu_device *vgdev = dev->dev_private;
@@ -580,7 +590,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
 		return -EINVAL;
 
 	/* Number of unique parameters supported at this time. */
-	if (num_params > 3)
+	if (num_params > 4)
 		return -EINVAL;
 
 	ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -642,6 +652,21 @@ static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
 
 			vfpriv->ring_idx_mask = value;
 			break;
+		case VIRTGPU_CONTEXT_PARAM_DEBUG_NAME:
+			if (vfpriv->explicit_debug_name) {
+				ret = -EINVAL;
+				goto out_unlock;
+			}
+
+			ret = strncpy_from_user(vfpriv->debug_name,
+						u64_to_user_ptr(value),
+						DEBUG_NAME_MAX_LEN - 1);
+			if (ret < 0)
+				goto out_unlock;
+
+			vfpriv->explicit_debug_name = true;
+			ret = 0;
+			break;
 		default:
 			ret = -EINVAL;
 			goto out_unlock;
diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c
index 62f99f6fccd3..fe682af63827 100644
--- a/drivers/video/fbdev/simplefb.c
+++ b/drivers/video/fbdev/simplefb.c
@@ -21,9 +21,11 @@
 #include <linux/platform_device.h>
 #include <linux/clk.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/of_clk.h>
 #include <linux/of_platform.h>
 #include <linux/parser.h>
+#include <linux/pm_domain.h>
 #include <linux/regulator/consumer.h>
 
 static const struct fb_fix_screeninfo simplefb_fix = {
@@ -77,6 +79,11 @@ struct simplefb_par {
 	unsigned int clk_count;
 	struct clk **clks;
 #endif
+#if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
+	unsigned int num_genpds;
+	struct device **genpds;
+	struct device_link **genpd_links;
+#endif
 #if defined CONFIG_OF && defined CONFIG_REGULATOR
 	bool regulators_enabled;
 	u32 regulator_count;
@@ -121,12 +128,13 @@ struct simplefb_params {
 	u32 height;
 	u32 stride;
 	struct simplefb_format *format;
+	struct resource memory;
 };
 
 static int simplefb_parse_dt(struct platform_device *pdev,
 			   struct simplefb_params *params)
 {
-	struct device_node *np = pdev->dev.of_node;
+	struct device_node *np = pdev->dev.of_node, *mem;
 	int ret;
 	const char *format;
 	int i;
@@ -166,6 +174,23 @@ static int simplefb_parse_dt(struct platform_device *pdev,
 		return -EINVAL;
 	}
 
+	mem = of_parse_phandle(np, "memory-region", 0);
+	if (mem) {
+		ret = of_address_to_resource(mem, 0, &params->memory);
+		if (ret < 0) {
+			dev_err(&pdev->dev, "failed to parse memory-region\n");
+			of_node_put(mem);
+			return ret;
+		}
+
+		if (of_property_present(np, "reg"))
+			dev_warn(&pdev->dev, "preferring \"memory-region\" over \"reg\" property\n");
+
+		of_node_put(mem);
+	} else {
+		memset(&params->memory, 0, sizeof(params->memory));
+	}
+
 	return 0;
 }
 
@@ -193,6 +218,8 @@ static int simplefb_parse_pd(struct platform_device *pdev,
 		return -EINVAL;
 	}
 
+	memset(&params->memory, 0, sizeof(params->memory));
+
 	return 0;
 }
 
@@ -411,6 +438,89 @@ static void simplefb_regulators_enable(struct simplefb_par *par,
 static void simplefb_regulators_destroy(struct simplefb_par *par) { }
 #endif
 
+#if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
+static void simplefb_detach_genpds(void *res)
+{
+	struct simplefb_par *par = res;
+	unsigned int i = par->num_genpds;
+
+	if (par->num_genpds <= 1)
+		return;
+
+	while (i--) {
+		if (par->genpd_links[i])
+			device_link_del(par->genpd_links[i]);
+
+		if (!IS_ERR_OR_NULL(par->genpds[i]))
+			dev_pm_domain_detach(par->genpds[i], true);
+	}
+}
+
+static int simplefb_attach_genpds(struct simplefb_par *par,
+				  struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	unsigned int i;
+	int err;
+
+	err = of_count_phandle_with_args(dev->of_node, "power-domains",
+					 "#power-domain-cells");
+	if (err < 0) {
+		dev_info(dev, "failed to parse power-domains: %d\n", err);
+		return err;
+	}
+
+	par->num_genpds = err;
+
+	/*
+	 * Single power-domain devices are handled by the driver core, so
+	 * nothing to do here.
+	 */
+	if (par->num_genpds <= 1)
+		return 0;
+
+	par->genpds = devm_kcalloc(dev, par->num_genpds, sizeof(*par->genpds),
+				   GFP_KERNEL);
+	if (!par->genpds)
+		return -ENOMEM;
+
+	par->genpd_links = devm_kcalloc(dev, par->num_genpds,
+					sizeof(*par->genpd_links),
+					GFP_KERNEL);
+	if (!par->genpd_links)
+		return -ENOMEM;
+
+	for (i = 0; i < par->num_genpds; i++) {
+		par->genpds[i] = dev_pm_domain_attach_by_id(dev, i);
+		if (IS_ERR(par->genpds[i])) {
+			err = PTR_ERR(par->genpds[i]);
+			if (err == -EPROBE_DEFER) {
+				simplefb_detach_genpds(par);
+				return err;
+			}
+
+			dev_warn(dev, "failed to attach domain %u: %d\n", i, err);
+			continue;
+		}
+
+		par->genpd_links[i] = device_link_add(dev, par->genpds[i],
+						      DL_FLAG_STATELESS |
+						      DL_FLAG_PM_RUNTIME |
+						      DL_FLAG_RPM_ACTIVE);
+		if (!par->genpd_links[i])
+			dev_warn(dev, "failed to link power-domain %u\n", i);
+	}
+
+	return devm_add_action_or_reset(dev, simplefb_detach_genpds, par);
+}
+#else
+static int simplefb_attach_genpds(struct simplefb_par *par,
+				  struct platform_device *pdev)
+{
+	return 0;
+}
+#endif
+
 static int simplefb_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -431,10 +541,14 @@ static int simplefb_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!res) {
-		dev_err(&pdev->dev, "No memory resource\n");
-		return -EINVAL;
+	if (params.memory.start == 0 && params.memory.end == 0) {
+		res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+		if (!res) {
+			dev_err(&pdev->dev, "No memory resource\n");
+			return -EINVAL;
+		}
+	} else {
+		res = &params.memory;
 	}
 
 	mem = request_mem_region(res->start, resource_size(res), "simplefb");
@@ -493,6 +607,10 @@ static int simplefb_probe(struct platform_device *pdev)
 	if (ret < 0)
 		goto error_clocks;
 
+	ret = simplefb_attach_genpds(par, pdev);
+	if (ret < 0)
+		goto error_regulators;
+
 	simplefb_clocks_enable(par, pdev);
 	simplefb_regulators_enable(par, pdev);
 
diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
index 882d2638708e..e98aa6818700 100644
--- a/include/drm/drm_edid.h
+++ b/include/drm/drm_edid.h
@@ -269,64 +269,6 @@ struct detailed_timing {
 #define DRM_EDID_DSC_MAX_SLICES			0xf
 #define DRM_EDID_DSC_TOTAL_CHUNK_KBYTES		0x3f
 
-/* ELD Header Block */
-#define DRM_ELD_HEADER_BLOCK_SIZE	4
-
-#define DRM_ELD_VER			0
-# define DRM_ELD_VER_SHIFT		3
-# define DRM_ELD_VER_MASK		(0x1f << 3)
-# define DRM_ELD_VER_CEA861D		(2 << 3) /* supports 861D or below */
-# define DRM_ELD_VER_CANNED		(0x1f << 3)
-
-#define DRM_ELD_BASELINE_ELD_LEN	2	/* in dwords! */
-
-/* ELD Baseline Block for ELD_Ver == 2 */
-#define DRM_ELD_CEA_EDID_VER_MNL	4
-# define DRM_ELD_CEA_EDID_VER_SHIFT	5
-# define DRM_ELD_CEA_EDID_VER_MASK	(7 << 5)
-# define DRM_ELD_CEA_EDID_VER_NONE	(0 << 5)
-# define DRM_ELD_CEA_EDID_VER_CEA861	(1 << 5)
-# define DRM_ELD_CEA_EDID_VER_CEA861A	(2 << 5)
-# define DRM_ELD_CEA_EDID_VER_CEA861BCD	(3 << 5)
-# define DRM_ELD_MNL_SHIFT		0
-# define DRM_ELD_MNL_MASK		(0x1f << 0)
-
-#define DRM_ELD_SAD_COUNT_CONN_TYPE	5
-# define DRM_ELD_SAD_COUNT_SHIFT	4
-# define DRM_ELD_SAD_COUNT_MASK		(0xf << 4)
-# define DRM_ELD_CONN_TYPE_SHIFT	2
-# define DRM_ELD_CONN_TYPE_MASK		(3 << 2)
-# define DRM_ELD_CONN_TYPE_HDMI		(0 << 2)
-# define DRM_ELD_CONN_TYPE_DP		(1 << 2)
-# define DRM_ELD_SUPPORTS_AI		(1 << 1)
-# define DRM_ELD_SUPPORTS_HDCP		(1 << 0)
-
-#define DRM_ELD_AUD_SYNCH_DELAY		6	/* in units of 2 ms */
-# define DRM_ELD_AUD_SYNCH_DELAY_MAX	0xfa	/* 500 ms */
-
-#define DRM_ELD_SPEAKER			7
-# define DRM_ELD_SPEAKER_MASK		0x7f
-# define DRM_ELD_SPEAKER_RLRC		(1 << 6)
-# define DRM_ELD_SPEAKER_FLRC		(1 << 5)
-# define DRM_ELD_SPEAKER_RC		(1 << 4)
-# define DRM_ELD_SPEAKER_RLR		(1 << 3)
-# define DRM_ELD_SPEAKER_FC		(1 << 2)
-# define DRM_ELD_SPEAKER_LFE		(1 << 1)
-# define DRM_ELD_SPEAKER_FLR		(1 << 0)
-
-#define DRM_ELD_PORT_ID			8	/* offsets 8..15 inclusive */
-# define DRM_ELD_PORT_ID_LEN		8
-
-#define DRM_ELD_MANUFACTURER_NAME0	16
-#define DRM_ELD_MANUFACTURER_NAME1	17
-
-#define DRM_ELD_PRODUCT_CODE0		18
-#define DRM_ELD_PRODUCT_CODE1		19
-
-#define DRM_ELD_MONITOR_NAME_STRING	20	/* offsets 20..(20+mnl-1) inclusive */
-
-#define DRM_ELD_CEA_SAD(mnl, sad)	(20 + (mnl) + 3 * (sad))
-
 struct edid {
 	u8 header[8];
 	/* Vendor & product info */
@@ -409,96 +351,6 @@ drm_hdmi_avi_infoframe_quant_range(struct hdmi_avi_infoframe *frame,
 				   const struct drm_display_mode *mode,
 				   enum hdmi_quantization_range rgb_quant_range);
 
-/**
- * drm_eld_mnl - Get ELD monitor name length in bytes.
- * @eld: pointer to an eld memory structure with mnl set
- */
-static inline int drm_eld_mnl(const uint8_t *eld)
-{
-	return (eld[DRM_ELD_CEA_EDID_VER_MNL] & DRM_ELD_MNL_MASK) >> DRM_ELD_MNL_SHIFT;
-}
-
-/**
- * drm_eld_sad - Get ELD SAD structures.
- * @eld: pointer to an eld memory structure with sad_count set
- */
-static inline const uint8_t *drm_eld_sad(const uint8_t *eld)
-{
-	unsigned int ver, mnl;
-
-	ver = (eld[DRM_ELD_VER] & DRM_ELD_VER_MASK) >> DRM_ELD_VER_SHIFT;
-	if (ver != 2 && ver != 31)
-		return NULL;
-
-	mnl = drm_eld_mnl(eld);
-	if (mnl > 16)
-		return NULL;
-
-	return eld + DRM_ELD_CEA_SAD(mnl, 0);
-}
-
-/**
- * drm_eld_sad_count - Get ELD SAD count.
- * @eld: pointer to an eld memory structure with sad_count set
- */
-static inline int drm_eld_sad_count(const uint8_t *eld)
-{
-	return (eld[DRM_ELD_SAD_COUNT_CONN_TYPE] & DRM_ELD_SAD_COUNT_MASK) >>
-		DRM_ELD_SAD_COUNT_SHIFT;
-}
-
-/**
- * drm_eld_calc_baseline_block_size - Calculate baseline block size in bytes
- * @eld: pointer to an eld memory structure with mnl and sad_count set
- *
- * This is a helper for determining the payload size of the baseline block, in
- * bytes, for e.g. setting the Baseline_ELD_Len field in the ELD header block.
- */
-static inline int drm_eld_calc_baseline_block_size(const uint8_t *eld)
-{
-	return DRM_ELD_MONITOR_NAME_STRING - DRM_ELD_HEADER_BLOCK_SIZE +
-		drm_eld_mnl(eld) + drm_eld_sad_count(eld) * 3;
-}
-
-/**
- * drm_eld_size - Get ELD size in bytes
- * @eld: pointer to a complete eld memory structure
- *
- * The returned value does not include the vendor block. It's vendor specific,
- * and comprises of the remaining bytes in the ELD memory buffer after
- * drm_eld_size() bytes of header and baseline block.
- *
- * The returned value is guaranteed to be a multiple of 4.
- */
-static inline int drm_eld_size(const uint8_t *eld)
-{
-	return DRM_ELD_HEADER_BLOCK_SIZE + eld[DRM_ELD_BASELINE_ELD_LEN] * 4;
-}
-
-/**
- * drm_eld_get_spk_alloc - Get speaker allocation
- * @eld: pointer to an ELD memory structure
- *
- * The returned value is the speakers mask. User has to use %DRM_ELD_SPEAKER
- * field definitions to identify speakers.
- */
-static inline u8 drm_eld_get_spk_alloc(const uint8_t *eld)
-{
-	return eld[DRM_ELD_SPEAKER] & DRM_ELD_SPEAKER_MASK;
-}
-
-/**
- * drm_eld_get_conn_type - Get device type hdmi/dp connected
- * @eld: pointer to an ELD memory structure
- *
- * The caller need to use %DRM_ELD_CONN_TYPE_HDMI or %DRM_ELD_CONN_TYPE_DP to
- * identify the display type connected.
- */
-static inline u8 drm_eld_get_conn_type(const uint8_t *eld)
-{
-	return eld[DRM_ELD_SAD_COUNT_CONN_TYPE] & DRM_ELD_CONN_TYPE_MASK;
-}
-
 /**
  * drm_edid_decode_mfg_id - Decode the manufacturer ID
  * @mfg_id: The manufacturer ID
diff --git a/include/drm/drm_eld.h b/include/drm/drm_eld.h
new file mode 100644
index 000000000000..0a88d10b28b0
--- /dev/null
+++ b/include/drm/drm_eld.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __DRM_ELD_H__
+#define __DRM_ELD_H__
+
+#include <linux/types.h>
+
+struct cea_sad;
+
+/* ELD Header Block */
+#define DRM_ELD_HEADER_BLOCK_SIZE	4
+
+#define DRM_ELD_VER			0
+# define DRM_ELD_VER_SHIFT		3
+# define DRM_ELD_VER_MASK		(0x1f << 3)
+# define DRM_ELD_VER_CEA861D		(2 << 3) /* supports 861D or below */
+# define DRM_ELD_VER_CANNED		(0x1f << 3)
+
+#define DRM_ELD_BASELINE_ELD_LEN	2	/* in dwords! */
+
+/* ELD Baseline Block for ELD_Ver == 2 */
+#define DRM_ELD_CEA_EDID_VER_MNL	4
+# define DRM_ELD_CEA_EDID_VER_SHIFT	5
+# define DRM_ELD_CEA_EDID_VER_MASK	(7 << 5)
+# define DRM_ELD_CEA_EDID_VER_NONE	(0 << 5)
+# define DRM_ELD_CEA_EDID_VER_CEA861	(1 << 5)
+# define DRM_ELD_CEA_EDID_VER_CEA861A	(2 << 5)
+# define DRM_ELD_CEA_EDID_VER_CEA861BCD	(3 << 5)
+# define DRM_ELD_MNL_SHIFT		0
+# define DRM_ELD_MNL_MASK		(0x1f << 0)
+
+#define DRM_ELD_SAD_COUNT_CONN_TYPE	5
+# define DRM_ELD_SAD_COUNT_SHIFT	4
+# define DRM_ELD_SAD_COUNT_MASK		(0xf << 4)
+# define DRM_ELD_CONN_TYPE_SHIFT	2
+# define DRM_ELD_CONN_TYPE_MASK		(3 << 2)
+# define DRM_ELD_CONN_TYPE_HDMI		(0 << 2)
+# define DRM_ELD_CONN_TYPE_DP		(1 << 2)
+# define DRM_ELD_SUPPORTS_AI		(1 << 1)
+# define DRM_ELD_SUPPORTS_HDCP		(1 << 0)
+
+#define DRM_ELD_AUD_SYNCH_DELAY		6	/* in units of 2 ms */
+# define DRM_ELD_AUD_SYNCH_DELAY_MAX	0xfa	/* 500 ms */
+
+#define DRM_ELD_SPEAKER			7
+# define DRM_ELD_SPEAKER_MASK		0x7f
+# define DRM_ELD_SPEAKER_RLRC		(1 << 6)
+# define DRM_ELD_SPEAKER_FLRC		(1 << 5)
+# define DRM_ELD_SPEAKER_RC		(1 << 4)
+# define DRM_ELD_SPEAKER_RLR		(1 << 3)
+# define DRM_ELD_SPEAKER_FC		(1 << 2)
+# define DRM_ELD_SPEAKER_LFE		(1 << 1)
+# define DRM_ELD_SPEAKER_FLR		(1 << 0)
+
+#define DRM_ELD_PORT_ID			8	/* offsets 8..15 inclusive */
+# define DRM_ELD_PORT_ID_LEN		8
+
+#define DRM_ELD_MANUFACTURER_NAME0	16
+#define DRM_ELD_MANUFACTURER_NAME1	17
+
+#define DRM_ELD_PRODUCT_CODE0		18
+#define DRM_ELD_PRODUCT_CODE1		19
+
+#define DRM_ELD_MONITOR_NAME_STRING	20	/* offsets 20..(20+mnl-1) inclusive */
+
+#define DRM_ELD_CEA_SAD(mnl, sad)	(20 + (mnl) + 3 * (sad))
+
+/**
+ * drm_eld_mnl - Get ELD monitor name length in bytes.
+ * @eld: pointer to an eld memory structure with mnl set
+ */
+static inline int drm_eld_mnl(const u8 *eld)
+{
+	return (eld[DRM_ELD_CEA_EDID_VER_MNL] & DRM_ELD_MNL_MASK) >> DRM_ELD_MNL_SHIFT;
+}
+
+int drm_eld_sad_get(const u8 *eld, int sad_index, struct cea_sad *cta_sad);
+int drm_eld_sad_set(u8 *eld, int sad_index, const struct cea_sad *cta_sad);
+
+/**
+ * drm_eld_sad - Get ELD SAD structures.
+ * @eld: pointer to an eld memory structure with sad_count set
+ */
+static inline const u8 *drm_eld_sad(const u8 *eld)
+{
+	unsigned int ver, mnl;
+
+	ver = (eld[DRM_ELD_VER] & DRM_ELD_VER_MASK) >> DRM_ELD_VER_SHIFT;
+	if (ver != 2 && ver != 31)
+		return NULL;
+
+	mnl = drm_eld_mnl(eld);
+	if (mnl > 16)
+		return NULL;
+
+	return eld + DRM_ELD_CEA_SAD(mnl, 0);
+}
+
+/**
+ * drm_eld_sad_count - Get ELD SAD count.
+ * @eld: pointer to an eld memory structure with sad_count set
+ */
+static inline int drm_eld_sad_count(const u8 *eld)
+{
+	return (eld[DRM_ELD_SAD_COUNT_CONN_TYPE] & DRM_ELD_SAD_COUNT_MASK) >>
+		DRM_ELD_SAD_COUNT_SHIFT;
+}
+
+/**
+ * drm_eld_calc_baseline_block_size - Calculate baseline block size in bytes
+ * @eld: pointer to an eld memory structure with mnl and sad_count set
+ *
+ * This is a helper for determining the payload size of the baseline block, in
+ * bytes, for e.g. setting the Baseline_ELD_Len field in the ELD header block.
+ */
+static inline int drm_eld_calc_baseline_block_size(const u8 *eld)
+{
+	return DRM_ELD_MONITOR_NAME_STRING - DRM_ELD_HEADER_BLOCK_SIZE +
+		drm_eld_mnl(eld) + drm_eld_sad_count(eld) * 3;
+}
+
+/**
+ * drm_eld_size - Get ELD size in bytes
+ * @eld: pointer to a complete eld memory structure
+ *
+ * The returned value does not include the vendor block. It's vendor specific,
+ * and comprises of the remaining bytes in the ELD memory buffer after
+ * drm_eld_size() bytes of header and baseline block.
+ *
+ * The returned value is guaranteed to be a multiple of 4.
+ */
+static inline int drm_eld_size(const u8 *eld)
+{
+	return DRM_ELD_HEADER_BLOCK_SIZE + eld[DRM_ELD_BASELINE_ELD_LEN] * 4;
+}
+
+/**
+ * drm_eld_get_spk_alloc - Get speaker allocation
+ * @eld: pointer to an ELD memory structure
+ *
+ * The returned value is the speakers mask. User has to use %DRM_ELD_SPEAKER
+ * field definitions to identify speakers.
+ */
+static inline u8 drm_eld_get_spk_alloc(const u8 *eld)
+{
+	return eld[DRM_ELD_SPEAKER] & DRM_ELD_SPEAKER_MASK;
+}
+
+/**
+ * drm_eld_get_conn_type - Get device type hdmi/dp connected
+ * @eld: pointer to an ELD memory structure
+ *
+ * The caller need to use %DRM_ELD_CONN_TYPE_HDMI or %DRM_ELD_CONN_TYPE_DP to
+ * identify the display type connected.
+ */
+static inline u8 drm_eld_get_conn_type(const u8 *eld)
+{
+	return eld[DRM_ELD_SAD_COUNT_CONN_TYPE] & DRM_ELD_CONN_TYPE_MASK;
+}
+
+#endif /* __DRM_ELD_H__ */
diff --git a/include/drm/drm_flip_work.h b/include/drm/drm_flip_work.h
index 21c3d512d25c..1eef3283a109 100644
--- a/include/drm/drm_flip_work.h
+++ b/include/drm/drm_flip_work.h
@@ -31,11 +31,10 @@
 /**
  * DOC: flip utils
  *
- * Util to queue up work to run from work-queue context after flip/vblank.
+ * Utility to queue up work to run from work-queue context after flip/vblank.
  * Typically this can be used to defer unref of framebuffer's, cursor
- * bo's, etc until after vblank.  The APIs are all thread-safe.
- * Moreover, drm_flip_work_queue_task and drm_flip_work_queue can be called
- * in atomic context.
+ * bo's, etc until after vblank. The APIs are all thread-safe. Moreover,
+ * drm_flip_work_commit() can be called in atomic context.
  */
 
 struct drm_flip_work;
@@ -51,16 +50,6 @@ struct drm_flip_work;
  */
 typedef void (*drm_flip_func_t)(struct drm_flip_work *work, void *val);
 
-/**
- * struct drm_flip_task - flip work task
- * @node: list entry element
- * @data: data to pass to &drm_flip_work.func
- */
-struct drm_flip_task {
-	struct list_head node;
-	void *data;
-};
-
 /**
  * struct drm_flip_work - flip work queue
  * @name: debug name
@@ -79,9 +68,6 @@ struct drm_flip_work {
 	spinlock_t lock;
 };
 
-struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags);
-void drm_flip_work_queue_task(struct drm_flip_work *work,
-			      struct drm_flip_task *task);
 void drm_flip_work_queue(struct drm_flip_work *work, void *val);
 void drm_flip_work_commit(struct drm_flip_work *work,
 		struct workqueue_struct *wq);
diff --git a/include/drm/drm_format_helper.h b/include/drm/drm_format_helper.h
index 291deb09475b..f13b34e0b752 100644
--- a/include/drm/drm_format_helper.h
+++ b/include/drm/drm_format_helper.h
@@ -15,6 +15,57 @@ struct drm_rect;
 
 struct iosys_map;
 
+/**
+ * struct drm_format_conv_state - Stores format-conversion state
+ *
+ * DRM helpers for format conversion store temporary state in
+ * struct drm_xfrm_buf. The buffer's resources can be reused
+ * among multiple conversion operations.
+ *
+ * All fields are considered private.
+ */
+struct drm_format_conv_state {
+	struct {
+		void *mem;
+		size_t size;
+		bool preallocated;
+	} tmp;
+};
+
+#define __DRM_FORMAT_CONV_STATE_INIT(_mem, _size, _preallocated) { \
+		.tmp = { \
+			.mem = (_mem), \
+			.size = (_size), \
+			.preallocated = (_preallocated), \
+		} \
+	}
+
+/**
+ * DRM_FORMAT_CONV_STATE_INIT - Initializer for struct drm_format_conv_state
+ *
+ * Initializes an instance of struct drm_format_conv_state to default values.
+ */
+#define DRM_FORMAT_CONV_STATE_INIT \
+	__DRM_FORMAT_CONV_STATE_INIT(NULL, 0, false)
+
+/**
+ * DRM_FORMAT_CONV_STATE_INIT_PREALLOCATED - Initializer for struct drm_format_conv_state
+ * @_mem: The preallocated memory area
+ * @_size: The number of bytes in _mem
+ *
+ * Initializes an instance of struct drm_format_conv_state to preallocated
+ * storage. The caller is responsible for releasing the provided memory range.
+ */
+#define DRM_FORMAT_CONV_STATE_INIT_PREALLOCATED(_mem, _size) \
+	__DRM_FORMAT_CONV_STATE_INIT(_mem, _size, true)
+
+void drm_format_conv_state_init(struct drm_format_conv_state *state);
+void drm_format_conv_state_copy(struct drm_format_conv_state *state,
+				const struct drm_format_conv_state *old_state);
+void *drm_format_conv_state_reserve(struct drm_format_conv_state *state,
+				    size_t new_size, gfp_t flags);
+void drm_format_conv_state_release(struct drm_format_conv_state *state);
+
 unsigned int drm_fb_clip_offset(unsigned int pitch, const struct drm_format_info *format,
 				const struct drm_rect *clip);
 
@@ -23,45 +74,49 @@ void drm_fb_memcpy(struct iosys_map *dst, const unsigned int *dst_pitch,
 		   const struct drm_rect *clip);
 void drm_fb_swab(struct iosys_map *dst, const unsigned int *dst_pitch,
 		 const struct iosys_map *src, const struct drm_framebuffer *fb,
-		 const struct drm_rect *clip, bool cached);
+		 const struct drm_rect *clip, bool cached,
+		 struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_rgb332(struct iosys_map *dst, const unsigned int *dst_pitch,
 			       const struct iosys_map *src, const struct drm_framebuffer *fb,
-			       const struct drm_rect *clip);
+			       const struct drm_rect *clip, struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_rgb565(struct iosys_map *dst, const unsigned int *dst_pitch,
 			       const struct iosys_map *src, const struct drm_framebuffer *fb,
-			       const struct drm_rect *clip, bool swab);
+			       const struct drm_rect *clip, struct drm_format_conv_state *state,
+			       bool swab);
 void drm_fb_xrgb8888_to_xrgb1555(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip);
+				 const struct drm_rect *clip, struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_argb1555(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip);
+				 const struct drm_rect *clip, struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_rgba5551(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip);
+				 const struct drm_rect *clip, struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_rgb888(struct iosys_map *dst, const unsigned int *dst_pitch,
 			       const struct iosys_map *src, const struct drm_framebuffer *fb,
-			       const struct drm_rect *clip);
+			       const struct drm_rect *clip, struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_argb8888(struct iosys_map *dst, const unsigned int *dst_pitch,
 				 const struct iosys_map *src, const struct drm_framebuffer *fb,
-				 const struct drm_rect *clip);
+				 const struct drm_rect *clip, struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_xrgb2101010(struct iosys_map *dst, const unsigned int *dst_pitch,
 				    const struct iosys_map *src, const struct drm_framebuffer *fb,
-				    const struct drm_rect *clip);
+				    const struct drm_rect *clip,
+				    struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_argb2101010(struct iosys_map *dst, const unsigned int *dst_pitch,
 				    const struct iosys_map *src, const struct drm_framebuffer *fb,
-				    const struct drm_rect *clip);
+				    const struct drm_rect *clip,
+				    struct drm_format_conv_state *state);
 void drm_fb_xrgb8888_to_gray8(struct iosys_map *dst, const unsigned int *dst_pitch,
 			      const struct iosys_map *src, const struct drm_framebuffer *fb,
-			      const struct drm_rect *clip);
+			      const struct drm_rect *clip, struct drm_format_conv_state *state);
 
 int drm_fb_blit(struct iosys_map *dst, const unsigned int *dst_pitch, uint32_t dst_format,
 		const struct iosys_map *src, const struct drm_framebuffer *fb,
-		const struct drm_rect *rect);
+		const struct drm_rect *clip, struct drm_format_conv_state *state);
 
 void drm_fb_xrgb8888_to_mono(struct iosys_map *dst, const unsigned int *dst_pitch,
 			     const struct iosys_map *src, const struct drm_framebuffer *fb,
-			     const struct drm_rect *clip);
+			     const struct drm_rect *clip, struct drm_format_conv_state *state);
 
 size_t drm_fb_build_fourcc_list(struct drm_device *dev,
 				const u32 *native_fourccs, size_t native_nfourccs,
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 16364487fde9..369505447acd 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -580,7 +580,7 @@ int drm_gem_evict(struct drm_gem_object *obj);
  * drm_gem_gpuva_init() - initialize the gpuva list of a GEM object
  * @obj: the &drm_gem_object
  *
- * This initializes the &drm_gem_object's &drm_gpuva list.
+ * This initializes the &drm_gem_object's &drm_gpuvm_bo list.
  *
  * Calling this function is only necessary for drivers intending to support the
  * &drm_driver_feature DRIVER_GEM_GPUVA.
@@ -593,28 +593,28 @@ static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
 }
 
 /**
- * drm_gem_for_each_gpuva() - iternator to walk over a list of gpuvas
- * @entry__: &drm_gpuva structure to assign to in each iteration step
- * @obj__: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ * drm_gem_for_each_gpuvm_bo() - iterator to walk over a list of &drm_gpuvm_bo
+ * @entry__: &drm_gpuvm_bo structure to assign to in each iteration step
+ * @obj__: the &drm_gem_object the &drm_gpuvm_bo to walk are associated with
  *
- * This iterator walks over all &drm_gpuva structures associated with the
- * &drm_gpuva_manager.
+ * This iterator walks over all &drm_gpuvm_bo structures associated with the
+ * &drm_gem_object.
  */
-#define drm_gem_for_each_gpuva(entry__, obj__) \
-	list_for_each_entry(entry__, &(obj__)->gpuva.list, gem.entry)
+#define drm_gem_for_each_gpuvm_bo(entry__, obj__) \
+	list_for_each_entry(entry__, &(obj__)->gpuva.list, list.entry.gem)
 
 /**
- * drm_gem_for_each_gpuva_safe() - iternator to safely walk over a list of
- * gpuvas
- * @entry__: &drm_gpuva structure to assign to in each iteration step
- * @next__: &next &drm_gpuva to store the next step
- * @obj__: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ * drm_gem_for_each_gpuvm_bo_safe() - iterator to safely walk over a list of
+ * &drm_gpuvm_bo
+ * @entry__: &drm_gpuvm_bostructure to assign to in each iteration step
+ * @next__: &next &drm_gpuvm_bo to store the next step
+ * @obj__: the &drm_gem_object the &drm_gpuvm_bo to walk are associated with
  *
- * This iterator walks over all &drm_gpuva structures associated with the
+ * This iterator walks over all &drm_gpuvm_bo structures associated with the
  * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
  * it is save against removal of elements.
  */
-#define drm_gem_for_each_gpuva_safe(entry__, next__, obj__) \
-	list_for_each_entry_safe(entry__, next__, &(obj__)->gpuva.list, gem.entry)
+#define drm_gem_for_each_gpuvm_bo_safe(entry__, next__, obj__) \
+	list_for_each_entry_safe(entry__, next__, &(obj__)->gpuva.list, list.entry.gem)
 
 #endif /* __DRM_GEM_H__ */
diff --git a/include/drm/drm_gem_atomic_helper.h b/include/drm/drm_gem_atomic_helper.h
index 40b8b039518e..3e01c619a25e 100644
--- a/include/drm/drm_gem_atomic_helper.h
+++ b/include/drm/drm_gem_atomic_helper.h
@@ -5,6 +5,7 @@
 
 #include <linux/iosys-map.h>
 
+#include <drm/drm_format_helper.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_plane.h>
 
@@ -49,6 +50,15 @@ struct drm_shadow_plane_state {
 	/** @base: plane state */
 	struct drm_plane_state base;
 
+	/**
+	 * @fmtcnv_state: Format-conversion state
+	 *
+	 * Per-plane state for format conversion.
+	 * Flags for copying shadow buffers into backend storage. Also holds
+	 * temporary storage for format conversion.
+	 */
+	struct drm_format_conv_state fmtcnv_state;
+
 	/* Transitional state - do not export or duplicate */
 
 	/**
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index bdfafc4a7705..8ca10461d8ac 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -25,13 +25,17 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#include <linux/dma-resv.h>
 #include <linux/list.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
 
+#include <drm/drm_device.h>
 #include <drm/drm_gem.h>
+#include <drm/drm_exec.h>
 
 struct drm_gpuvm;
+struct drm_gpuvm_bo;
 struct drm_gpuvm_ops;
 
 /**
@@ -72,6 +76,12 @@ struct drm_gpuva {
 	 */
 	struct drm_gpuvm *vm;
 
+	/**
+	 * @vm_bo: the &drm_gpuvm_bo abstraction for the mapped
+	 * &drm_gem_object
+	 */
+	struct drm_gpuvm_bo *vm_bo;
+
 	/**
 	 * @flags: the &drm_gpuva_flags for this mapping
 	 */
@@ -107,7 +117,7 @@ struct drm_gpuva {
 		struct drm_gem_object *obj;
 
 		/**
-		 * @entry: the &list_head to attach this object to a &drm_gem_object
+		 * @entry: the &list_head to attach this object to a &drm_gpuvm_bo
 		 */
 		struct list_head entry;
 	} gem;
@@ -140,7 +150,7 @@ struct drm_gpuva {
 int drm_gpuva_insert(struct drm_gpuvm *gpuvm, struct drm_gpuva *va);
 void drm_gpuva_remove(struct drm_gpuva *va);
 
-void drm_gpuva_link(struct drm_gpuva *va);
+void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo);
 void drm_gpuva_unlink(struct drm_gpuva *va);
 
 struct drm_gpuva *drm_gpuva_find(struct drm_gpuvm *gpuvm,
@@ -183,6 +193,22 @@ static inline bool drm_gpuva_invalidated(struct drm_gpuva *va)
 	return va->flags & DRM_GPUVA_INVALIDATED;
 }
 
+/**
+ * enum drm_gpuvm_flags - flags for struct drm_gpuvm
+ */
+enum drm_gpuvm_flags {
+	/**
+	 * @DRM_GPUVM_RESV_PROTECTED: GPUVM is protected externally by the
+	 * GPUVM's &dma_resv lock
+	 */
+	DRM_GPUVM_RESV_PROTECTED = BIT(0),
+
+	/**
+	 * @DRM_GPUVM_USERBITS: user defined bits
+	 */
+	DRM_GPUVM_USERBITS = BIT(1),
+};
+
 /**
  * struct drm_gpuvm - DRM GPU VA Manager
  *
@@ -201,6 +227,16 @@ struct drm_gpuvm {
 	 */
 	const char *name;
 
+	/**
+	 * @flags: the &drm_gpuvm_flags of this GPUVM
+	 */
+	enum drm_gpuvm_flags flags;
+
+	/**
+	 * @drm: the &drm_device this VM lives in
+	 */
+	struct drm_device *drm;
+
 	/**
 	 * @mm_start: start of the VA space
 	 */
@@ -226,6 +262,11 @@ struct drm_gpuvm {
 		struct list_head list;
 	} rb;
 
+	/**
+	 * @kref: reference count of this object
+	 */
+	struct kref kref;
+
 	/**
 	 * @kernel_alloc_node:
 	 *
@@ -238,16 +279,147 @@ struct drm_gpuvm {
 	 * @ops: &drm_gpuvm_ops providing the split/merge steps to drivers
 	 */
 	const struct drm_gpuvm_ops *ops;
+
+	/**
+	 * @r_obj: Resv GEM object; representing the GPUVM's common &dma_resv.
+	 */
+	struct drm_gem_object *r_obj;
+
+	/**
+	 * @extobj: structure holding the extobj list
+	 */
+	struct {
+		/**
+		 * @list: &list_head storing &drm_gpuvm_bos serving as
+		 * external object
+		 */
+		struct list_head list;
+
+		/**
+		 * @local_list: pointer to the local list temporarily storing
+		 * entries from the external object list
+		 */
+		struct list_head *local_list;
+
+		/**
+		 * @lock: spinlock to protect the extobj list
+		 */
+		spinlock_t lock;
+	} extobj;
+
+	/**
+	 * @evict: structure holding the evict list and evict list lock
+	 */
+	struct {
+		/**
+		 * @list: &list_head storing &drm_gpuvm_bos currently being
+		 * evicted
+		 */
+		struct list_head list;
+
+		/**
+		 * @local_list: pointer to the local list temporarily storing
+		 * entries from the evicted object list
+		 */
+		struct list_head *local_list;
+
+		/**
+		 * @lock: spinlock to protect the evict list
+		 */
+		spinlock_t lock;
+	} evict;
 };
 
 void drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
+		    enum drm_gpuvm_flags flags,
+		    struct drm_device *drm,
+		    struct drm_gem_object *r_obj,
 		    u64 start_offset, u64 range,
 		    u64 reserve_offset, u64 reserve_range,
 		    const struct drm_gpuvm_ops *ops);
-void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
 
+/**
+ * drm_gpuvm_get() - acquire a struct drm_gpuvm reference
+ * @gpuvm: the &drm_gpuvm to acquire the reference of
+ *
+ * This function acquires an additional reference to @gpuvm. It is illegal to
+ * call this without already holding a reference. No locks required.
+ */
+static inline struct drm_gpuvm *
+drm_gpuvm_get(struct drm_gpuvm *gpuvm)
+{
+	kref_get(&gpuvm->kref);
+
+	return gpuvm;
+}
+
+void drm_gpuvm_put(struct drm_gpuvm *gpuvm);
+
+bool drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm, u64 addr, u64 range);
 bool drm_gpuvm_interval_empty(struct drm_gpuvm *gpuvm, u64 addr, u64 range);
 
+struct drm_gem_object *
+drm_gpuvm_resv_object_alloc(struct drm_device *drm);
+
+/**
+ * drm_gpuvm_resv_protected() - indicates whether &DRM_GPUVM_RESV_PROTECTED is
+ * set
+ * @gpuvm: the &drm_gpuvm
+ *
+ * Returns: true if &DRM_GPUVM_RESV_PROTECTED is set, false otherwise.
+ */
+static inline bool
+drm_gpuvm_resv_protected(struct drm_gpuvm *gpuvm)
+{
+	return gpuvm->flags & DRM_GPUVM_RESV_PROTECTED;
+}
+
+/**
+ * drm_gpuvm_resv() - returns the &drm_gpuvm's &dma_resv
+ * @gpuvm__: the &drm_gpuvm
+ *
+ * Returns: a pointer to the &drm_gpuvm's shared &dma_resv
+ */
+#define drm_gpuvm_resv(gpuvm__) ((gpuvm__)->r_obj->resv)
+
+/**
+ * drm_gpuvm_resv_obj() - returns the &drm_gem_object holding the &drm_gpuvm's
+ * &dma_resv
+ * @gpuvm__: the &drm_gpuvm
+ *
+ * Returns: a pointer to the &drm_gem_object holding the &drm_gpuvm's shared
+ * &dma_resv
+ */
+#define drm_gpuvm_resv_obj(gpuvm__) ((gpuvm__)->r_obj)
+
+#define drm_gpuvm_resv_held(gpuvm__) \
+	dma_resv_held(drm_gpuvm_resv(gpuvm__))
+
+#define drm_gpuvm_resv_assert_held(gpuvm__) \
+	dma_resv_assert_held(drm_gpuvm_resv(gpuvm__))
+
+#define drm_gpuvm_resv_held(gpuvm__) \
+	dma_resv_held(drm_gpuvm_resv(gpuvm__))
+
+#define drm_gpuvm_resv_assert_held(gpuvm__) \
+	dma_resv_assert_held(drm_gpuvm_resv(gpuvm__))
+
+/**
+ * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
+ * external object
+ * @gpuvm: the &drm_gpuvm to check
+ * @obj: the &drm_gem_object to check
+ *
+ * Returns: true if the &drm_gem_object &dma_resv differs from the
+ * &drm_gpuvms &dma_resv, false otherwise
+ */
+static inline bool
+drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
+		    struct drm_gem_object *obj)
+{
+	return obj && obj->resv != drm_gpuvm_resv(gpuvm);
+}
+
 static inline struct drm_gpuva *
 __drm_gpuva_next(struct drm_gpuva *va)
 {
@@ -326,6 +498,302 @@ __drm_gpuva_next(struct drm_gpuva *va)
 #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
 	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
 
+/**
+ * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
+ *
+ * This structure should be created on the stack as &drm_exec should be.
+ *
+ * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
+ */
+struct drm_gpuvm_exec {
+	/**
+	 * @exec: the &drm_exec structure
+	 */
+	struct drm_exec exec;
+
+	/**
+	 * @flags: the flags for the struct drm_exec
+	 */
+	uint32_t flags;
+
+	/**
+	 * @vm: the &drm_gpuvm to lock its DMA reservations
+	 */
+	struct drm_gpuvm *vm;
+
+	/**
+	 * @num_fences: the number of fences to reserve for the &dma_resv of the
+	 * locked &drm_gem_objects
+	 */
+	unsigned int num_fences;
+
+	/**
+	 * @extra: Callback and corresponding private data for the driver to
+	 * lock arbitrary additional &drm_gem_objects.
+	 */
+	struct {
+		/**
+		 * @fn: The driver callback to lock additional &drm_gem_objects.
+		 */
+		int (*fn)(struct drm_gpuvm_exec *vm_exec);
+
+		/**
+		 * @priv: driver private data for the @fn callback
+		 */
+		void *priv;
+	} extra;
+};
+
+/**
+ * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec context
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
+ *
+ * Using this function directly, it is the drivers responsibility to call
+ * drm_exec_init() and drm_exec_fini() accordingly.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static inline int
+drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
+		     struct drm_exec *exec,
+		     unsigned int num_fences)
+{
+	return drm_exec_prepare_obj(exec, gpuvm->r_obj, num_fences);
+}
+
+int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+			      struct drm_exec *exec,
+			      unsigned int num_fences);
+
+int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
+			    struct drm_exec *exec,
+			    u64 addr, u64 range,
+			    unsigned int num_fences);
+
+int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec);
+
+int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
+			      struct drm_gem_object **objs,
+			      unsigned int num_objs);
+
+int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
+			      u64 addr, u64 range);
+
+/**
+ * drm_gpuvm_exec_unlock() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec wrapper
+ *
+ * Releases all dma-resv locks of all &drm_gem_objects previously acquired
+ * through drm_gpuvm_exec_lock() or its variants.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static inline void
+drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
+{
+	drm_exec_fini(&vm_exec->exec);
+}
+
+int drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec);
+void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
+			      struct drm_exec *exec,
+			      struct dma_fence *fence,
+			      enum dma_resv_usage private_usage,
+			      enum dma_resv_usage extobj_usage);
+
+/**
+ * drm_gpuvm_exec_resv_add_fence()
+ * @vm_exec: the &drm_gpuvm_exec wrapper
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ *
+ * See drm_gpuvm_resv_add_fence().
+ */
+static inline void
+drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
+			      struct dma_fence *fence,
+			      enum dma_resv_usage private_usage,
+			      enum dma_resv_usage extobj_usage)
+{
+	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
+				 private_usage, extobj_usage);
+}
+
+/**
+ * drm_gpuvm_exec_validate()
+ * @vm_exec: the &drm_gpuvm_exec wrapper
+ *
+ * See drm_gpuvm_validate().
+ */
+static inline int
+drm_gpuvm_exec_validate(struct drm_gpuvm_exec *vm_exec)
+{
+	return drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
+}
+
+/**
+ * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
+ * &drm_gem_object combination
+ *
+ * This structure is an abstraction representing a &drm_gpuvm and
+ * &drm_gem_object combination. It serves as an indirection to accelerate
+ * iterating all &drm_gpuvas within a &drm_gpuvm backed by the same
+ * &drm_gem_object.
+ *
+ * Furthermore it is used cache evicted GEM objects for a certain GPU-VM to
+ * accelerate validation.
+ *
+ * Typically, drivers want to create an instance of a struct drm_gpuvm_bo once
+ * a GEM object is mapped first in a GPU-VM and release the instance once the
+ * last mapping of the GEM object in this GPU-VM is unmapped.
+ */
+struct drm_gpuvm_bo {
+	/**
+	 * @vm: The &drm_gpuvm the @obj is mapped in. This is a reference
+	 * counted pointer.
+	 */
+	struct drm_gpuvm *vm;
+
+	/**
+	 * @obj: The &drm_gem_object being mapped in @vm. This is a reference
+	 * counted pointer.
+	 */
+	struct drm_gem_object *obj;
+
+	/**
+	 * @evicted: Indicates whether the &drm_gem_object is evicted; field
+	 * protected by the &drm_gem_object's dma-resv lock.
+	 */
+	bool evicted;
+
+	/**
+	 * @kref: The reference count for this &drm_gpuvm_bo.
+	 */
+	struct kref kref;
+
+	/**
+	 * @list: Structure containing all &list_heads.
+	 */
+	struct {
+		/**
+		 * @gpuva: The list of linked &drm_gpuvas.
+		 *
+		 * It is safe to access entries from this list as long as the
+		 * GEM's gpuva lock is held. See also struct drm_gem_object.
+		 */
+		struct list_head gpuva;
+
+		/**
+		 * @entry: Structure containing all &list_heads serving as
+		 * entry.
+		 */
+		struct {
+			/**
+			 * @gem: List entry to attach to the &drm_gem_objects
+			 * gpuva list.
+			 */
+			struct list_head gem;
+
+			/**
+			 * @evict: List entry to attach to the &drm_gpuvms
+			 * extobj list.
+			 */
+			struct list_head extobj;
+
+			/**
+			 * @evict: List entry to attach to the &drm_gpuvms evict
+			 * list.
+			 */
+			struct list_head evict;
+		} entry;
+	} list;
+};
+
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
+		    struct drm_gem_object *obj);
+
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
+		    struct drm_gem_object *obj);
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *vm_bo);
+
+/**
+ * drm_gpuvm_bo_get() - acquire a struct drm_gpuvm_bo reference
+ * @vm_bo: the &drm_gpuvm_bo to acquire the reference of
+ *
+ * This function acquires an additional reference to @vm_bo. It is illegal to
+ * call this without already holding a reference. No locks required.
+ */
+static inline struct drm_gpuvm_bo *
+drm_gpuvm_bo_get(struct drm_gpuvm_bo *vm_bo)
+{
+	kref_get(&vm_bo->kref);
+	return vm_bo;
+}
+
+void drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo);
+
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
+		  struct drm_gem_object *obj);
+
+void drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict);
+
+/**
+ * drm_gpuvm_bo_gem_evict()
+ * @obj: the &drm_gem_object
+ * @evict: indicates whether @obj is evicted
+ *
+ * See drm_gpuvm_bo_evict().
+ */
+static inline void
+drm_gpuvm_bo_gem_evict(struct drm_gem_object *obj, bool evict)
+{
+	struct drm_gpuvm_bo *vm_bo;
+
+	drm_gem_gpuva_assert_lock_held(obj);
+	drm_gem_for_each_gpuvm_bo(vm_bo, obj)
+		drm_gpuvm_bo_evict(vm_bo, evict);
+}
+
+void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
+
+/**
+ * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
+ * @va__: &drm_gpuva structure to assign to in each iteration step
+ * @vm_bo__: the &drm_gpuvm_bo the &drm_gpuva to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuvm_bo.
+ *
+ * The caller must hold the GEM's gpuva lock.
+ */
+#define drm_gpuvm_bo_for_each_va(va__, vm_bo__) \
+	list_for_each_entry(va__, &(vm_bo)->list.gpuva, gem.entry)
+
+/**
+ * drm_gpuvm_bo_for_each_va_safe() - iterator to safely walk over a list of
+ * &drm_gpuva
+ * @va__: &drm_gpuva structure to assign to in each iteration step
+ * @next__: &next &drm_gpuva to store the next step
+ * @vm_bo__: the &drm_gpuvm_bo the &drm_gpuva to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuvm_bo. It is implemented with list_for_each_entry_safe(), hence
+ * it is save against removal of elements.
+ *
+ * The caller must hold the GEM's gpuva lock.
+ */
+#define drm_gpuvm_bo_for_each_va_safe(va__, next__, vm_bo__) \
+	list_for_each_entry_safe(va__, next__, &(vm_bo)->list.gpuva, gem.entry)
+
 /**
  * enum drm_gpuva_op_type - GPU VA operation type
  *
@@ -595,8 +1063,7 @@ drm_gpuvm_prefetch_ops_create(struct drm_gpuvm *gpuvm,
 				 u64 addr, u64 range);
 
 struct drm_gpuva_ops *
-drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
-			       struct drm_gem_object *obj);
+drm_gpuvm_bo_unmap_ops_create(struct drm_gpuvm_bo *vm_bo);
 
 void drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
 			struct drm_gpuva_ops *ops);
@@ -616,6 +1083,14 @@ static inline void drm_gpuva_init_from_op(struct drm_gpuva *va,
  * operations to drivers.
  */
 struct drm_gpuvm_ops {
+	/**
+	 * @vm_free: called when the last reference of a struct drm_gpuvm is
+	 * dropped
+	 *
+	 * This callback is mandatory.
+	 */
+	void (*vm_free)(struct drm_gpuvm *gpuvm);
+
 	/**
 	 * @op_alloc: called when the &drm_gpuvm allocates
 	 * a struct drm_gpuva_op
@@ -640,6 +1115,42 @@ struct drm_gpuvm_ops {
 	 */
 	void (*op_free)(struct drm_gpuva_op *op);
 
+	/**
+	 * @vm_bo_alloc: called when the &drm_gpuvm allocates
+	 * a struct drm_gpuvm_bo
+	 *
+	 * Some drivers may want to embed struct drm_gpuvm_bo into driver
+	 * specific structures. By implementing this callback drivers can
+	 * allocate memory accordingly.
+	 *
+	 * This callback is optional.
+	 */
+	struct drm_gpuvm_bo *(*vm_bo_alloc)(void);
+
+	/**
+	 * @vm_bo_free: called when the &drm_gpuvm frees a
+	 * struct drm_gpuvm_bo
+	 *
+	 * Some drivers may want to embed struct drm_gpuvm_bo into driver
+	 * specific structures. By implementing this callback drivers can
+	 * free the previously allocated memory accordingly.
+	 *
+	 * This callback is optional.
+	 */
+	void (*vm_bo_free)(struct drm_gpuvm_bo *vm_bo);
+
+	/**
+	 * @vm_bo_validate: called from drm_gpuvm_validate()
+	 *
+	 * Drivers receive this callback for every evicted &drm_gem_object being
+	 * mapped in the corresponding &drm_gpuvm.
+	 *
+	 * Typically, drivers would call their driver specific variant of
+	 * ttm_bo_validate() from within this callback.
+	 */
+	int (*vm_bo_validate)(struct drm_gpuvm_bo *vm_bo,
+			      struct drm_exec *exec);
+
 	/**
 	 * @sm_step_map: called from &drm_gpuvm_sm_map to finally insert the
 	 * mapping once all previous steps were completed
diff --git a/include/drm/drm_mipi_dbi.h b/include/drm/drm_mipi_dbi.h
index 816f196b3d4c..e8e0f8d39f3a 100644
--- a/include/drm/drm_mipi_dbi.h
+++ b/include/drm/drm_mipi_dbi.h
@@ -12,6 +12,7 @@
 #include <drm/drm_device.h>
 #include <drm/drm_simple_kms_helper.h>
 
+struct drm_format_conv_state;
 struct drm_rect;
 struct gpio_desc;
 struct iosys_map;
@@ -192,7 +193,8 @@ int mipi_dbi_command_buf(struct mipi_dbi *dbi, u8 cmd, u8 *data, size_t len);
 int mipi_dbi_command_stackbuf(struct mipi_dbi *dbi, u8 cmd, const u8 *data,
 			      size_t len);
 int mipi_dbi_buf_copy(void *dst, struct iosys_map *src, struct drm_framebuffer *fb,
-		      struct drm_rect *clip, bool swap);
+		      struct drm_rect *clip, bool swap,
+		      struct drm_format_conv_state *fmtcnv_state);
 
 /**
  * mipi_dbi_command - MIPI DCS command with optional parameter(s)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d2fb81e34174..9a50348bd5c0 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -320,6 +320,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * @sched: the scheduler instance on which this job is scheduled.
  * @s_fence: contains the fences for the scheduling of job.
  * @finish_cb: the callback for the finished fence.
+ * @credits: the number of credits this job contributes to the scheduler
  * @work: Helper to reschdeule job kill to different context.
  * @id: a unique id assigned to each job scheduled on the scheduler.
  * @karma: increment on every hang caused by this job. If this exceeds the hang
@@ -339,6 +340,8 @@ struct drm_sched_job {
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
 
+	u32				credits;
+
 	/*
 	 * work is used only after finish_cb has been used and will not be
 	 * accessed anymore.
@@ -462,29 +465,42 @@ struct drm_sched_backend_ops {
          * and it's time to clean it up.
 	 */
 	void (*free_job)(struct drm_sched_job *sched_job);
+
+	/**
+	 * @update_job_credits: Called when the scheduler is considering this
+	 * job for execution.
+	 *
+	 * This callback returns the number of credits the job would take if
+	 * pushed to the hardware. Drivers may use this to dynamically update
+	 * the job's credit count. For instance, deduct the number of credits
+	 * for already signalled native fences.
+	 *
+	 * This callback is optional.
+	 */
+	u32 (*update_job_credits)(struct drm_sched_job *sched_job);
 };
 
 /**
  * struct drm_gpu_scheduler - scheduler instance-specific data
  *
  * @ops: backend operations provided by the driver.
- * @hw_submission_limit: the max size of the hardware queue.
+ * @credit_limit: the credit limit of this scheduler
+ * @credit_count: the current credit count of this scheduler
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
  * @num_rqs: Number of run-queues. This is at most DRM_SCHED_PRIORITY_COUNT,
  *           as there's usually one run-queue per priority, but could be less.
  * @sched_rq: An allocated array of run-queues of size @num_rqs;
- * @wake_up_worker: the wait queue on which the scheduler sleeps until a job
- *                  is ready to be scheduled.
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
- * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
+ * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
  * @timeout_wq: workqueue used to queue @work_tdr
+ * @work_run_job: work which calls run_job op of each scheduler.
+ * @work_free_job: work which calls free_job op of each scheduler.
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
- * @thread: the kthread on which the scheduler which run.
  * @pending_list: the list of jobs which are currently in the job queue.
  * @job_list_lock: lock to protect the pending_list.
  * @hang_limit: once the hangs by a job crosses this limit then it is marked
@@ -493,24 +509,27 @@ struct drm_sched_backend_ops {
  * @_score: score used when the driver doesn't provide one
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @pause_submit: pause queuing of @work_run_job on @submit_wq
+ * @own_submit_wq: scheduler owns allocation of @submit_wq
  * @dev: system &struct device
  *
  * One scheduler is implemented for each hardware ring.
  */
 struct drm_gpu_scheduler {
 	const struct drm_sched_backend_ops	*ops;
-	uint32_t			hw_submission_limit;
+	u32				credit_limit;
+	atomic_t			credit_count;
 	long				timeout;
 	const char			*name;
 	u32                             num_rqs;
 	struct drm_sched_rq             **sched_rq;
-	wait_queue_head_t		wake_up_worker;
 	wait_queue_head_t		job_scheduled;
-	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
+	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
+	struct work_struct		work_run_job;
+	struct work_struct		work_free_job;
 	struct delayed_work		work_tdr;
-	struct task_struct		*thread;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
 	int				hang_limit;
@@ -518,19 +537,22 @@ struct drm_gpu_scheduler {
 	atomic_t                        _score;
 	bool				ready;
 	bool				free_guilty;
+	bool				pause_submit;
+	bool				own_submit_wq;
 	struct device			*dev;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
-		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
+		   struct workqueue_struct *submit_wq,
+		   u32 num_rqs, u32 credit_limit, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       void *owner);
+		       u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
@@ -550,8 +572,12 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
 
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
-void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched, struct drm_sched_entity *entity);
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
 void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
 void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h
index 1b06d074ade0..e3649a6563dd 100644
--- a/include/linux/iosys-map.h
+++ b/include/linux/iosys-map.h
@@ -168,9 +168,9 @@ struct iosys_map {
  * about the use of uninitialized variable.
  */
 #define IOSYS_MAP_INIT_OFFSET(map_, offset_) ({				\
-	struct iosys_map copy = *map_;					\
-	iosys_map_incr(&copy, offset_);					\
-	copy;								\
+	struct iosys_map copy_ = *map_;					\
+	iosys_map_incr(&copy_, offset_);				\
+	copy_;								\
 })
 
 /**
@@ -391,14 +391,14 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
  * Returns:
  * The value read from the mapping.
  */
-#define iosys_map_rd(map__, offset__, type__) ({				\
-	type__ val;								\
-	if ((map__)->is_iomem) {						\
-		__iosys_map_rd_io(val, (map__)->vaddr_iomem + (offset__), type__);\
-	} else {								\
-		__iosys_map_rd_sys(val, (map__)->vaddr + (offset__), type__);	\
-	}									\
-	val;									\
+#define iosys_map_rd(map__, offset__, type__) ({					\
+	type__ val_;									\
+	if ((map__)->is_iomem) {							\
+		__iosys_map_rd_io(val_, (map__)->vaddr_iomem + (offset__), type__);	\
+	} else {									\
+		__iosys_map_rd_sys(val_, (map__)->vaddr + (offset__), type__);		\
+	}										\
+	val_;										\
 })
 
 /**
@@ -413,13 +413,13 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
  * or if pointer may be unaligned (and problematic for the architecture
  * supported), use iosys_map_memcpy_to()
  */
-#define iosys_map_wr(map__, offset__, type__, val__) ({				\
-	type__ val = (val__);							\
-	if ((map__)->is_iomem) {						\
-		__iosys_map_wr_io(val, (map__)->vaddr_iomem + (offset__), type__);\
-	} else {								\
-		__iosys_map_wr_sys(val, (map__)->vaddr + (offset__), type__);	\
-	}									\
+#define iosys_map_wr(map__, offset__, type__, val__) ({					\
+	type__ val_ = (val__);								\
+	if ((map__)->is_iomem) {							\
+		__iosys_map_wr_io(val_, (map__)->vaddr_iomem + (offset__), type__);	\
+	} else {									\
+		__iosys_map_wr_sys(val_, (map__)->vaddr + (offset__), type__);		\
+	}										\
 })
 
 /**
@@ -485,9 +485,9 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
  * The value read from the mapping.
  */
 #define iosys_map_rd_field(map__, struct_offset__, struct_type__, field__) ({	\
-	struct_type__ *s;							\
+	struct_type__ *s_;							\
 	iosys_map_rd(map__, struct_offset__ + offsetof(struct_type__, field__),	\
-		     typeof(s->field__));					\
+		     typeof(s_->field__));					\
 })
 
 /**
@@ -508,9 +508,9 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
  * usage and memory layout.
  */
 #define iosys_map_wr_field(map__, struct_offset__, struct_type__, field__, val__) ({	\
-	struct_type__ *s;								\
+	struct_type__ *s_;								\
 	iosys_map_wr(map__, struct_offset__ + offsetof(struct_type__, field__),		\
-		     typeof(s->field__), val__);					\
+		     typeof(s_->field__), val__);					\
 })
 
 #endif /* __IOSYS_MAP_H__ */
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index de723566c5ae..8662b5aeea0c 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -1218,6 +1218,26 @@ extern "C" {
 
 #define DRM_IOCTL_SYNCOBJ_EVENTFD	DRM_IOWR(0xCF, struct drm_syncobj_eventfd)
 
+/**
+ * DRM_IOCTL_MODE_CLOSEFB - Close a framebuffer.
+ *
+ * This closes a framebuffer previously added via ADDFB/ADDFB2. The IOCTL
+ * argument is a framebuffer object ID.
+ *
+ * This IOCTL is similar to &DRM_IOCTL_MODE_RMFB, except it doesn't disable
+ * planes and CRTCs. As long as the framebuffer is used by a plane, it's kept
+ * alive. When the plane no longer uses the framebuffer (because the
+ * framebuffer is replaced with another one, or the plane is disabled), the
+ * framebuffer is cleaned up.
+ *
+ * This is useful to implement flicker-free transitions between two processes.
+ *
+ * Depending on the threat model, user-space may want to ensure that the
+ * framebuffer doesn't expose any sensitive user information: closed
+ * framebuffers attached to a plane can be read back by the next DRM master.
+ */
+#define DRM_IOCTL_MODE_CLOSEFB		DRM_IOWR(0xD0, struct drm_mode_closefb)
+
 /*
  * Device specific ioctls should only be in their respective headers
  * The device specific ioctl range is from 0x40 to 0x9f.
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 128d09138ceb..09e7a471ee30 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -1323,6 +1323,16 @@ struct drm_mode_rect {
 	__s32 y2;
 };
 
+/**
+ * struct drm_mode_closefb
+ * @fb_id: Framebuffer ID.
+ * @pad: Must be zero.
+ */
+struct drm_mode_closefb {
+	__u32 fb_id;
+	__u32 pad;
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/uapi/drm/ivpu_accel.h b/include/uapi/drm/ivpu_accel.h
index 262db0c3beee..de1944e42c65 100644
--- a/include/uapi/drm/ivpu_accel.h
+++ b/include/uapi/drm/ivpu_accel.h
@@ -196,7 +196,7 @@ struct drm_ivpu_bo_create {
 	 *
 	 * %DRM_IVPU_BO_UNCACHED:
 	 *
-	 * Allocated BO will not be cached on host side nor snooped on the VPU side.
+	 * Not supported. Use DRM_IVPU_BO_WC instead.
 	 *
 	 * %DRM_IVPU_BO_WC:
 	 *
diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
index 43ac5d864512..9dab32316aee 100644
--- a/include/uapi/drm/qaic_accel.h
+++ b/include/uapi/drm/qaic_accel.h
@@ -287,8 +287,9 @@ struct qaic_execute_entry {
  * struct qaic_partial_execute_entry - Defines a BO to resize and submit.
  * @handle: In. GEM handle of the BO to commit to the device.
  * @dir: In. Direction of data. 1 = to device, 2 = from device.
- * @resize: In. New size of the BO.  Must be <= the original BO size.  0 is
- *	    short for no resize.
+ * @resize: In. New size of the BO.  Must be <= the original BO size.
+ *	    @resize as 0 would be interpreted as no DMA transfer is
+ *	    involved.
  */
 struct qaic_partial_execute_entry {
 	__u32 handle;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 3dfc0af8756a..1a7d7a689de3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -319,6 +319,11 @@ struct drm_v3d_submit_tfu {
 
 	/* Pointer to an array of ioctl extensions*/
 	__u64 extensions;
+
+	struct {
+		__u32 ioc;
+		__u32 pad;
+	} v71;
 };
 
 /* Submits a compute shader for dispatch.  This job will block on any
diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index b1d0e56565bc..c2ce71987e9b 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -97,6 +97,7 @@ struct drm_virtgpu_execbuffer {
 #define VIRTGPU_PARAM_CROSS_DEVICE 5 /* Cross virtio-device resource sharing  */
 #define VIRTGPU_PARAM_CONTEXT_INIT 6 /* DRM_VIRTGPU_CONTEXT_INIT */
 #define VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs 7 /* Bitmask of supported capability set ids */
+#define VIRTGPU_PARAM_EXPLICIT_DEBUG_NAME 8 /* Ability to set debug name from userspace */
 
 struct drm_virtgpu_getparam {
 	__u64 param;
@@ -198,6 +199,7 @@ struct drm_virtgpu_resource_create_blob {
 #define VIRTGPU_CONTEXT_PARAM_CAPSET_ID       0x0001
 #define VIRTGPU_CONTEXT_PARAM_NUM_RINGS       0x0002
 #define VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK 0x0003
+#define VIRTGPU_CONTEXT_PARAM_DEBUG_NAME      0x0004
 struct drm_virtgpu_context_set_param {
 	__u64 param;
 	__u64 value;
diff --git a/sound/core/pcm_drm_eld.c b/sound/core/pcm_drm_eld.c
index 07075071972d..1cdca4d4fc9c 100644
--- a/sound/core/pcm_drm_eld.c
+++ b/sound/core/pcm_drm_eld.c
@@ -6,6 +6,7 @@
 #include <linux/export.h>
 #include <linux/hdmi.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <sound/pcm.h>
 #include <sound/pcm_drm_eld.h>
 
diff --git a/sound/soc/codecs/hdac_hdmi.c b/sound/soc/codecs/hdac_hdmi.c
index b9c5ffbfb5ba..a188a89e40d9 100644
--- a/sound/soc/codecs/hdac_hdmi.c
+++ b/sound/soc/codecs/hdac_hdmi.c
@@ -16,6 +16,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/hdmi.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <sound/pcm_params.h>
 #include <sound/jack.h>
 #include <sound/soc.h>
diff --git a/sound/soc/codecs/hdmi-codec.c b/sound/soc/codecs/hdmi-codec.c
index 20da1eaa4f1c..a7bd1796aed6 100644
--- a/sound/soc/codecs/hdmi-codec.c
+++ b/sound/soc/codecs/hdmi-codec.c
@@ -17,6 +17,7 @@
 #include <sound/pcm_iec958.h>
 
 #include <drm/drm_crtc.h> /* This is only to get MAX_ELD_BYTES */
+#include <drm/drm_eld.h>
 
 #define HDMI_CODEC_CHMAP_IDX_UNKNOWN  -1
 
diff --git a/sound/x86/intel_hdmi_audio.c b/sound/x86/intel_hdmi_audio.c
index ab95fb34a635..02f5a7f9b728 100644
--- a/sound/x86/intel_hdmi_audio.c
+++ b/sound/x86/intel_hdmi_audio.c
@@ -30,6 +30,7 @@
 #include <sound/control.h>
 #include <sound/jack.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_eld.h>
 #include <drm/intel_lpe_audio.h>
 #include "intel_hdmi_audio.h"