This tag contains habanalabs driver and accel changes for v6.3:

- Moved the driver to the accel subsystem. Currently only the files were moved (including the uapi file which was also renamed). This doesn't include registering to the accel subsystem. This will probably be only in the next kernel version. - In case of decoder error (axi error) in Gaudi2, we can now find the exact IP that initiated the erroneous transaction and print the details for better debug. - Add more trace events. We now can trace mmio transactions and communication with the preboot firmware. - Add to Gaudi2 support for abrupt reset that is done by the firmware. This was support so far only for Gaudi1. - Add uAPI to flush memory transactions (to the device memory). This is needed by the communications library in case of doing p2p with a host NIC which access our HBM directly through the PCI BAR. - Add uAPI to pass-through a request from user-space to firmware and get the result back to user-space. This will allow the driver code to avoid the need to add new packet (in the communication channel with the firmware) for every new request type. - Remove the option to export dma-buf by memory allocation handle in our uAPI. This was planned for Gaudi2 but was never used. Instead, we will do export by memory address (same as Gaudi1). In addition, we added the option to specify an offset to the address. This is needed in Gaudi2 because there the user allocates the entire HBM in one allocation, but would like to export only small part of it. - Multiple bug fixes, refactors and small optimizations. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmPS7fYACgkQZR1NuKta 54Bu/wf+I4dySZslvWhjqom9zYlubnwKLpLxuKGeR2OeMZxe5AJSyNCohmMXjl4S ac+2qktRbecYXwNHCZBFjybKUgqBN5VuxRNRjV3/It+koON82QkMvBB48CeD7W/P 9y5b38LYOxqFixd5U5e7MpNhU1/AAKGnqEtQZgub0h0IeOrZM0RkTL6PJoUdQAcn ap3pNXm6cpNKo5EZYlyRwEJRw1YlUKt3lcsKNnBDjij1j+IOtQ0sPA7Ii0YMdtqh /gL81JkyZmgQBq9aGjr8vGyILvVTfm8qJu7t4hGEZ165bAP4bQaU9W+YMDgpKmgM Bc+NghtCz13gmKPY/gudP5fDbPHUdA== =fOxf -----END PGP SIGNATURE----- Merge tag 'drm-habanalabs-next-2023-01-26' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next This tag contains habanalabs driver and accel changes for v6.3: - Moved the driver to the accel subsystem. Currently only the files were moved (including the uapi file which was also renamed). This doesn't include registering to the accel subsystem. This will probably be only in the next kernel version. - In case of decoder error (axi error) in Gaudi2, we can now find the exact IP that initiated the erroneous transaction and print the details for better debug. - Add more trace events. We now can trace mmio transactions and communication with the preboot firmware. - Add to Gaudi2 support for abrupt reset that is done by the firmware. This was support so far only for Gaudi1. - Add uAPI to flush memory transactions (to the device memory). This is needed by the communications library in case of doing p2p with a host NIC which access our HBM directly through the PCI BAR. - Add uAPI to pass-through a request from user-space to firmware and get the result back to user-space. This will allow the driver code to avoid the need to add new packet (in the communication channel with the firmware) for every new request type. - Remove the option to export dma-buf by memory allocation handle in our uAPI. This was planned for Gaudi2 but was never used. Instead, we will do export by memory address (same as Gaudi1). In addition, we added the option to specify an offset to the address. This is needed in Gaudi2 because there the user allocates the entire HBM in one allocation, but would like to export only small part of it. - Multiple bug fixes, refactors and small optimizations. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Oded Gabbay <ogabbay@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230126213317.GA1520525@ogabbay-vm-u20.habana-labs.com
2024-09-30 22:26:55 +00:00 · 2023-01-30 12:43:10 +10:00 · 2023-01-30 12:43:10 +10:00 · 729b3c1530
commit 729b3c1530
parent 2e2245efc1 4dc7c58023
439 changed files with 3949 additions and 1155 deletions
--- a/Documentation/ABI/testing/sysfs-driver-habanalabs
+++ b/Documentation/ABI/testing/sysfs-driver-habanalabs
@ -201,7 +201,19 @@ What:           /sys/class/habanalabs/hl<n>/status
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        ogabbay@kernel.org
-Description:    Status of the card: "Operational", "Malfunction", "In reset".
+Description:    Status of the card:
+
+                  * "operational" - Device is available for work.
+                  * "in reset" - Device is going through reset, will be
+                    available shortly.
+                  * "disabled" - Device is not usable.
+                  * "needs reset" - Device is not usable until a hard reset
+                    is initiated.
+                  * "in device creation" - Device is not available yet, as it
+                    is still initializing.
+                  * "in reset after device release" - Device is going through
+                    a compute-reset which is executed after a device release
+                    (relevant for Gaudi2 only).

 What:           /sys/class/habanalabs/hl<n>/thermal_ver
 Date:           Jan 2019
--- a/Documentation/accel/introduction.rst
+++ b/Documentation/accel/introduction.rst
@ -67,9 +67,9 @@ tree - drivers/accel/.
 The accelerator devices will be exposed to the user space with the dedicated
 261 major number and will have the following convention:

- device char files - /dev/accel/accel*
- sysfs             - /sys/class/accel/accel*/
- debugfs           - /sys/kernel/debug/accel/accel*/
+- device char files - /dev/accel/accel\*
+- sysfs             - /sys/class/accel/accel\*/
+- debugfs           - /sys/kernel/debug/accel/\*/

 Getting Started
 ===============
--- a/7
+++ b/7
@ -6893,6 +6893,7 @@ C:	irc://irc.oftc.net/dri-devel
 T:	git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
 F:	Documentation/accel/
 F:	drivers/accel/
+F:	include/drm/drm_accel.h

 DRM ACCEL DRIVERS FOR INTEL VPU
 M:	Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
@ -9032,13 +9033,15 @@ F:	block/partitions/efi.*

 HABANALABS PCI DRIVER
 M:	Oded Gabbay <ogabbay@kernel.org>
+L:	dri-devel@lists.freedesktop.org
 S:	Supported
+C:	irc://irc.oftc.net/dri-devel
 T:	git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git
 F:	Documentation/ABI/testing/debugfs-driver-habanalabs
 F:	Documentation/ABI/testing/sysfs-driver-habanalabs
-F:	drivers/misc/habanalabs/
+F:	drivers/accel/habanalabs/
 F:	include/trace/events/habanalabs.h
-F:	include/uapi/misc/habanalabs.h
+F:	include/uapi/drm/habanalabs_accel.h

 HACKRF MEDIA DRIVER
 M:	Antti Palosaari <crope@iki.fi>
--- a/drivers/Makefile
+++ b/drivers/Makefile
@ -189,4 +189,4 @@ obj-$(CONFIG_COUNTER)		+= counter/
 obj-$(CONFIG_MOST)		+= most/
 obj-$(CONFIG_PECI)		+= peci/
 obj-$(CONFIG_HTE)		+= hte/
-obj-$(CONFIG_DRM_ACCEL)	+= accel/
+obj-$(CONFIG_DRM_ACCEL)		+= accel/
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@ -23,4 +23,5 @@ menuconfig DRM_ACCEL
 	  different device files, called accel/accel* (in /dev, sysfs
 	  and debugfs).

+source "drivers/accel/habanalabs/Kconfig"
 source "drivers/accel/ivpu/Kconfig"
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only

+obj-y	+= habanalabs/
 obj-y	+= ivpu/
--- a/drivers/accel/habanalabs/Kconfig
+++ b/drivers/accel/habanalabs/Kconfig
@ -3,8 +3,10 @@
 # HabanaLabs AI accelerators driver
 #

-config HABANA_AI
-	tristate "HabanaAI accelerators (habanalabs)"
+config DRM_ACCEL_HABANALABS
+	tristate "HabanaLabs AI accelerators"
+	depends on DRM_ACCEL
+	depends on X86_64
 	depends on PCI && HAS_IOMEM
 	select GENERIC_ALLOCATOR
 	select HWMON
@ -19,7 +21,7 @@ config HABANA_AI
 	  the user to submit workloads to the devices.

 	  The user-space interface is described in
-	  include/uapi/misc/habanalabs.h
+	  include/uapi/drm/habanalabs_accel.h

 	  If unsure, say N.

--- a/drivers/accel/habanalabs/Makefile
+++ b/drivers/accel/habanalabs/Makefile
@ -3,7 +3,7 @@
 # Makefile for HabanaLabs AI accelerators driver
 #

-obj-$(CONFIG_HABANA_AI) := habanalabs.o
+obj-$(CONFIG_DRM_ACCEL_HABANALABS) := habanalabs.o

 include $(src)/common/Makefile
 habanalabs-y += $(HL_COMMON_FILES)
--- a/drivers/accel/habanalabs/common/Makefile
+++ b/drivers/accel/habanalabs/common/Makefile
--- a/drivers/accel/habanalabs/common/asid.c
+++ b/drivers/accel/habanalabs/common/asid.c
--- a/drivers/accel/habanalabs/common/command_buffer.c
+++ b/drivers/accel/habanalabs/common/command_buffer.c
@ -5,7 +5,7 @@
 * All Rights Reserved.
 */

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "habanalabs.h"

 #include <linux/mm.h>
@ -88,6 +88,7 @@ static void cb_fini(struct hl_device *hdev, struct hl_cb *cb)
 static void cb_do_release(struct hl_device *hdev, struct hl_cb *cb)
 {
 	if (cb->is_pool) {
+		atomic_set(&cb->is_handle_destroyed, 0);
 		spin_lock(&hdev->cb_pool_lock);
 		list_add(&cb->pool_list, &hdev->cb_pool);
 		spin_unlock(&hdev->cb_pool_lock);
@ -298,8 +299,25 @@ int hl_cb_create(struct hl_device *hdev, struct hl_mem_mgr *mmg,

 int hl_cb_destroy(struct hl_mem_mgr *mmg, u64 cb_handle)
 {
+	struct hl_cb *cb;
 	int rc;

+	cb = hl_cb_get(mmg, cb_handle);
+	if (!cb) {
+		dev_dbg(mmg->dev, "CB destroy failed, no CB was found for handle %#llx\n",
+			cb_handle);
+		return -EINVAL;
+	}
+
+	/* Make sure that CB handle isn't destroyed more than once */
+	rc = atomic_cmpxchg(&cb->is_handle_destroyed, 0, 1);
+	hl_cb_put(cb);
+	if (rc) {
+		dev_dbg(mmg->dev, "CB destroy failed, handle %#llx was already destroyed\n",
+			cb_handle);
+		return -EINVAL;
+	}
+
 	rc = hl_mmap_mem_buf_put_handle(mmg, cb_handle);
 	if (rc < 0)
 		return rc; /* Invalid handle */
@ -350,7 +368,7 @@ int hl_cb_ioctl(struct hl_fpriv *hpriv, void *data)
 	int rc;

 	if (!hl_device_operational(hdev, &status)) {
-		dev_warn_ratelimited(hdev->dev,
+		dev_dbg_ratelimited(hdev->dev,
 			"Device is %s. Can't execute CB IOCTL\n",
 			hdev->status[status]);
 		return -EBUSY;
--- a/drivers/accel/habanalabs/common/command_submission.c
+++ b/drivers/accel/habanalabs/common/command_submission.c
@ -5,7 +5,7 @@
 * All Rights Reserved.
 */

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "habanalabs.h"

 #include <linux/uaccess.h>
@ -13,7 +13,8 @@

 #define HL_CS_FLAGS_TYPE_MASK	(HL_CS_FLAGS_SIGNAL | HL_CS_FLAGS_WAIT | \
 			HL_CS_FLAGS_COLLECTIVE_WAIT | HL_CS_FLAGS_RESERVE_SIGNALS_ONLY | \
-			HL_CS_FLAGS_UNRESERVE_SIGNALS_ONLY | HL_CS_FLAGS_ENGINE_CORE_COMMAND)
+			HL_CS_FLAGS_UNRESERVE_SIGNALS_ONLY | HL_CS_FLAGS_ENGINE_CORE_COMMAND | \
+			HL_CS_FLAGS_FLUSH_PCI_HBW_WRITES)


 #define MAX_TS_ITER_NUM 10
@ -397,8 +398,16 @@ static void hl_complete_job(struct hl_device *hdev, struct hl_cs_job *job)
 	 * flow by calling 'hl_hw_queue_update_ci'.
 	 */
 	if (cs_needs_completion(cs) &&
-		(job->queue_type == QUEUE_TYPE_EXT || job->queue_type == QUEUE_TYPE_HW))
+			(job->queue_type == QUEUE_TYPE_EXT || job->queue_type == QUEUE_TYPE_HW)) {
+
+		/* In CS based completions, the timestamp is already available,
+		 * so no need to extract it from job
+		 */
+		if (hdev->asic_prop.completion_mode == HL_COMPLETION_MODE_JOB)
+			cs->completion_timestamp = job->timestamp;
+
 		cs_put(cs);
+	}

 	hl_cs_job_put(job);
 }
@ -775,7 +784,7 @@ static void cs_do_release(struct kref *ref)
 	}

 	if (cs->timestamp) {
-		cs->fence->timestamp = ktime_get();
+		cs->fence->timestamp = cs->completion_timestamp;
 		hl_push_cs_outcome(hdev, &cs->ctx->outcome_store, cs->sequence,
 				   cs->fence->timestamp, cs->fence->error);
 	}
@ -1117,6 +1126,27 @@ void hl_release_pending_user_interrupts(struct hl_device *hdev)
 	wake_pending_user_interrupt_threads(interrupt);
 }

+static void force_complete_cs(struct hl_device *hdev)
+{
+	struct hl_cs *cs;
+
+	spin_lock(&hdev->cs_mirror_lock);
+
+	list_for_each_entry(cs, &hdev->cs_mirror_list, mirror_node) {
+		cs->fence->error = -EIO;
+		complete_all(&cs->fence->completion);
+	}
+
+	spin_unlock(&hdev->cs_mirror_lock);
+}
+
+void hl_abort_waitings_for_completion(struct hl_device *hdev)
+{
+	force_complete_cs(hdev);
+	force_complete_multi_cs(hdev);
+	hl_release_pending_user_interrupts(hdev);
+}
+
 static void job_wq_completion(struct work_struct *work)
 {
 	struct hl_cs_job *job = container_of(work, struct hl_cs_job,
@ -1274,6 +1304,8 @@ static enum hl_cs_type hl_cs_get_cs_type(u32 cs_type_flags)
 		return CS_UNRESERVE_SIGNALS;
 	else if (cs_type_flags & HL_CS_FLAGS_ENGINE_CORE_COMMAND)
 		return CS_TYPE_ENGINE_CORE;
+	else if (cs_type_flags & HL_CS_FLAGS_FLUSH_PCI_HBW_WRITES)
+		return CS_TYPE_FLUSH_PCI_HBW_WRITES;
 	else
 		return CS_TYPE_DEFAULT;
 }
@ -1286,6 +1318,13 @@ static int hl_cs_sanity_checks(struct hl_fpriv *hpriv, union hl_cs_args *args)
 	enum hl_device_status status;
 	enum hl_cs_type cs_type;
 	bool is_sync_stream;
+	int i;
+
+	for (i = 0 ; i < sizeof(args->in.pad) ; i++)
+		if (args->in.pad[i]) {
+			dev_dbg(hdev->dev, "Padding bytes must be 0\n");
+			return -EINVAL;
+		}

 	if (!hl_device_operational(hdev, &status)) {
 		return -EBUSY;
@ -2422,6 +2461,21 @@ static int cs_ioctl_engine_cores(struct hl_fpriv *hpriv, u64 engine_cores,
 	return rc;
 }

+static int cs_ioctl_flush_pci_hbw_writes(struct hl_fpriv *hpriv)
+{
+	struct hl_device *hdev = hpriv->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+	if (!prop->hbw_flush_reg) {
+		dev_dbg(hdev->dev, "HBW flush is not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	RREG32(prop->hbw_flush_reg);
+
+	return 0;
+}
+
 int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
 {
 	union hl_cs_args *args = data;
@ -2478,6 +2532,9 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
 		rc = cs_ioctl_engine_cores(hpriv, args->in.engine_cores,
 				args->in.num_engine_cores, args->in.core_command);
 		break;
+	case CS_TYPE_FLUSH_PCI_HBW_WRITES:
+		rc = cs_ioctl_flush_pci_hbw_writes(hpriv);
+		break;
 	default:
 		rc = cs_ioctl_default(hpriv, chunks, num_chunks, &cs_seq,
 						args->in.cs_flags,
@ -2569,7 +2626,9 @@ static int hl_wait_for_fence(struct hl_ctx *ctx, u64 seq, struct hl_fence *fence
 		*status = CS_WAIT_STATUS_BUSY;
 	}

-	if (error == -ETIMEDOUT || error == -EIO)
+	if (completion_rc == -ERESTARTSYS)
+		rc = completion_rc;
+	else if (error == -ETIMEDOUT || error == -EIO)
 		rc = error;

 	return rc;
@ -2699,7 +2758,8 @@ static int hl_cs_poll_fences(struct multi_cs_data *mcs_data, struct multi_cs_com
 			break;
 		default:
 			dev_err(hdev->dev, "Invalid fence status\n");
-			return -EINVAL;
+			rc = -EINVAL;
+			break;
 		}

 	}
@ -2828,6 +2888,9 @@ static int hl_wait_multi_cs_completion(struct multi_cs_data *mcs_data,
 	if (completion_rc > 0)
 		mcs_data->timestamp = mcs_compl->timestamp;

+	if (completion_rc == -ERESTARTSYS)
+		return completion_rc;
+
 	mcs_data->wait_status = completion_rc;

 	return 0;
@ -2870,7 +2933,13 @@ static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 	u32 size_to_copy;
 	u64 *cs_seq_arr;
 	u8 seq_arr_len;
-	int rc;
+	int rc, i;
+
+	for (i = 0 ; i < sizeof(args->in.pad) ; i++)
+		if (args->in.pad[i]) {
+			dev_dbg(hdev->dev, "Padding bytes must be 0\n");
+			return -EINVAL;
+		}

 	if (!hdev->supports_wait_for_multi_cs) {
 		dev_err(hdev->dev, "Wait for multi CS is not supported\n");
@ -2973,15 +3042,15 @@ static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 free_seq_arr:
 	kfree(cs_seq_arr);

-	if (rc)
-		return rc;
-
-	if (mcs_data.wait_status == -ERESTARTSYS) {
+	if (rc == -ERESTARTSYS) {
 		dev_err_ratelimited(hdev->dev,
 				"user process got signal while waiting for Multi-CS\n");
-		return -EINTR;
+		rc = -EINTR;
 	}

+	if (rc)
+		return rc;
+
 	/* update output args */
 	memset(args, 0, sizeof(*args));

@ -3119,19 +3188,18 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
 			goto start_over;
 		}
 	} else {
+		/* Fill up the new registration node info */
+		requested_offset_record->ts_reg_info.buf = buf;
+		requested_offset_record->ts_reg_info.cq_cb = cq_cb;
+		requested_offset_record->ts_reg_info.timestamp_kernel_addr =
+				(u64 *) ts_buff->user_buff_address + ts_offset;
+		requested_offset_record->cq_kernel_addr =
+				(u64 *) cq_cb->kernel_address + cq_offset;
+		requested_offset_record->cq_target_value = target_value;
+
 		spin_unlock_irqrestore(wait_list_lock, flags);
 	}

-	/* Fill up the new registration node info */
-	requested_offset_record->ts_reg_info.in_use = 1;
-	requested_offset_record->ts_reg_info.buf = buf;
-	requested_offset_record->ts_reg_info.cq_cb = cq_cb;
-	requested_offset_record->ts_reg_info.timestamp_kernel_addr =
-			(u64 *) ts_buff->user_buff_address + ts_offset;
-	requested_offset_record->cq_kernel_addr =
-			(u64 *) cq_cb->kernel_address + cq_offset;
-	requested_offset_record->cq_target_value = target_value;
-
 	*pend = requested_offset_record;

 	dev_dbg(buf->mmg->dev, "Found available node in TS kernel CB %p\n",
@ -3179,7 +3247,7 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 			goto put_cq_cb;
 		}

-		/* Find first available record */
+		/* get ts buffer record */
 		rc = ts_buff_get_kernel_ts_record(buf, cq_cb, ts_offset,
 						cq_counters_offset, target_value,
 						&interrupt->wait_list_lock, &pend);
@ -3227,7 +3295,19 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 	 * Note that we cannot have sorted list by target value,
 	 * in order to shorten the list pass loop, since
 	 * same list could have nodes for different cq counter handle.
+	 * Note:
+	 * Mark ts buff offset as in use here in the spinlock protection area
+	 * to avoid getting in the re-use section in ts_buff_get_kernel_ts_record
+	 * before adding the node to the list. this scenario might happen when
+	 * multiple threads are racing on same offset and one thread could
+	 * set the ts buff in ts_buff_get_kernel_ts_record then the other thread
+	 * takes over and get to ts_buff_get_kernel_ts_record and then we will try
+	 * to re-use the same ts buff offset, and will try to delete a non existing
+	 * node from the list.
 	 */
+	if (register_ts_record)
+		pend->ts_reg_info.in_use = 1;
+
 	list_add_tail(&pend->wait_list_node, &interrupt->wait_list_head);
 	spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);

@ -3489,14 +3569,15 @@ static int hl_interrupt_wait_ioctl(struct hl_fpriv *hpriv, void *data)

 int hl_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 {
+	struct hl_device *hdev = hpriv->hdev;
 	union hl_wait_cs_args *args = data;
 	u32 flags = args->in.flags;
 	int rc;

-	/* If the device is not operational, no point in waiting for any command submission or
-	 * user interrupt
+	/* If the device is not operational, or if an error has happened and user should release the
+	 * device, there is no point in waiting for any command submission or user interrupt.
 	 */
-	if (!hl_device_operational(hpriv->hdev, NULL))
+	if (!hl_device_operational(hpriv->hdev, NULL) || hdev->reset_info.watchdog_active)
 		return -EBUSY;

 	if (flags & HL_WAIT_CS_FLAGS_INTERRUPT)
--- a/drivers/accel/habanalabs/common/context.c
+++ b/drivers/accel/habanalabs/common/context.c
--- a/drivers/accel/habanalabs/common/debugfs.c
+++ b/drivers/accel/habanalabs/common/debugfs.c
--- a/drivers/accel/habanalabs/common/decoder.c
+++ b/drivers/accel/habanalabs/common/decoder.c
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@ -7,7 +7,7 @@

 #define pr_fmt(fmt)			"habanalabs: " fmt

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "habanalabs.h"

 #include <linux/pci.h>
@ -428,8 +428,10 @@ static void hpriv_release(struct kref *ref)
 	 */
 	reset_device = hdev->reset_upon_device_release || hdev->reset_info.watchdog_active;

-	/* Unless device is reset in any case, check idle status and reset if device is not idle */
-	if (!reset_device && hdev->pdev && !hdev->pldm)
+	/* Check the device idle status and reset if not idle.
+	 * Skip it if already in reset, or if device is going to be reset in any case.
+	 */
+	if (!hdev->reset_info.in_reset && !reset_device && hdev->pdev && !hdev->pldm)
 		device_is_idle = hdev->asic_funcs->is_device_idle(hdev, idle_mask,
 							HL_BUSY_ENGINES_MASK_EXT_SIZE, NULL);
 	if (!device_is_idle) {
@ -511,11 +513,6 @@ static int hl_device_release(struct inode *inode, struct file *filp)
 		return 0;
 	}

-	/* Each pending user interrupt holds the user's context, hence we
-	 * must release them all before calling hl_ctx_mgr_fini().
-	 */
-	hl_release_pending_user_interrupts(hpriv->hdev);
-
 	hl_ctx_mgr_fini(hdev, &hpriv->ctx_mgr);
 	hl_mem_mgr_fini(&hpriv->mem_mgr);

@ -1428,8 +1425,8 @@ static void handle_reset_trigger(struct hl_device *hdev, u32 flags)
 int hl_device_reset(struct hl_device *hdev, u32 flags)
 {
 	bool hard_reset, from_hard_reset_thread, fw_reset, hard_instead_soft = false,
-			reset_upon_device_release = false, schedule_hard_reset = false, delay_reset,
-			from_dev_release, from_watchdog_thread;
+			reset_upon_device_release = false, schedule_hard_reset = false,
+			delay_reset, from_dev_release, from_watchdog_thread;
 	u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0};
 	struct hl_ctx *ctx;
 	int i, rc;
@ -1446,12 +1443,17 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 	delay_reset = !!(flags & HL_DRV_RESET_DELAY);
 	from_watchdog_thread = !!(flags & HL_DRV_RESET_FROM_WD_THR);

+	if (!hard_reset && (hl_device_status(hdev) == HL_DEVICE_STATUS_MALFUNCTION)) {
+		dev_dbg(hdev->dev, "soft-reset isn't supported on a malfunctioning device\n");
+		return 0;
+	}
+
 	if (!hard_reset && !hdev->asic_prop.supports_compute_reset) {
 		hard_instead_soft = true;
 		hard_reset = true;
 	}

-	if (hdev->reset_upon_device_release && (flags & HL_DRV_RESET_DEV_RELEASE)) {
+	if (hdev->reset_upon_device_release && from_dev_release) {
 		if (hard_reset) {
 			dev_crit(hdev->dev,
 				"Aborting reset because hard-reset is mutually exclusive with reset-on-device-release\n");
@ -1512,6 +1514,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 						&hdev->device_release_watchdog_work.reset_work);

 			if (from_dev_release) {
+				hdev->reset_info.in_compute_reset = 0;
 				flags |= HL_DRV_RESET_HARD;
 				flags &= ~HL_DRV_RESET_DEV_RELEASE;
 				hard_reset = true;
@ -1566,7 +1569,8 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 		if (rc == -EBUSY) {
 			if (hdev->device_fini_pending) {
 				dev_crit(hdev->dev,
-					"Failed to kill all open processes, stopping hard reset\n");
+					"%s Failed to kill all open processes, stopping hard reset\n",
+					dev_name(&(hdev)->pdev->dev));
 				goto out_err;
 			}

@ -1576,7 +1580,8 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)

 		if (rc) {
 			dev_crit(hdev->dev,
-				"Failed to kill all open processes, stopping hard reset\n");
+				"%s Failed to kill all open processes, stopping hard reset\n",
+				dev_name(&(hdev)->pdev->dev));
 			goto out_err;
 		}

@ -1627,14 +1632,16 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 			 * ensure driver puts the driver in a unusable state
 			 */
 			dev_crit(hdev->dev,
-				"Consecutive FW fatal errors received, stopping hard reset\n");
+				"%s Consecutive FW fatal errors received, stopping hard reset\n",
+				dev_name(&(hdev)->pdev->dev));
 			rc = -EIO;
 			goto out_err;
 		}

 		if (hdev->kernel_ctx) {
 			dev_crit(hdev->dev,
-				"kernel ctx was alive during hard reset, something is terribly wrong\n");
+				"%s kernel ctx was alive during hard reset, something is terribly wrong\n",
+				dev_name(&(hdev)->pdev->dev));
 			rc = -EBUSY;
 			goto out_err;
 		}
@ -1732,7 +1739,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 	rc = hdev->asic_funcs->scrub_device_mem(hdev);
 	if (rc) {
 		dev_err(hdev->dev, "scrub mem failed from device reset (%d)\n", rc);
-		return rc;
+		goto out_err;
 	}

 	spin_lock(&hdev->reset_info.lock);
@ -1752,9 +1759,13 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 	hdev->reset_info.needs_reset = false;

 	if (hard_reset)
-		dev_info(hdev->dev, "Successfully finished resetting the device\n");
+		dev_info(hdev->dev,
+			 "Successfully finished resetting the %s device\n",
+			 dev_name(&(hdev)->pdev->dev));
 	else
-		dev_dbg(hdev->dev, "Successfully finished resetting the device\n");
+		dev_dbg(hdev->dev,
+			"Successfully finished resetting the %s device\n",
+			dev_name(&(hdev)->pdev->dev));

 	if (hard_reset) {
 		hdev->reset_info.hard_reset_cnt++;
@ -1789,7 +1800,9 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 	hdev->reset_info.in_compute_reset = 0;

 	if (hard_reset) {
-		dev_err(hdev->dev, "Failed to reset! Device is NOT usable\n");
+		dev_err(hdev->dev,
+			"%s Failed to reset! Device is NOT usable\n",
+			dev_name(&(hdev)->pdev->dev));
 		hdev->reset_info.hard_reset_cnt++;
 	} else if (reset_upon_device_release) {
 		spin_unlock(&hdev->reset_info.lock);
@ -1870,6 +1883,8 @@ int hl_device_cond_reset(struct hl_device *hdev, u32 flags, u64 event_mask)

 	hl_ctx_put(ctx);

+	hl_abort_waitings_for_completion(hdev);
+
 	return 0;

 device_reset:
@ -2186,7 +2201,8 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 	}

 	dev_notice(hdev->dev,
-		"Successfully added device to habanalabs driver\n");
+		"Successfully added device %s to habanalabs driver\n",
+		dev_name(&(hdev)->pdev->dev));

 	hdev->init_done = true;

@ -2235,11 +2251,11 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 		device_cdev_sysfs_add(hdev);
 	if (hdev->pdev)
 		dev_err(&hdev->pdev->dev,
-			"Failed to initialize hl%d. Device is NOT usable !\n",
-			hdev->cdev_idx);
+			"Failed to initialize hl%d. Device %s is NOT usable !\n",
+			hdev->cdev_idx, dev_name(&(hdev)->pdev->dev));
 	else
-		pr_err("Failed to initialize hl%d. Device is NOT usable !\n",
-			hdev->cdev_idx);
+		pr_err("Failed to initialize hl%d. Device %s is NOT usable !\n",
+			hdev->cdev_idx, dev_name(&(hdev)->pdev->dev));

 	return rc;
 }
@ -2295,7 +2311,8 @@ void hl_device_fini(struct hl_device *hdev)

 		if (ktime_compare(ktime_get(), timeout) > 0) {
 			dev_crit(hdev->dev,
-				"Failed to remove device because reset function did not finish\n");
+				"%s Failed to remove device because reset function did not finish\n",
+				dev_name(&(hdev)->pdev->dev));
 			return;
 		}
 	}
@ -2363,7 +2380,7 @@ void hl_device_fini(struct hl_device *hdev)

 	hl_mmu_fini(hdev);

-	vfree(hdev->captured_err_info.pgf_info.user_mappings);
+	vfree(hdev->captured_err_info.page_fault_info.user_mappings);

 	hl_eq_fini(hdev, &hdev->event_queue);

@ -2402,7 +2419,12 @@ void hl_device_fini(struct hl_device *hdev)
 */
 inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
 {
-	return readl(hdev->rmmio + reg);
+	u32 val = readl(hdev->rmmio + reg);
+
+	if (unlikely(trace_habanalabs_rreg32_enabled()))
+		trace_habanalabs_rreg32(hdev->dev, reg, val);
+
+	return val;
 }

 /*
@ -2417,12 +2439,17 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
 */
 inline void hl_wreg(struct hl_device *hdev, u32 reg, u32 val)
 {
+	if (unlikely(trace_habanalabs_wreg32_enabled()))
+		trace_habanalabs_wreg32(hdev->dev, reg, val);
+
 	writel(val, hdev->rmmio + reg);
 }

 void hl_capture_razwi(struct hl_device *hdev, u64 addr, u16 *engine_id, u16 num_of_engines,
 			u8 flags)
 {
+	struct razwi_info *razwi_info = &hdev->captured_err_info.razwi_info;
+
 	if (num_of_engines > HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR) {
 		dev_err(hdev->dev,
 				"Number of possible razwi initiators (%u) exceeded limit (%u)\n",
@ -2431,15 +2458,17 @@ void hl_capture_razwi(struct hl_device *hdev, u64 addr, u16 *engine_id, u16 num_
 	}

 	/* In case it's the first razwi since the device was opened, capture its parameters */
-	if (atomic_cmpxchg(&hdev->captured_err_info.razwi_info_recorded, 0, 1))
+	if (atomic_cmpxchg(&hdev->captured_err_info.razwi_info.razwi_detected, 0, 1))
 		return;

-	hdev->captured_err_info.razwi.timestamp = ktime_to_ns(ktime_get());
-	hdev->captured_err_info.razwi.addr = addr;
-	hdev->captured_err_info.razwi.num_of_possible_engines = num_of_engines;
-	memcpy(&hdev->captured_err_info.razwi.engine_id[0], &engine_id[0],
+	razwi_info->razwi.timestamp = ktime_to_ns(ktime_get());
+	razwi_info->razwi.addr = addr;
+	razwi_info->razwi.num_of_possible_engines = num_of_engines;
+	memcpy(&razwi_info->razwi.engine_id[0], &engine_id[0],
 			num_of_engines * sizeof(u16));
-	hdev->captured_err_info.razwi.flags = flags;
+	razwi_info->razwi.flags = flags;
+
+	razwi_info->razwi_info_available = true;
 }

 void hl_handle_razwi(struct hl_device *hdev, u64 addr, u16 *engine_id, u16 num_of_engines,
@ -2453,7 +2482,7 @@ void hl_handle_razwi(struct hl_device *hdev, u64 addr, u16 *engine_id, u16 num_o

 static void hl_capture_user_mappings(struct hl_device *hdev, bool is_pmmu)
 {
-	struct page_fault_info *pgf_info = &hdev->captured_err_info.pgf_info;
+	struct page_fault_info *pgf_info = &hdev->captured_err_info.page_fault_info;
 	struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
 	struct hl_vm_hash_node *hnode;
 	struct hl_userptr *userptr;
@ -2515,14 +2544,18 @@ static void hl_capture_user_mappings(struct hl_device *hdev, bool is_pmmu)

 void hl_capture_page_fault(struct hl_device *hdev, u64 addr, u16 eng_id, bool is_pmmu)
 {
+	struct page_fault_info *pgf_info = &hdev->captured_err_info.page_fault_info;
+
 	/* Capture only the first page fault */
-	if (atomic_cmpxchg(&hdev->captured_err_info.pgf_info_recorded, 0, 1))
+	if (atomic_cmpxchg(&pgf_info->page_fault_detected, 0, 1))
 		return;

-	hdev->captured_err_info.pgf_info.pgf.timestamp = ktime_to_ns(ktime_get());
-	hdev->captured_err_info.pgf_info.pgf.addr = addr;
-	hdev->captured_err_info.pgf_info.pgf.engine_id = eng_id;
+	pgf_info->page_fault.timestamp = ktime_to_ns(ktime_get());
+	pgf_info->page_fault.addr = addr;
+	pgf_info->page_fault.engine_id = eng_id;
 	hl_capture_user_mappings(hdev, is_pmmu);
+
+	pgf_info->page_fault_info_available = true;
 }

 void hl_handle_page_fault(struct hl_device *hdev, u64 addr, u16 eng_id, bool is_pmmu,
--- a/drivers/accel/habanalabs/common/firmware_if.c
+++ b/drivers/accel/habanalabs/common/firmware_if.c
@ -14,8 +14,32 @@
 #include <linux/ctype.h>
 #include <linux/vmalloc.h>

+#include <trace/events/habanalabs.h>
+
 #define FW_FILE_MAX_SIZE		0x1400000 /* maximum size of 20MB */

+static char *comms_cmd_str_arr[COMMS_INVLD_LAST] = {
+	[COMMS_NOOP] = __stringify(COMMS_NOOP),
+	[COMMS_CLR_STS] = __stringify(COMMS_CLR_STS),
+	[COMMS_RST_STATE] = __stringify(COMMS_RST_STATE),
+	[COMMS_PREP_DESC] = __stringify(COMMS_PREP_DESC),
+	[COMMS_DATA_RDY] = __stringify(COMMS_DATA_RDY),
+	[COMMS_EXEC] = __stringify(COMMS_EXEC),
+	[COMMS_RST_DEV] = __stringify(COMMS_RST_DEV),
+	[COMMS_GOTO_WFE] = __stringify(COMMS_GOTO_WFE),
+	[COMMS_SKIP_BMC] = __stringify(COMMS_SKIP_BMC),
+	[COMMS_PREP_DESC_ELBI] = __stringify(COMMS_PREP_DESC_ELBI),
+};
+
+static char *comms_sts_str_arr[COMMS_STS_INVLD_LAST] = {
+	[COMMS_STS_NOOP] = __stringify(COMMS_STS_NOOP),
+	[COMMS_STS_ACK] = __stringify(COMMS_STS_ACK),
+	[COMMS_STS_OK] = __stringify(COMMS_STS_OK),
+	[COMMS_STS_ERR] = __stringify(COMMS_STS_ERR),
+	[COMMS_STS_VALID_ERR] = __stringify(COMMS_STS_VALID_ERR),
+	[COMMS_STS_TIMEOUT_ERR] = __stringify(COMMS_STS_TIMEOUT_ERR),
+};
+
 static char *extract_fw_ver_from_str(const char *fw_str)
 {
 	char *str, *fw_ver, *whitespace;
@ -311,7 +335,7 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
 			dev_dbg(hdev->dev, "Device CPU packet timeout (0x%x) due to FW reset\n",
 					tmp);
 		else
-			dev_err(hdev->dev, "Device CPU packet timeout (0x%x)\n", tmp);
+			dev_err(hdev->dev, "Device CPU packet timeout (status = 0x%x)\n", tmp);
 		hdev->device_cpu_disabled = true;
 		goto out;
 	}
@ -1322,13 +1346,12 @@ static void detect_cpu_boot_status(struct hl_device *hdev, u32 status)
 		break;
 	default:
 		dev_err(hdev->dev,
-			"Device boot progress - Invalid status code %d\n",
-			status);
+			"Device boot progress - Invalid or unexpected status code %d\n", status);
 		break;
 	}
 }

-static int hl_fw_wait_preboot_ready(struct hl_device *hdev)
+int hl_fw_wait_preboot_ready(struct hl_device *hdev)
 {
 	struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 	u32 status;
@ -1353,8 +1376,8 @@ static int hl_fw_wait_preboot_ready(struct hl_device *hdev)
 		pre_fw_load->wait_for_preboot_timeout);

 	if (rc) {
-		dev_err(hdev->dev, "CPU boot ready status timeout\n");
 		detect_cpu_boot_status(hdev, status);
+		dev_err(hdev->dev, "CPU boot ready timeout (status = %d)\n", status);

 		/* If we read all FF, then something is totally wrong, no point
 		 * of reading specific errors
@ -1634,6 +1657,7 @@ static void hl_fw_dynamic_send_cmd(struct hl_device *hdev,
 	val = FIELD_PREP(COMMS_COMMAND_CMD_MASK, cmd);
 	val |= FIELD_PREP(COMMS_COMMAND_SIZE_MASK, size);

+	trace_habanalabs_comms_send_cmd(hdev->dev, comms_cmd_str_arr[cmd]);
 	WREG32(le32_to_cpu(dyn_regs->kmd_msg_to_cpu), val);
 }

@ -1691,6 +1715,8 @@ static int hl_fw_dynamic_wait_for_status(struct hl_device *hdev,

 	dyn_regs = &fw_loader->dynamic_loader.comm_desc.cpu_dyn_regs;

+	trace_habanalabs_comms_wait_status(hdev->dev, comms_sts_str_arr[expected_status]);
+
 	/* Wait for expected status */
 	rc = hl_poll_timeout(
 		hdev,
@ -1706,6 +1732,8 @@ static int hl_fw_dynamic_wait_for_status(struct hl_device *hdev,
 		return -EIO;
 	}

+	trace_habanalabs_comms_wait_status_done(hdev->dev, comms_sts_str_arr[expected_status]);
+
 	/*
 	 * skip storing FW response for NOOP to preserve the actual desired
 	 * FW status
@ -1778,6 +1806,8 @@ int hl_fw_dynamic_send_protocol_cmd(struct hl_device *hdev,
 {
 	int rc;

+	trace_habanalabs_comms_protocol_cmd(hdev->dev, comms_cmd_str_arr[cmd]);
+
 	/* first send clear command to clean former commands */
 	rc = hl_fw_dynamic_send_clear_cmd(hdev, fw_loader);
 	if (rc)
@ -1884,7 +1914,7 @@ static int hl_fw_dynamic_validate_memory_bound(struct hl_device *hdev,
 *
 * @hdev: pointer to the habanalabs device structure
 * @fw_loader: managing structure for loading device's FW
- * @fw_desc: the descriptor form FW
+ * @fw_desc: the descriptor from FW
 *
 * @return 0 on success, otherwise non-zero error code
 */
@ -1901,11 +1931,11 @@ static int hl_fw_dynamic_validate_descriptor(struct hl_device *hdev,
 	int rc;

 	if (le32_to_cpu(fw_desc->header.magic) != HL_COMMS_DESC_MAGIC)
-		dev_warn(hdev->dev, "Invalid magic for dynamic FW descriptor (%x)\n",
+		dev_dbg(hdev->dev, "Invalid magic for dynamic FW descriptor (%x)\n",
 				fw_desc->header.magic);

 	if (fw_desc->header.version != HL_COMMS_DESC_VER)
-		dev_warn(hdev->dev, "Invalid version for dynamic FW descriptor (%x)\n",
+		dev_dbg(hdev->dev, "Invalid version for dynamic FW descriptor (%x)\n",
 				fw_desc->header.version);

 	/*
@ -1976,6 +2006,43 @@ static int hl_fw_dynamic_validate_response(struct hl_device *hdev,
 	return rc;
 }

+/*
+ * hl_fw_dynamic_read_descriptor_msg - read and show the ascii msg that sent by fw
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @fw_desc: the descriptor from FW
+ */
+static void hl_fw_dynamic_read_descriptor_msg(struct hl_device *hdev,
+					struct lkd_fw_comms_desc *fw_desc)
+{
+	int i;
+	char *msg;
+
+	for (i = 0 ; i < LKD_FW_ASCII_MSG_MAX ; i++) {
+		if (!fw_desc->ascii_msg[i].valid)
+			return;
+
+		/* force NULL termination */
+		msg = fw_desc->ascii_msg[i].msg;
+		msg[LKD_FW_ASCII_MSG_MAX_LEN - 1] = '\0';
+
+		switch (fw_desc->ascii_msg[i].msg_lvl) {
+		case LKD_FW_ASCII_MSG_ERR:
+			dev_err(hdev->dev, "fw: %s", fw_desc->ascii_msg[i].msg);
+			break;
+		case LKD_FW_ASCII_MSG_WRN:
+			dev_warn(hdev->dev, "fw: %s", fw_desc->ascii_msg[i].msg);
+			break;
+		case LKD_FW_ASCII_MSG_INF:
+			dev_info(hdev->dev, "fw: %s", fw_desc->ascii_msg[i].msg);
+			break;
+		default:
+			dev_dbg(hdev->dev, "fw: %s", fw_desc->ascii_msg[i].msg);
+			break;
+		}
+	}
+}
+
 /**
 * hl_fw_dynamic_read_and_validate_descriptor - read and validate FW descriptor
 *
@ -1988,9 +2055,10 @@ static int hl_fw_dynamic_read_and_validate_descriptor(struct hl_device *hdev,
 						struct fw_load_mgr *fw_loader)
 {
 	struct lkd_fw_comms_desc *fw_desc;
-	void __iomem *src, *temp_fw_desc;
 	struct pci_mem_region *region;
 	struct fw_response *response;
+	void *temp_fw_desc;
+	void __iomem *src;
 	u16 fw_data_size;
 	enum pci_region region_id;
 	int rc;
@ -2039,6 +2107,10 @@ static int hl_fw_dynamic_read_and_validate_descriptor(struct hl_device *hdev,

 	rc = hl_fw_dynamic_validate_descriptor(hdev, fw_loader,
 					(struct lkd_fw_comms_desc *) temp_fw_desc);
+
+	if (!rc)
+		hl_fw_dynamic_read_descriptor_msg(hdev, temp_fw_desc);
+
 	vfree(temp_fw_desc);

 	return rc;
@ -2354,7 +2426,7 @@ static int hl_fw_dynamic_wait_for_boot_fit_active(struct hl_device *hdev,
 		hdev->fw_poll_interval_usec,
 		dyn_loader->wait_for_bl_timeout);
 	if (rc) {
-		dev_err(hdev->dev, "failed to wait for boot\n");
+		dev_err(hdev->dev, "failed to wait for boot (status = %d)\n", status);
 		return rc;
 	}

@ -2381,7 +2453,7 @@ static int hl_fw_dynamic_wait_for_linux_active(struct hl_device *hdev,
 		hdev->fw_poll_interval_usec,
 		fw_loader->cpu_timeout);
 	if (rc) {
-		dev_err(hdev->dev, "failed to wait for Linux\n");
+		dev_err(hdev->dev, "failed to wait for Linux (status = %d)\n", status);
 		return rc;
 	}

@ -2459,51 +2531,54 @@ static void hl_fw_linux_update_state(struct hl_device *hdev,
 static int hl_fw_dynamic_send_msg(struct hl_device *hdev,
 		struct fw_load_mgr *fw_loader, u8 msg_type, void *data)
 {
-	struct lkd_msg_comms msg;
+	struct lkd_msg_comms *msg;
 	int rc;

-	memset(&msg, 0, sizeof(msg));
+	msg = kzalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;

 	/* create message to be sent */
-	msg.header.type = msg_type;
-	msg.header.size = cpu_to_le16(sizeof(struct comms_msg_header));
-	msg.header.magic = cpu_to_le32(HL_COMMS_MSG_MAGIC);
+	msg->header.type = msg_type;
+	msg->header.size = cpu_to_le16(sizeof(struct comms_msg_header));
+	msg->header.magic = cpu_to_le32(HL_COMMS_MSG_MAGIC);

 	switch (msg_type) {
 	case HL_COMMS_RESET_CAUSE_TYPE:
-		msg.reset_cause = *(__u8 *) data;
+		msg->reset_cause = *(__u8 *) data;
 		break;

 	default:
 		dev_err(hdev->dev,
 			"Send COMMS message - invalid message type %u\n",
 			msg_type);
-		return -EINVAL;
+		rc = -EINVAL;
+		goto out;
 	}

 	rc = hl_fw_dynamic_request_descriptor(hdev, fw_loader,
 			sizeof(struct lkd_msg_comms));
 	if (rc)
-		return rc;
+		goto out;

 	/* copy message to space allocated by FW */
-	rc = hl_fw_dynamic_copy_msg(hdev, &msg, fw_loader);
+	rc = hl_fw_dynamic_copy_msg(hdev, msg, fw_loader);
 	if (rc)
-		return rc;
+		goto out;

 	rc = hl_fw_dynamic_send_protocol_cmd(hdev, fw_loader, COMMS_DATA_RDY,
 						0, true,
 						fw_loader->cpu_timeout);
 	if (rc)
-		return rc;
+		goto out;

 	rc = hl_fw_dynamic_send_protocol_cmd(hdev, fw_loader, COMMS_EXEC,
 						0, true,
 						fw_loader->cpu_timeout);
-	if (rc)
-		return rc;

-	return 0;
+out:
+	kfree(msg);
+	return rc;
 }

 /**
@ -2560,13 +2635,43 @@ static int hl_fw_dynamic_init_cpu(struct hl_device *hdev,
 	}

 	if (!(hdev->fw_components & FW_TYPE_BOOT_CPU)) {
+		struct lkd_fw_binning_info *binning_info;
+
 		rc = hl_fw_dynamic_request_descriptor(hdev, fw_loader, 0);
 		if (rc)
 			goto protocol_err;

 		/* read preboot version */
-		return hl_fw_dynamic_read_device_fw_version(hdev, FW_COMP_PREBOOT,
+		rc = hl_fw_dynamic_read_device_fw_version(hdev, FW_COMP_PREBOOT,
 				fw_loader->dynamic_loader.comm_desc.cur_fw_ver);
+
+		if (rc)
+			return rc;
+
+		/* read binning info from preboot */
+		if (hdev->support_preboot_binning) {
+			binning_info = &fw_loader->dynamic_loader.comm_desc.binning_info;
+			hdev->tpc_binning = le64_to_cpu(binning_info->tpc_mask_l);
+			hdev->dram_binning = le32_to_cpu(binning_info->dram_mask);
+			hdev->edma_binning = le32_to_cpu(binning_info->edma_mask);
+			hdev->decoder_binning = le32_to_cpu(binning_info->dec_mask);
+			hdev->rotator_binning = le32_to_cpu(binning_info->rot_mask);
+
+			rc = hdev->asic_funcs->set_dram_properties(hdev);
+			if (rc)
+				return rc;
+
+			rc = hdev->asic_funcs->set_binning_masks(hdev);
+			if (rc)
+				return rc;
+
+			dev_dbg(hdev->dev,
+				"Read binning masks: tpc: 0x%llx, dram: 0x%llx, edma: 0x%x, dec: 0x%x, rot:0x%x\n",
+				hdev->tpc_binning, hdev->dram_binning, hdev->edma_binning,
+				hdev->decoder_binning, hdev->rotator_binning);
+		}
+
+		return 0;
 	}

 	/* load boot fit to FW */
@ -2687,7 +2792,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,

 	if (rc) {
 		dev_dbg(hdev->dev,
-			"No boot fit request received, resuming boot\n");
+			"No boot fit request received (status = %d), resuming boot\n", status);
 	} else {
 		rc = hdev->asic_funcs->load_boot_fit_to_device(hdev);
 		if (rc)
@ -2710,7 +2815,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,

 		if (rc) {
 			dev_err(hdev->dev,
-				"Timeout waiting for boot fit load ack\n");
+				"Timeout waiting for boot fit load ack (status = %d)\n", status);
 			goto out;
 		}

@ -2788,7 +2893,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,

 		if (rc) {
 			dev_err(hdev->dev,
-				"Failed to get ACK on skipping BMC, %d\n",
+				"Failed to get ACK on skipping BMC (status = %d)\n",
 				status);
 			WREG32(msg_to_cpu_reg, KMD_MSG_NA);
 			rc = -EIO;
@ -2815,7 +2920,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,
 				"Device reports FIT image is corrupted\n");
 		else
 			dev_err(hdev->dev,
-				"Failed to load firmware to device, %d\n",
+				"Failed to load firmware to device (status = %d)\n",
 				status);

 		rc = -EIO;
@ -3043,3 +3148,27 @@ int hl_fw_get_sec_attest_info(struct hl_device *hdev, struct cpucp_sec_attest_in
 					sizeof(struct cpucp_sec_attest_info), nonce,
 					HL_CPUCP_SEC_ATTEST_INFO_TINEOUT_USEC);
 }
+
+int hl_fw_send_generic_request(struct hl_device *hdev, enum hl_passthrough_type sub_opcode,
+						dma_addr_t buff, u32 *size)
+{
+	struct cpucp_packet pkt = {0};
+	u64 result;
+	int rc = 0;
+
+	pkt.ctl = cpu_to_le32(CPUCP_PACKET_GENERIC_PASSTHROUGH << CPUCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(buff);
+	pkt.data_max_size = cpu_to_le32(*size);
+	pkt.pkt_subidx = cpu_to_le32(sub_opcode);
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *)&pkt, sizeof(pkt),
+						HL_CPUCP_INFO_TIMEOUT_USEC, &result);
+	if (rc)
+		dev_err(hdev->dev, "failed to send CPUCP data of generic fw pkt\n");
+	else
+		dev_dbg(hdev->dev, "generic pkt was successful, result: 0x%llx\n", result);
+
+	*size = (u32)result;
+
+	return rc;
+}
--- a/drivers/accel/habanalabs/common/habanalabs.h
+++ b/drivers/accel/habanalabs/common/habanalabs.h
@ -11,7 +11,7 @@
 #include "../include/common/cpucp_if.h"
 #include "../include/common/qman_if.h"
 #include "../include/hw_ip/mmu/mmu_general.h"
-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>

 #include <linux/cdev.h>
 #include <linux/iopoll.h>
@ -29,6 +29,8 @@
 #include <linux/coresight.h>
 #include <linux/dma-buf.h>

+#include "security.h"
+
 #define HL_NAME				"habanalabs"

 struct hl_device;
@ -375,7 +377,8 @@ enum hl_cs_type {
 	CS_TYPE_COLLECTIVE_WAIT,
 	CS_RESERVE_SIGNALS,
 	CS_UNRESERVE_SIGNALS,
-	CS_TYPE_ENGINE_CORE
+	CS_TYPE_ENGINE_CORE,
+	CS_TYPE_FLUSH_PCI_HBW_WRITES,
 };

 /*
@ -545,6 +548,8 @@ struct hl_hints_range {
 /**
 * struct asic_fixed_properties - ASIC specific immutable properties.
 * @hw_queues_props: H/W queues properties.
+ * @special_blocks: points to an array containing special blocks info.
+ * @skip_special_blocks_cfg: special blocks skip configs.
 * @cpucp_info: received various information from CPU-CP regarding the H/W, e.g.
 *		available sensors.
 * @uboot_ver: F/W U-boot version.
@ -644,6 +649,10 @@ struct hl_hints_range {
 *                                      (i.e. the DRAM supports multiple page sizes), otherwise
 *                                      it will shall  be equal to dram_page_size.
 * @num_engine_cores: number of engine cpu cores
+ * @num_of_special_blocks: special_blocks array size.
+ * @glbl_err_cause_num: global err cause number.
+ * @hbw_flush_reg: register to read to generate HBW flush. value of 0 means HBW flush is
+ *                 not supported.
 * @collective_first_sob: first sync object available for collective use
 * @collective_first_mon: first monitor available for collective use
 * @sync_stream_first_sob: first sync object available for sync stream use
@ -692,6 +701,8 @@ struct hl_hints_range {
 */
 struct asic_fixed_properties {
 	struct hw_queue_properties	*hw_queues_props;
+	struct hl_special_block_info	*special_blocks;
+	struct hl_skip_blocks_cfg	skip_special_blocks_cfg;
 	struct cpucp_info		cpucp_info;
 	char				uboot_ver[VERSION_MAX_LEN];
 	char				preboot_ver[VERSION_MAX_LEN];
@ -764,6 +775,9 @@ struct asic_fixed_properties {
 	u32				xbar_edge_enabled_mask;
 	u32				device_mem_alloc_default_page_size;
 	u32				num_engine_cores;
+	u32				num_of_special_blocks;
+	u32				glbl_err_cause_num;
+	u32				hbw_flush_reg;
 	u16				collective_first_sob;
 	u16				collective_first_mon;
 	u16				sync_stream_first_sob;
@ -935,6 +949,7 @@ struct hl_mmap_mem_buf {
 * @size: holds the CB's size.
 * @roundup_size: holds the cb size after roundup to page size.
 * @cs_cnt: holds number of CS that this CB participates in.
+ * @is_handle_destroyed: atomic boolean indicating whether or not the CB handle was destroyed.
 * @is_pool: true if CB was acquired from the pool, false otherwise.
 * @is_internal: internally allocated
 * @is_mmu_mapped: true if the CB is mapped to the device's MMU.
@ -951,6 +966,7 @@ struct hl_cb {
 	u32			size;
 	u32			roundup_size;
 	atomic_t		cs_cnt;
+	atomic_t		is_handle_destroyed;
 	u8			is_pool;
 	u8			is_internal;
 	u8			is_mmu_mapped;
@ -1077,20 +1093,25 @@ struct hl_cq {
 	atomic_t		free_slots_cnt;
 };

+enum hl_user_interrupt_type {
+	HL_USR_INTERRUPT_CQ = 0,
+	HL_USR_INTERRUPT_DECODER,
+};
+
 /**
 * struct hl_user_interrupt - holds user interrupt information
 * @hdev: pointer to the device structure
+ * @type: user interrupt type
 * @wait_list_head: head to the list of user threads pending on this interrupt
 * @wait_list_lock: protects wait_list_head
 * @interrupt_id: msix interrupt id
- * @is_decoder: whether this entry represents a decoder interrupt
 */
 struct hl_user_interrupt {
-	struct hl_device	*hdev;
-	struct list_head	wait_list_head;
-	spinlock_t		wait_list_lock;
-	u32			interrupt_id;
-	bool			is_decoder;
+	struct hl_device		*hdev;
+	enum hl_user_interrupt_type	type;
+	struct list_head		wait_list_head;
+	spinlock_t			wait_list_lock;
+	u32				interrupt_id;
 };

 /**
@ -1540,8 +1561,10 @@ struct engines_data {
 * @check_if_razwi_happened: check if there was a razwi due to RR violation.
 * @access_dev_mem: access device memory
 * @set_dram_bar_base: set the base of the DRAM BAR
- * @set_engine_cores: set a config command to enigne cores
+ * @set_engine_cores: set a config command to engine cores
 * @send_device_activity: indication to FW about device availability
+ * @set_dram_properties: set DRAM related properties.
+ * @set_binning_masks: set binning/enable masks for all relevant components.
 */
 struct hl_asic_funcs {
 	int (*early_init)(struct hl_device *hdev);
@ -1679,6 +1702,8 @@ struct hl_asic_funcs {
 	int (*set_engine_cores)(struct hl_device *hdev, u32 *core_ids,
 					u32 num_cores, u32 core_command);
 	int (*send_device_activity)(struct hl_device *hdev, bool open);
+	int (*set_dram_properties)(struct hl_device *hdev);
+	int (*set_binning_masks)(struct hl_device *hdev);
 };


@ -1739,8 +1764,9 @@ struct hl_cs_counters_atomic {
 * struct hl_dmabuf_priv - a dma-buf private object.
 * @dmabuf: pointer to dma-buf object.
 * @ctx: pointer to the dma-buf owner's context.
- * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported for
- *                memory allocation handle.
+ * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported
+ *                where virtual memory is supported.
+ * @memhash_hnode: pointer to the memhash node. this object holds the export count.
 * @device_address: physical address of the device's memory. Relevant only
 *                  if phys_pg_pack is NULL (dma-buf was exported from address).
 *                  The total size can be taken from the dmabuf object.
@ -1749,6 +1775,7 @@ struct hl_dmabuf_priv {
 	struct dma_buf			*dmabuf;
 	struct hl_ctx			*ctx;
 	struct hl_vm_phys_pg_pack	*phys_pg_pack;
+	struct hl_vm_hash_node		*memhash_hnode;
 	uint64_t			device_address;
 };

@ -1923,6 +1950,7 @@ struct hl_userptr {
 * @type: CS_TYPE_*.
 * @jobs_cnt: counter of submitted jobs on all queues.
 * @encaps_sig_hdl_id: encaps signals handle id, set for the first staged cs.
+ * @completion_timestamp: timestamp of the last completed cs job.
 * @sob_addr_offset: sob offset from the configuration base address.
 * @initial_sob_count: count of completed signals in SOB before current submission of signal or
 *                     cs with encaps signals.
@ -1955,6 +1983,7 @@ struct hl_cs {
 	struct list_head	staged_cs_node;
 	struct list_head	debugfs_list;
 	struct hl_cs_encaps_sig_handle *encaps_sig_hdl;
+	ktime_t			completion_timestamp;
 	u64			sequence;
 	u64			staged_sequence;
 	u64			timeout_jiffies;
@ -1990,6 +2019,7 @@ struct hl_cs {
 * @debugfs_list: node in debugfs list of command submission jobs.
 * @refcount: reference counter for usage of the CS job.
 * @queue_type: the type of the H/W queue this job is submitted to.
+ * @timestamp: timestamp upon job completion
 * @id: the id of this job inside a CS.
 * @hw_queue_id: the id of the H/W queue this job is submitted to.
 * @user_cb_size: the actual size of the CB we got from the user.
@ -2016,6 +2046,7 @@ struct hl_cs_job {
 	struct list_head	debugfs_list;
 	struct kref		refcount;
 	enum hl_queue_type	queue_type;
+	ktime_t			timestamp;
 	u32			id;
 	u32			hw_queue_id;
 	u32			user_cb_size;
@ -2076,12 +2107,16 @@ struct hl_cs_parser {
 *				hl_userptr).
 * @node: node to hang on the hash table in context object.
 * @vaddr: key virtual address.
+ * @handle: memory handle for device memory allocation.
 * @ptr: value pointer (hl_vm_phys_pg_list or hl_userptr).
+ * @export_cnt: number of exports from within the VA block.
 */
 struct hl_vm_hash_node {
 	struct hlist_node	node;
 	u64			vaddr;
+	u64			handle;
 	void			*ptr;
+	int			export_cnt;
 };

 /**
@ -2109,10 +2144,10 @@ struct hl_vm_hw_block_list_node {
 * @pages: the physical page array.
 * @npages: num physical pages in the pack.
 * @total_size: total size of all the pages in this list.
+ * @exported_size: buffer exported size.
 * @node: used to attach to deletion list that is used when all the allocations are cleared
 *        at the teardown of the context.
 * @mapping_cnt: number of shared mappings.
- * @exporting_cnt: number of dma-buf exporting.
 * @asid: the context related to this list.
 * @page_size: size of each page in the pack.
 * @flags: HL_MEM_* flags related to this list.
@ -2126,9 +2161,9 @@ struct hl_vm_phys_pg_pack {
 	u64			*pages;
 	u64			npages;
 	u64			total_size;
+	u64			exported_size;
 	struct list_head	node;
 	atomic_t		mapping_cnt;
-	u32			exporting_cnt;
 	u32			asid;
 	u32			page_size;
 	u32			flags;
@ -2675,11 +2710,11 @@ void hl_wreg(struct hl_device *hdev, u32 reg, u32 val);
 	p->size = sz; \
 })

-#define HL_USR_INTR_STRUCT_INIT(usr_intr, hdev, intr_id, decoder) \
+#define HL_USR_INTR_STRUCT_INIT(usr_intr, hdev, intr_id, intr_type) \
 ({ \
 	usr_intr.hdev = hdev; \
 	usr_intr.interrupt_id = intr_id; \
-	usr_intr.is_decoder = decoder; \
+	usr_intr.type = intr_type; \
 	INIT_LIST_HEAD(&usr_intr.wait_list_head); \
 	spin_lock_init(&usr_intr.wait_list_lock); \
 })
@ -2961,37 +2996,53 @@ struct undefined_opcode_info {
 };

 /**
- * struct page_fault_info - info about page fault
- * @pgf_info: page fault information.
+ * struct page_fault_info - page fault information.
+ * @page_fault: holds information collected during a page fault.
 * @user_mappings: buffer containing user mappings.
 * @num_of_user_mappings: number of user mappings.
+ * @page_fault_detected: if set as 1, then a page-fault was discovered for the
+ *                       first time after the driver has finished booting-up.
+ *                       Since we're looking for the page-fault's root cause,
+ *                       we don't care of the others that might follow it-
+ *                       so once changed to 1, it will remain that way.
+ * @page_fault_info_available: indicates that a page fault info is now available.
 */
 struct page_fault_info {
-	struct hl_page_fault_info	pgf;
+	struct hl_page_fault_info	page_fault;
 	struct hl_user_mapping		*user_mappings;
 	u64				num_of_user_mappings;
+	atomic_t			page_fault_detected;
+	bool				page_fault_info_available;
+};
+
+/**
+ * struct razwi_info - RAZWI information.
+ * @razwi: holds information collected during a RAZWI
+ * @razwi_detected: if set as 1, then a RAZWI was discovered for the
+ *                  first time after the driver has finished booting-up.
+ *                  Since we're looking for the RAZWI's root cause,
+ *                  we don't care of the others that might follow it-
+ *                  so once changed to 1, it will remain that way.
+ * @razwi_info_available: indicates that a RAZWI info is now available.
+ */
+struct razwi_info {
+	struct hl_info_razwi_event	razwi;
+	atomic_t			razwi_detected;
+	bool				razwi_info_available;
 };

 /**
 * struct hl_error_info - holds information collected during an error.
 * @cs_timeout: CS timeout error information.
- * @razwi: razwi information.
- * @razwi_info_recorded: if set writing to razwi information is enabled.
- *                       otherwise - disabled, so the first (root cause) razwi will not be
- *                       overwritten.
- * @undef_opcode: undefined opcode information
- * @pgf_info: page fault information.
- * @pgf_info_recorded: if set writing to page fault information is enabled.
- *                     otherwise - disabled, so the first (root cause) page fault will not be
- *                     overwritten.
+ * @razwi_info: RAZWI information.
+ * @undef_opcode: undefined opcode information.
+ * @page_fault_info: page fault information.
 */
 struct hl_error_info {
 	struct cs_timeout_info		cs_timeout;
-	struct hl_info_razwi_event	razwi;
-	atomic_t			razwi_info_recorded;
+	struct razwi_info		razwi_info;
 	struct undefined_opcode_info	undef_opcode;
-	struct page_fault_info		pgf_info;
-	atomic_t			pgf_info_recorded;
+	struct page_fault_info		page_fault_info;
 };

 /**
@ -3157,6 +3208,8 @@ struct hl_reset_info {
 * @edma_binning: contains mask of edma engines that is received from the f/w which
 *                   indicates which edma engines are binned-out
 * @device_release_watchdog_timeout_sec: device release watchdog timeout value in seconds.
+ * @rotator_binning: contains mask of rotators engines that is received from the f/w
+ *			which indicates which rotator engines are binned-out(Gaudi3 and above).
 * @id: device minor.
 * @id_control: minor of the control device.
 * @cdev_idx: char device index. Used for setting its name.
@ -3214,6 +3267,7 @@ struct hl_reset_info {
 * @heartbeat: Controls if we want to enable the heartbeat mechanism vs. the f/w, which verifies
 *             that the f/w is always alive. Used only for testing.
 * @supports_ctx_switch: true if a ctx switch is required upon first submission.
+ * @support_preboot_binning: true if we support read binning info from preboot.
 */
 struct hl_device {
 	struct pci_dev			*pdev;
@ -3322,6 +3376,7 @@ struct hl_device {
 	u32				decoder_binning;
 	u32				edma_binning;
 	u32				device_release_watchdog_timeout_sec;
+	u32				rotator_binning;
 	u16				id;
 	u16				id_control;
 	u16				cdev_idx;
@ -3355,6 +3410,7 @@ struct hl_device {
 	u8				supports_mmu_prefetch;
 	u8				reset_upon_device_release;
 	u8				supports_ctx_switch;
+	u8				support_preboot_binning;

 	/* Parameters for bring-up */
 	u64				nic_ports_mask;
@ -3729,6 +3785,7 @@ int hl_fw_cpucp_power_get(struct hl_device *hdev, u64 *power);
 void hl_fw_ask_hard_reset_without_linux(struct hl_device *hdev);
 void hl_fw_ask_halt_machine_without_linux(struct hl_device *hdev);
 int hl_fw_init_cpu(struct hl_device *hdev);
+int hl_fw_wait_preboot_ready(struct hl_device *hdev);
 int hl_fw_read_preboot_status(struct hl_device *hdev);
 int hl_fw_dynamic_send_protocol_cmd(struct hl_device *hdev,
 				struct fw_load_mgr *fw_loader,
@ -3772,6 +3829,8 @@ int hl_fw_get_clk_rate(struct hl_device *hdev, u32 *cur_clk, u32 *max_clk);
 void hl_fw_set_pll_profile(struct hl_device *hdev);
 void hl_sysfs_add_dev_clk_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp);
 void hl_sysfs_add_dev_vrm_attr(struct hl_device *hdev, struct attribute_group *dev_vrm_attr_grp);
+int hl_fw_send_generic_request(struct hl_device *hdev, enum hl_passthrough_type sub_opcode,
+						dma_addr_t buff, u32 *size);

 void hw_sob_get(struct hl_hw_sob *hw_sob);
 void hw_sob_put(struct hl_hw_sob *hw_sob);
@ -3786,6 +3845,7 @@ void hl_dec_fini(struct hl_device *hdev);
 void hl_dec_ctx_fini(struct hl_ctx *ctx);

 void hl_release_pending_user_interrupts(struct hl_device *hdev);
+void hl_abort_waitings_for_completion(struct hl_device *hdev);
 int hl_cs_signal_sob_wraparound_handler(struct hl_device *hdev, u32 q_idx,
 			struct hl_hw_sob **hw_sob, u32 count, bool encaps_sig);

--- a/drivers/accel/habanalabs/common/habanalabs_drv.c
+++ b/drivers/accel/habanalabs/common/habanalabs_drv.c
@ -222,9 +222,11 @@ int hl_device_open(struct inode *inode, struct file *filp)
 	hl_debugfs_add_file(hpriv);

 	atomic_set(&hdev->captured_err_info.cs_timeout.write_enable, 1);
-	atomic_set(&hdev->captured_err_info.razwi_info_recorded, 0);
-	atomic_set(&hdev->captured_err_info.pgf_info_recorded, 0);
+	atomic_set(&hdev->captured_err_info.razwi_info.razwi_detected, 0);
+	atomic_set(&hdev->captured_err_info.page_fault_info.page_fault_detected, 0);
 	hdev->captured_err_info.undef_opcode.write_enable = true;
+	hdev->captured_err_info.razwi_info.razwi_info_available = false;
+	hdev->captured_err_info.page_fault_info.page_fault_info_available = false;

 	hdev->open_counter++;
 	hdev->last_successful_open_jif = jiffies;
--- a/drivers/accel/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/accel/habanalabs/common/habanalabs_ioctl.c
@ -7,7 +7,7 @@

 #define pr_fmt(fmt)	"habanalabs: " fmt

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "habanalabs.h"

 #include <linux/fs.h>
@ -607,16 +607,20 @@ static int cs_timeout_info(struct hl_fpriv *hpriv, struct hl_info_args *args)

 static int razwi_info(struct hl_fpriv *hpriv, struct hl_info_args *args)
 {
+	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
 	struct hl_device *hdev = hpriv->hdev;
 	u32 max_size = args->return_size;
-	struct hl_info_razwi_event *info = &hdev->captured_err_info.razwi;
-	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
+	struct razwi_info *razwi_info;

 	if ((!max_size) || (!out))
 		return -EINVAL;

-	return copy_to_user(out, info, min_t(size_t, max_size, sizeof(struct hl_info_razwi_event)))
-				? -EFAULT : 0;
+	razwi_info = &hdev->captured_err_info.razwi_info;
+	if (!razwi_info->razwi_info_available)
+		return 0;
+
+	return copy_to_user(out, &razwi_info->razwi,
+			min_t(size_t, max_size, sizeof(struct hl_info_razwi_event))) ? -EFAULT : 0;
 }

 static int undefined_opcode_info(struct hl_fpriv *hpriv, struct hl_info_args *args)
@ -786,16 +790,20 @@ static int engine_status_info(struct hl_fpriv *hpriv, struct hl_info_args *args)

 static int page_fault_info(struct hl_fpriv *hpriv, struct hl_info_args *args)
 {
+	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
 	struct hl_device *hdev = hpriv->hdev;
 	u32 max_size = args->return_size;
-	struct hl_page_fault_info *info = &hdev->captured_err_info.pgf_info.pgf;
-	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
+	struct page_fault_info *pgf_info;

 	if ((!max_size) || (!out))
 		return -EINVAL;

-	return copy_to_user(out, info, min_t(size_t, max_size, sizeof(struct hl_page_fault_info)))
-				? -EFAULT : 0;
+	pgf_info = &hdev->captured_err_info.page_fault_info;
+	if (!pgf_info->page_fault_info_available)
+		return 0;
+
+	return copy_to_user(out, &pgf_info->page_fault,
+			min_t(size_t, max_size, sizeof(struct hl_page_fault_info))) ? -EFAULT : 0;
 }

 static int user_mappings_info(struct hl_fpriv *hpriv, struct hl_info_args *args)
@ -806,18 +814,68 @@ static int user_mappings_info(struct hl_fpriv *hpriv, struct hl_info_args *args)
 	struct page_fault_info *pgf_info;
 	u64 actual_size;

-	pgf_info = &hdev->captured_err_info.pgf_info;
-	args->array_size = pgf_info->num_of_user_mappings;
-
 	if (!out)
 		return -EINVAL;

+	pgf_info = &hdev->captured_err_info.page_fault_info;
+	if (!pgf_info->page_fault_info_available)
+		return 0;
+
+	args->array_size = pgf_info->num_of_user_mappings;
+
 	actual_size = pgf_info->num_of_user_mappings * sizeof(struct hl_user_mapping);
 	if (user_buf_size < actual_size)
 		return -ENOMEM;

-	return copy_to_user(out, pgf_info->user_mappings, min_t(size_t, user_buf_size, actual_size))
-				? -EFAULT : 0;
+	return copy_to_user(out, pgf_info->user_mappings, actual_size) ? -EFAULT : 0;
+}
+
+static int send_fw_generic_request(struct hl_device *hdev, struct hl_info_args *info_args)
+{
+	void __user *buff = (void __user *) (uintptr_t) info_args->return_pointer;
+	u32 size = info_args->return_size;
+	dma_addr_t dma_handle;
+	bool need_input_buff;
+	void *fw_buff;
+	int rc = 0;
+
+	switch (info_args->fw_sub_opcode) {
+	case HL_PASSTHROUGH_VERSIONS:
+		need_input_buff = false;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (size > SZ_1M) {
+		dev_err(hdev->dev, "buffer size cannot exceed 1MB\n");
+		return -EINVAL;
+	}
+
+	fw_buff = hl_cpu_accessible_dma_pool_alloc(hdev, size, &dma_handle);
+	if (!fw_buff)
+		return -ENOMEM;
+
+
+	if (need_input_buff && copy_from_user(fw_buff, buff, size)) {
+		dev_dbg(hdev->dev, "Failed to copy from user FW buff\n");
+		rc = -EFAULT;
+		goto free_buff;
+	}
+
+	rc = hl_fw_send_generic_request(hdev, info_args->fw_sub_opcode, dma_handle, &size);
+	if (rc)
+		goto free_buff;
+
+	if (copy_to_user(buff, fw_buff, min(size, info_args->return_size))) {
+		dev_dbg(hdev->dev, "Failed to copy to user FW generic req output\n");
+		rc = -EFAULT;
+	}
+
+free_buff:
+	hl_cpu_accessible_dma_pool_free(hdev, info_args->return_size, fw_buff);
+
+	return rc;
 }

 static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
@ -826,9 +884,13 @@ static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 	enum hl_device_status status;
 	struct hl_info_args *args = data;
 	struct hl_device *hdev = hpriv->hdev;
-
 	int rc;

+	if (args->pad) {
+		dev_dbg(hdev->dev, "Padding bytes must be 0\n");
+		return -EINVAL;
+	}
+
 	/*
 	 * Information is returned for the following opcodes even if the device
 	 * is disabled or in reset.
@ -893,7 +955,7 @@ static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 	}

 	if (!hl_device_operational(hdev, &status)) {
-		dev_warn_ratelimited(dev,
+		dev_dbg_ratelimited(dev,
 			"Device is %s. Can't execute INFO IOCTL\n",
 			hdev->status[status]);
 		return -EBUSY;
@ -947,6 +1009,9 @@ static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 	case HL_INFO_ENGINE_STATUS:
 		return engine_status_info(hpriv, args);

+	case HL_INFO_FW_GENERIC_REQ:
+		return send_fw_generic_request(hdev, args);
+
 	default:
 		dev_err(dev, "Invalid request %d\n", args->op);
 		rc = -EINVAL;
@ -975,7 +1040,7 @@ static int hl_debug_ioctl(struct hl_fpriv *hpriv, void *data)
 	int rc = 0;

 	if (!hl_device_operational(hdev, &status)) {
-		dev_warn_ratelimited(hdev->dev,
+		dev_dbg_ratelimited(hdev->dev,
 			"Device is %s. Can't execute DEBUG IOCTL\n",
 			hdev->status[status]);
 		return -EBUSY;
@ -1072,8 +1137,6 @@ static long _hl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg,
 			retcode = -EFAULT;
 			goto out_err;
 		}
-	} else if (cmd & IOC_OUT) {
-		memset(kdata, 0, usize);
 	}

 	retcode = func(hpriv, kdata);
--- a/drivers/accel/habanalabs/common/hw_queue.c
+++ b/drivers/accel/habanalabs/common/hw_queue.c
--- a/drivers/accel/habanalabs/common/hwmon.c
+++ b/drivers/accel/habanalabs/common/hwmon.c
--- a/drivers/accel/habanalabs/common/irq.c
+++ b/drivers/accel/habanalabs/common/irq.c
@ -72,15 +72,17 @@ static void irq_handle_eqe(struct work_struct *work)
 * @hdev: pointer to device structure
 * @cs_seq: command submission sequence
 * @cq: completion queue
+ * @timestamp: interrupt timestamp
 *
 */
-static void job_finish(struct hl_device *hdev, u32 cs_seq, struct hl_cq *cq)
+static void job_finish(struct hl_device *hdev, u32 cs_seq, struct hl_cq *cq, ktime_t timestamp)
 {
 	struct hl_hw_queue *queue;
 	struct hl_cs_job *job;

 	queue = &hdev->kernel_queues[cq->hw_queue_id];
 	job = queue->shadow_queue[hl_pi_2_offset(cs_seq)];
+	job->timestamp = timestamp;
 	queue_work(hdev->cq_wq[cq->cq_idx], &job->finish_work);

 	atomic_inc(&queue->ci);
@ -91,9 +93,10 @@ static void job_finish(struct hl_device *hdev, u32 cs_seq, struct hl_cq *cq)
 *
 * @hdev: pointer to device structure
 * @cs_seq: command submission sequence
+ * @timestamp: interrupt timestamp
 *
 */
-static void cs_finish(struct hl_device *hdev, u16 cs_seq)
+static void cs_finish(struct hl_device *hdev, u16 cs_seq, ktime_t timestamp)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	struct hl_hw_queue *queue;
@ -113,6 +116,7 @@ static void cs_finish(struct hl_device *hdev, u16 cs_seq)
 		atomic_inc(&queue->ci);
 	}

+	cs->completion_timestamp = timestamp;
 	queue_work(hdev->cs_cmplt_wq, &cs->finish_work);
 }

@ -130,6 +134,7 @@ irqreturn_t hl_irq_handler_cq(int irq, void *arg)
 	bool shadow_index_valid, entry_ready;
 	u16 shadow_index;
 	struct hl_cq_entry *cq_entry, *cq_base;
+	ktime_t timestamp = ktime_get();

 	if (hdev->disabled) {
 		dev_dbg(hdev->dev,
@ -171,9 +176,9 @@ irqreturn_t hl_irq_handler_cq(int irq, void *arg)
 		if (shadow_index_valid && !hdev->disabled) {
 			if (hdev->asic_prop.completion_mode ==
 					HL_COMPLETION_MODE_CS)
-				cs_finish(hdev, shadow_index);
+				cs_finish(hdev, shadow_index, timestamp);
 			else
-				job_finish(hdev, shadow_index, cq);
+				job_finish(hdev, shadow_index, cq, timestamp);
 		}

 		/* Clear CQ entry ready bit */
@ -228,7 +233,7 @@ static void hl_ts_free_objects(struct work_struct *work)
 * list to a dedicated workqueue to do the actual put.
 */
 static int handle_registration_node(struct hl_device *hdev, struct hl_user_pending_interrupt *pend,
-						struct list_head **free_list)
+						struct list_head **free_list, ktime_t now)
 {
 	struct timestamp_reg_free_node *free_node;
 	u64 timestamp;
@ -246,7 +251,7 @@ static int handle_registration_node(struct hl_device *hdev, struct hl_user_pendi
 	if (!free_node)
 		return -ENOMEM;

-	timestamp = ktime_get_ns();
+	timestamp = ktime_to_ns(now);

 	*pend->ts_reg_info.timestamp_kernel_addr = timestamp;

@ -298,7 +303,7 @@ static void handle_user_interrupt(struct hl_device *hdev, struct hl_user_interru
 			if (pend->ts_reg_info.buf) {
 				if (!reg_node_handle_fail) {
 					rc = handle_registration_node(hdev, pend,
-									&ts_reg_free_list_head);
+								&ts_reg_free_list_head, now);
 					if (rc)
 						reg_node_handle_fail = true;
 				}
@ -333,13 +338,22 @@ irqreturn_t hl_irq_handler_user_interrupt(int irq, void *arg)
 	struct hl_user_interrupt *user_int = arg;
 	struct hl_device *hdev = user_int->hdev;

-	if (user_int->is_decoder)
-		handle_user_interrupt(hdev, &hdev->common_decoder_interrupt);
-	else
+	switch (user_int->type) {
+	case HL_USR_INTERRUPT_CQ:
 		handle_user_interrupt(hdev, &hdev->common_user_cq_interrupt);

-	/* Handle user cq or decoder interrupts registered on this specific irq */
-	handle_user_interrupt(hdev, user_int);
+		/* Handle user cq interrupt registered on this specific irq */
+		handle_user_interrupt(hdev, user_int);
+		break;
+	case HL_USR_INTERRUPT_DECODER:
+		handle_user_interrupt(hdev, &hdev->common_decoder_interrupt);
+
+		/* Handle decoder interrupt registered on this specific irq */
+		handle_user_interrupt(hdev, user_int);
+		break;
+	default:
+		break;
+	}

 	return IRQ_HANDLED;
 }
--- a/drivers/accel/habanalabs/common/memory.c
+++ b/drivers/accel/habanalabs/common/memory.c
@ -5,7 +5,7 @@
 * All Rights Reserved.
 */

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "habanalabs.h"
 #include "../include/hw_ip/mmu/mmu_general.h"

@ -19,7 +19,9 @@ MODULE_IMPORT_NS(DMA_BUF);
 #define HL_MMU_DEBUG	0

 /* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */
-#define DRAM_POOL_PAGE_SIZE SZ_8M
+#define DRAM_POOL_PAGE_SIZE	SZ_8M
+
+#define MEM_HANDLE_INVALID	ULONG_MAX

 static int allocate_timestamps_buffers(struct hl_fpriv *hpriv,
 			struct hl_mem_in *args, u64 *handle);
@ -371,12 +373,6 @@ static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args)
 		return -EINVAL;
 	}

-	if (phys_pg_pack->exporting_cnt) {
-		spin_unlock(&vm->idr_lock);
-		dev_dbg(hdev->dev, "handle %u is exported, cannot free\n", handle);
-		return -EINVAL;
-	}
-
 	/* must remove from idr before the freeing of the physical pages as the refcount of the pool
 	 * is also the trigger of the idr destroy
 	 */
@ -1240,6 +1236,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device

 	hnode->ptr = vm_type;
 	hnode->vaddr = ret_vaddr;
+	hnode->handle = is_userptr ? MEM_HANDLE_INVALID : handle;

 	mutex_lock(&ctx->mem_hash_lock);
 	hash_add(ctx->mem_hash, &hnode->node, ret_vaddr);
@ -1313,6 +1310,12 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 		return -EINVAL;
 	}

+	if (hnode->export_cnt) {
+		mutex_unlock(&ctx->mem_hash_lock);
+		dev_err(hdev->dev, "failed to unmap %#llx, memory is exported\n", vaddr);
+		return -EINVAL;
+	}
+
 	hash_del(&hnode->node);
 	mutex_unlock(&ctx->mem_hash_lock);

@ -1545,10 +1548,10 @@ static int set_dma_sg(struct scatterlist *sg, u64 bar_address, u64 chunk_size,
 }

 static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 *pages, u64 npages,
-						u64 page_size, struct device *dev,
-						enum dma_data_direction dir)
+						u64 page_size, u64 exported_size,
+						struct device *dev, enum dma_data_direction dir)
 {
-	u64 chunk_size, bar_address, dma_max_seg_size;
+	u64 chunk_size, bar_address, dma_max_seg_size, cur_size_to_export, cur_npages;
 	struct asic_fixed_properties *prop;
 	int rc, i, j, nents, cur_page;
 	struct scatterlist *sg;
@ -1574,16 +1577,23 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64
 	if (!sgt)
 		return ERR_PTR(-ENOMEM);

+	/* remove export size restrictions in case not explicitly defined */
+	cur_size_to_export = exported_size ? exported_size : (npages * page_size);
+
 	/* If the size of each page is larger than the dma max segment size,
 	 * then we can't combine pages and the number of entries in the SGL
 	 * will just be the
 	 * <number of pages> * <chunks of max segment size in each page>
 	 */
-	if (page_size > dma_max_seg_size)
-		nents = npages * DIV_ROUND_UP_ULL(page_size, dma_max_seg_size);
-	else
+	if (page_size > dma_max_seg_size) {
+		/* we should limit number of pages according to the exported size */
+		cur_npages = DIV_ROUND_UP_SECTOR_T(cur_size_to_export, page_size);
+		nents = cur_npages * DIV_ROUND_UP_SECTOR_T(page_size, dma_max_seg_size);
+	} else {
+		cur_npages = npages;
+
 		/* Get number of non-contiguous chunks */
-		for (i = 1, nents = 1, chunk_size = page_size ; i < npages ; i++) {
+		for (i = 1, nents = 1, chunk_size = page_size ; i < cur_npages ; i++) {
 			if (pages[i - 1] + page_size != pages[i] ||
 					chunk_size + page_size > dma_max_seg_size) {
 				nents++;
@ -1593,6 +1603,7 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64

 			chunk_size += page_size;
 		}
+	}

 	rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO);
 	if (rc)
@ -1615,7 +1626,8 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64
 			else
 				cur_device_address += dma_max_seg_size;

-			chunk_size = min(size_left, dma_max_seg_size);
+			/* make sure not to export over exported size */
+			chunk_size = min3(size_left, dma_max_seg_size, cur_size_to_export);

 			bar_address = hdev->dram_pci_bar_start + cur_device_address;

@ -1623,6 +1635,8 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64
 			if (rc)
 				goto error_unmap;

+			cur_size_to_export -= chunk_size;
+
 			if (size_left > dma_max_seg_size) {
 				size_left -= dma_max_seg_size;
 			} else {
@ -1634,7 +1648,7 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64
 		/* Merge pages and put them into the scatterlist */
 		for_each_sgtable_dma_sg(sgt, sg, i) {
 			chunk_size = page_size;
-			for (j = cur_page + 1 ; j < npages ; j++) {
+			for (j = cur_page + 1 ; j < cur_npages ; j++) {
 				if (pages[j - 1] + page_size != pages[j] ||
 						chunk_size + page_size > dma_max_seg_size)
 					break;
@ -1645,10 +1659,13 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64
 			bar_address = hdev->dram_pci_bar_start +
 					(pages[cur_page] - prop->dram_base_address);

+			/* make sure not to export over exported size */
+			chunk_size = min(chunk_size, cur_size_to_export);
 			rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
 			if (rc)
 				goto error_unmap;

+			cur_size_to_export -= chunk_size;
 			cur_page = j;
 		}
 	}
@ -1719,6 +1736,7 @@ static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
 						phys_pg_pack->pages,
 						phys_pg_pack->npages,
 						phys_pg_pack->page_size,
+						phys_pg_pack->exported_size,
 						attachment->dev,
 						dir);
 	else
@ -1726,6 +1744,7 @@ static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
 						&hl_dmabuf->device_address,
 						1,
 						hl_dmabuf->dmabuf->size,
+						0,
 						attachment->dev,
 						dir);

@ -1763,18 +1782,20 @@ static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
 static void hl_release_dmabuf(struct dma_buf *dmabuf)
 {
 	struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv;
-	struct hl_ctx *ctx = hl_dmabuf->ctx;
-	struct hl_device *hdev = ctx->hdev;
-	struct hl_vm *vm = &hdev->vm;
+	struct hl_ctx *ctx;

-	if (hl_dmabuf->phys_pg_pack) {
-		spin_lock(&vm->idr_lock);
-		hl_dmabuf->phys_pg_pack->exporting_cnt--;
-		spin_unlock(&vm->idr_lock);
+	if (!hl_dmabuf)
+		return;
+
+	ctx = hl_dmabuf->ctx;
+
+	if (hl_dmabuf->memhash_hnode) {
+		mutex_lock(&ctx->mem_hash_lock);
+		hl_dmabuf->memhash_hnode->export_cnt--;
+		mutex_unlock(&ctx->mem_hash_lock);
 	}

-	hl_ctx_put(hl_dmabuf->ctx);
-
+	hl_ctx_put(ctx);
 	kfree(hl_dmabuf);
 }

@ -1785,7 +1806,7 @@ static const struct dma_buf_ops habanalabs_dmabuf_ops = {
 	.release = hl_release_dmabuf,
 };

-static int export_dmabuf_common(struct hl_ctx *ctx,
+static int export_dmabuf(struct hl_ctx *ctx,
 				struct hl_dmabuf_priv *hl_dmabuf,
 				u64 total_size, int flags, int *dmabuf_fd)
 {
@ -1806,7 +1827,7 @@ static int export_dmabuf_common(struct hl_ctx *ctx,

 	fd = dma_buf_fd(hl_dmabuf->dmabuf, flags);
 	if (fd < 0) {
-		dev_err(hdev->dev, "failed to get a file descriptor for a dma-buf\n");
+		dev_err(hdev->dev, "failed to get a file descriptor for a dma-buf, %d\n", fd);
 		rc = fd;
 		goto err_dma_buf_put;
 	}
@ -1819,36 +1840,13 @@ static int export_dmabuf_common(struct hl_ctx *ctx,
 	return 0;

 err_dma_buf_put:
+	hl_dmabuf->dmabuf->priv = NULL;
 	dma_buf_put(hl_dmabuf->dmabuf);
 	return rc;
 }

-/**
- * export_dmabuf_from_addr() - export a dma-buf object for the given memory
- *                             address and size.
- * @ctx: pointer to the context structure.
- * @device_addr:  device memory physical address.
- * @size: size of device memory.
- * @flags: DMA-BUF file/FD flags.
- * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
- *
- * Create and export a dma-buf object for an existing memory allocation inside
- * the device memory, and return a FD which is associated with the dma-buf
- * object.
- *
- * Return: 0 on success, non-zero for failure.
- */
-static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr,
-					u64 size, int flags, int *dmabuf_fd)
+static int validate_export_params_common(struct hl_device *hdev, u64 device_addr, u64 size)
 {
-	struct hl_dmabuf_priv *hl_dmabuf;
-	struct hl_device *hdev = ctx->hdev;
-	struct asic_fixed_properties *prop;
-	u64 bar_address;
-	int rc;
-
-	prop = &hdev->asic_prop;
-
 	if (!IS_ALIGNED(device_addr, PAGE_SIZE)) {
 		dev_dbg(hdev->dev,
 			"exported device memory address 0x%llx should be aligned to 0x%lx\n",
@ -1863,49 +1861,150 @@ static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr,
 		return -EINVAL;
 	}

+	return 0;
+}
+
+static int validate_export_params_no_mmu(struct hl_device *hdev, u64 device_addr, u64 size)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 bar_address;
+	int rc;
+
+	rc = validate_export_params_common(hdev, device_addr, size);
+	if (rc)
+		return rc;
+
 	if (device_addr < prop->dram_user_base_address ||
-				device_addr + size > prop->dram_end_address ||
-				device_addr + size < device_addr) {
+				(device_addr + size) > prop->dram_end_address ||
+				(device_addr + size) < device_addr) {
 		dev_dbg(hdev->dev,
 			"DRAM memory range 0x%llx (+0x%llx) is outside of DRAM boundaries\n",
 			device_addr, size);
 		return -EINVAL;
 	}

-	bar_address = hdev->dram_pci_bar_start +
-			(device_addr - prop->dram_base_address);
+	bar_address = hdev->dram_pci_bar_start + (device_addr - prop->dram_base_address);

-	if (bar_address + size >
-			hdev->dram_pci_bar_start + prop->dram_pci_bar_size ||
-			bar_address + size < bar_address) {
+	if ((bar_address + size) > (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
+			(bar_address + size) < bar_address) {
 		dev_dbg(hdev->dev,
 			"DRAM memory range 0x%llx (+0x%llx) is outside of PCI BAR boundaries\n",
 			device_addr, size);
 		return -EINVAL;
 	}

-	hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
-	if (!hl_dmabuf)
-		return -ENOMEM;
+	return 0;
+}

-	hl_dmabuf->device_address = device_addr;
+static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size, u64 offset,
+					struct hl_vm_phys_pg_pack *phys_pg_pack)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 bar_address;
+	int i, rc;

-	rc = export_dmabuf_common(ctx, hl_dmabuf, size, flags, dmabuf_fd);
+	rc = validate_export_params_common(hdev, device_addr, size);
 	if (rc)
-		goto err_free_dmabuf_wrapper;
+		return rc;
+
+	if ((offset + size) > phys_pg_pack->total_size) {
+		dev_dbg(hdev->dev, "offset %#llx and size %#llx exceed total map size %#llx\n",
+				offset, size, phys_pg_pack->total_size);
+		return -EINVAL;
+	}
+
+	for (i = 0 ; i < phys_pg_pack->npages ; i++) {
+
+		bar_address = hdev->dram_pci_bar_start +
+					(phys_pg_pack->pages[i] - prop->dram_base_address);
+
+		if ((bar_address + phys_pg_pack->page_size) >
+				(hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
+				(bar_address + phys_pg_pack->page_size) < bar_address) {
+			dev_dbg(hdev->dev,
+				"DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n",
+					phys_pg_pack->pages[i],
+					phys_pg_pack->page_size);
+
+			return -EINVAL;
+		}
+	}

 	return 0;
+}

-err_free_dmabuf_wrapper:
-	kfree(hl_dmabuf);
-	return rc;
+static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm_hash_node *hnode;
+
+	/* get the memory handle */
+	mutex_lock(&ctx->mem_hash_lock);
+	hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr)
+		if (addr == hnode->vaddr)
+			break;
+
+	if (!hnode) {
+		mutex_unlock(&ctx->mem_hash_lock);
+		dev_dbg(hdev->dev, "map address %#llx not found\n", addr);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (upper_32_bits(hnode->handle)) {
+		mutex_unlock(&ctx->mem_hash_lock);
+		dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n",
+				hnode->handle, addr);
+		return ERR_PTR(-EINVAL);
+	}
+
+	/*
+	 * node found, increase export count so this memory cannot be unmapped
+	 * and the hash node cannot be deleted.
+	 */
+	hnode->export_cnt++;
+	mutex_unlock(&ctx->mem_hash_lock);
+
+	return hnode;
+}
+
+static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode)
+{
+	mutex_lock(&ctx->mem_hash_lock);
+	hnode->export_cnt--;
+	mutex_unlock(&ctx->mem_hash_lock);
+}
+
+static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev,
+							struct hl_vm_hash_node *hnode)
+{
+	struct hl_vm_phys_pg_pack *phys_pg_pack;
+	struct hl_vm *vm = &hdev->vm;
+
+	spin_lock(&vm->idr_lock);
+	phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) hnode->handle);
+	if (!phys_pg_pack) {
+		spin_unlock(&vm->idr_lock);
+		dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) hnode->handle);
+		return ERR_PTR(-EINVAL);
+	}
+
+	spin_unlock(&vm->idr_lock);
+
+	if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
+		dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", hnode->handle);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return phys_pg_pack;
 }

 /**
- * export_dmabuf_from_handle() - export a dma-buf object for the given memory
- *                               handle.
+ * export_dmabuf_from_addr() - export a dma-buf object for the given memory
+ *                             address and size.
 * @ctx: pointer to the context structure.
- * @handle: device memory allocation handle.
+ * @addr: device address.
+ * @size: size of device memory to export.
+ * @offset: the offset into the buffer from which to start exporting
 * @flags: DMA-BUF file/FD flags.
 * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
 *
@ -1915,87 +2014,69 @@ static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr,
 *
 * Return: 0 on success, non-zero for failure.
 */
-static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags,
-					int *dmabuf_fd)
+static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 addr, u64 size, u64 offset,
+					int flags, int *dmabuf_fd)
 {
-	struct hl_vm_phys_pg_pack *phys_pg_pack;
-	struct hl_dmabuf_priv *hl_dmabuf;
-	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
+	struct hl_vm_hash_node *hnode = NULL;
 	struct asic_fixed_properties *prop;
-	struct hl_vm *vm = &hdev->vm;
-	u64 bar_address;
-	int rc, i;
+	struct hl_dmabuf_priv *hl_dmabuf;
+	struct hl_device *hdev;
+	u64 export_addr;
+	int rc;

+	hdev = ctx->hdev;
 	prop = &hdev->asic_prop;

-	if (upper_32_bits(handle)) {
-		dev_dbg(hdev->dev, "no match for handle 0x%llx\n", handle);
+	/* offset must be 0 in devices without virtual memory support */
+	if (!prop->dram_supports_virtual_memory && offset) {
+		dev_dbg(hdev->dev, "offset is not allowed in device without virtual memory\n");
 		return -EINVAL;
 	}

-	spin_lock(&vm->idr_lock);
-
-	phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) handle);
-	if (!phys_pg_pack) {
-		spin_unlock(&vm->idr_lock);
-		dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) handle);
-		return -EINVAL;
-	}
-
-	/* increment now to avoid freeing device memory while exporting */
-	phys_pg_pack->exporting_cnt++;
-
-	spin_unlock(&vm->idr_lock);
-
-	if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
-		dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", handle);
-		rc = -EINVAL;
-		goto err_dec_exporting_cnt;
-	}
-
-	for (i = 0 ; i < phys_pg_pack->npages ; i++) {
-
-		bar_address = hdev->dram_pci_bar_start +
-						(phys_pg_pack->pages[i] -
-						prop->dram_base_address);
-
-		if (bar_address + phys_pg_pack->page_size >
-			hdev->dram_pci_bar_start + prop->dram_pci_bar_size ||
-			bar_address + phys_pg_pack->page_size < bar_address) {
-
-			dev_dbg(hdev->dev,
-				"DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n",
-				phys_pg_pack->pages[i],
-				phys_pg_pack->page_size);
-
-			rc = -EINVAL;
-			goto err_dec_exporting_cnt;
-		}
-	}
+	export_addr = addr + offset;

 	hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
-	if (!hl_dmabuf) {
-		rc = -ENOMEM;
-		goto err_dec_exporting_cnt;
+	if (!hl_dmabuf)
+		return -ENOMEM;
+
+	if (prop->dram_supports_virtual_memory) {
+		hnode = memhash_node_export_get(ctx, addr);
+		if (IS_ERR(hnode)) {
+			rc = PTR_ERR(hnode);
+			goto err_free_dmabuf_wrapper;
+		}
+		phys_pg_pack = get_phys_pg_pack_from_hash_node(hdev, hnode);
+		if (IS_ERR(phys_pg_pack)) {
+			rc = PTR_ERR(phys_pg_pack);
+			goto dec_memhash_export_cnt;
+		}
+		rc = validate_export_params(hdev, export_addr, size, offset, phys_pg_pack);
+		if (rc)
+			goto dec_memhash_export_cnt;
+
+		phys_pg_pack->exported_size = size;
+		hl_dmabuf->phys_pg_pack = phys_pg_pack;
+		hl_dmabuf->memhash_hnode = hnode;
+	} else {
+		rc = validate_export_params_no_mmu(hdev, export_addr, size);
+		if (rc)
+			goto err_free_dmabuf_wrapper;
 	}

-	hl_dmabuf->phys_pg_pack = phys_pg_pack;
+	hl_dmabuf->device_address = export_addr;

-	rc = export_dmabuf_common(ctx, hl_dmabuf, phys_pg_pack->total_size,
-				flags, dmabuf_fd);
+	rc = export_dmabuf(ctx, hl_dmabuf, size, flags, dmabuf_fd);
 	if (rc)
-		goto err_free_dmabuf_wrapper;
+		goto dec_memhash_export_cnt;

 	return 0;

+dec_memhash_export_cnt:
+	if (prop->dram_supports_virtual_memory)
+		memhash_node_export_put(ctx, hnode);
 err_free_dmabuf_wrapper:
 	kfree(hl_dmabuf);
-
-err_dec_exporting_cnt:
-	spin_lock(&vm->idr_lock);
-	phys_pg_pack->exporting_cnt--;
-	spin_unlock(&vm->idr_lock);
-
 	return rc;
 }

@ -2089,12 +2170,13 @@ static int hl_ts_mmap(struct hl_mmap_mem_buf *buf, struct vm_area_struct *vma, v
 static int hl_ts_alloc_buf(struct hl_mmap_mem_buf *buf, gfp_t gfp, void *args)
 {
 	struct hl_ts_buff *ts_buff = NULL;
-	u32 size, num_elements;
+	u32 num_elements;
+	size_t size;
 	void *p;

 	num_elements = *(u32 *)args;

-	ts_buff = kzalloc(sizeof(*ts_buff), GFP_KERNEL);
+	ts_buff = kzalloc(sizeof(*ts_buff), gfp);
 	if (!ts_buff)
 		return -ENOMEM;

@ -2180,7 +2262,7 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 	int rc, dmabuf_fd = -EBADF;

 	if (!hl_device_operational(hdev, &status)) {
-		dev_warn_ratelimited(hdev->dev,
+		dev_dbg_ratelimited(hdev->dev,
 			"Device is %s. Can't execute MEMORY IOCTL\n",
 			hdev->status[status]);
 		return -EBUSY;
@ -2269,17 +2351,12 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 		break;

 	case HL_MEM_OP_EXPORT_DMABUF_FD:
-		if (hdev->asic_prop.dram_supports_virtual_memory)
-			rc = export_dmabuf_from_handle(ctx,
-					args->in.export_dmabuf_fd.handle,
-					args->in.flags,
-					&dmabuf_fd);
-		else
-			rc = export_dmabuf_from_addr(ctx,
-					args->in.export_dmabuf_fd.handle,
-					args->in.export_dmabuf_fd.mem_size,
-					args->in.flags,
-					&dmabuf_fd);
+		rc = export_dmabuf_from_addr(ctx,
+				args->in.export_dmabuf_fd.addr,
+				args->in.export_dmabuf_fd.mem_size,
+				args->in.export_dmabuf_fd.offset,
+				args->in.flags,
+				&dmabuf_fd);
 		memset(args, 0, sizeof(*args));
 		args->out.fd = dmabuf_fd;
 		break;
--- a/drivers/accel/habanalabs/common/memory_mgr.c
+++ b/drivers/accel/habanalabs/common/memory_mgr.c
@ -25,8 +25,7 @@ struct hl_mmap_mem_buf *hl_mmap_mem_buf_get(struct hl_mem_mgr *mmg, u64 handle)
 	buf = idr_find(&mmg->handles, lower_32_bits(handle >> PAGE_SHIFT));
 	if (!buf) {
 		spin_unlock(&mmg->lock);
-		dev_warn(mmg->dev,
-			 "Buff get failed, no match to handle %#llx\n", handle);
+		dev_dbg(mmg->dev, "Buff get failed, no match to handle %#llx\n", handle);
 		return NULL;
 	}
 	kref_get(&buf->refcount);
--- a/drivers/accel/habanalabs/common/mmu/Makefile
+++ b/drivers/accel/habanalabs/common/mmu/Makefile
--- a/drivers/accel/habanalabs/common/mmu/mmu.c
+++ b/drivers/accel/habanalabs/common/mmu/mmu.c
@ -781,7 +781,7 @@ static void mmu_dma_mem_free_from_chunk(struct gen_pool *pool,
 					struct gen_pool_chunk *chunk,
 					void *data)
 {
-	struct hl_device *hdev = (struct hl_device *)data;
+	struct hl_device *hdev = data;

 	hl_asic_dma_free_coherent(hdev, (chunk->end_addr - chunk->start_addr) + 1,
 					(void *)chunk->start_addr, chunk->phys_addr);
--- a/drivers/accel/habanalabs/common/mmu/mmu_v1.c
+++ b/drivers/accel/habanalabs/common/mmu/mmu_v1.c
@ -344,7 +344,6 @@ static void dram_default_mapping_fini(struct hl_ctx *ctx)
 		}
 	}

-	hop2_pte_addr = hop2_addr;
 	hop2_pte_addr = hop2_addr;
 	for (i = 0 ; i < num_of_hop3 ; i++) {
 		clear_pte(ctx, hop2_pte_addr);
--- a/drivers/accel/habanalabs/common/mmu/mmu_v2_hr.c
+++ b/drivers/accel/habanalabs/common/mmu/mmu_v2_hr.c
--- a/drivers/accel/habanalabs/common/pci/Makefile
+++ b/drivers/accel/habanalabs/common/pci/Makefile
--- a/drivers/accel/habanalabs/common/pci/pci.c
+++ b/drivers/accel/habanalabs/common/pci/pci.c
@ -10,6 +10,8 @@

 #include <linux/pci.h>

+#include <trace/events/habanalabs.h>
+
 #define HL_PLDM_PCI_ELBI_TIMEOUT_MSEC	(HL_PCI_ELBI_TIMEOUT_MSEC * 100)

 #define IATU_REGION_CTRL_REGION_EN_MASK		BIT(31)
@ -120,6 +122,9 @@ int hl_pci_elbi_read(struct hl_device *hdev, u64 addr, u32 *data)
 	if ((val & PCI_CONFIG_ELBI_STS_MASK) == PCI_CONFIG_ELBI_STS_DONE) {
 		pci_read_config_dword(pdev, mmPCI_CONFIG_ELBI_DATA, data);

+		if (unlikely(trace_habanalabs_elbi_read_enabled()))
+			trace_habanalabs_elbi_read(hdev->dev, (u32) addr, val);
+
 		return 0;
 	}

@ -179,8 +184,11 @@ static int hl_pci_elbi_write(struct hl_device *hdev, u64 addr, u32 data)
 		usleep_range(300, 500);
 	}

-	if ((val & PCI_CONFIG_ELBI_STS_MASK) == PCI_CONFIG_ELBI_STS_DONE)
+	if ((val & PCI_CONFIG_ELBI_STS_MASK) == PCI_CONFIG_ELBI_STS_DONE) {
+		if (unlikely(trace_habanalabs_elbi_write_enabled()))
+			trace_habanalabs_elbi_write(hdev->dev, (u32) addr, val);
 		return 0;
+	}

 	if (val & PCI_CONFIG_ELBI_STS_ERR)
 		return -EIO;
--- a/drivers/accel/habanalabs/common/security.c
+++ b/drivers/accel/habanalabs/common/security.c
@ -7,6 +7,19 @@

 #include "habanalabs.h"

+static const char * const hl_glbl_error_cause[HL_MAX_NUM_OF_GLBL_ERR_CAUSE] = {
+	"Error due to un-priv read",
+	"Error due to un-secure read",
+	"Error due to read from unmapped reg",
+	"Error due to un-priv write",
+	"Error due to un-secure write",
+	"Error due to write to unmapped reg",
+	"External I/F write sec violation",
+	"External I/F write to un-mapped reg",
+	"Read to write only",
+	"Write to read only"
+};
+
 /**
 * hl_get_pb_block - return the relevant block within the block array
 *
@ -598,3 +611,164 @@ void hl_ack_pb_single_dcore(struct hl_device *hdev, u32 dcore_offset,
 				blocks_array_size);

 }
+
+static u32 hl_automated_get_block_base_addr(struct hl_device *hdev,
+		struct hl_special_block_info *block_info,
+		u32 major, u32 minor, u32 sub_minor)
+{
+	u32 fw_block_base_address = block_info->base_addr +
+			major * block_info->major_offset +
+			minor * block_info->minor_offset +
+			sub_minor * block_info->sub_minor_offset;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+	/* Calculation above returns an address for FW use, and therefore should
+	 * be casted for driver use.
+	 */
+	return (fw_block_base_address - lower_32_bits(prop->cfg_base_address));
+}
+
+static bool hl_check_block_type_exclusion(struct hl_skip_blocks_cfg *skip_blocks_cfg,
+		int block_type)
+{
+	int i;
+
+	/* Check if block type is listed in the exclusion list of block types */
+	for (i = 0 ; i < skip_blocks_cfg->block_types_len ; i++)
+		if (block_type == skip_blocks_cfg->block_types[i])
+			return true;
+
+	return false;
+}
+
+static bool hl_check_block_range_exclusion(struct hl_device *hdev,
+		struct hl_skip_blocks_cfg *skip_blocks_cfg,
+		struct hl_special_block_info *block_info,
+		u32 major, u32 minor, u32 sub_minor)
+{
+	u32 blocks_in_range, block_base_addr_in_range, block_base_addr;
+	int i, j;
+
+	block_base_addr = hl_automated_get_block_base_addr(hdev, block_info,
+			major, minor, sub_minor);
+
+	for (i = 0 ; i < skip_blocks_cfg->block_ranges_len ; i++) {
+		blocks_in_range = (skip_blocks_cfg->block_ranges[i].end -
+				skip_blocks_cfg->block_ranges[i].start) /
+				HL_BLOCK_SIZE + 1;
+		for (j = 0 ; j < blocks_in_range ; j++) {
+			block_base_addr_in_range = skip_blocks_cfg->block_ranges[i].start +
+					j * HL_BLOCK_SIZE;
+			if (block_base_addr == block_base_addr_in_range)
+				return true;
+		}
+	}
+
+	return false;
+}
+
+static int hl_read_glbl_errors(struct hl_device *hdev,
+		u32 blk_idx, u32 major, u32 minor, u32 sub_minor, void *data)
+{
+	struct hl_special_block_info *special_blocks = hdev->asic_prop.special_blocks;
+	struct hl_special_block_info *current_block = &special_blocks[blk_idx];
+	u32 glbl_err_addr, glbl_err_cause, addr_val, cause_val, block_base,
+		base = current_block->base_addr - lower_32_bits(hdev->asic_prop.cfg_base_address);
+	int i;
+
+	block_base = base + major * current_block->major_offset +
+			minor * current_block->minor_offset +
+			sub_minor * current_block->sub_minor_offset;
+
+	glbl_err_cause = block_base + HL_GLBL_ERR_CAUSE_OFFSET;
+	cause_val = RREG32(glbl_err_cause);
+	if (!cause_val)
+		return 0;
+
+	glbl_err_addr = block_base + HL_GLBL_ERR_ADDR_OFFSET;
+	addr_val = RREG32(glbl_err_addr);
+
+	for (i = 0 ; i < hdev->asic_prop.glbl_err_cause_num ; i++) {
+		if (cause_val & BIT(i))
+			dev_err_ratelimited(hdev->dev,
+				"%s, addr %#llx\n",
+				hl_glbl_error_cause[i],
+				hdev->asic_prop.cfg_base_address + block_base +
+				FIELD_GET(HL_GLBL_ERR_ADDRESS_MASK, addr_val));
+	}
+
+	WREG32(glbl_err_cause, cause_val);
+
+	return 0;
+}
+
+void hl_check_for_glbl_errors(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct hl_special_blocks_cfg special_blocks_cfg;
+	struct iterate_special_ctx glbl_err_iter;
+	int rc;
+
+	memset(&special_blocks_cfg, 0, sizeof(special_blocks_cfg));
+	special_blocks_cfg.skip_blocks_cfg = &prop->skip_special_blocks_cfg;
+
+	glbl_err_iter.fn = &hl_read_glbl_errors;
+	glbl_err_iter.data = &special_blocks_cfg;
+
+	rc = hl_iterate_special_blocks(hdev, &glbl_err_iter);
+	if (rc)
+		dev_err_ratelimited(hdev->dev,
+			"Could not iterate special blocks, glbl error check failed\n");
+}
+
+int hl_iterate_special_blocks(struct hl_device *hdev, struct iterate_special_ctx *ctx)
+{
+	struct hl_special_blocks_cfg *special_blocks_cfg =
+			(struct hl_special_blocks_cfg *)ctx->data;
+	struct hl_skip_blocks_cfg *skip_blocks_cfg =
+			special_blocks_cfg->skip_blocks_cfg;
+	u32 major, minor, sub_minor, blk_idx, num_blocks;
+	struct hl_special_block_info *block_info_arr;
+	int rc;
+
+	block_info_arr = hdev->asic_prop.special_blocks;
+	if (!block_info_arr)
+		return -EINVAL;
+
+	num_blocks = hdev->asic_prop.num_of_special_blocks;
+
+	for (blk_idx = 0 ; blk_idx < num_blocks ; blk_idx++, block_info_arr++) {
+		if (hl_check_block_type_exclusion(skip_blocks_cfg, block_info_arr->block_type))
+			continue;
+
+		for (major = 0 ; major < block_info_arr->major ; major++) {
+			minor = 0;
+			do {
+				sub_minor = 0;
+				do {
+					if ((hl_check_block_range_exclusion(hdev,
+							skip_blocks_cfg, block_info_arr,
+							major, minor, sub_minor)) ||
+						(skip_blocks_cfg->skip_block_hook &&
+						skip_blocks_cfg->skip_block_hook(hdev,
+							special_blocks_cfg,
+							blk_idx, major, minor, sub_minor))) {
+						sub_minor++;
+						continue;
+					}
+
+					rc = ctx->fn(hdev, blk_idx, major, minor,
+								sub_minor, ctx->data);
+					if (rc)
+						return rc;
+
+					sub_minor++;
+				} while (sub_minor < block_info_arr->sub_minor);
+
+				minor++;
+			} while (minor < block_info_arr->minor);
+		}
+	}
+
+	return 0;
+}
--- a/drivers/accel/habanalabs/common/security.h
+++ b/drivers/accel/habanalabs/common/security.h
@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef SECURITY_H_
+#define SECURITY_H_
+
+#include <linux/io-64-nonatomic-lo-hi.h>
+
+extern struct hl_device *hdev;
+
+/* special blocks */
+#define HL_MAX_NUM_OF_GLBL_ERR_CAUSE		10
+#define HL_GLBL_ERR_ADDRESS_MASK		GENMASK(11, 0)
+/* GLBL_ERR_ADDR register offset from the start of the block */
+#define HL_GLBL_ERR_ADDR_OFFSET		0xF44
+/* GLBL_ERR_CAUSE register offset from the start of the block */
+#define HL_GLBL_ERR_CAUSE_OFFSET	0xF48
+
+/*
+ * struct hl_special_block_info - stores address details of a particular type of
+ * IP block which has a SPECIAL part.
+ *
+ * @block_type: block type as described in every ASIC's block_types enum.
+ * @base_addr: base address of the first block of particular type,
+ *             e.g., address of NIC0_UMR0_0 of 'NIC_UMR' block.
+ * @major: number of major blocks of particular type.
+ * @minor: number of minor blocks of particular type.
+ * @sub_minor: number of sub minor blocks of particular type.
+ * @major_offset: address gap between 2 consecutive major blocks of particular type,
+ *                e.g., offset between NIC0_UMR0_0 and NIC1_UMR0_0 is 0x80000.
+ * @minor_offset: address gap between 2 consecutive minor blocks of particular type,
+ *                e.g., offset between NIC0_UMR0_0 and NIC0_UMR1_0 is 0x20000.
+ * @sub_minor_offset: address gap between 2 consecutive sub_minor blocks of particular
+ *                    type, e.g., offset between NIC0_UMR0_0 and NIC0_UMR0_1 is 0x1000.
+ *
+ * e.g., in Gaudi2, NIC_UMR blocks can be interpreted as:
+ * NIC<major>_UMR<minor>_<sub_minor> where major=12, minor=2, sub_minor=15.
+ * In other words, for each of 12 major numbers (i.e 0 to 11) there are
+ * 2 blocks with different minor numbers (i.e. 0 to 1). Again, for each minor
+ * number there are 15 blocks with different sub_minor numbers (i.e. 0 to 14).
+ * So different blocks are NIC0_UMR0_0, NIC0_UMR0_1, ..., NIC0_UMR1_0, ....,
+ * NIC11_UMR1_14.
+ *
+ * Struct's formatted data is located in the SOL-based auto-generated protbits headers.
+ */
+struct hl_special_block_info {
+	int block_type;
+	u32 base_addr;
+	u32 major;
+	u32 minor;
+	u32 sub_minor;
+	u32 major_offset;
+	u32 minor_offset;
+	u32 sub_minor_offset;
+};
+
+/*
+ * struct hl_automated_pb_cfg - represents configurations of a particular type
+ * of IP block which has protection bits.
+ *
+ * @addr: address details as described in hl_automation_pb_addr struct.
+ * @prot_map: each bit corresponds to one among 32 protection configuration regs
+ *            (e.g., SPECIAL_GLBL_PRIV). '1' means 0xffffffff and '0' means 0x0
+ *            to be written into the corresponding protection configuration reg.
+ *            This bit is meaningful if same bit in data_map is 0, otherwise ignored.
+ * @data_map: each bit corresponds to one among 32 protection configuration regs
+ *            (e.g., SPECIAL_GLBL_PRIV). '1' means corresponding protection
+ *            configuration reg is to be written with a value in array pointed
+ *            by 'data', otherwise the value is decided by 'prot_map'.
+ * @data: pointer to data array which stores the config value(s) to be written
+ *            to corresponding protection configuration reg(s).
+ * @data_size: size of the data array.
+ *
+ * Each bit of 'data_map' and 'prot_map' fields corresponds to one among 32
+ * protection configuration registers e.g., SPECIAL GLBL PRIV regs (starting at
+ * offset 0xE80). '1' in 'data_map' means protection configuration to be done
+ * using configuration in data array. '0' in 'data_map" means protection
+ * configuration to be done as per the value of corresponding bit in 'prot_map'.
+ * '1' in 'prot_map' means the register to be programmed with 0xFFFFFFFF
+ * (all non-protected). '0' in 'prot_map' means the register to be programmed
+ * with 0x0 (all protected).
+ *
+ * e.g., prot_map = 0x00000001, data_map = 0xC0000000 , data = {0xff, 0x12}
+ * SPECIAL_GLBL_PRIV[0] = 0xFFFFFFFF
+ * SPECIAL_GLBL_PRIV[1..29] = 0x0
+ * SPECIAL_GLBL_PRIV[30] = 0xFF
+ * SPECIAL_GLBL_PRIV[31] = 0x12
+ */
+struct hl_automated_pb_cfg {
+	struct hl_special_block_info addr;
+	u32 prot_map;
+	u32 data_map;
+	const u32 *data;
+	u8 data_size;
+};
+
+/* struct hl_special_blocks_cfg - holds special blocks cfg data.
+ *
+ * @priv_automated_pb_cfg: points to the main privileged PB array.
+ * @sec_automated_pb_cfg: points to the main secured PB array.
+ * @skip_blocks_cfg: holds arrays of block types & block ranges to be excluded.
+ * @priv_cfg_size: size of the main privileged PB array.
+ * @sec_cfg_size: size of the main secured PB array.
+ * @prot_lvl_priv: indication if it's a privileged/secured PB configurations.
+ */
+struct hl_special_blocks_cfg {
+	struct hl_automated_pb_cfg *priv_automated_pb_cfg;
+	struct hl_automated_pb_cfg *sec_automated_pb_cfg;
+	struct hl_skip_blocks_cfg *skip_blocks_cfg;
+	u32 priv_cfg_size;
+	u32 sec_cfg_size;
+	u8 prot_lvl_priv;
+};
+
+/* Automated security */
+
+/* struct hl_skip_blocks_cfg - holds arrays of block types & block ranges to be
+ * excluded from special blocks configurations.
+ *
+ * @block_types: an array of block types NOT to be configured.
+ * @block_types_len: len of an array of block types not to be configured.
+ * @block_ranges: an array of block ranges not to be configured.
+ * @block_ranges_len: len of an array of block ranges not to be configured.
+ * @skip_block_hook: hook that will be called before initializing special blocks.
+ */
+struct hl_skip_blocks_cfg {
+	int *block_types;
+	size_t block_types_len;
+	struct range *block_ranges;
+	size_t block_ranges_len;
+	bool (*skip_block_hook)(struct hl_device *hdev,
+				struct hl_special_blocks_cfg *special_blocks_cfg,
+				u32 blk_idx, u32 major, u32 minor, u32 sub_minor);
+};
+
+/**
+ * struct iterate_special_ctx - HW module special block iterator
+ * @fn: function to apply to each HW module special block instance
+ * @data: optional internal data to the function iterator
+ */
+struct iterate_special_ctx {
+	/*
+	 * callback for the HW module special block iterator
+	 * @hdev: pointer to the habanalabs device structure
+	 * @block_id: block (ASIC specific definition can be dcore/hdcore)
+	 * @major: major block index within block_id
+	 * @minor: minor block index within the major block
+	 * @sub_minor: sub_minor block index within the minor block
+	 * @data: function specific data
+	 */
+	int (*fn)(struct hl_device *hdev, u32 block_id, u32 major, u32 minor,
+						u32 sub_minor, void *data);
+	void *data;
+};
+
+int hl_iterate_special_blocks(struct hl_device *hdev, struct iterate_special_ctx *ctx);
+void hl_check_for_glbl_errors(struct hl_device *hdev);
+
+#endif /* SECURITY_H_ */
--- a/drivers/accel/habanalabs/common/state_dump.c
+++ b/drivers/accel/habanalabs/common/state_dump.c
@ -6,7 +6,7 @@
 */

 #include <linux/vmalloc.h>
-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "habanalabs.h"

 /**
--- a/drivers/accel/habanalabs/common/sysfs.c
+++ b/drivers/accel/habanalabs/common/sysfs.c
--- a/drivers/accel/habanalabs/gaudi/Makefile
+++ b/drivers/accel/habanalabs/gaudi/Makefile
--- a/drivers/accel/habanalabs/gaudi/gaudi.c
+++ b/drivers/accel/habanalabs/gaudi/gaudi.c
@ -701,6 +701,8 @@ static int gaudi_set_fixed_properties(struct hl_device *hdev)

 	prop->dma_mask = 48;

+	prop->hbw_flush_reg = mmPCIE_WRAP_RR_ELBI_RD_SEC_REG_CTRL;
+
 	return 0;
 }

@ -6432,12 +6434,6 @@ static int gaudi_send_job_on_qman0(struct hl_device *hdev,
 	else
 		timeout = HL_DEVICE_TIMEOUT_USEC;

-	if (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
-		dev_err_ratelimited(hdev->dev,
-			"Can't send driver job on QMAN0 because the device is not idle\n");
-		return -EBUSY;
-	}
-
 	fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 	if (!fence_ptr) {
 		dev_err(hdev->dev,
@ -7584,7 +7580,7 @@ static int tpc_krn_event_to_tpc_id(u16 tpc_dec_event_type)
 	return (tpc_dec_event_type - GAUDI_EVENT_TPC0_KRN_ERR) / 6;
 }

-static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type)
+static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 {
 	ktime_t zero_time = ktime_set(0, 0);

@ -7612,6 +7608,7 @@ static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type)
 		hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 		hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 		hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
+		*event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 		dev_info_ratelimited(hdev->dev,
 			"Clock throttling due to overheating\n");
 		break;
@ -7619,6 +7616,7 @@ static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type)
 	case GAUDI_EVENT_FIX_THERMAL_ENV_E:
 		hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 		hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
+		*event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 		dev_info_ratelimited(hdev->dev,
 			"Thermal envelop is safe, back to optimal clock\n");
 		break;
@ -7887,8 +7885,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entr
 		break;

 	case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
-		event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
-		gaudi_print_clk_change_info(hdev, event_type);
+		gaudi_print_clk_change_info(hdev, event_type, &event_mask);
 		hl_fw_unmask_irq(hdev, event_type);
 		break;

@ -9133,6 +9130,16 @@ static u32 *gaudi_get_stream_master_qid_arr(void)
 	return gaudi_stream_master;
 }

+static int gaudi_set_dram_properties(struct hl_device *hdev)
+{
+	return 0;
+}
+
+static int gaudi_set_binning_masks(struct hl_device *hdev)
+{
+	return 0;
+}
+
 static void gaudi_check_if_razwi_happened(struct hl_device *hdev)
 {
 }
@ -9259,6 +9266,8 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.access_dev_mem = hl_access_dev_mem,
 	.set_dram_bar_base = gaudi_set_hbm_bar_base,
 	.send_device_activity = gaudi_send_device_activity,
+	.set_dram_properties = gaudi_set_dram_properties,
+	.set_binning_masks = gaudi_set_binning_masks,
 };

 /**
--- a/drivers/accel/habanalabs/gaudi/gaudiP.h
+++ b/drivers/accel/habanalabs/gaudi/gaudiP.h
@ -8,7 +8,7 @@
 #ifndef GAUDIP_H_
 #define GAUDIP_H_

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "../common/habanalabs.h"
 #include "../include/common/hl_boot_if.h"
 #include "../include/gaudi/gaudi_packets.h"
--- a/drivers/accel/habanalabs/gaudi/gaudi_coresight.c
+++ b/drivers/accel/habanalabs/gaudi/gaudi_coresight.c
@ -11,7 +11,8 @@
 #include "../include/gaudi/gaudi_masks.h"
 #include "../include/gaudi/gaudi_reg_map.h"

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
+
 #define SPMU_SECTION_SIZE		MME0_ACC_SPMU_MAX_OFFSET
 #define SPMU_EVENT_TYPES_OFFSET		0x400
 #define SPMU_MAX_COUNTERS		6
--- a/drivers/accel/habanalabs/gaudi/gaudi_security.c
+++ b/drivers/accel/habanalabs/gaudi/gaudi_security.c
--- a/drivers/accel/habanalabs/gaudi2/Makefile
+++ b/drivers/accel/habanalabs/gaudi2/Makefile
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
--- a/drivers/accel/habanalabs/gaudi2/gaudi2P.h
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2P.h
@ -8,7 +8,7 @@
 #ifndef GAUDI2P_H_
 #define GAUDI2P_H_

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "../common/habanalabs.h"
 #include "../include/common/hl_boot_if.h"
 #include "../include/gaudi2/gaudi2.h"
@ -240,6 +240,8 @@
 #define GAUDI2_SOB_INCREMENT_BY_ONE	(FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1) | \
 					FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1))

+#define GAUDI2_NUM_OF_GLBL_ERR_CAUSE		8
+
 enum gaudi2_reserved_sob_id {
 	GAUDI2_RESERVED_SOB_CS_COMPLETION_FIRST,
 	GAUDI2_RESERVED_SOB_CS_COMPLETION_LAST =
@ -532,6 +534,41 @@ struct gaudi2_device {
 	u32				num_of_valid_hw_events;
 };

+/*
+ * Types of the Gaudi2 IP blocks, used by special blocks iterator.
+ * Required for scenarios where only particular block types can be
+ * addressed (e.g., special PLDM images).
+ */
+enum gaudi2_block_types {
+	GAUDI2_BLOCK_TYPE_PLL,
+	GAUDI2_BLOCK_TYPE_RTR,
+	GAUDI2_BLOCK_TYPE_CPU,
+	GAUDI2_BLOCK_TYPE_HIF,
+	GAUDI2_BLOCK_TYPE_HBM,
+	GAUDI2_BLOCK_TYPE_NIC,
+	GAUDI2_BLOCK_TYPE_PCIE,
+	GAUDI2_BLOCK_TYPE_PCIE_PMA,
+	GAUDI2_BLOCK_TYPE_PDMA,
+	GAUDI2_BLOCK_TYPE_EDMA,
+	GAUDI2_BLOCK_TYPE_PMMU,
+	GAUDI2_BLOCK_TYPE_PSOC,
+	GAUDI2_BLOCK_TYPE_ROT,
+	GAUDI2_BLOCK_TYPE_ARC_FARM,
+	GAUDI2_BLOCK_TYPE_DEC,
+	GAUDI2_BLOCK_TYPE_MME,
+	GAUDI2_BLOCK_TYPE_EU_BIST,
+	GAUDI2_BLOCK_TYPE_SYNC_MNGR,
+	GAUDI2_BLOCK_TYPE_STLB,
+	GAUDI2_BLOCK_TYPE_TPC,
+	GAUDI2_BLOCK_TYPE_HMMU,
+	GAUDI2_BLOCK_TYPE_SRAM,
+	GAUDI2_BLOCK_TYPE_XBAR,
+	GAUDI2_BLOCK_TYPE_KDMA,
+	GAUDI2_BLOCK_TYPE_XDMA,
+	GAUDI2_BLOCK_TYPE_XFT,
+	GAUDI2_BLOCK_TYPE_MAX
+};
+
 extern const u32 gaudi2_dma_core_blocks_bases[DMA_CORE_ID_SIZE];
 extern const u32 gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_SIZE];
 extern const u32 gaudi2_mme_acc_blocks_bases[MME_ID_SIZE];
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c
@ -5,7 +5,7 @@
 * All Rights Reserved.
 */
 #include "gaudi2_coresight_regs.h"
-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>

 #define GAUDI2_PLDM_CORESIGHT_TIMEOUT_USEC	(CORESIGHT_TIMEOUT_USEC * 2000)
 #define SPMU_MAX_COUNTERS			6
@ -2376,10 +2376,10 @@ static int gaudi2_config_bmon(struct hl_device *hdev, struct hl_debug_params *pa
 		WREG32(base_reg + mmBMON_ADDRH_S2_OFFSET, upper_32_bits(input->start_addr2));
 		WREG32(base_reg + mmBMON_ADDRL_E2_OFFSET, lower_32_bits(input->end_addr2));
 		WREG32(base_reg + mmBMON_ADDRH_E2_OFFSET, upper_32_bits(input->end_addr2));
-		WREG32(base_reg + mmBMON_ADDRL_S3_OFFSET, lower_32_bits(input->start_addr2));
-		WREG32(base_reg + mmBMON_ADDRH_S3_OFFSET, upper_32_bits(input->start_addr2));
-		WREG32(base_reg + mmBMON_ADDRL_E3_OFFSET, lower_32_bits(input->end_addr2));
-		WREG32(base_reg + mmBMON_ADDRH_E3_OFFSET, upper_32_bits(input->end_addr2));
+		WREG32(base_reg + mmBMON_ADDRL_S3_OFFSET, lower_32_bits(input->start_addr3));
+		WREG32(base_reg + mmBMON_ADDRH_S3_OFFSET, upper_32_bits(input->start_addr3));
+		WREG32(base_reg + mmBMON_ADDRL_E3_OFFSET, lower_32_bits(input->end_addr3));
+		WREG32(base_reg + mmBMON_ADDRH_E3_OFFSET, upper_32_bits(input->end_addr3));

 		WREG32(base_reg + mmBMON_IDL_OFFSET, 0x0);
 		WREG32(base_reg + mmBMON_IDH_OFFSET, 0x0);
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_coresight_regs.h
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_coresight_regs.h
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_masks.h
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_masks.h
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
@ -1561,6 +1561,7 @@ static const u32 gaudi2_pb_dcr0_tpc0_unsecured_regs[] = {
 	mmDCORE0_TPC0_CFG_LUT_FUNC128_BASE_ADDR_HI,
 	mmDCORE0_TPC0_CFG_LUT_FUNC256_BASE_ADDR_LO,
 	mmDCORE0_TPC0_CFG_LUT_FUNC256_BASE_ADDR_HI,
+	mmDCORE0_TPC0_CFG_KERNEL_KERNEL_CONFIG,
 	mmDCORE0_TPC0_CFG_KERNEL_SRF_0,
 	mmDCORE0_TPC0_CFG_KERNEL_SRF_1,
 	mmDCORE0_TPC0_CFG_KERNEL_SRF_2,
@ -1666,6 +1667,10 @@ static const u32 gaudi2_pb_dcr0_sm_glbl[] = {
 	mmDCORE0_SYNC_MNGR_GLBL_BASE,
 };

+static const u32 gaudi2_pb_dcr1_sm_glbl[] = {
+	mmDCORE1_SYNC_MNGR_GLBL_BASE,
+};
+
 static const struct range gaudi2_pb_dcr0_sm_glbl_unsecured_regs[] = {
 	{mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_1, mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_63},
 	{mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_1, mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_63},
@ -1678,14 +1683,14 @@ static const struct range gaudi2_pb_dcr0_sm_glbl_unsecured_regs[] = {
 };

 static const struct range gaudi2_pb_dcr_x_sm_glbl_unsecured_regs[] = {
-	{mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0, mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0, mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0, mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_CQ_PI_0, mmDCORE0_SYNC_MNGR_GLBL_CQ_PI_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0, mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0, mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0, mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_63},
-	{mmDCORE0_SYNC_MNGR_GLBL_CQ_INC_MODE_0, mmDCORE0_SYNC_MNGR_GLBL_CQ_INC_MODE_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0, mmDCORE1_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0, mmDCORE1_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0, mmDCORE1_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_CQ_PI_0, mmDCORE1_SYNC_MNGR_GLBL_CQ_PI_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_LBW_ADDR_L_0, mmDCORE1_SYNC_MNGR_GLBL_LBW_ADDR_L_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_LBW_ADDR_H_0, mmDCORE1_SYNC_MNGR_GLBL_LBW_ADDR_H_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_LBW_DATA_0, mmDCORE1_SYNC_MNGR_GLBL_LBW_DATA_63},
+	{mmDCORE1_SYNC_MNGR_GLBL_CQ_INC_MODE_0, mmDCORE1_SYNC_MNGR_GLBL_CQ_INC_MODE_63},
 };

 static const u32 gaudi2_pb_arc_sched[] = {
@ -3358,14 +3363,6 @@ static int gaudi2_init_protection_bits(struct hl_device *hdev)

 	/* Sync Manager GLBL */

-	/* Unsecure all CQ registers */
-	rc |= hl_init_pb_ranges(hdev, NUM_OF_DCORES, DCORE_OFFSET,
-			HL_PB_SINGLE_INSTANCE, HL_PB_NA,
-			gaudi2_pb_dcr0_sm_glbl,
-			ARRAY_SIZE(gaudi2_pb_dcr0_sm_glbl),
-			gaudi2_pb_dcr_x_sm_glbl_unsecured_regs,
-			ARRAY_SIZE(gaudi2_pb_dcr_x_sm_glbl_unsecured_regs));
-
 	/* Secure Dcore0 CQ0 registers */
 	rc |= hl_init_pb_ranges(hdev, HL_PB_SHARED, HL_PB_NA,
 			HL_PB_SINGLE_INSTANCE, HL_PB_NA,
@ -3374,6 +3371,14 @@ static int gaudi2_init_protection_bits(struct hl_device *hdev)
 			gaudi2_pb_dcr0_sm_glbl_unsecured_regs,
 			ARRAY_SIZE(gaudi2_pb_dcr0_sm_glbl_unsecured_regs));

+	/* Unsecure all other CQ registers */
+	rc |= hl_init_pb_ranges(hdev, NUM_OF_DCORES - 1, DCORE_OFFSET,
+			HL_PB_SINGLE_INSTANCE, HL_PB_NA,
+			gaudi2_pb_dcr1_sm_glbl,
+			ARRAY_SIZE(gaudi2_pb_dcr1_sm_glbl),
+			gaudi2_pb_dcr_x_sm_glbl_unsecured_regs,
+			ARRAY_SIZE(gaudi2_pb_dcr_x_sm_glbl_unsecured_regs));
+
 	/* PSOC.
 	 * Except for PSOC_GLOBAL_CONF, skip when security is enabled in F/W, because the blocks are
 	 * protected by privileged RR.
--- a/drivers/accel/habanalabs/goya/Makefile
+++ b/drivers/accel/habanalabs/goya/Makefile
--- a/drivers/accel/habanalabs/goya/goya.c
+++ b/drivers/accel/habanalabs/goya/goya.c
@ -5420,6 +5420,16 @@ static int goya_scrub_device_dram(struct hl_device *hdev, u64 val)
 	return -EOPNOTSUPP;
 }

+static int goya_set_dram_properties(struct hl_device *hdev)
+{
+	return 0;
+}
+
+static int goya_set_binning_masks(struct hl_device *hdev)
+{
+	return 0;
+}
+
 static int goya_send_device_activity(struct hl_device *hdev, bool open)
 {
 	return 0;
@ -5518,6 +5528,8 @@ static const struct hl_asic_funcs goya_funcs = {
 	.access_dev_mem = hl_access_dev_mem,
 	.set_dram_bar_base = goya_set_ddr_bar_base,
 	.send_device_activity = goya_send_device_activity,
+	.set_dram_properties = goya_set_dram_properties,
+	.set_binning_masks = goya_set_binning_masks,
 };

 /*
--- a/drivers/accel/habanalabs/goya/goyaP.h
+++ b/drivers/accel/habanalabs/goya/goyaP.h
@ -8,7 +8,7 @@
 #ifndef GOYAP_H_
 #define GOYAP_H_

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>
 #include "../common/habanalabs.h"
 #include "../include/common/hl_boot_if.h"
 #include "../include/goya/goya_packets.h"
--- a/drivers/accel/habanalabs/goya/goya_coresight.c
+++ b/drivers/accel/habanalabs/goya/goya_coresight.c
@ -10,7 +10,7 @@
 #include "../include/goya/asic_reg/goya_regs.h"
 #include "../include/goya/asic_reg/goya_masks.h"

-#include <uapi/misc/habanalabs.h>
+#include <uapi/drm/habanalabs_accel.h>

 #define GOYA_PLDM_CORESIGHT_TIMEOUT_USEC	(CORESIGHT_TIMEOUT_USEC * 100)

--- a/drivers/accel/habanalabs/goya/goya_hwmgr.c
+++ b/drivers/accel/habanalabs/goya/goya_hwmgr.c
--- a/drivers/accel/habanalabs/goya/goya_security.c
+++ b/drivers/accel/habanalabs/goya/goya_security.c
--- a/drivers/accel/habanalabs/include/common/cpucp_if.h
+++ b/drivers/accel/habanalabs/include/common/cpucp_if.h
@ -344,6 +344,16 @@ struct hl_eq_engine_arc_intr_data {
 	__le64 pad[5];
 };

+#define ADDR_DEC_ADDRESS_COUNT_MAX 4
+
+/* Data structure specifies details of ADDR_DEC interrupt */
+struct hl_eq_addr_dec_intr_data {
+	struct hl_eq_intr_cause intr_cause;
+	__le64 addr[ADDR_DEC_ADDRESS_COUNT_MAX];
+	__u8 addr_cnt;
+	__u8 pad[7];
+};
+
 struct hl_eq_entry {
 	struct hl_eq_header hdr;
 	union {
@ -358,6 +368,7 @@ struct hl_eq_entry {
 		struct hl_eq_razwi_with_intr_cause razwi_with_intr_cause;
 		struct hl_eq_hbm_sei_data sei_data;	/* Gaudi2 HBM */
 		struct hl_eq_engine_arc_intr_data arc_data;
+		struct hl_eq_addr_dec_intr_data addr_dec;
 		__le64 data[7];
 	};
 };
@ -643,6 +654,10 @@ enum pq_init_status {
 *       data corruption in case of mismatched driver/FW versions.
 *       Relevant only to Gaudi.
 *
+ * CPUCP_PACKET_GENERIC_PASSTHROUGH -
+ *      Generic opcode for all firmware info that is only passed to host
+ *      through the LKD, without getting parsed there.
+ *
 * CPUCP_PACKET_ACTIVE_STATUS_SET -
 *       LKD sends FW indication whether device is free or in use, this indication is reported
 *       also to the BMC.
@ -704,9 +719,12 @@ enum cpucp_packet_id {
 	CPUCP_PACKET_RESERVED5,			/* not used */
 	CPUCP_PACKET_RESERVED6,			/* not used */
 	CPUCP_PACKET_RESERVED7,			/* not used */
+	CPUCP_PACKET_GENERIC_PASSTHROUGH,	/* IOCTL */
 	CPUCP_PACKET_RESERVED8,			/* not used */
-	CPUCP_PACKET_RESERVED9,			/* not used */
 	CPUCP_PACKET_ACTIVE_STATUS_SET,		/* internal */
+	CPUCP_PACKET_RESERVED9,			/* not used */
+	CPUCP_PACKET_RESERVED10,		/* not used */
+	CPUCP_PACKET_RESERVED11,		/* not used */
 	CPUCP_PACKET_ID_MAX			/* must be last */
 };

@ -727,6 +745,11 @@ enum cpucp_packet_id {
 #define CPUCP_PKT_RES_PLL_OUT3_SHIFT	48
 #define CPUCP_PKT_RES_PLL_OUT3_MASK	0xFFFF000000000000ull

+#define CPUCP_PKT_RES_EEPROM_OUT0_SHIFT	0
+#define CPUCP_PKT_RES_EEPROM_OUT0_MASK	0x000000000000FFFFull
+#define CPUCP_PKT_RES_EEPROM_OUT1_SHIFT	16
+#define CPUCP_PKT_RES_EEPROM_OUT1_MASK	0x0000000000FF0000ull
+
 #define CPUCP_PKT_VAL_PFC_IN1_SHIFT	0
 #define CPUCP_PKT_VAL_PFC_IN1_MASK	0x0000000000000001ull
 #define CPUCP_PKT_VAL_PFC_IN2_SHIFT	1
@ -805,8 +828,13 @@ struct cpucp_packet {
 		__le32 nonce;
 	};

-	/* For NIC requests */
-	__le32 port_index;
+	union {
+		/* For NIC requests */
+		__le32 port_index;
+
+		/* For Generic packet sub index */
+		__le32 pkt_subidx;
+	};
 };

 struct cpucp_unmask_irq_arr_packet {
@ -881,7 +909,9 @@ enum cpucp_in_attributes {
 	cpucp_in_max,
 	cpucp_in_lowest = 6,
 	cpucp_in_highest = 7,
-	cpucp_in_reset_history
+	cpucp_in_reset_history,
+	cpucp_in_intr_alarm_a,
+	cpucp_in_intr_alarm_b,
 };

 enum cpucp_curr_attributes {
@ -976,6 +1006,11 @@ enum pll_index {
 	IC_PLL = 16,
 	MC_PLL = 17,
 	EMMC_PLL = 18,
+	D2D_PLL = 19,
+	CS_PLL = 20,
+	C2C_PLL = 21,
+	NCH_PLL = 22,
+	C2M_PLL = 23,
 	PLL_MAX
 };

@ -1135,8 +1170,9 @@ enum cpucp_serdes_type {
 	HLS1_SERDES_TYPE,
 	HLS1H_SERDES_TYPE,
 	HLS2_SERDES_TYPE,
-	UNKNOWN_SERDES_TYPE,
-	MAX_NUM_SERDES_TYPE = UNKNOWN_SERDES_TYPE
+	HLS2_TYPE_1_SERDES_TYPE,
+	MAX_NUM_SERDES_TYPE,		/* number of types */
+	UNKNOWN_SERDES_TYPE = 0xFFFF	/* serdes_type is u16 */
 };

 struct cpucp_nic_info {
@ -1160,6 +1196,21 @@ struct page_discard_info {
 	__le32 mmu_page_idx[PAGE_DISCARD_MAX];
 };

+/*
+ * struct frac_val - fracture value represented by "integer.frac".
+ * @integer: the integer part of the fracture value;
+ * @frac: the fracture part of the fracture value.
+ */
+struct frac_val {
+	union {
+		struct {
+			__le16 integer;
+			__le16 frac;
+		};
+		__le32 val;
+	};
+};
+
 /*
 * struct ser_val - the SER (symbol error rate) value is represented by "integer * 10 ^ -exp".
 * @integer: the integer part of the SER value;
@ -1183,8 +1234,12 @@ struct ser_val {
 * @pcs_link: has PCS link.
 * @phy_ready: is PHY ready.
 * @auto_neg: is Autoneg enabled.
- * @timeout_retransmission_cnt: timeout retransmission events
- * @high_ber_cnt: high ber events
+ * @timeout_retransmission_cnt: timeout retransmission events.
+ * @high_ber_cnt: high ber events.
+ * @pre_fec_ser: pre FEC SER value.
+ * @post_fec_ser: post FEC SER value.
+ * @throughput: measured throughput.
+ * @latency: measured latency.
 */
 struct cpucp_nic_status {
 	__le32 port;
@ -1200,6 +1255,10 @@ struct cpucp_nic_status {
 	__u8 auto_neg;
 	__le32 timeout_retransmission_cnt;
 	__le32 high_ber_cnt;
+	struct ser_val pre_fec_ser;
+	struct ser_val post_fec_ser;
+	struct frac_val bandwidth;
+	struct frac_val lat;
 };

 enum cpucp_hbm_row_replace_cause {
@ -1292,6 +1351,7 @@ struct cpucp_dev_info_signed {
 	__u8 certificate[SEC_CERTIFICATE_BUF_SZ];
 };

+#define DCORE_MON_REGS_SZ	512
 /*
 * struct dcore_monitor_regs_data - DCORE monitor regs data.
 * the structure follows sync manager block layout. relevant only to Gaudi.
@ -1302,11 +1362,11 @@ struct cpucp_dev_info_signed {
 * @mon_status: array of monitor status.
 */
 struct dcore_monitor_regs_data {
-	__le32 mon_pay_addrl[512];
-	__le32 mon_pay_addrh[512];
-	__le32 mon_pay_data[512];
-	__le32 mon_arm[512];
-	__le32 mon_status[512];
+	__le32 mon_pay_addrl[DCORE_MON_REGS_SZ];
+	__le32 mon_pay_addrh[DCORE_MON_REGS_SZ];
+	__le32 mon_pay_data[DCORE_MON_REGS_SZ];
+	__le32 mon_arm[DCORE_MON_REGS_SZ];
+	__le32 mon_status[DCORE_MON_REGS_SZ];
 };

 /* contains SM data for each SYNC_MNGR (relevant only to Gaudi) */
@ -1317,4 +1377,14 @@ struct cpucp_monitor_dump {
 	struct dcore_monitor_regs_data sync_mngr_e_n;
 };

+/*
+ * The Type of the generic request (and other input arguments) will be fetched from user by reading
+ * from "pkt_subidx" field in struct cpucp_packet.
+ *
+ * HL_PASSTHROUGHT_VERSIONS	- Fetch all firmware versions.
+ */
+enum hl_passthrough_type {
+	HL_PASSTHROUGH_VERSIONS,
+};
+
 #endif /* CPUCP_IF_H */
--- a/drivers/accel/habanalabs/include/common/hl_boot_if.h
+++ b/drivers/accel/habanalabs/include/common/hl_boot_if.h
@ -40,6 +40,19 @@ enum cpu_boot_err {
 	CPU_BOOT_ERR_LAST = 64 /* we have 2 registers of 32 bits */
 };

+/*
+ * Mask for fatal failures
+ * This mask contains all possible fatal failures, and a dynamic code
+ * will clear the non-relevant ones.
+ */
+#define CPU_BOOT_ERR_FATAL_MASK					\
+		((1 << CPU_BOOT_ERR_DRAM_INIT_FAIL) |		\
+		 (1 << CPU_BOOT_ERR_PLL_FAIL) |			\
+		 (1 << CPU_BOOT_ERR_DEVICE_UNUSABLE_FAIL) |	\
+		 (1 << CPU_BOOT_ERR_BINNING_FAIL) |		\
+		 (1 << CPU_BOOT_ERR_DRAM_SKIPPED) |		\
+		 (1 << CPU_BOOT_ERR_EEPROM_FAIL))
+
 /*
 * CPU error bits in BOOT_ERROR registers
 *
@ -439,7 +452,7 @@ struct cpu_dyn_regs {
 /* TODO: remove the desc magic after the code is updated to use message */
 /* HCDM - Habana Communications Descriptor Magic */
 #define HL_COMMS_DESC_MAGIC	0x4843444D
-#define HL_COMMS_DESC_VER	1
+#define HL_COMMS_DESC_VER	3

 /* HCMv - Habana Communications Message + header version */
 #define HL_COMMS_MSG_MAGIC_VALUE	0x48434D00
@ -450,8 +463,10 @@ struct cpu_dyn_regs {
 					((ver) & HL_COMMS_MSG_MAGIC_VER_MASK))
 #define HL_COMMS_MSG_MAGIC_V0		HL_COMMS_DESC_MAGIC
 #define HL_COMMS_MSG_MAGIC_V1		HL_COMMS_MSG_MAGIC_VER(1)
+#define HL_COMMS_MSG_MAGIC_V2		HL_COMMS_MSG_MAGIC_VER(2)
+#define HL_COMMS_MSG_MAGIC_V3		HL_COMMS_MSG_MAGIC_VER(3)

-#define HL_COMMS_MSG_MAGIC		HL_COMMS_MSG_MAGIC_V1
+#define HL_COMMS_MSG_MAGIC		HL_COMMS_MSG_MAGIC_V3

 #define HL_COMMS_MSG_MAGIC_VALIDATE_MAGIC(magic)			\
 		(((magic) & HL_COMMS_MSG_MAGIC_MASK) ==			\
@ -474,22 +489,31 @@ enum comms_msg_type {

 /*
 * Binning information shared between LKD and FW
- * @tpc_mask - TPC binning information
+ * @tpc_mask_l - TPC binning information lower 64 bit
 * @dec_mask - Decoder binning information
- * @hbm_mask - HBM binning information
+ * @dram_mask - DRAM binning information
 * @edma_mask - EDMA binning information
 * @mme_mask_l - MME binning information lower 32
 * @mme_mask_h - MME binning information upper 32
- * @reserved - reserved field for 64 bit alignment
+ * @rot_mask - Rotator binning information
+ * @xbar_mask - xBAR binning information
+ * @reserved - reserved field for future binning info w/o ABI change
+ * @tpc_mask_h - TPC binning information upper 64 bit
+ * @nic_mask - NIC binning information
 */
 struct lkd_fw_binning_info {
-	__le64 tpc_mask;
+	__le64 tpc_mask_l;
 	__le32 dec_mask;
-	__le32 hbm_mask;
+	__le32 dram_mask;
 	__le32 edma_mask;
 	__le32 mme_mask_l;
 	__le32 mme_mask_h;
-	__le32 reserved;
+	__le32 rot_mask;
+	__le32 xbar_mask;
+	__le32 reserved0;
+	__le64 tpc_mask_h;
+	__le64 nic_mask;
+	__le32 reserved1[8];
 };

 /* TODO: remove this struct after the code is updated to use message */
@ -512,6 +536,23 @@ struct comms_msg_header {
 	__u8 reserved[4];	/* pad to 64 bit */
 };

+enum lkd_fw_ascii_msg_lvls {
+	LKD_FW_ASCII_MSG_ERR = 0,
+	LKD_FW_ASCII_MSG_WRN = 1,
+	LKD_FW_ASCII_MSG_INF = 2,
+	LKD_FW_ASCII_MSG_DBG = 3,
+};
+
+#define LKD_FW_ASCII_MSG_MAX_LEN	128
+#define LKD_FW_ASCII_MSG_MAX		4	/* consider ABI when changing */
+
+struct lkd_fw_ascii_msg {
+	__u8 valid;
+	__u8 msg_lvl;
+	__u8 reserved[6];
+	char msg[LKD_FW_ASCII_MSG_MAX_LEN];
+};
+
 /* this is the main FW descriptor - consider ABI when changing */
 struct lkd_fw_comms_desc {
 	struct comms_desc_header header;
@ -521,6 +562,8 @@ struct lkd_fw_comms_desc {
 	/* can be used for 1 more version w/o ABI change */
 	char reserved0[VERSION_MAX_LEN];
 	__le64 img_addr;	/* address for next FW component load */
+	struct lkd_fw_binning_info binning_info;
+	struct lkd_fw_ascii_msg ascii_msg[LKD_FW_ASCII_MSG_MAX];
 };

 enum comms_reset_cause {
@ -545,6 +588,8 @@ struct lkd_fw_comms_msg {
 			char reserved0[VERSION_MAX_LEN];
 			/* address for next FW component load */
 			__le64 img_addr;
+			struct lkd_fw_binning_info binning_info;
+			struct lkd_fw_ascii_msg ascii_msg[LKD_FW_ASCII_MSG_MAX];
 		};
 		struct {
 			__u8 reset_cause;
@ -552,7 +597,7 @@ struct lkd_fw_comms_msg {
 		struct {
 			__u8 fw_cfg_skip; /* 1 - skip, 0 - don't skip */
 		};
-		struct lkd_fw_binning_info binning_info;
+		struct lkd_fw_binning_info binning_conf;
 	};
 };

@ -699,4 +744,92 @@ struct comms_status {
 	};
 };

+/**
+ * HL_MODULES_MAX_NUM is determined by the size of modules_mask in struct
+ *      hl_component_versions
+ */
+enum hl_modules {
+	HL_MODULES_BOOT_INFO = 0,
+	HL_MODULES_EEPROM,
+	HL_MODULES_FDT,
+	HL_MODULES_I2C,
+	HL_MODULES_LZ4,
+	HL_MODULES_MBEDTLS,
+	HL_MODULES_MAX_NUM = 16
+};
+
+/**
+ * HL_COMPONENTS_MAX_NUM is determined by the size of components_mask in
+ *      struct cpucp_versions
+ */
+enum hl_components {
+	HL_COMPONENTS_PID = 0,
+	HL_COMPONENTS_MGMT,
+	HL_COMPONENTS_PREBOOT,
+	HL_COMPONENTS_PPBOOT,
+	HL_COMPONENTS_ARMCP,
+	HL_COMPONENTS_CPLD,
+	HL_COMPONENTS_UBOOT,
+	HL_COMPONENTS_MAX_NUM = 16
+};
+
+/**
+ * struct hl_component_versions - versions associated with hl component.
+ * @struct_size: size of all the struct (including dynamic size of modules).
+ * @modules_offset: offset of the modules field in this struct.
+ * @component: version of the component itself.
+ * @fw_os: Firmware OS Version.
+ * @modules_mask: i'th bit (from LSB) is a flag - on if module i in enum
+ *              hl_modules is used.
+ * @modules_counter: number of set bits in modules_mask.
+ * @reserved: reserved for future use.
+ * @modules: versions of the component's modules. Elborated explanation in
+ *              struct cpucp_versions.
+ */
+struct hl_component_versions {
+	__le16 struct_size;
+	__le16 modules_offset;
+	__u8 component[VERSION_MAX_LEN];
+	__u8 fw_os[VERSION_MAX_LEN];
+	__le16 modules_mask;
+	__u8 modules_counter;
+	__u8 reserved[1];
+	__u8 modules[][VERSION_MAX_LEN];
+};
+
+/**
+ * struct hl_fw_versions - all versions (fuse, cpucp's components with their
+ *              modules)
+ * @struct_size: size of all the struct (including dynamic size of components).
+ * @components_offset: offset of the components field in this struct.
+ * @fuse: silicon production FUSE information.
+ * @components_mask: i'th bit (from LSB) is a flag - on if component i in enum
+ *              hl_components is used.
+ * @components_counter: number of set bits in components_mask.
+ * @reserved: reserved for future use.
+ * @components: versions of hl components. Index i corresponds to the i'th bit
+ *              that is *on* in components_mask. For example, if
+ *              components_mask=0b101, then *components represents arcpid and
+ *              *(hl_component_versions*)((char*)components + 1') represents
+ *              preboot, where 1' = components[0].struct_size.
+ */
+struct hl_fw_versions {
+	__le16 struct_size;
+	__le16 components_offset;
+	__u8 fuse[VERSION_MAX_LEN];
+	__le16 components_mask;
+	__u8 components_counter;
+	__u8 reserved[1];
+	struct hl_component_versions components[];
+};
+
+/* Max size of struct hl_component_versions */
+#define HL_COMPONENT_VERSIONS_MAX_SIZE \
+	(sizeof(struct hl_component_versions) + HL_MODULES_MAX_NUM * \
+	 VERSION_MAX_LEN)
+
+/* Max size of struct hl_fw_versions */
+#define HL_FW_VERSIONS_MAX_SIZE (sizeof(struct hl_fw_versions) + \
+		HL_COMPONENTS_MAX_NUM * HL_COMPONENT_VERSIONS_MAX_SIZE)
+
 #endif /* HL_BOOT_IF_H */
--- a/drivers/accel/habanalabs/include/common/qman_if.h
+++ b/drivers/accel/habanalabs/include/common/qman_if.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/cpu_if_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/cpu_if_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_core_masks.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_core_masks.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_qm_masks.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_qm_masks.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma0_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma1_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma1_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma1_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma1_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma2_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma2_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma2_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma2_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma3_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma3_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma3_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma3_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma4_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma4_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma4_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma4_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma5_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma5_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma5_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma5_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma6_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma6_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma6_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma6_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma7_core_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma7_core_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma7_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma7_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_n_down_ch0_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_n_down_ch0_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_n_down_ch1_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_n_down_ch1_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_n_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_n_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_s_down_ch0_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_s_down_ch0_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_s_down_ch1_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_s_down_ch1_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_s_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_e_s_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_n_down_ch0_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_n_down_ch0_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_n_down_ch1_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_n_down_ch1_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_n_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_n_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_s_down_ch0_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_s_down_ch0_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_s_down_ch1_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_s_down_ch1_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_s_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/dma_if_w_s_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/gaudi_blocks.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/gaudi_blocks.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/gaudi_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/gaudi_regs.h
@ -320,4 +320,6 @@
 #define mmPSOC_TPC_PLL_NR                                            0xC73100
 #define mmIF_W_PLL_NR                                                0x488100

+#define mmPCIE_WRAP_RR_ELBI_RD_SEC_REG_CTRL                          0xC01208
+
 #endif /* ASIC_REG_GAUDI_REGS_H_ */
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme0_ctrl_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme0_ctrl_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme0_qm_masks.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme0_qm_masks.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme0_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme0_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme1_ctrl_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme1_ctrl_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme2_ctrl_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme2_ctrl_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme2_qm_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme2_qm_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mme3_ctrl_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mme3_ctrl_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/mmu_up_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/mmu_up_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
--- a/drivers/accel/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
--- a/Show more
+++ b/Show more