Commit graph

5961 commits

Author SHA1 Message Date
Linus Torvalds
a594533df0 drm for 6.2:
Initial accel subsystem support. There are no drivers yet, just the framework.
 
 New driver:
 - ofdrm - replacement for offb
 
 fbdev:
 - add support for nomodeset
 
 fourcc:
 - add Vivante tiled modifier
 
 core:
 - atomic-helpers: CRTC primary plane test fixes, fb access hooks
 - connector: TV API consistency, cmdline parser improvements
 - send connector hotplug on cleanup
 - sort makefile objects
 
 tests:
 - sort kunit tests
 - improve DP-MST tests
 - add kunit helpers to create a device
 
 sched:
 - module param for scheduling policy
 - refcounting fix
 
 buddy:
 - add back random seed log
 
 ttm:
 - convert ttm_resource to size_t
 - optimize pool allocations
 
 edid:
 - HFVSDB parsing support fixes
 - logging/debug improvements
 - DSC quirks
 
 dma-buf:
 - Add unlocked vmap and attachment mapping
 - move drivers to common locking convention
 - locking improvements
 
 firmware:
 - new API for rPI firmware and vc4
 
 xilinx:
 - zynqmp: displayport bridge support
 - dpsub fix
 
 bridge:
 - adv7533: Remove dynamic lane switching
 - it6505: Runtime PM support, sync improvements
 - ps8640: Handle AUX defer messages
 - tc358775: Drop soft-reset over I2C
 
 panel:
 - panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
 - Jadard JD9365DA-H3
 - NewVision NV3051D
 
 amdgpu:
 - DCN support on ARM
 - DCN 2.1 secure display
 - Sienna Cichlid mode2 reset fixes
 - new GC 11.x firmware versions
 - drop AMD specific DSC workarounds in favour of drm code
 - clang warning fixes
 - scheduler rework
 - SR-IOV fixes
 - GPUVM locking fixes
 - fix memory leak in CS IOCTL error path
 - flexible array updates
 - enable new GC/PSP/SMU/NBIO IP
 - GFX preemption support for gfx9
 
 amdkfd:
 - cache size fixes
 - userptr fixes
 - enable cooperative launch on gfx 10.3
 - enable GC 11.0.4 KFD support
 
 radeon:
 - replace kmap with kmap_local_page
 - ACPI ref count fix
 - HDA audio notifier support
 
 i915:
 - DG2 enabled by default
 - MTL enablement work
 - hotplug refactoring
 - VBT improvements
 - Display and watermark refactoring
 - ADL-P workaround
 - temp disable runtime_pm for discrete-
 - fix for A380 as a secondary GPU
 - Wa_18017747507 for DG2
 - CS timestamp support fixes for gen5 and earlier
 - never purge busy TTM objects
 - use i915_sg_dma_sizes for all backends
 - demote GuC kernel contexts to normal priority
 - gvt: refactor for new MDEV interface
 - enable DC power states on eDP ports
 - fix gen 2/3 workarounds
 
 nouveau:
 - fix page fault handling
 - Ampere acceleration support
 - driver stability improvements
 - nva3 backlight support
 
 msm:
 - MSM_INFO_GET_FLAGS support
 - DPU: XR30 and P010 image formats
 - Qualcomm SM6115 support
 - DSI PHY support for QCM2290
 - HDMI: refactored dev init path
 - remove exclusive-fence hack
 - fix speed-bin detection
 - enable clamp to idle on 7c3
 - improved hangcheck detection
 
 vmwgfx:
 - fb and cursor refactoring
 - convert to generic hashtable
 - cursor improvements
 
 etnaviv:
 - hw workarounds
 - softpin MMU fixes
 
 ast:
 - atomic gamma LUT support
 - convert to SHMEM
 
 lcdif:
 - support YUV planes
 - Increase DMA burst size
 - FIFO threshold tuning
 
 meson:
 - fix return type of cvbs mode_valid
 
 mgag200:
 - fix PLL setup on some revisions
 
 sun4i:
 - A100 and D1 support
 
 udl:
 - modesetting improvements
 - hot unplug support
 
 vc4:
 - support PAL-M
 - fix regression preventing 4K @ 60Hz
 - fix NULL ptr deref
 
 v3d:
 - switch to drm managed resources
 
 renesas:
 - RZ/G2L DSI support
 - DU Kconfig cleanup
 
 mediatek:
 - fixup dpi and hdmi
 - MT8188 dpi support
 - MT8195 AFBC support
 
 tegra:
 - NVDEC hardware on Tegra234 SoC
 
 hdlcd:
 - switch to drm managed resources
 
 ingenic:
 - fix registration error path
 
 hisilicon:
 - convert to drm_mode_init
 
 maildp:
 - use managed resources
 
 mtk:
 - use drm_mode_init
 
 rockchip:
 - use drm_mode_copy
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmOXxI0ACgkQDHTzWXnE
 hr4NyBAAojK3N+XJf2b8LWuRKsShCr5FXlteEDxiYGLeB8/g4x3LztSfHgUg0iuS
 nP1m7Cx4snXcVNS6iyOsoZVq1EGUAWvv+mPWJe1UywjpyqtciTVQ11GEHRvI/w+V
 GRvkhmt/TsoZA0QIlS2MaOmhn9j17QOcuYTUjYdyRL4tsrHWrTASH5W1Jt2xmDyw
 5FUJvfukPWm100DVWbh6hWbCKL22bDDF/nj1H+G6hYSyTjVbk7wZ0vy2m6TnIHNF
 iyBHBIzFPg3BveiSlKe6aVX7Gq2d8bfqjHsgN5f1qcS4ejWEkHLVxJtBdOB+fOSC
 7o8Ms7WHi1AmnkOVCGRIjJ0cJrLZu2HDlyhViguAO1XQ3Jvuo/4WW3mplv+YPOMc
 c+P/zuPG42d4lrISuB8wspTdOgxmqpZDkg3HE6n1+jiVR0u4hTTYktoPnLsHX6KG
 l/l2B6aVAxE4b6P0q3ofYoAnk5rNsb1YUS+a8kC6f97TQ3gmOsN75iZXD/ASHg2r
 ozhh2wcFxIPkJhE7vqLWPIBCWQs93sGyQXoI7Q0TJaIAZTXV0VmO1BIofetpVImE
 7FhDC4wvBedXywN8NYUEFbCTOnIcDMteM/i6S1ns78s5UjDa5osPuS5I02VT1lbN
 tvnJoHNkhCt13lJz63b0HNFm3cPKoRosCQhJeshyUYaFKs+evL0=
 =pABG
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "The biggest highlight is that the accel subsystem framework is merged.
  Hopefully for 6.3 we will be able to line up a driver to use it.

  In drivers land, i915 enables DG2 support by default now, and nouveau
  has a big stability refactoring and initial ampere support, AMD
  includes new hw IP support and should build on ARM again. There is
  also an ofdrm driver to take over offb on platforms it's used.

  Stuff outside my tree, the dma-buf patches hit a few places, the vc4
  firmware changes also do, and i915 has some interactions with MEI for
  discrete GPUs. I think all of those should have been acked/reviewed by
  relevant parties.

  New driver:
   - ofdrm - replacement for offb

  fbdev:
   - add support for nomodeset

  fourcc:
   - add Vivante tiled modifier

  core:
   - atomic-helpers: CRTC primary plane test fixes, fb access hooks
   - connector: TV API consistency, cmdline parser improvements
   - send connector hotplug on cleanup
   - sort makefile objects

  tests:
   - sort kunit tests
   - improve DP-MST tests
   - add kunit helpers to create a device

  sched:
   - module param for scheduling policy
   - refcounting fix

  buddy:
   - add back random seed log

  ttm:
   - convert ttm_resource to size_t
   - optimize pool allocations

  edid:
   - HFVSDB parsing support fixes
   - logging/debug improvements
   - DSC quirks

  dma-buf:
   - Add unlocked vmap and attachment mapping
   - move drivers to common locking convention
   - locking improvements

  firmware:
   - new API for rPI firmware and vc4

  xilinx:
   - zynqmp: displayport bridge support
   - dpsub fix

  bridge:
   - adv7533: Remove dynamic lane switching
   - it6505: Runtime PM support, sync improvements
   - ps8640: Handle AUX defer messages
   - tc358775: Drop soft-reset over I2C

  panel:
   - panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
   - Jadard JD9365DA-H3
   - NewVision NV3051D

  amdgpu:
   - DCN support on ARM
   - DCN 2.1 secure display
   - Sienna Cichlid mode2 reset fixes
   - new GC 11.x firmware versions
   - drop AMD specific DSC workarounds in favour of drm code
   - clang warning fixes
   - scheduler rework
   - SR-IOV fixes
   - GPUVM locking fixes
   - fix memory leak in CS IOCTL error path
   - flexible array updates
   - enable new GC/PSP/SMU/NBIO IP
   - GFX preemption support for gfx9

  amdkfd:
   - cache size fixes
   - userptr fixes
   - enable cooperative launch on gfx 10.3
   - enable GC 11.0.4 KFD support

  radeon:
   - replace kmap with kmap_local_page
   - ACPI ref count fix
   - HDA audio notifier support

  i915:
   - DG2 enabled by default
   - MTL enablement work
   - hotplug refactoring
   - VBT improvements
   - Display and watermark refactoring
   - ADL-P workaround
   - temp disable runtime_pm for discrete-
   - fix for A380 as a secondary GPU
   - Wa_18017747507 for DG2
   - CS timestamp support fixes for gen5 and earlier
   - never purge busy TTM objects
   - use i915_sg_dma_sizes for all backends
   - demote GuC kernel contexts to normal priority
   - gvt: refactor for new MDEV interface
   - enable DC power states on eDP ports
   - fix gen 2/3 workarounds

  nouveau:
   - fix page fault handling
   - Ampere acceleration support
   - driver stability improvements
   - nva3 backlight support

  msm:
   - MSM_INFO_GET_FLAGS support
   - DPU: XR30 and P010 image formats
   - Qualcomm SM6115 support
   - DSI PHY support for QCM2290
   - HDMI: refactored dev init path
   - remove exclusive-fence hack
   - fix speed-bin detection
   - enable clamp to idle on 7c3
   - improved hangcheck detection

  vmwgfx:
   - fb and cursor refactoring
   - convert to generic hashtable
   - cursor improvements

  etnaviv:
   - hw workarounds
   - softpin MMU fixes

  ast:
   - atomic gamma LUT support
   - convert to SHMEM

  lcdif:
   - support YUV planes
   - Increase DMA burst size
   - FIFO threshold tuning

  meson:
   - fix return type of cvbs mode_valid

  mgag200:
   - fix PLL setup on some revisions

  sun4i:
   - A100 and D1 support

  udl:
   - modesetting improvements
   - hot unplug support

  vc4:
   - support PAL-M
   - fix regression preventing 4K @ 60Hz
   - fix NULL ptr deref

  v3d:
   - switch to drm managed resources

  renesas:
   - RZ/G2L DSI support
   - DU Kconfig cleanup

  mediatek:
   - fixup dpi and hdmi
   - MT8188 dpi support
   - MT8195 AFBC support

  tegra:
   - NVDEC hardware on Tegra234 SoC

  hdlcd:
   - switch to drm managed resources

  ingenic:
   - fix registration error path

  hisilicon:
   - convert to drm_mode_init

  maildp:
   - use managed resources

  mtk:
   - use drm_mode_init

  rockchip:
   - use drm_mode_copy"

* tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm: (1397 commits)
  drm/amdgpu: fix mmhub register base coding error
  drm/amdgpu: add tmz support for GC IP v11.0.4
  drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4
  drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4
  drm/amdgpu: enable GFX IP v11.0.4 CG support
  drm/amdgpu: Make amdgpu_ring_mux functions as static
  drm/amdgpu: generally allow over-commit during BO allocation
  drm/amd/display: fix array index out of bound error in DCN32 DML
  drm/amd/display: 3.2.215
  drm/amd/display: set optimized required for comp buf changes
  drm/amd/display: Add debug option to skip PSR CRTC disable
  drm/amd/display: correct DML calc error of UrgentLatency
  drm/amd/display: correct static_screen_event_mask
  drm/amd/display: Ensure commit_streams returns the DC return code
  drm/amd/display: read invalid ddc pin status cause engine busy
  drm/amd/display: Bypass DET swath fill check for max clocks
  drm/amd/display: Disable uclk pstate for subvp pipes
  drm/amd/display: Fix DCN2.1 default DSC clocks
  drm/amd/display: Enable dp_hdmi21_pcon support
  drm/amd/display: prevent seamless boot on displays that don't have the preferred dig
  ...
2022-12-13 11:59:58 -08:00
Bjorn Helgaas
6aecc0a59e cxl: Remove unnecessary cxl_pci_window_alignment()
cxl_pci_window_alignment() is referenced only via the struct
pci_controller_ops.window_alignment function pointer, and only in the
powerpc implementation of pcibios_window_alignment().

pcibios_window_alignment() defaults to returning 1 if the function pointer
is NULL, which is the same was what cxl_pci_window_alignment() does.

cxl_pci_window_alignment() is unnecessary, so remove it.  No functional
change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221205223231.1268085-1-helgaas@kernel.org
2022-12-06 23:15:06 +11:00
Jason Gunthorpe
90337f526c Merge tag 'v6.1-rc7' into iommufd.git for-next
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 12:04:39 -04:00
David Hildenbrand
052d9b0f7a habanalabs: remove FOLL_FORCE usage
FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.

FOLL_FORCE in this case seems to be due to copy-and-past from other
drivers. Let's just remove it.

Link: https://lkml.kernel.org/r/20221116102659.70287-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oded Gabbay <ogabbay@kernel.org>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:59:00 -08:00
Greg Kroah-Hartman
ae27e8869f This tag contains habanalabs driver changes for v6.2:
- New feature of graceful hard-reset. Instead of immediately killing the
   user-process when a command submission times out, we wait a bit and give
   the user-process notification and let it try to close things gracefully,
   with the ability to retrieve debug information.
 
 - Enhance the EventFD mechanism. Add new events such as access to illegal
   address (RAZWI), page fault, device unavailable. In addition, change the
   event workqueue to be handled in a single-threaded workqueue.
 
 - Allow the control device to work during reset of the ASIC, to enable
   monitoring applications to continue getting the data.
 
 - Add handling for Gaudi2 with PCI revision 2.
 
 - Reduce severity of prints due to power/thermal events.
 
 - Change how we use the h/w to perform memory scrubbing in Gaudi2.
 
 - Multiple bug fixes, refactors and renames.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmN+QSEACgkQZR1NuKta
 54Ducgf+PsD85BWXlqWkTa2S8tn7h+OCETapAEY+JMRu1UB15ccLGZlH1O/L2try
 NBjUEcbzvS1KPYmNMubKXOIlacTJrukaoqvtMfLSe1+Y/zvfTpF1+LGLp39wSRo8
 R36A1VEQxalSZoFhQMERVoWBfjiVaW3ENqe3H9Vb/ab9QdMUzP4P4uaLsECtsLSy
 ft31ZcN+jPjhjSYC7xZjzd3KXvVqlQ/5TsXdX6nsxphrOUiKxT55Gsypkx5O4vt1
 Q4aw+v3Z0NgknDF90n7O90y/wgE3OqKHiKl+9l7lS/WkLhhaWknJ9zJlfLI8uiEH
 UjMku/EpH6SSN5hrCDQvtFaXJSgiWQ==
 =cHXD
 -----END PGP SIGNATURE-----

Merge tag 'misc-habanalabs-next-2022-11-23' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v6.2:

- New feature of graceful hard-reset. Instead of immediately killing the
  user-process when a command submission times out, we wait a bit and give
  the user-process notification and let it try to close things gracefully,
  with the ability to retrieve debug information.

- Enhance the EventFD mechanism. Add new events such as access to illegal
  address (RAZWI), page fault, device unavailable. In addition, change the
  event workqueue to be handled in a single-threaded workqueue.

- Allow the control device to work during reset of the ASIC, to enable
  monitoring applications to continue getting the data.

- Add handling for Gaudi2 with PCI revision 2.

- Reduce severity of prints due to power/thermal events.

- Change how we use the h/w to perform memory scrubbing in Gaudi2.

- Multiple bug fixes, refactors and renames.

* tag 'misc-habanalabs-next-2022-11-23' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (63 commits)
  habanalabs: fix VA range calculation
  habanalabs: fail driver load if EEPROM errors detected
  habanalabs: make print of engines idle mask more readable
  habanalabs: clear non-released encapsulated signals
  habanalabs: don't put context in hl_encaps_handle_do_release_sob()
  habanalabs: print context refcount value if hard reset fails
  habanalabs: add RMWREG32_SHIFTED to set a val within a mask
  habanalabs: fix rc when new CPUCP opcodes are not supported
  habanalabs/gaudi2: added memset for the cq_size register
  habanalabs: added return value check for hl_fw_dynamic_send_clear_cmd()
  habanalabs: increase the size of busy engines mask
  habanalabs/gaudi2: change memory scrub mechanism
  habanalabs: extend process wait timeout in device fine
  habanalabs: check schedule_hard_reset correctly
  habanalabs: reset device if still in use when released
  habanalabs/gaudi2: return to reset upon SM SEI BRESP error
  habanalabs/gaudi2: don't enable entries in the MSIX_GW table
  habanalabs/gaudi2: remove redundant firmware version check
  habanalabs/gaudi: fix print for firmware-alive event
  habanalabs: fix print for out-of-sync and pkt-failure events
  ...
2022-11-29 13:19:29 +01:00
Greg Kroah-Hartman
fb12940f51 driver core: fix up some missing class.devnode() conversions.
In commit ff62b8e658 ("driver core: make struct class.devnode() take a
const *") the ->devnode callback changed the pointer to be const, but a
few instances of PowerPC drivers were not caught for some reason.

Fix this up by changing the pointers to be const.

Fixes: ff62b8e658 ("driver core: make struct class.devnode() take a const *")
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20221128173539.3112234-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-29 09:28:46 +01:00
Al Viro
de4eda9de2 use less confusing names for iov_iter direction initializers
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 13:01:55 -05:00
Abel Vesa
9bde43a0e2 misc: fastrpc: Add dma_mask to fastrpc_channel_ctx
dma_set_mask_and_coherent only updates the mask to which the device
dma_mask pointer points to. Add a dma_mask to the channel ctx and set
the device dma_mask to point to that, otherwise the dma_set_mask will
return an error and the dma_set_coherent_mask will be skipped too.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-11-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:34 +01:00
Abel Vesa
532ad70c6d misc: fastrpc: Add mmap request assigning for static PD pool
If the mmap request is to add pages and thre are VMIDs associated with
that context, do a call to SCM to reassign that memory. Do not do this
for remote heap allocation, that is done on init create static process
only.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-10-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:34 +01:00
Abel Vesa
76e8e4ace1 misc: fastrpc: Safekeep mmaps on interrupted invoke
If the userspace daemon is killed in the middle of an invoke (e.g.
audiopd listerner invoke), we need to skip the unmapping on device
release, otherwise the DSP will crash. So lets safekeep all the maps
only if there is in invoke interrupted, by attaching them to the channel
context (which is resident until RPMSG driver is removed), and free them
on RPMSG driver remove.

Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-9-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Abel Vesa
0871561055 misc: fastrpc: Add support for audiopd
In order to be able to start the adsp listener for audiopd using adsprpcd,
we need to add the corresponding ioctl for creating a static process.
On that ioctl call we need to allocate the heap. Allocating the heap needs
to be happening only once and needs to be kept between different device
open calls, so attach it to the channel context to make sure that remains
until the RPMSG driver is removed. Then, if there are any VMIDs associated
with the static ADSP process, do a call to SCM to assign it.
And then, send all the necessary info related to heap to the DSP.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-8-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Abel Vesa
72fa6f7820 misc: fastrpc: Rework fastrpc_req_munmap
Move the lookup of the munmap request to the fastrpc_req_munmap and pass
on only the buf to the lower level fastrpc_req_munmap_impl. That way
we can use the lower level fastrpc_req_munmap_impl on error path in
fastrpc_req_mmap to free the buf without searching for the munmap
request it belongs to.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-7-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Abel Vesa
334f1a1cbe misc: fastrpc: Use fastrpc_map_put in fastrpc_map_create on fail
Move the kref_init right after the allocation so that we can use
fastrpc_map_put on any following error case.

Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Abel Vesa
6f18c7e845 misc: fastrpc: Add fastrpc_remote_heap_alloc
Split fastrpc_buf_alloc in such a way it allows allocation of remote
heap too and add fastrpc_remote_heap_alloc to do so.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-5-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Abel Vesa
1ce91d45ba misc: fastrpc: Add reserved mem support
The reserved mem support is needed for CMA heap support, which will be
used by AUDIOPD.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Abel Vesa
1959ab9edc misc: fastrpc: Rename audio protection domain to root
The AUDIO_PD will be done via static pd, so the proper name here is
actually ROOT_PD.

Co-developed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221125071405.148786-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 18:45:33 +01:00
Greg Kroah-Hartman
ff62b8e658 driver core: make struct class.devnode() take a const *
The devnode() in struct class should not be modifying the device that is
passed into it, so mark it as a const * and propagate the function
signature changes out into all relevant subsystems that use this
callback.

Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: John Stultz <jstultz@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Sean Young <sean@mess.org>
Cc: Frank Haverkamp <haver@linux.ibm.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Xie Yongji <xieyongji@bytedance.com>
Cc: Gautam Dawar <gautam.dawar@xilinx.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Eli Cohen <elic@nvidia.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: alsa-devel@alsa-project.org
Cc: dri-devel@lists.freedesktop.org
Cc: kvm@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-block@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20221123122523.1332370-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-24 17:12:27 +01:00
Yang Yingliang
5f58cad1e4 ocxl: fix pci device refcount leak when calling get_function_0()
get_function_0() calls pci_get_domain_bus_and_slot(), as comment
says, it returns a pci device with refcount increment, so after
using it, pci_dev_put() needs be called.

Get the device reference when get_function_0() is not called, so
pci_dev_put() can be called in the error path and callers
unconditionally. And add comment above get_dvsec_vendor0() to tell
callers to call pci_dev_put().

Fixes: 87db7579eb ("ocxl: control via sysfs whether the FPGA is reloaded on a link reset")
Suggested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221121154339.4088935-1-yangyingliang@huawei.com
2022-11-24 23:31:46 +11:00
Yang Yingliang
295faa1772 ocxl: fix possible name leak in ocxl_file_register_afu()
If device_register() returns error in ocxl_file_register_afu(),
the name allocated by dev_set_name() need be freed. As comment
of device_register() says, it should use put_device() to give
up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanup(),
and info is freed in info_release().

Fixes: 75ca758adb ("ocxl: Create a clear delineation between ocxl backend & frontend")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221111145929.2429271-1-yangyingliang@huawei.com
2022-11-24 23:31:44 +11:00
Yang Yingliang
8bf03f557d cxl: fix possible null-ptr-deref in cxl_pci_init_afu|adapter()
If device_register() fails in cxl_pci_afu|adapter(), the device
is not added, device_unregister() can not be called in the error
path, otherwise it will cause a null-ptr-deref because of removing
not added device.

As comment of device_register() says, it should use put_device() to give
up the reference in the error path. So split device_unregister() into
device_del() and put_device(), then goes to put dev when register fails.

Fixes: f204e0b8ce ("cxl: Driver code for powernv PCIe based cards for userspace access")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221111145440.2426970-2-yangyingliang@huawei.com
2022-11-24 23:31:25 +11:00
Yang Yingliang
f949ccee1d cxl: fix possible null-ptr-deref in cxl_guest_init_afu|adapter()
If device_register() fails in cxl_register_afu|adapter(), the device
is not added, device_unregister() can not be called in the error path,
otherwise it will cause a null-ptr-deref because of removing not added
device.

As comment of device_register() says, it should use put_device() to give
up the reference in the error path. So split device_unregister() into
device_del() and put_device(), then goes to put dev when register fails.

Fixes: 14baf4d9c7 ("cxl: Add guest-specific code")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221111145440.2426970-1-yangyingliang@huawei.com
2022-11-24 23:31:24 +11:00
Miaoqian Lin
1d09697ff2 cxl: Fix refcount leak in cxl_calc_capp_routing
of_get_next_parent() returns a node pointer with refcount incremented,
we should use of_node_put() on it when not need anymore.
This function only calls of_node_put() in normal path,
missing it in the error path.
Add missing of_node_put() to avoid refcount leak.

Fixes: f24be42aab ("cxl: Add psl9 specific code")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220605060038.62217-1-linmq006@gmail.com
2022-11-24 23:12:19 +11:00
Dave Airlie
d47f958083 Linux 6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
 Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
 FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
 qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
 WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
 1GfXkXs=
 =gt6P
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.1-rc6' into drm-next

Linux 6.1-rc6

This is needed for drm-misc-next and tegra.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-24 11:05:43 +10:00
Yang Yingliang
02cd3032b1 cxl: fix possible null-ptr-deref in cxl_pci_init_afu|adapter()
If device_register() fails in cxl_pci_afu|adapter(), the device
is not added, device_unregister() can not be called in the error
path, otherwise it will cause a null-ptr-deref because of removing
not added device.

As comment of device_register() says, it should use put_device() to give
up the reference in the error path. So split device_unregister() into
device_del() and put_device(), then goes to put dev when register fails.

Fixes: f204e0b8ce ("cxl: Driver code for powernv PCIe based cards for userspace access")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111145440.2426970-2-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 20:03:31 +01:00
Yang Yingliang
61c80d1c38 cxl: fix possible null-ptr-deref in cxl_guest_init_afu|adapter()
If device_register() fails in cxl_register_afu|adapter(), the device
is not added, device_unregister() can not be called in the error path,
otherwise it will cause a null-ptr-deref because of removing not added
device.

As comment of device_register() says, it should use put_device() to give
up the reference in the error path. So split device_unregister() into
device_del() and put_device(), then goes to put dev when register fails.

Fixes: 14baf4d9c7 ("cxl: Add guest-specific code")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111145440.2426970-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 20:03:31 +01:00
Uwe Kleine-König
3127a86a37 misc: ds1682: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-487-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
781edb0530 misc: bh1770glc: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-486-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
9f28b675c1 misc: apds9802als: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-484-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
6757c6480d misc: apds990x: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-485-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
244179dbe1 misc: eeprom/idt_89hpesx: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-489-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
db687ce718 misc: isl29003: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-493-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
9c18dad44d misc: ics932s401: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-492-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:39 +01:00
Uwe Kleine-König
654700c9fc misc: hmc6352: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-491-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:38 +01:00
Uwe Kleine-König
99b0cb3f5f misc: eeprom/max6875: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-490-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:08 +01:00
Uwe Kleine-König
327e1ad186 misc: isl29020: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-494-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:07 +01:00
Uwe Kleine-König
8427bd8bde misc: tsl2550: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-496-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:05 +01:00
Uwe Kleine-König
59ee8ca4ee misc: eeprom/eeprom: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-488-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:03 +01:00
Uwe Kleine-König
7198cf0f1c misc: lis3lv02d/lis3lv02d_i2c: Convert to i2c's .probe_new()
The probe function doesn't make use of the i2c_device_id * parameter so it
can be trivially converted.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221118224540.619276-495-uwe@kleine-koenig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:56:01 +01:00
Zheng Wang
643a16a0eb misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os
In some bad situation, the gts may be freed gru_check_chiplet_assignment.
The call chain can be gru_unload_context->gru_free_gru_context->gts_drop
and kfree finally. However, the caller didn't know if the gts is freed
or not and use it afterwards. This will trigger a Use after Free bug.

Fix it by introducing a return value to see if it's in error path or not.
Free the gts in caller if gru_check_chiplet_assignment check failed.

Fixes: 55484c45db ("gru: allow users to specify gru chiplet 2")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Link: https://lore.kernel.org/r/20221110035033.19498-1-zyytlz.wz@163.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:55:48 +01:00
ruanjinjie
fd2c930cf6 misc: tifm: fix possible memory leak in tifm_7xx1_switch_media()
If device_register() returns error in tifm_7xx1_switch_media(),
name of kobject which is allocated in dev_set_name() called in device_add()
is leaked.

Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized.

Fixes: 2428a8fe22 ("tifm: move common device management tasks from tifm_7xx1 to tifm_core")
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20221117064725.3478402-1-ruanjinjie@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:55:26 +01:00
Yang Yingliang
27158c7267 ocxl: fix pci device refcount leak when calling get_function_0()
get_function_0() calls pci_get_domain_bus_and_slot(), as comment
says, it returns a pci device with refcount increment, so after
using it, pci_dev_put() needs be called.

Get the device reference when get_function_0() is not called, so
pci_dev_put() can be called in the error path and callers
unconditionally. And add comment above get_dvsec_vendor0() to tell
callers to call pci_dev_put().

Fixes: 87db7579eb ("ocxl: control via sysfs whether the FPGA is reloaded on a link reset")
Suggested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Link: https://lore.kernel.org/r/20221121154339.4088935-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:49:22 +01:00
Yang Yingliang
a4cb1004ae misc: ocxl: fix possible name leak in ocxl_file_register_afu()
If device_register() returns error in ocxl_file_register_afu(),
the name allocated by dev_set_name() need be freed. As comment
of device_register() says, it should use put_device() to give
up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanup(),
and info is freed in info_release().

Fixes: 75ca758adb ("ocxl: Create a clear delineation between ocxl backend & frontend")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111145929.2429271-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:49:17 +01:00
Alexander Usyskin
0ef77698b8 mei: bus-fixup: change pxp mode only if message was sent
Move PXP mode state machine to SETUP mode only if
memory ready message sent successfully to the firmware.
Leave it in INIT mode otherwise to allow try to send message later.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20221116124735.2493847-3-alexander.usyskin@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:43:33 +01:00
Alexander Usyskin
83f47eea74 mei: add timeout to send
When driver wakes up the firmware from the low power state,
it is sending a memory ready message.
The send is done via synchronous/blocking function to ensure
that firmware is in ready state. However, in case of firmware
undergoing reset send might be block forever.
To address this issue a timeout is added to blocking
write command on the internal bus.

Introduce the __mei_cl_send_timeout function to use instead of
__mei_cl_send in cases where timeout is required.
The mei_cl_write has only two callers and there is no need to split
it into two functions.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20221116124735.2493847-2-alexander.usyskin@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 19:43:33 +01:00
Ohad Sharabi
19a17a9fb4 habanalabs: fix VA range calculation
Current implementation is fixing the page size to PAGE_SIZE whereas the
input page size may be different.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:54:10 +02:00
Ofir Bitton
5354a2a001 habanalabs: fail driver load if EEPROM errors detected
In case EEPROM is not burned, firmware sets default EEPROM values.
As this is not valid in production, driver should fail load upon any
EEPROM error reported by firmware.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:54:10 +02:00
Tomer Tayar
1b18cf33d6 habanalabs: make print of engines idle mask more readable
The engines idle mask was increased to be an array of 4 u64 entries.
To make the print of this mask more readable, remove the "0x" prefix,
and zero-pad each u64 to 16 bytes if either it isn't zero or if any of
the higher-order u64's is not zero.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:49:10 +02:00
Tomer Tayar
893afb248c habanalabs: clear non-released encapsulated signals
Reserved encapsulated signals which were not released hold the context
refcount, leading to a failure when killing the user process on device
reset or device fini.
Add the release of these left signals in the CS roll-back process.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:47:42 +02:00
Tomer Tayar
1f615120fc habanalabs: don't put context in hl_encaps_handle_do_release_sob()
hl_encaps_handle_do_release_sob() can be called only when the last
reference to the context object is released and hl_ctx_do_release() is
initiated, and therefore it shouldn't call hl_ctx_put().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:46:29 +02:00
Tomer Tayar
408c46bd6e habanalabs: print context refcount value if hard reset fails
Failing to kill a user process during a hard reset can be due to a
reference to the user context which isn't released.
To make it easier to understand if this the reason for the failure and
not something else, add a print of the context refcount value.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:45:23 +02:00
Dafna Hirschfeld
0abcae8b48 habanalabs: add RMWREG32_SHIFTED to set a val within a mask
This is similar to RMWREG32, but the given 'val' is already shifted
according to the mask.
This allows several 'ORed' vals and masks to be set at once
The patch also fixes wrong usage of RMWREG32 by replacing
it with RMWREG32_SHIFTED

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:44:42 +02:00
Tomer Tayar
56fb517775 habanalabs: fix rc when new CPUCP opcodes are not supported
When the new CPUCP opcodes are not supported and a CPUCP packet fails,
the return value is the F/W error resposone which is a positive value.
If this packet is sent from IOCTL and the positive value is used, the
ICOTL will not be considered as unsuccessful.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:44:02 +02:00
Marco Pagani
6825b5f81f habanalabs/gaudi2: added memset for the cq_size register
The clang-analyzer reported a warning: "Value stored to
'cq_size_addr' is never read".

The cq_size register of dcore0 is not being zeroed using
gaudi2_memset_device_lbw(), along with the other cq_* registers,
even though the corresponding cq_size_addr variable is set.

Signed-off-by: Marco Pagani <marpagan@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:22:38 +02:00
Marco Pagani
5908560a7f habanalabs: added return value check for hl_fw_dynamic_send_clear_cmd()
The clang-analyzer reported a warning: "Value stored to 'rc' is never
read".

The return value check for the first hl_fw_dynamic_send_clear_cmd() call
in hl_fw_dynamic_send_protocol_cmd() appears to be missing.

Signed-off-by: Marco Pagani <marpagan@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:48 +02:00
Tomer Tayar
01907ba525 habanalabs: increase the size of busy engines mask
Increase the size of the busy engines mask in 'struct hl_info_hw_idle',
for future ASICs with more than 128 engines.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:48 +02:00
farah kassabri
18cd948204 habanalabs/gaudi2: change memory scrub mechanism
Currently the scrubbing mechanism used the EDMA engines by directly
setting the engine core registers to scrub a chunk of memory.
Due to a sporadic failure with this mechanism, it was decided to
initiate the engines via its QMAN using LIN-DMA packets.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:48 +02:00
Oded Gabbay
b585daa89d habanalabs: extend process wait timeout in device fine
Processes that use our device are likely to use at the same time other
devices such as remote storage.

In case our device is removed and a user process is still using the
device, we need to kill the user process. However, if that process
has a thread waiting for i/o to complete on remote storage, for example,
the process won't terminate.

Let's give it enough time to terminate before giving up.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2022-11-23 16:13:48 +02:00
Oded Gabbay
f69c3e460a habanalabs: check schedule_hard_reset correctly
schedule_hard_reset can be true only if we didn't do hard-reset.
Therefore, no point of checking it in case hard_reset is true.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2022-11-23 16:13:48 +02:00
Tomer Tayar
bc8e4bae70 habanalabs: reset device if still in use when released
If the device file is released while a context is still held, it won't
be possible to reopen it until the context is eventually released.
If that doesn't happen, only a device reset will revert it back to an
operational state, i.e. need to wait for a CS timeout or an error, or to
wait for an external intervention of injecting a reset via sysfs.

At this stage, after the device was released by user, context is held
either because of CS which were left running on the device and are not
relevant anymore, or due to missing cleanup steps from user side.

All of this is in any case handled in the device reset flow, so initiate
the reset at this point instead of waiting for it.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:48 +02:00
Tomer Tayar
9c604af0c9 habanalabs/gaudi2: return to reset upon SM SEI BRESP error
Due to a H/W issue in the LBW path to the PCIE_DBI MSI-X doorbell, there
were false sporadic error responses in SM when it was configured to
write to there, and hence no reset was done as part of handling the
relevant event.
Now that the virtual MSI-X doorbell is used, such errors in SM are not
expected and reset shouldn't be skipped.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Tomer Tayar
2c77ec14c2 habanalabs/gaudi2: don't enable entries in the MSIX_GW table
User should use the virtual MSI-X doorbell to generate interrupts from
the device, so there is no need to enable entries in the MSIX_GW table.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
farah kassabri
24c983c88f habanalabs/gaudi2: remove redundant firmware version check
Firmware 1.7 is the first official firmware, so no need to check
if we are running a version below it.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Tomer Tayar
fe3e88c947 habanalabs/gaudi: fix print for firmware-alive event
Add missing le{32,64}_to_cpu conversions.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Tomer Tayar
5f8981d699 habanalabs: fix print for out-of-sync and pkt-failure events
Add missing le32_to_cpu() conversions, and use %d for the value
returned from atomic_read().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Dani Liberman
d3027f4a62 habanalabs/gaudi2: add page fault notify event
Each time page fault happens, besides capturing its data, also notify
the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Ofir Bitton
a63de89bee habanalabs/gaudi2: classify power/thermal events as info
As power and thermal envelope events are pure informative and not
indicating an error, we reduce the print level to info only.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ohad Sharabi
b829e01025 habanalabs: skip events info ioctl if not supported
Some ASICs haven't yet implemented this functionality and so the
ioctl call should fail and the user should be notified of the reason.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
farah kassabri
3daa64eea1 habanalabs: fix firmware descriptor copy operation
This is needed to allow adding more data to the lkd_fw_comms_desc
structure.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
413bdb176e habanalabs/gaudi2: add razwi notify event
Each time razwi (read-only zero, write ignored) event happens, besides
capturing its data, also notify the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ofir Bitton
91bd822448 habanalabs/gaudi2: implement fp32 not supported event
Due to binning, Gaudi2 does not always support fp32.
We add support for such an event in case fp32 is used by the user
in such a device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
aff6354afd habanalabs/gaudi: add page fault notify event
Each time page fault happens, besides capturing its data, also notify
the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
cd21701cde habanalabs: use single threaded WQ for event handling
Creating event queue workqueue using alloc_workqueue made it run in
multi threaded mode, which caused parallel dumping of events as well as
parallel events notifying to user, causing logs with multiple
events to be out of order.

Fixed by creating event queue workqueue as single threaded work queue.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
cb5fb665f3 habanalabs/gaudi: add razwi notify event
Each time razwi (read-only zero, write ignore) happens, besides
capturing its data, also notify the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ofir Bitton
841cd2d765 habanalabs/gaudi2: add PCI revision 2 support
Add support for Gaudi2 Device with PCI revision 2.
Functionality is exactly the same as revision 1, the only difference
is device name exposed to user.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ofir Bitton
306206985a habanalabs: remove redundant gaudi2_sec asic type
As Gaudi2 has a single PCI id, the secured asic type is redundant.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
Ofir Bitton
bdfef91e7c habanalabs: add warning print upon a PCI error
In order to know if driver catches PCI errors correctly, we need to
print a warning per each error.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
Tomer Tayar
fc69aa8640 habanalabs: fix PCIe access to SRAM via debugfs
hl_access_sram_dram_region() uses a region base which is set within the
hl_set_dram_bar() function. However, for SRAM access this function is
not called, and we end up with a wrong value of region base and with a
bad calculated address.
Fix it by initializing the region base value independently of whether
hl_set_dram_bar() is called or not.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
farah kassabri
679e968908 habanalabs: zero ts registration buff when allocated
To avoid memory corruption in kernel memory while using timestamp
registration nodes, zero the kernel buff memory when its allocated.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
Tal Cohen
4a9c6e2cdf habanalabs: no consecutive err when user context is enabled
Consecutive error protects a device reset loop from being triggered
due to h/w issues and enters the device into an unavailable state.
When user may cause the error, an unavailable state
will prevent the user from running its workloads.

The commit prevents entering consecutive state when a user context
is enabled.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:44 +02:00
Tomer Tayar
1b363adc7f habanalabs: use graceful hard reset for CS timeouts
Use graceful hard reset when detecting a CS timeout that requires a
device reset.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:44 +02:00
Tomer Tayar
d1ce7e5ea1 habanalabs/gaudi2: use graceful hard reset for F/W events
Use graceful hard reset for F/W events on Gaudi2 device that require a
device reset.

While at it, do a small refactor of the checks and function calls,
to simplify it and to avoid code duplication.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
5b8873b39c habanalabs/gaudi: use graceful hard reset for F/W events
Use graceful hard reset for F/W events on Gaudi device that require a
device reset.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
11669b58fa habanalabs: add an option to control watchdog timeout via debugfs
Add an option to control the timeout value for the driver's watchdog
of the reset process. The timeout represents the amount of the user
has to close his process once he gets a device reset notification from
the driver.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
a88a6f5f5c habanalabs: add support for graceful hard reset
Calling hl_device_reset() for a hard reset will lead to a quite
immediate device reset and to killing user process.
For resets that follow errors, it disables the option to debug the
errors on both the device side and the user application side.

This patch adds a 'graceful hard reset' option and a new
hl_device_cond_reset() function.
Under some conditions, mainly if there is no user process or if he is
not registered to driver notifications, this function will execute hard
reset as usual.
Otherwise, the reset will be postponed and a notification will be sent
to user, to let him perform post-error actions and then to release the
device, after which reset will take place.

If device is not released by user in some defined time, a watchdog work
will execute the reset in any case.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Ohad Sharabi
d1e0ac37ed habanalabs: avoid divide by zero in device utilization
Currently there is no verification whether the divisor is legal.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Dani Liberman
6bcb2d05a5 habanalabs: fix user mappings calculation in case of page fault
As there are 2 types of user mappings, pmmu and hmmu, calculate
only the relevant mappings for the requested type.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
5ad06bb1d2 habanalabs/gaudi2: remove configurations to access the MSI-X doorbell
The virtual MSI-X doorbell is supported now in F/W, so all
configurations to access the PCIE_DBI MSI-X doorbell can be removed.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Ohad Sharabi
e325d5dbf3 habanalabs: allow setting HBM BAR to other regions
Up until now the use-case in the driver was that the HBM is accessed
using the HBM BAR, yet the BAR sometimes cannot cover the whole HBM and
so we needed to set the BAR to other HBM offset.
Now we are facing the need to access other PCI memory regions that can
be covered by the HBM BAR.
To answer that we are allowing the caller to determine if the HBM BAR
need to be set or not regardless of the PCI memory region.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Ohad Sharabi
24fdfb359c habanalabs: fix using freed pointer
The code uses the pointer for trace purpose (without actually
dereference it) but still get static analysis warning.
This patch eliminate the warning.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Dilip Puri
dc8d243cae habanalabs/gaudi2: unsecure CBU_EARLY_BRESP registers
NIC ARCs need to have access to CBU_EARLY_BRESP, hence we unsecure
those registers.

Signed-off-by: Dilip Puri <dilipp@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:42 +02:00
Tal Cohen
27cd39afde habanalabs: verify no zero event is sent
The event notifier mechanism should not raise an empty
event (event equals zero).

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:42 +02:00
Dani Liberman
4f11694f27 habanalabs/gaudi2: capture page fault data
Capture page fault data when it happens.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:42 +02:00
Dani Liberman
15ac503cdc habanalabs/gaudi2: capture RAZWI information
Added function to calculate possible engines which caused
RAZWI (read-only zero, write ignored), from a given router id or
module index.

When getting RAZWI via PSOC IP, first the router id is calculated
and then the possible engines that caused the RAZWI are calculated.

There is a possibility that the RAZWI initiator is not an engine. In
that case, it will not be included in possible engines as it
doesn't have an engine id.

RAZWI information is captured when receiving event from engine or via
PSOC IP.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:41 +02:00
Dani Liberman
17f3f42af2 habanalabs: handle HBM MMU when capturing page fault data
In case of HBM MMU page fault, capture its relevant mappings.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
1eebb25929 habanalabs: move reset workqueue to be under hl_device
'struct hl_device_reset_work' is used as a wrapper for the reset work
and its parameters, including the reset workqueue on which it runs.
In a future commit, another reset related work with similar parameters
is going to be added, but it won't use the reset workqueue.

As in any case there is a single reset workqueue, and to allow the resue
of this structure, move the reset workqueue to 'struct hl_device'.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
51236cd95e habanalabs: allow unregistering eventfd when device non-operational
Unregistering eventfd is for releasing host resources and doesn't
involve an access to the device. As such, there is no reason to disallow
it when device isn't operational.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
3a83ebc521 habanalabs: skip idle status check if reset on device release
If reset upon device release is enabled, there is no need to check the
device idle status in hpriv_release(), because device is going to be
reset in any case.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tal Cohen
5731b6e6f0 habanalabs/gaudi2: add device unavailable notification
Device unavailable notifies the user that there isn't an option to
retrieve debug information from the device.
When a critical device error occurs and the f/w performs the device
reset, a device unavailable notification shall be sent to the user
process.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Koby Elbaz
16448d6444 habanalabs/gaudi2: remove privileged MME clock configuration
Privileged MME clock configuration is removed as it is done by the f/w.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Dafna Hirschfeld
189b203ebb habanalabs: replace 'pf' to 'prefetch'
pf was an abbreviation for prefetch but because pf already stands
for 'physical function', we decided to change it to 'prefetch'.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00