Commit graph

998037 commits

Author SHA1 Message Date
Gustavo Pimentel
23188e0d45 dw-xdata-pcie: Update outdated info and improve text format
Removes old information related to the stop file interface in sysfs left
by mistake during patch revision.

Improves the document text format to be more user-friendly and adds
basic driver related information, such as support, datasheet, and author.

Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/4e72f931474a784d478e5a67961ecf116911997a.1618066164.git.gustavo.pimentel@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14 19:47:28 +02:00
Gustavo Pimentel
b21a57636c dw-xdata-pcie: Fix documentation build warns
Fixes documentation build warns related to indentation, text formatting,
and missing reference on toc.

This fix solves the following warnings:

WARNING: Unexpected indentation.
WARNING: Block quote ends without a blank line; unexpected unindent.
WARNING: document isn't included in any toctree

Link: https://lore.kernel.org/linux-next/20210406214615.40cf3493@canb.auug.org.au/
Fixes: e1181b5bbc ("Documentation: misc-devices: Add Documentation for dw-xdata-pcie driver")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/42ed2d9d27579291dc7cce89c0164bd9255fe337.1618066164.git.gustavo.pimentel@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14 19:47:28 +02:00
Greg Kroah-Hartman
31d8df9f4a MHI changes for v5.13
core:
 
 - Added support for Flash Programmer execution environment which allows the
   host machine (like x86) to flash the modem firmware to NAND or eMMC in the
   modem. The MHI bus will expose EDL channels (34, 35) and then the opensource
   QDL tool [1] can be used to flash the firmware from the host.
 - Added an internal helper for polling the MHI registers with a retry interval.
   This helper is used now to poll for the MHI ready state in MHI STATUS
   register.
 - Various fixes for issues found during the bringup of SDX24/SDX55 based Quectel
   and Telit modems.
 - Updates to the Execution environment handling for proper downloading of the
   AMSS image from SBL (Secondary Bootloader) mode.
 - Added support for sending STOP channel command to the MHI device and also made
   changes to the MHI core for proper handling of stop and restart.
 - Fixed the runtime_pm handling in the core by forcing the device to be in wake
   mode until TX completion and allowing it to suspend for RX.
 - Added sanity checks for values read from the device to avoid crash if those
   are corrupted somehow.
 - Fixed warnings generated by sparse (W=2)
 - Couple of kernel doc cleanups in mhi.h
 
 pci_generic:
 
 - Added support for runtime PM and generic PM
 - Added Firehose channels for flashing the firmware
 - Added support for modems such as Quectel EM1XXGR-L, SDX24, SDX65, Foxconn
   T99W175 exposing relevant channels.
 
 [1] https://git.linaro.org/landing-teams/working/qualcomm/qdl.git
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEZ6VDKoFIy9ikWCeXVZ8R5v6RzvUFAmByjAUACgkQVZ8R5v6R
 zvXcgQf/cmZ+E7DUXfIutbtxu0WQsprLoi12Z+tNy+0di/6HbstodNYsDJGEMCeg
 f5mXClHMTj6uO5aRu+5tgWxA6pNNBeHSpJmztbbxjrtdiAC1tZHXMFCMQ/Mj4Sv5
 IFmfHVF/wsMdFJUkfaOWC45mVhPG/TK5Wng86CUSZXUdhgC0AxY0mQqmivTjS5UE
 TA1zxCTS7ni97fceGM+V2JlebFYJJ+gwkVVgHhZMF0x+1xNldoNCxjSfMso6EeS1
 ThK8bjxYYi/eRcM1jltdv/zWlJbePOTSos5Pkm+NQsauPWtETELKq58MDhLTzD28
 aiQ8mx10gsYDGXvXpfh3nsMN1pwOfg==
 =lVaD
 -----END PGP SIGNATURE-----

Merge tag 'mhi-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next

Manivannan writes:

MHI changes for v5.13

core:

- Added support for Flash Programmer execution environment which allows the
  host machine (like x86) to flash the modem firmware to NAND or eMMC in the
  modem. The MHI bus will expose EDL channels (34, 35) and then the opensource
  QDL tool [1] can be used to flash the firmware from the host.
- Added an internal helper for polling the MHI registers with a retry interval.
  This helper is used now to poll for the MHI ready state in MHI STATUS
  register.
- Various fixes for issues found during the bringup of SDX24/SDX55 based Quectel
  and Telit modems.
- Updates to the Execution environment handling for proper downloading of the
  AMSS image from SBL (Secondary Bootloader) mode.
- Added support for sending STOP channel command to the MHI device and also made
  changes to the MHI core for proper handling of stop and restart.
- Fixed the runtime_pm handling in the core by forcing the device to be in wake
  mode until TX completion and allowing it to suspend for RX.
- Added sanity checks for values read from the device to avoid crash if those
  are corrupted somehow.
- Fixed warnings generated by sparse (W=2)
- Couple of kernel doc cleanups in mhi.h

pci_generic:

- Added support for runtime PM and generic PM
- Added Firehose channels for flashing the firmware
- Added support for modems such as Quectel EM1XXGR-L, SDX24, SDX65, Foxconn
  T99W175 exposing relevant channels.

[1] https://git.linaro.org/landing-teams/working/qualcomm/qdl.git

* tag 'mhi-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: (49 commits)
  bus: mhi: fix typo in comments for struct mhi_channel_config
  bus: mhi: core: Fix shadow declarations
  bus: mhi: pci_generic: Constify mhi_controller_config struct definitions
  bus: mhi: pci_generic: Introduce Foxconn T99W175 support
  bus: mhi: core: Sanity check values from remote device before use
  bus: mhi: pci_generic: Add FIREHOSE channels
  bus: mhi: pci_generic: Implement PCI shutdown callback
  bus: mhi: Improve documentation on channel transfer setup APIs
  bus: mhi: core: Remove __ prefix for MHI channel unprepare function
  bus: mhi: core: Check channel execution environment before issuing reset
  bus: mhi: core: Clear configuration from channel context during reset
  bus: mhi: core: Hold device wake for channel update commands
  bus: mhi: core: Update debug messages to use client device
  bus: mhi: core: Improvements to the channel handling state machine
  bus: mhi: core: Clear context for stopped channels from remove()
  bus: mhi: core: Allow sending the STOP channel command
  bus: mhi: pci_generic: Add SDX65 based modem support
  bus: mhi: core: Remove pre_init flag used for power purposes
  bus: mhi: pm: reduce PM state change verbosity
  bus: mhi: core: Fix MHI runtime_pm behavior
  ...
2021-04-11 08:53:17 +02:00
Greg Kroah-Hartman
aa87e31757 This tag contains habanalabs driver changes for v5.13:
- Add support to reset device after the user closes the file descriptor.
   Because we support a single user, we can reset the device (if needs to)
   after a user closes its file descriptor to make sure the device is in
   idle and clean state for the next user.
 
 - Add a new feature to allow the user to wait on interrupt. This is needed
   for future ASICs
 
 - Replace GFP_ATOMIC with GFP_KERNEL wherever possible and add code to
   support failure of allocating with GFP_ATOMIC.
 
 - Update code to support the latest firmware image:
   - More security features are done in the firmware
   - Remove hard-coded assumptions and replace them with values that are
     sent to the firmware on loading.
   - Print device unusable error
   - Reset device in case the communication between driver and firmware
     gets out of sync.
   - Support new PCI device ids for secured GAUDI.
 
 - Expose current power draw through the INFO IOCTL.
 
 - Support resetting the device upon a request from the BMC (through F/W).
 
 - Always use only a single MSI in GAUDI, due to H/W limitation.
 
 - Improve data-path code by taking out code from spinlock protection.
 
 - Allow user to specify custom timeout per Command Submission.
 
 - Some enhancements to debugfs.
 
 - Various minor changes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE7TEboABC71LctBLFZR1NuKta54AFAmByAxcTHG9nYWJiYXlA
 a2VybmVsLm9yZwAKCRBlHU24q1rngLzgB/4gkltBLgkp+VaIK+fb7uB34CY096M1
 e8iO7eqxjrjW3YymBaHVYVuaogv9/XD1pUap/rsvw4Ytvb3g390wLjHhsHcSW0AM
 8gIswu0VqWWrxphe0ns+ArV4j6JWVBkUQ1QDxp9Ut0qMaUZha/EkfAengMseQbjR
 3oaPwrUCpPpl4XfZaBTxTg3RyHtXnzi3cFw2b207D9iX8DS69TtLgMPAj5xN4vO2
 lei/4ZRJw/MbJSwvmNJt2d7E7CniLQh9sy7JnMeinpG+WD4GMdx1m0bI8fIuKQ11
 GkvVRREGHuQ0YtvTIWi9K+GAwJNqIIw/cW8M3+P1+7WjLWAKkyOTtYk5
 =ZHBU
 -----END PGP SIGNATURE-----

Merge tag 'misc-habanalabs-next-2021-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v5.13:

- Add support to reset device after the user closes the file descriptor.
  Because we support a single user, we can reset the device (if needs to)
  after a user closes its file descriptor to make sure the device is in
  idle and clean state for the next user.

- Add a new feature to allow the user to wait on interrupt. This is needed
  for future ASICs

- Replace GFP_ATOMIC with GFP_KERNEL wherever possible and add code to
  support failure of allocating with GFP_ATOMIC.

- Update code to support the latest firmware image:
  - More security features are done in the firmware
  - Remove hard-coded assumptions and replace them with values that are
    sent to the firmware on loading.
  - Print device unusable error
  - Reset device in case the communication between driver and firmware
    gets out of sync.
  - Support new PCI device ids for secured GAUDI.

- Expose current power draw through the INFO IOCTL.

- Support resetting the device upon a request from the BMC (through F/W).

- Always use only a single MSI in GAUDI, due to H/W limitation.

- Improve data-path code by taking out code from spinlock protection.

- Allow user to specify custom timeout per Command Submission.

- Some enhancements to debugfs.

- Various minor changes and improvements.

* tag 'misc-habanalabs-next-2021-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (41 commits)
  habanalabs: print f/w boot unknown error
  habanalabs: update to latest F/W communication header
  habanalabs/gaudi: skip iATU if F/W security is enabled
  habanalabs/gaudi: derive security status from pci id
  habanalabs: move dram scrub to free sequence
  habanalabs: send dynamic msi-x indexes to f/w
  habanalabs/gaudi: clear QM errors only if not in stop_on_err mode
  habanalabs: support DEVICE_UNUSABLE error indication from FW
  habanalabs: use strscpy instead of sprintf and strlcpy
  habanalabs: remove the store jobs array from CS IOCTL
  habanalabs/gaudi: add debugfs to DMA from the device
  habanalabs/gaudi: sync stream add protection to SOB reset flow
  habanalabs: add custom timeout flag per cs
  habanalabs: improve utilization calculation
  habanalabs: support legacy and new pll indexes
  habanalabs: move relevant datapath work outside cs lock
  habanalabs: avoid soft lockup bug upon mapping error
  habanalabs/gaudi: Update async events header
  habanalabs/gaudi: unsecure TPC cfg status registers
  habanalabs/gaudi: always use single-msi mode
  ...
2021-04-11 08:52:09 +02:00
Phillip Potter
19ab233989 fbdev: zero-fill colormap in fbcmap.c
Use kzalloc() rather than kmalloc() for the dynamically allocated parts
of the colormap in fb_alloc_cmap_gfp, to prevent a leak of random kernel
data to userspace under certain circumstances.

Fixes a KMSAN-found infoleak bug reported by syzbot at:
https://syzkaller.appspot.com/bug?id=741578659feabd108ad9e06696f0c1f2e69c4b6e

Reported-by: syzbot+47fa9c9c648b765305b9@syzkaller.appspotmail.com
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20210331220719.1499743-1-phil@philpotter.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 11:12:08 +02:00
He Ying
2954a6f12f firmware: qcom-scm: Fix QCOM_SCM configuration
When CONFIG_QCOM_SCM is y and CONFIG_HAVE_ARM_SMCCC
is not set, compiling errors are encountered as follows:

drivers/firmware/qcom_scm-smc.o: In function `__scm_smc_do_quirk':
qcom_scm-smc.c:(.text+0x36): undefined reference to `__arm_smccc_smc'
drivers/firmware/qcom_scm-legacy.o: In function `scm_legacy_call':
qcom_scm-legacy.c:(.text+0xe2): undefined reference to `__arm_smccc_smc'
drivers/firmware/qcom_scm-legacy.o: In function `scm_legacy_call_atomic':
qcom_scm-legacy.c:(.text+0x1f0): undefined reference to `__arm_smccc_smc'

Note that __arm_smccc_smc is defined when HAVE_ARM_SMCCC is y.
So add dependency on HAVE_ARM_SMCCC in QCOM_SCM configuration.

Fixes: 916f743da3 ("firmware: qcom: scm: Move the scm driver to drivers/firmware")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: He Ying <heying24@huawei.com>
Link: https://lore.kernel.org/r/20210406094200.60952-1-heying24@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 11:01:19 +02:00
Yang Yingliang
0d5cf95465 speakup: i18n: Switch to kmemdup_nul() in spk_msg_set()
Use kmemdup_nul() helper instead of open-coding to
simplify the code in spk_msg_set().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20210406034434.442251-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:58:50 +02:00
Chen Huang
6c00365d53 w1: ds28e17: Use module_w1_family to simplify the code
module_w1_family() makes the code simpler by eliminating
boilerplate code.

Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Link: https://lore.kernel.org/r/20210408130954.1158963-2-chenhuang5@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:58:21 +02:00
Chen Huang
88adcd6610 w1: ds2805: Use module_w1_family to simplify the code
module_w1_family() makes the code simpler by eliminating
boilerplate code.

Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Link: https://lore.kernel.org/r/20210408130954.1158963-1-chenhuang5@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:58:21 +02:00
Hang Lu
a7dc1e6f99 binder: tell userspace to dump current backtrace when detected oneway spamming
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.

This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible for
either more than 50 transactions, or more than 50% of the oneway space).
And the detection will restart when the async buffer has returned to a
healthy state.

Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:52:04 +02:00
Hang Lu
0051691574 binder: fix the missing BR_FROZEN_REPLY in binder_return_strings
Add BR_FROZEN_REPLY in binder_return_strings to support stat function.

Fixes: ae28c1be1e ("binder: BINDER_GET_FROZEN_INFO ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-2-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:52:04 +02:00
Jarvis Jiang
a503d1628c bus: mhi: fix typo in comments for struct mhi_channel_config
The word 'rung' is a typo in below comment, fix it.
* @event_ring: The event rung index that services this channel

Signed-off-by: Jarvis Jiang <jarvis.w.jiang@gmail.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20210408100220.3853-1-jarvis.w.jiang@gmail.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2021-04-09 18:48:22 +05:30
Oded Gabbay
b575a7673e habanalabs: print f/w boot unknown error
We need to print a message to the kernel log in case we encounter
an unknown error in the f/w boot to help the user understand what
happened.

In addition, we shouldn't print unknown error in case of known errors.

Moreover, in case of warnings/info, we shouldn't return -EIO that will
fail the initialization and mark the device as disabled

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:10:32 +03:00
Ohad Sharabi
669b018835 habanalabs: update to latest F/W communication header
update files to latest version from F/W team.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:25 +03:00
Ofir Bitton
41f458f205 habanalabs/gaudi: skip iATU if F/W security is enabled
As part of the securing GAUDI, the F/W will configure the PCI iATU
regions. If the driver identifies a secured PCI ID, it will know to
skip iATU configuration in a very early stage.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:25 +03:00
Ofir Bitton
e5042a6fa6 habanalabs/gaudi: derive security status from pci id
As F/ security indication must be available before driver approaches
PCI bus, F/W security should be derived from PCI id rather than be
fetched during boot handshake with F/W.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:25 +03:00
Bharat Jauhari
d4b1e5da54 habanalabs: move dram scrub to free sequence
DRAM scrubbing can take time hence it adds to latency during allocation.
To minimize latency during initialization, scrubbing is moved to release
call.
In case scrubbing fails it means the device is in a bad state,
hence HARD reset is initiated.

Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:25 +03:00
Ohad Sharabi
e9c2003be4 habanalabs: send dynamic msi-x indexes to f/w
In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Tomer Tayar
1b4971573f habanalabs/gaudi: clear QM errors only if not in stop_on_err mode
Clearing QM errors by the driver will prevent these H/W blocks from
stopping in case they are configured to stop on errors, so perform this
clearing only if this mode is not in use.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Koby Elbaz
7d21114b03 habanalabs: support DEVICE_UNUSABLE error indication from FW
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit.
On boot-up, the driver will therefore fail inserting the device.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Oded Gabbay
ae382c22fc habanalabs: use strscpy instead of sprintf and strlcpy
Prefer the use of strscpy when copying the ASIC name into a char array,
to prevent accidentally exceeding the array's length.
In addition, strlcpy is frowned upon so replace it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Oded Gabbay
131d1ba130 habanalabs: remove the store jobs array from CS IOCTL
The store part was never implemented in the code and never been used
by the userspace applications.

We currently use the related parameters to a different purpose with
a defined union. However, there is no point in that and it is better
to just remove the union and the store parameters.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Oded Gabbay
639781dcab habanalabs/gaudi: add debugfs to DMA from the device
When trying to debug program, the user often needs to
dump large parts of the device's DRAM, which can reach to tens of GBs.
Because reading from the device's internal memory through the PCI BAR
is extremely slow, the debug can take hours.

Instead, we can provide the user to copy data through one of the DMA
engines. This will make the operation much faster.

Currently, only GAUDI is supported.

In GAUDI, we need to find a PCI DMA engine that is IDLE and set the
DMA as secured to be able to bypass our MMU as we currently don't
map the temporary buffer to the MMU.

Example bash one-line to dump entire HBM to file (~2 minutes):

for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \
printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \
echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \
sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
farah kassabri
e65448faf4 habanalabs/gaudi: sync stream add protection to SOB reset flow
Since we moved the SOB reset flow to workqueue and
not part of the fence release flow, we might reach a
scenario where new context is created while we in the middle
of resetting the SOB.
in such cases the reset may fail due to idle check.
This will mess up the streams sync since the SOB value is invalid.
so we protect this area with a mutex, to delay context creation.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Alon Mizrahi
cf39395034 habanalabs: add custom timeout flag per cs
There is a need to allow to user to send command submissions with
custom timeout as some CS take longer than the max timeout that is
used by default.

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Koby Elbaz
cd5def8020 habanalabs: improve utilization calculation
The new approach is based on the notion that the relative
current power consumption is in relation of proportionality
to device's true utilization.
Utilization info ranges between [0,100]%
Currently, dc_power values are hard-coded.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Ohad Sharabi
e8f9392a5c habanalabs: support legacy and new pll indexes
In order to use minimum of hard coded values common to LKD and F/W
a dynamic method to work with PLLs is introduced in this patch.
Formerly asic specific PLL numbering is now common for all asics.
To be backward compatible a bit in dev status is defined, if the bit is
not set LKD will keep working with old PLL numbering.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Ofir Bitton
8445dde1b9 habanalabs: move relevant datapath work outside cs lock
In order to shorten the time cs lock is being held, we move any
possible work outside of the cs lock.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
farah kassabri
2f6274e477 habanalabs: avoid soft lockup bug upon mapping error
Add a little sleep between page unmappings in case mapping of
large number of host pages failed, in order to
avoid soft lockup bug during the rollback.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Ofir Bitton
d661d79930 habanalabs/gaudi: Update async events header
Update with latest version from the Firmware team.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Ofir Bitton
f951eb08a9 habanalabs/gaudi: unsecure TPC cfg status registers
Unsecure relevant registers as TPC engine need access to
TPC status.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Oded Gabbay
12e66a1727 habanalabs/gaudi: always use single-msi mode
The device can get into deadlock in case it use indirect mode for MSI
interrupts (multi-msi) and have hard-reset during interrupt storm.

To prevent that, always use direct mode which means single-msi mode.

The F/W will prevent the host from writing to the indirect MSI
registers to prevent any malicious user from causing this scenario.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
2ea09537ad habanalabs/gaudi: reset device upon BMC request
In case the BMC of the devices' box wants to initiate a reset of
a specific device, it must go through driver.
Once driver will receive the request it will initiate a hard reset
flow.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
a5778d10a1 habanalabs: debugfs access to user mapped host addresses
In order to have a better debuggability we allow debugfs access
to user mmu mapped host memory. Non-user host memory access will be
rejected.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Yang Li
dd0a25c77a habanalabs: Switch to using the new API kobj_to_dev()
fixed the following coccicheck:
./drivers/misc/habanalabs/common/sysfs.c:347:60-61: WARNING opportunity
for kobj_to_dev()

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ohad Sharabi
99cb017e72 habanalabs: update hl_boot_if.h
Update to the latest version of the file as supplied by the F/W.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ohad Sharabi
e42a6400fb habanalabs: skip DISABLE PCI packet to FW on heartbeat
if reset is due to heartbeat, device CPU is no responsive in which
case no point sending PCI disable message to it.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
d5eb8373b2 habanalabs: replace GFP_ATOMIC with GFP_KERNEL
As there are incorrect assumptions in which some of the
initialization and data path flows cannot sleep, most allocations
are being done using GFP_ATOMIC.
We modify the code to use GFP_ATOMIC only when realy needed, as
sleepable flow should use GFP_KERNEL.
In addition add a fallback to allocate memory using GFP_KERNEL,
once ATOMIC allocation fails.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
f209e5ad18 habanalabs/gaudi: update extended async event header
Update to the latest definition of the firmware

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Sagiv Ozeri
586f2caf0e habanalabs: return current power via INFO IOCTL
Add driver implementation for reading the current power from the device
CPU F/W.

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Sagiv Ozeri
a4371c1a1e habanalabs: support HW blocks vm show
Improve "vm" debugfs node to print also the virtual addresses which are
currently mapped to HW blocks in the device.

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
6a2f5d7098 habanalabs: use a single FW loading bringup flag
For simplicity, use a single bringup flag indicating which FW
binaries should loaded to device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Oded Gabbay
366addb0c3 habanalabs: use correct define for 32-bit max value
Timeout in wait for interrupt is in 32-bit variable so we need to use
the correct maximum value to compare.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
ab5f5c3089 habanalabs: wait for interrupt support
In order to support command submissions from user space, the driver
need to add support for user interrupt completions. The driver will
allow multiple user threads to wait for an interrupt and perform
a comparison with a given user address once interrupt expires.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ofir Bitton
2d44c6f6b3 habanalabs: enable all IRQs for user interrupt support
In order to support user interrupts, driver must enable all MSI-X
interrupts for any case user will trigger them. We differentiate
between a valid user interrupt and a non valid one.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00
Ohad Sharabi
5d6a198f9d habanalabs: reset device in case of sync error
As the F/wW is the first to detect out of sync event, a new event is
added to notify the driver on such event. In which case the driver
performs hard reset.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00
Oded Gabbay
17b59dd339 habanalabs: change default CS timeout to 30 seconds
Because our graph contains network operations, we need to account
for delay in the network.

5 seconds timeout per CS is not enough to account for that.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00
Oded Gabbay
278b5f7acb habanalabs: print if device is used on FD close
Notify to the user that although he closed the FD, the device is
still in use because there are live CS and/or memory mappings (mmaps).

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00
Oded Gabbay
d3ee681afd habanalabs: reset_upon_device_release is for bring-up
Move the field to correct location in structure and remove comment.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00
Oded Gabbay
23c3efd1fb habanalabs: fail reset if device is not idle
After any reset (soft or hard) the device (the engines/QMANs) should
be idle. If they are not idle, fail the reset. If it is soft-reset,
the driver will try to do hard-reset automatically. If it is hard-reset,
the driver will make the device non-operational.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00