Commit Graph

1136491 Commits

Author SHA1 Message Date
Bryan Brattlof 46cab93ab4 thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function
The k3_thermal_get_temp() function can be simplified to return only
the result of k3_bgp_read_temp() without needing the 'ret' variable

Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://lore.kernel.org/r/20221031232702.10339-2-bb@ti.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Manivannan Sadhasivam 2baad24963 thermal/drivers/qcom: Demote error log of thermal zone register to debug
devm_thermal_of_zone_register() can fail with -ENODEV if thermal zone for
the channel is not represented in DT. This is perfectly fine since not all
sensors needs to be used for thermal zones but only a few in real world.

So demote the error log to debug to avoid spamming users.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20221029052933.32421-1-manivannan.sadhasivam@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Luca Weiss 8763f8acbf thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2
On gen2 chips the stage2 threshold is not 140 degC but 125 degC.

Make the warning message clearer by using this variable and also by
including the temperature that was checked for.

Fixes: aa92b3310c ("thermal/drivers/qcom-spmi-temp-alarm: Add support for GEN2 rev 1 PMIC peripherals")
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Amit Kucheria <amitk@kernel.org>
Link: https://lore.kernel.org/r/20221020145237.942146-1-luca.weiss@fairphone.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Krzysztof Kozlowski fa17c4136d dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450
Narrow number of interrupts per variants: SC8280XP, SM6350 and SM8450.
The compatibles are already used and described.  They only missed the
constraints of number of interrupts.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221116113140.69587-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Christophe JAILLET de04f680b0 thermal/core/power allocator: Remove a useless include
This file does not use rcu, so there is no point in including
<linux/rculist.h>.

Remove it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/9adeec47cb5a8193016272d5c8bf936235c1711d.1669459337.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Alexander Stein de95d1341a thermal/drivers/imx8mm: Add hwmon support
Expose thermal sensors as HWMON devices.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20220726122331.323093-1-alexander.stein@ew.tq-group.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Johan Hovold 6f89416462 thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message
Drivers should not be logging errors on probe deferral. Switch to using
dev_err_probe() to log failures when parsing the devicetree to avoid
errors like:

	qcom-spmi-adc-tm5 c440000.spmi:pmic@0:adc-tm@3400: get dt data failed: -517

when a channel is not yet available.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20221102152630.696-1-johan+linaro@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Daniel Golle c464856e63 dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC
Document compatible string 'mediatek,mt7986-thermal' for V3 thermal
unit found in MT7986 SoCs.
'mediatek,mt7981-thermal' is also added as it is identical with the
thermal unit of MT7986.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Geert Uytterhoeven 3f9cb57962 thermal: ti-soc-thermal: Drop comma after SoC match table sentinel
It does not make sense to have a comma after a sentinel, as any new
elements must be added before the sentinel.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Keerthy <j-keerthy@ti.com>
Link: https://lore.kernel.org/r/1d6de2a80b919cb11199e56ac06ad21c273ebe57.1669045586.git.geert+renesas@glider.be
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Marek Vasut 4032916488 thermal/drivers/imx: Add support for loading calibration data from OCOTP
The TMU TASR, TCALIVn, TRIM registers must be explicitly programmed with
calibration values in OCOTP. Add support for reading the OCOTP calibration
data and programming those into the TMU hardware.

The MX8MM/MX8MN TMUv1 uses only one OCOTP cell, while MX8MP TMUv2 uses 4,
the programming differs in each case.

Based on U-Boot commits:
70487ff386c ("imx8mm: Load fuse for TMU TCALIV and TASR")
ebb9aab318b ("imx: load calibration parameters from fuse for i.MX8MP")

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Marek Vasut 8848c0d7a0 dt-bindings: thermal: imx8mm-thermal: Document optional nvmem-cells
The TMU TASR, TCALIVn, TRIM registers must be explicitly programmed with
calibration values from OCOTP. Document optional phandle to OCOTP nvmem
provider.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Christian Marangi 89992d95ed thermal/drivers/qcom/tsens: Rework debugfs file structure
The current tsens debugfs structure is composed by:
- a tsens dir in debugfs with a version file
- a directory for each tsens istance with sensors file to dump all the
  sensors value.

This works on the assumption that we have the same version for each
istance but this assumption seems fragile and with more than one tsens
istance results in the version file not tracking each of them.

A better approach is to just create a subdirectory for each tsens
istance and put there version and sensors debugfs file.

Using this new implementation results in less code since debugfs entry
are created only on successful tsens probe.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20221022125657.22530-4-ansuelsmth@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Christian Marangi c7e077e921 thermal/drivers/qcom/tsens: Fix wrong version id dbg_version_show
For VER_0 the version was incorrectly reported as 0.1.0.

Fix that and correctly report the major version for this old tsens
revision.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20221022125657.22530-3-ansuelsmth@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Christian Marangi de48d8766a thermal/drivers/qcom/tsens: Init debugfs only with successful probe
Calibrate and tsens_register can fail or PROBE_DEFER. This will cause a
double or a wrong init of the debugfs information. Init debugfs only
with successful probe fixing warning about directory already present.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20221022125657.22530-2-ansuelsmth@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2022-12-14 15:25:40 +01:00
Robert Marko 6840455deb thermal/drivers/tsens: Add IPQ8074 support
Qualcomm IPQ8074 uses tsens v2.3 IP, however unlike other tsens v2 IP
it only has one IRQ, that is used for up/low as well as critical.
It also does not support negative trip temperatures.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220818220245.338396-4-robimarko@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:40 +01:00
Robert Marko f63baced38 thermal/drivers/tsens: Allow configuring min and max trips
IPQ8074 and IPQ6018 dont support negative trip temperatures and support
up to 204 degrees C as the max trip temperature.

So, instead of always setting the -40 as min and 120 degrees C as max
allow it to be configured as part of the features.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220818220245.338396-3-robimarko@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:40 +01:00
Robert Marko 4360af3527 thermal/drivers/tsens: Add support for combined interrupt
Despite using tsens v2.3 IP, IPQ8074 and IPQ6018 only have one IRQ for
signaling both up/low and critical trips.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220818220245.338396-2-robimarko@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:40 +01:00
Robert Marko c6db32ec7c dt-bindings: thermal: tsens: Add ipq8074 compatible
Qualcomm IPQ8074 has tsens v2.3.0 block, though unlike existing v2 IP it
only uses one IRQ, so tsens v2 compatible cannot be used as the fallback.

We also have to make sure that correct interrupts are set according to
compatibles, so populate interrupt information per compatibles.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220818220245.338396-1-robimarko@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:40 +01:00
Ido Schimmel 7ef2f023c2 thermal/of: Fix memory leak on thermal_of_zone_register() failure
The function does not free 'of_ops' upon failure, leading to a memory
leak [1].

Fix by freeing 'of_ops' in the error path.

[1]
unreferenced object 0xffff8ee846198c80 (size 128):
  comm "swapper/0", pid 1, jiffies 4294699704 (age 70.076s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    d0 3f 6e 8c ff ff ff ff 00 00 00 00 00 00 00 00  .?n.............
  backtrace:
    [<00000000d136f562>] __kmalloc_node_track_caller+0x42/0x120
    [<0000000063f31678>] kmemdup+0x1d/0x40
    [<00000000e6d24096>] thermal_of_zone_register+0x49/0x520
    [<000000005e78c755>] devm_thermal_of_zone_register+0x54/0x90
    [<00000000ee6b209e>] pmbus_add_sensor+0x1b4/0x1d0
    [<00000000896105e3>] pmbus_add_sensor_attrs_one+0x123/0x440
    [<0000000049e990a6>] pmbus_add_sensor_attrs+0xfe/0x1d0
    [<00000000466b5440>] pmbus_do_probe+0x66b/0x14e0
    [<0000000084d42285>] i2c_device_probe+0x13b/0x2f0
    [<0000000029e2ae74>] really_probe+0xce/0x2c0
    [<00000000692df15c>] driver_probe_device+0x19/0xd0
    [<00000000547d9cce>] __device_attach_driver+0x6f/0x100
    [<0000000020abd24b>] bus_for_each_drv+0x76/0xc0
    [<00000000665d9563>] __device_attach+0xfc/0x180
    [<000000008ddd4d6a>] bus_probe_device+0x82/0xa0
    [<000000009e61132b>] device_add+0x3fe/0x920

Fixes: 3fd6d6e2b4 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20221020103658.802457-1-idosch@nvidia.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:39 +01:00
Keerthy a7c42af78b thermal/drivers/k3_j72xx_bandgap: Fix the debug print message
The debug print message to check the workaround applicability is inverted.
Fix the same.

Fixes: ffcb2fc86e ("thermal: k3_j72xx_bandgap: Add the bandgap driver support")
Reported-by: Bryan Brattlof <bb@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Link: https://lore.kernel.org/r/20221010034126.3550-1-j-keerthy@ti.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:39 +01:00
Rob Herring 87f9fe8c4b dt-bindings: thermal: Convert generic-adc-thermal to DT schema
Convert the 'generic-adc-thermal' binding to DT schema format.

The binding said '#thermal-sensor-cells' should be 1, but all in tree
users are 0 and 1 doesn't make sense for a single channel.

Drop the example's related providers and consumers of the
'generic-adc-thermal' node as the convention is to not have those in
the examples.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221011175235.3191509-1-robh@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:39 +01:00
Marcus Folkesson d37edc7370 thermal/drivers/imx8mm_thermal: Validate temperature range
Check against the upper temperature limit (125 degrees C) before
consider the temperature valid.

Fixes: 5eed800a68 ("thermal: imx8mm: Add support for i.MX8MM thermal monitoring unit")
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Reviewed-by: Jacky Bai <ping.bai@nxp.com>
Link: https://lore.kernel.org/r/20221014073507.1594844-1-marcus.folkesson@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:39 +01:00
Marcus Folkesson 1f455f144f thermal/drivers/imx8mm_thermal: Use GENMASK() when appropriate
GENMASK() is preferred to use for bitmasks.

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Link: https://lore.kernel.org/r/20221014081620.1599511-1-marcus.folkesson@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:39 +01:00
Luca Weiss f0f4c3adcf dt-bindings: thermal: tsens: Add sm8450 compatible
Document the tsens-v2 compatible for sm8450 SoC.

Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221016090035.565350-5-luca@z3ntu.xyz
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-12-14 15:25:39 +01:00
Rafael J. Wysocki 7d4b19ab6b Merge Intel thermal control drivers changes for v6.2
- Add Raptor Lake-S support to the intel_tcc_cooling driver (Zhang
   Rui).

 - Make the intel_tcc_cooling driver detect TCC locking (Zhang Rui).

 - Address Coverity warning in intel_hfi_process_event() (Ricardo Neri).

 - Prevent accidental clearing of HFI in the package thermal interrupt
   status (Srinivas Pandruvada).

 - Protect the clearing of status bits in MSR_IA32_PACKAGE_THERM_STATUS
   and MSR_IA32_THERM_STATUS (Srinivas Pandruvada).

 - Allow the HFI interrupt handler to ACK an event for the same
   timestamp (Srinivas Pandruvada).

* thermal-intel:
  thermal: intel: hfi: ACK HFI for the same timestamp
  thermal: intel: Protect clearing of thermal status bits
  thermal: intel: Prevent accidental clearing of HFI status
  thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
  thermal: intel: intel_tcc_cooling: Detect TCC lock bit
  thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
2022-12-02 19:39:07 +01:00
Yang Yingliang 4748f9687c thermal: core: fix some possible name leaks in error paths
In some error paths before device_register(), the names allocated
by dev_set_name() are not freed. Move dev_set_name() front to
device_register(), so the name can be freed while calling
put_device().

Fixes: 1dd7128b83 ("thermal/core: Fix null pointer dereference in thermal_release()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-25 19:51:41 +01:00
Srinivas Pandruvada c0e3acdcde thermal: intel: hfi: ACK HFI for the same timestamp
Some processors issue more than one HFI interrupt with the same
timestamp. Each interrupt must be acknowledged to let the hardware issue
new HFI interrupts. But this can't be done without some additional flow
modification in the existing interrupt handling.

For background, the HFI interrupt is a package level thermal interrupt
delivered via a LVT. This LVT is common for both the CPU and package
level interrupts. Hence, all CPUs receive the HFI interrupts. But only
one CPU should process interrupt and others simply exit by issuing EOI
to LAPIC.

The current HFI interrupt processing flow:

  1. Receive Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spinlock, one CPU will enter spinlock and others
     will simply return from here to issue EOI.
    (Let's assume CPU 4 is processing interrupt)
  4. Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.      ignore interrupt, unlock and return
  7. Copy the HFI message to local buffer
  8. unlock spinlock
  9. ACK HFI interrupt
 10. Queue the message for processing in a work-queue

It is tempting to simply acknowledge all the interrupts even if they
have the same timestamp. This may cause some interrupts to not be
processed.

Let's say CPU5 is slightly late and reaches step 4 while CPU4 is
between steps 8 and 9.

Currently we simply ignore interrupts with the same timestamp. No
issue here for CPU5. When CPU4 acknowledges the interrupt, the next
HFI interrupt can be delivered.

If we acknowledge interrupts with the same timestamp (at step 6), there
is a race condition. Under the same scenario, CPU 5 will acknowledge
the HFI interrupt. This lets hardware generate another HFI interrupt,
before CPU 4 start executing step 9. Once CPU 4 complete step 9, it
will acknowledge the newly arrived HFI interrupt, without actually
processing it.

Acknowledge the interrupt when holding the spinlock. This avoids
contention of the interrupt acknowledgment.

Updated flow:

  1. Receive HFI Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spin-lock
     Let's assume CPU 4 is processing interrupt
  4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit
  4.2	If hfi status is 0
  4.3		unlock spinlock
  4.4		return
  4.5 Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.1      ACK HFI Interrupt,
  6.2	unlock spinlock
  6.3	return
  7. Copy the HFI message to local buffer
  8. ACK HFI interrupt
  9. unlock spinlock
 10. Queue the message for processing in a work-queue

To avoid taking the lock unnecessarily, intel_hfi_process_event() checks
the status of the HFI interrupt before taking the lock. If CPU5 is late,
when it starts processing the interrupt there are two scenarios:

 a) CPU4 acknowledged the HFI interrupt before CPU5 read
    MSR_IA32_THERM_STATUS. CPU5 exits.

 b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the
    interrupt. CPU5 will take the lock if CPU4 has released it. It then
    re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt,
    the HFI status bit is clear and CPU5 exits. If a new HFI interrupt
    was generated it will find that the status bit is set and it will
    continue to process the interrupt. In this case even if timestamp
    is not changed, the ACK can be issued as this is a new interrupt.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Arshad, Adeel<adeel.arshad@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 20:13:22 +01:00
Srinivas Pandruvada 930d06bf07 thermal: intel: Protect clearing of thermal status bits
The clearing of the package thermal status is done by Read-Modify-Write
operation. This may result in clearing of some new status bits which are
being or about to be processed.

For example, while clearing of HFI status, after read of thermal status
register, a new thermal status bit is set by the hardware. But during
write back, the newly generated status bit will be set to 0 or cleared.
So, it is not safe to do read-modify-write.

Since thermal status Read-Write bits can be set to only 0 not 1, it is
safe to set all other bits to 1 which are not getting cleared.

Create a common interface for clearing package thermal status bits. Use
this interface to replace existing code to clear thermal package status
bits.

It is safe to call from different CPUs without protection as there is no
read-modify-write. Also wrmsrl results in just single instruction. For
example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If
CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write
0x4000aa8. The bits which are not part of clear are set to 1. The default
mask for bits, which can be written here is 0x4000aaa.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 20:09:06 +01:00
Srinivas Pandruvada 6fe1e64b60 thermal: intel: Prevent accidental clearing of HFI status
When there is a package thermal interrupt with PROCHOT log, it will be
processed and cleared. It is possible that there is an active HFI event
status, which is about to get processed or getting processed. While
clearing PROCHOT log bit, it will also clear HFI status bit. This means
that hardware is free to update HFI memory.

When clearing a package thermal interrupt, some processors will generate
a "general protection fault" when any of the read only bit is set to 1.

The driver maintains a mask of all read-write bits which can be set.

This mask doesn't include HFI status bit. This bit will also be cleared,
as it will be assumed read-only bit. So, add HFI status bit 26 to the
mask.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 20:09:06 +01:00
Guenter Roeck b778b4d782 thermal/core: Protect thermal device operations against thermal device removal
Thermal device operations may be called after thermal zone device removal.
After thermal zone device removal, thermal zone device operations must
no longer be called. To prevent such calls from happening, ensure that
the thermal device is registered before executing any thermal device
operations.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck 91b3aafc22 thermal/core: Remove thermal_zone_set_trips()
Since no callers of thermal_zone_set_trips() are left, remove the function.
Document __thermal_zone_set_trips() instead. Explicitly state that the
thermal zone lock must be held when calling the function, and that the
pointer to the thermal zone must be valid.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck 05eeee2b51 thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
Protect access to thermal operations against thermal zone removal by
acquiring the thermal zone device mutex. After acquiring the mutex, check
if the thermal zone device is registered and abort the operation if not.

With this change, we can call __thermal_zone_device_update() instead of
thermal_zone_device_update() from trip_point_temp_store() and from
emul_temp_store(). Similar, we can call __thermal_zone_set_trips() instead
of thermal_zone_set_trips() from trip_point_hyst_store().

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck ea37bec51f thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
In preparation to protecting access to thermal operations against thermal
zone device removal, protect hwmon accesses to thermal zone operations
with the thermal zone mutex. After acquiring the mutex, ensure that the
thermal zone device is registered before proceeding.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck 1c439dec35 thermal/core: Introduce locked version of thermal_zone_device_update
In thermal_zone_device_set_mode(), the thermal zone mutex is released only
to be reacquired in the subsequent call to thermal_zone_device_update().

Introduce __thermal_zone_device_update(), which is similar to
thermal_zone_device_update() but has to be called with the thermal device
mutex held. Call the new function from thermal_zone_device_set_mode()
to avoid the extra thermal device mutex release/acquire sequence in that
function.

With the new function in place, re-implement thermal_zone_device_update()
as wrapper around __thermal_zone_device_update() to acquire and release
the thermal device mutex.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck ed97d10a8b thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
All callers of __thermal_zone_get_temp() already validated the
thermal zone parameters. Move validation to thermal_zone_get_temp()
where it is actually needed. Also add kernel documentation for
__thermal_zone_get_temp(), listing the requirement that the
function must be called with validated parameters and with thermal
device mutex held.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck 1c6b300607 thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
Calls to thermal_zone_get_temp() are not protected against thermal zone
device removal. As result, it is possible that the thermal zone operations
callbacks are no longer valid when thermal_zone_get_temp() is called.
This may result in crashes such as

BUG: unable to handle page fault for address: ffffffffc04ef420
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 3209 Comm: cat Tainted: G        W         5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1
Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018
RIP: 0010:thermal_zone_get_temp+0x26/0x73
Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f
RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287
RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000
RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000
RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000
R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000
R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018
FS:  00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0
Call Trace:
 temp_show+0x31/0x68
 dev_attr_show+0x1d/0x4f
 sysfs_kf_seq_show+0x92/0x107
 seq_read_iter+0xf5/0x3f2
 vfs_read+0x205/0x379
 __x64_sys_read+0x7c/0xe2
 do_syscall_64+0x43/0x55
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

if a thermal device is removed while accesses to its device attributes
are ongoing.

The problem is exposed by code in iwl_op_mode_mvm_start(), which registers
a thermal zone device only to unregister it shortly afterwards if an
unrelated failure is encountered while accessing the hardware.

Check if the thermal zone device is registered after acquiring the
thermal zone device mutex to ensure this does not happen.

The code was tested by triggering the failure in iwl_op_mode_mvm_start()
on purpose. Without this patch, the kernel crashes reliably. The crash
is no longer observed after applying this and the preceding patches.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck 30b2ae07d3 thermal/core: Delete device under thermal device zone lock
Thermal device attributes may still be opened after unregistering
the thermal zone and deleting the thermal device.

Currently there is no protection against accessing thermal device
operations after unregistering a thermal zone. To enable adding
such protection, protect the device delete operation with the
thermal zone device mutex. This requires splitting the call to
device_unregister() into its components, device_del() and put_device().
Only the first call can be executed under mutex protection, since
put_device() may result in releasing the thermal zone device memory.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck d35f29ed9d thermal/core: Destroy thermal zone device mutex in release function
Accesses to thermal zones, and with it the thermal zone device mutex,
are still possible after the thermal zone device has been unregistered.
For example, thermal_zone_get_temp() can be called from temp_show()
in thermal_sysfs.c if the sysfs attribute was opened before the thermal
device was unregistered.

Move the call to mutex_destroy from thermal_zone_device_unregister()
to thermal_release() to ensure that it is only destroyed after it is
guaranteed to be no longer accessed.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Zhang Rui e77f069fd6 thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
Add RaptorLake to the list of processor models supported by the Intel
TCC cooling driver.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-09 14:58:02 +01:00
Zhang Rui be6abd3ed6 thermal: intel: intel_tcc_cooling: Detect TCC lock bit
When MSR_IA32_TEMPERATURE_TARGET is locked, TCC Offset can not be
updated even if the PROGRAMMABE Bit is set.

Yield the driver on platforms with MSR_IA32_TEMPERATURE_TARGET locked.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-09 14:58:02 +01:00
Ricardo Neri 54d9135cf2 thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
A Coverity static code scan raised a potential overflow_before_widen
warning when hfi_features::nr_table_pages is used as an argument to
memcpy in intel_hfi_process_event().

Even though the overflow can never happen (the maximum number of pages of
the HFI table is 0x10 and 0x10 << PAGE_SHIFT = 0x10000), using size_t as
the data type of hfi_features::nr_table_pages makes Coverity happy and
matches the data type of the argument 'size' of memcpy().

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28 20:11:48 +02:00
Dan Carpenter e49a1e1ee0 thermal/core: fix error code in __thermal_cooling_device_register()
Return an error pointer if ->get_max_state() fails.  The current code
returns NULL which will cause an oops in the callers.

Fixes: c408b3d1d9 ("thermal: Validate new state in cur_state_store()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28 19:59:51 +02:00
Viresh Kumar a365105c68 thermal: sysfs: Reuse cdev->max_state
Now that the cooling device structure stores the max_state value, reuse
it and drop max_states from struct cooling_dev_stats.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-25 18:58:11 +02:00
Viresh Kumar c408b3d1d9 thermal: Validate new state in cur_state_store()
In cur_state_store(), the new state of the cooling device is received
from user-space and is not validated by the thermal core but the same is
left for the individual drivers to take care of. Apart from duplicating
the code it leaves possibility for introducing bugs where a driver may
not do it right.

Lets make the thermal core check the new state itself and store the max
value in the cooling device structure.

Link: https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-25 18:58:11 +02:00
Linus Torvalds 247f34f7b8 Linux 6.1-rc2 2022-10-23 15:27:33 -07:00
Linus Torvalds 05b4ebd2c7 RISC-V:
- Fix compilation without RISCV_ISA_ZICBOM
 
 - Fix kvm_riscv_vcpu_timer_pending() for Sstc
 
 ARM:
 
 - Fix a bug preventing restoring an ITS containing mappings
   for very large and very sparse device topology
 
 - Work around a relocation handling error when compiling
   the nVHE object with profile optimisation
 
 - Fix for stage-2 invalidation holding the VM MMU lock
   for too long by limiting the walk to the largest
   block mapping size
 
 - Enable stack protection and branch profiling for VHE
 
 - Two selftest fixes
 
 x86:
 
 - add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
 
 selftests:
 
 - synchronize includes between include/uapi and tools/include/uapi
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNT2hQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNeVwgAkGGk2F2SF5s+MQUQ9tDPxyuRbddN
 NPo/YRTKszKc8rK6d1TCbQi56I3e8Oa7kNkMF7CiBlAekB7B1r1ySg5qc+3lQebx
 moME30Ru4nmfqPcZ7971MA8Me7zZxGzvIviL5KIwm1ownGifdTsPZ9jCvu4EPdzv
 3dd10guH3GeBIq8QeQGEqNP4fticziwhE+IA3HZstcWsq96800Le7WNAgklfzdC+
 YTB81QU6whHv6N/7YvRcTbp+tER3VIKdFMmRD1FwC90flhXMbxTymESFXULfHCM2
 x/arGz2E31/QGgJo0/Yy2VPenr5ZMU57dL4SYWR02mwSfJQnJWb1cRdWnw==
 =rxQ7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "RISC-V:

   - Fix compilation without RISCV_ISA_ZICBOM

   - Fix kvm_riscv_vcpu_timer_pending() for Sstc

  ARM:

   - Fix a bug preventing restoring an ITS containing mappings for very
     large and very sparse device topology

   - Work around a relocation handling error when compiling the nVHE
     object with profile optimisation

   - Fix for stage-2 invalidation holding the VM MMU lock for too long
     by limiting the walk to the largest block mapping size

   - Enable stack protection and branch profiling for VHE

   - Two selftest fixes

  x86:

   - add compat implementation for KVM_X86_SET_MSR_FILTER ioctl

  selftests:

   - synchronize includes between include/uapi and tools/include/uapi"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  tools: include: sync include/api/linux/kvm.h
  KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
  KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
  kvm: Add support for arch compat vm ioctls
  RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc
  RISC-V: Fix compilation without RISCV_ISA_ZICBOM
  KVM: arm64: vgic: Fix exit condition in scan_its_table()
  KVM: arm64: nvhe: Fix build with profile optimization
  KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test
  KVM: arm64: selftests: Fix multiple versions of GIC creation
  KVM: arm64: Enable stack protection and branch profiling for VHE
  KVM: arm64: Limit stage2_apply_range() batch size to largest block
  KVM: arm64: Work out supported block level at compile time
2022-10-23 15:00:43 -07:00
Jason A. Donenfeld ca4582c286 Revert "mfd: syscon: Remove repetition of the regmap_get_val_endian()"
This reverts commit 72a9585972.

It broke reboots on big-endian MIPS and MIPS64 malta QEMU instances,
which use the syscon driver.  Little-endian is not effected, which means
likely it's important to handle regmap_get_val_endian() in this function
after all.

Fixes: 72a9585972 ("mfd: syscon: Remove repetition of the regmap_get_val_endian()")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Lee Jones <lee@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-23 12:04:56 -07:00
Linus Torvalds 52826d3b2d kernel/utsname_sysctl.c: Fix hostname polling
Commit bfca3dd3d0 ("kernel/utsname_sysctl.c: print kernel arch") added
a new entry to the uts_kern_table[] array, but didn't update the
UTS_PROC_xyz enumerators of older entries, breaking anything that used
them.

Which is admittedly not many cases: it's really just the two uses of
uts_proc_notify() in kernel/sys.c.  But apparently journald-systemd
actually uses this to detect hostname changes.

Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Fixes: bfca3dd3d0 ("kernel/utsname_sysctl.c: print kernel arch")
Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Cc: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-23 12:01:01 -07:00
Linus Torvalds a703852408 - Fix raw data handling when perf events are used in bpf
- Rework how SIGTRAPs get delivered to events to address a bunch of
   problems with it. Add a selftest for that too
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNVE9UACgkQEsHwGGHe
 VUpodRAAsn3s+xX02qfxRG0mYX/1nnOmnFnDhmUEPAZpDXjB6g3PFXAh26F9tw92
 dLm8fbfp8W3tK7TQrGyOGWMnb7oj4gEoXAcQBXvsq2KOJMVwRHHKwCeSSJ89DLAW
 zZEl2nzgea2cGyZn2RZJvhmbm8YdFON3v++Vv34yovs1MMoCcD7FOPLLGWkD2gk5
 6PK6lIlWEI/+vYKWhZdxgk6/PanBInXRUnbaBVj42US1XwfDLzPBEi9yyUUJQrht
 CQhfTpHn4Z5MC/hJTlFat4Jlaajql4JBcQ1SS5LW59M+6gdlBK4tL6zFt10zvU2m
 +kywOOIWiYLRRgFf6idGO45P5BuWOdmsXEaEg5KW7b6nJfGvgkd7WUYgCVOgtEY1
 r4Mf4hAQUunDHGQ4e8eHk7XemFJsoBSweYCTQ2O0yr/QzO2M6QBi/BR9PzUajyH+
 yShKEfrxXt4595BMH0nonSMpTKcE4Zxdj06LZSnGecEN8UUlx/n49uYwhFNdUkqM
 s6Wz6kSR76YRlKUmYnNzP1gkY6nJZ1nR6z7SjmMkioxf3VxhT9SY8K587r6hRUlr
 /NVA69iUhJy75VdttxZEmZzB03A7AjdudmZEisF0ImEmB1hxzYLHcDKJMTIj/r4/
 f8OXCg5ACKhFlnx1SdBVRtA+6+5ab368Fs2rItJQ4dzdxRi6VVY=
 =DXG7
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Fix raw data handling when perf events are used in bpf

 - Rework how SIGTRAPs get delivered to events to address a bunch of
   problems with it. Add a selftest for that too

* tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  bpf: Fix sample_flags for bpf_perf_event_output
  selftests/perf_events: Add a SIGTRAP stress test with disables
  perf: Fix missing SIGTRAPs
2022-10-23 10:14:45 -07:00
Linus Torvalds c70055d8d9 - Adjust code to not trip up CFI
- Fix sched group cookie matching
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNVESwACgkQEsHwGGHe
 VUpIUg/9Ff8ZWaaray2uTML3cYMQF8yAsUIibAfeZuL9mEiz9TJy2k587qf61I34
 8cTcmrLWIkgrRsdn6XACDM3knU0BEPxhxonckFpbb5O8J93qRQZGGZZenAiUFO3m
 E+u71/MyP8IRqrsNQT1ESDAtE5TjP1KLmxWmpaKUxCqHWxmPmfg4rAvNBe4ZI2Kg
 UjvWxLMmVpOmCCCa5ti2udO2VEmFRLlQTFvt6Yo7Zny8bUZVYfvD3rKdauIP/+PL
 2JrRt3BxKNKygdR8rSRJKz2trxKQIXWGLeFHEfp7nH/mU48HvoW8ZI44owb4YkfN
 v9ZaTJ54KCoZY1pGif5X+TrETWoYdY/PKBwPWfLhOij6eO4SStwMaAzTSsT0tc9a
 xpy1AaV2XnUWvCBtrthTlaVn3/gBeNl0os+apyPcnQYZmiRp7CRfPozcHKyLJMQo
 oxAPXEHYZSSxz+hPVIenkVWXoFYBmm8CfSXwRoPOC7JYQNO6+0GBkJ8XAZAABbpN
 nUF70kbE0YQygP1gE2wDJgudaXOv4wXDKS1BHDoBmw2d4uSwAsgS9h6iMvATdX04
 UFoIP+kVTnQjYXBSzkfJkO+AcFtS4ReanX92tLI4IlUq6uPHRT5KC7L9Lq11NqUn
 kU16BaEaeSh4cuvGU7QEJSlM9sfWuolXcOgz5I80q7wjIcz1S1E=
 =N9Mz
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Adjust code to not trip up CFI

 - Fix sched group cookie matching

* tag 'sched_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Introduce struct balance_callback to avoid CFI mismatches
  sched/core: Fix comparison in sched_group_cookie_match()
2022-10-23 10:10:55 -07:00