Commit Graph

1900 Commits

Author SHA1 Message Date
Linus Torvalds aa35a4835e - Add initial support for RAS hardware found on AMD server GPUs (MI200).
Those GPUs and CPUs are connected together through the coherent fabric
   and the GPU memory controllers report errors through x86's MCA so EDAC
   needs to support them. The amd64_edac driver supports now HBM (High
   Bandwidth Memory) and thus such heterogeneous memory controller
   systems
 
 - Other small cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZiUwACgkQEsHwGGHe
 VUphSQ/+JLXTAQ06CNos98MR8iCGdThVujhWt1pBIgjhQFJuf4JlEEtKs9htjbud
 9HZvgnGbHahRoO8pMCB0jwtz0ATrPbaOvz4BofVp3SIRiR5jMI0tfmyl8iSrnA3Q
 m5pbMh6uiIAlH8aPqQXret2iwp7JXOjnBWksgbmUWkI7d2qseKu98ikXyC4QoCaD
 AGRJJ6OCA3P85rdT9qabOuXh6yoELOPKw3j243s22sTLiqn+EuoTE+QX5ZjrQ8Ts
 DyXN/pYI/vGVP7sECkWf7PsEf1BkL6m5KeXDB4Ij2YJesQnBlBZQdAcxdGdY8z3M
 f/qpLdrYvpcLHQy42Jm5VnnISOvMvAl8YWqCEyUmBjXcLwSPNIKHN9LQuznhnQHr
 vssRVqQUg1J+/UWAoIzHdrAQ6zvgv1xlX2dG2YOw3t1WMDnMhztW3eoQv04etD3d
 fqQH3MrkGHI4qeq1Mice1Gz+NWQG/PXVhgBzbTBDDCiRJkg1Dhxce1OMRUiM4tUW
 0JABoU+KS0RZAKXAwine6v5duYmwK36Vl1SSCCWjqFMeR7XMwWWHA9d7t8+wdT1l
 KBIEiRTcRnXaZXyLUPSPRbEF5ALS25RgWVPCA3ibuSUnJjGU7Z7/rbwlQryAefVB
 nqjATed0zat4fbL9bvnDuOKQEzkuySvUWpU+Eozxbct6oRu5ms0=
 =Vcif
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Add initial support for RAS hardware found on AMD server GPUs (MI200).

   Those GPUs and CPUs are connected together through the coherent
   fabric and the GPU memory controllers report errors through x86's MCA
   so EDAC needs to support them. The amd64_edac driver supports now HBM
   (High Bandwidth Memory) and thus such heterogeneous memory controller
   systems

 - Other small cleanups and improvements

* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/amd64: Cache and use GPU node map
  EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
  EDAC/amd64: Document heterogeneous system enumeration
  x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
  x86/amd_nb: Re-sort and re-indent PCI defines
  x86/amd_nb: Add MI200 PCI IDs
  ras/debugfs: Fix error checking for debugfs_create_dir()
  x86/MCE: Check a hw error's address to determine proper recovery action
2023-06-26 15:09:18 -07:00
Linus Torvalds e5ce2f196f - amd64_edac: Add support for Zen4 client hardware
- amd64_edac: Remove the version string as it is useless and actively
   confusing when looking at backported versions of the driver
 
 - Add a driver for the Nuvoton NPCM memory controller
 
 - A debugfs error checking cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZSk4ACgkQEsHwGGHe
 VUqPaQ/8Dwc8vS4iibztAFyYisRGrJRfVP6k7Nl2tSgCi+Tg0BWNFTSMyBzoLNuY
 ewGUe1ZKNkKb3Vs8OE3E48vstVd6J/jcoMAxUmtl4uHxzjhVfoIruBD/xK2Q6mO1
 UDfRrfT2LZv/0/Tn7++QP3R3aQLvDqJC6IVAG1Hn4hqHSnhw7CqgCetbBY/M+hQR
 p9Xjtb2Gbm1UwMEK+z9DG9jNZR2vtPRfOeieAcHpOnDwTe2QY1jQGoeeVDfdfJbC
 iU2D87ad1V7o4p+7Eur0wwg8smuWqSVslWId6+qmtL4xePK6JUL9D+3kPEO4AjWV
 iYqDi4EcdXOglYnAEvKhRbN8eCFMaYyoZqpC10DUTccyWv5w/CW2tRc7ZOKDPgyZ
 LVpupz87rKaJ2C6ymQ41vv98hpHEiGSSHserK0aY4K03ecL+pnHp4Qu3ZID8YLCo
 V6P1R7S63YFO1TU0LSWiVBBcmoWg0Zy5MQkKc+2PcWYm6soGDYFoD5lURVoVAiw4
 YZhReq58NQwyZQYhxgpBmdZYaLlrvGiGQZx/dhuR5C2qF3uL3wdi5mYvP/vSmKbG
 vLPMl/DrqGQEHJnCU2U8Xo3kss3mf/Qv7qvusaxkjcub8wvfKRbX7w4QhXSU7+qb
 1sf6LPWBOk+xb2daUM1tzaMUnF3Pr+8gbzlAxlu1SmtG/HiC7JA=
 =e4qx
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - amd64_edac: Add support for Zen4 client hardware

 - amd64_edac: Remove the version string as it is useless and actively
   confusing when looking at backported versions of the driver

 - Add a driver for the Nuvoton NPCM memory controller

 - A debugfs error checking cleanup

* tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/npcm: Add NPCM memory controller driver
  dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller
  EDAC/thunderx: Check debugfs file creation retval properly
  EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
  EDAC/amd64: Remove module version string
2023-06-26 15:06:42 -07:00
Yazen Ghannam 4251566ebc EDAC/amd64: Cache and use GPU node map
AMD systems have historically provided an "AMD Node ID" that is a unique
identifier for each die in a multi-die package. This was associated with
a unique instance of the AMD Northbridge on a legacy system. And now it
is associated with a unique instance of the AMD Data Fabric on modern
systems. Each instance is referred to as a "Node"; this is an
AMD-specific term not to be confused with NUMA nodes.

The data fabric provides a number of interfaces accessible through a set
of functions in a single PCI device. There is one PCI device per Data
Fabric (AMD Node), and multi-die systems will see multiple such PCI
devices. The AMD Node ID matches a Node's position in the PCI hierarchy.
For example, the Node 0 is accessed using the first PCI device, Node 1
is accessed using the second, and so on. A logical CPU can find its AMD
Node ID using CPUID. Furthermore, the AMD Node ID is used within the
hardware fabric, so it is not purely a logical value.

Heterogeneous AMD systems, with a CPU Data Fabric connected to GPU data
fabrics, follow a similar convention. Each CPU and GPU die has a unique
AMD Node ID value, and each Node ID corresponds to PCI devices in
sequential order.

However, there are two caveats:
1) GPUs are not x86, and they don't have CPUID to read their AMD Node ID
like on CPUs. This means the value is more implicit and based on PCI
enumeration and hardware-specifics.
2) There is a gap in the hardware values for AMD Node IDs. Values 0-7
are for CPUs and values 8-15 are for GPUs.

For example, a system with one CPU die and two GPUs dies will have the
following values:
  CPU0 -> AMD Node 0
  GPU0 -> AMD Node 8
  GPU1 -> AMD Node 9

EDAC is the only subsystem where this has a practical effect. Memory
errors on AMD systems are commonly reported through MCA to a CPU on the
local AMD Node. The error information is passed along to EDAC where the
AMD EDAC modules use the AMD Node ID of reporting logical CPU to access
AMD Node information.

However, memory errors from a GPU die will be reported to the CPU die.
Therefore, the logical CPU's AMD Node ID can't be used since it won't
match the AMD Node ID of the GPU die. The AMD Node ID of the GPU die is
provided as part of the MCA information, and the value will match the
hardware enumeration (e.g. 8-15).

Handle this situation by discovering GPU dies the same way as CPU dies
in the AMD NB code. But do a "node id" fixup in AMD64 EDAC where it's
needed.

The GPU data fabrics provide a register with the base AMD Node ID for
their local "type", i.e. GPU data fabric. This value is the same for all
fabrics of the same type in a system.

Read and cache the base AMD Node ID from one of the GPU devices during
module initialization. Use this to fixup the "node id" when reporting
memory errors at runtime.

  [ bp: Squash a fix making gpu_node_map static as reported by
        Tom Rix <trix@redhat.com>.
    Link: https://lore.kernel.org/r/20230610210930.174074-1-trix@redhat.com ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-6-muralimk@amd.com
2023-06-19 13:01:44 +02:00
Borislav Petkov (AMD) 852667c317 Merge ras/edac-drivers into for-next
* ras/edac-drivers:
  EDAC/npcm: Add NPCM memory controller driver
  dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-12 15:15:36 +02:00
Marvin Lin d244c610f1 EDAC/npcm: Add NPCM memory controller driver
Add driver for memory controller present on Nuvoton NPCM SoCs. The
memory controller supports single bit error correction and double bit
error detection.

Signed-off-by: Marvin Lin <milkfafa@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230111093245.318745-4-milkfafa@gmail.com
2023-06-12 15:14:10 +02:00
Borislav Petkov (AMD) 0a81fa5d74 Merge ras/edac-misc into for-next
* ras/edac-misc:
  EDAC/thunderx: Check debugfs file creation retval properly

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-07 10:50:07 +02:00
Yeqi Fu bf5c04ddd3 EDAC/thunderx: Check debugfs file creation retval properly
edac_debugfs_create_file() returns ERR_PTR by way of the respective
debugfs function it calls, if an error occurs.

The appropriate way to verify for errors is to use IS_ERR(). Do so.

  [ bp: Rewrite all text. ]

Signed-off-by: Yeqi Fu <asuk4.q@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230517173111.365787-1-asuk4.q@gmail.com
2023-06-06 23:04:56 +02:00
Muralidhara M K 9c42edd571 EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
AMD Family 19h Model 30h-3Fh systems can be connected to AMD MI200
accelerator/GPU devices such that the CPU and GPU data fabrics are
connected together. In this configuration, the CPU manages error logging
and reporting for MCA banks located on the GPUs. This includes HBM memory
errors reported from Unified Memory Controllers (UMCs) on the GPUs.
The GPU memory errors are handled like CPU memory errors.

AMD CPU UMC support in EDAC can be re-used for GPU UMC support. However,
keeping them separate means drastic changes in one path (e.g. to support
newer products) should have less impact on the other path.

Also, simplify the "gpu_" helper functions where possible. GPU product
configuration, like memory type and channel count, is fixed compared to
CPU products.

GPU UMCs each have four physical connections (phys) connected to eight
channels. There is a single "chip select". This differs from CPUs where
each UMC has one physical connection connected to one channel, and each
channel has up to four "chip selects".

Enumerate each UMC "phy" as an EDAC CSROW, since there is only a single
chip select for each physical connection. This is similar to how a CPU
UMC "phy" is enumerated as an EDAC CHANNEL, since there is only a single
channel for each physical connection.

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-5-muralimk@amd.com
2023-06-05 12:27:18 +02:00
Yazen Ghannam c35977b00f x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
2023-06-05 12:27:11 +02:00
Manivannan Sadhasivam cbd77119b6 EDAC/qcom: Get rid of hardcoded register offsets
The LLCC EDAC register offsets varies between each SoC. Hardcoding the
register offsets won't work and will often result in crash due to
accessing the wrong locations.

Hence, get the register offsets from the LLCC driver matching the
individual SoCs.

Cc: <stable@vger.kernel.org> # 6.0: 5365cea199 ("soc: qcom: llcc: Rename reg_offset structs to reflect LLCC version")
Cc: <stable@vger.kernel.org> # 6.0: c13d7d261e ("soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver")
Cc: <stable@vger.kernel.org> # 6.0
Fixes: a6e9d7ef25 ("soc: qcom: llcc: Add configuration data for SM8450 SoC")
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-3-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:55 -07:00
Manivannan Sadhasivam 3d49f7406b EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
"ret" variable will be assigned on both success and failure cases. So there
is no need to initialize it during start of qcom_llcc_core_setup().

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-2-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:54 -07:00
Hristo Venev 6c79e42169 EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
instead of 12.

With two 32GB dual-rank DIMMs the sizes appear to be reported correctly:

  EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
  EDAC amd64: F19h_M60h detected (node 0).
  EDAC MC: UMC0 chip selects:
  EDAC amd64: MC: 0:     0MB 1:     0MB
  EDAC amd64: MC: 2: 16384MB 3: 16384MB
  EDAC MC: UMC1 chip selects:
  EDAC amd64: MC: 0:     0MB 1:     0MB
  EDAC amd64: MC: 2: 16384MB 3: 16384MB
  AMD64 EDAC driver v3.5.0

ECC errors can also be detected:

  mce: [Hardware Error]: Machine check events logged
  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
  [Hardware Error]: Error Addr: 0x00000007ff7e93c0
  [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
  [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
  EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
  [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

According to Mario Limonciello, the same code should also work for
models 70h-7Fh (follow thread in Link).

  [ bp: Massage, the translation logic updates are pending. ]

Signed-off-by: Hristo Venev <hristo@venev.name>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230425201239.324476-1-hristo@venev.name
Link: https://lore.kernel.org/r/20230511174506.875153-2-hristo@venev.name
2023-05-15 16:32:47 +02:00
Yazen Ghannam b34348a0d7 EDAC/amd64: Remove module version string
The AMD64 EDAC module version information is not exposed through ABI
like MODULE_VERSION(). Instead it is printed during module init.

Version numbers can be confusing in cases where module updates are
partly backported resulting in a difference between upstream and
backported module versions.

Remove the AMD64 EDAC module version information to avoid user
confusion.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230410190959.3367528-1-yazen.ghannam@amd.com
2023-05-10 15:49:52 +02:00
Linus Torvalds 556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds a907047732 ARM: SoC drivers for v6.4
The most notable updates this time are for Qualcomm Snapdragon platforms.
 The Inline-Crypto-Engine gets a new DT binding and driver. A number of
 drivers now support additional Snapdragon variants, in particular the
 rsc, scm, geni, bwm, glink and socinfo, while the llcc (edac) and rpm
 drivers get notable functionality updates.
 
 Updates on other platforms include:
 
  - Various updates to the Mediatek mutex and mmsys drivers, including
    support for the Helio X10 SoC
 
  - Support for unidirectional mailbox channels in Arm SCMI firmware
 
  - Support for per cpu asynchronous notification in OP-TEE firmware
 
  - Minor updates for memory controller drivers.
 
  - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
    Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
    SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
    obsolete DT driver interfaces.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRGmncACgkQYKtH/8kJ
 Uif6ghAAw1TiPTJzJLLCNx+txOVFB62WDglv3T1CufjfcWp0Eh0RJSCcsCOPV+/7
 UHi4+X4nPAcudeOFMFtslCR8ExLRWY4j7t2ZYo/k+VI3jdB8Qkbr6NAQgAuRdLYX
 WZ1cV6o76B3bhO2HqSVNVZ8/3Z7OAYw4j9VDD/4AbW+l3GyentlQTjabpJNREvSS
 5HzT3ZI33o7M8mM4uYmmEXVrg8sCupbRyL9S7jTiFXRLcfqujclhfezJ4UrJJv7b
 wxGf+e2YNMqKH6PiKYufzN1TYI2D0YQeB1m56Y9FsAKxgAyHh2xWpsHeyVnaw0jc
 KaKjRN/H3JDlW/VCMAjQOIShCZdAs02xHnEXxY6pKLMM6i8/FkzzNIxNQwXrx5KH
 zYESXVd6suOI0eCZT8zkKKLHRT5EJRaliUv5Z+Qp2BBe3vJVZD0JqSlZ7lOznplF
 lviwL6ydAMr2cfTgfMxbRiYQVDzncFkfnR3t55SC6rYjGt6QWjeS0dDbGHf4WVC4
 FDbnST4JaBmi+frh55VooX7EpzIv9wa0/taayaChd9qvXnh22uqaqho1sPYKZ6BI
 OXduHQ3qojJhKKKK1VJKzN5Ef3OHLQLNrvcc1DsKILrrES4w4LX1C9dmyh2CLXLo
 q5cX6L1iB1Hx5tujalDYBsHBBmbiT/1tNM2S7pAGigiGy4KEc28=
 =r6jm
 -----END PGP SIGNATURE-----

Merge tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "The most notable updates this time are for Qualcomm Snapdragon
  platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
  and a number of drivers now support additional Snapdragon variants, in
  particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
  (edac) and rpm drivers get notable functionality updates.

  Updates on other platforms include:

   - Various updates to the Mediatek mutex and mmsys drivers, including
     support for the Helio X10 SoC

   - Support for unidirectional mailbox channels in Arm SCMI firmware

   - Support for per cpu asynchronous notification in OP-TEE firmware

   - Minor updates for memory controller drivers.

   - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
     Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
     SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
     obsolete DT driver interfaces"

* tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
  soc: ti: smartreflex: Simplify getting the opam_sr pointer
  bus: vexpress-config: Add explicit of_platform.h include
  soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
  memory: mtk-smi: mt8365: Add SMI Support
  dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
  dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
  memory: tegra: read values from correct device
  dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
  soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
  dt-bindings: firmware: document Qualcomm QCM2290 SCM
  soc: qcom: rpmh-rsc: Support RSC v3 minor versions
  soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
  soc/tegra: fuse: Remove nvmem root only access
  soc/tegra: cbb: tegra194: Use of_address_count() helper
  soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
  ARM: tegra: Remove MODULE_LICENSE in non-modules
  soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
  soc: tegra: cbb: Drop empty platform remove function
  firmware: arm_scmi: Add support for unidirectional mailbox channels
  dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
  ...
2023-04-25 12:02:16 -07:00
Borislav Petkov (AMD) ce8ac91130 Merge branches 'edac-drivers', 'edac-amd64' and 'edac-misc' into edac-updates
Combine all queued EDAC changes for submission into v6.4:

* ras/edac-drivers:
  EDAC/i10nm: Add Intel Sierra Forest server support
  EDAC/skx: Fix overflows on the DRAM row address mapping arrays

* ras/edac-amd64: (27 commits)
  EDAC/amd64: Fix indentation in umc_determine_edac_cap()
  EDAC/amd64: Add get_err_info() to pvt->ops
  EDAC/amd64: Split dump_misc_regs() into dct/umc functions
  EDAC/amd64: Split init_csrows() into dct/umc functions
  EDAC/amd64: Split determine_edac_cap() into dct/umc functions
  EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
  EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
  EDAC/amd64: Split ecc_enabled() into dct/umc functions
  EDAC/amd64: Split read_mc_regs() into dct/umc functions
  EDAC/amd64: Split determine_memory_type() into dct/umc functions
  EDAC/amd64: Split read_base_mask() into dct/umc functions
  EDAC/amd64: Split prep_chip_selects() into dct/umc functions
  EDAC/amd64: Rework hw_info_{get,put}
  EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
  EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
  EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
  EDAC/amd64: Rename debug_display_dimm_sizes()

* ras/edac-misc:
  EDAC/altera: Remove MODULE_LICENSE in non-module
  EDAC: Sanitize MODULE_AUTHOR strings
  EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
  EDAC/i5100: Fix typo in comment
  EDAC/altera: Remove redundant error logging

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-04-24 09:14:30 +02:00
Qiuxu Zhuo 96ae3995c6 EDAC/i10nm: Add Intel Sierra Forest server support
The Sierra Forest CPU model uses similar memory controller registers as
Granite Rapids server. Add Sierra Forest CPU model ID for EDAC support.

Tested-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230410131531.11914-1-qiuxu.zhuo@intel.com
2023-04-10 09:33:51 -07:00
Yang Li 49aba1c589 EDAC/amd64: Fix indentation in umc_determine_edac_cap()
Use consistent indentation to improve the readability and fix:

  drivers/edac/amd64_edac.c:1279 umc_determine_edac_cap() warn: inconsistent indenting

Fixes: f6a4b4a1aa ("EDAC/amd64: Split determine_edac_cap() into dct/umc functions")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230404022557.46409-1-yang.lee@linux.alibaba.com
2023-04-04 17:22:55 +02:00
Nick Alcock e088d80e2a EDAC/altera: Remove MODULE_LICENSE in non-module
Since

  8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"),

MODULE_LICENSE declarations are used to identify modules. As
a consequence, uses of the macro in non-modules will cause modprobe to
misidentify their containing object file as a module when it is not
(false positives), and modprobe might succeed rather than failing with
a suitable error message.

altera_edac is not a module for a while now, remove the macro call.

Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com
2023-04-01 13:18:50 +02:00
Borislav Petkov (AMD) 371b27f2f3 EDAC: Sanitize MODULE_AUTHOR strings
Fixup the remaining MODULE_AUTHOR strings to not contain newlines.
Shorten and unbreak others.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230328134309.23159-1-bp@alien8.de
2023-03-28 15:43:30 +02:00
Jonathan Neuschäfer 01db1030f1 EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
MODULE_AUTHOR strings don't usually include a newline character.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230129165054.1675554-1-j.neuschaefer@gmx.net
2023-03-28 15:26:52 +02:00
Muralidhara M K b3ece3a6a2 EDAC/amd64: Add get_err_info() to pvt->ops
GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.

Make sure to call this after MCA_STATUS[SyndV] is checked, since the
csrow value is found in MCA_SYND.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K f6f36382d6 EDAC/amd64: Split dump_misc_regs() into dct/umc functions
Add a function pointer to pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-22-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K 6fb8b5fb9e EDAC/amd64: Split init_csrows() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

Also, drop the check for an "empty" device, i.e. one without memory.
This is redundant and already done in instance_has_memory() earlier in
the init path.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K f6a4b4a1aa EDAC/amd64: Split determine_edac_cap() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-20-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Yazen Ghannam 9369239e8d EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
...to match the "umc_" prefix convention.

No functional change is intended.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-19-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 0a42a37f65 EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
The init_one_instance() path is shared between legacy and modern
systems. So add the new functions to a function pointer in pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K eb2bcdfc37 EDAC/amd64: Split ecc_enabled() into dct/umc functions
Call them using a function pointer in pvt->ops. The "ECC enabled"
check is done outside of the hardware information gathering done in
hw_info_get(). So a high-level function pointer is needed to separate
the legacy and modern paths.

No functional change is intended.

  [Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 32ecdf8688 EDAC/amd64: Split read_mc_regs() into dct/umc functions
Call them from their respective hw_info_get() paths.

ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz()
is left out of the UMC path. Do not save TOP_MEM* values on modern
controllers because they're not needed there (read: they were used only
for debugging, if anything).

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 78ec161a91 EDAC/amd64: Split determine_memory_type() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call them after all other hardware registers have been saved, since the
memory type for a device will be determined based on the saved
information.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K b29dad9bf3 EDAC/amd64: Split read_base_mask() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call the new functions after the setting the chip select base and mask
counts, since those are need to read the correct number of chip select
base and mask registers. And call the new functions before the remaining
set up, because the base and mask register values will be needed later.

  [Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 637f60ef2c EDAC/amd64: Split prep_chip_selects() into dct/umc functions
Call them from their respective hw_info_get() function. Avoid the
need for family/model-based function pointers.

Add the calls before reading hardware registers from the memory
controllers, since the number of chip select bases and masks needs to be
known first.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-13-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Yazen Ghannam 9a97a7f4d7 EDAC/amd64: Rework hw_info_{get,put}
The bulk of system-specific information is gathered at init time with
hw_info_get(). This function calls a number of helper functions, and
many of these helper functions are split between a modern UMC/DF path
and a legacy DCT path.

Split hw_info_get() into legacy and modern versions. This creates two
separate code paths early on, and legacy and modern helper functions can
be called directly in the appropriate code path.

Also, simplify hw_info_put() and share it between legacy and modern
systems. NULL pointer checks are done in pci_dev_put() and kfree(), so
they can be called unconditionally.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-12-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K ed623d55ee EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
Future AMD systems will support heterogeneous "AMD Node" types, e.g.
CPU and GPU types. Therefore, a global family type shared across all
AMD nodes is no longer appropriate.

Move struct low_ops routines and members of struct amd64_family_type
to struct amd64_pvt.

Currently, there are many code branches that split between "modern" and
"legacy" systems. Another code branch will be needed in order to cover
GPU cases. However, rather than introduce another branching case in
multiple functions, the current branching code should be switched to a
set of function pointers. This change makes the code more readable and
simplifies adding support for new families/models.

In order to reuse code, define two sets of function pointers. Use one
for modern systems (Family 17h and later). This will not change between
current CPU families. Use another set of function pointers for legacy
systems (before Family 17h). Use the Family 16h versions as default
for the legacy ops since these are the latest, and adjust the function
pointers as needed for older families.

  [ Yazen: rebased/reworked patch and reworded commit message. ]
  [  bp: Fix rev8 or later check. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-11-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam 5a1adb375d EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
The ECC symbol size was needed on legacy system to lookup the ECC syndrome.
This is not needed on modern systems because the ECC syndrome is explicitly
provided in the MCA information.

Remove the ECC symbol size discovery code for modern UMC-based systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-10-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam a2e59ab8e9 EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
The same function is used to calculate chip select size for all Zen-based
family/models. Therefore, a family/model function pointer is not necessary.

Drop the dbam_to_cs() function pointer for Family 17h and later systems.
Also, move the Family 17h function to avoid a forward declaration. Rename
it to indicate that the UMC Address Mask is used rather than the legacy
DBAM value.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-9-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam c0984666fd EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
Split get_csrow_nr_pages() into a legacy and modern versions in preparation
for further legacy/modern refactoring.

Also, rename f17_get_cs_mode() to match the new convention.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-8-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam 00e4feb8c0 EDAC/amd64: Rename debug_display_dimm_sizes()
Use the "dct" and "umc" prefixes for legacy and modern versions
respectively.

Also, move the "dct" version to avoid a forward declaration, and fixup
some checkpatch warnings in the process.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-7-yazen.ghannam@amd.com
2023-03-24 12:54:47 +01:00
Jongwoo Han 5b6cb45072 EDAC/i5100: Fix typo in comment
Correct typo from 'preform' to 'perform' in comment.

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230302021120.56794-1-jongwooo.han@gmail.com
2023-03-23 12:04:04 +01:00
Greg Kroah-Hartman cb4a0bec0b EDAC/sysfs: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lore.kernel.org/r/20230313182918.1312597-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22 09:25:49 +01:00
Deepak R Varma 4e89780a4c EDAC/altera: Remove redundant error logging
A call to platform_get_irq() already prints an error on failure within
its own implementation. So printing another error based on its return
value in the caller is redundant and should be removed. The clean up
also makes if condition block braces unnecessary. Remove that as well.

Issue identified using platform_get_irq.cocci coccinelle semantic patch.

Signed-off-by: Deepak R Varma <drv@mailo.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y/+j27kqdhflPtaj@ubun2204.myguest.virtualbox.org
2023-03-21 22:50:46 +01:00
Manivannan Sadhasivam 721d3e91bf qcom: llcc/edac: Support polling mode for ECC handling
Not all Qcom platforms support IRQ mode for ECC handling. For those
platforms, the current EDAC driver will not be probed due to missing ECC
IRQ in devicetree.

So add support for polling mode so that the EDAC driver can be used on all
Qcom platforms supporting LLCC.

The polling delay of 5000ms is chosen based on Qcom downstream/vendor
driver.

Reported-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230314080443.64635-14-manivannan.sadhasivam@linaro.org
2023-03-15 15:17:08 -07:00
Manivannan Sadhasivam ee13b50087 qcom: llcc/edac: Fix the base address used for accessing LLCC banks
The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
accessing the (Control and Status Registers) CSRs of each LLCC bank.
This stride only works for some SoCs like SDM845 for which driver
support was initially added.

But the later SoCs use different register stride that vary between the
banks with holes in-between. So it is not possible to use a single register
stride for accessing the CSRs of each bank. By doing so could result in a
crash.

For fixing this issue, let's obtain the base address of each LLCC bank from
devicetree and get rid of the fixed stride. This also means, there is no
need to rely on reg-names property and the base addresses can be obtained
using the index.

First index is LLCC bank 0 and last index is LLCC broadcast. If the SoC
supports more than one bank, then those need to be defined in devicetree
for index from 1..N-1.

Reported-by: Parikshit Pareek <quic_ppareek@quicinc.com>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230314080443.64635-13-manivannan.sadhasivam@linaro.org
2023-03-15 15:17:08 -07:00
Qiuxu Zhuo 71b1e3ba3f EDAC/skx: Fix overflows on the DRAM row address mapping arrays
The current DRAM row address mapping arrays skx_{open,close}_row[]
only support ranks with sizes up to 16G. Decoding a rank address
to a DRAM row address for a 32G rank by using either one of the
above arrays by the skx_edac driver, will result in an overflow on
the array.

For a 32G rank, the most significant DRAM row address bit (the
bit17) is mapped from the bit34 of the rank address. Add this new
mapping item to both arrays to fix the overflow issue.

Fixes: 4ec656bdf4 ("EDAC, skx_edac: Add EDAC driver for Skylake")
Reported-by: Feng Xu <feng.f.xu@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com
2023-03-13 10:42:00 -07:00
Linus Torvalds d9de5ce8a5 - Add a driver for the RAS functionality on Xilinx's on chip memory
controller
 
 - Add support for decoding errors from the first and second level memory
   on SKL-based hardware
 
 - Add support for the memory controllers in Intel Granite Rapids and
   Emerald Rapids machines
 
 - First round of amd64_edac driver simplification and removal of
   unneeded functionality
 
 - The usual cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPw5xsACgkQEsHwGGHe
 VUohSA//bS/iFiglmpTKiY1qynPuVRfZCYGZov5JN+fRzpFQos1HISHGHTKtGbGJ
 pau8Y6+QJG5LkFdR8Nf1u25WOEaYhBHHj1crUAkmSIz9zYyinrdYyDTOC2LBTmSf
 BziOElAtuhTrvQ4WNL75cFzpaAKCGE7yuwZZFLVM3gHXiuVeZ3Spzbe0I9eJ4uDe
 Hvgg1/IVoGAsvhNouxG5ABgVzKWxoyqEDFZtLo1adLuv8cm0hwFKWqC7zw9Y/gj0
 b8tiqnoRxrEDNt8uc+D+y9HIXunB+YPBUcGhDZFrYAMlWQbENQ2WJSodIg0klNtv
 Nd62wWZavdtCv9rMjOdGFPuLvWV1Lr5uIsNVSEhuqRpXjywFdYycMfmuD30YIfA6
 k1t71pxGSB5fJ6qr/y0a4HkoRz9HON03Ki00gkVIMMo48k0DJKtzt6Mui8rtzIe3
 uNlSDxyMXQvEUg/nR54kPAropL5DvKRx7QJ3Z1Yh4KcFmH1NtjIqoJfDghK2Gz1X
 XIzIzeTJy+LRepZ6KRSEDOM8FrFzHkUKU9OZTnn/RlWha6nKyBaVyeb5kutJCW+N
 Ytj9DqSxpAFDRBvbUpHRRFL1h5bgss7+AXLpkmYBF0QKmYiYV/MBSBdNpEZ1B3VC
 CsRlD1IT6FSUhAdPqhAvbCDPOGpd/AvGhmLnfmn78wGIIWR0W24=
 =i3bo
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add a driver for the RAS functionality on Xilinx's on chip memory
   controller

 - Add support for decoding errors from the first and second level
   memory on SKL-based hardware

 - Add support for the memory controllers in Intel Granite Rapids and
   Emerald Rapids machines

 - First round of amd64_edac driver simplification and removal of
   unneeded functionality

 - The usual cleanups and fixes

* tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive
  EDAC/amd64: Remove early_channel_count()
  EDAC/amd64: Remove PCI Function 0
  EDAC/amd64: Remove PCI Function 6
  EDAC/amd64: Remove scrub rate control for Family 17h and later
  EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
  EDAC/i10nm: Add driver decoder for Sapphire Rapids server
  EDAC/i10nm: Add Intel Granite Rapids server support
  EDAC/i10nm: Make more configurations CPU model specific
  EDAC/i10nm: Add Intel Emerald Rapids server support
  EDAC/skx_common: Delete duplicated and unreachable code
  EDAC/skx_common: Enable EDAC support for the "near" memory
  EDAC/qcom: Add platform_device_id table for module autoloading
  EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM
  dt-bindings: edac: Add bindings for Xilinx ZynqMP OCM
2023-02-21 08:10:03 -08:00
Linus Torvalds 0246725d73 - Add support for reporting more bits of the physical address on error,
on newer AMD CPUs
 
 - Mask out bits which don't belong to the address of the error being
   reported
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzUAEACgkQEsHwGGHe
 VUr22RAAh7fi3s8sDP4B2WBe1LPKZystZamxlLObBG2eLT7g0YmSKV12+bHCGf/B
 nGqz9iy+e/T1Khxv0gdEyuENwzuitXgiEOYgB4u70HimWy5422ZCzn1EiOMFtyST
 g0ehOR+tU84YwMVR40ui3spI1DHgeVPqVBLHBARZ1OAaA58N8eVREC6MqJAeAzIU
 +VYiBbn69quECTuU1P7yaT8NDnbm5G6pA1dhKLc7vLl9QWzoW1yWLLcp+oGFN6B8
 rcGDKEDK1OYtdHScRCfhFrznkeYP6SVnSt4wlAgX+HVGPoMpvq8nJygxCWdE0yjd
 aQGhdcVJkQlSqm1iJUv0MK9nkolqXVVSVTurpHunAq7ctul6Qm/X+fsfwBgSIXXn
 Gdj3in374MLWCz/xGqeBS8IiiPxGxJA9s350jyk02LK6Np6sXeuc4PpR66+6FAKQ
 Ypen+uWJ6oBof04bW7DBK0R14atA8EpOOLUrrGIsSkNSEIjLaCipMZOpRCbOw76N
 bXcdnKKsaEDjKtHClvx/vZXklfzWk0OgF8qtY0nGF+khvDAi3pQaIIlCehf0Qemh
 6j00TqIYBCXa0kuKktdPzVJSM7A7TZ5ftboa1IPhE+GYrFFee/VJ3yfgqz102FWI
 RJsY8JXt+EP3VMSOQYqQ5KzcLBJ2uDiRYtgUo4P1CITNpRfZEMc=
 =e9v9
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Add support for reporting more bits of the physical address on error,
   on newer AMD CPUs

 - Mask out bits which don't belong to the address of the error being
   reported

* tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Mask out non-address bits from machine check bank
  x86/mce: Add support for Extended Physical Address MCA changes
  x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
2023-02-21 08:04:51 -08:00
Yazen Ghannam 28980db947 EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive
Reportedly, clang cannot do interprocedural analysis:

  https://lore.kernel.org/r/20230213-amd64_edac-wsometimes-uninitialized-v1-1-5bde32b89e02@kernel.org

and see that those arguments won't be used uninitialized.

So, yeah, the code's fine even without this. Normally, such a "fix"
won't be applied but that warning gets automatically enabled in -Wall
builds and when CONFIG_WERROR is set in allmodconfig builds, the build
fails.

So shut it up with a minimal fix as this code will see more
reorganization very soon.

  [ bp: Write commit message. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/Y%2BqdVHidnrrKvxiD@dev-arch.thelio-3990X
2023-02-14 17:56:14 +01:00
Yazen Ghannam c4605bde33 EDAC/amd64: Remove early_channel_count()
The early_channel_count() function seems to have been useful in the past
for knowing how many EDAC mci structures to populate. However, this is no
longer needed as the maximum channel count for a system is used instead.

Remove the early_channel_count() helper functions and related code. Use the
size of the channel layer when iterating over channel structures.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-6-yazen.ghannam@amd.com
2023-02-09 14:43:39 +01:00
Yazen Ghannam cf981562e6 EDAC/amd64: Remove PCI Function 0
PCI Function 0 is used on Family 17h and later only to read the "dhar"
value. This value is printed and provided through a module-specific
debug sysfs file. The value is not used for any Family 17h and later
code, and it does not have any apparent debug value on these systems.

Remove "dhar", Function 0 PCI IDs, and all related code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-5-yazen.ghannam@amd.com
2023-02-09 14:41:56 +01:00
Yazen Ghannam 6229235f7c EDAC/amd64: Remove PCI Function 6
PCI Function 6 is used on Family 17h and later to access scrub
registers. With scrub access removed, this function has no other use.

Remove all Function 6 PCI IDs and related code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-4-yazen.ghannam@amd.com
2023-02-09 11:50:52 +01:00