Commit graph

1509 commits

Author SHA1 Message Date
Linus Torvalds
11b1177399 * Do not build mpc85_edac as a module (Michael Ellerman)
* Correct edac_mc_find()'s return value on error	(Robert Richter)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlzdOjIACgkQEsHwGGHe
 VUrkxQ/+J+mSL23ush4HNq43OqW7Wgpkdoljo+5hr1XWH790Kc/HNI+B/Qdsgl6I
 UoFkJPwMTfxqV/JZFuNC22ADnrXw6nRDZSE7+K+fHmpBgj2rK4hQ6tOYON9dTg/X
 9nPF1ylBQcdzwNt4eTTnS5U1eKBSWqmQm5NtYmq2s83DQ1v3r2Pckqx00SC5Stdy
 9vQLCPUR5SjsGpeQZnrdesxO89t3pWWzfZrAsCxPiY36VbUVmsHQVbHC9yfpLSIE
 6pzc0esQx6zqyb4e0Rx+aKJPJ6fS43NsxifkI81h7S5F/xPEuZy39rhAmnEOYCNE
 0ceQwl+ktzzGzmurgXf9Y5LLy8vOcCUJX4HBCkaweHSDRiP1xZtIl7Ao3aAaP4no
 AUQqX/O0BydkO+awarhCduM7vIGdr0B9xn0xCtUODitoTIzaoVZ8F2iF7xSOi/3F
 QkP4dSsf72ZK/EdA/8k1bxyBnxQ/TKgwLIQEJ1n9Bz7biPxZmqCyHQMDg7HaA1Uc
 crGkJJr0cGHb9TEAzy0ygtRNrKE+hpt0FGFGE0zlE4uX7YP1Y/Bw3R38zbcl2x9h
 nIqEmHwswVfBdm28RfxRH3LnS9SHwsXTtUk0y89Je1pK0qZ2Kv6nRiEFzWIAqPyY
 I/7eorSkNXU+vVnZko8cvuHEuJh1VUj2bGJeS6NUo9X5+IOE0LE=
 =yjXI
 -----END PGP SIGNATURE-----

Merge tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fixes from Borislav Petkov:

 - Do not build mpc85_edac as a module (Michael Ellerman)

 - Correct edac_mc_find()'s return value on error (Robert Richter)

* tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC/mc: Fix edac_mc_find() in case no device is found
  EDAC/mpc85xx: Prevent building as a module
2019-05-16 11:55:35 -07:00
Robert Richter
29a0c84397 EDAC/mc: Fix edac_mc_find() in case no device is found
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.

I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.

 [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]

Fixes: c73e8833be ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
2019-05-14 17:08:46 +02:00
Michael Ellerman
2b8358a951 EDAC/mpc85xx: Prevent building as a module
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:

  ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
  ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!

We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.

This seems to have been broken since at least

  c92132f598 ("edac/85xx: Add PCIe error interrupt edac support")

(Nov 2013).

 [ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
   as a module. ]

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
2019-05-10 20:15:11 +02:00
Linus Torvalds
ffa6f55eb6 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:

 - Support for varying MCA bank numbers per CPU: this is in preparation
   for future CPU enablement (Yazen Ghannam)

 - MCA banks read race fix (Tony Luck)

 - Facility to filter MCEs which should not be logged (Yazen Ghannam)

 - The usual round of cleanups and fixes

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
  x86/MCE: Add an MCE-record filtering function
  RAS/CEC: Increment cec_entered under the mutex lock
  x86/mce: Fix debugfs_simple_attr.cocci warnings
  x86/mce: Remove mce_report_event()
  x86/mce: Handle varying MCA bank counts
  x86/mce: Fix machine_check_poll() tests for error types
  MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE
  x86/MCE: Group AMD function prototypes in <asm/mce.h>
2019-05-06 19:54:57 -07:00
Borislav Petkov
8de9930a46 Revert "EDAC/amd64: Support more than two controllers for chip select handling"
This reverts commit 0a227af521.

Unfortunately, this commit caused wrong detection of chip select sizes
on some F17h client machines:

  --- 00-rc6+     2019-02-14 14:28:03.126622904 +0100
  +++ 01-rc4+     2019-04-14 21:06:16.060614790 +0200
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0MB
   EDAC MC: UMC1 chip selects:
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0M

Revert it for now until it has been solved properly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2019-04-25 16:30:34 +02:00
Yazen Ghannam
71a84402b9 x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
errors under certain conditions. The errors are benign and can safely be
ignored. However, the high error rate may cause the MCA threshold
counter to overflow causing a high rate of thresholding interrupts.

In addition, users may see the errors reported through the AMD MCE
decoder module, even with the interrupt disabled, due to MCA polling.

Clear the "Counter Present" bit in the Instruction Fetch bank's
MCA_MISC0 register. This will prevent enabling MCA thresholding on this
bank which will prevent the high interrupt rate due to this error.

Define an AMD-specific function to filter these errors from the MCE
event pool so that they don't get reported during early boot.

Rename filter function in EDAC/mce_amd to avoid a naming conflict, while
at it.

 [ bp: Move function prototype to the internal header and
   massage/cleanup, fix typos. ]

Reported-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "clemej@gmail.com" <clemej@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # 5.0.x: c95b323dcd: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26ed: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd4074: x86/MCE: Group AMD function prototypes in <asm/mce.h>
Cc: <stable@vger.kernel.org> # 5.0.x
Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
2019-04-23 18:16:07 +02:00
Thor Thayer
fad9fab975 EDAC/altera, firmware/intel: Add Stratix10 ECC DBE SMC call
Reserve ECC Double Bit Error SMC call to alert U-Boot that a DBE has
occurred. Move the call from local EDAC header file to a common header.

 [ bp: Merge the two patches. ]

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Richard Gong <richard.gong@intel.com>
Reviewed-by: Alan Tull <atull@kernel.org> # firmware
Cc: Greg KH <greg@kroah.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@kernel.org
Link: https://lkml.kernel.org/r/1553870639-23895-1-git-send-email-thor.thayer@linux.intel.com

Signed-off-by: Borislav Petkov <bp@suse.de>
2019-04-02 17:42:15 +02:00
Thor Thayer
788586efd1 EDAC/altera: Initialize peripheral FIFOs in probe()
The FIFO memory and ECC initialization doesn't need to be
done as a separate operation early in the startup.

Improve the Arria10 and Stratix10 peripheral FIFO init
by initializing memory and enabling ECC as part of the
device driver initialization.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-2-git-send-email-thor.thayer@linux.intel.com
2019-03-29 11:46:23 +01:00
Thor Thayer
436b0a583a EDAC/altera: Do less intrusive error injection
Improve the Arria10 and Stratix10 error injection routine
by reading the data and changing just 1 bit before writing
back out. Previous routine would overwrite the first bytes
to 0 then change 1 bit but this method is less intrusive.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-1-git-send-email-thor.thayer@linux.intel.com
2019-03-29 11:35:39 +01:00
Yazen Ghannam
fc00c6a416 EDAC/amd64: Adjust printed chip select sizes when interleaved
AMD systems may support chip select interleaving. However, on family
17h+ this was not taken into account when printing the chip select
sizes.

Add support to detect if chip selects are interleaved on family 17h+,
and adjust the sizes accordingly.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
0a227af521 EDAC/amd64: Support more than two controllers for chip select handling
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
7835961d37 EDAC/amd64: Recognize x16 symbol size
Future AMD systems may support x16 symbol sizes.

Recognize if a system is using x16 symbol size. Also, simplify the print
statement.

Note that a x16 syndrome vector table is not necessary like with x4 or
x8 syndromes. This is because systems that support x16 symbol sizes are
SMCA systems and in that case, the syndrome can be directly extracted
from the MCA_SYND[Syndrome] field.

 [ bp: massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
869adc4316 EDAC/amd64: Set maximum channel layer size depending on family
The AMD64 EDAC module currently hardcodes the EDAC channel layer size
count to two. Future AMD systems may have more channels than this.

Set the EDAC channel layer size equal to the maximum number of channels
possible for the system. On Family 17h and later, this is set in the
num_umcs variable. Older systems will continue to use two as the
default.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
bdcee7747f EDAC/amd64: Support more than two Unified Memory Controllers
The first few models of Family 17h all had 2 Unified Memory Controllers
per Die, so this was treated as a fixed value. However, future systems
may have more Unified Memory Controllers per Die.

Related to this, the channel number and base address of a Unified Memory
Controller were found by matching on fixed, known values. However,
current and future systems follow this pattern for the channel number
and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
the channel number. So matching on hardcoded values is not necessary.

Set the number of Unified Memory Controllers at driver init time based
on the family/model. Also, update the functions that find the channel
number and base address of a Unified Memory Controller to support more
than two.

 [ bp: Move num_umcs into the .c file and simplify comment. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
4d30d2bc3c EDAC/amd64: Use a macro for iterating over Unified Memory Controllers
Define and use a macro for looping over the number of Unified Memory
Controllers.

No functional change.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
6e846239e5 EDAC/amd64: Add Family 17h Model 30h PCI IDs
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.

This also fixes a probe failure that appeared when some other PCI IDs
for Family 17h Model 30h were added to the AMD NB code.

Fixes: be3518a16e (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Thor Thayer
1bd76ff448 EDAC, altera: Fix S10 Double Bit Error Notification
Stratix10 Double Bit Error Address was always read from SDRAM Address
register instead of each device's Address register.

To determine which device had the DBE, cycle through the EDAC devices
comparing the DBE value to the db_irq value. Once found, report the DBE
Address from the device registers as well as the device name.

Finally, notify the system via an SMC call and indicate the panic should
result in a system reboot. Change a run-time check to a Stratix10
compile-time check for a clean SMC notification.

Fixes: d5fc912556 ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1552490842-25440-1-git-send-email-thor.thayer@linux.intel.com
2019-03-23 10:03:30 +01:00
Qiuxu Zhuo
fe783516e3 EDAC, skx, i10nm: Make skx_common.c a pure library
The following Kconfig constellations fail randconfig builds:

  CONFIG_ACPI_NFIT=y
  CONFIG_EDAC_DEBUG=y
  CONFIG_EDAC_SKX=m
  CONFIG_EDAC_I10NM=y

or

  CONFIG_ACPI_NFIT=y
  CONFIG_EDAC_DEBUG=y
  CONFIG_EDAC_SKX=y
  CONFIG_EDAC_I10NM=m

with:
  ...
  CC [M]  drivers/edac/skx_common.o
  ...
  .../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module'

That is because if one of the two drivers - skx_edac or i10nm_edac - is
built-in and the other one is a module, the shared file skx_common.c
gets linked into a module object by kbuild. Therefore, when linking that
same file into vmlinux, the '__this_module' symbol used in debugfs isn't
defined, leading to the above error.

Fix it by moving all debugfs code from skx_common.c to both skx_base.c
and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to the
'__this_module' symbol anymore.

Clarify skx_common.c's purpose at the top of the file for future
reference, while at it.

 [ bp: Make text more readable. ]

Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk
2019-03-23 09:43:50 +01:00
Linus Torvalds
e13284da94 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
 "This time around we have in store:

   - Disable MC4_MISC thresholding banks on all AMD family 0x15 models
     (Shirish S)

   - AMD MCE error descriptions update and error decode improvements
     (Yazen Ghannam)

   - The usual smaller conversions and fixes"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Improve error message when kernel cannot recover, p2
  EDAC/mce_amd: Decode MCA_STATUS in bit definition order
  EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
  EDAC, mce_amd: Print ExtErrorCode and description on a single line
  EDAC, mce_amd: Match error descriptions to latest documentation
  x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
  x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
  x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
  RAS: Add a MAINTAINERS entry
  RAS: Use consistent types for UUIDs
  x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
  x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
  x86/MCE: Switch to use the new generic UUID API
2019-03-08 09:11:39 -08:00
Linus Torvalds
1b37b8c48d * A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)
* New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)
 
 * Altera SDRAM functionality carveout for separate enablement of RAS and
 SDRAM capabilities on some Altera chips. (Thor Thayer)
 
 * The usual round of cleanups and fixes
 
 Last but not least:
 
 * Recruit James Morse as a reviewer for the ARM side
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlx+SqYACgkQEsHwGGHe
 VUpwxg//fDdlIcnNjPUKWcBQxfy7meFd5xlDwbbIbkdE1mfHLBP6n2gRVM9NguSm
 shYPXcdqIrFTn4D7nOxVLS2Gqa7cF/j9M+YaqTNfe9/OVI0oSeM84D2+kEUi2tHQ
 LkCbBL9W+SAk4wjcFUPrEuwPaABfPdt0g9wuEf3Yg+PQsZ4FojwF7p91plBiKo/X
 GewLIM4+QT/mIkyn5u+2UJWayUvtdc1nchBGg3klYaDTRsUqH9pn284bInj7/Woj
 r34288yXuksIhDnUd2h4F9RCdZegBLIZf/k7Rqdg+Acot64c3PprE+/SI9nFcYfn
 fcF/48Sv6vMfP5kDKeJhsDjWu85VdpP+Cp4bxebXx4NURWn30kyYGDdpvbpgWxzc
 XDOXiEDxfh43/dNEyqCRr86dcZS8ro1pQNlnQvxOJyMljdEGjbB4JizG2ZvVluBP
 hSu3ifgpTiBGJMRQQijha41SMuWE7Z1ZgZt/XnyPAKwEEFtQVrm7IfnDohag3VYw
 6kWMVeyenmx/yF1JmA0fTxAdeeZPMnbUx0JxHRo1wJXF+1b19b0P+1nYUjgKlXQN
 Wq78DGPkQ9InfISFegS/A2AMWk+ZgLZ5d4pVwRVWdyeOMQVUoXO4R3KQur1tV7gu
 vm5BpWRZUszhcVvuhly8fOTyOsudYsNe7EeMd2V0Q2FZBy81MH8=
 =A1kA
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)

 - New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)

 - Altera SDRAM functionality carveout for separate enablement of RAS
   and SDRAM capabilities on some Altera chips. (Thor Thayer)

 - The usual round of cleanups and fixes

And last but not least: recruit James Morse as a reviewer for the ARM
side.

* tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC/altera: Add separate SDRAM EDAC config
  EDAC, altera: Add missing of_node_put()
  EDAC, skx_common: Add code to recognise new compound error code
  EDAC, i10nm: Fix randconfig builds
  EDAC, i10nm: Add a driver for Intel 10nm server processors
  EDAC, skx_edac: Delete duplicated code
  EDAC, skx_common: Separate common code out from skx_edac
  EDAC: Do not check return value of debugfs_create() functions
  EDAC: Add James Morse as a reviewer
  dt-bindings, EDAC: Add Aspeed AST2500
  EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
2019-03-08 09:07:07 -08:00
Thor Thayer
580b5cf50c EDAC/altera: Add separate SDRAM EDAC config
The CONFIG_ALTERA_EDAC Kconfig symbol always enables the SDRAM EDAC
functionality. On the newer architectures, however, there are cases
where the peripheral EDAC functionality is enabled but SDRAM needs to be
disabled.

Move SDRAM functions so they can be contained inside the conditional
CONFIG. Create new CONFIG option just for SDRAM.

 [ bp: Massage commit message. ]

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: dinguyen@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux@armlinux.org.uk
Link: https://lkml.kernel.org/r/1551121006-4657-2-git-send-email-thor.thayer@linux.intel.com
2019-02-26 16:18:57 +01:00
Yazen Ghannam
a0bcd3c0b8 EDAC/mce_amd: Decode MCA_STATUS in bit definition order
Sort the MCA_STATUS bits in decode output to follow how they are defined
in the register.

The order is as follows:

  Bit | Decode
  ------------
  62  | Over
  61  | UC
  59  | MiscV
  58  | AddrV
  57  | PCC
  55  | TCC
  53  | SyndV
  46  | CECC
  45  | UECC
  44  | Deferred
  43  | Poison
  40  | Scrub

 [ bp: Massage a bit. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190212212417.107049-2-Yazen.Ghannam@amd.com
2019-02-15 14:36:31 +01:00
Yazen Ghannam
3f4da372ec EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
Previous AMD systems have had a bit in MCA_STATUS to indicate that an
error was detected on a scrub operation. However, this bit was defined
differently within different banks and families/models.

Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
or defined as "Scrub", for all MCA banks and CPU models. Therefore, this
bit can be defined as the "Scrub" bit.

Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
module for Family 17h and newer systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com
2019-02-15 14:25:58 +01:00
Huang Zijiang
7f736599d6 EDAC, altera: Add missing of_node_put()
The call to of_parse_phandle() returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.

Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/1550126347-27984-1-git-send-email-huang.zijiang@zte.com.cn
2019-02-15 12:02:47 +01:00
Tony Luck
cbfa482f7e EDAC, skx_common: Add code to recognise new compound error code
A new error code for systems that use DRAM as an extra level of cache
looks like:

    000F 0010 1MMM CCCC

where the MMM and CCCC bits are used for the same purpose as the
original code. For this new class of errors the ADXL translation will
provide details of both the DIMM used as cache for the error location
and the component that is being cached.

Note: This new error code is first supported in Skylake. Older EDAC
drivers do not need to be updated.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190205182109.27828-1-tony.luck@intel.com
2019-02-06 11:03:06 +01:00
Tony Luck
d6a9f7336d EDAC, i10nm: Fix randconfig builds
I10NM_EDAC depends on CONFIG_ACPI so make that dependency explicit.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190205180200.26865-1-tony.luck@intel.com
2019-02-06 10:40:58 +01:00
Yazen Ghannam
1c1522d32a EDAC, mce_amd: Print ExtErrorCode and description on a single line
Save a log line by printing the extended error code and the description
on a single line. This is similar to how errors are printed in other
subsystems, e.g. "#, description". If we don't have a valid description
then only the number/code is printed.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-6-Yazen.Ghannam@amd.com
2019-02-04 19:29:13 +01:00
Yazen Ghannam
e03447ee71 EDAC, mce_amd: Match error descriptions to latest documentation
Update the error descriptions to match the latest documentation for
easier searching. In some cases the changes are small and in other cases
the changes may be total rewording of the description.

No functional changes.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-5-Yazen.Ghannam@amd.com
2019-02-03 13:16:50 +01:00
Yazen Ghannam
8a5dd2cd2f x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
Some SMCA bank types on future systems will report new error types even
though the bank type is not treated as a new version. These new error
types will reported by bits that are reserved in past systems.

Add the new error descriptions to the lists in edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
2019-02-03 13:05:16 +01:00
Yazen Ghannam
3ad7e748c1 x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
The existing CS, PSP, and SMU SMCA bank types will see new versions (as
indicated by their McaTypes) in future SMCA systems.

Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
same names as the older versions, since they are logically the same to
the user. SMCA systems won't mix and match IP blocks with different
McaType versions in the same system, so there isn't a need to
distinguish them. The MCA_IPID register is saved when logging an MCA
error, and that can be used to triage the error.

Also, add the new error descriptions to edac_mce_amd. Some error types
(positions in the list) are overloaded compared to the previous
McaTypes. Therefore, just create new lists of the error descriptions to
keep things simple even if some of the error descriptions are the same
between versions.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
2019-02-03 13:01:57 +01:00
Yazen Ghannam
cbfa447edd x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and
PCIE SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.com
2019-02-03 13:01:44 +01:00
Qiuxu Zhuo
d4dc89d069 EDAC, i10nm: Add a driver for Intel 10nm server processors
This driver supports the Intel 10nm series server integrated memory
controller. It gets the memory capacity and topology information by
reading the registers in PCI configuration space and memory-mapped I/O.

It decodes the memory error address to the platform specific address
by using the ACPI Address Translation (ADXL) Device Specific Method
(DSM).

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-5-tony.luck@intel.com
2019-02-02 13:33:18 +01:00
Qiuxu Zhuo
98f2fc829e EDAC, skx_edac: Delete duplicated code
Delete the duplicated code from skx_edac.c and rename skx_edac.c to
skx_base.c. Update the Makefile to build the skx_edac driver from
skx_base.c and skx_common.c.

Add SPDX to skx_base.c and clean out unnecessary #include lines.

 [ bp: Drop the license boilerplate - there's an SPDX identifier now. ]

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-4-tony.luck@intel.com
2019-02-02 13:33:11 +01:00
Qiuxu Zhuo
88a242c987 EDAC, skx_common: Separate common code out from skx_edac
Parts of skx_edac can be shared with the Intel 10nm server EDAC driver.

Carve out the common parts from skx_edac in preparation to support both
skx_edac driver and i10nm_edac drivers.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-3-tony.luck@intel.com
2019-02-02 10:50:59 +01:00
Thor Thayer
245b6c6558 EDAC, altera: Fix S10 persistent register offset
Correct the persistent register offset where address and status are
stored.

Fixes: 08f08bfb7b ("EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: robh+dt@kernel.org
Cc: stable <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1548179287-21760-2-git-send-email-thor.thayer@linux.intel.com
2019-01-24 17:13:59 +01:00
Greg Kroah-Hartman
912ebd99ed EDAC: Do not check return value of debugfs_create() functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

 [ bp: Make edac_debugfs_init() return void too, while at it. ]

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190122152151.16139-17-gregkh@linuxfoundation.org
2019-01-23 10:41:18 +01:00
Stefan M Schaeckeler
9b7e6242ee EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
Add support for the Aspeed AST2500 SoC.

Signed-off-by: Stefan M Schaeckeler <sschaeck@cisco.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-aspeed@lists.ozlabs.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1547743097-5236-2-git-send-email-schaecsn@gmx.net
2019-01-18 15:23:11 +01:00
Patrick Havelange
75dfa87035 EDAC, fsl_ddr: Add LS1021A to the list of supported hardware
The Freescale ddr driver also works on the LS1021A board.

Signed-off-by: Patrick Havelange <patrick.havelange@essensium.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: York Sun <york.sun@nxp.com>
Cc: arnout.vandecappelle@essensium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: matthew.weber@rockwellcollins.com
Cc: patrick.havelange@essensium.com
Link: https://lkml.kernel.org/r/20181219104323.10324-1-patrick.havelange@essensium.com
2018-12-19 11:57:45 +01:00
YueHaibing
bd44735418 EDAC, i5000: Remove set but not used local variables
Remove unused local variables as reported by gcc's -Wunused-but-set-variable option.

 [ bp: simplify commit message. ]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181211095207.25936-1-yuehaibing@huawei.com
2018-12-11 14:53:49 +01:00
Alexandre Belloni
37d964f914 EDAC, i82975x: Fix spelling mistake "reserverd" -> "reserved"
Fix a spelling mistake in a register layout description.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181120153304.1218-1-alexandre.belloni@bootlin.com
2018-11-20 17:46:01 +01:00
York Sun
a59817fa8f EDAC, fsl: Move error injection under CONFIG_EDAC_DEBUG
Gate error injection feature with CONFIG_EDAC_DEBUG so that it is not
visible in production setups.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: York Sun <york.sun@nxp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181119225303.13265-1-york.sun@nxp.com
2018-11-20 17:33:32 +01:00
Qiuxu Zhuo
fa1c071c1e EDAC, skx: Let EDAC core show the decoded result for debugfs
Current debugfs shows the decoded result in its own print format which
is inconvenient for analysis/statistics.

Use skx_mce_check_error() instead of skx_decode() for debugfs,
then the decoded result is showed via EDAC core in a more readable
format like "CPU_SrcID#[0-9]_MC#[0-9]_Chan#[0-9]_DIMM#[0-9]".

Print a warning the first time this interface is used so the
administrator can see the console log that error(s) have been faked.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353705-13531-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:19:53 +01:00
Qiuxu Zhuo
85b9c8bfee EDAC, skx: Move debugfs node under EDAC's hierarchy
The debugfs node is /sys/kernel/debug/skx_edac_test. Rename it and move
under EDAC debugfs root directory. Remove the unused 'skx_fake_addr' and
remove the 'skx_test' on error.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353684-13496-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:17:24 +01:00
Qiuxu Zhuo
e235dd43d8 EDAC, skx: Prepend hex formatting with '0x'
Some debug/error strings in hex formatting do not have the '0x' prefix.

Prepend hex formatting with '0x' for them, but with one exception:
"Couldn't enable %04x:%04x", instead of putting '0x' in this line,
add the word 'device'. We commonly use 8086:1234 without the leading
'0x' (e.g. as '-d' argument to lspci(8) and setpci(8) commands).

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Tony Luck <tony.luck@intel.com>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353660-13458-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:17:19 +01:00
Qiuxu Zhuo
a6a386152a EDAC, skx: Fix function calling order in skx_exit()
The order of function calling in skx_exit() is not the reversed order in
skx_init(). Fix it.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Tony Luck <tony.luck@intel.com>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353616-13421-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:17:11 +01:00
Borislav Petkov
861e6ed667 EDAC: Drop per-memory controller buses
... and use the single edac_subsys object returned from
subsys_system_register(). The idea is to have a single bus
and multiple devices on it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
2018-11-13 21:55:24 +01:00
Tony Luck
88a10b1517 EDAC: Don't add devices under /sys/bus/edac
Nobody(*) uses them. Dropping this will allow us to make the total
number of memory controllers configurable (as we won't have to worry
about duplicated device names under this directory).

(*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk
2018-11-13 20:26:41 +01:00
Colin Ian King
1722bc0e8c EDAC: Fix indentation issues in several EDAC drivers
Replace spaces with tabs and insert missing indentation.

 [ bp: Rewrite commit message. ]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Arvind R." <arvino55@gmail.com>
CC: Mark Gross <mark.gross@intel.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Ranganathan Desikan <ravi@jetztechnologies.com>
CC: kernel-janitors@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181109133757.21471-1-colin.king@canonical.com
2018-11-10 16:56:16 +01:00
Luck, Tony
24c9d423e8 EDAC, skx: Fix randconfig builds in a better way
It was previously noted that Kconfig complained about unmet dependencies
when trying to configure skx_edac together with CONFIG_ACPI=n. First fix
for this checked for ACPI when doing

  select ACPI_ADXL

but this required stub functions for the case where ACPI wasn't
selected. It also allowed building a driver that didn't actually work
for a system that has non-volatile DIMMs.

Arnd Bergmann pointed out that the right fix is to make EDAC_SKX
"depend on ACPI".

Fixes: a324e9396c ("EDAC, skx: Fix randconfig builds")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/20181106183914.GA26731@agluck-desk
2018-11-07 22:58:29 +01:00
YueHaibing
96c1c58eb0 EDAC, i82975x: Remove set but not used variable dtype
Fix this gcc -Wunused-but-set-variable warning:

  drivers/edac/i82975x_edac.c:378:16: warning: variable 'dtype'
  	set but not used [-Wunused-but-set-variable]

It was introduced in

  084a4fccef ("edac: move dimm properties to struct dimm_info")

but never used.

Also, remove the function i82975x_dram_type() and move the comment and
the assignment to the place where it is used.

 [ bp: massage commit message and shorten comment. ]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Arvind R." <arvino55@gmail.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: ravi@jetztechnologies.com
CC: arvino55@gmail.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181107022237.14048-1-yuehaibing@huawei.com
2018-11-07 20:19:19 +01:00