Commit Graph

15 Commits

Author SHA1 Message Date
Toshi Kani 807900395e acpi/nfit: Issue Start ARS to retrieve existing records
ACPI 6.2 defines in section 9.20.7.2 that the OSPM may call a Start
ARS with Flags Bit [1] set upon receiving the 0x81 notification.

  Upon receiving the notification, the OSPM may decide to issue
  a Start ARS with Flags Bit [1] set to prepare for the retrieval
  of existing records and issue the Query ARS Status function to
  retrieve the records.

Add support to call a Start ARS from acpi_nfit_uc_error_notify()
with ND_ARS_RETURN_PREV_DATA set when HW_ERROR_SCRUB_ON is not set.

Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-07-02 09:56:37 -07:00
Toshi Kani 56b47fe657 acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2
ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device
in Table 5-169.

 0x81 Unconsumed Uncorrectable Memory Error Detected
      Used to pro-actively notify OSPM of uncorrectable memory errors
      detected (for example a memory scrubbing engine that continuously
      scans the NVDIMMs memory). This is an optional notification. Only
      locations that were mapped in to SPA by the platform will generate
      a notification.

Add support of this notification value by initiating an ARS scan. This
will find new error locations and add their badblocks information.

Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:39:42 -07:00
Andy Shevchenko 41c8bdb3ab acpi, nfit: Switch to use new generic UUID API
There are new types and helpers that are supposed to be used in new code.

As a preparation to get rid of legacy types and API functions do
the conversion here.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-05 19:42:02 +02:00
Dan Williams fbabd829fe acpi, nfit: fix module unload vs workqueue shutdown race
The workqueue may still be running when the devres callbacks start
firing to deallocate an acpi_nfit_desc instance. Stop and flush the
workqueue before letting any other devres de-allocations proceed.

Reported-by: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-18 10:55:37 -07:00
Dan Williams 9ccaed4bfd acpi, nfit: limit ->flush_probe() to initialization work
The nvdimm probe flushing mechanism gives userspace a sync point where
it knows all asynchronous driver probe sequences have completed.
However, it need not wait for other asynchronous actions, like
on-demand address-range-scrub. Track the init work separately from other
work in the workqueue, and only flush the former.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-17 12:34:17 -07:00
Dan Williams 1499934dcd acpi, nfit: support "map failed" dimms
Stop requiring dimms be successfully mapped into a
system-physical-address range. For provisioning and hardware remediation
purposes the kernel should account for failed devices in sysfs. If
possible it should still allow management commands to be sent to the
device.

Reported-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-17 12:34:17 -07:00
Dan Williams a7de92dac9 tools/testing/nvdimm: unit test acpi_nfit_ctl()
A recent flurry of bug discoveries in the nfit driver's DSM marshalling
routine has highlighted the fact that we do not have unit test coverage
for this routine. Add a self-test of acpi_nfit_ctl() routine before
probing the "nfit_test.0" device. This mocks stimulus to acpi_nfit_ctl()
and if any of the tests fail "nfit_test.0" will be unavailable causing
the rest of the tests to not run / fail.

This unit test will also be a place to land reproductions of quirky BIOS
behavior discovered in the field and ensure the kernel does not regress
against implementations it has seen in practice.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06 17:42:36 -08:00
Vishal Verma 9ffd6350a1 nfit: don't start a full scrub by default for an MCE
Starting a full Address Range Scrub (ARS) on hitting a memory error
machine check exception may not always be desirable. Provide a way
through sysfs to toggle the behavior between just adding the address
(cache line) where the MCE happened to the poison list and doing a full
scrub. The former (selective insertion of the address) is done
unconditionally.

Cc: linux-acpi@vger.kernel.org
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30 17:00:10 -07:00
Dan Williams 231bf117aa tools/testing/nvdimm: unit test for acpi_nvdimm_notify()
Trigger an nmemX/nfit/flags attribute to fire an event whenever a
smart-threshold DSM is received.

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-01 18:20:14 -07:00
Dan Williams ba9c8dd3c2 acpi, nfit: add dimm device notification support
Per "ACPI 6.1 Section 9.20.3" NVDIMM devices, children of the ACPI0012
NVDIMM Root device, can receive health event notifications.

Given that these devices are precluded from registering a notification
handler via acpi_driver.acpi_device_ops (due to no _HID), we use
acpi_install_notify_handler() directly.  The registered handler,
acpi_nvdimm_notify(), triggers a poll(2) event on the nmemX/nfit/flags
sysfs attribute when a health event notification is received.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-29 14:55:17 -07:00
Dan Williams c14a868a5a tools/testing/nvdimm: unit test for acpi_nfit_notify()
We have had a couple bugs in this implementation in the past and before
we add another ->notify() implementation for nvdimm devices, lets allow
this routine to be exercised via nfit_test.

Rewrite acpi_nfit_notify() in terms of a generic struct device and
acpi_handle parameter, and then implement a mock acpi_evaluate_object()
that returns a _FIT payload.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-23 07:49:42 -07:00
Vishal Verma c09f12186d acpi, nfit: check for the correct event code in notifications
Commit 209851649d "acpi: nfit: Add support for hot-add" added
support for _FIT notifications, but it neglected to verify the
notification event code matches the one in the ACPI spec for
"NFIT Update". Currently there is only one code in the spec, but
once additional codes are added, older kernels (without this fix)
will misbehave by assuming all event notifications are for an
NFIT Update.

Fixes: 209851649d ("acpi: nfit: Add support for hot-add")
Cc: <stable@vger.kernel.org>
Cc: <linux-acpi@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-23 07:49:08 -07:00
Dan Williams 0606263f24 Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next 2016-07-24 08:05:44 -07:00
Vishal Verma 6839a6d96f nfit: do an ARS scrub on hitting a latent media error
When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-24 08:04:04 -07:00
Dan Williams bdf97013ce nfit: move to nfit/ sub-directory
With the arrival of x86-machine-check support the nfit driver will add a
(conditionally-compiled) source file.  Prepare for this by moving all
nfit source to drivers/acpi/nfit/.  This is pure code movement, no
functional changes.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-24 08:04:04 -07:00