docs/vm: hwpoison.txt: convert to ReST format

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
Mike Rapoport 2018-03-21 21:22:25 +02:00 committed by Jonathan Corbet
parent 88ececc23c
commit b53ba58845

View file

@ -1,7 +1,14 @@
.. hwpoison:
========
hwpoison
========
What is hwpoison? What is hwpoison?
=================
Upcoming Intel CPUs have support for recovering from some memory errors Upcoming Intel CPUs have support for recovering from some memory errors
(``MCA recovery''). This requires the OS to declare a page "poisoned", (``MCA recovery``). This requires the OS to declare a page "poisoned",
kill the processes associated with it and avoid using it in the future. kill the processes associated with it and avoid using it in the future.
This patchkit implements the necessary infrastructure in the VM. This patchkit implements the necessary infrastructure in the VM.
@ -46,9 +53,10 @@ address. This in theory allows other applications to handle
memory failures too. The expection is that near all applications memory failures too. The expection is that near all applications
won't do that, but some very specialized ones might. won't do that, but some very specialized ones might.
--- Failure recovery modes
======================
There are two (actually three) modi memory failure recovery can be in: There are two (actually three) modes memory failure recovery can be in:
vm.memory_failure_recovery sysctl set to zero: vm.memory_failure_recovery sysctl set to zero:
All memory failures cause a panic. Do not attempt recovery. All memory failures cause a panic. Do not attempt recovery.
@ -67,9 +75,8 @@ late kill
This is best for memory error unaware applications and default This is best for memory error unaware applications and default
Note some pages are always handled as late kill. Note some pages are always handled as late kill.
--- User control
============
User control:
vm.memory_failure_recovery vm.memory_failure_recovery
See sysctl.txt See sysctl.txt
@ -79,11 +86,19 @@ vm.memory_failure_early_kill
PR_MCE_KILL PR_MCE_KILL
Set early/late kill mode/revert to system default Set early/late kill mode/revert to system default
arg1: PR_MCE_KILL_CLEAR: Revert to system default
arg1: PR_MCE_KILL_SET: arg2 defines thread specific mode arg1: PR_MCE_KILL_CLEAR:
PR_MCE_KILL_EARLY: Early kill Revert to system default
PR_MCE_KILL_LATE: Late kill arg1: PR_MCE_KILL_SET:
PR_MCE_KILL_DEFAULT: Use system global default arg2 defines thread specific mode
PR_MCE_KILL_EARLY:
Early kill
PR_MCE_KILL_LATE:
Late kill
PR_MCE_KILL_DEFAULT
Use system global default
Note that if you want to have a dedicated thread which handles Note that if you want to have a dedicated thread which handles
the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
call prctl(PR_MCE_KILL_EARLY) on the designated thread. Otherwise, call prctl(PR_MCE_KILL_EARLY) on the designated thread. Otherwise,
@ -92,48 +107,38 @@ PR_MCE_KILL
PR_MCE_KILL_GET PR_MCE_KILL_GET
return current mode return current mode
Testing
=======
--- * madvise(MADV_HWPOISON, ....) (as root) - Poison a page in the
process for testing
Testing: * hwpoison-inject module through debugfs ``/sys/kernel/debug/hwpoison/``
madvise(MADV_HWPOISON, ....)
(as root)
Poison a page in the process for testing
hwpoison-inject module through debugfs
/sys/kernel/debug/hwpoison/
corrupt-pfn corrupt-pfn
Inject hwpoison fault at PFN echoed into this file. This does Inject hwpoison fault at PFN echoed into this file. This does
some early filtering to avoid corrupted unintended pages in test suites. some early filtering to avoid corrupted unintended pages in test suites.
unpoison-pfn unpoison-pfn
Software-unpoison page at PFN echoed into this file. This way
Software-unpoison page at PFN echoed into this file. This a page can be reused again. This only works for Linux
way a page can be reused again. injected failures, not for real memory failures.
This only works for Linux injected failures, not for real
memory failures.
Note these injection interfaces are not stable and might change between Note these injection interfaces are not stable and might change between
kernel versions kernel versions
corrupt-filter-dev-major corrupt-filter-dev-major, corrupt-filter-dev-minor
corrupt-filter-dev-minor Only handle memory failures to pages associated with the file
system defined by block device major/minor. -1U is the
Only handle memory failures to pages associated with the file system defined wildcard value. This should be only used for testing with
by block device major/minor. -1U is the wildcard value. artificial injection.
This should be only used for testing with artificial injection.
corrupt-filter-memcg corrupt-filter-memcg
Limit injection to pages owned by memgroup. Specified by inode
number of the memcg.
Limit injection to pages owned by memgroup. Specified by inode number Example::
of the memcg.
Example:
mkdir /sys/fs/cgroup/mem/hwpoison mkdir /sys/fs/cgroup/mem/hwpoison
usemem -m 100 -s 1000 & usemem -m 100 -s 1000 &
@ -145,24 +150,21 @@ Example:
page-types -p `pidof init` --hwpoison # shall do nothing page-types -p `pidof init` --hwpoison # shall do nothing
page-types -p `pidof usemem` --hwpoison # poison its pages page-types -p `pidof usemem` --hwpoison # poison its pages
corrupt-filter-flags-mask corrupt-filter-flags-mask, corrupt-filter-flags-value
corrupt-filter-flags-value When specified, only poison pages if ((page_flags & mask) ==
value). This allows stress testing of many kinds of
pages. The page_flags are the same as in /proc/kpageflags. The
flag bits are defined in include/linux/kernel-page-flags.h and
documented in Documentation/vm/pagemap.txt
When specified, only poison pages if ((page_flags & mask) == value). * Architecture specific MCE injector
This allows stress testing of many kinds of pages. The page_flags
are the same as in /proc/kpageflags. The flag bits are defined in
include/linux/kernel-page-flags.h and documented in
Documentation/vm/pagemap.txt
Architecture specific MCE injector
x86 has mce-inject, mce-test x86 has mce-inject, mce-test
Some portable hwpoison test programs in mce-test, see blow. Some portable hwpoison test programs in mce-test, see below.
--- References
==========
References:
http://halobates.de/mce-lc09-2.pdf http://halobates.de/mce-lc09-2.pdf
Overview presentation from LinuxCon 09 Overview presentation from LinuxCon 09
@ -174,14 +176,11 @@ git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
x86 specific injector x86 specific injector
--- Limitations
===========
Limitations:
- Not all page types are supported and never will. Most kernel internal - Not all page types are supported and never will. Most kernel internal
objects cannot be recovered, only LRU pages for now. objects cannot be recovered, only LRU pages for now.
- Right now hugepage support is missing. - Right now hugepage support is missing.
--- ---
Andi Kleen, Oct 2009 Andi Kleen, Oct 2009