mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2024-11-01 00:48:50 +00:00
docs/vm: hwpoison.txt: convert to ReST format
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
88ececc23c
commit
b53ba58845
1 changed files with 70 additions and 71 deletions
|
@ -1,7 +1,14 @@
|
||||||
|
.. hwpoison:
|
||||||
|
|
||||||
|
========
|
||||||
|
hwpoison
|
||||||
|
========
|
||||||
|
|
||||||
What is hwpoison?
|
What is hwpoison?
|
||||||
|
=================
|
||||||
|
|
||||||
Upcoming Intel CPUs have support for recovering from some memory errors
|
Upcoming Intel CPUs have support for recovering from some memory errors
|
||||||
(``MCA recovery''). This requires the OS to declare a page "poisoned",
|
(``MCA recovery``). This requires the OS to declare a page "poisoned",
|
||||||
kill the processes associated with it and avoid using it in the future.
|
kill the processes associated with it and avoid using it in the future.
|
||||||
|
|
||||||
This patchkit implements the necessary infrastructure in the VM.
|
This patchkit implements the necessary infrastructure in the VM.
|
||||||
|
@ -46,9 +53,10 @@ address. This in theory allows other applications to handle
|
||||||
memory failures too. The expection is that near all applications
|
memory failures too. The expection is that near all applications
|
||||||
won't do that, but some very specialized ones might.
|
won't do that, but some very specialized ones might.
|
||||||
|
|
||||||
---
|
Failure recovery modes
|
||||||
|
======================
|
||||||
|
|
||||||
There are two (actually three) modi memory failure recovery can be in:
|
There are two (actually three) modes memory failure recovery can be in:
|
||||||
|
|
||||||
vm.memory_failure_recovery sysctl set to zero:
|
vm.memory_failure_recovery sysctl set to zero:
|
||||||
All memory failures cause a panic. Do not attempt recovery.
|
All memory failures cause a panic. Do not attempt recovery.
|
||||||
|
@ -67,9 +75,8 @@ late kill
|
||||||
This is best for memory error unaware applications and default
|
This is best for memory error unaware applications and default
|
||||||
Note some pages are always handled as late kill.
|
Note some pages are always handled as late kill.
|
||||||
|
|
||||||
---
|
User control
|
||||||
|
============
|
||||||
User control:
|
|
||||||
|
|
||||||
vm.memory_failure_recovery
|
vm.memory_failure_recovery
|
||||||
See sysctl.txt
|
See sysctl.txt
|
||||||
|
@ -79,11 +86,19 @@ vm.memory_failure_early_kill
|
||||||
|
|
||||||
PR_MCE_KILL
|
PR_MCE_KILL
|
||||||
Set early/late kill mode/revert to system default
|
Set early/late kill mode/revert to system default
|
||||||
arg1: PR_MCE_KILL_CLEAR: Revert to system default
|
|
||||||
arg1: PR_MCE_KILL_SET: arg2 defines thread specific mode
|
arg1: PR_MCE_KILL_CLEAR:
|
||||||
PR_MCE_KILL_EARLY: Early kill
|
Revert to system default
|
||||||
PR_MCE_KILL_LATE: Late kill
|
arg1: PR_MCE_KILL_SET:
|
||||||
PR_MCE_KILL_DEFAULT: Use system global default
|
arg2 defines thread specific mode
|
||||||
|
|
||||||
|
PR_MCE_KILL_EARLY:
|
||||||
|
Early kill
|
||||||
|
PR_MCE_KILL_LATE:
|
||||||
|
Late kill
|
||||||
|
PR_MCE_KILL_DEFAULT
|
||||||
|
Use system global default
|
||||||
|
|
||||||
Note that if you want to have a dedicated thread which handles
|
Note that if you want to have a dedicated thread which handles
|
||||||
the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
|
the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
|
||||||
call prctl(PR_MCE_KILL_EARLY) on the designated thread. Otherwise,
|
call prctl(PR_MCE_KILL_EARLY) on the designated thread. Otherwise,
|
||||||
|
@ -92,48 +107,38 @@ PR_MCE_KILL
|
||||||
PR_MCE_KILL_GET
|
PR_MCE_KILL_GET
|
||||||
return current mode
|
return current mode
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
---
|
* madvise(MADV_HWPOISON, ....) (as root) - Poison a page in the
|
||||||
|
process for testing
|
||||||
|
|
||||||
Testing:
|
* hwpoison-inject module through debugfs ``/sys/kernel/debug/hwpoison/``
|
||||||
|
|
||||||
madvise(MADV_HWPOISON, ....)
|
|
||||||
(as root)
|
|
||||||
Poison a page in the process for testing
|
|
||||||
|
|
||||||
|
|
||||||
hwpoison-inject module through debugfs
|
|
||||||
|
|
||||||
/sys/kernel/debug/hwpoison/
|
|
||||||
|
|
||||||
corrupt-pfn
|
corrupt-pfn
|
||||||
|
|
||||||
Inject hwpoison fault at PFN echoed into this file. This does
|
Inject hwpoison fault at PFN echoed into this file. This does
|
||||||
some early filtering to avoid corrupted unintended pages in test suites.
|
some early filtering to avoid corrupted unintended pages in test suites.
|
||||||
|
|
||||||
unpoison-pfn
|
unpoison-pfn
|
||||||
|
Software-unpoison page at PFN echoed into this file. This way
|
||||||
Software-unpoison page at PFN echoed into this file. This
|
a page can be reused again. This only works for Linux
|
||||||
way a page can be reused again.
|
injected failures, not for real memory failures.
|
||||||
This only works for Linux injected failures, not for real
|
|
||||||
memory failures.
|
|
||||||
|
|
||||||
Note these injection interfaces are not stable and might change between
|
Note these injection interfaces are not stable and might change between
|
||||||
kernel versions
|
kernel versions
|
||||||
|
|
||||||
corrupt-filter-dev-major
|
corrupt-filter-dev-major, corrupt-filter-dev-minor
|
||||||
corrupt-filter-dev-minor
|
Only handle memory failures to pages associated with the file
|
||||||
|
system defined by block device major/minor. -1U is the
|
||||||
Only handle memory failures to pages associated with the file system defined
|
wildcard value. This should be only used for testing with
|
||||||
by block device major/minor. -1U is the wildcard value.
|
artificial injection.
|
||||||
This should be only used for testing with artificial injection.
|
|
||||||
|
|
||||||
corrupt-filter-memcg
|
corrupt-filter-memcg
|
||||||
|
Limit injection to pages owned by memgroup. Specified by inode
|
||||||
|
number of the memcg.
|
||||||
|
|
||||||
Limit injection to pages owned by memgroup. Specified by inode number
|
Example::
|
||||||
of the memcg.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
mkdir /sys/fs/cgroup/mem/hwpoison
|
mkdir /sys/fs/cgroup/mem/hwpoison
|
||||||
|
|
||||||
usemem -m 100 -s 1000 &
|
usemem -m 100 -s 1000 &
|
||||||
|
@ -145,24 +150,21 @@ Example:
|
||||||
page-types -p `pidof init` --hwpoison # shall do nothing
|
page-types -p `pidof init` --hwpoison # shall do nothing
|
||||||
page-types -p `pidof usemem` --hwpoison # poison its pages
|
page-types -p `pidof usemem` --hwpoison # poison its pages
|
||||||
|
|
||||||
corrupt-filter-flags-mask
|
corrupt-filter-flags-mask, corrupt-filter-flags-value
|
||||||
corrupt-filter-flags-value
|
When specified, only poison pages if ((page_flags & mask) ==
|
||||||
|
value). This allows stress testing of many kinds of
|
||||||
|
pages. The page_flags are the same as in /proc/kpageflags. The
|
||||||
|
flag bits are defined in include/linux/kernel-page-flags.h and
|
||||||
|
documented in Documentation/vm/pagemap.txt
|
||||||
|
|
||||||
When specified, only poison pages if ((page_flags & mask) == value).
|
* Architecture specific MCE injector
|
||||||
This allows stress testing of many kinds of pages. The page_flags
|
|
||||||
are the same as in /proc/kpageflags. The flag bits are defined in
|
|
||||||
include/linux/kernel-page-flags.h and documented in
|
|
||||||
Documentation/vm/pagemap.txt
|
|
||||||
|
|
||||||
Architecture specific MCE injector
|
|
||||||
|
|
||||||
x86 has mce-inject, mce-test
|
x86 has mce-inject, mce-test
|
||||||
|
|
||||||
Some portable hwpoison test programs in mce-test, see blow.
|
Some portable hwpoison test programs in mce-test, see below.
|
||||||
|
|
||||||
---
|
References
|
||||||
|
==========
|
||||||
References:
|
|
||||||
|
|
||||||
http://halobates.de/mce-lc09-2.pdf
|
http://halobates.de/mce-lc09-2.pdf
|
||||||
Overview presentation from LinuxCon 09
|
Overview presentation from LinuxCon 09
|
||||||
|
@ -174,14 +176,11 @@ git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
|
||||||
x86 specific injector
|
x86 specific injector
|
||||||
|
|
||||||
|
|
||||||
---
|
Limitations
|
||||||
|
===========
|
||||||
Limitations:
|
|
||||||
|
|
||||||
- Not all page types are supported and never will. Most kernel internal
|
- Not all page types are supported and never will. Most kernel internal
|
||||||
objects cannot be recovered, only LRU pages for now.
|
objects cannot be recovered, only LRU pages for now.
|
||||||
- Right now hugepage support is missing.
|
- Right now hugepage support is missing.
|
||||||
|
|
||||||
---
|
---
|
||||||
Andi Kleen, Oct 2009
|
Andi Kleen, Oct 2009
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue