Commit Graph

14915 Commits

Author SHA1 Message Date
Ben Widawsky b7e3debdd0 mm/memory_hotplug.c: fix false softlockup during pfn range removal
When working with very large nodes, poisoning the struct pages (for which
there will be very many) can take a very long time.  If the system is
using voluntary preemptions, the software watchdog will not be able to
detect forward progress.  This patch addresses this issue by offering to
give up time like __remove_pages() does.  This behavior was introduced in
v5.6 with: commit d33695b16a ("mm/memory_hotplug: poison memmap in
remove_pfn_range_from_zone()")

Alternately, init_page_poison could do this cond_resched(), but it seems
to me that the caller of init_page_poison() is what actually knows whether
or not it should relax its own priority.

Based on Dan's notes, I think this is perfectly safe: commit f931ab479d
("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

Aside from fixing the lockup, it is also a friendlier thing to do on lower
core systems that might wipe out large chunks of hotplug memory (probably
not a very common case).

Fixes this kind of splat:

  watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
  irq event stamp: 138450
  hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
  hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
  softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
  softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
  CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
  Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
  RIP: 0010:memset_erms+0x9/0x10
  Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
  Call Trace:
   remove_pfn_range_from_zone+0x3a/0x380
   memunmap_pages+0x17f/0x280
   release_nodes+0x22a/0x260
   __device_release_driver+0x172/0x220
   device_driver_detach+0x3e/0xa0
   unbind_store+0x113/0x130
   kernfs_fop_write+0xdc/0x1c0
   vfs_write+0xde/0x1d0
   ksys_write+0x58/0xd0
   do_syscall_64+0x5a/0x120
   entry_SYSCALL_64_after_hwframe+0x49/0xb3
  Built 2 zonelists, mobility grouping on.  Total pages: 49050381
  Policy zone: Normal
  Built 3 zonelists, mobility grouping on.  Total pages: 49312525
  Policy zone: Normal

David said: "It really only is an issue for devmem.  Ordinary
hotplugged system memory is not affected (onlined/offlined in memory
block granularity)."

Link: http://lkml.kernel.org/r/20200619231213.1160351-1-ben.widawsky@intel.com
Fixes: commit d33695b16a ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:38 -07:00
Christoph Hellwig 7a0e27b2a0 mm: remove vmalloc_exec
Merge vmalloc_exec into its only caller.  Note that for !CONFIG_MMU
__vmalloc_node_range maps to __vmalloc, which directly clears the
__GFP_HIGHMEM added by the vmalloc_exec stub anyway.

Link: http://lkml.kernel.org/r/20200618064307.32739-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:38 -07:00
Joonsoo Kim 0076f029cb mm/memory: fix IO cost for anonymous page
With synchronous IO swap device, swap-in is directly handled in fault
code.  Since IO cost notation isn't added there, with synchronous IO
swap device, LRU balancing could be wrongly biased.  Fix it to count it
in fault code.

Link: http://lkml.kernel.org/r/1592288204-27734-4-git-send-email-iamjoonsoo.kim@lge.com
Fixes: 314b57fb04 ("mm: balance LRU lists based on relative thrashing cache sizing")
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:38 -07:00
Joonsoo Kim cb6868832e mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages"
Non-file-lru page could also be activated in mark_page_accessed() and we
need to count this activation for nonresident_age.

Note that it's better for this patch to be squashed into the patch "mm:
workingset: age nonresident information alongside anonymous pages".

Link: http://lkml.kernel.org/r/1592288204-27734-3-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:38 -07:00
Johannes Weiner 31d8fcac00 mm: workingset: age nonresident information alongside anonymous pages
Patch series "fix for "mm: balance LRU lists based on relative
thrashing" patchset"

This patchset fixes some problems of the patchset, "mm: balance LRU
lists based on relative thrashing", which is now merged on the mainline.

Patch "mm: workingset: let cache workingset challenge anon fix" is the
result of discussion with Johannes.  See following link.

  http://lkml.kernel.org/r/20200520232525.798933-6-hannes@cmpxchg.org

And, the other two are minor things which are found when I try to rebase
my patchset.

This patch (of 3):

After ("mm: workingset: let cache workingset challenge anon fix"), we
compare refault distances to active_file + anon.  But age of the
non-resident information is only driven by the file LRU.  As a result,
we may overestimate the recency of any incoming refaults and activate
them too eagerly, causing unnecessary LRU churn in certain situations.

Make anon aging drive nonresident age as well to address that.

Link: http://lkml.kernel.org/r/1592288204-27734-1-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1592288204-27734-2-git-send-email-iamjoonsoo.kim@lge.com
Fixes: 34e58cac6d ("mm: workingset: let cache workingset challenge anon")
Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Chris Down 03960e3318 mm/memcontrol.c: prevent missed memory.low load tears
Looks like one of these got missed when massaging in f86b810c26 ("mm,
memcg: prevent memory.low load/store tearing") with other linux-mm
changes.

Link: http://lkml.kernel.org/r/20200612174437.GA391453@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Reported-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Muchun Song 3a98990ae2 mm/memcontrol.c: add missed css_put()
We should put the css reference when memory allocation failed.

Link: http://lkml.kernel.org/r/20200614122653.98829-1-songmuchun@bytedance.com
Fixes: f0a3a24b53 ("mm: memcg/slab: rework non-root kmem_cache lifecycle management")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Johannes Weiner cd324edce5 mm: memcontrol: handle div0 crash race condition in memory.low
Tejun reports seeing rare div0 crashes in memory.low stress testing:

  RIP: 0010:mem_cgroup_calculate_protection+0xed/0x150
  Code: 0f 46 d1 4c 39 d8 72 57 f6 05 16 d6 42 01 40 74 1f 4c 39 d8 76 1a 4c 39 d1 76 15 4c 29 d1 4c 29 d8 4d 29 d9 31 d2 48 0f af c1 <49> f7 f1 49 01 c2 4c 89 96 38 01 00 00 5d c3 48 0f af c7 31 d2 49
  RSP: 0018:ffffa14e01d6fcd0 EFLAGS: 00010246
  RAX: 000000000243e384 RBX: 0000000000000000 RCX: 0000000000008f4b
  RDX: 0000000000000000 RSI: ffff8b89bee84000 RDI: 0000000000000000
  RBP: ffffa14e01d6fcd0 R08: ffff8b89ca7d40f8 R09: 0000000000000000
  R10: 0000000000000000 R11: 00000000006422f7 R12: 0000000000000000
  R13: ffff8b89d9617000 R14: ffff8b89bee84000 R15: ffffa14e01d6fdb8
  FS:  0000000000000000(0000) GS:ffff8b8a1f1c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f93b1fc175b CR3: 000000016100a000 CR4: 0000000000340ea0
  Call Trace:
    shrink_node+0x1e5/0x6c0
    balance_pgdat+0x32d/0x5f0
    kswapd+0x1d7/0x3d0
    kthread+0x11c/0x160
    ret_from_fork+0x1f/0x30

This happens when parent_usage == siblings_protected.

We check that usage is bigger than protected, which should imply
parent_usage being bigger than siblings_protected.  However, we don't
read (or even update) these values atomically, and they can be out of
sync as the memory state changes under us.  A bit of fluctuation around
the target protection isn't a big deal, but we need to handle the div0
case.

Check the parent state explicitly to make sure we have a reasonable
positive value for the divisor.

Link: http://lkml.kernel.org/r/20200615140658.601684-1-hannes@cmpxchg.org
Fixes: 8a931f8013 ("mm: memcontrol: recursive memory.low protection")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Masanari Iida 8eab7035b2 mm/vmalloc.c: fix a warning while make xmldocs
This patch fixes following warning while "make xmldocs"

  mm/vmalloc.c:1877: warning: Excess function parameter 'prot' description in 'vm_map_ram'

This warning started since commit d4efd79a81 ("mm: remove the prot
argument from vm_map_ram").

Link: http://lkml.kernel.org/r/20200622152850.140871-1-standby24x7@gmail.com
Fixes: d4efd79a81 ("mm: remove the prot argument from vm_map_ram")
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Christophe Leroy 9449c9cb42 mm/debug_vm_pgtable: fix build failure with powerpc 8xx
Since commit 9e343b467c ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses"), READ_ONCE() cannot be used
anymore to read complex page table entries.

This leads to:

      CC      mm/debug_vm_pgtable.o
    In file included from ./include/asm-generic/bug.h:5,
                     from ./arch/powerpc/include/asm/bug.h:109,
                     from ./include/linux/bug.h:5,
                     from ./include/linux/mmdebug.h:5,
                     from ./include/linux/gfp.h:5,
                     from mm/debug_vm_pgtable.c:13:
    In function 'pte_clear_tests',
        inlined from 'debug_vm_pgtable' at mm/debug_vm_pgtable.c:363:2:
    ./include/linux/compiler.h:392:38: error: Unsupported access size for {READ,WRITE}_ONCE().
    mm/debug_vm_pgtable.c:249:14: note: in expansion of macro 'READ_ONCE'
      249 |  pte_t pte = READ_ONCE(*ptep);
          |              ^~~~~~~~~
    make[2]: *** [mm/debug_vm_pgtable.o] Error 1

Fix it by using the recently added ptep_get() helper.

Link: http://lkml.kernel.org/r/6ca8c972e6c920dc4ae0d4affbed9703afa4d010.1592490570.git.christophe.leroy@csgroup.eu
Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Arjun Roy 7f70c2a68a mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages()
Calls to pte_offset_map() in vm_insert_pages() are erroneously not
matched with a call to pte_unmap().  This would cause problems on
architectures where that is not a no-op.

This patch does away with the non-traditional locking in the existing
code, and instead uses pte_offset_map_lock/unlock() as usual,
incrementing PTE as necessary.  The PTE pointer is kept within bounds
since we clamp it with PTRS_PER_PTE.

Link: http://lkml.kernel.org/r/20200618220446.20284-1-arjunroy.kdev@gmail.com
Fixes: 8cd3984d81 ("mm/memory.c: add vm_insert_pages()")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Hugh Dickins 243bce09c9 mm: fix swap cache node allocation mask
Chris Murphy reports that a slightly overcommitted load, testing swap
and zram along with i915, splats and keeps on splatting, when it had
better fail less noisily:

  gnome-shell: page allocation failure: order:0,
  mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
  nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
  Call Trace:
    dump_stack+0x64/0x88
    warn_alloc.cold+0x75/0xd9
    __alloc_pages_slowpath.constprop.0+0xcfa/0xd30
    __alloc_pages_nodemask+0x2df/0x320
    alloc_slab_page+0x195/0x310
    allocate_slab+0x3c5/0x440
    ___slab_alloc+0x40c/0x5f0
    __slab_alloc+0x1c/0x30
    kmem_cache_alloc+0x20e/0x220
    xas_nomem+0x28/0x70
    add_to_swap_cache+0x321/0x400
    __read_swap_cache_async+0x105/0x240
    swap_cluster_readahead+0x22c/0x2e0
    shmem_swapin+0x8e/0xc0
    shmem_swapin_page+0x196/0x740
    shmem_getpage_gfp+0x3a2/0xa60
    shmem_read_mapping_page_gfp+0x32/0x60
    shmem_get_pages+0x155/0x5e0 [i915]
    __i915_gem_object_get_pages+0x68/0xa0 [i915]
    i915_vma_pin+0x3fe/0x6c0 [i915]
    eb_add_vma+0x10b/0x2c0 [i915]
    i915_gem_do_execbuffer+0x704/0x3430 [i915]
    i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
    drm_ioctl_kernel+0x86/0xd0 [drm]
    drm_ioctl+0x206/0x390 [drm]
    ksys_ioctl+0x82/0xc0
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x5b/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported on 5.7, but it goes back really to 3.1: when
shmem_read_mapping_page_gfp() was implemented for use by i915, and
allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
__read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
Fixes: 68da9f0557 ("tmpfs: pass gfp to shmem_getpage_gfp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Chris Murphy <lists@colorremedies.com>
Analyzed-by: Vlastimil Babka <vbabka@suse.cz>
Analyzed-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org>	[3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Sebastian Andrzej Siewior 55860d96ca slub: cure list_slab_objects() from double fix
According to Christopher Lameter two fixes have been merged for the same
problem.  As far as I can tell, the code does not acquire the list_lock
and invoke kmalloc().  list_slab_objects() misses an unlock (the
counterpart to get_map()) and the memory allocated in free_partial()
isn't used.

Revert the mentioned commit.

Link: http://lkml.kernel.org/r/20200618201234.795692-1-bigeasy@linutronix.de
Fixes: aa456c7aeb ("slub: remove kmalloc under list_lock from list_slab_objects() V2")
Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2006181501480.12014@www.lameter.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Waiman Long 8982ae527f mm/slab: use memzero_explicit() in kzfree()
The kzfree() function is normally used to clear some sensitive
information, like encryption keys, in the buffer before freeing it back to
the pool.  Memset() is currently used for buffer clearing.  However
unlikely, there is still a non-zero probability that the compiler may
choose to optimize away the memory clearing especially if LTO is being
used in the future.

To make sure that this optimization will never happen,
memzero_explicit(), which is introduced in v3.18, is now used in
kzfree() to future-proof it.

Link: http://lkml.kernel.org/r/20200616154311.12314-2-longman@redhat.com
Fixes: 3ef0e5ba46 ("slab: introduce kzfree()")
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Waiman Long d7670879c5 mm, slab: fix sign conversion problem in memcg_uncharge_slab()
It was found that running the LTP test on a PowerPC system could produce
erroneous values in /proc/meminfo, like:

  MemTotal:       531915072 kB
  MemFree:        507962176 kB
  MemAvailable:   1100020596352 kB

Using bisection, the problem is tracked down to commit 9c315e4d7d ("mm:
memcg/slab: cache page number in memcg_(un)charge_slab()").

In memcg_uncharge_slab() with a "int order" argument:

  unsigned int nr_pages = 1 << order;
    :
  mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);

The mod_lruvec_state() function will eventually call the
__mod_zone_page_state() which accepts a long argument.  Depending on the
compiler and how inlining is done, "-nr_pages" may be treated as a
negative number or a very large positive number.  Apparently, it was
treated as a large positive number in that PowerPC system leading to
incorrect stat counts.  This problem hasn't been seen in x86-64 yet,
perhaps the gcc compiler there has some slight difference in behavior.

It is fixed by making nr_pages a signed value.  For consistency, a similar
change is applied to memcg_charge_slab() as well.

Link: http://lkml.kernel.org/r/20200620184719.10994-1-longman@redhat.com
Fixes: 9c315e4d7d ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:37 -07:00
Vlastimil Babka b9e20f0da1 mm, compaction: make capture control handling safe wrt interrupts
Hugh reports:

 "While stressing compaction, one run oopsed on NULL capc->cc in
  __free_one_page()'s task_capc(zone): compact_zone_order() had been
  interrupted, and a page was being freed in the return from interrupt.

  Though you would not expect it from the source, both gccs I was using
  (4.8.1 and 7.5.0) had chosen to compile compact_zone_order() with the
  ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
  callq compact_zone - long after the "current->capture_control =
  &capc". An interrupt in between those finds capc->cc NULL (zeroed by
  an earlier rep stos).

  This could presumably be fixed by a barrier() before setting
  current->capture_control in compact_zone_order(); but would also need
  more care on return from compact_zone(), in order not to risk leaking
  a page captured by interrupt just before capture_control is reset.

  Maybe that is the preferable fix, but I felt safer for task_capc() to
  exclude the rather surprising possibility of capture at interrupt
  time"

I have checked that gcc10 also behaves the same.

The advantage of fix in compact_zone_order() is that we don't add
another test in the page freeing hot path, and that it might prevent
future problems if we stop exposing pointers to uninitialized structures
in current task.

So this patch implements the suggestion for compact_zone_order() with
barrier() (and WRITE_ONCE() to prevent store tearing) for setting
current->capture_control, and prevents page leaking with
WRITE_ONCE/READ_ONCE in the proper order.

Link: http://lkml.kernel.org/r/20200616082649.27173-1-vbabka@suse.cz
Fixes: 5e1f0f098b ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Hugh Dickins <hughd@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Li Wang <liwang@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>	[5.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:36 -07:00
Michal Hocko 545b1b077c mm: do_swap_page(): fix up the error code
do_swap_page() returns error codes from the VM_FAULT* space.  try_charge()
might return -ENOMEM, though, and then do_swap_page() simply returns 0
which means a success.

We almost never return ENOMEM for GFP_KERNEL single page charge.  Except
for async OOM handling (oom_disabled v1).  So this needs translation to
VM_FAULT_OOM otherwise the the page fault path will not notify the
userspace and wait for an action.

Link: http://lkml.kernel.org/r/20200617090238.GL9499@dhcp22.suse.cz
Fixes: 4c6355b25e ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:36 -07:00
Linus Torvalds 7561393908 powerpc fixes for 5.8 #3
One fix for the interrupt rework we did last release which broke KVM-PR.
 
 Three commits fixing some fallout from the READ_ONCE() changes interacting badly
 with our 8xx 16K pages support, which uses a pte_t that is a structure of 4
 actual PTEs.
 
 A cleanup of the 8xx pte_update() to use the newly added pmd_off().
 
 A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is enabled.
 
 A minor fix for the SPU syscall generation.
 
 Thanks to:
   Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike Rapoport,
   Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7vNVsTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgO3rD/46cXJQ9AMQqtZh3+sgWu95Zd6JOviL
 vfhWeH/kbt/p6OGPoLXoYChoFD44Mf7BmTEDflslYICrxvhu9zI2lYN+948zfrEY
 lIjjP+Dd6fr1D2o3+hnOOX/LHAVyyZJTsZp5i6ehTOXeUw8KOCF1ulVB3o5GgQK0
 I/0oewL/SXNFnZS5qLgF2/OFS/BH3OnDG6mpICxCetZC9mNbHrTzos403ijyrvcX
 AsE4JSzI2UM9kT0pWXLa9QR3RgfBZ4wtMrnKAwdGI/E+YqAa7TuHZatPDAqoCJYY
 aePEZdweaeLWHQaQYSqlNP7YLAHuSdvZ2SvU65c2EKaaXug9sZJImyboJl/fo0Xo
 EtZiVbfaTfqsyi7EVQnsLMFYmtquacXoUH//nIoTro4pRkeMsM94BiK2HISa+8Bs
 KGQxBsnK2UaTgWERZHiK2VaKY/Tl1vGs09u7R21GiE2aD25ly+/q1Uo+WUr6iRKh
 1v42AsH1VCeEZKAog43gBGOr7bCez8/90GNtTJnKTKndSRSybCH68ME/zBKdNACn
 A7M9E0CNNjTOQNJyQ2UhyiBJzUK6kT/5g+C4mEH5WG4FkO6YHT1JyEusvsfj6Oe9
 RwDr98iNuM8AhaT30XmUXithDAl6JA5+3S0OcC2bL2xQ0O/VBPGZhIzgSFU8T7BY
 qpDj8l/8zk64Fw==
 =qws5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - One fix for the interrupt rework we did last release which broke
   KVM-PR

 - Three commits fixing some fallout from the READ_ONCE() changes
   interacting badly with our 8xx 16K pages support, which uses a pte_t
   that is a structure of 4 actual PTEs

 - A cleanup of the 8xx pte_update() to use the newly added pmd_off()

 - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
   enabled

 - A minor fix for the SPU syscall generation

Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
Rapoport, Nicholas Piggin.

* tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Provide ptep_get() with 16k pages
  mm: Allow arches to provide ptep_get()
  mm/gup: Use huge_ptep_get() in gup_hugepte()
  powerpc/syscalls: Use the number when building SPU syscall table
  powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
  powerpc/64s: Fix KVM interrupt using wrong save area
  powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
2020-06-21 10:02:53 -07:00
Christophe Leroy 481e980a7c mm: Allow arches to provide ptep_get()
Since commit 9e343b467c ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
use READ_ONCE() to access complex page table entries like the one
defined for powerpc 8xx with 16k size pages.

Define a ptep_get() helper that architectures can override instead
of performing a READ_ONCE() on the page table entry pointer.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:53 +10:00
Christophe Leroy 55ca22633a mm/gup: Use huge_ptep_get() in gup_hugepte()
gup_hugepte() reads hugepage table entries, it can't read
them directly, huge_ptep_get() must be used.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ffc3714334c3bfaca6f13788ad039e8759ae413f.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:53 +10:00
Christoph Hellwig c0ee37e85e maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault
Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17 10:57:41 -07:00
Christoph Hellwig fe557319aa maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17 10:57:41 -07:00
Linus Torvalds 6adc19fd13 Kbuild updates for v5.8 (2nd)
- fix build rules in binderfs sample
 
  - fix build errors when Kbuild recurses to the top Makefile
 
  - covert '---help---' in Kconfig to 'help'
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7lBuYVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGHvIP/3iErjPshpg/phwH8NTCS4SFkiti
 BZRM+2lupSn7Qs53BTpVzIkXoHBJQZlJxlQ5HY8ScO+fiz28rKZr+b40us+je1Q+
 SkvSPfwZzxjEg7lAZutznG4KgItJLWJKmDyh9T8Y8TAuG4f8WO0hKnXoAp3YorS2
 zppEIxso8O5spZPjp+fF/fPbxPjIsabGK7Jp2LpSVFR5pVDHI/ycTlKQS+MFpMEx
 6JIpdFRw7TkvKew1dr5uAWT5btWHatEqjSR3JeyVHv3EICTGQwHmcHK67cJzGInK
 T51+DT7/CpKtmRgGMiTEu/INfMzzoQAKl6Fcu+vMaShTN97Hk9DpdtQyvA6P/h3L
 8GA4UBct05J7fjjIB7iUD+GYQ0EZbaFujzRXLYk+dQqEJRbhcCwvdzggGp0WvGRs
 1f8/AIpgnQv8JSL/bOMgGMS5uL2dSLsgbzTdr6RzWf1jlYdI1i4u7AZ/nBrwWP+Z
 iOBkKsVceEoJrTbaynl3eoYqFLtWyDau+//oBc2gUvmhn8ioM5dfqBRiJjxJnPG9
 /giRj6xRIqMMEw8Gg8PCG7WebfWxWyaIQwlWBbPok7DwISURK5mvOyakZL+Q25/y
 6MBr2H8NEJsf35q0GTINpfZnot7NX4JXrrndJH8NIRC7HEhwd29S041xlQJdP0rs
 E76xsOr3hrAmBu4P
 =1NIT
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - fix build rules in binderfs sample

 - fix build errors when Kbuild recurses to the top Makefile

 - covert '---help---' in Kconfig to 'help'

* tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  treewide: replace '---help---' in Kconfig files with 'help'
  kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
  samples: binderfs: really compile this sample and fix build issues
2020-06-13 13:29:16 -07:00
Masahiro Yamada a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Linus Torvalds b791d1bdf9 The Kernel Concurrency Sanitizer (KCSAN)
KCSAN is a dynamic race detector, which relies on compile-time
 instrumentation, and uses a watchpoint-based sampling approach to detect
 races.
 
 The feature was under development for quite some time and has already found
 legitimate bugs.
 
 Unfortunately it comes with a limitation, which was only understood late in
 the development cycle:
 
   It requires an up to date CLANG-11 compiler
 
 CLANG-11 is not yet released (scheduled for June), but it's the only
 compiler today which handles the kernel requirements and especially the
 annotations of functions to exclude them from KCSAN instrumentation
 correctly.
 
 These annotations really need to work so that low level entry code and
 especially int3 text poke handling can be completely isolated.
 
 A detailed discussion of the requirements and compiler issues can be found
 here:
 
   https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/
 
 We came to the conclusion that trying to work around compiler limitations
 and bugs again would end up in a major trainwreck, so requiring a working
 compiler seemed to be the best choice.
 
 For Continous Integration purposes the compiler restriction is manageable
 and that's where most xxSAN reports come from.
 
 For a change this limitation might make GCC people actually look at their
 bugs. Some issues with CSAN in GCC are 7 years old and one has been 'fixed'
 3 years ago with a half baken solution which 'solved' the reported issue
 but not the underlying problem.
 
 The KCSAN developers also ponder to use a GCC plugin to become independent,
 but that's not something which will show up in a few days.
 
 Blocking KCSAN until wide spread compiler support is available is not a
 really good alternative because the continuous growth of lockless
 optimizations in the kernel demands proper tooling support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7im98THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQ3xD/9+q87OmwnyoRTs6O3GDDbWZYoJGolh
 rctDOAYW8RSS73Fiw23z8hKlLl9tJCya6/X8Q9qoonB1YeIEPPRVj5HJWAMUNEIs
 YgjlZJFmh+mnbP/KQFctm3AWpoX8kqt3ncqj6zG72oQ9qKui691BY/2NmGVSLxUV
 DqtUYSKmi51XEQtZuXRuHEf3zBxoyeD43DaSCdJAXd6f5O2X7tmrWDuazHVeKzHV
 lhijvkyBvGMWvPg0IBrXkkLmeOvS0++MTGm3o+L72XF6nWpzTkcV7N0E9GEDFg45
 zwcidRVKD5d/1DoU5Tos96rCJpBEGh/wimlu0z14mcZpNiJgRQH5rzVEO9Y14UcP
 KL9FgRrb5dFw7yfX2zRQ070OFJ4AEDBMK0o5Lbu/QO5KLkvFkqnuWlQfmmtZJWCW
 DTRw/FgUgU7lvyPjRrao6HBvwy+yTb0u9K5seCOTRkuepR9nPJs0710pFiBsNCfV
 RY3cyggNBipAzgBOgLxixnq9+rHt70ton6S8Gijxpvt0dGGfO8k0wuEhFtA4zKrQ
 6HGK+pidxnoVdEgyQZhS+qzMMkyiUL0FXdaGJ2IX+/DC+Ij1UrUPjZBn7v25M0hQ
 ESkvxWKCn7snH4/NJsNxqCV1zyEc3zAW/WvLJUc9I7H8zPwtVvKWPrKEMzrJJ5bA
 aneySilbRxBFUg==
 =iplm
 -----END PGP SIGNATURE-----

Merge tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
 "The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
  which relies on compile-time instrumentation, and uses a
  watchpoint-based sampling approach to detect races.

  The feature was under development for quite some time and has already
  found legitimate bugs.

  Unfortunately it comes with a limitation, which was only understood
  late in the development cycle:

     It requires an up to date CLANG-11 compiler

  CLANG-11 is not yet released (scheduled for June), but it's the only
  compiler today which handles the kernel requirements and especially
  the annotations of functions to exclude them from KCSAN
  instrumentation correctly.

  These annotations really need to work so that low level entry code and
  especially int3 text poke handling can be completely isolated.

  A detailed discussion of the requirements and compiler issues can be
  found here:

    https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/

  We came to the conclusion that trying to work around compiler
  limitations and bugs again would end up in a major trainwreck, so
  requiring a working compiler seemed to be the best choice.

  For Continous Integration purposes the compiler restriction is
  manageable and that's where most xxSAN reports come from.

  For a change this limitation might make GCC people actually look at
  their bugs. Some issues with CSAN in GCC are 7 years old and one has
  been 'fixed' 3 years ago with a half baken solution which 'solved' the
  reported issue but not the underlying problem.

  The KCSAN developers also ponder to use a GCC plugin to become
  independent, but that's not something which will show up in a few
  days.

  Blocking KCSAN until wide spread compiler support is available is not
  a really good alternative because the continuous growth of lockless
  optimizations in the kernel demands proper tooling support"

* tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
  compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
  compiler.h: Move function attributes to compiler_types.h
  compiler.h: Avoid nested statement expression in data_race()
  compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
  kcsan: Update Documentation to change supported compilers
  kcsan: Remove 'noinline' from __no_kcsan_or_inline
  kcsan: Pass option tsan-instrument-read-before-write to Clang
  kcsan: Support distinguishing volatile accesses
  kcsan: Restrict supported compilers
  kcsan: Avoid inserting __tsan_func_entry/exit if possible
  ubsan, kcsan: Don't combine sanitizer with kcov on clang
  objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
  kcsan: Add __kcsan_{enable,disable}_current() variants
  checkpatch: Warn about data_race() without comment
  kcsan: Use GFP_ATOMIC under spin lock
  Improve KCSAN documentation a bit
  kcsan: Make reporting aware of KCSAN tests
  kcsan: Fix function matching in report
  kcsan: Change data_race() to no longer require marking racing accesses
  kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
  ...
2020-06-11 18:55:43 -07:00
Naoya Horiguchi 03151c6e0b mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread
Action Required memory error should happen only when a processor is
about to access to a corrupted memory, so it's synchronous and only
affects current process/thread.

Recently commit 872e9a205c ("mm, memory_failure: don't send
BUS_MCEERR_AO for action required error") fixed the issue that Action
Required memory could unnecessarily send SIGBUS to the processes which
share the error memory.  But we still have another issue that we could
send SIGBUS to a wrong thread.

This is because collect_procs() and task_early_kill() fails to add the
current process to "to-kill" list.  So this patch is suggesting to fix
it.  With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
process/thread.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: http://lkml.kernel.org/r/1591321039-22141-3-git-send-email-naoya.horiguchi@nec.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11 18:17:47 -07:00
Naoya Horiguchi 4e018b450a mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
Patch series "hwpoison: fixes signaling on memory error"

This is a small patchset to solve issues in memory error handler to send
SIGBUS to proper process/thread as expected in configuration.  Please
see descriptions in individual patches for more details.

This patch (of 2):

Early-kill policy is controlled from two types of settings, one is
per-process setting prctl(PR_MCE_KILL) and the other is system-wide
setting vm.memory_failure_early_kill.  Users expect per-process setting
to override system-wide setting as many other settings do, but
early-kill setting doesn't work as such.

For example, if a system configures vm.memory_failure_early_kill to 1
(enabled), a process receives SIGBUS even if it's configured to
explicitly disable PF_MCE_KILL by prctl().  That's not desirable for
applications with their own policies.

This patch is suggesting to change the priority of these two types of
settings, by checking sysctl_memory_failure_early_kill only when a given
process has the default kill policy.

Note that this patch is solving a thread choice issue too.

Originally, collect_procs() always chooses the main thread when
vm.memory_failure_early_kill is 1, even if the process has a dedicated
thread for memory error handling.  SIGBUS should be sent to the
dedicated thread if early-kill is enabled via
vm.memory_failure_early_kill as we are doing for PR_MCE_KILL_EARLY
processes.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: http://lkml.kernel.org/r/1591321039-22141-1-git-send-email-naoya.horiguchi@nec.com
Link: http://lkml.kernel.org/r/1591321039-22141-2-git-send-email-naoya.horiguchi@nec.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11 18:17:47 -07:00
Linus Torvalds 623f6dc593 Merge branch 'akpm' (patches from Andrew)
Merge some more updates from Andrew Morton:

 - various hotfixes and minor things

 - hch's use_mm/unuse_mm clearnups

Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kernel: set USER_DS in kthread_use_mm
  kernel: better document the use_mm/unuse_mm API contract
  kernel: move use_mm/unuse_mm to kthread.c
  kernel: move use_mm/unuse_mm to kthread.c
  stacktrace: cleanup inconsistent variable type
  lib: test get_count_order/long in test_bitops.c
  mm: add comments on pglist_data zones
  ocfs2: fix spelling mistake and grammar
  mm/debug_vm_pgtable: fix kernel crash by checking for THP support
  lib: fix bitmap_parse() on 64-bit big endian archs
  checkpatch: correct check for kernel parameters doc
  nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
  lib/lz4/lz4_decompress.c: document deliberate use of `&'
  kcov: check kcov_softirq in kcov_remote_stop()
  scripts/spelling: add a few more typos
  khugepaged: selftests: fix timeout condition in wait_for_scan()
2020-06-11 13:25:53 -07:00
Thomas Gleixner 37d1a04b13 Rebase locking/kcsan to locking/urgent
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.

Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11 20:02:46 +02:00
Christoph Hellwig f5678e7f2a kernel: better document the use_mm/unuse_mm API contract
Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
  Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
  Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb]
Acked-by: Haren Myneni <haren@linux.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Christoph Hellwig 9bf5b9eb23 kernel: move use_mm/unuse_mm to kthread.c
Patch series "improve use_mm / unuse_mm", v2.

This series improves the use_mm / unuse_mm interface by better documenting
the assumptions, and my taking the set_fs manipulations spread over the
callers into the core API.

This patch (of 3):

Use the proper API instead.

Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

These helpers are only for use with kernel threads, and I will tie them
more into the kthread infrastructure going forward.  Also move the
prototypes to kthread.h - mmu_context.h was a little weird to start with
as it otherwise contains very low-level MM bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Aneesh Kumar K.V 787d563b86 mm/debug_vm_pgtable: fix kernel crash by checking for THP support
Architectures can have CONFIG_TRANSPARENT_HUGEPAGE enabled but no THP
support enabled based on platforms.  For ex: with 4K PAGE_SIZE ppc64
supports THP only with radix translation.

This results in below crash when running with hash translation and 4K
PAGE_SIZE.

    kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:140!
    cpu 0x61: Vector: 700 (Program Check) at [c000000ff948f860]
        pc: debug_vm_pgtable+0x480/0x8b0
        lr: debug_vm_pgtable+0x474/0x8b0
    ...
        debug_vm_pgtable+0x374/0x8b0 (unreliable)
        do_one_initcall+0x98/0x4f0
        kernel_init_freeable+0x330/0x3fc
        kernel_init+0x24/0x148

Check for THP support correctly

Link: http://lkml.kernel.org/r/20200608125252.407659-1-aneesh.kumar@linux.ibm.com
Fixes: 399145f9eb ("mm/debug: add tests validating architecture page table helpers")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Linus Torvalds 09102704c6 virtio: features, fixes
virtio-mem
 doorbell mapping for vdpa
 config interrupt support in ifc
 fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl7fZ6APHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpkDoIAMcBcQx5su1iuX7vT35xzUWZO478eAf1jOMZ
 7KxKUVBeztkcxVFUlRVRu9MR6wOzwHils+1HD6025775Smr5M6x3aJxR6xOORaBj
 RoU6OVGkpDvbzsxlhW+xhONz4O7/RkveKJPCwzGjqHrsFeh92lkfTqroz/EuNpw+
 LZsO0+DhdUf123HbwHQp5lxW8EjyrRabgeZZg/D9VLPhoCP88vCjRhBXU2GPuaUl
 /UNXsQafn4xUgrxPaoN5f4Phn/P46NNrbZ1jmlkw/z/3QhF/DhktGXGaZsIHDCN/
 vicUii0or5QLeBsZpMbKko/BIe2xWHxFjkMRhMOMZOfcBb6sMBI=
 =auUa
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - virtio-mem: paravirtualized memory hotplug

 - support doorbell mapping for vdpa

 - config interrupt support in ifc

 - fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  vhost/test: fix up after API change
  virtio_mem: convert device block size into 64bit
  virtio-mem: drop unnecessary initialization
  ifcvf: implement config interrupt in IFCVF
  vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
  vhost_vdpa: Support config interrupt in vdpa
  ifcvf: ignore continuous setting same status value
  virtio-mem: Don't rely on implicit compiler padding for requests
  virtio-mem: Try to unplug the complete online memory block first
  virtio-mem: Use -ETXTBSY as error code if the device is busy
  virtio-mem: Unplug subblocks right-to-left
  virtio-mem: Drop manual check for already present memory
  virtio-mem: Add parent resource for all added "System RAM"
  virtio-mem: Better retry handling
  virtio-mem: Offline and remove completely unplugged memory blocks
  mm/memory_hotplug: Introduce offline_and_remove_memory()
  virtio-mem: Allow to offline partially unplugged memory blocks
  mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
  virtio-mem: Paravirtualized memory hotunplug part 2
  virtio-mem: Paravirtualized memory hotunplug part 1
  ...
2020-06-10 13:42:09 -07:00
Christoph Hellwig 2a71e81d32 maccess: return -ERANGE when probe_kernel_read() fails
Allow the callers to distinguish a real unmapped address vs a range
that can't be probed.

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-24-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:16 -07:00
Christoph Hellwig b58294ead1 maccess: allow architectures to provide kernel probing directly
Provide alternative versions of probe_kernel_read, probe_kernel_write
and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
use arch hooks that are modelled after unsafe_{get,put}_user to access
kernel memory in an exception safe way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-19-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig fc3562d794 maccess: move user access routines together
Move kernel access vs user access routines together to ease upcoming
ifdefs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-18-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 98a23609b1 maccess: always use strict semantics for probe_kernel_read
Except for historical confusion in the kprobes/uprobes and bpf tracers,
which has been fixed now, there is no good reason to ever allow user
memory accesses from probe_kernel_read.  Switch probe_kernel_read to only
read from kernel memory.

[akpm@linux-foundation.org: update it for "mm, dump_page(): do not crash with invalid mapping pointer"]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-17-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 7676fbf21b maccess: remove strncpy_from_unsafe
All users are gone now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-16-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig eab0c6089b maccess: unify the probe kernel arch hooks
Currently architectures have to override every routine that probes
kernel memory, which includes a pure read and strcpy, both in strict
and not strict variants.  Just provide a single arch hooks instead to
make sure all architectures cover all the cases.

[akpm@linux-foundation.org: fix !CONFIG_X86_64 build]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-11-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig cd0309058f maccess: remove probe_read_common and probe_write_common
Each of the helpers has just two callers, which also different in
dealing with kernel or userspace pointers.  Just open code the logic
in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-10-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 02dddb160e maccess: rename strnlen_unsafe_user to strnlen_user_nofault
This matches the naming of strnlen_user, and also makes it more clear
what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-9-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig c4cb164426 maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
This matches the naming of strncpy_from_user_nofault, and also makes it
more clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-8-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig bd88bb5d40 maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
This matches the naming of strncpy_from_user, and also makes it more
clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 3f0acb1eb3 maccess: update the top of file comment
This file now also contains several helpers for accessing user memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 4f6de12b37 maccess: clarify kerneldoc comments
Add proper kerneldoc comments for probe_kernel_read_strict and
probe_kernel_read strncpy_from_unsafe_strict and explain the different
versus the non-strict version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-5-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 48c49c0e5f maccess: remove various unused weak aliases
maccess tends to define lots of underscore prefixed symbols that then
have other weak aliases.  But except for two cases they are never
actually used, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-3-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig 0493cb0863 maccess: unexport probe_kernel_write()
Patch series "clean up and streamline probe_kernel_* and friends", v4.

This series start cleaning up the safe kernel and user memory probing
helpers in mm/maccess.c, and then allows architectures to implement the
kernel probing without overriding the address space limit and temporarily
allowing access to user memory.  It then switches x86 over to this new
mechanism by reusing the unsafe_* uaccess logic.

This version also switches to the saner copy_{from,to}_kernel_nofault
naming suggested by Linus.

I kept the x86 helpers as-is without calling unsage_{get,put}_user as that
avoids a number of hard to trace casts, and it will still work with the
asm-goto based version easily.

This patch (of 20):

probe_kernel_write() is not used by any modular code.

[sfr@canb.auug.org.au: turns out that probe_user_write is used in modular code]
  Link: http://lkml.kernel.org/r/20200602195741.4faaa348@canb.auug.org.au

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200521152301.2587579-2-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Michel Lespinasse c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse 3e4e28c5a8 mmap locking API: convert mmap_sem API comments
Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse da1c55f1b2 mmap locking API: rename mmap_sem to mmap_lock
Rename the mmap_sem field to mmap_lock.  Any new uses of this lock should
now go through the new mmap locking api.  The mmap_lock is still
implemented as a rwsem, though this could change in the future.

[akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00