linux-stable/arch/powerpc/mm
Gautham R. Shenoy 07a971f35f powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
[ Upstream commit c784be435d ]

The calls to arch_add_memory()/arch_remove_memory() are always made
with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin().
On pSeries, arch_add_memory()/arch_remove_memory() eventually call
resize_hpt() which in turn calls stop_machine() which acquires the
read-side cpu_hotplug_lock again, thereby resulting in the recursive
acquisition of this lock.

In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
lockup during a memory hotplug operation because cpus_read_lock() is a
per-cpu rwsem read, which, in the fast-path (in the absence of the
writer, which in our case is a CPU-hotplug operation) simply
increments the read_count on the semaphore. Thus a recursive read in
the fast-path doesn't cause any problems.

However, we can hit this problem in practice if there is a concurrent
CPU-Hotplug operation in progress which is waiting to acquire the
write-side of the lock. This will cause the second recursive read to
block until the writer finishes. While the writer is blocked since the
first read holds the lock. Thus both the reader as well as the writers
fail to make any progress thereby blocking both CPU-Hotplug as well as
Memory Hotplug operations.

Memory-Hotplug				CPU-Hotplug
CPU 0					CPU 1
------                                  ------

1. down_read(cpu_hotplug_lock.rw_sem)
   [memory_hotplug_begin]
					2. down_write(cpu_hotplug_lock.rw_sem)
					[cpu_up/cpu_down]
3. down_read(cpu_hotplug_lock.rw_sem)
   [stop_machine()]

Lockdep complains as follows in these code-paths.

 swapper/0/1 is trying to acquire lock:
 (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60

but task is already holding lock:
(____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(cpu_hotplug_lock.rw_sem);
   lock(cpu_hotplug_lock.rw_sem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 3 locks held by swapper/0/1:
  #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
  #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
  #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0

stack backtrace:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
 Call Trace:
   dump_stack+0xe8/0x164 (unreliable)
   __lock_acquire+0x1110/0x1c70
   lock_acquire+0x240/0x290
   cpus_read_lock+0x64/0xf0
   stop_machine+0x2c/0x60
   pseries_lpar_resize_hpt+0x19c/0x2c0
   resize_hpt_for_hotplug+0x70/0xd0
   arch_add_memory+0x58/0xfc
   devm_memremap_pages+0x5e8/0x8f0
   pmem_attach_disk+0x764/0x830
   nvdimm_bus_probe+0x118/0x240
   really_probe+0x230/0x4b0
   driver_probe_device+0x16c/0x1e0
   __driver_attach+0x148/0x1b0
   bus_for_each_dev+0x90/0x130
   driver_attach+0x34/0x50
   bus_add_driver+0x1a8/0x360
   driver_register+0x108/0x170
   __nd_driver_register+0xd0/0xf0
   nd_pmem_driver_init+0x34/0x48
   do_one_initcall+0x1e0/0x45c
   kernel_init_freeable+0x540/0x64c
   kernel_init+0x2c/0x160
   ret_from_kernel_thread+0x5c/0x68

Fix this issue by
  1) Requiring all the calls to pseries_lpar_resize_hpt() be made
     with cpu_hotplug_lock held.

  2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
     as a consequence of 1)

  3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
     with cpu_hotplug_lock held.

Fixes: dbcf929c00 ("powerpc/pseries: Add support for hash table resizing")
Cc: stable@vger.kernel.org # v4.11+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:18:45 +02:00
..
8xx_mmu.c powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx 2018-06-05 11:42:00 +02:00
40x_mmu.c
44x_mmu.c
copro_fault.c
dma-noncoherent.c powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit 2017-06-05 19:59:03 +10:00
dump_hashpagetable.c powerpc/mm: Use seq_putc() in two functions 2017-09-01 16:42:52 +10:00
dump_linuxpagetables.c powerpc/mm: Fix linux page tables build with some configs 2019-01-13 10:00:56 +01:00
fault.c powerpc/mm: Fix reporting of kernel execute faults on the 8xx 2019-02-12 19:46:07 +01:00
fsl_booke_mmu.c
hash64_4k.c
hash64_64k.c
hash_low_32.S powerpc: fix location of two EXPORT_SYMBOL 2017-09-01 16:42:45 +10:00
hash_native_64.c powerpc/powernv: Fix kexec crashes caused by tlbie tracing 2017-12-05 11:26:31 +01:00
hash_utils_64.c powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt() 2019-10-11 18:18:45 +02:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hugepage-hash64.c
hugetlbpage-book3e.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hugetlbpage-hash64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hugetlbpage-radix.c powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback 2019-04-05 22:31:30 +02:00
hugetlbpage.c powerpc/mm: Don't report hugepage tables as memory leaks when using kmemleak 2018-11-21 09:24:03 +01:00
init-common.c
init_32.c powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line 2017-08-16 14:56:12 +10:00
init_64.c powerpc/mm: Fix section mismatch warning in early_check_vec5() 2017-08-10 23:40:51 +10:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mem.c powerpc: Avoid code patching freed init sections 2018-10-13 09:27:28 +02:00
mmap.c powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation 2017-11-30 08:40:57 +00:00
mmu_context.c powerpc/mm: Make switch_mm_irqs_off() out of line 2017-08-23 22:48:51 +10:00
mmu_context_book3s64.c powerpc/64s/hash: Fix fork() with 512TB process address space 2017-11-30 08:40:57 +00:00
mmu_context_hash32.c
mmu_context_iommu.c KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages 2018-09-09 19:55:58 +02:00
mmu_context_nohash.c powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx 2018-06-05 11:42:00 +02:00
mmu_decl.h powerpc/8xx: Getting rid of remaining use of CONFIG_8xx 2017-08-10 23:32:12 +10:00
numa.c powerpc/numa: improve control of topology updates 2019-05-31 06:47:25 -07:00
pgtable-book3e.c
pgtable-book3s64.c powerpc/mm/cxl: Add the fault handling cpu to mm cpumask 2017-08-17 23:31:52 +10:00
pgtable-hash64.c powerpc/mm: Use mm_is_thread_local() instread of open-coding 2017-08-23 22:27:45 +10:00
pgtable-radix.c powerpc/mm/radix: Use the right page size for vmemmap mapping 2019-09-21 07:15:27 +02:00
pgtable.c
pgtable_32.c powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled 2017-10-04 22:15:30 +11:00
pgtable_64.c powerpc/mm: Flush radix process translations when setting MMU type 2018-02-22 15:42:16 +01:00
ppc_mmu_32.c
slb.c powerpc/64s: Fix compiler store ordering to SLB shadow area 2018-08-03 07:50:25 +02:00
slb_low.S powerpc/mm/hash64: Make vmalloc 56T on hash 2017-08-08 19:37:05 +10:00
slice.c powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search 2019-05-08 07:20:53 +02:00
subpage-prot.c powerpc/mm/hash: Free the subpage_prot_table correctly 2017-07-27 13:05:50 +10:00
tlb-radix.c powerpc/radix: Remove trace_tlbie call from radix__flush_tlb_all 2018-02-22 15:42:16 +01:00
tlb_hash32.c
tlb_hash64.c powerpc/mm: Use mm_is_thread_local() instread of open-coding 2017-08-23 22:27:45 +10:00
tlb_low_64e.S powerpc/fsl: Flush the branch predictor at each kernel entry (64bit) 2019-04-03 06:25:14 +02:00
tlb_nohash.c powerpc/nohash: fix undefined behaviour when testing page size support 2018-11-21 09:24:03 +01:00
tlb_nohash_low.S powerpc/8xx: Getting rid of remaining use of CONFIG_8xx 2017-08-10 23:32:12 +10:00
vphn.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
vphn.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00