linux-stable/mm
Tetsuo Handa 63c79247fe mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
commit 1007843a91 upstream.

syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

  CPU0
  ----
  __build_all_zonelists() {
    write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
    // e.g. timer interrupt handler runs at this moment
      some_timer_func() {
        kmalloc(GFP_ATOMIC) {
          __alloc_pages_slowpath() {
            read_seqbegin(&zonelist_update_seq) {
              // spins forever because zonelist_update_seq.seqcount is odd
            }
          }
        }
      }
    // e.g. timer interrupt handler finishes
    write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
  }

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests.  But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

  CPU0                                   CPU1
  ----                                   ----
  pty_write() {
    tty_insert_flip_string_and_push_buffer() {
                                         __build_all_zonelists() {
                                           write_seqlock(&zonelist_update_seq);
                                           build_zonelists() {
                                             printk() {
                                               vprintk() {
                                                 vprintk_default() {
                                                   vprintk_emit() {
                                                     console_unlock() {
                                                       console_flush_all() {
                                                         console_emit_next_record() {
                                                           con->write() = serial8250_console_write() {
      spin_lock_irqsave(&port->lock, flags);
      tty_insert_flip_string() {
        tty_insert_flip_string_fixed_flag() {
          __tty_buffer_request_room() {
            tty_buffer_alloc() {
              kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                __alloc_pages_slowpath() {
                  zonelist_iter_begin() {
                    read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                             spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                    }
                  }
                }
              }
            }
          }
        }
      }
      spin_unlock_irqrestore(&port->lock, flags);
                                                             // message is printed to console
                                                             spin_unlock_irqrestore(&port->lock, flags);
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                               }
                                             }
                                           }
                                           write_sequnlock(&zonelist_update_seq);
                                         }
    }
  }

This deadlock scenario can be eliminated by

  preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

  preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

  disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

  disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17 11:11:51 +02:00
..
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-06 07:46:34 +01:00
Kconfig mm/hmm: select mmu notifier when selecting HMM 2019-06-15 11:54:51 +02:00
Kconfig.debug kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
Makefile kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 10:16:54 +01:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2018-10-13 09:27:30 +02:00
bootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cleancache.c
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-08-06 19:05:23 +02:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-06-15 11:54:51 +02:00
compaction.c mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 12:48:13 +02:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:08:03 +01:00
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint 2018-09-15 09:45:28 +02:00
failslab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filemap.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-25 17:36:55 +01:00
frame_vector.c v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails 2022-12-08 11:16:33 +01:00
frontswap.c
gup.c gup: document and work around "COW can break either way" issue 2021-04-28 12:08:42 +02:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2019-01-13 10:01:02 +01:00
huge_memory.c mm/huge_memory.c: don't discard hugepage if other processes are mapping it 2021-07-20 16:17:41 +02:00
hugetlb.c mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages 2022-11-03 23:50:54 +09:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 17:59:33 +01:00
hwpoison-inject.c
init-mm.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-07-11 12:48:10 +02:00
interval_tree.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
khugepaged.c mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths 2023-01-18 09:26:04 +01:00
kmemleak-test.c
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 12:23:51 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-22 10:57:39 +02:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-19 08:20:54 +02:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-09-09 19:03:11 +02:00
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-10 08:54:22 +02:00
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-03-02 11:34:00 +01:00
memcontrol.c memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:26:13 +01:00
memory-failure.c mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-30 08:48:48 -04:00
memory.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2023-01-18 09:26:04 +01:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 11:45:34 +02:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-22 12:46:04 +01:00
mempool.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm/migrate_device.c: flush TLB while holding PTL 2022-10-26 13:16:50 +02:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-21 18:50:16 +02:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-07-10 09:54:36 +02:00
mm_init.c
mmap.c mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() 2022-09-20 11:51:30 +02:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:28:56 +02:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-11 18:03:02 +01:00
mremap.c mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) 2022-04-20 09:08:29 +02:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-04-02 16:34:20 +02:00
oom_kill.c mm, oom: do not trigger out_of_memory from the #PF 2021-11-26 11:40:37 +01:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:20:32 +01:00
page_alloc.c mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock 2023-05-17 11:11:51 +02:00
page_counter.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:31:27 +02:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:15:59 +02:00
page_io.c swap: fix swapfile read/write offset 2021-03-07 11:27:46 +01:00
page_isolation.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_owner.c mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages 2020-07-31 16:44:45 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:31:28 +02:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-07-11 12:48:12 +02:00
pagewalk.c mm: pagewalk: fix termination condition in walk_pte_range() 2020-10-01 13:12:32 +02:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:46:05 +01:00
percpu-stats.c percpu: fix starting offset for chunk statistics traversal 2017-09-27 14:45:57 -07:00
percpu-vm.c percpu: add __GFP_NORETRY semantics to the percpu balancing path 2018-04-08 14:26:29 +02:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 10:46:36 +02:00
pgtable-generic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_vm_access.c
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c readahead: stricter check for bdi io_pages 2018-09-09 19:55:53 +02:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:25:06 +02:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:01:58 +01:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-06-15 11:54:51 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 11:40:40 +01:00
slab_common.c mm/slab: use memzero_explicit() in kzfree() 2020-06-30 15:38:08 -04:00
slob.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 10:56:50 +02:00
sparse-vmemmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse.c mm: sections are not offlined during memory hotremove 2018-05-16 10:10:27 +02:00
swap.c mm: avoid marking swap cached page as lazyfree 2017-10-03 17:54:24 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:12:46 +02:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-20 12:02:11 +02:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-08 13:03:40 +01:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-10-11 18:18:34 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 19:40:27 +02:00
util.c mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags 2023-02-06 07:46:35 +01:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
vmalloc.c mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap() 2020-06-03 08:18:11 +02:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c mm: memcg: make sure memory.events is uptodate when waking pollers 2021-04-07 12:47:03 +02:00
vmstat.c mm, vmstat: drop zone->lock in /proc/pagetypeinfo 2021-06-03 08:36:10 +02:00
workingset.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:42:54 +01:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:20:57 +02:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-09-05 09:26:30 +02:00