linux-stable/mm
Tetsuo Handa 93ca0d7b88 mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
commit 1007843a91 upstream.

syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

  CPU0
  ----
  __build_all_zonelists() {
    write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
    // e.g. timer interrupt handler runs at this moment
      some_timer_func() {
        kmalloc(GFP_ATOMIC) {
          __alloc_pages_slowpath() {
            read_seqbegin(&zonelist_update_seq) {
              // spins forever because zonelist_update_seq.seqcount is odd
            }
          }
        }
      }
    // e.g. timer interrupt handler finishes
    write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
  }

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests.  But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

  CPU0                                   CPU1
  ----                                   ----
  pty_write() {
    tty_insert_flip_string_and_push_buffer() {
                                         __build_all_zonelists() {
                                           write_seqlock(&zonelist_update_seq);
                                           build_zonelists() {
                                             printk() {
                                               vprintk() {
                                                 vprintk_default() {
                                                   vprintk_emit() {
                                                     console_unlock() {
                                                       console_flush_all() {
                                                         console_emit_next_record() {
                                                           con->write() = serial8250_console_write() {
      spin_lock_irqsave(&port->lock, flags);
      tty_insert_flip_string() {
        tty_insert_flip_string_fixed_flag() {
          __tty_buffer_request_room() {
            tty_buffer_alloc() {
              kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                __alloc_pages_slowpath() {
                  zonelist_iter_begin() {
                    read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                             spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                    }
                  }
                }
              }
            }
          }
        }
      }
      spin_unlock_irqrestore(&port->lock, flags);
                                                             // message is printed to console
                                                             spin_unlock_irqrestore(&port->lock, flags);
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                               }
                                             }
                                           }
                                           write_sequnlock(&zonelist_update_seq);
                                         }
    }
  }

This deadlock scenario can be eliminated by

  preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

  preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

  disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

  disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17 11:36:05 +02:00
..
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-06 07:52:50 +01:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 14:49:00 +01:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c cma: don't quit at first error when activating reserved areas 2020-09-03 11:26:51 +02:00
cma.h
cma_debug.c
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2023-01-18 11:41:44 +01:00
debug.c mm/debug.c: always print flags in dump_page() 2020-03-05 16:43:51 +01:00
debug_page_ref.c
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c
filemap.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-25 17:42:22 +01:00
frame_vector.c v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails 2022-12-08 11:23:06 +01:00
frontswap.c
gup.c mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-12-19 12:24:15 +01:00
gup_benchmark.c mm/gup: fix memory leak in __gup_benchmark_ioctl 2020-01-09 10:20:00 +01:00
highmem.c
hmm.c pagewalk: separate function pointers from iterator data 2019-09-07 04:28:04 -03:00
huge_memory.c mm/thp: check and bail out if page in deferred queue already 2023-03-11 16:44:05 +01:00
hugetlb.c mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-12-19 12:24:15 +01:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-15 18:34:00 -08:00
hwpoison-inject.c
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:52 -04:00
interval_tree.c
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-16 10:56:59 +01:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
khugepaged.c mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma 2023-01-24 07:18:01 +01:00
kmemleak-test.c
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 12:04:49 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:08:27 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-11 13:23:31 +01:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-01-17 19:48:40 +01:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:37:43 +02:00
Makefile mm: silence -Woverride-init/initializer-overrides 2019-09-24 15:54:10 -07:00
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:50:39 +01:00
memcontrol.c mm: memcontrol: deprecate charge moving 2023-03-11 16:44:05 +01:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:07:49 +01:00
memory-failure.c mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-23 14:41:23 +02:00
memory.c mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() 2022-05-15 19:54:47 +02:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 12:26:43 +02:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-22 12:50:34 +01:00
mempool.c
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-09 10:19:56 +01:00
memtest.c
migrate.c mm/migrate_device.c: flush TLB while holding PTL 2022-10-05 10:37:43 +02:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c
mmap.c mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() 2022-09-20 12:28:00 +02:00
mmu_context.c mm: fix kthread_use_mm() vs TLB invalidate 2020-09-03 11:26:51 +02:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:30:42 +01:00
mmu_notifier.c mm/mmu_notifiers: use the right return code for WARN_ON 2019-11-06 08:47:50 -08:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 19:54:46 +02:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-12 13:00:19 +01:00
mremap.c mm/mremap: hold the rmap lock in write mode when moving page table entries. 2022-08-25 11:17:20 +02:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:25:58 +01:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:50:48 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:22:41 +01:00
page_alloc.c mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock 2023-05-17 11:36:05 +02:00
page_counter.c mm/page_counter.c: fix protection usage propagation 2020-08-21 13:05:27 +02:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-05-12 12:23:48 +02:00
page_isolation.c mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-23 12:40:47 +02:00
page_owner.c mm/page_owner: change split_page_owner to take a count 2020-10-29 09:57:52 +01:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:55 -04:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-10-15 07:54:36 +02:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 12:40:45 +02:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:52 -04:00
process_vm_access.c
readahead.c
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:27:46 +02:00
rodata_test.c
shmem.c shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode 2022-01-27 09:19:29 +01:00
shuffle.c mm/shuffle: don't move pages between zones and don't read garbage memmaps 2020-09-03 11:26:51 +02:00
shuffle.h
slab.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-23 08:22:40 +01:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:47:21 +01:00
slab_common.c mm: slab: fix kmem_cache_create failed when sysfs node not destroyed 2021-07-25 14:35:14 +02:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 11:04:04 +02:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:44:32 +02:00
swap.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:18:08 +02:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-20 12:07:35 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:53 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-15 14:18:30 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 19:54:47 +02:00
util.c random: move randomize_page() into mm where it belongs 2022-06-22 14:11:17 +02:00
vmacache.c
vmalloc.c mm/vunmap: add cond_resched() in vunmap_pmd_range 2020-09-03 11:26:52 +02:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-22 12:26:37 +02:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 19:54:46 +02:00
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold: fix potential memory leak in z3fold_destroy_pool() 2021-07-14 16:53:47 +02:00
zbud.c
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:33:50 +02:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00