linux-stable/mm
Jann Horn e7e3e90d67 mm/slub: add missing TID updates on slab deactivation
commit eeaa345e12 upstream.

The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
the TID stays the same. However, two places in __slab_alloc() currently
don't update the TID when deactivating the CPU slab.

If multiple operations race the right way, this could lead to an object
getting lost; or, in an even more unlikely situation, it could even lead to
an object being freed onto the wrong slab's freelist, messing up the
`inuse` counter and eventually causing a page to be freed to the page
allocator while it still contains slab objects.

(I haven't actually tested these cases though, this is just based on
looking at the code. Writing testcases for this stuff seems like it'd be
a pain...)

The race leading to state inconsistency is (all operations on the same CPU
and kmem_cache):

 - task A: begin do_slab_free():
    - read TID
    - read pcpu freelist (==NULL)
    - check `slab == c->slab` (true)
 - [PREEMPT A->B]
 - task B: begin slab_alloc_node():
    - fastpath fails (`c->freelist` is NULL)
    - enter __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - take local_lock_irqsave()
    - read c->freelist as NULL
    - get_freelist() returns NULL
    - write `c->slab = NULL`
    - drop local_unlock_irqrestore()
    - goto new_slab
    - slub_percpu_partial() is NULL
    - get_partial() returns NULL
    - slub_put_cpu_ptr() (enables preemption)
 - [PREEMPT B->A]
 - task A: finish do_slab_free():
    - this_cpu_cmpxchg_double() succeeds()
    - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]

From there, the object on c->freelist will get lost if task B is allowed to
continue from here: It will proceed to the retry_load_slab label,
set c->slab, then jump to load_freelist, which clobbers c->freelist.

But if we instead continue as follows, we get worse corruption:

 - task A: run __slab_free() on object from other struct slab:
    - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
 - task A: run slab_alloc_node() with NUMA node constraint:
    - fastpath fails (c->slab is NULL)
    - call __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - c->slab is NULL: goto new_slab
    - slub_percpu_partial() is non-NULL
    - set c->slab to slub_percpu_partial(c)
    - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
      from slab-2]
    - goto redo
    - node_match() fails
    - goto deactivate_slab
    - existing c->freelist is passed into deactivate_slab()
    - inuse count of slab-1 is decremented to account for object from
      slab-2

At this point, the inuse count of slab-1 is 1 lower than it should be.
This means that if we free all allocated objects in slab-1 except for one,
SLUB will think that slab-1 is completely unused, and may free its page,
leading to use-after-free.

Fixes: c17dda40a6 ("slub: Separate out kmem_cache_cpu processing from deactivate_slab")
Fixes: 03e404af26 ("slub: fast release on full slab")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-12 16:30:46 +02:00
..
kasan kasan: fix incorrect arguments passing in kasan_add_zero_shadow 2021-01-27 11:47:53 +01:00
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-16 10:56:59 +01:00
Kconfig.debug
Makefile mm: silence -Woverride-init/initializer-overrides 2019-09-24 15:54:10 -07:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 14:49:00 +01:00
balloon_compaction.c
cleancache.c
cma.c cma: don't quit at first error when activating reserved areas 2020-09-03 11:26:51 +02:00
cma.h
cma_debug.c
compaction.c mm, compaction: fast_find_migrateblock() should return pfn in the target zone 2022-06-14 18:11:46 +02:00
debug.c mm/debug.c: always print flags in dump_page() 2020-03-05 16:43:51 +01:00
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap: fix storing to a THP shadow entry 2021-06-10 13:37:15 +02:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c
gup.c mm/gup: fix gup_fast with dynamic page table folding 2020-10-01 13:18:24 +02:00
gup_benchmark.c mm/gup: fix memory leak in __gup_benchmark_ioctl 2020-01-09 10:20:00 +01:00
highmem.c
hmm.c
huge_memory.c mm/huge_memory.c: don't discard hugepage if other processes are mapping it 2021-07-14 16:53:47 +02:00
hugetlb.c hugetlb: fix huge_pmd_unshare address update 2022-06-14 18:11:48 +02:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-15 18:34:00 -08:00
hwpoison-inject.c
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:52 -04:00
interval_tree.c
khugepaged.c khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate() 2021-05-19 10:08:27 +02:00
kmemleak-test.c
kmemleak.c mm: kmemleak: take a full lowmem check in kmemleak_*_phys() 2022-04-20 09:19:38 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:08:27 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-11 13:23:31 +01:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-01-17 19:48:40 +01:00
madvise.c mm: validate pmd after splitting 2020-10-01 13:18:21 +02:00
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-03-02 11:41:18 +01:00
memcontrol.c mm/memcontrol: return 1 from cgroup.memory __setup() handler 2022-04-15 14:18:29 +02:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:07:49 +01:00
memory-failure.c mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-23 14:41:23 +02:00
memory.c mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() 2022-05-15 19:54:47 +02:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 12:26:43 +02:00
mempolicy.c mm/mempolicy: fix mpol_new leak in shared_policy_replace 2022-04-15 14:18:39 +02:00
mempool.c
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-09 10:19:56 +01:00
memtest.c
migrate.c mm: fix missing cache flush for all tail pages of compound page 2022-05-15 19:54:47 +02:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c
mmap.c mm, hugetlb: allow for "high" userspace addresses 2022-05-09 09:03:28 +02:00
mmu_context.c mm: fix kthread_use_mm() vs TLB invalidate 2020-09-03 11:26:51 +02:00
mmu_gather.c mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush 2020-02-11 04:35:42 -08:00
mmu_notifier.c mm/mmu_notifiers: use the right return code for WARN_ON 2019-11-06 08:47:50 -08:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 19:54:46 +02:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-12 13:00:19 +01:00
mremap.c mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) 2022-04-15 14:18:39 +02:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:25:58 +01:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:50:48 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:22:41 +01:00
page_alloc.c mm: page_alloc: fix building error on -Werror=array-compare 2022-04-27 13:50:45 +02:00
page_counter.c mm/page_counter.c: fix protection usage propagation 2020-08-21 13:05:27 +02:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-05-12 12:23:48 +02:00
page_isolation.c mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-23 12:40:47 +02:00
page_owner.c mm/page_owner: change split_page_owner to take a count 2020-10-29 09:57:52 +01:00
page_poison.c
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:55 -04:00
pagewalk.c mm: pagewalk: fix termination condition in walk_pte_range() 2020-10-01 13:17:30 +02:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 12:40:45 +02:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:52 -04:00
process_vm_access.c
readahead.c
rmap.c mm: fix race between MADV_FREE reclaim and blkdev direct IO read 2022-04-15 14:18:36 +02:00
rodata_test.c
shmem.c shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode 2022-01-27 09:19:29 +01:00
shuffle.c mm/shuffle: don't move pages between zones and don't read garbage memmaps 2020-09-03 11:26:51 +02:00
shuffle.h
slab.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-23 08:22:40 +01:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:47:21 +01:00
slab_common.c mm: slab: fix kmem_cache_create failed when sysfs node not destroyed 2021-07-25 14:35:14 +02:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub: add missing TID updates on slab deactivation 2022-07-12 16:30:46 +02:00
sparse-vmemmap.c
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:44:32 +02:00
swap.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:18:08 +02:00
swapfile.c swap: fix swapfile read/write offset 2021-03-07 12:20:49 +01:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:53 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-15 14:18:30 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 19:54:47 +02:00
util.c random: move randomize_page() into mm where it belongs 2022-06-22 14:11:17 +02:00
vmacache.c
vmalloc.c mm/vunmap: add cond_resched() in vunmap_pmd_range 2020-09-03 11:26:52 +02:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-22 12:26:37 +02:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 19:54:46 +02:00
workingset.c
z3fold.c mm/z3fold: fix potential memory leak in z3fold_destroy_pool() 2021-07-14 16:53:47 +02:00
zbud.c
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:33:50 +02:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00