linux-stable/mm
Joonsoo Kim 0378730142 slab: fix oops when reading /proc/slab_allocators
Commit b1cb0982bd ("change the management method of free objects of
the slab") introduced a bug on slab leak detector
('/proc/slab_allocators').  This detector works like as following
decription.

 1. traverse all objects on all the slabs.
 2. determine whether it is active or not.
 3. if active, print who allocate this object.

but that commit changed the way how to manage free objects, so the logic
determining whether it is active or not is also changed.  In before, we
regard object in cpu caches as inactive one, but, with this commit, we
mistakenly regard object in cpu caches as active one.

This intoduces kernel oops if DEBUG_PAGEALLOC is enabled.  If
DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
corrupt free memory in the slab.  It unmaps page table mapping if object
is free and map it if object is active.  When slab leak detector check
object in cpu caches, it mistakenly think this object active so try to
access object memory to retrieve caller of allocation.  At this point,
page table mapping to this object doesn't exist, so oops occurs.

Following is oops message reported from Dave.

It blew up when something tried to read /proc/slab_allocators
(Just cat it, and you should see the oops below)

  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  [snip...]
  CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131
  task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000
  RIP: 0010:[<ffffffffaa1a8f4a>]  [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180
  RSP: 0018:ffff880076925de0  EFLAGS: 00010002
  RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7
  RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000
  RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000
  R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0
  FS:  00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0
  DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
  Call Trace:
    leaks_show+0xce/0x240
    seq_read+0x28e/0x490
    proc_reg_read+0x3d/0x80
    vfs_read+0x9b/0x160
    SyS_read+0x58/0xb0
    tracesys+0xd4/0xd9
  Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46
  RIP   handle_slab+0x8a/0x180

To fix the problem, I introduce an object status buffer on each slab.
With this, we can track object status precisely, so slab leak detector
would not access active object and no kernel oops would occur.  Memory
overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
which is mainly used for debugging, so memory overhead isn't big
problem.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23 16:47:44 -07:00
..
backing-dev.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
compaction.c mm, compaction: properly signal and act upon lock and need_sched() contention 2014-06-04 16:54:11 -07:00
debug-pagealloc.c
dmapool.c mm/dmapool.c: reuse devres_release() to free resources 2014-06-04 16:54:08 -07:00
early_ioremap.c mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
fadvise.c
failslab.c
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
fremap.c mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
frontswap.c swap: change swap_list_head to plist, add swap_avail_head 2014-06-04 16:54:07 -07:00
gup.c mm: cleanup __get_user_pages() 2014-06-04 16:54:05 -07:00
highmem.c
huge_memory.c mm: let mm_find_pmd fix buggy race with THP fault 2014-06-23 16:47:44 -07:00
hugetlb.c hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry 2014-06-23 16:47:43 -07:00
hugetlb_cgroup.c cgroup: remove css_parent() 2014-05-16 13:22:48 -04:00
hwpoison-inject.c mm/hwpoison: add '#' to hwpoison_inject 2014-01-21 16:19:48 -08:00
init-mm.c
internal.h mm, compaction: properly signal and act upon lock and need_sched() contention 2014-06-04 16:54:11 -07:00
interval_tree.c
iov_iter.c bio_vec-backed iov_iter 2014-05-06 17:39:45 -04:00
Kconfig mm/zsmalloc: make zsmalloc module-buildable 2014-06-04 16:54:14 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
ksm.c mm: let mm_find_pmd fix buggy race with THP fault 2014-06-23 16:47:44 -07:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c
madvise.c mm: madvise: fix MADV_WILLNEED on shmem swapouts 2014-05-23 09:37:29 -07:00
Makefile mm: move get_user_pages()-related code to separate file 2014-06-04 16:54:04 -07:00
memblock.c mm/memblock.c: call kmemleak directly from memblock_(alloc|free) 2014-06-06 16:08:17 -07:00
memcontrol.c Merge branch 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-06-09 15:03:33 -07:00
memory-failure.c mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) 2014-06-04 16:54:13 -07:00
memory.c mm: document do_fault_around() feature 2014-06-04 16:54:12 -07:00
memory_hotplug.c mm, migration: add destination page freeing callback 2014-06-04 16:54:06 -07:00
mempolicy.c Merge branch 'next' (accumulated 3.16 merge window patches) into master 2014-06-08 11:31:16 -07:00
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations 2014-06-06 16:08:17 -07:00
migrate.c mm: let mm_find_pmd fix buggy race with THP fault 2014-06-23 16:47:44 -07:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c mm: try_to_unmap_cluster() should lock_page() before mlocking 2014-04-07 16:35:57 -07:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c mm: convert some level-less printks to pr_* 2014-06-06 16:08:18 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: move mmu notifier call from change_protection to change_pmd_range 2014-04-07 16:35:50 -07:00
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-05-11 17:55:48 +09:00
msync.c mm/msync.c: sync only the requested range in msync() 2014-06-04 16:54:11 -07:00
nobootmem.c mm/memblock.c: call kmemleak directly from memblock_(alloc|free) 2014-06-06 16:08:17 -07:00
nommu.c mm: nommu: per-thread vma cache fix 2014-06-23 16:47:43 -07:00
oom_kill.c mm, oom: base root bonus on current usage 2014-01-30 16:56:56 -08:00
page-writeback.c Clean ups and miscellaneous bug fixes, in particular for the new 2014-06-08 13:03:35 -07:00
page_alloc.c mm, pcp: allow restoring percpu_pagelist_fraction default 2014-06-23 16:47:43 -07:00
page_cgroup.c mm/page_cgroup.c: mark functions as static 2014-04-03 16:21:02 -07:00
page_io.c fix __swap_writepage() compile failure on old gcc versions 2014-06-14 19:30:48 -05:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() 2014-04-14 16:18:06 -04:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
process_vm_access.c start adding the tag to iov_iter 2014-05-06 17:32:49 -04:00
quicklist.c
readahead.c mm/readahead.c: inline ra_submit 2014-04-07 16:35:58 -07:00
rmap.c mm: let mm_find_pmd fix buggy race with THP fault 2014-06-23 16:47:44 -07:00
shmem.c shmem: fix faulting into a hole while it's punched 2014-06-23 16:47:44 -07:00
slab.c slab: fix oops when reading /proc/slab_allocators 2014-06-23 16:47:44 -07:00
slab.h memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab 2014-06-04 16:54:01 -07:00
slab_common.c slab: delete cache from list after __kmem_cache_shutdown succeeds 2014-06-04 16:54:08 -07:00
slob.c slab: get_online_mems for kmem_cache_{create,destroy,shrink} 2014-06-04 16:53:59 -07:00
slub.c slub: search partial list on numa_mem_id(), instead of numa_node_id() 2014-06-06 16:08:06 -07:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap.c mm: non-atomically mark page accessed during page cache allocation where possible 2014-06-04 16:54:10 -07:00
swap_state.c mm: page_alloc: convert hot/cold parameter and immediate callers to bool 2014-06-04 16:54:09 -07:00
swapfile.c mm/swapfile.c: delete the "last_in_cluster < scan_base" loop in the body of scan_swap_map() 2014-06-04 16:54:12 -07:00
truncate.c mm: filemap: update find_get_pages_tag() to deal with shadow entries 2014-05-06 13:04:59 -07:00
util.c nick kvfree() from apparmor 2014-05-06 14:02:53 -04:00
vmacache.c mm,vmacache: optimize overflow system-wide flushing 2014-06-04 16:53:57 -07:00
vmalloc.c mm/vmalloc.c: export unmap_kernel_range() 2014-06-04 16:54:14 -07:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
vmstat.c mm: use the light version __mod_zone_page_state in mlocked_vma_newpage() 2014-06-04 16:54:07 -07:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c mm/zbud.c: make size unsigned like unique callsite 2014-06-04 16:54:13 -07:00
zsmalloc.c zsmalloc: fixup trivial zs size classes value in comments 2014-06-04 16:54:13 -07:00
zswap.c mm/zswap: NUMA aware allocation for zswap_dstmem 2014-06-04 16:54:14 -07:00