linux-stable/include/linux/slab_def.h
Thomas Garnier 7c00fce98c mm: reorganize SLAB freelist randomization
The kernel heap allocators are using a sequential freelist making their
allocation predictable.  This predictability makes kernel heap overflow
easier to exploit.  An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist (singled linked) used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data:

slab_test impact is between 3% to 4% on average for 100000 attempts
without smp.  It is a very focused testing, kernbench show the overall
impact on the system is way lower.

Before:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
  100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
  100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
  100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
  100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
  100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
  100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
  100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
  100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
  100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
  2. Kmalloc: alloc/free test
  100000 times kmalloc(8)/kfree -> 70 cycles
  100000 times kmalloc(16)/kfree -> 70 cycles
  100000 times kmalloc(32)/kfree -> 70 cycles
  100000 times kmalloc(64)/kfree -> 70 cycles
  100000 times kmalloc(128)/kfree -> 70 cycles
  100000 times kmalloc(256)/kfree -> 69 cycles
  100000 times kmalloc(512)/kfree -> 70 cycles
  100000 times kmalloc(1024)/kfree -> 73 cycles
  100000 times kmalloc(2048)/kfree -> 72 cycles
  100000 times kmalloc(4096)/kfree -> 71 cycles

After:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
  100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
  100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
  100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
  100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
  100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
  100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
  100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
  100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
  100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
  2. Kmalloc: alloc/free test
  100000 times kmalloc(8)/kfree -> 66 cycles
  100000 times kmalloc(16)/kfree -> 66 cycles
  100000 times kmalloc(32)/kfree -> 66 cycles
  100000 times kmalloc(64)/kfree -> 66 cycles
  100000 times kmalloc(128)/kfree -> 65 cycles
  100000 times kmalloc(256)/kfree -> 67 cycles
  100000 times kmalloc(512)/kfree -> 67 cycles
  100000 times kmalloc(1024)/kfree -> 64 cycles
  100000 times kmalloc(2048)/kfree -> 67 cycles
  100000 times kmalloc(4096)/kfree -> 67 cycles

Kernbench, before:

  Average Optimal load -j 12 Run (std deviation):
  Elapsed Time 101.873 (1.16069)
  User Time 1045.22 (1.60447)
  System Time 88.969 (0.559195)
  Percent CPU 1112.9 (13.8279)
  Context Switches 189140 (2282.15)
  Sleeps 99008.6 (768.091)

After:

  Average Optimal load -j 12 Run (std deviation):
  Elapsed Time 102.47 (0.562732)
  User Time 1045.3 (1.34263)
  System Time 88.311 (0.342554)
  Percent CPU 1105.8 (6.49444)
  Context Switches 189081 (2355.78)
  Sleeps 99231.5 (800.358)

This patch (of 2):

This commit reorganizes the previous SLAB freelist randomization to
prepare for the SLUB implementation.  It moves functions that will be
shared to slab_common.

The entropy functions are changed to align with the SLUB implementation,
now using get_random_(int|long) functions.  These functions were chosen
because they provide a bit more entropy early on boot and better
performance when specific arch instructions are not available.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/1464295031-26375-2-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00

101 lines
2.3 KiB
C

#ifndef _LINUX_SLAB_DEF_H
#define _LINUX_SLAB_DEF_H
#include <linux/reciprocal_div.h>
/*
* Definitions unique to the original Linux SLAB allocator.
*/
struct kmem_cache {
struct array_cache __percpu *cpu_cache;
/* 1) Cache tunables. Protected by slab_mutex */
unsigned int batchcount;
unsigned int limit;
unsigned int shared;
unsigned int size;
struct reciprocal_value reciprocal_buffer_size;
/* 2) touched by every alloc & free from the backend */
unsigned int flags; /* constant flags */
unsigned int num; /* # of objs per slab */
/* 3) cache_grow/shrink */
/* order of pgs per slab (2^n) */
unsigned int gfporder;
/* force GFP flags, e.g. GFP_DMA */
gfp_t allocflags;
size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
struct kmem_cache *freelist_cache;
unsigned int freelist_size;
/* constructor func */
void (*ctor)(void *obj);
/* 4) cache creation/removal */
const char *name;
struct list_head list;
int refcount;
int object_size;
int align;
/* 5) statistics */
#ifdef CONFIG_DEBUG_SLAB
unsigned long num_active;
unsigned long num_allocations;
unsigned long high_mark;
unsigned long grown;
unsigned long reaped;
unsigned long errors;
unsigned long max_freeable;
unsigned long node_allocs;
unsigned long node_frees;
unsigned long node_overflow;
atomic_t allochit;
atomic_t allocmiss;
atomic_t freehit;
atomic_t freemiss;
#ifdef CONFIG_DEBUG_SLAB_LEAK
atomic_t store_user_clean;
#endif
/*
* If debugging is enabled, then the allocator can add additional
* fields and/or padding to every object. size contains the total
* object size including these internal fields, the following two
* variables contain the offset to the user object and its size.
*/
int obj_offset;
#endif /* CONFIG_DEBUG_SLAB */
#ifdef CONFIG_MEMCG
struct memcg_cache_params memcg_params;
#endif
#ifdef CONFIG_KASAN
struct kasan_cache kasan_info;
#endif
#ifdef CONFIG_SLAB_FREELIST_RANDOM
unsigned int *random_seq;
#endif
struct kmem_cache_node *node[MAX_NUMNODES];
};
static inline void *nearest_obj(struct kmem_cache *cache, struct page *page,
void *x) {
void *object = x - (x - page->s_mem) % cache->size;
void *last_object = page->s_mem + (cache->num - 1) * cache->size;
if (unlikely(object > last_object))
return last_object;
else
return object;
}
#endif /* _LINUX_SLAB_DEF_H */