linux-stable/scripts/kallsyms.c

844 lines
19 KiB
C
Raw Permalink Normal View History

/* Generate assembler source containing symbol information
*
* Copyright 2002 by Kai Germaschewski
*
* This software may be used and distributed according to the terms
* of the GNU General Public License, incorporated herein by reference.
*
* Usage: kallsyms [--all-symbols] [--absolute-percpu]
* [--base-relative] [--lto-clang] in.map > out.S
*
* Table compression uses all the unused char codes on the symbols and
* maps these to the most used substrings (tokens). For instance, it might
* map char code 0xF7 to represent "write_" and then in every symbol where
* "write_" appears it can be replaced by 0xF7, saving 5 bytes.
* The used codes themselves are also placed in the table so that the
* decompresion can work without "special cases".
* Applied to kernel symbols, this usually produces a compression ratio
* of about 50%.
*
*/
#include <errno.h>
#include <getopt.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
kallsyms: add support for relative offsets in kallsyms address table Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:19 +00:00
#include <limits.h>
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
#define KSYM_NAME_LEN 512
struct sym_entry {
unsigned long long addr;
unsigned int len;
unsigned int seq;
unsigned int start_pos;
kallsyms: don't overload absolute symbol type for percpu symbols Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with relocation") overloaded the 'A' (absolute) symbol type to signify that a symbol is not subject to dynamic relocation. However, the original A type does not imply that at all, and depending on the version of the toolchain, many A type symbols are emitted that are in fact relative to the kernel text, i.e., if the kernel is relocated at runtime, these symbols should be updated as well. For instance, on sparc32, the following symbols are emitted as absolute (kindly provided by Guenter Roeck): f035a420 A _etext f03d9000 A _sdata f03de8c4 A jiffies f03f8860 A _edata f03fc000 A __init_begin f041bdc8 A __init_text_end f0423000 A __bss_start f0423000 A __init_end f044457d A __bss_stop f044457d A _end On x86_64, similar behavior can be observed: ffffffff81a00000 A __end_rodata_hpage_align ffffffff81b19000 A __vvar_page ffffffff81d3d000 A _end Even if only a couple of them pass the symbol range check that results in them to be taken into account for the final kallsyms symbol table, it is obvious that 'A' does not mean the symbol does not need to be updated at relocation time, and overloading its meaning to signify that is perhaps not a good idea. So instead, add a new percpu_absolute member to struct sym_entry, and when --absolute-percpu is in effect, use it to record symbols whose addresses should be emitted as final values rather than values that still require relocation at runtime. That way, we can drop the check against the 'A' type. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:15 +00:00
unsigned int percpu_absolute;
unsigned char sym[];
};
struct addr_range {
const char *start_sym, *end_sym;
unsigned long long start, end;
};
static unsigned long long _text;
kallsyms: add support for relative offsets in kallsyms address table Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:19 +00:00
static unsigned long long relative_base;
static struct addr_range text_ranges[] = {
{ "_stext", "_etext" },
{ "_sinittext", "_einittext" },
};
#define text_range_text (&text_ranges[0])
#define text_range_inittext (&text_ranges[1])
kallsyms: fix percpu vars on x86-64 with relocation. x86-64 has a problem: per-cpu variables are actually represented by their absolute offsets within the per-cpu area, but the symbols are not emitted as absolute. Thus kallsyms naively creates them as offsets from _text, meaning their values change if the kernel is relocated (especially noticeable with CONFIG_RANDOMIZE_BASE): $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.nokaslr 0000000000000000 D __per_cpu_start 0000000000004000 D gdt_page 0000000000014280 D __per_cpu_end ffffffff810001c8 T _stext ffffffff81ee53c0 D __per_cpu_offset $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.kaslr1 000000001f200000 D __per_cpu_start 000000001f204000 D gdt_page 000000001f214280 D __per_cpu_end ffffffffa02001c8 T _stext ffffffffa10e53c0 D __per_cpu_offset Making them absolute symbols is the Right Thing, but requires fixes to the relocs tool. So for the moment, we add a --absolute-percpu option which makes them absolute from a kallsyms perspective: $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # no KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff802001c8 T _stext ffffffff8099b180 D __per_cpu_offset ffffffff809a3000 D __per_cpu_load $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # With KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff89c001c8 T _stext ffffffff8a39d180 D __per_cpu_offset ffffffff8a3a5000 D __per_cpu_load Based-on-the-original-screenplay-by: Andy Honig <ahonig@google.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kees Cook <keescook@chromium.org>
2014-03-17 03:35:46 +00:00
static struct addr_range percpu_range = {
"__per_cpu_start", "__per_cpu_end", -1ULL, 0
};
static struct sym_entry **table;
static unsigned int table_size, table_cnt;
static int all_symbols;
static int absolute_percpu;
static int base_relative;
static int lto_clang;
static int token_profit[0x10000];
/* the table that holds the result of the compression */
static unsigned char best_table[256][2];
static unsigned char best_table_len[256];
static void usage(void)
{
fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
"[--base-relative] [--lto-clang] in.map > out.S\n");
exit(1);
}
static char *sym_name(const struct sym_entry *s)
{
return (char *)s->sym + 1;
}
static bool is_ignored_symbol(const char *name, char type)
{
if (type == 'u' || type == 'n')
return true;
if (toupper(type) == 'A') {
/* Keep these useful absolute symbols */
if (strcmp(name, "__kernel_syscall_via_break") &&
strcmp(name, "__kernel_syscall_via_epc") &&
strcmp(name, "__kernel_sigtramp") &&
strcmp(name, "__gp"))
return true;
}
return false;
}
static void check_symbol_range(const char *sym, unsigned long long addr,
struct addr_range *ranges, int entries)
{
size_t i;
struct addr_range *ar;
for (i = 0; i < entries; ++i) {
ar = &ranges[i];
if (strcmp(sym, ar->start_sym) == 0) {
ar->start = addr;
return;
} else if (strcmp(sym, ar->end_sym) == 0) {
ar->end = addr;
return;
}
}
}
static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
{
char *name, type, *p;
unsigned long long addr;
size_t len;
ssize_t readlen;
struct sym_entry *sym;
errno = 0;
readlen = getline(buf, buf_len, in);
if (readlen < 0) {
if (errno) {
perror("read_symbol");
exit(EXIT_FAILURE);
}
return NULL;
}
if ((*buf)[readlen - 1] == '\n')
(*buf)[readlen - 1] = 0;
addr = strtoull(*buf, &p, 16);
if (*buf == p || *p++ != ' ' || !isascii((type = *p++)) || *p++ != ' ') {
fprintf(stderr, "line format error\n");
exit(EXIT_FAILURE);
}
name = p;
len = strlen(name);
if (len >= KSYM_NAME_LEN) {
fprintf(stderr, "Symbol %s too long for kallsyms (%zu >= %d).\n"
"Please increase KSYM_NAME_LEN both in kernel and kallsyms.c\n",
name, len, KSYM_NAME_LEN);
return NULL;
}
if (strcmp(name, "_text") == 0)
_text = addr;
scripts/kallsyms: fix wrong kallsyms_relative_base There is the code in the read_symbol function in 'scripts/kallsyms.c': if (is_ignored_symbol(name, type)) return NULL; /* Ignore most absolute/undefined (?) symbols. */ if (strcmp(name, "_text") == 0) _text = addr; But the is_ignored_symbol function returns true for name="_text" and type='A'. So the next condition is not executed and the _text variable is always zero. It makes the wrong kallsyms_relative_base symbol as a result of the code (CONFIG_KALLSYMS_BASE_RELATIVE is defined): if (base_relative) { output_label("kallsyms_relative_base"); output_address(relative_base); printf("\n"); } Because the output_address function uses the _text variable. So the kallsyms_lookup function and all related functions in the kernel do not work properly. For example, the stack trace in oops: Call Trace: [aa095e58] [809feab8] kobj_ns_ops_tbl+0x7ff09ac8/0x7ff1c1c4 (unreliable) [aa095e98] [80002b64] kobj_ns_ops_tbl+0x7f50db74/0x80000010 [aa095ef8] [809c3d24] kobj_ns_ops_tbl+0x7feced34/0x7ff1c1c4 [aa095f28] [80002ed0] kobj_ns_ops_tbl+0x7f50dee0/0x80000010 [aa095f38] [8000f238] kobj_ns_ops_tbl+0x7f51a248/0x80000010 The right stack trace: Call Trace: [aa095e58] [809feab8] module_vdu_video_init+0x2fc/0x3bc (unreliable) [aa095e98] [80002b64] do_one_initcall+0x40/0x1f0 [aa095ef8] [809c3d24] kernel_init_freeable+0x164/0x1d8 [aa095f28] [80002ed0] kernel_init+0x14/0x124 [aa095f38] [8000f238] ret_from_kernel_thread+0x14/0x1c [masahiroy@kernel.org: This issue happens on binutils <= 2.22 The following commit fixed it: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d2667025dd30611514810c28bee9709e4623012a The symbol type of _text is 'T' on binutils >= 2.23 The minimal supported binutils version for the kernel build is 2.21 ] Signed-off-by: Mikhail Petrov <Mikhail.Petrov@mir.dev> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-11 20:37:09 +00:00
/* Ignore most absolute/undefined (?) symbols. */
if (is_ignored_symbol(name, type))
return NULL;
check_symbol_range(name, addr, text_ranges, ARRAY_SIZE(text_ranges));
check_symbol_range(name, addr, &percpu_range, 1);
/* include the type field in the symbol name, so that it gets
* compressed together */
len++;
sym = malloc(sizeof(*sym) + len + 1);
if (!sym) {
fprintf(stderr, "kallsyms failure: "
"unable to allocate required amount of memory\n");
exit(EXIT_FAILURE);
}
sym->addr = addr;
sym->len = len;
sym->sym[0] = type;
strcpy(sym_name(sym), name);
sym->percpu_absolute = 0;
kallsyms: don't overload absolute symbol type for percpu symbols Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with relocation") overloaded the 'A' (absolute) symbol type to signify that a symbol is not subject to dynamic relocation. However, the original A type does not imply that at all, and depending on the version of the toolchain, many A type symbols are emitted that are in fact relative to the kernel text, i.e., if the kernel is relocated at runtime, these symbols should be updated as well. For instance, on sparc32, the following symbols are emitted as absolute (kindly provided by Guenter Roeck): f035a420 A _etext f03d9000 A _sdata f03de8c4 A jiffies f03f8860 A _edata f03fc000 A __init_begin f041bdc8 A __init_text_end f0423000 A __bss_start f0423000 A __init_end f044457d A __bss_stop f044457d A _end On x86_64, similar behavior can be observed: ffffffff81a00000 A __end_rodata_hpage_align ffffffff81b19000 A __vvar_page ffffffff81d3d000 A _end Even if only a couple of them pass the symbol range check that results in them to be taken into account for the final kallsyms symbol table, it is obvious that 'A' does not mean the symbol does not need to be updated at relocation time, and overloading its meaning to signify that is perhaps not a good idea. So instead, add a new percpu_absolute member to struct sym_entry, and when --absolute-percpu is in effect, use it to record symbols whose addresses should be emitted as final values rather than values that still require relocation at runtime. That way, we can drop the check against the 'A' type. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:15 +00:00
return sym;
}
static int symbol_in_range(const struct sym_entry *s,
const struct addr_range *ranges, int entries)
{
size_t i;
const struct addr_range *ar;
for (i = 0; i < entries; ++i) {
ar = &ranges[i];
if (s->addr >= ar->start && s->addr <= ar->end)
return 1;
}
return 0;
}
static int symbol_valid(const struct sym_entry *s)
{
const char *name = sym_name(s);
/* if --all-symbols is not specified, then symbols outside the text
* and inittext sections are discarded */
if (!all_symbols) {
if (symbol_in_range(s, text_ranges,
ARRAY_SIZE(text_ranges)) == 0)
return 0;
/* Corner case. Discard any symbols with the same value as
* _etext _einittext; they can move between pass 1 and 2 when
* the kallsyms data are added. If these symbols move then
* they may get dropped in pass 2, which breaks the kallsyms
* rules.
*/
if ((s->addr == text_range_text->end &&
strcmp(name, text_range_text->end_sym)) ||
(s->addr == text_range_inittext->end &&
strcmp(name, text_range_inittext->end_sym)))
return 0;
}
return 1;
}
/* remove all the invalid symbols from the table */
static void shrink_table(void)
{
unsigned int i, pos;
pos = 0;
for (i = 0; i < table_cnt; i++) {
if (symbol_valid(table[i])) {
if (pos != i)
table[pos] = table[i];
pos++;
} else {
free(table[i]);
}
}
table_cnt = pos;
/* When valid symbol is not registered, exit to error */
if (!table_cnt) {
fprintf(stderr, "No valid symbol.\n");
exit(1);
}
}
static void read_map(const char *in)
{
FILE *fp;
struct sym_entry *sym;
char *buf = NULL;
size_t buflen = 0;
fp = fopen(in, "r");
if (!fp) {
perror(in);
exit(1);
}
while (!feof(fp)) {
sym = read_symbol(fp, &buf, &buflen);
if (!sym)
continue;
sym->start_pos = table_cnt;
if (table_cnt >= table_size) {
table_size += 10000;
table = realloc(table, sizeof(*table) * table_size);
if (!table) {
fprintf(stderr, "out of memory\n");
fclose(fp);
exit (1);
}
}
table[table_cnt++] = sym;
}
free(buf);
fclose(fp);
}
static void output_label(const char *label)
{
printf(".globl %s\n", label);
printf("\tALGN\n");
printf("%s:\n", label);
}
/* Provide proper symbols relocatability by their '_text' relativeness. */
static void output_address(unsigned long long addr)
{
if (_text <= addr)
printf("\tPTR\t_text + %#llx\n", addr - _text);
else
printf("\tPTR\t_text - %#llx\n", _text - addr);
}
/* uncompress a compressed symbol. When this function is called, the best table
* might still be compressed itself, so the function needs to be recursive */
static int expand_symbol(const unsigned char *data, int len, char *result)
{
int c, rlen, total=0;
while (len) {
c = *data;
/* if the table holds a single char that is the same as the one
* we are looking for, then end the search */
if (best_table[c][0]==c && best_table_len[c]==1) {
*result++ = c;
total++;
} else {
/* if not, recurse and expand */
rlen = expand_symbol(best_table[c], best_table_len[c], result);
total += rlen;
result += rlen;
}
data++;
len--;
}
*result=0;
return total;
}
static int symbol_absolute(const struct sym_entry *s)
{
kallsyms: don't overload absolute symbol type for percpu symbols Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with relocation") overloaded the 'A' (absolute) symbol type to signify that a symbol is not subject to dynamic relocation. However, the original A type does not imply that at all, and depending on the version of the toolchain, many A type symbols are emitted that are in fact relative to the kernel text, i.e., if the kernel is relocated at runtime, these symbols should be updated as well. For instance, on sparc32, the following symbols are emitted as absolute (kindly provided by Guenter Roeck): f035a420 A _etext f03d9000 A _sdata f03de8c4 A jiffies f03f8860 A _edata f03fc000 A __init_begin f041bdc8 A __init_text_end f0423000 A __bss_start f0423000 A __init_end f044457d A __bss_stop f044457d A _end On x86_64, similar behavior can be observed: ffffffff81a00000 A __end_rodata_hpage_align ffffffff81b19000 A __vvar_page ffffffff81d3d000 A _end Even if only a couple of them pass the symbol range check that results in them to be taken into account for the final kallsyms symbol table, it is obvious that 'A' does not mean the symbol does not need to be updated at relocation time, and overloading its meaning to signify that is perhaps not a good idea. So instead, add a new percpu_absolute member to struct sym_entry, and when --absolute-percpu is in effect, use it to record symbols whose addresses should be emitted as final values rather than values that still require relocation at runtime. That way, we can drop the check against the 'A' type. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:15 +00:00
return s->percpu_absolute;
}
static void cleanup_symbol_name(char *s)
{
char *p;
/*
* ASCII[.] = 2e
* ASCII[0-9] = 30,39
* ASCII[A-Z] = 41,5a
* ASCII[_] = 5f
* ASCII[a-z] = 61,7a
*
kallsyms: strip LTO-only suffixes from promoted global functions Commit 6eb4bd92c1ce ("kallsyms: strip LTO suffixes from static functions") stripped all function/variable suffixes started with '.' regardless of whether those suffixes are generated at LTO mode or not. In fact, as far as I know, in LTO mode, when a static function/variable is promoted to the global scope, '.llvm.<...>' suffix is added. The existing mechanism breaks live patch for a LTO kernel even if no <symbol>.llvm.<...> symbols are involved. For example, for the following kernel symbols: $ grep bpf_verifier_vlog /proc/kallsyms ffffffff81549f60 t bpf_verifier_vlog ffffffff8268b430 d bpf_verifier_vlog._entry ffffffff8282a958 d bpf_verifier_vlog._entry_ptr ffffffff82e12a1f d bpf_verifier_vlog.__already_done 'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and '__already_done' are static variables used inside 'bpf_verifier_vlog', so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'. Note that the func-level to file-level static function promotion also happens without LTO. Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will return 4 symbols to live patch subsystem which current live patching subsystem cannot handle it. With non-LTO kernel, only one symbol is returned. In [1], we have a lengthy discussion, the suggestion is to separate two cases: (1). new symbols with suffix which are generated regardless of whether LTO is enabled or not, and (2). new symbols with suffix generated only when LTO is enabled. The cleanup_symbol_name() should only remove suffixes for case (2). Case (1) should not be changed so it can work uniformly with or without LTO. This patch removed LTO-only suffix '.llvm.<...>' so live patching and tracing should work the same way for non-LTO kernel. The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same filtering pattern so both kernel and kallsyms tool have the same expectation on the order of symbols. [1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u Fixes: 6eb4bd92c1ce ("kallsyms: strip LTO suffixes from static functions") Reported-by: Song Liu <song@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-06-28 18:19:26 +00:00
* As above, replacing the first '.' in ".llvm." with '\0' does not
* affect the main sorting, but it helps us with subsorting.
*/
kallsyms: strip LTO-only suffixes from promoted global functions Commit 6eb4bd92c1ce ("kallsyms: strip LTO suffixes from static functions") stripped all function/variable suffixes started with '.' regardless of whether those suffixes are generated at LTO mode or not. In fact, as far as I know, in LTO mode, when a static function/variable is promoted to the global scope, '.llvm.<...>' suffix is added. The existing mechanism breaks live patch for a LTO kernel even if no <symbol>.llvm.<...> symbols are involved. For example, for the following kernel symbols: $ grep bpf_verifier_vlog /proc/kallsyms ffffffff81549f60 t bpf_verifier_vlog ffffffff8268b430 d bpf_verifier_vlog._entry ffffffff8282a958 d bpf_verifier_vlog._entry_ptr ffffffff82e12a1f d bpf_verifier_vlog.__already_done 'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and '__already_done' are static variables used inside 'bpf_verifier_vlog', so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'. Note that the func-level to file-level static function promotion also happens without LTO. Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will return 4 symbols to live patch subsystem which current live patching subsystem cannot handle it. With non-LTO kernel, only one symbol is returned. In [1], we have a lengthy discussion, the suggestion is to separate two cases: (1). new symbols with suffix which are generated regardless of whether LTO is enabled or not, and (2). new symbols with suffix generated only when LTO is enabled. The cleanup_symbol_name() should only remove suffixes for case (2). Case (1) should not be changed so it can work uniformly with or without LTO. This patch removed LTO-only suffix '.llvm.<...>' so live patching and tracing should work the same way for non-LTO kernel. The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same filtering pattern so both kernel and kallsyms tool have the same expectation on the order of symbols. [1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u Fixes: 6eb4bd92c1ce ("kallsyms: strip LTO suffixes from static functions") Reported-by: Song Liu <song@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-06-28 18:19:26 +00:00
p = strstr(s, ".llvm.");
if (p)
*p = '\0';
}
static int compare_names(const void *a, const void *b)
{
int ret;
const struct sym_entry *sa = *(const struct sym_entry **)a;
const struct sym_entry *sb = *(const struct sym_entry **)b;
2023-03-08 11:52:42 +00:00
ret = strcmp(sym_name(sa), sym_name(sb));
if (!ret) {
if (sa->addr > sb->addr)
return 1;
else if (sa->addr < sb->addr)
return -1;
/* keep old order */
return (int)(sa->seq - sb->seq);
}
return ret;
}
static void sort_symbols_by_name(void)
{
qsort(table, table_cnt, sizeof(table[0]), compare_names);
}
static void write_src(void)
{
unsigned int i, k, off;
unsigned int best_idx[256];
unsigned int *markers;
char buf[KSYM_NAME_LEN];
printf("#include <asm/bitsperlong.h>\n");
printf("#if BITS_PER_LONG == 64\n");
printf("#define PTR .quad\n");
printf("#define ALGN .balign 8\n");
printf("#else\n");
printf("#define PTR .long\n");
printf("#define ALGN .balign 4\n");
printf("#endif\n");
printf("\t.section .rodata, \"a\"\n");
output_label("kallsyms_num_syms");
printf("\t.long\t%u\n", table_cnt);
printf("\n");
/* table of offset markers, that give the offset in the compressed stream
* every 256 symbols */
markers = malloc(sizeof(unsigned int) * ((table_cnt + 255) / 256));
if (!markers) {
fprintf(stderr, "kallsyms failure: "
"unable to allocate required memory\n");
exit(EXIT_FAILURE);
}
output_label("kallsyms_names");
off = 0;
for (i = 0; i < table_cnt; i++) {
if ((i & 0xFF) == 0)
markers[i >> 8] = off;
table[i]->seq = i;
/* There cannot be any symbol of length zero. */
if (table[i]->len == 0) {
fprintf(stderr, "kallsyms failure: "
"unexpected zero symbol length\n");
exit(EXIT_FAILURE);
}
/* Only lengths that fit in up-to-two-byte ULEB128 are supported. */
if (table[i]->len > 0x3FFF) {
fprintf(stderr, "kallsyms failure: "
"unexpected huge symbol length\n");
exit(EXIT_FAILURE);
}
/* Encode length with ULEB128. */
if (table[i]->len <= 0x7F) {
/* Most symbols use a single byte for the length. */
printf("\t.byte 0x%02x", table[i]->len);
off += table[i]->len + 1;
} else {
/* "Big" symbols use two bytes. */
printf("\t.byte 0x%02x, 0x%02x",
(table[i]->len & 0x7F) | 0x80,
(table[i]->len >> 7) & 0x7F);
off += table[i]->len + 2;
}
for (k = 0; k < table[i]->len; k++)
printf(", 0x%02x", table[i]->sym[k]);
printf("\n");
}
printf("\n");
2023-03-08 11:52:42 +00:00
/*
* Now that we wrote out the compressed symbol names, restore the
* original names, which are needed in some of the later steps.
*/
for (i = 0; i < table_cnt; i++) {
expand_symbol(table[i]->sym, table[i]->len, buf);
strcpy((char *)table[i]->sym, buf);
}
output_label("kallsyms_markers");
for (i = 0; i < ((table_cnt + 255) >> 8); i++)
printf("\t.long\t%u\n", markers[i]);
printf("\n");
free(markers);
output_label("kallsyms_token_table");
off = 0;
for (i = 0; i < 256; i++) {
best_idx[i] = off;
expand_symbol(best_table[i], best_table_len[i], buf);
printf("\t.asciz\t\"%s\"\n", buf);
off += strlen(buf) + 1;
}
printf("\n");
output_label("kallsyms_token_index");
for (i = 0; i < 256; i++)
printf("\t.short\t%d\n", best_idx[i]);
printf("\n");
if (!base_relative)
output_label("kallsyms_addresses");
else
output_label("kallsyms_offsets");
for (i = 0; i < table_cnt; i++) {
if (base_relative) {
/*
* Use the offset relative to the lowest value
* encountered of all relative symbols, and emit
* non-relocatable fixed offsets that will be fixed
* up at runtime.
*/
long long offset;
int overflow;
if (!absolute_percpu) {
offset = table[i]->addr - relative_base;
overflow = (offset < 0 || offset > UINT_MAX);
} else if (symbol_absolute(table[i])) {
offset = table[i]->addr;
overflow = (offset < 0 || offset > INT_MAX);
} else {
offset = relative_base - table[i]->addr - 1;
overflow = (offset < INT_MIN || offset >= 0);
}
if (overflow) {
fprintf(stderr, "kallsyms failure: "
"%s symbol value %#llx out of range in relative mode\n",
symbol_absolute(table[i]) ? "absolute" : "relative",
table[i]->addr);
exit(EXIT_FAILURE);
}
2023-03-08 11:52:42 +00:00
printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym);
} else if (!symbol_absolute(table[i])) {
output_address(table[i]->addr);
} else {
printf("\tPTR\t%#llx\n", table[i]->addr);
}
}
printf("\n");
if (base_relative) {
output_label("kallsyms_relative_base");
output_address(relative_base);
printf("\n");
}
2023-03-08 11:52:42 +00:00
if (lto_clang)
for (i = 0; i < table_cnt; i++)
cleanup_symbol_name((char *)table[i]->sym);
sort_symbols_by_name();
output_label("kallsyms_seqs_of_names");
for (i = 0; i < table_cnt; i++)
printf("\t.byte 0x%02x, 0x%02x, 0x%02x\n",
(unsigned char)(table[i]->seq >> 16),
(unsigned char)(table[i]->seq >> 8),
(unsigned char)(table[i]->seq >> 0));
printf("\n");
}
/* table lookup compression functions */
/* count all the possible tokens in a symbol */
static void learn_symbol(const unsigned char *symbol, int len)
{
int i;
for (i = 0; i < len - 1; i++)
token_profit[ symbol[i] + (symbol[i + 1] << 8) ]++;
}
/* decrease the count for all the possible tokens in a symbol */
static void forget_symbol(const unsigned char *symbol, int len)
{
int i;
for (i = 0; i < len - 1; i++)
token_profit[ symbol[i] + (symbol[i + 1] << 8) ]--;
}
/* do the initial token count */
static void build_initial_token_table(void)
{
unsigned int i;
for (i = 0; i < table_cnt; i++)
learn_symbol(table[i]->sym, table[i]->len);
}
static unsigned char *find_token(unsigned char *str, int len,
const unsigned char *token)
{
int i;
for (i = 0; i < len - 1; i++) {
if (str[i] == token[0] && str[i+1] == token[1])
return &str[i];
}
return NULL;
}
/* replace a given token in all the valid symbols. Use the sampled symbols
* to update the counts */
static void compress_symbols(const unsigned char *str, int idx)
{
unsigned int i, len, size;
unsigned char *p1, *p2;
for (i = 0; i < table_cnt; i++) {
len = table[i]->len;
p1 = table[i]->sym;
/* find the token on the symbol */
p2 = find_token(p1, len, str);
if (!p2) continue;
/* decrease the counts for this symbol's tokens */
forget_symbol(table[i]->sym, len);
size = len;
do {
*p2 = idx;
p2++;
size -= (p2 - p1);
memmove(p2, p2 + 1, size);
p1 = p2;
len--;
if (size < 2) break;
/* find the token on the symbol */
p2 = find_token(p1, size, str);
} while (p2);
table[i]->len = len;
/* increase the counts for this symbol's new tokens */
learn_symbol(table[i]->sym, len);
}
}
/* search the token with the maximum profit */
static int find_best_token(void)
{
int i, best, bestprofit;
bestprofit=-10000;
best = 0;
for (i = 0; i < 0x10000; i++) {
if (token_profit[i] > bestprofit) {
best = i;
bestprofit = token_profit[i];
}
}
return best;
}
/* this is the core of the algorithm: calculate the "best" table */
static void optimize_result(void)
{
int i, best;
/* using the '\0' symbol last allows compress_symbols to use standard
* fast string functions */
for (i = 255; i >= 0; i--) {
/* if this table slot is empty (it is not used by an actual
* original char code */
if (!best_table_len[i]) {
/* find the token with the best profit value */
best = find_best_token();
if (token_profit[best] == 0)
break;
/* place it in the "best" table */
best_table_len[i] = 2;
best_table[i][0] = best & 0xFF;
best_table[i][1] = (best >> 8) & 0xFF;
/* replace this token in all the valid symbols */
compress_symbols(best_table[i], i);
}
}
}
/* start by placing the symbols that are actually used on the table */
static void insert_real_symbols_in_table(void)
{
unsigned int i, j, c;
for (i = 0; i < table_cnt; i++) {
for (j = 0; j < table[i]->len; j++) {
c = table[i]->sym[j];
best_table[c][0]=c;
best_table_len[c]=1;
}
}
}
static void optimize_token_table(void)
{
build_initial_token_table();
insert_real_symbols_in_table();
optimize_result();
}
kallsyms, tracing: output more proper symbol name Impact: bugfix, output more reliable symbol lookup result Debug tools(dump_stack(), ftrace...) are like to print out symbols. But it is always print out the first aliased symbol.(Aliased symbols are symbols with the same address), and the first aliased symbol is sometime not proper. # echo function_graph > current_tracer # cat trace ...... 1) 1.923 us | select_nohz_load_balancer(); 1) + 76.692 us | } 1) | default_idle() { 1) ==========> | __irqentry_text_start() { 1) 0.000 us | native_apic_mem_write(); 1) | irq_enter() { 1) 0.000 us | idle_cpu(); 1) | tick_check_idle() { 1) 0.000 us | tick_check_oneshot_broadcast(); 1) | tick_nohz_stop_idle() { ...... It's very embarrassing, it ouputs "__irqentry_text_start()", actually, it should output "smp_apic_timer_interrupt()". (these two symbol are the same address, but "__irqentry_text_start" is deemed to the first aliased symbol by scripts/kallsyms) This patch puts symbols like "__irqentry_text_start" to the second aliased symbols. And a more proper symbol name becomes the first. Aliased symbols mostly come from linker script. The solution is guessing "is this symbol defined in linker script", the symbols defined in linker script will not become the first aliased symbol. And if symbols are found to be equal in this "linker script provided" criteria, symbols are sorted by the number of prefix underscores. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Paulo Marques <pmarques@grupopie.com> LKML-Reference: <49BA06E2.7080807@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 07:10:26 +00:00
/* guess for "linker script provide" symbol */
static int may_be_linker_script_provide_symbol(const struct sym_entry *se)
{
const char *symbol = sym_name(se);
kallsyms, tracing: output more proper symbol name Impact: bugfix, output more reliable symbol lookup result Debug tools(dump_stack(), ftrace...) are like to print out symbols. But it is always print out the first aliased symbol.(Aliased symbols are symbols with the same address), and the first aliased symbol is sometime not proper. # echo function_graph > current_tracer # cat trace ...... 1) 1.923 us | select_nohz_load_balancer(); 1) + 76.692 us | } 1) | default_idle() { 1) ==========> | __irqentry_text_start() { 1) 0.000 us | native_apic_mem_write(); 1) | irq_enter() { 1) 0.000 us | idle_cpu(); 1) | tick_check_idle() { 1) 0.000 us | tick_check_oneshot_broadcast(); 1) | tick_nohz_stop_idle() { ...... It's very embarrassing, it ouputs "__irqentry_text_start()", actually, it should output "smp_apic_timer_interrupt()". (these two symbol are the same address, but "__irqentry_text_start" is deemed to the first aliased symbol by scripts/kallsyms) This patch puts symbols like "__irqentry_text_start" to the second aliased symbols. And a more proper symbol name becomes the first. Aliased symbols mostly come from linker script. The solution is guessing "is this symbol defined in linker script", the symbols defined in linker script will not become the first aliased symbol. And if symbols are found to be equal in this "linker script provided" criteria, symbols are sorted by the number of prefix underscores. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Paulo Marques <pmarques@grupopie.com> LKML-Reference: <49BA06E2.7080807@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 07:10:26 +00:00
int len = se->len - 1;
if (len < 8)
return 0;
if (symbol[0] != '_' || symbol[1] != '_')
return 0;
/* __start_XXXXX */
if (!memcmp(symbol + 2, "start_", 6))
return 1;
/* __stop_XXXXX */
if (!memcmp(symbol + 2, "stop_", 5))
return 1;
/* __end_XXXXX */
if (!memcmp(symbol + 2, "end_", 4))
return 1;
/* __XXXXX_start */
if (!memcmp(symbol + len - 6, "_start", 6))
return 1;
/* __XXXXX_end */
if (!memcmp(symbol + len - 4, "_end", 4))
return 1;
return 0;
}
static int compare_symbols(const void *a, const void *b)
{
const struct sym_entry *sa = *(const struct sym_entry **)a;
const struct sym_entry *sb = *(const struct sym_entry **)b;
int wa, wb;
/* sort by address first */
if (sa->addr > sb->addr)
return 1;
if (sa->addr < sb->addr)
return -1;
/* sort by "weakness" type */
wa = (sa->sym[0] == 'w') || (sa->sym[0] == 'W');
wb = (sb->sym[0] == 'w') || (sb->sym[0] == 'W');
if (wa != wb)
return wa - wb;
kallsyms, tracing: output more proper symbol name Impact: bugfix, output more reliable symbol lookup result Debug tools(dump_stack(), ftrace...) are like to print out symbols. But it is always print out the first aliased symbol.(Aliased symbols are symbols with the same address), and the first aliased symbol is sometime not proper. # echo function_graph > current_tracer # cat trace ...... 1) 1.923 us | select_nohz_load_balancer(); 1) + 76.692 us | } 1) | default_idle() { 1) ==========> | __irqentry_text_start() { 1) 0.000 us | native_apic_mem_write(); 1) | irq_enter() { 1) 0.000 us | idle_cpu(); 1) | tick_check_idle() { 1) 0.000 us | tick_check_oneshot_broadcast(); 1) | tick_nohz_stop_idle() { ...... It's very embarrassing, it ouputs "__irqentry_text_start()", actually, it should output "smp_apic_timer_interrupt()". (these two symbol are the same address, but "__irqentry_text_start" is deemed to the first aliased symbol by scripts/kallsyms) This patch puts symbols like "__irqentry_text_start" to the second aliased symbols. And a more proper symbol name becomes the first. Aliased symbols mostly come from linker script. The solution is guessing "is this symbol defined in linker script", the symbols defined in linker script will not become the first aliased symbol. And if symbols are found to be equal in this "linker script provided" criteria, symbols are sorted by the number of prefix underscores. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Paulo Marques <pmarques@grupopie.com> LKML-Reference: <49BA06E2.7080807@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 07:10:26 +00:00
/* sort by "linker script provide" type */
wa = may_be_linker_script_provide_symbol(sa);
wb = may_be_linker_script_provide_symbol(sb);
if (wa != wb)
return wa - wb;
/* sort by the number of prefix underscores */
wa = strspn(sym_name(sa), "_");
wb = strspn(sym_name(sb), "_");
kallsyms, tracing: output more proper symbol name Impact: bugfix, output more reliable symbol lookup result Debug tools(dump_stack(), ftrace...) are like to print out symbols. But it is always print out the first aliased symbol.(Aliased symbols are symbols with the same address), and the first aliased symbol is sometime not proper. # echo function_graph > current_tracer # cat trace ...... 1) 1.923 us | select_nohz_load_balancer(); 1) + 76.692 us | } 1) | default_idle() { 1) ==========> | __irqentry_text_start() { 1) 0.000 us | native_apic_mem_write(); 1) | irq_enter() { 1) 0.000 us | idle_cpu(); 1) | tick_check_idle() { 1) 0.000 us | tick_check_oneshot_broadcast(); 1) | tick_nohz_stop_idle() { ...... It's very embarrassing, it ouputs "__irqentry_text_start()", actually, it should output "smp_apic_timer_interrupt()". (these two symbol are the same address, but "__irqentry_text_start" is deemed to the first aliased symbol by scripts/kallsyms) This patch puts symbols like "__irqentry_text_start" to the second aliased symbols. And a more proper symbol name becomes the first. Aliased symbols mostly come from linker script. The solution is guessing "is this symbol defined in linker script", the symbols defined in linker script will not become the first aliased symbol. And if symbols are found to be equal in this "linker script provided" criteria, symbols are sorted by the number of prefix underscores. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Paulo Marques <pmarques@grupopie.com> LKML-Reference: <49BA06E2.7080807@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 07:10:26 +00:00
if (wa != wb)
return wa - wb;
/* sort by initial order, so that other symbols are left undisturbed */
return sa->start_pos - sb->start_pos;
}
static void sort_symbols(void)
{
qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
}
kallsyms: fix percpu vars on x86-64 with relocation. x86-64 has a problem: per-cpu variables are actually represented by their absolute offsets within the per-cpu area, but the symbols are not emitted as absolute. Thus kallsyms naively creates them as offsets from _text, meaning their values change if the kernel is relocated (especially noticeable with CONFIG_RANDOMIZE_BASE): $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.nokaslr 0000000000000000 D __per_cpu_start 0000000000004000 D gdt_page 0000000000014280 D __per_cpu_end ffffffff810001c8 T _stext ffffffff81ee53c0 D __per_cpu_offset $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.kaslr1 000000001f200000 D __per_cpu_start 000000001f204000 D gdt_page 000000001f214280 D __per_cpu_end ffffffffa02001c8 T _stext ffffffffa10e53c0 D __per_cpu_offset Making them absolute symbols is the Right Thing, but requires fixes to the relocs tool. So for the moment, we add a --absolute-percpu option which makes them absolute from a kallsyms perspective: $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # no KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff802001c8 T _stext ffffffff8099b180 D __per_cpu_offset ffffffff809a3000 D __per_cpu_load $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # With KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff89c001c8 T _stext ffffffff8a39d180 D __per_cpu_offset ffffffff8a3a5000 D __per_cpu_load Based-on-the-original-screenplay-by: Andy Honig <ahonig@google.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kees Cook <keescook@chromium.org>
2014-03-17 03:35:46 +00:00
static void make_percpus_absolute(void)
{
unsigned int i;
for (i = 0; i < table_cnt; i++)
if (symbol_in_range(table[i], &percpu_range, 1)) {
kallsyms: don't overload absolute symbol type for percpu symbols Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with relocation") overloaded the 'A' (absolute) symbol type to signify that a symbol is not subject to dynamic relocation. However, the original A type does not imply that at all, and depending on the version of the toolchain, many A type symbols are emitted that are in fact relative to the kernel text, i.e., if the kernel is relocated at runtime, these symbols should be updated as well. For instance, on sparc32, the following symbols are emitted as absolute (kindly provided by Guenter Roeck): f035a420 A _etext f03d9000 A _sdata f03de8c4 A jiffies f03f8860 A _edata f03fc000 A __init_begin f041bdc8 A __init_text_end f0423000 A __bss_start f0423000 A __init_end f044457d A __bss_stop f044457d A _end On x86_64, similar behavior can be observed: ffffffff81a00000 A __end_rodata_hpage_align ffffffff81b19000 A __vvar_page ffffffff81d3d000 A _end Even if only a couple of them pass the symbol range check that results in them to be taken into account for the final kallsyms symbol table, it is obvious that 'A' does not mean the symbol does not need to be updated at relocation time, and overloading its meaning to signify that is perhaps not a good idea. So instead, add a new percpu_absolute member to struct sym_entry, and when --absolute-percpu is in effect, use it to record symbols whose addresses should be emitted as final values rather than values that still require relocation at runtime. That way, we can drop the check against the 'A' type. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:15 +00:00
/*
* Keep the 'A' override for percpu symbols to
* ensure consistent behavior compared to older
* versions of this tool.
*/
table[i]->sym[0] = 'A';
table[i]->percpu_absolute = 1;
kallsyms: don't overload absolute symbol type for percpu symbols Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with relocation") overloaded the 'A' (absolute) symbol type to signify that a symbol is not subject to dynamic relocation. However, the original A type does not imply that at all, and depending on the version of the toolchain, many A type symbols are emitted that are in fact relative to the kernel text, i.e., if the kernel is relocated at runtime, these symbols should be updated as well. For instance, on sparc32, the following symbols are emitted as absolute (kindly provided by Guenter Roeck): f035a420 A _etext f03d9000 A _sdata f03de8c4 A jiffies f03f8860 A _edata f03fc000 A __init_begin f041bdc8 A __init_text_end f0423000 A __bss_start f0423000 A __init_end f044457d A __bss_stop f044457d A _end On x86_64, similar behavior can be observed: ffffffff81a00000 A __end_rodata_hpage_align ffffffff81b19000 A __vvar_page ffffffff81d3d000 A _end Even if only a couple of them pass the symbol range check that results in them to be taken into account for the final kallsyms symbol table, it is obvious that 'A' does not mean the symbol does not need to be updated at relocation time, and overloading its meaning to signify that is perhaps not a good idea. So instead, add a new percpu_absolute member to struct sym_entry, and when --absolute-percpu is in effect, use it to record symbols whose addresses should be emitted as final values rather than values that still require relocation at runtime. That way, we can drop the check against the 'A' type. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:15 +00:00
}
kallsyms: fix percpu vars on x86-64 with relocation. x86-64 has a problem: per-cpu variables are actually represented by their absolute offsets within the per-cpu area, but the symbols are not emitted as absolute. Thus kallsyms naively creates them as offsets from _text, meaning their values change if the kernel is relocated (especially noticeable with CONFIG_RANDOMIZE_BASE): $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.nokaslr 0000000000000000 D __per_cpu_start 0000000000004000 D gdt_page 0000000000014280 D __per_cpu_end ffffffff810001c8 T _stext ffffffff81ee53c0 D __per_cpu_offset $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.kaslr1 000000001f200000 D __per_cpu_start 000000001f204000 D gdt_page 000000001f214280 D __per_cpu_end ffffffffa02001c8 T _stext ffffffffa10e53c0 D __per_cpu_offset Making them absolute symbols is the Right Thing, but requires fixes to the relocs tool. So for the moment, we add a --absolute-percpu option which makes them absolute from a kallsyms perspective: $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # no KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff802001c8 T _stext ffffffff8099b180 D __per_cpu_offset ffffffff809a3000 D __per_cpu_load $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # With KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff89c001c8 T _stext ffffffff8a39d180 D __per_cpu_offset ffffffff8a3a5000 D __per_cpu_load Based-on-the-original-screenplay-by: Andy Honig <ahonig@google.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kees Cook <keescook@chromium.org>
2014-03-17 03:35:46 +00:00
}
kallsyms: add support for relative offsets in kallsyms address table Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:19 +00:00
/* find the minimum non-absolute symbol address */
static void record_relative_base(void)
{
unsigned int i;
for (i = 0; i < table_cnt; i++)
if (!symbol_absolute(table[i])) {
/*
* The table is sorted by address.
* Take the first non-absolute symbol value.
*/
relative_base = table[i]->addr;
return;
}
kallsyms: add support for relative offsets in kallsyms address table Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:19 +00:00
}
int main(int argc, char **argv)
{
while (1) {
static const struct option long_options[] = {
{"all-symbols", no_argument, &all_symbols, 1},
{"absolute-percpu", no_argument, &absolute_percpu, 1},
{"base-relative", no_argument, &base_relative, 1},
{"lto-clang", no_argument, &lto_clang, 1},
{},
};
int c = getopt_long(argc, argv, "", long_options, NULL);
if (c == -1)
break;
if (c != 0)
usage();
}
if (optind >= argc)
usage();
read_map(argv[optind]);
shrink_table();
kallsyms: fix percpu vars on x86-64 with relocation. x86-64 has a problem: per-cpu variables are actually represented by their absolute offsets within the per-cpu area, but the symbols are not emitted as absolute. Thus kallsyms naively creates them as offsets from _text, meaning their values change if the kernel is relocated (especially noticeable with CONFIG_RANDOMIZE_BASE): $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.nokaslr 0000000000000000 D __per_cpu_start 0000000000004000 D gdt_page 0000000000014280 D __per_cpu_end ffffffff810001c8 T _stext ffffffff81ee53c0 D __per_cpu_offset $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.kaslr1 000000001f200000 D __per_cpu_start 000000001f204000 D gdt_page 000000001f214280 D __per_cpu_end ffffffffa02001c8 T _stext ffffffffa10e53c0 D __per_cpu_offset Making them absolute symbols is the Right Thing, but requires fixes to the relocs tool. So for the moment, we add a --absolute-percpu option which makes them absolute from a kallsyms perspective: $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # no KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff802001c8 T _stext ffffffff8099b180 D __per_cpu_offset ffffffff809a3000 D __per_cpu_load $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # With KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff89c001c8 T _stext ffffffff8a39d180 D __per_cpu_offset ffffffff8a3a5000 D __per_cpu_load Based-on-the-original-screenplay-by: Andy Honig <ahonig@google.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kees Cook <keescook@chromium.org>
2014-03-17 03:35:46 +00:00
if (absolute_percpu)
make_percpus_absolute();
sort_symbols();
kallsyms: add support for relative offsets in kallsyms address table Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 21:58:19 +00:00
if (base_relative)
record_relative_base();
optimize_token_table();
write_src();
return 0;
}