Commit Graph

757 Commits

Author SHA1 Message Date
Kirill A. Shutemov 372fddf709 x86/mm: Introduce the 'no5lvl' kernel parameter
This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.

The option may be useful to work around regressions related to 5-level
paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Kirill A. Shutemov ad3fe525b9 x86/mm: Unify pgtable_l5_enabled usage in early boot code
Usually pgtable_l5_enabled is defined using cpu_feature_enabled().
cpu_feature_enabled() is not available in early boot code. We use
several different preprocessor tricks to get around it. It's messy.

Unify them all.

If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can
be defined before all includes. It makes pgtable_l5_enabled rely on
__pgtable_l5_enabled variable instead. This approach fits all early
users.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Kirill A. Shutemov 30bbf728ba x86/boot/compressed/64: Fix trampoline page table address calculation
Hugh noticied that we calculate the address of the trampoline page table
incorrectly in cleanup_trampoline().

TRAMPOLINE_32BIT_PGTABLE_OFFSET has to be divided by sizeof(unsigned long),
since trampoline_32bit is an 'unsigned long' pointer.

TRAMPOLINE_32BIT_PGTABLE_OFFSET is zero so the bug doesn't have a
visible effect.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180518103528.59260-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Kirill A. Shutemov 589bb62be3 x86/boot/compressed/64: Fix moving page table out of trampoline memory
cleanup_trampoline() relocates the top-level page table out of
trampoline memory. We use 'top_pgtable' as our new top-level page table.

But if the 'top_pgtable' would be referenced from C in a usual way,
the address of the table will be calculated relative to RIP.
After kernel gets relocated, the address will be in the middle of
decompression buffer and the page table may get overwritten.
This leads to a crash.

We calculate the address of other page tables relative to the relocation
address. It makes them safe. We should do the same for 'top_pgtable'.

Calculate the address of 'top_pgtable' in assembly and pass down to
cleanup_trampoline().

Move the page table to .pgtable section where the rest of page tables
are. The section is @nobits so we save 4k in kernel image.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180516080131.27913-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-16 12:15:13 +02:00
Kirill A. Shutemov 5c9b0b1c49 x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
Eric and Hugh have reported instant reboot due to my recent changes in
decompression code.

The root cause is that I didn't realize that we need to adjust GOT to be
able to run C code that early.

The problem is only visible with an older toolchain. Binutils >= 2.24 is
able to eliminate GOT references by replacing them with RIP-relative
address loads:

  https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec

We need to adjust GOT two times:

 - before calling paging_prepare() using the initial load address
 - before calling C code from the relocated kernel

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 194a9749c7 ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Link: http://lkml.kernel.org/r/20180516080131.27913-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-16 12:15:13 +02:00
Hans de Goede 1de3a1be8a efi/x86: Ignore unrealistically large option ROMs
setup_efi_pci() tries to save a copy of each PCI option ROM as this may
be necessary for the device driver for the PCI device to have access too.

On some systems the efi_pci_io_protocol's romimage and romsize fields
contain invalid data, which looks a bit like pointers pointing back into
other EFI code or data. Interpreting these pointers as romsize leads to
a very large value and if we then try to alloc this amount of memory to
save a copy the alloc call fails.

This leads to a "Failed to alloc mem for rom" error being printed on the
EFI console for each PCI device.

This commit avoids the printing of these errors, by checking romsize before
doing the alloc and if it is larger then the EFI spec limit of 16 MiB
silently ignore the ROM fields instead of trying to alloc mem and fail.

Tested-by: Hans de Goede <hdegoede@redhat.com>
[ardb: deduplicate 32/64 bit changes, use SZ_16M symbolic constant]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-16-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:57:49 +02:00
Ard Biesheuvel 2c3625cb9f efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function
As suggested by Lukas, use his efi_call_proto() and efi_table_attr()
macros to merge __setup_efi_pci32() and __setup_efi_pci64() into a
single function, removing the need to duplicate changes made in
subsequent patches across both.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-15-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:57:49 +02:00
Ard Biesheuvel cb0ba79352 efi: Align efi_pci_io_protocol typedefs to type naming convention
In order to use the helper macros that perform type mangling with the
EFI PCI I/O protocol struct typedefs, align their Linux typenames with
the convention we use for definitionns that originate in the UEFI spec,
and add the trailing _t to each.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-14-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:57:48 +02:00
Ard Biesheuvel 0b3225ab94 efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
Mixed mode allows a kernel built for x86_64 to interact with 32-bit
EFI firmware, but requires us to define all struct definitions carefully
when it comes to pointer sizes.

'struct efi_pci_io_protocol_32' currently uses a 'void *' for the
'romimage' field, which will be interpreted as a 64-bit field
on such kernels, potentially resulting in bogus memory references
and subsequent crashes.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:56:29 +02:00
Dave Hansen fb43d6cb91 x86/mm: Do not auto-massage page protections
A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE.  This is
done implicitly within functions like pfn_pte() by massage_pgprot().

However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().

This moves the bit filtering out of pfn_pte() and friends.  For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.

Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot().  This way, we still *look* for
unsupported bits and properly warn about them if we find them.  This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.

- printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
- boot crash fix from:            Tom Lendacky <thomas.lendacky@amd.com>
- crash bisected by:              Mike Galbraith <efault@gmx.de>

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de>
Fixed-by: Tom Lendacky <thomas.lendacky@amd.com>
Bisected-by: Mike Galbraith <efault@gmx.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:04:22 +02:00
Linus Torvalds bc16d4052f Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main EFI changes in this cycle were:

   - Fix the apple-properties code (Andy Shevchenko)

   - Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved
     x18 register (Ard Biesheuvel)

   - Use efi_switch_mm() on x86 instead of manipulating %cr3 directly
     (Sai Praneeth)

   - Fix early memremap leak in ESRT code (Ard Biesheuvel)

   - Switch to L"xxx" notation for wide string literals (Ard Biesheuvel)

   - ... plus misc other cleanups and bugfixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
  x86/efi: Replace efi_pgd with efi_mm.pgd
  efi: Use string literals for efi_char16_t variable initializers
  efi/esrt: Fix handling of early ESRT table mapping
  efi: Use efi_mm in x86 as well as ARM
  efi: Make const array 'apple' static
  efi/apple-properties: Use memremap() instead of ioremap()
  efi: Reorder pr_notice() with add_device_randomness() call
  x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store()
  efi/arm64: Check whether x18 is preserved by runtime services calls
  efi/arm*: Stop printing addresses of virtual mappings
  efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs()
  efi/arm*: Only register page tables when they exist
2018-04-02 17:46:37 -07:00
Linus Torvalds d22fff8141 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:

 - Extend the memmap= boot parameter syntax to allow the redeclaration
   and dropping of existing ranges, and to support all e820 range types
   (Jan H. Schönherr)

 - Improve the W+X boot time security checks to remove false positive
   warnings on Xen (Jan Beulich)

 - Support booting as Xen PVH guest (Juergen Gross)

 - Improved 5-level paging (LA57) support, in particular it's possible
   now to have a single kernel image for both 4-level and 5-level
   hardware (Kirill A. Shutemov)

 - AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)

 - Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
   (Kirill A. Shutemov)

 - Improved Intel-MID support (Andy Shevchenko)

 - Show EFI page tables in page_tables debug files (Andy Lutomirski)

 - ... plus misc fixes and smaller cleanups

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
  x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
  x86/mm: Update comment in detect_tme() regarding x86_phys_bits
  x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
  x86/mm: Remove pointless checks in vmalloc_fault
  x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
  ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
  ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
  x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
  x86/pconfig: Detect PCONFIG targets
  x86/tme: Detect if TME and MKTME is activated by BIOS
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
  x86/boot/compressed/64: Use page table in trampoline memory
  x86/boot/compressed/64: Use stack from trampoline memory
  x86/boot/compressed/64: Make sure we have a 32-bit code segment
  x86/mm: Do not use paravirtualized calls in native_set_p4d()
  kdump, vmcoreinfo: Export pgtable_l5_enabled value
  x86/boot/compressed/64: Prepare new top-level page table for trampoline
  x86/boot/compressed/64: Set up trampoline memory
  x86/boot/compressed/64: Save and restore trampoline memory
  ...
2018-04-02 15:45:30 -07:00
Linus Torvalds e68b4bad71 Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
 "The biggest change is the forcing of asm-goto support on x86, which
  effectively increases the GCC minimum supported version to gcc-4.5 (on
  x86)"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Don't pass in -D__KERNEL__ multiple times
  x86: Remove FAST_FEATURE_TESTS
  x86: Force asm-goto
  x86/build: Drop superfluous ALIGN from the linker script
2018-04-02 14:37:03 -07:00
Cao jin d6289f36aa x86/build: Don't pass in -D__KERNEL__ multiple times
Some .<target>.cmd files under arch/x86 are showing two instances of
-D__KERNEL__, like arch/x86/boot/ and arch/x86/realmode/rm/.

__KERNEL__ is already defined in KBUILD_CPPFLAGS in the top Makefile,
so it can be dropped safely.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/20180316084944.3997-1-caoj.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-31 08:02:07 +02:00
Tom Lendacky 07344b15a9 x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
In arch/x86/boot/compressed/kaslr_64.c, CONFIG_AMD_MEM_ENCRYPT support was
initially #undef'd to support SME with minimal effort.  When support for
SEV was added, the #undef remained and some minimal support for setting the
encryption bit was added for building identity mapped pagetable entries.

Commit b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
changed __PHYSICAL_MASK_SHIFT from 46 to 52 in support of 5-level paging.
This change resulted in SEV guests failing to boot because the encryption
bit was no longer being automatically masked out.  The compressed boot
path now requires sme_me_mask to be defined in order for the pagetable
functions, such as pud_present(), to properly mask out the encryption bit
(currently bit 47) when evaluating pagetable entries.

Add an sme_me_mask variable in arch/x86/boot/compressed/mem_encrypt.S,
which is set when SEV is active, delete the #undef CONFIG_AMD_MEM_ENCRYPT
from arch/x86/boot/compressed/kaslr_64.c and use sme_me_mask when building
the identify mapped pagetable entries.

Fixes: b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180327220711.8702.55842.stgit@tlendack-t1.amdoffice.net
2018-03-28 10:42:57 +02:00
Ingo Molnar 0bc91d4ba7 Linux 4.16-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJauCZfAAoJEHm+PkMAQRiGWGUH/2rhdQDkoJpYWnjaQkolECG8
 MxpGE7nmIIHxQcbSDdHTGJ8IhVm6Z5wZ7ym/PwCDTT043Y1y341sJrIwL2/nTG6d
 HVidk8hFvgN6QzlzVAHT3ZZMII/V9Zt+VV5SUYLGnPAVuJNHo/6uzWlTU5g+NTFo
 IquFDdQUaGBlkKqby+NoAFnkV1UAIkW0g22cfvPnlO5GMer0gusGyVNvVp7TNj3C
 sqj4Hvt3RMDLMNe9RZ2pFTiOD096n8FWpYftZneUTxFImhRV3Jg5MaaYZm9SI3HW
 tXrv/LChT/F1mi5Pkx6tkT5Hr8WvcrwDMJ4It1kom10RqWAgjxIR3CMm448ileY=
 =YKUG
 -----END PGP SIGNATURE-----

Merge tag 'v4.16-rc7' into x86/mm, to fix up conflict

 Conflicts:
	arch/x86/mm/init_64.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27 08:43:39 +02:00
Linus Torvalds d2862360bf Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 and PTI fixes from Ingo Molnar:
 "Misc fixes:

   - fix EFI pagetables freeing

   - fix vsyscall pagetable setting on Xen PV guests

   - remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again

   - fix two binutils (ld) development version related incompatibilities

   - clean up breakpoint handling

   - fix an x86 self-test"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Don't use IST entry for #BP stack
  x86/efi: Free efi_pgd with free_pages()
  x86/vsyscall/64: Use proper accessor to update P4D entry
  x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk
  x86/boot/64: Verify alignment of the LOAD segment
  x86/build/64: Force the linker to use 2MB page size
  selftests/x86/ptrace_syscall: Fix for yet more glibc interference
2018-03-25 07:36:02 -10:00
H.J. Lu c55b8550fa x86/boot/64: Verify alignment of the LOAD segment
Since the x86-64 kernel must be aligned to 2MB, refuse to boot the
kernel if the alignment of the LOAD segment isn't a multiple of 2MB.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20 08:03:03 +01:00
Kirill A. Shutemov 194a9749c7 x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.

If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.

But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.

This patch implements a trampoline in lower memory to handle this
situation.

We only need the memory for a very short time, until the main kernel
image sets up own page tables.

We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 11:49:25 +01:00
Kirill A. Shutemov 0a1756bd28 x86/boot/compressed/64: Use page table in trampoline memory
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.

But if the bootloader put the kernel above 4G (i.e. in kexec() case),
we would lose control as soon as paging is disabled, because the code
becomes unreachable to the CPU.

To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.

Apart from the trampoline code itself we also need a place to store
top-level page table in lower memory as we don't have a way to load
64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there
as we only use the very first entry of the page table. But we allocate a
whole page anyway.

This patch switches 32-bit code to use page table in trampoline memory.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 11:49:25 +01:00
Kirill A. Shutemov f7ff53e470 x86/boot/compressed/64: Use stack from trampoline memory
As the first step on using trampoline memory, let's make 32-bit code use
stack there.

Separate stack is required to return back from trampoline and we cannot
user stack from 64-bit mode as it may be above 4G.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 11:49:24 +01:00
Kirill A. Shutemov 7beebaccd5 x86/boot/compressed/64: Make sure we have a 32-bit code segment
When kernel starts in 64-bit mode we inherit the GDT from the bootloader.
It may cause a problem if the GDT doesn't have a 32-bit code segment
where we expect it to be.

Load our own GDT with known segments.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 11:49:24 +01:00
Ard Biesheuvel 36b649760e efi: Use string literals for efi_char16_t variable initializers
Now that we unambiguously build the entire kernel with -fshort-wchar,
it is no longer necessary to open code efi_char16_t[] initializers as
arrays of characters, and we can move to the L"xxx" notation instead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180312084500.10764-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 10:05:02 +01:00
Kirill A. Shutemov e9d0e6330e x86/boot/compressed/64: Prepare new top-level page table for trampoline
If trampoline code would need to switch between 4- and 5-level paging
modes, we have to use a page table in trampoline memory.

Having it in trampoline memory guarantees that it's below 4G and we can
point CR3 to it from 32-bit trampoline code.

We only use the page table if the desired paging mode doesn't match the
mode we are in. Otherwise the page table is unused and trampoline code
wouldn't touch CR3.

For 4- to 5-level paging transition, we set up current (4-level paging)
CR3 as the first and the only entry in a new top-level page table.

For 5- to 4-level paging transition, copy page table pointed by first
entry in the current top-level page table as our new top-level page
table.

If the page table is used by trampoline we would need to copy it to new
page table outside trampoline and update CR3 before restoring trampoline
memory.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 09:37:26 +01:00
Kirill A. Shutemov 32fcefa2bf x86/boot/compressed/64: Set up trampoline memory
This patch clears up trampoline memory and copies trampoline code in
place. It's not yet used though.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 09:37:25 +01:00
Kirill A. Shutemov fb5268354d x86/boot/compressed/64: Save and restore trampoline memory
The memory area we found for trampoline shouldn't contain anything
useful. But let's preserve the data anyway. Just to be on safe side.

paging_prepare() would save the data into a buffer.

cleanup_trampoline() would restore it back once we are done with the
trampoline.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 09:37:25 +01:00
Kirill A. Shutemov 3548e131ec x86/boot/compressed/64: Find a place for 32-bit trampoline
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling of
paging, which works fine if kernel itself is loaded below 4G.

But if the bootloader puts the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.

To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.

This patch finds a spot in low memory for a trampoline.

The heuristic is based on code in reserve_bios_regions().

We find the end of low memory based on BIOS and EBDA start addresses.
The trampoline is put just before end of low memory. It's mimic approach
taken to allocate memory for realtime trampoline.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 09:37:23 +01:00
Kirill A. Shutemov a403d79818 x86/boot/compressed/64: Describe the logic behind the LA57 check
The patch explains the LA57 check in more details.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 09:29:24 +01:00
Colin Ian King f779ca740f efi: Make const array 'apple' static
Don't populate the const read-only array 'buf' on the stack but instead
make it static. Makes the object code smaller by 64 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
   9264	      1	     16	   9281	   2441	arch/x86/boot/compressed/eboot.o

After:
   text	   data	    bss	    dec	    hex	filename
   9200	      1	     16	   9217	   2401	arch/x86/boot/compressed/eboot.o

(GCC version 7.2.0 x86_64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180308080020.22828-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 09:30:35 +01:00
Ingo Molnar 1ea4fe8497 Merge branch 'x86/boot' into x86/mm, to unify branches
Both x86/mm and x86/boot contain 5-level paging related patches,
unify them to have a single tree to work against.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26 08:48:08 +01:00
Ingo Molnar 3f7df3efeb Linux 4.16-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlqTdg8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG10wH/iSt+OKmBdUZSAYv
 ADvfifLynLgugFYNzuijj8/gVt6b0ZIB2/wSYfdPjDErLFogis6wjnxl0lf3sEMB
 g7Oy8SE+pPPQ7587lFkg6Pj53405b6BwCbSkg8PLlwepSGiu0JmGvUYmz753tIeP
 kRIIQk/KrLlxNFixhGWNfQ9k8PqJ0NCgcbj+mTxmFkfIw2FKnBtYz72LR7Eut3Mt
 PJFh4pLKsHKlcjvX8+SehDdLwlEBv/ohDP7S7gRyR+QX1aNZhZAXyHQ0C8/tw8h6
 DnRvlTWp9EGTFxp8bYie5xcWusIcfy1eAA8yiG2kH+Mx7kLa8cmU234bHhUiu9yT
 YJSLoI4=
 =XBoV
 -----END PGP SIGNATURE-----

Merge tag 'v4.16-rc3' into x86/mm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26 08:41:15 +01:00
Ingo Molnar ed7158bae4 treewide/trivial: Remove ';;$' typo noise
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:

  --- a/arch/x86/boot/compressed/eboot.c
  +++ b/arch/x86/boot/compressed/eboot.c
  @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
          struct efi_uga_draw_protocol *uga = NULL, *first_uga;
          efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
          unsigned long nr_ugas;
  -       u32 *handles = (u32 *)uga_handle;;
  +       u32 *handles = (u32 *)uga_handle;
          efi_status_t status = EFI_INVALID_PARAMETER;
          int i;

This patch is the result of the following script:

  $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)

... followed by manual review to make sure it's all good.

Splitting this up is just crazy talk, let's get over with this and just do it.

Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-22 10:59:33 +01:00
Kirill A. Shutemov 39b9552281 x86/mm: Optimize boot-time paging mode switching cost
By this point we have functioning boot-time switching between 4- and
5-level paging mode. But naive approach comes with cost.

Numbers below are for kernel build, allmodconfig, 5 times.

CONFIG_X86_5LEVEL=n:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

   17308719.892691      task-clock:u (msec)       #   26.772 CPUs utilized            ( +-  0.11% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
       331,993,164      page-faults:u             #    0.019 M/sec                    ( +-  0.01% )
43,614,978,867,455      cycles:u                  #    2.520 GHz                      ( +-  0.01% )
39,371,534,575,126      stalled-cycles-frontend:u #   90.27% frontend cycles idle     ( +-  0.09% )
28,363,350,152,428      instructions:u            #    0.65  insn per cycle
                                                  #    1.39  stalled cycles per insn  ( +-  0.00% )
 6,316,784,066,413      branches:u                #  364.948 M/sec                    ( +-  0.00% )
   250,808,144,781      branch-misses:u           #    3.97% of all branches          ( +-  0.01% )

     646.531974142 seconds time elapsed                                          ( +-  1.15% )

CONFIG_X86_5LEVEL=y:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

   17411536.780625      task-clock:u (msec)       #   26.426 CPUs utilized            ( +-  0.10% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
       331,868,663      page-faults:u             #    0.019 M/sec                    ( +-  0.01% )
43,865,909,056,301      cycles:u                  #    2.519 GHz                      ( +-  0.01% )
39,740,130,365,581      stalled-cycles-frontend:u #   90.59% frontend cycles idle     ( +-  0.05% )
28,363,358,997,959      instructions:u            #    0.65  insn per cycle
                                                  #    1.40  stalled cycles per insn  ( +-  0.00% )
 6,316,784,937,460      branches:u                #  362.793 M/sec                    ( +-  0.00% )
   251,531,919,485      branch-misses:u           #    3.98% of all branches          ( +-  0.00% )

     658.886307752 seconds time elapsed                                          ( +-  0.92% )

The patch tries to fix the performance regression by using
cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in
all hot code paths. These will statically patch the target code for
additional performance.

CONFIG_X86_5LEVEL=y + the patch:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

   17381990.268506      task-clock:u (msec)       #   26.907 CPUs utilized            ( +-  0.19% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
       331,862,625      page-faults:u             #    0.019 M/sec                    ( +-  0.01% )
43,697,726,320,051      cycles:u                  #    2.514 GHz                      ( +-  0.03% )
39,480,408,690,401      stalled-cycles-frontend:u #   90.35% frontend cycles idle     ( +-  0.05% )
28,363,394,221,388      instructions:u            #    0.65  insn per cycle
                                                  #    1.39  stalled cycles per insn  ( +-  0.00% )
 6,316,794,985,573      branches:u                #  363.410 M/sec                    ( +-  0.00% )
   251,013,232,547      branch-misses:u           #    3.97% of all branches          ( +-  0.01% )

     645.991174661 seconds time elapsed                                          ( +-  1.19% )

Unfortunately, this approach doesn't help with text size:

  vmlinux.before .text size:	8190319
  vmlinux.after .text size:	8200623

The .text section is increased by about 4k. Not sure if we can do anything
about this.

Signed-off-by: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 10:19:18 +01:00
Kirill A. Shutemov 6657fca06e x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y
All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without LA57 support.

Kernel will detect that LA57 is missing and fold p4d at runtime.

Update the documentation and the Kconfig option description to reflect the
change.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16 10:48:49 +01:00
Kirill A. Shutemov b16e770bfa x86/mm: Initialize 'pgdir_shift' and 'ptrs_per_p4d' at boot-time
Switching between paging modes requires the folding of the p4d page table level
when we only have 4 paging levels, which means we need to adjust 'pgdir_shift'
and 'ptrs_per_p4d' during early boot according to paging mode.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16 10:48:47 +01:00
Kirill A. Shutemov 4c2b4058ab x86/mm: Initialize 'pgtable_l5_enabled' at boot-time
'pgtable_l5_enabled' indicates which paging mode we are using. We need to
initialize it at boot-time according to machine capability.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16 10:48:46 +01:00
Kirill A. Shutemov c65e774fb3 x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.

The change doesn't affect the kernel image size much:

   text	   data	    bss	    dec	    hex	filename
8628091	4734304	1368064	14730459	 e0c4db	vmlinux.before
8628393	4734340	1368064	14730797	 e0c62d	vmlinux.after

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov e626e6bb0d x86/mm: Introduce 'pgtable_l5_enabled'
The new flag would indicate what paging mode we are in.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov 4440977be1 x86/boot/compressed/64: Introduce paging_prepare()
Rename l5_paging_required() to paging_prepare() and change the
interface of the function.

This is a preparation for the next patch, which would make the function
also allocate memory for the 32-bit trampoline.

The function now returns a 128-bit structure. RAX would return
trampoline memory address (zero for now) and RDX would indicate if we
need to enable 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Typo fixes and general clarification. ]
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-11 12:36:18 +01:00
Kirill A. Shutemov 7cc4eb1bdd x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
The name of the file -- pagetable.c -- is misleading: it only contains
helpers used for KASLR in 64-bit mode.

Let's rename the file to reflect its content.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-11 12:36:18 +01:00
Linus Torvalds ae0cb7be35 Merge branch 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull tpm updates from James Morris:

 - reduce polling delays in tpm_tis

 - support retrieving TPM 2.0 Event Log through EFI before
   ExitBootServices

 - replace tpm-rng.c with a hwrng device managed by the driver for each
   TPM device

 - TPM resource manager synthesizes TPM_RC_COMMAND_CODE response instead
   of returning -EINVAL for unknown TPM commands. This makes user space
   more sound.

 - CLKRUN fixes:

    * Keep #CLKRUN disable through the entier TPM command/response flow

    * Check whether #CLKRUN is enabled before disabling and enabling it
      again because enabling it breaks PS/2 devices on a system where it
      is disabled

* 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  tpm: remove unused variables
  tpm: remove unused data fields from I2C and OF device ID tables
  tpm: only attempt to disable the LPC CLKRUN if is already enabled
  tpm: follow coding style for variable declaration in tpm_tis_core_init()
  tpm: delete the TPM_TIS_CLK_ENABLE flag
  tpm: Update MAINTAINERS for Jason Gunthorpe
  tpm: Keep CLKRUN enabled throughout the duration of transmit_cmd()
  tpm_tis: Move ilb_base_addr to tpm_tis_data
  tpm2-cmd: allow more attempts for selftest execution
  tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented
  tpm: Move Linux RNG connection to hwrng
  tpm: use struct tpm_chip for tpm_chip_find_get()
  tpm: parse TPM event logs based on EFI table
  efi: call get_event_log before ExitBootServices
  tpm: add event log format version
  tpm: rename event log provider files
  tpm: move tpm_eventlog.h outside of drivers folder
  tpm: use tpm_msleep() value as max delay
  tpm: reduce tpm polling delay in tpm_tis_core
  tpm: move wait_for_tpm_stat() to respective driver files
2018-01-31 13:12:31 -08:00
Thiebaud Weksteen 33b6d03469 efi: call get_event_log before ExitBootServices
With TPM 2.0 specification, the event logs may only be accessible by
calling an EFI Boot Service. Modify the EFI stub to copy the log area to
a new Linux-specific EFI configuration table so it remains accessible
once booted.

When calling this service, it is possible to specify the expected format
of the logs: TPM 1.2 (SHA1) or TPM 2.0 ("Crypto Agile"). For now, only the
first format is retrieved.

Signed-off-by: Thiebaud Weksteen <tweek@google.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Jarkko Sakkinen  <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen  <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen  <jarkko.sakkinen@linux.intel.com>
2018-01-08 12:58:35 +02:00
Linus Torvalds 5aa90a8458 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 page table isolation updates from Thomas Gleixner:
 "This is the final set of enabling page table isolation on x86:

   - Infrastructure patches for handling the extra page tables.

   - Patches which map the various bits and pieces which are required to
     get in and out of user space into the user space visible page
     tables.

   - The required changes to have CR3 switching in the entry/exit code.

   - Optimizations for the CR3 switching along with documentation how
     the ASID/PCID mechanism works.

   - Updates to dump pagetables to cover the user space page tables for
     W+X scans and extra debugfs files to analyze both the kernel and
     the user space visible page tables

  The whole functionality is compile time controlled via a config switch
  and can be turned on/off on the command line as well"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  x86/ldt: Make the LDT mapping RO
  x86/mm/dump_pagetables: Allow dumping current pagetables
  x86/mm/dump_pagetables: Check user space page table for WX pages
  x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
  x86/mm/pti: Add Kconfig
  x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
  x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
  x86/mm: Use INVPCID for __native_flush_tlb_single()
  x86/mm: Optimize RESTORE_CR3
  x86/mm: Use/Fix PCID to optimize user/kernel switches
  x86/mm: Abstract switching CR3
  x86/mm: Allow flushing for future ASID switches
  x86/pti: Map the vsyscall page if needed
  x86/pti: Put the LDT in its own PGD if PTI is on
  x86/mm/64: Make a full PGD-entry size hole in the memory map
  x86/events/intel/ds: Map debug buffers in cpu_entry_area
  x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
  x86/mm/pti: Map ESPFIX into user space
  x86/mm/pti: Share entry text PMD
  x86/entry: Align entry text section to PMD boundary
  ...
2017-12-29 17:02:49 -08:00
Thomas Gleixner aa8c6248f8 x86/mm/pti: Add infrastructure for page table isolation
Add the initial files for kernel page table isolation, with a minimal init
function and the boot time detection for this misfeature.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23 21:12:59 +01:00
Kirill A. Shutemov 6d7e0ba2d2 x86/boot/compressed/64: Print error if 5-level paging is not supported
If the machine does not support the paging mode for which the kernel was
compiled, the boot process cannot continue.

It's not possible to let the kernel detect the mismatch as it does not even
reach the point where cpu features can be evaluted due to a triple fault in
the KASLR setup.

Instead of instantaneous silent reboot, emit an error message which gives
the user the information why the boot fails.

Fixes: 77ef56e4f0 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-3-kirill.shutemov@linux.intel.com
2017-12-07 10:36:26 +01:00
Kirill A. Shutemov 08529078d8 x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
Prerequisite for fixing the current problem of instantaneous reboots when a
5-level paging kernel is booted on 4-level paging hardware.

At the same time this change prepares the decompression code to boot-time
switching between 4- and 5-level paging.

[ tglx: Folded the GCC < 5 fix. ]

Fixes: 77ef56e4f0 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-2-kirill.shutemov@linux.intel.com
2017-12-07 10:34:39 +01:00
Chao Fan 69550d41ff x86/boot/KASLR: Remove unused variable
There are two variables "rc" in mem_avoid_memmap. One at the top of the
function and another one inside the while() loop. Drop the outer one as it
is unused. Cleanup some whitespace damage while at it.

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: n-horiguchi@ah.jp.nec.com
Cc: keescook@chromium.org
Link: https://lkml.kernel.org/r/20171123090847.15293-1-fanc.fnst@cn.fujitsu.com
2017-11-23 20:17:59 +01:00
Linus Torvalds 6a9f70b0a5 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Three smaller changes:

   - clang fix

   - boot message beautification

   - unnecessary header inclusion removal"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Disable Clang warnings about GNU extensions
  x86/boot: Remove unnecessary #include <generated/utsrelease.h>
  x86/boot: Spell out "boot CPU" for BP
2017-11-13 16:32:30 -08:00
Tom Lendacky 1958b5fc40 x86/boot: Add early boot support when running with SEV active
Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.

Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
(CPUID 0x8000001f, bit 1) and then checking a non-interceptable SEV MSR
(0xc0010131, bit 0).

This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory, it can boot properly.

After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so.  This allows
to distinguish between SME and SEV, each of which have unique differences
in how certain things are handled: e.g. DMA (always bounce buffered with
SEV) or EFI tables (always access decrypted with SME).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kvm@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-13-brijesh.singh@amd.com
2017-11-07 15:35:58 +01:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Matthias Kaehlcke 6c3b56b197 x86/boot: Disable Clang warnings about GNU extensions
The kernel makes use of several GCC extensions, disable Clang warnings
about that in the boot code, as we already do for the rest of the kernel.

This suppresses the following warning when building with clang:

  ./include/linux/cgroup-defs.h:391:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
        struct cgroup cgrp;

Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171030194351.122090-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-31 10:54:30 +01:00
Linus Torvalds f92e3da18b Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Transparently fall back to other poweroff method(s) if EFI poweroff
     fails (and returns)

   - Use separate PE/COFF section headers for the RX and RW parts of the
     ARM stub loader so that the firmware can use strict mapping
     permissions

   - Add support for requesting the firmware to wipe RAM at warm reboot

   - Increase the size of the random seed obtained from UEFI so CRNG
     fast init can complete earlier

   - Update the EFI framebuffer address if it points to a BAR that gets
     moved by the PCI resource allocation code

   - Enable "reset attack mitigation" of TPM environments: this is
     enabled if the kernel is configured with
     CONFIG_RESET_ATTACK_MITIGATION=y.

   - Clang related fixes

   - Misc cleanups, constification, refactoring, etc"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/bgrt: Use efi_mem_type()
  efi: Move efi_mem_type() to common code
  efi/reboot: Make function pointer orig_pm_power_off static
  efi/random: Increase size of firmware supplied randomness
  efi/libstub: Enable reset attack mitigation
  firmware/efi/esrt: Constify attribute_group structures
  firmware/efi: Constify attribute_group structures
  firmware/dcdbas: Constify attribute_group structures
  arm/efi: Split zImage code and data into separate PE/COFF sections
  arm/efi: Replace open coded constants with symbolic ones
  arm/efi: Remove pointless dummy .reloc section
  arm/efi: Remove forbidden values from the PE/COFF header
  drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
  efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
  efi/arm/arm64: Add missing assignment of efi.config_table
  efi/libstub/arm64: Set -fpie when building the EFI stub
  efi/libstub/arm64: Force 'hidden' visibility for section markers
  efi/libstub/arm64: Use hidden attribute for struct screen_info reference
  efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
2017-09-07 09:42:35 -07:00
Linus Torvalds 24e700e291 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This update provides:

   - Cleanup of the IDT management including the removal of the extra
     tracing IDT. A first step to cleanup the vector management code.

   - The removal of the paravirt op adjust_exception_frame. This is a
     XEN specific issue, but merged through this branch to avoid nasty
     merge collisions

   - Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
     disabled the TSC DEADLINE timer in CPUID.

   - Adjust a debug message in the ioapic code to print out the
     information correctly"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  x86/idt: Fix the X86_TRAP_BP gate
  x86/xen: Get rid of paravirt op adjust_exception_frame
  x86/eisa: Add missing include
  x86/idt: Remove superfluous ALIGNment
  x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
  x86/idt: Remove the tracing IDT leftovers
  x86/idt: Hide set_intr_gate()
  x86/idt: Simplify alloc_intr_gate()
  x86/idt: Deinline setup functions
  x86/idt: Remove unused functions/inlines
  x86/idt: Move interrupt gate initialization to IDT code
  x86/idt: Move APIC gate initialization to tables
  x86/idt: Move regular trap init to tables
  x86/idt: Move IST stack based traps to table init
  x86/idt: Move debug stack init to table based
  x86/idt: Switch early trap init to IDT tables
  x86/idt: Prepare for table based init
  x86/idt: Move early IDT setup out of 32-bit asm
  x86/idt: Move early IDT handler setup to IDT code
  x86/idt: Consolidate IDT invalidation
  ...
2017-09-04 17:43:56 -07:00
Linus Torvalds b1b6f83ac9 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "PCID support, 5-level paging support, Secure Memory Encryption support

  The main changes in this cycle are support for three new, complex
  hardware features of x86 CPUs:

   - Add 5-level paging support, which is a new hardware feature on
     upcoming Intel CPUs allowing up to 128 PB of virtual address space
     and 4 PB of physical RAM space - a 512-fold increase over the old
     limits. (Supercomputers of the future forecasting hurricanes on an
     ever warming planet can certainly make good use of more RAM.)

     Many of the necessary changes went upstream in previous cycles,
     v4.14 is the first kernel that can enable 5-level paging.

     This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
     default.

     (By Kirill A. Shutemov)

   - Add 'encrypted memory' support, which is a new hardware feature on
     upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
     RAM to be encrypted and decrypted (mostly) transparently by the
     CPU, with a little help from the kernel to transition to/from
     encrypted RAM. Such RAM should be more secure against various
     attacks like RAM access via the memory bus and should make the
     radio signature of memory bus traffic harder to intercept (and
     decrypt) as well.

     This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
     by default.

     (By Tom Lendacky)

   - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
     hardware feature that attaches an address space tag to TLB entries
     and thus allows to skip TLB flushing in many cases, even if we
     switch mm's.

     (By Andy Lutomirski)

  All three of these features were in the works for a long time, and
  it's coincidence of the three independent development paths that they
  are all enabled in v4.14 at once"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
  x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
  x86/mm: Use pr_cont() in dump_pagetable()
  x86/mm: Fix SME encryption stack ptr handling
  kvm/x86: Avoid clearing the C-bit in rsvd_bits()
  x86/CPU: Align CR3 defines
  x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
  acpi, x86/mm: Remove encryption mask from ACPI page protection type
  x86/mm, kexec: Fix memory corruption with SME on successive kexecs
  x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
  x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
  x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
  x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
  x86/mm: Allow userspace have mappings above 47-bit
  x86/mm: Prepare to expose larger address space to userspace
  x86/mpx: Do not allow MPX if we have mappings above 47-bit
  x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
  x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
  x86/mm/dump_pagetables: Fix printout of p4d level
  x86/mm/dump_pagetables: Generalize address normalization
  x86/boot: Fix memremap() related build failure
  ...
2017-09-04 12:21:28 -07:00
Linus Torvalds 45153920c7 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes are KASL related fixes and cleanups: in particular we
  now exclude certain physical memory ranges as KASLR randomization
  targets that have proven to be unreliable (early-)RAM on some firmware
  versions"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice
  x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address
  efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor
  x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()
  x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()
  x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()
2017-09-04 10:51:02 -07:00
Linus Torvalds b0c79f49c3 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:

 - Introduce the ORC unwinder, which can be enabled via
   CONFIG_ORC_UNWINDER=y.

   The ORC unwinder is a lightweight, Linux kernel specific debuginfo
   implementation, which aims to be DWARF done right for unwinding.
   Objtool is used to generate the ORC unwinder tables during build, so
   the data format is flexible and kernel internal: there's no
   dependency on debuginfo created by an external toolchain.

   The ORC unwinder is almost two orders of magnitude faster than the
   (out of tree) DWARF unwinder - which is important for perf call graph
   profiling. It is also significantly simpler and is coded defensively:
   there has not been a single ORC related kernel crash so far, even
   with early versions. (knock on wood!)

   But the main advantage is that enabling the ORC unwinder allows
   CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
   measurably:

   With frame pointers disabled, GCC does not have to add frame pointer
   instrumentation code to every function in the kernel. The kernel's
   .text size decreases by about 3.2%, resulting in better cache
   utilization and fewer instructions executed, resulting in a broad
   kernel-wide speedup. Average speedup of system calls should be
   roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
   a speedup of 5-10% for some function execution intense workloads.

   The main cost of the unwinder is that the unwinder data has to be
   stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
   config - which is a modest cost on modern x86 systems.

   Given how young the ORC unwinder code is it's not enabled by default
   - but given the performance advantages the plan is to eventually make
   it the default unwinder on x86.

   See Documentation/x86/orc-unwinder.txt for more details.

 - Remove lguest support: its intended role was that of a temporary
   proof of concept for virtualization, plus its removal will enable the
   reduction (removal) of the paravirt API as well, so Rusty agreed to
   its removal. (Juergen Gross)

 - Clean up and fix FSGS related functionality (Andy Lutomirski)

 - Clean up IO access APIs (Andy Shevchenko)

 - Enhance the symbol namespace (Jiri Slaby)

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  objtool: Handle GCC stack pointer adjustment bug
  x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
  x86/fpu/math-emu: Add ENDPROC to functions
  x86/boot/64: Extract efi_pe_entry() from startup_64()
  x86/boot/32: Extract efi_pe_entry() from startup_32()
  x86/lguest: Remove lguest support
  x86/paravirt/xen: Remove xen_patch()
  objtool: Fix objtool fallthrough detection with function padding
  x86/xen/64: Fix the reported SS and CS in SYSCALL
  objtool: Track DRAP separately from callee-saved registers
  objtool: Fix validate_branch() return codes
  x86: Clarify/fix no-op barriers for text_poke_bp()
  x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
  selftests/x86/fsgsbase: Test selectors 1, 2, and 3
  x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
  x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
  x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
  x86/asm/32: Fix regs_get_register() on segment registers
  x86/xen/64: Rearrange the SYSCALL entries
  x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
  ...
2017-09-04 09:52:57 -07:00
Naoya Horiguchi 0982adc746 x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice
There's a potential bug in how we select the KASLR kernel address n
the early boot code.

The KASLR boot code currently chooses the kernel image's physical memory
location from E820_TYPE_RAM regions by walking over all e820 entries.

E820_TYPE_RAM includes EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA
as well, so those regions can end up hosting the kernel image. According to
the UEFI spec, all memory regions marked as EfiBootServicesCode and
EfiBootServicesData are available as free memory after the first call
to ExitBootServices(). I.e. so such regions should be usable for the
kernel, per spec.

In real life however, we have workarounds for broken x86 firmware,
where we keep such regions reserved until SetVirtualAddressMap() is done.

See the following code in should_map_region():

	static bool should_map_region(efi_memory_desc_t *md)
	{
		...
		/*
		 * Map boot services regions as a workaround for buggy
		 * firmware that accesses them even when they shouldn't.
		 *
		 * See efi_{reserve,free}_boot_services().
		 */
		if (md->type =3D=3D EFI_BOOT_SERVICES_CODE ||
			md->type =3D=3D EFI_BOOT_SERVICES_DATA)
				return false;

This workaround suppressed a boot crash, but potential issues still
remain because no one prevents the regions from overlapping with kernel
image by KASLR.

So let's make sure that EFI_BOOT_SERVICES_{CODE|DATA} regions are never
chosen as kernel memory for the workaround to work fine.

Furthermore, EFI_LOADER_{CODE|DATA} regions are also excluded because
they can be used after ExitBootServices() as defined in EFI spec.

As a result, we choose kernel address only from EFI_CONVENTIONAL_MEMORY
which is the only memory type we know to be safely free.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Junichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Link: http://lkml.kernel.org/r/20170828074444.GC23181@hori1.linux.bs1.fc.nec.co.jp
[ Rewrote/fixed/clarified the changelog and the in code comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 12:00:35 +02:00
Jan H. Schönherr fb1cc2f916 x86/boot: Prevent faulty bootparams.screeninfo from causing harm
If a zero for the number of lines manages to slip through, scroll()
may underflow some offset calculations, causing accesses outside the
video memory.

Make the check in __putstr() more pessimistic to prevent that.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 13:32:50 +02:00
Jiri Slaby 9e085cefc6 x86/boot/64: Extract efi_pe_entry() from startup_64()
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into
startup_64().

In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry()
to start at 0x210. But this requirement was removed long time ago, in:

  99f857db88 ("x86, build: Dynamically find entry points in compressed startup code")

The way it is now makes the code less readable and illogical. Given
we can now safely extract the inlined efi_pe_entry() body from
startup_64() into a separate function, we do so.

We also annotate the function appropriatelly by ENTRY+ENDPROC.

ABI offsets are preserved:

  0000000000000000 T startup_32
  0000000000000200 T startup_64
  0000000000000390 T efi64_stub_entry

On the top-level, it looked like:

	.org 0x200
	ENTRY(startup_64)
	#ifdef CONFIG_EFI_STUB		; start of inlined
		jmp     preferred_addr
	GLOBAL(efi_pe_entry)
		... ; a lot of assembly (efi_pe_entry)
		leaq    preferred_addr(%rax), %rax
		jmp     *%rax
	preferred_addr:
	#endif				; end of inlined
		... ; a lot of assembly (startup_64)
	ENDPROC(startup_64)

And it is now converted into:

	.org 0x200
	ENTRY(startup_64)
		... ; a lot of assembly (startup_64)
	ENDPROC(startup_64)

	#ifdef CONFIG_EFI_STUB
	ENTRY(efi_pe_entry)
		... ; a lot of assembly (efi_pe_entry)
		leaq    startup_64(%rax), %rax
		jmp     *%rax
	ENDPROC(efi_pe_entry)
	#endif

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 13:23:29 +02:00
Jiri Slaby f4dee0bb65 x86/boot/32: Extract efi_pe_entry() from startup_32()
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days,
we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start
at 0x10.

But this requirement was removed long time ago, in:

  99f857db88 ("x86, build: Dynamically find entry points in compressed startup code")

The way it is now makes the code less readable and illogical. Given
we can now safely extract the inlined efi_pe_entry() body from
startup_32() into a separate function, we do so and we separate it to two
functions as they are marked already: efi_pe_entry() + efi32_stub_entry().

We also annotate the functions appropriatelly by ENTRY+ENDPROC.

ABI offset is preserved:

  0000   128 FUNC    GLOBAL DEFAULT    6 startup_32
  0080    60 FUNC    GLOBAL DEFAULT    6 efi_pe_entry
  00bc    68 FUNC    GLOBAL DEFAULT    6 efi32_stub_entry

On the top-level, it looked like this:

	ENTRY(startup_32)
	#ifdef CONFIG_EFI_STUB		; start of inlined
		jmp     preferred_addr
	ENTRY(efi_pe_entry)
		... ; a lot of assembly (efi_pe_entry)
	ENTRY(efi32_stub_entry)
		... ; a lot of assembly (efi32_stub_entry)
		leal    preferred_addr(%eax), %eax
		jmp     *%eax
	preferred_addr:
	#endif				; end of inlined
		... ; a lot of assembly (startup_32)
	ENDPROC(startup_32)

And it is now converted into:

	ENTRY(startup_32)
		... ; a lot of assembly (startup_32)
	ENDPROC(startup_32)

	#ifdef CONFIG_EFI_STUB
	ENTRY(efi_pe_entry)
		... ; a lot of assembly (efi_pe_entry)
	ENDPROC(efi_pe_entry)

	ENTRY(efi32_stub_entry)
		... ; a lot of assembly (efi32_stub_entry)
		leal    startup_32(%eax), %eax
		jmp     *%eax
	ENDPROC(efi32_stub_entry)
	#endif

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 13:23:29 +02:00
Thomas Gleixner 64b163fab6 x86/idt: Unify gate_struct handling for 32/64-bit kernels
The first 32 bits of gate struct are the same for 32 and 64 bit kernels.

The 32-bit version uses desc_struct and no designated data structure,
so we need different accessors for 32 and 64 bit kernels.

Aside of that the macros which are necessary to build the 32-bit
gate descriptor are horrible to read.

Unify the gate structs and switch all code fiddling with it over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:24 +02:00
Matthew Garrett ccc829ba36 efi/libstub: Enable reset attack mitigation
If a machine is reset while secrets are present in RAM, it may be
possible for code executed after the reboot to extract those secrets
from untouched memory. The Trusted Computing Group specified a mechanism
for requesting that the firmware clear all RAM on reset before booting
another OS. This is done by setting the MemoryOverwriteRequestControl
variable at startup. If userspace can ensure that all secrets are
removed as part of a controlled shutdown, it can reset this variable to
0 before triggering a hardware reboot.

Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170825155019.6740-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-26 09:20:33 +02:00
Ingo Molnar 413d63d71b Merge branch 'linus' into x86/mm to pick up fixes and to fix conflicts
Conflicts:
	arch/x86/kernel/head64.c
	arch/x86/mm/mmap.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-26 09:19:13 +02:00
Baoquan He c05cd79750 x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address
Currently KASLR will parse all e820 entries of RAM type and add all
candidate positions into the slots array. After that we choose one slot
randomly as the new position which the kernel will be decompressed into
and run at.

On systems with EFI enabled, e820 memory regions are coming from EFI
memory regions by combining adjacent regions.

These EFI memory regions have various attributes, and the "mirrored"
attribute is one of them. The physical memory region whose descriptors
in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
mirrored. The address range mirroring feature of the kernel arranges such
mirrored regions into normal zones and other regions into movable zones.

With the mirroring feature enabled, the code and data of the kernel can only
be located in the more reliable mirrored regions. However, the current KASLR
code doesn't check EFI memory entries, and could choose a new kernel position
in non-mirrored regions. This will break the intended functionality of the
address range mirroring feature.

To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
region to process for adding candidate of randomization slot. If EFI is disabled
or no mirrored region found, still process the e820 memory map.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: linux-efi@vger.kernel.org
Cc: matt@codeblueprint.co.uk
Cc: n-horiguchi@ah.jp.nec.com
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com
[ Rewrote most of the text. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 10:51:35 +02:00
Baoquan He 02e43c2dcd efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor
The existing map iteration helper for_each_efi_memory_desc_in_map can
only be used after the kernel initializes the EFI subsystem to set up
struct efi_memory_map.

Before that we also need iterate map descriptors which are stored in several
intermediate structures, like struct efi_boot_memmap for arch independent
usage and struct efi_info for x86 arch only.

Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
replace several places where that primitive is open coded.

Signed-off-by: Baoquan He <bhe@redhat.com>
[ Various improvements to the text. ]
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: linux-efi@vger.kernel.org
Cc: n-horiguchi@ah.jp.nec.com
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 10:50:57 +02:00
Ingo Molnar 2257e268b1 Merge branch 'linus' into x86/boot, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 10:50:48 +02:00
Matthias Kaehlcke 20c6c18904 x86/boot: Disable the address-of-packed-member compiler warning
The clang warning 'address-of-packed-member' is disabled for the general
kernel code, also disable it for the x86 boot code.

This suppresses a bunch of warnings like this when building with clang:

./arch/x86/include/asm/processor.h:535:30: warning: taking address of
  packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
  unaligned pointer value [-Waddress-of-packed-member]
    return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                ^~~~~~~~~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
  'this_cpu_read_stable'
    #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                    ^~~
./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
  'percpu_stable_op'
    : "p" (&(var)));
             ^~~

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-28 08:39:08 +02:00
Tom Lendacky 21729f81ce x86/mm: Provide general kernel support for memory encryption
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros.  Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active.  Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.

The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO.  SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.

Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask.  These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.

Also, an early initialization function is added for SME.  If SME is active,
this function:

 - Updates the early_pmd_flags so that early page faults create mappings
   with the encryption mask.

 - Updates the __supported_pte_mask to include the encryption mask.

 - Updates the protection_map entries to include the encryption mask so
   that user-space allocations will automatically have the encryption mask
   applied.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:38:00 +02:00
Baoquan He 27aac20574 x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()
Now process_e820_entry() is not limited to e820 entry processing, rename
it to process_mem_region(). And adjust the code comment accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: matt@codeblueprint.co.uk
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-4-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:11:12 +02:00
Baoquan He 87891b01b5 x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()
This makes process_e820_entry() be able to process any kind of memory
region.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: matt@codeblueprint.co.uk
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-3-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:11:11 +02:00
Baoquan He f62995c92a x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()
The original function process_e820_entry() only takes care of each
e820 entry passed.

And move the E820_TYPE_RAM checking logic into process_e820_entries().

And remove the redundent local variable 'addr' definition in
find_random_phys_addr().

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: matt@codeblueprint.co.uk
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:11:11 +02:00
Daniel Micay 6974f0c455 include/linux/string.h: add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time.  Unlike glibc,
it covers buffer reads in addition to writes.

GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation.  They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks.  Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.

This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code.  There will likely be issues caught in
regular use at runtime too.

Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:

* Some of the fortified string functions (strncpy, strcat), don't yet
  place a limit on reads from the source based on __builtin_object_size of
  the source buffer.

* Extending coverage to more string functions like strlcat.

* It should be possible to optionally use __builtin_object_size(x, 1) for
  some functions (C strings) to detect intra-object overflows (like
  glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
  approach to avoid likely compatibility issues.

* The compile-time checks should be made available via a separate config
  option which can be enabled by default (or always enabled) once enough
  time has passed to get the issues it catches fixed.

Kees said:
 "This is great to have. While it was out-of-tree code, it would have
  blocked at least CVE-2016-3858 from being exploitable (improper size
  argument to strlcpy()). I've sent a number of fixes for
  out-of-bounds-reads that this detected upstream already"

[arnd@arndb.de: x86: fix fortified memcpy]
  Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
  Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Linus Torvalds 7a69f9c60b Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued work to add support for 5-level paging provided by future
     Intel CPUs. In particular we switch the x86 GUP code to the generic
     implementation. (Kirill A. Shutemov)

   - Continued work to add PCID CPU support to native kernels as well.
     In this round most of the focus is on reworking/refreshing the TLB
     flush infrastructure for the upcoming PCID changes. (Andy
     Lutomirski)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  x86/mm: Delete a big outdated comment about TLB flushing
  x86/mm: Don't reenter flush_tlb_func_common()
  x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
  x86/ftrace: Exclude functions in head64.c from function-tracing
  x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
  x86/mm: Remove reset_lazy_tlbstate()
  x86/ldt: Simplify the LDT switching logic
  x86/boot/64: Put __startup_64() into .head.text
  x86/mm: Add support for 5-level paging for KASLR
  x86/mm: Make kernel_physical_mapping_init() support 5-level paging
  x86/mm: Add sync_global_pgds() for configuration with 5-level paging
  x86/boot/64: Add support of additional page table level during early boot
  x86/boot/64: Rename init_level4_pgt and early_level4_pgt
  x86/boot/64: Rewrite startup_64() in C
  x86/boot/compressed: Enable 5-level paging during decompression stage
  x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
  x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
  x86/boot/efi: Cleanup initialization of GDT entries
  x86/asm: Fix comment in return_from_SYSCALL_64()
  x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
  ...
2017-07-03 14:45:09 -07:00
Linus Torvalds 25e09ca524 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes in this cycle were KASLR improvements for rare
  environments with special boot options, by Baoquan He. Also misc
  smaller changes/cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Extend the lower bound of crash kernel low reservations
  x86/boot: Remove unused copy_*_gs() functions
  x86/KASLR: Use the right memcpy() implementation
  Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
  x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options
  x86/KASLR: Parse all 'memmap=' boot option entries
2017-07-03 13:40:38 -07:00
Kirill A. Shutemov a24261d70e x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
KASLR uses hack to detect whether we booted via startup_32() or
startup_64(): it checks what is loaded into cr3 and compares it to
_pgtables. _pgtables is the array of page tables where early code
allocates page table from.

KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
that's not true if we booted with 5-level paging enabled. In this case top
level page table is allocated separately and only the first p4d page table
is allocated from the array.

Let's modify the check to cover both 4- and 5-level paging cases.

The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
page table for 4th or 5th level, depending on configuration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170628121730.43079-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30 08:56:53 +02:00
Baoquan He 8eabf42ae5 x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug
Kernel text KASLR is separated into physical address and virtual
address randomization. And for virtual address randomization, we
only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
but not the original kernel loading address 'output'.

The bug will cause kernel boot failure if kernel is loaded at a different
position than the address, 16M, which is decided at compiled time.
Kexec/kdump is such practical case.

To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
value.

Tested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately")
Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30 08:53:14 +02:00
Baoquan He b892cb873c x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization
For kernel text KASLR, the virtual address is confined to area of 1G,
[0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
virtual address randomization, we only randomize to get an offset
between 16M and 1G, then add this offset to the starting address,
0xffffffff80000000. Here 16M is the offset which is decided at linking
stage. So the amount of the local variable 'virt_addr' which respresents
the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.

Add a debug check for the offset. If out of bounds, print error
message and hang there.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30 08:53:14 +02:00
Kirill A. Shutemov 34bbb0009f x86/boot/compressed: Enable 5-level paging during decompression stage
We need to cover two basic cases: when bootloader left us in 32-bit mode
and when bootloader enabled long mode.

The patch implements unified codepath to enabled 5-level paging for both
cases. It means case when we start in 32-bit mode, we first enable long
mode with 4-level and then switch over to 5-level paging.

Switching from 4-level to 5-level paging is not trivial. We cannot do it
directly. Setting LA57 in long mode would trigger #GP. So we need to
switch off long mode first and the then re-enable with 5-level paging.

NOTE: The need of switching off long mode means we are in trouble if
bootloader put us above 4G boundary. If bootloader wants to boot 5-level
paging kernel, it has to put kernel below 4G or enable 5-level paging on
it's own, so we could avoid the step.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:53 +02:00
Kirill A. Shutemov 919a02d128 x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
We would need to switch temporarily to compatibility mode during booting
with 5-level paging enabled. It would require 32-bit code segment
descriptor.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:53 +02:00
Kirill A. Shutemov 4c94117c7f x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
Define __KERNEL_CS GDT entry as long mode (.L=1, .D=0) on 64-bit
configurations.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:52 +02:00
Kirill A. Shutemov f8fceacbd1 x86/boot/efi: Cleanup initialization of GDT entries
This is preparation for following patches without changing semantics of the
code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:51 +02:00
Andy Lutomirski 6c690ee103 x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()
The kernel has several code paths that read CR3.  Most of them assume that
CR3 contains the PGD's physical address, whereas some of them awkwardly
use PHYSICAL_PAGE_MASK to mask off low bits.

Add explicit mask macros for CR3 and convert all of the CR3 readers.
This will keep them from breaking when PCID is enabled.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:48:09 +02:00
Arnd Bergmann 5b8b9cf76a x86/KASLR: Use the right memcpy() implementation
The decompressor has its own implementation of the string functions,
but has to include the right header to get those, while implicitly
including linux/string.h may result in a link error:

  arch/x86/boot/compressed/kaslr.o: In function `choose_random_location':
  kaslr.c:(.text+0xf51): undefined reference to `_mmx_memcpy'

This has appeared now as KASLR started using memcpy(), via:

	d52e7d5a95 ("x86/KASLR: Parse all 'memmap=' boot option entries")

Other files in the decompressor already do the same thing.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170530091446.1000183-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-31 07:59:45 +02:00
Baoquan He 4cdba14f84 x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options
The 'mem=' boot option limits the max address a system can use - any memory
region above the limit will be removed.

Furthermore, the 'memmap=nn[KMG]' variant (with no offset specified) has the same
behaviour as 'mem='.

KASLR needs to consider this when choosing the random position for
decompressing the kernel. Do it.

Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: douly.fnst@cn.fujitsu.com
Cc: dyoung@redhat.com
Link: http://lkml.kernel.org/r/1494654390-23861-3-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24 09:50:27 +02:00
Baoquan He d52e7d5a95 x86/KASLR: Parse all 'memmap=' boot option entries
In commit:

  f28442497b ("x86/boot: Fix KASLR and memmap= collision")

... the memmap= option is parsed so that KASLR can avoid those reserved
regions. It uses cmdline_find_option() to get the value if memmap=
is specified, however the problem is that cmdline_find_option() can only
find the last entry if multiple memmap entries are provided. This
is not correct.

Address this by checking each command line token for a "memmap=" match
and parse each instance instead of using cmdline_find_option().

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: douly.fnst@cn.fujitsu.com
Cc: dyoung@redhat.com
Cc: m.mizuma@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24 09:50:27 +02:00
Rob Landley 3780578761 x86/boot: Use CROSS_COMPILE prefix for readelf
The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.

Add the missing $(CROSS_COMPILE) prefix.

[ tglx: Rewrote changelog ]

Fixes: 98f7852537 ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-21 13:04:27 +02:00
Xunlei Pang 66aad4fdf2 x86/mm: Add support for gbpages to kernel_ident_mapping_init()
Kernel identity mappings on x86-64 kernels are created in two
ways: by the early x86 boot code, or by kernel_ident_mapping_init().

Native kernels (which is the dominant usecase) use the former,
but the kexec and the hibernation code uses kernel_ident_mapping_init().

There's a subtle difference between these two ways of how identity
mappings are created, the current kernel_ident_mapping_init() code
creates identity mappings always using 2MB page(PMD level) - while
the native kernel boot path also utilizes gbpages where available.

This difference is suboptimal both for performance and for memory
usage: kernel_ident_mapping_init() needs to allocate pages for the
page tables when creating the new identity mappings.

This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
to address these concerns.

The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.

It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.

( Note that this change alone does not activate gbpages in kexec,
  we are doing that in a separate patch. )

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-08 08:28:40 +02:00
Kees Cook 60854a12d2 x86/boot: Declare error() as noreturn
The compressed boot function error() is used to halt execution, but it
wasn't marked with "noreturn". This fixes that in preparation for
supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation
on panic, and calls error(). GCC would warn about a noreturn function
calling a non-noreturn function:

  arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’:
  arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return
   }
 ^

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20170506045116.GA2879@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-07 10:59:05 +02:00
Baoquan He da63b6b200 x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails
Dave found that a kdump kernel with KASLR enabled will reset to the BIOS
immediately if physical randomization failed to find a new position for
the kernel. A kernel with the 'nokaslr' option works in this case.

The reason is that KASLR will install a new page table for the identity
mapping, while it missed building it for the original kernel location
if KASLR physical randomization fails.

This only happens in the kexec/kdump kernel, because the identity mapping
has been built for kexec/kdump in the 1st kernel for the whole memory by
calling init_pgtable(). Here if physical randomizaiton fails, it won't build
the identity mapping for the original area of the kernel but change to a
new page table '_pgtable'. Then the kernel will triple fault immediately
caused by no identity mappings.

The normal kernel won't see this bug, because it comes here via startup_32()
and CR3 will be set to _pgtable already. In startup_32() the identity
mapping is built for the 0~4G area. In KASLR we just append to the existing
area instead of entirely overwriting it for on-demand identity mapping
building. So the identity mapping for the original area of kernel is still
there.

To fix it we just switch to the new identity mapping page table when physical
KASLR succeeds. Otherwise we keep the old page table unchanged just like
"nokaslr" does.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-28 08:31:15 +02:00
Ingo Molnar 4729277156 Merge branch 'WIP.x86/boot' into x86/boot, to pick up ready branch
The E820 rework in WIP.x86/boot has gone through a couple of weeks
of exposure in -tip, merge it in a wider fashion.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:49:31 +02:00
Zhengyi Shen 5af218439f x86/boot: Fix Sparse warning by including required header file
Include declarations for various symbols defined in the error.h header file
to fix the following Sparse warnings:

  arch/x86/boot/compressed/error.c:8:6:
	warning: symbol 'warn' was not declared. Should it be static?
  arch/x86/boot/compressed/error.c:15:6:
	warning: symbol 'error' was not declared. Should it be static?

Signed-off-by: Zhengyi Shen <shenzhengyi@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com
[ Fixed/enhanced the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-31 08:13:54 +02:00
Ingo Molnar 0871d5a66d Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up updates
Conflicts:
	arch/x86/xen/setup.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 09:02:26 +01:00
Linus Torvalds 292d386743 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Misc updates:

   - fix e820 error handling

   - convert page table setup code from assembly to C

   - fix kexec environment bug

   - ... plus small cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kconfig: Remove misleading note regarding hibernation and KASLR
  x86/boot: Fix KASLR and memmap= collision
  x86/e820/32: Fix e820_search_gap() error handling on x86-32
  x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
  x86/e820: Make e820_search_gap() static and remove unused variables
2017-02-20 14:04:37 -08:00
David Howells de8cb45862 efi: Get and store the secure boot status
Get the firmware's secure-boot status in the kernel boot wrapper and stash
it somewhere that the main kernel image can find.

The efi_get_secureboot() function is extracted from the ARM stub and (a)
generalised so that it can be called from x86 and (b) made to use
efi_call_runtime() so that it can be run in mixed-mode.

For x86, it is stored in boot_params and can be overridden by the boot
loader or kexec.  This allows secure-boot mode to be passed on to a new
kernel.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-5-git-send-email-ard.biesheuvel@linaro.org
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07 10:42:10 +01:00
David Howells a2cd2f3f29 x86/efi: Allow invocation of arbitrary runtime services
Provide the ability to perform mixed-mode runtime service calls for x86 in
the same way the following commit provided the ability to invoke for boot
services:

  0a637ee612 ("x86/efi: Allow invocation of arbitrary boot services")

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07 10:42:09 +01:00
Lukas Wunner db4545d9a7 x86/efi: Deduplicate efi_char16_printk()
Eliminate the separate 32-bit and 64x- bit code paths by way of the shiny
new efi_call_proto() macro.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-3-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 08:45:43 +01:00
Lukas Wunner 2bd79f30ee efi: Deduplicate efi_file_size() / _read() / _close()
There's one ARM, one x86_32 and one x86_64 version which can be folded
into a single shared version by masking their differences with the shiny
new efi_call_proto() macro.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 08:45:42 +01:00
Ingo Molnar 7410aa1ca3 x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures
Linus pointed out that relying on the compiler to pack structures with
enums is fragile not just for the kernel, but for external tooling as
well which might rely on our UAPI headers.

So separate the two from each other: introduce 'struct boot_e820_entry',
which is the boot protocol entry format.

This actually simplifies the code, as e820__update_table() is now never
called directly with boot protocol table entries - we can rely on
append_e820_table() and do a e820__update_table() call afterwards.

( This will allow further simplifications of __e820__update_table(),
  but that will be done in a separate patch. )

This change also has the side effect of not modifying the bootparams structure
anymore - which might be useful for debugging. In theory we could even constify
the boot_params structure - at least from the E820 code's point of view.

Remove the uapi/asm/e820/types.h file, as it's not used anymore - all
kernel side E820 types are defined in asm/e820/types.h.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-29 13:39:32 +01:00
Ingo Molnar 09821ff1d5 x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_"
So there's a number of constants that start with "E820" but which
are not types - these create a confusing mixture when seen together
with 'enum e820_type' values:

	E820MAP
	E820NR
	E820_X_MAX
	E820MAX

To better differentiate the 'enum e820_type' values prefix them
with E820_TYPE_.

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 22:55:22 +01:00
Ingo Molnar 61a5010163 x86/boot/e820: Rename everything to e820_table
No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 09:33:16 +01:00
Ingo Molnar acd4c04872 x86/boot/e820: Rename 'e820_map' variables to 'e820_array'
In line with the rename to 'struct e820_array', harmonize the naming of common e820
table variable names as well:

 e820          =>  e820_array
 e820_saved    =>  e820_array_saved
 e820_map      =>  e820_array
 initial_e820  =>  e820_array_init

This makes the variable names more consistent  and easier to grep for.

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 09:33:15 +01:00
Ingo Molnar 8ec67d97bf x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and 'struct e820_array'
The 'e820entry' and 'e820map' names have various annoyances:

 - the missing underscore departs from the usual kernel style
   and makes the code look weird,

 - in the past I kept confusing the 'map' with the 'entry', because
   a 'map' is ambiguous in that regard,

 - it's not really clear from the 'e820map' that this is a regular
   C array.

Rename them to 'struct e820_entry' and 'struct e820_array' accordingly.

( Leave the legacy UAPI header alone but do the rename in the bootparam.h
  and e820/types.h file - outside tools relying on these defines should
  either adjust their code, or should use the legacy header, or should
  create their private copies for the definitions. )

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 09:33:14 +01:00
Ingo Molnar 5520b7e7d2 x86/boot/e820: Remove spurious asm/e820/api.h inclusions
A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h
spuriously, without making direct use of it.

Removing it is not simple: over the years various .c code learned to rely
on this indirect inclusion.

Remove the unnecessary include - this should speed up the kernel build a bit,
as a large header is not included anymore in totally unrelated code.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 09:31:14 +01:00
Dave Jiang f28442497b x86/boot: Fix KASLR and memmap= collision
CONFIG_RANDOMIZE_BASE=y relocates the kernel to a random base address.

However it does not take into account the memmap= parameter passed in from
the kernel command line. This results in the kernel sometimes being put in
the middle of memmap.

Teach KASLR to not insert the kernel in memmap defined regions. We support
up to 4 memmap regions: any additional regions will cause KASLR to disable.

The mem_avoid set has been augmented to add up to 4 unusable regions of
memmaps provided by the user to exclude those regions from the set of valid
address range to insert the uncompressed kernel image.

The nn@ss ranges will be skipped by the mem_avoid set since it indicates
that memory is useable.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: dan.j.williams@intel.com
Cc: david@fromorbit.com
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/148417664156.131935.2248592164852799738.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 12:35:50 +01:00
Linus Torvalds a9042defa2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  NTB: correct ntb_spad_count comment typo
  misc: ibmasm: fix typo in error message
  Remove references to dead make variable LINUX_INCLUDE
  Remove last traces of ikconfig.h
  treewide: Fix printk() message errors
  Documentation/device-mapper: s/getsize/getsz/
2016-12-14 11:12:25 -08:00
Paul Bolle 846221cfb8 Remove references to dead make variable LINUX_INCLUDE
Commit 4fd06960f1 ("Use the new x86 setup code for i386") introduced a
reference to the make variable LINUX_INCLUDE. That reference got moved
around a bit and copied twice and now there are three references to it.

There has never been a definition of that variable. (Presumably that is
because it started out as a mistyped reference to LINUXINCLUDE.) So this
reference has always been an empty string. Let's remove it before it
spreads any further.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-12-14 10:54:28 +01:00
Linus Torvalds 06cc6b969c Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Misc cleanups/simplifications by Borislav Petkov, Paul Bolle and Wei
  Yang"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/64: Optimize fixmap page fixup
  x86/boot: Simplify the GDTR calculation assembly code a bit
  x86/boot/build: Remove always empty $(USERINCLUDE)
2016-12-12 14:13:30 -08:00
Linus Torvalds 3940cf0b3d Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this development cycle were:

   - Implement EFI dev path parser and other changes to fully support
     thunderbolt devices on Apple Macbooks (Lukas Wunner)

   - Add RNG seeding via the EFI stub, on ARM/arm64 (Ard Biesheuvel)

   - Expose EFI framebuffer configuration to user-space, to improve
     tooling (Peter Jones)

   - Misc fixes and cleanups (Ivan Hu, Wei Yongjun, Yisheng Xie, Dan
     Carpenter, Roy Franz)"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
  thunderbolt: Compile on x86 only
  thunderbolt, efi: Fix Kconfig dependencies harder
  thunderbolt, efi: Fix Kconfig dependencies
  thunderbolt: Use Device ROM retrieved from EFI
  x86/efi: Retrieve and assign Apple device properties
  efi: Allow bitness-agnostic protocol calls
  efi: Add device path parser
  efi/arm*/libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
  efi/libstub: Add random.c to ARM build
  efi: Add support for seeding the RNG from a UEFI config table
  MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem
  efi/libstub: Fix allocation size calculations
  efi/efivar_ssdt_load: Don't return success on allocation failure
  efifb: Show framebuffer layout as device attributes
  efi/efi_test: Use memdup_user() as a cleanup
  efi/efi_test: Fix uninitialized variable 'rv'
  efi/efi_test: Fix uninitialized variable 'datasize'
  efi/arm*: Fix efi_init() error handling
  efi: Remove unused include of <linux/version.h>
2016-12-12 10:03:44 -08:00
H.J. Lu a980ce352f x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
Since the bootloader may load the compressed x86 kernel at any address,
it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.

Otherwise, linker in binutils 2.27 will optimize GOT load into the
absolute address when building the compressed x86 kernel as a non-PIE
executable.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Small wording changes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 11:05:28 +01:00
Lukas Wunner 58c5475aba x86/efi: Retrieve and assign Apple device properties
Apple's EFI drivers supply device properties which are needed to support
Macs optimally. They contain vital information which cannot be obtained
any other way (e.g. Thunderbolt Device ROM). They're also used to convey
the current device state so that OS drivers can pick up where EFI
drivers left (e.g. GPU mode setting).

There's an EFI driver dubbed "AAPL,PathProperties" which implements a
per-device key/value store. Other EFI drivers populate it using a custom
protocol. The macOS bootloader /System/Library/CoreServices/boot.efi
retrieves the properties with the same protocol. The kernel extension
AppleACPIPlatform.kext subsequently merges them into the I/O Kit
registry (see ioreg(8)) where they can be queried by other kernel
extensions and user space.

This commit extends the efistub to retrieve the device properties before
ExitBootServices is called. It assigns them to devices in an fs_initcall
so that they can be queried with the API in <linux/property.h>.

Note that the device properties will only be available if the kernel is
booted with the efistub. Distros should adjust their installers to
always use the efistub on Macs. grub with the "linux" directive will not
work unless the functionality of this commit is duplicated in grub.
(The "linuxefi" directive should work but is not included upstream as of
this writing.)

The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and
looks like this:

typedef struct {
	unsigned long version; /* 0x10000 */
	efi_status_t (*get) (
		IN	struct apple_properties_protocol *this,
		IN	struct efi_dev_path *device,
		IN	efi_char16_t *property_name,
		OUT	void *buffer,
		IN OUT	u32 *buffer_len);
		/* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */
	efi_status_t (*set) (
		IN	struct apple_properties_protocol *this,
		IN	struct efi_dev_path *device,
		IN	efi_char16_t *property_name,
		IN	void *property_value,
		IN	u32 property_value_len);
		/* allocates copies of property name and value */
		/* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */
	efi_status_t (*del) (
		IN	struct apple_properties_protocol *this,
		IN	struct efi_dev_path *device,
		IN	efi_char16_t *property_name);
		/* EFI_SUCCESS, EFI_NOT_FOUND */
	efi_status_t (*get_all) (
		IN	struct apple_properties_protocol *this,
		OUT	void *buffer,
		IN OUT	u32 *buffer_len);
		/* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */
} apple_properties_protocol;

Thanks to Pedro Vilaça for this blog post which was helpful in reverse
engineering Apple's EFI drivers and bootloader:
https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/

If someone at Apple is reading this, please note there's a memory leak
in your implementation of the del() function as the property struct is
freed but the name and value allocations are not.

Neither the macOS bootloader nor Apple's EFI drivers check the protocol
version, but we do to avoid breakage if it's ever changed. It's been the
same since at least OS X 10.6 (2009).

The get_all() function conveniently fills a buffer with all properties
in marshalled form which can be passed to the kernel as a setup_data
payload. The number of device properties is dynamic and can change
between a first invocation of get_all() (to determine the buffer size)
and a second invocation (to retrieve the actual buffer), hence the
peculiar loop which does not finish until the buffer size settles.
The macOS bootloader does the same.

The setup_data payload is later on unmarshalled in an fs_initcall. The
idea is that most buses instantiate devices in "subsys" initcall level
and drivers are usually bound to these devices in "device" initcall
level, so we assign the properties in-between, i.e. in "fs" initcall
level.

This assumes that devices to which properties pertain are instantiated
from a "subsys" initcall or earlier. That should always be the case
since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only
supports ACPI and PCI nodes and we've fully scanned those buses during
"subsys" initcall level.

The second assumption is that properties are only needed from a "device"
initcall or later. Seems reasonable to me, but should this ever not work
out, an alternative approach would be to store the property sets e.g. in
a btree early during boot. Then whenever device_add() is called, an EFI
Device Path would have to be constructed for the newly added device,
and looked up in the btree. That way, the property set could be assigned
to the device immediately on instantiation. And this would also work for
devices instantiated in a deferred fashion. It seems like this approach
would be more complicated and require more code. That doesn't seem
justified without a specific use case.

For comparison, the strategy on macOS is to assign properties to objects
in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
That approach is definitely wrong as it fails for devices not present in
the namespace: The NHI EFI driver supplies properties for attached
Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
level behind the host controller is described in the namespace.
Consequently macOS cannot assign properties for chained devices. With
Thunderbolt 2 they started to describe three device levels behind host
controllers in the namespace but this grossly inflates the SSDT and
still fails if the user daisy-chained more than three devices.

We copy the property names and values from the setup_data payload to
swappable virtual memory and afterwards make the payload available to
the page allocator. This is just for the sake of good housekeeping, it
wouldn't occupy a meaningful amount of physical memory (4444 bytes on my
machine). Only the payload is freed, not the setup_data header since
otherwise we'd break the list linkage and we cannot safely update the
predecessor's ->next link because there's no locking for the list.

The payload is currently not passed on to kexec'ed kernels, same for PCI
ROMs retrieved by setup_efi_pci(). This can be added later if there is
demand by amending setup_efi_state(). The payload can then no longer be
made available to the page allocator of course.

Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1]
Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Vilaça <reverser@put.as>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: grub-devel@gnu.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-13 08:23:16 +01:00
Wei Yang 064025f7ea x86/boot: Simplify the GDTR calculation assembly code a bit
This patch calculates the GDTR's base address via a single instruction.

( EBP contains the address where it is loaded and GDTR's base address is
  already set to "gdt" in compilation. It is fine to get the correct base
  address by adding the delta to GDTR's base address. )

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1478015364-5547-1-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-07 08:33:59 +01:00
Ingo Molnar 5465fe0fc3 * Refactor the EFI memory map code into architecture neutral files
and allow drivers to permanently reserve EFI boot services regions
    on x86, as well as ARM/arm64 - Matt Fleming
 
  * Add ARM support for the EFI esrt driver - Ard Biesheuvel
 
  * Make the EFI runtime services and efivar API interruptible by
    swapping spinlocks for semaphores - Sylvain Chouleur
 
  * Provide the EFI identity mapping for kexec which allows kexec to
    work on SGI/UV platforms with requiring the "noefi" kernel command
    line parameter - Alex Thorlton
 
  * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel
 
  * Merge the EFI test driver being carried out of tree until now in
    the FWTS project - Ivan Hu
 
  * Expand the list of flags for classifying EFI regions as "RAM" on
    arm64 so we align with the UEFI spec - Ard Biesheuvel
 
  * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
    or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
    services function table for direct calls, alleviating us from
    having to maintain the custom function table - Lukas Wunner
 
  * Miscellaneous cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQI2BAABCAAgBQJX0tCTGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
 wI0jPVWLVBAAn/iM91Vmhggdk3t0wCMJzrNGonw61VJ9TZJVbCUJyiH0qdDUThhj
 R4rO+6Vf6yOuyswu+mGmae61tfsjwJHH+IPpB8nRLIGQRwzoxk+aGC7FzmQ0ISVO
 wIdv5shsmeWhFAyNB1D4hzlp1NxOZaqcU/0cfUVGe4HmK0Js3tUpWWx8VgJ7yvW+
 X1PBbfyChArGqiwV6FJz/mJxRAgByUfhvYMcX9HhQkou6F4U5Y8l3vlhUMbuAZAi
 ZfG2LWGYCQ+F4XKxMq2QDAtdUgBzlYWw6W60o55x9WO4cEVSzNVRgedto5o1Zea9
 2QGEr94gim+e5cJ/HeDIEmbWZhAqIdcNDqXSSBd1CDVQytp4PNAn6rxk+2S9kxoe
 T9Mk523HEabo+AZvDAPPJlzcsnIe83JYy69M1xFvcP25ebk7y2BwQtd1jwWPrPDQ
 Q/llzF93aezUFR/guvIw0oHckhQl0ZkNedL9Tq4+UKL0ibp2X4gSX636/x4PkBSP
 5+pyfmO1SAqTiiMQGQMnp4+ngPQeQrxkmVnh1P7cKlTNXg1IoS03t46Xn2Pj10cd
 3KneVDeN9DKIAOn7wPKuPnjTho+9FH36xbwTaIgbt0cWuFFfu090rmqOQfjAJEDN
 foHzsMZ7S6CmeOJnj97NNR8sMQDcc+p9bh1KXpJIHaZAgrKmvqPZpMk=
 =G7L6
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core

Pull EFI updates from Matt Fleming:

"* Refactor the EFI memory map code into architecture neutral files
   and allow drivers to permanently reserve EFI boot services regions
   on x86, as well as ARM/arm64 - Matt Fleming

 * Add ARM support for the EFI esrt driver - Ard Biesheuvel

 * Make the EFI runtime services and efivar API interruptible by
   swapping spinlocks for semaphores - Sylvain Chouleur

 * Provide the EFI identity mapping for kexec which allows kexec to
   work on SGI/UV platforms with requiring the "noefi" kernel command
   line parameter - Alex Thorlton

 * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel

 * Merge the EFI test driver being carried out of tree until now in
   the FWTS project - Ivan Hu

 * Expand the list of flags for classifying EFI regions as "RAM" on
   arm64 so we align with the UEFI spec - Ard Biesheuvel

 * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
   or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
   services function table for direct calls, alleviating us from
   having to maintain the custom function table - Lukas Wunner

 * Miscellaneous cleanups and fixes"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-13 20:21:55 +02:00
Lukas Wunner 0a637ee612 x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.

The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.

Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.

When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.

The 8 boot services can consequently be removed from struct efi_config.

No functional change intended (for now).

Example -- invocation of free_pool before (64 bit code path):
0x2d4      movq  %ds:efi_early, %rdx          ; efi_early
0x2db      movq  %ss:arg_0-0x20(%rsp), %rsi
0x2e0      xorl  %eax, %eax
0x2e2      movq  %ds:0x28(%rdx), %rdi         ; efi_early->free_pool
0x2e6      callq *%ds:0x58(%rdx)              ; efi_early->call()

Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc      movq  %ds:efi_early, %rax          ; efi_early
0x0e3      cmpb  $0, %ds:0x28(%rax)           ; !efi_early->is64 ?
0x0e7      movq  %ds:0x20(%rax), %rdx         ; efi_early->call()
0x0eb      movq  %ds:0x10(%rax), %rax         ; efi_early->boot_services
0x0ef      je    $0x150
0x0f1      movq  %ds:0x48(%rax), %rdi         ; free_pool (64 bit)
0x0f5      xorl  %eax, %eax
0x0f7      callq *%rdx
...
0x150      movl  %ds:0x30(%rax), %edi         ; free_pool (32 bit)
0x153      jmp   $0x0f5

Size of eboot.o text section:
CONFIG_X86_32:                         6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED:    7670 before, 7573 after
CONFIG_X86_64 &&  CONFIG_EFI_MIXED:    7670 before, 8319 after

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:57 +01:00
Lukas Wunner 15cf7cae08 x86/efi: Remove unused find_bits() function
Left behind by commit fc37206427 ("efi/libstub: Move Graphics Output
Protocol handling to generic code").

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:50 +01:00
Colin Ian King ac0e94b63e x86/efi: Initialize status to ensure garbage is not returned on small size
Although very unlikey, if size is too small or zero, then we end up with
status not being set and returning garbage. Instead, initializing status to
EFI_INVALID_PARAMETER to indicate that size is invalid in the calls to
setup_uga32 and setup_uga64.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:44 +01:00
Jeffrey Hugo d64934019f x86/efi: Use efi_exit_boot_services()
The eboot code directly calls ExitBootServices.  This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct.  The
eboot code attempts allocations after calling ExitBootSerives which is
not permitted per the spec.  Call the efi_exit_boot_services() helper
intead, which handles the allocation scenario properly.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05 12:40:16 +01:00
Jeffrey Hugo dadb57abc3 efi/libstub: Allocate headspace in efi_get_memory_map()
efi_get_memory_map() allocates a buffer to store the memory map that it
retrieves.  This buffer may need to be reused by the client after
ExitBootServices() is called, at which point allocations are not longer
permitted.  To support this usecase, provide the allocated buffer size back
to the client, and allocate some additional headroom to account for any
reasonable growth in the map that is likely to happen between the call to
efi_get_memory_map() and the client reusing the buffer.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05 12:18:17 +01:00
Linus Torvalds 77cd3d0c43 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes:

   - add initial commits to randomize kernel memory section virtual
     addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
     (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

   - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
     Cook)

   - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

   - misc cleanups/fixes"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Simplify EBDA-vs-BIOS reservation logic
  x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
  x86/boot: Reorganize and clean up the BIOS area reservation code
  x86/mm: Do not reference phys addr beyond kernel
  x86/mm: Add memory hotplug support for KASLR memory randomization
  x86/mm: Enable KASLR for vmalloc memory regions
  x86/mm: Enable KASLR for physical mapping memory regions
  x86/mm: Implement ASLR for kernel memory regions
  x86/mm: Separate variable for trampoline PGD
  x86/mm: Add PUD VA support for physical mapping
  x86/mm: Update physical mapping variable names
  x86/mm: Refactor KASLR entropy functions
  x86/KASLR: Fix boot crash with certain memory configurations
  x86/boot/64: Add forgotten end of function marker
  x86/KASLR: Allow randomization below the load address
  x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
  x86/KASLR: Randomize virtual address separately
  x86/KASLR: Clarify identity map interface
  x86/boot: Refuse to build with data relocations
  x86/KASLR, x86/power: Remove x86 hibernation restrictions
2016-07-25 17:32:28 -07:00
Thomas Garnier 021182e52f x86/mm: Enable KASLR for physical mapping memory regions
Add the physical mapping in the list of randomized memory regions.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowing the base address and physical memory size, an attacker
can deduce the PDE virtual address for the vDSO memory page. This attack
was demonstrated at CanSecWest 2016, in the following presentation:

  "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
  https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf

(See second part of the presentation).

The exploits used against Linux worked successfully against 4.6+ but
fail with KASLR memory enabled:

  https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits

Similar research was done at Google leading to this patch proposal.

Variants exists to overwrite /proc or /sys objects ACLs leading to
elevation of privileges. These variants were tested against 4.6+.

The page offset used by the compressed kernel retains the static value
since it is not yet randomized during this boot stage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:35:15 +02:00
Thomas Garnier d899a7d146 x86/mm: Refactor KASLR entropy functions
Move the KASLR entropy functions into arch/x86/lib to be used in early
kernel boot for KASLR memory randomization.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:45 +02:00
Baoquan He 6daa2ec0b3 x86/KASLR: Fix boot crash with certain memory configurations
Ye Xiaolong reported this boot crash:

|
|  XZ-compressed data is corrupt
|
|   -- System halted
|

Fix the bug in mem_avoid_overlap() of finding the earliest overlap.

Reported-and-tested-by: Ye Xiaolong <xiaolong.ye@intel.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 14:36:19 +02:00
Colin Ian King f6d1747f89 x86/efi: Remove unused variable 'efi'
Remove unused variable 'efi', it is never used. This fixes the following
clang build warning:

  arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:55 +02:00
Yinghai Lu e066cc4777 x86/KASLR: Allow randomization below the load address
Currently the kernel image physical address randomization's lower
boundary is the original kernel load address.

For bootloaders that load kernels into very high memory (e.g. kexec),
this means randomization takes place in a very small window at the
top of memory, ignoring the large region of physical memory below
the load address.

Since mem_avoid[] is already correctly tracking the regions that must be
avoided, this patch changes the minimum address to whatever is less:
512M (to conservatively avoid unknown things in lower memory) or the
load address. Now, for example, if the kernel is loaded at 8G, [512M,
8G) will be added to the list of possible physical memory positions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog, refactored the code to use min(). ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
[ Edited the changelog some more, plus the code comment as well. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:05 +02:00
Kees Cook ed9f007ee6 x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
We want the physical address to be randomized anywhere between
16MB and the top of physical memory (up to 64TB).

This patch exchanges the prior slots[] array for the new slot_areas[]
array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
address offset for 64-bit. As before, process_e820_entry() walks
memory and populates slot_areas[], splitting on any detected mem_avoid
collisions.

Finally, since the slots[] array and its associated functions are not
needed any more, so they are removed.

Based on earlier patches by Baoquan He.

Originally-from: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:05 +02:00
Baoquan He 8391c73c96 x86/KASLR: Randomize virtual address separately
The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.

Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.

[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:04 +02:00
Kees Cook 11fdf97a3c x86/KASLR: Clarify identity map interface
This extracts the call to prepare_level4() into a top-level function
that the user of the pagetable.c interface must call to initialize
the new page tables. For clarity and to match the "finalize" function,
it has been renamed to initialize_identity_maps(). This function also
gains the initialization of mapping_info so we don't have to do it each
time in add_identity_map().

Additionally add copyright notice to the top, to make it clear that the
bulk of the pagetable.c code was written by Yinghai, and that I just
added bugs later. :)

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:04 +02:00
Kees Cook 98f7852537 x86/boot: Refuse to build with data relocations
The compressed kernel is built with -fPIC/-fPIE so that it can run in any
location a bootloader happens to put it. However, since ELF relocation
processing is not happening (and all the relocation information has
already been stripped at link time), none of the code can use data
relocations (e.g. static assignments of pointers). This is already noted
in a warning comment at the top of misc.c, but this adds an explicit
check for the condition during the linking stage to block any such bugs
from appearing.

If this was in place with the earlier bug in pagetable.c, the build
would fail like this:

  ...
    CC      arch/x86/boot/compressed/pagetable.o
    DATAREL arch/x86/boot/compressed/vmlinux
  error: arch/x86/boot/compressed/pagetable.o has data relocations!
  make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
  ...

A clean build shows:

  ...
    CC      arch/x86/boot/compressed/pagetable.o
    DATAREL arch/x86/boot/compressed/vmlinux
    LD      arch/x86/boot/compressed/vmlinux
  ...

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:03 +02:00
Kees Cook 65fe935dd2 x86/KASLR, x86/power: Remove x86 hibernation restrictions
With the following fix:

  70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")

... there is no longer a problem with hibernation resuming a
KASLR-booted kernel image, so remove the restriction.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:03 +02:00
Linus Torvalds 5b26fc8824 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
   unexports symbols which are not used in the current config [Nicolas
   Pitre]

 - several kbuild rule cleanups [Masahiro Yamada]

 - warning option adjustments for gcov etc [Arnd Bergmann]

 - a few more small fixes

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
  kbuild: move -Wunused-const-variable to W=1 warning level
  kbuild: fix if_change and friends to consider argument order
  kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
  kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
  gcov: disable -Wmaybe-uninitialized warning
  gcov: disable tree-loop-im to reduce stack usage
  gcov: disable for COMPILE_TEST
  Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
  Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
  kbuild: forbid kernel directory to contain spaces and colons
  kbuild: adjust ksym_dep_filter for some cmd_* renames
  kbuild: Fix dependencies for final vmlinux link
  kbuild: better abstract vmlinux sequential prerequisites
  kbuild: fix call to adjust_autoksyms.sh when output directory specified
  kbuild: Get rid of KBUILD_STR
  kbuild: rename cmd_as_s_S to cmd_cpp_s_S
  kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
  kbuild: drop redundant "PHONY += FORCE"
  kbuild: delete unnecessary "@:"
  kbuild: mark help target as PHONY
  ...
2016-05-26 22:01:22 -07:00
Linus Torvalds 9a45f036af Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The biggest changes in this cycle were:

   - prepare for more KASLR related changes, by restructuring, cleaning
     up and fixing the existing boot code.  (Kees Cook, Baoquan He,
     Yinghai Lu)

   - simplifly/concentrate subarch handling code, eliminate
     paravirt_enabled() usage.  (Luis R Rodriguez)"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  x86/KASLR: Clarify purpose of each get_random_long()
  x86/KASLR: Add virtual address choosing function
  x86/KASLR: Return earliest overlap when avoiding regions
  x86/KASLR: Add 'struct slot_area' to manage random_addr slots
  x86/boot: Add missing file header comments
  x86/KASLR: Initialize mapping_info every time
  x86/boot: Comment what finalize_identity_maps() does
  x86/KASLR: Build identity mappings on demand
  x86/boot: Split out kernel_ident_mapping_init()
  x86/boot: Clean up indenting for asm/boot.h
  x86/KASLR: Improve comments around the mem_avoid[] logic
  x86/boot: Simplify pointer casting in choose_random_location()
  x86/KASLR: Consolidate mem_avoid[] entries
  x86/boot: Clean up pointer casting
  x86/boot: Warn on future overlapping memcpy() use
  x86/boot: Extract error reporting functions
  x86/boot: Correctly bounds-check relocations
  x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
  x86/boot: Fix "run_size" calculation
  x86/boot: Calculate decompression size during boot not build
  ...
2016-05-16 15:54:01 -07:00
Kees Cook d2d3462f9f x86/KASLR: Clarify purpose of each get_random_long()
KASLR will be calling get_random_long() twice, but the debug output
won't distinguishing between them. This patch adds a report on when it
is fetching the physical vs virtual address. With this, once the virtual
offset is separate, the report changes from:

 KASLR using RDTSC...
 KASLR using RDTSC...

into:

 Physical KASLR using RDTSC...
 Virtual KASLR using RDTSC...

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:08 +02:00
Baoquan He 071a74930e x86/KASLR: Add virtual address choosing function
To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_addr() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.

For clarity and readability, find_random_addr() is renamed to
find_random_phys_addr() and has "size" renamed to "image_size" to match
find_random_virt_addr().

Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, refactored slot calculation for readability. ]
[ Renamed find_random_phys_addr() and size argument. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:06 +02:00
Kees Cook 06486d6c97 x86/KASLR: Return earliest overlap when avoiding regions
In preparation for being able to detect where to split up contiguous
memory regions that overlap with memory regions to avoid, we need to
pass back what the earliest overlapping region was. This modifies the
overlap checker to return that information.

Based on a separate mem_min_overlap() implementation by Baoquan He.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:04 +02:00
Baoquan He c401cf1524 x86/KASLR: Add 'struct slot_area' to manage random_addr slots
In order to support KASLR moving the kernel anywhere in physical memory
(which could be up to 64TB), we need to handle counting the potential
randomization locations in a more efficient manner.

In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time. This method would cost too much memory and it's
also very inefficient to get and save the slot information into the slot
array one by one.

This patch introduces 'struct slot_area' to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas[] will avoid the mem_avoid[] regions.

Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this level of fragmentation will be extremely rare, but we
choose a suitably large value (100) for the array. If setup_data forces
the slot_area array to become highly fragmented and there are more
slots available beyond the first 100 found, the rest will be ignored
for KASLR selection.

The function store_slot_info() is used to calculate the number of slots
available in the passed-in memory region and stores it into slot_areas[]
after adjusting for alignment and size requirements.

Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, squashed with new functions. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:04 +02:00
Kees Cook cb18ef0da2 x86/boot: Add missing file header comments
There were some files with missing header comments. Since they are
included from both compressed and regular kernels, make note of that.
Also corrects a typo in the mem_avoid comments.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:03 +02:00
Kees Cook 434a6c9f90 x86/KASLR: Initialize mapping_info every time
As it turns out, mapping_info DOES need to be initialized every
time, because pgt_data address could be changed during kernel
relocation. So it can not be build time assigned.

Without this, page tables were not being corrected updated, which
could cause reboots when a physical address beyond 2G was chosen.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:02 +02:00
Borislav Petkov 36a39ac967 x86/boot: Comment what finalize_identity_maps() does
So it is not really obvious that finalize_identity_maps() doesn't do any
finalization but it *actually* writes CR3 with the ident PGD. Comment
that at the call site.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bhe@redhat.com
Cc: dyoung@redhat.com
Cc: jkosina@suse.cz
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Cc: vgoyal@redhat.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20160507100541.GA24613@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10 10:12:02 +02:00
Kees Cook 3a94707d7a x86/KASLR: Build identity mappings on demand
Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identity mapping.
To support ranges above this, we need to have an identity mapping for the
desired memory range before we can decompress (and later run) the kernel.

32-bit kernels already have the needed identity mapping. This patch adds
identity mappings for the needed memory ranges on 64-bit kernels. This
happens in two possible boot paths:

If loaded via startup_32(), we need to set up the needed identity map.

If loaded from a 64-bit bootloader, the bootloader will have already
set up an identity mapping, and we'll start via the compressed kernel's
startup_64(). In this case, the bootloader's page tables need to be
avoided while selecting the new uncompressed kernel location. If not,
the decompressor could overwrite them during decompression.

To accomplish this, we could walk the pagetable and find every page
that is used, and add them to mem_avoid, but this needs extra code and
will require increasing the size of the mem_avoid array.

Instead, we can create a new set of page tables for our own identity
mapping instead. The pages for the new page table will come from the
_pagetable section of the compressed kernel, which means they are
already contained by in mem_avoid array. To do this, we reuse the code
from the uncompressed kernel's identity mapping routines.

The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
init_size, as now the compressed kernel's _rodata to _end will contribute
to init_size.

To handle the possible mappings, we need to increase the existing page
table buffer size:

When booting via startup_64(), we need to cover the old VO, params,
cmdline and uncompressed kernel. In an extreme case we could have them
all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
This gets us to 19 pages total.

When booting via startup_32(), KASLR could move the uncompressed kernel
above 4G, so we need to create extra identity mappings, which should only
need (2+2) pages at most when it is beyond the 512G boundary. So 19
pages is sufficient for this case as well.

The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
names to maintain logical consistency with the existing BOOT_HEAP_SIZE
and BOOT_STACK_SIZE defines.

This patch is based on earlier patches from Yinghai Lu and Baoquan He.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:39 +02:00
Kees Cook ed09acde44 x86/KASLR: Improve comments around the mem_avoid[] logic
This attempts to improve the comments that describe how the memory
range used for decompression is avoided. Additionally uses an enum
instead of raw numbers for the mem_avoid[] indexing.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20160506194459.GA16480@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:38 +02:00
Borislav Petkov 549f90db68 x86/boot: Simplify pointer casting in choose_random_location()
Pass them down as 'unsigned long' directly and get rid of more casting and
assignments.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bhe@redhat.com
Cc: dyoung@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Cc: vgoyal@redhat.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20160506115015.GI24044@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:38 +02:00
Yinghai Lu 9dc1969c24 x86/KASLR: Consolidate mem_avoid[] entries
The mem_avoid[] array is used to track positions that should be avoided (like
the compressed kernel, decompression code, etc) when selecting a memory
position for the randomly relocated kernel. Since ZO is now at the end of
the decompression buffer and the decompression code (and its heap and
stack) are at the front, we can safely consolidate the decompression entry,
the heap entry, and the stack entry. The boot_params memory, however, could
be elsewhere, so it should be explicitly included.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rwrote changelog, cleaned up code comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-06 09:00:59 +02:00
Kees Cook 2bc1cd39fa x86/boot: Clean up pointer casting
Currently extract_kernel() defines the input and output buffer pointers
as "unsigned char *" since that's effectively what they are. It passes
these to the decompressor routine and to the ELF parser, which both
logically deal with buffer pointers too. There is some casting ("unsigned
long") done to validate the numerical value of the pointers, but it is
relatively limited.

However, choose_random_location() operates almost exclusively on the
numerical representation of these pointers, so it ended up carrying
a lot of "unsigned long" casts. With the future physical/virtual split
these casts were going to multiply, so this attempts to solve the
problem by doing all the casting in choose_random_location()'s entry
and return instead of through-out the code. Adjusts argument names to
be more meaningful, and changes one us of "choice" to "output" to make
the future physical/virtual split more clear (i.e. "choice" should be
strictly a function return value and not used as an intermediate).

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462486436-3707-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-06 09:00:59 +02:00
Kees Cook 00ec2c3703 x86/boot: Warn on future overlapping memcpy() use
If an overlapping memcpy() is ever attempted, we should at least report
it, in case it might lead to problems, so it could be changed to a
memmove() call instead.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1462229461-3370-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-03 08:15:58 +02:00
Kees Cook dc425a6e14 x86/boot: Extract error reporting functions
Currently to use warn(), a caller would need to include misc.h. However,
this means they would get the (unavailable during compressed boot)
gcc built-in memcpy family of functions. But since string.c is defining
these memcpy functions for use by misc.c, we end up in a weird circular
dependency.

To break this loop, move the error reporting functions outside of misc.c
with their own header so that they can be independently included by
other sources. Since the screen-writing routines use memmove(), keep the
low-level *_putstr() functions in misc.c.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1462229461-3370-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-03 08:15:58 +02:00
Yinghai Lu 4abf061bf8 x86/boot: Correctly bounds-check relocations
Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows.  The valid range
should be VO [_text, __bss_start] in the loaded physical address space.

This patch adds an export for __bss_start in voffset.h and uses it to
set the correct limit for max_addr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:03:30 +02:00
Yinghai Lu 4d2d542482 x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
Since 'run_size' is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them, and renames
'run_size' to the more descriptive 'kernel_total_size'.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:03:30 +02:00
Yinghai Lu 67b6662559 x86/boot: Fix "run_size" calculation
Currently, the "run_size" variable holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections from the vmlinux, and adds them as follows:

  run_size = $(( $offsetA + $sizeA + $sizeB ))

However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.

First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.

 [bhe@x1 linux]$ objdump -h vmlinux

 vmlinux:     file format elf64-x86-64

 Sections:
 Idx Name    Size      VMA               LMA               File off  Algn
  27 .bss    00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
             ALLOC
  28 .brk    00027000  ffffffff82038000  0000000002038000  012c8000  2**0
             ALLOC

Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:

 0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

However, if we instead examine the ELF LOAD program headers, we see a
different picture.

 [bhe@x1 linux]$ readelf -l vmlinux

 Elf file type is EXEC (Executable file)
 Entry point 0x1000000
 There are 5 program headers, starting at offset 64

 Program Headers:
  Type        Offset             VirtAddr           PhysAddr
              FileSiz            MemSiz              Flags  Align
  LOAD        0x0000000000200000 0xffffffff81000000 0x0000000001000000
              0x0000000000b5e000 0x0000000000b5e000  R E    200000
  LOAD        0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
              0x0000000000145000 0x0000000000145000  RW     200000
  LOAD        0x0000000001000000 0x0000000000000000 0x0000000001d45000
              0x0000000000018158 0x0000000000018158  RW     200000
  LOAD        0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
              0x000000000016a000 0x0000000000301000  RWE    200000
  NOTE        0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
              0x00000000000001bc 0x00000000000001bc         4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
          __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
          __modver
   01     .data .vvar
   02     .data..percpu
   03     .init.text .init.data .x86_cpu_dev.init .parainstructions
          .altinstructions .altinstr_replacement .iommu_table .apicdrivers
          .exit.text .smp_locks .bss .brk
   04     .notes

As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:

 0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000

 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.

Dependence before:

 boot/header.S ==> boot/voffset.h ==> vmlinux
 boot/header.S ==> compressed/vmlinux ==> compressed/misc.c

Dependence after:

 boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Fixes: e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:03:30 +02:00
Yinghai Lu d607251ba9 x86/boot: Calculate decompression size during boot not build
Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:

 - output + output_len (VO) could be much bigger than input + input_len
   (ZO). In this case, the decompressed kernel plus relocs could overwrite
   the decompression code while it is running.

 - The head code of ZO could be bigger than z_extract_offset. In this case
   an overwrite could happen when the head code is running to move ZO to
   the end of buffer. Though currently the size of the head code is very
   small it's still a potential risk. Since there is no rule to limit the
   size of the head code of ZO, it runs the risk of suddenly becoming a
   (hard to find) bug.

Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen,
and further corrects the comments describing the calculations.

Since we have (in the previous patch) made ZO always be located against
the end of decompression buffer, z_extract_offset is only used here to
calculate an appropriate buffer size (INIT_SIZE), and is not longer used
elsewhere. As such, it can be removed from voffset.h.

Additionally clean up #if/#else #define to improve readability.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:03:29 +02:00
Yinghai Lu 974f221c84 x86/boot: Move compressed kernel to the end of the decompression buffer
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:

  77d1a49995 ("x86, boot: make symbols from the main vmlinux available")
  37ba7ab5e3 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")

Specifically:

All names prefixed with 'VO_':

 - relate to the uncompressed kernel image

 - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)

All names prefixed with 'ZO_':

 - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
   which is composed of the following memory areas:
     - head text
     - compressed kernel (VO image and relocs table)
     - decompressor code

 - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The 'INIT_SIZE' value is used to find the larger of the two image sizes:

 #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
 #define VO_INIT_SIZE    (VO__end - VO__text)

 #if ZO_INIT_SIZE > VO_INIT_SIZE
 # define INIT_SIZE ZO_INIT_SIZE
 #else
 # define INIT_SIZE VO_INIT_SIZE
 #endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.

To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.

z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

                            |-----compressed kernel image------|
                            V                                  V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          VO__end    ZO__end
            ^                                         ^
            |-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

                            |-compressed kernel image-|
                            V                         V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          ZO__end    VO__end
            ^                                                  ^
            |------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.

Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:03:29 +02:00
Baoquan He 6f9af75faa x86/KASLR: Handle kernel relocations above 2G correctly
When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.

To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 09:58:26 +02:00
Ard Biesheuvel fc37206427 efi/libstub: Move Graphics Output Protocol handling to generic code
The Graphics Output Protocol code executes in the stub, so create a generic
version based on the x86 version in libstub so that we can move other archs
to it in subsequent patches. The new source file gop.c is added to the
libstub build for all architectures, but only wired up for x86.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-18-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:33:57 +02:00
Ard Biesheuvel 2c23b73c2d x86/efi: Prepare GOP handling code for reuse as generic code
In preparation of moving this code to drivers/firmware/efi and reusing
it on ARM and arm64, apply any changes that will be required to make this
code build for other architectures. This should make it easier to track
down problems that this move may cause to its operation on x86.

Note that the generic version uses slightly different ways of casting the
protocol methods and some other variables to the correct types, since such
method calls are not loosely typed on ARM and arm64 as they are on x86.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-17-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:33:56 +02:00
Kees Cook 81b785f3e4 x86/boot: Rename overlapping memcpy() to memmove()
Instead of having non-standard memcpy() behavior, explicitly call the new
function memmove(), make it available to the decompressors, and switch
the two overlap cases (screen scrolling and ELF parsing) to use memmove().
Additionally documents the purpose of compressed/string.c.

Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20160426214606.GA5758@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:02:29 +02:00
Kees Cook 0f8ede1b8c x86/KASLR: Warn when KASLR is disabled
If KASLR is built in but not available at run-time (either due to the
current conflict with hibernation, command-line request, or e820 parsing
failures), announce the state explicitly. To support this, a new "warn"
function is created, based on the existing "error" function.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:00:51 +02:00
Kees Cook bf0118dbba x86/boot: Make memcpy() handle overlaps
Two uses of memcpy() (screen scrolling and ELF parsing) were handling
overlapping memory areas. While there were no explicitly noticed bugs
here (yet), it is best to fix this so that the copying will always be
safe.

Instead of making a new memmove() function that might collide with other
memmove() definitions in the decompressors, this just makes the compressed
boot code's copy of memcpy() overlap-safe.

Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461185746-8017-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:00:50 +02:00
Kees Cook 1f208de37d x86/boot: Clean up things used by decompressors
This rearranges the pieces needed to include the decompressor code
in misc.c. It wasn't obvious why things were there, so a comment was
added and definitions consolidated.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:00:50 +02:00
Baoquan He e8581e3d67 x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:00:50 +02:00
Baoquan He 4252db1055 x86/KASLR: Update description for decompressor worst case size
The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are currently 6 other decompressors
that could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation, fixes several typos, moves
the comment to header.S, updates references, and adds a note at the end
of the decompressor include list to remind us about updating the comment
in the future.

(Instead of moving the comment to mkpiggy.c, where the calculation
is currently happening, it is being moved to header.S because
the calculations in mkpiggy.c will be removed in favor of header.S
calculations in a following patch, and it seemed like overkill to move
the giant comment twice, especially when there's already reference to
z_extract_offset in header.S.)

Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, cleaned up comment style, moved comments around. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:00:50 +02:00
Masahiro Yamada be1fb0e8eb kbuild: delete unnecessary "@:"
Since commit 2aedcd098a ('kbuild: suppress annoying "... is up to
date." message'), $(call if_changed,...) is evaluated to "@:"
when there is nothing to do.

We no longer need to add "@:" after $(call if_changed,...) to
suppress "... is up to date." message.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-04-20 10:36:57 +02:00
Kees Cook 9016875df4 x86/KASLR: Rename "random" to "random_addr"
The variable "random" is also the name of a libc function. It's better
coding style to avoid overloading such things, so rename it to the more
accurate "random_addr".

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:30:51 +02:00
Kees Cook 7de828dfe6 x86/KASLR: Clarify purpose of kaslr.c
The name "choose_kernel_location" isn't specific enough, and doesn't
describe the primary thing it does: choosing a random location. This
patch renames it to "choose_random_location", and clarifies the what
routines are contained in the kaslr.c source file.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:30:51 +02:00
Kees Cook c040288132 x86/boot: Clarify purpose of functions in misc.c
The function "decompress_kernel" now performs many more duties, so this
patch renames it to "extract_kernel" and updates callers and comments.
Additionally the file header comment for misc.c is improved to actually
describe what is contained.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:30:51 +02:00
Kees Cook 6655e0aaf7 x86/boot: Rename "real_mode" to "boot_params"
The non-compressed boot code uses the (much more obvious) name
"boot_params" for the global pointer to the x86 boot parameters. The
compressed kernel loader code, though, was using the legacy name
"real_mode". There is no need to have a different name, and changing it
improves readability.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:30:50 +02:00
Yinghai Lu 206f25a831 x86/KASLR: Remove unneeded boot_params argument
Since the boot_params can be found using the real_mode global variable,
there is no need to pass around a pointer to it. This slightly simplifies
the choose_kernel_location function and its callers.

[kees: rewrote changelog, tracked file rename]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460997735-24785-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:30:50 +02:00
Kees Cook 9b238748cb x86/KASLR: Rename aslr.c to kaslr.c
In order to avoid confusion over what this file provides, rename it to
kaslr.c since it is used exclusively for the kernel ASLR, not userspace
ASLR.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:30:50 +02:00
H.J. Lu 6d92bc9d48 x86/build: Build compressed x86 kernels as PIE
The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC.  When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses.  However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:

  Failed to allocate space for phdrs

during the decompression stage.

If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.

Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE.  It isn't wrong, just less
optimal than R_386_RELATIVE.  But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel.  To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.

To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:

 commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
 Author: H.J. Lu <hjl.tools@gmail.com>
 Date:   Tue Mar 15 11:07:06 2016 -0700

    Add -z noreloc-overflow option to x86-64 ld

    Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
    disable relocation overflow check.  This can be used to avoid relocation
    overflow check if there will be no dynamic relocation overflow at
    run-time.

The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow.  So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:51:12 +02:00
Dmitry Vyukov 5c9a8750a6 kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Josh Poimboeuf c0dd671686 objtool: Mark non-standard object files and directories
Code which runs outside the kernel's normal mode of operation often does
unusual things which can cause a static analysis tool like objtool to
emit false positive warnings:

 - boot image
 - vdso image
 - relocation
 - realmode
 - efi
 - head
 - purgatory
 - modpost

Set OBJECT_FILES_NON_STANDARD for their related files and directories,
which will tell objtool to skip checking them.  It's ok to skip them
because they don't affect runtime stack traces.

Also skip the following code which does the right thing with respect to
frame pointers, but is too "special" to be validated by a tool:

 - entry
 - mcount

Also skip the test_nx module because it modifies its exception handling
table at runtime, which objtool can't understand.  Fortunately it's
just a test module so it doesn't matter much.

Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it
might eventually be useful for other tools.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/366c080e3844e8a5b6a0327dc7e8c2b90ca3baeb.1456719558.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 08:35:02 +01:00
Andrey Ryabinin c6d308534a UBSAN: run-time undefined behavior sanity checker
UBSAN uses compile-time instrumentation to catch undefined behavior
(UB).  Compiler inserts code that perform certain kinds of checks before
operations that could cause UB.  If check fails (i.e.  UB detected)
__ubsan_handle_* function called to print error message.

So the most of the work is done by compiler.  This patch just implements
ubsan handlers printing errors.

GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
option and its suboptions).
However GCC 5.x has more checkers implemented [2].
Article [3] has a bit more details about UBSAN in the GCC.

[1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
[2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
[3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

Issues which UBSAN has found thus far are:

Found bugs:

 * out-of-bounds access - 97840cb67f ("netfilter: nfnetlink: fix
   insufficient validation in nfnetlink_bind")

undefined shifts:

 * d48458d4a7 ("jbd2: use a better hash function for the revoke
   table")

 * 10632008b9 ("clockevents: Prevent shift out of bounds")

 * 'x << -1' shift in ext4 -
   http://lkml.kernel.org/r/<5444EF21.8020501@samsung.com>

 * undefined rol32(0) -
   http://lkml.kernel.org/r/<1449198241-20654-1-git-send-email-sasha.levin@oracle.com>

 * undefined dirty_ratelimit calculation -
   http://lkml.kernel.org/r/<566594E2.3050306@odin.com>

 * undefined roundown_pow_of_two(0) -
   http://lkml.kernel.org/r/<1449156616-11474-1-git-send-email-sasha.levin@oracle.com>

 * [WONTFIX] undefined shift in __bpf_prog_run -
   http://lkml.kernel.org/r/<CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com>

   WONTFIX here because it should be fixed in bpf program, not in kernel.

signed overflows:

 * 32a8df4e0b ("sched: Fix odd values in effective_load()
   calculations")

 * mul overflow in ntp -
   http://lkml.kernel.org/r/<1449175608-1146-1-git-send-email-sasha.levin@oracle.com>

 * incorrect conversion into rtc_time in rtc_time64_to_tm() -
   http://lkml.kernel.org/r/<1449187944-11730-1-git-send-email-sasha.levin@oracle.com>

 * unvalidated timespec in io_getevents() -
   http://lkml.kernel.org/r/<CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com>

 * [NOTABUG] signed overflow in ktime_add_safe() -
   http://lkml.kernel.org/r/<CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com>

[akpm@linux-foundation.org: fix unused local warning]
[akpm@linux-foundation.org: fix __int128 build woes]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yury Gribov <y.gribov@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Ingo Molnar 790a2ee242 * Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway - Paul Gortmaker
 
  * Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64 - Leif Lindholm
 
  * Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses - Matt Fleming
 
  * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel
 
  * Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module - Ben Hutchings
 
  * Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support - Taku Izumi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
 +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
 H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
 ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
 zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
 v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
 dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
 v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
 8xb/rET4JrhCG4gFMUZ7
 =1Oee
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi

Pull v4.4 EFI updates from Matt Fleming:

  - Make the EFI System Resource Table (ESRT) driver explicitly
    non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway. (Paul Gortmaker)

  - Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64. (Leif Lindholm)

  - Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses. (Matt Fleming)

  - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel

  - Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module. (Ben Hutchings)

  - Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support. (Taku Izumi)

Note: there is a semantic conflict between the following two commits:

  8a53554e12 ("x86/efi: Fix multiple GOP device support")
  ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")

I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:51:34 +02:00
Kővágó, Zoltán 8a53554e12 x86/efi: Fix multiple GOP device support
When multiple GOP devices exists, but none of them implements
ConOut, the code should just choose the first GOP (according to
the comments). But currently 'fb_base' will refer to the last GOP,
while other parameters to the first GOP, which will likely
result in a garbled display.

I can reliably reproduce this bug using my ASRock Z87M Extreme4
motherboard with CSM and integrated GPU disabled, and two PCIe
video cards (NVidia GT640 and GTX980), booting from efi-stub
(booting from grub works fine).  On the primary display the
ASRock logo remains and on the secondary screen it is garbled
up completely.

Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1444659236-24837-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:02:43 +02:00
Matt Fleming ae2ee627dc efifb: Add support for 64-bit frame buffer addresses
The EFI Graphics Output Protocol uses 64-bit frame buffer addresses
but these get truncated to 32-bit by the EFI boot stub when storing
the address in the 'lfb_base' field of 'struct screen_info'.

Add a 'ext_lfb_base' field for the upper 32-bits of the frame buffer
address and set VIDEO_TYPE_CAPABILITY_64BIT_BASE when the field is
useable.

It turns out that the reason no one has required this support so far
is that there's actually code in tianocore to "downgrade" PCI
resources that have option ROMs and 64-bit BARS from 64-bit to 32-bit
to cope with legacy option ROMs that can't handle 64-bit addresses.
The upshot is that basically all GOP devices in the wild use a 32-bit
frame buffer address.

Still, it is possible to build firmware that uses a full 64-bit GOP
frame buffer address. Chad did, which led to him reporting this issue.

Add support in anticipation of GOP devices using 64-bit addresses more
widely, and so that efifb works out of the box when that happens.

Reported-by: Chad Page <chad.page@znyx.com>
Cc: Pete Hawkins <pete.hawkins@znyx.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:06 +01:00
Yinghai Lu 2d3862d26e lib/decompressors: use real out buf size for gunzip with kernel
When loading x86 64bit kernel above 4GiB with patched grub2, got kernel
gunzip error.

| early console in decompress_kernel
| decompress_kernel:
|       input: [0x807f2143b4-0x807ff61aee]
|      output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len
| boot via startup_64
| KASLR using RDTSC...
|  new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size
|  decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee]
|
| Decompressing Linux... gz...
|
| uncompression error
|
| -- System halted

the new buffer is at 0x46fe000000ULL, decompressor_gzip is using
0xffffffb901ffffff as out_len.  gunzip in lib/zlib_inflate/inflate.c cap
that len to 0x01ffffff and decompress fails later.

We could hit this problem with crashkernel booting that uses kexec loading
kernel above 4GiB.

We have decompress_* support:
    1. inbuf[]/outbuf[] for kernel preboot.
    2. inbuf[]/flush() for initramfs
    3. fill()/flush() for initrd.
This bug only affect kernel preboot path that use outbuf[].

Add __decompress and take real out_buf_len for gunzip instead of guessing
wrong buf size.

Fixes: 1431574a1c (lib/decompressors: fix "no limit" output buffer length)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds 11e612ddb4 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main x86 bootup related changes in this cycle were:

   - more boot time optimizations.  (Len Brown)

   - implement hex output to allow the debugging of early bootup
     parameters.  (Kees Cook)

   - remove obsolete MCA leftovers.  (Paolo Pisati)"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
  x86/smpboot: Remove SIPI delays from cpu_up()
  x86/smpboot: Remove udelay(100) when polling cpu_callin_map
  x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
  x86/boot: Obsolete the MCA sys_desc_table
  x86/boot: Add hex output for debugging
2015-09-01 09:04:31 -07:00
Ingo Molnar 5461bd81bf Linux 4.2-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV0R4AAAoJEHm+PkMAQRiG8xIH/AmiRd+JDrs0qqEy46p6X8Gn
 0lB5/KsGycvIGIBTiy2nZzcT0Ly6LeFUKUjzPytlOhIZPMrxMVMShDaQKCXXIMUr
 1mN6hkvpkLNnUhvL2fR6mm0zkjbz3zZEazFY+Jic8wQrtSkHgfH0DXqSAo8le0f8
 kNrd5BPPhIwvpHGaNGFdTpbgpPcalXyQk/fHyvDGidbyXzY/d7l05QfYJ6XCD4Zm
 IAy48iK5BFts2+z3aOYrOeuuCcm1qFX8YArqzE1rfPp+U/LQpfUfij4cmOqDLn/F
 qnv9E7bRRVovvrgKe4I3Trta8kT53VLJvqpdw2Usqo8zvhs4VyrYpHC+gEE6YUY=
 =9Rd4
 -----END PGP SIGNATURE-----

Merge tag 'v4.2-rc7' into x86/boot, to refresh the branch before merging new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-17 10:41:59 +02:00
Ingo Molnar 5b929bd11d Merge branch 'x86/urgent' into x86/asm, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:35 +02:00
Dmitry Skorodumov 7cc03e4896 x86/efi: Use all 64 bit of efi_memmap in setup_e820()
The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.

While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.

It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.

The issue is triggered on Parallles virtual machine and
fixed with this patch.

Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-07-30 18:07:10 +01:00
Paolo Pisati 949163015c x86/boot: Obsolete the MCA sys_desc_table
The kernel does not support the MCA bus anymroe, so mark sys_desc_table
as obsolete: remove any reference from the code together with the remaining
of MCA logic.

bloat-o-meter output:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-55 (-55)
  function                                     old     new   delta
  i386_start_kernel                            128     119      -9
  setup_arch                                  1421    1375     -46

Signed-off-by: Paolo Pisati <p.pisati@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409430-8491-1-git-send-email-p.pisati@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 10:55:11 +02:00
Kees Cook 79063a7c02 x86/boot: Add hex output for debugging
This is useful for reporting various addresses or other values
while debugging early boot, for example, the recent kernel image
size vs kernel run size. For example, when
CONFIG_X86_VERBOSE_BOOTUP is set, this is now visible at boot
time:

	early console in setup code
	early console in decompress_kernel
	input_data: 0x0000000001e1526e
	input_len: 0x0000000000732236
	output: 0x0000000001000000
	output_len: 0x0000000001535640
	run_size: 0x00000000021fb000
	KASLR using RDTSC...

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20150706230620.GA17501@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07 08:59:05 +02:00
Andy Lutomirski 4ea1636b04 x86/asm/tsc: Rename native_read_tsc() to rdtsc()
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:28 +02:00
Andy Lutomirski 87be28aaf1 x86/asm/tsc: Replace rdtscll() with native_read_tsc()
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:26 +02:00
Linus Torvalds 88793e5c77 The libnvdimm sub-system introduces, in addition to the libnvdimm-core,
4 drivers / enabling modules:
 
 NFIT:
 Instantiates an "nvdimm bus" with the core and registers memory devices
 (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface
 table).  After registering NVDIMMs the NFIT driver then registers
 "region" devices.  A libnvdimm-region defines an access mode and the
 boundaries of persistent memory media.  A region may span multiple
 NVDIMMs that are interleaved by the hardware memory controller.  In
 turn, a libnvdimm-region can be carved into a "namespace" device and
 bound to the PMEM or BLK driver which will attach a Linux block device
 (disk) interface to the memory.
 
 PMEM:
 Initially merged in v4.1 this driver for contiguous spans of persistent
 memory address ranges is re-worked to drive PMEM-namespaces emitted by
 the libnvdimm-core.  In this update the PMEM driver, on x86, gains the
 ability to assert that writes to persistent memory have been flushed all
 the way through the caches and buffers in the platform to persistent
 media.  See memcpy_to_pmem() and wmb_pmem().
 
 BLK:
 This new driver enables access to persistent memory media through "Block
 Data Windows" as defined by the NFIT.  The primary difference of this
 driver to PMEM is that only a small window of persistent memory is
 mapped into system address space at any given point in time.  Per-NVDIMM
 windows are reprogrammed at run time, per-I/O, to access different
 portions of the media.  BLK-mode, by definition, does not support DAX.
 
 BTT:
 This is a library, optionally consumed by either PMEM or BLK, that
 converts a byte-accessible namespace into a disk with atomic sector
 update semantics (prevents sector tearing on crash or power loss).  The
 sinister aspect of sector tearing is that most applications do not know
 they have a atomic sector dependency.  At least today's disk's rarely
 ever tear sectors and if they do one almost certainly gets a CRC error
 on access.  NVDIMMs will always tear and always silently.  Until an
 application is audited to be robust in the presence of sector-tearing
 the usage of BTT is recommended.
 
 Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
 Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
 Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
 Wysocki, and Bob Moore.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj
 3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3
 QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE
 LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv
 +tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq
 KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT
 h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG
 b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0
 UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF
 etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee
 fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo
 7PJBbN5M5qP5tD0aO7SZ
 =VtWG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm

Pull libnvdimm subsystem from Dan Williams:
 "The libnvdimm sub-system introduces, in addition to the
  libnvdimm-core, 4 drivers / enabling modules:

  NFIT:
    Instantiates an "nvdimm bus" with the core and registers memory
    devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
    Interface table).

    After registering NVDIMMs the NFIT driver then registers "region"
    devices.  A libnvdimm-region defines an access mode and the
    boundaries of persistent memory media.  A region may span multiple
    NVDIMMs that are interleaved by the hardware memory controller.  In
    turn, a libnvdimm-region can be carved into a "namespace" device and
    bound to the PMEM or BLK driver which will attach a Linux block
    device (disk) interface to the memory.

  PMEM:
    Initially merged in v4.1 this driver for contiguous spans of
    persistent memory address ranges is re-worked to drive
    PMEM-namespaces emitted by the libnvdimm-core.

    In this update the PMEM driver, on x86, gains the ability to assert
    that writes to persistent memory have been flushed all the way
    through the caches and buffers in the platform to persistent media.
    See memcpy_to_pmem() and wmb_pmem().

  BLK:
    This new driver enables access to persistent memory media through
    "Block Data Windows" as defined by the NFIT.  The primary difference
    of this driver to PMEM is that only a small window of persistent
    memory is mapped into system address space at any given point in
    time.

    Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
    different portions of the media.  BLK-mode, by definition, does not
    support DAX.

  BTT:
    This is a library, optionally consumed by either PMEM or BLK, that
    converts a byte-accessible namespace into a disk with atomic sector
    update semantics (prevents sector tearing on crash or power loss).

    The sinister aspect of sector tearing is that most applications do
    not know they have a atomic sector dependency.  At least today's
    disk's rarely ever tear sectors and if they do one almost certainly
    gets a CRC error on access.  NVDIMMs will always tear and always
    silently.  Until an application is audited to be robust in the
    presence of sector-tearing the usage of BTT is recommended.

  Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
  Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
  Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
  Wysocki, and Bob Moore"

* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
  arch, x86: pmem api for ensuring durability of persistent memory updates
  libnvdimm: Add sysfs numa_node to NVDIMM devices
  libnvdimm: Set numa_node to NVDIMM devices
  acpi: Add acpi_map_pxm_to_online_node()
  libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
  pmem: flag pmem block devices as non-rotational
  libnvdimm: enable iostat
  pmem: make_request cleanups
  libnvdimm, pmem: fix up max_hw_sectors
  libnvdimm, blk: add support for blk integrity
  libnvdimm, btt: add support for blk integrity
  fs/block_dev.c: skip rw_page if bdev has integrity
  libnvdimm: Non-Volatile Devices
  tools/testing/nvdimm: libnvdimm unit test infrastructure
  libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
  nd_btt: atomic sector updates
  libnvdimm: infrastructure for btt devices
  libnvdimm: write blk label set
  libnvdimm: write pmem label set
  libnvdimm: blk labels and namespace instantiation
  ...
2015-06-29 10:34:42 -07:00
Ingo Molnar 927392d73a x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h
Linus reported the following new warning on x86 allmodconfig with GCC 5.1:

  > ./arch/x86/include/asm/spinlock.h: In function ‘arch_spin_lock’:
  > ./arch/x86/include/asm/spinlock.h:119:3: warning: implicit declaration
  > of function ‘__ticket_lock_spinning’ [-Wimplicit-function-declaration]
  >    __ticket_lock_spinning(lock, inc.tail);
  >    ^

This warning triggers because of these hacks in misc.h:

  /*
   * we have to be careful, because no indirections are allowed here, and
   * paravirt_ops is a kind of one. As it will only run in baremetal anyway,
   * we just keep it from happening
   */
  #undef CONFIG_PARAVIRT
  #undef CONFIG_KASAN

But these hacks were not updated when CONFIG_PARAVIRT_SPINLOCKS was added,
and eventually (with the introduction of queued paravirt spinlocks in
recent kernels) this created an invalid Kconfig combination and broke
the build.

So add a CONFIG_PARAVIRT_SPINLOCKS #undef line as well.

Also remove the _ASM_X86_DESC_H quirk: that undocumented quirk
was originally added ages ago, in:

  099e137726 ("x86: use ELF format in compressed images.")

and I went back to that kernel (and fixed up the main Makefile
which didn't build anymore) and checked what failure it
avoided: it avoided an include file dependencies related
build failure related to our old x86-platforms code.

That old code is long gone, the header dependencies got cleaned
up, and the build does not fail anymore with the totality of
asm/desc.h included - so remove the quirk.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-29 11:06:37 +02:00
Dan Williams ad5fb870c4 e820, efi: add ACPI 6.0 persistent memory types
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.

This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).

Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.

Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-05-27 21:46:05 -04:00
Ingo Molnar c102cb097d * Avoid garbage names in efivarfs due to buggy firmware by zero'ing
EFI variable name - Ross Lagerwall
 
  * Stop erroneously dropping upper 32-bits of boot command line pointer
    in EFI boot stub and stash them in ext_cmd_line_ptr - Roy Franz
 
  * Fix double-free bug in error handling code path of EFI runtime map
    code - Dan Carpenter
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVSOSjAAoJEC84WcCNIz1VXk4P/R4GwmmzZBdYAseiwv6u/NRm
 bTXnK7SN1ZyY8WibEm8ptXJuTIyXZxmQYr4lY97canJy8P7umtoCP7P3tS0Ier8U
 N1AMFGes7xlwBhjIRz2Cr9e5plr5H3qk65JNMuUDp0/MVuPEiNEzi6efbL82dh9S
 RCLxQ94paX+wV6ltQMKWGD3v0WnHkzouuCdETCGaozqQmJx6PGzDmJ51kXYRWDyP
 esTCZpRHlIzKN0u3XEFgswlIev2wab0BtjXYOzUqb0AH1Q13OgQfiswX3WIG6k+c
 3xuMH4JByBIDwOLudgu0D6Sst2QwVJZnw6JavoEgGCFao0n6IPzUGolAWLFMdDhL
 Kparzc6ObHpiqYtqBjJXW+awOENVS4qIrn9MHc9wwsJxXOy++0YnyYCgge0iia47
 F2/pOHvkd52QiQ0gC442W0EdX1VlPCUR04G0s4d3UX3O875yl80QTyLQ4n7ZK074
 3wfi/9+Fuv8wWMJ4HI8FJgaTl57KzAP4ZPh2cy8oPs6bkiiwlnMWH24bEhlxKBK4
 mEIze045kyswz3rV7j1WX3MSXrPA2cM95L5WlvVTxckMn40QwLPBWSDCOJIj3K5K
 yhXNHHfHzG/GRm3SfD2i1EcK4gUW82awl72jJn0F69YMI5a+T1BIppEMP2pzsWE4
 FcwvWDxzWwKxYKJosfkk
 =f7a2
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

 * Avoid garbage names in efivarfs due to buggy firmware by zeroing
   EFI variable name. (Ross Lagerwall)

 * Stop erroneously dropping upper 32 bits of boot command line pointer
   in EFI boot stub and stash them in ext_cmd_line_ptr. (Roy Franz)

 * Fix double-free bug in error handling code path of EFI runtime map
   code. (Dan Carpenter)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 08:30:24 +02:00
Roy Franz 98b228f550 x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
Until now, the EFI stub was only setting the 32 bit cmd_line_ptr in
the setup_header structure, so on 64 bit platforms this could be truncated.
This patch adds setting the upper bits of the buffer address in
ext_cmd_line_ptr.  This case was likely never hit, as the allocation
for this buffer is done at the lowest available address.  Only
x86_64 kernels have this problem, as the 1-1 mapping mandated
by EFI ensures that all memory is 32 bit addressable on 32 bit
platforms.  The EFI stub does not support mixed mode, so the
32 bit kernel on 64 bit firmware case does not need to be handled.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-04-17 15:46:32 +01:00
Borislav Petkov 78cac48c04 x86/mm/KASLR: Propagate KASLR status to kernel proper
Commit:

  e2b32e6785 ("x86, kaslr: randomize module base load address")

made module base address randomization unconditional and didn't regard
disabled KKASLR due to CONFIG_HIBERNATION and command line option
"nokaslr". For more info see (now reverted) commit:

  f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")

In order to propagate KASLR status to kernel proper, we need a single bit
in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
top-down allocated bits for bits supposed to be used by the bootloader.

Originally-From: Jiri Kosina <jkosina@suse.cz>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:26:15 +02:00
Denys Vlasenko 40e4f2d177 x86/asm/boot/64: Use __BOOT_TSS instead of literal $0x20
__BOOT_TSS = (GDT_ENTRY_BOOT_TSS * 8)
GDT_ENTRY_BOOT_TSS = (GDT_ENTRY_BOOT_CS + 2)
GDT_ENTRY_BOOT_CS = 2

(2 + 2) * 8 = 4 * 8 = 32 = 0x20

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 12:00:20 +02:00
Ingo Molnar e4518ab90f Linux 4.0-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVD1VGAAoJEHm+PkMAQRiG7yoH/juKOQ1zbxi5M+mleDEEJtA0
 RxQSojqEMWIKrWi8PNZxjENn1OZB6XOLIXOhlyAZBrmgsjO34p1DyXlZMznr/R8W
 kQ2Xxs061hRtB3OuruMIqOApUrjuqsaCwgbgUS1qWmqZcoyZN4oELyZMP6OOlqv5
 UUBZm8MfyXGyxrCcg39mjct3VEOhiuEcvL6SUxOC380CdSVAnyqHFPcz0JVqMUn9
 9RUBs0T9cMdhb0mZ2bfXzt6AKArj63G2nXOum+VzFcvspSm2U+MPIDCuoE+ZbTPS
 jqIAgG0rj1ezRyb5oeJrvlU0Yy3u/cXoMPs9+kORvpladooYNLti8ovh6qllm0I=
 =d/ye
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc5' into x86/asm, to resolve conflicts

Conflicts:
	arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:13:15 +01:00
Borislav Petkov 69797dafe3 Revert "x86/mm/ASLR: Propagate base load address calculation"
This reverts commit:

  f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")

The main reason for the revert is that the new boot flag does not work
at all currently, and in order to make this work, we need non-trivial
changes to the x86 boot code which we didn't manage to get done in
time for merging.

And even if we did, they would've been too risky so instead of
rushing things and break booting 4.1 on boxes left and right, we
will be very strict and conservative and will take our time with
this to fix and test it properly.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 11:18:21 +01:00
Ingo Molnar d2c032e3dc Linux 4.0-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b
 RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q
 al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE
 ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr
 NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw
 QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68=
 =qpY+
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:35:43 +01:00
Linus Torvalds 5fbe4c224c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
 "This contains:

   - EFI fixes
   - a boot printout fix
   - ASLR/kASLR fixes
   - intel microcode driver fixes
   - other misc fixes

  Most of the linecount comes from an EFI revert"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch
  x86/microcode/intel: Handle truncated microcode images more robustly
  x86/microcode/intel: Guard against stack overflow in the loader
  x86, mm/ASLR: Fix stack randomization on 64-bit systems
  x86/mm/init: Fix incorrect page size in init_memory_mapping() printks
  x86/mm/ASLR: Propagate base load address calculation
  Documentation/x86: Fix path in zero-page.txt
  x86/apic: Fix the devicetree build in certain configs
  Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes"
  x86/efi: Avoid triple faults during EFI mixed mode calls
2015-02-21 10:41:29 -08:00
Ingo Molnar a267b0a349 Merge branch 'tip-x86-kaslr' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent
Pull ASLR and kASLR fixes from Borislav Petkov:

  - Add a global flag announcing KASLR state so that relevant code can do
    informed decisions based on its setting. (Jiri Kosina)

  - Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19 12:31:34 +01:00
Jiri Kosina f47233c2d3 x86/mm/ASLR: Propagate base load address calculation
Commit:

  e2b32e6785 ("x86, kaslr: randomize module base load address")

makes the base address for module to be unconditionally randomized in
case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't
present on the commandline.

This is not consistent with how choose_kernel_location() decides whether
it will randomize kernel load base.

Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is
explicitly specified on kernel commandline), which makes the state space
larger than what module loader is looking at. IOW CONFIG_HIBERNATION &&
CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied
by default in that case, but module loader is not aware of that.

Instead of fixing the logic in module.c, this patch takes more generic
aproach. It introduces a new bootparam setup data_type SETUP_KASLR and
uses that to pass the information whether kaslr has been applied during
kernel decompression, and sets a global 'kaslr_enabled' variable
accordingly, so that any kernel code (module loading, livepatching, ...)
can make decisions based on its value.

x86 module loader is converted to make use of this flag.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz
[ Always dump correct kaslr status when panicking ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 11:38:54 +01:00
Alexander Kuleshov fb148d83ec x86/asm/boot: Use already defined KEEP_SEGMENTS macro in head_{32,64}.S
There is already defined macro KEEP_SEGMENTS in
<asm/bootparam.h>, let's use it instead of hardcoded
constants.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1424331298-7456-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19 10:05:04 +01:00
Andrey Ryabinin 393f203f5f x86_64: kasan: add interceptors for memset/memmove/memcpy functions
Recently instrumentation of builtin functions calls was removed from GCC
5.0.  To check the memory accessed by such functions, userspace asan
always uses interceptors for them.

So now we should do this as well.  This patch declares
memset/memmove/memcpy as weak symbols.  In mm/kasan/kasan.c we have our
own implementation of those functions which checks memory before accessing
it.

Default memset/memmove/memcpy now now always have aliases with '__'
prefix.  For files that built without kasan instrumentation (e.g.
mm/slub.c) original mem* replaced (via #define) with prefixed variants,
cause we don't want to check memory accesses there.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:41 -08:00
Andrey Ryabinin ef7f0d6a6c x86_64: add KASan support
This patch adds arch specific code for kernel address sanitizer.

16TB of virtual addressed used for shadow memory.  It's located in range
[ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
stacks.

At early stage we map whole shadow region with zero page.  Latter, after
pages mapped to direct mapping address range we unmap zero pages from
corresponding shadow (see kasan_map_shadow()) and allocate and map a real
shadow memory reusing vmemmap_populate() function.

Also replace __pa with __pa_nodebug before shadow initialized.  __pa with
CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
__phys_addr is instrumented, so __asan_load could be called before shadow
area initialized.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:41 -08:00
Matt Fleming 96738c69a7 x86/efi: Avoid triple faults during EFI mixed mode calls
Andy pointed out that if an NMI or MCE is received while we're in the
middle of an EFI mixed mode call a triple fault will occur. This can
happen, for example, when issuing an EFI mixed mode call while running
perf.

The reason for the triple fault is that we execute the mixed mode call
in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
installed throughout the call.

At Andy's suggestion, stop playing the games we currently do at runtime,
such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
can simply switch to the __KERNEL32_CS descriptor before invoking
firmware services, and run in compatibility mode. This way, if an
NMI/MCE does occur the kernel IDT handler will execute correctly, since
it'll jump to __KERNEL_CS automatically.

However, this change is only possible post-ExitBootServices(). Before
then the firmware "owns" the machine and expects for its 32-bit IDT
handlers to be left intact to service interrupts, etc.

So, we now need to distinguish between early boot and runtime
invocations of EFI services. During early boot, we need to restore the
GDT that the firmware expects to be present. We can only jump to the
__KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
has been invoked.

A liberal sprinkling of comments in the thunking code should make the
differences in early and late environments more apparent.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-02-13 15:42:56 +00:00
Kees Cook d69911a68c x86, build: replace Perl script with Shell script
Commit e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
added Perl to the required build environment.  This reimplements in
shell the Perl script used to find the size of the kernel with bss and
brk added.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Rob Landley <rob@landley.net>
Acked-by: Rob Landley <rob@landley.net>
Cc: Anca Emanuel <anca.emanuel@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-26 13:37:18 -08:00
Kees Cook f285f4a21c x86, boot: Skip relocs when load address unchanged
On 64-bit, relocation is not required unless the load address gets
changed. Without this, relocations do unexpected things when the kernel
is above 4G.

Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Thomas D. <whissi@whissi.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150116005146.GA4212@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:37:23 +01:00
Linus Torvalds 8139548136 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "Changes in this cycle are:

   - support module unload for efivarfs (Mathias Krause)

   - another attempt at moving x86 to libstub taking advantage of the
     __pure attribute (Ard Biesheuvel)

   - add EFI runtime services section to ptdump (Mathias Krause)"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, ptdump: Add section for EFI runtime services
  efi/x86: Move x86 back to libstub
  efivarfs: Allow unloading when build as module
2014-12-10 12:42:16 -08:00
Linus Torvalds b6444bd0a1 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot and percpu updates from Ingo Molnar:
 "This tree contains a bootable images documentation update plus three
  slightly misplaced x86/asm percpu changes/optimizations"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64: Use RIP-relative addressing for most per-CPU accesses
  x86-64: Handle PC-relative relocations on per-CPU data
  x86: Convert a few more per-CPU items to read-mostly ones
  x86, boot: Document intermediates more clearly
2014-12-10 12:10:24 -08:00
Chris Clayton e2e68ae688 x86: Use $(OBJDUMP) instead of plain objdump
commit e6023367d7 'x86, kaslr: Prevent .bss from overlaping initrd'
broke the cross compile of x86. It added a objdump invocation, which
invokes the host native objdump and ignores an active cross tool
chain.

Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into
account.

[ tglx: Massage changelog and use $(OBJDUMP) ]

Fixes: e6023367d7 'x86, kaslr: Prevent .bss from overlaping initrd'
Signed-off-by: Chris Clayton <chris2553@googlemail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/54705C8E.1080400@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-23 21:21:53 +01:00
Ard Biesheuvel 243b6754cd efi/x86: Move x86 back to libstub
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.

The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)

Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.

Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:23:11 +00:00
Jan Beulich 6d24c5f72d x86-64: Handle PC-relative relocations on per-CPU data
This is in preparation of using RIP-relative addressing in many of the
per-CPU accesses.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A15A0200007800044A9A@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 20:43:14 +01:00
Kees Cook fb7183ef3c x86, boot: Document intermediates more clearly
This adds a comment detailing the various intermediate files used to build
the bootable decompression image for the x86 kernel.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Link: http://lkml.kernel.org/r/20141031162204.GA26268@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-01 22:46:02 +01:00
Junjie Mao e6023367d7 x86, kaslr: Prevent .bss from overlaping initrd
When choosing a random address, the current implementation does not take into
account the reversed space for .bss and .brk sections. Thus the relocated kernel
may overlap other components in memory. Here is an example of the overlap from a
x86_64 kernel in qemu (the ranges of physical addresses are presented):

 Physical Address

    0x0fe00000                  --+--------------------+  <-- randomized base
                               /  |  relocated kernel  |
                   vmlinux.bin    | (from vmlinux.bin) |
    0x1336d000    (an ELF file)   +--------------------+--
                               \  |                    |  \
    0x1376d870                  --+--------------------+   |
                                  |    relocs table    |   |
    0x13c1c2a8                    +--------------------+   .bss and .brk
                                  |                    |   |
    0x13ce6000                    +--------------------+   |
                                  |                    |  /
    0x13f77000                    |       initrd       |--
                                  |                    |
    0x13fef374                    +--------------------+

The initrd image will then be overwritten by the memset during early
initialization:

[    1.655204] Unpacking initramfs...
[    1.662831] Initramfs unpacking failed: junk in compressed archive

This patch prevents the above situation by requiring a larger space when looking
for a random kernel base, so that existing logic can effectively avoids the
overlap.

[kees: switched to perl to avoid hex translation pain in mawk vs gawk]
[kees: calculated overlap without relocs table]

Fixes: 82fa9637a2 ("x86, kaslr: Select random position from e820 maps")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414762838-13067-1-git-send-email-eternal.n08@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-01 22:20:50 +01:00
Linus Torvalds 8c81f48e16 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI updates from Peter Anvin:
 "This patchset falls under the "maintainers that grovel" clause in the
  v3.18-rc1 announcement.  We had intended to push it late in the merge
  window since we got it into the -tip tree relatively late.

  Many of these are relatively simple things, but there are a couple of
  key bits, especially Ard's and Matt's patches"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  rtc: Disable EFI rtc for x86
  efi: rtc-efi: Export platform:rtc-efi as module alias
  efi: Delete the in_nmi() conditional runtime locking
  efi: Provide a non-blocking SetVariable() operation
  x86/efi: Adding efi_printks on memory allocationa and pci.reads
  x86/efi: Mark initialization code as such
  x86/efi: Update comment regarding required phys mapped EFI services
  x86/efi: Unexport add_efi_memmap variable
  x86/efi: Remove unused efi_call* macros
  efi: Resolve some shadow warnings
  arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  efi: Introduce efi_md_typeattr_format()
  efi: Add macro for EFI_MEMORY_UCE memory attribute
  x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode
  arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi
  arm64/efi: uefi_init error handling fix
  efi: Add kernel param efi=noruntime
  lib: Add a generic cmdline parse function parse_option_str
  ...
2014-10-23 14:45:09 -07:00
Linus Torvalds 2fd7476de9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc smaller fixes that missed the v3.17 cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Add arch/x86/purgatory/ make generated files to gitignore
  x86: Fix section conflict for numachip
  x86: Reject x32 executables if x32 ABI not supported
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14 02:28:16 +02:00
Linus Torvalds 74da38631a Tinification for 3.18
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2
 7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS
 bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR
 dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP
 eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x
 JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk
 QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ
 kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa
 WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf
 AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt
 LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G
 GDTZo9iIRNSfm/M4tJ+n
 =2VyW
 -----END PGP SIGNATURE-----

Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux

Pull "tinification" patches from Josh Triplett.

Work on making smaller kernels.

* tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
  bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
  mm: Support compiling out madvise and fadvise
  x86: Support compiling out human-friendly processor feature names
  x86: Drop support for /proc files when !CONFIG_PROC_FS
  x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
  x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
  x86, boot: Use the usual -y -n mechanism for objects in vmlinux
  x86: Add "make tinyconfig" to configure the tiniest possible kernel
  x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-07 08:51:59 -04:00
Matt Fleming 75b128573b Merge branch 'next' into efi-next-merge
Conflicts:
	arch/x86/boot/compressed/eboot.c
2014-10-03 22:15:56 +01:00
Andre Müller 77e21e87ac x86/efi: Adding efi_printks on memory allocationa and pci.reads
All other calls to allocate memory seem to make some noise already, with the
exception of two calls (for gop, uga) in the setup_graphics path.

The purpose is to be noisy on worrysome errors immediately.

commit fb86b2440d ("x86/efi: Add better error logging to EFI boot
stub") introduces printing false alarms for lots of hardware. Rather
than playing Whack a Mole with non-fatal exit conditions, try the other
way round.

This is per Matt Fleming's suggestion:

> Where I think we could improve things
> is by adding efi_printk() message in certain error paths. Clearly, not
> all error paths need such messages, e.g. the EFI_INVALID_PARAMETER path
> you highlighted above, but it makes sense for memory allocation and PCI
> read failures.

Link: http://article.gmane.org/gmane.linux.kernel.efi/4628
Signed-off-by: Andre Müller <andre.muller@web.de>
Cc: Ulf Winkelvos <ulf@winkelvos.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:03 +01:00
Matt Fleming 5a17dae422 efi: Add efi= parameter parsing to the EFI boot stub
We need a way to customize the behaviour of the EFI boot stub, in
particular, we need a way to disable the "chunking" workaround, used
when reading files from the EFI System Partition.

One of my machines doesn't cope well when reading files in 1MB chunks to
a buffer above the 4GB mark - it appears that the "chunking" bug
workaround triggers another firmware bug. This was only discovered with
commit 4bf7111f50 ("x86/efi: Support initrd loaded above 4G"), and
that commit is perfectly valid. The symptom I observed was a corrupt
initrd rather than any kind of crash.

efi= is now used to specify EFI parameters in two very different
execution environments, the EFI boot stub and during kernel boot.

There is also a slight performance optimization by enabling efi=nochunk,
but that's offset by the fact that you're more likely to run into
firmware issues, at least on x86. This is the rationale behind leaving
the workaround enabled by default.

Also provide some documentation for EFI_READ_CHUNK_SIZE and why we're
using the current value of 1MB.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:57 +01:00
Kees Cook 20cc28882b x86, boot, kaslr: Fix nuisance warning on 32-bit builds
Building 32-bit threw a warning on kASLR enabled builds:

arch/x86/boot/compressed/aslr.c: In function ‘mem_avoid_overlap’:
arch/x86/boot/compressed/aslr.c:198:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   avoid.start = (u64)ptr;
                 ^

This fixes the warning; unsigned long should have been used here.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20141001183632.GA11431@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-10-01 11:41:24 -07:00
Linus Torvalds cd40fab6db Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This has:

   - EFI revert to fix a boot regression
   - early_ioremap() fix for boot failure
   - KASLR fix for possible boot failures
   - EFI fix for corrupted string printing
   - remove a misleading EFI bootup 'failed!' error message

  Unfortunately it's all rather close to the merge window"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
  x86/efi: Delete misleading efi_printk() error message
  Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
  x86/kaslr: Avoid the setup_data area when picking location
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
2014-09-27 14:23:13 -07:00
Ingo Molnar 29282ac0bd * Revert the static library changes from the merge window since they're
causing issues for Macbooks and Fedora + Grub2 - Matt Fleming
 
  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI - Matt Fleming
 
  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware - Matt Fleming
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUIzOFAAoJEC84WcCNIz1VpSwQAJhp9Yu60dWXyqRpV+ER1yau
 MWHXunkeaccbBXNABkzuDUqb2a6DRIJow+/n+dYjIGY6Nf9zzLQFQ+s/EsS+IiyY
 s4rRvAqzfGYk1d6xzgvEccrdD4fRP32kFKnSlZpfTRuJZHZieD+f2y6TP7D33Ja3
 HV/ivPQHZNjxgsExpcE8Rz/QyOZpqRacTr9Gr7IusBRL6IMCyycfCTmt6d5pf1iD
 kQGSGwIBvgN4xMqPUdxTNo31bQZA5ZeywNOh9WhdSCL7FAIDfG9TmXt/J7ckq5ax
 0f6X92qgCs3peLY+/szgSZ2LsZI6I/FM1udc2SFiIOPvCwwytcJ94Wro5pLYzZ4i
 SnqB2xLLEmsR2J3MXIeY0aVy2VtHT4bYRnXYNd9G0eaVfrlJ+4lgwqJavAJmtDZx
 88ey1R8LKRQr+ueSv/BnOvE6T2+38HrjrMooFQsPvolRR0S6MITBr8I2hoRASkUt
 YsA+7s6+tO2QBmQYrKCYSAi9A7onMA9Fh93dmv7XLqFw/SsfVm3RnrNhOVsO9kPC
 zIsWZoS+PGwb4RRvM2i7JAEqUCbuLpIAHYEU6gqprWm1ERHsX9mfFYfsJQHzHuOY
 rg6+wtWQ9MGxek8POac4d2mC+PwC4DA0AkaTTZQBcdAJu+h/gZNt4w7mpk5v4Th1
 QYr/otShMvc84Zd+RMeV
 =5mVJ
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

  * Revert the static library changes from the merge window since they're
    causing issues for Macbooks and Fedora + Grub2 (Matt Fleming)

  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI (Matt Fleming)

  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-25 16:40:08 +02:00
Matt Fleming 115c6628a5 x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
If we're executing the 32-bit efi_char16_printk() code path (i.e.
running on top of 32-bit firmware) we know that efi_early->text_output
will be a 32-bit value, even though ->text_output has type u64.

Unfortunately, we currently pass ->text_output directly to
efi_early->call() so for CONFIG_X86_32 the compiler will push a 64-bit
value onto the stack, causing the other parameters to be misaligned.

The way we handle this in the rest of the EFI boot stub is to pass
pointers as arguments to efi_early->call(), which automatically do the
right thing (pointers are 32-bit on CONFIG_X86_32, and we simply ignore
the upper 32-bits of the argument register if running in 64-bit mode
with 32-bit firmware).

This fixes a corruption bug when printing strings from the 32-bit EFI
boot stub.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84241
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-24 21:56:46 +01:00
Matt Fleming 56394ab8c2 x86/efi: Delete misleading efi_printk() error message
A number of people are reporting seeing the "setup_efi_pci() failed!"
error message in what used to be a quiet boot,

  https://bugzilla.kernel.org/show_bug.cgi?id=81891

The message isn't all that helpful because setup_efi_pci() can return a
non-success error code for a variety of reasons, not all of them fatal.

Let's drop the return code from setup_efi_pci*() altogether, since
there's no way to process it in any meaningful way outside of the inner
__setup_efi_pci*() functions.

Reported-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Ulf Winkelvos <ulf@winkelvos.de>
Cc: Andre Müller <andre.muller@web.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-24 12:46:59 +01:00
Matt Fleming 84be880560 Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
This reverts commit f23cf8bd5c ("efi/x86: efistub: Move shared
dependencies to <asm/efi.h>") as well as the x86 parts of commit
f4f75ad574 ("efi: efistub: Convert into static library").

The road leading to these two reverts is long and winding.

The above two commits were merged during the v3.17 merge window and
turned the common EFI boot stub code into a static library. This
necessitated making some symbols global in the x86 boot stub which
introduced new entries into the early boot GOT.

The problem was that we weren't fixing up the newly created GOT entries
before invoking the EFI boot stub, which sometimes resulted in hangs or
resets. This failure was reported by Maarten on his Macbook pro.

The proposed fix was commit 9cb0e39423 ("x86/efi: Fixup GOT in all
boot code paths"). However, that caused issues for Linus when booting
his Sony Vaio Pro 11. It was subsequently reverted in commit
f3670394c2.

So that leaves us back with Maarten's Macbook pro not booting.

At this stage in the release cycle the least risky option is to revert
the x86 EFI boot stub to the pre-merge window code structure where we
explicitly #include efi-stub-helper.c instead of linking with the static
library. The arm64 code remains unaffected.

We can take another swing at the x86 parts for v3.18.

Conflicts:
	arch/x86/include/asm/efi.h

Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64]
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-23 22:01:55 +01:00
Linus Torvalds f3670394c2 Revert "x86/efi: Fixup GOT in all boot code paths"
This reverts commit 9cb0e39423.

It causes my Sony Vaio Pro 11 to immediately reboot at startup.

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-22 23:05:49 -07:00
Kees Cook 0cacbfbeb5 x86/kaslr: Avoid the setup_data area when picking location
The KASLR location-choosing logic needs to avoid the setup_data
list memory areas as well. Without this, it would be possible to
have the ASLR position stomp on the memory, ultimately causing
the boot to fail.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Baoquan He <bhe@redhat.com>
Cc: stable@vger.kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140911161931.GA12001@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 13:04:29 +02:00
Matt Fleming 9cb0e39423 x86/efi: Fixup GOT in all boot code paths
Maarten reported that his Macbook pro 8.2 stopped booting after commit
f23cf8bd5c ("efi/x86: efistub: Move shared dependencies to
<asm/efi.h>"), the main feature of which is changing the visibility of
symbol 'efi_early' from local to global.

By making 'efi_early' global we end up requiring an entry in the Global
Offset Table. Unfortunately, while we do include code to fixup GOT
entries in the early boot code, it's only called after we've executed
the EFI boot stub.

What this amounts to is that references to 'efi_early' in the EFI boot
stub don't point to the correct place.

Since we've got multiple boot entry points we need to be prepared to
fixup the GOT in multiple places, while ensuring that we never do it
more than once, otherwise the GOT entries will still point to the wrong
place.

Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-08 20:52:02 +01:00
Yinghai Lu 47226ad4f4 x86/efi: Only load initrd above 4g on second try
Mantas found that after commit 4bf7111f50 ("x86/efi: Support initrd
loaded above 4G"), the kernel freezes at the earliest possible moment
when trying to boot via UEFI on Asus laptop.

Revert to old way to load initrd under 4G on first try, second try will
use above 4G buffer when initrd is too big and does not fit under 4G.

[ The cause of the freeze appears to be a firmware bug when reading
  file data into buffers above 4GB, though the exact reason is unknown.
  Mantas reports that the hang can be avoid if the file size is a
  multiple of 512 bytes, but I've seen some ASUS firmware simply
  corrupting the file data rather than freezing.

  Laszlo fixed an issue in the upstream EDK2 DiskIO code in Aug 2013
  which may possibly be related, commit 4e39b75e ("MdeModulePkg/DiskIoDxe:
  fix source/destination pointer of overrun transfer").

  Whatever the cause, it's unlikely that a fix will be forthcoming
  from the vendor, hence the workaround - Matt ]

Cc: Laszlo Ersek <lersek@redhat.com>
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Reported-by: Harald Hoyer <harald@redhat.com>
Tested-by: Anders Darander <anders@chargestorm.se>
Tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-08 20:52:02 +01:00
Josh Triplett 3afed06a35 x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
All the code in early_serial_console.c gets compiled out if
!CONFIG_EARLY_PRINTK, but early_serial_console.o itself still gets
compiled in.  Eliminate it from the compile entirely in that case.

This does not change the generated code at all, in either case.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 14:58:24 -07:00
Josh Triplett 9e6abd2a98 x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
All the code in aslr.c gets compiled out if !CONFIG_RANDOMIZE_BASE, but
aslr.o itself still gets compiled in.  Eliminate it from the compile
entirely in that case.

This does not change the generated code at all, in either case.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 14:58:24 -07:00
Josh Triplett 9a1cb47112 x86, boot: Use the usual -y -n mechanism for objects in vmlinux
Switch VMLINUX_OBJS to vmlinux-objs-y, to eliminate Makefile
conditionals in favor of vmlinux-objs-$(CONFIG_*) constructs.

This does not change the generated code at all.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 14:58:24 -07:00
Linus Torvalds 76f09aa464 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI changes from Ingo Molnar:
 "Main changes in this cycle are:

   - arm64 efi stub fixes, preservation of FP/SIMD registers across
     firmware calls, and conversion of the EFI stub code into a static
     library - Ard Biesheuvel

   - Xen EFI support - Daniel Kiper

   - Support for autoloading the efivars driver - Lee, Chun-Yi

   - Use the PE/COFF headers in the x86 EFI boot stub to request that
     the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael
     Brown

   - Consolidate all the x86 EFI quirks into one file - Saurabh Tangri

   - Additional error logging in x86 EFI boot stub - Ulf Winkelvos

   - Support loading initrd above 4G in EFI boot stub - Yinghai Lu

   - EFI reboot patches for ACPI hardware reduced platforms"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  efi/arm64: Handle missing virtual mapping for UEFI System Table
  arch/x86/xen: Silence compiler warnings
  xen: Silence compiler warnings
  x86/efi: Request desired alignment via the PE/COFF headers
  x86/efi: Add better error logging to EFI boot stub
  efi: Autoload efivars
  efi: Update stale locking comment for struct efivars
  arch/x86: Remove efi_set_rtc_mmss()
  arch/x86: Replace plain strings with constants
  xen: Put EFI machinery in place
  xen: Define EFI related stuff
  arch/x86: Remove redundant set_bit(EFI_MEMMAP) call
  arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call
  efi: Introduce EFI_PARAVIRT flag
  arch/x86: Do not access EFI memory map if it is not available
  efi: Use early_mem*() instead of early_io*()
  arch/ia64: Define early_memunmap()
  x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
  efi/reboot: Allow powering off machines using EFI
  efi/reboot: Add generic wrapper around EfiResetSystem()
  ...
2014-08-04 17:13:50 -07:00
Ulf Winkelvos fb86b2440d x86/efi: Add better error logging to EFI boot stub
Hopefully this will enable us to better debug:
https://bugzilla.kernel.org/show_bug.cgi?id=68761

Signed-off-by: Ulf Winkelvos <ulf@winkelvos.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:24:01 +01:00
Ard Biesheuvel f4f75ad574 efi: efistub: Convert into static library
This patch changes both x86 and arm64 efistub implementations
from #including shared .c files under drivers/firmware/efi to
building shared code as a static library.

The x86 code uses a stub built into the boot executable which
uncompresses the kernel at boot time. In this case, the library is
linked into the decompressor.

In the arm64 case, the stub is part of the kernel proper so the library
is linked into the kernel proper as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:22:19 +01:00
Ard Biesheuvel bd669475d1 efi: efistub: Refactor stub components
In order to move from the #include "../../../xxxxx.c" anti-pattern used
by both the x86 and arm64 versions of the stub to a static library
linked into either the kernel proper (arm64) or a separate boot
executable (x86), there is some prepatory work required.

This patch does the following:
- move forward declarations of functions shared between the arch
  specific and the generic parts of the stub to include/linux/efi.h
- move forward declarations of functions shared between various .c files
  of the generic stub code to a new local header file called "efistub.h"
- add #includes to all .c files which were formerly relying on the
  #includor to include the correct header files
- remove all static modifiers from functions which will need to be
  externally visible once we move to a static library

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-07 20:29:48 +01:00
Ard Biesheuvel f23cf8bd5c efi/x86: efistub: Move shared dependencies to <asm/efi.h>
This moves definitions depended upon both by code under arch/x86/boot
and under drivers/firmware/efi to <asm/efi.h>. This is in preparation of
turning the stub code under drivers/firmware/efi into a static library.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-07 20:29:46 +01:00
Yinghai Lu 4bf7111f50 x86/efi: Support initrd loaded above 4G
For boot efi kernel directly without bootloader.
If the kernel support XLF_CAN_BE_LOADED_ABOVE_4G, we should
not limit initrd under hdr->initrd_add_max.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-19 11:14:34 +01:00
Kees Cook 24f2e0273f x86, kaslr: boot-time selectable with hibernation
Changes kASLR from being compile-time selectable (blocked by
CONFIG_HIBERNATION), to being boot-time selectable (with hibernation
available by default) via the "kaslr" kernel command line.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-16 23:30:44 +02:00
Linus Torvalds 046f153343 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 EFI updates from Peter Anvin:
 "A collection of EFI changes.  The perhaps most important one is to
  fully save and restore the FPU state around each invocation of EFI
  runtime, and to not choke on non-ASCII characters in the boot stub"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Add compatibility code for compat tasks
  efivars: Refactor sanity checking code into separate function
  efivars: Stop passing a struct argument to efivar_validate()
  efivars: Check size of user object
  efivars: Use local variables instead of a pointer dereference
  x86/efi: Save and restore FPU context around efi_calls (i386)
  x86/efi: Save and restore FPU context around efi_calls (x86_64)
  x86/efi: Implement a __efi_call_virt macro
  x86, fpu: Extend the use of static_cpu_has_safe
  x86/efi: Delete most of the efi_call* macros
  efi: x86: Handle arbitrary Unicode characters
  efi: Add get_dram_base() helper function
  efi: Add shared printk wrapper for consistent prefixing
  efi: create memory map iteration helper
  efi: efi-stub-helper cleanup
2014-06-05 08:16:29 -07:00
Linus Torvalds e7a38766d2 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 boot changes from Ingo Molnar:
 "Two small cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, boot: Remove misc.h inclusion from compressed/string.c
  x86, boot: Do not include boot.h in string.c
2014-06-03 15:41:57 -07:00
Vivek Goyal a9a17104a1 x86, boot: Remove misc.h inclusion from compressed/string.c
Given the fact that we removed inclusion of boot.h from boot/string.c
does not look like we need misc.h inclusion in compressed/string.c. So
remove it.

misc.h was also pulling in string_32.h which in turn had macros for
memcmp and memcpy. So we don't need to #undef memcmp and memcpy anymore.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1398447972-27896-3-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-08 08:00:06 -07:00
Andi Kleen 2605fc216f asmlinkage, x86: Add explicit __visible to arch/x86/*
As requested by Linus add explicit __visible to the asmlinkage users.
This marks all functions visible to assembler.

Tree sweep for arch/x86/*

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 16:07:44 -07:00
Matt Fleming 62fa6e69a4 x86/efi: Delete most of the efi_call* macros
We really only need one phys and one virt function call, and then only
one assembly function to make firmware calls.

Since we are not using the C type system anyway, we're not really losing
much by deleting the macros apart from no longer having a check that
we are passing the correct number of parameters. The lack of duplicated
code seems like a worthwhile trade-off.

Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:30 +01:00
H. Peter Anvin c625d1c203 efi: x86: Handle arbitrary Unicode characters
Instead of truncating UTF-16 assuming all characters is ASCII,
properly convert it to UTF-8.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[ Bug and style fixes. ]
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 12:29:25 +01:00
Linus Torvalds 8eab6cd031 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is a collection of minor fixes for x86, plus the IRET information
  leak fix (forbid the use of 16-bit segments in 64-bit mode)"

NOTE! We may have to relax the "forbid the use of 16-bit segments in
64-bit mode" part, since there may be people who still run and depend on
16-bit Windows binaries under Wine.

But I'm taking this in the current unconditional form for now to see who
(if anybody) screams bloody murder.  Maybe nobody cares.  And maybe
we'll have to update it with some kind of runtime enablement (like our
vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already
need to tweak).

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
  efi: Pass correct file handle to efi_file_{read,close}
  x86/efi: Correct EFI boot stub use of code32_start
  x86/efi: Fix boot failure with EFI stub
  x86/platform/hyperv: Handle VMBUS driver being a module
  x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround
  x86, CMCI: Add proper detection of end of CMCI storms
2014-04-11 11:58:33 -07:00
Matt Fleming 47514c996f efi: Pass correct file handle to efi_file_{read,close}
We're currently passing the file handle for the root file system to
efi_file_read() and efi_file_close(), instead of the file handle for the
file we wish to read/close.

While this has worked up until now, it seems that it has only been by
pure luck. Olivier explains,

 "The issue is the UEFI Fat driver might return the same function for
  'fh->read()' and 'h->read()'. While in our case it does not work with
  a different implementation of EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. In our
  case, we return a different pointer when reading a directory and
  reading a file."

Fixing this actually clears up the two functions because we can drop one
of the arguments, and instead only pass a file 'handle' argument.

Reported-by: Olivier Martin <olivier.martin@arm.com>
Reviewed-by: Olivier Martin <olivier.martin@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-10 21:20:03 +01:00
Matt Fleming 7e8213c1f3 x86/efi: Correct EFI boot stub use of code32_start
code32_start should point at the start of the protected mode code, and
*not* at the beginning of the bzImage. This is much easier to do in
assembly so document that callers of make_boot_params() need to fill out
code32_start.

The fallout from this bug is that we would end up relocating the image
but copying the image at some offset, resulting in what appeared to be
memory corruption.

Reported-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-10 21:19:52 +01:00
Matt Fleming 396f1a08db x86/efi: Fix boot failure with EFI stub
commit 54b52d8726 ("x86/efi: Build our own EFI services pointer
table") introduced a regression because the 64-bit file_size()
implementation passed a pointer to a 32-bit data object, instead of a
pointer to a 64-bit object.

Because the firmware treats the object as 64-bits regardless it was
reading random values from the stack for the upper 32-bits.

This resulted in people being unable to boot their machines, after
seeing the following error messages,

    Failed to get file info size
    Failed to alloc highmem for files

Reported-by: Dzmitry Sledneu <dzmitry.sledneu@gmail.com>
Reported-by: Koen Kooi <koen@dominion.thruhere.net>
Tested-by: Koen Kooi <koen@dominion.thruhere.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-10 21:19:47 +01:00
Linus Torvalds 9447dc4394 Merge branch 'x86/boot' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Peter Anvin:
 "This patchset is a set of cleanups aiming at librarize some of the
  common code from the boot environments.  We currently have three
  different "little environments" (boot, boot/compressed, and
  realmode/rm) in x86, and we are likely to soon get a fourth one
  (kexec/purgatory, which will have to be integrated in the kernel to
  support secure kexec).  This is primarily a cleanup in the
  anticipation of the latter.

  While Vivek implemented this, he ran into some bugs, in particular the
  memcmp implementation for when gcc punts from using the builtin would
  have a misnamed symbol, causing compilation errors if we were ever
  unlucky enough that gcc didn't want to inline the test"

* 'x86/boot' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, boot: Move memset() definition in compressed/string.c
  x86, boot: Move memcmp() into string.h and string.c
  x86, boot: Move optimized memcpy() 32/64 bit versions to compressed/string.c
  x86, boot: Create a separate string.h file to provide standard string functions
  x86, boot: Undef memcmp before providing a new definition
2014-04-02 12:23:49 -07:00
Matt Fleming 204b0a1a4b x86, efi: Abstract x86 efi_early calls
The ARM EFI boot stub doesn't need to care about the efi_early
infrastructure that x86 requires in order to do mixed mode thunking. So
wrap everything up in an efi_call_early() macro.

This allows x86 to do the necessary indirection jumps to call whatever
firmware interface is necessary (native or mixed mode), but also allows
the ARM folks to mask the fact that they don't support relocation in the
boot stub and need to pass 'sys_table_arg' to every function.

[ hpa: there are no object code changes from this patch ]

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20140326091011.GB2958@console-pimps.org
Cc: Roy Franz <roy.franz@linaro.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-26 11:30:03 -07:00
Vivek Goyal 04999550f9 x86, boot: Move memset() definition in compressed/string.c
Currently compressed/misc.c needs to link against memset(). I think one of
the reasons of this need is inclusion of various header files which define
static inline functions and use memset() inside these. For example,
include/linux/bitmap.h

I think trying to include "../string.h" and using builtin version of memset
does not work because by the time "#define memset" shows up, it is too
late. Some other header file has already used memset() and expects to
find a definition during link phase.

Currently we have a C definitoin of memset() in misc.c. Move it to
compressed/string.c so that others can use it if need be.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1395170800-11059-6-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-19 15:44:09 -07:00
Vivek Goyal fb4cac573e x86, boot: Move memcmp() into string.h and string.c
Try to treat memcmp() in same way as memcpy() and memset(). Provide a
declaration in boot/string.h and by default user gets a memcmp() which
maps to builtin function.

Move optimized definition of memcmp() in boot/string.c. Now a user can
do #undef memcmp and link against string.c to use optimzied memcmp().

It also simplifies boot/compressed/string.c where we had to redefine
memcmp(). That extra definition is gone now.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1395170800-11059-5-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-19 15:44:04 -07:00
Vivek Goyal 820e8feca0 x86, boot: Move optimized memcpy() 32/64 bit versions to compressed/string.c
Move optimized versions of memcpy to compressed/string.c This will allow
any other code to use these functions too if need be in future. Again
trying to put definition in a common place instead of hiding it in misc.c

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1395170800-11059-4-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-19 15:43:59 -07:00
Vivek Goyal aad830938e x86, boot: Undef memcmp before providing a new definition
With CONFIG_X86_32=y, string_32.h gets pulled in compressed/string.c by
"misch.h". string_32.h defines a macro to map memcmp to __builtin_memcmp().
And that macro in turn changes the name of memcmp() defined here and
converts it to __builtin_memcmp().

I thought that's not the intention though. We probably want to provide
our own optimized definition of memcmp(). If yes, then undef the memcmp
before we define a new memcmp.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1395170800-11059-2-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-19 15:43:37 -07:00
Matt Fleming 617b3c37da Merge branch 'mixed-mode' into efi-for-mingo 2014-03-05 18:18:50 +00:00
Matt Fleming 994448f1af Merge remote-tracking branch 'tip/x86/efi-mixed' into efi-for-mingo
Conflicts:
	arch/x86/kernel/setup.c
	arch/x86/platform/efi/efi.c
	arch/x86/platform/efi/efi_64.c
2014-03-05 18:15:37 +00:00
Matt Fleming 3db4cafdfd x86/boot: Fix non-EFI build
The kbuild test robot reported the following errors, introduced with
commit 54b52d8726 ("x86/efi: Build our own EFI services pointer
table"),

 arch/x86/boot/compressed/head_32.o: In function `efi32_config':
>> (.data+0x58): undefined reference to `efi_call_phys'

 arch/x86/boot/compressed/head_64.o: In function `efi64_config':
>> (.data+0x90): undefined reference to `efi_call6'

Wrap the efi*_config structures in #ifdef CONFIG_EFI_STUB so that we
don't make references to EFI functions if they're not compiled in.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-05 10:19:07 +00:00
Matt Fleming 108d3f44b1 x86/boot: Don't overwrite cr4 when enabling PAE
Some EFI firmware makes use of the FPU during boottime services and
clearing X86_CR4_OSFXSR by overwriting %cr4 causes the firmware to
crash.

Add the PAE bit explicitly instead of trashing the existing contents,
leaving the rest of the bits as the firmware set them.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:59 +00:00
Matt Fleming b8ff87a615 x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.

Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,

    (1) 32-bit loader: handover_offset
    (2) 64-bit loader: handover_offset + 512

Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.

To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.

(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:06 +00:00
Matt Fleming c116e8d60a x86/efi: Split the boot stub into 32/64 code paths
Make the decision which code path to take at runtime based on
efi_early->is64.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:05 +00:00
Matt Fleming 0154416a71 x86/efi: Add early thunk code to go from 64-bit to 32-bit
Implement the transition code to go from IA32e mode to protected mode in
the EFI boot stub. This is required to use 32-bit EFI services from a
64-bit kernel.

Since EFI boot stub is executed in an identity-mapped region, there's
not much we need to do before invoking the 32-bit EFI boot services.
However, we do reload the firmware's global descriptor table
(efi32_boot_gdt) in case things like timer events are still running in
the firmware.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:04 +00:00
Matt Fleming 54b52d8726 x86/efi: Build our own EFI services pointer table
It's not possible to dereference the EFI System table directly when
booting a 64-bit kernel on a 32-bit EFI firmware because the size of
pointers don't match.

In preparation for supporting the above use case, build a list of
function pointers on boot so that callers don't have to worry about
converting pointer sizes through multiple levels of indirection.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:03 +00:00
Matt Fleming 677703cef0 efi: Add separate 32-bit/64-bit definitions
The traditional approach of using machine-specific types such as
'unsigned long' does not allow the kernel to interact with firmware
running in a different CPU mode, e.g. 64-bit kernel with 32-bit EFI.

Add distinct EFI structure definitions for both 32-bit and 64-bit so
that we can use them in the 32-bit and 64-bit code paths.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:02 +00:00
Kees Cook e290e8c59d x86, kaslr: add missed "static" declarations
This silences build warnings about unexported variables and functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20140209215644.GA30339@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 16:59:29 -08:00
Linus Torvalds f4bcd8ccdd Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kernel address space randomization support from Peter Anvin:
 "This enables kernel address space randomization for x86"

* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET
  x86, kaslr: Remove unused including <linux/version.h>
  x86, kaslr: Use char array to gain sizeof sanity
  x86, kaslr: Add a circular multiply for better bit diffusion
  x86, kaslr: Mix entropy sources together as needed
  x86/relocs: Add percpu fixup for GNU ld 2.23
  x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
  x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
  x86, kaslr: Report kernel offset on panic
  x86, kaslr: Select random position from e820 maps
  x86, kaslr: Provide randomness functions
  x86, kaslr: Return location from decompress_kernel
  x86, boot: Move CPU flags out of cpucheck
  x86, relocs: Add more per-cpu gold special cases
2014-01-20 14:45:50 -08:00
Wei Yongjun 19259943f0 x86, kaslr: Remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Link: http://lkml.kernel.org/r/CAPgLHd-Fjx1RybjWFAu1vHRfTvhWwMLL3x46BouC5uNxHPjy1A@mail.gmail.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-14 10:45:56 -08:00
H. Peter Anvin 8b3b005d67 x86, build: Pass in additional -mno-mmx, -mno-sse options
In checkin

    5551a34e5a x86-64, build: Always pass in -mno-sse

we unconditionally added -mno-sse to the main build, to keep newer
compilers from generating SSE instructions from autovectorization.
However, this did not extend to the special environments
(arch/x86/boot, arch/x86/boot/compressed, and arch/x86/realmode/rm).
Add -mno-sse to the compiler command line for these environments, and
add -mno-mmx to all the environments as well, as we don't want a
compiler to generate MMX code either.

This patch also removes a $(cc-option) call for -m32, since we have
long since stopped supporting compilers too old for the -m32 option,
and in fact hardcode it in other places in the Makefiles.

Reported-by: Kevin B. Smith <kevin.b.smith@intel.com>
Cc: Sunil K. Pandey <sunil.k.pandey@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/n/tip-j21wzqv790q834n7yc6g80j1@git.kernel.org
Cc: <stable@vger.kernel.org> # build fix only
2013-12-09 15:52:39 -08:00
Kees Cook 327f7d7245 x86, kaslr: Use char array to gain sizeof sanity
The build_str needs to be char [] not char * for the sizeof() to report
the string length.

Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131112165607.GA5921@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-11-12 08:58:35 -08:00
H. Peter Anvin e8236c4d93 x86, kaslr: Add a circular multiply for better bit diffusion
If we don't have RDRAND (in which case nothing else *should* matter),
most sources have a highly biased entropy distribution.  Use a
circular multiply to diffuse the entropic bits.  A circular multiply
is a good operation for this: it is cheap on standard hardware and
because it is symmetric (unlike an ordinary multiply) it doesn't
introduce its own bias.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20131111222839.GA28616@www.outflux.net
2013-11-11 23:05:49 -08:00
Kees Cook a653f3563c x86, kaslr: Mix entropy sources together as needed
Depending on availability, mix the RDRAND and RDTSC entropy together with
XOR. Only when neither is available should the i8254 be used. Update
the Kconfig documentation to reflect this. Additionally, since bits
used for entropy is masked elsewhere, drop the needless masking in
the get_random_long(). Similarly, use the entire TSC, not just the low
32 bits.

Finally, to improve the starting entropy, do a simple hashing of a
build-time versions string and the boot-time boot_params structure for
some additional level of unpredictability.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131111222839.GA28616@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-11-11 22:29:44 -08:00
Linus Torvalds 69019d77c7 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "Main changes:

   - Add support for earlyprintk=efi which uses the EFI framebuffer.
     Very useful for debugging boot problems.

   - EFI stub support for large memory maps (more than 128 entries)

   - EFI ARM support - this was mostly done by generalizing x86 <-> ARM
     platform differences, such as by moving x86 EFI code into
     drivers/firmware/efi/ and sharing it with ARM.

   - Documentation updates

   - misc fixes"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/efi: Add EFI framebuffer earlyprintk support
  boot, efi: Remove redundant memset()
  x86/efi: Fix config_table_type array termination
  x86 efi: bugfix interrupt disabling sequence
  x86: EFI stub support for large memory maps
  efi: resolve warnings found on ARM compile
  efi: Fix types in EFI calls to match EFI function definitions.
  efi: Renames in handle_cmdline_files() to complete generalization.
  efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
  efi: Allow efi_free() to be called with size of 0
  efi: use efi_get_memory_map() to get final map for x86
  efi: generalize efi_get_memory_map()
  efi: Rename __get_map() to efi_get_memory_map()
  efi: Move unicode to ASCII conversion to shared function.
  efi: Generalize relocate_kernel() for use by other architectures.
  efi: Move relocate_kernel() to shared file.
  efi: Enforce minimum alignment of 1 page on allocations.
  efi: Rename memory allocation/free functions
  efi: Add system table pointer argument to shared functions.
  efi: Move common EFI stub code from x86 arch code to common location
  ...
2013-11-12 10:48:30 +09:00
H. Peter Anvin 6e6a4932b0 x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
When a function is used in more than one file it may not be possible
to immediately tell from context what the intended meaning is.  As
such, it is more important that the naming be self-evident.  Thus,
change get_flags() to get_cpuflags().

For consistency, change check_flags() to check_cpuflags() even though
it is only used in cpucheck.c.

Link: http://lkml.kernel.org/r/1381450698-28710-2-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 04:08:56 -07:00
Kees Cook 82fa9637a2 x86, kaslr: Select random position from e820 maps
Counts available alignment positions across all e820 maps, and chooses
one randomly for the new kernel base address, making sure not to collide
with unsafe memory areas.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-5-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:19 -07:00
Kees Cook 5bfce5ef55 x86, kaslr: Provide randomness functions
Adds potential sources of randomness: RDRAND, RDTSC, or the i8254.

This moves the pre-alternatives inline rdrand function into the header so
both pieces of code can use it. Availability of RDRAND is then controlled
by CONFIG_ARCH_RANDOM, if someone wants to disable it even for kASLR.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-4-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:12 -07:00
Kees Cook 8ab3820fd5 x86, kaslr: Return location from decompress_kernel
This allows decompress_kernel to return a new location for the kernel to
be relocated to. Additionally, enforces CONFIG_PHYSICAL_START as the
minimum relocation position when building with CONFIG_RELOCATABLE.

With CONFIG_RANDOMIZE_BASE set, the choose_kernel_location routine
will select a new location to decompress the kernel, though here it is
presently a no-op. The kernel command line option "nokaslr" is introduced
to bypass these routines.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-3-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:07 -07:00
Kees Cook dd78b97367 x86, boot: Move CPU flags out of cpucheck
Refactor the CPU flags handling out of the cpucheck routines so that
they can be reused by the future ASLR routines (in order to detect CPU
features like RDRAND and RDTSC).

This reworks has_eflag() and has_fpu() to be used on both 32-bit and
64-bit, and refactors the calls to cpuid to make them PIC-safe on 32-bit.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-2-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:02 -07:00
Geyslan G. Bem 49449c30c4 x86: mkpiggy.c: Explicitly close the output file
Even though the resource is released when the application is closed or
when returned from main function, modify the code to make it obvious,
and to keep static analysis tools from complaining.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Link: http://lkml.kernel.org/r/1381184219-10985-1-git-send-email-geyslan@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-08 11:36:09 -07:00
Bart Kuivenhoven 0ce6cda2c7 x86 efi: bugfix interrupt disabling sequence
The problem in efi_main was that the idt was cleared before the
interrupts were disabled.

The UEFI spec states that interrupts aren't used so this shouldn't be
too much of a problem. Peripherals however don't necessarily know about
this and thus might cause interrupts to happen anyway. Even if
ExitBootServices() has been called.

This means there is a risk of an interrupt being triggered while the IDT
register is nullified and the interrupt bit hasn't been cleared,
allowing for a triple fault.

This patch disables the interrupt flag, while leaving the existing IDT
in place. The CPU won't care about the IDT at all as long as the
interrupt bit is off, so it's safe to leave it in place as nothing will
ever happen to it.

[ Removed the now unused 'idt' variable - Matt ]

Signed-off-by: Bart Kuivenhoven <bemk@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-30 10:23:11 +01:00
Linn Crosetto d2078d5adb x86: EFI stub support for large memory maps
This patch fixes a problem with EFI memory maps larger than 128 entries
when booting using the EFI stub, which results in overflowing e820_map
in boot_params and an eventual halt when checking the map size in
sanitize_e820_map().

If the number of map entries is greater than what can fit in e820_map,
add the extra entries to the setup_data list using type SETUP_E820_EXT.
These extra entries are then picked up when the setup_data list is
parsed in parse_e820_ext().

Signed-off-by: Linn Crosetto <linn@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-30 10:23:10 +01:00
Roy Franz 46f4582e7c efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
The handle_cmdline_files now takes the option to handle as a string,
and returns the loaded data through parameters, rather than taking
an x86 specific setup_header structure.  For ARM, this will be used
to load a device tree blob in addition to initrd images.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:43 +01:00
Roy Franz 0e1cadb05b efi: Allow efi_free() to be called with size of 0
Make efi_free() safely callable with size of 0, similar to free() being
callable with NULL pointers, and do nothing in that case.
Remove size checks that this makes redundant.  This also avoids some
size checks in the ARM EFI stub code that will be added as well.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:42 +01:00
Roy Franz ae8e9060a3 efi: use efi_get_memory_map() to get final map for x86
Replace the open-coded memory map getting with the
efi_get_memory_map() that is now general enough to use.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:41 +01:00
Roy Franz 5fef3870c5 efi: Move unicode to ASCII conversion to shared function.
Move the open-coded conversion to a shared function for
use by all architectures.  Change the allocation to prefer
a high address for ARM, as this is required to avoid conflicts
with reserved regions in low memory.  We don't know the specifics
of these regions until after we process the command line and
device tree.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:38 +01:00
Roy Franz 4a9f3a7c33 efi: Generalize relocate_kernel() for use by other architectures.
Rename relocate_kernel() to efi_relocate_kernel(), and take
parameters rather than x86 specific structure.  Add max_addr
argument as for ARM we have some address constraints that we
need to enforce when relocating the kernel.  Add alloc_size
parameter for use by ARM64 which uses an uncompressed kernel,
and needs to allocate space for BSS.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:38 +01:00
Roy Franz c6866d7238 efi: Move relocate_kernel() to shared file.
The relocate_kernel() function will be generalized and used
by all architectures, as they all have similar requirements.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:37 +01:00
Roy Franz 40e4530a00 efi: Rename memory allocation/free functions
Rename them to be more similar, as low_free() could be used to free
memory allocated by both high_alloc() and low_alloc().
high_alloc() -> efi_high_alloc()
low_alloc()  -> efi_low_alloc()
low_free()   -> efi_free()

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:36 +01:00
Roy Franz 876dc36ace efi: Add system table pointer argument to shared functions.
Add system table pointer argument to shared EFI stub related functions
so they no longer use a global system table pointer as they did when part
of eboot.c.  For the ARM EFI stub this allows us to avoid global
variables completely and thereby not have to deal with GOT fixups.
Not having the EFI stub fixup its GOT, which is shared with the
decompressor, simplifies the relocating of the zImage to a
bootable address.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:35 +01:00
Roy Franz 7721da4c1e efi: Move common EFI stub code from x86 arch code to common location
No code changes made, just moving functions and #define from x86 arch
directory to common location.  Code is shared using #include, similar
to how decompression code is shared among architectures.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:34 +01:00
Roy Franz ed37ddffe2 efi: Add proper definitions for some EFI function pointers.
The x86/AMD64 EFI stubs must use a call wrapper to convert between
the Linux and EFI ABIs, so void pointers are sufficient.  For ARM,
the ABIs are compatible, so we can directly invoke the function
pointers.  The functions that are used by the ARM stub are updated
to match the EFI definitions.
Also add some EFI types used by EFI functions.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:33 +01:00
Linus Torvalds aafcd5d757 Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 relocation changes from Ingo Molnar:
 "This tree contains a single change, ELF relocation handling in C - one
  of the kernel randomization patches that makes sense even without
  randomization present upstream"

* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: Move ELF relocation handling to C
2013-09-04 09:38:10 -07:00
Kees Cook a021506107 x86, relocs: Move ELF relocation handling to C
Moves the relocation handling into C, after decompression. This requires
that the decompressed size is passed to the decompression routine as
well so that relocations can be found. Only kernels that need relocation
support will use the code (currently just x86_32), but this is laying
the ground work for 64-bit using it in support of KASLR.

Based on work by Neill Clift and Michael Davidson.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130708161517.GA4832@www.outflux.net
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-07 21:00:04 -07:00
Roy Franz df981edcb9 x86, efi: correct call to free_pages
Specify memory size in pages, not bytes.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-07-26 15:22:32 +01:00
Kyungsik Lee f9b493ac9b arm: add support for LZ4-compressed kernel
Integrates the LZ4 decompression code to the arm pre-boot code.

Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:30 -07:00
Linus Torvalds 1982269a5c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc improvements:

   - Fix /proc/mtrr reporting
   - Fix ioremap printout
   - Remove the unused pvclock fixmap entry on 32-bit
   - misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Correct function name output
  x86: Fix /proc/mtrr with base/size more than 44bits
  ix86: Don't waste fixmap entries
  x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
  x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02 16:29:05 -07:00
Linus Torvalds 4d6f843a38 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "Two fixes that should in principle increase robustness of our
  interaction with the EFI firmware, and a cleanup"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: retry ExitBootServices() on failure
  efi: Convert runtime services function ptrs
  UEFI: Don't pass boot services regions to SetVirtualAddressMap()
2013-07-02 16:25:50 -07:00
Zach Bobroff d3768d885c x86, efi: retry ExitBootServices() on failure
ExitBootServices is absolutely supposed to return a failure if any
ExitBootServices event handler changes the memory map.  Basically the
get_map loop should run again if ExitBootServices returns an error the
first time.  I would say it would be fair that if ExitBootServices gives
an error the second time then Linux would be fine in returning control
back to BIOS.

The second change is the following line:

again:
        size += sizeof(*mem_map) * 2;

Originally you were incrementing it by the size of one memory map entry.
The issue here is all related to the low_alloc routine you are using.
In this routine you are making allocations to get the memory map itself.
Doing this allocation or allocations can affect the memory map by more
than one record.

[ mfleming - changelog, code style ]
Signed-off-by: Zach Bobroff <zacharyb@ami.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-11 07:51:54 +01:00
Matthew Garrett f8b8404337 Modify UEFI anti-bricking code
This patch reworks the UEFI anti-bricking code, including an effective
reversion of cc5a080c and 31ff2f20. It turns out that calling
QueryVariableInfo() from boot services results in some firmware
implementations jumping to physical addresses even after entering virtual
mode, so until we have 1:1 mappings for UEFI runtime space this isn't
going to work so well.

Reverting these gets us back to the situation where we'd refuse to create
variables on some systems because they classify deleted variables as "used"
until the firmware triggers a garbage collection run, which they won't do
until they reach a lower threshold. This results in it being impossible to
install a bootloader, which is unhelpful.

Feedback from Samsung indicates that the firmware doesn't need more than
5KB of storage space for its own purposes, so that seems like a reasonable
threshold. However, there's still no guarantee that a platform will attempt
garbage collection merely because it drops below this threshold. It seems
that this is often only triggered if an attempt to write generates a
genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to
create a variable larger than the remaining space. This should fail, but if
it somehow succeeds we can then immediately delete it.

I've tested this on the UEFI machines I have available, but I don't have
a Samsung and so can't verify that it avoids the bricking problem.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ]
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-10 21:59:37 +01:00
Zhang Yanfei 592a9b8cc8 x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
arch/x86/boot/compressed/head_64.S includes <asm/pgtable_types.h> and
 <asm/page_types.h> but it doesn't look like it needs them. So remove them.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191FAE2.4020403@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:47:23 +02:00
Linus Torvalds 874f6d1be7 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc smaller cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/lib: Fix spelling, put space between a numeral and its units
  x86/lib: Fix spelling in the comments
  x86, quirks: Shut-up a long-standing gcc warning
  x86, msr: Unify variable names
  x86-64, docs, mm: Add vsyscall range to virtual address space layout
  x86: Drop KERNEL_IMAGE_START
  x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety
2013-04-30 08:34:07 -07:00
H. Peter Anvin 697dfd8844 * The EFI variable anti-bricking algorithm merged in -rc8 broke booting
on some Apple machines because they implement EFI spec 1.10, which
    doesn't provide a QueryVariableInfo() runtime function and the logic
    used to check for the existence of that function was insufficient.
    Fix from Josh Boyer.
 
  * The anti-bricking algorithm also introduced a compiler warning on
    32-bit. Fix from Borislav Petkov.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJReOtLAAoJEC84WcCNIz1VFZgP/Aws1NdPo/RdyI6/oGkI7ZV4
 +5O79pLcaJt7ESuWjx2/9pto/qTzsWMri40HZivGbgxw+ViEdprGjJUFqSTn1LyJ
 QrYamP40jBdLFfh1oDHvsub8HiC72sjB/ILSoDvooHEniDmajrL6zZK7C66gP+na
 Q4ZN/Jp3x3XAW0s1mVJC4VnL60489Q/ndR3SH01hr2gqMSvmjwnhfiio6n9gYvdd
 egmoalTIst94+X0nW1VHA4HT3SRM7cuwCA/kDxtG6qitbsQMUKUoa+DOpMNfE8mD
 QdzmzZL115O+7ORj8Ki/JNS2CSyI83IRSQ3kcM1J5026mWIBMiM3h9Vlu5NwAyFA
 bapZSaYr7S5u9BU/vICGnpyYnSsLfjuB3CnAuJFyM0YVFjR6n7moUpnP1LNifGHX
 E/Qr1HDyIwwxE8K0f/n86a7BfstoMjzE74an6wOVXKDUY/RnH+FdWG/HDBPd8iG4
 Avei1bK2zLLcXK4Kqmx8EkXTK7VSFx6StCPjAVlpgYOAMpRmQEmNpd/3lF7Y70gp
 yXIBTSTKaPZ+/5SaeOPL2sgW37Uo9fFMphww2mLXGIdgO3L0BHD5hIq9pZQ7g0VK
 noDN7f6ViCuNYuZIrTAtLo9Oc+KKgqOXa0TovUhORkJ8Gk93moL4fgYyFVPvsYnD
 rQuTRJ3pZEEHlCmyZzBl
 =l/fT
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * The EFI variable anti-bricking algorithm merged in -rc8 broke booting
   on some Apple machines because they implement EFI spec 1.10, which
   doesn't provide a QueryVariableInfo() runtime function and the logic
   used to check for the existence of that function was insufficient.
   Fix from Josh Boyer.

 * The anti-bricking algorithm also introduced a compiler warning on
   32-bit. Fix from Borislav Petkov.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-25 14:00:22 -07:00
Josh Boyer f697036b93 efi: Check EFI revision in setup_efi_vars
We need to check the runtime sys_table for the EFI version the firmware
specifies instead of just checking for a NULL QueryVariableInfo.  Older
implementations of EFI don't have QueryVariableInfo but the runtime is
a smaller structure, so the pointer to it may be pointing off into garbage.

This is apparently the case with several Apple firmwares that support EFI
1.10, and the current check causes them to no longer boot.  Fix based on
a suggestion from Matthew Garrett.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-24 16:19:01 +01:00
Borislav Petkov 51f8fbba64 x86, efi: Fix a build warning
Fix this:

arch/x86/boot/compressed/eboot.c: In function ‘setup_efi_vars’:
arch/x86/boot/compressed/eboot.c:269:2: warning: passing argument 1 of ‘efi_call_phys’ makes pointer from integer without a cast [enabled by default]
In file included from arch/x86/boot/compressed/eboot.c:12:0:
/w/kernel/linux/arch/x86/include/asm/efi.h:8:33: note: expected ‘void *’ but argument is of type ‘long unsigned int’

after cc5a080c5d ("efi: Pass boot services variable info to runtime
code").

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-24 11:57:15 +01:00
H. Peter Anvin c0a9f451e4 Merge remote-tracking branch 'efi/urgent' into x86/urgent
Matt Fleming (1):
      x86, efivars: firmware bug workarounds should be in platform
      code

Matthew Garrett (3):
      Move utf16 functions to kernel core and rename
      efi: Pass boot services variable info to runtime code
      efi: Distinguish between "remaining space" and actually used
      space

Richard Weinberger (2):
      x86,efi: Check max_size only if it is non-zero.
      x86,efi: Implement efi_no_storage_paranoia parameter

Sergey Vlasov (2):
      x86/Kconfig: Make EFI select UCS2_STRING
      efi: Export efi_query_variable_store() for efivars.ko

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-19 17:09:03 -07:00
Matthew Garrett cc5a080c5d efi: Pass boot services variable info to runtime code
EFI variables can be flagged as being accessible only within boot services.
This makes it awkward for us to figure out how much space they use at
runtime. In theory we could figure this out by simply comparing the results
from QueryVariableInfo() to the space used by all of our variables, but
that fails if the platform doesn't garbage collect on every boot. Thankfully,
calling QueryVariableInfo() while still inside boot services gives a more
reliable answer. This patch passes that information from the EFI boot stub
up to the efi platform code.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-15 21:31:09 +01:00
Jan Beulich 918708245e x86: Fix rebuild with EFI_STUB enabled
eboot.o and efi_stub_$(BITS).o didn't get added to "targets", and hence
their .cmd files don't get included by the build machinery, leading to
the files always getting rebuilt.

Rather than adding the two files individually, take the opportunity and
add $(VMLINUX_OBJS) to "targets" instead, thus allowing the assignment
at the top of the file to be shrunk quite a bit.

At the same time, remove a pointless flags override line - the variable
assigned to was misspelled anyway, and the options added are
meaningless for assembly sources.

[ hpa: the patch is not minimal, but I am taking it for -urgent anyway
  since the excess impact of the patch seems to be small enough. ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/515C5D2502000078000CA6AD@nat28.tlf.novell.com
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-05 13:59:23 -07:00
Lans Zhang 2dead15fb8 x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety
In startup_32, the running code still uses the initial GDT
located in setup. Thus, __BOOT_DS is preferred. Currently
__KERNEL_DS is lucky to equal to __BOOT_DS, but this is
not always a safe way.

Signed-off-by: Lans Zhang <lans.zhang2008@gmail.com>
Link: http://lkml.kernel.org/r/51300267.6000008@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-03-01 10:18:33 -08:00
Linus Torvalds e3c4877de8 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/EFI changes from Peter Anvin:

 - Improve the initrd handling in the EFI boot stub by allowing forward
   slashes in the pathname - from Chun-Yi Lee.

 - Cleanup code duplication in the EFI mixed kernel/firmware code - from
   Satoru Takeuchi.

 - efivarfs bug fixes for more strict filename validation, with lots of
   input from Al Viro.

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: remove duplicate code in setup_arch() by using, efi_is_native()
  efivarfs: guid part of filenames are case-insensitive
  efivarfs: Validate filenames much more aggressively
  efivarfs: Use sizeof() instead of magic number
  x86, efi: Allow slash in file path of initrd
2013-02-27 16:17:42 -08:00
Linus Torvalds 2ef14f465b Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Peter Anvin:
 "This is a huge set of several partly interrelated (and concurrently
  developed) changes, which is why the branch history is messier than
  one would like.

  The *really* big items are two humonguous patchsets mostly developed
  by Yinghai Lu at my request, which completely revamps the way we
  create initial page tables.  In particular, rather than estimating how
  much memory we will need for page tables and then build them into that
  memory -- a calculation that has shown to be incredibly fragile -- we
  now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
  a #PF handler which creates temporary page tables on demand.

  This has several advantages:

  1. It makes it much easier to support things that need access to data
     very early (a followon patchset uses this to load microcode way
     early in the kernel startup).

  2. It allows the kernel and all the kernel data objects to be invoked
     from above the 4 GB limit.  This allows kdump to work on very large
     systems.

  3. It greatly reduces the difference between Xen and native (Xen's
     equivalent of the #PF handler are the temporary page tables created
     by the domain builder), eliminating a bunch of fragile hooks.

  The patch series also gets us a bit closer to W^X.

  Additional work in this pull is the 64-bit get_user() work which you
  were also involved with, and a bunch of cleanups/speedups to
  __phys_addr()/__pa()."

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
  x86, mm: Move reserving low memory later in initialization
  x86, doc: Clarify the use of asm("%edx") in uaccess.h
  x86, mm: Redesign get_user with a __builtin_choose_expr hack
  x86: Be consistent with data size in getuser.S
  x86, mm: Use a bitfield to mask nuisance get_user() warnings
  x86/kvm: Fix compile warning in kvm_register_steal_time()
  x86-32: Add support for 64bit get_user()
  x86-32, mm: Remove reference to alloc_remap()
  x86-32, mm: Remove reference to resume_map_numa_kva()
  x86-32, mm: Rip out x86_32 NUMA remapping code
  x86/numa: Use __pa_nodebug() instead
  x86: Don't panic if can not alloc buffer for swiotlb
  mm: Add alloc_bootmem_low_pages_nopanic()
  x86, 64bit, mm: hibernate use generic mapping_init
  x86, 64bit, mm: Mark data/bss/brk to nx
  x86: Merge early kernel reserve for 32bit and 64bit
  x86: Add Crash kernel low reservation
  x86, kdump: Remove crashkernel range find limit for 64bit
  memblock: Add memblock_mem_size()
  x86, boot: Not need to check setup_header version for setup_data
  ...
2013-02-21 18:06:55 -08:00
Linus Torvalds 5abcd76f5d Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 bootup changes from Ingo Molnar:
 "Deal with bootloaders which fail to initialize unknown fields in
  boot_params to zero, by sanitizing boot params passed in.

  This unbreaks versions of kexec-utils.  Other bootloaders do not
  appear to show sensitivity to this change, but it's a possibility for
  breakage nevertheless."

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, boot: Sanitize boot_params if not zeroed on creation
2013-02-19 19:11:10 -08:00
Lee, Chun-Yi deb94101c4 x86, efi: Allow slash in file path of initrd
When initrd file didn't put at the same place with stub kernel, we
need give the file path of initrd, but need use backslash to separate
directory and file. It's not friendly to unix/linux user, and not so
intuitive for bootloader forward paramters to efi stub kernel by
chainloading.

This patch add support to handle_ramdisks for allow slash in file path
of initrd, it convert slash to backlash when parsing path.

In additional, this patch also separates print code of efi_char16_t from
efi_printk, and print out the path/filename of initrd when failed to open
initrd file. It's good for debug and discover typo.

Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-31 14:44:44 +00:00
H. Peter Anvin becbd66080 Various urgent EFI fixes and some warning cleanups for v3.8
* EFI boot stub fix for Macbook Pro's from Maarten Lankhorst
   * Fix an oops in efivarfs from Lingzhu Xiang
   * 32-bit warning cleanups from Jan Beulich
   * Patch to Boot on >512GB RAM systems from Nathan Zimmer
   * Set efi.runtime_version correctly
   * efivarfs updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRCBrMAAoJEC84WcCNIz1VTdcP/2u3ZqohOKJAwwMkyzB3nkrQ
 1mhxKGFDitAAvGQQCOq3oIMgBZHOevKznH3hZtX+hxBxwu7AuNL+qw6Baz8GYZpz
 guFvAZjm2JX2ko1PgtNvPUFZ1krw7TObLW2YstTWhSDoOlRK5kqmA+idaJf1aHDe
 /cwV6Mr6u5N/egyBBcQI1ydKLA6ogmx1zfDsS9b2Vzavw168RGqfrpH3ybcokYND
 /E2NtcRVZagBw35eZHEDNKcoPt5z+skCA4nJyA6bLbxMsq51ZKaK0PKKaA8vd70s
 6Pc7d6zkQG/ZmaxrRfsdQUAYfJRJq/cpeTgS4YurkZB0r0gdxk6I86vYlg+xXi0X
 eqLAkUJJJasVY/1NK/c2vsJ03W9wDYkd2IJpUcl7rWz7Aa/RurY32QmT3SnLop7m
 Tzj3CgXAu/RH8FyMNMWpI85tOis7OcMUfrjmnxquQdCZpLXSsh7Rf5EgBRiv9xhH
 txDOX3y21Jnv2A5efAVWm5EbyI204Wq2nVDzSu0xTMXWkzdBg+/OeyYfzV0Sdguf
 3/MzYTn7mVXh/EZtnvsTyNjgvVxzpXW6mAf+ne9iJaC8MUJVIeSjB7xzSfuHXUBU
 aUc9OnbkHRJCdVSeKqZbLwO3X5mTXqmDMfIcRle3BPewvZ9pOEv8VrGgsNxh9ixW
 JaCpiTdxJDFtz6cLVsNa
 =QrJx
 -----END PGP SIGNATURE-----

Merge tag 'efi-for-3.8' into x86/efi

Various urgent EFI fixes and some warning cleanups for v3.8

  * EFI boot stub fix for Macbook Pro's from Maarten Lankhorst
  * Fix an oops in efivarfs from Lingzhu Xiang
  * 32-bit warning cleanups from Jan Beulich
  * Patch to Boot on >512GB RAM systems from Nathan Zimmer
  * Set efi.runtime_version correctly
  * efivarfs updates

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-30 14:43:05 -08:00
Yinghai Lu 8ee2f2dfdb x86, boot: Update comments about entries for 64bit image
Now 64bit entry is fixed on 0x200, can not be changed anymore.

Update the comments to reflect that.

Also put info about it in boot.txt

-v2: fix some grammar error

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-27-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu ee92d81502 x86, boot: Support loading bzImage, boot_params and ramdisk above 4G
xloadflags bit 1 indicates that we can load the kernel and all data
structures above 4G; it is set if kernel is relocatable and 64bit.

bootloader will check if xloadflags bit 1 is set to decide if
it could load ramdisk and kernel high above 4G.

bootloader will fill value to ext_ramdisk_image/size for high 32bits
when it load ramdisk above 4G.
kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:33 -08:00
Yinghai Lu d3c433bf9a x86, boot: Move lldt/ltr out of 64bit code section
commit 08da5a2ca

    x86_64: Early segment setup for VT

sets up LDT and TR into a valid state in order to speed up boot
decompression under VT.

Those code are put in code64, and it is using GDT that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path and jump to startup_64 directly, and it has different
GDT.

Move those lines into code32 after their GDT is loaded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-21-git-send-email-yinghai@kernel.org
Cc: Zachary Amsden <zamsden@gmail.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:19 -08:00
Yinghai Lu 187a8a73ce x86, boot: Move verify_cpu.S and no_longmode down
We need to move some code to 32bit section in following patch:

   x86, boot: Move lldt/ltr out of 64bit code section

but that will push startup_64 down from 0x200.

According to hpa, we can not change startup_64 position and that
is an ABI.

We could move function verify_cpu and no_longmode down, because
verify_cpu is used via function call and no_longmode will not
return, then we don't need to add extra code for jumping back.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-20-git-send-email-yinghai@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:15 -08:00
Yinghai Lu f1da834cd9 x86, boot: Add get_cmd_line_ptr()
Add an accessor function for the command line address.
Later we will add support for holding a 64-bit address via ext_cmd_line_ptr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:25:45 -08:00
Maarten Lankhorst 739701888f x86, efi: remove attribute check from setup_efi_pci
It looks like the original commit that copied the rom contents from
efi always copied the rom, and the fixup in setup_efi_pci from commit
886d751a2e ("x86, efi: correct precedence of operators in
setup_efi_pci") broke that.

This resulted in macbook pro's no longer finding the rom images, and
thus not being able to use the radeon card any more.

The solution is to just remove the check for now, and always copy the
rom if available.

Reported-by: Vitaly Budovski <vbudovski+news@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-29 17:52:06 +00:00
H. Peter Anvin 5dcd14ecd4 x86, boot: Sanitize boot_params if not zeroed on creation
Use the new sentinel field to detect bootloaders which fail to follow
protocol and don't initialize fields in struct boot_params that they
do not explicitly initialize to zero.

Based on an original patch and research by Yinghai Lu.
Changed by hpa to be invoked both in the decompression path and in the
kernel proper; the latter for the case where a bootloader takes over
decompression.

Originally-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 01:22:17 -08:00
David Woodhouse 99f857db88 x86, build: Dynamically find entry points in compressed startup code
We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.

Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.

Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.

This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.

Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.

This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.

[ hpa: some of the issues in the description are addressed or
  retconned by the 2.12 boot protocol.  This patch has been edited to
  only remove fixed addresses that were *not* thus retconned. ]

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse b607e21267 x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode
The 'Attributes' argument to pci->Attributes() function is 64-bit. So
when invoking in 32-bit mode it takes two registers, not just one.

This fixes memory corruption when booting via the 32-bit EFI boot stub.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse f791620fa7 x86, efi: Fix 32-bit EFI handover protocol entry point
If the bootloader calls the EFI handover entry point as a standard function
call, then it'll have a return address on the stack. We need to pop that
before calling efi_main(), or the arguments will all be out of position on
the stack.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse 70a479cbe8 x86, efi: Fix display detection in EFI boot stub
When booting under OVMF we have precisely one GOP device, and it
implements the ConOut protocol.

We break out of the loop when we look at it... and then promptly abort
because 'first_gop' never gets set. We should set first_gop *before*
breaking out of the loop. Yes, it doesn't really mean "first" any more,
but that doesn't matter. It's only a flag to indicate that a suitable
GOP was found.

In fact, we'd do just as well to initialise 'width' to zero in this
function, then just check *that* instead of first_gop. But I'll do the
minimal fix for now (and for stable@).

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
Jan Beulich bc754790f9 x86, efi: fix 32-bit warnings in setup_efi_pci()
Fix four similar build warnings on 32-bit (casts between different
size pointers and integers).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Stefan Hasko <hasko.stevo@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-25 10:22:53 +00:00
Sasha Levin 886d751a2e x86, efi: correct precedence of operators in setup_efi_pci
With the current code, the condition in the if() doesn't make much sense due to
precedence of operators.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-25-git-send-email-sasha.levin@oracle.com
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 11:47:14 -08:00
Linus Torvalds 193c0d6825 PCI changes for the v3.8 merge window:
Host bridge hotplug:
     - Untangle _PRT from struct pci_bus (Bjorn Helgaas)
     - Request _OSC control before scanning root bus (Taku Izumi)
     - Assign resources when adding host bridge (Yinghai Lu)
     - Remove root bus when removing host bridge (Yinghai Lu)
     - Remove _PRT during hot remove (Yinghai Lu)
 
   SRIOV
     - Add sysfs knobs to control numVFs (Don Dutile)
 
   Power management
     - Notify devices when power resource turned on (Huang Ying)
 
   Bug fixes
     - Work around broken _SEG on HP xw9300 (Bjorn Helgaas)
     - Keep runtime PM enabled for unbound PCI devices (Huang Ying)
     - Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie)
     - Fix xen frontend shutdown issue (David Vrabel)
     - Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott)
 
   Miscellaneous
     - Add GPL license for drivers/pci/ioapic (Andrew Cooks)
     - Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas)
     - NumaChip remote PCI support (Daniel Blueman)
     - Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo Han)
     - Convert dev_printk() to dev_info(), etc (Joe Perches)
     - Add support for non PCI BAR ROM data (Matthew Garrett)
     - Add x86 support for host bridge translation offset (Mike Yoknis)
     - Report success only when every driver supports AER (Vijay Pandarathil)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQyKwSAAoJEPGMOI97Hn6zScgQAJZK2VDfCv74mKrgSDNokIzH
 5nVDrc9AHKJm7CUODs6keJK5d4TD/za3Zao68zrYHsJJKes2ni2Z3W34HP2RXKK2
 eOmePXOHYPPZMlimP9r9cVxNu1ZJCyp/yWSBcsPF4zUgWhBWLRaSj85I049gQ0sz
 +05nZYfLjVd3HNiaXsG4CQyMrNF46XEsLhF9vs+Nr2GHPwrpzhfScgYv63oDS86C
 3ICKsjmiRUZcNelxIFYmyxa5u89QdW5XHjzc9eHGQuus24Vxw+TZzsdfc17sUJEE
 HTyXY+RjDpOVhdtwwUjrCEOiyZYvy3g9+3sKxoxgt/76ghdUaR7fxITwB97qVMFD
 T0ESlKjSV/Qv5QYdyy5uP4zwNs/PXCWXkTg/L1m71F30BxKWDa7tgiA6uK7Z7fl5
 1aokKBdk3mtJJJIDJG1YkxPXx/JItTGCNYrx7CcFj49rSjrUWLQdmrYahersRIsB
 3wiD2xTi9e4dXeP/+VGzGOWB/sHk+73jvrvZe/REa1FCnMINDz4+9V9WaGROMqyq
 MQ8kX0KfYcNVNxy1GOXjU5wLpMN/t/QbvI7gwzRP1DAUCJPoOgFy7AjvSTVG3zuy
 8CtdOFttVkUn5dqsbQR0gVbyQVTS3PGSKz5XC/s8kVDWhja0xZTBYwrskM/4zdSD
 Xf48OyYV5EjpC3FYUSiU
 =OE3Q
 -----END PGP SIGNATURE-----

Merge tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI update from Bjorn Helgaas:
 "Host bridge hotplug:
   - Untangle _PRT from struct pci_bus (Bjorn Helgaas)
   - Request _OSC control before scanning root bus (Taku Izumi)
   - Assign resources when adding host bridge (Yinghai Lu)
   - Remove root bus when removing host bridge (Yinghai Lu)
   - Remove _PRT during hot remove (Yinghai Lu)

  SRIOV
    - Add sysfs knobs to control numVFs (Don Dutile)

  Power management
   - Notify devices when power resource turned on (Huang Ying)

  Bug fixes
   - Work around broken _SEG on HP xw9300 (Bjorn Helgaas)
   - Keep runtime PM enabled for unbound PCI devices (Huang Ying)
   - Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie)
   - Fix xen frontend shutdown issue (David Vrabel)
   - Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott)

  Miscellaneous
   - Add GPL license for drivers/pci/ioapic (Andrew Cooks)
   - Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas)
   - NumaChip remote PCI support (Daniel Blueman)
   - Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo
     Han)
   - Convert dev_printk() to dev_info(), etc (Joe Perches)
   - Add support for non PCI BAR ROM data (Matthew Garrett)
   - Add x86 support for host bridge translation offset (Mike Yoknis)
   - Report success only when every driver supports AER (Vijay
     Pandarathil)"

Fix up trivial conflicts.

* tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
  PCI: Use phys_addr_t for physical ROM address
  x86/PCI: Add NumaChip remote PCI support
  ath9k: Use standard #defines for PCIe Capability ASPM fields
  iwlwifi: Use standard #defines for PCIe Capability ASPM fields
  iwlwifi: collapse wrapper for pcie_capability_read_word()
  iwlegacy: Use standard #defines for PCIe Capability ASPM fields
  iwlegacy: collapse wrapper for pcie_capability_read_word()
  cxgb3: Use standard #defines for PCIe Capability ASPM fields
  PCI: Add standard PCIe Capability Link ASPM field names
  PCI/portdrv: Use PCI Express Capability accessors
  PCI: Use standard PCIe Capability Link register field names
  x86: Use PCI setup data
  PCI: Add support for non-BAR ROMs
  PCI: Add pcibios_add_device
  EFI: Stash ROMs if they're not in the PCI BAR
  PCI: Add and use standard PCI-X Capability register names
  PCI/PM: Keep runtime PM enabled for unbound PCI devices
  xen-pcifront: Handle backend CLOSED without CLOSING
  PCI: SRIOV control and status via sysfs (documentation)
  PCI/AER: Report success only when every device has AER-aware driver
  ...
2012-12-13 12:14:47 -08:00
Matthew Garrett dd5fc854de EFI: Stash ROMs if they're not in the PCI BAR
EFI provides support for providing PCI ROMs via means other than the ROM
BAR. This support vanishes after we've exited boot services, so add support
for stashing copies of the ROMs in setup_data if they're not otherwise
available.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-12-05 14:33:26 -07:00
Matt Fleming 0f905a43ce x86, efi: Fix processor-specific memcpy() build error
Building for Athlon/Duron/K7 results in the following build error,

arch/x86/boot/compressed/eboot.o: In function `__constant_memcpy3d':
eboot.c:(.text+0x385): undefined reference to `_mmx_memcpy'
arch/x86/boot/compressed/eboot.o: In function `efi_main':
eboot.c:(.text+0x1a22): undefined reference to `_mmx_memcpy'

because the boot stub code doesn't link with the kernel proper, and
therefore doesn't have access to the 3DNow version of memcpy. So,
follow the example of misc.c and #undef memcpy so that we use the
version provided by misc.c.

See https://bugzilla.kernel.org/show_bug.cgi?id=50391

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Ryan Underwood <nemesis@icequake.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-11-20 20:52:07 +00:00
Matthew Garrett e9b10953ed x86, EFI: Calculate the EFI framebuffer size instead of trusting the firmware
Seth Forshee reported that his system was reporting that the EFI framebuffer
stretched from 0x90010000-0xb0010000 despite the GPU's BAR only covering
0x90000000-0x9ffffff. It's safer to calculate this value from the pixel
stride and screen height (values we already depend on) rather than face
potential problems with resource allocation later on.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-09-17 13:29:24 +01:00
Matthew Garrett f462ed939d efifb: Skip DMI checks if the bootloader knows what it's doing
The majority of the DMI checks in efifb are for cases where the bootloader
has provided invalid information. However, on some machines the overrides
may do more harm than good due to configuration differences between machines
with the same machine identifier. It turns out that it's possible for the
bootloader to get the correct information on GOP-based systems, but we
can't guarantee that the kernel's being booted with one that's been updated
to do so. Add support for a capabilities flag that can be set by the
bootloader, and skip the DMI checks in that case. Additionally, set this
flag in the UEFI stub code.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-09-17 13:29:23 +01:00
Matthew Garrett 9dead5bbb8 efi: Build EFI stub with EFI-appropriate options
We can't assume the presence of the red zone while we're still in a boot
services environment, so we should build with -fno-red-zone to avoid
problems. Change the size of wchar at the same time to make string handling
simpler.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-09-17 13:29:21 +01:00
Matthew Garrett 38cb5ef447 X86: Improve GOP detection in the EFI boot stub
We currently use the PCI IO protocol as a proxy for a functional GOP. This
is less than ideal, since some platforms will put the GOP on output devices
rather than the GPU itself. Move to using the conout protocol. This is not
guaranteed per-spec, but is part of the consplitter implementation that
causes this problem in the first place and so should be reliable.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-09-17 13:29:20 +01:00
Linus Torvalds 0a2fe19ccc Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pul x86/efi changes from Ingo Molnar:
 "This tree adds an EFI bootloader handover protocol, which, once
  supported on the bootloader side, will make bootup faster and might
  result in simpler bootloaders.

  The other change activates the EFI wall clock time accessors on x86-64
  as well, instead of the legacy RTC readout."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Handover Protocol
  x86-64/efi: Use EFI to deal with platform wall clock
2012-07-26 13:13:25 -07:00
Gokul Caushik bd448d4d0a x86, boot: Exclude cmdline.c if you can't use it
CONFIG_EARLY_PRINTK is the only feature that might use command line
parsing in the decompression stage.  If it is disabled then we can
exclude the related code to save space. This can result in an estimated
space savings of 2240 bytes from the compressed kernel image.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-8-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:39 -07:00
Joe Millenbach cec49df9d3 x86, boot: Exclude early_serial_console.c if can't use it.
Removes early_serial_console.c code if we don't have the config option that
enables it (EARLY_PRINTK). When disabling this code, make early_serial_base a
constant 0 to allow the compiler to optimize away the code that checks for
early_serial_base.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-7-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:34 -07:00
Joe Millenbach 641a1cebfe x86, boot: Removed unused debug flag and set code
As we're no longer using the flag we don't need to extract the value from the
command line and store it. This is a step towards removing command line
parameter code.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-6-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:29 -07:00
Joe Millenbach 7aac3015b5 x86, boot: Switch output functions from command-line flags to conditional compilation
Changed putstr flagging from parameter to conditional compilation for puts,
debug_putstr, and error_putstr. This allows for space savings since most
configurations won't use this feature.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-5-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:25 -07:00
Joe Millenbach cb454fe104 x86, boot: Changed error putstr path to match new debug_putstr format
For consistency we changed the error output path to match the new debug path.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-4-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:22 -07:00
Joe Millenbach e605a42597 x86, boot: Wrap debug printing in a new debug_putstr function
Change all instances of if (debug) putstr(...) to a new debug_putstr(...).
This allows a future change to conditionally stub out debug_putstr to save
space.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-3-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:18 -07:00
Joe Millenbach 9f4e4392cb x86, boot: Removed quiet flag and switched quiet output to debug flag
There are only 3 uses of the quiet flag and they all protect output that
is only useful for debugging the stub, therefore we switched to using the
debug flag for all extra output.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1342746282-28497-2-git-send-email-jmillenbach@gmail.com
Signed-off-by: Gokul Caushik <caushik1@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-21 11:07:15 -07:00
Matt Fleming 9ca8f72a92 x86, efi: Handover Protocol
As things currently stand, traditional EFI boot loaders and the EFI
boot stub are carrying essentially the same initialisation code
required to setup an EFI machine for booting a kernel. There's really
no need to have this code in two places and the hope is that, with
this new protocol, initialisation and booting of the kernel can be
left solely to the kernel's EFI boot stub. The responsibilities of the
boot loader then become,

   o Loading the kernel image from boot media

File system code still needs to be carried by boot loaders for the
scenario where the kernel and initrd files reside on a file system
that the EFI firmware doesn't natively understand, such as ext4, etc.

   o Providing a user interface

Boot loaders still need to display any menus/interfaces, for example
to allow the user to select from a list of kernels.

Bump the boot protocol number because we added the 'handover_offset'
field to indicate the location of the handover protocol entry point.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Acked-and-Tested-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1342689828-16815-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-20 16:18:58 -07:00
Matt Fleming 9fa7dedad3 x86, efi; Add EFI boot stub console support
We need a way of printing useful messages to the user, for example
when we fail to open an initrd file, instead of just hanging the
machine without giving the user any indication of what went wrong. So
sprinkle some error messages throughout the EFI boot stub code to make
it easier for users to diagnose/report problems.

Reported-by: Keshav P R <the.ridikulus.rat@gmail.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1331907517-3985-3-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-01 09:11:26 -07:00
Matt Fleming 30dc0d0fe5 x86, efi: Only close open files in error path
The loop at the 'close_handles' label in handle_ramdisks() should be
using 'i', which represents the number of initrd files that were
successfully opened, not 'nr_initrds' which is the number of initrd=
arguments passed on the command line.

Currently, if we execute the loop to close all file handles and we
failed to open any initrds we'll try to call the close function on a
garbage pointer, causing the machine to hang.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1331907517-3985-2-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-01 09:11:10 -07:00
Linus Torvalds 8ca038dc10 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI updates from Ingo Molnar:
 "This patchset makes changes to the bzImage EFI header, so that it can
  be signed with a secure boot signature tool.  It should not affect
  anyone who is not using the EFI self-boot feature in any way."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Fix NumberOfRvaAndSizes field in PE32 header for EFI_STUB
  x86, efi: Fix .text section overlapping image header for EFI_STUB
  x86, efi: Fix issue of overlapping .reloc section for EFI_STUB
2012-05-23 10:40:34 -07:00
H. Peter Anvin 6520fe5564 x86, realmode: 16-bit real-mode code support for relocs tool
A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.

In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.

16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.

The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture.  be compiled before building the arch/x86 tree.

[ hpa: accelerating this because it detects invalid absolute
  relocations, a serious bug in binutils 2.22.52.0.x which currently
  produces bad kernels. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:49:40 -07:00
Kusanagi Kouichi 7c77cda0fe x86, relocs: Remove an unused variable
sh_symtab is set but not used.

[ hpa: putting this in urgent because of the sheer harmlessness of the patch:
  it quiets a build warning but does not change any generated code. ]

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Link: http://lkml.kernel.org/r/20120401082932.D5E066FC03D@msa105.auone-net.jp
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-04-30 12:55:15 -07:00
Matt Fleming b1994304fc x86, efi: Add dedicated EFI stub entry point
The method used to work out whether we were booted by EFI firmware or
via a boot loader is broken. Because efi_main() is always executed
when booting from a boot loader we will dereference invalid pointers
either on the stack (CONFIG_X86_32) or contained in %rdx
(CONFIG_X86_64) when searching for an EFI System Table signature.

Instead of dereferencing these invalid system table pointers, add a
new entry point that is only used when booting from EFI firmware, when
we know the pointer arguments will be valid. With this change legacy
boot loaders will no longer execute efi_main(), but will instead skip
EFI stub initialisation completely.

[ hpa: Marking this for urgent/stable since it is a regression when
  the option is enabled; without the option the patch has no effect ]

Signed-off-by: Matt Fleming <matt.hfleming@intel.com>
Link: http://lkml.kernel.org/r/1334584744.26997.14.camel@mfleming-mobl1.ger.corp.intel.com
Reported-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> v3.3
2012-04-16 11:41:44 -07:00
H. Peter Anvin a9aff3eaaf Merge branch x86/build into x86/efi and fix up arch/x86/boot/tools/build.c
Reason for merge:
       The updates to the EFI boot stub generation conflicted with the
       changes to properly use the get/put_unaligned_le*() macros to
       generate images.

       This merge commit completes the conversion in
       arch/x86/boot/tools/build.c including the places in the code
       which had been changed on the x86/efi branch.

Resolved Conflicts:
	arch/x86/boot/tools/build.c

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-03-28 13:11:36 -07:00
Matt Fleming e31be363df x86, efi: Fix .text section overlapping image header for EFI_STUB
This change modifes the PE .text section to start after
the first sector of the kernel image.

The header may be modified by the UEFI secure boot signing,
so it is not appropriate for it to be included in one of the
image sections. Since the sections are part of the secure
boot hash, this modification to the .text section contents
would invalidate the secure boot signed hash.

Note: UEFI secure boot does hash the image header, but
fields that are changed by the signing process are excluded
from the hash calculation.  This exclusion process is only
handled for the image header, and not image sections.

Luckily, we can still easily boot without the first sector
by initializing a few fields in arch/x86/boot/compressed/eboot.c.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1332520506-6472-3-git-send-email-jordan.l.justen@intel.com
[jordan.l.justen@intel.com: set .text vma & file offset]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-03-26 13:10:01 -07:00
Linus Torvalds 28f23d1f3b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 "urgent" leftovers from Ingo Molnar:
 "Pending x86/urgent bits that were not high prio enough to warrant
  -rc-less v3.3-final inclusion."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Fix pointer math issue in handle_ramdisks()
  x86/ioapic: Add register level checks to detect bogus io-apic entries
  x86, mce: Fix rcu splat in drain_mce_log_buffer()
  x86, memblock: Move mem_hole_size() to .init
2012-03-22 09:44:50 -07:00
Dan Carpenter c7b738351b x86, efi: Fix pointer math issue in handle_ramdisks()
"filename" is a efi_char16_t string so this check for reaching the end
of the array doesn't work.  We need to cast the pointer to (u8 *) before
doing the math.

This patch changes the "filename" to "filename_16" to avoid confusion in
the future.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20120305180614.GA26880@elgon.mountain
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-03-16 13:03:24 -07:00
Matt Fleming 12871c5683 x86, mkpiggy: Don't open code put_unaligned_le32()
Use the new headers in tools/include instead of rolling our own
put_unaligned_le32() implementation.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1330436245-24875-4-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-28 10:22:57 -08:00
Matt Fleming 55f9709cd0 x86, relocs: Don't open code put_unaligned_le32()
Use the new headers in tools/include instead of rolling our own
put_unaligned_le32() implementation.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1330436245-24875-3-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-28 10:22:55 -08:00
Jesper Juhl 5067cf53ca x86/boot-image: Don't leak phdrs in arch/x86/boot/compressed/misc.c::Parse_elf()
We allocate memory with malloc(), but neglect to free it before
the variable 'phdrs' goes out of scope --> leak.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1201232332590.8772@swampdragon.chaosbits.net
[ Mostly harmless. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26 11:30:29 +01:00
Maarten Lankhorst 2d2da60fb4 x86, efi: Break up large initrd reads
The efi boot stub tries to read the entire initrd in 1 go, however
some efi implementations hang if too much if asked to read too much
data at the same time. After some experimentation I found out that my
asrock p67 board will hang if asked to read chunks of 4MiB, so use a
safe value.

elilo reads in chunks of 16KiB, but since that requires many read
calls I use a value of 1 MiB.  hpa suggested adding individual
blacklists for when systems are found where this value causes a crash.

Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Link: http://lkml.kernel.org/r/4EEB3A02.3090201@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-16 08:34:35 -08:00
Matt Fleming 291f36325f x86, efi: EFI boot stub support
There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,

"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."

This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.

The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.

Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img

v7:
 - Fix checkpatch warnings.

v6:

 - Try to allocate initrd memory just below hdr->inird_addr_max.

v5:

 - load_options_size is UTF-16, which needs dividing by 2 to convert
   to the corresponding ASCII size.

v4:

 - Don't read more than image->load_options_size

v3:

 - Fix following warnings when compiling CONFIG_EFI_STUB=n

   arch/x86/boot/tools/build.c: In function ‘main’:
   arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
   arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’

 - As reported by Matthew Garrett, some Apple machines have GOPs that
   don't have hardware attached. We need to weed these out by
   searching for ones that handle the PCIIO protocol.

 - Don't allocate memory if no initrds are on cmdline
 - Don't trust image->load_options_size

Maarten Lankhorst noted:
 - Don't strip first argument when booted from efibootmgr
 - Don't allocate too much memory for cmdline
 - Don't update cmdline_size, the kernel considers it read-only
 - Don't accept '\n' for initrd names

v2:

 - File alignment was too large, was 8192 should be 512. Reported by
   Maarten Lankhorst on LKML.
 - Added UGA support for graphics
 - Use VIDEO_TYPE_EFI instead of hard-coded number.
 - Move linelength assignment until after we've assigned depth
 - Dynamically fill out AddressOfEntryPoint in tools/build.c
 - Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
 - The bzImage may need to be relocated as it may have been loaded at
   a high address address by the firmware. This was required to get my
   macbook booting because the firmware loaded it at 0x7cxxxxxx, which
   triggers this error in decompress_kernel(),

	if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
		error("Destination address too large");

Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-12 14:26:10 -08:00