Commit Graph

201 Commits

Author SHA1 Message Date
Oleksandr Tymoshenko 62b71cd73d efi: fix panic in kdump kernel
Check if get_next_variable() is actually valid pointer before
calling it. In kdump kernel this method is set to NULL that causes
panic during the kexec-ed kernel boot.

Tested with QEMU and OVMF firmware.

Fixes: bad267f9e1 ("efi: verify that variable services are supported")
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Kuppuswamy Sathyanarayanan d228814b19 efi/libstub: Add get_event_log() support for CC platforms
To allow event log info access after boot, EFI boot stub extracts
the event log information and installs it in an EFI configuration
table. Currently, EFI boot stub only supports installation of event
log only for TPM 1.2 and TPM 2.0 protocols. Extend the same support
for CC protocol. Since CC platform also uses TCG2 format, reuse TPM2
support code as much as possible.

Link: https://uefi.org/specs/UEFI/2.10/38_Confidential_Computing.html#efi-cc-measurement-protocol [1]
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lkml.kernel.org/r/0229a87e-fb19-4dad-99fc-4afd7ed4099a%40collabora.com
[ardb: Split out final events table handling to avoid version confusion]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-09 11:37:17 +01:00
Ard Biesheuvel 7a1381e831 efi/tpm: Use symbolic GUID name from spec for final events table
The LINUX_EFI_ GUID identifiers are only intended to be used to refer to
GUIDs that are part of the Linux implementation, and are not considered
external ABI. (Famous last words).

GUIDs that already have a symbolic name in the spec should use that
name, to avoid confusion between firmware components. So use the
official name EFI_TCG2_FINAL_EVENTS_TABLE_GUID for the TCG2 'final
events' configuration table.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-09 11:36:56 +01:00
Masahisa Kojima 94f7f6182c efivarfs: automatically update super block flag
efivar operation is updated when the tee_stmm_efi module is probed.
tee_stmm_efi module supports SetVariable runtime service, but user needs
to manually remount the efivarfs as RW to enable the write access if the
previous efivar operation does not support SetVariable and efivarfs is
mounted as read-only.

This commit notifies the update of efivar operation to efivarfs
subsystem, then drops SB_RDONLY flag if the efivar operation supports
SetVariable.

Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
[ardb: use per-superblock instance of the notifier block]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:19:18 +01:00
Masahisa Kojima 6bb3703aa5 efi: expose efivar generic ops register function
This is a preparation for supporting efivar operations provided by other
than efi subsystem.  Both register and unregister functions are exposed
so that non-efi subsystem can revert the efi generic operation.

Acked-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:19:18 +01:00
Linus Torvalds 1e0c505e13 asm-generic updates for v6.7
The ia64 architecture gets its well-earned retirement as planned,
 now that there is one last (mostly) working release that will
 be maintained as an LTS kernel.
 
 The architecture specific system call tables are updated for
 the added map_shadow_stack() syscall and to remove references
 to the long-gone sys_lookup_dcookie() syscall.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ
 Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ
 tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq
 XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms
 R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ
 kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv
 shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4
 LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9
 ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD
 eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A
 BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W
 vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI=
 =H1vH
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull ia64 removal and asm-generic updates from Arnd Bergmann:

 - The ia64 architecture gets its well-earned retirement as planned,
   now that there is one last (mostly) working release that will be
   maintained as an LTS kernel.

 - The architecture specific system call tables are updated for the
   added map_shadow_stack() syscall and to remove references to the
   long-gone sys_lookup_dcookie() syscall.

* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  hexagon: Remove unusable symbols from the ptrace.h uapi
  asm-generic: Fix spelling of architecture
  arch: Reserve map_shadow_stack() syscall number for all architectures
  syscalls: Cleanup references to sys_lookup_dcookie()
  Documentation: Drop or replace remaining mentions of IA64
  lib/raid6: Drop IA64 support
  Documentation: Drop IA64 from feature descriptions
  kernel: Drop IA64 support from sig_fault handlers
  arch: Remove Itanium (IA-64) architecture
2023-11-01 15:28:33 -10:00
Ard Biesheuvel c03d21f05e Merge 3rd batch of EFI fixes into efi/urgent 2023-10-20 18:11:06 +02:00
Kuan-Wei Chiu 0d3ad19179 efi: fix memory leak in krealloc failure handling
In the previous code, there was a memory leak issue where the
previously allocated memory was not freed upon a failed krealloc
operation. This patch addresses the problem by releasing the old memory
before setting the pointer to NULL in case of a krealloc failure. This
ensures that memory is properly managed and avoids potential memory
leaks.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-10-13 12:32:37 +02:00
Kirill A. Shutemov 8dbe33956d efi/unaccepted: Make sure unaccepted table is mapped
Unaccepted table is now allocated from EFI_ACPI_RECLAIM_MEMORY. It
translates into E820_TYPE_ACPI, which is not added to memblock and
therefore not mapped in the direct mapping.

This causes a crash on the first touch of the table.

Use memblock_add() to make sure that the table is mapped in direct
mapping.

Align the range to the nearest page borders. Ranges smaller than page
size are not mapped.

Fixes: e7761d827e ("efi/unaccepted: Use ACPI reclaim memory for unaccepted memory table")
Reported-by: Hongyu Ning <hongyu.ning@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-09-19 16:11:36 +00:00
Ard Biesheuvel cf8e865810 arch: Remove Itanium (IA-64) architecture
The Itanium architecture is obsolete, and an informal survey [0] reveals
that any residual use of Itanium hardware in production is mostly HP-UX
or OpenVMS based. The use of Linux on Itanium appears to be limited to
enthusiasts that occasionally boot a fresh Linux kernel to see whether
things are still working as intended, and perhaps to churn out some
distro packages that are rarely used in practice.

None of the original companies behind Itanium still produce or support
any hardware or software for the architecture, and it is listed as
'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
that contributed on behalf of those companies (nor anyone else, for that
matter) have been willing to support or maintain the architecture
upstream or even be responsible for applying the odd fix. The Intel
firmware team removed all IA-64 support from the Tianocore/EDK2
reference implementation of EFI in 2018. (Itanium is the original
architecture for which EFI was developed, and the way Linux supports it
deviates significantly from other architectures.) Some distros, such as
Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
dropped support years ago.

While the argument is being made [1] that there is a 'for the common
good' angle to being able to build and run existing projects such as the
Grid Community Toolkit [2] on Itanium for interoperability testing, the
fact remains that none of those projects are known to be deployed on
Linux/ia64, and very few people actually have access to such a system in
the first place. Even if there were ways imaginable in which Linux/ia64
could be put to good use today, what matters is whether anyone is
actually doing that, and this does not appear to be the case.

There are no emulators widely available, and so boot testing Itanium is
generally infeasible for ordinary contributors. GCC still supports IA-64
but its compile farm [3] no longer has any IA-64 machines. GLIBC would
like to get rid of IA-64 [4] too because it would permit some overdue
code cleanups. In summary, the benefits to the ecosystem of having IA-64
be part of it are mostly theoretical, whereas the maintenance overhead
of keeping it supported is real.

So let's rip off the band aid, and remove the IA-64 arch code entirely.
This follows the timeline proposed by the Debian/ia64 maintainer [5],
which removes support in a controlled manner, leaving IA-64 in a known
good state in the most recent LTS release. Other projects will follow
once the kernel support is removed.

[0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
[1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/
[2] https://gridcf.org/gct-docs/latest/index.html
[3] https://cfarm.tetaneutral.net/machines/list/
[4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/
[5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/

Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-09-11 08:13:17 +00:00
Linus Torvalds 937d96d2d5 EFI updates for v6.5
Although some more stuff is brewing, the EFI changes that are ready for
 mainline are few, so not a lot to pull this cycle:
 
 - improve the PCI DMA paranoia logic in the EFI stub
 - some constification changes
 - add statfs support to efivarfs
 - allow user space to enumerate updatable firmware resources without
   CAP_SYS_ADMIN
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZJ1jIwAKCRAwbglWLn0t
 XDs8AP9PAAWIgukyXkYpoxabaQQK1Pqw6Zv63XAcNYBHa4zjHwD/UTcYviQIlI0B
 Rfj4i8pDQVVfReSI+lKWvhXfRQ5Qbgs=
 =w6zX
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Although some more stuff is brewing, the EFI changes that are ready
  for mainline are few this cycle:

   - improve the PCI DMA paranoia logic in the EFI stub

   - some constification changes

   - add statfs support to efivarfs

   - allow user space to enumerate updatable firmware resources without
     CAP_SYS_ADMIN"

* tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/libstub: Disable PCI DMA before grabbing the EFI memory map
  efi/esrt: Allow ESRT access without CAP_SYS_ADMIN
  efivarfs: expose used and total size
  efi: make kobj_type structure constant
  efi: x86: make kobj_type structure constant
2023-06-30 21:35:52 -07:00
Linus Torvalds 2c96136a3f - Add support for unaccepted memory as specified in the UEFI spec v2.9.
The gist of it all is that Intel TDX and AMD SEV-SNP confidential
   computing guests define the notion of accepting memory before using it
   and thus preventing a whole set of attacks against such guests like
   memory replay and the like.
 
   There are a couple of strategies of how memory should be accepted
   - the current implementation does an on-demand way of accepting.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZ0f4ACgkQEsHwGGHe
 VUpasw//RKoNW9HSU1csY+XnG9uuaT6QKgji+gIEZWWIGPO9iibvbBj6P5WxJE8T
 fe7yb6CGa6d6thoU0v+mQGVVvCd7OjCFwPD5wAo4mXToD7Ig+4mI6jMkaKifqa2f
 N1Uuy8u/zQnGyWrP5Y//WH5bJYfsmds4UGwXI2nLvKlhE7MG90/ePjt7iqnnwZsy
 waLp6a0Q1VeOvnfRszFLHZw/SoER5RSJ4qeVqttkFNmPPEKMK1Kirrl2poR56OQJ
 nMr6LqVtD7erlSJ36VRXOKzLI443A4iIEIg/wBjIOU6L5ZEWJGNqtCDnIqFJ6+TM
 XatsejfRYkkMZH0qXtX9+M0u+HJHbZPCH5rEcA21P3Nbd7od/ANq91qCGoMjtUZ4
 7pZohMG8M6IDvkLiOb8fQVkR5k/9Jbk8UvdN/8jdPx1ERxYMFO3BDvJpV2gzrW4B
 KYtFTPR7j2nY3eKfDpe3flanqYzKUBsKoTlLnlH7UHaiMZ2idwG8AQjlrhC/erCq
 /Lq1LXt4Mq46FyHABc+PSHytu0WWj1nBUftRt+lviY/Uv7TlkBldOTT7wm7itsfF
 HUCTfLWl0CJXKPq8rbbZhAG/exN6Ay6MO3E3OcNq8A72E5y4cXenuG3ic/0tUuOu
 FfjpiMk35qE2Qb4hnj1YtF3XINtd1MpKcuwzGSzEdv9s3J7hrS0=
 =FS95
 -----END PGP SIGNATURE-----

Merge tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 confidential computing update from Borislav Petkov:

 - Add support for unaccepted memory as specified in the UEFI spec v2.9.

   The gist of it all is that Intel TDX and AMD SEV-SNP confidential
   computing guests define the notion of accepting memory before using
   it and thus preventing a whole set of attacks against such guests
   like memory replay and the like.

   There are a couple of strategies of how memory should be accepted -
   the current implementation does an on-demand way of accepting.

* tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sevguest: Add CONFIG_CRYPTO dependency
  x86/efi: Safely enable unaccepted memory in UEFI
  x86/sev: Add SNP-specific unaccepted memory support
  x86/sev: Use large PSC requests if applicable
  x86/sev: Allow for use of the early boot GHCB for PSC requests
  x86/sev: Put PSC struct on the stack in prep for unaccepted memory support
  x86/sev: Fix calculation of end address based on number of pages
  x86/tdx: Add unaccepted memory support
  x86/tdx: Refactor try_accept_one()
  x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub
  efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory
  efi: Add unaccepted memory support
  x86/boot/compressed: Handle unaccepted memory
  efi/libstub: Implement support for unaccepted memory
  efi/x86: Get full memory map in allocate_e820()
  mm: Add support for unaccepted memory
2023-06-26 15:32:39 -07:00
Linus Torvalds 69cbeb61ff Revert "efi: random: refresh non-volatile random seed when RNG is initialized"
This reverts commit e7b813b32a (and the
subsequent fix for it: 41a15855c1 "efi: random: fix NULL-deref when
refreshing seed").

It turns otu to cause non-deterministic boot stalls on at least a HP
6730b laptop.

Reported-and-bisected-by: Sami Korkalainen <sami.korkalainen@proton.me>
Link: https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-21 10:58:46 -07:00
Kirill A. Shutemov 2053bc57f3 efi: Add unaccepted memory support
efi_config_parse_tables() reserves memory that holds unaccepted memory
configuration table so it won't be reused by page allocator.

Core-mm requires few helpers to support unaccepted memory:

 - accept_memory() checks the range of addresses against the bitmap and
   accept memory if needed.

 - range_contains_unaccepted_memory() checks if anything within the
   range requires acceptance.

Architectural code has to provide efi_get_unaccepted_table() that
returns pointer to the unaccepted memory configuration table.

arch_accept_memory() handles arch-specific part of memory acceptance.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230606142637.5171-6-kirill.shutemov@linux.intel.com
2023-06-06 17:22:20 +02:00
Kirill A. Shutemov 745e3ed85f efi/libstub: Implement support for unaccepted memory
UEFI Specification version 2.9 introduces the concept of memory
acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.

Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.

The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.

Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 (or whatever the arch uses) has to be modified on every
page acceptance. It leads to table fragmentation and there's a limited
number of entries in the e820 table.

Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents a
naturally aligned power-2-sized region of address space -- unit.

For x86, unit size is 2MiB: 4k of the bitmap is enough to track 64GiB or
physical address space.

In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.

Any unaccepted memory that is not aligned to unit_size gets accepted
upfront.

The bitmap is allocated and constructed in the EFI stub and passed down
to the kernel via EFI configuration table. allocate_e820() allocates the
bitmap if unaccepted memory is present, according to the size of
unaccepted region.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230606142637.5171-4-kirill.shutemov@linux.intel.com
2023-06-06 16:58:23 +02:00
Anisse Astier d86ff3333c efivarfs: expose used and total size
When writing EFI variables, one might get errors with no other message
on why it fails. Being able to see how much is used by EFI variables
helps analyzing such issues.

Since this is not a conventional filesystem, block size is intentionally
set to 1 instead of PAGE_SIZE.

x86 quirks of reserved size are taken into account; so that available
and free size can be different, further helping debugging space issues.

With this patch, one can see the remaining space in EFI variable storage
via efivarfs, like this:

   $ df -h /sys/firmware/efi/efivars/
   Filesystem      Size  Used Avail Use% Mounted on
   efivarfs        176K  106K   66K  62% /sys/firmware/efi/efivars

Signed-off-by: Anisse Astier <an.astier@criteo.com>
[ardb: - rename efi_reserved_space() to efivar_reserved_space()
       - whitespace/coding style tweaks]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-05-17 18:21:34 +02:00
Linus Torvalds 06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00
Ard Biesheuvel 1758817e7e efi: Use standard format for printing the EFI revision
The UEFI spec section 4.2.1 describes the way the human readable EFI
revision should be constructed from the 32-bit revision field in the
system table:

  The upper 16 bits of this field contain the major revision value,
  and the lower 16 bits contain the minor revision value. The minor
  revision values are binary coded decimals and are limited to the
  range of 00..99.

  When printed or displayed UEFI spec revision is referred as (Major
  revision).(Minor revision upper decimal).(Minor revision lower
  decimal) or (Major revision).(Minor revision upper decimal) in case
  Minor revision lower decimal is set to 0.

Let's adhere to this when logging the EFI revision to the kernel log.

Note that the bit about binary coded decimals is bogus, and the minor
revision lower decimal is simply the minor revision modulo 10, given the
symbolic definitions provided by the spec itself:

  #define EFI_2_40_SYSTEM_TABLE_REVISION ((2<<16) | (40))
  #define EFI_2_31_SYSTEM_TABLE_REVISION ((2<<16) | (31))
  #define EFI_2_30_SYSTEM_TABLE_REVISION ((2<<16) | (30))

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-03 18:09:00 +01:00
Ard Biesheuvel 234fa51db9 efi: Drop minimum EFI version check at boot
We currently pass a minimum major version to the generic EFI helper that
checks the system table magic and version, and refuse to boot if the
value is lower.

The motivation for this check is unknown, and even the code that uses
major version 2 as the minimum (ARM, arm64 and RISC-V) should make it
past this check without problems, and boot to a point where we have
access to a console or some other means to inform the user that the
firmware's major revision number made us unhappy. (Revision 2.0 of the
UEFI specification was released in January 2006, whereas ARM, arm64 and
RISC-V support where added in 2009, 2013 and 2017, respectively, so
checking for major version 2 or higher is completely arbitrary)

So just drop the check.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-03 18:01:07 +01:00
Anton Gusev 966d47e1f2 efi: fix potential NULL deref in efi_mem_reserve_persistent
When iterating on a linked list, a result of memremap is dereferenced
without checking it for NULL.

This patch adds a check that falls back on allocating a new page in
case memremap doesn't succeed.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 18df7577ad ("efi/memreserve: deal with memreserve entries in unmapped memory")
Signed-off-by: Anton Gusev <aagusev@ispras.ru>
[ardb: return -ENOMEM instead of breaking out of the loop]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-03 14:52:10 +01:00
Johan Hovold bad267f9e1 efi: verify that variable services are supported
Current Qualcomm UEFI firmware does not implement the variable services
but not all revisions clear the corresponding bits in the RT_PROP table
services mask and instead the corresponding calls return
EFI_UNSUPPORTED.

This leads to efi core registering the generic efivar ops even when the
variable services are not supported or when they are accessed through
some other interface (e.g. Google SMI or the upcoming Qualcomm SCM
implementation).

Instead of playing games with init call levels to make sure that the
custom implementations are registered after the generic one, make sure
that get_next_variable() is actually supported before registering the
generic ops.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-26 21:32:01 +01:00
Demi Marie Obenour fa7bee867d efi: Warn if trying to reserve memory under Xen
Doing so cannot work and should never happen.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-23 11:33:24 +01:00
Demi Marie Obenour c0fecaa44d efi: Apply allowlist to EFI configuration tables when running under Xen
As it turns out, Xen does not guarantee that EFI boot services data
regions in memory are preserved, which means that EFI configuration
tables pointing into such memory regions may be corrupted before the
dom0 OS has had a chance to inspect them.

This is causing problems for Qubes OS when it attempts to perform system
firmware updates, which requires that the contents of the EFI System
Resource Table are valid when the fwupd userspace program runs.

However, other configuration tables such as the memory attributes table
or the runtime properties table are equally affected, and so we need a
comprehensive workaround that works for any table type.

So when running under Xen, check the EFI memory descriptor covering the
start of the table, and disregard the table if it does not reside in
memory that is preserved by Xen.

Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-23 11:33:24 +01:00
Demi Marie Obenour aca1d27ac3 efi: xen: Implement memory descriptor lookup based on hypercall
Xen on x86 boots dom0 in EFI mode but without providing a memory map.
This means that some consistency checks we would like to perform on
configuration tables or other data structures in memory are not
currently possible.  Xen does, however, expose EFI memory descriptor
info via a Xen hypercall, so let's wire that up instead.  It turns out
that the returned information is not identical to what Linux's
efi_mem_desc_lookup would return: the address returned is the address
passed to the hypercall, and the size returned is the number of bytes
remaining in the configuration table.  However, none of the callers of
efi_mem_desc_lookup() currently care about this.  In the future, Xen may
gain a hypercall that returns the actual start address, which can be
used instead.

Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-22 10:14:15 +01:00
Demi Marie Obenour ab03e91e60 efi: memmap: Disregard bogus entries instead of returning them
The ESRT code currently contains two consistency checks on the memory
descriptor it obtains, but one of them is both incomplete and can only
trigger on invalid descriptors.

So let's drop these checks, and instead disregard descriptors entirely
if the start address is misaligned, or if the number of pages reaches
to or beyond the end of the address space.  Note that the memory map as
a whole could still be inconsistent: multiple entries might cover the
same area, or the address could be outside of the addressable PA space,
but validating that goes beyond the scope of these helpers.  Also note
that since the physical address space is never 64-bits wide, a
descriptor that includes the last page of memory is not valid.  This is
fortunate, since it means that a valid physical address will never be an
error pointer and that the length of a memory descriptor in bytes will
fit in a 64-bit unsigned integer.

Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-22 10:14:15 +01:00
Johan Hovold ade7fd908d efi: efivars: drop kobject from efivars_register()
Since commit 0f5b2c69a4 ("efi: vars: Remove deprecated 'efivars' sysfs
interface") and the removal of the sysfs interface there are no users of
the efivars kobject.

Drop the kobject argument from efivars_register() and add a new
efivar_is_available() helper in favour of the old efivars_kobject().

Note that the new helper uses the prefix 'efivar' (i.e. without an 's')
for consistency with efivar_supports_writes() and the rest of the
interface (except the registration functions).

For the benefit of drivers with optional EFI support, also provide a
dummy implementation of efivar_is_available().

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-17 16:14:36 +01:00
Johan Hovold 703c13fe3c efi: fix NULL-deref in init error path
In cases where runtime services are not supported or have been disabled,
the runtime services workqueue will never have been allocated.

Do not try to destroy the workqueue unconditionally in the unlikely
event that EFI initialisation fails to avoid dereferencing a NULL
pointer.

Fixes: 98086df8b7 ("efi: add missed destroy_workqueue when efisubsys_init fails")
Cc: stable@vger.kernel.org
Cc: Li Heng <liheng40@huawei.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-03 10:52:15 +01:00
Johan Hovold 41a15855c1 efi: random: fix NULL-deref when refreshing seed
Do not try to refresh the RNG seed in case the firmware does not support
setting variables.

This is specifically needed to prevent a NULL-pointer dereference on the
Lenovo X13s with some firmware revisions, or more generally, whenever
the runtime services have been disabled (e.g. efi=noruntime or with
PREEMPT_RT).

Fixes: e7b813b32a ("efi: random: refresh non-volatile random seed when RNG is initialized")
Reported-by: Steev Klimaszewski <steev@kali.org>
Reported-by: Bjorn Andersson <andersson@kernel.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-20 03:13:45 +01:00
Linus Torvalds fc4c9f4504 EFI updates for v6.2:
- Refactor the zboot code so that it incorporates all the EFI stub
   logic, rather than calling the decompressed kernel as a EFI app.
 - Add support for initrd= command line option to x86 mixed mode.
 - Allow initrd= to be used with arbitrary EFI accessible file systems
   instead of just the one the kernel itself was loaded from.
 - Move some x86-only handling and manipulation of the EFI memory map
   into arch/x86, as it is not used anywhere else.
 - More flexible handling of any random seeds provided by the boot
   environment (i.e., systemd-boot) so that it becomes available much
   earlier during the boot.
 - Allow improved arch-agnostic EFI support in loaders, by setting a
   uniform baseline of supported features, and adding a generic magic
   number to the DOS/PE header. This should allow loaders such as GRUB or
   systemd-boot to reduce the amount of arch-specific handling
   substantially.
 - (arm64) Run EFI runtime services from a dedicated stack, and use it to
   recover from synchronous exceptions that might occur in the firmware
   code.
 - (arm64) Ensure that we don't allocate memory outside of the 48-bit
   addressable physical range.
 - Make EFI pstore record size configurable
 - Add support for decoding CXL specific CPER records
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
 jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
 wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
 ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
 +Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
 FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
 ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
 dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
 qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
 Sk71c6Em
 =JEXz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Another fairly sizable pull request, by EFI subsystem standards.

  Most of the work was done by me, some of it in collaboration with the
  distro and bootloader folks (GRUB, systemd-boot), where the main focus
  has been on removing pointless per-arch differences in the way EFI
  boots a Linux kernel.

   - Refactor the zboot code so that it incorporates all the EFI stub
     logic, rather than calling the decompressed kernel as a EFI app.

   - Add support for initrd= command line option to x86 mixed mode.

   - Allow initrd= to be used with arbitrary EFI accessible file systems
     instead of just the one the kernel itself was loaded from.

   - Move some x86-only handling and manipulation of the EFI memory map
     into arch/x86, as it is not used anywhere else.

   - More flexible handling of any random seeds provided by the boot
     environment (i.e., systemd-boot) so that it becomes available much
     earlier during the boot.

   - Allow improved arch-agnostic EFI support in loaders, by setting a
     uniform baseline of supported features, and adding a generic magic
     number to the DOS/PE header. This should allow loaders such as GRUB
     or systemd-boot to reduce the amount of arch-specific handling
     substantially.

   - (arm64) Run EFI runtime services from a dedicated stack, and use it
     to recover from synchronous exceptions that might occur in the
     firmware code.

   - (arm64) Ensure that we don't allocate memory outside of the 48-bit
     addressable physical range.

   - Make EFI pstore record size configurable

   - Add support for decoding CXL specific CPER records"

* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
  arm64: efi: Recover from synchronous exceptions occurring in firmware
  arm64: efi: Execute runtime services from a dedicated stack
  arm64: efi: Limit allocations to 48-bit addressable physical region
  efi: Put Linux specific magic number in the DOS header
  efi: libstub: Always enable initrd command line loader and bump version
  efi: stub: use random seed from EFI variable
  efi: vars: prohibit reading random seed variables
  efi: random: combine bootloader provided RNG seed with RNG protocol output
  efi/cper, cxl: Decode CXL Error Log
  efi/cper, cxl: Decode CXL Protocol Error Section
  efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
  efi: x86: Move EFI runtime map sysfs code to arch/x86
  efi: runtime-maps: Clarify purpose and enable by default for kexec
  efi: pstore: Add module parameter for setting the record size
  efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
  efi: memmap: Move manipulation routines into x86 arch tree
  efi: memmap: Move EFI fake memmap support into x86 arch tree
  efi: libstub: Undeprecate the command line initrd loader
  efi: libstub: Add mixed mode support to command line initrd loader
  efi: libstub: Permit mixed mode return types other than efi_status_t
  ...
2022-12-13 14:31:47 -08:00
Jason A. Donenfeld e7b813b32a efi: random: refresh non-volatile random seed when RNG is initialized
EFI has a rather unique benefit that it has access to some limited
non-volatile storage, where the kernel can store a random seed. Register
a notification for when the RNG is initialized, and at that point, store
a new random seed.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-22 14:53:36 +01:00
Ard Biesheuvel 196dff2712 efi: random: combine bootloader provided RNG seed with RNG protocol output
Instead of blindly creating the EFI random seed configuration table if
the RNG protocol is implemented and works, check whether such a EFI
configuration table was provided by an earlier boot stage and if so,
concatenate the existing and the new seeds, leaving it up to the core
code to mix it in and credit it the way it sees fit.

This can be used for, e.g., systemd-boot, to pass an additional seed to
Linux in a way that can be consumed by the kernel very early. In that
case, the following definitions should be used to pass the seed to the
EFI stub:

struct linux_efi_random_seed {
      u32     size; // of the 'seed' array in bytes
      u8      seed[];
};

The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY
pool memory, and the address of the struct in memory should be installed
as a EFI configuration table using the following GUID:

LINUX_EFI_RANDOM_SEED_TABLE_GUID        1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b

Note that doing so is safe even on kernels that were built without this
patch applied, but the seed will simply be overwritten with a seed
derived from the EFI RNG protocol, if available. The recommended seed
size is 32 bytes, and seeds larger than 512 bytes are considered
corrupted and ignored entirely.

In order to preserve forward secrecy, seeds from previous bootloaders
are memzero'd out, and in order to preserve memory, those older seeds
are also freed from memory. Freeing from memory without first memzeroing
is not safe to do, as it's possible that nothing else will ever
overwrite those pages used by EFI.

Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
[ardb: incorporate Jason's followup changes to extend the maximum seed
       size on the consumer end, memzero() it and drop a needless printk]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18 10:19:55 +01:00
Ard Biesheuvel 1fff234de2 efi: x86: Move EFI runtime map sysfs code to arch/x86
The EFI runtime map code is only wired up on x86, which is the only
architecture that has a need for it in its implementation of kexec.

So let's move this code under arch/x86 and drop all references to it
from generic code. To ensure that the efi_runtime_map_init() is invoked
at the appropriate time use a 'sync' subsys_initcall() that will be
called right after the EFI initcall made from generic code where the
original invocation of efi_runtime_map_init() resided.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Dave Young <dyoung@redhat.com>
2022-11-18 09:14:09 +01:00
Ard Biesheuvel 732ea9db9d efi: libstub: Move screen_info handling to common code
Currently, arm64, RISC-V and LoongArch rely on the fact that struct
screen_info can be accessed directly, due to the fact that the EFI stub
and the core kernel are part of the same image. This will change after a
future patch, so let's ensure that the screen_info handling is able to
deal with this, by adopting the arm32 approach of passing it as a
configuration table. While at it, switch to ACPI reclaim memory to hold
the screen_info data, which is more appropriate for this kind of
allocation.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-09 12:42:02 +01:00
Ard Biesheuvel 161a438d73 efi: random: reduce seed size to 32 bytes
We no longer need at least 64 bytes of random seed to permit the early
crng init to complete. The RNG is now based on Blake2s, so reduce the
EFI seed size to the Blake2s hash size, which is sufficient for our
purposes.

While at it, drop the READ_ONCE(), which was supposed to prevent size
from being evaluated after seed was unmapped. However, this cannot
actually happen, so READ_ONCE() is unnecessary here.

Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
2022-10-24 10:21:02 +02:00
Ard Biesheuvel 4b017e59f0 efi: ssdt: Don't free memory if ACPI table was loaded successfully
Amadeusz reports KASAN use-after-free errors introduced by commit
3881ee0b1e ("efi: avoid efivars layer when loading SSDTs from
variables"). The problem appears to be that the memory that holds the
new ACPI table is now freed unconditionally, instead of only when the
ACPI core reported a failure to load the table.

So let's fix this, by omitting the kfree() on success.

Cc: <stable@vger.kernel.org> # v6.0
Link: https://lore.kernel.org/all/a101a10a-4fbb-5fae-2e3c-76cf96ed8fbd@linux.intel.com/
Fixes: 3881ee0b1e ("efi: avoid efivars layer when loading SSDTs from variables")
Reported-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-21 11:09:40 +02:00
Linus Torvalds 27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Ard Biesheuvel f4dc7fffa9 efi: libstub: unify initrd loading between architectures
Use a EFI configuration table to pass the initrd to the core kernel,
instead of per-arch methods. This cleans up the code considerably, and
should make it easier for architectures to get rid of their reliance on
DT for doing EFI boot in the future.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-27 13:21:36 +02:00
Liam R. Howlett 524e00b36e mm: remove rb tree.
Remove the RB tree and start using the maple tree for vm_area_struct
tracking.

Drop validate_mm() calls in expand_upwards() and expand_downwards() as the
lock is not held.

Link: https://lkml.kernel.org/r/20220906194824.2110408-18-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:16 -07:00
Liam R. Howlett d4af56c5c7 mm: start tracking VMAs with maple tree
Start tracking the VMAs with the new maple tree structure in parallel with
the rb_tree.  Add debug and trace events for maple tree operations and
duplicate the rb_tree that is created on forks into the maple tree.

The maple tree is added to the mm_struct including the mm_init struct,
added support in required mm/mmap functions, added tracking in kernel/fork
for process forking, and used to find the unmapped_area and checked
against what the rbtree finds.

This also moves the mmap_lock() in exit_mmap() since the oom reaper call
does walk the VMAs.  Otherwise lockdep will be unhappy if oom happens.

When splitting a vma fails due to allocations of the maple tree nodes,
the error path in __split_vma() calls new->vm_ops->close(new).  The page
accounting for hugetlb is actually in the close() operation,  so it
accounts for the removal of 1/2 of the VMA which was not adjusted.  This
results in a negative exit value.  To avoid the negative charge, set
vm_start = vm_end and vm_pgoff = 0.

There is also a potential accounting issue in special mappings from
insert_vm_struct() failing to allocate, so reverse the charge there in
the failure scenario.

Link: https://lkml.kernel.org/r/20220906194824.2110408-9-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:14 -07:00
Ard Biesheuvel 2d82e6227e efi: vars: Move efivar caching layer into efivarfs
Move the fiddly bits of the efivar layer into its only remaining user,
efivarfs, and confine its use to that particular module. All other uses
of the EFI variable store have no need for this additional layer of
complexity, given that they either only read variables, or read and
write variables into a separate GUIDed namespace, and cannot be used to
manipulate EFI variables that are covered by the EFI spec and/or affect
the boot flow.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-24 20:40:19 +02:00
Ard Biesheuvel 3881ee0b1e efi: avoid efivars layer when loading SSDTs from variables
The efivars intermediate variable access layer provides an abstraction
that permits the EFI variable store to be replaced by something else
that implements a compatible interface, and caches all variables in the
variable store for fast access via the efivarfs pseudo-filesystem.

The SSDT override feature does not take advantage of either feature, as
it is only used when the generic EFI implementation of efivars is used,
and it traverses all variables only once to find the ones it is
interested in, and frees all data structures that the efivars layer
keeps right after.

So in this case, let's just call EFI's code directly, using the function
pointers in struct efi.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-20 12:43:25 +02:00
Ma Wupeng 6365a1935c efi: Make code to find mirrored memory ranges generic
Commit b05b9f5f9d ("x86, mirror: x86 enabling - find mirrored memory
ranges") introduce the efi_find_mirror() function on x86. In order to reuse
the API we make it public.

Arm64 can support mirrored memory too, so function efi_find_mirror() is added to
efi_init() to this support for arm64.

Since efi_init() is shared by ARM, arm64 and riscv, this patch will bring
mirror memory support for these architectures, but this support is only tested
in arm64.

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Link: https://lore.kernel.org/r/20220614092156.1972846-2-mawupeng1@huawei.com
[ardb: fix subject to better reflect the payload]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-15 12:11:19 +02:00
Dov Murik 20ffd9205e efi: Register efi_secret platform device if EFI secret area is declared
During efi initialization, check if coco_secret is defined in the EFI
configuration table; in such case, register platform device
"efi_secret".  This allows udev to automatically load the efi_secret
module (platform driver), which in turn will populate the
<securityfs>/secrets/coco directory in guests into which secrets were
injected.

Note that a declared address of an EFI secret area doesn't mean that
secrets where indeed injected to that area; if the secret area is not
populated, the driver will not load (but the platform device will still
be registered).

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Link: https://lore.kernel.org/r/20220412212127.154182-4-dovmurik@linux.ibm.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-04-13 19:11:20 +02:00
Dov Murik 1227418989 efi: Save location of EFI confidential computing area
Confidential computing (coco) hardware such as AMD SEV (Secure Encrypted
Virtualization) allows a guest owner to inject secrets into the VMs
memory without the host/hypervisor being able to read them.

Firmware support for secret injection is available in OVMF, which
reserves a memory area for secret injection and includes a pointer to it
the in EFI config table entry LINUX_EFI_COCO_SECRET_TABLE_GUID.

If EFI exposes such a table entry, uefi_init() will keep a pointer to
the EFI config table entry in efi.coco_secret, so it can be used later
by the kernel (specifically drivers/virt/coco/efi_secret).  It will also
appear in the kernel log as "CocoSecret=ADDRESS"; for example:

    [    0.000000] efi: EFI v2.70 by EDK II
    [    0.000000] efi: CocoSecret=0x7f22e680 SMBIOS=0x7f541000 ACPI=0x7f77e000 ACPI 2.0=0x7f77e014 MEMATTR=0x7ea0c018

The new functionality can be enabled with CONFIG_EFI_COCO_SECRET=y.

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Link: https://lore.kernel.org/r/20220412212127.154182-2-dovmurik@linux.ibm.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-04-13 19:11:18 +02:00
Javier Martinez Canillas a031651ff2 efi: Allow to enable EFI runtime services by default on RT
Commit d9f283ae71 ("efi: Disable runtime services on RT") disabled EFI
runtime services by default when the CONFIG_PREEMPT_RT option is enabled.

The rationale for that commit is that some EFI calls could take too much
time, leading to large latencies which is an issue for Real-Time kernels.

But a side effect of that change was that now is not possible anymore to
enable the EFI runtime services by default when CONFIG_PREEMPT_RT is set,
without passing an efi=runtime command line parameter to the kernel.

Instead, let's add a new EFI_DISABLE_RUNTIME boolean Kconfig option, that
would be set to n by default but to y if CONFIG_PREEMPT_RT is enabled.

That way, the current behaviour is preserved but gives users a mechanism
to enable the EFI runtimes services in their kernels if that is required.
For example, if the firmware could guarantee bounded time for EFI calls.

Also, having a separate boolean config could allow users to disable the
EFI runtime services by default even when CONFIG_PREEMPT_RT is not set.

Reported-by: Alexander Larsson <alexl@redhat.com>
Fixes: d9f283ae71 ("efi: Disable runtime services on RT")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20220331151654.184433-1-javierm@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-04-13 19:10:52 +02:00
Randy Dunlap 9feaf8b387 efi: fix return value of __setup handlers
When "dump_apple_properties" is used on the kernel boot command line,
it causes an Unknown parameter message and the string is added to init's
argument strings:

  Unknown kernel command line parameters "dump_apple_properties
    BOOT_IMAGE=/boot/bzImage-517rc6 efivar_ssdt=newcpu_ssdt", will be
    passed to user space.

 Run /sbin/init as init process
   with arguments:
     /sbin/init
     dump_apple_properties
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc6
     efivar_ssdt=newcpu_ssdt

Similarly when "efivar_ssdt=somestring" is used, it is added to the
Unknown parameter message and to init's environment strings, polluting
them (see examples above).

Change the return value of the __setup functions to 1 to indicate
that the __setup options have been handled.

Fixes: 58c5475aba ("x86/efi: Retrieve and assign Apple device properties")
Fixes: 475fb4e8b2 ("efi / ACPI: load SSTDs from EFI variables")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Link: https://lore.kernel.org/r/20220301041851.12459-1-rdunlap@infradead.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-03-01 09:02:21 +01:00
Ard Biesheuvel f5390cd0b4 efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
Aditya reports [0] that his recent MacbookPro crashes in the firmware
when using the variable services at runtime. The culprit appears to be a
call to QueryVariableInfo(), which we did not use to call on Apple x86
machines in the past as they only upgraded from EFI v1.10 to EFI v2.40
firmware fairly recently, and QueryVariableInfo() (along with
UpdateCapsule() et al) was added in EFI v2.00.

The only runtime service introduced in EFI v2.00 that we actually use in
Linux is QueryVariableInfo(), as the capsule based ones are optional,
generally not used at runtime (all the LVFS/fwupd firmware update
infrastructure uses helper EFI programs that invoke capsule update at
boot time, not runtime), and not implemented by Apple machines in the
first place. QueryVariableInfo() is used to 'safely' set variables,
i.e., only when there is enough space. This prevents machines with buggy
firmwares from corrupting their NVRAMs when they run out of space.

Given that Apple machines have been using EFI v1.10 services only for
the longest time (the EFI v2.0 spec was released in 2006, and Linux
support for the newly introduced runtime services was added in 2011, but
the MacbookPro12,1 released in 2015 still claims to be EFI v1.10 only),
let's avoid the EFI v2.0 ones on all Apple x86 machines.

[0] https://lore.kernel.org/all/6D757C75-65B1-468B-842D-10410081A8E4@live.com/

Cc: <stable@vger.kernel.org>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Reported-by: Aditya Garg <gargaditya08@live.com>
Tested-by: Orlando Chamberlain <redecorating@protonmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Aditya Garg <gargaditya08@live.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215277
2022-01-23 10:31:27 +01:00
Sebastian Andrzej Siewior 720dff78de efi: Allow efi=runtime
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.

This is useful on PREEMPT_RT where "efi=noruntime" is default and the
user might need to alter the boot order for instance.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-09-28 22:44:15 +02:00
Sebastian Andrzej Siewior d9f283ae71 efi: Disable runtime services on RT
Based on measurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. These 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firmware will erase flash blocks on NOR.

The time-functions are used by efi-rtc and can be triggered during
run-time (either via explicit read/write or ntp sync).

The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.

Disable EFI's runtime wrappers on PREEMPT_RT.

This was observed on "EFI v2.60 by SoftIron Overdrive 1000".

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-09-28 22:43:53 +02:00
Marc Zyngier 2bab693a60 firmware/efi: Tell memblock about EFI iomem reservations
kexec_load_file() relies on the memblock infrastructure to avoid
stamping over regions of memory that are essential to the survival
of the system.

However, nobody seems to agree how to flag these regions as reserved,
and (for example) EFI only publishes its reservations in /proc/iomem
for the benefit of the traditional, userspace based kexec tool.

On arm64 platforms with GICv3, this can result in the payload being
placed at the location of the LPI tables. Shock, horror!

Let's augment the EFI reservation code with a memblock_reserve() call,
protecting our dear tables from the secondary kernel invasion.

Reported-by: Moritz Fischer <mdf@kernel.org>
Tested-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-07-16 18:05:49 +02:00