Commit Graph

272 Commits

Author SHA1 Message Date
Ard Biesheuvel 9684ec186f arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.

To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.

Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.

To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:40 +00:00
Ard Biesheuvel 9cce9c6c2c arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.

Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.

On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.

Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:36 +00:00
Ard Biesheuvel 84b04d3e6b arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.

So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.

Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:34 +00:00
Ard Biesheuvel e6128a8e52 arm64: mm: Use 48-bit virtual addressing for the permanent ID map
Even though we support loading kernels anywhere in 48-bit addressable
physical memory, we create the ID maps based on the number of levels
that we happened to configure for the kernel VA and user VA spaces.

The reason for this is that the PGD/PUD/PMD based classification of
translation levels, along with the associated folding when the number of
levels is less than 5, does not permit creating a page table hierarchy
of a set number of levels. This means that, for instance, on 39-bit VA
kernels we need to configure an additional level above PGD level on the
fly, and 36-bit VA kernels still only support 47-bit virtual addressing
with this trick applied.

Now that we have a separate helper to populate page table hierarchies
that does not define the levels in terms of PUDS/PMDS/etc at all, let's
reuse it to create the permanent ID map with a fixed VA size of 48 bits.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-64-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:34 +00:00
Ard Biesheuvel 97a6f43bb0 arm64: head: Move early kernel mapping routines into C code
The asm version of the kernel mapping code works fine for creating a
coarse grained identity map, but for mapping the kernel down to its
exact boundaries with the right attributes, it is not suitable. This is
why we create a preliminary RWX kernel mapping first, and then rebuild
it from scratch later on.

So let's reimplement this in C, in a way that will make it unnecessary
to create the kernel page tables yet another time in paging_init().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-63-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:33 +00:00
Ard Biesheuvel aa6a52b247 arm64: head: move memstart_offset_seed handling to C code
Now that we can set BSS variables from the early code running from the
ID map, we can set memstart_offset_seed directly from the C code that
derives the value instead of passing it back and forth between C and asm
code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-60-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:32 +00:00
Ard Biesheuvel 8a6e40e1f6 arm64: head: move dynamic shadow call stack patching into early C runtime
Once we update the early kernel mapping code to only map the kernel once
with the right permissions, we can no longer perform code patching via
this mapping.

So move this code to an earlier stage of the boot, right after applying
the relocations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-54-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:30 +00:00
Ard Biesheuvel dcfe969a64 arm64: head: Run feature override detection before mapping the kernel
To permit the feature overrides to be taken into account before the
KASLR init code runs and the kernel mapping is created, move the
detection code to an earlier stage in the boot.

In a subsequent patch, this will be taken advantage of by merging the
preliminary and permanent mappings of the kernel text and data into a
single one that gets created and relocated before start_kernel() is
called.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-53-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:30 +00:00
Ard Biesheuvel aa99aad798 arm64: head: Clear BSS and the kernel page tables in one go
We will move the CPU feature overrides into BSS in a subsequent patch,
and this requires that BSS is zeroed before the feature override
detection code runs. So let's map BSS read-write in the ID map, and zero
it via this mapping.

Since the kernel page tables are right next to it, and also zeroed via
the ID map, let's drop the separate clear_page_tables() function, and
just zero everything in one go.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-51-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:29 +00:00
Ard Biesheuvel e223a44912 arm64: idreg-override: Move to early mini C runtime
We will want to parse the ID register overrides even earlier, so that we
can take them into account before creating the kernel mapping. So
migrate the code and make it work in the context of the early C runtime.
We will move the invocation to an earlier stage in a subsequent patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-49-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:28 +00:00
Ard Biesheuvel 734958ef0b arm64: head: move relocation handling to C code
Now that we have a mini C runtime before the kernel mapping is up, we
can move the non-trivial relocation processing code out of head.S and
reimplement it in C.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-48-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:28 +00:00
Ard Biesheuvel 376f5a3bd7 arm64: mm: get rid of kimage_vaddr global variable
We store the address of _text in kimage_vaddr, but since commit
09e3c22a86 ("arm64: Use a variable to store non-global mappings
decision"), we no longer reference this variable from modules so we no
longer need to export it.

In fact, we don't need it at all so let's just get rid of it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231129111555.3594833-46-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 11:06:28 +00:00
Anshuman Khandual 62ce7af97b arm64/mm: Directly use ID_AA64MMFR2_EL1_VARange_MASK
Tools generated register fields have in place mask macros which can be used
directly instead of shifting the older right end sided masks.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230711092055.245756-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-07-27 11:11:44 +01:00
Linus Torvalds e8069f5a8e ARM64:
* Eager page splitting optimization for dirty logging, optionally
   allowing for a VM to avoid the cost of hugepage splitting in the stage-2
   fault path.
 
 * Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
   services that live in the Secure world. pKVM intervenes on FF-A calls
   to guarantee the host doesn't misuse memory donated to the hyp or a
   pKVM guest.
 
 * Support for running the split hypervisor with VHE enabled, known as
   'hVHE' mode. This is extremely useful for testing the split
   hypervisor on VHE-only systems, and paves the way for new use cases
   that depend on having two TTBRs available at EL2.
 
 * Generalized framework for configurable ID registers from userspace.
   KVM/arm64 currently prevents arbitrary CPU feature set configuration
   from userspace, but the intent is to relax this limitation and allow
   userspace to select a feature set consistent with the CPU.
 
 * Enable the use of Branch Target Identification (FEAT_BTI) in the
   hypervisor.
 
 * Use a separate set of pointer authentication keys for the hypervisor
   when running in protected mode, as the host is untrusted at runtime.
 
 * Ensure timer IRQs are consistently released in the init failure
   paths.
 
 * Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
   (FEAT_EVT), as it is a register commonly read from userspace.
 
 * Erratum workaround for the upcoming AmpereOne part, which has broken
   hardware A/D state management.
 
 RISC-V:
 
 * Redirect AMO load/store misaligned traps to KVM guest
 
 * Trap-n-emulate AIA in-kernel irqchip for KVM guest
 
 * Svnapot support for KVM Guest
 
 s390:
 
 * New uvdevice secret API
 
 * CMM selftest and fixes
 
 * fix racy access to target CPU for diag 9c
 
 x86:
 
 * Fix missing/incorrect #GP checks on ENCLS
 
 * Use standard mmu_notifier hooks for handling APIC access page
 
 * Drop now unnecessary TR/TSS load after VM-Exit on AMD
 
 * Print more descriptive information about the status of SEV and SEV-ES during
   module load
 
 * Add a test for splitting and reconstituting hugepages during and after
   dirty logging
 
 * Add support for CPU pinning in demand paging test
 
 * Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
   included along the way
 
 * Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage
   recovery threads (because nx_huge_pages=off can be toggled at runtime)
 
 * Move handling of PAT out of MTRR code and dedup SVM+VMX code
 
 * Fix output of PIC poll command emulation when there's an interrupt
 
 * Add a maintainer's handbook to document KVM x86 processes, preferred coding
   style, testing expectations, etc.
 
 * Misc cleanups, fixes and comments
 
 Generic:
 
 * Miscellaneous bugfixes and cleanups
 
 Selftests:
 
 * Generate dependency files so that partial rebuilds work as expected
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSgHrIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroORcAf+KkBlXwQMf+Q0Hy6Mfe0OtkKmh0Ae
 6HJ6dsuMfOHhWv5kgukh+qvuGUGzHq+gpVKmZg2yP3h3cLHOLUAYMCDm+rjXyjsk
 F4DbnJLfxq43Pe9PHRKFxxSecRcRYCNox0GD5UYL4PLKcH0FyfQrV+HVBK+GI8L3
 FDzUcyJkR12Lcj1qf++7fsbzfOshL0AJPmidQCoc6wkLJpUEr/nYUqlI1Kx3YNuQ
 LKmxFHS4l4/O/px3GKNDrLWDbrVlwciGIa3GZLS52PZdW3mAqT+cqcPcYK6SW71P
 m1vE80VbNELX5q3YSRoOXtedoZ3Pk97LEmz/xQAsJ/jri0Z5Syk0Ok0m/Q==
 =AMXp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Eager page splitting optimization for dirty logging, optionally
     allowing for a VM to avoid the cost of hugepage splitting in the
     stage-2 fault path.

   - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
     with services that live in the Secure world. pKVM intervenes on
     FF-A calls to guarantee the host doesn't misuse memory donated to
     the hyp or a pKVM guest.

   - Support for running the split hypervisor with VHE enabled, known as
     'hVHE' mode. This is extremely useful for testing the split
     hypervisor on VHE-only systems, and paves the way for new use cases
     that depend on having two TTBRs available at EL2.

   - Generalized framework for configurable ID registers from userspace.
     KVM/arm64 currently prevents arbitrary CPU feature set
     configuration from userspace, but the intent is to relax this
     limitation and allow userspace to select a feature set consistent
     with the CPU.

   - Enable the use of Branch Target Identification (FEAT_BTI) in the
     hypervisor.

   - Use a separate set of pointer authentication keys for the
     hypervisor when running in protected mode, as the host is untrusted
     at runtime.

   - Ensure timer IRQs are consistently released in the init failure
     paths.

   - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
     Traps (FEAT_EVT), as it is a register commonly read from userspace.

   - Erratum workaround for the upcoming AmpereOne part, which has
     broken hardware A/D state management.

  RISC-V:

   - Redirect AMO load/store misaligned traps to KVM guest

   - Trap-n-emulate AIA in-kernel irqchip for KVM guest

   - Svnapot support for KVM Guest

  s390:

   - New uvdevice secret API

   - CMM selftest and fixes

   - fix racy access to target CPU for diag 9c

  x86:

   - Fix missing/incorrect #GP checks on ENCLS

   - Use standard mmu_notifier hooks for handling APIC access page

   - Drop now unnecessary TR/TSS load after VM-Exit on AMD

   - Print more descriptive information about the status of SEV and
     SEV-ES during module load

   - Add a test for splitting and reconstituting hugepages during and
     after dirty logging

   - Add support for CPU pinning in demand paging test

   - Add support for AMD PerfMonV2, with a variety of cleanups and minor
     fixes included along the way

   - Add a "nx_huge_pages=never" option to effectively avoid creating NX
     hugepage recovery threads (because nx_huge_pages=off can be toggled
     at runtime)

   - Move handling of PAT out of MTRR code and dedup SVM+VMX code

   - Fix output of PIC poll command emulation when there's an interrupt

   - Add a maintainer's handbook to document KVM x86 processes,
     preferred coding style, testing expectations, etc.

   - Misc cleanups, fixes and comments

  Generic:

   - Miscellaneous bugfixes and cleanups

  Selftests:

   - Generate dependency files so that partial rebuilds work as
     expected"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
  Documentation/process: Add a maintainer handbook for KVM x86
  Documentation/process: Add a label for the tip tree handbook's coding style
  KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
  RISC-V: KVM: Remove unneeded semicolon
  RISC-V: KVM: Allow Svnapot extension for Guest/VM
  riscv: kvm: define vcpu_sbi_ext_pmu in header
  RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
  RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel emulation of AIA APLIC
  RISC-V: KVM: Implement device interface for AIA irqchip
  RISC-V: KVM: Skeletal in-kernel AIA irqchip support
  RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
  RISC-V: KVM: Add APLIC related defines
  RISC-V: KVM: Add IMSIC related defines
  RISC-V: KVM: Implement guest external interrupt line management
  KVM: x86: Remove PRIx* definitions as they are solely for user space
  s390/uv: Update query for secret-UVCs
  s390/uv: replace scnprintf with sysfs_emit
  s390/uvdevice: Add 'Lock Secret Store' UVC
  ...
2023-07-03 15:32:22 -07:00
Marc Zyngier 1700f89cb9 KVM: arm64: Fix hVHE init on CPUs where HCR_EL2.E2H is not RES1
On CPUs where E2H is RES1, we very quickly set the scene for
running EL2 with a VHE configuration, as we do not have any other
choice.

However, CPUs that conform to the current writing of the architecture
start with E2H=0, and only later upgrade with E2H=1. This is all
good, but nothing there is actually reconfiguring EL2 to be able
to correctly run the kernel at EL1. Huhuh...

The "obvious" solution is not to just reinitialise the timer
controls like we do, but to really intitialise *everything*
unconditionally.

This requires a bit of surgery, and is a good opportunity to
remove the macro that messes with SPSR_EL2 in init_el2_state.

With that, hVHE now works correctly on my trusted A55 machine!

Reported-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614155129.2697388-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 09:27:51 +00:00
Joey Gouly f0af339fc4 arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS
With PIE enabled, the swapper PTEs would have a Permission Indirection Index
(PIIndex) of 0. A PIIndex of 0 is not currently used by any other PTEs.

To avoid using index 0 specifically for the swapper PTEs, mark them as
PTE_UXN and PTE_WRITE, so that they map to a PAGE_KERNEL_EXEC equivalent.

This also adds PTE_WRITE to KPTI_NG_PTE_FLAGS, which was tested by booting
with kpti=on.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230606145859.697944-12-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 16:52:41 +01:00
Neeraj Upadhyay 4e8f6e44bc arm64: Fix label placement in record_mmu_state()
Fix label so that pre_disable_mmu_workaround() is called
before clearing sctlr_el1.M.

Fixes: 2ced0f30a4 ("arm64: head: Switch endianness before populating the ID map")
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230425095700.22005-1-quic_neeraju@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-26 09:01:04 +01:00
Mark Rutland d54170812e arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.

Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.

| .idmap.text    0xffffffc01e48e5c0      0x32c arch/arm64/mm/proc.o
|                0xffffffc01e48e5c0                idmap_cpu_replace_ttbr1
|                0xffffffc01e48e600                idmap_kpti_install_ng_mappings
|                0xffffffc01e48e800                __cpu_setup
| *fill*         0xffffffc01e48e8ec        0x4
| .idmap.text.stub
|                0xffffffc01e48e8f0       0x18 linker stubs
|                0xffffffc01e48f8f0                __idmap_text_end = .
|                0xffffffc01e48f000                . = ALIGN (0x1000)
| *fill*         0xffffffc01e48f8f0      0x710
|                0xffffffc01e490000                idmap_pg_dir = .

This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:

| LD      .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2

Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.

At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.

|   LD      .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2

Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.

Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 18:23:35 +00:00
Catalin Marinas 156010ed9c Merge branches 'for-next/sysreg', 'for-next/sme', 'for-next/kselftest', 'for-next/misc', 'for-next/sme2', 'for-next/tpidr2', 'for-next/scs', 'for-next/compat-hwcap', 'for-next/ftrace', 'for-next/efi-boot-mmu-on', 'for-next/ptrauth' and 'for-next/pseudo-nmi', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  perf: arm_spe: Support new SPEv1.2/v8.7 'not taken' event
  perf: arm_spe: Use new PMSIDR_EL1 register enums
  perf: arm_spe: Drop BIT() and use FIELD_GET/PREP accessors
  arm64/sysreg: Convert SPE registers to automatic generation
  arm64: Drop SYS_ from SPE register defines
  perf: arm_spe: Use feature numbering for PMSEVFR_EL1 defines
  perf/marvell: Add ACPI support to TAD uncore driver
  perf/marvell: Add ACPI support to DDR uncore driver
  perf/arm-cmn: Reset DTM_PMU_CONFIG at probe
  drivers/perf: hisi: Extract initialization of "cpa_pmu->pmu"
  drivers/perf: hisi: Simplify the parameters of hisi_pmu_init()
  drivers/perf: hisi: Advertise the PERF_PMU_CAP_NO_EXCLUDE capability

* for-next/sysreg:
  : arm64 sysreg and cpufeature fixes/updates
  KVM: arm64: Use symbolic definition for ISR_EL1.A
  arm64/sysreg: Add definition of ISR_EL1
  arm64/sysreg: Add definition for ICC_NMIAR1_EL1
  arm64/cpufeature: Remove 4 bit assumption in ARM64_FEATURE_MASK()
  arm64/sysreg: Fix errors in 32 bit enumeration values
  arm64/cpufeature: Fix field sign for DIT hwcap detection

* for-next/sme:
  : SME-related updates
  arm64/sme: Optimise SME exit on syscall entry
  arm64/sme: Don't use streaming mode to probe the maximum SME VL
  arm64/ptrace: Use system_supports_tpidr2() to check for TPIDR2 support

* for-next/kselftest: (23 commits)
  : arm64 kselftest fixes and improvements
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  kselftest/arm64: Fix enumeration of systems without 128 bit SME for SSVE+ZA
  kselftest/arm64: Fix enumeration of systems without 128 bit SME
  kselftest/arm64: Don't require FA64 for streaming SVE tests
  kselftest/arm64: Limit the maximum VL we try to set via ptrace
  kselftest/arm64: Correct buffer size for SME ZA storage
  kselftest/arm64: Remove the local NUM_VL definition
  kselftest/arm64: Verify simultaneous SSVE and ZA context generation
  kselftest/arm64: Verify that SSVE signal context has SVE_SIG_FLAG_SM set
  kselftest/arm64: Remove spurious comment from MTE test Makefile
  kselftest/arm64: Support build of MTE tests with clang
  kselftest/arm64: Initialise current at build time in signal tests
  kselftest/arm64: Don't pass headers to the compiler as source
  kselftest/arm64: Remove redundant _start labels from FP tests
  kselftest/arm64: Fix .pushsection for strings in FP tests
  kselftest/arm64: Run BTI selftests on systems without BTI
  kselftest/arm64: Fix test numbering when skipping tests
  kselftest/arm64: Skip non-power of 2 SVE vector lengths in fp-stress
  kselftest/arm64: Only enumerate power of two VLs in syscall-abi
  ...

* for-next/misc:
  : Miscellaneous arm64 updates
  arm64/mm: Intercept pfn changes in set_pte_at()
  Documentation: arm64: correct spelling
  arm64: traps: attempt to dump all instructions
  arm64: Apply dynamic shadow call stack patching in two passes
  arm64: el2_setup.h: fix spelling typo in comments
  arm64: Kconfig: fix spelling
  arm64: cpufeature: Use kstrtobool() instead of strtobool()
  arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  arm64: make ARCH_FORCE_MAX_ORDER selectable

* for-next/sme2: (23 commits)
  : Support for arm64 SME 2 and 2.1
  arm64/sme: Fix __finalise_el2 SMEver check
  kselftest/arm64: Remove redundant _start labels from zt-test
  kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
  kselftest/arm64: Add coverage of the ZT ptrace regset
  kselftest/arm64: Add SME2 coverage to syscall-abi
  kselftest/arm64: Add test coverage for ZT register signal frames
  kselftest/arm64: Teach the generic signal context validation about ZT
  kselftest/arm64: Enumerate SME2 in the signal test utility code
  kselftest/arm64: Cover ZT in the FP stress test
  kselftest/arm64: Add a stress test program for ZT0
  arm64/sme: Add hwcaps for SME 2 and 2.1 features
  arm64/sme: Implement ZT0 ptrace support
  arm64/sme: Implement signal handling for ZT
  arm64/sme: Implement context switching for ZT0
  arm64/sme: Provide storage for ZT0
  arm64/sme: Add basic enumeration for SME2
  arm64/sme: Enable host kernel to access ZT0
  arm64/sme: Manually encode ZT0 load and store instructions
  arm64/esr: Document ISS for ZT0 being disabled
  arm64/sme: Document SME 2 and SME 2.1 ABI
  ...

* for-next/tpidr2:
  : Include TPIDR2 in the signal context
  kselftest/arm64: Add test case for TPIDR2 signal frame records
  kselftest/arm64: Add TPIDR2 to the set of known signal context records
  arm64/signal: Include TPIDR2 in the signal context
  arm64/sme: Document ABI for TPIDR2 signal information

* for-next/scs:
  : arm64: harden shadow call stack pointer handling
  arm64: Stash shadow stack pointer in the task struct on interrupt
  arm64: Always load shadow stack pointer directly from the task struct

* for-next/compat-hwcap:
  : arm64: Expose compat ARMv8 AArch32 features (HWCAPs)
  arm64: Add compat hwcap SSBS
  arm64: Add compat hwcap SB
  arm64: Add compat hwcap I8MM
  arm64: Add compat hwcap ASIMDBF16
  arm64: Add compat hwcap ASIMDFHM
  arm64: Add compat hwcap ASIMDDP
  arm64: Add compat hwcap FPHP and ASIMDHP

* for-next/ftrace:
  : Add arm64 support for DYNAMICE_FTRACE_WITH_CALL_OPS
  arm64: avoid executing padding bytes during kexec / hibernation
  arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
  arm64: ftrace: Update stale comment
  arm64: patching: Add aarch64_insn_write_literal_u64()
  arm64: insn: Add helpers for BTI
  arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
  ACPI: Don't build ACPICA with '-Os'
  Compiler attributes: GCC cold function alignment workarounds
  ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

* for-next/efi-boot-mmu-on:
  : Permit arm64 EFI boot with MMU and caches on
  arm64: kprobes: Drop ID map text from kprobes blacklist
  arm64: head: Switch endianness before populating the ID map
  efi: arm64: enter with MMU and caches enabled
  arm64: head: Clean the ID map and the HYP text to the PoC if needed
  arm64: head: avoid cache invalidation when entering with the MMU on
  arm64: head: record the MMU state at primary entry
  arm64: kernel: move identity map out of .text mapping
  arm64: head: Move all finalise_el2 calls to after __enable_mmu

* for-next/ptrauth:
  : arm64 pointer authentication cleanup
  arm64: pauth: don't sign leaf functions
  arm64: unify asm-arch manipulation

* for-next/pseudo-nmi:
  : Pseudo-NMI code generation optimisations
  arm64: irqflags: use alternative branches for pseudo-NMI logic
  arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap
  arm64: make ARM64_HAS_GIC_PRIO_MASKING depend on ARM64_HAS_GIC_CPUIF_SYSREGS
  arm64: rename ARM64_HAS_IRQ_PRIO_MASKING to ARM64_HAS_GIC_PRIO_MASKING
  arm64: rename ARM64_HAS_SYSREG_GIC_CPUIF to ARM64_HAS_GIC_CPUIF_SYSREGS
2023-02-10 18:51:49 +00:00
Ard Biesheuvel 2ced0f30a4 arm64: head: Switch endianness before populating the ID map
Ensure that the endianness used for populating the ID map matches the
endianness that the running kernel will be using, as this is no longer
guaranteed now that create_idmap() is invoked before init_kernel_el().

Note that doing so is only safe if the MMU is off, as switching the
endianness with the MMU on results in the active ID map to become
invalid. So also clear the M bit when toggling the EE bit in SCTLR, and
mark the MMU as disabled at boot.

Note that the same issue has resulted in preserve_boot_args() recording
the contents of registers X0 ... X3 in the wrong byte order, although
this is arguably a very minor concern.

Fixes: 32b135a7fa ("arm64: head: avoid cache invalidation when entering with the MMU on")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230125185910.962733-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-26 17:42:58 +00:00
Ard Biesheuvel 3dcf60bbfd arm64: head: Clean the ID map and the HYP text to the PoC if needed
If we enter with the MMU and caches enabled, the bootloader may not have
performed any cache maintenance to the PoC. So clean the ID mapped page
to the PoC, to ensure that instruction and data accesses with the MMU
off see the correct data. For similar reasons, clean all the HYP text to
the PoC as well when entering at EL2 with the MMU and caches enabled.

Note that this means primary_entry() itself needs to be moved into the
ID map as well, as we will return from init_kernel_el() with the MMU and
caches off.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-6-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:08 +00:00
Ard Biesheuvel 32b135a7fa arm64: head: avoid cache invalidation when entering with the MMU on
If we enter with the MMU on, there is no need for explicit cache
invalidation for stores to memory, as they will be coherent with the
caches.

Let's take advantage of this, and create the ID map with the MMU still
enabled if that is how we entered, and avoid any cache invalidation
calls in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-5-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:07 +00:00
Ard Biesheuvel 9d7c13e5dd arm64: head: record the MMU state at primary entry
Prepare for being able to deal with primary entry with the MMU and
caches enabled, by recording whether or not we entered with the MMU on
in register x19 and in a global variable. (Note that setting this
variable to '1' does not require cache invalidation, nor is it required
for storing the bootargs in that case, so omit the cache maintenance).

Since boot with the MMU and caches enabled is not permitted by the bare
metal boot protocol, ensure that a diagnostic is emitted and a taint bit
set if the MMU was found to be enabled on a non-EFI boot, and panic()
once the console is likely to be up. We will make an exception for EFI
boot later, which has strict requirements for the mapping of system
memory, permitting us to relax the boot protocol and hand over from the
EFI stub to the core kernel with MMU and caches left enabled.

While at it, add 'pre_disable_mmu_workaround' macro invocations to
init_kernel_el, as its manipulation of SCTLR_ELx may amount to disabling
of the MMU after subsequent patches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-4-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:07 +00:00
Ard Biesheuvel af7249b317 arm64: kernel: move identity map out of .text mapping
Reorganize the ID map slightly so that only code that is executed with
the MMU off or via the 1:1 mapping remains. This allows us to move the
identity map out of the .text segment, as it will no longer need
executable permissions via the kernel mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:07 +00:00
Ard Biesheuvel 82e4958800 arm64: head: Move all finalise_el2 calls to after __enable_mmu
In the primary boot path, finalise_el2() is called much later than on
the secondary boot or resume-from-suspend paths, and this does not
appear to be intentional.

Since we aim to do as little as possible before enabling the MMU and
caches, align secondary and resume with primary boot, and defer the call
to after the MMU is turned on. This also removes the need to clean
finalise_el2() to the PoC once we enable support for booting with the
MMU on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:07 +00:00
Ard Biesheuvel 2198d07c50 arm64: Always load shadow stack pointer directly from the task struct
All occurrences of the scs_load macro load the value of the shadow call
stack pointer from the task which is current at that point. So instead
of taking a task struct register argument in the scs_load macro to
specify the task struct to load from, let's always reference the current
task directly. This should make it much harder to exploit any
instruction sequences reloading the shadow call stack pointer register
from memory.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230109174800.3286265-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 14:26:18 +00:00
Ard Biesheuvel 3b619e22c4 arm64: implement dynamic shadow call stack for Clang
Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.

This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-4-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-09 18:06:35 +00:00
Linus Torvalds 18fd049731 arm64 updates for 6.1:
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.
 
 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation
   to match the actual kernel behaviour (zeroing the registers on syscall
   rather than "zeroed or preserved" previously).
 
 - More conversions to automatic system registers generation.
 
 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.
 
 - arm64 stacktrace cleanups and improvements.
 
 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.
 
 - Improve the reporting of EL1 exceptions: rework BTI and FPAC exception
   handling, better EL1 undefs reporting.
 
 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.
 
 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.
 
 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).
 
 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.
 
 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include larger
   SVE and SME vector lengths in signal tests, various cleanups.
 
 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.
 
 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmM9W4cACgkQa9axLQDI
 XvEy3w/+LJ3KCFowWiz5gTAWikjv+UVssHjLMJixn47V7hsEFQ26Xnam/438rTMI
 kE95u6DHUpw2SMIxKzFRO7oI5cQtP+cWGwTtOUnjVO+U1oN+HqDOIbO9DbylWDcU
 eeeqMMmawMfTPuZrYklpOhXscsorbrKIvYBg7wHYOcwBYV3EPhWr89lwMvTVRuyJ
 qpX628KlkGMaBcONNhv3nS3qZcAOs0oHQCAVS4C8czLDL+vtJlumXUS3xr1Mqm72
 xtFe7sje8Djr2kZ8mzh0GbFiZEBoBD3F/l7ayq8gVRaVpToUt8sk36Stjs4LojF1
 6imuAfji/5TItkScq5KhGqj6MIugwp/eUVbRN74OLNTYx7msF1ZADNFQ+Q0UuY0H
 SYK13KvmOji0xjS8qAfhqrwNB79sk3fb+zF9LjETbdz4ZJCgg9gcFbSUTY0DvMfS
 MXZk/jVeB07olA8xYbjh0BRt4UV9xU628FPQzK5k7e4Nzl4jSvgtJZCZanfuVtjy
 /ZS1vbN8o7tQLBAlVnw+Exi/VedkKxkkMgm8tPKsMgERTFDx0Pc4Gs72hRpDnPWT
 MRbeCCGleAf3JQ5vF0coBDNOCEVvweQgShHOyHTz0GyhWXLCFx3RJICo5I4EIpps
 LLUk4JK0fO3LVrf1AEpu5ZP4+Sact0zfsH3gB7qyLPYFDmjDXD8=
 =jl3Z
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.

 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
   documentation to match the actual kernel behaviour (zeroing the
   registers on syscall rather than "zeroed or preserved" previously).

 - More conversions to automatic system registers generation.

 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.

 - arm64 stacktrace cleanups and improvements.

 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.

 - Improve the reporting of EL1 exceptions: rework BTI and FPAC
   exception handling, better EL1 undefs reporting.

 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.

 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.

 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).

 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.

 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include
   larger SVE and SME vector lengths in signal tests, various cleanups.

 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.

 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
  arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
  arm64/kprobe: Optimize the performance of patching single-step slot
  arm64: defconfig: Add Coresight as module
  kselftest/arm64: Handle EINTR while reading data from children
  kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
  kselftest/arm64: Don't repeat termination handler for fp-stress
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: ftrace: fix module PLTs with mcount
  arm64: module: Remove unused plt_entry_is_initialized()
  arm64: module: Make plt_equals_entry() static
  arm64: fix the build with binutils 2.27
  kselftest/arm64: Don't enable v8.5 for MTE selftest builds
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
  kselftest/arm64: Fix typo in hwcap check
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64/sve: Add Perf extensions documentation
  ...
2022-10-06 11:51:49 -07:00
Mark Brown 8f40baded4 arm64/sysreg: Standardise naming for ID_AA64MMFR2_EL1.VARange
The kernel refers to ID_AA64MMFR2_EL1.VARange as LVA. In preparation for
automatic generation of defines for the system registers bring the naming
used by the kernel in sync with that of DDI0487H.a. No functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-12-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:03 +01:00
Mark Brown a957c6be2b arm64/sysreg: Add _EL1 into ID_AA64MMFR2_EL1 definition names
Normally we include the full register name in the defines for fields within
registers but this has not been followed for ID registers. In preparation
for automatic generation of defines add the _EL1s into the defines for
ID_AA64MMFR2_EL1 to follow the convention. No functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:02 +01:00
Mark Brown 2d987e64e8 arm64/sysreg: Add _EL1 into ID_AA64MMFR0_EL1 definition names
Normally we include the full register name in the defines for fields within
registers but this has not been followed for ID registers. In preparation
for automatic generation of defines add the _EL1s into the defines for
ID_AA64MMFR0_EL1 to follow the convention. No functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:02 +01:00
Ard Biesheuvel e62b9e6f25 arm64: head: Ignore bogus KASLR displacement on non-relocatable kernels
Even non-KASLR kernels can be built as relocatable, to work around
broken bootloaders that violate the rules regarding physical placement
of the kernel image - in this case, the physical offset modulo 2 MiB is
used as the KASLR offset, and all absolute symbol references are fixed
up in the usual way. This workaround is enabled by default.

CONFIG_RELOCATABLE can also be disabled entirely, in which case the
relocation code and the code that captures the offset are omitted from
the build. However, since commit aacd149b62 ("arm64: head: avoid
relocating the kernel twice for KASLR"), this code got out of sync, and
we still add the offset to the kernel virtual address before populating
the page tables even though we never capture it. This means we add a
bogus value instead, breaking the boot entirely.

Fixes: aacd149b62 ("arm64: head: avoid relocating the kernel twice for KASLR")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Link: https://lore.kernel.org/r/20220827070904.2216989-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-01 11:50:44 +01:00
Mark Rutland 1191b6256e arm64: fix KASAN_INLINE
Since commit:

  a004393f45 ("arm64: idreg-override: use early FDT mapping in ID map")

Kernels built with KASAN_INLINE=y die early in boot before producing any
console output. This is because the accesses made to the FDT (e.g. in
generic string processing functions) are instrumented with KASAN, and
with KASAN_INLINE=y any access to an address in TTBR0 results in a bogus
shadow VA, resulting in a data abort.

This patch fixes this by reverting commits:

  7559d9f975 ("arm64: setup: drop early FDT pointer helpers")
  bd0c3fa21878b6d0 ("arm64: idreg-override: use early FDT mapping in ID map")

... and using the TTBR1 fixmap mapping of the FDT.

Note that due to a later commit:

  b65e411d6c ("arm64: Save state of HCR_EL2.E2H before switch to EL1")

... which altered the prototype of init_feature_override() (and
invocation from head.S), commit bd0c3fa21878b6d0 does not revert
cleanly, and I've fixed that up manually.

Fixes: a004393f45 ("arm64: idreg-override: use early FDT mapping in ID map")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220713140949.45440-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-20 16:08:10 +01:00
Marc Zyngier ae4b7e38e9 arm64: Allow sticky E2H when entering EL1
For CPUs that have the unfortunate mis-feature to be stuck in
VHE mode, we perform a funny dance where we completely shortcut
the normal boot process to enable VHE and run the kernel at EL2,
and only then start booting the kernel.

Not only this is pretty ugly, but it means that the EL2 finalisation
occurs before we have processed the sysreg override.

Instead, start executing the kernel as if it was an EL1 guest and
rely on the normal EL2 finalisation to go back to EL2.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:22:51 +01:00
Marc Zyngier b65e411d6c arm64: Save state of HCR_EL2.E2H before switch to EL1
As we're about to switch the way E2H-stuck CPUs boot, save
the boot CPU E2H state as a flag tied to the boot mode
that can then be checked by the idreg override code.

This allows us to replace the is_kernel_in_hyp_mode() check
with a simple comparison with this state, even when running
at EL1. Note that this flag isn't saved in __boot_cpu_mode,
and is only kept in a register in the assembly code.

Use with caution.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:22:51 +01:00
Marc Zyngier 7ddb0c3df7 arm64: Rename the VHE switch to "finalise_el2"
as we are about to perform a lot more in 'mutate_to_vhe' than
we currently do, this function really becomes the point where
we finalise the basic EL2 configuration.

Reflect this into the code by renaming a bunch of things:
- HVC_VHE_RESTART -> HVC_FINALISE_EL2
- switch_to_vhe --> finalise_el2
- mutate_to_vhe -> __finalise_el2

No functional changes.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:22:51 +01:00
Ard Biesheuvel 0aaa68532e arm64: mm: fix booting with 52-bit address space
Joey reports that booting 52-bit VA capable builds on 52-bit VA capable
CPUs is broken since commit 0d9b1ffefa ("arm64: mm: make vabits_actual
a build time constant if possible"). This is due to the fact that the
primary CPU reads the vabits_actual variable before it has been
assigned.

The reason for deferring the assignment of vabits_actual was that we try
to perform as few stores to memory as we can with the MMU and caches
off, due to the cache coherency issues it creates.

Since __cpu_setup() [which is where the read of vabits_actual occurs] is
also called on the secondary boot path, we cannot just read the CPU ID
registers directly, given that the size of the VA space is decided by
the capabilities of the primary CPU. So let's read vabits_actual only on
the secondary boot path, and read the CPU ID registers directly on the
primary boot path, by making it a function parameter of __cpu_setup().

To ensure that all users of vabits_actual (including kasan_early_init())
observe the correct value, move the assignment of vabits_actual back
into asm code, but still defer it to after the MMU and caches have been
enabled.

Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 0d9b1ffefa ("arm64: mm: make vabits_actual a build time constant if possible")
Reported-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220701111045.2944309-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:19:07 +01:00
Mark Rutland bdbcd22d49 arm64: head: remove __PHYS_OFFSET
It's very easy to confuse __PHYS_OFFSET and PHYS_OFFSET. To clarify
things, let's remove __PHYS_OFFSET and use KERNEL_START directly, with
comments to show that we're using physical address, as we do for other
objects.

At the same time, update the comment regarding the kernel entry address
to mention __pa(KERNEL_START) rather than __pa(PAGE_OFFSET).

There should be no functional change as a result of this patch.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220629041207.1670133-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-29 10:28:33 +01:00
Ard Biesheuvel 7559d9f975 arm64: setup: drop early FDT pointer helpers
We no longer need to call into the kernel to map the FDT before calling
into the kernel so let's drop the helpers we added for this.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-22-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:11 +01:00
Ard Biesheuvel aacd149b62 arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.

All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.

So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal.  This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.

This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.

Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:11 +01:00
Ard Biesheuvel 005e12676a arm64: head: record CPU boot mode after enabling the MMU
In order to avoid having to touch memory with the MMU and caches
disabled, and therefore having to invalidate it from the caches
explicitly, just defer storing the value until after the MMU has been
turned on, unless we are giving up with an error.

While at it, move the associated variable definitions into C code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-19-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel 6495b9ba62 arm64: head: populate kernel page tables with MMU and caches on
Now that we can access the entire kernel image via the ID map, we can
execute the page table population code with the MMU and caches enabled.
The only thing we need to ensure is that translations via TTBR1 remain
disabled while we are updating the page tables the second time around,
in case KASLR wants them to be randomized.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-18-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel c0be8f18a3 arm64: head: factor out TTBR1 assignment into a macro
Create a macro load_ttbr1 to avoid having to repeat the same instruction
sequence 3 times in a subsequent patch. No functional change intended.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-17-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel a004393f45 arm64: idreg-override: use early FDT mapping in ID map
Instead of calling into the kernel to map the FDT into the kernel page
tables before even calling start_kernel(), let's switch to the initial,
temporary mapping of the device tree that has been added to the ID map.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-16-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel f70b3a2332 arm64: head: create a temporary FDT mapping in the initial ID map
We need to access the DT very early to get at the command line and the
KASLR seed, which currently means we rely on some hacks to call into the
kernel before really calling into the kernel, which is undesirable.

So instead, let's create a mapping for the FDT in the initial ID map,
which is feasible now that it has been extended to cover more than a
single page or block, and can be updated in place to remap other output
addresses.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-15-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel d7bea55027 arm64: head: use relative references to the RELA and RELR tables
Formerly, we had to access the RELA and RELR tables via the kernel
mapping that was being relocated, and so deriving the start and end
addresses using ADRP/ADD references was not possible, as the relocation
code runs from the ID map.

Now that we map the entire kernel image via the ID map, we can simplify
this, and just load the entries via the ID map as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-14-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel c3cee924bd arm64: head: cover entire kernel image in initial ID map
As a first step towards avoiding the need to create, tear down and
recreate the kernel virtual mapping with MMU and caches disabled, start
by expanding the ID map so it covers the page tables as well as all
executable code. This will allow us to populate the page tables with the
MMU and caches on, and call KASLR init code before setting up the
virtual mapping.

Since this ID map is only needed at boot, create it as a temporary set
of page tables, and populate the permanent ID map after enabling the MMU
and caches. While at it, switch to read-only attributes for the where
possible, as writable permissions are only needed for the initial kernel
page tables. Note that on 4k granule configurations, the permanent ID
map will now be reduced to a single page rather than a 2M block mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-13-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel b013c1e1c6 arm64: head: add helper function to remap regions in early page tables
The asm macros used to create the initial ID map and kernel mappings
don't support randomly remapping parts of the address space after it has
been populated. What we can do, however, given that all block or page
mappings are created at the final level, is take a subset of the mapped
range and update its attributes or output address. This will permit us
to make parts of these page tables read-only, or remap a part of it to
cover the device tree.

So add a helper that encapsulates this.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-12-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel 723d3a8ed1 arm64: head: pass ID map root table address to __enable_mmu()
We will be adding an initial ID map that covers the entire kernel image,
so we will pass the actual ID map root table to use to __enable_mmu(),
rather than hard code it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-10-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:09 +01:00
Ard Biesheuvel e42ade29e3 arm64: head: split off idmap creation code
Split off the creation of the ID map page tables, so that we can avoid
running it again unnecessarily when KASLR is in effect (which only
randomizes the virtual placement). This will permit us to drop some
explicit cache maintenance to the PoC which was necessary because the
cache invalidation being performed on some global variables might
otherwise clobber unrelated variables that happen to share a cacheline.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-8-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:09 +01:00