Commit graph

5067 commits

Author SHA1 Message Date
Will Deacon
ccaeeec529 Merge branch 'for-next/lpa2-prep' into for-next/core
* for-next/lpa2-prep:
  arm64: mm: get rid of kimage_vaddr global variable
  arm64: mm: Take potential load offset into account when KASLR is off
  arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  arm64: Add ARM64_HAS_LPA2 CPU capability
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
  arm64/mm: Modify range-based tlbi to decrement scale
2024-01-04 12:27:42 +00:00
Will Deacon
88619527b4 Merge branch 'for-next/kbuild' into for-next/core
* for-next/kbuild:
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64: replace <asm-generic/export.h> with <linux/export.h>
  arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
2024-01-04 12:27:35 +00:00
Will Deacon
79eb42b269 Merge branch 'for-next/fpsimd' into for-next/core
* for-next/fpsimd:
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  arm64: fpsimd: Drop unneeded 'busy' flag
2024-01-04 12:27:29 +00:00
Mark Rutland
eb15d707c2 arm64: Align boot cpucap handling with system cpucap handling
Currently the detection+enablement of boot cpucaps is separate from the
patching of boot cpucap alternatives, which means there's a period where
cpus_have_cap($CAP) and alternative_has_cap($CAP) may be mismatched.

It would be preferable to manage the boot cpucaps in the same way as the
system cpucaps, both for clarity and to minimize the risk of accidental
usage of code relying upon an alternative which has not yet been
patched.

This patch aligns the handling of boot cpucaps with the handling of
system cpucaps:

* The existing setup_boot_cpu_capabilities() function is moved to be
  closer to the setup_system_capabilities() and setup_system_features()
  functions so that they're more clearly related and more likely to be
  updated together in future.

* The patching of boot cpucap alternatives is moved into
  setup_boot_cpu_capabilities(), immediately after boot cpucaps are
  detected and enabled.

* A new setup_boot_cpu_features() function is added to mirror
  setup_system_features(); this handles initialization of cpucap data
  structures and calls setup_boot_cpu_capabilities(). This makes
  init_cpu_features() a closer mirror to update_cpu_features(), and
  makes smp_prepare_boot_cpu() a closer mirror to smp_cpus_done().

Importantly, while these changes alter the structure of the code, they
retain the existing order of calls to:

  init_cpu_features(); // prefix initializing feature regs
  init_cpucap_indirect_list();
  detect_system_supports_pseudo_nmi();
  update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU);
  enable_cpu_capabilities(SCOPE_BOOT_CPU);
  apply_boot_alternatives();

... and hence there should be no functional change as a result of this
patch; this is purely a structural cleanup.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20231212170910.3745497-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-13 16:02:01 +00:00
Ard Biesheuvel
2632e25217 arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
Now that kernel mode FPSIMD state is context switched along with other
task state, we can enable the existing logic that keeps track of which
task's FPSIMD state the CPU is holding in its registers. If it is the
context of the task that we are switching to, we can elide the reload of
the FPSIMD state from memory.

Note that we also need to check whether the FPSIMD state on this CPU is
the most recent: if a task gets migrated away and back again, the state
in memory may be more recent than the state in the CPU. So add another
CPU id field to task_struct to keep track of this. (We could reuse the
existing CPU id field used for user mode context, but that might result
in user state to be discarded unnecessarily, given that two distinct
CPUs could be holding the most recent user mode state and the most
recent kernel mode state)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231208113218.3001940-9-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 14:31:55 +00:00
Ard Biesheuvel
aefbab8e77 arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
Currently, the FPSIMD register file is not preserved and restored along
with the general registers on exception entry/exit or context switch.
For this reason, we disable preemption when enabling FPSIMD for kernel
mode use in task context, and suspend the processing of softirqs so that
there are no concurrent uses in the kernel. (Kernel mode FPSIMD may not
be used at all in other contexts).

Disabling preemption while doing CPU intensive work on inputs of
potentially unbounded size is bad for real-time performance, which is
why we try and ensure that SIMD crypto code does not operate on more
than ~4k at a time, which is an arbitrary limit and requires assembler
code to implement efficiently.

We can avoid the need for disabling preemption if we can ensure that any
in-kernel users of the NEON will not lose the FPSIMD register state
across a context switch. And given that disabling softirqs implicitly
disables preemption as well, we will also have to ensure that a softirq
that runs code using FPSIMD can safely interrupt an in-kernel user.

So introduce a thread_info flag TIF_KERNEL_FPSTATE, and modify the
context switch hook for FPSIMD to preserve and restore the kernel mode
FPSIMD to/from struct thread_struct when it is set. This avoids any
scheduling blackouts due to prolonged use of FPSIMD in kernel mode,
without the need for manual yielding.

In order to support softirq processing while FPSIMD is being used in
kernel task context, use the same flag to decide whether the kernel mode
FPSIMD state needs to be preserved and restored before allowing FPSIMD
to be used in softirq context.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231208113218.3001940-8-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 14:31:54 +00:00
Ard Biesheuvel
9b19700e62 arm64: fpsimd: Drop unneeded 'busy' flag
Kernel mode NEON will preserve the user mode FPSIMD state by saving it
into the task struct before clobbering the registers. In order to avoid
the need for preserving kernel mode state too, we disallow nested use of
kernel mode NEON, i..e, use in softirq context while the interrupted
task context was using kernel mode NEON too.

Originally, this policy was implemented using a per-CPU flag which was
exposed via may_use_simd(), requiring the users of the kernel mode NEON
to deal with the possibility that it might return false, and having NEON
and non-NEON code paths. This policy was changed by commit
13150149aa ("arm64: fpsimd: run kernel mode NEON with softirqs
disabled"), and now, softirq processing is disabled entirely instead,
and so may_use_simd() can never fail when called from task or softirq
context.

This means we can drop the fpsimd_context_busy flag entirely, and
instead, ensure that we disable softirq processing in places where we
formerly relied on the flag for preventing races in the FPSIMD preserve
routines.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20231208113218.3001940-7-ardb@google.com
[will: Folded in fix from CAMj1kXFhzbJRyWHELCivQW1yJaF=p07LLtbuyXYX3G1WtsdyQg@mail.gmail.com]
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 14:29:16 +00:00
Ard Biesheuvel
376f5a3bd7 arm64: mm: get rid of kimage_vaddr global variable
We store the address of _text in kimage_vaddr, but since commit
09e3c22a86 ("arm64: Use a variable to store non-global mappings
decision"), we no longer reference this variable from modules so we no
longer need to export it.

In fact, we don't need it at all so let's just get rid of it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231129111555.3594833-46-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 11:06:28 +00:00
Ard Biesheuvel
a22fc8e102 arm64: mm: Take potential load offset into account when KASLR is off
We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is
disabled, and this permits the loader (i.e., EFI) to place the kernel
anywhere in physical memory as long as the base address is 64k aligned.

This means that the 'KASLR' case described in the header that defines
the size of the statically allocated page tables could take effect even
when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231129111555.3594833-45-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 11:06:27 +00:00
Masahiro Yamada
8fd7588fd4 arm64: replace <asm-generic/export.h> with <linux/export.h>
Commit ddb5cdbafa ("kbuild: generate KSYMTAB entries by modpost")
deprecated <asm-generic/export.h>, which is now a wrapper of
<linux/export.h>.

Replace #include <asm-generic/export.h> with #include <linux/export.h>.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20231126151045.1556686-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-05 14:25:30 +00:00
Ryan Roberts
b1366d21da arm64: Add ARM64_HAS_LPA2 CPU capability
Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in the hypervisor.

Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, we combine both into a single
capability, allowing us to simplify the implementation. KVM requires
support in both stages in order to use LPA2 since the same library is
used for hyp stage 1 and guest stage 2 pgtables.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-6-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Anshuman Khandual
e477c8c483 arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-5-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Ryan Roberts
c910f2b655 arm64/mm: Update tlb invalidation routines for FEAT_LPA2
FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value as a
level hint for the 4KB granule (this is due to the extra level of
translation) - previously TTL=0b0100 meant no hint and was treated as
0b0000. Secondly, The BADDR field of the range-based tlbi instructions
is specified in 64KB units when LPA2 is in use (TCR.DS=1), whereas it is
in page units otherwise. Changes are required for tlbi to continue to
operate correctly when LPA2 is in use.

Solve the first problem by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. While we
are at it, we replace the notion of 0 being the non-hinted sentinel with
a macro, TLBI_TTL_UNKNOWN. This means callers won't need updating
if/when translation depth increases in future.

The second issue is more complex: When LPA2 is in use, use the non-range
tlbi instructions to forward align to a 64KB boundary first, then use
range-based tlbi from there on, until we have either invalidated all
pages or we have a single page remaining. If the latter, that is done
with non-range tlbi. We determine whether LPA2 is in use based on
lpa2_is_enabled() (for kernel calls) or kvm_lpa2_is_enabled() (for kvm
calls).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-4-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Ryan Roberts
936a4ec281 arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
Add stub functions which is initially always return false. These provide
the hooks that we need to update the range-based TLBI routines, whose
operands are encoded differently depending on whether lpa2 is enabled or
not.

The kernel and kvm will enable the use of lpa2 asynchronously in future,
and part of that enablement will involve fleshing out their respective
hook to advertise when it is using lpa2.

Since the kernel's decision to use lpa2 relies on more than just whether
the HW supports the feature, it can't just use the same static key as
kvm. This is another reason to use separate functions. lpa2_is_enabled()
is already implemented as part of Ard's kernel lpa2 series. Since kvm
will make its decision solely based on HW support, kvm_lpa2_is_enabled()
will be defined as system_supports_lpa2() once kvm starts using lpa2.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-3-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Ryan Roberts
e2768b798a arm64/mm: Modify range-based tlbi to decrement scale
In preparation for adding support for LPA2 to the tlb invalidation
routines, modify the algorithm used by range-based tlbi to start at the
highest 'scale' and decrement instead of starting at the lowest 'scale'
and incrementing. This new approach makes it possible to maintain 64K
alignment as we work through the range, until the last op (at scale=0).
This is required when LPA2 is enabled. (This part will be added in a
subsequent commit).

This change is separated into its own patch because it will also impact
non-LPA2 systems, and I want to make it easy to bisect in case it leads
to performance regression (see below for benchmarks that suggest this
should not be a problem).

The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for _incrementing_ scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Indeed benchmarking kernel compilation, a TLBI-heavy workload, suggests
that this new approach actually _improves_ performance slightly (using a
virtual machine on Apple M2):

Table shows time to execute kernel compilation workload with 8 jobs,
relative to baseline without this patch (more negative number is
bigger speedup). Repeated 9 times across 3 system reboots:

| counter   |       mean |     stdev |
|:----------|-----------:|----------:|
| real-time |      -0.6% |      0.0% |
| kern-time |      -1.6% |      0.5% |
| user-time |      -0.4% |      0.1% |

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-2-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Will Deacon
acfa60dbe0 arm64: mm: Fix "rodata=on" when CONFIG_RODATA_FULL_DEFAULT_ENABLED=y
When CONFIG_RODATA_FULL_DEFAULT_ENABLED=y, passing "rodata=on" on the
kernel command-line (rather than "rodata=full") should turn off the
"full" behaviour, leaving writable linear aliases of read-only kernel
memory. Unfortunately, the option has no effect in this situation and
the only way to disable the "rodata=full" behaviour is to disable rodata
protection entirely by passing "rodata=off".

Fix this by parsing the "on" and "off" options in the arch code,
additionally enforcing that 'rodata_full' cannot be set without also
setting 'rodata_enabled', allowing us to simplify a couple of checks
in the process.

Fixes: 2e8cff0a0e ("arm64: fix rodata=full")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231117131422.29663-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-22 18:46:05 +00:00
Linus Torvalds
3ca112b71f Probes fixes for v6.7-rc1:
- Documentation update: Add a note about argument and return value
    fetching is the best effort because it depends on the type.
 
  - objpool: Fix to make internal global variables static in
    test_objpool.c.
 
  - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are
    the same prototypes in asm/kprobes.h for some architectures, but
    some of them are missing the prototype and it causes a warning.
    So move the prototype into linux/kprobes.h.
 
  - tracing: Fix to check the tracepoint event and return event at
    parsing stage. The tracepoint event doesn't support %return
    but if $retval exists, it will be converted to %return silently.
    This finds that case and rejects it.
 
  - tracing: Fix the order of the descriptions about the parameters
    of __kprobe_event_gen_cmd_start() to be consistent with the
    argument list of the function.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVOwAQbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bItMH/0F/vyiirgLrRVvQ+5Tr
 Hm32oc1BQzxnQ0+9bjzk3r90KYk5cysBEEqxKzgxq9/RsJdyCczQUpxYehU0BoZT
 1B4pB5eQ0DwcdGAVk4TyBRYVBb3uhCyyZNXv+F60AsO8i87fHHoJXT9SoKK+Vgx4
 MAklE1gnxFFlRoYCBQpks89NajRx6n3aEL4/oXO3WYSrv+H2WGtZamB+RhpufkDx
 Qx5TkIGnjulcW6J5m7Px5N3z9AX00SbfooZHAae3fqsek5RPNecfc1/WiANNXrSm
 SYsG/i1jcHVvmk2YmCVokVLPKzhCOsKIuiW91rBu/Tu6lqiJmC+fxWxuZqAdXFUi
 +kw=
 =uymB
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - Documentation update: Add a note about argument and return value
   fetching is the best effort because it depends on the type.

 - objpool: Fix to make internal global variables static in
   test_objpool.c.

 - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are the
   same prototypes in asm/kprobes.h for some architectures, but some of
   them are missing the prototype and it causes a warning. So move the
   prototype into linux/kprobes.h.

 - tracing: Fix to check the tracepoint event and return event at
   parsing stage. The tracepoint event doesn't support %return but if
   $retval exists, it will be converted to %return silently. This finds
   that case and rejects it.

 - tracing: Fix the order of the descriptions about the parameters of
   __kprobe_event_gen_cmd_start() to be consistent with the argument
   list of the function.

* tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/kprobes: Fix the order of argument descriptions
  tracing: fprobe-event: Fix to check tracepoint event and return
  kprobes: unify kprobes_exceptions_nofify() prototypes
  lib: test_objpool: make global variables static
  Documentation: tracing: Add a note about argument and retval access
2023-11-10 16:35:04 -08:00
Linus Torvalds
ac347a0655 arm64 fixes:
- Move the MediaTek GIC quirk handling from irqchip to core. Before the
   merging window commit 44bd78dd2b ("irqchip/gic-v3: Disable pseudo
   NMIs on MediaTek devices w/ firmware issues") temporarily addressed
   this issue. Fixed now at a deeper level in the arch code.
 
 - Reject events meant for other PMUs in the CoreSight PMU driver,
   otherwise some of the core PMU events would disappear.
 
 - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers,
   causing some events to be invisible.
 
 - Remove duplicate declaration of __arm64_sys##name following the patch
   to avoid prototype warning for syscalls.
 
 - Typos in the elf_hwcap documentation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmVObkoACgkQa9axLQDI
 XvHfiQ//eM5pDYlXTtkD8lqqAMKL5270iig9kN3lbrHO9+fPU0f15tntPyBJbgdv
 mTLTkfw5Uz1WqCuJkDHIL3aqeJU7uphJQgS+X4Js//37txJ0T+soJ2LQ+yCxIhVi
 PrJBcfNe6lz+0j/AeP7548hXt+gmUFIkBrSqy0NYPnhEd9Ly1mkk5Ggvt6e1baU3
 STSjsjFXNl9YtmsiU4Yy3X4n/vrt3rqQzsuq18R51Cw/w/J/CvI2g6z0bhMcThY1
 NHrMJU5xhTfDxOASS2p40HFZau4yCtIvbr0Y0HF1UsXilBXp7F17J7j6Og6+IEO1
 bOTgPnZ9p6faZ4BrNvC8wYNtclonHf5eYyrdf+aUzoyDIXkAtAqAU9lPg1+2+Aiv
 FrRmROtgnLX1upM9fq7/sSX+SUYUZMibtVlt1aNqgQktVUkUc6t0tzaj7lBtnvXQ
 PkUnA7qcUnwsF3r2GbUvYI3mzQfN7hTt924eFOtiDcXjWwrhhXeBI3vQyCwS2JGa
 zl2D+5tw/tERKYXwkNHWw69d9BYu7eVP5cw06nOXk3iDVNW8dJf7J3eUWnqNl7Ss
 nSBdYKgE97MvVWmaeaKWrrOO//zeHKeFoaH1BxxlHRTwhgpi6DWcRccB8F9RqKwe
 eZP1vKW66qH82DpHR9ivQ+OXE1WCDi0ZdcKhi2KYdNtf6wuXssY=
 =c+Yt
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Mostly PMU fixes and a reworking of the pseudo-NMI disabling on broken
  MediaTek firmware:

   - Move the MediaTek GIC quirk handling from irqchip to core. Before
     the merging window commit 44bd78dd2b ("irqchip/gic-v3: Disable
     pseudo NMIs on MediaTek devices w/ firmware issues") temporarily
     addressed this issue. Fixed now at a deeper level in the arch code

   - Reject events meant for other PMUs in the CoreSight PMU driver,
     otherwise some of the core PMU events would disappear

   - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers,
     causing some events to be invisible

   - Remove duplicate declaration of __arm64_sys##name following the
     patch to avoid prototype warning for syscalls

   - Typos in the elf_hwcap documentation"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/syscall: Remove duplicate declaration
  Revert "arm64: smp: avoid NMI IPIs with broken MediaTek FW"
  arm64: Move MediaTek GIC quirk handling from irqchip to core
  arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
  perf: arm_cspmu: Reject events meant for other PMUs
  Documentation/arm64: Fix typos in elf_hwcaps
2023-11-10 12:22:14 -08:00
Arnd Bergmann
abc28463c8 kprobes: unify kprobes_exceptions_nofify() prototypes
Most architectures that support kprobes declare this function in their
own asm/kprobes.h header and provide an override, but some are missing
the prototype, which causes a warning for the __weak stub implementation:

kernel/kprobes.c:1865:12: error: no previous prototype for 'kprobe_exceptions_notify' [-Werror=missing-prototypes]
 1865 | int __weak kprobe_exceptions_notify(struct notifier_block *self,

Move the prototype into linux/kprobes.h so it is visible to all
the definitions.

Link: https://lore.kernel.org/all/20231108125843.3806765-4-arnd@kernel.org/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-11-10 19:59:05 +09:00
Kevin Brodsky
f86128050d arm64/syscall: Remove duplicate declaration
Commit 6ac19f9651 ("arm64: avoid prototype warnings for syscalls")
added missing declarations to various syscall wrapper macros. It
however proved a little too zealous in __SYSCALL_DEFINEx(), as a
declaration for __arm64_sys##name was already present. A declaration
is required before the call to ALLOW_ERROR_INJECTION(), so keep
the original one and remove the new one.

Fixes: 6ac19f9651 ("arm64: avoid prototype warnings for syscalls")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231109141153.250046-1-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-09 17:19:14 +00:00
Ilkka Koskinen
403edfa436 arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
The driver used to truncate several 64-bit registers such as PMCEID[n]
registers used to describe whether architectural and microarchitectural
events in range 0x4000-0x401f exist. Due to discarding the bits, the
driver made the events invisible, even if they existed.

Moreover, PMCCFILTR and PMCR registers have additional bits in the upper
32 bits. This patch makes them available although they aren't currently
used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are
removed as they not being used at all.

Fixes: df29ddf4f0 ("arm64: perf: Abstract system register accesses away")
Reported-by: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Acked-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/..
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-07 11:00:57 +00:00
Linus Torvalds
8f6f76a6a2 As usual, lots of singleton and doubleton patches all over the tree and
there's little I can say which isn't in the individual changelogs.
 
 The lengthier patch series are
 
 - "kdump: use generic functions to simplify crashkernel reservation in
   arch", from Baoquan He.  This is mainly cleanups and consolidation of
   the "crashkernel=" kernel parameter handling.
 
 - After much discussion, David Laight's "minmax: Relax type checks in
   min() and max()" is here.  Hopefully reduces some typecasting and the
   use of min_t() and max_t().
 
 - A group of patches from Oleg Nesterov which clean up and slightly fix
   our handling of reads from /proc/PID/task/...  and which remove
   task_struct.therad_group.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA
 jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4
 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0=
 =On4u
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00
Linus Torvalds
ecae0bd517 Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
 
 - Kemeng Shi has contributed some compation maintenance work in the
   series "Fixes and cleanups to compaction".
 
 - Joel Fernandes has a patchset ("Optimize mremap during mutual
   alignment within PMD") which fixes an obscure issue with mremap()'s
   pagetable handling during a subsequent exec(), based upon an
   implementation which Linus suggested.
 
 - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
   following patch series:
 
 	mm/damon: misc fixups for documents, comments and its tracepoint
 	mm/damon: add a tracepoint for damos apply target regions
 	mm/damon: provide pseudo-moving sum based access rate
 	mm/damon: implement DAMOS apply intervals
 	mm/damon/core-test: Fix memory leaks in core-test
 	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
 
 - In the series "Do not try to access unaccepted memory" Adrian Hunter
   provides some fixups for the recently-added "unaccepted memory' feature.
   To increase the feature's checking coverage.  "Plug a few gaps where
   RAM is exposed without checking if it is unaccepted memory".
 
 - In the series "cleanups for lockless slab shrink" Qi Zheng has done
   some maintenance work which is preparation for the lockless slab
   shrinking code.
 
 - Qi Zheng has redone the earlier (and reverted) attempt to make slab
   shrinking lockless in the series "use refcount+RCU method to implement
   lockless slab shrink".
 
 - David Hildenbrand contributes some maintenance work for the rmap code
   in the series "Anon rmap cleanups".
 
 - Kefeng Wang does more folio conversions and some maintenance work in
   the migration code.  Series "mm: migrate: more folio conversion and
   unification".
 
 - Matthew Wilcox has fixed an issue in the buffer_head code which was
   causing long stalls under some heavy memory/IO loads.  Some cleanups
   were added on the way.  Series "Add and use bdev_getblk()".
 
 - In the series "Use nth_page() in place of direct struct page
   manipulation" Zi Yan has fixed a potential issue with the direct
   manipulation of hugetlb page frames.
 
 - In the series "mm: hugetlb: Skip initialization of gigantic tail
   struct pages if freed by HVO" has improved our handling of gigantic
   pages in the hugetlb vmmemmep optimizaton code.  This provides
   significant boot time improvements when significant amounts of gigantic
   pages are in use.
 
 - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
   rationalization and folio conversions in the hugetlb code.
 
 - Yin Fengwei has improved mlock()'s handling of large folios in the
   series "support large folio for mlock"
 
 - In the series "Expose swapcache stat for memcg v1" Liu Shixin has
   added statistics for memcg v1 users which are available (and useful)
   under memcg v2.
 
 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
   prctl so that userspace may direct the kernel to not automatically
   propagate the denial to child processes.  The series is named "MDWE
   without inheritance".
 
 - Kefeng Wang has provided the series "mm: convert numa balancing
   functions to use a folio" which does what it says.
 
 - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
   makes is possible for a process to propagate KSM treatment across
   exec().
 
 - Huang Ying has enhanced memory tiering's calculation of memory
   distances.  This is used to permit the dax/kmem driver to use "high
   bandwidth memory" in addition to Optane Data Center Persistent Memory
   Modules (DCPMM).  The series is named "memory tiering: calculate
   abstract distance based on ACPI HMAT"
 
 - In the series "Smart scanning mode for KSM" Stefan Roesch has
   optimized KSM by teaching it to retain and use some historical
   information from previous scans.
 
 - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
   series "mm: memcg: fix tracking of pending stats updates values".
 
 - In the series "Implement IOCTL to get and optionally clear info about
   PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
   us to atomically read-then-clear page softdirty state.  This is mainly
   used by CRIU.
 
 - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
   - a bunch of relatively minor maintenance tweaks to this code.
 
 - Matthew Wilcox has increased the use of the VMA lock over file-backed
   page faults in the series "Handle more faults under the VMA lock".  Some
   rationalizations of the fault path became possible as a result.
 
 - In the series "mm/rmap: convert page_move_anon_rmap() to
   folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
   and folio conversions.
 
 - In the series "various improvements to the GUP interface" Lorenzo
   Stoakes has simplified and improved the GUP interface with an eye to
   providing groundwork for future improvements.
 
 - Andrey Konovalov has sent along the series "kasan: assorted fixes and
   improvements" which does those things.
 
 - Some page allocator maintenance work from Kemeng Shi in the series
   "Two minor cleanups to break_down_buddy_pages".
 
 - In thes series "New selftest for mm" Breno Leitao has developed
   another MM self test which tickles a race we had between madvise() and
   page faults.
 
 - In the series "Add folio_end_read" Matthew Wilcox provides cleanups
   and an optimization to the core pagecache code.
 
 - Nhat Pham has added memcg accounting for hugetlb memory in the series
   "hugetlb memcg accounting".
 
 - Cleanups and rationalizations to the pagemap code from Lorenzo
   Stoakes, in the series "Abstract vma_merge() and split_vma()".
 
 - Audra Mitchell has fixed issues in the procfs page_owner code's new
   timestamping feature which was causing some misbehaviours.  In the
   series "Fix page_owner's use of free timestamps".
 
 - Lorenzo Stoakes has fixed the handling of new mappings of sealed files
   in the series "permit write-sealed memfd read-only shared mappings".
 
 - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
   series "Batch hugetlb vmemmap modification operations".
 
 - Some buffer_head folio conversions and cleanups from Matthew Wilcox in
   the series "Finish the create_empty_buffers() transition".
 
 - As a page allocator performance optimization Huang Ying has added
   automatic tuning to the allocator's per-cpu-pages feature, in the series
   "mm: PCP high auto-tuning".
 
 - Roman Gushchin has contributed the patchset "mm: improve performance
   of accounted kernel memory allocations" which improves their performance
   by ~30% as measured by a micro-benchmark.
 
 - folio conversions from Kefeng Wang in the series "mm: convert page
   cpupid functions to folios".
 
 - Some kmemleak fixups in Liu Shixin's series "Some bugfix about
   kmemleak".
 
 - Qi Zheng has improved our handling of memoryless nodes by keeping them
   off the allocation fallback list.  This is done in the series "handle
   memoryless nodes more appropriately".
 
 - khugepaged conversions from Vishal Moola in the series "Some
   khugepaged folio conversions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
 jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
 FgeUPAD1oasg6CP+INZvCj34waNxwAc=
 =E+Y4
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
2023-11-02 19:38:47 -10:00
Linus Torvalds
6803bd7956 ARM:
* Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest
 
 * Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table
 
 * Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM
 
 * Guest support for memory operation instructions (FEAT_MOPS)
 
 * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code
 
 * Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use
 
 * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations
 
 * Miscellaneous kernel and selftest fixes
 
 LoongArch:
 
 * New architecture.  The hardware uses the same model as x86, s390
   and RISC-V, where guest/host mode is orthogonal to supervisor/user
   mode.  The virtualization extensions are very similar to MIPS,
   therefore the code also has some similarities but it's been cleaned
   up to avoid some of the historical bogosities that are found in
   arch/mips.  The kernel emulates MMU, timer and CSR accesses, while
   interrupt controllers are only emulated in userspace, at least for
   now.
 
 RISC-V:
 
 * Support for the Smstateen and Zicond extensions
 
 * Support for virtualizing senvcfg
 
 * Support for virtualized SBI debug console (DBCN)
 
 S390:
 
 * Nested page table management can be monitored through tracepoints
   and statistics
 
 x86:
 
 * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC,
   which could result in a dropped timer IRQ
 
 * Avoid WARN on systems with Intel IPI virtualization
 
 * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without
   forcing more common use cases to eat the extra memory overhead.
 
 * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
   SBPB, aka Selective Branch Predictor Barrier).
 
 * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
   creating the original vCPU would cause KVM to try to synchronize the vCPU's
   TSC and thus clobber the correct TSC being set by userspace.
 
 * Compute guest wall clock using a single TSC read to avoid generating an
   inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
 
 * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
   about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
   Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server
   2022.
 
 * Don't apply side effects to Hyper-V's synthetic timer on writes from
   userspace to fix an issue where the auto-enable behavior can trigger
   spurious interrupts, i.e. do auto-enabling only for guest writes.
 
 * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
   without PML enabled.
 
 * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
 
 * Harden the fast page fault path to guard against encountering an invalid
   root when walking SPTEs.
 
 * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
 
 * Use the fast path directly from the timer callback when delivering Xen
   timer events, instead of waiting for the next iteration of the run loop.
   This was not done so far because previously proposed code had races,
   but now care is taken to stop the hrtimer at critical points such as
   restarting the timer or saving the timer information for userspace.
 
 * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
 
 * Optimize injection of PMU interrupts that are simultaneous with NMIs.
 
 * Usual handful of fixes for typos and other warts.
 
 x86 - MTRR/PAT fixes and optimizations:
 
 * Clean up code that deals with honoring guest MTRRs when the VM has
   non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
 
 * Zap EPT entries when non-coherent DMA assignment stops/start to prevent
   using stale entries with the wrong memtype.
 
 * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y.
   This was done as a workaround for virtual machine BIOSes that did not
   bother to clear CR0.CD (because ancient KVM/QEMU did not bother to
   set it, in turn), and there's zero reason to extend the quirk to
   also ignore guest PAT.
 
 x86 - SEV fixes:
 
 * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
   running an SEV-ES guest.
 
 * Clean up the recognition of emulation failures on SEV guests, when KVM would
   like to "skip" the instruction but it had already been partially emulated.
   This makes it possible to drop a hack that second guessed the (insufficient)
   information provided by the emulator, and just do the right thing.
 
 Documentation:
 
 * Various updates and fixes, mostly for x86
 
 * MTRR and PAT fixes and optimizations:
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or
 l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH
 hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p
 bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD
 eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL
 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g==
 =UIxm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Generalized infrastructure for 'writable' ID registers, effectively
     allowing userspace to opt-out of certain vCPU features for its
     guest

   - Optimization for vSGI injection, opportunistically compressing
     MPIDR to vCPU mapping into a table

   - Improvements to KVM's PMU emulation, allowing userspace to select
     the number of PMCs available to a VM

   - Guest support for memory operation instructions (FEAT_MOPS)

   - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
     bugs and getting rid of useless code

   - Changes to the way the SMCCC filter is constructed, avoiding wasted
     memory allocations when not in use

   - Load the stage-2 MMU context at vcpu_load() for VHE systems,
     reducing the overhead of errata mitigations

   - Miscellaneous kernel and selftest fixes

  LoongArch:

   - New architecture for kvm.

     The hardware uses the same model as x86, s390 and RISC-V, where
     guest/host mode is orthogonal to supervisor/user mode. The
     virtualization extensions are very similar to MIPS, therefore the
     code also has some similarities but it's been cleaned up to avoid
     some of the historical bogosities that are found in arch/mips. The
     kernel emulates MMU, timer and CSR accesses, while interrupt
     controllers are only emulated in userspace, at least for now.

  RISC-V:

   - Support for the Smstateen and Zicond extensions

   - Support for virtualizing senvcfg

   - Support for virtualized SBI debug console (DBCN)

  S390:

   - Nested page table management can be monitored through tracepoints
     and statistics

  x86:

   - Fix incorrect handling of VMX posted interrupt descriptor in
     KVM_SET_LAPIC, which could result in a dropped timer IRQ

   - Avoid WARN on systems with Intel IPI virtualization

   - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
     without forcing more common use cases to eat the extra memory
     overhead.

   - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
     SBPB, aka Selective Branch Predictor Barrier).

   - Fix a bug where restoring a vCPU snapshot that was taken within 1
     second of creating the original vCPU would cause KVM to try to
     synchronize the vCPU's TSC and thus clobber the correct TSC being
     set by userspace.

   - Compute guest wall clock using a single TSC read to avoid
     generating an inaccurate time, e.g. if the vCPU is preempted
     between multiple TSC reads.

   - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
     complain about a "Firmware Bug" if the bit isn't set for select
     F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
     appease Windows Server 2022.

   - Don't apply side effects to Hyper-V's synthetic timer on writes
     from userspace to fix an issue where the auto-enable behavior can
     trigger spurious interrupts, i.e. do auto-enabling only for guest
     writes.

   - Remove an unnecessary kick of all vCPUs when synchronizing the
     dirty log without PML enabled.

   - Advertise "support" for non-serializing FS/GS base MSR writes as
     appropriate.

   - Harden the fast page fault path to guard against encountering an
     invalid root when walking SPTEs.

   - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.

   - Use the fast path directly from the timer callback when delivering
     Xen timer events, instead of waiting for the next iteration of the
     run loop. This was not done so far because previously proposed code
     had races, but now care is taken to stop the hrtimer at critical
     points such as restarting the timer or saving the timer information
     for userspace.

   - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
     flag.

   - Optimize injection of PMU interrupts that are simultaneous with
     NMIs.

   - Usual handful of fixes for typos and other warts.

  x86 - MTRR/PAT fixes and optimizations:

   - Clean up code that deals with honoring guest MTRRs when the VM has
     non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.

   - Zap EPT entries when non-coherent DMA assignment stops/start to
     prevent using stale entries with the wrong memtype.

   - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y

     This was done as a workaround for virtual machine BIOSes that did
     not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
     to set it, in turn), and there's zero reason to extend the quirk to
     also ignore guest PAT.

  x86 - SEV fixes:

   - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
     SHUTDOWN while running an SEV-ES guest.

   - Clean up the recognition of emulation failures on SEV guests, when
     KVM would like to "skip" the instruction but it had already been
     partially emulated. This makes it possible to drop a hack that
     second guessed the (insufficient) information provided by the
     emulator, and just do the right thing.

  Documentation:

   - Various updates and fixes, mostly for x86

   - MTRR and PAT fixes and optimizations"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: x86: Service NMI requests after PMI requests in VM-Enter path
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  ...
2023-11-02 15:45:15 -10:00
Linus Torvalds
1e0c505e13 asm-generic updates for v6.7
The ia64 architecture gets its well-earned retirement as planned,
 now that there is one last (mostly) working release that will
 be maintained as an LTS kernel.
 
 The architecture specific system call tables are updated for
 the added map_shadow_stack() syscall and to remove references
 to the long-gone sys_lookup_dcookie() syscall.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ
 Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ
 tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq
 XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms
 R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ
 kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv
 shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4
 LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9
 ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD
 eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A
 BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W
 vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI=
 =H1vH
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull ia64 removal and asm-generic updates from Arnd Bergmann:

 - The ia64 architecture gets its well-earned retirement as planned,
   now that there is one last (mostly) working release that will be
   maintained as an LTS kernel.

 - The architecture specific system call tables are updated for the
   added map_shadow_stack() syscall and to remove references to the
   long-gone sys_lookup_dcookie() syscall.

* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  hexagon: Remove unusable symbols from the ptrace.h uapi
  asm-generic: Fix spelling of architecture
  arch: Reserve map_shadow_stack() syscall number for all architectures
  syscalls: Cleanup references to sys_lookup_dcookie()
  Documentation: Drop or replace remaining mentions of IA64
  lib/raid6: Drop IA64 support
  Documentation: Drop IA64 from feature descriptions
  kernel: Drop IA64 support from sig_fault handlers
  arch: Remove Itanium (IA-64) architecture
2023-11-01 15:28:33 -10:00
Linus Torvalds
56ec8e4cd8 arm64 updates for 6.7:
* Major refactoring of the CPU capability detection logic resulting in
   the removal of the cpus_have_const_cap() function and migrating the
   code to "alternative" branches where possible
 
 * Backtrace/kgdb: use IPIs and pseudo-NMI
 
 * Perf and PMU:
 
   - Add support for Ampere SoC PMUs
 
   - Multi-DTC improvements for larger CMN configurations with multiple
     Debug & Trace Controllers
 
   - Rework the Arm CoreSight PMU driver to allow separate registration of
     vendor backend modules
 
   - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
     driver; use device_get_match_data() in the xgene driver; fix NULL
     pointer dereference in the hisi driver caused by calling
     cpuhp_state_remove_instance(); use-after-free in the hisi driver
 
 * HWCAP updates:
 
   - FEAT_SVE_B16B16 (BFloat16)
 
   - FEAT_LRCPC3 (release consistency model)
 
   - FEAT_LSE128 (128-bit atomic instructions)
 
 * SVE: remove a couple of pseudo registers from the cpufeature code.
   There is logic in place already to detect mismatched SVE features
 
 * Miscellaneous:
 
   - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
     bouncing is needed. The buffer is still required for small kmalloc()
     buffers
 
   - Fix module PLT counting with !RANDOMIZE_BASE
 
   - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
     synchronisation code out of the set_ptes() loop
 
   - More compact cpufeature displaying enabled cores
 
   - Kselftest updates for the new CPU features
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmU7/QUACgkQa9axLQDI
 XvEx3xAAjICmHm+ryKJxS1IGXLYu2DXMcHUjeW6w1SxkK/vKhTMlHRx/CIWDze2l
 eENu7TcDLtTw+Gv9kqg30TSwzLfJhP9oFpX2T5TKkh5qlJlbz8fBtm+as14DTLCZ
 p2sra3J0w4B5JwTVqnj2RHOlEftMKvbyLGRkz3ve6wIUbsp5pXMkxAd/k3wOf0lC
 m6d9w1OMA2sOsw9YCgjcCNQGEzFMJk+13w7K+4w6A8Djn/Jxkt4fAFVn2ZlCiZzD
 NA2lTDWJqGmeGHo3iFdCTensWXmWTqjzxsNEf7PyBk5mBOdzDVxlTfEL7vnJg7gf
 BlTQ/nhIpra7rHQ9q2rwqEzbF+4Tn3uWlQfdDb7+/4goPjDh7tlBhEOYyOwTCEIT
 0t9cCSvBmSCKeXC3lKWWtJ+QJKhZHSmXN84EotTs65KyyfIsi4RuSezvV/+aIL86
 06sHYlYxETuujZP1cgOjf69Wsdsgizx0mqXJXf/xOjp22HFDcL4Bki6Rgi6t5OZj
 GEHG15kSE+eJ+RIpxpuAN8fdrlxYubsVLIksCqK7cZf9zXbQGIlifKAIrYiEx6kz
 FD+o+j/5niRWR6yJZCtCcGxqpSlwnYWPqc1Ds0GES8A/BphWMPozXUAZ0ll4Fnp1
 yyR2/Due/eBsCNESn579kP8989rashubB8vxvdx2fcWVtLC7VgE=
 =QaEo
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "No major architecture features this time around, just some new HWCAP
  definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.

  The bulk of the changes is reworking of the CPU capability checking
  code (cpus_have_cap() etc).

   - Major refactoring of the CPU capability detection logic resulting
     in the removal of the cpus_have_const_cap() function and migrating
     the code to "alternative" branches where possible

   - Backtrace/kgdb: use IPIs and pseudo-NMI

   - Perf and PMU:

      - Add support for Ampere SoC PMUs

      - Multi-DTC improvements for larger CMN configurations with
        multiple Debug & Trace Controllers

      - Rework the Arm CoreSight PMU driver to allow separate
        registration of vendor backend modules

      - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
        driver; use device_get_match_data() in the xgene driver; fix
        NULL pointer dereference in the hisi driver caused by calling
        cpuhp_state_remove_instance(); use-after-free in the hisi driver

   - HWCAP updates:

      - FEAT_SVE_B16B16 (BFloat16)

      - FEAT_LRCPC3 (release consistency model)

      - FEAT_LSE128 (128-bit atomic instructions)

   - SVE: remove a couple of pseudo registers from the cpufeature code.
     There is logic in place already to detect mismatched SVE features

   - Miscellaneous:

      - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
        bouncing is needed. The buffer is still required for small
        kmalloc() buffers

      - Fix module PLT counting with !RANDOMIZE_BASE

      - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
        synchronisation code out of the set_ptes() loop

      - More compact cpufeature displaying enabled cores

      - Kselftest updates for the new CPU features"

 * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
  arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
  arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
  perf: hisi: Fix use-after-free when register pmu fails
  drivers/perf: hisi_pcie: Initialize event->cpu only on success
  drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
  arm64: cpufeature: Change DBM to display enabled cores
  arm64: cpufeature: Display the set of cores with a feature
  perf/arm-cmn: Enable per-DTC counter allocation
  perf/arm-cmn: Rework DTC counters (again)
  perf/arm-cmn: Fix DTC domain detection
  drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
  drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
  drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
  clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
  arm64: Remove system_uses_lse_atomics()
  arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
  drivers/perf: xgene: Use device_get_match_data()
  perf/amlogic: add missing MODULE_DEVICE_TABLE
  arm64/mm: Hoist synchronization out of set_ptes() loop
  ...
2023-11-01 09:34:55 -10:00
Paolo Bonzini
45b890f768 KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively
    allowing userspace to opt-out of certain vCPU features for its guest
 
  - Optimization for vSGI injection, opportunistically compressing MPIDR
    to vCPU mapping into a table
 
  - Improvements to KVM's PMU emulation, allowing userspace to select
    the number of PMCs available to a VM
 
  - Guest support for memory operation instructions (FEAT_MOPS)
 
  - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
    bugs and getting rid of useless code
 
  - Changes to the way the SMCCC filter is constructed, avoiding wasted
    memory allocations when not in use
 
  - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
    the overhead of errata mitigations
 
  - Miscellaneous kernel and selftest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd
 FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834
 fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw=
 =R4M7
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.7

 - Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest

 - Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table

 - Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM

 - Guest support for memory operation instructions (FEAT_MOPS)

 - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code

 - Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use

 - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations

 - Miscellaneous kernel and selftest fixes
2023-10-31 16:37:07 -04:00
Linus Torvalds
3cf3fabccb Locking changes in this cycle are:
- Futex improvements:
 
     - Add the 'futex2' syscall ABI, which is an attempt to get away from the
       multiplex syscall and adds a little room for extentions, while lifting
       some limitations.
 
     - Fix futex PI recursive rt_mutex waiter state bug
 
     - Fix inter-process shared futexes on no-MMU systems
 
     - Use folios instead of pages
 
  - Micro-optimizations of locking primitives:
 
     - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
       architectures, to improve lockref code generation.
 
     - Improve the x86-32 lockref_get_not_zero() main loop by adding
       build-time CMPXCHG8B support detection for the relevant lockref code,
       and by better interfacing the CMPXCHG8B assembly code with the compiler.
 
     - Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg()
       code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg().
 
     - Micro-optimize rcuref_put_slowpath()
 
  - Locking debuggability improvements:
 
     - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
 
     - Enforce atomicity of sched_submit_work(), which is de-facto atomic but
       was un-enforced previously.
 
     - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics
 
     - Fix ww_mutex self-tests
 
     - Clean up const-propagation in <linux/seqlock.h> and simplify
       the API-instantiation macros a bit.
 
  - RT locking improvements:
 
     - Provide the rt_mutex_*_schedule() primitives/helpers and use them
       in the rtmutex code to avoid recursion vs. rtlock on the PI state.
 
     - Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock()
       and rwbase_read_lock().
 
  - Plus misc fixes & cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU877IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g9jw/+N7rxQ78dmFCYh4UWnLCYvuKP0/ivHErG
 493JcB8MupuA2tfJHIkDdr4aM2mNq2E61w69/WlZAQWWD6pdOhwgF5Xf5eoEcJm0
 vsAhWBGLxihXdtevPuMAx0dEpg3AMp2wc6i5PkN831KdPUgCNsrKq9Bfnfef7/G8
 MQTSHjmtba6jxleyxfEa4tE2xe5PJX825nRfkX2e1cf+stkYua+uJFxVxUfxFWGE
 4pBy70D9OC7MsJ44WWOA1gwkVtMMiBTmRPNjlP8Gz2GQ0f3ERHRwYk3jDHOPHZI6
 0GNt7pE3IMXQn2UuDtfkvv9IFTd+U5qD+APnWIn2ntWXqzGLFqOlmovMrobVn7El
 olYDCyweWPG71m1Qblsb1VK2QjRPQVJ9NAEg8RlDHIu2ThxHbMysDVGPVOYnPFq4
 S8QFpmldzbNoPU4rDJyT1fAmoUIrusBHkl+Us3yGfC74iM+fHnDEvaSoMZbzEdY1
 x/Nocj9XgKEgfXdYzrCWFmZ9xXqHkO25/wDL6yKqBdQtvaEalXuHTT6mQcYxrUPm
 Xx1BPan2Jg7p4u2oOFcVtKewUtRH9KBx8qytr5S+JK4PJbrBsixMnr84HLd/3X2V
 ykYkO+367T5MTYv4TnJDE5vdurzUqekKSCFPY3skPujPJfdLj1vsPzYf9iMkCLdo
 hU2f/R+Wpdk=
 =36Ff
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Info Molnar:
 "Futex improvements:

   - Add the 'futex2' syscall ABI, which is an attempt to get away from
     the multiplex syscall and adds a little room for extentions, while
     lifting some limitations.

   - Fix futex PI recursive rt_mutex waiter state bug

   - Fix inter-process shared futexes on no-MMU systems

   - Use folios instead of pages

  Micro-optimizations of locking primitives:

   - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
     architectures, to improve lockref code generation

   - Improve the x86-32 lockref_get_not_zero() main loop by adding
     build-time CMPXCHG8B support detection for the relevant lockref
     code, and by better interfacing the CMPXCHG8B assembly code with
     the compiler

   - Introduce arch_sync_try_cmpxchg() on x86 to improve
     sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
     users to sync_try_cmpxchg().

   - Micro-optimize rcuref_put_slowpath()

  Locking debuggability improvements:

   - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well

   - Enforce atomicity of sched_submit_work(), which is de-facto atomic
     but was un-enforced previously.

   - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
     semantics

   - Fix ww_mutex self-tests

   - Clean up const-propagation in <linux/seqlock.h> and simplify the
     API-instantiation macros a bit

  RT locking improvements:

   - Provide the rt_mutex_*_schedule() primitives/helpers and use them
     in the rtmutex code to avoid recursion vs. rtlock on the PI state.

   - Add nested blocking lockdep asserts to rt_mutex_lock(),
     rtlock_lock() and rwbase_read_lock()

  .. plus misc fixes & cleanups"

* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  futex: Don't include process MM in futex key on no-MMU
  locking/seqlock: Fix grammar in comment
  alpha: Fix up new futex syscall numbers
  locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
  locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
  locking/seqlock: Change __seqprop() to return the function pointer
  locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
  locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
  locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
  locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
  locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
  locking/seqlock: Fix typo in comment
  futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
  locking/local, arch: Rewrite local_add_unless() as a static inline function
  locking/debug: Fix debugfs API return value checks to use IS_ERR()
  locking/ww_mutex/test: Make sure we bail out instead of livelock
  locking/ww_mutex/test: Fix potential workqueue corruption
  locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
  futex: Add sys_futex_requeue()
  futex: Add flags2 argument to futex_requeue()
  ...
2023-10-30 12:38:48 -10:00
Oliver Upton
123f42f0ad Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/next
* kvm-arm64/pmu_pmcr_n:
  : User-defined PMC limit, courtesy Raghavendra Rao Ananta
  :
  : Certain VMMs may want to reserve some PMCs for host use while running a
  : KVM guest. This was a bit difficult before, as KVM advertised all
  : supported counters to the guest. Userspace can now limit the number of
  : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU
  : emulation enforce the specified limit for handling guest accesses.
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
  KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0
  KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler
  KVM: arm64: PMU: Introduce helpers to set the guest's PMU

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:24:19 +00:00
Oliver Upton
53ce49ea75 Merge branch kvm-arm64/mops into kvmarm/next
* kvm-arm64/mops:
  : KVM support for MOPS, courtesy of Kristina Martsenko
  :
  : MOPS adds new instructions for accelerating memcpy(), memset(), and
  : memmove() operations in hardware. This series brings virtualization
  : support for KVM guests, and allows VMs to run on asymmetrict systems
  : that may have different MOPS implementations.
  KVM: arm64: Expose MOPS instructions to guests
  KVM: arm64: Add handler for MOPS exceptions

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:21:19 +00:00
Oliver Upton
a87a36436c Merge branch kvm-arm64/writable-id-regs into kvmarm/next
* kvm-arm64/writable-id-regs:
  : Writable ID registers, courtesy of Jing Zhang
  :
  : This series significantly expands the architectural feature set that
  : userspace can manipulate via the ID registers. A new ioctl is defined
  : that makes the mutable fields in the ID registers discoverable to
  : userspace.
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: selftests: Test for setting ID register from usersapce
  tools headers arm64: Update sysreg.h with kernel sources
  KVM: selftests: Generate sysreg-defs.h and add to include path
  perf build: Generate arm64's sysreg-defs.h and add to include path
  tools: arm64: Add a Makefile for generating sysreg-defs.h
  KVM: arm64: Document vCPU feature selection UAPIs
  KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1
  KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1
  KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1
  KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1
  KVM: arm64: Bump up the default KVM sanitised debug version to v8p8
  KVM: arm64: Reject attempts to set invalid debug arch version
  KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version
  KVM: arm64: Use guest ID register values for the sake of emulation
  KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS
  KVM: arm64: Allow userspace to get the writable masks for feature ID registers

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:21:09 +00:00
Oliver Upton
54b44ad26c Merge branch kvm-arm64/sgi-injection into kvmarm/next
* kvm-arm64/sgi-injection:
  : vSGI injection improvements + fixes, courtesy Marc Zyngier
  :
  : Avoid linearly searching for vSGI targets using a compressed MPIDR to
  : index a cache. While at it, fix some egregious bugs in KVM's mishandling
  : of vcpuid (user-controlled value) and vcpu_idx.
  KVM: arm64: Clarify the ordering requirements for vcpu/RD creation
  KVM: arm64: vgic-v3: Optimize affinity-based SGI injection
  KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available
  KVM: arm64: Build MPIDR to vcpu index cache at runtime
  KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
  KVM: arm64: Use vcpu_idx for invalidation tracking
  KVM: arm64: vgic: Use vcpu_idx for the debug information
  KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id
  KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id
  KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:19:13 +00:00
Oliver Upton
df26b77915 Merge branch kvm-arm64/stage2-vhe-load into kvmarm/next
* kvm-arm64/stage2-vhe-load:
  : Setup stage-2 MMU from vcpu_load() for VHE
  :
  : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
  : entry/exit in VHE mode as the host is running at EL2. Despite this KVM
  : reloads the stage-2 on every guest entry, which is needless.
  :
  : This series moves the setup of the stage-2 MMU context to vcpu_load()
  : when running in VHE mode. This is likely to be a win across the board,
  : but also allows us to remove an ISB on the guest entry path for systems
  : with one of the speculative AT errata.
  KVM: arm64: Move VTCR_EL2 into struct s2_mmu
  KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()
  KVM: arm64: Rename helpers for VHE vCPU load/put
  KVM: arm64: Reload stage-2 for VMID change on VHE
  KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host()
  KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:56 +00:00
Oliver Upton
51e6079614 Merge branch kvm-arm64/nv-trap-fixes into kvmarm/next
* kvm-arm64/nv-trap-fixes:
  : NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier
  :
  :  - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the
  :    NV trap encoding
  :
  :  - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI
  :    where appropriate for NV guests
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:46 +00:00
Oliver Upton
25a35c1a3d Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/next
* kvm-arm64/smccc-filter-cleanups:
  : Cleanup the management of KVM's SMCCC maple tree
  :
  : Avoid the cost of maintaining the SMCCC filter maple tree if userspace
  : hasn't writen a rule to the filter. While at it, rip out the now
  : unnecessary VM flag to indicate whether or not the SMCCC filter was
  : configured.
  KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured
  KVM: arm64: Only insert reserved ranges when SMCCC filter is used
  KVM: arm64: Add a predicate for testing if SMCCC filter is configured

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:37 +00:00
Oliver Upton
d47dcb67fc Merge branch kvm-arm64/feature-flag-refactor into kvmarm/next
* kvm-arm64/feature-flag-refactor:
  : vCPU feature flag cleanup
  :
  : Clean up KVM's handling of vCPU feature flags to get rid of the
  : vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu().
  KVM: arm64: Get rid of vCPU-scoped feature bitmap
  KVM: arm64: Remove unused return value from kvm_reset_vcpu()
  KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Prevent NV feature flag on systems w/o nested virt
  KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl
  KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Add generic check for system-supported vCPU features

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:14 +00:00
Oliver Upton
054056bf98 Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Put an upper bound on the number of I-cache invalidations by
  :    cacheline to avoid soft lockups
  :
  :  - Get rid of bogus refererence count transfer for THP mappings
  :
  :  - Do a local TLB invalidation on permission fault race
  :
  :  - Fixes for page_fault_test KVM selftest
  :
  :  - Add a tracepoint for detecting MMIO instructions unsupported by KVM
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: arm64: Do not transfer page refcount for THP adjustment
  KVM: arm64: Avoid soft lockups due to I-cache maintenance
  arm64: tlbflush: Rename MAX_TLBI_OPS
  KVM: arm64: Don't use kerneldoc comment for arm64_check_features()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:00 +00:00
Catalin Marinas
14dcf78a6c Merge branch 'for-next/cpus_have_const_cap' into for-next/core
* for-next/cpus_have_const_cap: (38 commits)
  : cpus_have_const_cap() removal
  arm64: Remove cpus_have_const_cap()
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
  arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
  arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
  arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
  arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
  arm64: Avoid cpus_have_const_cap() for ARM64_MTE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
  ...
2023-10-26 17:10:18 +01:00
Catalin Marinas
2baca17e6a Merge branch 'for-next/feat_lse128' into for-next/core
* for-next/feat_lse128:
  : HWCAP for FEAT_LSE128
  kselftest/arm64: add FEAT_LSE128 to hwcap test
  arm64: add FEAT_LSE128 HWCAP
2023-10-26 17:10:07 +01:00
Catalin Marinas
023113fe66 Merge branch 'for-next/feat_lrcpc3' into for-next/core
* for-next/feat_lrcpc3:
  : HWCAP for FEAT_LRCPC3
  selftests/arm64: add HWCAP2_LRCPC3 test
  arm64: add FEAT_LRCPC3 HWCAP
2023-10-26 17:10:05 +01:00
Catalin Marinas
2a3f8ce3bb Merge branch 'for-next/feat_sve_b16b16' into for-next/core
* for-next/feat_sve_b16b16:
  : Add support for FEAT_SVE_B16B16 (BFloat16)
  kselftest/arm64: Verify HWCAP2_SVE_B16B16
  arm64/sve: Report FEAT_SVE_B16B16 to userspace
2023-10-26 17:10:01 +01:00
Catalin Marinas
1519018ccb Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', 'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: hisi: Fix use-after-free when register pmu fails
  drivers/perf: hisi_pcie: Initialize event->cpu only on success
  drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
  perf/arm-cmn: Enable per-DTC counter allocation
  perf/arm-cmn: Rework DTC counters (again)
  perf/arm-cmn: Fix DTC domain detection
  drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
  drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
  drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
  drivers/perf: xgene: Use device_get_match_data()
  perf/amlogic: add missing MODULE_DEVICE_TABLE
  docs/perf: Add ampere_cspmu to toctree to fix a build warning
  perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU
  perf: arm_cspmu: Support implementation specific validation
  perf: arm_cspmu: Support implementation specific filters
  perf: arm_cspmu: Split 64-bit write to 32-bit writes
  perf: arm_cspmu: Separate Arm and vendor module

* for-next/sve-remove-pseudo-regs:
  : arm64/fpsimd: Remove the vector length pseudo registers
  arm64/sve: Remove SMCR pseudo register from cpufeature code
  arm64/sve: Remove ZCR pseudo register from cpufeature code

* for-next/backtrace-ipi:
  : Add IPI for backtraces/kgdb, use NMI
  arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
  arm64: smp: avoid NMI IPIs with broken MediaTek FW
  arm64: smp: Mark IPI globals as __ro_after_init
  arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup
  arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
  arm64: smp: Add arch support for backtrace using pseudo-NMI
  arm64: smp: Remove dedicated wakeup IPI
  arm64: idle: Tag the arm64 idle functions as __cpuidle
  irqchip/gic-v3: Enable support for SGIs to act as NMIs

* for-next/kselftest:
  : Various arm64 kselftest updates
  kselftest/arm64: Validate SVCR in streaming SVE stress test

* for-next/misc:
  : Miscellaneous patches
  arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
  arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
  clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
  arm64: Remove system_uses_lse_atomics()
  arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
  arm64/mm: Hoist synchronization out of set_ptes() loop
  arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed

* for-next/cpufeat-display-cores:
  : arm64 cpufeature display enabled cores
  arm64: cpufeature: Change DBM to display enabled cores
  arm64: cpufeature: Display the set of cores with a feature
2023-10-26 17:09:52 +01:00
Marc Zyngier
3f7915ccc9 KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
When trapping accesses from a NV guest that tries to access
SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI,
as if AArch32 wasn't implemented.

This involves a bit of repainting to make the visibility
handler more generic.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25 00:25:06 +00:00
Miguel Luis
41f6c93447 arm64: Add missing _EL2 encodings
Some _EL2 encodings are missing. Add them.

Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[maz: dropped secure encodings]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25 00:24:49 +00:00
Miguel Luis
d5cb781b77 arm64: Add missing _EL12 encodings
Some _EL12 encodings are missing. Add them.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25 00:24:34 +00:00
Raghavendra Rao Ananta
4d20debf9c KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
The number of PMU event counters is indicated in PMCR_EL0.N.
For a vCPU with PMUv3 configured, the value is set to the same
value as the current PE on every vCPU reset.  Unless the vCPU is
pinned to PEs that has the PMU associated to the guest from the
initial vCPU reset, the value might be different from the PMU's
PMCR_EL0.N on heterogeneous PMU systems.

Fix this by setting the vCPU's PMCR_EL0.N to the PMU's PMCR_EL0.N
value. Track the PMCR_EL0.N per guest, as only one PMU can be set
for the guest (PMCR_EL0.N must be the same for all vCPUs of the
guest), and it is convenient for updating the value.

To achieve this, the patch introduces a helper,
kvm_arm_pmu_get_max_counters(), that reads the maximum number of
counters from the arm_pmu associated to the VM. Make the function
global as upcoming patches will be interested to know the value
while setting the PMCR.N of the guest from userspace.

KVM does not yet support userspace modifying PMCR_EL0.N.
The following patch will add support for that.

Reviewed-by: Sebastian Ott <sebott@redhat.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20231020214053.2144305-5-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:30 +00:00
Marc Zyngier
fe49fd940e KVM: arm64: Move VTCR_EL2 into struct s2_mmu
We currently have a global VTCR_EL2 value for each guest, even
if the guest uses NV. This implies that the guest's own S2 must
fit in the host's. This is odd, for multiple reasons:

- the PARange values and the number of IPA bits don't necessarily
  match: you can have 33 bits of IPA space, and yet you can only
  describe 32 or 36 bits of PARange

- When userspace set the IPA space, it creates a contract with the
  kernel saying "this is the IPA space I'm prepared to handle".
  At no point does it constraint the guest's own IPA space as
  long as the guest doesn't try to use a [I]PA outside of the
  IPA space set by userspace

- We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange.

And then there is the consequence of the above: if a guest tries
to create a S2 that has for input address something that is larger
than the IPA space defined by the host, we inject a fatal exception.

This is no good. For all intent and purposes, a guest should be
able to have the S2 it really wants, as long as the *output* address
of that S2 isn't outside of the IPA space.

For that, we need to have a per-s2_mmu VTCR_EL2 setting, which
allows us to represent the full PARange. Move the vctr field into
the s2_mmu structure, which has no impact whatsoever, except for NV.

Note that once we are able to override ID_AA64MMFR0_EL1.PARange
from userspace, we'll also be able to restrict the size of the
shadow S2 that NV uses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23 18:48:46 +00:00
Jeremy Linton
23b727dc20 arm64: cpufeature: Display the set of cores with a feature
The AMU feature can be enabled on a subset of the cores in a system.
Because of that, it prints a message for each core as it is detected.
This becomes tedious when there are hundreds of cores. Instead, for
CPU features which can be enabled on a subset of the present cores,
lets wait until update_cpu_capabilities() and print the subset of cores
the feature was enabled on.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Tested-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-23 16:40:37 +01:00
Oliver Upton
27cde4c0fe KVM: arm64: Rename helpers for VHE vCPU load/put
The names for the helpers we expose to the 'generic' KVM code are a bit
imprecise; we switch the EL0 + EL1 sysreg context and setup trap
controls that do not need to change for every guest entry/exit. Rename +
shuffle things around a bit in preparation for loading the stage-2 MMU
context on vcpu_load().

Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20 17:52:01 +00:00
Marc Zyngier
5eba523e1e KVM: arm64: Reload stage-2 for VMID change on VHE
Naturally, a change to the VMID for an MMU implies a new value for
VTTBR. Reload on VMID change in anticipation of loading stage-2 on
vcpu_load() instead of every guest entry.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20 17:52:01 +00:00