Commit Graph

45 Commits

Author SHA1 Message Date
Alistair Popple 1af5a81099 mmu_notifiers: rename invalidate_range notifier
There are two main use cases for mmu notifiers.  One is by KVM which uses
mmu_notifier_invalidate_range_start()/end() to manage a software TLB.

The other is to manage hardware TLBs which need to use the
invalidate_range() callback because HW can establish new TLB entries at
any time.  Hence using start/end() can lead to memory corruption as these
callbacks happen too soon/late during page unmap.

mmu notifier users should therefore either use the start()/end() callbacks
or the invalidate_range() callbacks.  To make this usage clearer rename
the invalidate_range() callback to arch_invalidate_secondary_tlbs() and
update documention.

Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:41 -07:00
Alistair Popple 6bbd42e2df mmu_notifiers: call invalidate_range() when invalidating TLBs
The invalidate_range() is going to become an architecture specific mmu
notifier used to keep the TLB of secondary MMUs such as an IOMMU in sync
with the CPU page tables.  Currently it is called from separate code paths
to the main CPU TLB invalidations.  This can lead to a secondary TLB not
getting invalidated when required and makes it hard to reason about when
exactly the secondary TLB is invalidated.

To fix this move the notifier call to the architecture specific TLB
maintenance functions for architectures that have secondary MMUs requiring
explicit software invalidations.

This fixes a SMMU bug on ARM64.  On ARM64 PTE permission upgrades require
a TLB invalidation.  This invalidation is done by the architecture
specific ptep_set_access_flags() which calls flush_tlb_page() if required.
However this doesn't call the notifier resulting in infinite faults being
generated by devices using the SMMU if it has previously cached a
read-only PTE in it's TLB.

Moving the invalidations into the TLB invalidation functions ensures all
invalidations happen at the same time as the CPU invalidation.  The
architecture specific flush_tlb_all() routines do not call the notifier as
none of the IOMMUs require this.

Link: https://lkml.kernel.org/r/0287ae32d91393a582897d6c4db6f7456b1001f2.1690292440.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Tested-by: SeongJae Park <sj@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:41 -07:00
Barry Song 43b3dfdd04 arm64: support batched/deferred tlb shootdown during page reclamation/migration
On x86, batched and deferred tlb shootdown has lead to 90% performance
increase on tlb shootdown.  on arm64, HW can do tlb shootdown without
software IPI.  But sync tlbi is still quite expensive.

Even running a simplest program which requires swapout can
prove this is true,
 #include <sys/types.h>
 #include <unistd.h>
 #include <sys/mman.h>
 #include <string.h>

 int main()
 {
 #define SIZE (1 * 1024 * 1024)
         volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                          MAP_SHARED | MAP_ANONYMOUS, -1, 0);

         memset(p, 0x88, SIZE);

         for (int k = 0; k < 10000; k++) {
                 /* swap in */
                 for (int i = 0; i < SIZE; i += 4096) {
                         (void)p[i];
                 }

                 /* swap out */
                 madvise(p, SIZE, MADV_PAGEOUT);
         }
 }

Perf result on snapdragon 888 with 8 cores by using zRAM
as the swap block device.

 ~ # perf record taskset -c 4 ./a.out
 [ perf record: Woken up 10 times to write data ]
 [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
 ~ # perf report
 # To display the perf.data header info, please use --header/--header-only options.
 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 60K of event 'cycles'
 # Event count (approx.): 35706225414
 #
 # Overhead  Command  Shared Object      Symbol
 # ........  .......  .................  ......
 #
    21.07%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irq
     8.23%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     6.67%  a.out    [kernel.kallsyms]  [k] filemap_map_pages
     6.16%  a.out    [kernel.kallsyms]  [k] __zram_bvec_write
     5.36%  a.out    [kernel.kallsyms]  [k] ptep_clear_flush
     3.71%  a.out    [kernel.kallsyms]  [k] _raw_spin_lock
     3.49%  a.out    [kernel.kallsyms]  [k] memset64
     1.63%  a.out    [kernel.kallsyms]  [k] clear_page
     1.42%  a.out    [kernel.kallsyms]  [k] _raw_spin_unlock
     1.26%  a.out    [kernel.kallsyms]  [k] mod_zone_state.llvm.8525150236079521930
     1.23%  a.out    [kernel.kallsyms]  [k] xas_load
     1.15%  a.out    [kernel.kallsyms]  [k] zram_slot_lock

ptep_clear_flush() takes 5.36% CPU in the micro-benchmark swapping in/out
a page mapped by only one process.  If the page is mapped by multiple
processes, typically, like more than 100 on a phone, the overhead would be
much higher as we have to run tlb flush 100 times for one single page. 
Plus, tlb flush overhead will increase with the number of CPU cores due to
the bad scalability of tlb shootdown in HW, so those ARM64 servers should
expect much higher overhead.

Further perf annonate shows 95% cpu time of ptep_clear_flush is actually
used by the final dsb() to wait for the completion of tlb flush.  This
provides us a very good chance to leverage the existing batched tlb in
kernel.  The minimum modification is that we only send async tlbi in the
first stage and we send dsb while we have to sync in the second stage.

With the above simplest micro benchmark, collapsed time to finish the
program decreases around 5%.

Typical collapsed time w/o patch:
 ~ # time taskset -c 4 ./a.out
 0.21user 14.34system 0:14.69elapsed
w/ patch:
 ~ # time taskset -c 4 ./a.out
 0.22user 13.45system 0:13.80elapsed

Also tested with benchmark in the commit on Kunpeng920 arm64 server
and observed an improvement around 12.5% with command
`time ./swap_bench`.
        w/o             w/
real    0m13.460s       0m11.771s
user    0m0.248s        0m0.279s
sys     0m12.039s       0m11.458s

Originally it's noticed a 16.99% overhead of ptep_clear_flush()
which has been eliminated by this patch:

[root@localhost yang]# perf record -- ./swap_bench && perf report
[...]
16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush

It is tested on 4,8,128 CPU platforms and shows to be beneficial on
large systems but may not have improvement on small systems like on
a 4 CPU platform.

Also this patch improve the performance of page migration. Using pmbench
and tries to migrate the pages of pmbench between node 0 and node 1 for
100 times for 1G memory, this patch decrease the time used around 20%
(prev 18.338318910 sec after 13.981866350 sec) and saved the time used
by ptep_clear_flush().

Link: https://lkml.kernel.org/r/20230717131004.12662-5-yangyicong@huawei.com
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Punit Agrawal <punit.agrawal@bytedance.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Barry Song <baohua@kernel.org>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: lipeifeng <lipeifeng@oppo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:37 -07:00
Will Deacon 5e10f9887e arm64: mm: Fix TLBI vs ASID rollover
When switching to an 'mm_struct' for the first time following an ASID
rollover, a new ASID may be allocated and assigned to 'mm->context.id'.
This reassignment can happen concurrently with other operations on the
mm, such as unmapping pages and subsequently issuing TLB invalidation.

Consequently, we need to ensure that (a) accesses to 'mm->context.id'
are atomic and (b) all page-table updates made prior to a TLBI using the
old ASID are guaranteed to be visible to CPUs running with the new ASID.

This was found by inspection after reviewing the VMID changes from
Shameer but it looks like a real (yet hard to hit) bug.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Jade Alglave <jade.alglave@arm.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210806113109.2475-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-06 13:52:03 +01:00
Jason Wang 312b7104f3 arm64: fix typo in a comment
The double 'the' after 'If' in this comment "If the the TLB range ops
are supported..." is repeated. Consequently, one 'the' should be
removed from the comment.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Link: https://lore.kernel.org/r/20210803142020.124230-1-wangborong@cdjrlc.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-03 16:19:33 +01:00
Sami Tolvanen 1764c3edc6 arm64: use a common .arch preamble for inline assembly
Commit 7c78f67e9b ("arm64: enable tlbi range instructions") breaks
LLVM's integrated assembler, because -Wa,-march is only passed to
external assemblers and therefore, the new instructions are not enabled
when IAS is used.

This change adds a common architecture version preamble, which can be
used in inline assembly blocks that contain instructions that require
a newer architecture version, and uses it to fix __TLBI_0 and __TLBI_1
with ARM64_TLB_RANGE.

Fixes: 7c78f67e9b ("arm64: enable tlbi range instructions")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1106
Link: https://lore.kernel.org/r/20200827203608.1225689-1-samitolvanen@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-28 11:15:15 +01:00
Zhenyu Ye d1d3aa98b1 arm64: tlb: Use the TLBI RANGE feature in arm64
Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().

When cpu supports TLBI feature, the minimum range granularity is
decided by 'scale', so we can not flush all pages by one instruction
in some cases.

For example, when the pages = 0xe81a, let's start 'scale' from
maximum, and find right 'num' for each 'scale':

1. scale = 3, we can flush no pages because the minimum range is
   2^(5*3 + 1) = 0x10000.
2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can
   flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c.
   Remaining pages is 0x1a;
3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page
   can be flushed.
4. scale = 0, we flush the remaining 0x1a pages, the num =
   0x1a/0x2 - 1 = 0xd.

However, in most scenarios, the pages = 1 when flush_tlb_range() is
called. Start from scale = 3 or other proper value (such as scale =
ilog2(pages)), will incur extra overhead.
So increase 'scale' from 0 to maximum, the flush order is exactly
opposite to the example.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-4-yezhenyu2@huawei.com
[catalin.marinas@arm.com: removed unnecessary masks in __TLBI_VADDR_RANGE]
[catalin.marinas@arm.com: __TLB_RANGE_NUM subtracts 1]
[catalin.marinas@arm.com: minor adjustments to the comments]
[catalin.marinas@arm.com: introduce system_supports_tlb_range()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-15 17:07:19 +01:00
Zhenyu Ye 61c11656b6 arm64: tlb: don't set the ttl value in flush_tlb_page_nosync
flush_tlb_page_nosync() may be called from pmd level, so we
can not set the ttl = 3 here.

The callstack is as follows:

	pmdp_set_access_flags
		ptep_set_access_flags
			flush_tlb_fix_spurious_fault
				flush_tlb_page
					flush_tlb_page_nosync

Fixes: e735b98a5f ("arm64: Add tlbi_user_level TLB invalidation helper")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200710094158.468-1-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-10 16:27:49 +01:00
Catalin Marinas 34e36d81a0 arm64: Shift the __tlbi_level() indentation left
This is for consistency with the other __tlbi macros in this file. No
functional change.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-07 11:26:14 +01:00
Zhenyu Ye c4ab2cbc1d arm64: tlb: Set the TTL field in flush_tlb_range
This patch uses the cleared_* in struct mmu_gather to set the
TTL field in flush_tlb_range().

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-6-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-07 11:23:47 +01:00
Zhenyu Ye e735b98a5f arm64: Add tlbi_user_level TLB invalidation helper
Add a level-hinted parameter to __tlbi_user, which only gets used
if ARMv8.4-TTL gets detected.

ARMv8.4-TTL provides the TTL field in tlbi instruction to indicate
the level of translation table walk holding the leaf entry for the
address that is being invalidated.

This patch set the default level value of flush_tlb_range() to 0,
which will be updated in future patches.  And set the ttl value of
flush_tlb_page_nosync() to 3 because it is only called to flush a
single pte page.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200625080314.230-4-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-07 11:23:46 +01:00
Marc Zyngier c10bc62ae4 arm64: Add level-hinted TLB invalidation helper
Add a level-hinted TLB invalidation helper that only gets used if
ARMv8.4-TTL gets detected.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:27:15 +01:00
Will Deacon 51696d346c arm64: tlb: Ensure we execute an ISB following walk cache invalidation
05f2d2f83b ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
added a new TLB invalidation helper which is used when freeing
intermediate levels of page table used for kernel mappings, but is
missing the required ISB instruction after completion of the TLBI
instruction.

Add the missing barrier.

Cc: <stable@vger.kernel.org>
Fixes: 05f2d2f83b ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-27 17:38:26 +01:00
Thomas Gleixner caab277b1d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:07 +02:00
Will Deacon 01d57485fc arm64: tlbflush: Ensure start/end of address range are aligned to stride
Since commit 3d65b6bbc0 ("arm64: tlbi: Set MAX_TLBI_OPS to
PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
perform more than PTRS_PER_PTE invalidation instructions in a single
call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
code does not ensure that the end address of the range is rounded-up
to the stride when freeing intermediate page tables in pXX_free_tlb(),
which defeats our range checking.

Align the bounds passed into __flush_tlb_range().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-12 16:19:45 +01:00
Linus Torvalds 5694cecdb0 arm64 festive updates for 4.21
In the end, we ended up with quite a lot more than I expected:
 
 - Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
   kernel-side support to come later)
 
 - Support for per-thread stack canaries, pending an update to GCC that
   is currently undergoing review
 
 - Support for kexec_file_load(), which permits secure boot of a kexec
   payload but also happens to improve the performance of kexec
   dramatically because we can avoid the sucky purgatory code from
   userspace. Kdump will come later (requires updates to libfdt).
 
 - Optimisation of our dynamic CPU feature framework, so that all
   detected features are enabled via a single stop_machine() invocation
 
 - KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
   they can benefit from global TLB entries when KASLR is not in use
 
 - 52-bit virtual addressing for userspace (kernel remains 48-bit)
 
 - Patch in LSE atomics for per-cpu atomic operations
 
 - Custom preempt.h implementation to avoid unconditional calls to
   preempt_schedule() from preempt_enable()
 
 - Support for the new 'SB' Speculation Barrier instruction
 
 - Vectorised implementation of XOR checksumming and CRC32 optimisations
 
 - Workaround for Cortex-A76 erratum #1165522
 
 - Improved compatibility with Clang/LLD
 
 - Support for TX2 system PMUS for profiling the L3 cache and DMC
 
 - Reflect read-only permissions in the linear map by default
 
 - Ensure MMIO reads are ordered with subsequent calls to Xdelay()
 
 - Initial support for memory hotplug
 
 - Tweak the threshold when we invalidate the TLB by-ASID, so that
   mremap() performance is improved for ranges spanning multiple PMDs.
 
 - Minor refactoring and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJcE4TmAAoJELescNyEwWM0Nr0H/iaU7/wQSzHyNXtZoImyKTul
 Blu2ga4/EqUrTU7AVVfmkl/3NBILWlgQVpY6tH6EfXQuvnxqD7CizbHyLdyO+z0S
 B5PsFUH2GLMNAi48AUNqGqkgb2knFbg+T+9IimijDBkKg1G/KhQnRg6bXX32mLJv
 Une8oshUPBVJMsHN1AcQknzKariuoE3u0SgJ+eOZ9yA2ZwKxP4yy1SkDt3xQrtI0
 lojeRjxcyjTP1oGRNZC+BWUtGOT35p7y6cGTnBd/4TlqBGz5wVAJUcdoxnZ6JYVR
 O8+ob9zU+4I0+SKt80s7pTLqQiL9rxkKZ5joWK1pr1g9e0s5N5yoETXKFHgJYP8=
 =sYdt
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 festive updates from Will Deacon:
 "In the end, we ended up with quite a lot more than I expected:

   - Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
     kernel-side support to come later)

   - Support for per-thread stack canaries, pending an update to GCC
     that is currently undergoing review

   - Support for kexec_file_load(), which permits secure boot of a kexec
     payload but also happens to improve the performance of kexec
     dramatically because we can avoid the sucky purgatory code from
     userspace. Kdump will come later (requires updates to libfdt).

   - Optimisation of our dynamic CPU feature framework, so that all
     detected features are enabled via a single stop_machine()
     invocation

   - KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
     they can benefit from global TLB entries when KASLR is not in use

   - 52-bit virtual addressing for userspace (kernel remains 48-bit)

   - Patch in LSE atomics for per-cpu atomic operations

   - Custom preempt.h implementation to avoid unconditional calls to
     preempt_schedule() from preempt_enable()

   - Support for the new 'SB' Speculation Barrier instruction

   - Vectorised implementation of XOR checksumming and CRC32
     optimisations

   - Workaround for Cortex-A76 erratum #1165522

   - Improved compatibility with Clang/LLD

   - Support for TX2 system PMUS for profiling the L3 cache and DMC

   - Reflect read-only permissions in the linear map by default

   - Ensure MMIO reads are ordered with subsequent calls to Xdelay()

   - Initial support for memory hotplug

   - Tweak the threshold when we invalidate the TLB by-ASID, so that
     mremap() performance is improved for ranges spanning multiple PMDs.

   - Minor refactoring and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits)
  arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset()
  arm64: sysreg: Use _BITUL() when defining register bits
  arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
  arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4
  arm64: docs: document pointer authentication
  arm64: ptr auth: Move per-thread keys from thread_info to thread_struct
  arm64: enable pointer authentication
  arm64: add prctl control for resetting ptrauth keys
  arm64: perf: strip PAC when unwinding userspace
  arm64: expose user PAC bit positions via ptrace
  arm64: add basic pointer authentication support
  arm64/cpufeature: detect pointer authentication
  arm64: Don't trap host pointer auth use to EL2
  arm64/kvm: hide ptrauth from guests
  arm64/kvm: consistently handle host HCR_EL2 flags
  arm64: add pointer authentication register bits
  arm64: add comments about EC exception levels
  arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned
  arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
  arm64: enable per-task stack canaries
  ...
2018-12-25 17:41:56 -08:00
Catalin Marinas ce8c80c536 arm64: Add workaround for Cortex-A76 erratum 1286807
On the affected Cortex-A76 cores (r0p0 to r3p0), if a virtual address
for a cacheable mapping of a location is being accessed by a core while
another core is remapping the virtual address to a new physical page
using the recommended break-before-make sequence, then under very rare
circumstances TLBI+DSB completes before a read using the translation
being invalidated has been observed by other observers. The workaround
repeats the TLBI+DSB operation and is shared with the Qualcomm Falkor
erratum 1009

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-29 16:45:45 +00:00
Will Deacon 3d65b6bbc0 arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE
In order to reduce the possibility of soft lock-ups, we bound the
maximum number of TLBI operations performed by a single call to
flush_tlb_range() to an arbitrary constant of 1024.

Whilst this does the job of avoiding lock-ups, we can actually be a bit
smarter by defining this as PTRS_PER_PTE. Due to the structure of our
page tables, using PTRS_PER_PTE means that an outer loop calling
flush_tlb_range() for entire table entries will end up performing just a
single TLBI operation for each entry. As an example, mremap()ing a 1GB
range mapped using 4k pages now requires only 512 TLBI operations when
moving the page tables as opposed to 262144 operations (512*512) when
using the current threshold of 1024.

Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 19:01:21 +00:00
Alex Van Brunt 3403e56b41 arm64: mm: Don't wait for completion of TLB invalidation when page aging
When transitioning a PTE from young to old as part of page aging, we
can avoid waiting for the TLB invalidation to complete and therefore
drop the subsequent DSB instruction. Whilst this opens up a race with
page reclaim, where a PTE in active use via a stale, young TLB entry
does not update the underlying descriptor, the worst thing that happens
is that the page is reclaimed and then immediately faulted back in.

Given that we have a DSB in our context-switch path, the window for a
spurious reclaim is fairly limited and eliding the barrier claims to
boost NVMe/SSD accesses by over 10% on some platforms.

A similar optimisation was made for x86 in commit b13b1d2d86 ("x86/mm:
In the PTE swapout page reclaim case clear the accessed bit instead of
flushing the TLB").

Signed-off-by: Alex Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
[will: rewrote patch]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-26 16:59:46 +00:00
Will Deacon 7f08872774 arm64: tlb: Rewrite stale comment in asm/tlbflush.h
Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but
actually that whole block comment needs to be rewritten.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11 16:49:12 +01:00
Will Deacon ace8cb7545 arm64: tlb: Avoid synchronous TLBIs when freeing page tables
By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being
called if we fail to batch table pages for freeing. This in turn allows
us to postpone walk-cache invalidation until tlb_finish_mmu(), which
avoids lots of unnecessary DSBs and means we can shoot down the ASID if
the range is large enough.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11 16:49:12 +01:00
Will Deacon 67a902ac59 arm64: tlbflush: Allow stride to be specified for __flush_tlb_range()
When we are unmapping intermediate page-table entries or huge pages, we
don't need to issue a TLBI instruction for every PAGE_SIZE chunk in the
VA range being unmapped.

Allow the invalidation stride to be passed to __flush_tlb_range(), and
adjust our "just nuke the ASID" heuristic to take this into account.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11 16:49:11 +01:00
Will Deacon d8289d3a58 arm64: tlb: Justify non-leaf invalidation in flush_tlb_range()
Add a comment to explain why we can't get away with last-level
invalidation in flush_tlb_range()

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11 16:49:11 +01:00
Will Deacon 45a284bc5e arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable()
__flush_tlb_[kernel_]pgtable() rely on set_pXd() having a DSB after
writing the new table entry and therefore avoid the barrier prior to the
TLBI instruction.

In preparation for delaying our walk-cache invalidation on the unmap()
path, move the DSB into the TLB invalidation routines.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11 16:49:10 +01:00
Will Deacon 6899a4c82f arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range()
flush_tlb_kernel_range() is only ever used to invalidate last-level
entries, so we can restrict the scope of the TLB invalidation
instruction.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11 16:49:10 +01:00
Chintan Pandya 05f2d2f83b arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable
Add an interface to invalidate intermediate page tables
from TLB for kernel.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-06 13:17:14 +01:00
Philip Elcan 7f170499f7 arm64: tlbflush: avoid writing RES0 bits
Several of the bits of the TLBI register operand are RES0 per the ARM
ARM, so TLBI operations should avoid writing non-zero values to these
bits.

This patch adds a macro __TLBI_VADDR(addr, asid) that creates the
operand register in the correct format and honors the RES0 bits.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Philip Elcan <pelcan@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-28 15:20:17 +01:00
Will Deacon 9b0de864b5 arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI
Since an mm has both a kernel and a user ASID, we need to ensure that
broadcast TLB maintenance targets both address spaces so that things
like CoW continue to work with the uaccess primitives in the kernel.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-11 13:40:44 +00:00
Christopher Covington d9ff80f83e arm64: Work around Falkor erratum 1009
During a TLB invalidate sequence targeting the inner shareable domain,
Falkor may prematurely complete the DSB before all loads and stores using
the old translation are observed. Instruction fetches are not subject to
the conditions of this erratum. If the original code sequence includes
multiple TLB invalidate instructions followed by a single DSB, onle one of
the TLB instructions needs to be repeated to work around this erratum.
While the erratum only applies to cases in which the TLBI specifies the
inner-shareable domain (*IS form of TLBI) and the DSB is ISH form or
stronger (OSH, SYS), this changes applies the workaround overabundantly--
to local TLBI, DSB NSH sequences as well--for simplicity.

Based on work by Shanker Donthineni <shankerd@codeaurora.org>

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-02-01 15:41:50 +00:00
Mark Rutland db68f3e759 arm64: tlbflush.h: add __tlbi() macro
As with dsb() and isb(), add a __tlbi() helper so that we can avoid
distracting asm boilerplate every time we want a TLBI. As some TLBI
operations take an argument while others do not, some pre-processor is
used to handle these two cases with different assembly blocks.

The existing tlbflush.h code is moved over to use the helper.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
[ rename helper to __tlbi, update comment and commit log ]
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-28 10:44:05 +01:00
Will Deacon 28c6fbc3b4 arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
__flush_tlb_pgtable is used to invalidate intermediate page table
entries after they have been cleared and are about to be freed. Since
pXd_clear imply memory barriers, we don't need the extra one here.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:56:33 +01:00
Will Deacon f3e002c24e arm64: tlbflush: remove redundant ASID casts to (unsigned long)
The ASID macro returns a 64-bit (long long) value, so there is no need
to cast to (unsigned long) before shifting prior to a TLBI operation.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:56:06 +01:00
Will Deacon 8e63d38876 arm64: flush: use local TLB and I-cache invalidation
There are a number of places where a single CPU is running with a
private page-table and we need to perform maintenance on the TLB and
I-cache in order to ensure correctness, but do not require the operation
to be broadcast to other CPUs.

This patch adds local variants of tlb_flush_all and __flush_icache_all
to support these use-cases and updates the callers respectively.
__local_flush_icache_all also implies an isb, since it is intended to be
used synchronously.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Daney <david.daney@cavium.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:45:27 +01:00
Catalin Marinas 4150e50bf5 arm64: Use last level TLBI for user pte changes
The flush_tlb_page() function is used on user address ranges when PTEs
(or PMDs/PUDs for huge pages) were changed (attributes or clearing). For
such cases, it is more efficient to invalidate only the last level of
the TLB with the "tlbi vale1is" instruction.

In the TLB shoot-down case, the TLB caching of the intermediate page
table levels (pmd, pud, pgd) is handled by __flush_tlb_pgtable() via the
__(pte|pmd|pud)_free_tlb() functions and it is not deferred to
tlb_finish_mmu() (as of commit 285994a62c - "arm64: Invalidate the TLB
corresponding to intermediate page table levels"). The tlb_flush()
function only needs to invalidate the TLB for the last level of page
tables; the __flush_tlb_range() function gains a fourth argument for
last level TLBI.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-28 11:44:01 +01:00
Catalin Marinas da4e73303e arm64: Clean up __flush_tlb(_kernel)_range functions
This patch moves the MAX_TLB_RANGE check into the
flush_tlb(_kernel)_range functions directly to avoid the
undescore-prefixed definitions (and for consistency with a subsequent
patch).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-28 11:43:15 +01:00
Will Deacon cba3574fd5 arm64: move update_mmu_cache() into asm/pgtable.h
Mark Brown reported an allnoconfig build failure in -next:

  Today's linux-next fails to build an arm64 allnoconfig due to "mm:
  make GUP handle pfn mapping unless FOLL_GET is requested" which
  causes:

  >       arm64-allnoconfig
  > ../mm/gup.c:51:4: error: implicit declaration of function
    'update_mmu_cache' [-Werror=implicit-function-declaration]

Fix the error by moving the function to asm/pgtable.h, as is the case
for most other architectures.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 11:08:39 +01:00
Vladimir Murzin c7d6b573fe arm64: mm: remove reference to tlb.S from comment block
tlb.S has been removed since fa48e6f "arm64: mm: Optimise tlb flush logic
where we have >4K granule", so align comment with that.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-12 15:36:52 +01:00
Catalin Marinas 285994a62c arm64: Invalidate the TLB corresponding to intermediate page table levels
The ARM architecture allows the caching of intermediate page table
levels and page table freeing requires a sequence like:

	pmd_clear()
	TLB invalidation
	pte page freeing

With commit 5e5f6dc105 (arm64: mm: enable HAVE_RCU_TABLE_FREE logic),
the page table freeing batching was moved from tlb_remove_page() to
tlb_remove_table(). The former takes care of TLB invalidation as this is
also shared with pte clearing and page cache page freeing. The latter,
however, does not invalidate the TLBs for intermediate page table levels
as it probably relies on the architecture code to do it if required.
When the mm->mm_users < 2, tlb_remove_table() does not do any batching
and page table pages are freed before tlb_finish_mmu() which performs
the actual TLB invalidation.

This patch introduces __tlb_flush_pgtable() for arm64 and calls it from
the {pte,pmd,pud}_free_tlb() directly without relying on deferred page
table freeing.

Fixes: 5e5f6dc105 arm64: mm: enable HAVE_RCU_TABLE_FREE logic
Reported-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Tested-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-14 10:48:30 +00:00
Yingjoe Chen 06ff87bae8 arm64: mm: remove unused functions and variable protoypes
The functions __cpu_flush_user_tlb_range and __cpu_flush_kern_tlb_range
were removed in commit fa48e6f780 'arm64: mm: Optimise tlb flush logic
where we have >4K granule'. Global variable cpu_tlb was never used in
arm64.

Remove them.

Signed-off-by: Yingjoe Chen <yingjoe.chen@mediatek.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-02-26 18:25:38 +00:00
Mark Salter 05ac653054 arm64: fix soft lockup due to large tlb flush range
Under certain loads, this soft lockup has been observed:

   BUG: soft lockup - CPU#2 stuck for 22s! [ip6tables:1016]
   Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211 rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat fat efivarfs xfs libcrc32c

   CPU: 2 PID: 1016 Comm: ip6tables Not tainted 3.13.0-0.rc7.30.sa2.aarch64 #1
   task: fffffe03e81d1400 ti: fffffe03f01f8000 task.ti: fffffe03f01f8000
   PC is at __cpu_flush_kern_tlb_range+0xc/0x40
   LR is at __purge_vmap_area_lazy+0x28c/0x3ac
   pc : [<fffffe000009c5cc>] lr : [<fffffe0000182710>] pstate: 80000145
   sp : fffffe03f01fbb70
   x29: fffffe03f01fbb70 x28: fffffe03f01f8000
   x27: fffffe0000b19000 x26: 00000000000000d0
   x25: 000000000000001c x24: fffffe03f01fbc50
   x23: fffffe03f01fbc58 x22: fffffe03f01fbc10
   x21: fffffe0000b2a3f8 x20: 0000000000000802
   x19: fffffe0000b2a3c8 x18: 000003fffdf52710
   x17: 000003ff9d8bb910 x16: fffffe000050fbfc
   x15: 0000000000005735 x14: 000003ff9d7e1a5c
   x13: 0000000000000000 x12: 000003ff9d7e1a5c
   x11: 0000000000000007 x10: fffffe0000c09af0
   x9 : fffffe0000ad1000 x8 : 000000000000005c
   x7 : fffffe03e8624000 x6 : 0000000000000000
   x5 : 0000000000000000 x4 : 0000000000000000
   x3 : fffffe0000c09cc8 x2 : 0000000000000000
   x1 : 000fffffdfffca80 x0 : 000fffffcd742150

The __cpu_flush_kern_tlb_range() function looks like:

  ENTRY(__cpu_flush_kern_tlb_range)
	dsb	sy
	lsr	x0, x0, #12
	lsr	x1, x1, #12
  1:	tlbi	vaae1is, x0
	add	x0, x0, #1
	cmp	x0, x1
	b.lo	1b
	dsb	sy
	isb
	ret
  ENDPROC(__cpu_flush_kern_tlb_range)

The above soft lockup shows the PC at tlbi insn with:

  x0 = 0x000fffffcd742150
  x1 = 0x000fffffdfffca80

So __cpu_flush_kern_tlb_range has 0x128ba930 tlbi flushes left
after it has already been looping for 23 seconds!.

Looking up one frame at __purge_vmap_area_lazy(), there is:

	...
	list_for_each_entry_rcu(va, &vmap_area_list, list) {
		if (va->flags & VM_LAZY_FREE) {
			if (va->va_start < *start)
				*start = va->va_start;
			if (va->va_end > *end)
				*end = va->va_end;
			nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
			list_add_tail(&va->purge_list, &valist);
			va->flags |= VM_LAZY_FREEING;
			va->flags &= ~VM_LAZY_FREE;
		}
	}
	...
	if (nr || force_flush)
		flush_tlb_kernel_range(*start, *end);

So if two areas are being freed, the range passed to
flush_tlb_kernel_range() may be as large as the vmalloc
space. For arm64, this is ~240GB for 4k pagesize and ~2TB
for 64kpage size.

This patch works around this problem by adding a loop limit.
If the range is larger than the limit, use flush_tlb_all()
rather than flushing based on individual pages. The limit
chosen is arbitrary as the TLB size is implementation
specific and not accessible in an architected way. The aim
of the arbitrary limit is to avoid soft lockup.

Signed-off-by: Mark Salter <msalter@redhat.com>
[catalin.marinas@arm.com: commit log update]
[catalin.marinas@arm.com: marginal optimisation]
[catalin.marinas@arm.com: changed to MAX_TLB_RANGE and added comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-07-24 18:41:13 +01:00
Catalin Marinas 7f0b1bf045 arm64: Fix barriers used for page table modifications
The architecture specification states that both DSB and ISB are required
between page table modifications and subsequent memory accesses using the
corresponding virtual address. When TLB invalidation takes place, the
tlb_flush_* functions already have the necessary barriers. However, there are
other functions like create_mapping() for which this is not the case.

The patch adds the DSB+ISB instructions in the set_pte() function for
valid kernel mappings. The invalid pte case is handled by tlb_flush_*
and the user mappings in general have a corresponding update_mmu_cache()
call containing a DSB. Even when update_mmu_cache() isn't called, the
kernel can still cope with an unlikely spurious page fault by
re-executing the instruction.

In addition, the set_pmd, set_pud() functions gain an ISB for
architecture compliance when block mappings are created.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Steve Capper <steve.capper@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
2014-07-24 10:25:42 +01:00
Will Deacon 98f7685ee6 arm64: barriers: make use of barrier options with explicit barriers
When calling our low-level barrier macros directly, we can often suffice
with more relaxed behaviour than the default "all accesses, full system"
option.

This patch updates the users of dsb() to specify the option which they
actually require.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 17:03:15 +01:00
Steve Capper fa48e6f780 arm64: mm: Optimise tlb flush logic where we have >4K granule
The tlb maintainence functions: __cpu_flush_user_tlb_range and
__cpu_flush_kern_tlb_range do not take into consideration the page
granule when looping through the address range, and repeatedly flush
tlb entries for the same page when operating with 64K pages.

This patch re-works the logic s.t. we instead advance the loop by
 1 << (PAGE_SHIFT - 12), so avoid repeating ourselves.

Also the routines have been converted from assembler to static inline
functions to aid with legibility and potential compiler optimisations.

The isb() has been removed from flush_tlb_kernel_range(.) as it is
only needed when changing the execute permission of a mapping. If one
needs to set an area of the kernel as execute/non-execute an isb()
must be inserted after the call to flush_tlb_kernel_range.

Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 17:00:48 +01:00
Steve Capper af07484863 ARM64: mm: THP support.
Bring Transparent HugePage support to ARM. The size of a
transparent huge page depends on the normal page size. A
transparent huge page is always represented as a pmd.

If PAGE_SIZE is 4KB, THPs are 2MB.
If PAGE_SIZE is 64KB, THPs are 512MB.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14 09:52:41 +01:00
Catalin Marinas 58d0ba578b arm64: TLB maintenance functionality
This patch adds the TLB maintenance functions. There is no distinction
made between the I and D TLBs. TLB maintenance operations are
automatically broadcast between CPUs in hardware. The inner-shareable
operations are always present, even on UP systems.

NOTE: Large part of this patch to be dropped once Peter Z's generic
mmu_gather patches are merged.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2012-09-17 13:42:01 +01:00