Commit Graph

42 Commits

Author SHA1 Message Date
Matthew Wilcox (Oracle) 8b5989f333 arm: implement the new page table range API
Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Change the PG_dcache_clear flag from being per-page
to per-folio which makes __dma_page_dev_to_cpu() a bit more exciting. 
Also add flush_cache_pages(), even though this isn't used by generic code
(yet?)

[m.szyprowski@samsung.com: fix potential endless loop in __dma_page_dev_to_cpu()]
  Link: https://lkml.kernel.org/r/20230809172737.3574190-1-m.szyprowski@samsung.com
[willy@infradead.org: fix folio conversion in __dma_page_dev_to_cpu()]
  Link: https://lkml.kernel.org/r/20230823191852.1556561-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230802151406.3735276-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:20 -07:00
Hugh Dickins de2e4626c7 arm: adjust_pte() use pte_offset_map_nolock()
Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
in adjust_pte(): because it gives the not-locked ptl for precisely that
pte, which the caller can then safely lock; whereas pte_lockptr() is not
so tightly coupled, because it dereferences the pmd pointer again.

Link: https://lkml.kernel.org/r/4d5258bd-ffa0-018-253a-25f2c9b783f7@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:23 -07:00
Hugh Dickins 766b59e876 arm: allow pte_offset_map[_lock]() to fail
Patch series "arch: allow pte_offset_map[_lock]() to fail", v2.

What is it all about?  Some mmap_lock avoidance i.e.  latency reduction. 
Initially just for the case of collapsing shmem or file pages to THPs; but
likely to be relied upon later in other contexts e.g.  freeing of empty
page tables (but that's not work I'm doing).  mmap_write_lock avoidance
when collapsing to anon THPs?  Perhaps, but again that's not work I've
done: a quick attempt was not as easy as the shmem/file case.

I would much prefer not to have to make these small but wide-ranging
changes for such a niche case; but failed to find another way, and have
heard that shmem MADV_COLLAPSE's usefulness is being limited by that
mmap_write_lock it currently requires.

These changes (though of course not these exact patches, and not all of
these architectures!) have been in Google's data centre kernel for three
years now: we do rely upon them.

What are the per-arch changes about?  Generally, two things.

One: the current mmap locking may not be enough to guard against that
tricky transition between pmd entry pointing to page table, and empty pmd
entry, and pmd entry pointing to huge page: pte_offset_map() will have to
validate the pmd entry for itself, returning NULL if no page table is
there.  What to do about that varies: often the nearby error handling
indicates just to skip it; but in some cases a "goto again" looks
appropriate (and if that risks an infinite loop, then there must have been
an oops, or pfn 0 mistaken for page table, before).

Deeper study of each site might show that 90% of them here in arch code
could only fail if there's corruption e.g.  a transition to THP would be
surprising on an arch without HAVE_ARCH_TRANSPARENT_HUGEPAGE.  But given
the likely extension to freeing empty page tables, I have not limited this
set of changes to THP; and it has been easier, and sets a better example,
if each site is given appropriate handling.

Two: pte_offset_map() will need to do an rcu_read_lock(), with the
corresponding rcu_read_unlock() in pte_unmap().  But most architectures
never supported CONFIG_HIGHPTE, so some don't always call pte_unmap()
after pte_offset_map(), or have used userspace pte_offset_map() where
pte_offset_kernel() is more correct.  No problem in the current tree, but
a problem once an rcu_read_unlock() will be needed to keep balance.

A common special case of that comes in arch/*/mm/hugetlbpage.c, if the
architecture supports hugetlb pages down at the lowest PTE level. 
huge_pte_alloc() uses pte_alloc_map(), but generic hugetlb code does no
corresponding pte_unmap(); similarly for huge_pte_offset().

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Link: https://lkml.kernel.org/r/a4963be9-7aa6-350-66d0-2ba843e1af44@google.com
Link: https://lkml.kernel.org/r/813429a1-204a-1844-eeae-7fd72826c28@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:05 -07:00
Mike Rapoport e31cf2f4ca mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport 84e6ffb2c4 arm: add support for folded p4d page tables
Implement primitives necessary for the 4th level folding, add walks of p4d
level where appropriate, and remove __ARCH_USE_5LEVEL_HACK.

[rppt@linux.ibm.com: fix kexec]
  Link: http://lkml.kernel.org/r/20200508174232.GA759899@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: James Morse <james.morse@arm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200414153455.21744-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:21 -07:00
Thomas Gleixner d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Huang Ying cb9f753a37 mm: fix races between swapoff and flush dcache
Thanks to commit 4b3ef9daa4 ("mm/swap: split swap cache into 64MB
trunks"), after swapoff the address_space associated with the swap
device will be freed.  So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.

The dcache flushing functions (flush_dcache_page(), etc) in architecture
specific code may access the address_space of swap device for anonymous
pages in swap cache via page_mapping() function.  But in some cases
there are no mechanisms to prevent the swap device from being swapoff,
for example,

  CPU1					CPU2
  __get_user_pages()			swapoff()
    flush_dcache_page()
      mapping = page_mapping()
        ...				  exit_swap_address_space()
        ...				    kvfree(spaces)
        mapping_mapped(mapping)

The address space may be accessed after being freed.

But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should be
used.  The implementation of flush_dcache_page() in all architectures
follows this too.  They will check whether page_mapping() is NULL and
whether mapping_mapped() is true to determine whether to flush the
dcache immediately.  And they will use interval tree (mapping->i_mmap)
to find all user space mappings.  While mapping_mapped() and
mapping->i_mmap isn't used by anonymous pages in swap cache at all.

So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise.  All page_mapping() invoking in flush dcache functions are
replaced with page_mapping_file().

[akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:26 -07:00
Russell King 4ed89f2228 ARM: convert printk(KERN_* to pr_*
Convert many (but not all) printk(KERN_* to pr_* to simplify the code.
We take the opportunity to join some printk lines together so we don't
split the message across several lines, and we also add a few levels
to some messages which were previously missing them.

Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-11-21 15:24:50 +00:00
Kirill A. Shutemov 57c1ffcefb mm: rename USE_SPLIT_PTLOCKS to USE_SPLIT_PTE_PTLOCKS
We're going to introduce split page table lock for PMD level.  Let's
rename existing split ptlock for PTE level to avoid confusion.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:14 +09:00
Michel Lespinasse 6b2dbba8b6 mm: replace vma prio_tree with an interval tree
Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:39 +09:00
Paul Gortmaker d91ef63bd5 arm: remove several unnecessary module.h include instances
Building these files does not reveal a hidden need for
any of these.  Since module.h brings in the whole kitchen
sink, it just needlessly adds 30k+ lines to the cpp burden.

There are probably lots more, but ARM files of mach-* and plat-*
don't get coverage via a simple yesconfig build.  They will have
to be cleaned up and tested via using their respective configs.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:48 -04:00
Russell King 516295e5ab ARM: pgtable: add pud-level code
Add pud_offset() et.al. between the pgd and pmd code in preparation of
using pgtable-nopud.h rather than 4level-fixup.h.

This incorporates a fix from Jamie Iles <jamie@jamieiles.com> for
uaccess_with_memcpy.c.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-02-21 19:24:14 +00:00
Russell King f6e3354d02 ARM: pgtable: introduce pteval_t to represent a pte value
This makes everywhere dealing with pte values use the same type.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-11-26 20:45:47 +00:00
Mika Westerberg 4e54d93d3c ARM: 6464/2: fix spinlock recursion in adjust_pte()
When running following code in a machine which has VIVT caches and
USE_SPLIT_PTLOCKS is not defined:

  fd = open("/etc/passwd", O_RDONLY);
  addr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd, 0);
  addr2 = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd, 0);

  v = *((int *)addr);

we will hang in spinlock recursion in the page fault handler:

  BUG: spinlock recursion on CPU#0, mmap_test/717
  lock: c5e295d8, .magic: dead4ead, .owner: mmap_test/717,
                  .owner_cpu: 0
  [<c0026604>] (unwind_backtrace+0x0/0xec)
  [<c014ee48>] (do_raw_spin_lock+0x40/0x140)
  [<c0027f68>] (update_mmu_cache+0x208/0x250)
  [<c0079db4>] (__do_fault+0x320/0x3ec)
  [<c007af7c>] (handle_mm_fault+0x2f0/0x6d8)
  [<c0027834>] (do_page_fault+0xdc/0x1cc)
  [<c00202d0>] (do_DataAbort+0x34/0x94)

This comes from the fact that when USE_SPLIT_PTLOCKS is not defined,
the only lock protecting the page tables is mm->page_table_lock
which is already locked before update_mmu_cache() is called.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-10-28 13:53:47 +01:00
Peter Zijlstra ece0e2b640 mm: remove pte_*map_nested()
Since we no longer need to provide KM_type, the whole pte_*map_nested()
API is now redundant, remove it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:08 -07:00
Catalin Marinas 6012191aa9 ARM: 6380/1: Introduce __sync_icache_dcache() for VIPT caches
On SMP systems, there is a small chance of a PTE becoming visible to a
different CPU before the current cache maintenance operations in
update_mmu_cache(). To avoid this, cache maintenance must be handled in
set_pte_at() (similar to IA-64 and PowerPC).

This patch provides a unified VIPT cache handling mechanism and
implements the __sync_icache_dcache() function for ARMv6 onwards
architectures. It is called from set_pte_at() and replaces the
update_mmu_cache(). The latter is still used on VIVT hardware where a
vm_area_struct is required.

Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-19 12:17:44 +01:00
Catalin Marinas c01778001a ARM: 6379/1: Assume new page cache pages have dirty D-cache
There are places in Linux where writes to newly allocated page cache
pages happen without a subsequent call to flush_dcache_page() (several
PIO drivers including USB HCD). This patch changes the meaning of
PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly
mapped page in update_mmu_cache().

The patch also sets the PG_arch_1 bit in the DMA cache maintenance
function to avoid additional cache flushing in update_mmu_cache().

Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-19 12:17:43 +01:00
Russell King ac1d426e82 Merge branch 'devel-stable' into devel
Conflicts:
	arch/arm/Kconfig
	arch/arm/include/asm/system.h
	arch/arm/mm/Kconfig
2010-05-17 17:24:04 +01:00
Russell King f76348a360 ARM: remove unnecessary cache flush
This cache flush occurs when we first insert a page into the page
tables, where a page did not exist previously.  There can be no
cache lines associated with this virtual mapping, so this cache
flush is redundant.

Tested-by: Mike Rapoport <mike@compulab.co.il>
Tested-by: Mikael Pettersson <mikpe at it.uu.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-04-14 13:13:25 +01:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Russell King ae1402022e ARM: make_coherent(): fix problems with highpte, part 2
update_mmu_cache() is called with the page table for the faulted-in
page still mapped.  We need to modify the PTE for this page to ensure
coherency with other shared mappings when multiple shared mappings
exist within a MM.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:42:51 +00:00
Russell King 4b3073e1c5 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies.  We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

  On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
  to construct a pointer to the pte again.  Passing a pte_t * is much
  more elegant.  Maybe we might even replace the pte argument with the
  pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

  Passing the ptep in there is exactly what I want.  I want that
  -instead- of the PTE value, because I have issue on some ppc cases,
  for I$/D$ coherency, where set_pte_at() may decide to mask out the
  _PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

  sparc: fix fallout from update_mmu_cache API change

  Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:41:46 +00:00
Russell King ed42acaef1 ARM: make_coherent: avoid recalculating the pfn for the modified page
We already know the pfn for the page to be modified in make_coherent,
so let's stop recalculating it unnecessarily.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-01-20 13:48:30 +00:00
Russell King 56dd47098a ARM: make_coherent: fix problems with highpte, part 1
update_mmu_cache() is called with a page table already mapped.  We
call make_coherent(), which then calls adjust_pte() which wants to
map other page tables.  This causes kmap_atomic() to BUG() because
the slot its trying to use is already taken.

Since do_adjust_pte() modifies the page tables, we are also missing
any form of locking, so we're risking corrupting the page tables.

Fix this by using pte_offset_map_nested(), and taking the pte page
table lock around do_adjust_pte().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-01-20 13:48:30 +00:00
Russell King f8a85f1164 ARM: make_coherent: convert adjust_pte() to use p*d_none_or_clear_bad()
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-01-20 13:48:29 +00:00
Russell King c26c20b823 ARM: make_coherent: split adjust_pte() in two
adjust_pte() walks the page tables, and do_adjust_pte() does the
page table manipulation.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-01-20 13:48:29 +00:00
Russell King 52e8bfd81a ARM: Fix wrong shared bit for CPU write buffer bug test
It is unpredictable to have the same memory mapped using different
shared bit settings for ARMv6 and ARMv7 CPUs.  Fix this for the CPU
write buffer bug test.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-23 19:54:31 +00:00
Russell King 7b0a1003e7 ARM: Reduce __flush_dcache_page() visibility
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:50 +00:00
Russell King 421fe93cc4 ARM: ZERO_PAGE: Avoid flush_dcache_page() for zero page
The zero page is read-only, and has its cache state cleared during
boot.  No further maintanence for this page is required.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-01 18:20:07 +00:00
Nitin Gupta 787b2faadc ARM: force dcache flush if dcache_dirty bit set
On ARM, update_mmu_cache() does dcache flush for a page only if
it has a kernel mapping (page_mapping(page) != NULL). The correct
behavior would be to force the flush based on dcache_dirty bit only.

One of the cases where present logic would be a problem is when
a RAM based block device[1] is used as a swap disk. In this case,
we would have in-memory data corruption as shown in steps below:

do_swap_page()
{
    - Allocate a new page (if not already in swap cache)
    - Issue read from swap disk
        - Block driver issues flush_dcache_page()
        - flush_dcache_page() simply sets PG_dcache_dirty bit and does not
          actually issue a flush since this page has no user space mapping yet.
    - Now, if swap disk is almost full, this newly read page is removed
      from swap cache and corrsponding swap slot is freed.
    - Map this page anonymously in user space.
    - update_mmu_cache()
        - Since this page does not have kernel mapping (its not in page/swap
          cache and is mapped anonymously), it does not issue dcache flush
          even if dcache_dirty bit is set by flush_dcache_page() above.

    <user now gets stale data since dcache was never flushed>
}

Same problem exists on mips too.

[1] example:
 - brd (RAM based block device)
 - ramzswap (RAM based compressed swap device)

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-10-12 17:52:26 +01:00
Nicolas Pitre 08e445bd6a [ARM] 5366/1: fix shared memory coherency with VIVT L1 + L2 caches
When there are multiple L1-aliasing userland mappings of the same physical
page, we currently remap each of them uncached, to prevent VIVT cache
aliasing issues. (E.g. writes to one of the mappings not being immediately
visible via another mapping.)  However, when we do this remapping, there
could still be stale data in the L2 cache, and an uncached mapping might
bypass L2 and go straight to RAM.  This would cause reads from such
mappings to see old data (until the dirty L2 line is eventually evicted.)

This issue is solved by forcing a L2 cache flush whenever the shared page
is made L1 uncacheable.

Ideally, we would make L1 uncacheable and L2 cacheable as L2 is PIPT. But
Feroceon does not support that combination, and the TEX=5 C=0 B=0 encoding
for XSc3 doesn't appear to work in practice.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-01-28 16:55:00 +00:00
Russell King 6a4690c22f Merge branch 'ptebits' into devel
Conflicts:

	arch/arm/Kconfig
2008-10-09 21:31:56 +01:00
Russell King bb30f36f9b [ARM] Introduce new PTE memory type bits
Provide L_PTE_MT_xxx definitions to describe the memory types that we
use in Linux/ARM.  These definitions are carefully picked such that:

1. their LSBs match what is required for pre-ARMv6 CPUs.
2. they all have a unique encoding, including after modification
   by build_mem_type_table() (the result being that some have more
   than one combination.)

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-10-01 16:40:56 +01:00
Russell King 09d9bae064 [ARM] sparse: fix several warnings
arch/arm/kernel/process.c:270:6: warning: symbol 'show_fpregs' was not declared. Should it be static?

This function isn't used, so can be removed.

arch/arm/kernel/setup.c:532:9: warning: symbol 'len' shadows an earlier one
arch/arm/kernel/setup.c:524:6: originally declared here

A function containing two 'len's.

arch/arm/mm/fault-armv.c:188:13: warning: symbol 'check_writebuffer_bugs' was not declared. Should it be static?
arch/arm/mm/mmap.c:122:5: warning: symbol 'valid_phys_addr_range' was not declared. Should it be static?
arch/arm/mm/mmap.c:137:5: warning: symbol 'valid_mmap_phys_addr_range' was not declared. Should it be static?

Missing includes.

arch/arm/kernel/traps.c:71:77: warning: Using plain integer as NULL pointer
arch/arm/mm/ioremap.c:355:46: error: incompatible types in comparison expression (different address spaces)

Sillies.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-09-05 14:11:24 +01:00
Russell King 46097c7dd8 [ARM] cachetype: move definitions to separate header
Rather than pollute asm/cacheflush.h with the cache type definitions,
move them to asm/cachetype.h, and include this new header where
necessary.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-09-01 12:06:24 +01:00
Russell King 53cdb27a93 [ARM] Fix shared mmap when more than two maps of the same file exist
The shared mmap code works fine for the test case, which only checked
for two shared maps of the same file.  However, three shared maps
result in one mapping remaining cached, resulting in stale data being
visible via that mapping.  Fix this.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-27 10:35:54 +01:00
Catalin Marinas 826cbdaff2 [ARM] 5092/1: Fix the I-cache invalidation on ARMv6 and later CPUs
This patch adds the I-cache invalidation in update_mmu_cache if the
corresponding vma is marked as executable. It also invalidates the
I-cache if a thread migrates to a CPU it never ran on.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-03 16:39:57 +01:00
George G. Davis cb36bb7516 [ARM] 4191/1: Remove redundant __flush_dcache_page() function prototype
Commit 1c9d3df5e8 added function prototype
__flush_dcache_page() in include/asm-arm/cacheflush.h.  So we can remove
the prototype for same in arch/arm/mm/fault-armv.c since it is now
redundant to have it there.

Signed-off-by: George G. Davis <gdavis@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-02-16 12:57:55 +00:00
Russell King ad1ae2fe7f [ARM] Unuse another Linux PTE bit
L_PTE_ASID is not really required to be stored in every PTE, since we
can identify it via the address passed to set_pte_at().  So, create
set_pte_ext() which takes the address of the PTE to set, the Linux
PTE value, and the additional CPU PTE bits which aren't encoded in
the Linux PTE value.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-12-13 14:34:43 +00:00
Hugh Dickins 69b0475456 [PATCH] mm: arm ready for split ptlock
Prepare arm for the split page_table_lock: three issues.

Signal handling's preserve and restore of iwmmxt context currently involves
reading and writing that context to and from user space, while holding
page_table_lock to secure the user page(s) against kswapd.  If we split the
lock, then the structure might span two pages, secured by to read into and
write from a kernel stack buffer, copying that out and in without locking (the
structure is 160 bytes in size, and here we're near the top of the kernel
stack).  Or would the overhead be noticeable?

arm_syscall's cmpxchg emulation use pte_offset_map_lock, instead of
pte_offset_map and mm-wide page_table_lock; and strictly, it should now also
take mmap_sem before descending to pmd, to guard against another thread
munmapping, and the page table pulled out beneath this thread.

Updated two comments in fault-armv.c.  adjust_pte is interesting, since its
modification of a pte in one part of the mm depends on the lock held when
calling update_mmu_cache for a pte in some other part of that mm.  This can't
be done with a split page_table_lock (and we've already taken the lowest lock
in the hierarchy here): so we'll have to disable split on arm, unless
CONFIG_CPU_CACHE_VIPT to ensures adjust_pte never used.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:42 -07:00
Russell King 8830f04a09 [PATCH] ARM: Fix delayed dcache flush for ARMv6 non-aliasing caches
flush_dcache_page() did nothing for these caches, but since they
suffer from I/D cache coherency issues, we need to ensure that data
is written back to RAM.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-20 09:51:03 +01:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00