Commit graph

349 commits

Author SHA1 Message Date
Ian Campbell
c92a7a54d6 x86: reduce arch/x86/mm/ioremap.o size
> Don't we have a special section for page-aligned data so it doesn't
> waste most of two pages?

We have .bss.page_aligned and it seems appropriate to use it.

    text	   data	    bss	    dec	    hex	filename
    -   3388	   8236	      4	  11628	   2d6c	../build-32/arch/x86/mm/ioremap.o
    +   3388	     48	   4100	   7536	   1d70	../build-32/arch/x86/mm/ioremap.o

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:47 +02:00
Yinghai Lu
04adf11435 x86: remove never used nodenumer in pda
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:47 +02:00
Yinghai Lu
beafe91f1c x86: get apic_id later in acpi_numa_processor_affinity_init
we don't need get that so early.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:47 +02:00
Andi Kleen
ef9257668e x86: do kernel direct mapping at boot using GB pages
The AMD Fam10h CPUs support new Gigabyte page table entry for
mapping 1GB at a time. Use this for the kernel direct mapping.

Only done for 64bit because i386 does not support GB page tables.

This only applies to the data portion of the direct mapping; the
kernel text mapping stays with 2MB pages because the AMD Fam10h
microarchitecture does not support GB ITLBs and AMD recommends
against using GB mappings for code.

Can be disabled with disable_gbpages on the kernel command line

[ tglx@linutronix.de: simplify enable code ]
[ Yinghai Lu <yinghai.lu@sun.com>: boot fix on 256 GB RAM ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Ingo Molnar
00d1c5e057 x86: add gbpages switches
These new controls toggle experimental support for a new CPU feature,
the straightforward extension of largepages from the pmd level to the
pud level, which allows 1GB (kernel) TLBs instead of 2MB TLBs.

Turn it off by default, as this code has not been tested well enough yet.

Use the CONFIG_DIRECT_GBPAGES=y .config option or gbpages on the
boot line can be used to enable it. If enabled in the .config then
nogbpages boot option disables it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
H. Peter Anvin
fe770bf031 x86: clean up the page table dumper and add 32-bit support
Clean up the page table dumper (fix boundary conditions, table driven
address ranges, some formatting changes since it is no longer using
the kernel log but a separate virtual file), and generalize to 32
bits.

[ mingo@elte.hu: x86: fix the pagetable dumper ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Arjan van de Ven
926e5392ba x86: add code to dump the (kernel) page tables for visual inspection by kernel developers
This patch adds code to the kernel to have an (optional)
/proc/kernel_page_tables debug file that basically dumps the kernel
pagetables; this allows us kernel developers to verify that nothing fishy is
going on and that the various mappings are set up correctly. This was quite
useful in finding various change_page_attr() bugs, and is very likely to be
useful in the future as well.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: mingo@elte.hu
Cc: tglx@tglx.de
Cc: hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
H. Peter Anvin
2596e0fae0 x86: unify arch/x86/mm/Makefile
Unify arch/x86/mm/Makefile between 32 and 64 bits.

All configuration variables that are protected by Kconfig constraints
have been put in the common part of the Makefile; however, the NUMA
files are totally different between 32 and 64 bits and are handled via
an ifdef.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Thomas Gleixner
ee7ae7a198 x86: add debug info to DEBUG_PAGEALLOC
Add debug information for DEBUG_PAGEALLOC to get some statistics about
the pool usage and split status.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Ingo Molnar
b4e0409a36 x86: check vmlinux limits, 64-bit
these build-time and link-time checks would have prevented the
vmlinux size regression.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:45 +02:00
Andrew Morton
9c312058b2 Avoid false positive warnings in kmap_atomic_prot() with DEBUG_HIGHMEM
I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
positive.  There's no way in which networking will be using highmem pages
here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
performing these checks.

Cc: Pawel Staszewski <pstaszewski@artcom.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
 [ Really sad.  We lose almost all real-life coverage of the debug tests
   with this patch. Now it will only report problems for the cases where
   people actually end up using a HIGHMEM page, not when they just _might_
   use one.    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28 13:08:14 -07:00
Ingo Molnar
3085354de6 x86: prefetch fix #2
Linus noticed a second bug and an uncleanliness:

 - we'd return on any instruction fetch fault

 - we'd use both the value of 16 and the PF_INSTR symbol which are
   the same and make no sense

the cleanup nicely unifies this piece of logic.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 22:00:16 +01:00
Christoph Lameter
25e59881f1 x86: stricter check in follow_huge_addr()
The first page of the compound page is determined in follow_huge_addr()
but then PageCompound() only checks if the page is part of a compound page.
PageHead() allows checking if this is indeed the first page of the
compound.

Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:45 +01:00
Ingo Molnar
bc713dcf35 x86: fix prefetch workaround
some early Athlon XP's and Opterons generate bogus faults on prefetch
instructions. The workaround for this regressed over .24 - reinstate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:44 +01:00
Suresh Siddha
d546b67a94 x86: fix performance drop for glx
fix the 3D performance drop reported at:

   http://bugzilla.kernel.org/show_bug.cgi?id=10328

fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
WC attribute. Recent changes in page attribute code made both
ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
breaks the graphics performance, as the effective memory type is UC instead
of expected WC.

The correct way to fix this is to add ioremap_wc() (which uses UC- in the
absence of PAT kernel support and WC with PAT) and change all the
fb drivers to use this new ioremap_wc() API.

We can take this correct and longer route for post 2.6.25. For now,
revert back to the UC- behavior for ioremap/ioremap_nocache.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26 22:23:41 +01:00
Yinghai Lu
76c324182b x86: fix trim mtrr not to setup_memory two times
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.

otherwise setup_memory() is called two times... that is duplicated...

[ mingo@elte.hu: both Thomas and me simulated a double call to
  setup_bootmem_allocator() and can confirm that it is a real bug
  which can hang in certain configs. It's not been reported yet but
  that is probably due to the relatively scarce nature of
  MTRR-trimming systems. ]

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26 22:23:41 +01:00
Linus Torvalds
b9e76a0074 x86-32: Pass the full resource data to ioremap()
It appears that 64-bit PCI resources cannot possibly ever have worked on
x86-32 even when the RESOURCES_64BIT config option was set, because any
driver that tried to [pci_]ioremap() the resource would have been unable
to do so because the high 32 bits would have been silently dropped on
the floor by the ioremap() routines that only used "unsigned long".

Change them to use "resource_size_t" instead, which properly encodes the
whole 64-bit resource data if RESOURCES_64BIT is enabled.

Acked-by: H. Peter Anvin <hpa@kernel.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-24 11:22:39 -07:00
Yinghai Lu
37bff62e98 x86_64: free_bootmem should take phys
so use nodedata_phys directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21 17:06:15 +01:00
Thomas Gleixner
985a34bd75 x86: remove quicklists
quicklists cause a serious memory leak on 32-bit x86,
as documented at:

  http://bugzilla.kernel.org/show_bug.cgi?id=9991

the reason is that the quicklist pool is a special-purpose
cache that grows out of proportion. It is not accounted for
anywhere and users have no way to even realize that it's
the quicklists that are causing RAM usage spikes. It was
supposed to be a relatively small pool, but as demonstrated
by KOSAKI Motohiro, they can grow as large as:

  Quicklists:    1194304 kB

given how much trouble this code has caused historically,
and given that Andrew objected to its introduction on x86
(years ago), the best option at this point is to remove them.

[ any performance benefits of caching constructed pgds should
  be implemented in a more generic way (possibly within the page
  allocator), while still allowing constructed pages to be
  allocated by other workloads. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-11 17:11:55 +01:00
Ingo Molnar
9a46d7e5b6 x86: ioremap, remove WARN_ON()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-11 17:11:54 +01:00
Yinghai Lu
7c9e92b6cd x86: not set node to cpu_to_node if the node is not online
resolve boot problem reported by Mel Gorman:

   http://lkml.org/lkml/2008/2/13/404

init_cpu_to_node will use cpu->apic (from MADT or mptable) and
apic->node(from SRAT or AMD config space with k8_bus_64.c) to have
cpu->node mapping, and later identify_cpu will overwrite them
again...(with nearby_node...)

this patch checks if the node is online, otherwise it will not
update cpu_node map. so keep cpu_node map to online node before
identify_cpu..., to prevent possible error.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-04 17:10:12 +01:00
Rafael J. Wysocki
9b5cf48b06 x86: revert "x86: CPA: avoid split of alias mappings"
Revert:

  commit 8be8f54bae
  Author: Thomas Gleixner <tglx@linutronix.de>
  Date:   Sat Feb 23 20:43:21 2008 +0100

      x86: CPA: avoid split of alias mappings

because it clearly mishandles the case when __change_page_attr(), called
from __change_page_attr_set_clr(), changes cpa->processed to 1 and
cpa_process_alias(cpa) is executed right after that.

This crashes my x86-64 test box early in the boot process
(ref. http://bugzilla.kernel.org/show_bug.cgi?id=10140#c4).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-03 14:18:27 +01:00
Thomas Gleixner
8be8f54bae x86: CPA: avoid split of alias mappings
avoid over-eager large page splitup.

When the target area needs to be split or is split already (ioremap)
then the current code enforces the split of large mappings in the alias
regions even if we could avoid it.

Use a separate variable processed in the cpa_data structure to carry
the number of pages which have been processed instead of reusing the
numpages variable. This keeps numpages intact and gives the alias code
a chance to keep large mappings intact.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Ingo Molnar
b16bf712f4 x86: fix leak un ioremap_page_range() failure
Jan Beulich noticed it during code review that if a driver's ioremap()
fails (say due to -ENOMEM) then we might leak the struct vm_area.

Free it properly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Ingo Molnar
88f3aec7af x86: fix spontaneous reboot with allyesconfig bzImage
recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.

after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:

      #define KERNEL_TEXT_SIZE  (40*1024*1024)

which limit my vmlinux just happened to pass:

       text           data       bss        dec       hex   filename
   29703744        4222751   8646224   42572719   2899baf   vmlinux

40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/

So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)

So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)

We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26 12:55:56 +01:00
Yinghai Lu
3b57bc461f x86: remove double-checking empty zero pages debug
so far no one complained about that.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26 12:55:55 +01:00
Ingo Molnar
92cb54a37a x86: make DEBUG_PAGEALLOC and CPA more robust
Use PF_MEMALLOC to prevent recursive calls in the DBEUG_PAGEALLOC
case. This makes the code simpler and more robust against allocation
failures.

This fixes the following fallback to non-mmconfig:

   http://lkml.org/lkml/2008/2/20/551
   http://bugzilla.kernel.org/show_bug.cgi?id=10083

Also, for DEBUG_PAGEALLOC=n reduce the pool size to one page.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-26 12:55:50 +01:00
Rafael J. Wysocki
8a235efad5 Hibernation: Handle DEBUG_PAGEALLOC on x86
Make hibernation work with CONFIG_DEBUG_PAGEALLOC set on x86, by
checking if the pages to be copied are marked as present in the
kernel mapping and temporarily marking them as present if that's not
the case.  No functional modifications are introduced if
CONFIG_DEBUG_PAGEALLOC is unset.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-21 02:15:28 -05:00
Linus Torvalds
5d9c4a7de6 Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp: fix missing casts that produced a warning.
  agp: add support for 662/671 to agp driver
  fix historic ioremap() abuse in AGP
  agp/sis: Suspend support for SiS AGP
  agp/sis: Clear bit 2 from aperture size byte as well
2008-02-19 18:29:57 -08:00
Arjan van de Ven
156fbc3fbe x86: fix page_is_ram() thinko
page_is_ram() has a special case for the 640k-1M bios area, however
due to a thinko the special case checks the e820 table entry and
not the memory the user has asked for. This patch fixes the bug.

[ mingo@elte.hu: this too is better solved in the e820 space, but those
  fixes are too intrusive for v2.6.25. ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:34 +01:00
Arjan van de Ven
d8a9e6a51e x86: fix WARN_ON() message: teach page_is_ram() about the special 4Kb bios data page
This patch teaches page_is_ram() about the fact that the first
4Kb of memory are special on x86, even though the E820 table
normally doesn't exclude it.

This fixes the WARN_ON() reported by Laurent Riffard who was also
very helpful in diagnosing the issue.

[ mingo@elte.hu: we are working on doing this properly in the e820
  space, but for 2.6.25 this is the better fix. ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:34 +01:00
Sam Ravnborg
d01b9ad56e x86: fix section mismatch in srat_64.c:reserve_hotadd
reserve_hotadd() are only used by __init acpi_numa_memory_affinity_init().
Annotate reserve_hotadd() with __init is the trivial fix.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:31 +01:00
Andi Kleen
8e31c2ac11 x86: CPA: remove BUG_ON for LRU/Compound pages
New implementation does not use lru for anything so there is no need
to reject pages that are in the LRU. Similar for compound pages (which
were checked because they also use page->lru)

[ tglx@linutronix.de: removed unused variable ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:29 +01:00
Arjan van dev Ven
fcea424d31 fix historic ioremap() abuse in AGP
Several AGP drivers right now use ioremap_nocache() on kernel ram in order
to turn a page of regular memory uncached.

There are two problems with this:

    1) This is a total nightmare for the ioremap() implementation to keep
       various mappings of the same page coherent.

    2) It's a total nightmare for the AGP code since it adds a ton of
       complexity in terms of keeping track of 2 different pointers to
       the same thing, in terms of error handling etc etc.

This patch fixes this by making the AGP drivers use the new
set_memory_XX APIs instead.

Note: amd-k7-agp.c is built on Alpha too, and generic.c is built
on ia64 as well, which do not yet have the set_memory_*() APIs,
so for them some we have a few ugly #ifdefs - hopefully they'll
be fixed soon.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2008-02-19 14:46:39 +10:00
Yinghai Lu
b7ad149d62 x86: reenable support for system without on node0
One system doesn't have RAM for node0 installed.

SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: Node 1 PXM 1 0-a0000
SRAT: Node 1 PXM 1 0-dd000000
SRAT: Node 1 PXM 1 0-123000000
ACPI: SLIT: nodes = 2
 10 13
 13 10
mapped APIC to ffffffffff5fb000 (        fee00000)
Bootmem setup node 1 0000000000000000-0000000123000000
  NODE_DATA [000000000000e000 - 0000000000014fff]
  bootmap [0000000000015000 -  00000000000395ff] pages 25
Could not find start_pfn for node 0
Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #14

Call Trace:
 [<ffffffff80bab498>] free_area_init_node+0x22/0x381
 [<ffffffff8045ffc5>] generic_swap+0x0/0x17
 [<ffffffff80bab0cc>] find_zone_movable_pfns_for_nodes+0x54/0x271
 [<ffffffff80baba5f>] free_area_init_nodes+0x239/0x287
 [<ffffffff80ba6311>] paging_init+0x46/0x4c
 [<ffffffff80b9dda5>] setup_arch+0x3c3/0x44e
 [<ffffffff80b978be>] start_kernel+0x6f/0x2c7
 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3

This happens because node 0 is not online, but the node state in
mm/page_alloc.c has node 0 set.

        nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
                [N_POSSIBLE] = NODE_MASK_ALL,
                [N_ONLINE] = { { [0] = 1UL } },

So we need to clear node_online_map before initializing the memory.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
f34b439f34 x86: CPA: avoid double checking of alias ranges
When the CPA code is called with an virtual address in the range of
the direct mapping or the high alias then we do not need to run
through the alias check for this range.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
af96e4438a x86: CPA no alias checking for _NX
NX settings are not required to be consistent across alias mappings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
31eedd823c x86: zap invalid and unused pmds in early boot
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
from __START_KERNEL_map. The kernel itself only needs _text to _end
mapped in the high alias. On relocatible kernels the ASM setup code
adjusts the compile time created high mappings to the relocation. This
creates invalid pmd entries for negative offsets:

0xffffffff80000000 -> pmd entry: ffffffffff2001e3
It points outside of the physical address space and is marked present.

This starts at the virtual address __START_KERNEL_map and goes up to
the point where the first valid physical address (0x0) is mapped.

Zap the mappings before _text and after _end right away in early
boot. This removes also the invalid entries.

Furthermore it simplifies the range check for high aliases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
c31c7d4844 x86: CPA, fix alias checks
c_p_a() did not discover all aliases correctly. (such as when called
on vmalloc()-ed areas or ioremap()-ed areas)

Push the alias checks to the lower, physical level and consistently
discover all aliases that might exist: the low direct mappings and
the high linear kernel-text mappings (on 64-bit).

Thanks to Andi Kleen for pointing out that this was buggy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Ingo Molnar
f8d8406bcb x86: cpa, fix out of date comment
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:21 +01:00
Thomas Gleixner
69b1415e93 x86: cpa: ensure page alignment
the cpa API is page aligned - warn about any weird alignments.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:20 +01:00
Harvey Harrison
7bfeab9af9 x86: include proper prototypes for rodata_test
extern should not appear in C files.  Also, the definitions
do not match the prototype currently, not sure what way you
want to go with this, I've switched the prototype to return
int, but I can see going to the void return as well.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:20 +01:00
Adrian Bunk
cae30f8270 x86: make dump_pagetable() static
dump_pagetable() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:19 +01:00
Andi Kleen
5d3c8b21e2 x86: CPA: fix gbpages support in try_preserve_large_page
[ mingo@elte.hu: while gbpages cannot be enabled on mainline currently,
  keep the code uptodate and this fix is easy enough. ]

Use correct page sizes and masks for GB pages in try_preserve_large_page()

This prevents a boot hang on a GB capable system with CONFIG_DIRECT_GBPAGES
enabled.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-13 16:20:35 +01:00
Jeremy Fitzhardinge
37cc8d7f96 x86/early_ioremap: don't assume we're using swapper_pg_dir
At the early stages of boot, before the kernel pagetable has been
fully initialized, a Xen kernel will still be running off the
Xen-provided pagetables rather than swapper_pg_dir[].  Therefore,
readback cr3 to determine the base of the pagetable rather than
assuming swapper_pg_dir[].

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Tested-by: Jody Belka <knew-linux@pimb.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-13 16:20:35 +01:00
Thomas Gleixner
81772fea41 x86: remove over noisy debug printk
pageattr-test.c contains a noisy debug printk that people reported.
The condition under which it prints (randomly tapping into a mem_map[]
hole and not being able to c_p_a() there) is valid behavior and not
interesting to report.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-11 11:24:24 -08:00
Thomas Gleixner
fac8493960 x86: cpa, strict range check in try_preserve_large_page()
Right now, we check only the first 4k page for static required protections.
This does not take overlapping regions into account. So we might end up
setting the wrong permissions/protections for other parts of this large page.

This can be optimized further, but correctness is the important part.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Thomas Gleixner
eb5b5f024c x86: cpa, use page pool
Switch the split page code to use the page pool. We do this
unconditionally to avoid different behaviour with and without
DEBUG_PAGEALLOC enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Thomas Gleixner
76ebd0548d x86: introduce page pool in cpa
DEBUG_PAGEALLOC was not possible on 64-bit due to its early-bootup
hardcoded reliance on PSE pages, and the unrobustness of the runtime
splitup of large pages. The splitup ended in recursive calls to
alloc_pages() when a page for a pte split was requested.

Avoid the recursion with a preallocated page pool, which is used to
split up large mappings and gets refilled in the return path of
kernel_map_pages after the split has been done. The size of the page
pool is adjusted to the available memory.

This part just implements the page pool and the initialization w/o
using it yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Ian Campbell
b6fbb669c8 x86: fix early_ioremap pagetable ops
Some important parts of f6df72e71e got
dropped along the way, reintroduce them.

Only affects paravirt guests.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00