I've solved the problem I was having with the simulator and not
booting Debian.
The problem is that the number of bits for the virtual linear array
short-format VHPT (Virtually mapped linear page table, VMLPT for
short) is being tested incorrectly.
There are two problems:
1. The PAL call that should tell the kernel the size of the
virtual address space isn't implemented for the simulator, so
the kernel uses the default 50. This is addressed separately
in dc90e95f31
2. In arch/ia64/mm/init.c there's code to calcualte the size
of the VMLPT based on the number of implemented virtual address
bits and the page size. It checks to see if the VMLPT base
address overlaps the top of the mapped region, but this check
doesn't allow for the address space hole, and in fact will
never trigger.
Here's an alternative test and panic, that I think is more accurate.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not all of the PAL VM calls are implemented for the SKI simulator.
Don't just give up if one fails, print information from the calls
that succeed.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements PAL_VM_SUMMARY (and PAL_MEM_ATTRIB for good
measure) and pretends that the simulated machine is a McKinley.
Some extra comments and clean-up by Tony Luck.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It has been reported that the way Linux handles NODEFER for signals is
not consistent with the way other Unix boxes handle it. I've written a
program to test the behavior of how this flag affects signals and had
several reports from people who ran this on various Unix boxes,
confirming that Linux seems to be unique on the way this is handled.
The way NODEFER affects signals on other Unix boxes is as follows:
1) If NODEFER is set, other signals in sa_mask are still blocked.
2) If NODEFER is set and the signal is in sa_mask, then the signal is
still blocked. (Note: this is the behavior of all tested but Linux _and_
NetBSD 2.0 *).
The way NODEFER affects signals on Linux:
1) If NODEFER is set, other signals are _not_ blocked regardless of
sa_mask (Even NetBSD doesn't do this).
2) If NODEFER is set and the signal is in sa_mask, then the signal being
handled is not blocked.
The patch converts signal handling in all current Linux architectures to
the way most Unix boxes work.
Unix boxes that were tested: DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
* NetBSD was the only other Unix to behave like Linux on point #2. The
main concern was brought up by point #1 which even NetBSD isn't like
Linux. So with this patch, we leave NetBSD as the lonely one that
behaves differently here with #2.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pcibios_bus_to_resource is exported on all architectures except ia64
and sparc. Add exports for the two missing architectures. Needed when
Yenta socket support is compiled as a module.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thanks to Stephane, we've now worked out the real cause of the
`Linux will not boot on simulator' problem. Turns out it's a stack
overflow because the stack pointer wasn't being initialised properly
in boot_head.S (it was being initialised to the lowest instead of the
highest address of the stack, so the first push started to overwrite
data in the BSS).
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After building a fresh tree with gcc 4 I can't boot the simulator as
the bootloader loader dies with
loading /home/ianw/kerntest/kerncomp//build/sim_defconfig/vmlinux...
failed to read phdr
After some investigation I believe this is do with differences between
the alignment of variables on the stack between gcc 3 and 4 and the
ski simulator. If you trace through with the simulator you can see
that the disk_stat structure value returned from the SSC_WAIT_COMPLETION
call seems to be only half loaded. I guess it doesn't like the alignment
of the input.
Signed-off-by: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Just `make oldconfig' doesn't help for the zx1 defconfig ---
because we need the MPT Fusion drivers, which are picked up as not
selected.
Tested on HP ZX2000 and ZX2600.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The main issue is that bus_fixup calls may potentially call
functions that require a valid bus->sysdata pointer. Since
this is the case, we must set the bus->sysdata pointer before
calling the bus_fixup functions. The remaining changes are
simple fixes to make sure memory is cleaned up in the function.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The current one doesn't even make sense anymore on i386 where it
apparently came from.
Follow-up wordsmithing by Matthew Wilcox and Tony Luck.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The PFM_LOAD_CONTEXT may fail silently and cause a session
to remain reserved even though it should not. This can happen
when the commands succeeds in reserving the session but fails
when it actually tries to attach to the load_pid. In that case,
the command has failed but will return 0. More importantly,
the session will remain reserved. This patch fixes the problem.
Signed-off-by: <stephane.eranian@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
this changeset broke the "nohalt" kernel boot option.
8df5a500a3
default_idle() is looking at new variable can_do_pal_halt. However,
that variable did not get cleared upon "nohalt" boot option. Result
is that "nohalt" option is ignored until perfmon is exercised.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is
trying to solve a real problem, we must not hard-code an incomplete and
insufficient approach into a syscall, because syscalls are pretty much
for eternity. I am quite strongly convinced that this syscall must not
hit v2.6.13 in its current form.
Firstly, the syscall lacks basic syscall design: e.g. it allows the
global setting of VM policy for unprivileged users. (!) [ Imagine an
Oracle installation and a SAP installation on the same NUMA box fighting
over the 'optimal' setting for this flag. What will they do? Will they
try to set the flag to their own preferred value every second or so? ]
Secondly, it was added based on a single datapoint from Martin:
http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2
where Martin characterizes the numbers the following way:
' Run-to-run variability for "make -j" is huge, so these numbers aren't
terribly useful except to see that with reclaim the benchmark still
finishes in a reasonable amount of time. '
in other words: the fundamental problem has likely not been solved, only
a tendential move into the right direction has been observed, and a
handful of numbers were picked out of a set of hugely variable results,
without showing the variability data. How much variance is there
run-to-run?
I'd really suggest to first walk the walk and see what's needed to get
stable & predictable kernel compilation numbers on that NUMA box, before
adding random syscalls to tune a particular aspect of the VM ... which
approach might not even matter once the whole picture has been analyzed
and understood!
The third, most important point is that the syscall exposes VM tuning
internals in a completely unstructured way. What sense does it make to
have a _GLOBAL_ per-node setting for 'should we go to another node for
reclaim'? If then it might make sense to do this per-app, via numalib or
so.
The change is minimalistic in that it doesnt remove the syscall and the
underlying infrastructure changes, only the user-visible changes. We
could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
/proc/sys/vm/swappiness, but even that looks quite counterproductive
when the generic approach is that we are trying to reduce the number of
external factors in the VM balance picture.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
unwind.c can read the wrong unat bits from switch_stack.
sw->caller_unat is the value of ar.unat when the task was blocked.
sw->ar_unat is the value of ar.unat after doing st8.spill for r4-7.
IOW, ar_unat is caller_unat with 4 bits changed.
unw_access_gr() uses sw->ar_unat for r4-7 (correct), but it also uses
sw->ar_unat for other scratch registers (incorrect). sw->ar_unat
should only be used for r4-7, everything else should use
sw->caller_unat, unless modified by unwind info. Using sw->ar_unat
risks picking up the 4 bits that were overwritten when r4-7 were saved.
Also this line is wrong
unw.sw_off[unw.preg_index[UNW_REG_PFS]] = SW(AR_UNAT);
and should be
unw.sw_off[unw.preg_index[UNW_REG_PFS]] = SW(AR_PFS);
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Here's the patch again to fix the code to handle if the values between
MAX_USER_RT_PRIO and MAX_RT_PRIO are different.
Without this patch, an SMP system will crash if the values are
different.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with. Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
XPC calls smp_processor_id() twice from xpc_setup_infrastructure() with
preemption enabled, which gets flagged if 'DEBUG_PREEMPT=y'. This patch
replaces the two calls to smp_processor_id() by a single call to
raw_smp_processor_id() since any CPU within the partition will do.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The Altix subarch does not provide node information via ACPI. Instead hooks
are used to fixup pci structures. This patch determines the nodes for Altix
PCI busses.
Remote Bridges:
---------------
Altix supports remote I/O nodes without memory or processors but with bridges.
The TIOCA type of bridge is an AGP bridge and the PROM provides information
about the closest node. That information will be returned by pcibus_to_node.
The TIOCP remote bridge type is a PCI bridge but the PROM does not provide a
closest node id. pcibus_to_node will return -1 for devices on those bridges
meaning that device control structures may be allocated on any node.
Safeguard:
----------
Should the fixups result in invalid node information for a pci controller then
a warning will be printed and pcibus_to_node will return -1.
This patch also fixes the "FIXME" in sn_dma_alloc_coherent. This means that
dma_alloc_coherent will now use alloc_pages_node to allocate memory local to
the node that the PCI device is connected to.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Check with PAL to see what the i-cache line size is for
each level of the cache, and so use the correct stride
when flushing the cache.
Acked-by: David Mosberger
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the CONFIG_IA64_SGI_SN_SIM option entirely, allowing
any kernel bootable on sn2 to also be booted in the simulator.
Boot tested on Altix and HP rx2600.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
pcibus_to_node provides a way for the Linux kernel to identify to which
node a certain pcibus connects to. Allocations of control structures
for devices can then be made on the node where the pci bus is located
to allow local access during interrupt and other device manipulation.
This patch provides a new "node" field in the the pci_controller
structure. The node field will be set based on ACPI information (thanks
to Alex Williamson <alex.williamson@hp.com for that piece).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Create a new top-level menu named "Networking" thus moving
net related options and protocol selection way from the drivers
menu and up on the top-level where they belong.
To implement this all architectures has to source "net/Kconfig" before
drivers/*/Kconfig in their Kconfig file. This change has been
implemented for all architectures.
Device drivers for ordinary NIC's are still to be found
in the Device Drivers section, but Bluetooth, IrDA and ax25
are located with their corresponding menu entries under the new
networking menu item.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ACPI 3.0 added a Correctable Platform Error Interrupt (CPEI)
Processor Overide flag to MADT.Platform_Interrupt_Source.
Record the processor that was provided as hint from ACPI.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Current assign_irq_vector() will panic if interrupt vectors is running
out. But I think how to handle the case of lack of interrupt vectors
should be handled by the caller of this function. For example, some
PCI devices can raise the interrupt signal via both MSI and I/O
APIC. So even if the driver for these device fails to allocate a
vector for MSI, the driver still has a chance to use I/O APIC based
interrupt. But currently there is no chance for these driver to use
I/O APIC based interrupt because kernel will panic when
assign_irq_vector() fails to allocate interrupt vector.
The following patch changes assign_irq_vector() for ia64 to return
-ENOSPC on error instead of panic (as i386 and x86_64 versions do).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Description: Replace schedule_timeout() with msleep_interruptible() to
guarantee the task delays as expected.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jesse Barnes provided the original version of this patch months ago, but
other changes kept conflicting with it, so it got deferred. Greg Edwards
dug it out of obscurity just over a week ago, and almost immediately
another conflicting patch appeared (Bob Picco's memory-less nodes).
I've resolved the conflicts and got it running again. CONFIG_SGI_TIOCX
is set to "y" in defconfig, which causes a Tiger to not boot (oops in
tiocx_init). But that can be resolved later ... get this in now before it
gets stale again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
I reworked how nodes with only CPUs are treated. The patch below seems
simpler to me and has eliminated the complicated routine
reassign_cpu_only_nodes. There isn't any longer the requirement
to modify ACPI NUMA information which was in large part the
complexity introduced in reassign_cpu_only_nodes.
This patch will produce a different number of nodes. For example,
reassign_cpu_only_nodes would reduce two CPUonly nodes and one memory node
configuration to one memory+CPUs node configuration. This patch
doesn't change the number of nodes which means the user will see three. Two
nodes without memory and one node with all the memory.
While doing this patch, I noticed that early_nr_phys_cpus_node isn't serving
any useful purpose. It is called once in find_pernode_space but the value
isn't used to computer pernode space.
Signed-off-by: bob.picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
restore_sigcontext calls ia64_set_local_fpu_owner() which requires that
preempt be disabled.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is the SGI hotplug driver and additional changes required for
the driver. These modifications include changes to the SN io_init.c code
for memory management, the inclusion of new SAL calls to enable and disable
PCI slots, and a hotplug-style driver.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is a rewrite of the code to check the PROM version. The current
code has some deficiences in the way PROM comparisons were made. The minimum
value of PROM that will boot has also been changed to 4.04.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch moves header files out of the arch/ia64/sn directories and into
include/asm-ia64/sn. These files were being included by other subsystems
and should be under include/asm-ia64/sn.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the SN IRQ code such that cpu affinity and
Hotplug can modify IRQ values. The sn_irq_info structures are now locked
using a RCU lock mechanism to avoid lock contention in the lost interrupt
WAR code.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>