Only pass the address of a u64 if that is what the function requires.
[Split out of a larger patch - sfr]
[update comment - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add ps3vram driver, which exposes unused video RAM on the PS3 as a MTD
device suitable for storage or swap. Fast data transfer is achieved
using a local cache in system RAM and DMA transfers via the GPU.
Signed-off-by: Vivien Chappelier <vivien.chappelier@free.fr>
Signed-off-by: Jim Paris <jim@jtan.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
iSeries dependent drivers fail to build, when CONFIG_VIOPATH is disabled.
Fix the problem by making those drivers select it.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested
in Documentation/spinlocks.txt
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
declarer name DEFINE_SPINLOCK;
identifier xxx_lock;
@@
- spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+ DEFINE_SPINLOCK(xxx_lock);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested
in Documentation/spinlocks.txt
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
declarer name DEFINE_SPINLOCK;
identifier xxx_lock;
@@
- spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+ DEFINE_SPINLOCK(xxx_lock);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The non-zero return from the prepare callback is returned by sys_kexec_load()
to userspace, indicating that kexec is not supported on the machine.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch fixes some unbalanced OF node references in the
powermac code
Signed-off-by: Nicolas Palix <npalix@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
stacktrace: provide save_stack_trace_tsk() weak alias
rcu: provide RCU options on non-preempt architectures too
printk: fix discarding message when recursion_bug
futex: clean up futex_(un)lock_pi fault handling
"Tree RCU": scalable classic RCU implementation
futex: rename field in futex_q to clarify single waiter semantics
x86/swiotlb: add default swiotlb_arch_range_needs_mapping
x86/swiotlb: add default phys<->bus conversion
x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
x86: add swiotlb allocation functions
swiotlb: consolidate swiotlb info message printing
swiotlb: support bouncing of HighMem pages
swiotlb: factor out copy to/from device
swiotlb: add arch hook to force mapping
swiotlb: allow architectures to override phys<->bus<->phys conversions
swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
rcu: fix rcutorture behavior during reboot
resources: skip sanity check of busy resources
swiotlb: move some definitions to header
swiotlb: allow architectures to override swiotlb pool allocation
...
Fix up trivial conflicts in
arch/x86/kernel/Makefile
arch/x86/mm/init_32.c
include/linux/hardirq.h
as per Ingo's suggestions.
There is a call to local_irq_restore in the normal exit case, so it would
seem that there should be one on an error return as well.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression l;
expression E,E1,E2;
@@
local_irq_save(l);
... when != local_irq_restore(l)
when != spin_unlock_irqrestore(E,l)
when any
when strict
(
if (...) { ... when != local_irq_restore(l)
when != spin_unlock_irqrestore(E1,l)
+ local_irq_restore(l);
return ...;
}
|
if (...)
+ {local_irq_restore(l);
return ...;
+ }
|
spin_unlock_irqrestore(E2,l);
|
local_irq_restore(l);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Some 83xx boards were not ready for the optional QUICC Engine support.
This patch fixes following build errors:
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb308): undefined reference to `par_io_data_set'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb334): undefined reference to `par_io_data_set'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb408): undefined reference to `qe_ic_get_high_irq'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb478): undefined reference to `qe_ic_get_low_irq'
arch/powerpc/platforms/built-in.o: In function `mpc832x_spi_init':
mpc832x_rdb.c:(.init.text+0x574c): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x5768): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x5784): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x57a0): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x57bc): undefined reference to `par_io_config_pin'
arch/powerpc/platforms/built-in.o:mpc832x_rdb.c:(.init.text+0x57d8): more undefined references to `par_io_config_pin' follow
arch/powerpc/platforms/built-in.o: In function `mpc836x_rdk_init_IRQ':
mpc836x_rdk.c:(.init.text+0x5e84): undefined reference to `qe_ic_init'
arch/powerpc/platforms/built-in.o: In function `mpc836x_rdk_setup_arch':
mpc836x_rdk.c:(.init.text+0x5f10): undefined reference to `qe_reset'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
- Update the device tree per QE USB bindings;
- Add timer (FSL GTM) node;
- Add gpio-controller node for BCSR13 bank (GPIOs on that bank
are used to control the USB transceiver);
- Set up other BCSR registers;
- Configure the QE Par IO.
The work is loosely based on Li Yang's patch[1], which was used
to support peripheral mode only.
[1] http://ozlabs.org/pipermail/linuxppc-dev/2008-August/061357.html
The s-o-b line of the original patch preserved here.
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The driver supports very simple GPIO controllers, that is, when a
controller provides just a 'data' register. Such controllers may be
found in various BCSRs (Board's FPGAs used to control board's
switches, LEDs, chip-selects, Ethernet/USB PHY power, etc).
So far we support only 1-byte GPIO banks. Support for other widths may
be implemented when/if needed.
p.s.
To avoid "made up" compatible entries (like compatible = "simple-gpio"),
boards must call simple_gpiochip_init() to pass the compatible string.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch fixes following sparse warnings:
CHECK 83xx/usb.c
83xx/usb.c:205:5: warning: symbol 'mpc837x_usb_cfg' was not declared. Should it be static?
CHECK 83xx/mpc831x_rdb.c
83xx/mpc831x_rdb.c:45:13: warning: symbol 'mpc831x_rdb_init_IRQ' was not declared. Should it be static?
CHECK 83xx/mpc832x_rdb.c
83xx/mpc832x_rdb.c:133:13: warning: symbol 'mpc832x_rdb_init_IRQ' was not declared. Should it be static?
CHECK 83xx/mpc832x_mds.c
83xx/mpc832x_mds.c:68:12: warning: Using plain integer as NULL pointer
83xx/mpc832x_mds.c:72:13: warning: incorrect type in assignment (different address spaces)
83xx/mpc832x_mds.c:72:13: expected unsigned char [usertype] *static [toplevel] bcsr_regs
83xx/mpc832x_mds.c:72:13: got void [noderef] <asn:2>*
83xx/mpc832x_mds.c:99:11: warning: incorrect type in argument 1 (different address spaces)
83xx/mpc832x_mds.c:99:11: expected void volatile [noderef] <asn:2>*addr
83xx/mpc832x_mds.c:99:11: got unsigned char [usertype] *static [toplevel] bcsr_regs
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
powerpc/44x: Support 16K/64K base page sizes on 44x
powerpc: Force memory size to be a multiple of PAGE_SIZE
powerpc/32: Wire up the trampoline code for kdump
powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
powerpc/32: Setup OF properties for kdump
powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
powerpc: Prepare xmon_save_regs for use with kdump
powerpc: Remove default kexec/crash_kernel ops assignments
powerpc: Make default kexec/crash_kernel ops implicit
powerpc: Setup OF properties for ppc32 kexec
powerpc/pseries: Fix cpu hotplug
powerpc: Fix KVM build on ppc440
powerpc/cell: add QPACE as a separate Cell platform
powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
powerpc/mpc5200: fix error paths in PSC UART probe function
powerpc/mpc5200: add rts/cts handling in PSC UART driver
powerpc/mpc5200: Make PSC UART driver update serial errors counters
powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
...
Fix trivial conflict in drivers/char/Makefile as per Paul's directions
This adds support for 16k and 64k page sizes on PowerPC 44x processors.
The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().
One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vladimir Panfilov <pvr@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
(Also replaces powerpc internal uses of node_to_cpumask).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, pseries_cpu_die() calls msleep() while polling RTAS for
the status of the dying cpu.
However, if the cpu that is going down also happens to be the one
doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is
only passed later on in tick_shutdown() when _cpu_down() does the
CPU_DEAD notification. Therefore jiffies won't be updated anymore.
This replaces that msleep() with a cpu_relax() to make sure we're not
going to schedule at that point.
With this patch my test box survives a 100k iterations hotplug stress
test on _all_ cpus, whereas without it, it quickly dies after ~50
iterations.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since the QPACE (Chromodynamics Parallel Computing on the
Cell Broadband Engine) platform doesn't use a iommu, doesn't
have PCI devices and a MPIC much lesser setup and
configurations are needed. So far all devices are detected
as OF device. A notifier function is used to set the dma_ops
for the of_platform bus. Further this patch splits the
PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are
shared with the QPACE platform and the rest.
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
CBE_THERM and OPROFILE_CELL both cannot be built without
SPU_FS disabled, so make the dependency explicit.
Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The MPC5200 internal interrupt controller setup function needs to set
the default interrupt controller when it is called. Without this
irq_create_of_mapping() cannot be called without first determining
the pointer to the irq controller (ie. call with controller = NULL).
Reported-by: Steven Cavanagh <scavanagh@secretlab.ca>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds documentation to the mpc5200 interrupt controller
driver and cleans up some minor coding conventions. It also moves the
contents of mpc52xx_pic.h into the driver proper (except for a small
common bit that is moved to the common mpc52xx.h) because the
information encoded there is not required by any other part of kernel
code. Finally for code readability sake, the L2_OFFSET shift value
is removed because the code using it resolves to a noop.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit. We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it. The hash code clears it if the feature bit is not set.
It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.
This should help having the PTE actually containing the bit
combinations that we really want.
I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss. I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features. I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else. This is
preliminary work for adding SMP support for BookE processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When running Active Memory Sharing, pages can get marked as
"loaned" with the hypervisor by the CMM driver. This state gets
cleared by the system firmware when rebooting the partition.
When using kexec to boot a new kernel, this state never gets
cleared and the hypervisor and CMM driver can get out of sync
with respect to the number of pages currently marked "loaned".
Fix this by adding a reboot notifier to the CMM driver to deflate
the balloon and mark all pages as active.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When running Active Memory Sharing, the Collaborative Memory Manager
(CMM) may mark some pages as "loaned" with the hypervisor.
Periodically, the CMM will query the hypervisor for a loan request,
which is a single signed value. When kexec'ing into a kdump kernel,
the CMM driver in the kdump kernel is not aware of the pages the
previous kernel had marked as "loaned", so the hypervisor and the CMM
driver are out of sync. This results in the CMM driver getting a
negative loan request, which can then get treated as a large unsigned
value and can cause kdump to hang due to the CMM driver inflating too
large. Since there really is no clean way for the CMM driver in the
kdump kernel to clean this up, simply disable CMM in the kdump kernel.
This fixes hangs we were seeing doing kdump with AMS.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Otherwise you get lot of errors like these:
drivers/block/viodasd.c:72: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c: In function 'viodasd_open':
drivers/block/viodasd.c:135: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c: In function 'viodasd_release':
drivers/block/viodasd.c:184: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c: In function 'viodasd_getgeo':
drivers/block/viodasd.c:209: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c:214: error: implicit declaration of function 'get_capacity'
drivers/block/viodasd.c: At top level:
drivers/block/viodasd.c:222: error: variable 'viodasd_fops' has initializer but incomplete type
drivers/block/viodasd.c:223: error: unknown field 'owner' specified in initializer
Discovered by a randconfig build.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ibm_configure_kernel_dump is passed as the token to rtas_call() is
never initialised. This sets it to something sane.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
print_dump_header() will be called at least once with a NULL pointer in
a normal boot sequence. If DEBUG is defined then we will dereference
the pointer and crash. Add a quick fix to exit early in the NULL pointer
case.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.
This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.
Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.
Updates from v9 (http://lkml.org/lkml/2008/12/2/334):
o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.
I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.
A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:
http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf
o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.
o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.
Updates from v8 (http://lkml.org/lkml/2008/11/15/139):
o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.
o Apply Ingo's printk-standardization patch.
o Substitute local variables for repeated accesses to global
variables.
o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).
o Apply checkpatch fixes.
Updates from v7 (http://lkml.org/lkml/2008/10/10/291):
o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)
o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.
o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.
o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.
o Add more data to tracing.
o Remove unused "rcu_barrier" field from rcu_data structure.
o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.
o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...
Updates from v6 (http://lkml.org/lkml/2008/9/23/448):
o Fix a number of checkpatch.pl complaints.
o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.
o Fix several bugs in !CONFIG_SMP builds.
o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.
o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.
Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):
o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).
o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.
o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.
o A number of optimizations and usability improvements:
o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.
o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.
o Rearrange struct fields to improve struct layout.
o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.
o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.
o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.
o Document tracing files and formats for both rcupreempt
and rcutree.
Updates from v4 for those missing v5 given its bad subject line:
o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)
o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.
o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.
o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)
o Adjusted rcuclassic and rcupreempt to handle dynticks changes.
Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.
This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)
In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.
Some shortcomings:
o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.
Patches will be provided as required.
o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.
Patches will be provided as required.
o The memory footprint of this version is several KB larger
than rcuclassic.
A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.
Credits:
o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)
o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.
o Thomas Gleixner for much-needed help with some timer issues
(see patches below).
o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently there are a number of platforms that open code access to
the ppc_pci_flags global variable. However, that variable is not
present if CONFIG_PCI is not set, which can lead to a build break.
This introduces a number of accessor functions that are defined
to be empty in the case of CONFIG_PCI being disabled. The
various platform files in the kernel are updated to use these.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since "Factor out cpu joining/unjoining the GIQ"
(b4963255ad) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform. While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.
Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().
Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The hard_smp_processor_id functions are the appropriate interfaces for
managing physical CPU ids.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
commit 059e4938f8 ("powerpc/ps3: Add a sub-match
id to ps3_system_bus") forgot to update the module alias support:
- Add the sub-match ids to the module aliases, so udev can distinguish
between different types of sub-devices.
- Rename PS3_MODULE_ALIAS_GRAPHICS to PS3_MODULE_ALIAS_GPU_FB, as ps3fb
binds to the "FB" sub-device.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the debug message in dma_sb_region_create() from
pr_info() to DBG() to quiet the dmesg output.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
of_node_put is needed before discarding a value received from
of_find_node_by_name, eg in error handling code or when the device
node is no longer used.
The semantic match that catches the bug is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression struct device_node *n;
position p1, p2;
statement S1,S2;
expression E,E1;
expression *ptr != NULL;
@@
(
if (!(n@p1 = of_find_node_by_name(...))) S1
|
n@p1 = of_find_node_by_name(...)
)
<... when != of_node_put(n)
when != if (...) { <+... of_node_put(n) ...+> }
when != true !n || ...
when != n = E
when != E = n
if (!n || ...) S2
...>
(
return \(0\|<+...n...+>\|ptr\);
|
return@p2 ...;
|
n = E1
|
E1 = n
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s of_find_node_by_name %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Nicolas Palix <npalix@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit d015fe995 'powerpc/cell/axon-msi: Retry on missing interrupt'
has turned a rare failure to kexec on QS22 into a reproducible
error, which we have now analysed.
The problem is that after a kexec, the MSIC hardware still points
into the middle of the old ring buffer. We set up the ring buffer
during reboot, but not the offset into it. On older kernels, this
would cause a storm of thousands of spurious interrupts after a
kexec, which would most of the time get dropped silently.
With the new code, we time out on each interrupt, waiting for
it to become valid. If more interrupts come in that we time
out on, this goes on indefinitely, which eventually leads to
a hard crash.
The solution in this commit is to read the current offset from
the MSIC when reinitializing it. This now works correctly, as
expected.
Reported-by: Dirk Herrendoerfer <d.herrendoerfer@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>