If split page table lock is in use, we embed the lock into struct page
of table's page. We have to disable split lock, if spinlock_t is too
big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
enabled.
This patch add support for dynamic allocation of split page table lock
if we can't embed it to struct page.
page->ptl is unsigned long now and we use it as spinlock_t if
sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t.
The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
pgtable_pmd_page_ctor() for PMD table. All other helpers converted to
support dynamically allocated page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In split page table lock case, we embed spinlock_t into struct page.
For obvious reason, we don't want to increase size of struct page if
spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
on -rt kernel. So we disable split page table lock, if spinlock_t is
too big.
This patchset allows to allocate the lock dynamically if spinlock_t is
big. In this page->ptl is used to store pointer to spinlock instead of
spinlock itself. It costs additional cache line for indirect access,
but fix page fault scalability for multi-threaded applications.
LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
LOCK_STAT to analyse scalability issues breaks scalability. ;)
The patchset mostly fixes this. Results for ./thp_memscale -c 80 -b 512M
on 4-socket machine:
baseline, no CONFIG_LOCK_STAT: 9.115460703 seconds time elapsed
baseline, CONFIG_LOCK_STAT=y: 53.890567123 seconds time elapsed
patched, no CONFIG_LOCK_STAT: 8.852250368 seconds time elapsed
patched, CONFIG_LOCK_STAT=y: 11.069770759 seconds time elapsed
Patch count is scary, but most of them trivial. Overview:
Patches 1-4 Few bug fixes. No dependencies to other patches.
Probably should applied as soon as possible.
Patch 5 Changes signature of pgtable_page_ctor(). We will use it
for dynamic lock allocation, so it can fail.
Patches 6-8 Add missing constructor/destructor calls on few archs.
It's fixes NR_PAGETABLE accounting and prepare to use
split ptl.
Patches 9-33 Add pgtable_page_ctor() fail handling to all archs.
Patches 34 Finally adds support of dynamically-allocated page->pte.
Also contains documentation for split page table lock.
This patch (of 34):
I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
enabled. Let's add missed constructor/destructor calls.
I haven't noticed it during testing since prep_new_page() clears
page->mapping and therefore page->ptl. It's effectively equal to
spin_lock_init(&page->ptl).
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify struct acpi_dev_node to contain a pointer to struct acpi_device
associated with the given device object (that is, its ACPI companion
device) instead of an ACPI handle corresponding to it. Introduce two
new macros for manipulating that pointer in a CONFIG_ACPI-safe way,
ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the
ACPI_HANDLE() macro to take the above changes into account.
Drop the ACPI_HANDLE_SET() macro entirely and rework its users to
use ACPI_COMPANION_SET() instead. For some of them who used to
pass the result of acpi_get_child() directly to ACPI_HANDLE_SET()
introduce a helper routine acpi_preset_companion() doing an
equivalent thing.
The main motivation for doing this is that there are things
represented by struct acpi_device objects that don't have valid
ACPI handles (so called fixed ACPI hardware features, such as
power and sleep buttons) and we would like to create platform
device objects for them and "glue" them to their ACPI companions
in the usual way (which currently is impossible due to the
lack of valid ACPI handles). However, there are more reasons
why it may be useful.
First, struct acpi_device pointers allow of much better type checking
than void pointers which are ACPI handles, so it should be more
difficult to write buggy code using modified struct acpi_dev_node
and the new macros. Second, the change should help to reduce (over
time) the number of places in which the result of ACPI_HANDLE() is
passed to acpi_bus_get_device() in order to obtain a pointer to the
struct acpi_device associated with the given "physical" device,
because now that pointer is returned by ACPI_COMPANION() directly.
Finally, the change should make it easier to write generic code that
will build both for CONFIG_ACPI set and unset without adding explicit
compiler directives to it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
We've always been able to use either method on VLV, but it appears more
recent BIOSes only support the gen6 method, so switch over to that.
References: https://bugs.freedesktop.org/show_bug.cgi?id=71370
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull two x86 fixes from Ingo Molnar.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error
x86/dumpstack: Fix printk_address for direct addresses
Pull core locking changes from Ingo Molnar:
"The biggest changes:
- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.
- move the various kernel locking primitives to the new
kernel/locking/ directory"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...
Pull x86/trace changes from Ingo Molnar:
"This adds page fault tracepoints which have zero runtime cost in the
disabled case via IDT trickery (no NOPs in the page fault hotpath)"
* 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, trace: Change user|kernel_page_fault to page_fault_user|kernel
x86, trace: Add page fault tracepoints
x86, trace: Delete __trace_alloc_intr_gate()
x86, trace: Register exception handler to trace IDT
x86, trace: Remove __alloc_intr_gate()
- New power capping framework and the the Intel Running Average Power
Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.
- Addition of the in-kernel switching feature to the arm_big_little
cpufreq driver from Viresh Kumar and Nicolas Pitre.
- cpufreq support for iMac G5 from Aaro Koskinen.
- Baytrail processors support for intel_pstate from Dirk Brandewie.
- cpufreq support for Midway/ECX-2000 from Mark Langsdorf.
- ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.
- ACPI power management support for the I2C and SPI bus types from
Mika Westerberg and Lv Zheng.
- cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.
- cpufreq drivers updates (mostly fixes and cleanups) from Viresh Kumar,
Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz Majewski,
Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.
- intel_pstate updates from Dirk Brandewie and Adrian Huang.
- ACPICA update to version 20130927 includig fixes and cleanups and
some reduction of divergences between the ACPICA code in the kernel
and ACPICA upstream in order to improve the automatic ACPICA patch
generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki,
Naresh Bhat, Bjorn Helgaas, David E Box.
- ACPI IPMI driver fixes and cleanups from Lv Zheng.
- ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani,
Zhang Yanfei, Rafael J Wysocki.
- Conversion of the ACPI AC driver to the platform bus type and
multiple driver fixes and cleanups related to ACPI from Zhang Rui.
- ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.
- Fixes and cleanups and new blacklist entries related to the ACPI
video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
Kirill Tkhai.
- cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.
- cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
Bartlomiej Zolnierkiewicz, Prarit Bhargava.
- devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.
- Operation Performance Points (OPP) core updates from Nishanth Menon.
- Runtime power management core fix from Rafael J Wysocki and update
from Ulf Hansson.
- Hibernation fixes from Aaron Lu and Rafael J Wysocki.
- Device suspend/resume lockup detection mechanism from Benoit Goby.
- Removal of unused proc directories created for various ACPI drivers
from Lan Tianyu.
- ACPI LPSS driver fix and new device IDs for the ACPI platform scan
handler from Heikki Krogerus and Jarkko Nikula.
- New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.
- Assorted fixes and cleanups related to ACPI from Andy Shevchenko,
Al Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
Liu Chuansheng.
- Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
Jean-Christophe Plagniol-Villard.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSfPKLAAoJEILEb/54YlRxH6YQAJwDKi25RCZziFSIenXuqzC/
c6JxoH/tSnDHJHhcTgqh7H7Raa+zmatMDf0m2oEv2Wjfx4Lt4BQK4iefhe/zY4lX
yJ8uXDg+U8DYhDX2XwbwnFpd1M1k/A+s2gIHDTHHGnE0kDngXdd8RAFFktBmooTZ
l5LBQvOrTlgX/ZfqI/MNmQ6lfY6kbCABGSHV1tUUsDA6Kkvk/LAUTOMSmptv1q22
hcs6k55vR34qADPkUX5GghjmcYJv+gNtvbDEJUjcmCwVoPWouF415m7R5lJ8w3/M
49Q8Tbu5HELWLwca64OorS8qh/P7sgUOf1BX5IDzHnJT+TGeDfvcYbMv2Z275/WZ
/bqhuLuKBpsHQ2wvEeT+lYV3FlifKeTf1FBxER3ApjzI3GfpmVVQ+dpEu8e9hcTh
ZTPGzziGtoIsHQ0unxb+zQOyt1PmIk+cU4IsKazs5U20zsVDMcKzPrb19Od49vMX
gCHvRzNyOTqKWpE83Ss4NGOVPAG02AXiXi/BpuYBHKDy6fTH/liKiCw5xlCDEtmt
lQrEbupKpc/dhCLo5ws6w7MZzjWJs2eSEQcNR4DlR++pxIpYOOeoPTXXrghgZt2X
mmxZI2qsJ7GAvPzII8OBeF3CRO3fabZ6Nez+M+oEZjGe05ZtpB3ccw410HwieqBn
dYpJFt/BHK189odhV9CM
=JCxk
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael J Wysocki:
- New power capping framework and the the Intel Running Average Power
Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.
- Addition of the in-kernel switching feature to the arm_big_little
cpufreq driver from Viresh Kumar and Nicolas Pitre.
- cpufreq support for iMac G5 from Aaro Koskinen.
- Baytrail processors support for intel_pstate from Dirk Brandewie.
- cpufreq support for Midway/ECX-2000 from Mark Langsdorf.
- ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.
- ACPI power management support for the I2C and SPI bus types from Mika
Westerberg and Lv Zheng.
- cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.
- cpufreq drivers updates (mostly fixes and cleanups) from Viresh
Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.
- intel_pstate updates from Dirk Brandewie and Adrian Huang.
- ACPICA update to version 20130927 includig fixes and cleanups and
some reduction of divergences between the ACPICA code in the kernel
and ACPICA upstream in order to improve the automatic ACPICA patch
generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
Bhat, Bjorn Helgaas, David E Box.
- ACPI IPMI driver fixes and cleanups from Lv Zheng.
- ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
Yanfei, Rafael J Wysocki.
- Conversion of the ACPI AC driver to the platform bus type and
multiple driver fixes and cleanups related to ACPI from Zhang Rui.
- ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.
- Fixes and cleanups and new blacklist entries related to the ACPI
video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
Kirill Tkhai.
- cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.
- cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
Bartlomiej Zolnierkiewicz, Prarit Bhargava.
- devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.
- Operation Performance Points (OPP) core updates from Nishanth Menon.
- Runtime power management core fix from Rafael J Wysocki and update
from Ulf Hansson.
- Hibernation fixes from Aaron Lu and Rafael J Wysocki.
- Device suspend/resume lockup detection mechanism from Benoit Goby.
- Removal of unused proc directories created for various ACPI drivers
from Lan Tianyu.
- ACPI LPSS driver fix and new device IDs for the ACPI platform scan
handler from Heikki Krogerus and Jarkko Nikula.
- New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.
- Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
Liu Chuansheng.
- Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
Jean-Christophe Plagniol-Villard.
* tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
cpufreq: conservative: fix requested_freq reduction issue
ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
ACPI / event: remove unneeded NULL pointer check
Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
ACPI / video: Quirk initial backlight level 0
ACPI / video: Fix initial level validity test
intel_pstate: skip the driver if ACPI has power mgmt option
PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
ACPI / hotplug: Do not execute "insert in progress" _OST
ACPI / hotplug: Carry out PCI root eject directly
ACPI / hotplug: Merge device hot-removal routines
ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
ACPI / hotplug: Simplify device ejection routines
ACPI / hotplug: Fix handle_root_bridge_removal()
ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
ACPI / scan: Start matching drivers after trying scan handlers
ACPI: Remove acpi_pci_slot_init() headers from internal.h
ACPI / blacklist: fix name of ThinkPad Edge E530
PowerCap: Fix build error with option -Werror=format-security
...
Conflicts:
arch/arm/mach-omap2/opp.c
drivers/Kconfig
drivers/spi/spi.c
If a nested guest does a NM fault but its CR0 doesn't contain the TS
flag (because it was already cleared by the guest with L1 aid) then we
have to activate FPU ourselves in L0 and then continue to L2. If TS flag
is set then we fallback on the previous behavior, forward the fault to
L1 if it asked for.
Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
Pull percpu changes from Tejun Heo:
"Two smallish changes for percpu. Two patches to remove unused
this_cpu_xor() and one to fix a bug in percpu init failure path so
that it can reach the proper BUG() instead of oopsing earlier"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
x86: remove this_cpu_xor() implementation
percpu: remove this_cpu_xor() implementation
percpu: fix bootmem error handling in pcpu_page_first_chunk()
Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.
Compile tested i386_defconfig + gcc 4.7.3
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel, it
cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.
But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.
So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movable_node boot option to allow users to
choose to not to consume hotpluggable memory at early boot time and later
we can set it as ZONE_MOVABLE.
To achieve this, the movable_node boot option will control the memblock
allocation direction. That said, after memblock is ready, before SRAT is
parsed, we should allocate memory near the kernel image as we explained in
the previous patches. So if movable_node boot option is set, the kernel
does the following:
1. After memblock is ready, make memblock allocate memory bottom up.
2. After SRAT is parsed, make memblock behave as default, allocate memory
top down.
Users can specify "movable_node" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory reserved for crashkernel could be large. So we should not allocate
this memory bottom up from the end of kernel image.
When SRAT is parsed, we will be able to know which memory is hotpluggable,
and we can avoid allocating this memory for the kernel. So reorder
reserve_crashkernel() after SRAT is parsed.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Linux kernel cannot migrate pages used by the kernel. As a result,
kernel pages cannot be hot-removed. So we cannot allocate hotpluggable
memory for the kernel.
In a memory hotplug system, any numa node the kernel resides in should be
unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely
unhotpluggable.
ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
info. But before SRAT is parsed, memblock has already started to allocate
memory for the kernel. So we need to prevent memblock from doing this.
So direct memory mapping page tables setup is the case.
init_mem_mapping() is called before SRAT is parsed. To prevent page
tables being allocated within hotpluggable memory, we will use bottom-up
direction to allocate page tables from the end of kernel image to the
higher memory.
Note:
As for allocating page tables in lower memory, TJ said:
: This is an optional behavior which is triggered by a very specific kernel
: boot param, which I suspect is gonna need to stick around to support
: memory hotplug in the current setup unless we add another layer of address
: translation to support memory hotplug.
As for page tables may occupy too much lower memory if using 4K mapping
(CONFIG_DEBUG_PAGEALLOC and CONFIG_KMEMCHECK both disable using >4k
pages), TJ said:
: But as I said in the same paragraph, parsing SRAT earlier doesn't solve
: the problem in itself either. Ignoring the option if 4k mapping is
: required and memory consumption would be prohibitive should work, no?
: Something like that would be necessary if we're gonna worry about cases
: like this no matter how we implement it, but, frankly, I'm not sure this
: is something worth worrying about.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create a new function memory_map_top_down to factor out of the top-down
direct memory mapping pagetable setup. This is also a preparation for the
following patch, which will introduce the bottom-up memory mapping. That
said, we will put the two ways of pagetable setup into separate functions,
and choose to use which way in init_mem_mapping, which makes the code more
clear.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support the next generation Intel Atom processor
mirco-architecture, formerly called Silvermont.
The server version, formerly called "Avoton",
is named the "Intel(R) Atom(TM) Processor C2000 Product Family".
The client version, formerly called "Bay Trail",
is named the "Intel Atom Processor Z3000 Series",
as well as various "Intel Pentium Processor"
and "Intel Celeron Processor" brands, depending
on form-factor.
Silvermont has a set of MSRs not far off from NHM,
but the RAPL register set is a sub-set of those previously supported.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Do it the same way as done in microcode_intel.c: use pr_debug()
for missing firmware files.
There seem to be CPUs out there for which no microcode update
has been submitted to kernel-firmware repo yet resulting in
scary sounding error messages in dmesg:
microcode: failed to load file amd-ucode/microcode_amd_fam16h.bin
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1384274383-43510-1-git-send-email-trenn@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Consider a kernel crash in a module, simulated the following way:
static int my_init(void)
{
char *map = (void *)0x5;
*map = 3;
return 0;
}
module_init(my_init);
When we turn off FRAME_POINTERs, the very first instruction in
that function causes a BUG. The problem is that we print IP in
the BUG report using %pB (from printk_address). And %pB
decrements the pointer by one to fix printing addresses of
functions with tail calls.
This was added in commit 71f9e59800 ("x86, dumpstack: Use
%pB format specifier for stack trace") to fix the call stack
printouts.
So instead of correct output:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
We get:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
IP: [<ffffffffa0152000>] 0xffffffffa0151fff
To fix that, we use %pS only for stack addresses printouts (via
newly added printk_stack_address) and %pB for regs->ip (via
printk_address). I.e. we revert to the old behaviour for all
except call stacks. And since from all those reliable is 1, we
remove that parameter from printk_address.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: joe@perches.com
Cc: jirislaby@gmail.com
Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
Pull x86 UV debug changes from Ingo Molnar:
"Various SGI UV debuggability improvements, amongst them KDB support,
with related core KDB enabling patches changing kernel/debug/kdb/"
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "x86/UV: Add uvtrace support"
x86/UV: Add call to KGDB/KDB from NMI handler
kdb: Add support for external NMI handler to call KGDB/KDB
x86/UV: Check for alloc_cpumask_var() failures properly in uv_nmi_setup()
x86/UV: Add uvtrace support
x86/UV: Add kdump to UV NMI handler
x86/UV: Add summary of cpu activity to UV NMI handler
x86/UV: Update UV support for external NMI signals
x86/UV: Move NMI support
Pull x86 uaccess changes from Ingo Molnar:
"A single change that micro-optimizes __copy_*_user_inatomic(), used by
the futex code"
* 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
Pull x86 reboot changes from Ingo Molnar:
"Misc changes - the only one with functional impact should be commit
16c21ae5ca ("reboot: Allow specifying warm/cold reset for CF9 boot
type") which extends cold/warm reboot handling to the 0xCF9 reboot
method"
* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/reboot: Correct pr_info() log message in the set_bios/pci/kbd_reboot()
x86/reboot: Sort reboot DMI quirks by vendor
x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
reboot: Allow specifying warm/cold reset for CF9 boot type
Pull x86 platform fixlet from Ingo Molnar:
"A single __initdata fix"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/geode: Fix incorrect placement of __initdata tag
Pull x86 mm fixlet from Ingo Molnar:
"One cleanup that documents a particular detail in init_mem_mapping()"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Add 'step_size' comments to init_mem_mapping()
Pull x86 RAS changes from Ingo Molnar:
"The biggest change adds support for Intel 'CPER' (UEFI Common Platform
Error Record) error logging, which builds upon an enhanced error
logging mechanism available on Xeon processors.
Full description is here:
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
This change provides a module (and support code) to check for an
extended error log and prints extra details about the error on the
console"
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC
dmi: Avoid unaligned memory access in save_mem_devices()
Move cper.c from drivers/acpi/apei to drivers/firmware/efi
EDAC, GHES: Update ghes error record info
ACPI, APEI, CPER: Cleanup CPER memory error output format
ACPI, APEI, CPER: Enhance memory reporting capability
ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
DMI: Parse memory device (type 17) in SMBIOS
ACPI, x86: Extended error log driver for x86 platform
bitops: Introduce a more generic BITMASK macro
ACPI, CPER: Update cper info
ACPI, APEI, CPER: Fix status check during error printing
Pull x86 iommu changes from Ingo Molnar:
"Make it easier to turn off the old AMD GART code"
* 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu: Clean up the CONFIG_GART_IOMMU config option a bit
x86/iommu: Don't make AMD_GART depend on EXPERT and default y
Pull x86/intel-mid changes from Ingo Molnar:
"Update the 'intel mid' (mobile internet device) platform code as Intel
is rolling out more SoC designs.
This gets rid of most of the 'MRST' platform code in the process,
mostly by renaming and shuffling code around into their respective
'intel-mid' platform drivers"
* 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, intel-mid: Do not re-introduce usage of obsolete __cpuinit
intel_mid: Move platform device setups to their own platform_<device>.* files
x86: intel-mid: Add section for sfi device table
intel-mid: sfi: Allow struct devs_id.get_platform_data to be NULL
intel_mid: Moved SFI related code to sfi.c
intel_mid: Added custom handler for ipc devices
intel_mid: Added custom device_handler support
intel_mid: Refactored sfi_parse_devs() function
intel_mid: Renamed *mrst* to *intel_mid*
pci: intel_mid: Return true/false in function returning bool
intel_mid: Renamed *mrst* to *intel_mid*
mrst: Fixed indentation issues
mrst: Fixed printk/pr_* related issues
Pull x86/hyperv changes from Ingo Molnar:
"These changes enable Linux guests to boot as 'Modern VM' guest kernels
on MS-Hyperv hosts"
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, hyperv: Move a variable to avoid an unused variable warning
x86, hyperv: Fix build error due to missing <asm/apic.h> include
x86, hyperv: Correctly guard the local APIC calibration code
x86, hyperv: Get the local APIC timer frequency from the hypervisor
Pull x86 EFI changes from Ingo Molnar:
"Main changes:
- Add support for earlyprintk=efi which uses the EFI framebuffer.
Very useful for debugging boot problems.
- EFI stub support for large memory maps (more than 128 entries)
- EFI ARM support - this was mostly done by generalizing x86 <-> ARM
platform differences, such as by moving x86 EFI code into
drivers/firmware/efi/ and sharing it with ARM.
- Documentation updates
- misc fixes"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/efi: Add EFI framebuffer earlyprintk support
boot, efi: Remove redundant memset()
x86/efi: Fix config_table_type array termination
x86 efi: bugfix interrupt disabling sequence
x86: EFI stub support for large memory maps
efi: resolve warnings found on ARM compile
efi: Fix types in EFI calls to match EFI function definitions.
efi: Renames in handle_cmdline_files() to complete generalization.
efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
efi: Allow efi_free() to be called with size of 0
efi: use efi_get_memory_map() to get final map for x86
efi: generalize efi_get_memory_map()
efi: Rename __get_map() to efi_get_memory_map()
efi: Move unicode to ASCII conversion to shared function.
efi: Generalize relocate_kernel() for use by other architectures.
efi: Move relocate_kernel() to shared file.
efi: Enforce minimum alignment of 1 page on allocations.
efi: Rename memory allocation/free functions
efi: Add system table pointer argument to shared functions.
efi: Move common EFI stub code from x86 arch code to common location
...
Pull x86 cpu changes from Ingo Molnar:
"The biggest change that stands out is the increase of the
CONFIG_NR_CPUS range from 4096 to 8192 - as real hardware out there
already went beyond 4k CPUs ...
We only allow more than 512 CPUs if offstack cpumasks are enabled.
CONFIG_MAXSMP=y remains to be the 'you are nuts!' extreme testcase,
which now means a max of 8192 CPUs"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Increase max CPU count to 8192
x86/cpu: Allow higher NR_CPUS values
x86/cpu: Always print SMP information in /proc/cpuinfo
x86/cpu: Track legacy CPU model data only on 32-bit kernels
Pull x86 cleanups from Ingo Molnar:
"Two small cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, msr: Use file_inode(), not f_mapping->host
x86: mkpiggy.c: Explicitly close the output file
Pull x86 build changes from Ingo Molnar:
"Two small changes"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, defconfig: Add DEVTMPFS and DEVTMPFS_MOUNT to *86*_defconfig
x86, build: move build output statistics away from stderr
Pull x86 user access changes from Ingo Molnar:
"This tree contains two copy_[from/to]_user() build time checking
changes/enhancements from Jan Beulich.
The desired outcome is to get better compiler warnings with
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y, to keep people from
introducing bugs such as overflows and information leaks"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unify copy_to_user() and add size checking to it
x86: Unify copy_from_user() size checking
Pull x86/apic fix from Ingo Molnar:
"A single fix to the IO-APIC / local-APIC shutdown sequence"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Disable I/O APIC before shutdown of the local APIC
Pull timer changes from Ingo Molnar:
"Main changes in this cycle were:
- Updated full dynticks support.
- Event stream support for architected (ARM) timers.
- ARM clocksource driver updates.
- Move arm64 to using the generic sched_clock framework & resulting
cleanup in the generic sched_clock code.
- Misc fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
clocksource: sun4i: remove IRQF_DISABLED
clocksource: sun4i: Report the minimum tick that we can program
clocksource: sun4i: Select CLKSRC_MMIO
clocksource: Provide timekeeping for efm32 SoCs
clocksource: em_sti: convert to clk_prepare/unprepare
time: Fix signedness bug in sysfs_get_uname() and its callers
timekeeping: Fix some trivial typos in comments
alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
clocksource: arch_timer: Do not register arch_sys_counter twice
timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
sched_clock: Remove sched_clock_func() hook
arch_timer: Move to generic sched_clock framework
clocksource: tcb_clksrc: Remove IRQF_DISABLED
clocksource: tcb_clksrc: Improve driver robustness
clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
clocksource: dw_apb_timer_of: Mark a few more functions as __init
clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
arm: zynq: Enable arm_global_timer
...
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle are:
- (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
van Riel, Peter Zijlstra et al. Yay!
- optimize preemption counter handling: merge the NEED_RESCHED flag
into the preempt_count variable, by Peter Zijlstra.
- wait.h fixes and code reorganization from Peter Zijlstra
- cfs_bandwidth fixes from Ben Segall
- SMP load-balancer cleanups from Peter Zijstra
- idle balancer improvements from Jason Low
- other fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
stop_machine: Fix race between stop_two_cpus() and stop_cpus()
sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
sched: Fix asymmetric scheduling for POWER7
sched: Move completion code from core.c to completion.c
sched: Move wait code from core.c to wait.c
sched: Move wait.c into kernel/sched/
sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
sched: Avoid throttle_cfs_rq() racing with period_timer stopping
sched: Guarantee new group-entities always have weight
sched: Fix hrtimer_cancel()/rq->lock deadlock
sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
sched: Fix race on toggling cfs_bandwidth_used
sched: Remove extra put_online_cpus() inside sched_setaffinity()
sched/rt: Fix task_tick_rt() comment
sched/wait: Fix build breakage
sched/wait: Introduce prepare_to_wait_event()
sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
sched: Remove get_online_cpus() usage
sched: Fix race in migrate_swap_stop()
...
Pull perf updates from Ingo Molnar:
"As a first remark I'd like to note that the way to build perf tooling
has been simplified and sped up, in the future it should be enough for
you to build perf via:
cd tools/perf/
make install
(ie without the -j option.) The build system will figure out the
number of CPUs and will do a parallel build+install.
The various build system inefficiencies and breakages Linus reported
against the v3.12 pull request should now be resolved - please
(re-)report any remaining annoyances or bugs.
Main changes on the perf kernel side:
* Performance optimizations:
. perf ring-buffer code optimizations, by Peter Zijlstra
. perf ring-buffer code optimizations, by Oleg Nesterov
. x86 NMI call-stack processing optimizations, by Peter Zijlstra
. perf context-switch optimizations, by Peter Zijlstra
. perf sampling speedups, by Peter Zijlstra
. x86 Intel PEBS processing speedups, by Peter Zijlstra
* Enhanced hardware support:
. for Intel Ivy Bridge-EP uncore PMUs, by Zheng Yan
. for Haswell transactions, by Andi Kleen, Peter Zijlstra
* Core perf events code enhancements and fixes by Oleg Nesterov:
. for uprobes, if fork() is called with pending ret-probes
. for uprobes platform support code
* New ABI details by Andi Kleen:
. Report x86 Haswell TSX transaction abort cost as weight
Main changes on the perf tooling side (some of these tooling changes
utilize the above kernel side changes):
* 'perf report/top' enhancements:
. Convert callchain children list to rbtree, greatly reducing the
time taken for callchain processing, from Namhyung Kim.
. Add new COMM infrastructure, further improving histogram
processing, from Frédéric Weisbecker, one fix from Namhyung Kim.
. Add /proc/kcore based live-annotation improvements, including
build-id cache support, multi map 'call' instruction navigation
fixes, kcore address validation, objdump workarounds. From
Adrian Hunter.
. Show progress on histogram collapsing, that can take a long
time, from Namhyung Kim.
. Add --max-stack option to limit callchain stack scan in 'top'
and 'report', improving callchain processing when reducing the
stack depth is an option, from Waiman Long.
. Add new option --ignore-vmlinux for perf top, from Willy
Tarreau.
* 'perf trace' enhancements:
. 'perf trace' now can can use a 'perf probe' dynamic tracepoints
to hook into the userspace -> kernel pathname copy so that it
can map fds to pathnames without reading /proc/pid/fd/ symlinks.
From Arnaldo Carvalho de Melo.
. Show VFS path associated with fd in live sessions, using a
'vfs_getname' 'perf probe' created dynamic tracepoint or by
looking at /proc/pid/fd, from Arnaldo Carvalho de Melo.
. Add 'trace' beautifiers for lots of syscall arguments, from
Arnaldo Carvalho de Melo.
. Implement more compact 'trace' output by suppressing zeroed
args, from Arnaldo Carvalho de Melo.
. Show thread COMM by default in 'trace', from Arnaldo Carvalho de
Melo.
. Add option to show full timestamp in 'trace', from David Ahern.
. Add 'record' command in 'trace', to record raw_syscalls:*, from
David Ahern.
. Add summary option to dump syscall statistics in 'trace', from
David Ahern.
. Improve error messages in 'trace', providing hints about system
configuration steps needed for using it, from Ramkumar
Ramachandra.
. 'perf trace' now emits hints as to why tracing is not possible,
helping the user to setup the system to allow tracing in the
desired permission granularity, telling if the problem is due to
debugfs not being mounted or with not enough permission for
!root, /proc/sys/kernel/perf_event_paranoit value, etc. From
Arnaldo Carvalho de Melo.
* 'perf record' enhancements:
. Check maximum frequency rate for record/top, emitting better
error messages, from Jiri Olsa.
. 'perf record' code cleanups, from David Ahern.
. Improve write_output error message in 'perf record', from Adrian
Hunter.
. Allow specifying B/K/M/G unit to the --mmap-pages arguments,
from Jiri Olsa.
. Fix command line callchain attribute tests to handle the new
-g/--call-chain semantics, from Arnaldo Carvalho de Melo.
* 'perf kvm' enhancements:
. Disable live kvm command if timerfd is not supported, from David
Ahern.
. Fix detection of non-core features, from David Ahern.
* 'perf list' enhancements:
. Add usage to 'perf list', from David Ahern.
. Show error in 'perf list' if tracepoints not available, from
Pekka Enberg.
* 'perf probe' enhancements:
. Support "$vars" meta argument syntax for local variables,
allowing asking for all possible variables at a given probe
point to be collected when it hits, from Masami Hiramatsu.
* 'perf sched' enhancements:
. Address the root cause of that 'perf sched' stack initialization
build slowdown, by programmatically setting a big array after
moving the global variable back to the stack. Fix from Adrian
Hunter.
* 'perf script' enhancements:
. Set up output options for in-stream attributes, from Adrian
Hunter.
. Print addr by default for BTS in 'perf script', from Adrian
Juntmer
* 'perf stat' enhancements:
. Improved messages when doing profiling in all or a subset of
CPUs using a workload as the session delimitator, as in:
'perf stat --cpu 0,2 sleep 10s'
from Arnaldo Carvalho de Melo.
. Add units to nanosec-based counters in 'perf stat', from David
Ahern.
. Remove bogus info when using 'perf stat' -e cycles/instructions,
from Ramkumar Ramachandra.
* 'perf lock' enhancements:
. 'perf lock' fixes and cleanups, from Davidlohr Bueso.
* 'perf test' enhancements:
. Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing
and 'perf test', from Adrian Hunter.
. Clarify the "sample parsing" test entry, from Arnaldo Carvalho
de Melo.
. Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test,
from Arnaldo Carvalho de Melo.
. Memory leak fixes in 'perf test', from Felipe Pena.
* 'perf bench' enhancements:
. Change the procps visible command-name of invididual benchmark
tests plus cleanups, from Ingo Molnar.
* Generic perf tooling infrastructure/plumbing changes:
. Separating data file properties from session, code
reorganization from Jiri Olsa.
. Fix version when building out of tree, as when using one of
these:
$ make help | grep perf
perf-tar-src-pkg - Build perf-3.12.0.tar source tarball
perf-targz-src-pkg - Build perf-3.12.0.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball
perf-tarxz-src-pkg - Build perf-3.12.0.tar.xz source tarball
$
from David Ahern.
. Enhance option parse error message, showing just the help lines
of the options affected, from Namhyung Kim.
. libtraceevent updates from upstream trace-cmd repo, from Steven
Rostedt.
. Always use perf_evsel__set_sample_bit to set sample_type, from
Adrian Hunter.
. Memory and mmap leak fixes from Chenggang Qin.
. Assorted build fixes for from David Ahern and Jiri Olsa.
. Speed up and prettify the build system, from Ingo Molnar.
. Implement addr2line directly using libbfd, from Roberto Vitillo.
. Separate the GTK support in a separate libperf-gtk.so DSO, that
is only loaded when --gtk is specified, from Namhyung Kim.
. perf bash completion fixes and improvements from Ramkumar
Ramachandra.
. Support for Openembedded/Yocto -dbg packages, from Ricardo
Ribalda Delgado.
And lots and lots of other fixes and code reorganizations that did not
make it into the list, see the shortlog, diffstat and the Git log for
details!"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits)
uprobes: Fix the memory out of bound overwrite in copy_insn()
uprobes: Fix the wrong usage of current->utask in uprobe_copy_process()
perf tools: Remove unneeded include
perf record: Remove post_processing_offset variable
perf record: Remove advance_output function
perf record: Refactor feature handling into a separate function
perf trace: Don't relookup fields by name in each sample
perf tools: Fix version when building out of tree
perf evsel: Ditch evsel->handler.data field
uprobes: Export write_opcode() as uprobe_write_opcode()
uprobes: Introduce arch_uprobe->ixol
uprobes: Kill module_init() and module_exit()
uprobes: Move function declarations out of arch
perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support
perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes
perf: Factor out strncpy() in perf_event_mmap_event()
tools/perf: Add required memory barriers
perf: Fix arch_perf_out_copy_user default
perf: Update a stale comment
perf: Optimize perf_output_begin() -- address calculation
...
Pull IRQ changes from Ingo Molnar:
"The biggest change this cycle are the softirq/hardirq stack
interaction and nesting fixes, cleanups and reorganizations from
Frederic. This is the longer followup story to the softirq nesting
fix that is already upstream (commit ded7975475: "irq: Force hardirq
exit's softirq processing on its own stack")"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: bcm2835: Convert to use IRQCHIP_DECLARE macro
powerpc: Tell about irq stack coverage
x86: Tell about irq stack coverage
irq: Optimize softirq stack selection in irq exit
irq: Justify the various softirq stack choices
irq: Improve a bit softirq debugging
irq: Optimize call to softirq on hardirq exit
irq: Consolidate do_softirq() arch overriden implementations
x86/irq: Correct comment about i8259 initialization
This reverts commit 8eba18428a.
uv_trace() is not used by anything, nor is uv_trace_nmi_func, nor
uv_trace_func.
That's not how we do instrumentation code in the kernel: we add
tracepoints, printk()s, etc. so that everyone not just those with
magic kernel modules can debug a system.
So remove this unused (and misguied) piece of code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/n/tip-tumfBffmr4jmnt8Gyxanoblg@git.kernel.org
Tracepoints are named hierachially, and it makes more sense to keep a
general flow of information level from general to specific from left
to right, i.e.
x86_exceptions.page_fault_user|kernel
rather than
x86_exceptions.user|kernel_page_fault
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20131111082955.GB12405@gmail.com
Currently irq vector handlers for tracing are registered in both set_intr_gate()
and __trace_alloc_intr_gate() in alloc_intr_gate().
But, we don't need to do that twice.
So, let's delete __trace_alloc_intr_gate().
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716E1B.7090205@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch registers exception handlers for tracing to a trace IDT.
To implemented it in set_intr_gate(), this patch does followings.
- Register the exception handlers to
the trace IDT by prepending "trace_" to the handler's names.
- Also, newly introduce trace_page_fault() to add tracepoints
in a subsequent patch.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Prepare to move set_intr_gate() into a macro by removing
__alloc_intr_gate().
The purpose is to avoid failing a kernel build after applying a
subsequent patch which changes set_intr_gate() into a macro.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716DB8.1080702@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* stefano/swiotlb-xen-9.1:
swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
grant-table: call set_phys_to_machine after mapping grant refs
arm,arm64: do not always merge biovec if we are running on Xen
swiotlb: print a warning when the swiotlb is full
swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
swiotlb-xen: use xen_alloc/free_coherent_pages
xen: introduce xen_alloc/free_coherent_pages
arm64/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain
arm/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain
swiotlb-xen: introduce xen_swiotlb_set_dma_mask
xen/arm,arm64: enable SWIOTLB_XEN
xen: make xen_create_contiguous_region return the dma address
xen/x86: allow __set_phys_to_machine for autotranslate guests
arm/xen,arm64/xen: introduce p2m
arm64: define DMA_ERROR_CODE
arm: make SWIOTLB available
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/arm/include/asm/dma-mapping.h
drivers/xen/swiotlb-xen.c
[Conflicts arose b/c "arm: make SWIOTLB available" v8 was in Stefano's
branch, while I had v9 + Ack from Russel. I also fixed up white-space
issues]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSWyGgAAoJEHm+PkMAQRiGA8MH/35upHXImoRCsI5uC1qvHJtI
QvQAhDFxoEXbFUKeaYgTfcM8q9FgnqnfjhLf8eYa4Q7tDZeqLXOE8bkI807mSZMl
yECr3jcwlV+zyhV2MP/HdwTjzy25bwxLM3Zy43S7QROrYoMHZYznil/QPfyMATCJ
XLPuXZC1FtuUen89n4BoDIuL8QaVrIR/zLqFklAQcdTcGpLHSOwFtH8gb2WaRLhv
+4IikFRFgTNZiMR5tP0GPc6UH6TVTvRb4QKSqqa7J8OmfAIvOzAUdhqWSPOIwWwt
Z/+JFxFDczAcNmpv4gE6jkgc2vR8CVeHsvh0j61RDSFObBWspwk337CSyUZxYSA=
=w4VQ
-----END PGP SIGNATURE-----
Merge tag 'v3.12-rc5' into stable/for-linus-3.13
Linux 3.12-rc5
Because the Stefano branch (for SWIOTLB ARM changes) is based on that.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* tag 'v3.12-rc5': (550 commits)
Linux 3.12-rc5
watchdog: sunxi: Fix section mismatch
watchdog: kempld_wdt: Fix bit mask definition
watchdog: ts72xx_wdt: locking bug in ioctl
ARM: exynos: dts: Update 5250 arch timer node with clock frequency
parisc: let probe_kernel_read() capture access to page zero
parisc: optimize variable initialization in do_page_fault
parisc: fix interruption handler to respect pagefault_disable()
parisc: mark parisc_terminate() noreturn and cold.
parisc: remove unused syscall_ipi() function.
parisc: kill SMP single function call interrupt
parisc: Export flush_cache_page() (needed by lustre)
vfs: allow O_PATH file descriptors for fstatfs()
ext4: fix memory leak in xattr
ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc"
ALSA: hda - Sony VAIO Pro 13 (haswell) now has a working headset jack
ALSA: hda - Add a headset mic model for ALC269 and friends
ALSA: hda - Fix microphone for Sony VAIO Pro 13 (Haswell model)
compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
Revert "i915: Update VGA arbiter support for newer devices"
...
commit 6efa20e49b
("xen: Support 64-bit PV guest receiving NMIs") and
commit cd9151e26d
( "xen/balloon: set a mapping for ballooned out pages")
added new instances of __cpuinit usage.
We removed this a couple versions ago; we now want to remove
the compat no-op stubs. Introducing new users is not what
we want to see at this point in time, as it will break once
the stubs are gone.
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
All the BARs have the ability to grow.
v2: Pulled out the simulator workaround to a separate patch.
Rebased.
v3: Rebase onto latest vlv patches from Jesse.
v4: Rebased on top of the early stolen quirk patch from Jesse.
v5: Use the new macro names.
s/INTEL_BDW_PCI_IDS_D/INTEL_BDW_D_IDS
s/INTEL_BDW_PCI_IDS_M/INTEL_BDW_M_IDS
It's Jesse's fault for not following the convention I originally set.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* pci/misc:
PCI: Enable upstream bridges even for VFs on virtual buses
PCI: Add pci_upstream_bridge()
PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
The commit 712b6aa873 [Nov7 linux-next
via tip/auto-latest] ("intel_mid: Renamed *mrst* to *intel_mid*")
adds a __cpuinit.
We removed this a couple versions ago; we now want to remove
the compat no-op stubs. Introducing new users is not what
we want to see at this point in time, as it will break once
the stubs are gone.
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1383849290-11250-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We need to copy padding to kernel space first before looking at it.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
In reboot and crash path, when we shut down the local APIC, the I/O APIC is
still active. This may cause issues because external interrupts
can still come in and disturb the local APIC during shutdown process.
To quiet external interrupts, disable I/O APIC before shutdown local APIC.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua.yu@intel.com
Cc: <stable@kernel.org>
[ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Certain platforms do not allow writes in the MSI-X BARs to setup or tear
down vector values. To combat against the generic code trying to write to
that and either silently being ignored or crashing due to the pagetables
being marked R/O this patch introduces a platform override.
Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
code.
For Xen, which does not allow the guest to write to MSI-X tables - as the
hypervisor is solely responsible for setting the vector values - we
implement two nops.
This fixes a Xen guest crash when passing a PCI device with MSI-X to the
guest. See the bugzilla for more details.
[bhelgaas: add bugzilla info]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
This patch proposes to remove the IRQF_DISABLED flag from x86/xen
code. It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Currently xol_get_insn_slot() assumes that we should simply copy
arch_uprobe->insn[] which is (ignoring arch_uprobe_analyze_insn)
just the copy of the original insn.
This is not true for arm which needs to create another insn to
execute it out-of-line.
So this patch simply adds the new member, ->ixol into the union.
This doesn't make any difference for x86 and powerpc, but arm
can divorce insn/ixol and initialize the correct xol insn in
arch_uprobe_analyze_insn().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Move the function declarations from the arch headers to the common
header, since only the function bodies are architecture-specific.
These changes are from Vincent Rabin's uprobes patch.
[ oleg: update arch/powerpc/include/asm/uprobes.h ]
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
The variable hv_lapic_frequency causes an unused variable warning if
CONFIG_X86_LOCAL_APIC is disabled. Since the variable is only used
inside a small if statement, move the declaration of that variable
into the if statement itself.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Currently seqlocks and seqcounts don't support lockdep.
After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused variant of this patch
to narrow down the cause of the issue.
This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.
Since seqcounts are used in the vdso gettimeofday code, I've provided
non-lockdep accessors for those needs.
I've also handled one case where there were nested seqlock writers
and there may be more edge cases.
Comments and feedback would be appreciated!
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unlike other uncore boxes, IRP boxes live in PCI buses with no UBOX
device. For PCI bus without UBOX device, we find the next bus that
has UBOX device and use its 'bus to socket' mapping.
Besides the counter/control registers in IRP boxes are not properly
aligned.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: "Yan Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1383197815-17706-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The encoding for filter registers of IvyBridge-EP uncore QPI boxes is
completely the same as SandyBridge-EP.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: "Yan Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1383197815-17706-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The arch_perf_output_copy_user() default of
__copy_from_user_inatomic() returns bytes not copied, while all other
argument functions given DEFINE_OUTPUT_COPY() return bytes copied.
Since copy_from_user_nmi() is the odd duck out by returning bytes
copied where all other *copy_{to,from}* functions return bytes not
copied, change it over and ammend DEFINE_OUTPUT_COPY() to expect bytes
not copied.
Oddly enough DEFINE_OUTPUT_COPY() already returned bytes not copied
while expecting its worker functions to return bytes copied.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: will.deacon@arm.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131030201622.GR16117@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The prototype for kvm_check_iopl appeared in commit
f850e2e603 ("KVM: x86 emulator: Check IOPL
level during io instruction emulation"), but the function never actually
existed. Remove the prototype.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
complete_pio ceased to exist in commit
7972995b0c ("KVM: x86 emulator: Move
string pio emulation into emulator.c"), but the prototype remained.
Remove its prototype.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
In certain occasions it is possible for a hung task detector
positive to be false: continuation from a paused VM, for example.
Add a method to reset detection, similar as is done
with other kernel watchdogs.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Implement reset of kernel watchdogs at pvclock read time. This avoids
adding special code to every watchdog.
This is possible for watchdogs which measure time based on sched_clock() or
ktime_get() variants.
Suggested by Don Zickus.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
I noticed that srcu_read_lock/unlock both have a memory barrier,
so just by moving srcu_read_unlock earlier we can get rid of
one call to smp_mb() using smp_mb__after_srcu_read_unlock instead.
Unsurprisingly, the gain is small but measureable using the unit test
microbenchmark:
before
vmcall in the ballpark of 1410 cycles
after
vmcall in the ballpark of 1360 cycles
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The MAXSMP option is intended to enable silly large numbers of
CPUs for testing purposes. The current value of 4096 isn't very
silly any longer as there are actual SGI machines that approach
6096 CPUs when taking HT into account.
Increase the value to a nice round 8192 to account for this and
allow for short term future increases.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: prarit@redhat.com
Cc: Russ Anderson <rja@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131105143816.GK9944@hansolo.jdub.homelinux.org
[ Tweaked it so that MAXSMP simply sets the maximum of the normal range. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current range for SMP configs is 2 - 512 CPUs, or a full
4096 in the case of MAXSMP. There are machines that have 1024
CPUs in them today and configuring a kernel for that means you
are forced to set MAXSMP. This adds additional unnecessary
overhead. While that overhead might be considered tiny for
large machines, it isn't necessarily so if you are building a
kernel that runs across a wide variety of machines.
To cover the range of more common machines today, we allow
NR_CPUS to be up to 4096 when CPUMASK_OFFSTACK is enabled.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: prarit@redhat.com
Cc: Russ Anderson <rja@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131105143728.GJ9944@hansolo.jdub.homelinux.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently show_cpuinfo_core() displays cpu core information only if
the number of threads per a whole cores is 2 or larger.
However, this condition doesn't care about the number of
sockets. For example, this condition doesn't hold on systems
with two logical cpus consisting of two sockets and a single
core on each socket - yet the topology information would be
interesting to see in that case as well.
I don't know whether or not there are processors in real world
by which such configurations are possible, but at least on
vitual machine environments, such configuration can occur,
typically when no explicit SMP information is provided in
advance.
For example, on qemu/KVM, SMP information is specified via -smp
command-line option, more specifically, its syntax is:
-smp n[,cores=cores][,threads=threads][,sockets=sockets][,maxcpus=maxcpus]
If this is not specified, qemu tells configuration with
n-sockets, 1-core and 1-thread to the guest machine, on which
guest, MP information is not displayed in /proc/cpuinfo.
I saw this situation on VMWare guest environment, too.
To fix this issue, this patch simply removes the condition
because this information is useful even if there's only 1
thread.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/5277D644.4090707@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Conflicts:
kernel/Makefile
There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
=57MC
-----END PGP SIGNATURE-----
Merge tag 'v3.12' into x86/cpu, to refresh the branch before queueing up more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In commit 8a4d0a687a "ftrace: Use breakpoint method to update ftrace
caller", we choose to use breakpoint method to update the ftrace
caller. But we also need to skip over the breakpoint in function
ftrace_int3_handler() for them. Otherwise weird things would happen.
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently cpuid emulation is traced only when executed by intercept.
Move trace point so that emulator invocation is traced too.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
All decode_register() callers check if instruction has rex prefix
to properly decode one byte operand. It make sense to move the check
inside.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The defconfig kernel can not run under neither fedora16 x86_64 laptop
nor fedora17 x86_64 pc. After enable DEVTMPFS* in x86_64_defconfig, it
will be OK.
DEVTMPFS* is only related with software, so for i386_defconfig may also
need them (at least, it has no negative effect for defconfig).
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Link: http://lkml.kernel.org/r/52784DFF.8040004@asianux.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/netconsole.c
net/bridge/br_private.h
Three mostly trivial conflicts.
The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.
In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".
Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.
Signed-off-by: David S. Miller <davem@davemloft.net>