Commit graph

65062 commits

Author SHA1 Message Date
Stephane Eranian
3e702ff6d1 perf/x86: Add LBR software filter support for Intel CPUs
This patch adds an internal sofware filter to complement
the (optional) LBR hardware filter.

The software filter is necessary:

 - as a substitute when there is no HW LBR filter (e.g., Atom, Core)
 - to complement HW LBR filter in case of errata (e.g., Nehalem/Westmere)
 - to provide finer grain filtering (e.g., all processors)

Sometimes the LBR HW filter cannot distinguish between two types
of branches. For instance, to capture syscall as CALLS, it is necessary
to enable the LBR_FAR filter which will also capture JMP instructions.
Thus, a second pass is necessary to filter those out, this is what the
SW filter can do.

The SW filter is built on top of the internal x86 disassembler. It
is a best effort filter especially for user level code. It is subject
to the availability of the text page of the program.

The SW filter is enabled on all Intel processors. It is bypassed
when the user is capturing all branches at all priv levels.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-9-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:42 +01:00
Stephane Eranian
60ce0fbd07 perf/x86: Implement PERF_SAMPLE_BRANCH for Intel CPUs
This patch implements PERF_SAMPLE_BRANCH support for Intel
x86processors. It connects PERF_SAMPLE_BRANCH to the actual LBR.

The patch adds the hooks in the PMU irq handler to save the LBR
on counter overflow for both regular and PEBS modes.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-8-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:41 +01:00
Stephane Eranian
88c9a65e13 perf/x86: Disable LBR support for older Intel Atom processors
The patch adds a restriction for Intel Atom LBR support. Only
steppings 10 (PineView) and more recent are supported. Older models
do not have a functional LBR. Their LBR does not freeze on PMU
interrupt which makes LBR unusable in the context of perf_events.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-7-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:41 +01:00
Stephane Eranian
c5cc2cd906 perf/x86: Add Intel LBR mappings for PERF_SAMPLE_BRANCH filters
This patch adds the mappings from the generic PERF_SAMPLE_BRANCH_*
filters to the actual Intel x86LBR filters, whenever they exist.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-6-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:41 +01:00
Stephane Eranian
ff3fb511ba perf/x86: Sync branch stack sampling with precise_sampling
If precise sampling is enabled on Intel x86 then perf_event uses PEBS.
To correct for the off-by-one error of PEBS, perf_event uses LBR when
precise_sample > 1.

On Intel x86 PERF_SAMPLE_BRANCH_STACK is implemented using LBR,
therefore both features must be coordinated as they may not
configure LBR the same way.

For PEBS, LBR needs to capture all branches at the priv level of
the associated event.

This patch checks that the branch type and priv level of BRANCH_STACK
is compatible with that of the PEBS LBR requirement, thereby allowing:

   $ perf record -b any,u -e instructions:upp ....

But:

   $ perf record -b any_call,u -e instructions:upp

Is not possible.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:40 +01:00
Stephane Eranian
b36817e886 perf/x86: Add Intel LBR sharing logic
The Intel LBR on some recent processor is capable
of filtering branches by type. The filter is configurable
via the LBR_SELECT MSR register.

There are limitation on how this register can be used.

On Nehalem/Westmere, the LBR_SELECT is shared by the two HT threads
when HT is on. It is private to each core when HT is off.

On SandyBridge, the LBR_SELECT register is private to each thread
when HT is on. It is private to each core when HT is off.

The kernel must manage the sharing of LBR_SELECT. It allows
multiple users on the same logical CPU to use LBR_SELECT as
long as they program it with the same value. Across sibling
CPUs (HT threads), the same restriction applies on NHM/WSM.

This patch implements this sharing logic by leveraging the
mechanism put in place for managing the offcore_response
shared MSR.

We modify __intel_shared_reg_get_constraints() to cause
x86_get_event_constraint() to be called because LBR may
be associated with events that may be counter constrained.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:40 +01:00
Stephane Eranian
225ce53910 perf/x86: Add Intel LBR MSR definitions
This patch adds the LBR definitions for NHM/WSM/SNB and Core.
It also adds the definitions for the architected LBR MSR:
LBR_SELECT, LBRT_TOS.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:39 +01:00
Stephane Eranian
bce38cd53e perf: Add generic taken branch sampling support
This patch adds the ability to sample taken branches to the
perf_event interface.

The ability to capture taken branches is very useful for all
sorts of analysis. For instance, basic block profiling, call
counts, statistical call graph.

This new capability requires hardware assist and as such may
not be available on all HW platforms. On Intel x86 it is
implemented on top of the Last Branch Record (LBR) facility.

To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
bit must be set in attr->sample_type.

Sampled taken branches may be filtered by type and/or priv
levels.

The patch adds a new field, called branch_sample_type, to the
perf_event_attr structure. It contains a bitmask of filters
to apply to the sampled taken branches.

Filters may be implemented in HW. If the HW filter does not exist
or is not good enough, some arch may also implement a SW filter.

The following generic filters are currently defined:
- PERF_SAMPLE_USER
  only branches whose targets are at the user level

- PERF_SAMPLE_KERNEL
  only branches whose targets are at the kernel level

- PERF_SAMPLE_HV
  only branches whose targets are at the hypervisor level

- PERF_SAMPLE_ANY
  any type of branches (subject to priv levels filters)

- PERF_SAMPLE_ANY_CALL
  any call branches (may incl. syscall on some arch)

- PERF_SAMPLE_ANY_RET
  any return branches (may incl. syscall returns on some arch)

- PERF_SAMPLE_IND_CALL
  indirect call branches

Obviously filter may be combined. The priv level bits are optional.
If not provided, the priv level of the associated event are used. It
is possible to collect branches at a priv level different from the
associated event. Use of kernel, hv priv levels is subject to permissions
and availability (hv).

The number of taken branch records present in each sample may vary based
on HW, the type of sampled branches, the executed code. Therefore
each sample contains the number of taken branches it contains.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:39 +01:00
Ingo Molnar
737f24bda7 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/perf.h
	tools/perf/util/top.h

Merge reason: resolve these cherry-picking conflicts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 09:20:08 +01:00
Joerg Roedel
1018faa6cf perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled
It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.

This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.

The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Avi Kivity <avi@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: stable@vger.kernel.org # v3.2
Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-02 12:16:39 +01:00
Linus Torvalds
e25bda5642 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/AMD: Fix UP build error
  x86: Specify a size for the cmp in the NMI handler
  x86/nmi: Test saved %cs in NMI to determine nested NMI case
  x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors
  x86/microcode: Remove noisy AMD microcode warning
2012-02-27 07:55:51 -08:00
Heiko Carstens
048cd4e51d compat: fix compile breakage on s390
The new is_compat_task() define for the !COMPAT case in
include/linux/compat.h conflicts with a similar define in
arch/s390/include/asm/compat.h.

This is the minimal patch which fixes the build issues.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-27 07:54:27 -08:00
Linus Torvalds
500dd2370e Two fixes to fix a memory corruption bug when WC pages never get
converted back to WB but end up being recycled in the general memory
 pool as WC.
 
 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPStrRAAoJEFjIrFwIi8fJovAH/RBUJdeDw8x5ki2yDhAz/80S
 +yZKiGaaUYYCB0Fo/BIwVhBQeDabGz8rJCdOv40tRpRCiRD7JIfMo5tCS6QIFF7P
 UvhVuJcqltxIoRjz7nGX8iSUl48JKy9vqmqWXIucG3rYQ7YOkadwVTbhsg4a9U6P
 fcqexzUuXb4fr6CNBBpL3LqHfDaKNovgESHlAmzrcaRGbOADp9LVlWkR6kwiTnIA
 e5yU/DEW9Ej6wJM90Mx9Rg3y22hBZEL1p5NJjaiMrOY2LzX7bE4+mTgtk+a4FNGD
 8WJZm/WWhdsWrKlj8vCKOuJkIgQYJURVMySEGdzM91P1FpJ3edJxIM3qlA958vc=
 =jggO
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-fixes-3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Two fixes to fix a memory corruption bug when WC pages never get
converted back to WB but end up being recycled in the general memory
pool as WC.

There is a better way of fixing this, but there is not enough time to do
the full benchmarking to pick one of the right options - so picking the
one that favors stability for right now.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

* tag 'stable/for-linus-fixes-3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pat: Disable PAT support for now.
  xen/setup: Remove redundant filtering of PTE masks.
2012-02-26 21:03:16 -08:00
Linus Torvalds
ee3253241a This is the arch/c6x part of commit 7c43185138
which was dropped because c6x had not yet been merged at the time.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPR5ojAAoJEOiN4VijXeFPflcP/3oiSnfN64ND3NTVBOB8zXyV
 /CB2F5UQsia3L+Qag3nuB2Vgm6l+5+yPbMHx2IO55XwhG8P31nS7FSsX/kA6YcTQ
 EXN7Y9Y3/1q1trLxZf4ZrFuBev794Tg2X6HmIQztWPp/ru7z5iftnWGQinboCHKf
 8ftsKMCJqOmp4IwX1lj3513qn4FjPI1Pnvdp1P9oAVCuvjrKHDiKxS7pbcODTsHH
 KkVxZQVbF2EHw3LtTFYiryBrVq5arkQU8mbhoouv40TIa2/CdM5F8FKtdEhMkdsy
 6g/JVNdzFGCPMo0eg5eX3oSkr/1ohcFjZmbvq3NkSmh9YmdPfBUMIFTylJr8ShEX
 MlSK2T86cm7W213LUcHW/A5PjwgFEWiBHBI6TJ+2ojfMrrT+bEX8LjGRqQGK+uFR
 P+JhaWYk1KH5njgJWyuh/Q1HXjM3ucURwd3/PWsmZ0fyZm7OcuX1BXGFqLdICrVx
 49DmgzxqIxoyxfIg5EGdLFlR5MQaTxxQirVPkCACWL6avttr767vJ3H76lFGkVSW
 Tg+2KDLF3Ta3t6GUrBo+yPpI5oAWPDW5FAlPpanxfDu6Up3QCWfj2hlk2DZZoJMS
 kHPUF3ShO0lcyp4ZXMNaE7pJZevtmJV++eHSPZvL36ySFHXwQ/NmXhkT8agA05kY
 8VVEBYxS0dnynTO0igTs
 =YDuM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming

This is the arch/c6x part of commit 7c43185138 ("Kbuild: Use dtc's -d
(dependency) option") which was dropped because c6x had not yet been
merged at the time.

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
  Kbuild: Use dtc's -d (dependency) option
2012-02-24 09:01:46 -08:00
Linus Torvalds
37e79cbf7d SH/R-Mobile fixes for 3.3-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.15 (GNU/Linux)
 
 iEYEABECAAYFAk9HFAgACgkQGkmNcg7/o7gQ3QCgveYM5yvFmfaeK+7uKyot5CkC
 RkkAoMoo0cOs3B0s5JlFv1Fl7IKfgVWk
 =odpk
 -----END PGP SIGNATURE-----

Merge tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh

SH/R-Mobile fixes for 3.3-rc5

* tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh:
  arch/arm/mach-shmobile/board-ag5evm.c: included linux/dma-mapping.h twice
  ARM: mach-shmobile: r8a7779 PFC IPSR4 fix
  ARM: mach-shmobile: sh73a0 PSTR 32-bit access fix
  ARM: mach-shmobile: add GPIO-to-IRQ translation to sh7372
  ARM: mach-shmobile: clock-sh73a0: add DSIxPHY clock support
  arm: fix compile failure in mach-shmobile/board-ag5evm.c
  ARM: mach-shmobile: mackerel: add ak4642 amixer settings on comment
  ARM: mach-shmobile: mackerel: use renesas_usbhs instead of r8a66597_hcd
  ARM: mach-shmobile: simplify MMCIF DMA configuration
  ARM: mach-shmobile: IRQ driven GPIO key support for Kota2
  ARM: mach-shmobile: sh73a0 IRQ sparse alloc fix
  ARM: mach-shmobile: sh73a0 PINT IRQ base fix
2012-02-24 08:57:22 -08:00
Linus Torvalds
0e69e08401 SuperH fixes for 3.3-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.15 (GNU/Linux)
 
 iEYEABECAAYFAk9HE98ACgkQGkmNcg7/o7g4+QCfYIcWunicH7+tLMjH/h5MgHq2
 S5EAoJdIKyAPfV1tsIY0ykJZQP+H4t4X
 =QqTI
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

SuperH fixes for 3.3-rc5

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: Fix sh2a build error for CONFIG_CACHE_WRITETHROUGH
  sh: modify a resource of sh_eth_giga1_resources in board-sh7757lcr
  arch/sh: remove references to cpu_*_map.
  sh: Fix typo in pci-sh7780.c
  sh: add platform_device for SPI1 in setup-sh7757
  sh: modify resource for SPI0 in setup-sh7757
  sh: se7724: fix compile breakage
  sh: clkfwk: bugfix: use clk_reparent() for div6 clocks
  sh: clock-sh7724: fixup sh_fsi clock settings
  sh: sh7757lcr: update to the new MMCIF DMA configuration
  sh: fix the sh_mmcif_plat_data in board-sh7757lcr
  video: pvr2fb: Fix up spurious section mismatch warnings.
  sh: Defer to asm-generic/device.h.
2012-02-24 08:56:51 -08:00
Ingo Molnar
c5905afb0e static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-24 10:05:59 +01:00
Danny Kukawka
7372a4cd6c arch/arm/mach-shmobile/board-ag5evm.c: included linux/dma-mapping.h twice
arch/arm/mach-shmobile/board-ag5evm.c: included 'linux/dma-mapping.h'
twice, remove the duplicate.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:32:17 +09:00
Magnus Damm
74eb436ec0 ARM: mach-shmobile: r8a7779 PFC IPSR4 fix
Fix the bit field width information for the IPSR4 register
in the r8a7779 pin function controller (PFC).

Without this fix the Marzen board fails to receive data
over the serial console due to misconfigured pin function
for the RX pin.

Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:24:59 +09:00
Magnus Damm
689189fb01 ARM: mach-shmobile: sh73a0 PSTR 32-bit access fix
Convert the sh73a0 SMP code to use 32-bit PSTR access.

This fixes wakeup from deep sleep for sh73a0 secondary CPUs.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:24:58 +09:00
Paul Mundt
35eb304b5c Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into rmobile-fixes-for-linus 2012-02-24 13:23:23 +09:00
Phil Edworthy
1ae911cba4 sh: Fix sh2a build error for CONFIG_CACHE_WRITETHROUGH
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:21:46 +09:00
Shimoda, Yoshihiro
befe0756d5 sh: modify a resource of sh_eth_giga1_resources in board-sh7757lcr
The latest sh_eth driver needs a resource of TSU in the channel 1,
if the controller has TSU registers. So, this patch adds the resource.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:21:46 +09:00
Rusty Russell
004f4ce9f3 arch/sh: remove references to cpu_*_map.
This has been obsolescent for a while; time for the final push.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:21:45 +09:00
Masanari Iida
ecfb68c673 sh: Fix typo in pci-sh7780.c
Correct spelling "erorr" to "error" in
arch/sh/drivers/pci/pci-sh7780.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:21:44 +09:00
Linus Torvalds
73c8e679aa Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
BenH says:
 'Here are a few more powerpc bits for you.  A stupid regression I
  introduced with my previous commit to "fix" program check exceptions
  (brown paper bag for me), fix the cpuidle default, a bug fix for
  something that isn't strictly speaking a regression but some upstream
  changes causes it to show in lockdep now while it didn't before, and
  finally a trivial one for rusty to make his life easier later on
  removing the old cpumask cruft. '

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix various issues with return to userspace
  cpuidle: Default y on powerpc pSeries
  powerpc: Fix program check handling when lockdep is enabled
  powerpc: Remove references to cpu_*_map
2012-02-23 11:48:36 -08:00
Linus Torvalds
71c01b9d5b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
It contains 3 important fixes for ColdFire based machines:
 - fix processes getting stuck when running from strace
 - fix kernel vmalloced pages not being visible in all kernel contexts
 - fix shared user pages sometimes being visible in another process
   context

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: Do not set global share for non-kernel shared pages
  m68k: Add shared bit to Coldfire kernel page entries
  m68knommu: fix syscall tracing stuck process
2012-02-22 08:45:08 -08:00
Borislav Petkov
3f806e5098 x86/mce/AMD: Fix UP build error
141168c36c ("x86: Simplify code by removing a !SMP #ifdefs
from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs
around code touching struct cpuinfo_x86 members but also caused
the following build error with Randy's randconfigs:

mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map'

Restore the #ifdef in threshold_create_bank() which creates
symlinks on the non-BSP CPUs.

There's a better patch series being worked on by Kevin Winchester
which will solve this in a cleaner fashion, but that series is
too ambitious for v3.3 merging - so we first queue up this trivial
fix and then do the rest for v3.4.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Nick Bowler <nbowler@elliptictech.com>
Link: http://lkml.kernel.org/r/20120203191801.GA2846@x1.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-22 13:36:30 +01:00
Benjamin Herrenschmidt
18b246fa60 powerpc: Fix various issues with return to userspace
We have a few problems when returning to userspace. This is a
quick set of fixes for 3.3, I'll look into a more comprehensive
rework for 3.4. This fixes:

 - We kept interrupts soft-disabled when schedule'ing or calling
do_signal when returning to userspace as a result of a hardware
interrupt.

 - Rename do_signal to do_notify_resume like all other archs (and
do_signal_pending back to do_signal, which it was before Roland
changed it).

 - Add the missing call to key_replace_session_keyring() to
do_notify_resume().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2012-02-22 16:48:53 +11:00
Michael Ellerman
922b9f86a0 powerpc: Fix program check handling when lockdep is enabled
In commit 54321242af ("Disable interrupts early in Program Check"), we
switched from enabling to disabling interrupts in program_check_common.

Whereas ENABLE_INTS leaves r3 untouched, if lockdep is enabled DISABLE_INTS
calls into lockdep code and will clobber r3. That means we pass a bogus
struct pt_regs* into program_check_exception() and all hell breaks loose.

So load our regs pointer into r3 after we call DISABLE_INTS.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:49 +11:00
Rusty Russell
07d2f1a54a powerpc: Remove references to cpu_*_map
This has been obsolescent for a while; time for the final push.

In adjacent context, replaced old cpus_* with cpumask_*.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:47 +11:00
Linus Torvalds
6b0d1abb35 Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
A few more things this time around.  The only thing warranting some
commentry is the modpost change, which allows folk building a Thumb2
enabled kernel to see section mismatch warnings.  This is why many
weren't noticed with OMAP.

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM/audit: include audit header and fix audit arch
  ARM: OMAP: fix voltage domain build errors with PM_OPP disabled
  ARM/PCI: Remove ARM's duplicate definition of 'pcibios_max_latency'
  ARM: 7336/1: smp_twd: Don't register CPUFREQ notifiers if local timers are not initialised
  ARM: 7327/1: need to include asm/system.h in asm/processor.h
  ARM: 7326/2: PL330: fix null pointer dereference in pl330_chan_ctrl()
  ARM: 7164/3: PL330: Fix the size of the dst_cache_ctrl field
  ARM: 7325/1: fix v7 boot with lockdep enabled
  ARM: 7324/1: modpost: Fix section warnings for ARM for many compilers
  ARM: 7323/1: Do not allow ARM_LPAE on pre-ARMv7 architectures
2012-02-21 18:24:42 -08:00
Linus Torvalds
faf309009e sys_poll: fix incorrect type for 'timeout' parameter
The 'poll()' system call timeout parameter is supposed to be 'int', not
'long'.

Now, the reason this matters is that right now 32-bit compat mode is
broken on at least x86-64, because the 32-bit code just calls
'sys_poll()' directly on x86-64, and the 32-bit argument will have been
zero-extended, turning a signed 'int' into a large unsigned 'long'
value.

We could just introduce a 'compat_sys_poll()' function for this, and
that may eventually be what we have to do, but since the actual standard
poll() semantics is *supposed* to be 'int', and since at least on x86-64
glibc sign-extends the argument before invocing the system call (so
nobody can actually use a 64-bit timeout value in user space _anyway_,
even in 64-bit binaries), the simpler solution would seem to be to just
fix the definition of the system call to match what it should have been
from the very start.

If it turns out that somebody somehow circumvents the user-level libc
64-bit sign extension and actually uses a large unsigned 64-bit timeout
despite that not being how poll() is supposed to work, we will need to
do the compat_sys_poll() approach.

Reported-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-21 17:24:20 -08:00
Eric Paris
5180bb392a ARM/audit: include audit header and fix audit arch
Both bugs being fixed were introduced in:
29ef73b7a8

Include linux/audit.h to fix below build errors:

  CC      arch/arm/kernel/ptrace.o
arch/arm/kernel/ptrace.c: In function 'syscall_trace':
arch/arm/kernel/ptrace.c:919: error: implicit declaration of function 'audit_syscall_exit'
arch/arm/kernel/ptrace.c:921: error: implicit declaration of function 'audit_syscall_entry'
arch/arm/kernel/ptrace.c:921: error: 'AUDIT_ARCH_ARMEB' undeclared (first use in this function)
arch/arm/kernel/ptrace.c:921: error: (Each undeclared identifier is reported only once
arch/arm/kernel/ptrace.c:921: error: for each function it appears in.)
make[1]: *** [arch/arm/kernel/ptrace.o] Error 1
make: *** [arch/arm/kernel] Error 2

This part of the patch is:
Reported-by: Axel Lin <axel.lin@gmail.com>
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
(They both provided patches to fix it)

This patch also (at the request of the list) fixes the fact that
ARM has both LE and BE versions however the audit code was called as if
it was always BE.  If audit userspace were to try to interpret the bits
it got from a LE system it would obviously do so incorrectly.  Fix this
by using the right arch flag on the right system.

This part of the patch is:
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-21 16:50:14 +00:00
Russell King
3ddd4d0c62 ARM: OMAP: fix voltage domain build errors with PM_OPP disabled
The voltage domain code wants the voltage tables, which are in the
opp*.c files.  These files aren't built when PM_OPP is disabled,
causing the following build errors at link time:

twl-common.c:(.init.text+0x2e48): undefined reference to `omap34xx_vddmpu_volt_data'
twl-common.c:(.init.text+0x2e4c): undefined reference to `omap34xx_vddcore_volt_data'
twl-common.c:(.init.text+0x2e5c): undefined reference to `omap36xx_vddmpu_volt_data'
twl-common.c:(.init.text+0x2e60): undefined reference to `omap36xx_vddcore_volt_data'
twl-common.c:(.init.text+0x2830): undefined reference to `omap44xx_vdd_mpu_volt_data'
twl-common.c:(.init.text+0x283c): undefined reference to `omap44xx_vdd_iva_volt_data'
twl-common.c:(.init.text+0x2844): undefined reference to `omap44xx_vdd_core_volt_data'

Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-21 09:36:34 +00:00
Myron Stowe
e23e8c0690 ARM/PCI: Remove ARM's duplicate definition of 'pcibios_max_latency'
The patch series to re-factor PCI's 'latency timer' setup (re:
http://marc.info/?l=linux-kernel&m=131983853831049&w=2) forgot to
remove the ARM specific definition of 'pcibios_max_latency' once such
had been moved into the pci core resulting in ARM related compile
errors -
  drivers/built-in.o:(.data+0x230): multiple definition of
  `pcibios_max_latency'
  arch/arm/common/built-in.o:(.data+0x40c): first defined here
  make[1]: *** [vmlinux.o] Error 1

In the series, patch 2/16 (commit 168c8619fd) converted the ARM
specific version of 'pcibios_set_master()' to a non-inlined version.
This was done in preperation for hosting it up into PCI's core, which
was done in patch 10/16 (commit 96c5590058) of the series (and
where the removal of ARM's 'pcibios_max_latency' was overlooked).

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-21 09:35:32 +00:00
Santosh Shilimkar
910ba598c8 ARM: 7336/1: smp_twd: Don't register CPUFREQ notifiers if local timers are not initialised
Current ARM local timer code registers CPUFREQ notifiers even in case
the twd_timer_setup() isn't called. That seems to be wrong and
would eventually lead to kernel crash on the CPU frequency transitions
on the SOCs where the local timer doesn't exist or broken because of
hardware BUG. Fix it by testing twd_evt and *__this_cpu_ptr(twd_evt).

The issue was observed with v3.3-rc3 and building an OMAP2+ kernel
on OMAP3 SOC which doesn't have TWD.

Below is the dump for reference :

 Unable to handle kernel paging request at virtual address 007e900
 pgd = cdc20000
 [007e9000] *pgd=00000000
 Internal error: Oops: 5 [#1] SMP
 Modules linked in:
 CPU: 0    Not tainted  (3.3.0-rc3-pm+debug+initramfs #9)
 PC is at twd_update_frequency+0x34/0x48
 LR is at twd_update_frequency+0x10/0x48
 pc : [<c001382c>]    lr : [<c0013808>]    psr: 60000093
 sp : ce311dd8  ip : 00000000  fp : 00000000
 r10: 00000000  r9 : 00000001  r8 : ce310000
 r7 : c0440458  r6 : c00137f8  r5 : 00000000  r4 : c0947a74
 r3 : 00000000  r2 : 007e9000  r1 : 00000000  r0 : 00000000
 Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment usr
 Control: 10c5387d  Table: 8dc20019  DAC: 00000015
 Process sh (pid: 599, stack limit = 0xce3102f8)
 Stack: (0xce311dd8 to 0xce312000)
 1dc0:                                                       6000c
 1de0: 00000001 00000002 00000000 00000000 00000000 00000000 00000
 1e00: ffffffff c093d8f0 00000000 ce311ebc 00000001 00000001 ce310
 1e20: c001386c c0437c4c c0e95b60 c0e95ba8 00000001 c0e95bf8 ffff4
 1e40: 00000000 00000000 c005ef74 ce310000 c0435cf0 ce311ebc 00000
 1e60: ce352b40 0007a120 c08d5108 c08ba040 c08ba040 c005f030 00000
 1e80: c08bc554 c032fe2c 0007a120 c08d4b64 ce352b40 c08d8618 ffff8
 1ea0: c08ba040 c033364c ce311ecc c0433b50 00000002 ffffffea c0330
 1ec0: 0007a120 0007a120 22222201 00000000 22222222 00000000 ce357
 1ee0: ce3d6000 cdc2aed8 ce352ba0 c0470164 00000002 c032f47c 00034
 1f00: c0331cac ce352b40 00000007 c032f6d0 ce352bbc 0003d090 c0930
 1f20: c093d8bc c03306a4 00000007 ce311f80 00000007 cdc2aec0 ce358
 1f40: ce8d20c0 00000007 b6fe5000 ce311f80 00000007 ce310000 0000c
 1f60: c000de74 ce987400 ce8d20c0 b6fe5000 00000000 00000000 0000c
 1f80: 00000000 00000000 001fbac8 00000000 00000007 001fbac8 00004
 1fa0: c000df04 c000dd60 00000007 001fbac8 00000001 b6fe5000 00000
 1fc0: 00000007 001fbac8 00000007 00000004 b6fe5000 00000000 00202
 1fe0: 00000000 beb565f8 00101ffc 00008e8c 60000010 00000001 00000
 [<c001382c>] (twd_update_frequency+0x34/0x48) from [<c008ac4c>] )
 [<c008ac4c>] (smp_call_function_single+0x17c/0x1c8) from [<c0013)
 [<c0013890>] (twd_cpufreq_transition+0x24/0x30) from [<c0437c4c>)
 [<c0437c4c>] (notifier_call_chain+0x44/0x84) from [<c005efe4>] ()
 [<c005efe4>] (__srcu_notifier_call_chain+0x70/0xa4) from [<c005f)
 [<c005f030>] (srcu_notifier_call_chain+0x18/0x20) from [<c032fe2)
 [<c032fe2c>] (cpufreq_notify_transition+0xc8/0x1b0) from [<c0333)
 [<c033364c>] (omap_target+0x1b4/0x28c) from [<c032f47c>] (__cpuf)
 [<c032f47c>] (__cpufreq_driver_target+0x50/0x64) from [<c0331d24)
 [<c0331d24>] (cpufreq_set+0x78/0x98) from [<c032f6d0>] (store_sc)
 [<c032f6d0>] (store_scaling_setspeed+0x5c/0x74) from [<c03306a4>)
 [<c03306a4>] (store+0x58/0x74) from [<c014d868>] (sysfs_write_fi)
 [<c014d868>] (sysfs_write_file+0x80/0xb4) from [<c00f2c2c>] (vfs)
 [<c00f2c2c>] (vfs_write+0xa8/0x138) from [<c00f2e9c>] (sys_write)
 [<c00f2e9c>] (sys_write+0x40/0x6c) from [<c000dd60>] (ret_fast_s)
 Code: e594300c e792210c e1a01000 e5840004 (e7930002)
 ---[ end trace 5da3b5167c1ecdda ]---

Reported-by: Kevin Hilman <khilman@ti.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-21 09:26:46 +00:00
Linus Torvalds
27e74da980 i387: export 'fpu_owner_task' per-cpu variable
(And define it properly for x86-32, which had its 'current_task'
declaration in separate from x86-64)

Bitten by my dislike for modules on the machines I use, and the fact
that apparently nobody else actually wanted to test the patches I sent
out.

Snif. Nobody else cares.

Anyway, we probably should uninline the 'kernel_fpu_begin()' function
that is what modules actually use and that references this, but this is
the minimal fix for now.

Reported-by: Josh Boyer <jwboyer@gmail.com>
Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-20 19:34:10 -08:00
Steven Rostedt
a38449ef59 x86: Specify a size for the cmp in the NMI handler
Linus noticed that the cmp used to check if the code segment is
__KERNEL_CS or not did not specify a size. Perhaps it does not matter
as H. Peter Anvin noted that user space can not set the bottom two
bits of the %cs register. But it's best not to let the assembly choose
and change things between different versions of gas, but instead just
pick the size.

Four bytes are used to compare the saved code segment against
__KERNEL_CS. Perhaps this might mess up Xen, but we can fix that when
the time comes.

Also I noticed that there was another non-specified cmp that checks
the special stack variable if it is 1 or 0. This too probably doesn't
matter what cmp is used, but this patch uses cmpl just to make it non
ambiguous.

Link: http://lkml.kernel.org/r/CA+55aFxfAn9MWRgS3O5k2tqN5ys1XrhSFVO5_9ZAoZKDVgNfGA@mail.gmail.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-20 19:45:26 -05:00
Linus Torvalds
39e255dab5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  [S390] correct ktime to tod clock comparator conversion
  [S390] 3215 deadlock with tty_wakeup
  [S390] incorrect PageTables counter for kvm page tables
  [S390] idle: avoid RCU usage in extended quiescent state
2012-02-20 16:13:39 -08:00
Linus Torvalds
7e16838d94 i387: support lazy restore of FPU state
This makes us recognize when we try to restore FPU state that matches
what we already have in the FPU on this CPU, and avoids the restore
entirely if so.

To do this, we add two new data fields:

 - a percpu 'fpu_owner_task' variable that gets written any time we
   update the "has_fpu" field, and thus acts as a kind of back-pointer
   to the task that owns the CPU.  The exception is when we save the FPU
   state as part of a context switch - if the save can keep the FPU
   state around, we leave the 'fpu_owner_task' variable pointing at the
   task whose FP state still remains on the CPU.

 - a per-thread 'last_cpu' field, that indicates which CPU that thread
   used its FPU on last.  We update this on every context switch
   (writing an invalid CPU number if the last context switch didn't
   leave the FPU in a lazily usable state), so we know that *that*
   thread has done nothing else with the FPU since.

These two fields together can be used when next switching back to the
task to see if the CPU still matches: if 'fpu_owner_task' matches the
task we are switching to, we know that no other task (or kernel FPU
usage) touched the FPU on this CPU in the meantime, and if the current
CPU number matches the 'last_cpu' field, we know that this thread did no
other FP work on any other CPU, so the FPU state on the CPU must match
what was saved on last context switch.

In that case, we can avoid the 'f[x]rstor' entirely, and just clear the
CR0.TS bit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-20 10:58:54 -08:00
Linus Torvalds
80ab6f1e8c i387: use 'restore_fpu_checking()' directly in task switching code
This inlines what is usually just a couple of instructions, but more
importantly it also fixes the theoretical error case (can that FPU
restore really ever fail? Maybe we should remove the checking).

We can't start sending signals from within the scheduler, we're much too
deep in the kernel and are holding the runqueue lock etc.  So don't
bother even trying.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-20 10:58:28 -08:00
Linus Torvalds
cea20ca3f3 i387: fix up some fpu_counter confusion
This makes sure we clear the FPU usage counter for newly created tasks,
just so that we start off in a known state (for example, don't try to
preload the FPU state on the first task switch etc).

It also fixes a thinko in when we increment the fpu_counter at task
switch time, introduced by commit 34ddc81a23 ("i387: re-introduce FPU
state preloading at context switch time").  We should increment the
*new* task fpu_counter, not the old task, and only if we decide to use
that state (whether lazily or preloaded).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-20 10:24:09 -08:00
Konrad Rzeszutek Wilk
8eaffa67b4 xen/pat: Disable PAT support for now.
[Pls also look at https://lkml.org/lkml/2012/2/10/228]

Using of PAT to change pages from WB to WC works quite nicely.
Changing it back to WB - not so much. The crux of the matter is
that the code that does this (__page_change_att_set_clr) has only
limited information so when it tries to the change it gets
the "raw" unfiltered information instead of the properly filtered one -
and the "raw" one tell it that PSE bit is on (while infact it
is not).  As a result when the PTE is set to be WB from WC, we get
tons of:

:WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
:Hardware name: HP xw4400 Workstation
.. snip..
:Pid: 27, comm: kswapd0 Tainted: G        W    3.2.2-1.fc16.x86_64 #1
:Call Trace:
: [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
: [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
: [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
: [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
: [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
: [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
: [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
: [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
: [<ffffffff8100a9b2>] ? check_events+0x12/0x20
: [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
: [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
: [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
: [<ffffffff8112ba04>] shrink_slab+0x154/0x310
: [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
: [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
: [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
: [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
: [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
: [<ffffffff8108fb6c>] kthread+0x8c/0xa0

for every page. The proper fix for this is has been posted
and is https://lkml.org/lkml/2012/2/10/228
"x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
along with a detailed description of the problem and solution.

But since that posting has gone nowhere I am proposing
this band-aid solution so that at least users don't get
the page corruption (the pages that are WC don't get changed to WB
and end up being recycled for filesystem or other things causing
mysterious crashes).

The negative impact of this patch is that users of WC flag
(which are InfiniBand, radeon, nouveau drivers) won't be able
to set that flag - so they are going to see performance degradation.
But stability is more important here.

Fixes RH BZ# 742032, 787403, and 745574
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-20 10:41:35 -05:00
Konrad Rzeszutek Wilk
416d721474 xen/setup: Remove redundant filtering of PTE masks.
commit 7347b4082e "xen: Allow
unprivileged Xen domains to create iomap pages" added a redundant
line in the early bootup code to filter out the PTE. That
filtering is already done a bit earlier so this extra processing
is not required.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-20 10:40:54 -05:00
Steven Rostedt
45d5a1683c x86/nmi: Test saved %cs in NMI to determine nested NMI case
Currently, the NMI handler tests if it is nested by checking the
special variable saved on the stack (set during NMI handling)
and whether the saved stack is the NMI stack as well (to prevent
the race when the variable is set to zero).

But userspace may set their %rsp to any value as long as they do
not derefence it, and it may make it point to the NMI stack,
which will prevent NMIs from triggering while the userspace app
is running. (I tested this, and it is indeed the case)

Add another check to determine nested NMIs by looking at the
saved %cs (code segment register) and making sure that it is the
kernel code segment.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1329687817.1561.27.camel@acer.local.home
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-20 09:09:57 +01:00
Linus Torvalds
be2874cb4e These are the bug fixes that have accumulated since 3.3-rc3 in arm-soc.
The majority of them are regression fixes for stuff that broke during
 the merge 3.3 window.
 
 The notable ones are:
 
 * The at91 ata drivers both broke because of an earlier cleanup patch that
   some other patches were based on. Jean-Christophe decided to remove
   the legacy at91_ide driver and fix the new-style at91-pata driver while
   keeping the cleanup patch. I almost rejected the patches for being too
   late and too big but in the end decided to accept them because they
   fix a regression.
 
 * A patch fixing build breakage from the sysdev-to-device conversion
   colliding with other changes touches a number of mach-s3c files.
 
 * b0654037 "ARM: orion: Fix Orion5x GPIO regression from MPP cleanup"
   is a mechanical change that unfortunately touches a lot of lines
   that should up in the diffstat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIVAwUATztTM2CrR//JCVInAQJu3Q/+KN4npDjjJbRm1FR4J+z7dEy3631gt7Ku
 M64JuC2259da0AtXlHXoc8XB7ZrBkMR2k1n+Q42FqUFVILOXcrHSTId6osPQ8WYE
 TGWR0E2APP6/w4YH3dz0aTUauX0HhnWNP4ShWalWxw2Zsc1nhPNcMO3k57E/PNnp
 nUHb2ZR+Huqk9Eje6/Vkr7OQq7dhl0KJvITJKCT1H93vVYZc5l2O5ZytcOC3dsFg
 yMP/btmu9JlCenOwoKcQFv6ug0tWAYiY4ALqQujLN0kcf7rmjLLOG2HQrnycmeh3
 gv9jwK04gYxHkhPbCLCgO/bg906LVcYIl9/TY7jXK8oE4kR0vVxdQOWEzKIX5+KO
 dyAuwy3uGi4szG8f1DKnz1h7vR1MEyBVgQ+yRqnfhLh7mFmuZcOlGTzziD3csDXG
 qd5B2xf9WvLupfpbvgnHUUKEIJVfWPDoJeN3jGCOjd4+j8OzPR6yeAtU85TDQzIx
 IKs2x+0zrYMBre3R+m5vb9v3IhPb1wZU29eXXRzDmLuHJDM00Qc8LmpiWUoeu3cX
 DwuLstYLm8EhWN+LnjAABd3mKeR5tyBojK3EsDFRxIfz3mKHVNEAPE6Iky60Lfwr
 Pq+LgBBftFfcct70UyXWSK7UI92suavDgCHVejIxpbvIWF1UVY7S1mgmflZ1WZAL
 R5tdx6oe5Y4=
 =QYVn
 -----END PGP SIGNATURE-----

Merge tag 'fixes-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

These are the bug fixes that have accumulated since 3.3-rc3 in arm-soc.
The majority of them are regression fixes for stuff that broke during
the merge 3.3 window.

The notable ones are:

* The at91 ata drivers both broke because of an earlier cleanup patch that
  some other patches were based on. Jean-Christophe decided to remove
  the legacy at91_ide driver and fix the new-style at91-pata driver while
  keeping the cleanup patch. I almost rejected the patches for being too
  late and too big but in the end decided to accept them because they
  fix a regression.

* A patch fixing build breakage from the sysdev-to-device conversion
  colliding with other changes touches a number of mach-s3c files.

* b0654037 "ARM: orion: Fix Orion5x GPIO regression from MPP cleanup"
  is a mechanical change that unfortunately touches a lot of lines
  that should up in the diffstat.

* tag 'fixes-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
  ARM: at91: drop ide driver in favor of the pata one
  pata/at91: use newly introduced SMC accessors
  ARM: at91: add accessor to manage SMC
  ARM: at91:rtc/rtc-at91sam9: ioremap register bank
  ARM: at91: USB AT91 gadget registration for module
  ep93xx: fix build of vision_ep93xx.c
  ARM: OMAP2xxx: PM: fix OMAP2xxx-specific UART idle bug in v3.3
  ARM: orion: Fix USB phy for orion5x.
  ARM: orion: Fix Orion5x GPIO regression from MPP cleanup
  ARM: EXYNOS: Add cpu-offset property in gic device tree node
  ARM: EXYNOS: Bring exynos4-dt up to date
  ARM: OMAP3: cm-t35: fix section mismatch warning
  ARM: OMAP2: Fix the OMAP2 only build break seen with 2011+ ARM tool-chains
  ARM: tegra: paz00: fix wrong UART port on mini-pcie plug
  ARM: tegra: paz00: fix wrong SD1 power gpio
  i2c: tegra: Add devexit_p() for remove
  ARM: EXYNOS: Correct M-5MOLS sensor clock frequency on Universal C210 board
  ARM: EXYNOS: Correct framebuffer window size on Nuri board
  ARM: SAMSUNG: Fix missing api-change from subsys_interface change
  ARM: EXYNOS: Fix "warning: initialization from incompatible pointer type"
  ...
2012-02-18 15:40:00 -08:00
Linus Torvalds
06ca7c4376 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Here are a few more fixes for powerpc.  Some are regressions, the rest
is simple/obvious/nasty enough that I deemed it good to go now.

Here's also step one of deprecating legacy iSeries support: we are
removing it from the main defconfig.

Nobody seems to be using it anymore and the code is nasty to maintain,
(involves horrible hacks in various low level areas of the kernel) so we
plan to actually rip it out at some point.  For now let's just avoid
building it by default.  Stephen will proceed to do the actual removal
later (probably 3.4 or 3.5).

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events
  powerpc/adb: Use set_current_state()
  powerpc: Disable interrupts early in Program Check
  powerpc: Remove legacy iSeries from ppc64_defconfig
  powerpc/fsl/pci: Fix PCIe fixup regression
  powerpc: Fix kernel log of oops/panic instruction dump
2012-02-18 15:26:37 -08:00
Linus Torvalds
34ddc81a23 i387: re-introduce FPU state preloading at context switch time
After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3 ("i387:
do not preload FPU state at task switch time").

However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably

 - properly abstracted as a true FPU state switch, rather than as
   open-coded save and restore with various hacks.

   In particular, implementing it as a proper FPU state switch allows us
   to optimize the CR0.TS flag accesses: there is no reason to set the
   TS bit only to then almost immediately clear it again.  CR0 accesses
   are quite slow and expensive, don't flip the bit back and forth for
   no good reason.

 - Make sure that the same model works for both x86-32 and x86-64, so
   that there are no gratuitous differences between the two due to the
   way they save and restore segment state differently due to
   architectural differences that really don't matter to the FPU state.

 - Avoid exposing the "preload" state to the context switch routines,
   and in particular allow the concept of lazy state restore: if nothing
   else has used the FPU in the meantime, and the process is still on
   the same CPU, we can avoid restoring state from memory entirely, just
   re-expose the state that is still in the FPU unit.

   That optimized lazy restore isn't actually implemented here, but the
   infrastructure is set up for it.  Of course, older CPU's that use
   'fnsave' to save the state cannot take advantage of this, since the
   state saving also trashes the state.

In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage.  Hopefully it's easier to
follow as a result.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-18 14:03:48 -08:00
Linus Torvalds
f94edacf99 i387: move TS_USEDFPU flag from thread_info to task_struct
This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

 - changing 'thread_info->status' from the scheduler causes nasty
   problems for the other users of that variable, since it is defined to
   be thread-synchronous (that's what the "TS_" part of the naming was
   supposed to indicate).

   So perfectly valid code could (and did) do

	ti->status |= TS_RESTORE_SIGMASK;

   and the compiler was free to do that as separate load, or and store
   instructions.  Which can cause problems with preemption, since a task
   switch could happen in between, and change the TS_USEDFPU bit. The
   change to TS_USEDFPU would be overwritten by the final store.

   In practice, this seldom happened, though, because the 'status' field
   was seldom used more than once, so gcc would generally tend to
   generate code that used a read-modify-write instruction and thus
   happened to avoid this problem - RMW instructions are naturally low
   fat and preemption-safe.

 - On x86-32, the current_thread_info() pointer would, during interrupts
   and softirqs, point to a *copy* of the real thread_info, because
   x86-32 uses %esp to calculate the thread_info address, and thus the
   separate irq (and softirq) stacks would cause these kinds of odd
   thread_info copy aliases.

   This is normally not a problem, since interrupts aren't supposed to
   look at thread information anyway (what thread is running at
   interrupt time really isn't very well-defined), but it confused the
   heck out of irq_fpu_usable() and the code that tried to squirrel
   away the FPU state.

   (It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling).  And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: Raphael Prevost <raphael@buro.asia>
Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-18 10:19:41 -08:00